From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.tachyum.com (mx2.tachyum.com [50.229.46.110]) by sourceware.org (Postfix) with ESMTPS id F2ECE3838036 for ; Tue, 8 Jun 2021 14:47:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F2ECE3838036 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tachyum.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tachyum.com Received: by mx2.tachyum.com (Postfix, from userid 1000) id A543910055F2; Tue, 8 Jun 2021 07:47:29 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-Spam-Level: X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 Received: from THQ-EX1.tachyum.com (thq-ex1.tachyum.com [10.7.1.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx2.tachyum.com (Postfix) with ESMTPS id 9C6FE10055E5; Tue, 8 Jun 2021 07:47:28 -0700 (PDT) Received: from [10.0.96.2] (10.0.96.2) by THQ-EX1.tachyum.com (10.7.1.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.14; Tue, 8 Jun 2021 07:47:27 -0700 Subject: Re: Aligning stack offsets for spills To: Michael Matz CC: GCC Patches References: <98179c8e-bcec-83ed-5b99-6f54791bd7cd@tachyum.com> From: Jeff Law Message-ID: <1a10d2db-1867-5dfc-bf08-3b34557c85d4@tachyum.com> Date: Tue, 8 Jun 2021 08:47:26 -0600 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [10.0.96.2] X-ClientProxiedBy: THQ-EX3.tachyum.com (10.7.1.26) To THQ-EX1.tachyum.com (10.7.1.6) X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2021 14:47:31 -0000 On 6/8/2021 8:08 AM, Michael Matz wrote: > Hello, > > On Mon, 7 Jun 2021, Jeff Law wrote: > >> So, as many of you know I left Red Hat a while ago and joined Tachyum.  We're >> building a new processor and we've come across an issue where I think we need >> upstream discussion. >> >> I can't divulge many of the details right now, but one of the quirks of our >> architecture is that reg+d addressing modes for our vector loads/stores >> require the displacement to be aligned.  This is an artifact of how these >> instructions are encoded. >> >> Obviously we can emit a load of the address into a register when the >> displacement isn't aligned.  From a correctness point that works perfectly. >> Unfortunately, it's a significant performance hit on some standard benchmarks >> (spec) where we have a great number of spills of vector objects into the stack >> at unaligned offsets in the hot parts of the code. >> >> >> We've considered 3 possible approaches to solve this problem. >> >> 1. When the displacement isn't properly aligned, allocate more space in >> assign_stack_local so that we can make the offset aligned.  The downside is >> this potentially burns a lot of stack space, but in practice the cost was >> minimal (16 bytes in a 9k frame)  From a performance standpoint this works >> perfectly. >> >> 2. Abuse the register elimination code to create a second pointer into the >> stack.  Spills would start as + offset, then either get eliminated >> to sp+offset' when the offset is aligned or gpr+offset'' when the offset >> wasn't properly aligned. We started a bit down this path, but with #1 working >> so well, we didn't get this approach to proof-of-concept. >> >> 3. Hack up the post-reload optimizers to fix things up as best as we can. >> This may still be advantageous, but again with #1 working so well, we didn't >> explore this in any significant way.  We may still look at this at some point >> in other contexts. >> >> Here's what we're playing with. Obviously we'd need a target hook to >> drive this behavior. I was thinking that we'd pass in any slot offset >> alignment requirements (from the target hook) to assign_stack_local and >> that would bubble down to this point in try_fit_stack_local: > Why is the machinery involving STACK_SLOT_ALIGNMENT and > spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for > backing stack slots) not working for you? If everything is setup > correctly the input alignment to try_fit_stack_local ought to be correct > already. We don't need the MEM as a whole aligned, just the offset in the address calculation due to how we encode those instructions.  If I've read that code correctly, it would arrange for a dynamic realignment of the stack  so that it could then align the slot. None of that is necessary for us and we'd like to avoid forcing the dynamic stack realignment.  Or did I misread the code? jeff