From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jlaw@tachyum.com>
Received: from mx2.tachyum.com (mx2.tachyum.com [50.229.46.110])
 by sourceware.org (Postfix) with ESMTPS id F2ECE3838036
 for <gcc-patches@gcc.gnu.org>; Tue,  8 Jun 2021 14:47:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F2ECE3838036
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=tachyum.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tachyum.com
Received: by mx2.tachyum.com (Postfix, from userid 1000)
 id A543910055F2; Tue,  8 Jun 2021 07:47:29 -0700 (PDT)
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
Received: from THQ-EX1.tachyum.com (thq-ex1.tachyum.com [10.7.1.6])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mx2.tachyum.com (Postfix) with ESMTPS id 9C6FE10055E5;
 Tue,  8 Jun 2021 07:47:28 -0700 (PDT)
Received: from [10.0.96.2] (10.0.96.2) by THQ-EX1.tachyum.com (10.7.1.6) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.14; Tue, 8 Jun
 2021 07:47:27 -0700
Subject: Re: Aligning stack offsets for spills
To: Michael Matz <matz@suse.de>
CC: GCC Patches <gcc-patches@gcc.gnu.org>
References: <98179c8e-bcec-83ed-5b99-6f54791bd7cd@tachyum.com>
 <alpine.LSU.2.22.394.2106081402410.3803@wotan.suse.de>
From: Jeff Law <jlaw@tachyum.com>
Message-ID: <1a10d2db-1867-5dfc-bf08-3b34557c85d4@tachyum.com>
Date: Tue, 8 Jun 2021 08:47:26 -0600
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
 Thunderbird/78.10.2
MIME-Version: 1.0
In-Reply-To: <alpine.LSU.2.22.394.2106081402410.3803@wotan.suse.de>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-Originating-IP: [10.0.96.2]
X-ClientProxiedBy: THQ-EX3.tachyum.com (10.7.1.26) To THQ-EX1.tachyum.com
 (10.7.1.6)
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jun 2021 14:47:31 -0000


On 6/8/2021 8:08 AM, Michael Matz wrote:
> Hello,
>
> On Mon, 7 Jun 2021, Jeff Law wrote:
>
>> So, as many of you know I left Red Hat a while ago and joined Tachyum.  We're
>> building a new processor and we've come across an issue where I think we need
>> upstream discussion.
>>
>> I can't divulge many of the details right now, but one of the quirks of our
>> architecture is that reg+d addressing modes for our vector loads/stores
>> require the displacement to be aligned.  This is an artifact of how these
>> instructions are encoded.
>>
>> Obviously we can emit a load of the address into a register when the
>> displacement isn't aligned.  From a correctness point that works perfectly.
>> Unfortunately, it's a significant performance hit on some standard benchmarks
>> (spec) where we have a great number of spills of vector objects into the stack
>> at unaligned offsets in the hot parts of the code.
>>
>>
>> We've considered 3 possible approaches to solve this problem.
>>
>> 1. When the displacement isn't properly aligned, allocate more space in
>> assign_stack_local so that we can make the offset aligned.  The downside is
>> this potentially burns a lot of stack space, but in practice the cost was
>> minimal (16 bytes in a 9k frame)  From a performance standpoint this works
>> perfectly.
>>
>> 2. Abuse the register elimination code to create a second pointer into the
>> stack.  Spills would start as <virtual> + offset, then either get eliminated
>> to sp+offset' when the offset is aligned or gpr+offset'' when the offset
>> wasn't properly aligned. We started a bit down this path, but with #1 working
>> so well, we didn't get this approach to proof-of-concept.
>>
>> 3. Hack up the post-reload optimizers to fix things up as best as we can.
>> This may still be advantageous, but again with #1 working so well, we didn't
>> explore this in any significant way.  We may still look at this at some point
>> in other contexts.
>>
>> Here's what we're playing with.  Obviously we'd need a target hook to
>> drive this behavior.  I was thinking that we'd pass in any slot offset
>> alignment requirements (from the target hook) to assign_stack_local and
>> that would bubble down to this point in try_fit_stack_local:
> Why is the machinery involving STACK_SLOT_ALIGNMENT and
> spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for
> backing stack slots) not working for you?  If everything is setup
> correctly the input alignment to try_fit_stack_local ought to be correct
> already.
We don't need the MEM as a whole aligned, just the offset in the address 
calculation due to how we encode those instructions.  If I've read that 
code correctly, it would arrange for a dynamic realignment of the stack  
so that it could then align the slot. None of that is necessary for us 
and we'd like to avoid forcing the dynamic stack realignment.  Or did I 
misread the code?

jeff