public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Renlin Li <renlin.li@foss.arm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Subject: Re: BLKmode parameters are stored in unaligned stack slot when passed via registers.
Date: Tue, 06 Mar 2018 20:03:00 -0000	[thread overview]
Message-ID: <122112c3-91cd-bc68-4a28-adf4fea42ee4@foss.arm.com> (raw)
In-Reply-To: <CAFiYyc3XS6YYt13T08Qpm9Fq-hJ_i5=RHxR71ZqwkXc6Ecr0Rg@mail.gmail.com>

Hi Richard,

On 06/03/18 16:04, Richard Biener wrote:
> On Tue, Mar 6, 2018 at 4:21 PM, Renlin Li <renlin.li@arm.com> wrote:
>> Hi all,
>>
>> The problem described here probably only affects targets whose ABI allow to
>> pass structured
>> arguments of certain size via registers.
>>
>> If the mode of the parameter type is BLKmode, in the callee, during RTL
>> expanding,
>> a stack slot will be reserved for this parameter, and the incoming value
>> will be copied into
>> the stack slot.
>>
>> However, the stack slot for the parameter will not be aligned if the
>> alignment of parameter type
>> exceeds MAX_SUPPORTED_STACK_ALIGNMENT.
>> Chances are, unaligned memory access might cause run-time errors.
>>
>> For local variable on the stack, the alignment of the data type is honored,
>> although the document states that it is not guaranteed.
>>
>> For example:
>>
>> #include <stdint.h>
>> union U {
>>      uint32_t M0;
>>      uint32_t M1;
>>      uint32_t M2;
>>      uint32_t M3;
>> } __attribute((aligned(16)));
>>
>> void tmp (union U *);
>> void foo (union U P0)
>> {
>>    union U P1 = P0;
>>    tmp (&P1);
>> }
>>
>> The code-gen from armv7-a is like this:
>>
>> foo:
>>      @ args = 0, pretend = 0, frame = 48
>>      @ frame_needed = 0, uses_anonymous_args = 0
>>      str    lr, [sp, #-4]!
>>      sub    sp, sp, #52
>>      mov    ip, sp
>>      stm    ip, {r0, r1, r2, r3}  --> ip is not 128-bit aligned
>>      add    lr, sp, #39
>>      bic    lr, lr, #15
>>      ldm    ip, {r0, r1, r2, r3}
>>      stm    lr, {r0, r1, r2, r3} --> lr is 128-bit aligned
>>      mov    r0, lr
>>      bl    tmp
>>      add    sp, sp, #52
>>      @ sp needed
>>      ldr    pc, [sp], #4
>>
>> There are other obvious missed optimizations in the code-generation above.
>> The stack slot for parameter P0 and local variable P1 could be merged.
>> So that some of the load/store instructions could be removed.
>> I think this is a known missed optimization case.
>>
>> To summaries, there are two issues here:
>> 1, (wrong code) unaligned stack slot allocated for parameters during
>> function expansion.
>> 2, (missed optimization) stack slot for parameter sometimes is not
>> necessary.
>>     In certain scenario, the argument register could directly be used.
>>     Currently, this is only possible when the parameter mode is not BLKmode.
>>
>> For issue 1, we can do similar things as expand_used_vars.
>> Dynamically align the stack slot address for parameters whose alignment
>> exceeds
>> PREDERRED_STACK_BOUNDARY. Other parameters could be store in gap between the
>> aligned address and fp when possible.
>>
>> For issue 2, I checked the behavior of LLVM, it seems the stack slot
>> allocation
>> for parameters are explicitly exposed by the alloca IR instruction at the
>> very beginning.
>> Later, there are optimization/transformation passes like mem2reg, reg2mem,
>> sroa etc. to remove
>> unnecessary alloca instructions.
>>
>> In gcc, the stack allocation for parameters and local variables are done
>> during expand pass, implicitly.
>> And RTL passes are not able to remove the unnecessary stack allocation and
>> load/store operations.
>>
>> For example:
>>
>> uint32_t bar(union U P0)
>> {
>>    return P0.M0;
>> }
>>
>> Currently, the code-gen is different on different targets.
>> There are various backend hooks which make the code-gen sub-optimal.
>> For example, aarch64 target could directly return with w0 while armv7-a
>> target generates unnecessary
>> store and load.
>>
>> However, this optimization should be target independent, unrelated target
>> alignment configuration.
>> Both issue 1&2 could be resolved if gcc has a similar approach. But I assume
>> the change is big.
>>
>> Is there any suggestions for solving issue 1 and improving issue 2 in a
>> generic way?
>> I can create a bugzilla ticket to record the issue.
> 
> What does the ABI say for passing such over-aligned data types?
> 
> For solving 1) you could copy the argument as passed by the ABI
> to a properly aligned stack location in the callee.
> 
> Generally it sounds like either the ABI doesn't specify anything
> or the ABI specifies something that violates user expectation.
> 
> For 2) again, it is the ABI which specifies whether an argument
> is passed via the stack or via registers.  So - what does the ABI say?


The compiler is doing the right thing here to pass argument via registers.
To be specific, there are such clause in the arm PCS:

> B.5 If the argument is an alignment adjusted type its value is passed as a copy of the actual value. The
> copy will have an alignment defined as follows.
> ...
> For a Composite Type, the alignment of the copy will have 4-byte alignment if its natural alignment is
> <= 4 and 8-byte alignment if its natural alignment is >= 8

> C.3 If the argument requires double-word alignment (8-byte), the NCRN is rounded up to the next even
> register number.
> C.4 If the size in words of the argument is not more than r4 minus NCRN, the argument is copied into
> core registers, starting at the NCRN. The NCRN is incremented by the number of registers used.
> Successive registers hold the parts of the argument they would hold if its value were loaded into
> those registers from memory using an LDM instruction. The argument has now been allocated.


This is quite similar for other RISC machines.
Here, the problem here how arguments/parameters are received in the callee.
To store the incoming parameters on the stack, it seems an implementation decision.

Even for the following case without over-alignment, in the callee, it will save r0-r3 into local
stack first, and load M3 from local copy.

struct U {
     uint32_t M0;
     uint32_t M1;
     uint32_t M2;
     uint32_t M3;
};

int x (struct U p)
{
   return p.M3;
}


Regards,
Renlin

> 
> Richard.
> 
>> Regards,
>> Renlin

  reply	other threads:[~2018-03-06 20:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-06 15:21 Renlin Li
2018-03-06 16:04 ` Richard Biener
2018-03-06 20:03   ` Renlin Li [this message]
2018-03-07  8:49     ` Richard Biener
2018-03-07 17:02 ` Jeff Law
2018-03-12 12:23   ` Renlin Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=122112c3-91cd-bc68-4a28-adf4fea42ee4@foss.arm.com \
    --to=renlin.li@foss.arm.com \
    --cc=gcc@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).