public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* BLKmode parameters are stored in unaligned stack slot when passed via registers.
@ 2018-03-06 15:21 Renlin Li
  2018-03-06 16:04 ` Richard Biener
  2018-03-07 17:02 ` Jeff Law
  0 siblings, 2 replies; 6+ messages in thread
From: Renlin Li @ 2018-03-06 15:21 UTC (permalink / raw)
  To: gcc

Hi all,

The problem described here probably only affects targets whose ABI allow to pass structured
arguments of certain size via registers.

If the mode of the parameter type is BLKmode, in the callee, during RTL expanding,
a stack slot will be reserved for this parameter, and the incoming value will be copied into
the stack slot.

However, the stack slot for the parameter will not be aligned if the alignment of parameter type
exceeds MAX_SUPPORTED_STACK_ALIGNMENT.
Chances are, unaligned memory access might cause run-time errors.

For local variable on the stack, the alignment of the data type is honored,
although the document states that it is not guaranteed.

For example:

#include <stdint.h>
union U {
     uint32_t M0;
     uint32_t M1;
     uint32_t M2;
     uint32_t M3;
} __attribute((aligned(16)));

void tmp (union U *);
void foo (union U P0)
{
   union U P1 = P0;
   tmp (&P1);
}

The code-gen from armv7-a is like this:

foo:
     @ args = 0, pretend = 0, frame = 48
     @ frame_needed = 0, uses_anonymous_args = 0
     str    lr, [sp, #-4]!
     sub    sp, sp, #52
     mov    ip, sp
     stm    ip, {r0, r1, r2, r3}  --> ip is not 128-bit aligned
     add    lr, sp, #39
     bic    lr, lr, #15
     ldm    ip, {r0, r1, r2, r3}
     stm    lr, {r0, r1, r2, r3} --> lr is 128-bit aligned
     mov    r0, lr
     bl    tmp
     add    sp, sp, #52
     @ sp needed
     ldr    pc, [sp], #4

There are other obvious missed optimizations in the code-generation above.
The stack slot for parameter P0 and local variable P1 could be merged.
So that some of the load/store instructions could be removed.
I think this is a known missed optimization case.

To summaries, there are two issues here:
1, (wrong code) unaligned stack slot allocated for parameters during function expansion.
2, (missed optimization) stack slot for parameter sometimes is not necessary.
    In certain scenario, the argument register could directly be used.
    Currently, this is only possible when the parameter mode is not BLKmode.

For issue 1, we can do similar things as expand_used_vars.
Dynamically align the stack slot address for parameters whose alignment exceeds
PREDERRED_STACK_BOUNDARY. Other parameters could be store in gap between the
aligned address and fp when possible.

For issue 2, I checked the behavior of LLVM, it seems the stack slot allocation
for parameters are explicitly exposed by the alloca IR instruction at the very beginning.
Later, there are optimization/transformation passes like mem2reg, reg2mem, sroa etc. to remove
unnecessary alloca instructions.

In gcc, the stack allocation for parameters and local variables are done during expand pass, implicitly.
And RTL passes are not able to remove the unnecessary stack allocation and load/store operations.

For example:

uint32_t bar(union U P0)
{
   return P0.M0;
}

Currently, the code-gen is different on different targets.
There are various backend hooks which make the code-gen sub-optimal.
For example, aarch64 target could directly return with w0 while armv7-a target generates unnecessary
store and load.

However, this optimization should be target independent, unrelated target alignment configuration.
Both issue 1&2 could be resolved if gcc has a similar approach. But I assume the change is big.

Is there any suggestions for solving issue 1 and improving issue 2 in a generic way?
I can create a bugzilla ticket to record the issue.

Regards,
Renlin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BLKmode parameters are stored in unaligned stack slot when passed via registers.
  2018-03-06 15:21 BLKmode parameters are stored in unaligned stack slot when passed via registers Renlin Li
@ 2018-03-06 16:04 ` Richard Biener
  2018-03-06 20:03   ` Renlin Li
  2018-03-07 17:02 ` Jeff Law
  1 sibling, 1 reply; 6+ messages in thread
From: Richard Biener @ 2018-03-06 16:04 UTC (permalink / raw)
  To: Renlin Li; +Cc: gcc

On Tue, Mar 6, 2018 at 4:21 PM, Renlin Li <renlin.li@arm.com> wrote:
> Hi all,
>
> The problem described here probably only affects targets whose ABI allow to
> pass structured
> arguments of certain size via registers.
>
> If the mode of the parameter type is BLKmode, in the callee, during RTL
> expanding,
> a stack slot will be reserved for this parameter, and the incoming value
> will be copied into
> the stack slot.
>
> However, the stack slot for the parameter will not be aligned if the
> alignment of parameter type
> exceeds MAX_SUPPORTED_STACK_ALIGNMENT.
> Chances are, unaligned memory access might cause run-time errors.
>
> For local variable on the stack, the alignment of the data type is honored,
> although the document states that it is not guaranteed.
>
> For example:
>
> #include <stdint.h>
> union U {
>     uint32_t M0;
>     uint32_t M1;
>     uint32_t M2;
>     uint32_t M3;
> } __attribute((aligned(16)));
>
> void tmp (union U *);
> void foo (union U P0)
> {
>   union U P1 = P0;
>   tmp (&P1);
> }
>
> The code-gen from armv7-a is like this:
>
> foo:
>     @ args = 0, pretend = 0, frame = 48
>     @ frame_needed = 0, uses_anonymous_args = 0
>     str    lr, [sp, #-4]!
>     sub    sp, sp, #52
>     mov    ip, sp
>     stm    ip, {r0, r1, r2, r3}  --> ip is not 128-bit aligned
>     add    lr, sp, #39
>     bic    lr, lr, #15
>     ldm    ip, {r0, r1, r2, r3}
>     stm    lr, {r0, r1, r2, r3} --> lr is 128-bit aligned
>     mov    r0, lr
>     bl    tmp
>     add    sp, sp, #52
>     @ sp needed
>     ldr    pc, [sp], #4
>
> There are other obvious missed optimizations in the code-generation above.
> The stack slot for parameter P0 and local variable P1 could be merged.
> So that some of the load/store instructions could be removed.
> I think this is a known missed optimization case.
>
> To summaries, there are two issues here:
> 1, (wrong code) unaligned stack slot allocated for parameters during
> function expansion.
> 2, (missed optimization) stack slot for parameter sometimes is not
> necessary.
>    In certain scenario, the argument register could directly be used.
>    Currently, this is only possible when the parameter mode is not BLKmode.
>
> For issue 1, we can do similar things as expand_used_vars.
> Dynamically align the stack slot address for parameters whose alignment
> exceeds
> PREDERRED_STACK_BOUNDARY. Other parameters could be store in gap between the
> aligned address and fp when possible.
>
> For issue 2, I checked the behavior of LLVM, it seems the stack slot
> allocation
> for parameters are explicitly exposed by the alloca IR instruction at the
> very beginning.
> Later, there are optimization/transformation passes like mem2reg, reg2mem,
> sroa etc. to remove
> unnecessary alloca instructions.
>
> In gcc, the stack allocation for parameters and local variables are done
> during expand pass, implicitly.
> And RTL passes are not able to remove the unnecessary stack allocation and
> load/store operations.
>
> For example:
>
> uint32_t bar(union U P0)
> {
>   return P0.M0;
> }
>
> Currently, the code-gen is different on different targets.
> There are various backend hooks which make the code-gen sub-optimal.
> For example, aarch64 target could directly return with w0 while armv7-a
> target generates unnecessary
> store and load.
>
> However, this optimization should be target independent, unrelated target
> alignment configuration.
> Both issue 1&2 could be resolved if gcc has a similar approach. But I assume
> the change is big.
>
> Is there any suggestions for solving issue 1 and improving issue 2 in a
> generic way?
> I can create a bugzilla ticket to record the issue.

What does the ABI say for passing such over-aligned data types?

For solving 1) you could copy the argument as passed by the ABI
to a properly aligned stack location in the callee.

Generally it sounds like either the ABI doesn't specify anything
or the ABI specifies something that violates user expectation.

For 2) again, it is the ABI which specifies whether an argument
is passed via the stack or via registers.  So - what does the ABI say?

Richard.

> Regards,
> Renlin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BLKmode parameters are stored in unaligned stack slot when passed via registers.
  2018-03-06 16:04 ` Richard Biener
@ 2018-03-06 20:03   ` Renlin Li
  2018-03-07  8:49     ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Renlin Li @ 2018-03-06 20:03 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc

Hi Richard,

On 06/03/18 16:04, Richard Biener wrote:
> On Tue, Mar 6, 2018 at 4:21 PM, Renlin Li <renlin.li@arm.com> wrote:
>> Hi all,
>>
>> The problem described here probably only affects targets whose ABI allow to
>> pass structured
>> arguments of certain size via registers.
>>
>> If the mode of the parameter type is BLKmode, in the callee, during RTL
>> expanding,
>> a stack slot will be reserved for this parameter, and the incoming value
>> will be copied into
>> the stack slot.
>>
>> However, the stack slot for the parameter will not be aligned if the
>> alignment of parameter type
>> exceeds MAX_SUPPORTED_STACK_ALIGNMENT.
>> Chances are, unaligned memory access might cause run-time errors.
>>
>> For local variable on the stack, the alignment of the data type is honored,
>> although the document states that it is not guaranteed.
>>
>> For example:
>>
>> #include <stdint.h>
>> union U {
>>      uint32_t M0;
>>      uint32_t M1;
>>      uint32_t M2;
>>      uint32_t M3;
>> } __attribute((aligned(16)));
>>
>> void tmp (union U *);
>> void foo (union U P0)
>> {
>>    union U P1 = P0;
>>    tmp (&P1);
>> }
>>
>> The code-gen from armv7-a is like this:
>>
>> foo:
>>      @ args = 0, pretend = 0, frame = 48
>>      @ frame_needed = 0, uses_anonymous_args = 0
>>      str    lr, [sp, #-4]!
>>      sub    sp, sp, #52
>>      mov    ip, sp
>>      stm    ip, {r0, r1, r2, r3}  --> ip is not 128-bit aligned
>>      add    lr, sp, #39
>>      bic    lr, lr, #15
>>      ldm    ip, {r0, r1, r2, r3}
>>      stm    lr, {r0, r1, r2, r3} --> lr is 128-bit aligned
>>      mov    r0, lr
>>      bl    tmp
>>      add    sp, sp, #52
>>      @ sp needed
>>      ldr    pc, [sp], #4
>>
>> There are other obvious missed optimizations in the code-generation above.
>> The stack slot for parameter P0 and local variable P1 could be merged.
>> So that some of the load/store instructions could be removed.
>> I think this is a known missed optimization case.
>>
>> To summaries, there are two issues here:
>> 1, (wrong code) unaligned stack slot allocated for parameters during
>> function expansion.
>> 2, (missed optimization) stack slot for parameter sometimes is not
>> necessary.
>>     In certain scenario, the argument register could directly be used.
>>     Currently, this is only possible when the parameter mode is not BLKmode.
>>
>> For issue 1, we can do similar things as expand_used_vars.
>> Dynamically align the stack slot address for parameters whose alignment
>> exceeds
>> PREDERRED_STACK_BOUNDARY. Other parameters could be store in gap between the
>> aligned address and fp when possible.
>>
>> For issue 2, I checked the behavior of LLVM, it seems the stack slot
>> allocation
>> for parameters are explicitly exposed by the alloca IR instruction at the
>> very beginning.
>> Later, there are optimization/transformation passes like mem2reg, reg2mem,
>> sroa etc. to remove
>> unnecessary alloca instructions.
>>
>> In gcc, the stack allocation for parameters and local variables are done
>> during expand pass, implicitly.
>> And RTL passes are not able to remove the unnecessary stack allocation and
>> load/store operations.
>>
>> For example:
>>
>> uint32_t bar(union U P0)
>> {
>>    return P0.M0;
>> }
>>
>> Currently, the code-gen is different on different targets.
>> There are various backend hooks which make the code-gen sub-optimal.
>> For example, aarch64 target could directly return with w0 while armv7-a
>> target generates unnecessary
>> store and load.
>>
>> However, this optimization should be target independent, unrelated target
>> alignment configuration.
>> Both issue 1&2 could be resolved if gcc has a similar approach. But I assume
>> the change is big.
>>
>> Is there any suggestions for solving issue 1 and improving issue 2 in a
>> generic way?
>> I can create a bugzilla ticket to record the issue.
> 
> What does the ABI say for passing such over-aligned data types?
> 
> For solving 1) you could copy the argument as passed by the ABI
> to a properly aligned stack location in the callee.
> 
> Generally it sounds like either the ABI doesn't specify anything
> or the ABI specifies something that violates user expectation.
> 
> For 2) again, it is the ABI which specifies whether an argument
> is passed via the stack or via registers.  So - what does the ABI say?


The compiler is doing the right thing here to pass argument via registers.
To be specific, there are such clause in the arm PCS:

> B.5 If the argument is an alignment adjusted type its value is passed as a copy of the actual value. The
> copy will have an alignment defined as follows.
> ...
> For a Composite Type, the alignment of the copy will have 4-byte alignment if its natural alignment is
> <= 4 and 8-byte alignment if its natural alignment is >= 8

> C.3 If the argument requires double-word alignment (8-byte), the NCRN is rounded up to the next even
> register number.
> C.4 If the size in words of the argument is not more than r4 minus NCRN, the argument is copied into
> core registers, starting at the NCRN. The NCRN is incremented by the number of registers used.
> Successive registers hold the parts of the argument they would hold if its value were loaded into
> those registers from memory using an LDM instruction. The argument has now been allocated.


This is quite similar for other RISC machines.
Here, the problem here how arguments/parameters are received in the callee.
To store the incoming parameters on the stack, it seems an implementation decision.

Even for the following case without over-alignment, in the callee, it will save r0-r3 into local
stack first, and load M3 from local copy.

struct U {
     uint32_t M0;
     uint32_t M1;
     uint32_t M2;
     uint32_t M3;
};

int x (struct U p)
{
   return p.M3;
}


Regards,
Renlin

> 
> Richard.
> 
>> Regards,
>> Renlin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BLKmode parameters are stored in unaligned stack slot when passed via registers.
  2018-03-06 20:03   ` Renlin Li
@ 2018-03-07  8:49     ` Richard Biener
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Biener @ 2018-03-07  8:49 UTC (permalink / raw)
  To: Renlin Li; +Cc: gcc

On Tue, Mar 6, 2018 at 9:02 PM, Renlin Li <renlin.li@foss.arm.com> wrote:
> Hi Richard,
>
>
> On 06/03/18 16:04, Richard Biener wrote:
>>
>> On Tue, Mar 6, 2018 at 4:21 PM, Renlin Li <renlin.li@arm.com> wrote:
>>>
>>> Hi all,
>>>
>>> The problem described here probably only affects targets whose ABI allow
>>> to
>>> pass structured
>>> arguments of certain size via registers.
>>>
>>> If the mode of the parameter type is BLKmode, in the callee, during RTL
>>> expanding,
>>> a stack slot will be reserved for this parameter, and the incoming value
>>> will be copied into
>>> the stack slot.
>>>
>>> However, the stack slot for the parameter will not be aligned if the
>>> alignment of parameter type
>>> exceeds MAX_SUPPORTED_STACK_ALIGNMENT.
>>> Chances are, unaligned memory access might cause run-time errors.
>>>
>>> For local variable on the stack, the alignment of the data type is
>>> honored,
>>> although the document states that it is not guaranteed.
>>>
>>> For example:
>>>
>>> #include <stdint.h>
>>> union U {
>>>      uint32_t M0;
>>>      uint32_t M1;
>>>      uint32_t M2;
>>>      uint32_t M3;
>>> } __attribute((aligned(16)));
>>>
>>> void tmp (union U *);
>>> void foo (union U P0)
>>> {
>>>    union U P1 = P0;
>>>    tmp (&P1);
>>> }
>>>
>>> The code-gen from armv7-a is like this:
>>>
>>> foo:
>>>      @ args = 0, pretend = 0, frame = 48
>>>      @ frame_needed = 0, uses_anonymous_args = 0
>>>      str    lr, [sp, #-4]!
>>>      sub    sp, sp, #52
>>>      mov    ip, sp
>>>      stm    ip, {r0, r1, r2, r3}  --> ip is not 128-bit aligned
>>>      add    lr, sp, #39
>>>      bic    lr, lr, #15
>>>      ldm    ip, {r0, r1, r2, r3}
>>>      stm    lr, {r0, r1, r2, r3} --> lr is 128-bit aligned
>>>      mov    r0, lr
>>>      bl    tmp
>>>      add    sp, sp, #52
>>>      @ sp needed
>>>      ldr    pc, [sp], #4
>>>
>>> There are other obvious missed optimizations in the code-generation
>>> above.
>>> The stack slot for parameter P0 and local variable P1 could be merged.
>>> So that some of the load/store instructions could be removed.
>>> I think this is a known missed optimization case.
>>>
>>> To summaries, there are two issues here:
>>> 1, (wrong code) unaligned stack slot allocated for parameters during
>>> function expansion.
>>> 2, (missed optimization) stack slot for parameter sometimes is not
>>> necessary.
>>>     In certain scenario, the argument register could directly be used.
>>>     Currently, this is only possible when the parameter mode is not
>>> BLKmode.
>>>
>>> For issue 1, we can do similar things as expand_used_vars.
>>> Dynamically align the stack slot address for parameters whose alignment
>>> exceeds
>>> PREDERRED_STACK_BOUNDARY. Other parameters could be store in gap between
>>> the
>>> aligned address and fp when possible.
>>>
>>> For issue 2, I checked the behavior of LLVM, it seems the stack slot
>>> allocation
>>> for parameters are explicitly exposed by the alloca IR instruction at the
>>> very beginning.
>>> Later, there are optimization/transformation passes like mem2reg,
>>> reg2mem,
>>> sroa etc. to remove
>>> unnecessary alloca instructions.
>>>
>>> In gcc, the stack allocation for parameters and local variables are done
>>> during expand pass, implicitly.
>>> And RTL passes are not able to remove the unnecessary stack allocation
>>> and
>>> load/store operations.
>>>
>>> For example:
>>>
>>> uint32_t bar(union U P0)
>>> {
>>>    return P0.M0;
>>> }
>>>
>>> Currently, the code-gen is different on different targets.
>>> There are various backend hooks which make the code-gen sub-optimal.
>>> For example, aarch64 target could directly return with w0 while armv7-a
>>> target generates unnecessary
>>> store and load.
>>>
>>> However, this optimization should be target independent, unrelated target
>>> alignment configuration.
>>> Both issue 1&2 could be resolved if gcc has a similar approach. But I
>>> assume
>>> the change is big.
>>>
>>> Is there any suggestions for solving issue 1 and improving issue 2 in a
>>> generic way?
>>> I can create a bugzilla ticket to record the issue.
>>
>>
>> What does the ABI say for passing such over-aligned data types?
>>
>> For solving 1) you could copy the argument as passed by the ABI
>> to a properly aligned stack location in the callee.
>>
>> Generally it sounds like either the ABI doesn't specify anything
>> or the ABI specifies something that violates user expectation.
>>
>> For 2) again, it is the ABI which specifies whether an argument
>> is passed via the stack or via registers.  So - what does the ABI say?
>
>
>
> The compiler is doing the right thing here to pass argument via registers.
> To be specific, there are such clause in the arm PCS:
>
>> B.5 If the argument is an alignment adjusted type its value is passed as a
>> copy of the actual value. The
>> copy will have an alignment defined as follows.
>> ...
>> For a Composite Type, the alignment of the copy will have 4-byte alignment
>> if its natural alignment is
>> <= 4 and 8-byte alignment if its natural alignment is >= 8
>
>
>> C.3 If the argument requires double-word alignment (8-byte), the NCRN is
>> rounded up to the next even
>> register number.
>> C.4 If the size in words of the argument is not more than r4 minus NCRN,
>> the argument is copied into
>> core registers, starting at the NCRN. The NCRN is incremented by the
>> number of registers used.
>> Successive registers hold the parts of the argument they would hold if its
>> value were loaded into
>> those registers from memory using an LDM instruction. The argument has now
>> been allocated.
>
>
>
> This is quite similar for other RISC machines.
> Here, the problem here how arguments/parameters are received in the callee.
> To store the incoming parameters on the stack, it seems an implementation
> decision.
>
> Even for the following case without over-alignment, in the callee, it will
> save r0-r3 into local
> stack first, and load M3 from local copy.
>
> struct U {
>     uint32_t M0;
>     uint32_t M1;
>     uint32_t M2;
>     uint32_t M3;
> };
>
> int x (struct U p)
> {
>   return p.M3;
> }

Ah, so the possible bug is that we "spill" the aggregate and that
reloads from it may end up using bigger alignment that is guaranteed
by the spilling.  I guess this might happen on x86_64 as well but the
reloads should have the proper alignment of the stack slot assigned,
not the alignment of the spilled type.

Hmm, on x86 even bitfield extracts manage to operate on registers
only with optimization, just -O0 spills here and we indeed see

(insn 7 6 8 2 (set (mem/c:TI (plus:DI (reg/f:DI 82 virtual-stack-vars)
                (const_int -16 [0xfffffffffffffff0])) [1 x+0 S16 A128])
        (reg:TI 90)) "t.c":9 -1
     (nil))
^^^ overaligned spill

but the reload has correct alignment (well, it was expanded as
original unaligned load):

(insn 13 12 16 2 (set (reg:SI 88 [ _3 ])
        (mem:SI (reg/f:DI 87 [ _1 ]) [3 *_1+0 S4 A16])) "t.c":10 -1
     (nil))

Testcase I was playing with:

struct  __attribute__((aligned(16))) X{
    double a;
    double b;
};

typedef int ui __attribute__((aligned(2)));

int foo (struct X x)
{
  return *(ui *)((char *)&x + 6);
}

on x86_64 x is passed in xmm0/xmm1.

Richard.

>
> Regards,
> Renlin
>
>>
>> Richard.
>>
>>> Regards,
>>> Renlin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BLKmode parameters are stored in unaligned stack slot when passed via registers.
  2018-03-06 15:21 BLKmode parameters are stored in unaligned stack slot when passed via registers Renlin Li
  2018-03-06 16:04 ` Richard Biener
@ 2018-03-07 17:02 ` Jeff Law
  2018-03-12 12:23   ` Renlin Li
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Law @ 2018-03-07 17:02 UTC (permalink / raw)
  To: Renlin Li, gcc

On 03/06/2018 08:21 AM, Renlin Li wrote:
> Hi all,
> 
> The problem described here probably only affects targets whose ABI allow
> to pass structured
> arguments of certain size via registers.
> 
> If the mode of the parameter type is BLKmode, in the callee, during RTL
> expanding,
> a stack slot will be reserved for this parameter, and the incoming value
> will be copied into
> the stack slot.
> 
> However, the stack slot for the parameter will not be aligned if the
> alignment of parameter type
> exceeds MAX_SUPPORTED_STACK_ALIGNMENT.
> Chances are, unaligned memory access might cause run-time errors.
My recollection here (the PA has a ABI which mandates this kind of
stuff) is that you have to copy the object out of the potentially
unaligned location into a suitably aligned local.

The copy should be occurring in an alignment safe way.  It also has to
handle structures that are partially in registers, partially in memory
and structures which are justified in the wrong direction.

We never tried to optimize this stuff.  It was rare enough to not worry
about.

jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BLKmode parameters are stored in unaligned stack slot when passed via registers.
  2018-03-07 17:02 ` Jeff Law
@ 2018-03-12 12:23   ` Renlin Li
  0 siblings, 0 replies; 6+ messages in thread
From: Renlin Li @ 2018-03-12 12:23 UTC (permalink / raw)
  To: Jeff Law, gcc

Hi Jeff,

On 07/03/18 17:02, Jeff Law wrote:
> On 03/06/2018 08:21 AM, Renlin Li wrote:
>> Hi all,
>>
>> The problem described here probably only affects targets whose ABI allow
>> to pass structured
>> arguments of certain size via registers.
>>
>> If the mode of the parameter type is BLKmode, in the callee, during RTL
>> expanding,
>> a stack slot will be reserved for this parameter, and the incoming value
>> will be copied into
>> the stack slot.
>>
>> However, the stack slot for the parameter will not be aligned if the
>> alignment of parameter type
>> exceeds MAX_SUPPORTED_STACK_ALIGNMENT.
>> Chances are, unaligned memory access might cause run-time errors.
> My recollection here (the PA has a ABI which mandates this kind of
> stuff) is that you have to copy the object out of the potentially
> unaligned location into a suitably aligned local.
> 
> The copy should be occurring in an alignment safe way.  It also has to
> handle structures that are partially in registers, partially in memory
> and structures which are justified in the wrong direction.

Yes, I also realize this as well. This is a case in arm pcs.
So indeed, the copy of parameter on the stack should also be aligned.


> 
> We never tried to optimize this stuff.  It was rare enough to not worry
> about.

 From what I have observed, yes, gcc doesn't do any optimization regarding this.
For the following code without special alignment requirement.

#include <stdint.h>
struct U {
     uint32_t M0;
     uint32_t M1;
};

void tmp (struct U *);
void foo(struct U P0)
{
   struct U P1 = P0;
   tmp (&P1);
}

void bar(struct U P0)
{
   tmp (&P0);
}

int __attribute__ ((noinline)) x (struct U p)
{
   return p.M1;
}

in arm code-generation, the address return by foo function is 16-byte aligned.
It is because P1 is a local stack var, and its stack slot is aligned. There
is a copy operation to copy the data from stack slot of the parameter into the stack
slot of the local variable.

The process is:
store r0-r1 into stack_slot_for_parm (this will be unaligned if large alignment is required)
load r0-r1 from stack_slot_for_parm
store r0-r1 into stack_slot_for_p1 (aligned)

For bar function, the address returned is the stack slot address for parameter copy, which could be not aligned.

In function x, to get the return value, it will load the data from the stack slot for parameter after
the parameter is saved into stack.

Renlin

> 
> jeff
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-12 12:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-06 15:21 BLKmode parameters are stored in unaligned stack slot when passed via registers Renlin Li
2018-03-06 16:04 ` Richard Biener
2018-03-06 20:03   ` Renlin Li
2018-03-07  8:49     ` Richard Biener
2018-03-07 17:02 ` Jeff Law
2018-03-12 12:23   ` Renlin Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).