[PATCH] Fix regression caused by PR65310 fix

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Fix regression caused by PR65310 fix
@ 2015-03-11 15:09 Richard Biener
  2015-03-30 12:15 ` New regression on ARM Linux (was: Re: [PATCH] Fix regression caused by PR65310 fix) Alan Lawrence
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2015-03-11 15:09 UTC (permalink / raw)
  To: gcc-patches


This fixes a vectorizer testcase regression on powerpc where SRA
drops alignment info unnecessarily.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-03-11  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/65310
	* tree-sra.c (build_ref_for_offset): Also preserve larger
	alignment.

Index: gcc/tree-sra.c
===================================================================
--- gcc/tree-sra.c	(revision 221324)
+++ gcc/tree-sra.c	(working copy)
@@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
   misalign = (misalign + offset) & (align - 1);
   if (misalign != 0)
     align = (misalign & -misalign);
-  if (align < TYPE_ALIGN (exp_type))
+  if (align != TYPE_ALIGN (exp_type))
     exp_type = build_aligned_type (exp_type, align);
 
   mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* New regression on ARM Linux (was: Re: [PATCH] Fix regression caused by PR65310 fix)
  2015-03-11 15:09 [PATCH] Fix regression caused by PR65310 fix Richard Biener
@ 2015-03-30 12:15 ` Alan Lawrence
  2015-03-30 12:18   ` New regression on ARM Linux Alan Lawrence
  0 siblings, 1 reply; 31+ messages in thread
From: Alan Lawrence @ 2015-03-30 12:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Marcus Shawcroft

We've been seeing a bunch of new failures in the *libffi* testsuite on ARM Linux 
(arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this one-liner 
fix. I've reduced the testcase down to the attached (including removing any 
dependency on libffi); with gcc r221347, this prints the expected
7 8 9
whereas with gcc r221348, instead it prints
0 8 0

The action of r221348 is to change the alignment of a mem_ref, and a var_decl of 
b1, from 32 to 64; both have type
  type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0 BLK
         size <integer_cst 0x2b9b8d3720a8 constant 192>
         unit size <integer_cst 0x2b9b8d372078 constant 24>
         align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
         fields <field_decl 0x2b9b8d42b098 a type <integer_type 0x2b9b8d092690 int>
             SI file reduced.c line 12 col 7
             size <integer_cst 0x2b9b8d08eeb8 constant 32>
             unit size <integer_cst 0x2b9b8d08eed0 constant 4>
             align 32 offset_align 64
             offset <integer_cst 0x2b9b8d08eee8 constant 0>
             bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context 
<record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl 0x2b9b8d42b130 
b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
         pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl 
0x2b9b8d42b000 D.6044>>

The tree-optimized output is the same with both compilers (as this does not 
mention alignment); the expand output differs.

Still investigating...

--Alan


Richard Biener wrote:
> This fixes a vectorizer testcase regression on powerpc where SRA
> drops alignment info unnecessarily.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> 
> Richard.
> 
> 2015-03-11  Richard Biener  <rguenther@suse.de>
> 
> 	PR tree-optimization/65310
> 	* tree-sra.c (build_ref_for_offset): Also preserve larger
> 	alignment.
> 
> Index: gcc/tree-sra.c
> ===================================================================
> --- gcc/tree-sra.c	(revision 221324)
> +++ gcc/tree-sra.c	(working copy)
> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>    misalign = (misalign + offset) & (align - 1);
>    if (misalign != 0)
>      align = (misalign & -misalign);
> -  if (align < TYPE_ALIGN (exp_type))
> +  if (align != TYPE_ALIGN (exp_type))
>      exp_type = build_aligned_type (exp_type, align);
>  
>    mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-30 12:15 ` New regression on ARM Linux (was: Re: [PATCH] Fix regression caused by PR65310 fix) Alan Lawrence
@ 2015-03-30 12:18   ` Alan Lawrence
  2015-03-30 13:01     ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Alan Lawrence @ 2015-03-30 12:18 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: Richard Biener, gcc-patches, Marcus Shawcroft

[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]

...actually attach the testcase...

Alan Lawrence wrote:
> We've been seeing a bunch of new failures in the *libffi* testsuite on ARM Linux 
> (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this one-liner 
> fix. I've reduced the testcase down to the attached (including removing any 
> dependency on libffi); with gcc r221347, this prints the expected
> 7 8 9
> whereas with gcc r221348, instead it prints
> 0 8 0
> 
> The action of r221348 is to change the alignment of a mem_ref, and a var_decl of 
> b1, from 32 to 64; both have type
>   type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0 BLK
>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>          align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
>          fields <field_decl 0x2b9b8d42b098 a type <integer_type 0x2b9b8d092690 int>
>              SI file reduced.c line 12 col 7
>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>              align 32 offset_align 64
>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context 
> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl 0x2b9b8d42b130 
> b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl 
> 0x2b9b8d42b000 D.6044>>
> 
> The tree-optimized output is the same with both compilers (as this does not 
> mention alignment); the expand output differs.
> 
> Still investigating...
> 
> --Alan
> 
> 
> Richard Biener wrote:
>> This fixes a vectorizer testcase regression on powerpc where SRA
>> drops alignment info unnecessarily.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>
>> Richard.
>>
>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>
>> 	PR tree-optimization/65310
>> 	* tree-sra.c (build_ref_for_offset): Also preserve larger
>> 	alignment.
>>
>> Index: gcc/tree-sra.c
>> ===================================================================
>> --- gcc/tree-sra.c	(revision 221324)
>> +++ gcc/tree-sra.c	(working copy)
>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>    misalign = (misalign + offset) & (align - 1);
>>    if (misalign != 0)
>>      align = (misalign & -misalign);
>> -  if (align < TYPE_ALIGN (exp_type))
>> +  if (align != TYPE_ALIGN (exp_type))
>>      exp_type = build_aligned_type (exp_type, align);
>>  
>>    mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);
>>
>>
> 
> 




[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: reduced.c --]
[-- Type: text/x-csrc; name=reduced.c, Size: 694 bytes --]

/* Reduced from libffi.call/cls_16byte.c
   Originator:	<andreast@gcc.gnu.org> 20030828	 */

typedef struct cls_struct_16byte {
  int a;
  double b;
  int c;
} cls_struct_16byte;

void cls_struct_16byte_gn(void** args, void *resp)
{
  struct cls_struct_16byte b1 = *(struct cls_struct_16byte*)(args[0]);

  __builtin_printf("%d %g %d\n", b1.a, b1.b, b1.c);

  // Note that commenting this line out, restores correct printing in the line above
  *(cls_struct_16byte*)resp = b1;
}

int main (void)
{
  struct cls_struct_16byte h_dbl = { 7, 8.0, 9 };
  struct cls_struct_16byte result;
  void *args = &h_dbl;
  cls_struct_16byte_gn (&args, &result);
  return 0;
}




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-30 12:18   ` New regression on ARM Linux Alan Lawrence
@ 2015-03-30 13:01     ` Richard Biener
  2015-03-30 13:16       ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2015-03-30 13:01 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches, Marcus Shawcroft

On Mon, 30 Mar 2015, Alan Lawrence wrote:

> ...actually attach the testcase...

What compile options?

> Alan Lawrence wrote:
> > We've been seeing a bunch of new failures in the *libffi* testsuite on ARM
> > Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this
> > one-liner fix. I've reduced the testcase down to the attached (including
> > removing any dependency on libffi); with gcc r221347, this prints the
> > expected
> > 7 8 9
> > whereas with gcc r221348, instead it prints
> > 0 8 0
> > 
> > The action of r221348 is to change the alignment of a mem_ref, and a
> > var_decl of b1, from 32 to 64; both have type
> >   type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0
> > BLK
> >          size <integer_cst 0x2b9b8d3720a8 constant 192>
> >          unit size <integer_cst 0x2b9b8d372078 constant 24>
> >          align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
> >          fields <field_decl 0x2b9b8d42b098 a type <integer_type
> > 0x2b9b8d092690 int>
> >              SI file reduced.c line 12 col 7
> >              size <integer_cst 0x2b9b8d08eeb8 constant 32>
> >              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
> >              align 32 offset_align 64
> >              offset <integer_cst 0x2b9b8d08eee8 constant 0>
> >              bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context
> > <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
> > 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
> >          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl
> > 0x2b9b8d42b000 D.6044>>
> > 
> > The tree-optimized output is the same with both compilers (as this does not
> > mention alignment); the expand output differs.
> > 
> > Still investigating...
> > 
> > --Alan
> > 
> > 
> > Richard Biener wrote:
> > > This fixes a vectorizer testcase regression on powerpc where SRA
> > > drops alignment info unnecessarily.
> > > 
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > > 
> > > Richard.
> > > 
> > > 2015-03-11  Richard Biener  <rguenther@suse.de>
> > > 
> > > 	PR tree-optimization/65310
> > > 	* tree-sra.c (build_ref_for_offset): Also preserve larger
> > > 	alignment.
> > > 
> > > Index: gcc/tree-sra.c
> > > ===================================================================
> > > --- gcc/tree-sra.c	(revision 221324)
> > > +++ gcc/tree-sra.c	(working copy)
> > > @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
> > >    misalign = (misalign + offset) & (align - 1);
> > >    if (misalign != 0)
> > >      align = (misalign & -misalign);
> > > -  if (align < TYPE_ALIGN (exp_type))
> > > +  if (align != TYPE_ALIGN (exp_type))
> > >      exp_type = build_aligned_type (exp_type, align);
> > >     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);
> > > 
> > > 
> > 
> > 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-30 13:01     ` Richard Biener
@ 2015-03-30 13:16       ` Richard Biener
  2015-03-30 16:45         ` Alan Lawrence
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2015-03-30 13:16 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches, Marcus Shawcroft

On Mon, 30 Mar 2015, Richard Biener wrote:

> On Mon, 30 Mar 2015, Alan Lawrence wrote:
> 
> > ...actually attach the testcase...
> 
> What compile options?

Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
I can't see anything not guaranteeing that:

        .section        .rodata
        .align  3
.LANCHOR0 = . + 0
.LC1:
        .ascii  "%d %g %d\012\000"
        .space  6
.LC0:
        .word   7
        .space  4
        .word   0
        .word   1075838976
        .word   9
        .space  4

maybe there is some more generic code-gen bug for aligned aggregate
copy?  That is, the patch tells the backend that the loads and
stores to the 'int' vars (which have padding followed) is aligned
to 8 bytes.

I don't see what is wrong in the final assembler, but maybe
some endian issue exists?  The code looks quite ugly though ;)

Richard.

> 
> > Alan Lawrence wrote:
> > > We've been seeing a bunch of new failures in the *libffi* testsuite on ARM
> > > Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this
> > > one-liner fix. I've reduced the testcase down to the attached (including
> > > removing any dependency on libffi); with gcc r221347, this prints the
> > > expected
> > > 7 8 9
> > > whereas with gcc r221348, instead it prints
> > > 0 8 0
> > > 
> > > The action of r221348 is to change the alignment of a mem_ref, and a
> > > var_decl of b1, from 32 to 64; both have type
> > >   type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0
> > > BLK
> > >          size <integer_cst 0x2b9b8d3720a8 constant 192>
> > >          unit size <integer_cst 0x2b9b8d372078 constant 24>
> > >          align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
> > >          fields <field_decl 0x2b9b8d42b098 a type <integer_type
> > > 0x2b9b8d092690 int>
> > >              SI file reduced.c line 12 col 7
> > >              size <integer_cst 0x2b9b8d08eeb8 constant 32>
> > >              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
> > >              align 32 offset_align 64
> > >              offset <integer_cst 0x2b9b8d08eee8 constant 0>
> > >              bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context
> > > <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
> > > 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
> > >          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl
> > > 0x2b9b8d42b000 D.6044>>
> > > 
> > > The tree-optimized output is the same with both compilers (as this does not
> > > mention alignment); the expand output differs.
> > > 
> > > Still investigating...
> > > 
> > > --Alan
> > > 
> > > 
> > > Richard Biener wrote:
> > > > This fixes a vectorizer testcase regression on powerpc where SRA
> > > > drops alignment info unnecessarily.
> > > > 
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > > > 
> > > > Richard.
> > > > 
> > > > 2015-03-11  Richard Biener  <rguenther@suse.de>
> > > > 
> > > > 	PR tree-optimization/65310
> > > > 	* tree-sra.c (build_ref_for_offset): Also preserve larger
> > > > 	alignment.
> > > > 
> > > > Index: gcc/tree-sra.c
> > > > ===================================================================
> > > > --- gcc/tree-sra.c	(revision 221324)
> > > > +++ gcc/tree-sra.c	(working copy)
> > > > @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
> > > >    misalign = (misalign + offset) & (align - 1);
> > > >    if (misalign != 0)
> > > >      align = (misalign & -misalign);
> > > > -  if (align < TYPE_ALIGN (exp_type))
> > > > +  if (align != TYPE_ALIGN (exp_type))
> > > >      exp_type = build_aligned_type (exp_type, align);
> > > >     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-30 13:16       ` Richard Biener
@ 2015-03-30 16:45         ` Alan Lawrence
  2015-03-30 20:13           ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Alan Lawrence @ 2015-03-30 16:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Marcus Shawcroft, Richard Earnshaw

-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes it.

The problem appears to be in laying out arguments, specifically varargs. From 
the "good" -fdump-rtl-expand:

(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0  S4 A32])
         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
      (nil))
(insn 19 18 20 2 (set (reg:DF 2 r2)
         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
      (nil))
(insn 20 19 21 2 (set (reg:SI 1 r1)
         (reg:SI 113 [ b1 ])) reduced.c:14 -1
      (nil))
(insn 21 20 22 2 (set (reg:SI 0 r0)
         (reg:SI 118)) reduced.c:14 -1
      (nil))
(call_insn 22 21 23 2 (parallel [
             (set (reg:SI 0 r0)
                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] 
<function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 A32])

The struct members are
reg:SI 113 => int a;
reg:DF 112 => double b;
reg:SI 111 => int c;

r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is pushed 
into virtual-outgoing-args. In contrast, post-change to build_ref_of_offset, we get:

(insn 17 16 18 2 (set (reg:SI 118)
         (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750 
*.LC1>)) reduced.c:14 -1
      (nil))
(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 virtual-outgoing-args)
                 (const_int 8 [0x8])) [0  S4 A64])
         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
      (nil))
(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0  S8 A64])
         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
      (nil))
(insn 20 19 21 2 (set (reg:SI 2 r2)
         (reg:SI 113 [ b1 ])) reduced.c:14 -1
      (nil))
(insn 21 20 22 2 (set (reg:SI 0 r0)
         (reg:SI 118)) reduced.c:14 -1
      (nil))
(call_insn 22 21 23 2 (parallel [
             (set (reg:SI 0 r0)
                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] 
<function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 A32])

r0 still gets the format string, but 'int b1.a' now goes in r2, and the 
double+following int are all pushed into virtual-outgoing-args. This is because 
arm_function_arg is fed a 64-bit-aligned int as type of the second argument (the 
type constructed by build_ref_for_offset); it then executes (aapcs_layout_arg, 
arm.c line ~~5914)

   /* C3 - For double-word aligned arguments, round the NCRN up to the
      next even number.  */
   ncrn = pcum->aapcs_ncrn;
   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
     ncrn++;

Which changes r1 to r2. Passing -fno-tree-sra, or removing from the testcase 
"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a 
32-bit-aligned int instead, which works as previously.

Passing the same members of that struct in a non-vargs call, works ok - I think 
because these use the type of the declared parameters, rather than the provided 
arguments, and the former do not have the increased alignment from 
build_ref_for_offset.

FWIW, I also tried:

__attribute__((__aligned__((16)))) int x;
int main (void)
{
   __builtin_printf("%d\n", x);
}

but in that case, the arm_function_arg is still fed a type with alignment 32 
(bits), i.e. distinct from the type of the field 'x' in memory, which has 
alignment 128.

--Alan

Richard Biener wrote:
> On Mon, 30 Mar 2015, Richard Biener wrote:
> 
>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>
>>> ...actually attach the testcase...
>> What compile options?
> 
> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
> I can't see anything not guaranteeing that:
> 
>         .section        .rodata
>         .align  3
> .LANCHOR0 = . + 0
> .LC1:
>         .ascii  "%d %g %d\012\000"
>         .space  6
> .LC0:
>         .word   7
>         .space  4
>         .word   0
>         .word   1075838976
>         .word   9
>         .space  4
> 
> maybe there is some more generic code-gen bug for aligned aggregate
> copy?  That is, the patch tells the backend that the loads and
> stores to the 'int' vars (which have padding followed) is aligned
> to 8 bytes.
> 
> I don't see what is wrong in the final assembler, but maybe
> some endian issue exists?  The code looks quite ugly though ;)
> 
> Richard.
> 
>>> Alan Lawrence wrote:
>>>> We've been seeing a bunch of new failures in the *libffi* testsuite on ARM
>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this
>>>> one-liner fix. I've reduced the testcase down to the attached (including
>>>> removing any dependency on libffi); with gcc r221347, this prints the
>>>> expected
>>>> 7 8 9
>>>> whereas with gcc r221348, instead it prints
>>>> 0 8 0
>>>>
>>>> The action of r221348 is to change the alignment of a mem_ref, and a
>>>> var_decl of b1, from 32 to 64; both have type
>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0
>>>> BLK
>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>          align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>> 0x2b9b8d092690 int>
>>>>              SI file reduced.c line 12 col 7
>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>              align 32 offset_align 64
>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context
>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl
>>>> 0x2b9b8d42b000 D.6044>>
>>>>
>>>> The tree-optimized output is the same with both compilers (as this does not
>>>> mention alignment); the expand output differs.
>>>>
>>>> Still investigating...
>>>>
>>>> --Alan
>>>>
>>>>
>>>> Richard Biener wrote:
>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>> drops alignment info unnecessarily.
>>>>>
>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>
>>>>> Richard.
>>>>>
>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>
>>>>> 	PR tree-optimization/65310
>>>>> 	* tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>> 	alignment.
>>>>>
>>>>> Index: gcc/tree-sra.c
>>>>> ===================================================================
>>>>> --- gcc/tree-sra.c	(revision 221324)
>>>>> +++ gcc/tree-sra.c	(working copy)
>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>    if (misalign != 0)
>>>>>      align = (misalign & -misalign);
>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-30 16:45         ` Alan Lawrence
@ 2015-03-30 20:13           ` Richard Biener
  2015-03-31  7:50             ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2015-03-30 20:13 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches, Marcus Shawcroft, Richard Earnshaw

On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>it.
>
>The problem appears to be in laying out arguments, specifically
>varargs. From 
>the "good" -fdump-rtl-expand:
>
>(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 
>S4 A32])
>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>      (nil))
>(insn 19 18 20 2 (set (reg:DF 2 r2)
>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>      (nil))
>(insn 20 19 21 2 (set (reg:SI 1 r1)
>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>      (nil))
>(insn 21 20 22 2 (set (reg:SI 0 r0)
>         (reg:SI 118)) reduced.c:14 -1
>      (nil))
>(call_insn 22 21 23 2 (parallel [
>             (set (reg:SI 0 r0)
>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] 
><function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>A32])
>
>The struct members are
>reg:SI 113 => int a;
>reg:DF 112 => double b;
>reg:SI 111 => int c;
>
>r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>pushed 
>into virtual-outgoing-args. In contrast, post-change to
>build_ref_of_offset, we get:
>
>(insn 17 16 18 2 (set (reg:SI 118)
>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750 
>*.LC1>)) reduced.c:14 -1
>      (nil))
>(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>virtual-outgoing-args)
>                 (const_int 8 [0x8])) [0  S4 A64])
>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>      (nil))
>(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 
>S8 A64])
>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>      (nil))
>(insn 20 19 21 2 (set (reg:SI 2 r2)
>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>      (nil))
>(insn 21 20 22 2 (set (reg:SI 0 r0)
>         (reg:SI 118)) reduced.c:14 -1
>      (nil))
>(call_insn 22 21 23 2 (parallel [
>             (set (reg:SI 0 r0)
>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] 
><function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>A32])
>
>r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>
>double+following int are all pushed into virtual-outgoing-args. This is
>because 
>arm_function_arg is fed a 64-bit-aligned int as type of the second
>argument (the 
>type constructed by build_ref_for_offset); it then executes
>(aapcs_layout_arg, 
>arm.c line ~~5914)
>
>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>      next even number.  */
>   ncrn = pcum->aapcs_ncrn;
>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>     ncrn++;
>
>Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>testcase 
>"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a 
>32-bit-aligned int instead, which works as previously.
>
>Passing the same members of that struct in a non-vargs call, works ok -
>I think 
>because these use the type of the declared parameters, rather than the
>provided 
>arguments, and the former do not have the increased alignment from 
>build_ref_for_offset.

It doesn't make sense to use the alignment of passed values.  That looks like bs.

This means that

Int I __aligned__(8);

Is passed differently than int.

Arm_function_arg needs to be fixed.

Richard.

>
>FWIW, I also tried:
>
>__attribute__((__aligned__((16)))) int x;
>int main (void)
>{
>   __builtin_printf("%d\n", x);
>}
>
>but in that case, the arm_function_arg is still fed a type with
>alignment 32 
>(bits), i.e. distinct from the type of the field 'x' in memory, which
>has 
>alignment 128.
>
>--Alan
>
>Richard Biener wrote:
>> On Mon, 30 Mar 2015, Richard Biener wrote:
>> 
>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>
>>>> ...actually attach the testcase...
>>> What compile options?
>> 
>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>> I can't see anything not guaranteeing that:
>> 
>>         .section        .rodata
>>         .align  3
>> .LANCHOR0 = . + 0
>> .LC1:
>>         .ascii  "%d %g %d\012\000"
>>         .space  6
>> .LC0:
>>         .word   7
>>         .space  4
>>         .word   0
>>         .word   1075838976
>>         .word   9
>>         .space  4
>> 
>> maybe there is some more generic code-gen bug for aligned aggregate
>> copy?  That is, the patch tells the backend that the loads and
>> stores to the 'int' vars (which have padding followed) is aligned
>> to 8 bytes.
>> 
>> I don't see what is wrong in the final assembler, but maybe
>> some endian issue exists?  The code looks quite ugly though ;)
>> 
>> Richard.
>> 
>>>> Alan Lawrence wrote:
>>>>> We've been seeing a bunch of new failures in the *libffi*
>testsuite on ARM
>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>following this
>>>>> one-liner fix. I've reduced the testcase down to the attached
>(including
>>>>> removing any dependency on libffi); with gcc r221347, this prints
>the
>>>>> expected
>>>>> 7 8 9
>>>>> whereas with gcc r221348, instead it prints
>>>>> 0 8 0
>>>>>
>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>a
>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>sizes-gimplified type_0
>>>>> BLK
>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>          align 64 symtab 0 alias set 1 canonical type
>0x2b9b8d428d20
>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>> 0x2b9b8d092690 int>
>>>>>              SI file reduced.c line 12 col 7
>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>              align 32 offset_align 64
>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>context
>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>D.6070>
>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
><type_decl
>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>
>>>>> The tree-optimized output is the same with both compilers (as this
>does not
>>>>> mention alignment); the expand output differs.
>>>>>
>>>>> Still investigating...
>>>>>
>>>>> --Alan
>>>>>
>>>>>
>>>>> Richard Biener wrote:
>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>> drops alignment info unnecessarily.
>>>>>>
>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>
>>>>>> 	PR tree-optimization/65310
>>>>>> 	* tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>> 	alignment.
>>>>>>
>>>>>> Index: gcc/tree-sra.c
>>>>>>
>===================================================================
>>>>>> --- gcc/tree-sra.c	(revision 221324)
>>>>>> +++ gcc/tree-sra.c	(working copy)
>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>    if (misalign != 0)
>>>>>>      align = (misalign & -misalign);
>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>off);
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-30 20:13           ` Richard Biener
@ 2015-03-31  7:50             ` Richard Biener
  2015-03-31  9:43               ` Richard Earnshaw
  2015-03-31  9:50               ` Alan Lawrence
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Biener @ 2015-03-31  7:50 UTC (permalink / raw)
  To: Richard Biener
  Cc: Alan Lawrence, gcc-patches, Marcus Shawcroft, Richard Earnshaw

On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>it.
>>
>>The problem appears to be in laying out arguments, specifically
>>varargs. From
>>the "good" -fdump-rtl-expand:
>>
>>(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>S4 A32])
>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>      (nil))
>>(insn 19 18 20 2 (set (reg:DF 2 r2)
>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>      (nil))
>>(insn 20 19 21 2 (set (reg:SI 1 r1)
>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>      (nil))
>>(insn 21 20 22 2 (set (reg:SI 0 r0)
>>         (reg:SI 118)) reduced.c:14 -1
>>      (nil))
>>(call_insn 22 21 23 2 (parallel [
>>             (set (reg:SI 0 r0)
>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>><function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>A32])
>>
>>The struct members are
>>reg:SI 113 => int a;
>>reg:DF 112 => double b;
>>reg:SI 111 => int c;
>>
>>r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>pushed
>>into virtual-outgoing-args. In contrast, post-change to
>>build_ref_of_offset, we get:
>>
>>(insn 17 16 18 2 (set (reg:SI 118)
>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>*.LC1>)) reduced.c:14 -1
>>      (nil))
>>(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>virtual-outgoing-args)
>>                 (const_int 8 [0x8])) [0  S4 A64])
>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>      (nil))
>>(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>S8 A64])
>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>      (nil))
>>(insn 20 19 21 2 (set (reg:SI 2 r2)
>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>      (nil))
>>(insn 21 20 22 2 (set (reg:SI 0 r0)
>>         (reg:SI 118)) reduced.c:14 -1
>>      (nil))
>>(call_insn 22 21 23 2 (parallel [
>>             (set (reg:SI 0 r0)
>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>><function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>A32])
>>
>>r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>
>>double+following int are all pushed into virtual-outgoing-args. This is
>>because
>>arm_function_arg is fed a 64-bit-aligned int as type of the second
>>argument (the
>>type constructed by build_ref_for_offset); it then executes
>>(aapcs_layout_arg,
>>arm.c line ~~5914)
>>
>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>      next even number.  */
>>   ncrn = pcum->aapcs_ncrn;
>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>     ncrn++;
>>
>>Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>testcase
>>"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>32-bit-aligned int instead, which works as previously.
>>
>>Passing the same members of that struct in a non-vargs call, works ok -
>>I think
>>because these use the type of the declared parameters, rather than the
>>provided
>>arguments, and the former do not have the increased alignment from
>>build_ref_for_offset.
>
> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>
> This means that
>
> Int I __aligned__(8);
>
> Is passed differently than int.
>
> Arm_function_arg needs to be fixed.

That is,

typedef int myint __attribute__((aligned(8)));

int main()
{
  myint i = 1;
  int j = 2;
  __builtin_printf("%d %d\n", i, j);
}

or

myint i;
int j;
myint *p = &i;
int *q = &j;

int main()
{
  __builtin_printf("%d %d", *p, *q);
}

should behave the same.  There isn't a printf modifier for an "aligned int"
because that sort of thing doesn't make sense.  Special-casing aligned vs.
non-aligned values only makes sense for things passed by value on the stack.
And then obviously only dependent on the functuion type signature, not
on the type of the passed value.

Richard.

> Richard.
>
>>
>>FWIW, I also tried:
>>
>>__attribute__((__aligned__((16)))) int x;
>>int main (void)
>>{
>>   __builtin_printf("%d\n", x);
>>}
>>
>>but in that case, the arm_function_arg is still fed a type with
>>alignment 32
>>(bits), i.e. distinct from the type of the field 'x' in memory, which
>>has
>>alignment 128.
>>
>>--Alan
>>
>>Richard Biener wrote:
>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>
>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>
>>>>> ...actually attach the testcase...
>>>> What compile options?
>>>
>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>> I can't see anything not guaranteeing that:
>>>
>>>         .section        .rodata
>>>         .align  3
>>> .LANCHOR0 = . + 0
>>> .LC1:
>>>         .ascii  "%d %g %d\012\000"
>>>         .space  6
>>> .LC0:
>>>         .word   7
>>>         .space  4
>>>         .word   0
>>>         .word   1075838976
>>>         .word   9
>>>         .space  4
>>>
>>> maybe there is some more generic code-gen bug for aligned aggregate
>>> copy?  That is, the patch tells the backend that the loads and
>>> stores to the 'int' vars (which have padding followed) is aligned
>>> to 8 bytes.
>>>
>>> I don't see what is wrong in the final assembler, but maybe
>>> some endian issue exists?  The code looks quite ugly though ;)
>>>
>>> Richard.
>>>
>>>>> Alan Lawrence wrote:
>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>testsuite on ARM
>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>following this
>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>(including
>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>the
>>>>>> expected
>>>>>> 7 8 9
>>>>>> whereas with gcc r221348, instead it prints
>>>>>> 0 8 0
>>>>>>
>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>a
>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>sizes-gimplified type_0
>>>>>> BLK
>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>0x2b9b8d428d20
>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>> 0x2b9b8d092690 int>
>>>>>>              SI file reduced.c line 12 col 7
>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>              align 32 offset_align 64
>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>context
>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>D.6070>
>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>><type_decl
>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>
>>>>>> The tree-optimized output is the same with both compilers (as this
>>does not
>>>>>> mention alignment); the expand output differs.
>>>>>>
>>>>>> Still investigating...
>>>>>>
>>>>>> --Alan
>>>>>>
>>>>>>
>>>>>> Richard Biener wrote:
>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>> drops alignment info unnecessarily.
>>>>>>>
>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>
>>>>>>>  PR tree-optimization/65310
>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>  alignment.
>>>>>>>
>>>>>>> Index: gcc/tree-sra.c
>>>>>>>
>>===================================================================
>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>    if (misalign != 0)
>>>>>>>      align = (misalign & -misalign);
>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>off);
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31  7:50             ` Richard Biener
@ 2015-03-31  9:43               ` Richard Earnshaw
  2015-03-31 10:00                 ` Richard Biener
  2015-03-31  9:50               ` Alan Lawrence
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Earnshaw @ 2015-03-31  9:43 UTC (permalink / raw)
  To: Richard Biener, Richard Biener
  Cc: Alan Lawrence, gcc-patches, Marcus Shawcroft

On 31/03/15 08:50, Richard Biener wrote:
> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>> it.
>>>
>>> The problem appears to be in laying out arguments, specifically
>>> varargs. From
>>> the "good" -fdump-rtl-expand:
>>>
>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>> S4 A32])
>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>         (reg:SI 118)) reduced.c:14 -1
>>>      (nil))
>>> (call_insn 22 21 23 2 (parallel [
>>>             (set (reg:SI 0 r0)
>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>> A32])
>>>
>>> The struct members are
>>> reg:SI 113 => int a;
>>> reg:DF 112 => double b;
>>> reg:SI 111 => int c;
>>>
>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>> pushed
>>> into virtual-outgoing-args. In contrast, post-change to
>>> build_ref_of_offset, we get:
>>>
>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>> *.LC1>)) reduced.c:14 -1
>>>      (nil))
>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>> virtual-outgoing-args)
>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>> S8 A64])
>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>         (reg:SI 118)) reduced.c:14 -1
>>>      (nil))
>>> (call_insn 22 21 23 2 (parallel [
>>>             (set (reg:SI 0 r0)
>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>> A32])
>>>
>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>
>>> double+following int are all pushed into virtual-outgoing-args. This is
>>> because
>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>> argument (the
>>> type constructed by build_ref_for_offset); it then executes
>>> (aapcs_layout_arg,
>>> arm.c line ~~5914)
>>>
>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>      next even number.  */
>>>   ncrn = pcum->aapcs_ncrn;
>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>     ncrn++;
>>>
>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>> testcase
>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>> 32-bit-aligned int instead, which works as previously.
>>>
>>> Passing the same members of that struct in a non-vargs call, works ok -
>>> I think
>>> because these use the type of the declared parameters, rather than the
>>> provided
>>> arguments, and the former do not have the increased alignment from
>>> build_ref_for_offset.
>>
>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>
>> This means that
>>
>> Int I __aligned__(8);
>>
>> Is passed differently than int.
>>
>> Arm_function_arg needs to be fixed.
> 
> That is,
> 
> typedef int myint __attribute__((aligned(8)));
> 
> int main()
> {
>   myint i = 1;
>   int j = 2;
>   __builtin_printf("%d %d\n", i, j);
> }
> 
> or
> 
> myint i;
> int j;
> myint *p = &i;
> int *q = &j;
> 
> int main()
> {
>   __builtin_printf("%d %d", *p, *q);
> }
> 
> should behave the same.  There isn't a printf modifier for an "aligned int"
> because that sort of thing doesn't make sense.  Special-casing aligned vs.
> non-aligned values only makes sense for things passed by value on the stack.
> And then obviously only dependent on the functuion type signature, not
> on the type of the passed value.
> 

I think the testcase is ill-formed.  Just because printf doesn't have
such a modifier, doesn't mean that another variadic function might not
have a means to detect when an object in the variadic list needs to be
over-aligned.  As such, the test should really be written as:

typedef int myint __attribute__((aligned(8)));

int main()
{
  myint i = 1;
  int j = 2;
  __builtin_printf("%d %d\n", (int) i, j);
}

Variadic functions take the types of their arguments from the types of
the actuals passed.  The compiler should either be applying appropriate
promotion rules to make the types conformant by the language
specification or respecting the types exactly.  However, that should be
done in the mid-end not the back-end.  If incorrect alignment
information is passed to the back-end it can't help but make the wrong
choice.  Examining the mode here wouldn't help either.  Consider:

int foo (int a, int b, int c, int d, int e, int f
__attribute__((aligned(8))));

On ARM that should pass f in a 64-bit aligned location (since the
parameter will be on the stack).

R.


> Richard.
> 
>> Richard.
>>
>>>
>>> FWIW, I also tried:
>>>
>>> __attribute__((__aligned__((16)))) int x;
>>> int main (void)
>>> {
>>>   __builtin_printf("%d\n", x);
>>> }
>>>
>>> but in that case, the arm_function_arg is still fed a type with
>>> alignment 32
>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
>>> has
>>> alignment 128.
>>>
>>> --Alan
>>>
>>> Richard Biener wrote:
>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>>
>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>>
>>>>>> ...actually attach the testcase...
>>>>> What compile options?
>>>>
>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>>> I can't see anything not guaranteeing that:
>>>>
>>>>         .section        .rodata
>>>>         .align  3
>>>> .LANCHOR0 = . + 0
>>>> .LC1:
>>>>         .ascii  "%d %g %d\012\000"
>>>>         .space  6
>>>> .LC0:
>>>>         .word   7
>>>>         .space  4
>>>>         .word   0
>>>>         .word   1075838976
>>>>         .word   9
>>>>         .space  4
>>>>
>>>> maybe there is some more generic code-gen bug for aligned aggregate
>>>> copy?  That is, the patch tells the backend that the loads and
>>>> stores to the 'int' vars (which have padding followed) is aligned
>>>> to 8 bytes.
>>>>
>>>> I don't see what is wrong in the final assembler, but maybe
>>>> some endian issue exists?  The code looks quite ugly though ;)
>>>>
>>>> Richard.
>>>>
>>>>>> Alan Lawrence wrote:
>>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>> testsuite on ARM
>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>> following this
>>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>> (including
>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>> the
>>>>>>> expected
>>>>>>> 7 8 9
>>>>>>> whereas with gcc r221348, instead it prints
>>>>>>> 0 8 0
>>>>>>>
>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>> a
>>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>> sizes-gimplified type_0
>>>>>>> BLK
>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>> 0x2b9b8d428d20
>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>>> 0x2b9b8d092690 int>
>>>>>>>              SI file reduced.c line 12 col 7
>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>>              align 32 offset_align 64
>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>> context
>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>> D.6070>
>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>>> <type_decl
>>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>>
>>>>>>> The tree-optimized output is the same with both compilers (as this
>>> does not
>>>>>>> mention alignment); the expand output differs.
>>>>>>>
>>>>>>> Still investigating...
>>>>>>>
>>>>>>> --Alan
>>>>>>>
>>>>>>>
>>>>>>> Richard Biener wrote:
>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>>> drops alignment info unnecessarily.
>>>>>>>>
>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>>
>>>>>>>>  PR tree-optimization/65310
>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>>  alignment.
>>>>>>>>
>>>>>>>> Index: gcc/tree-sra.c
>>>>>>>>
>>> ===================================================================
>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>>    if (misalign != 0)
>>>>>>>>      align = (misalign & -misalign);
>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>> off);
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31  7:50             ` Richard Biener
  2015-03-31  9:43               ` Richard Earnshaw
@ 2015-03-31  9:50               ` Alan Lawrence
  1 sibling, 0 replies; 31+ messages in thread
From: Alan Lawrence @ 2015-03-31  9:50 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, gcc-patches, Marcus Shawcroft, Richard Earnshaw

Richard Biener wrote:
> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>
>> This means that
>>
>> Int I __aligned__(8);
>>
>> Is passed differently than int.
>>
>> Arm_function_arg needs to be fixed.
> 
> That is,
> 
> typedef int myint __attribute__((aligned(8)));
> 
> int main()
> {
>   myint i = 1;
>   int j = 2;
>   __builtin_printf("%d %d\n", i, j);
> }
> 
> or
> 
> myint i;
> int j;
> myint *p = &i;
> int *q = &j;
> 
> int main()
> {
>   __builtin_printf("%d %d", *p, *q);
> }
> 
> should behave the same.  There isn't a printf modifier for an "aligned int"
> because that sort of thing doesn't make sense.

Agreed. All of the cases you post do indeed behave the same, and correctly (they 
pass the format string in r0, the next argument in r1, and then r2). This is 
what my "aligned(16)" example was trying to achieve also. From the 
-fdump-rtl-expand of your last example:

(insn 7 6 8 2 (set (reg:SI 117)
         (mem:SI (reg/f:SI 116) [2 *_4+0 S4 A32])) richie2.c:10 -1
      (nil))
(insn 8 7 9 2 (set (reg/f:SI 118)
         (symbol_ref:SI ("*.LANCHOR0") [flags 0x182])) richie2.c:10 -1
      (nil))
(insn 9 8 10 2 (set (reg/f:SI 119)
         (mem/f/c:SI (plus:SI (reg/f:SI 118)
                 (const_int 4 [0x4])) [1 p+0 S4 A32])) richie2.c:10 -1
      (nil))
(insn 10 9 11 2 (set (reg:SI 120)
         (mem:SI (reg/f:SI 119) [2 *_2+0 S4 A64])) richie2.c:10 -1      ***
      (nil))
(insn 11 10 12 2 (set (reg:SI 121)
         (symbol_ref/v/f:SI ("*.LC0") [flags 0x82]  <var_decl 0x2b2c3b35d240 
*.LC0>)) richie2.c:10 -1
      (nil))
(insn 12 11 13 2 (set (reg:SI 2 r2)
         (reg:SI 117)) richie2.c:10 -1
      (nil))
(insn 13 12 14 2 (set (reg:SI 1 r1)
         (reg:SI 120)) richie2.c:10 -1
      (nil))
(insn 14 13 15 2 (set (reg:SI 0 r0)
         (reg:SI 121)) richie2.c:10 -1
      (nil))

*** is the load of *p. The mem has alignment 64, but the type describing the 
loaded value *p, passed to arm_function_arg, has alignment 32. Even in the first 
of your examples, or even with an explicit cast to the aligned type:
__builtin_printf("%d %d\n", (myint) i, j);
we still get alignment 32 in arm_function_arg. It's only if SRA is applied, do 
we get the situation where the int with 64-bit alignment, is passed to 
arm_function_arg.

Disclaimer, I haven't tracked down how the alignment information flows through 
the compiler i.e. from build_ref_for_offset into expand_call. But it looks to me 
like something different is happening in the SRA case...no?

--Alan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31  9:43               ` Richard Earnshaw
@ 2015-03-31 10:00                 ` Richard Biener
  2015-03-31 10:10                   ` Richard Earnshaw
  2015-03-31 10:20                   ` Richard Biener
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Biener @ 2015-03-31 10:00 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 08:50, Richard Biener wrote:
> > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> >> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
> >>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> >>> it.
> >>>
> >>> The problem appears to be in laying out arguments, specifically
> >>> varargs. From
> >>> the "good" -fdump-rtl-expand:
> >>>
> >>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> >>> S4 A32])
> >>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> >>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> >>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>         (reg:SI 118)) reduced.c:14 -1
> >>>      (nil))
> >>> (call_insn 22 21 23 2 (parallel [
> >>>             (set (reg:SI 0 r0)
> >>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
> >>> A32])
> >>>
> >>> The struct members are
> >>> reg:SI 113 => int a;
> >>> reg:DF 112 => double b;
> >>> reg:SI 111 => int c;
> >>>
> >>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> >>> pushed
> >>> into virtual-outgoing-args. In contrast, post-change to
> >>> build_ref_of_offset, we get:
> >>>
> >>> (insn 17 16 18 2 (set (reg:SI 118)
> >>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
> >>> *.LC1>)) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> >>> virtual-outgoing-args)
> >>>                 (const_int 8 [0x8])) [0  S4 A64])
> >>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> >>> S8 A64])
> >>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> >>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>         (reg:SI 118)) reduced.c:14 -1
> >>>      (nil))
> >>> (call_insn 22 21 23 2 (parallel [
> >>>             (set (reg:SI 0 r0)
> >>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
> >>> A32])
> >>>
> >>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> >>>
> >>> double+following int are all pushed into virtual-outgoing-args. This is
> >>> because
> >>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> >>> argument (the
> >>> type constructed by build_ref_for_offset); it then executes
> >>> (aapcs_layout_arg,
> >>> arm.c line ~~5914)
> >>>
> >>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >>>      next even number.  */
> >>>   ncrn = pcum->aapcs_ncrn;
> >>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> >>>     ncrn++;
> >>>
> >>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> >>> testcase
> >>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> >>> 32-bit-aligned int instead, which works as previously.
> >>>
> >>> Passing the same members of that struct in a non-vargs call, works ok -
> >>> I think
> >>> because these use the type of the declared parameters, rather than the
> >>> provided
> >>> arguments, and the former do not have the increased alignment from
> >>> build_ref_for_offset.
> >>
> >> It doesn't make sense to use the alignment of passed values.  That looks like bs.
> >>
> >> This means that
> >>
> >> Int I __aligned__(8);
> >>
> >> Is passed differently than int.
> >>
> >> Arm_function_arg needs to be fixed.
> > 
> > That is,
> > 
> > typedef int myint __attribute__((aligned(8)));
> > 
> > int main()
> > {
> >   myint i = 1;
> >   int j = 2;
> >   __builtin_printf("%d %d\n", i, j);
> > }
> > 
> > or
> > 
> > myint i;
> > int j;
> > myint *p = &i;
> > int *q = &j;
> > 
> > int main()
> > {
> >   __builtin_printf("%d %d", *p, *q);
> > }
> > 
> > should behave the same.  There isn't a printf modifier for an "aligned int"
> > because that sort of thing doesn't make sense.  Special-casing aligned vs.
> > non-aligned values only makes sense for things passed by value on the stack.
> > And then obviously only dependent on the functuion type signature, not
> > on the type of the passed value.
> > 
> 
> I think the testcase is ill-formed.  Just because printf doesn't have
> such a modifier, doesn't mean that another variadic function might not
> have a means to detect when an object in the variadic list needs to be
> over-aligned.  As such, the test should really be written as:

A value doesn't have "alignment".  A function may have alignment
requirements on its arguments, clearly printf doesn't.

> typedef int myint __attribute__((aligned(8)));
> 
> int main()
> {
>   myint i = 1;
>   int j = 2;
>   __builtin_printf("%d %d\n", (int) i, j);
> }
> 
> Variadic functions take the types of their arguments from the types of
> the actuals passed.  The compiler should either be applying appropriate
> promotion rules to make the types conformant by the language
> specification or respecting the types exactly.  However, that should be
> done in the mid-end not the back-end.  If incorrect alignment
> information is passed to the back-end it can't help but make the wrong
> choice.  Examining the mode here wouldn't help either.  Consider:
> 
> int foo (int a, int b, int c, int d, int e, int f
> __attribute__((aligned(8))));
> 
> On ARM that should pass f in a 64-bit aligned location (since the
> parameter will be on the stack).
> 
> R.
> 
> 
> > Richard.
> > 
> >> Richard.
> >>
> >>>
> >>> FWIW, I also tried:
> >>>
> >>> __attribute__((__aligned__((16)))) int x;
> >>> int main (void)
> >>> {
> >>>   __builtin_printf("%d\n", x);
> >>> }
> >>>
> >>> but in that case, the arm_function_arg is still fed a type with
> >>> alignment 32
> >>> (bits), i.e. distinct from the type of the field 'x' in memory, which
> >>> has
> >>> alignment 128.
> >>>
> >>> --Alan
> >>>
> >>> Richard Biener wrote:
> >>>> On Mon, 30 Mar 2015, Richard Biener wrote:
> >>>>
> >>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
> >>>>>
> >>>>>> ...actually attach the testcase...
> >>>>> What compile options?
> >>>>
> >>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
> >>>> I can't see anything not guaranteeing that:
> >>>>
> >>>>         .section        .rodata
> >>>>         .align  3
> >>>> .LANCHOR0 = . + 0
> >>>> .LC1:
> >>>>         .ascii  "%d %g %d\012\000"
> >>>>         .space  6
> >>>> .LC0:
> >>>>         .word   7
> >>>>         .space  4
> >>>>         .word   0
> >>>>         .word   1075838976
> >>>>         .word   9
> >>>>         .space  4
> >>>>
> >>>> maybe there is some more generic code-gen bug for aligned aggregate
> >>>> copy?  That is, the patch tells the backend that the loads and
> >>>> stores to the 'int' vars (which have padding followed) is aligned
> >>>> to 8 bytes.
> >>>>
> >>>> I don't see what is wrong in the final assembler, but maybe
> >>>> some endian issue exists?  The code looks quite ugly though ;)
> >>>>
> >>>> Richard.
> >>>>
> >>>>>> Alan Lawrence wrote:
> >>>>>>> We've been seeing a bunch of new failures in the *libffi*
> >>> testsuite on ARM
> >>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
> >>> following this
> >>>>>>> one-liner fix. I've reduced the testcase down to the attached
> >>> (including
> >>>>>>> removing any dependency on libffi); with gcc r221347, this prints
> >>> the
> >>>>>>> expected
> >>>>>>> 7 8 9
> >>>>>>> whereas with gcc r221348, instead it prints
> >>>>>>> 0 8 0
> >>>>>>>
> >>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
> >>> a
> >>>>>>> var_decl of b1, from 32 to 64; both have type
> >>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
> >>> sizes-gimplified type_0
> >>>>>>> BLK
> >>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
> >>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
> >>>>>>>          align 64 symtab 0 alias set 1 canonical type
> >>> 0x2b9b8d428d20
> >>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
> >>>>>>> 0x2b9b8d092690 int>
> >>>>>>>              SI file reduced.c line 12 col 7
> >>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
> >>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
> >>>>>>>              align 32 offset_align 64
> >>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
> >>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
> >>> context
> >>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
> >>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
> >>> D.6070>
> >>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
> >>> <type_decl
> >>>>>>> 0x2b9b8d42b000 D.6044>>
> >>>>>>>
> >>>>>>> The tree-optimized output is the same with both compilers (as this
> >>> does not
> >>>>>>> mention alignment); the expand output differs.
> >>>>>>>
> >>>>>>> Still investigating...
> >>>>>>>
> >>>>>>> --Alan
> >>>>>>>
> >>>>>>>
> >>>>>>> Richard Biener wrote:
> >>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
> >>>>>>>> drops alignment info unnecessarily.
> >>>>>>>>
> >>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >>>>>>>>
> >>>>>>>> Richard.
> >>>>>>>>
> >>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
> >>>>>>>>
> >>>>>>>>  PR tree-optimization/65310
> >>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
> >>>>>>>>  alignment.
> >>>>>>>>
> >>>>>>>> Index: gcc/tree-sra.c
> >>>>>>>>
> >>> ===================================================================
> >>>>>>>> --- gcc/tree-sra.c       (revision 221324)
> >>>>>>>> +++ gcc/tree-sra.c       (working copy)
> >>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
> >>>>>>>>    misalign = (misalign + offset) & (align - 1);
> >>>>>>>>    if (misalign != 0)
> >>>>>>>>      align = (misalign & -misalign);
> >>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
> >>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
> >>>>>>>>      exp_type = build_aligned_type (exp_type, align);
> >>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
> >>> off);
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:00                 ` Richard Biener
@ 2015-03-31 10:10                   ` Richard Earnshaw
  2015-03-31 10:32                     ` Jakub Jelinek
  2015-03-31 10:36                     ` Richard Biener
  2015-03-31 10:20                   ` Richard Biener
  1 sibling, 2 replies; 31+ messages in thread
From: Richard Earnshaw @ 2015-03-31 10:10 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On 31/03/15 11:00, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 08:50, Richard Biener wrote:
>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>> it.
>>>>>
>>>>> The problem appears to be in laying out arguments, specifically
>>>>> varargs. From
>>>>> the "good" -fdump-rtl-expand:
>>>>>
>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>> S4 A32])
>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>      (nil))
>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>             (set (reg:SI 0 r0)
>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>> A32])
>>>>>
>>>>> The struct members are
>>>>> reg:SI 113 => int a;
>>>>> reg:DF 112 => double b;
>>>>> reg:SI 111 => int c;
>>>>>
>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>> pushed
>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>> build_ref_of_offset, we get:
>>>>>
>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>> *.LC1>)) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>> virtual-outgoing-args)
>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>> S8 A64])
>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>      (nil))
>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>             (set (reg:SI 0 r0)
>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>> A32])
>>>>>
>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>
>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>> because
>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>> argument (the
>>>>> type constructed by build_ref_for_offset); it then executes
>>>>> (aapcs_layout_arg,
>>>>> arm.c line ~~5914)
>>>>>
>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>      next even number.  */
>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>     ncrn++;
>>>>>
>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>> testcase
>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>
>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>> I think
>>>>> because these use the type of the declared parameters, rather than the
>>>>> provided
>>>>> arguments, and the former do not have the increased alignment from
>>>>> build_ref_for_offset.
>>>>
>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>>>
>>>> This means that
>>>>
>>>> Int I __aligned__(8);
>>>>
>>>> Is passed differently than int.
>>>>
>>>> Arm_function_arg needs to be fixed.
>>>
>>> That is,
>>>
>>> typedef int myint __attribute__((aligned(8)));
>>>
>>> int main()
>>> {
>>>   myint i = 1;
>>>   int j = 2;
>>>   __builtin_printf("%d %d\n", i, j);
>>> }
>>>
>>> or
>>>
>>> myint i;
>>> int j;
>>> myint *p = &i;
>>> int *q = &j;
>>>
>>> int main()
>>> {
>>>   __builtin_printf("%d %d", *p, *q);
>>> }
>>>
>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>> non-aligned values only makes sense for things passed by value on the stack.
>>> And then obviously only dependent on the functuion type signature, not
>>> on the type of the passed value.
>>>
>>
>> I think the testcase is ill-formed.  Just because printf doesn't have
>> such a modifier, doesn't mean that another variadic function might not
>> have a means to detect when an object in the variadic list needs to be
>> over-aligned.  As such, the test should really be written as:
> 
> A value doesn't have "alignment".  A function may have alignment
> requirements on its arguments, clearly printf doesn't.
> 

Values don't.  But types do and variadic functions are special in that
they derive their types from the types of the actual parameters passed
not from the formals in the prototype.  Any manipulation of the types
should be done in the front end, not in the back end.

R.

>> typedef int myint __attribute__((aligned(8)));
>>
>> int main()
>> {
>>   myint i = 1;
>>   int j = 2;
>>   __builtin_printf("%d %d\n", (int) i, j);
>> }
>>
>> Variadic functions take the types of their arguments from the types of
>> the actuals passed.  The compiler should either be applying appropriate
>> promotion rules to make the types conformant by the language
>> specification or respecting the types exactly.  However, that should be
>> done in the mid-end not the back-end.  If incorrect alignment
>> information is passed to the back-end it can't help but make the wrong
>> choice.  Examining the mode here wouldn't help either.  Consider:
>>
>> int foo (int a, int b, int c, int d, int e, int f
>> __attribute__((aligned(8))));
>>
>> On ARM that should pass f in a 64-bit aligned location (since the
>> parameter will be on the stack).
>>
>> R.
>>
>>
>>> Richard.
>>>
>>>> Richard.
>>>>
>>>>>
>>>>> FWIW, I also tried:
>>>>>
>>>>> __attribute__((__aligned__((16)))) int x;
>>>>> int main (void)
>>>>> {
>>>>>   __builtin_printf("%d\n", x);
>>>>> }
>>>>>
>>>>> but in that case, the arm_function_arg is still fed a type with
>>>>> alignment 32
>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
>>>>> has
>>>>> alignment 128.
>>>>>
>>>>> --Alan
>>>>>
>>>>> Richard Biener wrote:
>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>>>>
>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>>>>
>>>>>>>> ...actually attach the testcase...
>>>>>>> What compile options?
>>>>>>
>>>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>>>>> I can't see anything not guaranteeing that:
>>>>>>
>>>>>>         .section        .rodata
>>>>>>         .align  3
>>>>>> .LANCHOR0 = . + 0
>>>>>> .LC1:
>>>>>>         .ascii  "%d %g %d\012\000"
>>>>>>         .space  6
>>>>>> .LC0:
>>>>>>         .word   7
>>>>>>         .space  4
>>>>>>         .word   0
>>>>>>         .word   1075838976
>>>>>>         .word   9
>>>>>>         .space  4
>>>>>>
>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
>>>>>> copy?  That is, the patch tells the backend that the loads and
>>>>>> stores to the 'int' vars (which have padding followed) is aligned
>>>>>> to 8 bytes.
>>>>>>
>>>>>> I don't see what is wrong in the final assembler, but maybe
>>>>>> some endian issue exists?  The code looks quite ugly though ;)
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>> Alan Lawrence wrote:
>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>>>> testsuite on ARM
>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>>>> following this
>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>>>> (including
>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>>>> the
>>>>>>>>> expected
>>>>>>>>> 7 8 9
>>>>>>>>> whereas with gcc r221348, instead it prints
>>>>>>>>> 0 8 0
>>>>>>>>>
>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>>>> a
>>>>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>>>> sizes-gimplified type_0
>>>>>>>>> BLK
>>>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>>>> 0x2b9b8d428d20
>>>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>>>>> 0x2b9b8d092690 int>
>>>>>>>>>              SI file reduced.c line 12 col 7
>>>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>>>>              align 32 offset_align 64
>>>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>>>> context
>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>>>> D.6070>
>>>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>>>>> <type_decl
>>>>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>>>>
>>>>>>>>> The tree-optimized output is the same with both compilers (as this
>>>>> does not
>>>>>>>>> mention alignment); the expand output differs.
>>>>>>>>>
>>>>>>>>> Still investigating...
>>>>>>>>>
>>>>>>>>> --Alan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Richard Biener wrote:
>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>>>>> drops alignment info unnecessarily.
>>>>>>>>>>
>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>>>>
>>>>>>>>>>  PR tree-optimization/65310
>>>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>>>>  alignment.
>>>>>>>>>>
>>>>>>>>>> Index: gcc/tree-sra.c
>>>>>>>>>>
>>>>> ===================================================================
>>>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>>>>    if (misalign != 0)
>>>>>>>>>>      align = (misalign & -misalign);
>>>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>>>> off);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:00                 ` Richard Biener
  2015-03-31 10:10                   ` Richard Earnshaw
@ 2015-03-31 10:20                   ` Richard Biener
  2015-03-31 10:31                     ` Richard Earnshaw
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Biener @ 2015-03-31 10:20 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On Tue, 31 Mar 2015, Richard Biener wrote:

> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
> > On 31/03/15 08:50, Richard Biener wrote:
> > > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> > >> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
> > >>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> > >>> it.
> > >>>
> > >>> The problem appears to be in laying out arguments, specifically
> > >>> varargs. From
> > >>> the "good" -fdump-rtl-expand:
> > >>>
> > >>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> > >>> S4 A32])
> > >>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> > >>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> > >>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> > >>>         (reg:SI 118)) reduced.c:14 -1
> > >>>      (nil))
> > >>> (call_insn 22 21 23 2 (parallel [
> > >>>             (set (reg:SI 0 r0)
> > >>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> > >>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
> > >>> A32])
> > >>>
> > >>> The struct members are
> > >>> reg:SI 113 => int a;
> > >>> reg:DF 112 => double b;
> > >>> reg:SI 111 => int c;
> > >>>
> > >>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> > >>> pushed
> > >>> into virtual-outgoing-args. In contrast, post-change to
> > >>> build_ref_of_offset, we get:
> > >>>
> > >>> (insn 17 16 18 2 (set (reg:SI 118)
> > >>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
> > >>> *.LC1>)) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> > >>> virtual-outgoing-args)
> > >>>                 (const_int 8 [0x8])) [0  S4 A64])
> > >>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> > >>> S8 A64])
> > >>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> > >>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> > >>>         (reg:SI 118)) reduced.c:14 -1
> > >>>      (nil))
> > >>> (call_insn 22 21 23 2 (parallel [
> > >>>             (set (reg:SI 0 r0)
> > >>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> > >>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
> > >>> A32])
> > >>>
> > >>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> > >>>
> > >>> double+following int are all pushed into virtual-outgoing-args. This is
> > >>> because
> > >>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> > >>> argument (the
> > >>> type constructed by build_ref_for_offset); it then executes
> > >>> (aapcs_layout_arg,
> > >>> arm.c line ~~5914)
> > >>>
> > >>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> > >>>      next even number.  */
> > >>>   ncrn = pcum->aapcs_ncrn;
> > >>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> > >>>     ncrn++;
> > >>>
> > >>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> > >>> testcase
> > >>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> > >>> 32-bit-aligned int instead, which works as previously.
> > >>>
> > >>> Passing the same members of that struct in a non-vargs call, works ok -
> > >>> I think
> > >>> because these use the type of the declared parameters, rather than the
> > >>> provided
> > >>> arguments, and the former do not have the increased alignment from
> > >>> build_ref_for_offset.
> > >>
> > >> It doesn't make sense to use the alignment of passed values.  That looks like bs.
> > >>
> > >> This means that
> > >>
> > >> Int I __aligned__(8);
> > >>
> > >> Is passed differently than int.
> > >>
> > >> Arm_function_arg needs to be fixed.
> > > 
> > > That is,
> > > 
> > > typedef int myint __attribute__((aligned(8)));
> > > 
> > > int main()
> > > {
> > >   myint i = 1;
> > >   int j = 2;
> > >   __builtin_printf("%d %d\n", i, j);
> > > }
> > > 
> > > or
> > > 
> > > myint i;
> > > int j;
> > > myint *p = &i;
> > > int *q = &j;
> > > 
> > > int main()
> > > {
> > >   __builtin_printf("%d %d", *p, *q);
> > > }
> > > 
> > > should behave the same.  There isn't a printf modifier for an "aligned int"
> > > because that sort of thing doesn't make sense.  Special-casing aligned vs.
> > > non-aligned values only makes sense for things passed by value on the stack.
> > > And then obviously only dependent on the functuion type signature, not
> > > on the type of the passed value.
> > > 
> > 
> > I think the testcase is ill-formed.  Just because printf doesn't have
> > such a modifier, doesn't mean that another variadic function might not
> > have a means to detect when an object in the variadic list needs to be
> > over-aligned.  As such, the test should really be written as:
> 
> A value doesn't have "alignment".  A function may have alignment
> requirements on its arguments, clearly printf doesn't.

Btw, it looks like function_arg is supposed to pass whether the argument
is to the variadic part of the function or not, but it gets passed
false as in

      args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
                                                argpos < n_named_args);

n_named_args is 4.  That is because ARM asks to do this via

  if (type_arg_types != 0
      && targetm.calls.strict_argument_naming (args_so_far))
    ;
  else if (type_arg_types != 0
           && ! targetm.calls.pretend_outgoing_varargs_named 
(args_so_far))
    /* Don't include the last named arg.  */
    --n_named_args;
  else
    /* Treat all args as named.  */
    n_named_args = num_actuals;

thus you lose that info (you could have that readily available).

> > typedef int myint __attribute__((aligned(8)));
> > 
> > int main()
> > {
> >   myint i = 1;
> >   int j = 2;
> >   __builtin_printf("%d %d\n", (int) i, j);
> > }
> > 
> > Variadic functions take the types of their arguments from the types of
> > the actuals passed.  The compiler should either be applying appropriate
> > promotion rules to make the types conformant by the language
> > specification or respecting the types exactly.  However, that should be
> > done in the mid-end not the back-end.  If incorrect alignment
> > information is passed to the back-end it can't help but make the wrong
> > choice.  Examining the mode here wouldn't help either.  Consider:
> > 
> > int foo (int a, int b, int c, int d, int e, int f
> > __attribute__((aligned(8))));
> > 
> > On ARM that should pass f in a 64-bit aligned location (since the
> > parameter will be on the stack).
> > 
> > R.
> > 
> > 
> > > Richard.
> > > 
> > >> Richard.
> > >>
> > >>>
> > >>> FWIW, I also tried:
> > >>>
> > >>> __attribute__((__aligned__((16)))) int x;
> > >>> int main (void)
> > >>> {
> > >>>   __builtin_printf("%d\n", x);
> > >>> }
> > >>>
> > >>> but in that case, the arm_function_arg is still fed a type with
> > >>> alignment 32
> > >>> (bits), i.e. distinct from the type of the field 'x' in memory, which
> > >>> has
> > >>> alignment 128.
> > >>>
> > >>> --Alan
> > >>>
> > >>> Richard Biener wrote:
> > >>>> On Mon, 30 Mar 2015, Richard Biener wrote:
> > >>>>
> > >>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
> > >>>>>
> > >>>>>> ...actually attach the testcase...
> > >>>>> What compile options?
> > >>>>
> > >>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
> > >>>> I can't see anything not guaranteeing that:
> > >>>>
> > >>>>         .section        .rodata
> > >>>>         .align  3
> > >>>> .LANCHOR0 = . + 0
> > >>>> .LC1:
> > >>>>         .ascii  "%d %g %d\012\000"
> > >>>>         .space  6
> > >>>> .LC0:
> > >>>>         .word   7
> > >>>>         .space  4
> > >>>>         .word   0
> > >>>>         .word   1075838976
> > >>>>         .word   9
> > >>>>         .space  4
> > >>>>
> > >>>> maybe there is some more generic code-gen bug for aligned aggregate
> > >>>> copy?  That is, the patch tells the backend that the loads and
> > >>>> stores to the 'int' vars (which have padding followed) is aligned
> > >>>> to 8 bytes.
> > >>>>
> > >>>> I don't see what is wrong in the final assembler, but maybe
> > >>>> some endian issue exists?  The code looks quite ugly though ;)
> > >>>>
> > >>>> Richard.
> > >>>>
> > >>>>>> Alan Lawrence wrote:
> > >>>>>>> We've been seeing a bunch of new failures in the *libffi*
> > >>> testsuite on ARM
> > >>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
> > >>> following this
> > >>>>>>> one-liner fix. I've reduced the testcase down to the attached
> > >>> (including
> > >>>>>>> removing any dependency on libffi); with gcc r221347, this prints
> > >>> the
> > >>>>>>> expected
> > >>>>>>> 7 8 9
> > >>>>>>> whereas with gcc r221348, instead it prints
> > >>>>>>> 0 8 0
> > >>>>>>>
> > >>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
> > >>> a
> > >>>>>>> var_decl of b1, from 32 to 64; both have type
> > >>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
> > >>> sizes-gimplified type_0
> > >>>>>>> BLK
> > >>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
> > >>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
> > >>>>>>>          align 64 symtab 0 alias set 1 canonical type
> > >>> 0x2b9b8d428d20
> > >>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
> > >>>>>>> 0x2b9b8d092690 int>
> > >>>>>>>              SI file reduced.c line 12 col 7
> > >>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
> > >>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
> > >>>>>>>              align 32 offset_align 64
> > >>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
> > >>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
> > >>> context
> > >>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
> > >>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
> > >>> D.6070>
> > >>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
> > >>> <type_decl
> > >>>>>>> 0x2b9b8d42b000 D.6044>>
> > >>>>>>>
> > >>>>>>> The tree-optimized output is the same with both compilers (as this
> > >>> does not
> > >>>>>>> mention alignment); the expand output differs.
> > >>>>>>>
> > >>>>>>> Still investigating...
> > >>>>>>>
> > >>>>>>> --Alan
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Richard Biener wrote:
> > >>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
> > >>>>>>>> drops alignment info unnecessarily.
> > >>>>>>>>
> > >>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > >>>>>>>>
> > >>>>>>>> Richard.
> > >>>>>>>>
> > >>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
> > >>>>>>>>
> > >>>>>>>>  PR tree-optimization/65310
> > >>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
> > >>>>>>>>  alignment.
> > >>>>>>>>
> > >>>>>>>> Index: gcc/tree-sra.c
> > >>>>>>>>
> > >>> ===================================================================
> > >>>>>>>> --- gcc/tree-sra.c       (revision 221324)
> > >>>>>>>> +++ gcc/tree-sra.c       (working copy)
> > >>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
> > >>>>>>>>    misalign = (misalign + offset) & (align - 1);
> > >>>>>>>>    if (misalign != 0)
> > >>>>>>>>      align = (misalign & -misalign);
> > >>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
> > >>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
> > >>>>>>>>      exp_type = build_aligned_type (exp_type, align);
> > >>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
> > >>> off);
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> > >>
> > > 
> > 
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:20                   ` Richard Biener
@ 2015-03-31 10:31                     ` Richard Earnshaw
  2015-03-31 10:45                       ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Earnshaw @ 2015-03-31 10:31 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On 31/03/15 11:20, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Biener wrote:
> 
>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>
>>> On 31/03/15 08:50, Richard Biener wrote:
>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>>> it.
>>>>>>
>>>>>> The problem appears to be in laying out arguments, specifically
>>>>>> varargs. From
>>>>>> the "good" -fdump-rtl-expand:
>>>>>>
>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>> S4 A32])
>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>             (set (reg:SI 0 r0)
>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>> A32])
>>>>>>
>>>>>> The struct members are
>>>>>> reg:SI 113 => int a;
>>>>>> reg:DF 112 => double b;
>>>>>> reg:SI 111 => int c;
>>>>>>
>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>>> pushed
>>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>>> build_ref_of_offset, we get:
>>>>>>
>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>>> *.LC1>)) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>>> virtual-outgoing-args)
>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>> S8 A64])
>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>             (set (reg:SI 0 r0)
>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>> A32])
>>>>>>
>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>>
>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>>> because
>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>>> argument (the
>>>>>> type constructed by build_ref_for_offset); it then executes
>>>>>> (aapcs_layout_arg,
>>>>>> arm.c line ~~5914)
>>>>>>
>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>>      next even number.  */
>>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>>     ncrn++;
>>>>>>
>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>>> testcase
>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>>
>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>>> I think
>>>>>> because these use the type of the declared parameters, rather than the
>>>>>> provided
>>>>>> arguments, and the former do not have the increased alignment from
>>>>>> build_ref_for_offset.
>>>>>
>>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>>>>
>>>>> This means that
>>>>>
>>>>> Int I __aligned__(8);
>>>>>
>>>>> Is passed differently than int.
>>>>>
>>>>> Arm_function_arg needs to be fixed.
>>>>
>>>> That is,
>>>>
>>>> typedef int myint __attribute__((aligned(8)));
>>>>
>>>> int main()
>>>> {
>>>>   myint i = 1;
>>>>   int j = 2;
>>>>   __builtin_printf("%d %d\n", i, j);
>>>> }
>>>>
>>>> or
>>>>
>>>> myint i;
>>>> int j;
>>>> myint *p = &i;
>>>> int *q = &j;
>>>>
>>>> int main()
>>>> {
>>>>   __builtin_printf("%d %d", *p, *q);
>>>> }
>>>>
>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>>> non-aligned values only makes sense for things passed by value on the stack.
>>>> And then obviously only dependent on the functuion type signature, not
>>>> on the type of the passed value.
>>>>
>>>
>>> I think the testcase is ill-formed.  Just because printf doesn't have
>>> such a modifier, doesn't mean that another variadic function might not
>>> have a means to detect when an object in the variadic list needs to be
>>> over-aligned.  As such, the test should really be written as:
>>
>> A value doesn't have "alignment".  A function may have alignment
>> requirements on its arguments, clearly printf doesn't.
> 
> Btw, it looks like function_arg is supposed to pass whether the argument
> is to the variadic part of the function or not, but it gets passed
> false as in
> 
>       args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
>                                                 argpos < n_named_args);
> 
> n_named_args is 4.  That is because ARM asks to do this via
> 
>   if (type_arg_types != 0
>       && targetm.calls.strict_argument_naming (args_so_far))
>     ;
>   else if (type_arg_types != 0
>            && ! targetm.calls.pretend_outgoing_varargs_named 
> (args_so_far))
>     /* Don't include the last named arg.  */
>     --n_named_args;
>   else
>     /* Treat all args as named.  */
>     n_named_args = num_actuals;
> 
> thus you lose that info (you could have that readily available).
> 

From the calling side of a function it shouldn't matter.  They only
thing the caller has to know is that the function is variadic (and
therefore that the base-standard rules from the AAPCS apply -- no use of
FP registers for parameters).  The behaviour after that is *as if* all
the arguments were pushed onto the stack in the correct order and
finally the lowest 4 words are popped off the stack again into r0-r3.
Hence any alignment that would apply to arguments on the stack has to be
reflected in their allocation into r0-r3 (since the push/pop of those
registers never happens).

R.

>>> typedef int myint __attribute__((aligned(8)));
>>>
>>> int main()
>>> {
>>>   myint i = 1;
>>>   int j = 2;
>>>   __builtin_printf("%d %d\n", (int) i, j);
>>> }
>>>
>>> Variadic functions take the types of their arguments from the types of
>>> the actuals passed.  The compiler should either be applying appropriate
>>> promotion rules to make the types conformant by the language
>>> specification or respecting the types exactly.  However, that should be
>>> done in the mid-end not the back-end.  If incorrect alignment
>>> information is passed to the back-end it can't help but make the wrong
>>> choice.  Examining the mode here wouldn't help either.  Consider:
>>>
>>> int foo (int a, int b, int c, int d, int e, int f
>>> __attribute__((aligned(8))));
>>>
>>> On ARM that should pass f in a 64-bit aligned location (since the
>>> parameter will be on the stack).
>>>
>>> R.
>>>
>>>
>>>> Richard.
>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> FWIW, I also tried:
>>>>>>
>>>>>> __attribute__((__aligned__((16)))) int x;
>>>>>> int main (void)
>>>>>> {
>>>>>>   __builtin_printf("%d\n", x);
>>>>>> }
>>>>>>
>>>>>> but in that case, the arm_function_arg is still fed a type with
>>>>>> alignment 32
>>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
>>>>>> has
>>>>>> alignment 128.
>>>>>>
>>>>>> --Alan
>>>>>>
>>>>>> Richard Biener wrote:
>>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>>>>>
>>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>>>>>
>>>>>>>>> ...actually attach the testcase...
>>>>>>>> What compile options?
>>>>>>>
>>>>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>>>>>> I can't see anything not guaranteeing that:
>>>>>>>
>>>>>>>         .section        .rodata
>>>>>>>         .align  3
>>>>>>> .LANCHOR0 = . + 0
>>>>>>> .LC1:
>>>>>>>         .ascii  "%d %g %d\012\000"
>>>>>>>         .space  6
>>>>>>> .LC0:
>>>>>>>         .word   7
>>>>>>>         .space  4
>>>>>>>         .word   0
>>>>>>>         .word   1075838976
>>>>>>>         .word   9
>>>>>>>         .space  4
>>>>>>>
>>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
>>>>>>> copy?  That is, the patch tells the backend that the loads and
>>>>>>> stores to the 'int' vars (which have padding followed) is aligned
>>>>>>> to 8 bytes.
>>>>>>>
>>>>>>> I don't see what is wrong in the final assembler, but maybe
>>>>>>> some endian issue exists?  The code looks quite ugly though ;)
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>> Alan Lawrence wrote:
>>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>>>>> testsuite on ARM
>>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>>>>> following this
>>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>>>>> (including
>>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>>>>> the
>>>>>>>>>> expected
>>>>>>>>>> 7 8 9
>>>>>>>>>> whereas with gcc r221348, instead it prints
>>>>>>>>>> 0 8 0
>>>>>>>>>>
>>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>>>>> a
>>>>>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>>>>> sizes-gimplified type_0
>>>>>>>>>> BLK
>>>>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>>>>> 0x2b9b8d428d20
>>>>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>>>>>> 0x2b9b8d092690 int>
>>>>>>>>>>              SI file reduced.c line 12 col 7
>>>>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>>>>>              align 32 offset_align 64
>>>>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>>>>> context
>>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>>>>> D.6070>
>>>>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>>>>>> <type_decl
>>>>>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>>>>>
>>>>>>>>>> The tree-optimized output is the same with both compilers (as this
>>>>>> does not
>>>>>>>>>> mention alignment); the expand output differs.
>>>>>>>>>>
>>>>>>>>>> Still investigating...
>>>>>>>>>>
>>>>>>>>>> --Alan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Richard Biener wrote:
>>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>>>>>> drops alignment info unnecessarily.
>>>>>>>>>>>
>>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>>>>>
>>>>>>>>>>>  PR tree-optimization/65310
>>>>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>>>>>  alignment.
>>>>>>>>>>>
>>>>>>>>>>> Index: gcc/tree-sra.c
>>>>>>>>>>>
>>>>>> ===================================================================
>>>>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>>>>>    if (misalign != 0)
>>>>>>>>>>>      align = (misalign & -misalign);
>>>>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>>>>> off);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:10                   ` Richard Earnshaw
@ 2015-03-31 10:32                     ` Jakub Jelinek
  2015-03-31 10:36                     ` Richard Biener
  1 sibling, 0 replies; 31+ messages in thread
From: Jakub Jelinek @ 2015-03-31 10:32 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Richard Biener, Richard Biener, Alan Lawrence, gcc-patches,
	Marcus Shawcroft

On Tue, Mar 31, 2015 at 11:10:39AM +0100, Richard Earnshaw wrote:
> >>> That is,
> >>>
> >>> typedef int myint __attribute__((aligned(8)));
> >>>
> >>> int main()
> >>> {
> >>>   myint i = 1;
> >>>   int j = 2;
> >>>   __builtin_printf("%d %d\n", i, j);
> >>> }
> >>>
> >>> or
> >>>
> >>> myint i;
> >>> int j;
> >>> myint *p = &i;
> >>> int *q = &j;
> >>>
> >>> int main()
> >>> {
> >>>   __builtin_printf("%d %d", *p, *q);
> >>> }

Note that starting with r221348, gcc fails to profiledbootstrap on
armv7hl-linux-gnueabi.  I'd hope it is the same thing.

To middle-end, all integral type conversions that differ just in alignment
are useless - for INTEGRAL_TYPE_P all useless_type_conversion_p cares about
is sign and precision.

So, if arm has a weirdo ABI that wants to pass aligned types differently
(why, I'd say it is just a bug in the ABI), then for named arguments it
really has to look at the function type - the type of the argument, rather
than the passed in value's type, and for varargs it would be best if it
remembered such thing early (in the FEs), because the middle-end can change
things any time, with or without Richard B.'s recent SRA change.

	Jakub

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:10                   ` Richard Earnshaw
  2015-03-31 10:32                     ` Jakub Jelinek
@ 2015-03-31 10:36                     ` Richard Biener
  2015-03-31 10:40                       ` Richard Earnshaw
  2015-03-31 10:47                       ` Alan Lawrence
  1 sibling, 2 replies; 31+ messages in thread
From: Richard Biener @ 2015-03-31 10:36 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 11:00, Richard Biener wrote:
> > On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> > 
> >> On 31/03/15 08:50, Richard Biener wrote:
> >>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> >>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
> >>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> >>>>> it.
> >>>>>
> >>>>> The problem appears to be in laying out arguments, specifically
> >>>>> varargs. From
> >>>>> the "good" -fdump-rtl-expand:
> >>>>>
> >>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>> S4 A32])
> >>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>      (nil))
> >>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> >>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>      (nil))
> >>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> >>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>      (nil))
> >>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>      (nil))
> >>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>             (set (reg:SI 0 r0)
> >>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>> A32])
> >>>>>
> >>>>> The struct members are
> >>>>> reg:SI 113 => int a;
> >>>>> reg:DF 112 => double b;
> >>>>> reg:SI 111 => int c;
> >>>>>
> >>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> >>>>> pushed
> >>>>> into virtual-outgoing-args. In contrast, post-change to
> >>>>> build_ref_of_offset, we get:
> >>>>>
> >>>>> (insn 17 16 18 2 (set (reg:SI 118)
> >>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
> >>>>> *.LC1>)) reduced.c:14 -1
> >>>>>      (nil))
> >>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> >>>>> virtual-outgoing-args)
> >>>>>                 (const_int 8 [0x8])) [0  S4 A64])
> >>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>      (nil))
> >>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>> S8 A64])
> >>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>      (nil))
> >>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> >>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>      (nil))
> >>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>      (nil))
> >>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>             (set (reg:SI 0 r0)
> >>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>> A32])
> >>>>>
> >>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> >>>>>
> >>>>> double+following int are all pushed into virtual-outgoing-args. This is
> >>>>> because
> >>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> >>>>> argument (the
> >>>>> type constructed by build_ref_for_offset); it then executes
> >>>>> (aapcs_layout_arg,
> >>>>> arm.c line ~~5914)
> >>>>>
> >>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >>>>>      next even number.  */
> >>>>>   ncrn = pcum->aapcs_ncrn;
> >>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> >>>>>     ncrn++;
> >>>>>
> >>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> >>>>> testcase
> >>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> >>>>> 32-bit-aligned int instead, which works as previously.
> >>>>>
> >>>>> Passing the same members of that struct in a non-vargs call, works ok -
> >>>>> I think
> >>>>> because these use the type of the declared parameters, rather than the
> >>>>> provided
> >>>>> arguments, and the former do not have the increased alignment from
> >>>>> build_ref_for_offset.
> >>>>
> >>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
> >>>>
> >>>> This means that
> >>>>
> >>>> Int I __aligned__(8);
> >>>>
> >>>> Is passed differently than int.
> >>>>
> >>>> Arm_function_arg needs to be fixed.
> >>>
> >>> That is,
> >>>
> >>> typedef int myint __attribute__((aligned(8)));
> >>>
> >>> int main()
> >>> {
> >>>   myint i = 1;
> >>>   int j = 2;
> >>>   __builtin_printf("%d %d\n", i, j);
> >>> }
> >>>
> >>> or
> >>>
> >>> myint i;
> >>> int j;
> >>> myint *p = &i;
> >>> int *q = &j;
> >>>
> >>> int main()
> >>> {
> >>>   __builtin_printf("%d %d", *p, *q);
> >>> }
> >>>
> >>> should behave the same.  There isn't a printf modifier for an "aligned int"
> >>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
> >>> non-aligned values only makes sense for things passed by value on the stack.
> >>> And then obviously only dependent on the functuion type signature, not
> >>> on the type of the passed value.
> >>>
> >>
> >> I think the testcase is ill-formed.  Just because printf doesn't have
> >> such a modifier, doesn't mean that another variadic function might not
> >> have a means to detect when an object in the variadic list needs to be
> >> over-aligned.  As such, the test should really be written as:
> > 
> > A value doesn't have "alignment".  A function may have alignment
> > requirements on its arguments, clearly printf doesn't.
> > 
> 
> Values don't.  But types do and variadic functions are special in that
> they derive their types from the types of the actual parameters passed
> not from the formals in the prototype.  Any manipulation of the types
> should be done in the front end, not in the back end.

The following seems to help the testcase (by luck I'd say?).  It
makes us drop alignment information from the temporaries that
SRA creates as memory replacement.

But I find it odd that on ARM passing *((aligned_int *)p) as
vararg (only as varargs?) changes calling conventions independent
of the functions type signature.

Richard.

Index: gcc/tree-sra.c
===================================================================
--- gcc/tree-sra.c      (revision 221770)
+++ gcc/tree-sra.c      (working copy)
@@ -2012,7 +2012,9 @@ create_access_replacement (struct access
       DECL_CONTEXT (repl) = current_function_decl;
     }
   else
-    repl = create_tmp_var (access->type, "SR");
+    repl = create_tmp_var (build_qualified_type
+                            (TYPE_MAIN_VARIANT (access->type),
+                             TYPE_QUALS (access->type)), "SR");
   if (TREE_CODE (access->type) == COMPLEX_TYPE
       || TREE_CODE (access->type) == VECTOR_TYPE)
     {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:36                     ` Richard Biener
@ 2015-03-31 10:40                       ` Richard Earnshaw
  2015-03-31 10:45                         ` Richard Biener
  2015-03-31 10:47                       ` Alan Lawrence
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Earnshaw @ 2015-03-31 10:40 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On 31/03/15 11:36, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 11:00, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>
>>>> On 31/03/15 08:50, Richard Biener wrote:
>>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>>>> it.
>>>>>>>
>>>>>>> The problem appears to be in laying out arguments, specifically
>>>>>>> varargs. From
>>>>>>> the "good" -fdump-rtl-expand:
>>>>>>>
>>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>> S4 A32])
>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>      (nil))
>>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>      (nil))
>>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>      (nil))
>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>      (nil))
>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>> A32])
>>>>>>>
>>>>>>> The struct members are
>>>>>>> reg:SI 113 => int a;
>>>>>>> reg:DF 112 => double b;
>>>>>>> reg:SI 111 => int c;
>>>>>>>
>>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>>>> pushed
>>>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>>>> build_ref_of_offset, we get:
>>>>>>>
>>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>>>> *.LC1>)) reduced.c:14 -1
>>>>>>>      (nil))
>>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>>>> virtual-outgoing-args)
>>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>      (nil))
>>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>> S8 A64])
>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>      (nil))
>>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>      (nil))
>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>      (nil))
>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>> A32])
>>>>>>>
>>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>>>
>>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>>>> because
>>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>>>> argument (the
>>>>>>> type constructed by build_ref_for_offset); it then executes
>>>>>>> (aapcs_layout_arg,
>>>>>>> arm.c line ~~5914)
>>>>>>>
>>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>>>      next even number.  */
>>>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>>>     ncrn++;
>>>>>>>
>>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>>>> testcase
>>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>>>
>>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>>>> I think
>>>>>>> because these use the type of the declared parameters, rather than the
>>>>>>> provided
>>>>>>> arguments, and the former do not have the increased alignment from
>>>>>>> build_ref_for_offset.
>>>>>>
>>>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>>>>>
>>>>>> This means that
>>>>>>
>>>>>> Int I __aligned__(8);
>>>>>>
>>>>>> Is passed differently than int.
>>>>>>
>>>>>> Arm_function_arg needs to be fixed.
>>>>>
>>>>> That is,
>>>>>
>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>
>>>>> int main()
>>>>> {
>>>>>   myint i = 1;
>>>>>   int j = 2;
>>>>>   __builtin_printf("%d %d\n", i, j);
>>>>> }
>>>>>
>>>>> or
>>>>>
>>>>> myint i;
>>>>> int j;
>>>>> myint *p = &i;
>>>>> int *q = &j;
>>>>>
>>>>> int main()
>>>>> {
>>>>>   __builtin_printf("%d %d", *p, *q);
>>>>> }
>>>>>
>>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>>>> non-aligned values only makes sense for things passed by value on the stack.
>>>>> And then obviously only dependent on the functuion type signature, not
>>>>> on the type of the passed value.
>>>>>
>>>>
>>>> I think the testcase is ill-formed.  Just because printf doesn't have
>>>> such a modifier, doesn't mean that another variadic function might not
>>>> have a means to detect when an object in the variadic list needs to be
>>>> over-aligned.  As such, the test should really be written as:
>>>
>>> A value doesn't have "alignment".  A function may have alignment
>>> requirements on its arguments, clearly printf doesn't.
>>>
>>
>> Values don't.  But types do and variadic functions are special in that
>> they derive their types from the types of the actual parameters passed
>> not from the formals in the prototype.  Any manipulation of the types
>> should be done in the front end, not in the back end.
> 
> The following seems to help the testcase (by luck I'd say?).  It
> makes us drop alignment information from the temporaries that
> SRA creates as memory replacement.
> 
> But I find it odd that on ARM passing *((aligned_int *)p) as
> vararg (only as varargs?) changes calling conventions independent
> of the functions type signature.
> 
> Richard.
> 
> Index: gcc/tree-sra.c
> ===================================================================
> --- gcc/tree-sra.c      (revision 221770)
> +++ gcc/tree-sra.c      (working copy)
> @@ -2012,7 +2012,9 @@ create_access_replacement (struct access
>        DECL_CONTEXT (repl) = current_function_decl;
>      }
>    else
> -    repl = create_tmp_var (access->type, "SR");
> +    repl = create_tmp_var (build_qualified_type
> +                            (TYPE_MAIN_VARIANT (access->type),
> +                             TYPE_QUALS (access->type)), "SR");
>    if (TREE_CODE (access->type) == COMPLEX_TYPE
>        || TREE_CODE (access->type) == VECTOR_TYPE)
>      {
> 

I guess we could do something similar (use TYPE_MAIN_VARIANT) in the
backend code - but then we'd always ignore any over-alignment
requirements (or under, for that matter).

R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:40                       ` Richard Earnshaw
@ 2015-03-31 10:45                         ` Richard Biener
  2015-03-31 10:51                           ` Richard Earnshaw
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2015-03-31 10:45 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 11:36, Richard Biener wrote:
> > On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> > 
> >> On 31/03/15 11:00, Richard Biener wrote:
> >>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> >>>
> >>>> On 31/03/15 08:50, Richard Biener wrote:
> >>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> >>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
> >>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> >>>>>>> it.
> >>>>>>>
> >>>>>>> The problem appears to be in laying out arguments, specifically
> >>>>>>> varargs. From
> >>>>>>> the "good" -fdump-rtl-expand:
> >>>>>>>
> >>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>>>> S4 A32])
> >>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>>>      (nil))
> >>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> >>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>>>      (nil))
> >>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> >>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>>>      (nil))
> >>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>>>      (nil))
> >>>>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>>>             (set (reg:SI 0 r0)
> >>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>>>> A32])
> >>>>>>>
> >>>>>>> The struct members are
> >>>>>>> reg:SI 113 => int a;
> >>>>>>> reg:DF 112 => double b;
> >>>>>>> reg:SI 111 => int c;
> >>>>>>>
> >>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> >>>>>>> pushed
> >>>>>>> into virtual-outgoing-args. In contrast, post-change to
> >>>>>>> build_ref_of_offset, we get:
> >>>>>>>
> >>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
> >>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
> >>>>>>> *.LC1>)) reduced.c:14 -1
> >>>>>>>      (nil))
> >>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> >>>>>>> virtual-outgoing-args)
> >>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
> >>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>>>      (nil))
> >>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>>>> S8 A64])
> >>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>>>      (nil))
> >>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> >>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>>>      (nil))
> >>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>>>      (nil))
> >>>>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>>>             (set (reg:SI 0 r0)
> >>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>>>> A32])
> >>>>>>>
> >>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> >>>>>>>
> >>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
> >>>>>>> because
> >>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> >>>>>>> argument (the
> >>>>>>> type constructed by build_ref_for_offset); it then executes
> >>>>>>> (aapcs_layout_arg,
> >>>>>>> arm.c line ~~5914)
> >>>>>>>
> >>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >>>>>>>      next even number.  */
> >>>>>>>   ncrn = pcum->aapcs_ncrn;
> >>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> >>>>>>>     ncrn++;
> >>>>>>>
> >>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> >>>>>>> testcase
> >>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> >>>>>>> 32-bit-aligned int instead, which works as previously.
> >>>>>>>
> >>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
> >>>>>>> I think
> >>>>>>> because these use the type of the declared parameters, rather than the
> >>>>>>> provided
> >>>>>>> arguments, and the former do not have the increased alignment from
> >>>>>>> build_ref_for_offset.
> >>>>>>
> >>>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
> >>>>>>
> >>>>>> This means that
> >>>>>>
> >>>>>> Int I __aligned__(8);
> >>>>>>
> >>>>>> Is passed differently than int.
> >>>>>>
> >>>>>> Arm_function_arg needs to be fixed.
> >>>>>
> >>>>> That is,
> >>>>>
> >>>>> typedef int myint __attribute__((aligned(8)));
> >>>>>
> >>>>> int main()
> >>>>> {
> >>>>>   myint i = 1;
> >>>>>   int j = 2;
> >>>>>   __builtin_printf("%d %d\n", i, j);
> >>>>> }
> >>>>>
> >>>>> or
> >>>>>
> >>>>> myint i;
> >>>>> int j;
> >>>>> myint *p = &i;
> >>>>> int *q = &j;
> >>>>>
> >>>>> int main()
> >>>>> {
> >>>>>   __builtin_printf("%d %d", *p, *q);
> >>>>> }
> >>>>>
> >>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
> >>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
> >>>>> non-aligned values only makes sense for things passed by value on the stack.
> >>>>> And then obviously only dependent on the functuion type signature, not
> >>>>> on the type of the passed value.
> >>>>>
> >>>>
> >>>> I think the testcase is ill-formed.  Just because printf doesn't have
> >>>> such a modifier, doesn't mean that another variadic function might not
> >>>> have a means to detect when an object in the variadic list needs to be
> >>>> over-aligned.  As such, the test should really be written as:
> >>>
> >>> A value doesn't have "alignment".  A function may have alignment
> >>> requirements on its arguments, clearly printf doesn't.
> >>>
> >>
> >> Values don't.  But types do and variadic functions are special in that
> >> they derive their types from the types of the actual parameters passed
> >> not from the formals in the prototype.  Any manipulation of the types
> >> should be done in the front end, not in the back end.
> > 
> > The following seems to help the testcase (by luck I'd say?).  It
> > makes us drop alignment information from the temporaries that
> > SRA creates as memory replacement.
> > 
> > But I find it odd that on ARM passing *((aligned_int *)p) as
> > vararg (only as varargs?) changes calling conventions independent
> > of the functions type signature.
> > 
> > Richard.
> > 
> > Index: gcc/tree-sra.c
> > ===================================================================
> > --- gcc/tree-sra.c      (revision 221770)
> > +++ gcc/tree-sra.c      (working copy)
> > @@ -2012,7 +2012,9 @@ create_access_replacement (struct access
> >        DECL_CONTEXT (repl) = current_function_decl;
> >      }
> >    else
> > -    repl = create_tmp_var (access->type, "SR");
> > +    repl = create_tmp_var (build_qualified_type
> > +                            (TYPE_MAIN_VARIANT (access->type),
> > +                             TYPE_QUALS (access->type)), "SR");
> >    if (TREE_CODE (access->type) == COMPLEX_TYPE
> >        || TREE_CODE (access->type) == VECTOR_TYPE)
> >      {
> > 
> 
> I guess we could do something similar (use TYPE_MAIN_VARIANT) in the
> backend code - but then we'd always ignore any over-alignment
> requirements (or under, for that matter).

The question is whether those are even in the scope of AACPS...

Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:31                     ` Richard Earnshaw
@ 2015-03-31 10:45                       ` Richard Biener
  2015-03-31 10:53                         ` Richard Earnshaw
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2015-03-31 10:45 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 11:20, Richard Biener wrote:
> > On Tue, 31 Mar 2015, Richard Biener wrote:
> > 
> >> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> >>
> >>> On 31/03/15 08:50, Richard Biener wrote:
> >>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> >>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
> >>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> >>>>>> it.
> >>>>>>
> >>>>>> The problem appears to be in laying out arguments, specifically
> >>>>>> varargs. From
> >>>>>> the "good" -fdump-rtl-expand:
> >>>>>>
> >>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>>> S4 A32])
> >>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> >>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> >>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>>             (set (reg:SI 0 r0)
> >>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>>> A32])
> >>>>>>
> >>>>>> The struct members are
> >>>>>> reg:SI 113 => int a;
> >>>>>> reg:DF 112 => double b;
> >>>>>> reg:SI 111 => int c;
> >>>>>>
> >>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> >>>>>> pushed
> >>>>>> into virtual-outgoing-args. In contrast, post-change to
> >>>>>> build_ref_of_offset, we get:
> >>>>>>
> >>>>>> (insn 17 16 18 2 (set (reg:SI 118)
> >>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
> >>>>>> *.LC1>)) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> >>>>>> virtual-outgoing-args)
> >>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
> >>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>>> S8 A64])
> >>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> >>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>>             (set (reg:SI 0 r0)
> >>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>>> A32])
> >>>>>>
> >>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> >>>>>>
> >>>>>> double+following int are all pushed into virtual-outgoing-args. This is
> >>>>>> because
> >>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> >>>>>> argument (the
> >>>>>> type constructed by build_ref_for_offset); it then executes
> >>>>>> (aapcs_layout_arg,
> >>>>>> arm.c line ~~5914)
> >>>>>>
> >>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >>>>>>      next even number.  */
> >>>>>>   ncrn = pcum->aapcs_ncrn;
> >>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> >>>>>>     ncrn++;
> >>>>>>
> >>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> >>>>>> testcase
> >>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> >>>>>> 32-bit-aligned int instead, which works as previously.
> >>>>>>
> >>>>>> Passing the same members of that struct in a non-vargs call, works ok -
> >>>>>> I think
> >>>>>> because these use the type of the declared parameters, rather than the
> >>>>>> provided
> >>>>>> arguments, and the former do not have the increased alignment from
> >>>>>> build_ref_for_offset.
> >>>>>
> >>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
> >>>>>
> >>>>> This means that
> >>>>>
> >>>>> Int I __aligned__(8);
> >>>>>
> >>>>> Is passed differently than int.
> >>>>>
> >>>>> Arm_function_arg needs to be fixed.
> >>>>
> >>>> That is,
> >>>>
> >>>> typedef int myint __attribute__((aligned(8)));
> >>>>
> >>>> int main()
> >>>> {
> >>>>   myint i = 1;
> >>>>   int j = 2;
> >>>>   __builtin_printf("%d %d\n", i, j);
> >>>> }
> >>>>
> >>>> or
> >>>>
> >>>> myint i;
> >>>> int j;
> >>>> myint *p = &i;
> >>>> int *q = &j;
> >>>>
> >>>> int main()
> >>>> {
> >>>>   __builtin_printf("%d %d", *p, *q);
> >>>> }
> >>>>
> >>>> should behave the same.  There isn't a printf modifier for an "aligned int"
> >>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
> >>>> non-aligned values only makes sense for things passed by value on the stack.
> >>>> And then obviously only dependent on the functuion type signature, not
> >>>> on the type of the passed value.
> >>>>
> >>>
> >>> I think the testcase is ill-formed.  Just because printf doesn't have
> >>> such a modifier, doesn't mean that another variadic function might not
> >>> have a means to detect when an object in the variadic list needs to be
> >>> over-aligned.  As such, the test should really be written as:
> >>
> >> A value doesn't have "alignment".  A function may have alignment
> >> requirements on its arguments, clearly printf doesn't.
> > 
> > Btw, it looks like function_arg is supposed to pass whether the argument
> > is to the variadic part of the function or not, but it gets passed
> > false as in
> > 
> >       args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
> >                                                 argpos < n_named_args);
> > 
> > n_named_args is 4.  That is because ARM asks to do this via
> > 
> >   if (type_arg_types != 0
> >       && targetm.calls.strict_argument_naming (args_so_far))
> >     ;
> >   else if (type_arg_types != 0
> >            && ! targetm.calls.pretend_outgoing_varargs_named 
> > (args_so_far))
> >     /* Don't include the last named arg.  */
> >     --n_named_args;
> >   else
> >     /* Treat all args as named.  */
> >     n_named_args = num_actuals;
> > 
> > thus you lose that info (you could have that readily available).
> > 
> 
> From the calling side of a function it shouldn't matter.  They only
> thing the caller has to know is that the function is variadic (and
> therefore that the base-standard rules from the AAPCS apply -- no use of
> FP registers for parameters).  The behaviour after that is *as if* all
> the arguments were pushed onto the stack in the correct order and
> finally the lowest 4 words are popped off the stack again into r0-r3.
> Hence any alignment that would apply to arguments on the stack has to be
> reflected in their allocation into r0-r3 (since the push/pop of those
> registers never happens).

But how do you compute the alignment of, say a myint '1'?  Of course
__attribute__((aligned())) is a C extension thus AAPCS likely doesn't 
consider it.

But yes, as given even on x86_64 we might spill a 8-byte aligned
int register to a 8-byte aligned stack slot so the proposed patch
makes sense in that regard.  I wonder how many other passes can
get confused by this (PRE, I suppose, inserting an explicitely
aligned load as well as all passes after the vectorizer which
also creates loads/store with explicit (usually lower) alignment).

Richard.

> R.
> 
> >>> typedef int myint __attribute__((aligned(8)));
> >>>
> >>> int main()
> >>> {
> >>>   myint i = 1;
> >>>   int j = 2;
> >>>   __builtin_printf("%d %d\n", (int) i, j);
> >>> }
> >>>
> >>> Variadic functions take the types of their arguments from the types of
> >>> the actuals passed.  The compiler should either be applying appropriate
> >>> promotion rules to make the types conformant by the language
> >>> specification or respecting the types exactly.  However, that should be
> >>> done in the mid-end not the back-end.  If incorrect alignment
> >>> information is passed to the back-end it can't help but make the wrong
> >>> choice.  Examining the mode here wouldn't help either.  Consider:
> >>>
> >>> int foo (int a, int b, int c, int d, int e, int f
> >>> __attribute__((aligned(8))));
> >>>
> >>> On ARM that should pass f in a 64-bit aligned location (since the
> >>> parameter will be on the stack).
> >>>
> >>> R.
> >>>
> >>>
> >>>> Richard.
> >>>>
> >>>>> Richard.
> >>>>>
> >>>>>>
> >>>>>> FWIW, I also tried:
> >>>>>>
> >>>>>> __attribute__((__aligned__((16)))) int x;
> >>>>>> int main (void)
> >>>>>> {
> >>>>>>   __builtin_printf("%d\n", x);
> >>>>>> }
> >>>>>>
> >>>>>> but in that case, the arm_function_arg is still fed a type with
> >>>>>> alignment 32
> >>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
> >>>>>> has
> >>>>>> alignment 128.
> >>>>>>
> >>>>>> --Alan
> >>>>>>
> >>>>>> Richard Biener wrote:
> >>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
> >>>>>>>
> >>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
> >>>>>>>>
> >>>>>>>>> ...actually attach the testcase...
> >>>>>>>> What compile options?
> >>>>>>>
> >>>>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
> >>>>>>> I can't see anything not guaranteeing that:
> >>>>>>>
> >>>>>>>         .section        .rodata
> >>>>>>>         .align  3
> >>>>>>> .LANCHOR0 = . + 0
> >>>>>>> .LC1:
> >>>>>>>         .ascii  "%d %g %d\012\000"
> >>>>>>>         .space  6
> >>>>>>> .LC0:
> >>>>>>>         .word   7
> >>>>>>>         .space  4
> >>>>>>>         .word   0
> >>>>>>>         .word   1075838976
> >>>>>>>         .word   9
> >>>>>>>         .space  4
> >>>>>>>
> >>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
> >>>>>>> copy?  That is, the patch tells the backend that the loads and
> >>>>>>> stores to the 'int' vars (which have padding followed) is aligned
> >>>>>>> to 8 bytes.
> >>>>>>>
> >>>>>>> I don't see what is wrong in the final assembler, but maybe
> >>>>>>> some endian issue exists?  The code looks quite ugly though ;)
> >>>>>>>
> >>>>>>> Richard.
> >>>>>>>
> >>>>>>>>> Alan Lawrence wrote:
> >>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
> >>>>>> testsuite on ARM
> >>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
> >>>>>> following this
> >>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
> >>>>>> (including
> >>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
> >>>>>> the
> >>>>>>>>>> expected
> >>>>>>>>>> 7 8 9
> >>>>>>>>>> whereas with gcc r221348, instead it prints
> >>>>>>>>>> 0 8 0
> >>>>>>>>>>
> >>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
> >>>>>> a
> >>>>>>>>>> var_decl of b1, from 32 to 64; both have type
> >>>>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
> >>>>>> sizes-gimplified type_0
> >>>>>>>>>> BLK
> >>>>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
> >>>>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
> >>>>>>>>>>          align 64 symtab 0 alias set 1 canonical type
> >>>>>> 0x2b9b8d428d20
> >>>>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
> >>>>>>>>>> 0x2b9b8d092690 int>
> >>>>>>>>>>              SI file reduced.c line 12 col 7
> >>>>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
> >>>>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
> >>>>>>>>>>              align 32 offset_align 64
> >>>>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
> >>>>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
> >>>>>> context
> >>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
> >>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
> >>>>>> D.6070>
> >>>>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
> >>>>>> <type_decl
> >>>>>>>>>> 0x2b9b8d42b000 D.6044>>
> >>>>>>>>>>
> >>>>>>>>>> The tree-optimized output is the same with both compilers (as this
> >>>>>> does not
> >>>>>>>>>> mention alignment); the expand output differs.
> >>>>>>>>>>
> >>>>>>>>>> Still investigating...
> >>>>>>>>>>
> >>>>>>>>>> --Alan
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Richard Biener wrote:
> >>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
> >>>>>>>>>>> drops alignment info unnecessarily.
> >>>>>>>>>>>
> >>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >>>>>>>>>>>
> >>>>>>>>>>> Richard.
> >>>>>>>>>>>
> >>>>>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
> >>>>>>>>>>>
> >>>>>>>>>>>  PR tree-optimization/65310
> >>>>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
> >>>>>>>>>>>  alignment.
> >>>>>>>>>>>
> >>>>>>>>>>> Index: gcc/tree-sra.c
> >>>>>>>>>>>
> >>>>>> ===================================================================
> >>>>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
> >>>>>>>>>>> +++ gcc/tree-sra.c       (working copy)
> >>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
> >>>>>>>>>>>    misalign = (misalign + offset) & (align - 1);
> >>>>>>>>>>>    if (misalign != 0)
> >>>>>>>>>>>      align = (misalign & -misalign);
> >>>>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
> >>>>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
> >>>>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
> >>>>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
> >>>>>> off);
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:36                     ` Richard Biener
  2015-03-31 10:40                       ` Richard Earnshaw
@ 2015-03-31 10:47                       ` Alan Lawrence
  2015-03-31 11:05                         ` Richard Biener
  2015-03-31 11:07                         ` Jakub Jelinek
  1 sibling, 2 replies; 31+ messages in thread
From: Alan Lawrence @ 2015-03-31 10:47 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Earnshaw, Richard Biener, gcc-patches, Marcus Shawcroft

Richard Biener wrote:
>
> But I find it odd that on ARM passing *((aligned_int *)p) as
> vararg (only as varargs?) changes calling conventions independent
> of the functions type signature.

Does it? Do you have a testcase, and compilation flags, that'll make this show 
up in an RTL dump? I've tried numerous cases, including AFAICT yours, and I 
always get the value being passed in the expected ("unaligned") register?

Cheers, Alan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:45                         ` Richard Biener
@ 2015-03-31 10:51                           ` Richard Earnshaw
  2015-03-31 11:09                             ` Richard Biener
  2015-03-31 12:11                             ` Alan Lawrence
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Earnshaw @ 2015-03-31 10:51 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On 31/03/15 11:45, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 11:36, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>
>>>> On 31/03/15 11:00, Richard Biener wrote:
>>>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>>>
>>>>>> On 31/03/15 08:50, Richard Biener wrote:
>>>>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>>>>>> it.
>>>>>>>>>
>>>>>>>>> The problem appears to be in laying out arguments, specifically
>>>>>>>>> varargs. From
>>>>>>>>> the "good" -fdump-rtl-expand:
>>>>>>>>>
>>>>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>>> S4 A32])
>>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>>      (nil))
>>>>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>>      (nil))
>>>>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>>      (nil))
>>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>>      (nil))
>>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>>> A32])
>>>>>>>>>
>>>>>>>>> The struct members are
>>>>>>>>> reg:SI 113 => int a;
>>>>>>>>> reg:DF 112 => double b;
>>>>>>>>> reg:SI 111 => int c;
>>>>>>>>>
>>>>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>>>>>> pushed
>>>>>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>>>>>> build_ref_of_offset, we get:
>>>>>>>>>
>>>>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>>>>>> *.LC1>)) reduced.c:14 -1
>>>>>>>>>      (nil))
>>>>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>>>>>> virtual-outgoing-args)
>>>>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>>      (nil))
>>>>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>>> S8 A64])
>>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>>      (nil))
>>>>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>>      (nil))
>>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>>      (nil))
>>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>>> A32])
>>>>>>>>>
>>>>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>>>>>
>>>>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>>>>>> because
>>>>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>>>>>> argument (the
>>>>>>>>> type constructed by build_ref_for_offset); it then executes
>>>>>>>>> (aapcs_layout_arg,
>>>>>>>>> arm.c line ~~5914)
>>>>>>>>>
>>>>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>>>>>      next even number.  */
>>>>>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>>>>>     ncrn++;
>>>>>>>>>
>>>>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>>>>>> testcase
>>>>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>>>>>
>>>>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>>>>>> I think
>>>>>>>>> because these use the type of the declared parameters, rather than the
>>>>>>>>> provided
>>>>>>>>> arguments, and the former do not have the increased alignment from
>>>>>>>>> build_ref_for_offset.
>>>>>>>>
>>>>>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>>>>>>>
>>>>>>>> This means that
>>>>>>>>
>>>>>>>> Int I __aligned__(8);
>>>>>>>>
>>>>>>>> Is passed differently than int.
>>>>>>>>
>>>>>>>> Arm_function_arg needs to be fixed.
>>>>>>>
>>>>>>> That is,
>>>>>>>
>>>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>>>
>>>>>>> int main()
>>>>>>> {
>>>>>>>   myint i = 1;
>>>>>>>   int j = 2;
>>>>>>>   __builtin_printf("%d %d\n", i, j);
>>>>>>> }
>>>>>>>
>>>>>>> or
>>>>>>>
>>>>>>> myint i;
>>>>>>> int j;
>>>>>>> myint *p = &i;
>>>>>>> int *q = &j;
>>>>>>>
>>>>>>> int main()
>>>>>>> {
>>>>>>>   __builtin_printf("%d %d", *p, *q);
>>>>>>> }
>>>>>>>
>>>>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>>>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>>>>>> non-aligned values only makes sense for things passed by value on the stack.
>>>>>>> And then obviously only dependent on the functuion type signature, not
>>>>>>> on the type of the passed value.
>>>>>>>
>>>>>>
>>>>>> I think the testcase is ill-formed.  Just because printf doesn't have
>>>>>> such a modifier, doesn't mean that another variadic function might not
>>>>>> have a means to detect when an object in the variadic list needs to be
>>>>>> over-aligned.  As such, the test should really be written as:
>>>>>
>>>>> A value doesn't have "alignment".  A function may have alignment
>>>>> requirements on its arguments, clearly printf doesn't.
>>>>>
>>>>
>>>> Values don't.  But types do and variadic functions are special in that
>>>> they derive their types from the types of the actual parameters passed
>>>> not from the formals in the prototype.  Any manipulation of the types
>>>> should be done in the front end, not in the back end.
>>>
>>> The following seems to help the testcase (by luck I'd say?).  It
>>> makes us drop alignment information from the temporaries that
>>> SRA creates as memory replacement.
>>>
>>> But I find it odd that on ARM passing *((aligned_int *)p) as
>>> vararg (only as varargs?) changes calling conventions independent
>>> of the functions type signature.
>>>
>>> Richard.
>>>
>>> Index: gcc/tree-sra.c
>>> ===================================================================
>>> --- gcc/tree-sra.c      (revision 221770)
>>> +++ gcc/tree-sra.c      (working copy)
>>> @@ -2012,7 +2012,9 @@ create_access_replacement (struct access
>>>        DECL_CONTEXT (repl) = current_function_decl;
>>>      }
>>>    else
>>> -    repl = create_tmp_var (access->type, "SR");
>>> +    repl = create_tmp_var (build_qualified_type
>>> +                            (TYPE_MAIN_VARIANT (access->type),
>>> +                             TYPE_QUALS (access->type)), "SR");
>>>    if (TREE_CODE (access->type) == COMPLEX_TYPE
>>>        || TREE_CODE (access->type) == VECTOR_TYPE)
>>>      {
>>>
>>
>> I guess we could do something similar (use TYPE_MAIN_VARIANT) in the
>> backend code - but then we'd always ignore any over-alignment
>> requirements (or under, for that matter).
> 
> The question is whether those are even in the scope of AACPS...
> 

Technically, they aren't.  The argument passing rules are all based on
the alignment rules for fundamental types (and derived rules for
composite types based on those fundamental types) written in the AAPCS
itself.  There's no provision for over-aligning a data type, so all of
this is off-piste.

R.

> Richard.
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:45                       ` Richard Biener
@ 2015-03-31 10:53                         ` Richard Earnshaw
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Earnshaw @ 2015-03-31 10:53 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On 31/03/15 11:44, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 11:20, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Biener wrote:
>>>
>>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>>
>>>>> On 31/03/15 08:50, Richard Biener wrote:
>>>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>>>>> it.
>>>>>>>>
>>>>>>>> The problem appears to be in laying out arguments, specifically
>>>>>>>> varargs. From
>>>>>>>> the "good" -fdump-rtl-expand:
>>>>>>>>
>>>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>> S4 A32])
>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>> A32])
>>>>>>>>
>>>>>>>> The struct members are
>>>>>>>> reg:SI 113 => int a;
>>>>>>>> reg:DF 112 => double b;
>>>>>>>> reg:SI 111 => int c;
>>>>>>>>
>>>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>>>>> pushed
>>>>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>>>>> build_ref_of_offset, we get:
>>>>>>>>
>>>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>>>>> *.LC1>)) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>>>>> virtual-outgoing-args)
>>>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>> S8 A64])
>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>> A32])
>>>>>>>>
>>>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>>>>
>>>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>>>>> because
>>>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>>>>> argument (the
>>>>>>>> type constructed by build_ref_for_offset); it then executes
>>>>>>>> (aapcs_layout_arg,
>>>>>>>> arm.c line ~~5914)
>>>>>>>>
>>>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>>>>      next even number.  */
>>>>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>>>>     ncrn++;
>>>>>>>>
>>>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>>>>> testcase
>>>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>>>>
>>>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>>>>> I think
>>>>>>>> because these use the type of the declared parameters, rather than the
>>>>>>>> provided
>>>>>>>> arguments, and the former do not have the increased alignment from
>>>>>>>> build_ref_for_offset.
>>>>>>>
>>>>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>>>>>>
>>>>>>> This means that
>>>>>>>
>>>>>>> Int I __aligned__(8);
>>>>>>>
>>>>>>> Is passed differently than int.
>>>>>>>
>>>>>>> Arm_function_arg needs to be fixed.
>>>>>>
>>>>>> That is,
>>>>>>
>>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>>
>>>>>> int main()
>>>>>> {
>>>>>>   myint i = 1;
>>>>>>   int j = 2;
>>>>>>   __builtin_printf("%d %d\n", i, j);
>>>>>> }
>>>>>>
>>>>>> or
>>>>>>
>>>>>> myint i;
>>>>>> int j;
>>>>>> myint *p = &i;
>>>>>> int *q = &j;
>>>>>>
>>>>>> int main()
>>>>>> {
>>>>>>   __builtin_printf("%d %d", *p, *q);
>>>>>> }
>>>>>>
>>>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>>>>> non-aligned values only makes sense for things passed by value on the stack.
>>>>>> And then obviously only dependent on the functuion type signature, not
>>>>>> on the type of the passed value.
>>>>>>
>>>>>
>>>>> I think the testcase is ill-formed.  Just because printf doesn't have
>>>>> such a modifier, doesn't mean that another variadic function might not
>>>>> have a means to detect when an object in the variadic list needs to be
>>>>> over-aligned.  As such, the test should really be written as:
>>>>
>>>> A value doesn't have "alignment".  A function may have alignment
>>>> requirements on its arguments, clearly printf doesn't.
>>>
>>> Btw, it looks like function_arg is supposed to pass whether the argument
>>> is to the variadic part of the function or not, but it gets passed
>>> false as in
>>>
>>>       args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
>>>                                                 argpos < n_named_args);
>>>
>>> n_named_args is 4.  That is because ARM asks to do this via
>>>
>>>   if (type_arg_types != 0
>>>       && targetm.calls.strict_argument_naming (args_so_far))
>>>     ;
>>>   else if (type_arg_types != 0
>>>            && ! targetm.calls.pretend_outgoing_varargs_named 
>>> (args_so_far))
>>>     /* Don't include the last named arg.  */
>>>     --n_named_args;
>>>   else
>>>     /* Treat all args as named.  */
>>>     n_named_args = num_actuals;
>>>
>>> thus you lose that info (you could have that readily available).
>>>
>>
>> From the calling side of a function it shouldn't matter.  They only
>> thing the caller has to know is that the function is variadic (and
>> therefore that the base-standard rules from the AAPCS apply -- no use of
>> FP registers for parameters).  The behaviour after that is *as if* all
>> the arguments were pushed onto the stack in the correct order and
>> finally the lowest 4 words are popped off the stack again into r0-r3.
>> Hence any alignment that would apply to arguments on the stack has to be
>> reflected in their allocation into r0-r3 (since the push/pop of those
>> registers never happens).
> 
> But how do you compute the alignment of, say a myint '1'?  Of course
> __attribute__((aligned())) is a C extension thus AAPCS likely doesn't 
> consider it.
> 

Front-end problem :-)

> But yes, as given even on x86_64 we might spill a 8-byte aligned
> int register to a 8-byte aligned stack slot so the proposed patch
> makes sense in that regard.  I wonder how many other passes can
> get confused by this (PRE, I suppose, inserting an explicitely
> aligned load as well as all passes after the vectorizer which
> also creates loads/store with explicit (usually lower) alignment).
> 
> Richard.
> 
>> R.
>>
>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>
>>>>> int main()
>>>>> {
>>>>>   myint i = 1;
>>>>>   int j = 2;
>>>>>   __builtin_printf("%d %d\n", (int) i, j);
>>>>> }
>>>>>
>>>>> Variadic functions take the types of their arguments from the types of
>>>>> the actuals passed.  The compiler should either be applying appropriate
>>>>> promotion rules to make the types conformant by the language
>>>>> specification or respecting the types exactly.  However, that should be
>>>>> done in the mid-end not the back-end.  If incorrect alignment
>>>>> information is passed to the back-end it can't help but make the wrong
>>>>> choice.  Examining the mode here wouldn't help either.  Consider:
>>>>>
>>>>> int foo (int a, int b, int c, int d, int e, int f
>>>>> __attribute__((aligned(8))));
>>>>>
>>>>> On ARM that should pass f in a 64-bit aligned location (since the
>>>>> parameter will be on the stack).
>>>>>
>>>>> R.
>>>>>
>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> FWIW, I also tried:
>>>>>>>>
>>>>>>>> __attribute__((__aligned__((16)))) int x;
>>>>>>>> int main (void)
>>>>>>>> {
>>>>>>>>   __builtin_printf("%d\n", x);
>>>>>>>> }
>>>>>>>>
>>>>>>>> but in that case, the arm_function_arg is still fed a type with
>>>>>>>> alignment 32
>>>>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
>>>>>>>> has
>>>>>>>> alignment 128.
>>>>>>>>
>>>>>>>> --Alan
>>>>>>>>
>>>>>>>> Richard Biener wrote:
>>>>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>>>>>>>
>>>>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>>>>>>>
>>>>>>>>>>> ...actually attach the testcase...
>>>>>>>>>> What compile options?
>>>>>>>>>
>>>>>>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>>>>>>>> I can't see anything not guaranteeing that:
>>>>>>>>>
>>>>>>>>>         .section        .rodata
>>>>>>>>>         .align  3
>>>>>>>>> .LANCHOR0 = . + 0
>>>>>>>>> .LC1:
>>>>>>>>>         .ascii  "%d %g %d\012\000"
>>>>>>>>>         .space  6
>>>>>>>>> .LC0:
>>>>>>>>>         .word   7
>>>>>>>>>         .space  4
>>>>>>>>>         .word   0
>>>>>>>>>         .word   1075838976
>>>>>>>>>         .word   9
>>>>>>>>>         .space  4
>>>>>>>>>
>>>>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
>>>>>>>>> copy?  That is, the patch tells the backend that the loads and
>>>>>>>>> stores to the 'int' vars (which have padding followed) is aligned
>>>>>>>>> to 8 bytes.
>>>>>>>>>
>>>>>>>>> I don't see what is wrong in the final assembler, but maybe
>>>>>>>>> some endian issue exists?  The code looks quite ugly though ;)
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>>> Alan Lawrence wrote:
>>>>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>>>>>>> testsuite on ARM
>>>>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>>>>>>> following this
>>>>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>>>>>>> (including
>>>>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>>>>>>> the
>>>>>>>>>>>> expected
>>>>>>>>>>>> 7 8 9
>>>>>>>>>>>> whereas with gcc r221348, instead it prints
>>>>>>>>>>>> 0 8 0
>>>>>>>>>>>>
>>>>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>>>>>>> a
>>>>>>>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>>>>>>> sizes-gimplified type_0
>>>>>>>>>>>> BLK
>>>>>>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>>>>>>> 0x2b9b8d428d20
>>>>>>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>>>>>>>> 0x2b9b8d092690 int>
>>>>>>>>>>>>              SI file reduced.c line 12 col 7
>>>>>>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>>>>>>>              align 32 offset_align 64
>>>>>>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>>>>>>> context
>>>>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>>>>>>> D.6070>
>>>>>>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>>>>>>>> <type_decl
>>>>>>>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>>>>>>>
>>>>>>>>>>>> The tree-optimized output is the same with both compilers (as this
>>>>>>>> does not
>>>>>>>>>>>> mention alignment); the expand output differs.
>>>>>>>>>>>>
>>>>>>>>>>>> Still investigating...
>>>>>>>>>>>>
>>>>>>>>>>>> --Alan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Richard Biener wrote:
>>>>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>>>>>>>> drops alignment info unnecessarily.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  PR tree-optimization/65310
>>>>>>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>>>>>>>  alignment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Index: gcc/tree-sra.c
>>>>>>>>>>>>>
>>>>>>>> ===================================================================
>>>>>>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>>>>>>>    if (misalign != 0)
>>>>>>>>>>>>>      align = (misalign & -misalign);
>>>>>>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>>>>>>> off);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:47                       ` Alan Lawrence
@ 2015-03-31 11:05                         ` Richard Biener
  2015-03-31 11:07                         ` Jakub Jelinek
  1 sibling, 0 replies; 31+ messages in thread
From: Richard Biener @ 2015-03-31 11:05 UTC (permalink / raw)
  To: Alan Lawrence
  Cc: Richard Earnshaw, Richard Biener, gcc-patches, Marcus Shawcroft

On Tue, 31 Mar 2015, Alan Lawrence wrote:

> Richard Biener wrote:
> > 
> > But I find it odd that on ARM passing *((aligned_int *)p) as
> > vararg (only as varargs?) changes calling conventions independent
> > of the functions type signature.
> 
> Does it? Do you have a testcase, and compilation flags, that'll make this show
> up in an RTL dump? I've tried numerous cases, including AFAICT yours, and I
> always get the value being passed in the expected ("unaligned") register?

Yep, it seems that loading from p with the following has a value of
type 'int', not 'myint'.

typedef int myint __attribute__((aligned(8)));
myint i;
myint *p = &i;

Well, the gimplifier does that:

/* Create a temporary with a name derived from VAL.  Subroutine of
   lookup_tmp_var; nobody else should call this function.  */

static inline tree
create_tmp_from_val (tree val)
{
  /* Drop all qualifiers and address-space information from the value 
type.  */
  tree type = TYPE_MAIN_VARIANT (TREE_TYPE (val));
  tree var = create_tmp_var (type, get_name (val));

it even drops address-space info which looks suspicious to me for

int *p [addr-space:X];
int * [addr-space:X] *q = &p;

and

 **q;

it would generate

 int *tem = *q;
 int tem = *tem;

and thus drop the addr-space from the 2nd load.  Similar for atomics
if they are also on the variant chain (ah, but those are lowered
early, so it doesn't matter for them).

So you indeed need a testcase where the compiler creates a temporary
of such type (or you need a type the gimplifier doesn't consider
a register type).

Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:47                       ` Alan Lawrence
  2015-03-31 11:05                         ` Richard Biener
@ 2015-03-31 11:07                         ` Jakub Jelinek
  2015-03-31 11:11                           ` Richard Biener
  2015-03-31 13:11                           ` Alan Lawrence
  1 sibling, 2 replies; 31+ messages in thread
From: Jakub Jelinek @ 2015-03-31 11:07 UTC (permalink / raw)
  To: Alan Lawrence
  Cc: Richard Biener, Richard Earnshaw, Richard Biener, gcc-patches,
	Marcus Shawcroft

On Tue, Mar 31, 2015 at 11:47:37AM +0100, Alan Lawrence wrote:
> Richard Biener wrote:
> >
> >But I find it odd that on ARM passing *((aligned_int *)p) as
> >vararg (only as varargs?) changes calling conventions independent
> >of the functions type signature.
> 
> Does it? Do you have a testcase, and compilation flags, that'll make this
> show up in an RTL dump? I've tried numerous cases, including AFAICT yours,
> and I always get the value being passed in the expected ("unaligned")
> register?

If the integral type alignment right now matters, I'd try something like:

typedef int V __attribute__((aligned (8)));
V x;

int foo (int x, ...)
{
  int z;
  __builtin_va_list va;
  __builtin_va_start (va, x);
  switch (x)
    {
    case 1:
    case 3:
    case 6:
      z = __builtin_va_arg (va, int);
      break;
    default:
      z = __builtin_va_arg (va, V);
      break;
    }
  __builtin_va_end (va);
  return z;
}

int
bar (void)
{
  V v = 3;
  int w = 3;
  foo (1, (int) v);
  foo (2, (V) w);
  v = 3;
  w = (int) v;
  foo (3, w);
  foo (4, (V) w);
  v = (V) w;
  foo (5, v);
  foo (6, (int) v);
  foo (7, x);
  return 0;
}

(of course, most likely with passing a different value each time and
verification of the result).
As the compiler treats all those casts there as useless, I'd expect
that the types of the passed argument would be pretty much random.
And, note that even on x86_64, the __builtin_va_arg with V expands into
  # addr.1_3 = PHI <addr.1_27(9), _31(10)>
  z_35 = MEM[(V * {ref-all})addr.1_3];
using exactly the same address for int as well as V va_arg - if you increase
the overalignment arbitrarily, it will surely be a wrong IL because nobody
really guarantees anything about the overalignment.

So, I think the tree-sra.c patch is a good idea - try to keep using the main
type variants as the types in the IL where possible except for the MEM_REF
first argument (i.e. even the lhs of the load should IMHO not be
overaligned).

As Eric Botcazou said, GCC right now isn't really prepared for under or
overaligned scalars, only when they are in structs (or for middle-end in
*MEM_REFs).

	Jakub

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:51                           ` Richard Earnshaw
@ 2015-03-31 11:09                             ` Richard Biener
  2015-03-31 12:15                               ` Richard Earnshaw
  2015-03-31 12:11                             ` Alan Lawrence
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Biener @ 2015-03-31 11:09 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 11:45, Richard Biener wrote:
> > On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> > 
> >> On 31/03/15 11:36, Richard Biener wrote:
> >>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> >>>
> >>>> On 31/03/15 11:00, Richard Biener wrote:
> >>>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> >>>>>
> >>>>>> On 31/03/15 08:50, Richard Biener wrote:
> >>>>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> >>>>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
> >>>>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> >>>>>>>>> it.
> >>>>>>>>>
> >>>>>>>>> The problem appears to be in laying out arguments, specifically
> >>>>>>>>> varargs. From
> >>>>>>>>> the "good" -fdump-rtl-expand:
> >>>>>>>>>
> >>>>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>>>>>> S4 A32])
> >>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>>>>>      (nil))
> >>>>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> >>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>>>>>      (nil))
> >>>>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> >>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>>>>>      (nil))
> >>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>>>>>      (nil))
> >>>>>>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>>>>>             (set (reg:SI 0 r0)
> >>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>>>>>> A32])
> >>>>>>>>>
> >>>>>>>>> The struct members are
> >>>>>>>>> reg:SI 113 => int a;
> >>>>>>>>> reg:DF 112 => double b;
> >>>>>>>>> reg:SI 111 => int c;
> >>>>>>>>>
> >>>>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> >>>>>>>>> pushed
> >>>>>>>>> into virtual-outgoing-args. In contrast, post-change to
> >>>>>>>>> build_ref_of_offset, we get:
> >>>>>>>>>
> >>>>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
> >>>>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
> >>>>>>>>> *.LC1>)) reduced.c:14 -1
> >>>>>>>>>      (nil))
> >>>>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> >>>>>>>>> virtual-outgoing-args)
> >>>>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
> >>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>>>>>      (nil))
> >>>>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>>>>>> S8 A64])
> >>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>>>>>      (nil))
> >>>>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> >>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>>>>>      (nil))
> >>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>>>>>      (nil))
> >>>>>>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>>>>>             (set (reg:SI 0 r0)
> >>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>>>>>> A32])
> >>>>>>>>>
> >>>>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> >>>>>>>>>
> >>>>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
> >>>>>>>>> because
> >>>>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> >>>>>>>>> argument (the
> >>>>>>>>> type constructed by build_ref_for_offset); it then executes
> >>>>>>>>> (aapcs_layout_arg,
> >>>>>>>>> arm.c line ~~5914)
> >>>>>>>>>
> >>>>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >>>>>>>>>      next even number.  */
> >>>>>>>>>   ncrn = pcum->aapcs_ncrn;
> >>>>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> >>>>>>>>>     ncrn++;
> >>>>>>>>>
> >>>>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> >>>>>>>>> testcase
> >>>>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> >>>>>>>>> 32-bit-aligned int instead, which works as previously.
> >>>>>>>>>
> >>>>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
> >>>>>>>>> I think
> >>>>>>>>> because these use the type of the declared parameters, rather than the
> >>>>>>>>> provided
> >>>>>>>>> arguments, and the former do not have the increased alignment from
> >>>>>>>>> build_ref_for_offset.
> >>>>>>>>
> >>>>>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
> >>>>>>>>
> >>>>>>>> This means that
> >>>>>>>>
> >>>>>>>> Int I __aligned__(8);
> >>>>>>>>
> >>>>>>>> Is passed differently than int.
> >>>>>>>>
> >>>>>>>> Arm_function_arg needs to be fixed.
> >>>>>>>
> >>>>>>> That is,
> >>>>>>>
> >>>>>>> typedef int myint __attribute__((aligned(8)));
> >>>>>>>
> >>>>>>> int main()
> >>>>>>> {
> >>>>>>>   myint i = 1;
> >>>>>>>   int j = 2;
> >>>>>>>   __builtin_printf("%d %d\n", i, j);
> >>>>>>> }
> >>>>>>>
> >>>>>>> or
> >>>>>>>
> >>>>>>> myint i;
> >>>>>>> int j;
> >>>>>>> myint *p = &i;
> >>>>>>> int *q = &j;
> >>>>>>>
> >>>>>>> int main()
> >>>>>>> {
> >>>>>>>   __builtin_printf("%d %d", *p, *q);
> >>>>>>> }
> >>>>>>>
> >>>>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
> >>>>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
> >>>>>>> non-aligned values only makes sense for things passed by value on the stack.
> >>>>>>> And then obviously only dependent on the functuion type signature, not
> >>>>>>> on the type of the passed value.
> >>>>>>>
> >>>>>>
> >>>>>> I think the testcase is ill-formed.  Just because printf doesn't have
> >>>>>> such a modifier, doesn't mean that another variadic function might not
> >>>>>> have a means to detect when an object in the variadic list needs to be
> >>>>>> over-aligned.  As such, the test should really be written as:
> >>>>>
> >>>>> A value doesn't have "alignment".  A function may have alignment
> >>>>> requirements on its arguments, clearly printf doesn't.
> >>>>>
> >>>>
> >>>> Values don't.  But types do and variadic functions are special in that
> >>>> they derive their types from the types of the actual parameters passed
> >>>> not from the formals in the prototype.  Any manipulation of the types
> >>>> should be done in the front end, not in the back end.
> >>>
> >>> The following seems to help the testcase (by luck I'd say?).  It
> >>> makes us drop alignment information from the temporaries that
> >>> SRA creates as memory replacement.
> >>>
> >>> But I find it odd that on ARM passing *((aligned_int *)p) as
> >>> vararg (only as varargs?) changes calling conventions independent
> >>> of the functions type signature.
> >>>
> >>> Richard.
> >>>
> >>> Index: gcc/tree-sra.c
> >>> ===================================================================
> >>> --- gcc/tree-sra.c      (revision 221770)
> >>> +++ gcc/tree-sra.c      (working copy)
> >>> @@ -2012,7 +2012,9 @@ create_access_replacement (struct access
> >>>        DECL_CONTEXT (repl) = current_function_decl;
> >>>      }
> >>>    else
> >>> -    repl = create_tmp_var (access->type, "SR");
> >>> +    repl = create_tmp_var (build_qualified_type
> >>> +                            (TYPE_MAIN_VARIANT (access->type),
> >>> +                             TYPE_QUALS (access->type)), "SR");
> >>>    if (TREE_CODE (access->type) == COMPLEX_TYPE
> >>>        || TREE_CODE (access->type) == VECTOR_TYPE)
> >>>      {
> >>>
> >>
> >> I guess we could do something similar (use TYPE_MAIN_VARIANT) in the
> >> backend code - but then we'd always ignore any over-alignment
> >> requirements (or under, for that matter).
> > 
> > The question is whether those are even in the scope of AACPS...
> > 
> 
> Technically, they aren't.  The argument passing rules are all based on
> the alignment rules for fundamental types (and derived rules for
> composite types based on those fundamental types) written in the AAPCS
> itself.  There's no provision for over-aligning a data type, so all of
> this is off-piste.

So in arm_needs_doubleword_align you could do

/* Return true if mode/type need doubleword alignment.  */
static bool
arm_needs_doubleword_align (machine_mode mode, const_tree type)
{
  return (GET_MODE_ALIGNMENT (mode) > PARM_BOUNDARY
          || (type
              && (AGGREGATE_TYPE_P (type) || mode == BLKmode)
              && TYPE_ALIGN (type) > PARM_BOUNDARY));
}

?  Thus always trust the 'mode' when it doesn't apply to an aggregate?

Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 11:07                         ` Jakub Jelinek
@ 2015-03-31 11:11                           ` Richard Biener
  2015-03-31 13:11                           ` Alan Lawrence
  1 sibling, 0 replies; 31+ messages in thread
From: Richard Biener @ 2015-03-31 11:11 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Alan Lawrence, Richard Earnshaw, Richard Biener, gcc-patches,
	Marcus Shawcroft

On Tue, 31 Mar 2015, Jakub Jelinek wrote:

> On Tue, Mar 31, 2015 at 11:47:37AM +0100, Alan Lawrence wrote:
> > Richard Biener wrote:
> > >
> > >But I find it odd that on ARM passing *((aligned_int *)p) as
> > >vararg (only as varargs?) changes calling conventions independent
> > >of the functions type signature.
> > 
> > Does it? Do you have a testcase, and compilation flags, that'll make this
> > show up in an RTL dump? I've tried numerous cases, including AFAICT yours,
> > and I always get the value being passed in the expected ("unaligned")
> > register?
> 
> If the integral type alignment right now matters, I'd try something like:
> 
> typedef int V __attribute__((aligned (8)));
> V x;
> 
> int foo (int x, ...)
> {
>   int z;
>   __builtin_va_list va;
>   __builtin_va_start (va, x);
>   switch (x)
>     {
>     case 1:
>     case 3:
>     case 6:
>       z = __builtin_va_arg (va, int);
>       break;
>     default:
>       z = __builtin_va_arg (va, V);
>       break;
>     }
>   __builtin_va_end (va);
>   return z;
> }
> 
> int
> bar (void)
> {
>   V v = 3;
>   int w = 3;
>   foo (1, (int) v);
>   foo (2, (V) w);
>   v = 3;
>   w = (int) v;
>   foo (3, w);
>   foo (4, (V) w);
>   v = (V) w;
>   foo (5, v);
>   foo (6, (int) v);
>   foo (7, x);
>   return 0;
> }
> 
> (of course, most likely with passing a different value each time and
> verification of the result).
> As the compiler treats all those casts there as useless, I'd expect
> that the types of the passed argument would be pretty much random.
> And, note that even on x86_64, the __builtin_va_arg with V expands into
>   # addr.1_3 = PHI <addr.1_27(9), _31(10)>
>   z_35 = MEM[(V * {ref-all})addr.1_3];
> using exactly the same address for int as well as V va_arg - if you increase
> the overalignment arbitrarily, it will surely be a wrong IL because nobody
> really guarantees anything about the overalignment.
> 
> So, I think the tree-sra.c patch is a good idea - try to keep using the main
> type variants as the types in the IL where possible except for the MEM_REF
> first argument (i.e. even the lhs of the load should IMHO not be
> overaligned).

Yeah, I'm testing it right now as it seems to fix the regression and
should be certainly safe.

Richard.

> As Eric Botcazou said, GCC right now isn't really prepared for under or
> overaligned scalars, only when they are in structs (or for middle-end in
> *MEM_REFs).
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 10:51                           ` Richard Earnshaw
  2015-03-31 11:09                             ` Richard Biener
@ 2015-03-31 12:11                             ` Alan Lawrence
  1 sibling, 0 replies; 31+ messages in thread
From: Alan Lawrence @ 2015-03-31 12:11 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Richard Biener, Richard Biener, gcc-patches, Marcus Shawcroft

Richard Earnshaw wrote:
> On 31/03/15 11:45, Richard Biener wrote:
>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>
>>> On 31/03/15 11:36, Richard Biener wrote:
>>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>>
>>>>> On 31/03/15 11:00, Richard Biener wrote:
>>>>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>>>>
>>>>>>> On 31/03/15 08:50, Richard Biener wrote:
>>>>>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>>>>>>> it.
>>>>>>>>>>
>>>>>>>>>> The problem appears to be in laying out arguments, specifically
>>>>>>>>>> varargs. From
>>>>>>>>>> the "good" -fdump-rtl-expand:
>>>>>>>>>>
>>>>>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>>>> S4 A32])
>>>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>>>      (nil))
>>>>>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>>>      (nil))
>>>>>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>>>      (nil))
>>>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>>>      (nil))
>>>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>>>> A32])
>>>>>>>>>>
>>>>>>>>>> The struct members are
>>>>>>>>>> reg:SI 113 => int a;
>>>>>>>>>> reg:DF 112 => double b;
>>>>>>>>>> reg:SI 111 => int c;
>>>>>>>>>>
>>>>>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>>>>>>> pushed
>>>>>>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>>>>>>> build_ref_of_offset, we get:
>>>>>>>>>>
>>>>>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>>>>>>> *.LC1>)) reduced.c:14 -1
>>>>>>>>>>      (nil))
>>>>>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>>>>>>> virtual-outgoing-args)
>>>>>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>>>      (nil))
>>>>>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>>>> S8 A64])
>>>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>>>      (nil))
>>>>>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>>>      (nil))
>>>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>>>      (nil))
>>>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>>>> A32])
>>>>>>>>>>
>>>>>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>>>>>>
>>>>>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>>>>>>> because
>>>>>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>>>>>>> argument (the
>>>>>>>>>> type constructed by build_ref_for_offset); it then executes
>>>>>>>>>> (aapcs_layout_arg,
>>>>>>>>>> arm.c line ~~5914)
>>>>>>>>>>
>>>>>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>>>>>>      next even number.  */
>>>>>>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>>>>>>     ncrn++;
>>>>>>>>>>
>>>>>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>>>>>>> testcase
>>>>>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>>>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>>>>>>
>>>>>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>>>>>>> I think
>>>>>>>>>> because these use the type of the declared parameters, rather than the
>>>>>>>>>> provided
>>>>>>>>>> arguments, and the former do not have the increased alignment from
>>>>>>>>>> build_ref_for_offset.
>>>>>>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>>>>>>>>
>>>>>>>>> This means that
>>>>>>>>>
>>>>>>>>> Int I __aligned__(8);
>>>>>>>>>
>>>>>>>>> Is passed differently than int.
>>>>>>>>>
>>>>>>>>> Arm_function_arg needs to be fixed.
>>>>>>>> That is,
>>>>>>>>
>>>>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>>>>
>>>>>>>> int main()
>>>>>>>> {
>>>>>>>>   myint i = 1;
>>>>>>>>   int j = 2;
>>>>>>>>   __builtin_printf("%d %d\n", i, j);
>>>>>>>> }
>>>>>>>>
>>>>>>>> or
>>>>>>>>
>>>>>>>> myint i;
>>>>>>>> int j;
>>>>>>>> myint *p = &i;
>>>>>>>> int *q = &j;
>>>>>>>>
>>>>>>>> int main()
>>>>>>>> {
>>>>>>>>   __builtin_printf("%d %d", *p, *q);
>>>>>>>> }
>>>>>>>>
>>>>>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>>>>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>>>>>>> non-aligned values only makes sense for things passed by value on the stack.
>>>>>>>> And then obviously only dependent on the functuion type signature, not
>>>>>>>> on the type of the passed value.
>>>>>>>>
>>>>>>> I think the testcase is ill-formed.  Just because printf doesn't have
>>>>>>> such a modifier, doesn't mean that another variadic function might not
>>>>>>> have a means to detect when an object in the variadic list needs to be
>>>>>>> over-aligned.  As such, the test should really be written as:
>>>>>> A value doesn't have "alignment".  A function may have alignment
>>>>>> requirements on its arguments, clearly printf doesn't.
>>>>>>
>>>>> Values don't.  But types do and variadic functions are special in that
>>>>> they derive their types from the types of the actual parameters passed
>>>>> not from the formals in the prototype.  Any manipulation of the types
>>>>> should be done in the front end, not in the back end.
>>>> The following seems to help the testcase (by luck I'd say?).  It
>>>> makes us drop alignment information from the temporaries that
>>>> SRA creates as memory replacement.
>>>>
>>>> But I find it odd that on ARM passing *((aligned_int *)p) as
>>>> vararg (only as varargs?) changes calling conventions independent
>>>> of the functions type signature.
>>>>
>>>> Richard.
>>>>
>>>> Index: gcc/tree-sra.c
>>>> ===================================================================
>>>> --- gcc/tree-sra.c      (revision 221770)
>>>> +++ gcc/tree-sra.c      (working copy)
>>>> @@ -2012,7 +2012,9 @@ create_access_replacement (struct access
>>>>        DECL_CONTEXT (repl) = current_function_decl;
>>>>      }
>>>>    else
>>>> -    repl = create_tmp_var (access->type, "SR");
>>>> +    repl = create_tmp_var (build_qualified_type
>>>> +                            (TYPE_MAIN_VARIANT (access->type),
>>>> +                             TYPE_QUALS (access->type)), "SR");
>>>>    if (TREE_CODE (access->type) == COMPLEX_TYPE
>>>>        || TREE_CODE (access->type) == VECTOR_TYPE)
>>>>      {
>>>>
>>> I guess we could do something similar (use TYPE_MAIN_VARIANT) in the
>>> backend code - but then we'd always ignore any over-alignment
>>> requirements (or under, for that matter).
>> The question is whether those are even in the scope of AACPS...
>>
> 
> Technically, they aren't.  The argument passing rules are all based on
> the alignment rules for fundamental types (and derived rules for
> composite types based on those fundamental types) written in the AAPCS
> itself.  There's no provision for over-aligning a data type, so all of
> this is off-piste.
> 
> R.

FWIW, I've just tested arm-none-linux-gnueabi with:

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 48342d0..37662f4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6020,7 +6020,7 @@ static bool
  arm_needs_doubleword_align (machine_mode mode, const_tree type)
  {
    return (GET_MODE_ALIGNMENT (mode) > PARM_BOUNDARY
-         || (type && TYPE_ALIGN (type) > PARM_BOUNDARY));
+         || (type && TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) > PARM_BOUNDARY));
  }

and found no regressions (and yes this fixes the original case). Whether this is 
the right way to do it however is another question!

--Alan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 11:09                             ` Richard Biener
@ 2015-03-31 12:15                               ` Richard Earnshaw
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Earnshaw @ 2015-03-31 12:15 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, Alan Lawrence, gcc-patches, Marcus Shawcroft

On 31/03/15 12:08, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 11:45, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>
>>>> On 31/03/15 11:36, Richard Biener wrote:
>>>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>>>
>>>>>> On 31/03/15 11:00, Richard Biener wrote:
>>>>>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>>>>>
>>>>>>>> On 31/03/15 08:50, Richard Biener wrote:
>>>>>>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>>>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>>>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>>>>>>>> it.
>>>>>>>>>>>
>>>>>>>>>>> The problem appears to be in laying out arguments, specifically
>>>>>>>>>>> varargs. From
>>>>>>>>>>> the "good" -fdump-rtl-expand:
>>>>>>>>>>>
>>>>>>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>>>>> S4 A32])
>>>>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>>>>      (nil))
>>>>>>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>>>>      (nil))
>>>>>>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>>>>      (nil))
>>>>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>>>>      (nil))
>>>>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>>>>> A32])
>>>>>>>>>>>
>>>>>>>>>>> The struct members are
>>>>>>>>>>> reg:SI 113 => int a;
>>>>>>>>>>> reg:DF 112 => double b;
>>>>>>>>>>> reg:SI 111 => int c;
>>>>>>>>>>>
>>>>>>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>>>>>>>> pushed
>>>>>>>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>>>>>>>> build_ref_of_offset, we get:
>>>>>>>>>>>
>>>>>>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>>>>>>>> *.LC1>)) reduced.c:14 -1
>>>>>>>>>>>      (nil))
>>>>>>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>>>>>>>> virtual-outgoing-args)
>>>>>>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>>>>      (nil))
>>>>>>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>>>>> S8 A64])
>>>>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>>>>      (nil))
>>>>>>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>>>>      (nil))
>>>>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>>>>      (nil))
>>>>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>>>>> A32])
>>>>>>>>>>>
>>>>>>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>>>>>>>
>>>>>>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>>>>>>>> because
>>>>>>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>>>>>>>> argument (the
>>>>>>>>>>> type constructed by build_ref_for_offset); it then executes
>>>>>>>>>>> (aapcs_layout_arg,
>>>>>>>>>>> arm.c line ~~5914)
>>>>>>>>>>>
>>>>>>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>>>>>>>      next even number.  */
>>>>>>>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>>>>>>>     ncrn++;
>>>>>>>>>>>
>>>>>>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>>>>>>>> testcase
>>>>>>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>>>>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>>>>>>>
>>>>>>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>>>>>>>> I think
>>>>>>>>>>> because these use the type of the declared parameters, rather than the
>>>>>>>>>>> provided
>>>>>>>>>>> arguments, and the former do not have the increased alignment from
>>>>>>>>>>> build_ref_for_offset.
>>>>>>>>>>
>>>>>>>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>>>>>>>>>
>>>>>>>>>> This means that
>>>>>>>>>>
>>>>>>>>>> Int I __aligned__(8);
>>>>>>>>>>
>>>>>>>>>> Is passed differently than int.
>>>>>>>>>>
>>>>>>>>>> Arm_function_arg needs to be fixed.
>>>>>>>>>
>>>>>>>>> That is,
>>>>>>>>>
>>>>>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>>>>>
>>>>>>>>> int main()
>>>>>>>>> {
>>>>>>>>>   myint i = 1;
>>>>>>>>>   int j = 2;
>>>>>>>>>   __builtin_printf("%d %d\n", i, j);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> or
>>>>>>>>>
>>>>>>>>> myint i;
>>>>>>>>> int j;
>>>>>>>>> myint *p = &i;
>>>>>>>>> int *q = &j;
>>>>>>>>>
>>>>>>>>> int main()
>>>>>>>>> {
>>>>>>>>>   __builtin_printf("%d %d", *p, *q);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>>>>>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>>>>>>>> non-aligned values only makes sense for things passed by value on the stack.
>>>>>>>>> And then obviously only dependent on the functuion type signature, not
>>>>>>>>> on the type of the passed value.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think the testcase is ill-formed.  Just because printf doesn't have
>>>>>>>> such a modifier, doesn't mean that another variadic function might not
>>>>>>>> have a means to detect when an object in the variadic list needs to be
>>>>>>>> over-aligned.  As such, the test should really be written as:
>>>>>>>
>>>>>>> A value doesn't have "alignment".  A function may have alignment
>>>>>>> requirements on its arguments, clearly printf doesn't.
>>>>>>>
>>>>>>
>>>>>> Values don't.  But types do and variadic functions are special in that
>>>>>> they derive their types from the types of the actual parameters passed
>>>>>> not from the formals in the prototype.  Any manipulation of the types
>>>>>> should be done in the front end, not in the back end.
>>>>>
>>>>> The following seems to help the testcase (by luck I'd say?).  It
>>>>> makes us drop alignment information from the temporaries that
>>>>> SRA creates as memory replacement.
>>>>>
>>>>> But I find it odd that on ARM passing *((aligned_int *)p) as
>>>>> vararg (only as varargs?) changes calling conventions independent
>>>>> of the functions type signature.
>>>>>
>>>>> Richard.
>>>>>
>>>>> Index: gcc/tree-sra.c
>>>>> ===================================================================
>>>>> --- gcc/tree-sra.c      (revision 221770)
>>>>> +++ gcc/tree-sra.c      (working copy)
>>>>> @@ -2012,7 +2012,9 @@ create_access_replacement (struct access
>>>>>        DECL_CONTEXT (repl) = current_function_decl;
>>>>>      }
>>>>>    else
>>>>> -    repl = create_tmp_var (access->type, "SR");
>>>>> +    repl = create_tmp_var (build_qualified_type
>>>>> +                            (TYPE_MAIN_VARIANT (access->type),
>>>>> +                             TYPE_QUALS (access->type)), "SR");
>>>>>    if (TREE_CODE (access->type) == COMPLEX_TYPE
>>>>>        || TREE_CODE (access->type) == VECTOR_TYPE)
>>>>>      {
>>>>>
>>>>
>>>> I guess we could do something similar (use TYPE_MAIN_VARIANT) in the
>>>> backend code - but then we'd always ignore any over-alignment
>>>> requirements (or under, for that matter).
>>>
>>> The question is whether those are even in the scope of AACPS...
>>>
>>
>> Technically, they aren't.  The argument passing rules are all based on
>> the alignment rules for fundamental types (and derived rules for
>> composite types based on those fundamental types) written in the AAPCS
>> itself.  There's no provision for over-aligning a data type, so all of
>> this is off-piste.
> 
> So in arm_needs_doubleword_align you could do
> 
> /* Return true if mode/type need doubleword alignment.  */
> static bool
> arm_needs_doubleword_align (machine_mode mode, const_tree type)
> {
>   return (GET_MODE_ALIGNMENT (mode) > PARM_BOUNDARY
>           || (type
>               && (AGGREGATE_TYPE_P (type) || mode == BLKmode)
>               && TYPE_ALIGN (type) > PARM_BOUNDARY));
> }
> 
> ?  Thus always trust the 'mode' when it doesn't apply to an aggregate?
> 
> Richard.
> 

Experience has shown that the front/mid ends always end up buggering the
mode argument to the point where it even causes the wrong register class
to be chosen.  It really can't be trusted unless type is NULL
(libcalls).  The type, on the other hand, should always be correct and
should always be the preferred way to get the information needed.

R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 11:07                         ` Jakub Jelinek
  2015-03-31 11:11                           ` Richard Biener
@ 2015-03-31 13:11                           ` Alan Lawrence
  2015-03-31 13:25                             ` Richard Biener
  1 sibling, 1 reply; 31+ messages in thread
From: Alan Lawrence @ 2015-03-31 13:11 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Richard Earnshaw, Richard Biener, gcc-patches,
	Marcus Shawcroft

Jakub Jelinek wrote:
> On Tue, Mar 31, 2015 at 11:47:37AM +0100, Alan Lawrence wrote:
>> Richard Biener wrote:
>>> But I find it odd that on ARM passing *((aligned_int *)p) as
>>> vararg (only as varargs?) changes calling conventions independent
>>> of the functions type signature.
>> Does it? Do you have a testcase, and compilation flags, that'll make this
>> show up in an RTL dump? I've tried numerous cases, including AFAICT yours,
>> and I always get the value being passed in the expected ("unaligned")
>> register?
> 
> If the integral type alignment right now matters, I'd try something like:
> 
> typedef int V __attribute__((aligned (8)));
> V x;
> 
> int foo (int x, ...)
> {
>   int z;
>   __builtin_va_list va;
>   __builtin_va_start (va, x);
>   switch (x)
>     {
>     case 1:
>     case 3:
>     case 6:
>       z = __builtin_va_arg (va, int);
>       break;
>     default:
>       z = __builtin_va_arg (va, V);
>       break;
>     }
>   __builtin_va_end (va);
>   return z;
> }
> 
> int
> bar (void)
> {
>   V v = 3;
>   int w = 3;
>   foo (1, (int) v);
>   foo (2, (V) w);
>   v = 3;
>   w = (int) v;
>   foo (3, w);
>   foo (4, (V) w);
>   v = (V) w;
>   foo (5, v);
>   foo (6, (int) v);
>   foo (7, x);
>   return 0;
> }
> 
> (of course, most likely with passing a different value each time and
> verification of the result).
> As the compiler treats all those casts there as useless, I'd expect
> that the types of the passed argument would be pretty much random.
> And, note that even on x86_64, the __builtin_va_arg with V expands into
>   # addr.1_3 = PHI <addr.1_27(9), _31(10)>
>   z_35 = MEM[(V * {ref-all})addr.1_3];
> using exactly the same address for int as well as V va_arg - if you increase
> the overalignment arbitrarily, it will surely be a wrong IL because nobody
> really guarantees anything about the overalignment.
> 
> So, I think the tree-sra.c patch is a good idea - try to keep using the main
> type variants as the types in the IL where possible except for the MEM_REF
> first argument (i.e. even the lhs of the load should IMHO not be
> overaligned).
> 
> As Eric Botcazou said, GCC right now isn't really prepared for under or
> overaligned scalars, only when they are in structs (or for middle-end in
> *MEM_REFs).
> 
> 	Jakub
> 

On ARM, I get the arguments being passed in r0 & r1 for every call in bar() 
above. It sounds as if this is because the casts are being removed as useless; 
so the only way for overalignment info to be present, is when SRA puts it there.

The only way I can get a register to be skipped, is by providing a prototype 
with alignment specified via a typedef:

typedef int aligned_int __attribute__((aligned((8))));
int foo(int a, aligned_int b) {...} //compiles ok

whereas specifying alignment directly, is rejected:

nonvar.c:2:20: error: alignment may not be specified for 'b'
  int foo(int a, int b __attribute__((aligned((8)))))
                     ^
Note this is using the GNU __attribute__((aligned)) extension. Trying to use C11 
_Alignas results in a frontend error either way; IIUC the C11 spec deems that 
sort of thing illegal.

(1) If we wish to keep the AAPCS principle that varargs are passed just as named 
args, we should use TYPE_MAIN_VARIANT inside arm_needs_doubleword_alignment, 
which will then ignore overalignment on both varargs _and named args_. However 
this would be silently ABI-changing....?

(2) It seems to me that SRA is the only way for overalignment info to be present 
on a value, so the patch to tree-sra.c/create_access_replacement seems to make 
things more consistent?

(3) Given C11 forbids overaligned arguments/parameters, do we wish GNU 
__attribute__((aligned)) to allow this (via typedefs but not attributes on the 
parameters themselves) ?

--Alan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 13:11                           ` Alan Lawrence
@ 2015-03-31 13:25                             ` Richard Biener
  2015-04-02 14:59                               ` Alan Lawrence
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2015-03-31 13:25 UTC (permalink / raw)
  To: Alan Lawrence
  Cc: Jakub Jelinek, Richard Earnshaw, Richard Biener, gcc-patches,
	Marcus Shawcroft

On Tue, 31 Mar 2015, Alan Lawrence wrote:

> Jakub Jelinek wrote:
> > On Tue, Mar 31, 2015 at 11:47:37AM +0100, Alan Lawrence wrote:
> > > Richard Biener wrote:
> > > > But I find it odd that on ARM passing *((aligned_int *)p) as
> > > > vararg (only as varargs?) changes calling conventions independent
> > > > of the functions type signature.
> > > Does it? Do you have a testcase, and compilation flags, that'll make this
> > > show up in an RTL dump? I've tried numerous cases, including AFAICT yours,
> > > and I always get the value being passed in the expected ("unaligned")
> > > register?
> > 
> > If the integral type alignment right now matters, I'd try something like:
> > 
> > typedef int V __attribute__((aligned (8)));
> > V x;
> > 
> > int foo (int x, ...)
> > {
> >   int z;
> >   __builtin_va_list va;
> >   __builtin_va_start (va, x);
> >   switch (x)
> >     {
> >     case 1:
> >     case 3:
> >     case 6:
> >       z = __builtin_va_arg (va, int);
> >       break;
> >     default:
> >       z = __builtin_va_arg (va, V);
> >       break;
> >     }
> >   __builtin_va_end (va);
> >   return z;
> > }
> > 
> > int
> > bar (void)
> > {
> >   V v = 3;
> >   int w = 3;
> >   foo (1, (int) v);
> >   foo (2, (V) w);
> >   v = 3;
> >   w = (int) v;
> >   foo (3, w);
> >   foo (4, (V) w);
> >   v = (V) w;
> >   foo (5, v);
> >   foo (6, (int) v);
> >   foo (7, x);
> >   return 0;
> > }
> > 
> > (of course, most likely with passing a different value each time and
> > verification of the result).
> > As the compiler treats all those casts there as useless, I'd expect
> > that the types of the passed argument would be pretty much random.
> > And, note that even on x86_64, the __builtin_va_arg with V expands into
> >   # addr.1_3 = PHI <addr.1_27(9), _31(10)>
> >   z_35 = MEM[(V * {ref-all})addr.1_3];
> > using exactly the same address for int as well as V va_arg - if you increase
> > the overalignment arbitrarily, it will surely be a wrong IL because nobody
> > really guarantees anything about the overalignment.
> > 
> > So, I think the tree-sra.c patch is a good idea - try to keep using the main
> > type variants as the types in the IL where possible except for the MEM_REF
> > first argument (i.e. even the lhs of the load should IMHO not be
> > overaligned).
> > 
> > As Eric Botcazou said, GCC right now isn't really prepared for under or
> > overaligned scalars, only when they are in structs (or for middle-end in
> > *MEM_REFs).
> > 
> > 	Jakub
> > 
> 
> On ARM, I get the arguments being passed in r0 & r1 for every call in bar()
> above. It sounds as if this is because the casts are being removed as useless;
> so the only way for overalignment info to be present, is when SRA puts it
> there.
> 
> The only way I can get a register to be skipped, is by providing a prototype
> with alignment specified via a typedef:
> 
> typedef int aligned_int __attribute__((aligned((8))));
> int foo(int a, aligned_int b) {...} //compiles ok
> 
> whereas specifying alignment directly, is rejected:
> 
> nonvar.c:2:20: error: alignment may not be specified for 'b'
>  int foo(int a, int b __attribute__((aligned((8)))))
>                     ^
> Note this is using the GNU __attribute__((aligned)) extension. Trying to use
> C11 _Alignas results in a frontend error either way; IIUC the C11 spec deems
> that sort of thing illegal.
> 
> (1) If we wish to keep the AAPCS principle that varargs are passed just as
> named args, we should use TYPE_MAIN_VARIANT inside
> arm_needs_doubleword_alignment, which will then ignore overalignment on both
> varargs _and named args_. However this would be silently ABI-changing....?
> 
> (2) It seems to me that SRA is the only way for overalignment info to be
> present on a value, so the patch to tree-sra.c/create_access_replacement seems
> to make things more consistent?

I'm not so sure about (2), SCCVN records the type of a reference
and PRE uses it to create the LHS temporaries to insert them.
You'd need some tricky order of optimizations to expose that to
a call argument though (copy-propagating the inserted value to
a call argument).  LIM may have similar issues (when doing store-motion),
so may predictive commoning and loop distribution (and maybe others I
forgot).

Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New regression on ARM Linux
  2015-03-31 13:25                             ` Richard Biener
@ 2015-04-02 14:59                               ` Alan Lawrence
  0 siblings, 0 replies; 31+ messages in thread
From: Alan Lawrence @ 2015-04-02 14:59 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, Richard Earnshaw, Richard Biener, gcc-patches,
	Marcus Shawcroft

Richard Biener wrote:
 > > On Tue, 31 Mar 2015, Alan Lawrence wrote:
 > >
 >> >> (1) If we wish to keep the AAPCS principle that varargs are passed just as
 >> >> named args, we should use TYPE_MAIN_VARIANT inside
 >> >> arm_needs_doubleword_alignment, which will then ignore overalignment on both
 >> >> varargs _and named args_. However this would be silently ABI-changing....?
 >> >>
 >> >> (2) It seems to me that SRA is the only way for overalignment info to be
 >> >> present on a value, so the patch to tree-sra.c/create_access_replacement 
seems
 >> >> to make things more consistent?
 > >
 > > I'm not so sure about (2), SCCVN records the type of a reference
 > > and PRE uses it to create the LHS temporaries to insert them.
 > > You'd need some tricky order of optimizations to expose that to
 > > a call argument though (copy-propagating the inserted value to
 > > a call argument).  LIM may have similar issues (when doing store-motion),
 > > so may predictive commoning and loop distribution (and maybe others I
 > > forgot).


Hmmm. The other cases you mention are somewhat worrisome, as we don't know if
there are testcases that might tickle these too. But I think making a
last-minute change to the backend, *that would affect the ABI for prototyped
arguments*, is asking for trouble; we don't have time to think through the
implications or make a proper spec, and I can easily see us wanting to change it
*again* for gcc 6! So I think your patch to tree-sra.c
(create_access_replacement) seems the right fix?

--Alan


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-04-02 14:59 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-11 15:09 [PATCH] Fix regression caused by PR65310 fix Richard Biener
2015-03-30 12:15 ` New regression on ARM Linux (was: Re: [PATCH] Fix regression caused by PR65310 fix) Alan Lawrence
2015-03-30 12:18   ` New regression on ARM Linux Alan Lawrence
2015-03-30 13:01     ` Richard Biener
2015-03-30 13:16       ` Richard Biener
2015-03-30 16:45         ` Alan Lawrence
2015-03-30 20:13           ` Richard Biener
2015-03-31  7:50             ` Richard Biener
2015-03-31  9:43               ` Richard Earnshaw
2015-03-31 10:00                 ` Richard Biener
2015-03-31 10:10                   ` Richard Earnshaw
2015-03-31 10:32                     ` Jakub Jelinek
2015-03-31 10:36                     ` Richard Biener
2015-03-31 10:40                       ` Richard Earnshaw
2015-03-31 10:45                         ` Richard Biener
2015-03-31 10:51                           ` Richard Earnshaw
2015-03-31 11:09                             ` Richard Biener
2015-03-31 12:15                               ` Richard Earnshaw
2015-03-31 12:11                             ` Alan Lawrence
2015-03-31 10:47                       ` Alan Lawrence
2015-03-31 11:05                         ` Richard Biener
2015-03-31 11:07                         ` Jakub Jelinek
2015-03-31 11:11                           ` Richard Biener
2015-03-31 13:11                           ` Alan Lawrence
2015-03-31 13:25                             ` Richard Biener
2015-04-02 14:59                               ` Alan Lawrence
2015-03-31 10:20                   ` Richard Biener
2015-03-31 10:31                     ` Richard Earnshaw
2015-03-31 10:45                       ` Richard Biener
2015-03-31 10:53                         ` Richard Earnshaw
2015-03-31  9:50               ` Alan Lawrence

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).