From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-394397-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 43741 invoked by alias); 31 Mar 2015 10:53:19 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 43715 invoked by uid 89); 31 Mar 2015 10:53:18 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 31 Mar 2015 10:53:14 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E304E27;	Tue, 31 Mar 2015 03:53:12 -0700 (PDT)
Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 458D43F218;	Tue, 31 Mar 2015 03:53:12 -0700 (PDT)
Message-ID: <551A7C96.9050900@foss.arm.com>
Date: Tue, 31 Mar 2015 10:53:00 -0000
From: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: Richard Biener <rguenther@suse.de>
CC: Richard Biener <richard.guenther@gmail.com>,  Alan Lawrence <Alan.Lawrence@arm.com>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Marcus Shawcroft <Marcus.Shawcroft@arm.com>
Subject: Re: New regression on ARM Linux
References: <alpine.LSU.2.11.1503111607530.10796@zhemvz.fhfr.qr> <55193E77.3040401@arm.com> <55193EFB.6030009@arm.com> <alpine.LSU.2.11.1503301501070.31545@zhemvz.fhfr.qr> <alpine.LSU.2.11.1503301509150.31545@zhemvz.fhfr.qr> <55197DAE.1050004@arm.com> <16A8C5AA-994C-4FA3-BC25-8547166DC13C@suse.de> <CAFiYyc1MOL8AAwtVhvYDYmSSo+MdGi2FFi-6ELfRxdsvDxtSew@mail.gmail.com> <551A6C46.7010305@foss.arm.com> <alpine.LSU.2.11.1503311200090.31545@zhemvz.fhfr.qr> <alpine.LSU.2.11.1503311207080.31545@zhemvz.fhfr.qr> <551A776B.3020103@foss.arm.com> <alpine.LSU.2.11.1503311239590.31545@zhemvz.fhfr.qr>
In-Reply-To: <alpine.LSU.2.11.1503311239590.31545@zhemvz.fhfr.qr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2015-03/txt/msg01637.txt.bz2

On 31/03/15 11:44, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 11:20, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Biener wrote:
>>>
>>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>>
>>>>> On 31/03/15 08:50, Richard Biener wrote:
>>>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>>>>> it.
>>>>>>>>
>>>>>>>> The problem appears to be in laying out arguments, specifically
>>>>>>>> varargs. From
>>>>>>>> the "good" -fdump-rtl-expand:
>>>>>>>>
>>>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>> S4 A32])
>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>> A32])
>>>>>>>>
>>>>>>>> The struct members are
>>>>>>>> reg:SI 113 => int a;
>>>>>>>> reg:DF 112 => double b;
>>>>>>>> reg:SI 111 => int c;
>>>>>>>>
>>>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>>>>> pushed
>>>>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>>>>> build_ref_of_offset, we get:
>>>>>>>>
>>>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>>>>> *.LC1>)) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>>>>> virtual-outgoing-args)
>>>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>> S8 A64])
>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>> A32])
>>>>>>>>
>>>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>>>>
>>>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>>>>> because
>>>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>>>>> argument (the
>>>>>>>> type constructed by build_ref_for_offset); it then executes
>>>>>>>> (aapcs_layout_arg,
>>>>>>>> arm.c line ~~5914)
>>>>>>>>
>>>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>>>>      next even number.  */
>>>>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>>>>     ncrn++;
>>>>>>>>
>>>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>>>>> testcase
>>>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>>>>
>>>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>>>>> I think
>>>>>>>> because these use the type of the declared parameters, rather than the
>>>>>>>> provided
>>>>>>>> arguments, and the former do not have the increased alignment from
>>>>>>>> build_ref_for_offset.
>>>>>>>
>>>>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>>>>>>
>>>>>>> This means that
>>>>>>>
>>>>>>> Int I __aligned__(8);
>>>>>>>
>>>>>>> Is passed differently than int.
>>>>>>>
>>>>>>> Arm_function_arg needs to be fixed.
>>>>>>
>>>>>> That is,
>>>>>>
>>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>>
>>>>>> int main()
>>>>>> {
>>>>>>   myint i = 1;
>>>>>>   int j = 2;
>>>>>>   __builtin_printf("%d %d\n", i, j);
>>>>>> }
>>>>>>
>>>>>> or
>>>>>>
>>>>>> myint i;
>>>>>> int j;
>>>>>> myint *p = &i;
>>>>>> int *q = &j;
>>>>>>
>>>>>> int main()
>>>>>> {
>>>>>>   __builtin_printf("%d %d", *p, *q);
>>>>>> }
>>>>>>
>>>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>>>>> non-aligned values only makes sense for things passed by value on the stack.
>>>>>> And then obviously only dependent on the functuion type signature, not
>>>>>> on the type of the passed value.
>>>>>>
>>>>>
>>>>> I think the testcase is ill-formed.  Just because printf doesn't have
>>>>> such a modifier, doesn't mean that another variadic function might not
>>>>> have a means to detect when an object in the variadic list needs to be
>>>>> over-aligned.  As such, the test should really be written as:
>>>>
>>>> A value doesn't have "alignment".  A function may have alignment
>>>> requirements on its arguments, clearly printf doesn't.
>>>
>>> Btw, it looks like function_arg is supposed to pass whether the argument
>>> is to the variadic part of the function or not, but it gets passed
>>> false as in
>>>
>>>       args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
>>>                                                 argpos < n_named_args);
>>>
>>> n_named_args is 4.  That is because ARM asks to do this via
>>>
>>>   if (type_arg_types != 0
>>>       && targetm.calls.strict_argument_naming (args_so_far))
>>>     ;
>>>   else if (type_arg_types != 0
>>>            && ! targetm.calls.pretend_outgoing_varargs_named 
>>> (args_so_far))
>>>     /* Don't include the last named arg.  */
>>>     --n_named_args;
>>>   else
>>>     /* Treat all args as named.  */
>>>     n_named_args = num_actuals;
>>>
>>> thus you lose that info (you could have that readily available).
>>>
>>
>> From the calling side of a function it shouldn't matter.  They only
>> thing the caller has to know is that the function is variadic (and
>> therefore that the base-standard rules from the AAPCS apply -- no use of
>> FP registers for parameters).  The behaviour after that is *as if* all
>> the arguments were pushed onto the stack in the correct order and
>> finally the lowest 4 words are popped off the stack again into r0-r3.
>> Hence any alignment that would apply to arguments on the stack has to be
>> reflected in their allocation into r0-r3 (since the push/pop of those
>> registers never happens).
> 
> But how do you compute the alignment of, say a myint '1'?  Of course
> __attribute__((aligned())) is a C extension thus AAPCS likely doesn't 
> consider it.
> 

Front-end problem :-)

> But yes, as given even on x86_64 we might spill a 8-byte aligned
> int register to a 8-byte aligned stack slot so the proposed patch
> makes sense in that regard.  I wonder how many other passes can
> get confused by this (PRE, I suppose, inserting an explicitely
> aligned load as well as all passes after the vectorizer which
> also creates loads/store with explicit (usually lower) alignment).
> 
> Richard.
> 
>> R.
>>
>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>
>>>>> int main()
>>>>> {
>>>>>   myint i = 1;
>>>>>   int j = 2;
>>>>>   __builtin_printf("%d %d\n", (int) i, j);
>>>>> }
>>>>>
>>>>> Variadic functions take the types of their arguments from the types of
>>>>> the actuals passed.  The compiler should either be applying appropriate
>>>>> promotion rules to make the types conformant by the language
>>>>> specification or respecting the types exactly.  However, that should be
>>>>> done in the mid-end not the back-end.  If incorrect alignment
>>>>> information is passed to the back-end it can't help but make the wrong
>>>>> choice.  Examining the mode here wouldn't help either.  Consider:
>>>>>
>>>>> int foo (int a, int b, int c, int d, int e, int f
>>>>> __attribute__((aligned(8))));
>>>>>
>>>>> On ARM that should pass f in a 64-bit aligned location (since the
>>>>> parameter will be on the stack).
>>>>>
>>>>> R.
>>>>>
>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> FWIW, I also tried:
>>>>>>>>
>>>>>>>> __attribute__((__aligned__((16)))) int x;
>>>>>>>> int main (void)
>>>>>>>> {
>>>>>>>>   __builtin_printf("%d\n", x);
>>>>>>>> }
>>>>>>>>
>>>>>>>> but in that case, the arm_function_arg is still fed a type with
>>>>>>>> alignment 32
>>>>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
>>>>>>>> has
>>>>>>>> alignment 128.
>>>>>>>>
>>>>>>>> --Alan
>>>>>>>>
>>>>>>>> Richard Biener wrote:
>>>>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>>>>>>>
>>>>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>>>>>>>
>>>>>>>>>>> ...actually attach the testcase...
>>>>>>>>>> What compile options?
>>>>>>>>>
>>>>>>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>>>>>>>> I can't see anything not guaranteeing that:
>>>>>>>>>
>>>>>>>>>         .section        .rodata
>>>>>>>>>         .align  3
>>>>>>>>> .LANCHOR0 = . + 0
>>>>>>>>> .LC1:
>>>>>>>>>         .ascii  "%d %g %d\012\000"
>>>>>>>>>         .space  6
>>>>>>>>> .LC0:
>>>>>>>>>         .word   7
>>>>>>>>>         .space  4
>>>>>>>>>         .word   0
>>>>>>>>>         .word   1075838976
>>>>>>>>>         .word   9
>>>>>>>>>         .space  4
>>>>>>>>>
>>>>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
>>>>>>>>> copy?  That is, the patch tells the backend that the loads and
>>>>>>>>> stores to the 'int' vars (which have padding followed) is aligned
>>>>>>>>> to 8 bytes.
>>>>>>>>>
>>>>>>>>> I don't see what is wrong in the final assembler, but maybe
>>>>>>>>> some endian issue exists?  The code looks quite ugly though ;)
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>>> Alan Lawrence wrote:
>>>>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>>>>>>> testsuite on ARM
>>>>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>>>>>>> following this
>>>>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>>>>>>> (including
>>>>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>>>>>>> the
>>>>>>>>>>>> expected
>>>>>>>>>>>> 7 8 9
>>>>>>>>>>>> whereas with gcc r221348, instead it prints
>>>>>>>>>>>> 0 8 0
>>>>>>>>>>>>
>>>>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>>>>>>> a
>>>>>>>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>>>>>>> sizes-gimplified type_0
>>>>>>>>>>>> BLK
>>>>>>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>>>>>>> 0x2b9b8d428d20
>>>>>>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>>>>>>>> 0x2b9b8d092690 int>
>>>>>>>>>>>>              SI file reduced.c line 12 col 7
>>>>>>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>>>>>>>              align 32 offset_align 64
>>>>>>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>>>>>>> context
>>>>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>>>>>>> D.6070>
>>>>>>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>>>>>>>> <type_decl
>>>>>>>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>>>>>>>
>>>>>>>>>>>> The tree-optimized output is the same with both compilers (as this
>>>>>>>> does not
>>>>>>>>>>>> mention alignment); the expand output differs.
>>>>>>>>>>>>
>>>>>>>>>>>> Still investigating...
>>>>>>>>>>>>
>>>>>>>>>>>> --Alan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Richard Biener wrote:
>>>>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>>>>>>>> drops alignment info unnecessarily.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  PR tree-optimization/65310
>>>>>>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>>>>>>>  alignment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Index: gcc/tree-sra.c
>>>>>>>>>>>>>
>>>>>>>> ===================================================================
>>>>>>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>>>>>>>    if (misalign != 0)
>>>>>>>>>>>>>      align = (misalign & -misalign);
>>>>>>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>>>>>>> off);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>