From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 43741 invoked by alias); 31 Mar 2015 10:53:19 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 43715 invoked by uid 89); 31 Mar 2015 10:53:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 31 Mar 2015 10:53:14 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E304E27; Tue, 31 Mar 2015 03:53:12 -0700 (PDT) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 458D43F218; Tue, 31 Mar 2015 03:53:12 -0700 (PDT) Message-ID: <551A7C96.9050900@foss.arm.com> Date: Tue, 31 Mar 2015 10:53:00 -0000 From: Richard Earnshaw User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Richard Biener CC: Richard Biener , Alan Lawrence , "gcc-patches@gcc.gnu.org" , Marcus Shawcroft Subject: Re: New regression on ARM Linux References: <55193E77.3040401@arm.com> <55193EFB.6030009@arm.com> <55197DAE.1050004@arm.com> <16A8C5AA-994C-4FA3-BC25-8547166DC13C@suse.de> <551A6C46.7010305@foss.arm.com> <551A776B.3020103@foss.arm.com> In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2015-03/txt/msg01637.txt.bz2 On 31/03/15 11:44, Richard Biener wrote: > On Tue, 31 Mar 2015, Richard Earnshaw wrote: > >> On 31/03/15 11:20, Richard Biener wrote: >>> On Tue, 31 Mar 2015, Richard Biener wrote: >>> >>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote: >>>> >>>>> On 31/03/15 08:50, Richard Biener wrote: >>>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener wrote: >>>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence wrote: >>>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes >>>>>>>> it. >>>>>>>> >>>>>>>> The problem appears to be in laying out arguments, specifically >>>>>>>> varargs. From >>>>>>>> the "good" -fdump-rtl-expand: >>>>>>>> >>>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 >>>>>>>> S4 A32]) >>>>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2) >>>>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1) >>>>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) >>>>>>>> (reg:SI 118)) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (call_insn 22 21 23 2 (parallel [ >>>>>>>> (set (reg:SI 0 r0) >>>>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>>>>>>> ) [0 __builtin_printf S4 >>>>>>>> A32]) >>>>>>>> >>>>>>>> The struct members are >>>>>>>> reg:SI 113 => int a; >>>>>>>> reg:DF 112 => double b; >>>>>>>> reg:SI 111 => int c; >>>>>>>> >>>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is >>>>>>>> pushed >>>>>>>> into virtual-outgoing-args. In contrast, post-change to >>>>>>>> build_ref_of_offset, we get: >>>>>>>> >>>>>>>> (insn 17 16 18 2 (set (reg:SI 118) >>>>>>>> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] >>>>>>> *.LC1>)) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 >>>>>>>> virtual-outgoing-args) >>>>>>>> (const_int 8 [0x8])) [0 S4 A64]) >>>>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 >>>>>>>> S8 A64]) >>>>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2) >>>>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) >>>>>>>> (reg:SI 118)) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (call_insn 22 21 23 2 (parallel [ >>>>>>>> (set (reg:SI 0 r0) >>>>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>>>>>>> ) [0 __builtin_printf S4 >>>>>>>> A32]) >>>>>>>> >>>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the >>>>>>>> >>>>>>>> double+following int are all pushed into virtual-outgoing-args. This is >>>>>>>> because >>>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second >>>>>>>> argument (the >>>>>>>> type constructed by build_ref_for_offset); it then executes >>>>>>>> (aapcs_layout_arg, >>>>>>>> arm.c line ~~5914) >>>>>>>> >>>>>>>> /* C3 - For double-word aligned arguments, round the NCRN up to the >>>>>>>> next even number. */ >>>>>>>> ncrn = pcum->aapcs_ncrn; >>>>>>>> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) >>>>>>>> ncrn++; >>>>>>>> >>>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the >>>>>>>> testcase >>>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a >>>>>>>> 32-bit-aligned int instead, which works as previously. >>>>>>>> >>>>>>>> Passing the same members of that struct in a non-vargs call, works ok - >>>>>>>> I think >>>>>>>> because these use the type of the declared parameters, rather than the >>>>>>>> provided >>>>>>>> arguments, and the former do not have the increased alignment from >>>>>>>> build_ref_for_offset. >>>>>>> >>>>>>> It doesn't make sense to use the alignment of passed values. That looks like bs. >>>>>>> >>>>>>> This means that >>>>>>> >>>>>>> Int I __aligned__(8); >>>>>>> >>>>>>> Is passed differently than int. >>>>>>> >>>>>>> Arm_function_arg needs to be fixed. >>>>>> >>>>>> That is, >>>>>> >>>>>> typedef int myint __attribute__((aligned(8))); >>>>>> >>>>>> int main() >>>>>> { >>>>>> myint i = 1; >>>>>> int j = 2; >>>>>> __builtin_printf("%d %d\n", i, j); >>>>>> } >>>>>> >>>>>> or >>>>>> >>>>>> myint i; >>>>>> int j; >>>>>> myint *p = &i; >>>>>> int *q = &j; >>>>>> >>>>>> int main() >>>>>> { >>>>>> __builtin_printf("%d %d", *p, *q); >>>>>> } >>>>>> >>>>>> should behave the same. There isn't a printf modifier for an "aligned int" >>>>>> because that sort of thing doesn't make sense. Special-casing aligned vs. >>>>>> non-aligned values only makes sense for things passed by value on the stack. >>>>>> And then obviously only dependent on the functuion type signature, not >>>>>> on the type of the passed value. >>>>>> >>>>> >>>>> I think the testcase is ill-formed. Just because printf doesn't have >>>>> such a modifier, doesn't mean that another variadic function might not >>>>> have a means to detect when an object in the variadic list needs to be >>>>> over-aligned. As such, the test should really be written as: >>>> >>>> A value doesn't have "alignment". A function may have alignment >>>> requirements on its arguments, clearly printf doesn't. >>> >>> Btw, it looks like function_arg is supposed to pass whether the argument >>> is to the variadic part of the function or not, but it gets passed >>> false as in >>> >>> args[i].reg = targetm.calls.function_arg (args_so_far, mode, type, >>> argpos < n_named_args); >>> >>> n_named_args is 4. That is because ARM asks to do this via >>> >>> if (type_arg_types != 0 >>> && targetm.calls.strict_argument_naming (args_so_far)) >>> ; >>> else if (type_arg_types != 0 >>> && ! targetm.calls.pretend_outgoing_varargs_named >>> (args_so_far)) >>> /* Don't include the last named arg. */ >>> --n_named_args; >>> else >>> /* Treat all args as named. */ >>> n_named_args = num_actuals; >>> >>> thus you lose that info (you could have that readily available). >>> >> >> From the calling side of a function it shouldn't matter. They only >> thing the caller has to know is that the function is variadic (and >> therefore that the base-standard rules from the AAPCS apply -- no use of >> FP registers for parameters). The behaviour after that is *as if* all >> the arguments were pushed onto the stack in the correct order and >> finally the lowest 4 words are popped off the stack again into r0-r3. >> Hence any alignment that would apply to arguments on the stack has to be >> reflected in their allocation into r0-r3 (since the push/pop of those >> registers never happens). > > But how do you compute the alignment of, say a myint '1'? Of course > __attribute__((aligned())) is a C extension thus AAPCS likely doesn't > consider it. > Front-end problem :-) > But yes, as given even on x86_64 we might spill a 8-byte aligned > int register to a 8-byte aligned stack slot so the proposed patch > makes sense in that regard. I wonder how many other passes can > get confused by this (PRE, I suppose, inserting an explicitely > aligned load as well as all passes after the vectorizer which > also creates loads/store with explicit (usually lower) alignment). > > Richard. > >> R. >> >>>>> typedef int myint __attribute__((aligned(8))); >>>>> >>>>> int main() >>>>> { >>>>> myint i = 1; >>>>> int j = 2; >>>>> __builtin_printf("%d %d\n", (int) i, j); >>>>> } >>>>> >>>>> Variadic functions take the types of their arguments from the types of >>>>> the actuals passed. The compiler should either be applying appropriate >>>>> promotion rules to make the types conformant by the language >>>>> specification or respecting the types exactly. However, that should be >>>>> done in the mid-end not the back-end. If incorrect alignment >>>>> information is passed to the back-end it can't help but make the wrong >>>>> choice. Examining the mode here wouldn't help either. Consider: >>>>> >>>>> int foo (int a, int b, int c, int d, int e, int f >>>>> __attribute__((aligned(8)))); >>>>> >>>>> On ARM that should pass f in a 64-bit aligned location (since the >>>>> parameter will be on the stack). >>>>> >>>>> R. >>>>> >>>>> >>>>>> Richard. >>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> >>>>>>>> FWIW, I also tried: >>>>>>>> >>>>>>>> __attribute__((__aligned__((16)))) int x; >>>>>>>> int main (void) >>>>>>>> { >>>>>>>> __builtin_printf("%d\n", x); >>>>>>>> } >>>>>>>> >>>>>>>> but in that case, the arm_function_arg is still fed a type with >>>>>>>> alignment 32 >>>>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which >>>>>>>> has >>>>>>>> alignment 128. >>>>>>>> >>>>>>>> --Alan >>>>>>>> >>>>>>>> Richard Biener wrote: >>>>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote: >>>>>>>>> >>>>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: >>>>>>>>>> >>>>>>>>>>> ...actually attach the testcase... >>>>>>>>>> What compile options? >>>>>>>>> >>>>>>>>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but >>>>>>>>> I can't see anything not guaranteeing that: >>>>>>>>> >>>>>>>>> .section .rodata >>>>>>>>> .align 3 >>>>>>>>> .LANCHOR0 = . + 0 >>>>>>>>> .LC1: >>>>>>>>> .ascii "%d %g %d\012\000" >>>>>>>>> .space 6 >>>>>>>>> .LC0: >>>>>>>>> .word 7 >>>>>>>>> .space 4 >>>>>>>>> .word 0 >>>>>>>>> .word 1075838976 >>>>>>>>> .word 9 >>>>>>>>> .space 4 >>>>>>>>> >>>>>>>>> maybe there is some more generic code-gen bug for aligned aggregate >>>>>>>>> copy? That is, the patch tells the backend that the loads and >>>>>>>>> stores to the 'int' vars (which have padding followed) is aligned >>>>>>>>> to 8 bytes. >>>>>>>>> >>>>>>>>> I don't see what is wrong in the final assembler, but maybe >>>>>>>>> some endian issue exists? The code looks quite ugly though ;) >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>>> Alan Lawrence wrote: >>>>>>>>>>>> We've been seeing a bunch of new failures in the *libffi* >>>>>>>> testsuite on ARM >>>>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), >>>>>>>> following this >>>>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached >>>>>>>> (including >>>>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints >>>>>>>> the >>>>>>>>>>>> expected >>>>>>>>>>>> 7 8 9 >>>>>>>>>>>> whereas with gcc r221348, instead it prints >>>>>>>>>>>> 0 8 0 >>>>>>>>>>>> >>>>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and >>>>>>>> a >>>>>>>>>>>> var_decl of b1, from 32 to 64; both have type >>>>>>>>>>>> type >>>>>>> sizes-gimplified type_0 >>>>>>>>>>>> BLK >>>>>>>>>>>> size >>>>>>>>>>>> unit size >>>>>>>>>>>> align 64 symtab 0 alias set 1 canonical type >>>>>>>> 0x2b9b8d428d20 >>>>>>>>>>>> fields >>>>>>>>>>> 0x2b9b8d092690 int> >>>>>>>>>>>> SI file reduced.c line 12 col 7 >>>>>>>>>>>> size >>>>>>>>>>>> unit size >>>>>>>>>>>> align 32 offset_align 64 >>>>>>>>>>>> offset >>>>>>>>>>>> bit offset >>>>>>>> context >>>>>>>>>>>> chain >>>>>>>>>>> 0x2b9b8d42b130 b>> context >>>>>>> D.6070> >>>>>>>>>>>> pointer_to_this chain >>>>>>>> >>>>>>>>>>> 0x2b9b8d42b000 D.6044>> >>>>>>>>>>>> >>>>>>>>>>>> The tree-optimized output is the same with both compilers (as this >>>>>>>> does not >>>>>>>>>>>> mention alignment); the expand output differs. >>>>>>>>>>>> >>>>>>>>>>>> Still investigating... >>>>>>>>>>>> >>>>>>>>>>>> --Alan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Richard Biener wrote: >>>>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA >>>>>>>>>>>>> drops alignment info unnecessarily. >>>>>>>>>>>>> >>>>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >>>>>>>>>>>>> >>>>>>>>>>>>> Richard. >>>>>>>>>>>>> >>>>>>>>>>>>> 2015-03-11 Richard Biener >>>>>>>>>>>>> >>>>>>>>>>>>> PR tree-optimization/65310 >>>>>>>>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger >>>>>>>>>>>>> alignment. >>>>>>>>>>>>> >>>>>>>>>>>>> Index: gcc/tree-sra.c >>>>>>>>>>>>> >>>>>>>> =================================================================== >>>>>>>>>>>>> --- gcc/tree-sra.c (revision 221324) >>>>>>>>>>>>> +++ gcc/tree-sra.c (working copy) >>>>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >>>>>>>>>>>>> misalign = (misalign + offset) & (align - 1); >>>>>>>>>>>>> if (misalign != 0) >>>>>>>>>>>>> align = (misalign & -misalign); >>>>>>>>>>>>> - if (align < TYPE_ALIGN (exp_type)) >>>>>>>>>>>>> + if (align != TYPE_ALIGN (exp_type)) >>>>>>>>>>>>> exp_type = build_aligned_type (exp_type, align); >>>>>>>>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, >>>>>>>> off); >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >> >> >