From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 101328 invoked by alias); 31 Mar 2015 10:20:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 101304 invoked by uid 89); 31 Mar 2015 10:20:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx2.suse.de Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Tue, 31 Mar 2015 10:20:14 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id DCF64ABD1; Tue, 31 Mar 2015 10:20:10 +0000 (UTC) Date: Tue, 31 Mar 2015 10:20:00 -0000 From: Richard Biener To: Richard Earnshaw cc: Richard Biener , Alan Lawrence , "gcc-patches@gcc.gnu.org" , Marcus Shawcroft Subject: Re: New regression on ARM Linux In-Reply-To: Message-ID: References: <55193E77.3040401@arm.com> <55193EFB.6030009@arm.com> <55197DAE.1050004@arm.com> <16A8C5AA-994C-4FA3-BC25-8547166DC13C@suse.de> <551A6C46.7010305@foss.arm.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2015-03/txt/msg01627.txt.bz2 On Tue, 31 Mar 2015, Richard Biener wrote: > On Tue, 31 Mar 2015, Richard Earnshaw wrote: > > > On 31/03/15 08:50, Richard Biener wrote: > > > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener wrote: > > >> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence wrote: > > >>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes > > >>> it. > > >>> > > >>> The problem appears to be in laying out arguments, specifically > > >>> varargs. From > > >>> the "good" -fdump-rtl-expand: > > >>> > > >>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 > > >>> S4 A32]) > > >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 19 18 20 2 (set (reg:DF 2 r2) > > >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 20 19 21 2 (set (reg:SI 1 r1) > > >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 21 20 22 2 (set (reg:SI 0 r0) > > >>> (reg:SI 118)) reduced.c:14 -1 > > >>> (nil)) > > >>> (call_insn 22 21 23 2 (parallel [ > > >>> (set (reg:SI 0 r0) > > >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] > > >>> ) [0 __builtin_printf S4 > > >>> A32]) > > >>> > > >>> The struct members are > > >>> reg:SI 113 => int a; > > >>> reg:DF 112 => double b; > > >>> reg:SI 111 => int c; > > >>> > > >>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is > > >>> pushed > > >>> into virtual-outgoing-args. In contrast, post-change to > > >>> build_ref_of_offset, we get: > > >>> > > >>> (insn 17 16 18 2 (set (reg:SI 118) > > >>> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] > >>> *.LC1>)) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 > > >>> virtual-outgoing-args) > > >>> (const_int 8 [0x8])) [0 S4 A64]) > > >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 > > >>> S8 A64]) > > >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 20 19 21 2 (set (reg:SI 2 r2) > > >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 21 20 22 2 (set (reg:SI 0 r0) > > >>> (reg:SI 118)) reduced.c:14 -1 > > >>> (nil)) > > >>> (call_insn 22 21 23 2 (parallel [ > > >>> (set (reg:SI 0 r0) > > >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] > > >>> ) [0 __builtin_printf S4 > > >>> A32]) > > >>> > > >>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the > > >>> > > >>> double+following int are all pushed into virtual-outgoing-args. This is > > >>> because > > >>> arm_function_arg is fed a 64-bit-aligned int as type of the second > > >>> argument (the > > >>> type constructed by build_ref_for_offset); it then executes > > >>> (aapcs_layout_arg, > > >>> arm.c line ~~5914) > > >>> > > >>> /* C3 - For double-word aligned arguments, round the NCRN up to the > > >>> next even number. */ > > >>> ncrn = pcum->aapcs_ncrn; > > >>> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) > > >>> ncrn++; > > >>> > > >>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the > > >>> testcase > > >>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a > > >>> 32-bit-aligned int instead, which works as previously. > > >>> > > >>> Passing the same members of that struct in a non-vargs call, works ok - > > >>> I think > > >>> because these use the type of the declared parameters, rather than the > > >>> provided > > >>> arguments, and the former do not have the increased alignment from > > >>> build_ref_for_offset. > > >> > > >> It doesn't make sense to use the alignment of passed values. That looks like bs. > > >> > > >> This means that > > >> > > >> Int I __aligned__(8); > > >> > > >> Is passed differently than int. > > >> > > >> Arm_function_arg needs to be fixed. > > > > > > That is, > > > > > > typedef int myint __attribute__((aligned(8))); > > > > > > int main() > > > { > > > myint i = 1; > > > int j = 2; > > > __builtin_printf("%d %d\n", i, j); > > > } > > > > > > or > > > > > > myint i; > > > int j; > > > myint *p = &i; > > > int *q = &j; > > > > > > int main() > > > { > > > __builtin_printf("%d %d", *p, *q); > > > } > > > > > > should behave the same. There isn't a printf modifier for an "aligned int" > > > because that sort of thing doesn't make sense. Special-casing aligned vs. > > > non-aligned values only makes sense for things passed by value on the stack. > > > And then obviously only dependent on the functuion type signature, not > > > on the type of the passed value. > > > > > > > I think the testcase is ill-formed. Just because printf doesn't have > > such a modifier, doesn't mean that another variadic function might not > > have a means to detect when an object in the variadic list needs to be > > over-aligned. As such, the test should really be written as: > > A value doesn't have "alignment". A function may have alignment > requirements on its arguments, clearly printf doesn't. Btw, it looks like function_arg is supposed to pass whether the argument is to the variadic part of the function or not, but it gets passed false as in args[i].reg = targetm.calls.function_arg (args_so_far, mode, type, argpos < n_named_args); n_named_args is 4. That is because ARM asks to do this via if (type_arg_types != 0 && targetm.calls.strict_argument_naming (args_so_far)) ; else if (type_arg_types != 0 && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far)) /* Don't include the last named arg. */ --n_named_args; else /* Treat all args as named. */ n_named_args = num_actuals; thus you lose that info (you could have that readily available). > > typedef int myint __attribute__((aligned(8))); > > > > int main() > > { > > myint i = 1; > > int j = 2; > > __builtin_printf("%d %d\n", (int) i, j); > > } > > > > Variadic functions take the types of their arguments from the types of > > the actuals passed. The compiler should either be applying appropriate > > promotion rules to make the types conformant by the language > > specification or respecting the types exactly. However, that should be > > done in the mid-end not the back-end. If incorrect alignment > > information is passed to the back-end it can't help but make the wrong > > choice. Examining the mode here wouldn't help either. Consider: > > > > int foo (int a, int b, int c, int d, int e, int f > > __attribute__((aligned(8)))); > > > > On ARM that should pass f in a 64-bit aligned location (since the > > parameter will be on the stack). > > > > R. > > > > > > > Richard. > > > > > >> Richard. > > >> > > >>> > > >>> FWIW, I also tried: > > >>> > > >>> __attribute__((__aligned__((16)))) int x; > > >>> int main (void) > > >>> { > > >>> __builtin_printf("%d\n", x); > > >>> } > > >>> > > >>> but in that case, the arm_function_arg is still fed a type with > > >>> alignment 32 > > >>> (bits), i.e. distinct from the type of the field 'x' in memory, which > > >>> has > > >>> alignment 128. > > >>> > > >>> --Alan > > >>> > > >>> Richard Biener wrote: > > >>>> On Mon, 30 Mar 2015, Richard Biener wrote: > > >>>> > > >>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: > > >>>>> > > >>>>>> ...actually attach the testcase... > > >>>>> What compile options? > > >>>> > > >>>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but > > >>>> I can't see anything not guaranteeing that: > > >>>> > > >>>> .section .rodata > > >>>> .align 3 > > >>>> .LANCHOR0 = . + 0 > > >>>> .LC1: > > >>>> .ascii "%d %g %d\012\000" > > >>>> .space 6 > > >>>> .LC0: > > >>>> .word 7 > > >>>> .space 4 > > >>>> .word 0 > > >>>> .word 1075838976 > > >>>> .word 9 > > >>>> .space 4 > > >>>> > > >>>> maybe there is some more generic code-gen bug for aligned aggregate > > >>>> copy? That is, the patch tells the backend that the loads and > > >>>> stores to the 'int' vars (which have padding followed) is aligned > > >>>> to 8 bytes. > > >>>> > > >>>> I don't see what is wrong in the final assembler, but maybe > > >>>> some endian issue exists? The code looks quite ugly though ;) > > >>>> > > >>>> Richard. > > >>>> > > >>>>>> Alan Lawrence wrote: > > >>>>>>> We've been seeing a bunch of new failures in the *libffi* > > >>> testsuite on ARM > > >>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), > > >>> following this > > >>>>>>> one-liner fix. I've reduced the testcase down to the attached > > >>> (including > > >>>>>>> removing any dependency on libffi); with gcc r221347, this prints > > >>> the > > >>>>>>> expected > > >>>>>>> 7 8 9 > > >>>>>>> whereas with gcc r221348, instead it prints > > >>>>>>> 0 8 0 > > >>>>>>> > > >>>>>>> The action of r221348 is to change the alignment of a mem_ref, and > > >>> a > > >>>>>>> var_decl of b1, from 32 to 64; both have type > > >>>>>>> type > >>> sizes-gimplified type_0 > > >>>>>>> BLK > > >>>>>>> size > > >>>>>>> unit size > > >>>>>>> align 64 symtab 0 alias set 1 canonical type > > >>> 0x2b9b8d428d20 > > >>>>>>> fields > >>>>>>> 0x2b9b8d092690 int> > > >>>>>>> SI file reduced.c line 12 col 7 > > >>>>>>> size > > >>>>>>> unit size > > >>>>>>> align 32 offset_align 64 > > >>>>>>> offset > > >>>>>>> bit offset > > >>> context > > >>>>>>> chain > >>>>>>> 0x2b9b8d42b130 b>> context > >>> D.6070> > > >>>>>>> pointer_to_this chain > > >>> > >>>>>>> 0x2b9b8d42b000 D.6044>> > > >>>>>>> > > >>>>>>> The tree-optimized output is the same with both compilers (as this > > >>> does not > > >>>>>>> mention alignment); the expand output differs. > > >>>>>>> > > >>>>>>> Still investigating... > > >>>>>>> > > >>>>>>> --Alan > > >>>>>>> > > >>>>>>> > > >>>>>>> Richard Biener wrote: > > >>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA > > >>>>>>>> drops alignment info unnecessarily. > > >>>>>>>> > > >>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. > > >>>>>>>> > > >>>>>>>> Richard. > > >>>>>>>> > > >>>>>>>> 2015-03-11 Richard Biener > > >>>>>>>> > > >>>>>>>> PR tree-optimization/65310 > > >>>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger > > >>>>>>>> alignment. > > >>>>>>>> > > >>>>>>>> Index: gcc/tree-sra.c > > >>>>>>>> > > >>> =================================================================== > > >>>>>>>> --- gcc/tree-sra.c (revision 221324) > > >>>>>>>> +++ gcc/tree-sra.c (working copy) > > >>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr > > >>>>>>>> misalign = (misalign + offset) & (align - 1); > > >>>>>>>> if (misalign != 0) > > >>>>>>>> align = (misalign & -misalign); > > >>>>>>>> - if (align < TYPE_ALIGN (exp_type)) > > >>>>>>>> + if (align != TYPE_ALIGN (exp_type)) > > >>>>>>>> exp_type = build_aligned_type (exp_type, align); > > >>>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, > > >>> off); > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>>> > > >> > > >> > > > > > > > > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)