From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 37297 invoked by alias); 31 Mar 2015 07:50:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 37283 invoked by uid 89); 31 Mar 2015 07:50:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_FROM_URIBL_PCCC,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=no version=3.3.2 X-HELO: mail-ob0-f181.google.com Received: from mail-ob0-f181.google.com (HELO mail-ob0-f181.google.com) (209.85.214.181) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 31 Mar 2015 07:50:35 +0000 Received: by obbgh1 with SMTP id gh1so14421608obb.1 for ; Tue, 31 Mar 2015 00:50:33 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.182.72.169 with SMTP id e9mr31992532obv.23.1427788233268; Tue, 31 Mar 2015 00:50:33 -0700 (PDT) Received: by 10.76.98.137 with HTTP; Tue, 31 Mar 2015 00:50:33 -0700 (PDT) In-Reply-To: <16A8C5AA-994C-4FA3-BC25-8547166DC13C@suse.de> References: <55193E77.3040401@arm.com> <55193EFB.6030009@arm.com> <55197DAE.1050004@arm.com> <16A8C5AA-994C-4FA3-BC25-8547166DC13C@suse.de> Date: Tue, 31 Mar 2015 07:50:00 -0000 Message-ID: Subject: Re: New regression on ARM Linux From: Richard Biener To: Richard Biener Cc: Alan Lawrence , "gcc-patches@gcc.gnu.org" , Marcus Shawcroft , Richard Earnshaw Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-03/txt/msg01616.txt.bz2 On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener wrote: > On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence wrote: >>-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes >>it. >> >>The problem appears to be in laying out arguments, specifically >>varargs. From >>the "good" -fdump-rtl-expand: >> >>(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 >>S4 A32]) >> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >> (nil)) >>(insn 19 18 20 2 (set (reg:DF 2 r2) >> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >> (nil)) >>(insn 20 19 21 2 (set (reg:SI 1 r1) >> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >> (nil)) >>(insn 21 20 22 2 (set (reg:SI 0 r0) >> (reg:SI 118)) reduced.c:14 -1 >> (nil)) >>(call_insn 22 21 23 2 (parallel [ >> (set (reg:SI 0 r0) >> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>) [0 __builtin_printf S4 >>A32]) >> >>The struct members are >>reg:SI 113 => int a; >>reg:DF 112 => double b; >>reg:SI 111 => int c; >> >>r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is >>pushed >>into virtual-outgoing-args. In contrast, post-change to >>build_ref_of_offset, we get: >> >>(insn 17 16 18 2 (set (reg:SI 118) >> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] >*.LC1>)) reduced.c:14 -1 >> (nil)) >>(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 >>virtual-outgoing-args) >> (const_int 8 [0x8])) [0 S4 A64]) >> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >> (nil)) >>(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 >>S8 A64]) >> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >> (nil)) >>(insn 20 19 21 2 (set (reg:SI 2 r2) >> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >> (nil)) >>(insn 21 20 22 2 (set (reg:SI 0 r0) >> (reg:SI 118)) reduced.c:14 -1 >> (nil)) >>(call_insn 22 21 23 2 (parallel [ >> (set (reg:SI 0 r0) >> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>) [0 __builtin_printf S4 >>A32]) >> >>r0 still gets the format string, but 'int b1.a' now goes in r2, and the >> >>double+following int are all pushed into virtual-outgoing-args. This is >>because >>arm_function_arg is fed a 64-bit-aligned int as type of the second >>argument (the >>type constructed by build_ref_for_offset); it then executes >>(aapcs_layout_arg, >>arm.c line ~~5914) >> >> /* C3 - For double-word aligned arguments, round the NCRN up to the >> next even number. */ >> ncrn = pcum->aapcs_ncrn; >> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) >> ncrn++; >> >>Which changes r1 to r2. Passing -fno-tree-sra, or removing from the >>testcase >>"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a >>32-bit-aligned int instead, which works as previously. >> >>Passing the same members of that struct in a non-vargs call, works ok - >>I think >>because these use the type of the declared parameters, rather than the >>provided >>arguments, and the former do not have the increased alignment from >>build_ref_for_offset. > > It doesn't make sense to use the alignment of passed values. That looks like bs. > > This means that > > Int I __aligned__(8); > > Is passed differently than int. > > Arm_function_arg needs to be fixed. That is, typedef int myint __attribute__((aligned(8))); int main() { myint i = 1; int j = 2; __builtin_printf("%d %d\n", i, j); } or myint i; int j; myint *p = &i; int *q = &j; int main() { __builtin_printf("%d %d", *p, *q); } should behave the same. There isn't a printf modifier for an "aligned int" because that sort of thing doesn't make sense. Special-casing aligned vs. non-aligned values only makes sense for things passed by value on the stack. And then obviously only dependent on the functuion type signature, not on the type of the passed value. Richard. > Richard. > >> >>FWIW, I also tried: >> >>__attribute__((__aligned__((16)))) int x; >>int main (void) >>{ >> __builtin_printf("%d\n", x); >>} >> >>but in that case, the arm_function_arg is still fed a type with >>alignment 32 >>(bits), i.e. distinct from the type of the field 'x' in memory, which >>has >>alignment 128. >> >>--Alan >> >>Richard Biener wrote: >>> On Mon, 30 Mar 2015, Richard Biener wrote: >>> >>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: >>>> >>>>> ...actually attach the testcase... >>>> What compile options? >>> >>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but >>> I can't see anything not guaranteeing that: >>> >>> .section .rodata >>> .align 3 >>> .LANCHOR0 = . + 0 >>> .LC1: >>> .ascii "%d %g %d\012\000" >>> .space 6 >>> .LC0: >>> .word 7 >>> .space 4 >>> .word 0 >>> .word 1075838976 >>> .word 9 >>> .space 4 >>> >>> maybe there is some more generic code-gen bug for aligned aggregate >>> copy? That is, the patch tells the backend that the loads and >>> stores to the 'int' vars (which have padding followed) is aligned >>> to 8 bytes. >>> >>> I don't see what is wrong in the final assembler, but maybe >>> some endian issue exists? The code looks quite ugly though ;) >>> >>> Richard. >>> >>>>> Alan Lawrence wrote: >>>>>> We've been seeing a bunch of new failures in the *libffi* >>testsuite on ARM >>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), >>following this >>>>>> one-liner fix. I've reduced the testcase down to the attached >>(including >>>>>> removing any dependency on libffi); with gcc r221347, this prints >>the >>>>>> expected >>>>>> 7 8 9 >>>>>> whereas with gcc r221348, instead it prints >>>>>> 0 8 0 >>>>>> >>>>>> The action of r221348 is to change the alignment of a mem_ref, and >>a >>>>>> var_decl of b1, from 32 to 64; both have type >>>>>> type >sizes-gimplified type_0 >>>>>> BLK >>>>>> size >>>>>> unit size >>>>>> align 64 symtab 0 alias set 1 canonical type >>0x2b9b8d428d20 >>>>>> fields >>>>> 0x2b9b8d092690 int> >>>>>> SI file reduced.c line 12 col 7 >>>>>> size >>>>>> unit size >>>>>> align 32 offset_align 64 >>>>>> offset >>>>>> bit offset >>context >>>>>> chain >>>>> 0x2b9b8d42b130 b>> context >D.6070> >>>>>> pointer_to_this chain >>>>>>> 0x2b9b8d42b000 D.6044>> >>>>>> >>>>>> The tree-optimized output is the same with both compilers (as this >>does not >>>>>> mention alignment); the expand output differs. >>>>>> >>>>>> Still investigating... >>>>>> >>>>>> --Alan >>>>>> >>>>>> >>>>>> Richard Biener wrote: >>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA >>>>>>> drops alignment info unnecessarily. >>>>>>> >>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>> 2015-03-11 Richard Biener >>>>>>> >>>>>>> PR tree-optimization/65310 >>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger >>>>>>> alignment. >>>>>>> >>>>>>> Index: gcc/tree-sra.c >>>>>>> >>=================================================================== >>>>>>> --- gcc/tree-sra.c (revision 221324) >>>>>>> +++ gcc/tree-sra.c (working copy) >>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >>>>>>> misalign = (misalign + offset) & (align - 1); >>>>>>> if (misalign != 0) >>>>>>> align = (misalign & -misalign); >>>>>>> - if (align < TYPE_ALIGN (exp_type)) >>>>>>> + if (align != TYPE_ALIGN (exp_type)) >>>>>>> exp_type = build_aligned_type (exp_type, align); >>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, >>off); >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> > >