From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 86288 invoked by alias); 30 Mar 2015 20:13:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 86275 invoked by uid 89); 30 Mar 2015 20:13:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx2.suse.de Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Mon, 30 Mar 2015 20:13:23 +0000 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id C29C0AD28; Mon, 30 Mar 2015 20:13:19 +0000 (UTC) User-Agent: K-9 Mail for Android In-Reply-To: <55197DAE.1050004@arm.com> References: <55193E77.3040401@arm.com> <55193EFB.6030009@arm.com> <55197DAE.1050004@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Subject: Re: New regression on ARM Linux From: Richard Biener Date: Mon, 30 Mar 2015 20:13:00 -0000 To: Alan Lawrence CC: "gcc-patches@gcc.gnu.org" ,Marcus Shawcroft ,Richard Earnshaw Message-ID: <16A8C5AA-994C-4FA3-BC25-8547166DC13C@suse.de> X-SW-Source: 2015-03/txt/msg01590.txt.bz2 On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence wrote: >-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes >it. > >The problem appears to be in laying out arguments, specifically >varargs. From >the "good" -fdump-rtl-expand: > >(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 >S4 A32]) > (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > (nil)) >(insn 19 18 20 2 (set (reg:DF 2 r2) > (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > (nil)) >(insn 20 19 21 2 (set (reg:SI 1 r1) > (reg:SI 113 [ b1 ])) reduced.c:14 -1 > (nil)) >(insn 21 20 22 2 (set (reg:SI 0 r0) > (reg:SI 118)) reduced.c:14 -1 > (nil)) >(call_insn 22 21 23 2 (parallel [ > (set (reg:SI 0 r0) > (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >) [0 __builtin_printf S4 >A32]) > >The struct members are >reg:SI 113 => int a; >reg:DF 112 => double b; >reg:SI 111 => int c; > >r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is >pushed >into virtual-outgoing-args. In contrast, post-change to >build_ref_of_offset, we get: > >(insn 17 16 18 2 (set (reg:SI 118) > (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] *.LC1>)) reduced.c:14 -1 > (nil)) >(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 >virtual-outgoing-args) > (const_int 8 [0x8])) [0 S4 A64]) > (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > (nil)) >(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 >S8 A64]) > (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > (nil)) >(insn 20 19 21 2 (set (reg:SI 2 r2) > (reg:SI 113 [ b1 ])) reduced.c:14 -1 > (nil)) >(insn 21 20 22 2 (set (reg:SI 0 r0) > (reg:SI 118)) reduced.c:14 -1 > (nil)) >(call_insn 22 21 23 2 (parallel [ > (set (reg:SI 0 r0) > (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >) [0 __builtin_printf S4 >A32]) > >r0 still gets the format string, but 'int b1.a' now goes in r2, and the > >double+following int are all pushed into virtual-outgoing-args. This is >because >arm_function_arg is fed a 64-bit-aligned int as type of the second >argument (the >type constructed by build_ref_for_offset); it then executes >(aapcs_layout_arg, >arm.c line ~~5914) > > /* C3 - For double-word aligned arguments, round the NCRN up to the > next even number. */ > ncrn = pcum->aapcs_ncrn; > if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) > ncrn++; > >Which changes r1 to r2. Passing -fno-tree-sra, or removing from the >testcase >"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a >32-bit-aligned int instead, which works as previously. > >Passing the same members of that struct in a non-vargs call, works ok - >I think >because these use the type of the declared parameters, rather than the >provided >arguments, and the former do not have the increased alignment from >build_ref_for_offset. It doesn't make sense to use the alignment of passed values. That looks like bs. This means that Int I __aligned__(8); Is passed differently than int. Arm_function_arg needs to be fixed. Richard. > >FWIW, I also tried: > >__attribute__((__aligned__((16)))) int x; >int main (void) >{ > __builtin_printf("%d\n", x); >} > >but in that case, the arm_function_arg is still fed a type with >alignment 32 >(bits), i.e. distinct from the type of the field 'x' in memory, which >has >alignment 128. > >--Alan > >Richard Biener wrote: >> On Mon, 30 Mar 2015, Richard Biener wrote: >> >>> On Mon, 30 Mar 2015, Alan Lawrence wrote: >>> >>>> ...actually attach the testcase... >>> What compile options? >> >> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but >> I can't see anything not guaranteeing that: >> >> .section .rodata >> .align 3 >> .LANCHOR0 = . + 0 >> .LC1: >> .ascii "%d %g %d\012\000" >> .space 6 >> .LC0: >> .word 7 >> .space 4 >> .word 0 >> .word 1075838976 >> .word 9 >> .space 4 >> >> maybe there is some more generic code-gen bug for aligned aggregate >> copy? That is, the patch tells the backend that the loads and >> stores to the 'int' vars (which have padding followed) is aligned >> to 8 bytes. >> >> I don't see what is wrong in the final assembler, but maybe >> some endian issue exists? The code looks quite ugly though ;) >> >> Richard. >> >>>> Alan Lawrence wrote: >>>>> We've been seeing a bunch of new failures in the *libffi* >testsuite on ARM >>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), >following this >>>>> one-liner fix. I've reduced the testcase down to the attached >(including >>>>> removing any dependency on libffi); with gcc r221347, this prints >the >>>>> expected >>>>> 7 8 9 >>>>> whereas with gcc r221348, instead it prints >>>>> 0 8 0 >>>>> >>>>> The action of r221348 is to change the alignment of a mem_ref, and >a >>>>> var_decl of b1, from 32 to 64; both have type >>>>> type sizes-gimplified type_0 >>>>> BLK >>>>> size >>>>> unit size >>>>> align 64 symtab 0 alias set 1 canonical type >0x2b9b8d428d20 >>>>> fields >>>> 0x2b9b8d092690 int> >>>>> SI file reduced.c line 12 col 7 >>>>> size >>>>> unit size >>>>> align 32 offset_align 64 >>>>> offset >>>>> bit offset >context >>>>> chain >>>> 0x2b9b8d42b130 b>> context D.6070> >>>>> pointer_to_this chain >>>>> 0x2b9b8d42b000 D.6044>> >>>>> >>>>> The tree-optimized output is the same with both compilers (as this >does not >>>>> mention alignment); the expand output differs. >>>>> >>>>> Still investigating... >>>>> >>>>> --Alan >>>>> >>>>> >>>>> Richard Biener wrote: >>>>>> This fixes a vectorizer testcase regression on powerpc where SRA >>>>>> drops alignment info unnecessarily. >>>>>> >>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >>>>>> >>>>>> Richard. >>>>>> >>>>>> 2015-03-11 Richard Biener >>>>>> >>>>>> PR tree-optimization/65310 >>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger >>>>>> alignment. >>>>>> >>>>>> Index: gcc/tree-sra.c >>>>>> >=================================================================== >>>>>> --- gcc/tree-sra.c (revision 221324) >>>>>> +++ gcc/tree-sra.c (working copy) >>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >>>>>> misalign = (misalign + offset) & (align - 1); >>>>>> if (misalign != 0) >>>>>> align = (misalign & -misalign); >>>>>> - if (align < TYPE_ALIGN (exp_type)) >>>>>> + if (align != TYPE_ALIGN (exp_type)) >>>>>> exp_type = build_aligned_type (exp_type, align); >>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, >off); >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >>