From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15466 invoked by alias); 17 Nov 2015 12:58:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 15448 invoked by uid 89); 17 Nov 2015 12:58:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_50,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 17 Nov 2015 12:58:15 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-12-WdR9B3ayR_qDDFoWRdIm9Q-1; Tue, 17 Nov 2015 12:58:10 +0000 Received: from [10.2.206.200] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 17 Nov 2015 12:58:10 +0000 Message-ID: <564B2462.90704@arm.com> Date: Tue, 17 Nov 2015 12:58:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Ramana Radhakrishnan , GCC Patches CC: Ramana Radhakrishnan , Richard Earnshaw Subject: Re: [PATCH][ARM] PR 68143 Properly update memory offsets when expanding setmem References: <563C84ED.4010603@arm.com> <564B1764.8000202@foss.arm.com> In-Reply-To: <564B1764.8000202@foss.arm.com> X-MC-Unique: WdR9B3ayR_qDDFoWRdIm9Q-1 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-11/txt/msg02079.txt.bz2 Hi Ramana, On 17/11/15 12:02, Ramana Radhakrishnan wrote: > > On 06/11/15 10:46, Kyrill Tkachov wrote: >> Hi all, >> >> In this wrong-code PR the vector setmem expansion and arm_block_set_alig= ned_vect in particular >> use the wrong offset when calling adjust_automodify_address. In the atta= ched testcase during the >> initial zeroing out we get two V16QI stores, but they both are recorded = by adjust_automodify_address >> as modifying x+0 rather than x+0 and x+12 (the total size to be written = is 28). >> >> This led to the scheduling pass moving the store from "x.g =3D 2;" to be= fore the zeroing stores. >> >> This patch fixes the problem by keeping track of the offset to which sto= res are emitted and >> passing it to adjust_automodify_address as appropriate. >> >> From inspection I see arm_block_set_unaligned_vect also has this issue = so I performed the same >> fix in that function as well. >> >> Bootstrapped and tested on arm-none-linux-gnueabihf. >> >> Ok for trunk? >> >> This bug appears on GCC 5 too and I'm currently testing this patch there. >> Ok to backport to GCC 5 as well? >> Thanks, >> Kyrill >> >> 2015-11-06 Kyrylo Tkachov >> >> PR target/68143 >> * config/arm/arm.c (arm_block_set_unaligned_vect): Keep track of >> offset from dstbase and use it appropriately in >> adjust_automodify_address. >> (arm_block_set_aligned_vect): Likewise. >> >> 2015-11-06 Kyrylo Tkachov >> >> PR target/68143 >> * gcc.target/arm/pr68143_1.c: New test. > Sorry about the delay in reviewing this. There's nothing arm specific abo= ut this test - I'd just put this in gcc.c-torture/execute, there are enough= auto-testers with neon on that will show up issues if this starts failing. Thanks, will do. I was on the fence about whether this should go in torture. I'll put it there. Kyrill > > Ok with that change. > > Ramana > >> arm-setmem-offset.patch >> >> >> commit 78c6989a7af1df672ea227057180d79d717ed5f3 >> Author: Kyrylo Tkachov >> Date: Wed Oct 28 17:29:18 2015 +0000 >> >> [ARM] Properly update memory offsets when expanding setmem >> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index 66e8afc..adf3143 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -29268,7 +29268,7 @@ arm_block_set_unaligned_vect (rtx dstbase, >> rtx (*gen_func) (rtx, rtx); >> machine_mode mode; >> unsigned HOST_WIDE_INT v =3D value; >> - >> + unsigned int offset =3D 0; >> gcc_assert ((align & 0x3) !=3D 0); >> nelt_v8 =3D GET_MODE_NUNITS (V8QImode); >> nelt_v16 =3D GET_MODE_NUNITS (V16QImode); >> @@ -29289,7 +29289,7 @@ arm_block_set_unaligned_vect (rtx dstbase, >> return false; >>=20=20=20 >> dst =3D copy_addr_to_reg (XEXP (dstbase, 0)); >> - mem =3D adjust_automodify_address (dstbase, mode, dst, 0); >> + mem =3D adjust_automodify_address (dstbase, mode, dst, offset); >>=20=20=20 >> v =3D sext_hwi (v, BITS_PER_WORD); >> val_elt =3D GEN_INT (v); >> @@ -29306,7 +29306,11 @@ arm_block_set_unaligned_vect (rtx dstbase, >> { >> emit_insn ((*gen_func) (mem, reg)); >> if (i + 2 * nelt_mode <=3D length) >> - emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode))); >> + { >> + emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode))); >> + offset +=3D nelt_mode; >> + mem =3D adjust_automodify_address (dstbase, mode, dst, offset); >> + } >> } >>=20=20=20 >> /* If there are not less than nelt_v8 bytes leftover, we must be in >> @@ -29317,6 +29321,9 @@ arm_block_set_unaligned_vect (rtx dstbase, >> if (i + nelt_v8 < length) >> { >> emit_insn (gen_add2_insn (dst, GEN_INT (length - i))); >> + offset +=3D length - i; >> + mem =3D adjust_automodify_address (dstbase, mode, dst, offset); >> + >> /* We are shifting bytes back, set the alignment accordingly. */ >> if ((length & 1) !=3D 0 && align >=3D 2) >> set_mem_align (mem, BITS_PER_UNIT); >> @@ -29327,12 +29334,13 @@ arm_block_set_unaligned_vect (rtx dstbase, >> else if (i < length && i + nelt_v8 >=3D length) >> { >> if (mode =3D=3D V16QImode) >> - { >> - reg =3D gen_lowpart (V8QImode, reg); >> - mem =3D adjust_automodify_address (dstbase, V8QImode, dst, 0); >> - } >> + reg =3D gen_lowpart (V8QImode, reg); >> + >> emit_insn (gen_add2_insn (dst, GEN_INT ((length - i) >> + (nelt_mode - nelt_v8)))); >> + offset +=3D (length - i) + (nelt_mode - nelt_v8); >> + mem =3D adjust_automodify_address (dstbase, V8QImode, dst, offset= ); >> + >> /* We are shifting bytes back, set the alignment accordingly. */ >> if ((length & 1) !=3D 0 && align >=3D 2) >> set_mem_align (mem, BITS_PER_UNIT); >> @@ -29359,6 +29367,7 @@ arm_block_set_aligned_vect (rtx dstbase, >> rtx rval[MAX_VECT_LEN]; >> machine_mode mode; >> unsigned HOST_WIDE_INT v =3D value; >> + unsigned int offset =3D 0; >>=20=20=20 >> gcc_assert ((align & 0x3) =3D=3D 0); >> nelt_v8 =3D GET_MODE_NUNITS (V8QImode); >> @@ -29390,14 +29399,15 @@ arm_block_set_aligned_vect (rtx dstbase, >> /* Handle first 16 bytes specially using vst1:v16qi instruction. */ >> if (mode =3D=3D V16QImode) >> { >> - mem =3D adjust_automodify_address (dstbase, mode, dst, 0); >> + mem =3D adjust_automodify_address (dstbase, mode, dst, offset); >> emit_insn (gen_movmisalignv16qi (mem, reg)); >> i +=3D nelt_mode; >> /* Handle (8, 16) bytes leftover using vst1:v16qi again. */ >> if (i + nelt_v8 < length && i + nelt_v16 > length) >> { >> emit_insn (gen_add2_insn (dst, GEN_INT (length - nelt_mode))); >> - mem =3D adjust_automodify_address (dstbase, mode, dst, 0); >> + offset +=3D length - nelt_mode; >> + mem =3D adjust_automodify_address (dstbase, mode, dst, offset); >> /* We are shifting bytes back, set the alignment accordingly. */ >> if ((length & 0x3) =3D=3D 0) >> set_mem_align (mem, BITS_PER_UNIT * 4); >> @@ -29419,7 +29429,7 @@ arm_block_set_aligned_vect (rtx dstbase, >> for (; (i + nelt_mode <=3D length); i +=3D nelt_mode) >> { >> addr =3D plus_constant (Pmode, dst, i); >> - mem =3D adjust_automodify_address (dstbase, mode, addr, i); >> + mem =3D adjust_automodify_address (dstbase, mode, addr, offset + = i); >> emit_move_insn (mem, reg); >> } >>=20=20=20 >> @@ -29428,8 +29438,8 @@ arm_block_set_aligned_vect (rtx dstbase, >> if (i + UNITS_PER_WORD =3D=3D length) >> { >> addr =3D plus_constant (Pmode, dst, i - UNITS_PER_WORD); >> - mem =3D adjust_automodify_address (dstbase, mode, >> - addr, i - UNITS_PER_WORD); >> + offset +=3D i - UNITS_PER_WORD; >> + mem =3D adjust_automodify_address (dstbase, mode, addr, offset); >> /* We are shifting 4 bytes back, set the alignment accordingly. = */ >> if (align > UNITS_PER_WORD) >> set_mem_align (mem, BITS_PER_UNIT * UNITS_PER_WORD); >> @@ -29441,7 +29451,8 @@ arm_block_set_aligned_vect (rtx dstbase, >> else if (i < length) >> { >> emit_insn (gen_add2_insn (dst, GEN_INT (length - nelt_mode))); >> - mem =3D adjust_automodify_address (dstbase, mode, dst, 0); >> + offset +=3D length - nelt_mode; >> + mem =3D adjust_automodify_address (dstbase, mode, dst, offset); >> /* We are shifting bytes back, set the alignment accordingly. */ >> if ((length & 1) =3D=3D 0) >> set_mem_align (mem, BITS_PER_UNIT * 2); >> diff --git a/gcc/testsuite/gcc.target/arm/pr68143_1.c b/gcc/testsuite/gc= c.target/arm/pr68143_1.c >> new file mode 100644 >> index 0000000..323473f >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/pr68143_1.c >> @@ -0,0 +1,36 @@ >> +/* { dg-do run } */ >> +/* { dg-require-effective-target arm_neon_hw } */ >> +/* { dg-options "-O3 -mcpu=3Dcortex-a57" } */ >> +/* { dg-add-options arm_neon } */ >> + >> +#define NULL 0 >> + >> +struct stuff >> +{ >> + int a; >> + int b; >> + int c; >> + int d; >> + int e; >> + char *f; >> + int g; >> +}; >> + >> +void __attribute__ ((noinline)) >> +bar (struct stuff *x) >> +{ >> + if (x->g !=3D 2) >> + __builtin_abort (); >> +} >> + >> +int >> +main (int argc, char** argv) >> +{ >> + struct stuff x =3D {0, 0, 0, 0, 0, NULL, 0}; >> + x.a =3D 100; >> + x.d =3D 100; >> + x.g =3D 2; >> + /* Struct should now look like {100, 0, 0, 100, 0, 0, 0, 2}. */ >> + bar (&x); >> + return 0; >> +} >>