From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31202 invoked by alias); 6 May 2011 09:56:31 -0000 Received: (qmail 31193 invoked by uid 22791); 6 May 2011 09:56:30 -0000 X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from service87.mimecast.com (HELO service87.mimecast.com) (94.185.240.25) by sourceware.org (qpsmtpd/0.43rc1) with SMTP; Fri, 06 May 2011 09:56:11 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Fri, 06 May 2011 10:56:07 +0100 Received: from [10.1.67.34] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Fri, 6 May 2011 10:56:03 +0100 Subject: Re: [patch, ARM] Fix PR target/48252 From: Richard Earnshaw To: Ira Rosen Cc: Ramana Radhakrishnan , gcc-patches@gcc.gnu.org, Ira Rosen , Patch Tracking , Ulrich Weigand In-Reply-To: References: <4D9DAB2C.6010306@linaro.org> Date: Fri, 06 May 2011 09:57:00 -0000 Message-Id: <1304675762.5165.1.camel@e102346-lin.cambridge.arm.com> Mime-Version: 1.0 X-MC-Unique: 111050610560703101 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-05/txt/msg00494.txt.bz2 On Sun, 2011-05-01 at 10:30 +0300, Ira Rosen wrote: >=20 > Ramana Radhakrishnan wrote on 07/04/2011 > 03:16:44 PM: >=20 > > > > On 07/04/11 08:42, Ira Rosen wrote: > > > Hi, > > > > > > This patch makes both outputs of neon_vzip/vuzp/vtrn_internal > > > explicitly dependent on both inputs, preventing incorrect > > > optimization: > > > for > > > (a,b)<- vzip (c,d) > > > and > > > (e,f)<- vzip (g,d) > > > CSE decides that b=3D=3Df, since b and f depend only on d. > > > > > > Tested on arm-linux-gnueabi. OK for trunk? > > > > This is OK for trunk. > > > > > OK for 4.6 after testing? > > I don't understand how it has happened, but the 4.6 patch that has been committed is corrupt (the patch submitted here looks OK). Please remember that it is essential to test release branches before commits are made. R. > > I have no objections to this going into 4.5 and 4.6 since it corrects > > the implementation of the neon intrinsics but please check with the > > release managers. >=20 > OK to backport to 4.5 and 4.6 - both tested on arm-linux-gnueabi? >=20 > Thanks, > Ira >=20 > 4.5 and 4.6 ChangeLog: >=20 > Backport from mainline: > 2011-04-18 Ulrich Weigand > Ira Rosen >=20 > PR target/48252 > * config/arm/arm.c (neon_emit_pair_result_insn): Swap arguments > to match neon_vzip/vuzp/vtrn_internal. > * config/arm/neon.md (neon_vtrn_internal): Make both > outputs explicitly dependent on both inputs. > (neon_vzip_internal, neon_vuzp_internal): Likewise. >=20 > testsuite/Changelog: >=20 > Backport from mainline: > 2011-04-18 Ulrich Weigand > Ira Rosen >=20 > PR target/48252 > * gcc.target/arm/pr48252.c: New test. >=20 >=20 > 4.5 patch: >=20 > Index: config/arm/arm.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- config/arm/arm.c (revision 172714) > +++ config/arm/arm.c (working copy) > @@ -18237,7 +18237,7 @@ neon_emit_pair_result_insn (enum machine_mode mode > rtx tmp1 =3D gen_reg_rtx (mode); > rtx tmp2 =3D gen_reg_rtx (mode); >=20 > - emit_insn (intfn (tmp1, op1, tmp2, op2)); > + emit_insn (intfn (tmp1, op1, op2, tmp2)); >=20 > emit_move_insn (mem, tmp1); > mem =3D adjust_address (mem, mode, GET_MODE_SIZE (mode)); > Index: config/arm/neon.md > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- config/arm/neon.md (revision 172714) > +++ config/arm/neon.md (working copy) > @@ -3895,13 +3895,14 @@ >=20 > (define_insn "neon_vtrn_internal" > [(set (match_operand:VDQW 0 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] > - UNSPEC_VTRN1)) > - (set (match_operand:VDQW 2 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] > - UNSPEC_VTRN2))] > + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") > + (match_operand:VDQW 2 "s_register_operand" "w")] > + UNSPEC_VTRN1)) > + (set (match_operand:VDQW 3 "s_register_operand" "=3D2") > + (unspec:VDQW [(match_dup 1) (match_dup 2)] > + UNSPEC_VTRN2))] > "TARGET_NEON" > - "vtrn.\t%0, %2" > + "vtrn.\t%0, %3" > [(set (attr "neon_type") > (if_then_else (ne (symbol_ref "") (const_int 0)) > (const_string "neon_bp_simple") > @@ -3921,13 +3922,14 @@ >=20 > (define_insn "neon_vzip_internal" > [(set (match_operand:VDQW 0 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] > - UNSPEC_VZIP1)) > - (set (match_operand:VDQW 2 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] > - UNSPEC_VZIP2))] > + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") > + (match_operand:VDQW 2 "s_register_operand" "w")] > + UNSPEC_VZIP1)) > + (set (match_operand:VDQW 3 "s_register_operand" "=3D2") > + (unspec:VDQW [(match_dup 1) (match_dup 2)] > + UNSPEC_VZIP2))] > "TARGET_NEON" > - "vzip.\t%0, %2" > + "vzip.\t%0, %3" > [(set (attr "neon_type") > (if_then_else (ne (symbol_ref "") (const_int 0)) > (const_string "neon_bp_simple") > @@ -3947,13 +3949,14 @@ >=20 > (define_insn "neon_vuzp_internal" > [(set (match_operand:VDQW 0 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] > + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") > + (match_operand:VDQW 2 "s_register_operand" "w")] > UNSPEC_VUZP1)) > - (set (match_operand:VDQW 2 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] > - UNSPEC_VUZP2))] > + (set (match_operand:VDQW 3 "s_register_operand" "=3D2") > + (unspec:VDQW [(match_dup 1) (match_dup 2)] > + UNSPEC_VUZP2))] > "TARGET_NEON" > - "vuzp.\t%0, %2" > + "vuzp.\t%0, %3" > [(set (attr "neon_type") > (if_then_else (ne (symbol_ref "") (const_int 0)) > (const_string "neon_bp_simple") > Index: testsuite/gcc.target/arm/pr48252.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- testsuite/gcc.target/arm/pr48252.c (revision 0) > +++ testsuite/gcc.target/arm/pr48252.c (revision 0) > @@ -0,0 +1,32 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target arm_neon_hw } */ > +/* { dg-options "-O2" } */ > +/* { dg-add-options arm_neon } */ > + > +#include "arm_neon.h" > +#include > + > +int main(void) > +{ > + uint8x8_t v1 =3D {1, 1, 1, 1, 1, 1, 1, 1}; > + uint8x8_t v2 =3D {2, 2, 2, 2, 2, 2, 2, 2}; > + uint8x8x2_t vd1, vd2; > + union {uint8x8_t v; uint8_t buf[8];} d1, d2, d3, d4; > + int i; > + > + vd1 =3D vzip_u8(v1, vdup_n_u8(0)); > + vd2 =3D vzip_u8(v2, vdup_n_u8(0)); > + > + vst1_u8(d1.buf, vd1.val[0]); > + vst1_u8(d2.buf, vd1.val[1]); > + vst1_u8(d3.buf, vd2.val[0]); > + vst1_u8(d4.buf, vd2.val[1]); > + > + for (i =3D 0; i < 8; i++) > + if ((i % 2 =3D=3D 0 && d4.buf[i] !=3D 2) > + || (i % 2 =3D=3D 1 && d4.buf[i] !=3D 0)) > + abort (); > + > + return 0; > +} > + >=20 >=20 > 4.6 patch: >=20 > Index: config/arm/arm.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- config/arm/arm.c (revision 172810) > +++ config/arm/arm.c (working copy) > @@ -19564,7 +19564,7 @@ neon_emit_pair_result_insn (enum machine_mode mode > rtx tmp1 =3D gen_reg_rtx (mode); > rtx tmp2 =3D gen_reg_rtx (mode); >=20 > - emit_insn (intfn (tmp1, op1, tmp2, op2)); > + emit_insn (intfn (tmp1, op1, op2, tmp2)); >=20 > emit_move_insn (mem, tmp1); > mem =3D adjust_address (mem, mode, GET_MODE_SIZE (mode)); > Index: config/arm/neon.md > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- config/arm/neon.md (revision 172810) > +++ config/arm/neon.md (working copy) > @@ -4079,13 +4079,14 @@ >=20 > (define_insn "neon_vtrn_internal" > [(set (match_operand:VDQW 0 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] > - UNSPEC_VTRN1)) > - (set (match_operand:VDQW 2 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] > - UNSPEC_VTRN2))] > + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") > + (match_operand:VDQW 2 "s_register_operand" "w")] > + UNSPEC_VTRN1)) > + (set (match_operand:VDQW 3 "s_register_operand" "=3D2") > + (unspec:VDQW [(match_dup 1) (match_dup 2)] > + UNSPEC_VTRN2))] > "TARGET_NEON" > - "vtrn.\t%0, %2" > + "vtrn.\t%0, %3" > [(set (attr "neon_type") > (if_then_else (ne (symbol_ref "") (const_int 0)) > (const_string "neon_bp_simple") > @@ -4105,13 +4106,14 @@ >=20 > (define_insn "neon_vzip_internal" > [(set (match_operand:VDQW 0 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] > - UNSPEC_VZIP1)) > - (set (match_operand:VDQW 2 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] > - UNSPEC_VZIP2))] > + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") > + (match_operand:VDQW 2 "s_register_operand" "w")] > + UNSPEC_VZIP1)) > + (set (match_operand:VDQW 3 "s_register_operand" "=3D2") > + (unspec:VDQW [(match_dup 1) (match_dup 2)] > + UNSPEC_VZIP2))] > "TARGET_NEON" > - "vzip.\t%0, %2" > + "vzip.\t%0, %3" > [(set (attr "neon_type") > (if_then_else (ne (symbol_ref "") (const_int 0)) > (const_string "neon_bp_simple") > @@ -4131,13 +4133,14 @@ >=20 > (define_insn "neon_vuzp_internal" > [(set (match_operand:VDQW 0 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] > + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") > + (match_operand:VDQW 2 "s_register_operand" "w")] > UNSPEC_VUZP1)) > - (set (match_operand:VDQW 2 "s_register_operand" "=3Dw") > - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] > - UNSPEC_VUZP2))] > + (set (match_operand:VDQW 3 "s_register_operand" "=3D2") > + (unspec:VDQW [(match_dup 1) (match_dup 2)] > + UNSPEC_VUZP2))] > "TARGET_NEON" > - "vuzp.\t%0, %2" > + "vuzp.\t%0, %3" > [(set (attr "neon_type") > (if_then_else (ne (symbol_ref "") (const_int 0)) > (const_string "neon_bp_simple") > Index: testsuite/gcc.target/arm/pr48252.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- testsuite/gcc.target/arm/pr48252.c (revision 0) > +++ testsuite/gcc.target/arm/pr48252.c (revision 0) > @@ -0,0 +1,32 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target arm_neon_hw } */ > +/* { dg-options "-O2" } */ > +/* { dg-add-options arm_neon } */ > + > +#include "arm_neon.h" > +#include > + > +int main(void) > +{ > + uint8x8_t v1 =3D {1, 1, 1, 1, 1, 1, 1, 1}; > + uint8x8_t v2 =3D {2, 2, 2, 2, 2, 2, 2, 2}; > + uint8x8x2_t vd1, vd2; > + union {uint8x8_t v; uint8_t buf[8];} d1, d2, d3, d4; > + int i; > + > + vd1 =3D vzip_u8(v1, vdup_n_u8(0)); > + vd2 =3D vzip_u8(v2, vdup_n_u8(0)); > + > + vst1_u8(d1.buf, vd1.val[0]); > + vst1_u8(d2.buf, vd1.val[1]); > + vst1_u8(d3.buf, vd2.val[0]); > + vst1_u8(d4.buf, vd2.val[1]); > + > + for (i =3D 0; i < 8; i++) > + if ((i % 2 =3D=3D 0 && d4.buf[i] !=3D 2) > + || (i % 2 =3D=3D 1 && d4.buf[i] !=3D 0)) > + abort (); > + > + return 0; > +} > + >=20 >=20 > > > > cheers > > Ramana > > > > > > > > Thanks, > > > Ira > > > > > > ChangeLog: > > > > > > 2011-04-07 Ulrich Weigand > > > Ira Rosen > > > > > > PR target/48252 > > > * config/arm/arm.c (neon_emit_pair_result_insn): Swap arguments > > > to match neon_vzip/vuzp/vtrn_internal. > > > * config/arm/neon.md (neon_vtrn_internal): Make both > > > outputs explicitly dependent on both inputs. > > > (neon_vzip_internal, neon_vuzp_internal): Likewise. > > > > > > testsuite/Changelog: > > > > > > PR target/48252 > > > * gcc.target/arm/pr48252.c: New test. > > >=20