From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17444 invoked by alias); 15 Jun 2011 16:40:56 -0000 Received: (qmail 17242 invoked by uid 22791); 15 Jun 2011 16:40:54 -0000 X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,TW_MX,TW_VP X-Spam-Check-By: sourceware.org Received: from mail-pv0-f175.google.com (HELO mail-pv0-f175.google.com) (74.125.83.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 15 Jun 2011 16:40:38 +0000 Received: by pvc30 with SMTP id 30so466364pvc.20 for ; Wed, 15 Jun 2011 09:40:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.25.201 with SMTP id e9mr539016pbg.22.1308156037947; Wed, 15 Jun 2011 09:40:37 -0700 (PDT) Received: by 10.68.47.69 with HTTP; Wed, 15 Jun 2011 09:40:37 -0700 (PDT) In-Reply-To: <20110615095406.GI17079@tyan-ft48-01.lab.bos.redhat.com> References: <20110615095406.GI17079@tyan-ft48-01.lab.bos.redhat.com> Date: Wed, 15 Jun 2011 17:00:00 -0000 Message-ID: Subject: Re: [PATCH] Fix ICEs with -mxop __builtin_ia32_vpermil2p[sd]{,256} and __builtin_ia32_vprot[bwdq]i intrinsics (PR target/49411) From: Quentin Neill To: Jakub Jelinek Cc: Uros Bizjak , Sebastian Pop , "Fang, Changpeng" , gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-06/txt/msg01170.txt.bz2 On Wed, Jun 15, 2011 at 4:54 AM, Jakub Jelinek wrote: > Hi! > > All of these _mm{,256}_permute2_p[sd] and _mm_roti_epi{8,16,32,64} > intrinsics ICE if the last argument is constant integer, but not in the > expected range. > > I could only find MSFT documentation for these intrinsics, where for > *permute2* it says that the last argument must be 0, 1, 2 or 3, > for *roti* it says that the last argument is integer rotation count, > preferrably constant and that if count is negative, it performs right > rotation instead of left rotation. > This patch adjusts the builtins to match that, if we want to instead > e.g. always mandate _mm_roti_epi* last argument is constant integer, > or constant integer in the range -N+1 .. N-1 where N is the number > after _mm_roti_epi, or in the range 0 .. N-1, it can be easily adjusted. > > Regtested on x86_64-linux {-m32,-m64}, unfortunately on a SandyBridge > box, so I couldn't verify if xop-rotate[12]-int.c actually succeeds > on xop capable HW. > > 2011-06-15 =A0Jakub Jelinek =A0 > > =A0 =A0 =A0 =A0PR target/49411 > =A0 =A0 =A0 =A0* config/i386/i386.c (ix86_expand_multi_arg_builtins): If > =A0 =A0 =A0 =A0last_arg_constant and last argument doesn't match its pred= icate, > =A0 =A0 =A0 =A0for xop_vpermil23 error out and for xop_rotl3 > =A0 =A0 =A0 =A0if it is CONST_INT, mask it, otherwise expand using rotl3. > > =A0 =A0 =A0 =A0* gcc.target/i386/xop-vpermil2px-1.c: New test. > =A0 =A0 =A0 =A0* gcc.target/i386/xop-vpermil2px-2.c: New test. > =A0 =A0 =A0 =A0* gcc.target/i386/xop-rotate1-int.c: New test. > =A0 =A0 =A0 =A0* gcc.target/i386/xop-rotate2-int.c: New test. > > --- gcc/config/i386/i386.c.jj =A0 2011-06-09 16:56:56.000000000 +0200 > +++ gcc/config/i386/i386.c =A0 =A0 =A02011-06-15 11:17:12.000000000 +0200 > @@ -26149,16 +26149,66 @@ ix86_expand_multi_arg_builtin (enum insn > =A0 =A0 =A0 int adjust =3D (comparison_p) ? 1 : 0; > =A0 =A0 =A0 enum machine_mode mode =3D insn_data[icode].operand[i+adjust+= 1].mode; > > - =A0 =A0 =A0if (last_arg_constant && i =3D=3D nargs-1) > + =A0 =A0 =A0if (last_arg_constant && i =3D=3D nargs - 1) > =A0 =A0 =A0 =A0{ > - =A0 =A0 =A0 =A0 if (!CONST_INT_P (op)) > + =A0 =A0 =A0 =A0 if (!insn_data[icode].operand[i + 1].predicate (op, mod= e)) > =A0 =A0 =A0 =A0 =A0 =A0{ > - =A0 =A0 =A0 =A0 =A0 =A0 error ("last argument must be an immediate"); > - =A0 =A0 =A0 =A0 =A0 =A0 return gen_reg_rtx (tmode); > + =A0 =A0 =A0 =A0 =A0 =A0 enum insn_code new_icode =3D icode; > + =A0 =A0 =A0 =A0 =A0 =A0 switch (icode) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_vpermil2v2df3: > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_vpermil2v4sf3: > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_vpermil2v4df3: > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_vpermil2v8sf3: > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (!CONST_INT_P (op)) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 error ("last argument must be a= n immediate"); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return gen_reg_rtx (tmode); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 error ("last argument must be in the ra= nge 0 .. 3"); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return gen_reg_rtx (tmode); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_rotlv2di3: > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 new_icode =3D CODE_FOR_rotlv2di3; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto xop_rotl; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_rotlv4si3: > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 new_icode =3D CODE_FOR_rotlv4si3; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto xop_rotl; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_rotlv8hi3: > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 new_icode =3D CODE_FOR_rotlv8hi3; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto xop_rotl; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_rotlv16qi3: > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 new_icode =3D CODE_FOR_rotlv16qi3; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 xop_rotl: > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (CONST_INT_P (op)) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 int mask =3D GET_MODE_BITSIZE (= GET_MODE_INNER (tmode)) - 1; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 op =3D GEN_INT (INTVAL (op) & m= ask); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gcc_checking_assert > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (insn_data[icode].operand[i= + 1].predicate (op, mode)); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 else > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gcc_checking_assert > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (nargs =3D=3D 2 > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&& insn_data[new_icode].= operand[0].mode =3D=3D tmode > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&& insn_data[new_icode].= operand[1].mode =3D=3D tmode > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&& insn_data[new_icode].= operand[2].mode =3D=3D mode > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&& insn_data[new_icode].= operand[0].predicate > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D=3D insn_data[ic= ode].operand[0].predicate > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&& insn_data[new_icode].= operand[1].predicate > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D=3D insn_data[ic= ode].operand[1].predicate); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 icode =3D new_icode; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto non_constant; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 default: > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gcc_unreachable (); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0} > =A0 =A0 =A0 else > =A0 =A0 =A0 =A0{ > + =A0 =A0 =A0 non_constant: > =A0 =A0 =A0 =A0 =A0if (VECTOR_MODE_P (mode)) > =A0 =A0 =A0 =A0 =A0 =A0op =3D safe_vector_operand (op, mode); > > --- gcc/testsuite/gcc.target/i386/xop-vpermil2px-1.c.jj 2011-06-15 10:18:= 29.000000000 +0200 > +++ gcc/testsuite/gcc.target/i386/xop-vpermil2px-1.c =A0 =A02011-06-15 10= :41:13.000000000 +0200 > @@ -0,0 +1,25 @@ > +/* PR target/49411 */ > +/* { dg-do compile } */ > +/* { dg-options "-O0 -mxop" } */ > + > +#include > + > +__m128d a1, a2, a3; > +__m256d b1, b2, b3; > +__m128 c1, c2, c3; > +__m256 d1, d2, d3; > +__m128i s; > +__m256i t; > + > +void > +foo (int i) > +{ > + =A0a1 =3D _mm_permute2_pd (a2, a3, s, 3); > + =A0b1 =3D _mm256_permute2_pd (b2, b3, t, 3); > + =A0c1 =3D _mm_permute2_ps (c2, c3, s, 3); > + =A0d1 =3D _mm256_permute2_ps (d2, d3, t, 3); > + =A0a1 =3D _mm_permute2_pd (a2, a3, s, 17); =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0/* { dg-error "last argument must be in the range 0 .. 3" } */ > + =A0b1 =3D _mm256_permute2_pd (b2, b3, t, 17); =A0 =A0 /* { dg-error "la= st argument must be in the range 0 .. 3" } */ > + =A0c1 =3D _mm_permute2_ps (c2, c3, s, 17); =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0/* { dg-error "last argument must be in the range 0 .. 3" } */ > + =A0d1 =3D _mm256_permute2_ps (d2, d3, t, 17); =A0 =A0 /* { dg-error "la= st argument must be in the range 0 .. 3" } */ > +} > --- gcc/testsuite/gcc.target/i386/xop-vpermil2px-2.c.jj 2011-06-15 10:39:= 36.000000000 +0200 > +++ gcc/testsuite/gcc.target/i386/xop-vpermil2px-2.c =A0 =A02011-06-15 10= :39:44.000000000 +0200 > @@ -0,0 +1,21 @@ > +/* PR target/49411 */ > +/* { dg-do compile } */ > +/* { dg-options "-O0 -mxop" } */ > + > +#include > + > +__m128d a1, a2, a3; > +__m256d b1, b2, b3; > +__m128 c1, c2, c3; > +__m256 d1, d2, d3; > +__m128i s; > +__m256i t; > + > +void > +foo (int i) > +{ > + =A0a1 =3D _mm_permute2_pd (a2, a3, s, i); =A0 =A0 =A0 =A0 /* { dg-error= "last argument must be an immediate" } */ > + =A0b1 =3D _mm256_permute2_pd (b2, b3, t, i); =A0 =A0 =A0/* { dg-error "= last argument must be an immediate" } */ > + =A0c1 =3D _mm_permute2_ps (c2, c3, s, i); =A0 =A0 =A0 =A0 /* { dg-error= "last argument must be an immediate" } */ > + =A0d1 =3D _mm256_permute2_ps (d2, d3, t, i); =A0 =A0 =A0/* { dg-error "= last argument must be an immediate" } */ > +} > --- gcc/testsuite/gcc.target/i386/xop-rotate1-int.c.jj =A02011-06-15 10:4= 7:29.000000000 +0200 > +++ gcc/testsuite/gcc.target/i386/xop-rotate1-int.c =A0 =A0 2011-06-15 11= :25:25.000000000 +0200 > @@ -0,0 +1,63 @@ > +/* PR target/49411 */ > +/* { dg-do run } */ > +/* { dg-require-effective-target xop } */ > +/* { dg-options "-O2 -mxop" } */ > + > +#include "xop-check.h" > + > +#include > + > +extern void abort (void); > + > +union > +{ > + =A0__m128i v; > + =A0unsigned char c[16]; > + =A0unsigned short s[8]; > + =A0unsigned int i[4]; > + =A0unsigned long long l[2]; > +} a, b, c, d; > + > +#define TEST1(F, N, S, SS) \ > +do { =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 \ > + =A0for (i =3D 0; i < sizeof (a.F) / sizeof (a.F[0]); i++) \ > + =A0 =A0a.F[i] =3D i * 17; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 \ > + =A0s =3D _mm_set1_epi##SS (N); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0\ > + =A0b.v =3D _mm_roti_epi##S (a.v, N); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0\ > + =A0c.v =3D _mm_rot_epi##S (a.v, s); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 \ > + =A0for (i =3D 0; i < sizeof (a.F) / sizeof (a.F[0]); i++) \ > + =A0 =A0{ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0\ > + =A0 =A0 =A0int mask =3D __CHAR_BIT__ * sizeof (a.F[i]) - 1; =A0 \ > + =A0 =A0 =A0d.F[i] =3D a.F[i] << (N & mask); =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 \ > + =A0 =A0 =A0if (N & mask) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0\ > + =A0 =A0 =A0 d.F[i] |=3D a.F[i] >> (mask + 1 - (N & mask)); =A0 =A0\ > + =A0 =A0 =A0if (b.F[i] !=3D c.F[i] || b.F[i] !=3D d.F[i]) =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0\ > + =A0 =A0 =A0 abort (); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 \ > + =A0 =A0} =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0\ > +} while (0) > +#define TEST(N) \ > + =A0TEST1 (c, N, 8, 8); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0\ > + =A0TEST1 (s, N, 16, 16); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0\ > + =A0TEST1 (i, N, 32, 32); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0\ > + =A0TEST1 (l, N, 64, 64x) > + > +volatile int n; > + > +static void > +xop_test (void) > +{ > + =A0unsigned int i; > + =A0__m128i s; > + > +#ifndef NON_CONST > + =A0TEST (5); > + =A0TEST (-5); > + =A0TEST (0); > + =A0TEST (31); > +#else > + =A0n =3D 5; TEST (n); > + =A0n =3D -5; TEST (n); > + =A0n =3D 0; TEST (n); > + =A0n =3D 31; TEST (n); > +#endif > +} > --- gcc/testsuite/gcc.target/i386/xop-rotate2-int.c.jj =A02011-06-15 11:2= 5:42.000000000 +0200 > +++ gcc/testsuite/gcc.target/i386/xop-rotate2-int.c =A0 =A0 2011-06-15 11= :26:03.000000000 +0200 > @@ -0,0 +1,7 @@ > +/* PR target/49411 */ > +/* { dg-do run } */ > +/* { dg-require-effective-target xop } */ > +/* { dg-options "-O2 -mxop" } */ > + > +#define NON_CONST 1 > +#include "xop-rotate1-int.c" > > =A0 =A0 =A0 =A0Jakub > I will test on AMD HW. --=20 Quentin