From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-294485-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 17444 invoked by alias); 15 Jun 2011 16:40:56 -0000
Received: (qmail 17242 invoked by uid 22791); 15 Jun 2011 16:40:54 -0000
X-SWARE-Spam-Status: No, hits=-2.0 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,TW_MX,TW_VP
X-Spam-Check-By: sourceware.org
Received: from mail-pv0-f175.google.com (HELO mail-pv0-f175.google.com) (74.125.83.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 15 Jun 2011 16:40:38 +0000
Received: by pvc30 with SMTP id 30so466364pvc.20        for <gcc-patches@gcc.gnu.org>; Wed, 15 Jun 2011 09:40:38 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.68.25.201 with SMTP id e9mr539016pbg.22.1308156037947; Wed, 15 Jun 2011 09:40:37 -0700 (PDT)
Received: by 10.68.47.69 with HTTP; Wed, 15 Jun 2011 09:40:37 -0700 (PDT)
In-Reply-To: <20110615095406.GI17079@tyan-ft48-01.lab.bos.redhat.com>
References: <20110615095406.GI17079@tyan-ft48-01.lab.bos.redhat.com>
Date: Wed, 15 Jun 2011 17:00:00 -0000
Message-ID: <BANLkTi=TJojLGHXNTZTsjBaCKLz4W7dkiw@mail.gmail.com>
Subject: Re: [PATCH] Fix ICEs with -mxop __builtin_ia32_vpermil2p[sd]{,256} and __builtin_ia32_vprot[bwdq]i intrinsics (PR target/49411)
From: Quentin Neill <quentin.neill.gnu@gmail.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>, Sebastian Pop <sebpop@gmail.com>, 	"Fang, Changpeng" <Changpeng.Fang@amd.com>, gcc-patches@gcc.gnu.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-06/txt/msg01170.txt.bz2

On Wed, Jun 15, 2011 at 4:54 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> All of these _mm{,256}_permute2_p[sd] and _mm_roti_epi{8,16,32,64}
> intrinsics ICE if the last argument is constant integer, but not in the
> expected range.
>
> I could only find MSFT documentation for these intrinsics, where for
> *permute2* it says that the last argument must be 0, 1, 2 or 3,
> for *roti* it says that the last argument is integer rotation count,
> preferrably constant and that if count is negative, it performs right
> rotation instead of left rotation.
> This patch adjusts the builtins to match that, if we want to instead
> e.g. always mandate _mm_roti_epi* last argument is constant integer,
> or constant integer in the range -N+1 .. N-1 where N is the number
> after _mm_roti_epi, or in the range 0 .. N-1, it can be easily adjusted.
>
> Regtested on x86_64-linux {-m32,-m64}, unfortunately on a SandyBridge
> box, so I couldn't verify if xop-rotate[12]-int.c actually succeeds
> on xop capable HW.
>
> 2011-06-15 =A0Jakub Jelinek =A0<jakub@redhat.com>
>
> =A0 =A0 =A0 =A0PR target/49411
> =A0 =A0 =A0 =A0* config/i386/i386.c (ix86_expand_multi_arg_builtins): If
> =A0 =A0 =A0 =A0last_arg_constant and last argument doesn't match its pred=
icate,
> =A0 =A0 =A0 =A0for xop_vpermil2<mode>3 error out and for xop_rotl<mode>3
> =A0 =A0 =A0 =A0if it is CONST_INT, mask it, otherwise expand using rotl<m=
ode>3.
>
> =A0 =A0 =A0 =A0* gcc.target/i386/xop-vpermil2px-1.c: New test.
> =A0 =A0 =A0 =A0* gcc.target/i386/xop-vpermil2px-2.c: New test.
> =A0 =A0 =A0 =A0* gcc.target/i386/xop-rotate1-int.c: New test.
> =A0 =A0 =A0 =A0* gcc.target/i386/xop-rotate2-int.c: New test.
>
> --- gcc/config/i386/i386.c.jj =A0 2011-06-09 16:56:56.000000000 +0200
> +++ gcc/config/i386/i386.c =A0 =A0 =A02011-06-15 11:17:12.000000000 +0200
> @@ -26149,16 +26149,66 @@ ix86_expand_multi_arg_builtin (enum insn
> =A0 =A0 =A0 int adjust =3D (comparison_p) ? 1 : 0;
> =A0 =A0 =A0 enum machine_mode mode =3D insn_data[icode].operand[i+adjust+=
1].mode;
>
> - =A0 =A0 =A0if (last_arg_constant && i =3D=3D nargs-1)
> + =A0 =A0 =A0if (last_arg_constant && i =3D=3D nargs - 1)
> =A0 =A0 =A0 =A0{
> - =A0 =A0 =A0 =A0 if (!CONST_INT_P (op))
> + =A0 =A0 =A0 =A0 if (!insn_data[icode].operand[i + 1].predicate (op, mod=
e))
> =A0 =A0 =A0 =A0 =A0 =A0{
> - =A0 =A0 =A0 =A0 =A0 =A0 error ("last argument must be an immediate");
> - =A0 =A0 =A0 =A0 =A0 =A0 return gen_reg_rtx (tmode);
> + =A0 =A0 =A0 =A0 =A0 =A0 enum insn_code new_icode =3D icode;
> + =A0 =A0 =A0 =A0 =A0 =A0 switch (icode)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_vpermil2v2df3:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_vpermil2v4sf3:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_vpermil2v4df3:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_vpermil2v8sf3:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (!CONST_INT_P (op))
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 error ("last argument must be a=
n immediate");
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return gen_reg_rtx (tmode);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 error ("last argument must be in the ra=
nge 0 .. 3");
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return gen_reg_rtx (tmode);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_rotlv2di3:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 new_icode =3D CODE_FOR_rotlv2di3;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto xop_rotl;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_rotlv4si3:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 new_icode =3D CODE_FOR_rotlv4si3;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto xop_rotl;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_rotlv8hi3:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 new_icode =3D CODE_FOR_rotlv8hi3;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto xop_rotl;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 case CODE_FOR_xop_rotlv16qi3:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 new_icode =3D CODE_FOR_rotlv16qi3;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 xop_rotl:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (CONST_INT_P (op))
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 int mask =3D GET_MODE_BITSIZE (=
GET_MODE_INNER (tmode)) - 1;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 op =3D GEN_INT (INTVAL (op) & m=
ask);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gcc_checking_assert
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (insn_data[icode].operand[i=
 + 1].predicate (op, mode));
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 else
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gcc_checking_assert
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (nargs =3D=3D 2
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&& insn_data[new_icode].=
operand[0].mode =3D=3D tmode
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&& insn_data[new_icode].=
operand[1].mode =3D=3D tmode
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&& insn_data[new_icode].=
operand[2].mode =3D=3D mode
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&& insn_data[new_icode].=
operand[0].predicate
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D=3D insn_data[ic=
ode].operand[0].predicate
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&& insn_data[new_icode].=
operand[1].predicate
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D=3D insn_data[ic=
ode].operand[1].predicate);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 icode =3D new_icode;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto non_constant;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 default:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gcc_unreachable ();
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 else
> =A0 =A0 =A0 =A0{
> + =A0 =A0 =A0 non_constant:
> =A0 =A0 =A0 =A0 =A0if (VECTOR_MODE_P (mode))
> =A0 =A0 =A0 =A0 =A0 =A0op =3D safe_vector_operand (op, mode);
>
> --- gcc/testsuite/gcc.target/i386/xop-vpermil2px-1.c.jj 2011-06-15 10:18:=
29.000000000 +0200
> +++ gcc/testsuite/gcc.target/i386/xop-vpermil2px-1.c =A0 =A02011-06-15 10=
:41:13.000000000 +0200
> @@ -0,0 +1,25 @@
> +/* PR target/49411 */
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -mxop" } */
> +
> +#include <x86intrin.h>
> +
> +__m128d a1, a2, a3;
> +__m256d b1, b2, b3;
> +__m128 c1, c2, c3;
> +__m256 d1, d2, d3;
> +__m128i s;
> +__m256i t;
> +
> +void
> +foo (int i)
> +{
> + =A0a1 =3D _mm_permute2_pd (a2, a3, s, 3);
> + =A0b1 =3D _mm256_permute2_pd (b2, b3, t, 3);
> + =A0c1 =3D _mm_permute2_ps (c2, c3, s, 3);
> + =A0d1 =3D _mm256_permute2_ps (d2, d3, t, 3);
> + =A0a1 =3D _mm_permute2_pd (a2, a3, s, 17); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0/* { dg-error "last argument must be in the range 0 .. 3" } */
> + =A0b1 =3D _mm256_permute2_pd (b2, b3, t, 17); =A0 =A0 /* { dg-error "la=
st argument must be in the range 0 .. 3" } */
> + =A0c1 =3D _mm_permute2_ps (c2, c3, s, 17); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0/* { dg-error "last argument must be in the range 0 .. 3" } */
> + =A0d1 =3D _mm256_permute2_ps (d2, d3, t, 17); =A0 =A0 /* { dg-error "la=
st argument must be in the range 0 .. 3" } */
> +}
> --- gcc/testsuite/gcc.target/i386/xop-vpermil2px-2.c.jj 2011-06-15 10:39:=
36.000000000 +0200
> +++ gcc/testsuite/gcc.target/i386/xop-vpermil2px-2.c =A0 =A02011-06-15 10=
:39:44.000000000 +0200
> @@ -0,0 +1,21 @@
> +/* PR target/49411 */
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -mxop" } */
> +
> +#include <x86intrin.h>
> +
> +__m128d a1, a2, a3;
> +__m256d b1, b2, b3;
> +__m128 c1, c2, c3;
> +__m256 d1, d2, d3;
> +__m128i s;
> +__m256i t;
> +
> +void
> +foo (int i)
> +{
> + =A0a1 =3D _mm_permute2_pd (a2, a3, s, i); =A0 =A0 =A0 =A0 /* { dg-error=
 "last argument must be an immediate" } */
> + =A0b1 =3D _mm256_permute2_pd (b2, b3, t, i); =A0 =A0 =A0/* { dg-error "=
last argument must be an immediate" } */
> + =A0c1 =3D _mm_permute2_ps (c2, c3, s, i); =A0 =A0 =A0 =A0 /* { dg-error=
 "last argument must be an immediate" } */
> + =A0d1 =3D _mm256_permute2_ps (d2, d3, t, i); =A0 =A0 =A0/* { dg-error "=
last argument must be an immediate" } */
> +}
> --- gcc/testsuite/gcc.target/i386/xop-rotate1-int.c.jj =A02011-06-15 10:4=
7:29.000000000 +0200
> +++ gcc/testsuite/gcc.target/i386/xop-rotate1-int.c =A0 =A0 2011-06-15 11=
:25:25.000000000 +0200
> @@ -0,0 +1,63 @@
> +/* PR target/49411 */
> +/* { dg-do run } */
> +/* { dg-require-effective-target xop } */
> +/* { dg-options "-O2 -mxop" } */
> +
> +#include "xop-check.h"
> +
> +#include <x86intrin.h>
> +
> +extern void abort (void);
> +
> +union
> +{
> + =A0__m128i v;
> + =A0unsigned char c[16];
> + =A0unsigned short s[8];
> + =A0unsigned int i[4];
> + =A0unsigned long long l[2];
> +} a, b, c, d;
> +
> +#define TEST1(F, N, S, SS) \
> +do { =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 \
> + =A0for (i =3D 0; i < sizeof (a.F) / sizeof (a.F[0]); i++) \
> + =A0 =A0a.F[i] =3D i * 17; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 \
> + =A0s =3D _mm_set1_epi##SS (N); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0\
> + =A0b.v =3D _mm_roti_epi##S (a.v, N); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0\
> + =A0c.v =3D _mm_rot_epi##S (a.v, s); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 \
> + =A0for (i =3D 0; i < sizeof (a.F) / sizeof (a.F[0]); i++) \
> + =A0 =A0{ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0\
> + =A0 =A0 =A0int mask =3D __CHAR_BIT__ * sizeof (a.F[i]) - 1; =A0 \
> + =A0 =A0 =A0d.F[i] =3D a.F[i] << (N & mask); =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 \
> + =A0 =A0 =A0if (N & mask) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0\
> + =A0 =A0 =A0 d.F[i] |=3D a.F[i] >> (mask + 1 - (N & mask)); =A0 =A0\
> + =A0 =A0 =A0if (b.F[i] !=3D c.F[i] || b.F[i] !=3D d.F[i]) =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0\
> + =A0 =A0 =A0 abort (); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 \
> + =A0 =A0} =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0\
> +} while (0)
> +#define TEST(N) \
> + =A0TEST1 (c, N, 8, 8); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0\
> + =A0TEST1 (s, N, 16, 16); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0\
> + =A0TEST1 (i, N, 32, 32); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0\
> + =A0TEST1 (l, N, 64, 64x)
> +
> +volatile int n;
> +
> +static void
> +xop_test (void)
> +{
> + =A0unsigned int i;
> + =A0__m128i s;
> +
> +#ifndef NON_CONST
> + =A0TEST (5);
> + =A0TEST (-5);
> + =A0TEST (0);
> + =A0TEST (31);
> +#else
> + =A0n =3D 5; TEST (n);
> + =A0n =3D -5; TEST (n);
> + =A0n =3D 0; TEST (n);
> + =A0n =3D 31; TEST (n);
> +#endif
> +}
> --- gcc/testsuite/gcc.target/i386/xop-rotate2-int.c.jj =A02011-06-15 11:2=
5:42.000000000 +0200
> +++ gcc/testsuite/gcc.target/i386/xop-rotate2-int.c =A0 =A0 2011-06-15 11=
:26:03.000000000 +0200
> @@ -0,0 +1,7 @@
> +/* PR target/49411 */
> +/* { dg-do run } */
> +/* { dg-require-effective-target xop } */
> +/* { dg-options "-O2 -mxop" } */
> +
> +#define NON_CONST 1
> +#include "xop-rotate1-int.c"
>
> =A0 =A0 =A0 =A0Jakub
>

I will test on AMD HW.
--=20
Quentin