From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <roger@nextmovesoftware.com>
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com
 [162.254.253.69])
 by sourceware.org (Postfix) with ESMTPS id 63C633857433
 for <gcc-patches@gcc.gnu.org>; Mon,  4 Jul 2022 17:48:10 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 63C633857433
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=nextmovesoftware.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID:
 Date:Subject:In-Reply-To:References:Cc:To:From:Sender:Reply-To:
 Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
 Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:
 List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=sHNiHudVlayzZ4Tnbv2XKI8INf4bDua0FBnhI1BFStA=; b=iewGCDnpDAkwWKMUry972aB9Eh
 YQZzQLOHm6ptF1HH2y8/J2KgFuhQPFjapVHsYV0t/kOBivwM1yudUUlQx2Va4GS+6CaFLlZQj34zV
 WEbtnqQSTW3qds4rL1tb64faG2e1xXjRcN4m2qnfIp9/gvu03vJrS2y/wLhG79g9cipSPfLof97ut
 SxFKNyir+0tg3HfLEAAWYNXpqEdiRvpNWFXr3GOjSx2MW5ViW/qLcJNePyenN8oEPvC/EIadYn+9r
 VdkE1G8qj5W/E+l/50Yp683jM2BaNwjsZv7JyUGiDOOHWT7vldZinCjKkLbIorsq/Vme2VEaynsJh
 gjhl94RQ==;
Received: from [185.62.158.67] (port=64199 helo=Dell)
 by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2)
 (envelope-from <roger@nextmovesoftware.com>)
 id 1o8QB7-0006xE-Ou; Mon, 04 Jul 2022 13:48:09 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Hongtao Liu'" <crazylht@gmail.com>
Cc: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
References: <006901d88cb1$1a06e870$4e14b950$@nextmovesoftware.com>
 <CAMZc-bzjAP8PDz1=OWsaYPgLQ1qDh=p5kSL7eiXAGbEACT1irg@mail.gmail.com>
 <CAMZc-byU6460h3GrPDVdBVpma47PUhdpCXfSRnpam6d5E9wSrA@mail.gmail.com>
In-Reply-To: <CAMZc-byU6460h3GrPDVdBVpma47PUhdpCXfSRnpam6d5E9wSrA@mail.gmail.com>
Subject: RE: [x86 PATCH] UNSPEC_PALIGNR optimizations and clean-ups.
Date: Mon, 4 Jul 2022 18:48:09 +0100
Message-ID: <008701d88fce$39c9c8b0$ad5d5a10$@nextmovesoftware.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="----=_NextPart_000_0088_01D88FD6.9B90A1B0"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQLTH3WO6gM2BNESrG5dTdOeGyHanQGh1nGhAgiL60OrXAhnUA==
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id:
 roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jul 2022 17:48:12 -0000

This is a multipart message in MIME format.

------=_NextPart_000_0088_01D88FD6.9B90A1B0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable


Hi Hongtao,
Many thanks for your review.  This revised patch implements your
suggestions of removing the combine splitters, and instead reusing
the functionality of the ssse3_palignrdi define_insn_and split.

This revised patch has been tested on x86_64-pc-linux-gnu with make
bootstrap and make -k check, both with and with =
--target_board=3Dunix{-32},
with no new failures.  Is this revised version Ok for mainline?


2022-07-04  Roger Sayle  <roger@nextmovesoftware.com>
            Hongtao Liu  <hongtao.liu@intel.com>

gcc/ChangeLog
        * config/i386/i386-builtin.def (__builtin_ia32_palignr128): =
Change
        CODE_FOR_ssse3_palignrti to CODE_FOR_ssse3_palignrv1ti.
        * config/i386/i386-expand.cc (expand_vec_perm_palignr): Use =
V1TImode
        and gen_ssse3_palignv1ti instead of TImode.
        * config/i386/sse.md (SSESCALARMODE): Delete.
        (define_mode_attr ssse3_avx2): Handle V1TImode instead of =
TImode.
        (<ssse3_avx2>_palignr<mode>): Use VIMAX_AVX2_AVX512BW as a mode
        iterator instead of SSESCALARMODE.

        (ssse3_palignrdi): Optimize cases where operands[3] is 0 or 64,
        using a single move instruction (if required).

gcc/testsuite/ChangeLog
        * gcc.target/i386/ssse3-palignr-2.c: New test case.


Thanks in advance,
Roger
--

> -----Original Message-----
> From: Hongtao Liu <crazylht@gmail.com>
> Sent: 01 July 2022 03:40
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [x86 PATCH] UNSPEC_PALIGNR optimizations and clean-ups.
>=20
> On Fri, Jul 1, 2022 at 10:12 AM Hongtao Liu <crazylht@gmail.com> =
wrote:
> >
> > On Fri, Jul 1, 2022 at 2:42 AM Roger Sayle =
<roger@nextmovesoftware.com>
> wrote:
> > >
> > >
> > > This patch is a follow-up to Hongtao's fix for PR target/105854.
> > > That fix is perfectly correct, but the thing that caught my eye =
was
> > > why is the compiler generating a shift by zero at all.  Digging
> > > deeper it turns out that we can easily optimize
> > > __builtin_ia32_palignr for alignments of 0 and 64 respectively,
> > > which may be simplified to moves from the highpart or lowpart.
> > >
> > > After adding optimizations to simplify the 64-bit DImode palignr, =
I
> > > started to add the corresponding optimizations for vpalignr (i.e.
> > > 128-bit).  The first oddity is that sse.md uses TImode and a =
special
> > > SSESCALARMODE iterator, rather than V1TImode, and indeed the =
comment
> > > above SSESCALARMODE hints that this should be "dropped in favor of
> > > VIMAX_AVX2_AVX512BW".  Hence this patch includes the migration of
> > > <ssse3_avx2>_palignr<mode> to use VIMAX_AVX2_AVX512BW, basically
> > > using V1TImode instead of TImode for 128-bit palignr.
> > >
> > > But it was only after I'd implemented this clean-up that I =
stumbled
> > > across the strange semantics of 128-bit [v]palignr.  According to
> > > https://www.felixcloutier.com/x86/palignr, the semantics are =
subtly
> > > different based upon how the instruction is encoded.  PALIGNR =
leaves
> > > the highpart unmodified, whilst VEX.128 encoded VPALIGNR clears =
the
> > > highpart, and (unless I'm mistaken) it looks like GCC currently =
uses
> > > the exact same RTL/templates for both, treating one as an
> > > alternative for the other.
> > I think as long as patterns or intrinsics only care about the low
> > part, they should be ok.
> > But if we want to use default behavior for upper bits, we need to
> > restrict them under specific isa(.i.e. vmovq in vec_set<mode>_0).
> > Generally, 128-bit sse legacy instructions have different behaviors
> > for upper bits from AVX ones, and that's why vzeroupper is =
introduced
> > for sse <-> avx instructions transition.
> > >
> > > Hence I thought I'd post what I have so far (part optimization and
> > > part clean-up), to then ask the x86 experts for their opinions.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make
> > > bootstrap and make -k check, both with and without
> > > --target_board=3Dunix{-,32}, with no new failures.  Ok for =
mainline?
> > >
> > >
> > > 2022-06-30  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         * config/i386/i386-builtin.def =
(__builtin_ia32_palignr128): Change
> > >         CODE_FOR_ssse3_palignrti to CODE_FOR_ssse3_palignrv1ti.
> > >         * config/i386/i386-expand.cc (expand_vec_perm_palignr): =
Use
> V1TImode
> > >         and gen_ssse3_palignv1ti instead of TImode.
> > >         * config/i386/sse.md (SSESCALARMODE): Delete.
> > >         (define_mode_attr ssse3_avx2): Handle V1TImode instead of =
TImode.
> > >         (<ssse3_avx2>_palignr<mode>): Use VIMAX_AVX2_AVX512BW as a
> mode
> > >         iterator instead of SSESCALARMODE.
> > >
> > >         (ssse3_palignrdi): Optimize cases when operands[3] is 0 or =
64,
> > >         using a single move instruction (if required).
> > >         (define_split): Likewise split UNSPEC_PALIGNR $0 into a =
move.
> > >         (define_split): Likewise split UNSPEC_PALIGNR $64 into a =
move.
> > >
> > > gcc/testsuite/ChangeLog
> > >         * gcc.target/i386/ssse3-palignr-2.c: New test case.
> > >
> > >
> > > Thanks in advance,
> > > Roger
> > > --
> > >
> >
> > +(define_split
> > +  [(set (match_operand:DI 0 "register_operand")  (unspec:DI
> > +[(match_operand:DI 1 "register_operand")
> > +    (match_operand:DI 2 "register_mmxmem_operand")
> > +    (const_int 0)]
> > +   UNSPEC_PALIGNR))]
> > +  ""
> > +  [(set (match_dup 0) (match_dup 2))])
> > +
> > +(define_split
> > +  [(set (match_operand:DI 0 "register_operand")  (unspec:DI
> > +[(match_operand:DI 1 "register_operand")
> > +    (match_operand:DI 2 "register_mmxmem_operand")
> > +    (const_int 64)]
> > +   UNSPEC_PALIGNR))]
> > +  ""
> > +  [(set (match_dup 0) (match_dup 1))])
> > +
> > define_split is assumed to be splitted to 2(or more) insns, hence
> > pass_combine will only try define_split if the number of merged =
insns
> > is greater than 2.
> > For palignr, i think most time there would be only 2 merged
> > insns(constant propagation), so better to change them as pre_reload
> > splitter.
> > (.i.e. (define_insn_and_split =
"*avx512bw_permvar_truncv16siv16hi_1").
> I think you can just merge 2 define_split into define_insn_and_split
> "ssse3_palignrdi" by relaxing split condition as
>=20
> -  "TARGET_SSSE3 && reload_completed
> -   && SSE_REGNO_P (REGNO (operands[0]))"
> +  "(TARGET_SSSE3 && reload_completed
> +   && SSE_REGNO_P (REGNO (operands[0])))
> +   || INVAL(operands[3]) =3D=3D 0
> +   || INVAL(operands[3]) =3D=3D 64"
>=20
> and you have already handled them by
>=20
> +  if (operands[3] =3D=3D const0_rtx)
> +    {
> +      if (!rtx_equal_p (operands[0], operands[2])) emit_move_insn
> + (operands[0], operands[2]);
> +      else
> + emit_note (NOTE_INSN_DELETED);
> +      DONE;
> +    }
> +  else if (INTVAL (operands[3]) =3D=3D 64)
> +    {
> +      if (!rtx_equal_p (operands[0], operands[1])) emit_move_insn
> + (operands[0], operands[1]);
> +      else
> + emit_note (NOTE_INSN_DELETED);
> +      DONE;
> +    }
> +
>=20
> >
>=20
> >
> > --
> > BR,
> > Hongtao
>=20
>=20
>=20
> --
> BR,
> Hongtao

------=_NextPart_000_0088_01D88FD6.9B90A1B0
Content-Type: text/plain;
	name="patchvs5.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="patchvs5.txt"

diff --git a/gcc/config/i386/i386-builtin.def =
b/gcc/config/i386/i386-builtin.def=0A=
index e6daad4..fd16093 100644=0A=
--- a/gcc/config/i386/i386-builtin.def=0A=
+++ b/gcc/config/i386/i386-builtin.def=0A=
@@ -900,7 +900,7 @@ BDESC (OPTION_MASK_ISA_SSSE3, 0, =
CODE_FOR_ssse3_psignv4si3, "__builtin_ia32_psig=0A=
 BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, =
CODE_FOR_ssse3_psignv2si3, "__builtin_ia32_psignd", IX86_BUILTIN_PSIGND, =
UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)=0A=
 =0A=
 /* SSSE3.  */=0A=
-BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_ssse3_palignrti, =
"__builtin_ia32_palignr128", IX86_BUILTIN_PALIGNR128, UNKNOWN, (int) =
V2DI_FTYPE_V2DI_V2DI_INT_CONVERT)=0A=
+BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_ssse3_palignrv1ti, =
"__builtin_ia32_palignr128", IX86_BUILTIN_PALIGNR128, UNKNOWN, (int) =
V2DI_FTYPE_V2DI_V2DI_INT_CONVERT)=0A=
 BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, =
CODE_FOR_ssse3_palignrdi, "__builtin_ia32_palignr", =
IX86_BUILTIN_PALIGNR, UNKNOWN, (int) V1DI_FTYPE_V1DI_V1DI_INT_CONVERT)=0A=
 =0A=
 /* SSE4.1 */=0A=
diff --git a/gcc/config/i386/i386-expand.cc =
b/gcc/config/i386/i386-expand.cc=0A=
index 8bc5430..6a3fcde 100644=0A=
--- a/gcc/config/i386/i386-expand.cc=0A=
+++ b/gcc/config/i386/i386-expand.cc=0A=
@@ -19548,9 +19548,11 @@ expand_vec_perm_palignr (struct =
expand_vec_perm_d *d, bool single_insn_only_p)=0A=
   shift =3D GEN_INT (min * GET_MODE_UNIT_BITSIZE (d->vmode));=0A=
   if (GET_MODE_SIZE (d->vmode) =3D=3D 16)=0A=
     {=0A=
-      target =3D gen_reg_rtx (TImode);=0A=
-      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, =
dcopy.op1),=0A=
-				      gen_lowpart (TImode, dcopy.op0), shift));=0A=
+      target =3D gen_reg_rtx (V1TImode);=0A=
+      emit_insn (gen_ssse3_palignrv1ti (target,=0A=
+					gen_lowpart (V1TImode, dcopy.op1),=0A=
+					gen_lowpart (V1TImode, dcopy.op0),=0A=
+					shift));=0A=
     }=0A=
   else=0A=
     {=0A=
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md=0A=
index f2f72e8..adf05bf 100644=0A=
--- a/gcc/config/i386/sse.md=0A=
+++ b/gcc/config/i386/sse.md=0A=
@@ -575,10 +575,6 @@=0A=
 (define_mode_iterator VIMAX_AVX2=0A=
   [(V2TI "TARGET_AVX2") V1TI])=0A=
 =0A=
-;; ??? This should probably be dropped in favor of VIMAX_AVX2_AVX512BW.=0A=
-(define_mode_iterator SSESCALARMODE=0A=
-  [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") TI])=0A=
-=0A=
 (define_mode_iterator VI12_AVX2=0A=
   [(V32QI "TARGET_AVX2") V16QI=0A=
    (V16HI "TARGET_AVX2") V8HI])=0A=
@@ -712,7 +708,7 @@=0A=
     (V4HI "ssse3") (V8HI "ssse3") (V16HI "avx2") (V32HI "avx512bw")=0A=
     (V4SI "ssse3") (V8SI "avx2")=0A=
     (V2DI "ssse3") (V4DI "avx2")=0A=
-    (TI "ssse3") (V2TI "avx2") (V4TI "avx512bw")])=0A=
+    (V1TI "ssse3") (V2TI "avx2") (V4TI "avx512bw")])=0A=
 =0A=
 (define_mode_attr sse4_1_avx2=0A=
    [(V16QI "sse4_1") (V32QI "avx2") (V64QI "avx512bw")=0A=
@@ -21092,10 +21088,10 @@=0A=
    (set_attr "mode" "<sseinsnmode>")])=0A=
 =0A=
 (define_insn "<ssse3_avx2>_palignr<mode>"=0A=
-  [(set (match_operand:SSESCALARMODE 0 "register_operand" "=3Dx,<v_Yw>")=0A=
-	(unspec:SSESCALARMODE=0A=
-	  [(match_operand:SSESCALARMODE 1 "register_operand" "0,<v_Yw>")=0A=
-	   (match_operand:SSESCALARMODE 2 "vector_operand" "xBm,<v_Yw>m")=0A=
+  [(set (match_operand:VIMAX_AVX2_AVX512BW 0 "register_operand" =
"=3Dx,<v_Yw>")=0A=
+	(unspec:VIMAX_AVX2_AVX512BW=0A=
+	  [(match_operand:VIMAX_AVX2_AVX512BW 1 "register_operand" "0,<v_Yw>")=0A=
+	   (match_operand:VIMAX_AVX2_AVX512BW 2 "vector_operand" "xBm,<v_Yw>m")=0A=
 	   (match_operand:SI 3 "const_0_to_255_mul_8_operand")]=0A=
 	  UNSPEC_PALIGNR))]=0A=
   "TARGET_SSSE3"=0A=
@@ -21141,11 +21137,30 @@=0A=
       gcc_unreachable ();=0A=
     }=0A=
 }=0A=
-  "TARGET_SSSE3 && reload_completed=0A=
-   && SSE_REGNO_P (REGNO (operands[0]))"=0A=
+  "(TARGET_SSSE3 && reload_completed=0A=
+    && SSE_REGNO_P (REGNO (operands[0])))=0A=
+   || operands[3] =3D=3D const0_rtx=0A=
+   || INTVAL (operands[3]) =3D=3D 64"=0A=
   [(set (match_dup 0)=0A=
 	(lshiftrt:V1TI (match_dup 0) (match_dup 3)))]=0A=
 {=0A=
+  if (operands[3] =3D=3D const0_rtx)=0A=
+    {=0A=
+      if (!rtx_equal_p (operands[0], operands[2]))=0A=
+	emit_move_insn (operands[0], operands[2]);=0A=
+      else=0A=
+	emit_note (NOTE_INSN_DELETED);=0A=
+      DONE;=0A=
+    }=0A=
+  else if (INTVAL (operands[3]) =3D=3D 64)=0A=
+    {=0A=
+      if (!rtx_equal_p (operands[0], operands[1]))=0A=
+	emit_move_insn (operands[0], operands[1]);=0A=
+      else=0A=
+	emit_note (NOTE_INSN_DELETED);=0A=
+      DONE;=0A=
+    }=0A=
+=0A=
   /* Emulate MMX palignrdi with SSE psrldq.  */=0A=
   rtx op0 =3D lowpart_subreg (V2DImode, operands[0],=0A=
 			    GET_MODE (operands[0]));=0A=
diff --git a/gcc/testsuite/gcc.target/i386/ssse3-palignr-2.c =
b/gcc/testsuite/gcc.target/i386/ssse3-palignr-2.c=0A=
new file mode 100644=0A=
index 0000000..791222d=0A=
--- /dev/null=0A=
+++ b/gcc/testsuite/gcc.target/i386/ssse3-palignr-2.c=0A=
@@ -0,0 +1,21 @@=0A=
+/* { dg-do compile } */=0A=
+/* { dg-options "-O2 -mssse3" } */=0A=
+=0A=
+typedef long long __attribute__ ((__vector_size__ (8))) T;=0A=
+=0A=
+T x;=0A=
+T y;=0A=
+T z;=0A=
+=0A=
+void foo()=0A=
+{=0A=
+  z =3D __builtin_ia32_palignr (x, y, 0);=0A=
+}=0A=
+=0A=
+void bar()=0A=
+{=0A=
+  z =3D __builtin_ia32_palignr (x, y, 64);=0A=
+}=0A=
+/* { dg-final { scan-assembler-not "punpcklqdq" } } */=0A=
+/* { dg-final { scan-assembler-not "pshufd" } } */=0A=
+/* { dg-final { scan-assembler-not "psrldq" } } */=0A=

------=_NextPart_000_0088_01D88FD6.9B90A1B0--