From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id E35BC3858D20 for ; Sun, 25 Dec 2022 12:47:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E35BC3858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:In-Reply-To:References:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=DiwIfFdwu8eID1dGj4CHKYcwQujVeTvP/QQjOuDJWXo=; b=RrmqsJIVAUGHSErplEhLDmL/cH 6kHIJ7yWjqLTNzNaKfJx+9bc9+08MvaKIS2o7/OYoDEskCeexu4bdd2gJB6sqmvt9Vvliz80BTIhI 2bF3xKbudVJEqLIGJgVE2VsT2JJ8ZDCz3jf+6ihfDfu7ehrZpR0TLTyYefwnEtK1sAnG14Qx3CHJL csS9qowGhRZ0JqiaY13pF1zsadFDL4l3iW4GZYDzRZrGPloA8MC8oREkVpUhsZGyAumBJJVatprEk TOZYAdx2gUqJNj7x6ohvT4F03Qifj2I4ee+f+DwF1BnXGZhTlc4wyvTV8gZLtR8jlw6tfjWfCQRvQ mEHQUr8w==; Received: from host109-151-228-216.range109-151.btcentralplus.com ([109.151.228.216]:53881 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1p9QPu-00022e-5d; Sun, 25 Dec 2022 07:47:52 -0500 From: "Roger Sayle" To: "'Uros Bizjak'" Cc: "'GCC Patches'" References: <00e501d916ee$0ef7d210$2ce77630$@nextmovesoftware.com> In-Reply-To: Subject: RE: [x86 PATCH] Use movss/movsd to implement V4SI/V2DI VEC_PERM. Date: Sun, 25 Dec 2022 12:47:48 -0000 Message-ID: <008601d9185f$18c650b0$4a52f210$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0087_01D9185F.18C6ECF0" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQJVYeK3Zf0nlO8lRiwEW1W4+nDi8AL3UE1rrW45fpA= Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_0087_01D9185F.18C6ECF0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi Uros, Many thanks and merry Christmas. Here's the version as committed, implemented using your preferred idiom with mode iterators for movss/movsd. Thanks again. 2022-12-25 Roger Sayle Uro=C5=A1 Bizjak gcc/ChangeLog * config/i386/i386-builtin.def (__builtin_ia32_movss): Update CODE_FOR_sse_movss to CODE_FOR_sse_movss_v4sf. (__builtin_ia32_movsd): Likewise, update CODE_FOR_sse2_movsd to CODE_FOR_sse2_movsd_v2df. * config/i386/i386-expand.cc (split_convert_uns_si_sse): Update gen_sse_movss call to gen_sse_movss_v4sf, and gen_sse2_movsd = call to gen_sse2_movsd_v2df. (expand_vec_perm_movs): Also allow V4SImode with TARGET_SSE and V2DImode with TARGET_SSE2. * config/i386/sse.md (avx512fp16_fcmaddcsh_v8hf_mask3): Update gen_sse_movss call to gen_sse_movss_v4sf. (avx512fp16_fmaddcsh_v8hf_mask3): Likewise. (sse_movss_): Renamed from sse_movss using VI4F_128 mode iterator to handle both V4SF and V4SI. (sse2_movsd_): Likewise, renamed from sse2_movsd using VI8F_128 mode iterator to handle both V2DF and V2DI. gcc/testsuite/ChangeLog * gcc.target/i386/sse-movss-4.c: New test case. * gcc.target/i386/sse2-movsd-3.c: New test case. Roger -- > -----Original Message----- > From: Uros Bizjak > Sent: 23 December 2022 17:18 > To: Roger Sayle > Cc: GCC Patches > Subject: Re: [x86 PATCH] Use movss/movsd to implement V4SI/V2DI = VEC_PERM. >=20 > On Fri, Dec 23, 2022 at 5:46 PM Roger Sayle = > wrote: > > > > > > This patch tweaks the x86 backend to use the movss and movsd > > instructions to perform some vector permutations on integer vectors > > (V4SI and V2DI) in the same way they are used for floating point = vectors (V4SF > and V2DF). > > > > As a motivating example, consider: > > > > typedef unsigned int v4si __attribute__((vector_size(16))); typedef > > float v4sf __attribute__((vector_size(16))); v4si foo(v4si x,v4si y) = { > > return (v4si){y[0],x[1],x[2],x[3]}; } v4sf bar(v4sf x,v4sf y) { = return > > (v4sf){y[0],x[1],x[2],x[3]}; } > > > > which is currently compiled with -O2 to: > > > > foo: movdqa %xmm0, %xmm2 > > shufps $80, %xmm0, %xmm1 > > movdqa %xmm1, %xmm0 > > shufps $232, %xmm2, %xmm0 > > ret > > > > bar: movss %xmm1, %xmm0 > > ret > > > > with this patch both functions compile to the same form. > > Likewise for the V2DI case: > > > > typedef unsigned long v2di __attribute__((vector_size(16))); typedef > > double v2df __attribute__((vector_size(16))); > > > > v2di foo(v2di x,v2di y) { return (v2di){y[0],x[1]}; } v2df bar(v2df > > x,v2df y) { return (v2df){y[0],x[1]}; } > > > > which is currently generates: > > > > foo: shufpd $2, %xmm0, %xmm1 > > movdqa %xmm1, %xmm0 > > ret > > > > bar: movsd %xmm1, %xmm0 > > ret > > > > There are two possible approaches to adding integer vector forms of > > the sse_movss and sse2_movsd instructions. One is to use a mode > > iterator > > (VI4F_128 or VI8F_128) on the existing define_insn patterns, but = this > > requires renaming the patterns to sse_movss_ which then = requires > > changes to i386-builtins.def and through-out the backend to reflect > > the new naming of gen_sse_movss_v4sf. The alternate approach (taken > > here) is to simply clone and specialize the existing patterns. = Uros, > > if you'd prefer the first approach, I'm happy to make/test/commit = those > changes. >=20 > I would really prefer the variant with VI4F_128/VI8F_128, these two = iterators > were introduced specifically for this case (see e.g. > sse_shufps_ and sse2_shufpd_. The internal name of the > pattern is fairly irrelevant and a trivial search and replace = operation can replace > the grand total of 6 occurrences ...) >=20 > Also, changing sse2_movsd to use VI8F_128 mode iterator would enable = more > alternatives besides movsd, so we give combine pass some more = opportunities > with memory operands. >=20 > So, the patch with those two iterators is pre-approved. >=20 > Uros. >=20 > > This patch has been tested on x86_64-pc-linux-gnu with make = bootstrap > > and make -k check, both with and without = --target_board=3Dunix{-m32}, > > with no new failures. Ok for mainline? > > > > 2022-12-23 Roger Sayle > > > > gcc/ChangeLog > > * config/i386/i386-expand.cc (expand_vec_perm_movs): Also = allow > > V4SImode with TARGET_SSE and V2DImode with TARGET_SSE2. > > * config/i386/sse.md (sse_movss_v4si): New define_insn, a = V4SI > > specialization of sse_movss. > > (sse2_movsd_v2di): Likewise, a V2DI specialization of = sse2_movsd. > > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/sse-movss-4.c: New test case. > > * gcc.target/i386/sse2-movsd-3.c: New test case. > > > > > > Thanks in advance, > > Roger > > -- > > ------=_NextPart_000_0087_01D9185F.18C6ECF0 Content-Type: text/plain; name="patchvz.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchvz.txt" diff --git a/gcc/config/i386/i386-builtin.def = b/gcc/config/i386/i386-builtin.def=0A= index d85b175..0d1fc34 100644=0A= --- a/gcc/config/i386/i386-builtin.def=0A= +++ b/gcc/config/i386/i386-builtin.def=0A= @@ -679,7 +679,7 @@ BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_xorv4sf3, = "__builtin_ia32_xorps", IX86_=0A= =0A= BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_copysignv4sf3, = "__builtin_ia32_copysignps", IX86_BUILTIN_CPYSGNPS, UNKNOWN, (int) = V4SF_FTYPE_V4SF_V4SF)=0A= =0A= -BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_movss, = "__builtin_ia32_movss", IX86_BUILTIN_MOVSS, UNKNOWN, (int) = V4SF_FTYPE_V4SF_V4SF)=0A= +BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_movss_v4sf, = "__builtin_ia32_movss", IX86_BUILTIN_MOVSS, UNKNOWN, (int) = V4SF_FTYPE_V4SF_V4SF)=0A= BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_movhlps_exp, = "__builtin_ia32_movhlps", IX86_BUILTIN_MOVHLPS, UNKNOWN, (int) = V4SF_FTYPE_V4SF_V4SF)=0A= BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_movlhps_exp, = "__builtin_ia32_movlhps", IX86_BUILTIN_MOVLHPS, UNKNOWN, (int) = V4SF_FTYPE_V4SF_V4SF)=0A= BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_vec_interleave_highv4sf, = "__builtin_ia32_unpckhps", IX86_BUILTIN_UNPCKHPS, UNKNOWN, (int) = V4SF_FTYPE_V4SF_V4SF)=0A= @@ -781,7 +781,7 @@ BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_xorv2df3, = "__builtin_ia32_xorpd", IX86=0A= =0A= BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_copysignv2df3, = "__builtin_ia32_copysignpd", IX86_BUILTIN_CPYSGNPD, UNKNOWN, (int) = V2DF_FTYPE_V2DF_V2DF)=0A= =0A= -BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_movsd, = "__builtin_ia32_movsd", IX86_BUILTIN_MOVSD, UNKNOWN, (int) = V2DF_FTYPE_V2DF_V2DF)=0A= +BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_movsd_v2df, = "__builtin_ia32_movsd", IX86_BUILTIN_MOVSD, UNKNOWN, (int) = V2DF_FTYPE_V2DF_V2DF)=0A= BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_vec_interleave_highv2df, = "__builtin_ia32_unpckhpd", IX86_BUILTIN_UNPCKHPD, UNKNOWN, (int) = V2DF_FTYPE_V2DF_V2DF)=0A= BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_vec_interleave_lowv2df, = "__builtin_ia32_unpcklpd", IX86_BUILTIN_UNPCKLPD, UNKNOWN, (int) = V2DF_FTYPE_V2DF_V2DF)=0A= =0A= diff --git a/gcc/config/i386/i386-expand.cc = b/gcc/config/i386/i386-expand.cc=0A= index a45640f..e144b5e 100644=0A= --- a/gcc/config/i386/i386-expand.cc=0A= +++ b/gcc/config/i386/i386-expand.cc=0A= @@ -1774,9 +1774,9 @@ ix86_split_convert_uns_si_sse (rtx operands[])=0A= input =3D gen_rtx_REG (vecmode, REGNO (input));=0A= emit_move_insn (value, CONST0_RTX (vecmode));=0A= if (vecmode =3D=3D V4SFmode)=0A= - emit_insn (gen_sse_movss (value, value, input));=0A= + emit_insn (gen_sse_movss_v4sf (value, value, input));=0A= else=0A= - emit_insn (gen_sse2_movsd (value, value, input));=0A= + emit_insn (gen_sse2_movsd_v2df (value, value, input));=0A= }=0A= =0A= emit_move_insn (large, two31);=0A= @@ -18903,8 +18903,10 @@ expand_vec_perm_movs (struct expand_vec_perm_d = *d)=0A= return false;=0A= =0A= if (!(TARGET_SSE && vmode =3D=3D V4SFmode)=0A= + && !(TARGET_SSE && vmode =3D=3D V4SImode)=0A= && !(TARGET_MMX_WITH_SSE && vmode =3D=3D V2SFmode)=0A= - && !(TARGET_SSE2 && vmode =3D=3D V2DFmode))=0A= + && !(TARGET_SSE2 && vmode =3D=3D V2DFmode)=0A= + && !(TARGET_SSE2 && vmode =3D=3D V2DImode))=0A= return false;=0A= =0A= /* Only the first element is changed. */=0A= diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md=0A= index de632b2..d50627a 100644=0A= --- a/gcc/config/i386/sse.md=0A= +++ b/gcc/config/i386/sse.md=0A= @@ -6825,7 +6825,7 @@=0A= if (!MEM_P (operands[3]))=0A= operands[3] =3D force_reg (V8HFmode, operands[3]);=0A= op1 =3D lowpart_subreg (V4SFmode, operands[3], V8HFmode);=0A= - emit_insn (gen_sse_movss (dest, op1, op0));=0A= + emit_insn (gen_sse_movss_v4sf (dest, op1, op0));=0A= emit_move_insn (operands[0], lowpart_subreg (V8HFmode, dest, = V4SFmode));=0A= DONE;=0A= })=0A= @@ -6855,7 +6855,7 @@=0A= if (!MEM_P (operands[3]))=0A= operands[3] =3D force_reg (V8HFmode, operands[3]);=0A= op1 =3D lowpart_subreg (V4SFmode, operands[3], V8HFmode);=0A= - emit_insn (gen_sse_movss (dest, op1, op0));=0A= + emit_insn (gen_sse_movss_v4sf (dest, op1, op0));=0A= emit_move_insn (operands[0], lowpart_subreg (V8HFmode, dest, = V4SFmode));=0A= DONE;=0A= })=0A= @@ -10498,11 +10498,11 @@=0A= (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")=0A= (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])=0A= =0A= -(define_insn "sse_movss"=0A= - [(set (match_operand:V4SF 0 "register_operand" "=3Dx,v")=0A= - (vec_merge:V4SF=0A= - (match_operand:V4SF 2 "register_operand" " x,v")=0A= - (match_operand:V4SF 1 "register_operand" " 0,v")=0A= +(define_insn "sse_movss_"=0A= + [(set (match_operand:VI4F_128 0 "register_operand" "=3Dx,v")=0A= + (vec_merge:VI4F_128=0A= + (match_operand:VI4F_128 2 "register_operand" " x,v")=0A= + (match_operand:VI4F_128 1 "register_operand" " 0,v")=0A= (const_int 1)))]=0A= "TARGET_SSE"=0A= "@=0A= @@ -13481,11 +13481,11 @@=0A= [(set (match_dup 0) (match_dup 1))]=0A= "operands[0] =3D adjust_address (operands[0], DFmode, 0);")=0A= =0A= -(define_insn "sse2_movsd"=0A= - [(set (match_operand:V2DF 0 "nonimmediate_operand" = "=3Dx,v,x,v,m,x,x,v,o")=0A= - (vec_merge:V2DF=0A= - (match_operand:V2DF 2 "nonimmediate_operand" " x,v,m,m,v,0,0,v,0")=0A= - (match_operand:V2DF 1 "nonimmediate_operand" " 0,v,0,v,0,x,o,o,v")=0A= +(define_insn "sse2_movsd_"=0A= + [(set (match_operand:VI8F_128 0 "nonimmediate_operand" = "=3Dx,v,x,v,m,x,x,v,o")=0A= + (vec_merge:VI8F_128=0A= + (match_operand:VI8F_128 2 "nonimmediate_operand" " = x,v,m,m,v,0,0,v,0")=0A= + (match_operand:VI8F_128 1 "nonimmediate_operand" " = 0,v,0,v,0,x,o,o,v")=0A= (const_int 1)))]=0A= "TARGET_SSE2"=0A= "@=0A= diff --git a/gcc/testsuite/gcc.target/i386/sse-movss-4.c = b/gcc/testsuite/gcc.target/i386/sse-movss-4.c=0A= new file mode 100644=0A= index 0000000..ec3019c=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/sse-movss-4.c=0A= @@ -0,0 +1,13 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-O2 -msse" } */=0A= +=0A= +typedef unsigned int v4si __attribute__((vector_size(16)));=0A= +typedef float v4sf __attribute__((vector_size(16)));=0A= +=0A= +v4si foo(v4si x,v4si y) { return (v4si){y[0],x[1],x[2],x[3]}; }=0A= +v4sf bar(v4sf x,v4sf y) { return (v4sf){y[0],x[1],x[2],x[3]}; }=0A= +=0A= +/* { dg-final { scan-assembler-times "\tv?movss\t" 2 } } */=0A= +/* { dg-final { scan-assembler-not "movaps" } } */=0A= +/* { dg-final { scan-assembler-not "shufps" } } */=0A= +/* { dg-final { scan-assembler-not "vpblendw" } } */=0A= diff --git a/gcc/testsuite/gcc.target/i386/sse2-movsd-3.c = b/gcc/testsuite/gcc.target/i386/sse2-movsd-3.c=0A= new file mode 100644=0A= index 0000000..fadbe2b=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/sse2-movsd-3.c=0A= @@ -0,0 +1,15 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-O2 -msse2" } */=0A= +=0A= +typedef unsigned long long v2di __attribute__((vector_size(16)));=0A= +typedef double v2df __attribute__((vector_size(16)));=0A= +=0A= +v2di foo(v2di x,v2di y) { return (v2di){y[0],x[1]}; }=0A= +v2df bar(v2df x,v2df y) { return (v2df){y[0],x[1]}; }=0A= +=0A= +/* { dg-final { scan-assembler-times "\tv?movsd\t" 2 } } */=0A= +/* { dg-final { scan-assembler-not "v?shufpd" } } */=0A= +/* { dg-final { scan-assembler-not "movdqa" } } */=0A= +/* { dg-final { scan-assembler-not "pshufd" } } */=0A= +/* { dg-final { scan-assembler-not "v?punpckldq" } } */=0A= +/* { dg-final { scan-assembler-not "v?movq" } } */=0A= ------=_NextPart_000_0087_01D9185F.18C6ECF0--