public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [x86 PATCH] Split SUBREGs of SSE vector registers into vec_select insns.
@ 2023-08-03  7:10 Roger Sayle
  2023-08-03  9:37 ` Uros Bizjak
  0 siblings, 1 reply; 2+ messages in thread
From: Roger Sayle @ 2023-08-03  7:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: 'Uros Bizjak'

[-- Attachment #1: Type: text/plain, Size: 2577 bytes --]


This patch is the final piece in the series to improve the ABI issues
affecting PR 88873.  The previous patches tackled inserting DFmode
values into V2DFmode registers, by introducing insvti_{low,high}part
patterns.  This patch improves the extraction of DFmode values from
v2DFmode registers via TImode intermediates.

I'd initially thought this would require new extvti_{low,high}part
patterns to be defined, but all that's required is to recognize that
the SUBREG idioms produced by combine are equivalent to (forms of)
vec_select patterns.  The target-independent middle-end can't be sure
that the appropriate vec_select instruction exists on the target,
hence doesn't canonicalize a SUBREG of a vector mode as a vec_select,
but the backend can provide a define_split stating where and when
this is useful, for example, considering whether the operand is in
memory, or whether !TARGET_SSE_MATH and the destination is i387.

For pr88873.c, gcc -O2 -march=cascadelake currently generates:

foo:    vpunpcklqdq     %xmm3, %xmm2, %xmm7
        vpunpcklqdq     %xmm1, %xmm0, %xmm6
        vpunpcklqdq     %xmm5, %xmm4, %xmm2
        vmovdqa %xmm7, -24(%rsp)
        vmovdqa %xmm6, %xmm1
        movq    -16(%rsp), %rax
        vpinsrq $1, %rax, %xmm7, %xmm4
        vmovapd %xmm4, %xmm6
        vfmadd132pd     %xmm1, %xmm2, %xmm6
        vmovapd %xmm6, -24(%rsp)
        vmovsd  -16(%rsp), %xmm1
        vmovsd  -24(%rsp), %xmm0
        ret

with this patch, we now generate:

foo:    vpunpcklqdq     %xmm1, %xmm0, %xmm6
        vpunpcklqdq     %xmm3, %xmm2, %xmm7
        vpunpcklqdq     %xmm5, %xmm4, %xmm2
        vmovdqa %xmm6, %xmm1
        vfmadd132pd     %xmm7, %xmm2, %xmm1
        vmovsd  %xmm1, %xmm1, %xmm0
        vunpckhpd       %xmm1, %xmm1, %xmm1
        ret

The improvement is even more dramatic when compared to the original
29 instructions shown in comment #8.  GCC 13, for example, required
12 transfers to/from memory.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-08-03  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * config/i386/sse.md (define_split): Convert highpart:DF extract
        from V2DFmode register into a sse2_storehpd instruction.
        (define_split): Likewise, convert lowpart:DF extract from V2DF
        register into a sse2_storelpd instruction.

gcc/testsuite/ChangeLog
        * gcc.target/i386/pr88873.c: Tweak to check for improved code.


Thanks in advance,
Roger
--


[-- Attachment #2: patchet.txt --]
[-- Type: text/plain, Size: 1707 bytes --]

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 35fd66e..bc419ff 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -13554,6 +13554,14 @@
   [(set_attr "type" "ssemov")
    (set_attr "mode" "V2SF,V4SF,V2SF")])
 
+;; Convert highpart SUBREG in sse2_storehpd or *vec_extractv2df_1_sse.
+(define_split
+  [(set (match_operand:DF 0 "register_operand")
+	(subreg:DF (match_operand:V2DF 1 "register_operand") 8))]
+  "TARGET_SSE"
+  [(set (match_dup 0)
+	(vec_select:DF (match_dup 1) (parallel [(const_int 1)])))])
+
 ;; Avoid combining registers from different units in a single alternative,
 ;; see comment above inline_secondary_memory_needed function in i386.cc
 (define_insn "sse2_storelpd"
@@ -13599,6 +13607,14 @@
   [(set_attr "type" "ssemov")
    (set_attr "mode" "V2SF,V4SF,V2SF")])
 
+;; Convert lowpart SUBREG into sse2_storelpd or *vec_extractv2df_0_sse.
+(define_split
+  [(set (match_operand:DF 0 "register_operand")
+	(subreg:DF (match_operand:V2DF 1 "register_operand") 0))]
+  "TARGET_SSE"
+  [(set (match_dup 0)
+	(vec_select:DF (match_dup 1) (parallel [(const_int 0)])))])
+
 (define_expand "sse2_loadhpd_exp"
   [(set (match_operand:V2DF 0 "nonimmediate_operand")
 	(vec_concat:V2DF
diff --git a/gcc/testsuite/gcc.target/i386/pr88873.c b/gcc/testsuite/gcc.target/i386/pr88873.c
index d893aac..a3a7ef2 100644
--- a/gcc/testsuite/gcc.target/i386/pr88873.c
+++ b/gcc/testsuite/gcc.target/i386/pr88873.c
@@ -9,3 +9,5 @@ s_t foo (s_t a, s_t b, s_t c)
 }
 
 /* { dg-final { scan-assembler-times "vpunpcklqdq" 3 } } */
+/* { dg-final { scan-assembler "vunpckhpd" } } */
+/* { dg-final { scan-assembler-not "rsp" } } */

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [x86 PATCH] Split SUBREGs of SSE vector registers into vec_select insns.
  2023-08-03  7:10 [x86 PATCH] Split SUBREGs of SSE vector registers into vec_select insns Roger Sayle
@ 2023-08-03  9:37 ` Uros Bizjak
  0 siblings, 0 replies; 2+ messages in thread
From: Uros Bizjak @ 2023-08-03  9:37 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

On Thu, Aug 3, 2023 at 9:10 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch is the final piece in the series to improve the ABI issues
> affecting PR 88873.  The previous patches tackled inserting DFmode
> values into V2DFmode registers, by introducing insvti_{low,high}part
> patterns.  This patch improves the extraction of DFmode values from
> v2DFmode registers via TImode intermediates.
>
> I'd initially thought this would require new extvti_{low,high}part
> patterns to be defined, but all that's required is to recognize that
> the SUBREG idioms produced by combine are equivalent to (forms of)
> vec_select patterns.  The target-independent middle-end can't be sure
> that the appropriate vec_select instruction exists on the target,
> hence doesn't canonicalize a SUBREG of a vector mode as a vec_select,
> but the backend can provide a define_split stating where and when
> this is useful, for example, considering whether the operand is in
> memory, or whether !TARGET_SSE_MATH and the destination is i387.
>
> For pr88873.c, gcc -O2 -march=cascadelake currently generates:
>
> foo:    vpunpcklqdq     %xmm3, %xmm2, %xmm7
>         vpunpcklqdq     %xmm1, %xmm0, %xmm6
>         vpunpcklqdq     %xmm5, %xmm4, %xmm2
>         vmovdqa %xmm7, -24(%rsp)
>         vmovdqa %xmm6, %xmm1
>         movq    -16(%rsp), %rax
>         vpinsrq $1, %rax, %xmm7, %xmm4
>         vmovapd %xmm4, %xmm6
>         vfmadd132pd     %xmm1, %xmm2, %xmm6
>         vmovapd %xmm6, -24(%rsp)
>         vmovsd  -16(%rsp), %xmm1
>         vmovsd  -24(%rsp), %xmm0
>         ret
>
> with this patch, we now generate:
>
> foo:    vpunpcklqdq     %xmm1, %xmm0, %xmm6
>         vpunpcklqdq     %xmm3, %xmm2, %xmm7
>         vpunpcklqdq     %xmm5, %xmm4, %xmm2
>         vmovdqa %xmm6, %xmm1
>         vfmadd132pd     %xmm7, %xmm2, %xmm1
>         vmovsd  %xmm1, %xmm1, %xmm0
>         vunpckhpd       %xmm1, %xmm1, %xmm1
>         ret
>
> The improvement is even more dramatic when compared to the original
> 29 instructions shown in comment #8.  GCC 13, for example, required
> 12 transfers to/from memory.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-08-03  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/sse.md (define_split): Convert highpart:DF extract
>         from V2DFmode register into a sse2_storehpd instruction.
>         (define_split): Likewise, convert lowpart:DF extract from V2DF
>         register into a sse2_storelpd instruction.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/pr88873.c: Tweak to check for improved code.

OK.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-08-03  9:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-03  7:10 [x86 PATCH] Split SUBREGs of SSE vector registers into vec_select insns Roger Sayle
2023-08-03  9:37 ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).