[x86 PATCH] Support logical shifts by (some) integer constants in TImode STV.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [x86 PATCH] Support logical shifts by (some) integer constants in TImode STV.
@ 2022-07-28 22:18 Roger Sayle
  2022-07-31 17:22 ` Uros Bizjak
  0 siblings, 1 reply; 4+ messages in thread
From: Roger Sayle @ 2022-07-28 22:18 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]


This patch improves TImode STV by adding support for logical shifts by
integer constants that are multiples of 8.  For the test case:

__int128 a, b;
void foo() { a = b << 16; }

on x86_64, gcc -O2 currently generates:

        movq    b(%rip), %rax
        movq    b+8(%rip), %rdx
        shldq   $16, %rax, %rdx
        salq    $16, %rax
        movq    %rax, a(%rip)
        movq    %rdx, a+8(%rip)
        ret

with this patch we now generate:

        movdqa  b(%rip), %xmm0
        pslldq  $2, %xmm0
        movaps  %xmm0, a(%rip)
        ret

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check. both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-07-28  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * config/i386/i386-features.cc (compute_convert_gain): Add gain
        for converting suitable TImode shift to a V1TImode shift.
        (timode_scalar_chain::convert_insn): Add support for converting
        suitable ASHIFT and LSHIFTRT.
        (timode_scalar_to_vector_candidate_p): Consider logical shifts
        by integer constants that are multiples of 8 to be candidates.

gcc/testsuite/ChangeLog
        * gcc.target/i386/sse4_1-stv-7.c: New test case.


Thanks again,
Roger
--


[-- Attachment #2: patchvs.txt --]
[-- Type: text/plain, Size: 1949 bytes --]

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index aa5de71..e1e0645 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -1221,6 +1221,13 @@ timode_scalar_chain::compute_convert_gain ()
 	    igain = COSTS_N_INSNS (1);
 	  break;
 
+	case ASHIFT:
+	case LSHIFTRT:
+	  /* For logical shifts by constant multiples of 8. */
+	  igain = optimize_insn_for_size_p () ? COSTS_N_BYTES (4)
+					      : COSTS_N_INSNS (1);
+	  break;
+
 	default:
 	  break;
 	}
@@ -1462,6 +1469,12 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
       src = convert_compare (XEXP (src, 0), XEXP (src, 1), insn);
       break;
 
+    case ASHIFT:
+    case LSHIFTRT:
+      convert_op (&XEXP (src, 0), insn);
+      PUT_MODE (src, V1TImode);
+      break;
+
     default:
       gcc_unreachable ();
     }
@@ -1796,6 +1809,14 @@ timode_scalar_to_vector_candidate_p (rtx_insn *insn)
     case NOT:
       return REG_P (XEXP (src, 0)) || timode_mem_p (XEXP (src, 0));
 
+    case ASHIFT:
+    case LSHIFTRT:
+      /* Handle logical shifts by integer constants between 0 and 120
+	 that are multiples of 8.  */
+      return REG_P (XEXP (src, 0))
+	     && CONST_INT_P (XEXP (src, 1))
+	     && (INTVAL (XEXP (src, 1)) & ~0x78) == 0;
+
     default:
       return false;
     }
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-stv-7.c b/gcc/testsuite/gcc.target/i386/sse4_1-stv-7.c
new file mode 100644
index 0000000..b0d5fce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-stv-7.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign" } */
+
+unsigned __int128 a;
+unsigned __int128 b;
+
+void foo()
+{
+  a = b << 16;
+}
+
+void bar()
+{
+  a = b >> 16;
+}
+
+/* { dg-final { scan-assembler "pslldq" } } */
+/* { dg-final { scan-assembler "psrldq" } } */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [x86 PATCH] Support logical shifts by (some) integer constants in TImode STV.
  2022-07-28 22:18 [x86 PATCH] Support logical shifts by (some) integer constants in TImode STV Roger Sayle
@ 2022-07-31 17:22 ` Uros Bizjak
  2022-08-02 17:02   ` Roger Sayle
  0 siblings, 1 reply; 4+ messages in thread
From: Uros Bizjak @ 2022-07-31 17:22 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

On Fri, Jul 29, 2022 at 12:18 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch improves TImode STV by adding support for logical shifts by
> integer constants that are multiples of 8.  For the test case:
>
> __int128 a, b;
> void foo() { a = b << 16; }
>
> on x86_64, gcc -O2 currently generates:
>
>         movq    b(%rip), %rax
>         movq    b+8(%rip), %rdx
>         shldq   $16, %rax, %rdx
>         salq    $16, %rax
>         movq    %rax, a(%rip)
>         movq    %rdx, a+8(%rip)
>         ret
>
> with this patch we now generate:
>
>         movdqa  b(%rip), %xmm0
>         pslldq  $2, %xmm0
>         movaps  %xmm0, a(%rip)
>         ret
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check. both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-07-28  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-features.cc (compute_convert_gain): Add gain
>         for converting suitable TImode shift to a V1TImode shift.
>         (timode_scalar_chain::convert_insn): Add support for converting
>         suitable ASHIFT and LSHIFTRT.
>         (timode_scalar_to_vector_candidate_p): Consider logical shifts
>         by integer constants that are multiples of 8 to be candidates.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/sse4_1-stv-7.c: New test case.

+ case ASHIFT:
+ case LSHIFTRT:
+  /* For logical shifts by constant multiples of 8. */
+  igain = optimize_insn_for_size_p () ? COSTS_N_BYTES (4)
+      : COSTS_N_INSNS (1);

Isn't the conversion an universal win for -O2 as well as for -Os? The
conversion to/from XMM register is already accounted for, so for -Os
substituting shldq/salq with pslldq should always be a win. I'd expect
the cost calculation to be similar to the
general_scalar_chain::compute_convert_gain cost calculation with m =
2.

Uros.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [x86 PATCH] Support logical shifts by (some) integer constants in TImode STV.
  2022-07-31 17:22 ` Uros Bizjak
@ 2022-08-02 17:02   ` Roger Sayle
  2022-08-02 17:22     ` Uros Bizjak
  0 siblings, 1 reply; 4+ messages in thread
From: Roger Sayle @ 2022-08-02 17:02 UTC (permalink / raw)
  To: 'Uros Bizjak'; +Cc: gcc-patches

Hi Uros,

> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: 31 July 2022 18:23
> To: Roger Sayle <roger@nextmovesoftware.com>
> On Fri, Jul 29, 2022 at 12:18 AM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> > This patch improves TImode STV by adding support for logical shifts by
> > integer constants that are multiples of 8.  For the test case:
> >
> > __int128 a, b;
> > void foo() { a = b << 16; }
> >
> > on x86_64, gcc -O2 currently generates:
> >
> >         movq    b(%rip), %rax
> >         movq    b+8(%rip), %rdx
> >         shldq   $16, %rax, %rdx
> >         salq    $16, %rax
> >         movq    %rax, a(%rip)
> >         movq    %rdx, a+8(%rip)
> >         ret
> >
> > with this patch we now generate:
> >
> >         movdqa  b(%rip), %xmm0
> >         pslldq  $2, %xmm0
> >         movaps  %xmm0, a(%rip)
> >         ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check. both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> >
> >
> > 2022-07-28  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/i386-features.cc (compute_convert_gain): Add gain
> >         for converting suitable TImode shift to a V1TImode shift.
> >         (timode_scalar_chain::convert_insn): Add support for converting
> >         suitable ASHIFT and LSHIFTRT.
> >         (timode_scalar_to_vector_candidate_p): Consider logical shifts
> >         by integer constants that are multiples of 8 to be candidates.
> >
> > gcc/testsuite/ChangeLog
> >         * gcc.target/i386/sse4_1-stv-7.c: New test case.
> 
> + case ASHIFT:
> + case LSHIFTRT:
> +  /* For logical shifts by constant multiples of 8. */  igain =
> + optimize_insn_for_size_p () ? COSTS_N_BYTES (4)
> +      : COSTS_N_INSNS (1);
> 
> Isn't the conversion an universal win for -O2 as well as for -Os? The conversion
> to/from XMM register is already accounted for, so for -Os substituting
> shldq/salq with pslldq should always be a win. I'd expect the cost calculation to
> be similar to the general_scalar_chain::compute_convert_gain cost calculation
> with m = 2.

I agree that the terminology is perhaps a little confusing.  The
compute_convert_gain function calculates the total "gain" from an
STV chain, summing the igain of each instruction, and performs
the STV transformation if this total gain is greater than zero.
Hence positive values are good and negative values are bad.

In this case, of a logical shift by multiple of 8, converting the chain is indeed always
beneficial, reducing by 4 bytes in size when optimizing for size, and avoiding 1 fast
instruction when optimizing for speed.  Having a "positive gain of four bytes" sounds bad,
but in this sense the gain is used as a synonym of "benefit" not "magnitude".

By comparison, shifting by a single bit 128 bit value is always a net loss, requiring
three addition fast instructions, or 15 extra bytes in size.  However, it's still worth
considering/capturing these counter-productive (i.e. negative) values, as they
might be compensated for by other wins in the chain.

Dealing with COSTS_N_BYTES (when optimizing for size) and COSTS_N_INSNS
(when optimizing for speed) allows much finer granularity.  For example,
the constant number of bits used in a shift/rotate, or the value of an
immediate constant in a compare have significant effects on the size/speed
of scalar vs. vector code, and this isn't (yet) something easily handled by the
simple "m" approximation used in general_scalar_chain::compute_convert_gain.

See (comment #5 of) PR target/105034 which mentions the need for more
accurate parameterization of compute_convert_gain (in response to the
undesirable suggestion of simply disabling STV when optimizing for size). 

I hope this explains the above idiom.  Hopefully, things will become clearer
when support for shifts by other bit counts, and arithmetic shifts, are added
to this part of the code (STV).  I'll be sure to add more comments.

[Ok for mainline?]

Cheers,
Roger
--

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [x86 PATCH] Support logical shifts by (some) integer constants in TImode STV.
  2022-08-02 17:02   ` Roger Sayle
@ 2022-08-02 17:22     ` Uros Bizjak
  0 siblings, 0 replies; 4+ messages in thread
From: Uros Bizjak @ 2022-08-02 17:22 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

On Tue, Aug 2, 2022 at 7:02 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
>
> > From: Uros Bizjak <ubizjak@gmail.com>
> > Sent: 31 July 2022 18:23
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > On Fri, Jul 29, 2022 at 12:18 AM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > >
> > > This patch improves TImode STV by adding support for logical shifts by
> > > integer constants that are multiples of 8.  For the test case:
> > >
> > > __int128 a, b;
> > > void foo() { a = b << 16; }
> > >
> > > on x86_64, gcc -O2 currently generates:
> > >
> > >         movq    b(%rip), %rax
> > >         movq    b+8(%rip), %rdx
> > >         shldq   $16, %rax, %rdx
> > >         salq    $16, %rax
> > >         movq    %rax, a(%rip)
> > >         movq    %rdx, a+8(%rip)
> > >         ret
> > >
> > > with this patch we now generate:
> > >
> > >         movdqa  b(%rip), %xmm0
> > >         pslldq  $2, %xmm0
> > >         movaps  %xmm0, a(%rip)
> > >         ret
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check. both with and without --target_board=unix{-m32},
> > > with no new failures.  Ok for mainline?
> > >
> > >
> > > 2022-07-28  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         * config/i386/i386-features.cc (compute_convert_gain): Add gain
> > >         for converting suitable TImode shift to a V1TImode shift.
> > >         (timode_scalar_chain::convert_insn): Add support for converting
> > >         suitable ASHIFT and LSHIFTRT.
> > >         (timode_scalar_to_vector_candidate_p): Consider logical shifts
> > >         by integer constants that are multiples of 8 to be candidates.
> > >
> > > gcc/testsuite/ChangeLog
> > >         * gcc.target/i386/sse4_1-stv-7.c: New test case.
> >
> > + case ASHIFT:
> > + case LSHIFTRT:
> > +  /* For logical shifts by constant multiples of 8. */  igain =
> > + optimize_insn_for_size_p () ? COSTS_N_BYTES (4)
> > +      : COSTS_N_INSNS (1);
> >
> > Isn't the conversion an universal win for -O2 as well as for -Os? The conversion
> > to/from XMM register is already accounted for, so for -Os substituting
> > shldq/salq with pslldq should always be a win. I'd expect the cost calculation to
> > be similar to the general_scalar_chain::compute_convert_gain cost calculation
> > with m = 2.
>
> I agree that the terminology is perhaps a little confusing.  The
> compute_convert_gain function calculates the total "gain" from an
> STV chain, summing the igain of each instruction, and performs
> the STV transformation if this total gain is greater than zero.
> Hence positive values are good and negative values are bad.
>
> In this case, of a logical shift by multiple of 8, converting the chain is indeed always
> beneficial, reducing by 4 bytes in size when optimizing for size, and avoiding 1 fast
> instruction when optimizing for speed.  Having a "positive gain of four bytes" sounds bad,
> but in this sense the gain is used as a synonym of "benefit" not "magnitude".
>
> By comparison, shifting by a single bit 128 bit value is always a net loss, requiring
> three addition fast instructions, or 15 extra bytes in size.  However, it's still worth
> considering/capturing these counter-productive (i.e. negative) values, as they
> might be compensated for by other wins in the chain.
>
> Dealing with COSTS_N_BYTES (when optimizing for size) and COSTS_N_INSNS
> (when optimizing for speed) allows much finer granularity.  For example,
> the constant number of bits used in a shift/rotate, or the value of an
> immediate constant in a compare have significant effects on the size/speed
> of scalar vs. vector code, and this isn't (yet) something easily handled by the
> simple "m" approximation used in general_scalar_chain::compute_convert_gain.
>
> See (comment #5 of) PR target/105034 which mentions the need for more
> accurate parameterization of compute_convert_gain (in response to the
> undesirable suggestion of simply disabling STV when optimizing for size).
>
> I hope this explains the above idiom.  Hopefully, things will become clearer
> when support for shifts by other bit counts, and arithmetic shifts, are added
> to this part of the code (STV).  I'll be sure to add more comments.

Thanks for the explanation, I was confused by the differences between
32bit DImode and 64bit TImode costs (these should IMO be quite
similar, so if the TImode cost function is now more precise, DImode
cost function should follow it).

> [Ok for mainline?]

OK. The cost functions are always in need of more accuracy..

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-08-02 17:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-28 22:18 [x86 PATCH] Support logical shifts by (some) integer constants in TImode STV Roger Sayle
2022-07-31 17:22 ` Uros Bizjak
2022-08-02 17:02   ` Roger Sayle
2022-08-02 17:22     ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).