public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r13-2038] Move V1TI shift/rotate lowering from expand to pre-reload split on x86_64.
@ 2022-08-13 13:54 Roger Sayle
  0 siblings, 0 replies; only message in thread
From: Roger Sayle @ 2022-08-13 13:54 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:4991e20923b658ce9fbdf5621cab39f71b98fbc2

commit r13-2038-g4991e20923b658ce9fbdf5621cab39f71b98fbc2
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Sat Aug 13 14:52:45 2022 +0100

    Move V1TI shift/rotate lowering from expand to pre-reload split on x86_64.
    
    This patch moves the lowering of 128-bit V1TImode shifts and rotations by
    constant bit counts to sequences of SSE operations from the RTL expansion
    pass to the pre-reload split pass.  Postponing this splitting of shifts
    and rotates enables (will enable) the TImode equivalents of these operations/
    instructions to be considered as candidates by the (TImode) STV pass.
    Technically, this patch changes the existing expanders to continue to
    lower shifts by variable amounts, but constant operands become RTL
    instructions, specified by define_insn_and_split that are triggered by
    x86_pre_reload_split.  The one minor complication is that logical shifts
    by multiples of eight, don't get split, but are handled by existing insn
    patterns, such as sse2_ashlv1ti3 and sse2_lshrv1ti3.  There should be no
    changes in generated code with this patch, which just adjusts the pass
    in which transformations get applied.
    
    2022-08-13  Roger Sayle  <roger@nextmovesoftware.com>
                Uroš Bizjak  <ubizjak@gmail.com>
    
    gcc/ChangeLog
            * config/i386/predicates.md (const_0_to_255_not_mul_8_operand):
            New predicate for values between 0/1 and 255, not multiples of 8.
            * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left
            shifts by constant bit counts.
            (*ashlvti3_internal): New define_insn_and_split that lowers
            logical left shifts by constant bit counts, that aren't multiples
            of 8, before reload.
            (lshrv1ti3): Delay lowering of logical right shifts by constant.
            (*lshrv1ti3_internal): New define_insn_and_split that lowers
            logical right shifts by constant bit counts, that aren't multiples
            of 8, before reload.
            (ashrv1ti3):: Delay lowering of arithmetic right shifts by
            constant bit counts.
            (*ashrv1ti3_internal): New define_insn_and_split that lowers
            arithmetic right shifts by constant bit counts before reload.
            (rotlv1ti3): Delay lowering of rotate left by constant.
            (*rotlv1ti3_internal): New define_insn_and_split that lowers
            rotate left by constant bits counts before reload.
            (rotrv1ti3): Delay lowering of rotate right by constant.
            (*rotrv1ti3_internal): New define_insn_and_split that lowers
            rotate right by constant bits counts before reload.

Diff:
---
 gcc/config/i386/predicates.md |  8 ++++
 gcc/config/i386/sse.md        | 95 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 064596d9594..4f16bb748b5 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -931,6 +931,14 @@
   return val <= 255*8 && val % 8 == 0;
 })
 
+;; Match 1 to 255 except multiples of 8
+(define_predicate "const_0_to_255_not_mul_8_operand"
+  (match_code "const_int")
+{
+  unsigned HOST_WIDE_INT val = INTVAL (op);
+  return val <= 255 && val % 8 != 0;
+})
+
 ;; Return true if OP is CONST_INT >= 1 and <= 31 (a valid operand
 ;; for shift & compare patterns, as shifting by 0 does not change flags).
 (define_predicate "const_1_to_31_operand"
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ccd9d002e93..b23f07e08c6 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15994,11 +15994,29 @@
 })
 
 (define_expand "ashlv1ti3"
+  [(set (match_operand:V1TI 0 "register_operand")
+        (ashift:V1TI
+         (match_operand:V1TI 1 "register_operand")
+         (match_operand:QI 2 "general_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT"
+{
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_shift (ASHIFT, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*ashlv1ti3_internal"
   [(set (match_operand:V1TI 0 "register_operand")
 	(ashift:V1TI
 	 (match_operand:V1TI 1 "register_operand")
-	 (match_operand:QI 2 "general_operand")))]
-  "TARGET_SSE2 && TARGET_64BIT"
+	 (match_operand:SI 2 "const_0_to_255_not_mul_8_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
 {
   ix86_expand_v1ti_shift (ASHIFT, operands);
   DONE;
@@ -16010,6 +16028,24 @@
 	 (match_operand:V1TI 1 "register_operand")
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
+{
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_shift (LSHIFTRT, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*lshrv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(lshiftrt:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_not_mul_8_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
 {
   ix86_expand_v1ti_shift (LSHIFTRT, operands);
   DONE;
@@ -16021,6 +16057,25 @@
 	 (match_operand:V1TI 1 "register_operand")
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
+{
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_ashiftrt (operands);
+      DONE;
+    }
+})
+
+
+(define_insn_and_split "*ashrv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(ashiftrt:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
 {
   ix86_expand_v1ti_ashiftrt (operands);
   DONE;
@@ -16032,6 +16087,24 @@
 	 (match_operand:V1TI 1 "register_operand")
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
+{
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_rotate (ROTATE, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*rotlv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(rotate:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
 {
   ix86_expand_v1ti_rotate (ROTATE, operands);
   DONE;
@@ -16043,6 +16116,24 @@
 	 (match_operand:V1TI 1 "register_operand")
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
+{
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_rotate (ROTATERT, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*rotrv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(rotatert:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
 {
   ix86_expand_v1ti_rotate (ROTATERT, operands);
   DONE;


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-08-13 13:54 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-13 13:54 [gcc r13-2038] Move V1TI shift/rotate lowering from expand to pre-reload split on x86_64 Roger Sayle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).