[x86 PATCH take #2] Move V1TI shift/rotate lowering from expand to pre-reload split.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Uros Bizjak'" <ubizjak@gmail.com>
Cc: <gcc-patches@gcc.gnu.org>
Subject: [x86 PATCH take #2] Move V1TI shift/rotate lowering from expand to pre-reload split.
Date: Fri, 12 Aug 2022 22:24:39 +0100	[thread overview]
Message-ID: <008101d8ae91$ef2ca100$cd85e300$@nextmovesoftware.com> (raw)
In-Reply-To: <CAFULd4Y_29tA1wE9A6RsiZNHguFcU0BG_DYi1godKmqeUAvxeg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5631 bytes --]


Hi Uros,
As requested, here's an updated version of my patch that introduces a new
const_0_to_255_not_mul_8_operand as you've requested.  I think in this
instance, having mutually exclusive patterns that can appear in any order,
without imposing implicit ordering constraints, is slightly preferable,
especially as (thanks to STV)  some related patterns may appear in
sse.md and others appear in i386.md (making ordering tricky).

This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-08-12  Roger Sayle  <roger@nextmovesoftware.com>
            Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
        * config/i386/predicates.md (const_0_to_255_not_mul_8_operand):
        New predicate for values between 0/1 and 255, not multiples of 8.
        * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left
        shifts by constant bit counts.
        (*ashlvti3_internal): New define_insn_and_split that lowers
        logical left shifts by constant bit counts, that aren't multiples
        of 8, before reload.
        (lshrv1ti3): Delay lowering of logical right shifts by constant.
        (*lshrv1ti3_internal): New define_insn_and_split that lowers
        logical right shifts by constant bit counts, that aren't multiples
        of 8, before reload.
        (ashrv1ti3):: Delay lowering of arithmetic right shifts by
        constant bit counts.
        (*ashrv1ti3_internal): New define_insn_and_split that lowers
        arithmetic right shifts by constant bit counts before reload.
        (rotlv1ti3): Delay lowering of rotate left by constant.
        (*rotlv1ti3_internal): New define_insn_and_split that lowers
        rotate left by constant bits counts before reload.
        (rotrv1ti3): Delay lowering of rotate right by constant.
        (*rotrv1ti3_internal): New define_insn_and_split that lowers
        rotate right by constant bits counts before reload.


Thanks again,
Roger

> -----Original Message-----
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: 08 August 2022 08:48
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PATCH] Move V1TI shift/rotate lowering from expand to pre-
> reload split.
> 
> On Fri, Aug 5, 2022 at 8:36 PM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> >
> > This patch moves the lowering of 128-bit V1TImode shifts and rotations
> > by constant bit counts to sequences of SSE operations from the RTL
> > expansion pass to the pre-reload split pass.  Postponing this
> > splitting of shifts and rotates enables (will enable) the TImode
> > equivalents of these operations/ instructions to be considered as
> > candidates by the (TImode) STV pass.
> > Technically, this patch changes the existing expanders to continue to
> > lower shifts by variable amounts, but constant operands become RTL
> > instructions, specified by define_insn_and_split that are triggered by
> > x86_pre_reload_split.  The one minor complication is that logical
> > shifts by multiples of eight, don't get split, but are handled by
> > existing insn patterns, such as sse2_ashlv1ti3 and sse2_lshrv1ti3.
> > There should be no changes in generated code with this patch, which
> > just adjusts the pass in which transformations get applied.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> >
> >
> >
> > 2022-08-05  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left
> >         shifts by constant bit counts.
> >         (*ashlvti3_internal): New define_insn_and_split that lowers
> >         logical left shifts by constant bit counts, that aren't multiples
> >         of 8, before reload.
> >         (lshrv1ti3): Delay lowering of logical right shifts by constant.
> >         (*lshrv1ti3_internal): New define_insn_and_split that lowers
> >         logical right shifts by constant bit counts, that aren't multiples
> >         of 8, before reload.
> >         (ashrv1ti3):: Delay lowering of arithmetic right shifts by
> >         constant bit counts.
> >         (*ashrv1ti3_internal): New define_insn_and_split that lowers
> >         arithmetic right shifts by constant bit counts before reload.
> >         (rotlv1ti3): Delay lowering of rotate left by constant.
> >         (*rotlv1ti3_internal): New define_insn_and_split that lowers
> >         rotate left by constant bits counts before reload.
> >         (rotrv1ti3): Delay lowering of rotate right by constant.
> >         (*rotrv1ti3_internal): New define_insn_and_split that lowers
> >         rotate right by constant bits counts before reload.
> 
> +(define_insn_and_split "*ashlv1ti3_internal"
> +  [(set (match_operand:V1TI 0 "register_operand")
>   (ashift:V1TI
>   (match_operand:V1TI 1 "register_operand")
> - (match_operand:QI 2 "general_operand")))]
> -  "TARGET_SSE2 && TARGET_64BIT"
> + (match_operand:SI 2 "const_0_to_255_operand")))]
> +  "TARGET_SSE2
> +   && TARGET_64BIT
> +   && (INTVAL (operands[2]) & 7) != 0
> 
> Please introduce const_0_to_255_not_mul_8_operand predicate.
> Alternatively, and preferably, you can use pattern shadowing, where the
> preceding, more constrained pattern will match before the following, more
> broad pattern will.
> 
> Uros.

[-- Attachment #2: patchrt2.txt --]
[-- Type: text/plain, Size: 4220 bytes --]

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 064596d..4f16bb7 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -931,6 +931,14 @@
   return val <= 255*8 && val % 8 == 0;
 })
 
+;; Match 1 to 255 except multiples of 8
+(define_predicate "const_0_to_255_not_mul_8_operand"
+  (match_code "const_int")
+{
+  unsigned HOST_WIDE_INT val = INTVAL (op);
+  return val <= 255 && val % 8 != 0;
+})
+
 ;; Return true if OP is CONST_INT >= 1 and <= 31 (a valid operand
 ;; for shift & compare patterns, as shifting by 0 does not change flags).
 (define_predicate "const_1_to_31_operand"
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 14d12d1..468c5f1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15995,10 +15995,29 @@
 
 (define_expand "ashlv1ti3"
   [(set (match_operand:V1TI 0 "register_operand")
+        (ashift:V1TI
+         (match_operand:V1TI 1 "register_operand")
+         (match_operand:QI 2 "general_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT"
+{
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_shift (ASHIFT, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*ashlv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
 	(ashift:V1TI
 	 (match_operand:V1TI 1 "register_operand")
-	 (match_operand:QI 2 "general_operand")))]
-  "TARGET_SSE2 && TARGET_64BIT"
+	 (match_operand:SI 2 "const_0_to_255_not_mul_8_operand")))]
+  "TARGET_SSE2
+   && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
 {
   ix86_expand_v1ti_shift (ASHIFT, operands);
   DONE;
@@ -16011,6 +16030,25 @@
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
 {
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_shift (LSHIFTRT, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*lshrv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(lshiftrt:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_not_mul_8_operand")))]
+  "TARGET_SSE2
+   && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
   ix86_expand_v1ti_shift (LSHIFTRT, operands);
   DONE;
 })
@@ -16022,6 +16060,26 @@
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
 {
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_ashiftrt (operands);
+      DONE;
+    }
+})
+
+
+(define_insn_and_split "*ashrv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(ashiftrt:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_operand")))]
+  "TARGET_SSE2
+   && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
   ix86_expand_v1ti_ashiftrt (operands);
   DONE;
 })
@@ -16033,6 +16091,25 @@
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
 {
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_rotate (ROTATE, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*rotlv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(rotate:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_operand")))]
+  "TARGET_SSE2
+   && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
   ix86_expand_v1ti_rotate (ROTATE, operands);
   DONE;
 })
@@ -16044,6 +16121,25 @@
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
 {
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_rotate (ROTATERT, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*rotrv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(rotatert:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_operand")))]
+  "TARGET_SSE2
+   && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
   ix86_expand_v1ti_rotate (ROTATERT, operands);
   DONE;
 })

next prev parent reply	other threads:[~2022-08-12 21:24 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-05 18:36 [x86 PATCH] " Roger Sayle
2022-08-08  7:48 ` Uros Bizjak
2022-08-12 21:24   ` Roger Sayle [this message]
2022-08-13  9:57     ` [x86 PATCH take #2] " Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='008101d8ae91$ef2ca100$cd85e300$@nextmovesoftware.com' \
    --to=roger@nextmovesoftware.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).