public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Check rrotate optab first when transforming lrotate
@ 2019-07-15  8:59 Kewen.Lin
  2019-07-15  9:16 ` Jakub Jelinek
  0 siblings, 1 reply; 42+ messages in thread
From: Kewen.Lin @ 2019-07-15  8:59 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, richard.sandiford

Hi all,

In match.pd and expmed.c, we have some codes to transform lrotate to 
rrotate if rotation count is const.  But they don't consider the target
whether supports the rrotate.  It leads to some suboptimal generated
code since some optimization can't get expected result by querying
target optab.  One typical case is that we miss some rotation 
vectorization opportunity on Power.

This patch is to teach them to check target optab availability first.

Regression testing just launched, is it OK for trunk if it passed and
bootstrapped?

Thanks,
Kewen

-----------------------------------------------

gcc/ChangeLog

2019-07-15  Kewen Lin  <linkw@gcc.gnu.org>

	* expmed.c (expand_shift_1): Only transform to opposite rotate
	when it's supported on the target.
	* match.pd (lrot N -> rrot N'): Check target support or not.

gcc/testsuite/ChangeLog

2019-07-15  Kewen Lin  <linkw@gcc.gnu.org>

	* gcc.target/powerpc/vec_rotate.c: New test.


diff --git a/gcc/expmed.c b/gcc/expmed.c
index d7f8e9a5d76..6cd3341a0e4 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -2491,7 +2491,9 @@ expand_shift_1 (enum tree_code code, machine_mode mode, rtx shifted,
   if (rotate
       && CONST_INT_P (op1)
       && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
-                  GET_MODE_BITSIZE (scalar_mode) - 1))
+                  GET_MODE_BITSIZE (scalar_mode) - 1)
+      && ((left && optab_handler (rrotate_optab, mode) != CODE_FOR_nothing) ||
+          (!left && optab_handler (lrotate_optab, mode) != CODE_FOR_nothing)))
     {
       op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
                                         - INTVAL (op1)));
diff --git a/gcc/match.pd b/gcc/match.pd
index 88dae4231d8..a63cd15e129 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2418,11 +2418,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

 /* Rewrite an LROTATE_EXPR by a constant into an
    RROTATE_EXPR by a new constant.  */
+
 (simplify
  (lrotate @0 INTEGER_CST@1)
+ (if ((VECTOR_TYPE_P (type) && target_supports_op_p (type, RROTATE_EXPR, optab_vector))
+  || (!VECTOR_TYPE_P (type) && target_supports_op_p (type, RROTATE_EXPR, optab_scalar)))
  (rrotate @0 { const_binop (MINUS_EXPR, TREE_TYPE (@1),
                            build_int_cst (TREE_TYPE (@1),
-                                          element_precision (type)), @1); }))
+                                          element_precision (type)), @1); })))

 /* Turn (a OP c1) OP c2 into a OP (c1+c2).  */
 (for op (lrotate rrotate rshift lshift)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate.c
new file mode 100644
index 00000000000..16d1f297d2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate.c
@@ -0,0 +1,47 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+/* Check LROTATE to RROTATE transformation on const rotation count is disabled
+   on Power by checking optab, it helps vectorizer to exploit vector rotation
+   instructions.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> 8) | (sud[i] << (sizeof (sud[0]) * 8 - 8));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> 8) | (suw[i] << (sizeof (suw[0]) * 8 - 8));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> 9)
+            | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - 9));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> 5)
+            | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - 5));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Check rrotate optab first when transforming lrotate
  2019-07-15  8:59 Check rrotate optab first when transforming lrotate Kewen.Lin
@ 2019-07-15  9:16 ` Jakub Jelinek
  2019-07-15  9:19   ` Richard Biener
                     ` (4 more replies)
  0 siblings, 5 replies; 42+ messages in thread
From: Jakub Jelinek @ 2019-07-15  9:16 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, Richard Biener, richard.sandiford

On Mon, Jul 15, 2019 at 04:50:13PM +0800, Kewen.Lin wrote:
> In match.pd and expmed.c, we have some codes to transform lrotate to 
> rrotate if rotation count is const.  But they don't consider the target
> whether supports the rrotate.  It leads to some suboptimal generated
> code since some optimization can't get expected result by querying
> target optab.  One typical case is that we miss some rotation 
> vectorization opportunity on Power.

This will not do the right thing if neither lrotate nor rrotate is
supported, you want to canonicalize in that case IMHO.
The code formatting is wrong (|| at the end of line, overly long lines).

Finally, what is the reason why Power doesn't support one of the rotates?
Especially for rotates by constant, you can transform those in the
define_expand.

Canonicalizing code is highly desirable during GIMPLE, it means if you have
say left rotate by 23 in one spot and right rotate by bitsize - 23 in
another spot, e.g. SCCVN can merge that code.

So, IMNSHO, just improve the backend.

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Check rrotate optab first when transforming lrotate
  2019-07-15  9:16 ` Jakub Jelinek
@ 2019-07-15  9:19   ` Richard Biener
  2019-07-15  9:20   ` Richard Sandiford
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: Richard Biener @ 2019-07-15  9:19 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Kewen.Lin, GCC Patches, Richard Sandiford

On Mon, Jul 15, 2019 at 10:59 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Jul 15, 2019 at 04:50:13PM +0800, Kewen.Lin wrote:
> > In match.pd and expmed.c, we have some codes to transform lrotate to
> > rrotate if rotation count is const.  But they don't consider the target
> > whether supports the rrotate.  It leads to some suboptimal generated
> > code since some optimization can't get expected result by querying
> > target optab.  One typical case is that we miss some rotation
> > vectorization opportunity on Power.
>
> This will not do the right thing if neither lrotate nor rrotate is
> supported, you want to canonicalize in that case IMHO.
> The code formatting is wrong (|| at the end of line, overly long lines).
>
> Finally, what is the reason why Power doesn't support one of the rotates?
> Especially for rotates by constant, you can transform those in the
> define_expand.
>
> Canonicalizing code is highly desirable during GIMPLE, it means if you have
> say left rotate by 23 in one spot and right rotate by bitsize - 23 in
> another spot, e.g. SCCVN can merge that code.
>
> So, IMNSHO, just improve the backend.

Agreed.  Or alternatively improve the generic expander.

Richard.

>         Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Check rrotate optab first when transforming lrotate
  2019-07-15  9:16 ` Jakub Jelinek
  2019-07-15  9:19   ` Richard Biener
@ 2019-07-15  9:20   ` Richard Sandiford
  2019-07-15 10:54   ` Kewen.Lin
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: Richard Sandiford @ 2019-07-15  9:20 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Kewen.Lin, GCC Patches, Richard Biener

Jakub Jelinek <jakub@redhat.com> writes:
> On Mon, Jul 15, 2019 at 04:50:13PM +0800, Kewen.Lin wrote:
>> In match.pd and expmed.c, we have some codes to transform lrotate to 
>> rrotate if rotation count is const.  But they don't consider the target
>> whether supports the rrotate.  It leads to some suboptimal generated
>> code since some optimization can't get expected result by querying
>> target optab.  One typical case is that we miss some rotation 
>> vectorization opportunity on Power.
>
> This will not do the right thing if neither lrotate nor rrotate is
> supported, you want to canonicalize in that case IMHO.
> The code formatting is wrong (|| at the end of line, overly long lines).
>
> Finally, what is the reason why Power doesn't support one of the rotates?
> Especially for rotates by constant, you can transform those in the
> define_expand.
>
> Canonicalizing code is highly desirable during GIMPLE, it means if you have
> say left rotate by 23 in one spot and right rotate by bitsize - 23 in
> another spot, e.g. SCCVN can merge that code.
>
> So, IMNSHO, just improve the backend.

Agreed FWIW, if the target supports rotates by a constant.  If it
only supports rotates by a variable then I think expand should try
to use the rrotate optab for lrotates.

The current code looks odd though.  The comment above the expmed.c
hunk is:

  /* Canonicalize rotates by constant amount.  If op1 is bitsize / 2,
     prefer left rotation, if op1 is from bitsize / 2 + 1 to
     bitsize - 1, use other direction of rotate with 1 .. bitsize / 2 - 1
     amount instead.  */

whereas rtl.texi says:

@item (rotate:@var{m} @var{x} @var{c})
@itemx (rotatert:@var{m} @var{x} @var{c})
Similar but represent left and right rotate.  If @var{c} is a constant,
use @code{rotate}.

simplify-rtx.c seems to stick to the rtl.texi version and use ROTATE
rather than ROTATERT for constant rotates.

If a target prefers the expmed.c version then I think it should handle
that in the asm template.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Check rrotate optab first when transforming lrotate
  2019-07-15  9:16 ` Jakub Jelinek
  2019-07-15  9:19   ` Richard Biener
  2019-07-15  9:20   ` Richard Sandiford
@ 2019-07-15 10:54   ` Kewen.Lin
  2019-07-15 14:51   ` Segher Boessenkool
       [not found]   ` <d2ccc831-c805-c7b8-5a90-cb3e5ee5ed8b@linux.ibm.com>
  4 siblings, 0 replies; 42+ messages in thread
From: Kewen.Lin @ 2019-07-15 10:54 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: GCC Patches, Richard Biener, richard.sandiford,
	Segher Boessenkool, Bill Schmidt

Hi Jakub,

on 2019/7/15 脧脗脦莽4:59, Jakub Jelinek wrote:
> On Mon, Jul 15, 2019 at 04:50:13PM +0800, Kewen.Lin wrote:
>> In match.pd and expmed.c, we have some codes to transform lrotate to 
>> rrotate if rotation count is const.  But they don't consider the target
>> whether supports the rrotate.  It leads to some suboptimal generated
>> code since some optimization can't get expected result by querying
>> target optab.  One typical case is that we miss some rotation 
>> vectorization opportunity on Power.
> 
> This will not do the right thing if neither lrotate nor rrotate is
> supported, you want to canonicalize in that case IMHO.
> The code formatting is wrong (|| at the end of line, overly long lines).
> 
> Finally, what is the reason why Power doesn't support one of the rotates?
> Especially for rotates by constant, you can transform those in the
> define_expand.
> 
> Canonicalizing code is highly desirable during GIMPLE, it means if you have
> say left rotate by 23 in one spot and right rotate by bitsize - 23 in
> another spot, e.g. SCCVN can merge that code.
> 
> So, IMNSHO, just improve the backend.
> 

OK, I see.  Thanks for the explanation.  I'll try to fix it in the backend.


Thanks,
Kewen

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Check rrotate optab first when transforming lrotate
  2019-07-15  9:16 ` Jakub Jelinek
                     ` (2 preceding siblings ...)
  2019-07-15 10:54   ` Kewen.Lin
@ 2019-07-15 14:51   ` Segher Boessenkool
       [not found]   ` <d2ccc831-c805-c7b8-5a90-cb3e5ee5ed8b@linux.ibm.com>
  4 siblings, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-07-15 14:51 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Kewen.Lin, GCC Patches, Richard Biener, richard.sandiford

On Mon, Jul 15, 2019 at 10:59:29AM +0200, Jakub Jelinek wrote:
> On Mon, Jul 15, 2019 at 04:50:13PM +0800, Kewen.Lin wrote:
> > In match.pd and expmed.c, we have some codes to transform lrotate to 
> > rrotate if rotation count is const.  But they don't consider the target
> > whether supports the rrotate.  It leads to some suboptimal generated
> > code since some optimization can't get expected result by querying
> > target optab.  One typical case is that we miss some rotation 
> > vectorization opportunity on Power.
> 
> This will not do the right thing if neither lrotate nor rrotate is
> supported, you want to canonicalize in that case IMHO.
> The code formatting is wrong (|| at the end of line, overly long lines).
> 
> Finally, what is the reason why Power doesn't support one of the rotates?

Because it is pointless duplication, and we have some dozen rotate
patterns already.

Canonicalising this representation is highly desirable.  Everything in
RTL already does work fine, fwiw (and did since before rotatert
existed).


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [RFC] Consider lrotate const rotation in vectorizer
       [not found]   ` <d2ccc831-c805-c7b8-5a90-cb3e5ee5ed8b@linux.ibm.com>
@ 2019-07-16  8:48     ` Kewen.Lin
  2019-07-17  8:42       ` [PATCH, rs6000] Support vrotr<mode>3 for int vector types Kewen.Lin
  2019-07-17 10:39       ` [RFC] Consider lrotate const rotation in vectorizer Richard Biener
  0 siblings, 2 replies; 42+ messages in thread
From: Kewen.Lin @ 2019-07-16  8:48 UTC (permalink / raw)
  To: GCC Patches
  Cc: Jakub Jelinek, Richard Biener, richard.sandiford,
	Segher Boessenkool, Bill Schmidt

Hi all,

Based on the previous comments (thank you!), I tried to update the 
handling in expander and vectorizer.  Middle-end optimizes lrotate
with const rotation count to rrotate all the time, it makes vectorizer
fail to vectorize if rrotate isn't supported on the target.  We can at
least teach it on const rotation count, the cost should be the same? 
At the same time, the expander already tries to use the opposite 
rotation optable for scalar, we can teach it to deal with vector as well.

Is it on the right track and reasonable?


Thanks,
Kewen

--------------

One informal patch to help describing this new thought:


diff --git a/gcc/optabs.c b/gcc/optabs.c
index a0e361b8bfe..ebebb0ad145 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -1273,6 +1273,7 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, rtx op1,
   if (mclass == MODE_VECTOR_INT)
     {
       optab otheroptab = unknown_optab;
+      optab otheroptab1 = unknown_optab;

       if (binoptab == ashl_optab)
        otheroptab = vashl_optab;
@@ -1281,23 +1282,50 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, rtx op1,
       else if (binoptab == lshr_optab)
        otheroptab = vlshr_optab;
       else if (binoptab == rotl_optab)
-       otheroptab = vrotl_optab;
+       {
+         otheroptab = vrotl_optab;
+         otheroptab1 = vrotr_optab;
+       }
       else if (binoptab == rotr_optab)
-       otheroptab = vrotr_optab;
+       {
+         otheroptab = vrotr_optab;
+         otheroptab1 = vrotl_optab;
+       }
+
+      bool other_ok = (otheroptab && (icode = optab_handler (otheroptab, mode)) != CODE_FOR_nothing);
+      bool other1_ok = false;
+      if (!other_ok && otheroptab1)
+       other1_ok
+         = ((icode = optab_handler (otheroptab1, mode)) != CODE_FOR_nothing)
+           && SCALAR_INT_MODE_P (GET_MODE_INNER (mode));

-      if (otheroptab
-         && (icode = optab_handler (otheroptab, mode)) != CODE_FOR_nothing)
+      if (other_ok || other1_ok)
        {
          /* The scalar may have been extended to be too wide.  Truncate
             it back to the proper size to fit in the broadcast vector.  */
          scalar_mode inner_mode = GET_MODE_INNER (mode);
-         if (!CONST_INT_P (op1)
-             && (GET_MODE_BITSIZE (as_a <scalar_int_mode> (GET_MODE (op1)))
+         rtx newop1 = op1;
+         if (other1_ok)
+           {
+             unsigned int bits = GET_MODE_PRECISION (inner_mode);
+
+             if (CONST_INT_P (op1))
+               newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
+             else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
+               newop1 = negate_rtx (GET_MODE (op1), op1);
+             else
+               newop1 = expand_binop (GET_MODE (op1), sub_optab,
+                                      gen_int_mode (bits, GET_MODE (op1)), op1,
+                                      NULL_RTX, unsignedp, OPTAB_DIRECT);
+           }
+         if (!CONST_INT_P (newop1)
+             && (GET_MODE_BITSIZE (as_a<scalar_int_mode> (GET_MODE (newop1)))
                  > GET_MODE_BITSIZE (inner_mode)))
-           op1 = force_reg (inner_mode,
-                            simplify_gen_unary (TRUNCATE, inner_mode, op1,
-                                                GET_MODE (op1)));
-         rtx vop1 = expand_vector_broadcast (mode, op1);
+           newop1 = force_reg (inner_mode,
+                               simplify_gen_unary (TRUNCATE, inner_mode,
+                                                   newop1, GET_MODE (newop1)));
+
+         rtx vop1 = expand_vector_broadcast (mode, newop1);
          if (vop1)
            {
              temp = expand_binop_directly (icode, mode, otheroptab, op0, vop1,

diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index ff952d6f464..c05ce1acba4 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2039,6 +2039,15 @@ vect_recog_rotate_pattern (stmt_vec_info stmt_vinfo, tree *type_out)
   if (optab1
       && optab_handler (optab1, TYPE_MODE (vectype)) != CODE_FOR_nothing)
     return NULL;
+  /* middle-end canonicalizing LROTATE to RROTATE with const rotation count,
+     let's try the LROTATE as well.  */
+  if (rhs_code == RROTATE_EXPR && TREE_CODE(oprnd1) == INTEGER_CST)
+    {
+      optab1 = optab_for_tree_code (LROTATE_EXPR, vectype, optab_vector);
+      if (optab1
+         && optab_handler (optab1, TYPE_MODE (vectype)) != CODE_FOR_nothing)
+       return NULL;
+    }

   if (is_a <bb_vec_info> (vinfo) || dt != vect_internal_def)
     {
@@ -2046,6 +2055,14 @@ vect_recog_rotate_pattern (stmt_vec_info stmt_vinfo, tree *type_out)
       if (optab2
          && optab_handler (optab2, TYPE_MODE (vectype)) != CODE_FOR_nothing)
        return NULL;
+      if (rhs_code == RROTATE_EXPR && TREE_CODE(oprnd1) == INTEGER_CST)
+       {
+         optab2 = optab_for_tree_code (LROTATE_EXPR, vectype, optab_scalar);
+         if (optab2
+             && optab_handler (optab2, TYPE_MODE (vectype))
+                  != CODE_FOR_nothing)
+           return NULL;
+       }
     }

   /* If vector/vector or vector/scalar shifts aren't supported by the target,
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 21046931243..9e1a2f971a1 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5665,6 +5665,12 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
   if (!scalar_shift_arg)
     {
       optab = optab_for_tree_code (code, vectype, optab_vector);
+
+      if (TREE_CODE (op1) == INTEGER_CST && code == RROTATE_EXPR
+         && !(optab
+              && optab_handler (optab, TYPE_MODE (vectype))
+                   != CODE_FOR_nothing))
+       optab = optab_for_tree_code (LROTATE_EXPR, vectype, optab_vector);
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location,
                          "vector/vector shift/rotate found.\n");
@@ -5696,7 +5702,10 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
       else
         {
           optab = optab_for_tree_code (code, vectype, optab_vector);
-          if (optab
+         if (TREE_CODE (op1) == INTEGER_CST && code == RROTATE_EXPR
+             && !(optab && optab_handler (optab, TYPE_MODE (vectype))))
+           optab = optab_for_tree_code (LROTATE_EXPR, vectype, optab_vector);
+         if (optab
                && (optab_handler (optab, TYPE_MODE (vectype))
                       != CODE_FOR_nothing))
             {

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-16  8:48     ` [RFC] Consider lrotate const rotation in vectorizer Kewen.Lin
@ 2019-07-17  8:42       ` Kewen.Lin
  2019-07-17  8:44         ` Jakub Jelinek
  2019-07-17 13:48         ` Segher Boessenkool
  2019-07-17 10:39       ` [RFC] Consider lrotate const rotation in vectorizer Richard Biener
  1 sibling, 2 replies; 42+ messages in thread
From: Kewen.Lin @ 2019-07-17  8:42 UTC (permalink / raw)
  To: GCC Patches
  Cc: Jakub Jelinek, Richard Biener, richard.sandiford,
	Segher Boessenkool, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 1534 bytes --]

Hi all,

This patch follows the idea to improve rs6000 backend instead of
generic expander.  I think this is a better solution?  I was thinking
generic expander change may benefit other targets suffering similar
issues but the previous RFC seems too restricted on const rotation 
count, although it's possible to extend.  Any comments on their pros/
cons are really helpful to me (a noob).

Regression testing just launched, is it OK for trunk if it's bootstrapped
and regresstested on powerpc64le-unknown-linux-gnu?


Thanks,
Kewen

---- 

gcc/ChangeLog

2019-07-17  Kewen Lin  <linkw@gcc.gnu.org>

	* config/rs6000/predicates.md (vint_reg_or_const_vector): New predicate.
	* config/rs6000/vector.md (vrotr<mode>3): New define_expand.

gcc/testsuite/ChangeLog

2019-07-17  Kewen Lin  <linkw@gcc.gnu.org>

	* gcc.target/powerpc/vec_rotate-1.c: New test.
	* gcc.target/powerpc/vec_rotate-2.c: New test.

on 2019/7/16 脧脗脦莽4:45, Kewen.Lin wrote:
> Hi all,
> 
> Based on the previous comments (thank you!), I tried to update the 
> handling in expander and vectorizer.  Middle-end optimizes lrotate
> with const rotation count to rrotate all the time, it makes vectorizer
> fail to vectorize if rrotate isn't supported on the target.  We can at
> least teach it on const rotation count, the cost should be the same? 
> At the same time, the expander already tries to use the opposite 
> rotation optable for scalar, we can teach it to deal with vector as well.
> 
> Is it on the right track and reasonable?
> 
> 
> Thanks,
> Kewen
> 

[-- Attachment #2: expand_v1.diff --]
[-- Type: text/plain, Size: 5003 bytes --]

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 8ca98299950..c4c74630d26 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -163,6 +163,17 @@
   return VINT_REGNO_P (REGNO (op));
 })
 
+;; Return 1 if op is a vector register that operates on integer vectors
+;; or if op is a const vector with integer vector modes.
+(define_predicate "vint_reg_or_const_vector"
+  (match_code "reg,subreg,const_vector")
+{
+  if (GET_CODE (op) == CONST_VECTOR && GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    return 1;
+
+  return vint_operand (op, mode);
+})
+
 ;; Return 1 if op is a vector register to do logical operations on (and, or,
 ;; xor, etc.)
 (define_predicate "vlogical_operand"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 70bcfe02e22..5c6a344e452 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -1260,6 +1260,32 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; Expanders for rotatert to make use of vrotl
+(define_expand "vrotr<mode>3"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
+		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
+{
+  machine_mode inner_mode = GET_MODE_INNER (<MODE>mode);
+  unsigned int bits = GET_MODE_PRECISION (inner_mode);
+  rtx imm_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits));
+  rtx rot_count = gen_reg_rtx (<MODE>mode);
+  if (GET_CODE (operands[2]) == CONST_VECTOR)
+    {
+      imm_vec = simplify_const_binary_operation (MINUS, <MODE>mode, imm_vec,
+						 operands[2]);
+      rot_count = force_reg (<MODE>mode, imm_vec);
+    }
+  else
+    {
+      rtx imm_reg = force_reg (<MODE>mode, imm_vec);
+      emit_insn (gen_sub<mode>3 (rot_count, imm_reg, operands[2]));
+    }
+  emit_insn (gen_vrotl<mode>3 (operands[0], operands[1], rot_count));
+  DONE;
+})
+
 ;; Expanders for arithmetic shift left on each vector element
 (define_expand "vashl<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
new file mode 100644
index 00000000000..80aca1a94a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
@@ -0,0 +1,46 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count is const number.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> 8) | (sud[i] << (sizeof (sud[0]) * 8 - 8));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> 8) | (suw[i] << (sizeof (suw[0]) * 8 - 8));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> 9)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - 9));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> 5)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - 5));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
new file mode 100644
index 00000000000..affda6c023b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
@@ -0,0 +1,47 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count isn't const number.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+extern unsigned char rot_cnt;
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> rot_cnt) | (sud[i] << (sizeof (sud[0]) * 8 - rot_cnt));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> rot_cnt) | (suw[i] << (sizeof (suw[0]) * 8 - rot_cnt));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> rot_cnt)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - rot_cnt));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> rot_cnt)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - rot_cnt));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-17  8:42       ` [PATCH, rs6000] Support vrotr<mode>3 for int vector types Kewen.Lin
@ 2019-07-17  8:44         ` Jakub Jelinek
  2019-07-17  9:38           ` Kewen.Lin
  2019-07-17 13:48         ` Segher Boessenkool
  1 sibling, 1 reply; 42+ messages in thread
From: Jakub Jelinek @ 2019-07-17  8:44 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Richard Biener, richard.sandiford,
	Segher Boessenkool, Bill Schmidt

On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -1260,6 +1260,32 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
>  
> +;; Expanders for rotatert to make use of vrotl
> +(define_expand "vrotr<mode>3"
> +  [(set (match_operand:VEC_I 0 "vint_operand")
> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
> +{
> +  machine_mode inner_mode = GET_MODE_INNER (<MODE>mode);
> +  unsigned int bits = GET_MODE_PRECISION (inner_mode);
> +  rtx imm_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits));
> +  rtx rot_count = gen_reg_rtx (<MODE>mode);
> +  if (GET_CODE (operands[2]) == CONST_VECTOR)
> +    {
> +      imm_vec = simplify_const_binary_operation (MINUS, <MODE>mode, imm_vec,
> +						 operands[2]);
> +      rot_count = force_reg (<MODE>mode, imm_vec);
> +    }
> +  else
> +    {
> +      rtx imm_reg = force_reg (<MODE>mode, imm_vec);
> +      emit_insn (gen_sub<mode>3 (rot_count, imm_reg, operands[2]));
> +    }

Is this actually correct if one or more elements in operands[2] are 0?
If vrotl<mode>3 acts with truncated shift count, that is not an issue
(but then perhaps you wouldn't have to compute imm_reg - operands[2] but
just - operands[2]), but if it does something else, then prec - 0 will be
prec and thus outside of the allowed rotate count.  Or does rs6000 allow
rotate counts to be 0 to prec inclusive?

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-17  8:44         ` Jakub Jelinek
@ 2019-07-17  9:38           ` Kewen.Lin
  2019-07-17 10:18             ` Jakub Jelinek
  0 siblings, 1 reply; 42+ messages in thread
From: Kewen.Lin @ 2019-07-17  9:38 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: GCC Patches, Richard Biener, richard.sandiford,
	Segher Boessenkool, Bill Schmidt

on 2019/7/17 脧脗脦莽4:42, Jakub Jelinek wrote:
> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:
>> --- a/gcc/config/rs6000/vector.md
>> +++ b/gcc/config/rs6000/vector.md
>> @@ -1260,6 +1260,32 @@
>>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>>    "")
>>  
>> +;; Expanders for rotatert to make use of vrotl
>> +(define_expand "vrotr<mode>3"
>> +  [(set (match_operand:VEC_I 0 "vint_operand")
>> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
>> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>> +{
>> +  machine_mode inner_mode = GET_MODE_INNER (<MODE>mode);
>> +  unsigned int bits = GET_MODE_PRECISION (inner_mode);
>> +  rtx imm_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits));
>> +  rtx rot_count = gen_reg_rtx (<MODE>mode);
>> +  if (GET_CODE (operands[2]) == CONST_VECTOR)
>> +    {
>> +      imm_vec = simplify_const_binary_operation (MINUS, <MODE>mode, imm_vec,
>> +						 operands[2]);
>> +      rot_count = force_reg (<MODE>mode, imm_vec);
>> +    }
>> +  else
>> +    {
>> +      rtx imm_reg = force_reg (<MODE>mode, imm_vec);
>> +      emit_insn (gen_sub<mode>3 (rot_count, imm_reg, operands[2]));
>> +    }
> 
> Is this actually correct if one or more elements in operands[2] are 0?
> If vrotl<mode>3 acts with truncated shift count, that is not an issue
> (but then perhaps you wouldn't have to compute imm_reg - operands[2] but
> just - operands[2]), but if it does something else, then prec - 0 will be
> prec and thus outside of the allowed rotate count.  Or does rs6000 allow
> rotate counts to be 0 to prec inclusive?
> 
> 	Jakub
> 

Hi Jakub,

Good question, the vector rotation for byte looks like (others are similar):

vrlb VRT,VRA,VRB
  do i=0 to 127 by 8
   sh = (VRB)[i+5:i+7]
   VRT[i:i+7] = (VRA)[i:i+7] <<< sh
  end

It only takes care of the counts from 0 to prec-1 (inclusive) [log2(prec) bits]
So it's fine even operands[2] are zero or negative.

Take byte as example, prec is 8.
  - rot count is 0, then minus res gets 8. (out of 3 bits range), same as 0.
  - rot count is 9, then minus res gets -1. (3 bits parsed as 7), the original 
    rot count 9 was parsed as 1 (in 3 bits range).
  - rot count is -1, then minus res gets 9, (3 bits parsed as 1), the original
    rot count was parsed as 7 (in 3 bits range).

It's a good idea to just use negate!  Thanks!!


Kewen

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-17  9:38           ` Kewen.Lin
@ 2019-07-17 10:18             ` Jakub Jelinek
  0 siblings, 0 replies; 42+ messages in thread
From: Jakub Jelinek @ 2019-07-17 10:18 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Richard Biener, richard.sandiford,
	Segher Boessenkool, Bill Schmidt

On Wed, Jul 17, 2019 at 05:22:38PM +0800, Kewen.Lin wrote:
> Good question, the vector rotation for byte looks like (others are similar):
> 
> vrlb VRT,VRA,VRB
>   do i=0 to 127 by 8
>    sh = (VRB)[i+5:i+7]
>    VRT[i:i+7] = (VRA)[i:i+7] <<< sh
>   end
> 
> It only takes care of the counts from 0 to prec-1 (inclusive) [log2(prec) bits]
> So it's fine even operands[2] are zero or negative.
> 
> Take byte as example, prec is 8.
>   - rot count is 0, then minus res gets 8. (out of 3 bits range), same as 0.
>   - rot count is 9, then minus res gets -1. (3 bits parsed as 7), the original 
>     rot count 9 was parsed as 1 (in 3 bits range).
>   - rot count is -1, then minus res gets 9, (3 bits parsed as 1), the original
>     rot count was parsed as 7 (in 3 bits range).
> 
> It's a good idea to just use negate!  Thanks!!

Ok, so the hw for the vectors truncates, the question is how happy will the
RTL generic code with that.  rs6000 defines SHIFT_COUNT_TRUNCATED to 0,
so the generic code can't assume there is a truncation going on.  Either it
will punt some optimizations when it sees say negative or too large
shift/rotate count (that is the better case), or it might just assume there
is UB.
As the documentation says, for zero SHIFT_COUNT_TRUNCATED there is an option
of having a pattern with the truncation being explicit, so in your case
*vrotl<mode>3_and or similar that would have an explicit AND on the shift
operand with say {7, 7...} vector for the byte shifts etc. but emit in the
end identical instruction to vrotl<mode>3 and use the MINUS + that pattern
for vrotr<mode>3.  If the rotate argument is CONST_VECTOR, you can of course
just canonicalize, i.e. perform -operands[2] & mask, fold that into constant
and keep using vrotl<mode>3 in that case.

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-16  8:48     ` [RFC] Consider lrotate const rotation in vectorizer Kewen.Lin
  2019-07-17  8:42       ` [PATCH, rs6000] Support vrotr<mode>3 for int vector types Kewen.Lin
@ 2019-07-17 10:39       ` Richard Biener
  2019-07-17 11:19         ` Jakub Jelinek
  2019-07-18  6:28         ` Kewen.Lin
  1 sibling, 2 replies; 42+ messages in thread
From: Richard Biener @ 2019-07-17 10:39 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Jakub Jelinek, Richard Sandiford,
	Segher Boessenkool, Bill Schmidt

On Tue, Jul 16, 2019 at 10:45 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi all,
>
> Based on the previous comments (thank you!), I tried to update the
> handling in expander and vectorizer.  Middle-end optimizes lrotate
> with const rotation count to rrotate all the time, it makes vectorizer
> fail to vectorize if rrotate isn't supported on the target.  We can at
> least teach it on const rotation count, the cost should be the same?
> At the same time, the expander already tries to use the opposite
> rotation optable for scalar, we can teach it to deal with vector as well.
>
> Is it on the right track and reasonable?

So you're basically fixing this up in the expander.  I think on
the GIMPLE level you then miss to update tree-vect-generic.c?

I'm not sure if it makes sense to have both LROTATE_EXPR and
RROTATE_EXPR on the GIMPLE level then (that CPUs only
support one direction is natural though).  So maybe simply get
rid of one?  Its semantics are also nowhere documented
(do we allow negative rotation amounts?  how are
non-mode-precision entities rotated? etc.).

Richard.

>
> Thanks,
> Kewen
>
> --------------
>
> One informal patch to help describing this new thought:
>
>
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index a0e361b8bfe..ebebb0ad145 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -1273,6 +1273,7 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, rtx op1,
>    if (mclass == MODE_VECTOR_INT)
>      {
>        optab otheroptab = unknown_optab;
> +      optab otheroptab1 = unknown_optab;
>
>        if (binoptab == ashl_optab)
>         otheroptab = vashl_optab;
> @@ -1281,23 +1282,50 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, rtx op1,
>        else if (binoptab == lshr_optab)
>         otheroptab = vlshr_optab;
>        else if (binoptab == rotl_optab)
> -       otheroptab = vrotl_optab;
> +       {
> +         otheroptab = vrotl_optab;
> +         otheroptab1 = vrotr_optab;
> +       }
>        else if (binoptab == rotr_optab)
> -       otheroptab = vrotr_optab;
> +       {
> +         otheroptab = vrotr_optab;
> +         otheroptab1 = vrotl_optab;
> +       }
> +
> +      bool other_ok = (otheroptab && (icode = optab_handler (otheroptab, mode)) != CODE_FOR_nothing);
> +      bool other1_ok = false;
> +      if (!other_ok && otheroptab1)
> +       other1_ok
> +         = ((icode = optab_handler (otheroptab1, mode)) != CODE_FOR_nothing)
> +           && SCALAR_INT_MODE_P (GET_MODE_INNER (mode));
>
> -      if (otheroptab
> -         && (icode = optab_handler (otheroptab, mode)) != CODE_FOR_nothing)
> +      if (other_ok || other1_ok)
>         {
>           /* The scalar may have been extended to be too wide.  Truncate
>              it back to the proper size to fit in the broadcast vector.  */
>           scalar_mode inner_mode = GET_MODE_INNER (mode);
> -         if (!CONST_INT_P (op1)
> -             && (GET_MODE_BITSIZE (as_a <scalar_int_mode> (GET_MODE (op1)))
> +         rtx newop1 = op1;
> +         if (other1_ok)
> +           {
> +             unsigned int bits = GET_MODE_PRECISION (inner_mode);
> +
> +             if (CONST_INT_P (op1))
> +               newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
> +             else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
> +               newop1 = negate_rtx (GET_MODE (op1), op1);
> +             else
> +               newop1 = expand_binop (GET_MODE (op1), sub_optab,
> +                                      gen_int_mode (bits, GET_MODE (op1)), op1,
> +                                      NULL_RTX, unsignedp, OPTAB_DIRECT);
> +           }
> +         if (!CONST_INT_P (newop1)
> +             && (GET_MODE_BITSIZE (as_a<scalar_int_mode> (GET_MODE (newop1)))
>                   > GET_MODE_BITSIZE (inner_mode)))
> -           op1 = force_reg (inner_mode,
> -                            simplify_gen_unary (TRUNCATE, inner_mode, op1,
> -                                                GET_MODE (op1)));
> -         rtx vop1 = expand_vector_broadcast (mode, op1);
> +           newop1 = force_reg (inner_mode,
> +                               simplify_gen_unary (TRUNCATE, inner_mode,
> +                                                   newop1, GET_MODE (newop1)));
> +
> +         rtx vop1 = expand_vector_broadcast (mode, newop1);
>           if (vop1)
>             {
>               temp = expand_binop_directly (icode, mode, otheroptab, op0, vop1,
>
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index ff952d6f464..c05ce1acba4 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -2039,6 +2039,15 @@ vect_recog_rotate_pattern (stmt_vec_info stmt_vinfo, tree *type_out)
>    if (optab1
>        && optab_handler (optab1, TYPE_MODE (vectype)) != CODE_FOR_nothing)
>      return NULL;
> +  /* middle-end canonicalizing LROTATE to RROTATE with const rotation count,
> +     let's try the LROTATE as well.  */
> +  if (rhs_code == RROTATE_EXPR && TREE_CODE(oprnd1) == INTEGER_CST)
> +    {
> +      optab1 = optab_for_tree_code (LROTATE_EXPR, vectype, optab_vector);
> +      if (optab1
> +         && optab_handler (optab1, TYPE_MODE (vectype)) != CODE_FOR_nothing)
> +       return NULL;
> +    }
>
>    if (is_a <bb_vec_info> (vinfo) || dt != vect_internal_def)
>      {
> @@ -2046,6 +2055,14 @@ vect_recog_rotate_pattern (stmt_vec_info stmt_vinfo, tree *type_out)
>        if (optab2
>           && optab_handler (optab2, TYPE_MODE (vectype)) != CODE_FOR_nothing)
>         return NULL;
> +      if (rhs_code == RROTATE_EXPR && TREE_CODE(oprnd1) == INTEGER_CST)
> +       {
> +         optab2 = optab_for_tree_code (LROTATE_EXPR, vectype, optab_scalar);
> +         if (optab2
> +             && optab_handler (optab2, TYPE_MODE (vectype))
> +                  != CODE_FOR_nothing)
> +           return NULL;
> +       }
>      }
>
>    /* If vector/vector or vector/scalar shifts aren't supported by the target,
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 21046931243..9e1a2f971a1 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -5665,6 +5665,12 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
>    if (!scalar_shift_arg)
>      {
>        optab = optab_for_tree_code (code, vectype, optab_vector);
> +
> +      if (TREE_CODE (op1) == INTEGER_CST && code == RROTATE_EXPR
> +         && !(optab
> +              && optab_handler (optab, TYPE_MODE (vectype))
> +                   != CODE_FOR_nothing))
> +       optab = optab_for_tree_code (LROTATE_EXPR, vectype, optab_vector);
>        if (dump_enabled_p ())
>          dump_printf_loc (MSG_NOTE, vect_location,
>                           "vector/vector shift/rotate found.\n");
> @@ -5696,7 +5702,10 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
>        else
>          {
>            optab = optab_for_tree_code (code, vectype, optab_vector);
> -          if (optab
> +         if (TREE_CODE (op1) == INTEGER_CST && code == RROTATE_EXPR
> +             && !(optab && optab_handler (optab, TYPE_MODE (vectype))))
> +           optab = optab_for_tree_code (LROTATE_EXPR, vectype, optab_vector);
> +         if (optab
>                 && (optab_handler (optab, TYPE_MODE (vectype))
>                        != CODE_FOR_nothing))
>              {
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-17 10:39       ` [RFC] Consider lrotate const rotation in vectorizer Richard Biener
@ 2019-07-17 11:19         ` Jakub Jelinek
  2019-07-17 11:35           ` Richard Biener
  2019-07-17 17:51           ` Segher Boessenkool
  2019-07-18  6:28         ` Kewen.Lin
  1 sibling, 2 replies; 42+ messages in thread
From: Jakub Jelinek @ 2019-07-17 11:19 UTC (permalink / raw)
  To: Richard Biener
  Cc: Kewen.Lin, GCC Patches, Richard Sandiford, Segher Boessenkool,
	Bill Schmidt

On Wed, Jul 17, 2019 at 12:37:59PM +0200, Richard Biener wrote:
> On Tue, Jul 16, 2019 at 10:45 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> > Based on the previous comments (thank you!), I tried to update the
> > handling in expander and vectorizer.  Middle-end optimizes lrotate
> > with const rotation count to rrotate all the time, it makes vectorizer
> > fail to vectorize if rrotate isn't supported on the target.  We can at
> > least teach it on const rotation count, the cost should be the same?
> > At the same time, the expander already tries to use the opposite
> > rotation optable for scalar, we can teach it to deal with vector as well.
> >
> > Is it on the right track and reasonable?
> 
> So you're basically fixing this up in the expander.  I think on
> the GIMPLE level you then miss to update tree-vect-generic.c?
> 
> I'm not sure if it makes sense to have both LROTATE_EXPR and
> RROTATE_EXPR on the GIMPLE level then (that CPUs only
> support one direction is natural though).  So maybe simply get
> rid of one?  Its semantics are also nowhere documented

A lot of targets support both, and I think not all targets do the
truncation, so at least with non-constant rotate count emitting one over the
other is important and trying to match it up only during combine might be
too late and not work well in many cases.
Then there are some targets that only support left rotates and not right
rotates (rs6000, s390, tilegx, ...), and other targets that only support
right rotates (mips, iq2000, ...).
So only having one GIMPLE code doesn't seem to be good enough.

I think handling it during expansion in generic code is fine, especially
when we clearly have several targets that do support only one of the
rotates.  As you wrote, it needs corresponding code in tree-vect-generic.c,
and shouldn't hardcode the rs6000 direction of mapping rotr to rotl, but
support also the other direction - rotl to rotr.  For the sake of
!SHIFT_COUNT_TRUNCATED targets for constant shift counts it needs to do
negation + masking and for variable shift counts probably punt and let the
backend code handle it if it can do the truncation in there?

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-17 11:19         ` Jakub Jelinek
@ 2019-07-17 11:35           ` Richard Biener
  2019-07-17 11:56             ` Richard Biener
  2019-07-17 13:58             ` Segher Boessenkool
  2019-07-17 17:51           ` Segher Boessenkool
  1 sibling, 2 replies; 42+ messages in thread
From: Richard Biener @ 2019-07-17 11:35 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Kewen.Lin, GCC Patches, Richard Sandiford, Segher Boessenkool,
	Bill Schmidt

On Wed, Jul 17, 2019 at 12:54 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Jul 17, 2019 at 12:37:59PM +0200, Richard Biener wrote:
> > On Tue, Jul 16, 2019 at 10:45 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> > > Based on the previous comments (thank you!), I tried to update the
> > > handling in expander and vectorizer.  Middle-end optimizes lrotate
> > > with const rotation count to rrotate all the time, it makes vectorizer
> > > fail to vectorize if rrotate isn't supported on the target.  We can at
> > > least teach it on const rotation count, the cost should be the same?
> > > At the same time, the expander already tries to use the opposite
> > > rotation optable for scalar, we can teach it to deal with vector as well.
> > >
> > > Is it on the right track and reasonable?
> >
> > So you're basically fixing this up in the expander.  I think on
> > the GIMPLE level you then miss to update tree-vect-generic.c?
> >
> > I'm not sure if it makes sense to have both LROTATE_EXPR and
> > RROTATE_EXPR on the GIMPLE level then (that CPUs only
> > support one direction is natural though).  So maybe simply get
> > rid of one?  Its semantics are also nowhere documented
>
> A lot of targets support both, and I think not all targets do the
> truncation, so at least with non-constant rotate count emitting one over the
> other is important and trying to match it up only during combine might be
> too late and not work well in many cases.
> Then there are some targets that only support left rotates and not right
> rotates (rs6000, s390, tilegx, ...), and other targets that only support
> right rotates (mips, iq2000, ...).
> So only having one GIMPLE code doesn't seem to be good enough.

It seems for constants it is by means of canonicalization.  The lack
of canonicalization for non-constants then makes us fail to CSE
lrotate<x, a> and rrotate<x, width-a>.  Given rotates are only
detected on GIMPLE always creating one or the other should be
reasonably easy and fixup during expansion can happen either via TER
or via pre-expand generation of optab corresponding IFNs?

Might get tricky if we face width - (a + 5) so the pattern matching
of an opposing direction rotate gets harder.

> I think handling it during expansion in generic code is fine, especially
> when we clearly have several targets that do support only one of the
> rotates.

Yes.

> As you wrote, it needs corresponding code in tree-vect-generic.c,
> and shouldn't hardcode the rs6000 direction of mapping rotr to rotl, but
> support also the other direction - rotl to rotr.  For the sake of
> !SHIFT_COUNT_TRUNCATED targets for constant shift counts it needs to do
> negation + masking and for variable shift counts probably punt and let the
> backend code handle it if it can do the truncation in there?

Ick.  I wouldn't even touch SHIFT_COUNT_TRUNCATED with a 10-foot pole
here.  And for rotates we can simply always truncate constant amounts to
the rotated operands width, no?  For non-constant ones I fear targets
would need to support both to get reliable expansion.

Richard.

>         Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-17 11:35           ` Richard Biener
@ 2019-07-17 11:56             ` Richard Biener
  2019-07-17 13:58             ` Segher Boessenkool
  1 sibling, 0 replies; 42+ messages in thread
From: Richard Biener @ 2019-07-17 11:56 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Kewen.Lin, GCC Patches, Richard Sandiford, Segher Boessenkool,
	Bill Schmidt

On Wed, Jul 17, 2019 at 1:32 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Jul 17, 2019 at 12:54 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Wed, Jul 17, 2019 at 12:37:59PM +0200, Richard Biener wrote:
> > > On Tue, Jul 16, 2019 at 10:45 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> > > > Based on the previous comments (thank you!), I tried to update the
> > > > handling in expander and vectorizer.  Middle-end optimizes lrotate
> > > > with const rotation count to rrotate all the time, it makes vectorizer
> > > > fail to vectorize if rrotate isn't supported on the target.  We can at
> > > > least teach it on const rotation count, the cost should be the same?
> > > > At the same time, the expander already tries to use the opposite
> > > > rotation optable for scalar, we can teach it to deal with vector as well.
> > > >
> > > > Is it on the right track and reasonable?
> > >
> > > So you're basically fixing this up in the expander.  I think on
> > > the GIMPLE level you then miss to update tree-vect-generic.c?
> > >
> > > I'm not sure if it makes sense to have both LROTATE_EXPR and
> > > RROTATE_EXPR on the GIMPLE level then (that CPUs only
> > > support one direction is natural though).  So maybe simply get
> > > rid of one?  Its semantics are also nowhere documented
> >
> > A lot of targets support both, and I think not all targets do the
> > truncation, so at least with non-constant rotate count emitting one over the
> > other is important and trying to match it up only during combine might be
> > too late and not work well in many cases.
> > Then there are some targets that only support left rotates and not right
> > rotates (rs6000, s390, tilegx, ...), and other targets that only support
> > right rotates (mips, iq2000, ...).
> > So only having one GIMPLE code doesn't seem to be good enough.
>
> It seems for constants it is by means of canonicalization.  The lack
> of canonicalization for non-constants then makes us fail to CSE
> lrotate<x, a> and rrotate<x, width-a>.  Given rotates are only
> detected on GIMPLE always creating one or the other should be
> reasonably easy and fixup during expansion can happen either via TER
> or via pre-expand generation of optab corresponding IFNs?
>
> Might get tricky if we face width - (a + 5) so the pattern matching
> of an opposing direction rotate gets harder.
>
> > I think handling it during expansion in generic code is fine, especially
> > when we clearly have several targets that do support only one of the
> > rotates.
>
> Yes.
>
> > As you wrote, it needs corresponding code in tree-vect-generic.c,
> > and shouldn't hardcode the rs6000 direction of mapping rotr to rotl, but
> > support also the other direction - rotl to rotr.  For the sake of
> > !SHIFT_COUNT_TRUNCATED targets for constant shift counts it needs to do
> > negation + masking and for variable shift counts probably punt and let the
> > backend code handle it if it can do the truncation in there?
>
> Ick.  I wouldn't even touch SHIFT_COUNT_TRUNCATED with a 10-foot pole
> here.  And for rotates we can simply always truncate constant amounts to
> the rotated operands width, no?  For non-constant ones I fear targets
> would need to support both to get reliable expansion.

Btw, the docs of SHIFT_COUNT_TRUNCATED do not mention rotates
unless you treat a rotate as a shift.

Richard.

> Richard.
>
> >         Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-17  8:42       ` [PATCH, rs6000] Support vrotr<mode>3 for int vector types Kewen.Lin
  2019-07-17  8:44         ` Jakub Jelinek
@ 2019-07-17 13:48         ` Segher Boessenkool
  2019-07-18  6:06           ` Kewen.Lin
  1 sibling, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-07-17 13:48 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

Hi Kewen,

On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:
> Regression testing just launched, is it OK for trunk if it's bootstrapped
> and regresstested on powerpc64le-unknown-linux-gnu?

> +;; Expanders for rotatert to make use of vrotl
> +(define_expand "vrotr<mode>3"
> +  [(set (match_operand:VEC_I 0 "vint_operand")
> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]

Having any rotatert in a define_expand or define_insn will regress.

So, nope, sorry.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-17 11:35           ` Richard Biener
  2019-07-17 11:56             ` Richard Biener
@ 2019-07-17 13:58             ` Segher Boessenkool
  1 sibling, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-07-17 13:58 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, Kewen.Lin, GCC Patches, Richard Sandiford, Bill Schmidt

On Wed, Jul 17, 2019 at 01:32:50PM +0200, Richard Biener wrote:
> On Wed, Jul 17, 2019 at 12:54 PM Jakub Jelinek <jakub@redhat.com> wrote:
> Ick.  I wouldn't even touch SHIFT_COUNT_TRUNCATED with a 10-foot pole
> here.  And for rotates we can simply always truncate constant amounts to
> the rotated operands width, no?  For non-constant ones I fear targets
> would need to support both to get reliable expansion.

SHIFT_COUNT_TRUNCATED has no meaning for rotate instructions, as far as
I can see.  Mathematically it doesn't, and are there any CPUs that fail
for it for no reason?  Any archs where some rotate amounts give
undefined results?  So if this is true, SHIFT_COUNT_TRUNCATED can be
treated as always *on* for rotates.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-17 11:19         ` Jakub Jelinek
  2019-07-17 11:35           ` Richard Biener
@ 2019-07-17 17:51           ` Segher Boessenkool
  2019-07-18  7:03             ` Jakub Jelinek
  2019-07-18 15:17             ` Richard Earnshaw (lists)
  1 sibling, 2 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-07-17 17:51 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Kewen.Lin, GCC Patches, Richard Sandiford, Bill Schmidt

On Wed, Jul 17, 2019 at 12:54:32PM +0200, Jakub Jelinek wrote:
> On Wed, Jul 17, 2019 at 12:37:59PM +0200, Richard Biener wrote:
> > I'm not sure if it makes sense to have both LROTATE_EXPR and
> > RROTATE_EXPR on the GIMPLE level then (that CPUs only
> > support one direction is natural though).  So maybe simply get
> > rid of one?  Its semantics are also nowhere documented
> 
> A lot of targets support both,

Of all the linux targets, we have:

No rotate:
  alpha microblaze riscv sparc

Both directions:
  aarch64 c6x ia64 m68k nios2 parisc sh x86 xtensa

Left only:
  csky h8300 powerpc s390

Right only:
  arc arm mips nds32 openrisc

> Then there are some targets that only support left rotates and not right
> rotates (rs6000, s390, tilegx, ...), and other targets that only support
> right rotates (mips, iq2000, ...).
> So only having one GIMPLE code doesn't seem to be good enough.
> 
> I think handling it during expansion in generic code is fine, especially
> when we clearly have several targets that do support only one of the
> rotates.  As you wrote, it needs corresponding code in tree-vect-generic.c,
> and shouldn't hardcode the rs6000 direction of mapping rotr to rotl, but
> support also the other direction - rotl to rotr.  For the sake of
> !SHIFT_COUNT_TRUNCATED targets for constant shift counts it needs to do
> negation + masking and for variable shift counts probably punt and let the
> backend code handle it if it can do the truncation in there?

I think we can say that *all* targets behave like SHIFT_COUNT_TRUNCATED
for rotates?  Not all immediates are valid of course, but that is a
separate issue.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-17 13:48         ` Segher Boessenkool
@ 2019-07-18  6:06           ` Kewen.Lin
  2019-07-18 20:06             ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Kewen.Lin @ 2019-07-18  6:06 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

Hi Segher,

on 2019/7/17 脧脗脦莽9:40, Segher Boessenkool wrote:
> Hi Kewen,
> 
> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:
>> Regression testing just launched, is it OK for trunk if it's bootstrapped
>> and regresstested on powerpc64le-unknown-linux-gnu?
> 
>> +;; Expanders for rotatert to make use of vrotl
>> +(define_expand "vrotr<mode>3"
>> +  [(set (match_operand:VEC_I 0 "vint_operand")
>> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
>> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
> 
> Having any rotatert in a define_expand or define_insn will regress.
> 
> So, nope, sorry.
> 

Thanks for clarifying!  Since regression testing passed on powerpc64le,I'd like to double confirm the meaning of "regress", does it mean it's 
a regression from design view?  Is it specific to rotatert and its 
related one like vrotr? 

If yes, it sounds we can't go with vrotr way. :(


Thanks,
Kewen

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-17 10:39       ` [RFC] Consider lrotate const rotation in vectorizer Richard Biener
  2019-07-17 11:19         ` Jakub Jelinek
@ 2019-07-18  6:28         ` Kewen.Lin
  1 sibling, 0 replies; 42+ messages in thread
From: Kewen.Lin @ 2019-07-18  6:28 UTC (permalink / raw)
  To: Richard Biener
  Cc: GCC Patches, Jakub Jelinek, Richard Sandiford,
	Segher Boessenkool, Bill Schmidt

on 2019/7/17 下午6:37, Richard Biener wrote:
> On Tue, Jul 16, 2019 at 10:45 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi all,
>>
>> Based on the previous comments (thank you!), I tried to update the
>> handling in expander and vectorizer.  Middle-end optimizes lrotate
>> with const rotation count to rrotate all the time, it makes vectorizer
>> fail to vectorize if rrotate isn't supported on the target.  We can at
>> least teach it on const rotation count, the cost should be the same?
>> At the same time, the expander already tries to use the opposite
>> rotation optable for scalar, we can teach it to deal with vector as well.
>>
>> Is it on the right track and reasonable?
> 
> So you're basically fixing this up in the expander.  I think on
> the GIMPLE level you then miss to update tree-vect-generic.c?
> 

Thanks, I will update it.  Another question on variable rotation
number, where is the best place I can add additional cost in 
vectorizer (for negate + possible maskgen/and)?  Or to avoid this,
transform the stmt to several stmts with opposite direction
before vectorizer?

> I'm not sure if it makes sense to have both LROTATE_EXPR and
> RROTATE_EXPR on the GIMPLE level then (that CPUs only
> support one direction is natural though).  So maybe simply get
> rid of one?  

One maybe impractical idea to have ROTATE_EXPR to unify and use 
positive or negative for the direction?

> Its semantics are also nowhere documented
> (do we allow negative rotation amounts?  how are
> non-mode-precision entities rotated? etc.).
> 

I think negative rotation amount is ok, not sure non-mode-prec,
it's a good point we should guard it when would like to use 
the opposite direction.


Thanks,
Kewen

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-17 17:51           ` Segher Boessenkool
@ 2019-07-18  7:03             ` Jakub Jelinek
  2019-07-18 19:45               ` Segher Boessenkool
  2019-07-18 15:17             ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 42+ messages in thread
From: Jakub Jelinek @ 2019-07-18  7:03 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Biener, Kewen.Lin, GCC Patches, Richard Sandiford, Bill Schmidt

On Wed, Jul 17, 2019 at 12:00:32PM -0500, Segher Boessenkool wrote:
> I think we can say that *all* targets behave like SHIFT_COUNT_TRUNCATED
> for rotates?  Not all immediates are valid of course, but that is a
> separate issue.

Well, we'd need to double check all the hw rotate instructions on all the
targets to be sure.
As for the current GCC code, SHIFT_COUNT_TRUNCATED value is used even for
rotates at least in combine.c, expmed.c and simplify-rtx.c and in
optabs.c through targetm.shift_truncation_mask, but e.g. in cse.c is used
only for shifts and not rotates.

And speaking of optabs.c, it already has code to emit the other rotate
if one rotate isn't supported, the:
  /* If we were trying to rotate, and that didn't work, try rotating
     the other direction before falling back to shifts and bitwise-or.  */
  if (((binoptab == rotl_optab
        && (icode = optab_handler (rotr_optab, mode)) != CODE_FOR_nothing)
       || (binoptab == rotr_optab
           && (icode = optab_handler (rotl_optab, mode)) != CODE_FOR_nothing))
      && is_int_mode (mode, &int_mode))
    {
      optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab);
hunk in there, just it is limited to scalar rotates ATM rather than vector
ones through is_int_mode.  So I bet the problem with the vector shifts is just that
tree-vect-generic.c support isn't there.  And the above mentioned code
actually emits the
        newop1 = expand_binop (GET_MODE (op1), sub_optab,
                               gen_int_mode (bits, GET_MODE (op1)), op1,
                               NULL_RTX, unsignedp, OPTAB_DIRECT);
as the fallback, rather than masking of negation with some mask, so if there
is some target that doesn't truncate the rotate count, it will be
problematic with variable rotate by 0.  And note that the other rotate
code explicitly uses targetm.shift_truncation_mask.

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-17 17:51           ` Segher Boessenkool
  2019-07-18  7:03             ` Jakub Jelinek
@ 2019-07-18 15:17             ` Richard Earnshaw (lists)
  2019-07-18 15:26               ` Jakub Jelinek
  2019-07-18 18:04               ` Segher Boessenkool
  1 sibling, 2 replies; 42+ messages in thread
From: Richard Earnshaw (lists) @ 2019-07-18 15:17 UTC (permalink / raw)
  To: Segher Boessenkool, Jakub Jelinek
  Cc: Richard Biener, Kewen.Lin, GCC Patches, Richard Sandiford, Bill Schmidt



On 17/07/2019 18:00, Segher Boessenkool wrote:
> On Wed, Jul 17, 2019 at 12:54:32PM +0200, Jakub Jelinek wrote:
>> On Wed, Jul 17, 2019 at 12:37:59PM +0200, Richard Biener wrote:
>>> I'm not sure if it makes sense to have both LROTATE_EXPR and
>>> RROTATE_EXPR on the GIMPLE level then (that CPUs only
>>> support one direction is natural though).  So maybe simply get
>>> rid of one?  Its semantics are also nowhere documented
>>
>> A lot of targets support both,
> 
> Of all the linux targets, we have:
> 
> No rotate:
>    alpha microblaze riscv sparc
> 
> Both directions:
>    aarch64 c6x ia64 m68k nios2 parisc sh x86 xtensa

AArch64 is Right only.

R.

> 
> Left only:
>    csky h8300 powerpc s390
> 
> Right only:
>    arc arm mips nds32 openrisc
> 
>> Then there are some targets that only support left rotates and not right
>> rotates (rs6000, s390, tilegx, ...), and other targets that only support
>> right rotates (mips, iq2000, ...).
>> So only having one GIMPLE code doesn't seem to be good enough.
>>
>> I think handling it during expansion in generic code is fine, especially
>> when we clearly have several targets that do support only one of the
>> rotates.  As you wrote, it needs corresponding code in tree-vect-generic.c,
>> and shouldn't hardcode the rs6000 direction of mapping rotr to rotl, but
>> support also the other direction - rotl to rotr.  For the sake of
>> !SHIFT_COUNT_TRUNCATED targets for constant shift counts it needs to do
>> negation + masking and for variable shift counts probably punt and let the
>> backend code handle it if it can do the truncation in there?
> 
> I think we can say that *all* targets behave like SHIFT_COUNT_TRUNCATED
> for rotates?  Not all immediates are valid of course, but that is a
> separate issue.
> 
> 
> Segher
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-18 15:17             ` Richard Earnshaw (lists)
@ 2019-07-18 15:26               ` Jakub Jelinek
  2019-07-18 15:31                 ` Richard Earnshaw (lists)
  2019-07-18 18:04               ` Segher Boessenkool
  1 sibling, 1 reply; 42+ messages in thread
From: Jakub Jelinek @ 2019-07-18 15:26 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Segher Boessenkool, Richard Biener, Kewen.Lin, GCC Patches,
	Richard Sandiford, Bill Schmidt

On Thu, Jul 18, 2019 at 04:12:48PM +0100, Richard Earnshaw (lists) wrote:
> > Both directions:
> >    aarch64 c6x ia64 m68k nios2 parisc sh x86 xtensa
> 
> AArch64 is Right only.

Maybe hw-wise, but it has both rotr<mode>3 and rotl<mode>3 expanders.
At least for GPRs.

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-18 15:26               ` Jakub Jelinek
@ 2019-07-18 15:31                 ` Richard Earnshaw (lists)
  2019-07-18 15:35                   ` Jakub Jelinek
  0 siblings, 1 reply; 42+ messages in thread
From: Richard Earnshaw (lists) @ 2019-07-18 15:31 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Segher Boessenkool, Richard Biener, Kewen.Lin, GCC Patches,
	Richard Sandiford, Bill Schmidt



On 18/07/2019 16:17, Jakub Jelinek wrote:
> On Thu, Jul 18, 2019 at 04:12:48PM +0100, Richard Earnshaw (lists) wrote:
>>> Both directions:
>>>     aarch64 c6x ia64 m68k nios2 parisc sh x86 xtensa
>>
>> AArch64 is Right only.
> 
> Maybe hw-wise, but it has both rotr<mode>3 and rotl<mode>3 expanders.
> At least for GPRs.
> 
> 	Jakub
> 

Only for immediates.  And the patterns that support that just write out 
assembly as "ror (<size> - n)".

R.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-18 15:31                 ` Richard Earnshaw (lists)
@ 2019-07-18 15:35                   ` Jakub Jelinek
  2019-07-18 15:44                     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 42+ messages in thread
From: Jakub Jelinek @ 2019-07-18 15:35 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Segher Boessenkool, Richard Biener, Kewen.Lin, GCC Patches,
	Richard Sandiford, Bill Schmidt

On Thu, Jul 18, 2019 at 04:26:26PM +0100, Richard Earnshaw (lists) wrote:
> 
> 
> On 18/07/2019 16:17, Jakub Jelinek wrote:
> > On Thu, Jul 18, 2019 at 04:12:48PM +0100, Richard Earnshaw (lists) wrote:
> > > > Both directions:
> > > >     aarch64 c6x ia64 m68k nios2 parisc sh x86 xtensa
> > > 
> > > AArch64 is Right only.
> > 
> > Maybe hw-wise, but it has both rotr<mode>3 and rotl<mode>3 expanders.
> > At least for GPRs.
> > 
> > 	Jakub
> > 
> 
> Only for immediates.  And the patterns that support that just write out
> assembly as "ror (<size> - n)".

For registers too (for those it negates and uses rotr).
Note, the middle-end ought to be able to do the same thing already,
except if not SHIFT_COUNT_TRUNCATED it will use bits - count instead of
-count.
(define_expand "rotl<mode>3"
  [(set (match_operand:GPI 0 "register_operand")
        (rotatert:GPI (match_operand:GPI 1 "register_operand")
                      (match_operand:QI 2 "aarch64_reg_or_imm")))]
  ""
  {
    /* (SZ - cnt) % SZ == -cnt % SZ */
    if (CONST_INT_P (operands[2]))
      {
        operands[2] = GEN_INT ((-INTVAL (operands[2]))
                               & (GET_MODE_BITSIZE (<MODE>mode) - 1));
        if (operands[2] == const0_rtx)
          {
            emit_insn (gen_mov<mode> (operands[0], operands[1]));
            DONE;
          }
      }
    else
      operands[2] = expand_simple_unop (QImode, NEG, operands[2],
                                        NULL_RTX, 1);
  }
)

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-18 15:35                   ` Jakub Jelinek
@ 2019-07-18 15:44                     ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 42+ messages in thread
From: Richard Earnshaw (lists) @ 2019-07-18 15:44 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Segher Boessenkool, Richard Biener, Kewen.Lin, GCC Patches,
	Richard Sandiford, Bill Schmidt



On 18/07/2019 16:30, Jakub Jelinek wrote:
> On Thu, Jul 18, 2019 at 04:26:26PM +0100, Richard Earnshaw (lists) wrote:
>>
>>
>> On 18/07/2019 16:17, Jakub Jelinek wrote:
>>> On Thu, Jul 18, 2019 at 04:12:48PM +0100, Richard Earnshaw (lists) wrote:
>>>>> Both directions:
>>>>>      aarch64 c6x ia64 m68k nios2 parisc sh x86 xtensa
>>>>
>>>> AArch64 is Right only.
>>>
>>> Maybe hw-wise, but it has both rotr<mode>3 and rotl<mode>3 expanders.
>>> At least for GPRs.
>>>
>>> 	Jakub
>>>
>>
>> Only for immediates.  And the patterns that support that just write out
>> assembly as "ror (<size> - n)".
> 
> For registers too (for those it negates and uses rotr).
> Note, the middle-end ought to be able to do the same thing already,
> except if not SHIFT_COUNT_TRUNCATED it will use bits - count instead of
> -count.
> (define_expand "rotl<mode>3"
>    [(set (match_operand:GPI 0 "register_operand")
>          (rotatert:GPI (match_operand:GPI 1 "register_operand")
>                        (match_operand:QI 2 "aarch64_reg_or_imm")))]
>    ""
>    {
>      /* (SZ - cnt) % SZ == -cnt % SZ */
>      if (CONST_INT_P (operands[2]))
>        {
>          operands[2] = GEN_INT ((-INTVAL (operands[2]))
>                                 & (GET_MODE_BITSIZE (<MODE>mode) - 1));
>          if (operands[2] == const0_rtx)
>            {
>              emit_insn (gen_mov<mode> (operands[0], operands[1]));
>              DONE;
>            }
>        }
>      else
>        operands[2] = expand_simple_unop (QImode, NEG, operands[2],
>                                          NULL_RTX, 1);
>    }
> )
> 
> 	Jakub
> 

Well Arm has that sort of expansion as well, but it was listed as 'right 
only'.

R.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-18 15:17             ` Richard Earnshaw (lists)
  2019-07-18 15:26               ` Jakub Jelinek
@ 2019-07-18 18:04               ` Segher Boessenkool
  1 sibling, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-07-18 18:04 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Jakub Jelinek, Richard Biener, Kewen.Lin, GCC Patches,
	Richard Sandiford, Bill Schmidt

On Thu, Jul 18, 2019 at 04:12:48PM +0100, Richard Earnshaw (lists) wrote:
> 
> 
> On 17/07/2019 18:00, Segher Boessenkool wrote:
> >On Wed, Jul 17, 2019 at 12:54:32PM +0200, Jakub Jelinek wrote:
> >>On Wed, Jul 17, 2019 at 12:37:59PM +0200, Richard Biener wrote:
> >>>I'm not sure if it makes sense to have both LROTATE_EXPR and
> >>>RROTATE_EXPR on the GIMPLE level then (that CPUs only
> >>>support one direction is natural though).  So maybe simply get
> >>>rid of one?  Its semantics are also nowhere documented
> >>
> >>A lot of targets support both,
> >
> >Of all the linux targets, we have:
> >
> >No rotate:
> >   alpha microblaze riscv sparc
> >
> >Both directions:
> >   aarch64 c6x ia64 m68k nios2 parisc sh x86 xtensa
> 
> AArch64 is Right only.

This is whether a port has any rotate resp. rotatert in recog, which
generates HAVE_rotate and HAVE_rotatert in insn-config.h .

If the hardware can only do right rotates you should either handle
immediate left rotates everywhere as well, or avoid using left rotates
in define_expand and define_insn.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Consider lrotate const rotation in vectorizer
  2019-07-18  7:03             ` Jakub Jelinek
@ 2019-07-18 19:45               ` Segher Boessenkool
  0 siblings, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-07-18 19:45 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Kewen.Lin, GCC Patches, Richard Sandiford, Bill Schmidt

On Thu, Jul 18, 2019 at 09:01:13AM +0200, Jakub Jelinek wrote:
> On Wed, Jul 17, 2019 at 12:00:32PM -0500, Segher Boessenkool wrote:
> > I think we can say that *all* targets behave like SHIFT_COUNT_TRUNCATED
> > for rotates?  Not all immediates are valid of course, but that is a
> > separate issue.
> 
> Well, we'd need to double check all the hw rotate instructions on all the
> targets to be sure.

Yes :-(

> As for the current GCC code, SHIFT_COUNT_TRUNCATED value is used even for
> rotates at least in combine.c, expmed.c and simplify-rtx.c and in
> optabs.c through targetm.shift_truncation_mask, but e.g. in cse.c is used
> only for shifts and not rotates.

Many targets cannot use SHIFT_COUNT_TRUNCATED, and not even
targetm.shift_truncation_mask, because the actual mask depends on the
insn used, or rtl code used at least, not just the mode.

[snip]
> hunk in there, just it is limited to scalar rotates ATM rather than vector
> ones through is_int_mode.  So I bet the problem with the vector shifts is just that
> tree-vect-generic.c support isn't there.

:-)

Should we always allow both directions in gimple, and pretend both are
cheap?  Should we allow only one direction, and let the target select
which, or both?


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-18  6:06           ` Kewen.Lin
@ 2019-07-18 20:06             ` Segher Boessenkool
  2019-07-19  6:51               ` Kewen.Lin
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-07-18 20:06 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

On Thu, Jul 18, 2019 at 01:44:36PM +0800, Kewen.Lin wrote:
> Hi Segher,
> 
> on 2019/7/17 下午9:40, Segher Boessenkool wrote:
> > Hi Kewen,
> > 
> > On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:
> >> Regression testing just launched, is it OK for trunk if it's bootstrapped
> >> and regresstested on powerpc64le-unknown-linux-gnu?
> > 
> >> +;; Expanders for rotatert to make use of vrotl
> >> +(define_expand "vrotr<mode>3"
> >> +  [(set (match_operand:VEC_I 0 "vint_operand")
> >> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
> >> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
> > 
> > Having any rotatert in a define_expand or define_insn will regress.
> > 
> > So, nope, sorry.
> 
> Thanks for clarifying!  Since regression testing passed on powerpc64le,I'd like to double confirm the meaning of "regress", does it mean it's 
> a regression from design view?  Is it specific to rotatert and its 
> related one like vrotr? 

You will get HAVE_rotatert defined in insn-config.h if you do this patch,
and then simplify-rtx.c will not work correctly, generating rotatert by
an immediate, which we have no instructions for.

This might be masked because many of our rl*.c tests already fail because
of other changes, I should fix that :-/


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-18 20:06             ` Segher Boessenkool
@ 2019-07-19  6:51               ` Kewen.Lin
  2019-07-19 15:49                 ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Kewen.Lin @ 2019-07-19  6:51 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

on 2019/7/19 上午3:48, Segher Boessenkool wrote:
> On Thu, Jul 18, 2019 at 01:44:36PM +0800, Kewen.Lin wrote:
>> Hi Segher,
>>
>> on 2019/7/17 下午9:40, Segher Boessenkool wrote:
>>> Hi Kewen,
>>>
>>> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:
>>>> Regression testing just launched, is it OK for trunk if it's bootstrapped
>>>> and regresstested on powerpc64le-unknown-linux-gnu?
>>>
>>>> +;; Expanders for rotatert to make use of vrotl
>>>> +(define_expand "vrotr<mode>3"
>>>> +  [(set (match_operand:VEC_I 0 "vint_operand")
>>>> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
>>>> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
>>>
>>> Having any rotatert in a define_expand or define_insn will regress.
>>>
>>> So, nope, sorry.
>>
>> Thanks for clarifying!  Since regression testing passed on powerpc64le,I'd like to double confirm the meaning of "regress", does it mean it's 
>> a regression from design view?  Is it specific to rotatert and its 
>> related one like vrotr? 
> 
> You will get HAVE_rotatert defined in insn-config.h if you do this patch,
> and then simplify-rtx.c will not work correctly, generating rotatert by
> an immediate, which we have no instructions for.
> 
> This might be masked because many of our rl*.c tests already fail because
> of other changes, I should fix that :-/
> 

Hi Segher,

Thanks for further explanation!  Sorry that, but I didn't find this 
HAVE_rotatert definition.  I guess it's due to the preparation is always 
"DONE"?  Then it doesn't really generate rotatert. 
although I can see rotatert in insn like below, it seems fine in note?

(insn 10 9 11 4 (set (reg:V4SI 122 [ vect__2.7 ])
        (rotate:V4SI (reg:V4SI 121 [ vect__1.6 ])
            (reg:V4SI 124))) "t.c":17:28 1596 {*altivec_vrlw}
     (expr_list:REG_EQUAL (rotatert:V4SI (reg:V4SI 121 [ vect__1.6 ])
            (const_vector:V4SI [
                    (const_int 8 [0x8]) repeated x4
                ]))
        (nil)))


Thanks,
Kewen

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-19  6:51               ` Kewen.Lin
@ 2019-07-19 15:49                 ` Segher Boessenkool
  2019-07-23  7:32                   ` [PATCH V2, " Kewen.Lin
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-07-19 15:49 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

Hi!

On Fri, Jul 19, 2019 at 10:21:06AM +0800, Kewen.Lin wrote:
> on 2019/7/19 上午3:48, Segher Boessenkool wrote:
> > On Thu, Jul 18, 2019 at 01:44:36PM +0800, Kewen.Lin wrote:
> >> on 2019/7/17 下午9:40, Segher Boessenkool wrote:
> >>> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:
> >>>> Regression testing just launched, is it OK for trunk if it's bootstrapped
> >>>> and regresstested on powerpc64le-unknown-linux-gnu?
> >>>
> >>>> +;; Expanders for rotatert to make use of vrotl
> >>>> +(define_expand "vrotr<mode>3"
> >>>> +  [(set (match_operand:VEC_I 0 "vint_operand")
> >>>> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
> >>>> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
> >>>
> >>> Having any rotatert in a define_expand or define_insn will regress.

This is wrong.  I don't know why I thought this for a while.

There shouldn't be any rotatert in anything that goes through recog, but
that is everything *except* define_expand.  So define_insn, define_split,
define_peephole, define_peephole2 (and define_insn_and_split, which is
just syntactic sugar).

> Thanks for further explanation!  Sorry that, but I didn't find this 
> HAVE_rotatert definition.  I guess it's due to the preparation is always 
> "DONE"?  Then it doesn't really generate rotatert. 

You only had one in a define_expand.  That is fine, that pattern is never
recognised against.  HAVE_rotatert means that something somewhere will
recognise rotatert RTL insns; if it isn't set, it doesn't make sense to
ever create them, because they will never match.

> although I can see rotatert in insn like below, it seems fine in note?

Sure, many things are allowed in notes that can never show up in RTL
proper.

So, this approach will work fine, and not be too bad.  Could you do a
new patch with it?  It's simple to do, and even if the generic thing
will happen eventually, this is a nice stepping stone for that.

Thanks, and sorry for the confusion,


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V2, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-19 15:49                 ` Segher Boessenkool
@ 2019-07-23  7:32                   ` Kewen.Lin
  2019-07-25 14:24                     ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Kewen.Lin @ 2019-07-23  7:32 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 2828 bytes --]

Hi Segher,

Thanks for the clarification! 

Compared with the previous one, this add vrl<mode>_and define_insn(s)
for explicit AND (truncation) as Jakub's suggestion.

Bootstrapped and regtested on powerpc64le-unknown-linux-gnu, is it ok
for trunk?

Thanks,
Kewen

----------------
gcc/ChangeLog

2019-07-23  Kewen Lin  <linkw@gcc.gnu.org>

	* config/rs6000/predicates.md (vint_reg_or_const_vector): New predicate.
	* config/rs6000/vector.md (vrotr<mode>3): New define_expand.
	* config/rs6000/altivec.md (vrl<mode>_and): New define_insns.

gcc/testsuite/ChangeLog

2019-07-23  Kewen Lin  <linkw@gcc.gnu.org>

	* gcc.target/powerpc/vec_rotate-1.c: New test.
	* gcc.target/powerpc/vec_rotate-2.c: New test.

on 2019/7/19 下午11:06, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Jul 19, 2019 at 10:21:06AM +0800, Kewen.Lin wrote:
>> on 2019/7/19 上午3:48, Segher Boessenkool wrote:
>>> On Thu, Jul 18, 2019 at 01:44:36PM +0800, Kewen.Lin wrote:
>>>> on 2019/7/17 下午9:40, Segher Boessenkool wrote:
>>>>> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:
>>>>>> Regression testing just launched, is it OK for trunk if it's bootstrapped
>>>>>> and regresstested on powerpc64le-unknown-linux-gnu?
>>>>>
>>>>>> +;; Expanders for rotatert to make use of vrotl
>>>>>> +(define_expand "vrotr<mode>3"
>>>>>> +  [(set (match_operand:VEC_I 0 "vint_operand")
>>>>>> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
>>>>>> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
>>>>>
>>>>> Having any rotatert in a define_expand or define_insn will regress.
> 
> This is wrong.  I don't know why I thought this for a while.
> 
> There shouldn't be any rotatert in anything that goes through recog, but
> that is everything *except* define_expand.  So define_insn, define_split,
> define_peephole, define_peephole2 (and define_insn_and_split, which is
> just syntactic sugar).
> 
>> Thanks for further explanation!  Sorry that, but I didn't find this 
>> HAVE_rotatert definition.  I guess it's due to the preparation is always 
>> "DONE"?  Then it doesn't really generate rotatert. 
> 
> You only had one in a define_expand.  That is fine, that pattern is never
> recognised against.  HAVE_rotatert means that something somewhere will
> recognise rotatert RTL insns; if it isn't set, it doesn't make sense to
> ever create them, because they will never match.
> 
>> although I can see rotatert in insn like below, it seems fine in note?
> 
> Sure, many things are allowed in notes that can never show up in RTL
> proper.
> 
> So, this approach will work fine, and not be too bad.  Could you do a
> new patch with it?  It's simple to do, and even if the generic thing
> will happen eventually, this is a nice stepping stone for that.
> 
> Thanks, and sorry for the confusion,
> 
> 
> Segher
> 

[-- Attachment #2: expand.diff --]
[-- Type: text/plain, Size: 7649 bytes --]

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index b6a22d9010c..2b6a957d4a6 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1666,6 +1666,60 @@
   "vrl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Here these vrl<VI2>_and are for vrotr<mode>3 expansion.
+;; since SHIFT_COUNT_TRUNCATED is set as zero, to append one explicit
+;; AND to indicate truncation but emit vrl<VI_char> insn.
+(define_insn "vrlv2di_and"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+	(and:V2DI
+	  (rotate:V2DI (match_operand:V2DI 1 "register_operand" "v")
+	     (match_operand:V2DI 2 "register_operand" "v"))
+	  (const_vector:V2DI [(const_int 63) (const_int 63)])))]
+  "VECTOR_UNIT_P8_VECTOR_P (V2DImode)"
+  "vrld %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vrlv4si_and"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+	(and:V4SI
+	  (rotate:V4SI (match_operand:V4SI 1 "register_operand" "v")
+	     (match_operand:V4SI 2 "register_operand" "v"))
+	  (const_vector:V4SI [(const_int 31) (const_int 31)
+			      (const_int 31) (const_int 31)])))]
+  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
+  "vrlw %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vrlv8hi_and"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+	(and:V8HI
+	  (rotate:V8HI (match_operand:V8HI 1 "register_operand" "v")
+	     (match_operand:V8HI 2 "register_operand" "v"))
+	  (const_vector:V8HI [(const_int 15) (const_int 15)
+			      (const_int 15) (const_int 15)
+			      (const_int 15) (const_int 15)
+			      (const_int 15) (const_int 15)])))]
+  "VECTOR_UNIT_ALTIVEC_P (V8HImode)"
+  "vrlh %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vrlv16qi_and"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(and:V16QI
+	(rotate:V16QI (match_operand:V16QI 1 "register_operand" "v")
+	   (match_operand:V16QI 2 "register_operand" "v"))
+	(const_vector:V16QI [(const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)])))]
+  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
+  "vrlb %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vrl<VI_char>mi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 8ca98299950..c4c74630d26 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -163,6 +163,17 @@
   return VINT_REGNO_P (REGNO (op));
 })
 
+;; Return 1 if op is a vector register that operates on integer vectors
+;; or if op is a const vector with integer vector modes.
+(define_predicate "vint_reg_or_const_vector"
+  (match_code "reg,subreg,const_vector")
+{
+  if (GET_CODE (op) == CONST_VECTOR && GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    return 1;
+
+  return vint_operand (op, mode);
+})
+
 ;; Return 1 if op is a vector register to do logical operations on (and, or,
 ;; xor, etc.)
 (define_predicate "vlogical_operand"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 70bcfe02e22..b53f9717eb2 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -1260,6 +1260,35 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; Expanders for rotatert to make use of vrotl
+(define_expand "vrotr<mode>3"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
+		(match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
+{
+  rtx rot_count = gen_reg_rtx (<MODE>mode);
+  if (GET_CODE (operands[2]) == CONST_VECTOR)
+    {
+      machine_mode inner_mode = GET_MODE_INNER (<MODE>mode);
+      unsigned int bits = GET_MODE_PRECISION (inner_mode);
+      rtx mask_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits - 1));
+      rtx imm_vec
+	= simplify_const_unary_operation (NEG, <MODE>mode, operands[2],
+					  GET_MODE (operands[2]));
+      imm_vec
+	= simplify_const_binary_operation (AND, <MODE>mode, imm_vec, mask_vec);
+      rot_count = force_reg (<MODE>mode, imm_vec);
+      emit_insn (gen_vrotl<mode>3 (operands[0], operands[1], rot_count));
+    }
+  else
+    {
+      emit_insn (gen_neg<mode>2 (rot_count, operands[2]));
+      emit_insn (gen_vrl<mode>_and (operands[0], operands[1], rot_count));
+    }
+  DONE;
+})
+
 ;; Expanders for arithmetic shift left on each vector element
 (define_expand "vashl<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
new file mode 100644
index 00000000000..80aca1a94a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
@@ -0,0 +1,46 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count is const number.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> 8) | (sud[i] << (sizeof (sud[0]) * 8 - 8));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> 8) | (suw[i] << (sizeof (suw[0]) * 8 - 8));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> 9)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - 9));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> 5)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - 5));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
new file mode 100644
index 00000000000..affda6c023b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
@@ -0,0 +1,47 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count isn't const number.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+extern unsigned char rot_cnt;
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> rot_cnt) | (sud[i] << (sizeof (sud[0]) * 8 - rot_cnt));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> rot_cnt) | (suw[i] << (sizeof (suw[0]) * 8 - rot_cnt));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> rot_cnt)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - rot_cnt));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> rot_cnt)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - rot_cnt));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V2, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-23  7:32                   ` [PATCH V2, " Kewen.Lin
@ 2019-07-25 14:24                     ` Segher Boessenkool
  2019-07-26  3:33                       ` [PATCH V3, " Kewen.Lin
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-07-25 14:24 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

Hi Kewen,

On Tue, Jul 23, 2019 at 02:28:28PM +0800, Kewen.Lin wrote:
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1666,6 +1666,60 @@
>    "vrl<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
>  
> +;; Here these vrl<VI2>_and are for vrotr<mode>3 expansion.
> +;; since SHIFT_COUNT_TRUNCATED is set as zero, to append one explicit
> +;; AND to indicate truncation but emit vrl<VI_char> insn.
> +(define_insn "vrlv2di_and"
> +  [(set (match_operand:V2DI 0 "register_operand" "=v")
> +	(and:V2DI
> +	  (rotate:V2DI (match_operand:V2DI 1 "register_operand" "v")
> +	     (match_operand:V2DI 2 "register_operand" "v"))
> +	  (const_vector:V2DI [(const_int 63) (const_int 63)])))]
> +  "VECTOR_UNIT_P8_VECTOR_P (V2DImode)"
> +  "vrld %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

"vrlv2di_and" is an a bit unhappy name, we have a "vrlv" intruction.
Just something like "rotatev2di_something", maybe?

Do we have something similar for non-rotate vector shifts, already?  We
probably should, so please keep that in mind for naming things.

"vrlv2di_and" sounds like you first do the rotate, and then on what
that results in you do the and.  And that is what the pattern does,
too.  But this is wrong: it should mask off all but the lower bits
of operand 2, instead.

> +(define_insn "vrlv4si_and"
> +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> +	(and:V4SI
> +	  (rotate:V4SI (match_operand:V4SI 1 "register_operand" "v")
> +	     (match_operand:V4SI 2 "register_operand" "v"))
> +	  (const_vector:V4SI [(const_int 31) (const_int 31)
> +			      (const_int 31) (const_int 31)])))]
> +  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
> +  "vrlw %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vrlv8hi_and"
> +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> +	(and:V8HI
> +	  (rotate:V8HI (match_operand:V8HI 1 "register_operand" "v")
> +	     (match_operand:V8HI 2 "register_operand" "v"))
> +	  (const_vector:V8HI [(const_int 15) (const_int 15)
> +			      (const_int 15) (const_int 15)
> +			      (const_int 15) (const_int 15)
> +			      (const_int 15) (const_int 15)])))]
> +  "VECTOR_UNIT_ALTIVEC_P (V8HImode)"
> +  "vrlh %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vrlv16qi_and"
> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> +	(and:V16QI
> +	(rotate:V16QI (match_operand:V16QI 1 "register_operand" "v")
> +	   (match_operand:V16QI 2 "register_operand" "v"))
> +	(const_vector:V16QI [(const_int 7) (const_int 7)
> +			    (const_int 7) (const_int 7)
> +			    (const_int 7) (const_int 7)
> +			    (const_int 7) (const_int 7)
> +			    (const_int 7) (const_int 7)
> +			    (const_int 7) (const_int 7)
> +			    (const_int 7) (const_int 7)
> +			    (const_int 7) (const_int 7)])))]
> +  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
> +  "vrlb %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

All the patterns can be merged into one (using some code_iterator).  That
can be a later improvement.

> +;; Return 1 if op is a vector register that operates on integer vectors
> +;; or if op is a const vector with integer vector modes.
> +(define_predicate "vint_reg_or_const_vector"
> +  (match_code "reg,subreg,const_vector")
> +{
> +  if (GET_CODE (op) == CONST_VECTOR && GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> +    return 1;
> +
> +  return vint_operand (op, mode);
> +})

Hrm, I don't like this name very much.  Why is just vint_operand not
enough for what you use this for?

> +;; Expanders for rotatert to make use of vrotl
> +(define_expand "vrotr<mode>3"
> +  [(set (match_operand:VEC_I 0 "vint_operand")
> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
> +		(match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
> +{
> +  rtx rot_count = gen_reg_rtx (<MODE>mode);
> +  if (GET_CODE (operands[2]) == CONST_VECTOR)
> +    {
> +      machine_mode inner_mode = GET_MODE_INNER (<MODE>mode);
> +      unsigned int bits = GET_MODE_PRECISION (inner_mode);
> +      rtx mask_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits - 1));
> +      rtx imm_vec
> +	= simplify_const_unary_operation (NEG, <MODE>mode, operands[2],

(The "=" goes on the previous line).

> +					  GET_MODE (operands[2]));
> +      imm_vec
> +	= simplify_const_binary_operation (AND, <MODE>mode, imm_vec, mask_vec);
> +      rot_count = force_reg (<MODE>mode, imm_vec);
> +      emit_insn (gen_vrotl<mode>3 (operands[0], operands[1], rot_count));
> +    }
> +  else
> +    {
> +      emit_insn (gen_neg<mode>2 (rot_count, operands[2]));
> +      emit_insn (gen_vrl<mode>_and (operands[0], operands[1], rot_count));
> +    }
> +  DONE;
> +})

Why do you have to emit as the "and" form here?  Emitting the "bare"
rotate should work just as well here?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
> @@ -0,0 +1,46 @@
> +/* { dg-options "-O3" } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */

> +/* { dg-final { scan-assembler {\mvrld\M} } } */
> +/* { dg-final { scan-assembler {\mvrlw\M} } } */
> +/* { dg-final { scan-assembler {\mvrlh\M} } } */
> +/* { dg-final { scan-assembler {\mvrlb\M} } } */

You need to generate code for whatever cpu introduced those insns,
if you expect those to be generated ;-)

vsx_ok isn't needed.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V3, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-25 14:24                     ` Segher Boessenkool
@ 2019-07-26  3:33                       ` Kewen.Lin
  2019-07-26  3:37                         ` [PATCH V4, " Kewen.Lin
  2019-07-26 14:28                         ` [PATCH V3, " Segher Boessenkool
  0 siblings, 2 replies; 42+ messages in thread
From: Kewen.Lin @ 2019-07-26  3:33 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 4622 bytes --]

Hi Segher,

on 2019/7/25 脧脗脦莽9:49, Segher Boessenkool wrote:
> Hi Kewen,
> 
> On Tue, Jul 23, 2019 at 02:28:28PM +0800, Kewen.Lin wrote:
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -1666,6 +1666,60 @@
>>    "vrl<VI_char> %0,%1,%2"
>>    [(set_attr "type" "vecsimple")])
>>  
>> +;; Here these vrl<VI2>_and are for vrotr<mode>3 expansion.
>> +;; since SHIFT_COUNT_TRUNCATED is set as zero, to append one explicit
>> +;; AND to indicate truncation but emit vrl<VI_char> insn.
>> +(define_insn "vrlv2di_and"
>> +  [(set (match_operand:V2DI 0 "register_operand" "=v")
>> +	(and:V2DI
>> +	  (rotate:V2DI (match_operand:V2DI 1 "register_operand" "v")
>> +	     (match_operand:V2DI 2 "register_operand" "v"))
>> +	  (const_vector:V2DI [(const_int 63) (const_int 63)])))]
>> +  "VECTOR_UNIT_P8_VECTOR_P (V2DImode)"
>> +  "vrld %0,%1,%2"
>> +  [(set_attr "type" "vecsimple")])
> 
> "vrlv2di_and" is an a bit unhappy name, we have a "vrlv" intruction.
> Just something like "rotatev2di_something", maybe?
> 
> Do we have something similar for non-rotate vector shifts, already?  We
> probably should, so please keep that in mind for naming things.
> 
> "vrlv2di_and" sounds like you first do the rotate, and then on what
> that results in you do the and.  And that is what the pattern does,
> too.  But this is wrong: it should mask off all but the lower bits
> of operand 2, instead.
> 

Thanks for reviewing!

You are right, the name matches the pattern but not what we want.
How about the name trunc_vrl<mode>, first do the truncation on the
operand 2 then do the vector rotation.  I didn't find any existing
shifts with the similar pattern.

I've updated the name and associated pattern in the new patch.

>> +(define_insn "vrlv16qi_and"
>> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
>> +	(and:V16QI
>> +	(rotate:V16QI (match_operand:V16QI 1 "register_operand" "v")
>> +	   (match_operand:V16QI 2 "register_operand" "v"))
>> +	(const_vector:V16QI [(const_int 7) (const_int 7)
>> +			    (const_int 7) (const_int 7)
>> +			    (const_int 7) (const_int 7)
>> +			    (const_int 7) (const_int 7)
>> +			    (const_int 7) (const_int 7)
>> +			    (const_int 7) (const_int 7)
>> +			    (const_int 7) (const_int 7)
>> +			    (const_int 7) (const_int 7)])))]
>> +  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
>> +  "vrlb %0,%1,%2"
>> +  [(set_attr "type" "vecsimple")])
> 
> All the patterns can be merged into one (using some code_iterator).  That
> can be a later improvement.
> 

I guess you mean mode_attr?
I did try to merge them since they look tedious.  But the mode_attr can't
contain either "[" or "(" inside, it seems can't be used for different const 
vector mappings.  Really appreciate that if you can show me some examples.

>> +;; Return 1 if op is a vector register that operates on integer vectors
>> +;; or if op is a const vector with integer vector modes.
>> +(define_predicate "vint_reg_or_const_vector"
>> +  (match_code "reg,subreg,const_vector")

> Hrm, I don't like this name very much.  Why is just vint_operand not
> enough for what you use this for?
> 

vint_operand isn't enough since the expander legalizes the const vector into
vector register, I'm unable to get the feeder (const vector) of the input 
register operand.


>> +      rtx imm_vec
>> +	= simplify_const_unary_operation (NEG, <MODE>mode, operands[2],
> 
> (The "=" goes on the previous line).

OK, thanks.

>> +      emit_insn (gen_vrl<mode>_and (operands[0], operands[1], rot_count));
>> +    }
>> +  DONE;
>> +})
> 
> Why do you have to emit as the "and" form here?  Emitting the "bare"
> rotate should work just as well here?

Yes, the emitted insn is exactly the same.

It follows Jakub's suggestion via
https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01159.html

Append one explicit AND to indicate the truncation for the case 
!SHIFT_COUNT_TRUNCATED.  (sorry if the previous pattern misled.)

> 
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
>> @@ -0,0 +1,46 @@
>> +/* { dg-options "-O3" } */
>> +/* { dg-require-effective-target powerpc_vsx_ok } */
> 
>> +/* { dg-final { scan-assembler {\mvrld\M} } } */
>> +/* { dg-final { scan-assembler {\mvrlw\M} } } */
>> +/* { dg-final { scan-assembler {\mvrlh\M} } } */
>> +/* { dg-final { scan-assembler {\mvrlb\M} } } */
> 
> You need to generate code for whatever cpu introduced those insns,
> if you expect those to be generated ;-)
> 
> vsx_ok isn't needed.
> 

Thanks for catching, update it with altivec_ok in new patch.
I think we can still have this guard?  since those instructions 
origin from isa 2.03. 


[-- Attachment #2: expand_v3.diff --]
[-- Type: text/plain, Size: 7715 bytes --]

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index b6a22d9010c..2b0682ad2ba 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1666,6 +1666,56 @@
   "vrl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Here these vrl<VI2>_and are for vrotr<mode>3 expansion.
+;; since SHIFT_COUNT_TRUNCATED is set as zero, to append one explicit
+;; AND to indicate truncation but emit vrl<VI_char> insn.
+(define_insn "trunc_vrlv2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+	(rotate:V2DI (match_operand:V2DI 1 "register_operand" "v")
+	  (and:V2DI (match_operand:V2DI 2 "register_operand" "v")
+	    (const_vector:V2DI [(const_int 63) (const_int 63)]))))]
+  "VECTOR_UNIT_P8_VECTOR_P (V2DImode)"
+  "vrld %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "trunc_vrlv4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+	(rotate:V4SI (match_operand:V4SI 1 "register_operand" "v")
+	  (and:V4SI (match_operand:V4SI 2 "register_operand" "v")
+	    (const_vector:V4SI [(const_int 31) (const_int 31)
+			      (const_int 31) (const_int 31)]))))]
+  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
+  "vrlw %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "trunc_vrlv8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+	(rotate:V8HI (match_operand:V8HI 1 "register_operand" "v")
+	  (and:V8HI (match_operand:V8HI 2 "register_operand" "v")
+	    (const_vector:V8HI [(const_int 15) (const_int 15)
+			      (const_int 15) (const_int 15)
+			      (const_int 15) (const_int 15)
+			      (const_int 15) (const_int 15)]))))]
+  "VECTOR_UNIT_ALTIVEC_P (V8HImode)"
+  "vrlh %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "trunc_vrlv16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(rotate:V16QI (match_operand:V16QI 1 "register_operand" "v")
+	  (and:V16QI (match_operand:V16QI 2 "register_operand" "v")
+	    (const_vector:V16QI [(const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)]))))]
+  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
+  "vrlb %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vrl<VI_char>mi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 8ca98299950..c4c74630d26 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -163,6 +163,17 @@
   return VINT_REGNO_P (REGNO (op));
 })
 
+;; Return 1 if op is a vector register that operates on integer vectors
+;; or if op is a const vector with integer vector modes.
+(define_predicate "vint_reg_or_const_vector"
+  (match_code "reg,subreg,const_vector")
+{
+  if (GET_CODE (op) == CONST_VECTOR && GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    return 1;
+
+  return vint_operand (op, mode);
+})
+
 ;; Return 1 if op is a vector register to do logical operations on (and, or,
 ;; xor, etc.)
 (define_predicate "vlogical_operand"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 70bcfe02e22..8c50d09a7bf 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -1260,6 +1260,35 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; Expanders for rotatert to make use of vrotl
+(define_expand "vrotr<mode>3"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
+		(match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
+{
+  rtx rot_count = gen_reg_rtx (<MODE>mode);
+  if (GET_CODE (operands[2]) == CONST_VECTOR)
+    {
+      machine_mode inner_mode = GET_MODE_INNER (<MODE>mode);
+      unsigned int bits = GET_MODE_PRECISION (inner_mode);
+      rtx mask_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits - 1));
+      rtx imm_vec
+	= simplify_const_unary_operation (NEG, <MODE>mode, operands[2],
+					  GET_MODE (operands[2]));
+      imm_vec
+	= simplify_const_binary_operation (AND, <MODE>mode, imm_vec, mask_vec);
+      rot_count = force_reg (<MODE>mode, imm_vec);
+      emit_insn (gen_vrotl<mode>3 (operands[0], operands[1], rot_count));
+    }
+  else
+    {
+      emit_insn (gen_neg<mode>2 (rot_count, operands[2]));
+      emit_insn (gen_trunc_vrl<mode> (operands[0], operands[1], rot_count));
+    }
+  DONE;
+})
+
 ;; Expanders for arithmetic shift left on each vector element
 (define_expand "vashl<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
new file mode 100644
index 00000000000..7461f3b6317
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
@@ -0,0 +1,46 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count is const number.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> 8) | (sud[i] << (sizeof (sud[0]) * 8 - 8));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> 8) | (suw[i] << (sizeof (suw[0]) * 8 - 8));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> 9)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - 9));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> 5)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - 5));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} { target powerpc_p8vector_ok } } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
new file mode 100644
index 00000000000..bdfa1e25d07
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
@@ -0,0 +1,47 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count isn't const number.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+extern unsigned char rot_cnt;
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> rot_cnt) | (sud[i] << (sizeof (sud[0]) * 8 - rot_cnt));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> rot_cnt) | (suw[i] << (sizeof (suw[0]) * 8 - rot_cnt));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> rot_cnt)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - rot_cnt));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> rot_cnt)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - rot_cnt));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} { target powerpc_p8vector_ok } } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-26  3:33                       ` [PATCH V3, " Kewen.Lin
@ 2019-07-26  3:37                         ` Kewen.Lin
  2019-07-26 14:28                         ` [PATCH V3, " Segher Boessenkool
  1 sibling, 0 replies; 42+ messages in thread
From: Kewen.Lin @ 2019-07-26  3:37 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 165 bytes --]

Hi Segher,

You can just ignore the previous version and review the current
with one comment change and "=" line move.

Sorry for the inconvenience.


Thanks,
Kewen

[-- Attachment #2: expand_v4.diff --]
[-- Type: text/plain, Size: 7717 bytes --]

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index b6a22d9010c..217c7afdf17 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1666,6 +1666,56 @@
   "vrl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Here these trunc_vrl<VI2> are for vrotr<mode>3 expansion.
+;; since SHIFT_COUNT_TRUNCATED is set as zero, to append one explicit
+;; AND to indicate truncation but emit vrl<VI_char> insn.
+(define_insn "trunc_vrlv2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+	(rotate:V2DI (match_operand:V2DI 1 "register_operand" "v")
+	  (and:V2DI (match_operand:V2DI 2 "register_operand" "v")
+	    (const_vector:V2DI [(const_int 63) (const_int 63)]))))]
+  "VECTOR_UNIT_P8_VECTOR_P (V2DImode)"
+  "vrld %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "trunc_vrlv4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+	(rotate:V4SI (match_operand:V4SI 1 "register_operand" "v")
+	  (and:V4SI (match_operand:V4SI 2 "register_operand" "v")
+	    (const_vector:V4SI [(const_int 31) (const_int 31)
+			      (const_int 31) (const_int 31)]))))]
+  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
+  "vrlw %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "trunc_vrlv8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+	(rotate:V8HI (match_operand:V8HI 1 "register_operand" "v")
+	  (and:V8HI (match_operand:V8HI 2 "register_operand" "v")
+	    (const_vector:V8HI [(const_int 15) (const_int 15)
+			      (const_int 15) (const_int 15)
+			      (const_int 15) (const_int 15)
+			      (const_int 15) (const_int 15)]))))]
+  "VECTOR_UNIT_ALTIVEC_P (V8HImode)"
+  "vrlh %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "trunc_vrlv16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(rotate:V16QI (match_operand:V16QI 1 "register_operand" "v")
+	  (and:V16QI (match_operand:V16QI 2 "register_operand" "v")
+	    (const_vector:V16QI [(const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)
+			    (const_int 7) (const_int 7)]))))]
+  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
+  "vrlb %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vrl<VI_char>mi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 8ca98299950..c4c74630d26 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -163,6 +163,17 @@
   return VINT_REGNO_P (REGNO (op));
 })
 
+;; Return 1 if op is a vector register that operates on integer vectors
+;; or if op is a const vector with integer vector modes.
+(define_predicate "vint_reg_or_const_vector"
+  (match_code "reg,subreg,const_vector")
+{
+  if (GET_CODE (op) == CONST_VECTOR && GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    return 1;
+
+  return vint_operand (op, mode);
+})
+
 ;; Return 1 if op is a vector register to do logical operations on (and, or,
 ;; xor, etc.)
 (define_predicate "vlogical_operand"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 70bcfe02e22..a7e6c6c256f 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -1260,6 +1260,35 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; Expanders for rotatert to make use of vrotl
+(define_expand "vrotr<mode>3"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
+		(match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
+{
+  rtx rot_count = gen_reg_rtx (<MODE>mode);
+  if (GET_CODE (operands[2]) == CONST_VECTOR)
+    {
+      machine_mode inner_mode = GET_MODE_INNER (<MODE>mode);
+      unsigned int bits = GET_MODE_PRECISION (inner_mode);
+      rtx mask_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits - 1));
+      rtx imm_vec =
+	simplify_const_unary_operation (NEG, <MODE>mode, operands[2],
+					  GET_MODE (operands[2]));
+      imm_vec =
+	simplify_const_binary_operation (AND, <MODE>mode, imm_vec, mask_vec);
+      rot_count = force_reg (<MODE>mode, imm_vec);
+      emit_insn (gen_vrotl<mode>3 (operands[0], operands[1], rot_count));
+    }
+  else
+    {
+      emit_insn (gen_neg<mode>2 (rot_count, operands[2]));
+      emit_insn (gen_trunc_vrl<mode> (operands[0], operands[1], rot_count));
+    }
+  DONE;
+})
+
 ;; Expanders for arithmetic shift left on each vector element
 (define_expand "vashl<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
new file mode 100644
index 00000000000..7461f3b6317
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
@@ -0,0 +1,46 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count is const number.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> 8) | (sud[i] << (sizeof (sud[0]) * 8 - 8));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> 8) | (suw[i] << (sizeof (suw[0]) * 8 - 8));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> 9)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - 9));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> 5)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - 5));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} { target powerpc_p8vector_ok } } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
new file mode 100644
index 00000000000..bdfa1e25d07
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
@@ -0,0 +1,47 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count isn't const number.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+extern unsigned char rot_cnt;
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> rot_cnt) | (sud[i] << (sizeof (sud[0]) * 8 - rot_cnt));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> rot_cnt) | (suw[i] << (sizeof (suw[0]) * 8 - rot_cnt));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> rot_cnt)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - rot_cnt));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> rot_cnt)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - rot_cnt));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} { target powerpc_p8vector_ok } } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V3, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-26  3:33                       ` [PATCH V3, " Kewen.Lin
  2019-07-26  3:37                         ` [PATCH V4, " Kewen.Lin
@ 2019-07-26 14:28                         ` Segher Boessenkool
  2019-08-02  8:59                           ` [PATCH V5, " Kewen.Lin
  1 sibling, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-07-26 14:28 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

On Fri, Jul 26, 2019 at 11:20:27AM +0800, Kewen.Lin wrote:
> on 2019/7/25 下午9:49, Segher Boessenkool wrote:
> > "vrlv2di_and" is an a bit unhappy name, we have a "vrlv" intruction.
> > Just something like "rotatev2di_something", maybe?
> > 
> > Do we have something similar for non-rotate vector shifts, already?  We
> > probably should, so please keep that in mind for naming things.
> > 
> > "vrlv2di_and" sounds like you first do the rotate, and then on what
> > that results in you do the and.  And that is what the pattern does,
> > too.  But this is wrong: it should mask off all but the lower bits
> > of operand 2, instead.
> 
> You are right, the name matches the pattern but not what we want.
> How about the name trunc_vrl<mode>, first do the truncation on the
> operand 2 then do the vector rotation.  I didn't find any existing
> shifts with the similar pattern.

vector.md has define_expand "vrotl<mode>3".  Can you use that one
directly?

> > All the patterns can be merged into one (using some code_iterator).  That
> > can be a later improvement.
> 
> I guess you mean mode_attr?

code iterator, mode iterator, what's the difference ;-)  (Yup you're right
of course).

> I did try to merge them since they look tedious.  But the mode_attr can't
> contain either "[" or "(" inside, it seems can't be used for different const 
> vector mappings.  Really appreciate that if you can show me some examples.

You might need to define a new predicate?  I'll have a look later.

> >> +;; Return 1 if op is a vector register that operates on integer vectors
> >> +;; or if op is a const vector with integer vector modes.
> >> +(define_predicate "vint_reg_or_const_vector"
> >> +  (match_code "reg,subreg,const_vector")
> 
> > Hrm, I don't like this name very much.  Why is just vint_operand not
> > enough for what you use this for?
> 
> vint_operand isn't enough since the expander legalizes the const vector into
> vector register, I'm unable to get the feeder (const vector) of the input 
> register operand.

Hrm.  Ideally you wouldn't expand to such very specific patterns in the
first place.

> > Why do you have to emit as the "and" form here?  Emitting the "bare"
> > rotate should work just as well here?
> 
> Yes, the emitted insn is exactly the same.
> 
> It follows Jakub's suggestion via
> https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01159.html
> 
> Append one explicit AND to indicate the truncation for the case 
> !SHIFT_COUNT_TRUNCATED.  (sorry if the previous pattern misled.)

This is rs6000-specific code.  We always have !SHIFT_COUNT_TRUNCATED, but
we *do* in all cases have similar effects (it just differs per mode and
on the exact operation exactly how many bits are masked).

> >> +/* { dg-options "-O3" } */
> >> +/* { dg-require-effective-target powerpc_vsx_ok } */
> > 
> >> +/* { dg-final { scan-assembler {\mvrld\M} } } */
> >> +/* { dg-final { scan-assembler {\mvrlw\M} } } */
> >> +/* { dg-final { scan-assembler {\mvrlh\M} } } */
> >> +/* { dg-final { scan-assembler {\mvrlb\M} } } */
> > 
> > You need to generate code for whatever cpu introduced those insns,
> > if you expect those to be generated ;-)
> > 
> > vsx_ok isn't needed.
> > 
> 
> Thanks for catching, update it with altivec_ok in new patch.
> I think we can still have this guard?  since those instructions 
> origin from isa 2.03. 

The ISA doc says "2.03" for the altivec insns, because that was the first
ISA version that describes altivec.  Before this it was a separate doc;
altivec is as old as ISA 1.xx.

vrld needs power8, as you discovered.  That means you need to have two
testcases, one you compile with -mdejagnu-cpu=power8, one without.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V5, rs6000] Support vrotr<mode>3 for int vector types
  2019-07-26 14:28                         ` [PATCH V3, " Segher Boessenkool
@ 2019-08-02  8:59                           ` Kewen.Lin
  2019-08-03 20:52                             ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Kewen.Lin @ 2019-08-02  8:59 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

Hi Segher,

Sorry for the late, I've addressed your comments in the attached patch.
Some points:
  1) Remove explict AND part.
  2) Rename predicate name to vint_reg_or_vint_const.
  3) Split test cases into altivec and power8.

As to the predicate name and usage, I checked the current vector shifts, 
they don't need to check const_vector specially (like right to left 
conversion), excepting for the one "vec_shr_<mode>", but it checks for
scalar const int.

Btw, I've changed the 
  +      rtx imm_vec = 
  +	simplify_const_unary_operation
back to 
  +      rtx imm_vec
  +	= simplify_const_unary_operation
Otherwise check_GNU_style will report "Trailing operator" error.  :(

Bootstrapped and regtested on powerpc64le-unknown-linux-gnu.

Thanks,
Kewen

------------

gcc/ChangeLog

2019-08-02  Kewen Lin  <linkw@gcc.gnu.org>

	* config/rs6000/predicates.md (vint_reg_or_vint_const): New predicate.
	* config/rs6000/vector.md (vrotr<mode>3): New define_expand.

gcc/testsuite/ChangeLog

2019-08-02  Kewen Lin  <linkw@gcc.gnu.org>

	* gcc.target/powerpc/vec_rotate-1.c: New test.
	* gcc.target/powerpc/vec_rotate-2.c: New test.
	* gcc.target/powerpc/vec_rotate-3.c: New test.
	* gcc.target/powerpc/vec_rotate-4.c: New test.


[-- Attachment #2: expand_v5.diff --]
[-- Type: text/plain, Size: 6384 bytes --]

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 8ca98299950..faf057425a8 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -163,6 +163,17 @@
   return VINT_REGNO_P (REGNO (op));
 })
 
+;; Return 1 if op is a vector register that operates on integer vectors
+;; or if op is a const vector with integer vector modes.
+(define_predicate "vint_reg_or_vint_const"
+  (match_code "reg,subreg,const_vector")
+{
+  if (GET_CODE (op) == CONST_VECTOR && GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    return 1;
+
+  return vint_operand (op, mode);
+})
+
 ;; Return 1 if op is a vector register to do logical operations on (and, or,
 ;; xor, etc.)
 (define_predicate "vlogical_operand"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 70bcfe02e22..3111ca9029f 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -1260,6 +1260,33 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; Expanders for rotatert to make use of vrotl
+(define_expand "vrotr<mode>3"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
+		(match_operand:VEC_I 2 "vint_reg_or_vint_const")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
+{
+  rtx rot_count = gen_reg_rtx (<MODE>mode);
+  if (GET_CODE (operands[2]) == CONST_VECTOR)
+    {
+      machine_mode inner_mode = GET_MODE_INNER (<MODE>mode);
+      unsigned int bits = GET_MODE_PRECISION (inner_mode);
+      rtx mask_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits - 1));
+      rtx imm_vec
+	= simplify_const_unary_operation (NEG, <MODE>mode, operands[2],
+					  GET_MODE (operands[2]));
+      imm_vec
+	= simplify_const_binary_operation (AND, <MODE>mode, imm_vec, mask_vec);
+      rot_count = force_reg (<MODE>mode, imm_vec);
+    }
+  else
+    emit_insn (gen_neg<mode>2 (rot_count, operands[2]));
+
+  emit_insn (gen_vrotl<mode>3 (operands[0], operands[1], rot_count));
+  DONE;
+})
+
 ;; Expanders for arithmetic shift left on each vector element
 (define_expand "vashl<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
new file mode 100644
index 00000000000..f035a578292
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
@@ -0,0 +1,39 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count is const number.
+
+   Check for instructions vrlb/vrlh/vrlw only available if altivec supported. */
+
+#define N 256
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> 8) | (suw[i] << (sizeof (suw[0]) * 8 - 8));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> 9)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - 9));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> 5)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - 5));
+}
+
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
new file mode 100644
index 00000000000..0a2a965ddcb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
@@ -0,0 +1,19 @@
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O3 -mdejagnu-cpu=power8" } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power8, mainly
+   for the case rotation count is const number.
+
+   Check for vrld which is available on Power8 and above.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> 8) | (sud[i] << (sizeof (sud[0]) * 8 - 8));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-3.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-3.c
new file mode 100644
index 00000000000..5e90ae6fd63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-3.c
@@ -0,0 +1,40 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count isn't const number.
+
+   Check for instructions vrlb/vrlh/vrlw only available if altivec supported. */
+
+#define N 256
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+extern unsigned char rot_cnt;
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> rot_cnt) | (suw[i] << (sizeof (suw[0]) * 8 - rot_cnt));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> rot_cnt)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - rot_cnt));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> rot_cnt)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - rot_cnt));
+}
+
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-4.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-4.c
new file mode 100644
index 00000000000..0d3e8378ed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-4.c
@@ -0,0 +1,20 @@
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O3 -mdejagnu-cpu=power8" } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power8, mainly
+   for the case rotation count isn't const number.
+
+   Check for vrld which is available on Power8 and above.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+extern unsigned char rot_cnt;
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> rot_cnt) | (sud[i] << (sizeof (sud[0]) * 8 - rot_cnt));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5, rs6000] Support vrotr<mode>3 for int vector types
  2019-08-02  8:59                           ` [PATCH V5, " Kewen.Lin
@ 2019-08-03 20:52                             ` Segher Boessenkool
  2019-08-05  3:41                               ` Kewen.Lin
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-08-03 20:52 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

Hi!

I somehow lost track of this email, sorry.

On Fri, Aug 02, 2019 at 04:59:44PM +0800, Kewen.Lin wrote:
> As to the predicate name and usage, I checked the current vector shifts, 
> they don't need to check const_vector specially (like right to left 
> conversion), excepting for the one "vec_shr_<mode>", but it checks for
> scalar const int.

I don't understand why we want to expand rotate-by-vector-of-immediates
if we have no insns for that?  If you just use vint_operand, what happens
then?

> Btw, I've changed the 
>   +      rtx imm_vec = 
>   +	simplify_const_unary_operation
> back to 
>   +      rtx imm_vec
>   +	= simplify_const_unary_operation
> Otherwise check_GNU_style will report "Trailing operator" error.  :(

Yeah I got it the wrong way around.  Either way is ugly.  Oh well.

> +/* { dg-options "-O3" } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */

If you use altivec_ok, you need to use -maltivec in the options, too.
This test should probably work with -O2 as well; use that, if possible.

> +/* { dg-require-effective-target powerpc_p8vector_ok } */

I don't think we need this anymore?  Not sure.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5, rs6000] Support vrotr<mode>3 for int vector types
  2019-08-03 20:52                             ` Segher Boessenkool
@ 2019-08-05  3:41                               ` Kewen.Lin
  2019-08-05 21:50                                 ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Kewen.Lin @ 2019-08-05  3:41 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]

Hi Segher,

on 2019/8/4 脡脧脦莽4:52, Segher Boessenkool wrote:
> Hi!
> 
> I somehow lost track of this email, sorry.
> 
> On Fri, Aug 02, 2019 at 04:59:44PM +0800, Kewen.Lin wrote:
>> As to the predicate name and usage, I checked the current vector shifts, 
>> they don't need to check const_vector specially (like right to left 
>> conversion), excepting for the one "vec_shr_<mode>", but it checks for
>> scalar const int.
> 
> I don't understand why we want to expand rotate-by-vector-of-immediates
> if we have no insns for that?  If you just use vint_operand, what happens
> then?
> 

You are right, if we just use vint_operand, the functionality should be the
same, the only small difference is the adjusted constant rotation number 
isn't masked, but it would be fine for functionality.

One example for ULL >r 8, with const vector handling, it gets
  xxspltib 33,56

Without the handling, it gets 
  xxsplitb 33,248

But I agree that it's trivial and unified it as below attached patch.

>> +/* { dg-options "-O3" } */
>> +/* { dg-require-effective-target powerpc_altivec_ok } */
> 
> If you use altivec_ok, you need to use -maltivec in the options, too.
> This test should probably work with -O2 as well; use that, if possible.
> 

Sorry, the test case depends on vectorization which isn't enabled at -O2
by default.

>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> 
> I don't think we need this anymore?  Not sure.
> 

I thought -mdejagnu-cpu=power8 can only ensure power8 cpu setting takes
preference, but can't guarantee the current driver supports power8
complication.  As your comments, I guess since gcc configuration don't
have without-cpu= etc., the power8 support should be always guaranteed?


Thanks,
Kewen

-----------------

gcc/ChangeLog

2019-08-05  Kewen Lin  <linkw@gcc.gnu.org>

	* config/rs6000/vector.md (vrotr<mode>3): New define_expand.

gcc/testsuite/ChangeLog

2019-08-05  Kewen Lin  <linkw@gcc.gnu.org>

	* gcc.target/powerpc/vec_rotate-1.c: New test.
	* gcc.target/powerpc/vec_rotate-2.c: New test.
	* gcc.target/powerpc/vec_rotate-3.c: New test.
	* gcc.target/powerpc/vec_rotate-4.c: New test.

[-- Attachment #2: expand_v6.diff --]
[-- Type: text/plain, Size: 5092 bytes --]

diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 70bcfe02e22..886cbad1655 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -1260,6 +1260,19 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; Expanders for rotatert to make use of vrotl
+(define_expand "vrotr<mode>3"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
+		(match_operand:VEC_I 2 "vint_operand")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
+{
+  rtx rot_count = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neg<mode>2 (rot_count, operands[2]));
+  emit_insn (gen_vrotl<mode>3 (operands[0], operands[1], rot_count));
+  DONE;
+})
+
 ;; Expanders for arithmetic shift left on each vector element
 (define_expand "vashl<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
new file mode 100644
index 00000000000..f035a578292
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
@@ -0,0 +1,39 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count is const number.
+
+   Check for instructions vrlb/vrlh/vrlw only available if altivec supported. */
+
+#define N 256
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> 8) | (suw[i] << (sizeof (suw[0]) * 8 - 8));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> 9)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - 9));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> 5)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - 5));
+}
+
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
new file mode 100644
index 00000000000..0a2a965ddcb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
@@ -0,0 +1,19 @@
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O3 -mdejagnu-cpu=power8" } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power8, mainly
+   for the case rotation count is const number.
+
+   Check for vrld which is available on Power8 and above.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> 8) | (sud[i] << (sizeof (sud[0]) * 8 - 8));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-3.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-3.c
new file mode 100644
index 00000000000..5e90ae6fd63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-3.c
@@ -0,0 +1,40 @@
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count isn't const number.
+
+   Check for instructions vrlb/vrlh/vrlw only available if altivec supported. */
+
+#define N 256
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+extern unsigned char rot_cnt;
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> rot_cnt) | (suw[i] << (sizeof (suw[0]) * 8 - rot_cnt));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> rot_cnt)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - rot_cnt));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> rot_cnt)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - rot_cnt));
+}
+
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-4.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-4.c
new file mode 100644
index 00000000000..0d3e8378ed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-4.c
@@ -0,0 +1,20 @@
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O3 -mdejagnu-cpu=power8" } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power8, mainly
+   for the case rotation count isn't const number.
+
+   Check for vrld which is available on Power8 and above.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+extern unsigned char rot_cnt;
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> rot_cnt) | (sud[i] << (sizeof (sud[0]) * 8 - rot_cnt));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5, rs6000] Support vrotr<mode>3 for int vector types
  2019-08-05  3:41                               ` Kewen.Lin
@ 2019-08-05 21:50                                 ` Segher Boessenkool
  2019-08-06  3:11                                   ` Kewen.Lin
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-08-05 21:50 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

On Mon, Aug 05, 2019 at 11:41:41AM +0800, Kewen.Lin wrote:
> on 2019/8/4 上午4:52, Segher Boessenkool wrote:
> > On Fri, Aug 02, 2019 at 04:59:44PM +0800, Kewen.Lin wrote:
> >> As to the predicate name and usage, I checked the current vector shifts, 
> >> they don't need to check const_vector specially (like right to left 
> >> conversion), excepting for the one "vec_shr_<mode>", but it checks for
> >> scalar const int.
> > 
> > I don't understand why we want to expand rotate-by-vector-of-immediates
> > if we have no insns for that?  If you just use vint_operand, what happens
> > then?
> 
> You are right, if we just use vint_operand, the functionality should be the
> same, the only small difference is the adjusted constant rotation number 
> isn't masked, but it would be fine for functionality.
> 
> One example for ULL >r 8, with const vector handling, it gets
>   xxspltib 33,56
> 
> Without the handling, it gets 
>   xxsplitb 33,248
> 
> But I agree that it's trivial and unified it as below attached patch.

There are two cases: either all elements are rotated by the same amount,
or they are not.  When they are, on p8 and later we can always use
xxspltib, which allows immediates 0..255, and the rotates look only at
the low bits they need, in that element, for that element (so you can
always splat it to all bytes, all but the low-order bytes are just
ignored by the rotate insns; before p8, we use vsplti[bhw], and those
allow -16..15, so for vrlw you do *not* want to mask it with 31.
There is some mechanics with easy_altivec_constant that should help
here.  Maybe it can use some improvement.

The other case is if not all shift counts are the same.  I'm not sure
we actually care much about this case :-)

> >> +/* { dg-options "-O3" } */
> >> +/* { dg-require-effective-target powerpc_altivec_ok } */
> > 
> > If you use altivec_ok, you need to use -maltivec in the options, too.
> > This test should probably work with -O2 as well; use that, if possible.
> 
> Sorry, the test case depends on vectorization which isn't enabled at -O2
> by default.

Ah yes, that is pretty obvious.

> >> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> > 
> > I don't think we need this anymore?  Not sure.
> 
> I thought -mdejagnu-cpu=power8 can only ensure power8 cpu setting takes
> preference, but can't guarantee the current driver supports power8
> complication.  As your comments, I guess since gcc configuration don't
> have without-cpu= etc., the power8 support should be always guaranteed?

The compiler always supports all CPUs.

If you are using something like -maltivec, things are different: you
might have selected a CPU that does not allow -maltivec, so we do need
altivec_ok.  But if you use -mcpu=power8 (or -mdejagnu-cpu=power8), you
can use all p8 insns, including the vector ones (unless you disable
them again with -mno-vsx or similar; just don't do that ;-) )

[ In the past, it was possible to configure the compiler without support
for p8 vector insns, if your assembler doesn't support them.  We do not
do this anymore: now, if your compiler does support things that your
assembler does not, you'll get errors from that assembler if you try to
use those instructions.  Which is fine, just make sure you use a new
enough assembler for the GCC version you use.  This always has been true,
but with a somewhat larger window of allowed versions.  But this "don't
support all insns if the assembler does not" means we need to test a lot
more configurations (or leave them untested, even worse).

As a side effect, most *_ok now do nothing.  *_hw of course is still
needed to check if the test system allows running the testcase.  ]

> gcc/ChangeLog
> 
> 2019-08-05  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	* config/rs6000/vector.md (vrotr<mode>3): New define_expand.
> 
> gcc/testsuite/ChangeLog
> 
> 2019-08-05  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	* gcc.target/powerpc/vec_rotate-1.c: New test.
> 	* gcc.target/powerpc/vec_rotate-2.c: New test.
> 	* gcc.target/powerpc/vec_rotate-3.c: New test.
> 	* gcc.target/powerpc/vec_rotate-4.c: New test.

Approved for trunk (with or without the p8vector_ok change).  Thank you!


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5, rs6000] Support vrotr<mode>3 for int vector types
  2019-08-05 21:50                                 ` Segher Boessenkool
@ 2019-08-06  3:11                                   ` Kewen.Lin
  2019-08-06 15:12                                     ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Kewen.Lin @ 2019-08-06  3:11 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

Hi Segher,

on 2019/8/6 上午5:21, Segher Boessenkool wrote:
> On Mon, Aug 05, 2019 at 11:41:41AM +0800, Kewen.Lin wrote:
>> on 2019/8/4 上午4:52, Segher Boessenkool wrote:
>>> On Fri, Aug 02, 2019 at 04:59:44PM +0800, Kewen.Lin wrote:
> There are two cases: either all elements are rotated by the same amount,
> or they are not.  When they are, on p8 and later we can always use
> xxspltib, which allows immediates 0..255, and the rotates look only at
> the low bits they need, in that element, for that element (so you can
> always splat it to all bytes, all but the low-order bytes are just
> ignored by the rotate insns; before p8, we use vsplti[bhw], and those
> allow -16..15, so for vrlw you do *not* want to mask it with 31.
> There is some mechanics with easy_altivec_constant that should help
> here.  Maybe it can use some improvement.
> 
> The other case is if not all shift counts are the same.  I'm not sure
> we actually care much about this case :-)
> 

Got it, I think even for the "other" case, the neg operation without masking
is also safe since the hardware instructions do ignore the unrelated bits.

>> I thought -mdejagnu-cpu=power8 can only ensure power8 cpu setting takes
>> preference, but can't guarantee the current driver supports power8
>> complication.  As your comments, I guess since gcc configuration don't
>> have without-cpu= etc., the power8 support should be always guaranteed?
> 
> The compiler always supports all CPUs.
> 
> If you are using something like -maltivec, things are different: you
> might have selected a CPU that does not allow -maltivec, so we do need
> altivec_ok.  But if you use -mcpu=power8 (or -mdejagnu-cpu=power8), you
> can use all p8 insns, including the vector ones (unless you disable
> them again with -mno-vsx or similar; just don't do that ;-) )
> 
> [ In the past, it was possible to configure the compiler without support
> for p8 vector insns, if your assembler doesn't support them.  We do not
> do this anymore: now, if your compiler does support things that your
> assembler does not, you'll get errors from that assembler if you try to
> use those instructions.  Which is fine, just make sure you use a new
> enough assembler for the GCC version you use.  This always has been true,
> but with a somewhat larger window of allowed versions.  But this "don't
> support all insns if the assembler does not" means we need to test a lot
> more configurations (or leave them untested, even worse).
> 
> As a side effect, most *_ok now do nothing.  *_hw of course is still
> needed to check if the test system allows running the testcase.  ]
> 

Thanks a lot for the detailed explanation!  I'll remove p8vector_ok.

>> gcc/ChangeLog
>>
>> 2019-08-05  Kewen Lin  <linkw@gcc.gnu.org>
>>
>> 	* config/rs6000/vector.md (vrotr<mode>3): New define_expand.
>>
>> gcc/testsuite/ChangeLog
>>
>> 2019-08-05  Kewen Lin  <linkw@gcc.gnu.org>
>>
>> 	* gcc.target/powerpc/vec_rotate-1.c: New test.
>> 	* gcc.target/powerpc/vec_rotate-2.c: New test.
>> 	* gcc.target/powerpc/vec_rotate-3.c: New test.
>> 	* gcc.target/powerpc/vec_rotate-4.c: New test.
> 
> Approved for trunk (with or without the p8vector_ok change).  Thank you!
> 

Thank you!  :)


Kewen

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5, rs6000] Support vrotr<mode>3 for int vector types
  2019-08-06  3:11                                   ` Kewen.Lin
@ 2019-08-06 15:12                                     ` Segher Boessenkool
  0 siblings, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-08-06 15:12 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Jakub Jelinek, Richard Biener, richard.sandiford,
	Bill Schmidt

On Tue, Aug 06, 2019 at 10:19:34AM +0800, Kewen.Lin wrote:
> on 2019/8/6 上午5:21, Segher Boessenkool wrote:
> > The other case is if not all shift counts are the same.  I'm not sure
> > we actually care much about this case :-)
> 
> Got it, I think even for the "other" case, the neg operation without masking
> is also safe since the hardware instructions do ignore the unrelated bits.

Oh it is perfectly safe.  Only wondering how well we optimise this :-)


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2019-08-06 15:04 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-15  8:59 Check rrotate optab first when transforming lrotate Kewen.Lin
2019-07-15  9:16 ` Jakub Jelinek
2019-07-15  9:19   ` Richard Biener
2019-07-15  9:20   ` Richard Sandiford
2019-07-15 10:54   ` Kewen.Lin
2019-07-15 14:51   ` Segher Boessenkool
     [not found]   ` <d2ccc831-c805-c7b8-5a90-cb3e5ee5ed8b@linux.ibm.com>
2019-07-16  8:48     ` [RFC] Consider lrotate const rotation in vectorizer Kewen.Lin
2019-07-17  8:42       ` [PATCH, rs6000] Support vrotr<mode>3 for int vector types Kewen.Lin
2019-07-17  8:44         ` Jakub Jelinek
2019-07-17  9:38           ` Kewen.Lin
2019-07-17 10:18             ` Jakub Jelinek
2019-07-17 13:48         ` Segher Boessenkool
2019-07-18  6:06           ` Kewen.Lin
2019-07-18 20:06             ` Segher Boessenkool
2019-07-19  6:51               ` Kewen.Lin
2019-07-19 15:49                 ` Segher Boessenkool
2019-07-23  7:32                   ` [PATCH V2, " Kewen.Lin
2019-07-25 14:24                     ` Segher Boessenkool
2019-07-26  3:33                       ` [PATCH V3, " Kewen.Lin
2019-07-26  3:37                         ` [PATCH V4, " Kewen.Lin
2019-07-26 14:28                         ` [PATCH V3, " Segher Boessenkool
2019-08-02  8:59                           ` [PATCH V5, " Kewen.Lin
2019-08-03 20:52                             ` Segher Boessenkool
2019-08-05  3:41                               ` Kewen.Lin
2019-08-05 21:50                                 ` Segher Boessenkool
2019-08-06  3:11                                   ` Kewen.Lin
2019-08-06 15:12                                     ` Segher Boessenkool
2019-07-17 10:39       ` [RFC] Consider lrotate const rotation in vectorizer Richard Biener
2019-07-17 11:19         ` Jakub Jelinek
2019-07-17 11:35           ` Richard Biener
2019-07-17 11:56             ` Richard Biener
2019-07-17 13:58             ` Segher Boessenkool
2019-07-17 17:51           ` Segher Boessenkool
2019-07-18  7:03             ` Jakub Jelinek
2019-07-18 19:45               ` Segher Boessenkool
2019-07-18 15:17             ` Richard Earnshaw (lists)
2019-07-18 15:26               ` Jakub Jelinek
2019-07-18 15:31                 ` Richard Earnshaw (lists)
2019-07-18 15:35                   ` Jakub Jelinek
2019-07-18 15:44                     ` Richard Earnshaw (lists)
2019-07-18 18:04               ` Segher Boessenkool
2019-07-18  6:28         ` Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).