[PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare
@ 2022-11-16  6:48 Kewen.Lin
  2022-11-16  6:51 ` [PATCH 2/2] rs6000: Refine integer comparison handlings " Kewen.Lin
  2022-11-16 18:44 ` [PATCH 1/2] rs6000: Emit vector fp comparison directly " Segher Boessenkool
  0 siblings, 2 replies; 10+ messages in thread
From: Kewen.Lin @ 2022-11-16  6:48 UTC (permalink / raw)
  To: GCC Patches
  Cc: Segher Boessenkool, David Edelsohn, Peter Bergner, Michael Meissner

Hi,

All kinds of vector float comparison operators have been
supported in one rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is also for a subsequent patch to deal with some
comparison operators under trapping math enabled or disabled,
so it's important to have one centralized place for vector
float comparison handlings for better maintenance.

Bootstrapped and regtested on powerpc64-linux-gnu P7 and P8,
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this later this week if no objections.

BR,
Kewen
-----

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
	float only comparison operators.
	(rs6000_emit_vector_compare): Emit vector comparison insn directly for
	float modes.
---
 gcc/config/rs6000/rs6000.cc | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 635aced6105..56db12f08a0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15660,10 +15660,6 @@ rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
     case EQ:
     case GT:
     case GTU:
-    case ORDERED:
-    case UNORDERED:
-    case UNEQ:
-    case LTGT:
       mask = gen_reg_rtx (mode);
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
       return mask;
@@ -15681,12 +15677,24 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
 			    machine_mode dmode)
 {
   rtx mask;
-  bool swap_operands = false;
-  bool try_again = false;
-
   gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode));
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));

+  /* In vector.md, we support all kinds of vector float point
+     comparison operators in a comparison rtl pattern, we can
+     just emit the comparison rtx insn directly here.  Besides,
+     we should have a centralized place to handle the possibility
+     of raising invalid exception.  */
+  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
+    {
+      mask = gen_reg_rtx (dmode);
+      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
+      return mask;
+    }
+
+  bool swap_operands = false;
+  bool try_again = false;
+
   /* See if the comparison works as is.  */
   mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
   if (mask)
@@ -15705,10 +15713,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
       try_again = true;
       break;
     case NE:
-    case UNLE:
-    case UNLT:
-    case UNGE:
-    case UNGT:
       /* Invert condition and try again.
 	 e.g., A != B becomes ~(A==B).  */
       {
--
2.27.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] rs6000: Refine integer comparison handlings in rs6000_emit_vector_compare
  2022-11-16  6:48 [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare Kewen.Lin
@ 2022-11-16  6:51 ` Kewen.Lin
  2022-11-16 18:58   ` Segher Boessenkool
  2022-11-16 18:44 ` [PATCH 1/2] rs6000: Emit vector fp comparison directly " Segher Boessenkool
  1 sibling, 1 reply; 10+ messages in thread
From: Kewen.Lin @ 2022-11-16  6:51 UTC (permalink / raw)
  To: GCC Patches
  Cc: Segher Boessenkool, David Edelsohn, Peter Bergner, Michael Meissner

Hi,

The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  This patch is
to refine the handlings for vector integer comparison operators,
it becomes not recursive, and we don't need the helper function
rs6000_emit_vector_compare_inner any more.

Bootstrapped and regtested on powerpc64-linux-gnu P7 and P8,
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this later this week if no objections.

BR,
Kewen
-----
gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove.
	(rs6000_emit_vector_compare): Refine it by directly using the reversed
	or swapped code, to avoid the recursion.
---
 gcc/config/rs6000/rs6000.cc | 159 ++++++++----------------------------
 1 file changed, 34 insertions(+), 125 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 56db12f08a0..21f4cda7b80 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15639,169 +15639,78 @@ output_cbranch (rtx op, const char *label, int reversed, rtx_insn *insn)
   return string;
 }

-/* Return insn for VSX or Altivec comparisons.  */
-
-static rtx
-rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
-{
-  rtx mask;
-  machine_mode mode = GET_MODE (op0);
-
-  switch (code)
-    {
-    default:
-      break;
-
-    case GE:
-      if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
-	return NULL_RTX;
-      /* FALLTHRU */
-
-    case EQ:
-    case GT:
-    case GTU:
-      mask = gen_reg_rtx (mode);
-      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
-      return mask;
-    }
-
-  return NULL_RTX;
-}
-
 /* Emit vector compare for operands OP0 and OP1 using code RCODE.
-   DMODE is expected destination mode. This is a recursive function.  */
+   DMODE is expected destination mode.  */

 static rtx
 rs6000_emit_vector_compare (enum rtx_code rcode,
 			    rtx op0, rtx op1,
 			    machine_mode dmode)
 {
-  rtx mask;
   gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode));
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));
+  rtx mask = gen_reg_rtx (dmode);

   /* In vector.md, we support all kinds of vector float point
      comparison operators in a comparison rtl pattern, we can
      just emit the comparison rtx insn directly here.  Besides,
      we should have a centralized place to handle the possibility
-     of raising invalid exception.  */
-  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
+     of raising invalid exception.  Also emit directly for vector
+     integer comparison operators EQ/GT/GTU.  */
+  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
+      || rcode == EQ
+      || rcode == GT
+      || rcode == GTU)
     {
-      mask = gen_reg_rtx (dmode);
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
       return mask;
     }

   bool swap_operands = false;
-  bool try_again = false;
-
-  /* See if the comparison works as is.  */
-  mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
-  if (mask)
-    return mask;
+  bool need_invert = false;
+  enum rtx_code code = UNKNOWN;

   switch (rcode)
     {
     case LT:
-      rcode = GT;
-      swap_operands = true;
-      try_again = true;
-      break;
     case LTU:
-      rcode = GTU;
+      code = swap_condition (rcode);
       swap_operands = true;
-      try_again = true;
       break;
     case NE:
-      /* Invert condition and try again.
-	 e.g., A != B becomes ~(A==B).  */
-      {
-	enum rtx_code rev_code;
-	enum insn_code nor_code;
-	rtx mask2;
-
-	rev_code = reverse_condition_maybe_unordered (rcode);
-	if (rev_code == UNKNOWN)
-	  return NULL_RTX;
-
-	nor_code = optab_handler (one_cmpl_optab, dmode);
-	if (nor_code == CODE_FOR_nothing)
-	  return NULL_RTX;
-
-	mask2 = rs6000_emit_vector_compare (rev_code, op0, op1, dmode);
-	if (!mask2)
-	  return NULL_RTX;
-
-	mask = gen_reg_rtx (dmode);
-	emit_insn (GEN_FCN (nor_code) (mask, mask2));
-	return mask;
-      }
+    case LE:
+    case LEU:
+      code = reverse_condition (rcode);
+      need_invert = true;
       break;
     case GE:
+      code = GT;
+      swap_operands = true;
+      need_invert = true;
+      break;
     case GEU:
-    case LE:
-    case LEU:
-      /* Try GT/GTU/LT/LTU OR EQ */
-      {
-	rtx c_rtx, eq_rtx;
-	enum insn_code ior_code;
-	enum rtx_code new_code;
-
-	switch (rcode)
-	  {
-	  case  GE:
-	    new_code = GT;
-	    break;
-
-	  case GEU:
-	    new_code = GTU;
-	    break;
-
-	  case LE:
-	    new_code = LT;
-	    break;
-
-	  case LEU:
-	    new_code = LTU;
-	    break;
-
-	  default:
-	    gcc_unreachable ();
-	  }
-
-	ior_code = optab_handler (ior_optab, dmode);
-	if (ior_code == CODE_FOR_nothing)
-	  return NULL_RTX;
-
-	c_rtx = rs6000_emit_vector_compare (new_code, op0, op1, dmode);
-	if (!c_rtx)
-	  return NULL_RTX;
-
-	eq_rtx = rs6000_emit_vector_compare (EQ, op0, op1, dmode);
-	if (!eq_rtx)
-	  return NULL_RTX;
-
-	mask = gen_reg_rtx (dmode);
-	emit_insn (GEN_FCN (ior_code) (mask, c_rtx, eq_rtx));
-	return mask;
-      }
+      code = GTU;
+      swap_operands = true;
+      need_invert = true;
       break;
     default:
-      return NULL_RTX;
+      gcc_unreachable ();
+      break;
     }

-  if (try_again)
-    {
-      if (swap_operands)
-	std::swap (op0, op1);
+  if (swap_operands)
+    std::swap (op0, op1);
+
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));

-      mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
-      if (mask)
-	return mask;
+  if (need_invert)
+    {
+      enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
+      gcc_assert (nor_code != CODE_FOR_nothing);
+      emit_insn (GEN_FCN (nor_code) (mask, mask));
     }

-  /* You only get two chances.  */
-  return NULL_RTX;
+  return mask;
 }

 /* Emit vector conditional expression.  DEST is destination. OP_TRUE and
--
2.27.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare
  2022-11-16  6:48 [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare Kewen.Lin
  2022-11-16  6:51 ` [PATCH 2/2] rs6000: Refine integer comparison handlings " Kewen.Lin
@ 2022-11-16 18:44 ` Segher Boessenkool
  2022-11-17  6:59   ` Kewen.Lin
  1 sibling, 1 reply; 10+ messages in thread
From: Segher Boessenkool @ 2022-11-16 18:44 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, David Edelsohn, Peter Bergner, Michael Meissner

Hi!

On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
> 	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
> 	float only comparison operators.

Why?  Is that correct?  Your mail says nothing about this :-(

Is there any testcase that covers this, and that shows things still
generate the same code?


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] rs6000: Refine integer comparison handlings in rs6000_emit_vector_compare
  2022-11-16  6:51 ` [PATCH 2/2] rs6000: Refine integer comparison handlings " Kewen.Lin
@ 2022-11-16 18:58   ` Segher Boessenkool
  2022-11-17  7:52     ` Kewen.Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Segher Boessenkool @ 2022-11-16 18:58 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, David Edelsohn, Peter Bergner, Michael Meissner

Hi!

On Wed, Nov 16, 2022 at 02:51:04PM +0800, Kewen.Lin wrote:
> The current handlings in rs6000_emit_vector_compare is a bit
> complicated to me, especially after we emit vector float
> comparison insn with the given code directly.  This patch is
> to refine the handlings for vector integer comparison operators,
> it becomes not recursive, and we don't need the helper function
> rs6000_emit_vector_compare_inner any more.

That sounds nice.

>    /* In vector.md, we support all kinds of vector float point
>       comparison operators in a comparison rtl pattern, we can
>       just emit the comparison rtx insn directly here.  Besides,
>       we should have a centralized place to handle the possibility
> -     of raising invalid exception.  */
> -  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
> +     of raising invalid exception.  Also emit directly for vector
> +     integer comparison operators EQ/GT/GTU.  */
> +  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
> +      || rcode == EQ
> +      || rcode == GT
> +      || rcode == GTU)

The comment still says it handles FP only.  That would be best to keep
imo: add a separate block of code to handle the integer stuff you want
to add.  You get the same or better generated code, the compiler is
smart enough.  Code is for the user to read, and C is not a portable
assembler language.

This whole series needs to be factored better, it does way too many
things, and only marginally related things, at every step.  Or I don't
see it anyway :-)


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare
  2022-11-16 18:44 ` [PATCH 1/2] rs6000: Emit vector fp comparison directly " Segher Boessenkool
@ 2022-11-17  6:59   ` Kewen.Lin
  2022-11-18 15:10     ` Segher Boessenkool
  0 siblings, 1 reply; 10+ messages in thread
From: Kewen.Lin @ 2022-11-17  6:59 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, David Edelsohn, Peter Bergner, Michael Meissner

Hi Segher,

Thanks for the comments!

on 2022/11/17 02:44, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
>> 	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
>> 	float only comparison operators.
> 
> Why?  Is that correct?  Your mail says nothing about this :-(
> 
> Is there any testcase that covers this, and that shows things still
> generate the same code?
> 

Sorry for the unclear description, I thought mistakenly that it's
probably straightforward.

With the change in this patch, all 14 vector float comparison operators
(unordered/ordered/eq/ne/gt/lt/ge/le/ungt/unge/unlt/unle/uneq/ltgt)
would be handled early in rs6000_emit_vector_compare.

For unordered/ordered/ltgt/uneq, the new way is exactly the same
as what we do in rs6000_emit_vector_compare_inner, it means there is
no chance to get into rs6000_emit_vector_compare_inner with any of them.
For eq/ge/gt, it's the same too, but they are shared with vector integer
comparison, I just left them alone here.  Just noticed we can remove ge
safely too as it's guarded with !MODE_VECTOR_INT.

For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code
with reverse_condition_maybe_unordered and invert the result, it's the
same as what we have in vector.md.

; unge(a,b) = ~lt(a,b)
; unle(a,b) = ~gt(a,b)
; ne(a,b)   = ~eq(a,b)
; ungt(a,b) = ~le(a,b)
; unlt(a,b) = ~ge(a,b)

Then eq/ge/gt on the right side would match the cases that were mentioned
above.  So we just need to focus on lt and le then.

For lt, rs6000_emit_vector_compare swaps operands and the operator to gt,
it's the same as what we have in vector.md:

; lt(a,b)   = gt(b,a)

, and further matches the case mentioned above.

As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq,
and further handle lt recursively, that is:
   le = lt(a,b) || eq(a,b)
      = gt(b,a) || eq(a,b)

actually this is worse than what vector.md supports:

; le(a,b)   = ge(b,a)

In short, the function rs6000_emit_vector_compare_inner is only called by
twice in rs6000_emit_vector_compare, there is no chance to enter
rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq
any more, I think it's safe to make the change in function
rs6000_emit_vector_compare_inner.  Besides, the proposed way to handle
vector float comparison can improve slightly for UNGT and LE handlings.

I constructed a test case, compiled with option -O2 -ftree-vectorize
-fno-vect-cost-model on ppc64le, which goes into this function
rs6000_emit_vector_compare with all 14 vector float comparison codes,
the assembly of most functions doesn't change after this patch,
excepting for test_UNGT_{float,double} and test_LE_{float,double}.

one example from 
before:

          lxvx 12,3,9
          lxvx 11,4,9
          xvcmpgtsp 0,11,12
          xvcmpeqsp 12,12,11
          xxlor 0,0,12
          xxlandc 0,32,0
          stxvx 0,5,9
          addi 9,9,16
          bdnz .L77

vs. 

after: (good to be unrolled)

          lxvx 0,4,10
          lxvx 12,3,10
          addi 9,10,16
          lxvx 11,3,9
          xvcmpgesp 12,0,12
          lxvx 0,4,9
          xvcmpgesp 0,0,11
          xxlandc 12,32,12
          stxvx 12,5,10
          addi 10,10,32
          xxlandc 0,32,0
          stxvx 0,5,9
          bdnz .L77


===============
$ cat test.h

#define UNORD(a, b) (__builtin_isunordered ((a), (b)))
#define ORD(a, b) (!__builtin_isunordered ((a), (b)))
#define LTGT(a, b) (__builtin_islessgreater ((a), (b)))
#define UNEQ(a, b) (!__builtin_islessgreater ((a), (b)))
#define UNGT(a, b) (!__builtin_islessequal ((a), (b)))
#define UNGE(a, b) (!__builtin_isless ((a), (b)))
#define UNLT(a, b) (!__builtin_isgreaterequal ((a), (b)))
#define UNLE(a, b) (!__builtin_isgreater ((a), (b)))
#define GT(a, b) (((a) > (b)))
#define GE(a, b) (((a) >= (b)))
#define LT(a, b) (((a) < (b)))
#define LE(a, b) (((a) <= (b)))
#define EQ(a, b) (((a) == (b)))
#define NE(a, b) (((a) != (b)))

#define TEST_VECT(NAME, TYPE)                                                  \
  __attribute__ ((noipa)) void test_##NAME##_##TYPE (TYPE *x, TYPE *y,         \
                                                     int *res, int n)          \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      res[i] = NAME (x[i], y[i]);                                              \
  }

===============
$ cat test.c

#include "test.h"

#define TEST(TYPE)                                                             \
  TEST_VECT (UNORD, TYPE)                                                      \
  TEST_VECT (ORD, TYPE)                                                        \
  TEST_VECT (LTGT, TYPE)                                                       \
  TEST_VECT (UNEQ, TYPE)                                                       \
  TEST_VECT (UNGT, TYPE)                                                       \
  TEST_VECT (UNGE, TYPE)                                                       \
  TEST_VECT (UNLT, TYPE)                                                       \
  TEST_VECT (UNLE, TYPE)                                                       \
  TEST_VECT (GT, TYPE)                                                         \
  TEST_VECT (GE, TYPE)                                                         \
  TEST_VECT (LT, TYPE)                                                         \
  TEST_VECT (LE, TYPE)                                                         \
  TEST_VECT (EQ, TYPE)                                                         \
  TEST_VECT (NE, TYPE)

TEST (float)
TEST (double)
===============

Maybe it's good to add one test case with function test_{UNGT,LE}_{float,double}
and scan not xvcmp{gt,eq}[sd]p.

With the above explanation, does this patch look good to you?

BR,
Kewen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] rs6000: Refine integer comparison handlings in rs6000_emit_vector_compare
  2022-11-16 18:58   ` Segher Boessenkool
@ 2022-11-17  7:52     ` Kewen.Lin
  2022-11-18 15:18       ` Segher Boessenkool
  0 siblings, 1 reply; 10+ messages in thread
From: Kewen.Lin @ 2022-11-17  7:52 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, David Edelsohn, Peter Bergner, Michael Meissner

Hi Segher,

Thanks for the comments!

on 2022/11/17 02:58, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Nov 16, 2022 at 02:51:04PM +0800, Kewen.Lin wrote:
>> The current handlings in rs6000_emit_vector_compare is a bit
>> complicated to me, especially after we emit vector float
>> comparison insn with the given code directly.  This patch is
>> to refine the handlings for vector integer comparison operators,
>> it becomes not recursive, and we don't need the helper function
>> rs6000_emit_vector_compare_inner any more.
> 
> That sounds nice.
> 
>>    /* In vector.md, we support all kinds of vector float point
>>       comparison operators in a comparison rtl pattern, we can
>>       just emit the comparison rtx insn directly here.  Besides,
>>       we should have a centralized place to handle the possibility
>> -     of raising invalid exception.  */
>> -  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
>> +     of raising invalid exception.  Also emit directly for vector
>> +     integer comparison operators EQ/GT/GTU.  */
>> +  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
>> +      || rcode == EQ
>> +      || rcode == GT
>> +      || rcode == GTU)
> 
> The comment still says it handles FP only.  That would be best to keep
> imo: add a separate block of code to handle the integer stuff you want
> to add.  You get the same or better generated code, the compiler is
> smart enough.  Code is for the user to read, and C is not a portable
> assembler language.

OK, I'll make two blocks for FP and integer respectively.  I struggled
a bit updating this hunk with comments for integer comparison
consideration, someone could argue that both can share the same handling
if updating the condition.

> 
> This whole series needs to be factored better, it does way too many
> things, and only marginally related things, at every step.  Or I don't
> see it anyway :-)

OK, I was thinking patch 1/2 is to unify the current vector float
comparison handlings, patch 2/2 is to refine the remaining handlings
for vector integer comparison.  I'm pleased to factor it better, any
suggestions on concrete code is highly appreciated.  :)

btw, I constructed one test case as below, there is no assembly change
before and after this patch.

#define GT(a, b) (((a) > (b)))
#define GE(a, b) (((a) >= (b)))
#define LT(a, b) (((a) < (b)))
#define LE(a, b) (((a) <= (b)))
#define EQ(a, b) (((a) == (b)))
#define NE(a, b) (((a) != (b)))

#define TEST_VECT(NAME, TYPE)                                                  \
  __attribute__ ((noipa)) void test_##NAME##_##TYPE (TYPE *x, TYPE *y,         \
                                                     int *res, int n)          \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      res[i] = NAME (x[i], y[i]);                                              \
  }

#include "stdint.h"

#define TEST(TYPE)                                                             \
  TEST_VECT (GT, TYPE)                                                         \
  TEST_VECT (GE, TYPE)                                                         \
  TEST_VECT (LT, TYPE)                                                         \
  TEST_VECT (LE, TYPE)                                                         \
  TEST_VECT (EQ, TYPE)                                                         \
  TEST_VECT (NE, TYPE)

TEST (int64_t)
TEST (uint64_t)
TEST (int32_t)
TEST (uint32_t)
TEST (int16_t)
TEST (uint16_t)
TEST (int8_t)
TEST (uint8_t)



BR,
Kewen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare
  2022-11-17  6:59   ` Kewen.Lin
@ 2022-11-18 15:10     ` Segher Boessenkool
  2022-11-21  2:01       ` Kewen.Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Segher Boessenkool @ 2022-11-18 15:10 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, David Edelsohn, Peter Bergner, Michael Meissner

Hi!

On Thu, Nov 17, 2022 at 02:59:00PM +0800, Kewen.Lin wrote:
> on 2022/11/17 02:44, Segher Boessenkool wrote:
> > On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
> >> 	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
> >> 	float only comparison operators.
> > 
> > Why?  Is that correct?  Your mail says nothing about this :-(
> > 
> > Is there any testcase that covers this, and that shows things still
> > generate the same code?
> > 
> 
> Sorry for the unclear description, I thought mistakenly that it's
> probably straightforward.
> 
> With the change in this patch, all 14 vector float comparison operators
> (unordered/ordered/eq/ne/gt/lt/ge/le/ungt/unge/unlt/unle/uneq/ltgt)
> would be handled early in rs6000_emit_vector_compare.
> 
> For unordered/ordered/ltgt/uneq, the new way is exactly the same
> as what we do in rs6000_emit_vector_compare_inner, it means there is
> no chance to get into rs6000_emit_vector_compare_inner with any of them.

Ah!  In that case, please add an assert there.  It helps catch problems,
but much more importantly even, if helps the reader understand what is
going on :-)

> For eq/ge/gt, it's the same too, but they are shared with vector integer
> comparison, I just left them alone here.  Just noticed we can remove ge
> safely too as it's guarded with !MODE_VECTOR_INT.

ge is nasty for float, it means something different with and without
-ffast-math (with fast-math ge means not lt, le means not gt; both can
be done with a simple single condition, no cror needed.  (Compare to ne
which is the same with and without -ffast-math, that is because it has a
"not" in its definition!)

> For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code
> with reverse_condition_maybe_unordered and invert the result, it's the
> same as what we have in vector.md.
> 
> ; unge(a,b) = ~lt(a,b)
> ; unle(a,b) = ~gt(a,b)
> ; ne(a,b)   = ~eq(a,b)
> ; ungt(a,b) = ~le(a,b)
> ; unlt(a,b) = ~ge(a,b)

But for these last two do we generate identical code still?  Since
forever we have only use cror here (with CCEQ), not crnor etc. (and will
CCEQ still do the correct thing always then?)

> Then eq/ge/gt on the right side would match the cases that were mentioned
> above.  So we just need to focus on lt and le then.
> 
> For lt, rs6000_emit_vector_compare swaps operands and the operator to gt,
> it's the same as what we have in vector.md:
> 
> ; lt(a,b)   = gt(b,a)
> 
> , and further matches the case mentioned above.
> 
> As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq,
> and further handle lt recursively, that is:
>    le = lt(a,b) || eq(a,b)
>       = gt(b,a) || eq(a,b)
> 
> actually this is worse than what vector.md supports:
> 
> ; le(a,b)   = ge(b,a)
> 
> In short, the function rs6000_emit_vector_compare_inner is only called by
> twice in rs6000_emit_vector_compare, there is no chance to enter
> rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq
> any more, I think it's safe to make the change in function
> rs6000_emit_vector_compare_inner.  Besides, the proposed way to handle
> vector float comparison can improve slightly for UNGT and LE handlings.

Thanks for the explanation!

Can you do this in multiple steps, which will make it much easier to
review, and to spot the problem if some unexpected problem shows up?

> I constructed a test case, compiled with option -O2 -ftree-vectorize
> -fno-vect-cost-model on ppc64le, which goes into this function
> rs6000_emit_vector_compare with all 14 vector float comparison codes,
> the assembly of most functions doesn't change after this patch,
> excepting for test_UNGT_{float,double} and test_LE_{float,double}.

For, this is a separate change, a separate and the other patches will
show no changes in generated code at all.

> Maybe it's good to add one test case with function test_{UNGT,LE}_{float,double}
> and scan not xvcmp{gt,eq}[sd]p.

In the patch that changes code gen for those, sure :-)


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] rs6000: Refine integer comparison handlings in rs6000_emit_vector_compare
  2022-11-17  7:52     ` Kewen.Lin
@ 2022-11-18 15:18       ` Segher Boessenkool
  0 siblings, 0 replies; 10+ messages in thread
From: Segher Boessenkool @ 2022-11-18 15:18 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, David Edelsohn, Peter Bergner, Michael Meissner

Hi!

On Thu, Nov 17, 2022 at 03:52:26PM +0800, Kewen.Lin wrote:
> on 2022/11/17 02:58, Segher Boessenkool wrote:
> > On Wed, Nov 16, 2022 at 02:51:04PM +0800, Kewen.Lin wrote:
> >>    /* In vector.md, we support all kinds of vector float point
> >>       comparison operators in a comparison rtl pattern, we can
> >>       just emit the comparison rtx insn directly here.  Besides,
> >>       we should have a centralized place to handle the possibility
> >> -     of raising invalid exception.  */
> >> -  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
> >> +     of raising invalid exception.  Also emit directly for vector
> >> +     integer comparison operators EQ/GT/GTU.  */
> >> +  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
> >> +      || rcode == EQ
> >> +      || rcode == GT
> >> +      || rcode == GTU)
> > 
> > The comment still says it handles FP only.  That would be best to keep
> > imo: add a separate block of code to handle the integer stuff you want
> > to add.  You get the same or better generated code, the compiler is
> > smart enough.  Code is for the user to read, and C is not a portable
> > assembler language.
> 
> OK, I'll make two blocks for FP and integer respectively.  I struggled
> a bit updating this hunk with comments for integer comparison
> consideration, someone could argue that both can share the same handling
> if updating the condition.

The golden rule of programming: if something is hard to explain, you
probably overcomplicated it.  Sometimes more code is much easier to
read, too.

> > This whole series needs to be factored better, it does way too many
> > things, and only marginally related things, at every step.  Or I don't
> > see it anyway :-)
> 
> OK, I was thinking patch 1/2 is to unify the current vector float
> comparison handlings, patch 2/2 is to refine the remaining handlings
> for vector integer comparison.  I'm pleased to factor it better, any
> suggestions on concrete code is highly appreciated.  :)

Often it helps to start with a patch (or patches) that only restructures
existing code, without changing what it does, so that the patch that
does change anything material is smaller and easier to read and review.
The (perhaps big) patch that doesn't change anything is easy to review
as well then, simple because it *says* it does not change anything, and
reviewing it boils down to verifying that is true.


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare
  2022-11-18 15:10     ` Segher Boessenkool
@ 2022-11-21  2:01       ` Kewen.Lin
  2022-11-27 18:16         ` Segher Boessenkool
  0 siblings, 1 reply; 10+ messages in thread
From: Kewen.Lin @ 2022-11-21  2:01 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, David Edelsohn, Peter Bergner, Michael Meissner

Hi Segher,

on 2022/11/18 23:10, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Nov 17, 2022 at 02:59:00PM +0800, Kewen.Lin wrote:
>> on 2022/11/17 02:44, Segher Boessenkool wrote:
>>> On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
>>>> 	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
>>>> 	float only comparison operators.
>>>
>>> Why?  Is that correct?  Your mail says nothing about this :-(
>>>
>>> Is there any testcase that covers this, and that shows things still
>>> generate the same code?
>>>
>>
>> Sorry for the unclear description, I thought mistakenly that it's
>> probably straightforward.
>>
>> With the change in this patch, all 14 vector float comparison operators
>> (unordered/ordered/eq/ne/gt/lt/ge/le/ungt/unge/unlt/unle/uneq/ltgt)
>> would be handled early in rs6000_emit_vector_compare.
>>
>> For unordered/ordered/ltgt/uneq, the new way is exactly the same
>> as what we do in rs6000_emit_vector_compare_inner, it means there is
>> no chance to get into rs6000_emit_vector_compare_inner with any of them.
> 
> Ah!  In that case, please add an assert there.  It helps catch problems,
> but much more importantly even, if helps the reader understand what is
> going on :-)

Good idea, will do.

> 
>> For eq/ge/gt, it's the same too, but they are shared with vector integer
>> comparison, I just left them alone here.  Just noticed we can remove ge
>> safely too as it's guarded with !MODE_VECTOR_INT.
> 
> ge is nasty for float, it means something different with and without
> -ffast-math (with fast-math ge means not lt, le means not gt; both can
> be done with a simple single condition, no cror needed.  (Compare to ne
> which is the same with and without -ffast-math, that is because it has a
> "not" in its definition!)
> 

It's true for scalar float comparison, but the context here is for vector
comparison, the result of comparison is still vector (of boolean), and we
have the corresponding vector comparison instruction for ge, so I think it
should be fine here.

>> For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code
>> with reverse_condition_maybe_unordered and invert the result, it's the
>> same as what we have in vector.md.
>>
>> ; unge(a,b) = ~lt(a,b)
>> ; unle(a,b) = ~gt(a,b)
>> ; ne(a,b)   = ~eq(a,b)
>> ; ungt(a,b) = ~le(a,b)
>> ; unlt(a,b) = ~ge(a,b)
> 
> But for these last two do we generate identical code still?  Since
> forever we have only use cror here (with CCEQ), not crnor etc. (and will
> CCEQ still do the correct thing always then?)

For ge (~ge), yes; while for le (~le), it's not, as explained previously below.

> 
>> Then eq/ge/gt on the right side would match the cases that were mentioned
>> above.  So we just need to focus on lt and le then.
>>
>> For lt, rs6000_emit_vector_compare swaps operands and the operator to gt,
>> it's the same as what we have in vector.md:
>>
>> ; lt(a,b)   = gt(b,a)
>>
>> , and further matches the case mentioned above.
>>
>> As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq,
>> and further handle lt recursively, that is:
>>    le = lt(a,b) || eq(a,b)
>>       = gt(b,a) || eq(a,b)
>>
>> actually this is worse than what vector.md supports:
>>
>> ; le(a,b)   = ge(b,a)
>>
>> In short, the function rs6000_emit_vector_compare_inner is only called by
>> twice in rs6000_emit_vector_compare, there is no chance to enter
>> rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq
>> any more, I think it's safe to make the change in function
>> rs6000_emit_vector_compare_inner.  Besides, the proposed way to handle
>> vector float comparison can improve slightly for UNGT and LE handlings.
> 
> Thanks for the explanation!
> 
> Can you do this in multiple steps, which will make it much easier to
> review, and to spot the problem if some unexpected problem shows up?

Sure, I'll try my best to separate it into some steps and show how it
evolves gradually.

> 
>> I constructed a test case, compiled with option -O2 -ftree-vectorize
>> -fno-vect-cost-model on ppc64le, which goes into this function
>> rs6000_emit_vector_compare with all 14 vector float comparison codes,
>> the assembly of most functions doesn't change after this patch,
>> excepting for test_UNGT_{float,double} and test_LE_{float,double}.
> 
> For, this is a separate change, a separate and the other patches will
> show no changes in generated code at all.

Good point, will separate it.

> 
>> Maybe it's good to add one test case with function test_{UNGT,LE}_{float,double}
>> and scan not xvcmp{gt,eq}[sd]p.
> 
> In the patch that changes code gen for those, sure :-)
> 

Thanks for all the comments again.

BR,
Kewen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare
  2022-11-21  2:01       ` Kewen.Lin
@ 2022-11-27 18:16         ` Segher Boessenkool
  0 siblings, 0 replies; 10+ messages in thread
From: Segher Boessenkool @ 2022-11-27 18:16 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, David Edelsohn, Peter Bergner, Michael Meissner

Hi!

Whoops I missed following up to this.

On Mon, Nov 21, 2022 at 10:01:14AM +0800, Kewen.Lin wrote:
> on 2022/11/18 23:10, Segher Boessenkool wrote:
> > ge is nasty for float, it means something different with and without
> > -ffast-math (with fast-math ge means not lt, le means not gt; both can
> > be done with a simple single condition, no cror needed.  (Compare to ne
> > which is the same with and without -ffast-math, that is because it has a
> > "not" in its definition!)
> 
> It's true for scalar float comparison, but the context here is for vector
> comparison, the result of comparison is still vector (of boolean), and we
> have the corresponding vector comparison instruction for ge, so I think it
> should be fine here.

It is fine if all contexts it is used in allow ge insns, sure.  But you
have to make sure that is true; ge still is nasty, it truly means
something different with fastmath (which applies to vector float just\
the same as it does to scalar float).

> > Thanks for the explanation!
> > 
> > Can you do this in multiple steps, which will make it much easier to
> > review, and to spot the problem if some unexpected problem shows up?
> 
> Sure, I'll try my best to separate it into some steps and show how it
> evolves gradually.

If you can make the bulk of the series not actually change code
generation, just rearrange and massage the compiler code, that is much
easier to review (and it also helps to spot the problems in if there are
regressions, as a bonus).

Cheers,


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-27 18:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-16  6:48 [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare Kewen.Lin
2022-11-16  6:51 ` [PATCH 2/2] rs6000: Refine integer comparison handlings " Kewen.Lin
2022-11-16 18:58   ` Segher Boessenkool
2022-11-17  7:52     ` Kewen.Lin
2022-11-18 15:18       ` Segher Boessenkool
2022-11-16 18:44 ` [PATCH 1/2] rs6000: Emit vector fp comparison directly " Segher Boessenkool
2022-11-17  6:59   ` Kewen.Lin
2022-11-18 15:10     ` Segher Boessenkool
2022-11-21  2:01       ` Kewen.Lin
2022-11-27 18:16         ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).