[PATCH 0/9] rs6000: Rework rs6000_emit_vector

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare
@ 2022-11-24  9:15 Kewen Lin
  2022-11-24  9:15 ` [PATCH 1/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1 Kewen Lin
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Kewen Lin @ 2022-11-24  9:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kewen Lin, segher, dje.gcc, bergner, meissner

Hi,

Following Segher's suggestion, this patch series is to rework
function rs6000_emit_vector_compare for vector float and int
in multiple steps, it's based on the previous attempts [1][2].
As mentioned in [1], the need to rework this for float is to
make a centralized place for vector float comparison handlings
instead of supporting with swapping ops and reversing code etc.
dispersedly.  It's also for a subsequent patch to handle
comparison operators with or without trapping math (PR105480).
With the handling on vector float reworked, we can further make
the handling on vector int simplified as shown.

For Segher's concern about whether this rework causes any
assembly change, I constructed two testcases for vector float[3]
and int[4] respectively before, it showed the most are fine
excepting for the difference on LE and UNGT, it's demonstrated
as improvement since it uses GE instead of GT ior EQ.  The
associated test case in patch 3/9 is a good example.

Besides, w/ and w/o the whole patch series, I built the whole
SPEC2017 at options -O3 and -Ofast separately, checked the
differences on object assembly.  The result showed that the
most are unchanged, except for:

  * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
    9 object files with differences.

  * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
    one and 527.cam4_r has 4 object files with differences.

By looking into these differences, all significant differences
are caused by the known improvement mentined above transforming
GT ior EQ to GE, which can also affect unrolling decision due
to insn count.  Some other trivial differences are branch
target offset difference, nop difference for alignment, vsx
register number differences etc.

I also evaluated the runtime performance for these changed
benchmarks, the result is neutral.

These patches are bootstrapped and regress-tested
incrementally on powerpc64-linux-gnu P7 & P8, and
powerpc64le-linux-gnu P9 & P10.

Is it ok for trunk?

BR,
Kewen
-----
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html

Kewen Lin (9):
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5

 gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
 gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
 2 files changed, 74 insertions(+), 131 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c

-- 
2.27.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
  2022-11-24  9:15 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen Lin
@ 2022-11-24  9:15 ` Kewen Lin
  2022-11-24  9:15 ` [PATCH 2/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2 Kewen Lin
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Kewen Lin @ 2022-11-24  9:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kewen Lin, segher, dje.gcc, bergner, meissner

All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 1, it only handles the operators which are
already emitted with an rtx comparison previously in function
rs6000_emit_vector_compare_inner, they are EQ/GT/GE/ORDERED/
UNORDERED/UNEQ/LTGT.  There is no functionality change.

With this change, rs6000_emit_vector_compare_inner would
only work for vector integer comparison handling, it would
be cleaned up later in vector integer comparison rework.

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Move
	MODE_VECTOR_FLOAT handlings out.
	(rs6000_emit_vector_compare): Emit rtx comparison for operators EQ/GT/
	GE/UNORDERED/ORDERED/UNEQ/LTGT of MODE_VECTOR_FLOAT directly, and
	adjust one call site of rs6000_emit_vector_compare_inner to
	rs6000_emit_vector_compare.
---
 gcc/config/rs6000/rs6000.cc | 47 ++++++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d2743f7bce6..5a8f7ff3bf8 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15644,7 +15644,6 @@ output_cbranch (rtx op, const char *label, int reversed, rtx_insn *insn)
 static rtx
 rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
 {
-  rtx mask;
   machine_mode mode = GET_MODE (op0);
 
   switch (code)
@@ -15652,19 +15651,11 @@ rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
     default:
       break;
 
-    case GE:
-      if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
-	return NULL_RTX;
-      /* FALLTHRU */
-
     case EQ:
     case GT:
     case GTU:
-    case ORDERED:
-    case UNORDERED:
-    case UNEQ:
-    case LTGT:
-      mask = gen_reg_rtx (mode);
+      gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
+      rtx mask = gen_reg_rtx (mode);
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
       return mask;
     }
@@ -15680,18 +15671,42 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
 			    rtx op0, rtx op1,
 			    machine_mode dmode)
 {
-  rtx mask;
-  bool swap_operands = false;
-  bool try_again = false;
-
   gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode));
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));
+  rtx mask;
+
+  /* In vector.md, we support all kinds of vector float point
+     comparison operators in a comparison rtl pattern, we can
+     just emit the comparison rtx insn directly here.  Besides,
+     we should have a centralized place to handle the possibility
+     of raising invalid exception.  As the first step, only check
+     operators EQ/GT/GE/UNORDERED/ORDERED/LTGT/UNEQ for now, they
+     are handled equivalently as before.
+
+     FIXME: Handle the remaining vector float comparison operators
+     here.  */
+  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
+      && (rcode == EQ
+	  || rcode == GT
+	  || rcode == GE
+	  || rcode == UNORDERED
+	  || rcode == ORDERED
+	  || rcode == LTGT
+	  || rcode == UNEQ))
+    {
+      mask = gen_reg_rtx (dmode);
+      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
+      return mask;
+    }
 
   /* See if the comparison works as is.  */
   mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
   if (mask)
     return mask;
 
+  bool swap_operands = false;
+  bool try_again = false;
+
   switch (rcode)
     {
     case LT:
@@ -15791,7 +15806,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
       if (swap_operands)
 	std::swap (op0, op1);
 
-      mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
+      mask = rs6000_emit_vector_compare (rcode, op0, op1, dmode);
       if (mask)
 	return mask;
     }
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
  2022-11-24  9:15 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen Lin
  2022-11-24  9:15 ` [PATCH 1/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1 Kewen Lin
@ 2022-11-24  9:15 ` Kewen Lin
  2022-11-24  9:15 ` [PATCH 3/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3 Kewen Lin
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Kewen Lin @ 2022-11-24  9:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kewen Lin, segher, dje.gcc, bergner, meissner

All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 2, it further checks for comparison opeators
NE/UNLE/UNLT.  In rs6000_emit_vector_compare, they are
handled with reversed code which is queried from function
reverse_condition_maybe_unordered and inverting with
one_cmpl_optab.  It's the same as what we have in vector.md:

; ne(a,b)   = ~eq(a,b)
; unle(a,b) = ~gt(a,b)
; unlt(a,b) = ~ge(a,b)

The operators on the right side have been supported in part 1.
This patch should not have any functionality change too.

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Emit rtx
	comparison for operators NE/UNLE/UNLT of MODE_VECTOR_FLOAT directly.
---
 gcc/config/rs6000/rs6000.cc | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 5a8f7ff3bf8..09299bef6a2 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15679,20 +15679,18 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
      comparison operators in a comparison rtl pattern, we can
      just emit the comparison rtx insn directly here.  Besides,
      we should have a centralized place to handle the possibility
-     of raising invalid exception.  As the first step, only check
-     operators EQ/GT/GE/UNORDERED/ORDERED/LTGT/UNEQ for now, they
-     are handled equivalently as before.
+     of raising invalid exception.  For EQ/GT/GE/UNORDERED/
+     ORDERED/LTGT/UNEQ, they are handled equivalently as before;
+     for NE/UNLE/UNLT, they are handled with reversed code
+     and inverting, it's the same as before.
 
      FIXME: Handle the remaining vector float comparison operators
      here.  */
   if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
-      && (rcode == EQ
-	  || rcode == GT
-	  || rcode == GE
-	  || rcode == UNORDERED
-	  || rcode == ORDERED
-	  || rcode == LTGT
-	  || rcode == UNEQ))
+      && rcode != LT
+      && rcode != LE
+      && rcode != UNGE
+      && rcode != UNGT)
     {
       mask = gen_reg_rtx (dmode);
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
@@ -15720,8 +15718,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
       try_again = true;
       break;
     case NE:
-    case UNLE:
-    case UNLT:
     case UNGE:
     case UNGT:
       /* Invert condition and try again.
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
  2022-11-24  9:15 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen Lin
  2022-11-24  9:15 ` [PATCH 1/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1 Kewen Lin
  2022-11-24  9:15 ` [PATCH 2/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2 Kewen Lin
@ 2022-11-24  9:15 ` Kewen Lin
  2022-11-24  9:15 ` [PATCH 4/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4 Kewen Lin
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Kewen Lin @ 2022-11-24  9:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kewen Lin, segher, dje.gcc, bergner, meissner

All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 3, it further checks for comparison opeators
LE/UNGT.  In rs6000_emit_vector_compare, UNGT is handled
with reversed code LE and inverting with one_cmpl_optab,
LE is handled with LT ior EQ, while in vector.md, we have
the support:

; le(a,b)   = ge(b,a)
; ungt(a,b) = ~le(a,b)

The associated test case shows it's an improvement.

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Emit rtx
	comparison for operators LE/UNGT of MODE_VECTOR_FLOAT directly.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/vcond-fp.c: New test.
---
 gcc/config/rs6000/rs6000.cc                 |  9 ++++----
 gcc/testsuite/gcc.target/powerpc/vcond-fp.c | 25 +++++++++++++++++++++
 2 files changed, 29 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 09299bef6a2..98754805bd2 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15682,15 +15682,15 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
      of raising invalid exception.  For EQ/GT/GE/UNORDERED/
      ORDERED/LTGT/UNEQ, they are handled equivalently as before;
      for NE/UNLE/UNLT, they are handled with reversed code
-     and inverting, it's the same as before.
+     and inverting, it's the same as before; for LE/UNGT, they
+     are handled with LE ior EQ previously, emitting directly
+     here will make use of GE later, it's slightly better;
 
      FIXME: Handle the remaining vector float comparison operators
      here.  */
   if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
       && rcode != LT
-      && rcode != LE
-      && rcode != UNGE
-      && rcode != UNGT)
+      && rcode != UNGE)
     {
       mask = gen_reg_rtx (dmode);
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
@@ -15719,7 +15719,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
       break;
     case NE:
     case UNGE:
-    case UNGT:
       /* Invert condition and try again.
 	 e.g., A != B becomes ~(A==B).  */
       {
diff --git a/gcc/testsuite/gcc.target/powerpc/vcond-fp.c b/gcc/testsuite/gcc.target/powerpc/vcond-fp.c
new file mode 100644
index 00000000000..b71861d3588
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vcond-fp.c
@@ -0,0 +1,25 @@
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -mpower8-vector" } */
+
+/* Test we use xvcmpge[sd]p rather than xvcmpeq[sd]p and xvcmpgt[sd]p
+   for UNGT and LE handlings.  */
+
+#define UNGT(a, b) (!__builtin_islessequal ((a), (b)))
+#define LE(a, b) (((a) <= (b)))
+
+#define TEST_VECT(NAME, TYPE)                                                  \
+  __attribute__ ((noipa)) void test_##NAME##_##TYPE (TYPE *x, TYPE *y,         \
+						     int *res, int n)          \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      res[i] = NAME (x[i], y[i]);                                              \
+  }
+
+#define TEST(TYPE)                                                             \
+  TEST_VECT (UNGT, TYPE)                                                       \
+  TEST_VECT (LE, TYPE)
+
+TEST (float)
+TEST (double)
+
+/* { dg-final { scan-assembler-not {\mxvcmp(gt|eq)[sd]p\M} } } */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
  2022-11-24  9:15 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen Lin
                   ` (2 preceding siblings ...)
  2022-11-24  9:15 ` [PATCH 3/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3 Kewen Lin
@ 2022-11-24  9:15 ` Kewen Lin
  2022-11-24  9:15 ` [PATCH 5/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1 Kewen Lin
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Kewen Lin @ 2022-11-24  9:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kewen Lin, segher, dje.gcc, bergner, meissner

All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 4, it further checks for comparison opeators
LT/UNGE.  In rs6000_emit_vector_compare, for the handling
of LT, it switches to use code GT, swaps operands and try
again, it's exactly the same as what we have in vector.md:

; lt(a,b)   = gt(b,a)

As to UNGE, in rs6000_emit_vector_compare, it uses reversed
code LT and further operates on the result with one_cmpl,
it's also the same as what's in vector.md:

; unge(a,b) = ~lt(a,b)

This patch should not have any functionality change too.

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Emit rtx
	comparison for operators LT/UNGE of MODE_VECTOR_FLOAT directly.
	(rs6000_emit_vector_compare): Move assertion of no MODE_VECTOR_FLOAT to
	function beginning.
---
 gcc/config/rs6000/rs6000.cc | 24 ++++--------------------
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 98754805bd2..94e039649f5 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15645,6 +15645,7 @@ static rtx
 rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
 {
   machine_mode mode = GET_MODE (op0);
+  gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
 
   switch (code)
     {
@@ -15654,7 +15655,6 @@ rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
     case EQ:
     case GT:
     case GTU:
-      gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
       rtx mask = gen_reg_rtx (mode);
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
       return mask;
@@ -15679,18 +15679,8 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
      comparison operators in a comparison rtl pattern, we can
      just emit the comparison rtx insn directly here.  Besides,
      we should have a centralized place to handle the possibility
-     of raising invalid exception.  For EQ/GT/GE/UNORDERED/
-     ORDERED/LTGT/UNEQ, they are handled equivalently as before;
-     for NE/UNLE/UNLT, they are handled with reversed code
-     and inverting, it's the same as before; for LE/UNGT, they
-     are handled with LE ior EQ previously, emitting directly
-     here will make use of GE later, it's slightly better;
-
-     FIXME: Handle the remaining vector float comparison operators
-     here.  */
-  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
-      && rcode != LT
-      && rcode != UNGE)
+     of raising invalid exception.  */
+  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
     {
       mask = gen_reg_rtx (dmode);
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
@@ -15718,23 +15708,17 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
       try_again = true;
       break;
     case NE:
-    case UNGE:
       /* Invert condition and try again.
 	 e.g., A != B becomes ~(A==B).  */
       {
-	enum rtx_code rev_code;
 	enum insn_code nor_code;
 	rtx mask2;
 
-	rev_code = reverse_condition_maybe_unordered (rcode);
-	if (rev_code == UNKNOWN)
-	  return NULL_RTX;
-
 	nor_code = optab_handler (one_cmpl_optab, dmode);
 	if (nor_code == CODE_FOR_nothing)
 	  return NULL_RTX;
 
-	mask2 = rs6000_emit_vector_compare (rev_code, op0, op1, dmode);
+	mask2 = rs6000_emit_vector_compare (EQ, op0, op1, dmode);
 	if (!mask2)
 	  return NULL_RTX;
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
  2022-11-24  9:15 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen Lin
                   ` (3 preceding siblings ...)
  2022-11-24  9:15 ` [PATCH 4/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4 Kewen Lin
@ 2022-11-24  9:15 ` Kewen Lin
  2022-11-24  9:15 ` [PATCH 6/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2 Kewen Lin
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Kewen Lin @ 2022-11-24  9:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kewen Lin, segher, dje.gcc, bergner, meissner

The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 1, it's to remove the helper function
rs6000_emit_vector_compare_inner and move the logics into
rs6000_emit_vector_compare.  This patch doesn't introduce any
functionality change.

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove.
	(rs6000_emit_vector_compare): Emit rtx comparison for operators EQ/
	GT/GTU directly.
---
 gcc/config/rs6000/rs6000.cc | 37 +++++++++----------------------------
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 94e039649f5..0a5826800c5 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15639,30 +15639,6 @@ output_cbranch (rtx op, const char *label, int reversed, rtx_insn *insn)
   return string;
 }
 
-/* Return insn for VSX or Altivec comparisons.  */
-
-static rtx
-rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
-{
-  machine_mode mode = GET_MODE (op0);
-  gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
-
-  switch (code)
-    {
-    default:
-      break;
-
-    case EQ:
-    case GT:
-    case GTU:
-      rtx mask = gen_reg_rtx (mode);
-      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
-      return mask;
-    }
-
-  return NULL_RTX;
-}
-
 /* Emit vector compare for operands OP0 and OP1 using code RCODE.
    DMODE is expected destination mode. This is a recursive function.  */
 
@@ -15687,10 +15663,15 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
       return mask;
     }
 
-  /* See if the comparison works as is.  */
-  mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
-  if (mask)
-    return mask;
+  /* For any of vector integer comparison operators for which we
+     have direct hardware instructions, just emit it directly
+     here.  */
+  if (rcode == EQ || rcode == GT || rcode == GTU)
+    {
+      mask = gen_reg_rtx (dmode);
+      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
+      return mask;
+    }
 
   bool swap_operands = false;
   bool try_again = false;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
  2022-11-24  9:15 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen Lin
                   ` (4 preceding siblings ...)
  2022-11-24  9:15 ` [PATCH 5/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1 Kewen Lin
@ 2022-11-24  9:15 ` Kewen Lin
  2022-11-24  9:15 ` [PATCH 7/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3 Kewen Lin
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Kewen Lin @ 2022-11-24  9:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kewen Lin, segher, dje.gcc, bergner, meissner

The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 2, it's to refactor the handlings on LT and LTU.
This patch doesn't introduce any functionality change.

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refine the
	handlings for operators LT and LTU.
---
 gcc/config/rs6000/rs6000.cc | 32 +++++++++-----------------------
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 0a5826800c5..c1aebbb5c03 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15672,22 +15672,18 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
       return mask;
     }
-
-  bool swap_operands = false;
-  bool try_again = false;
+  else if (rcode == LT || rcode == LTU)
+    {
+      /* lt{,u}(a,b) = gt{,u}(b,a)  */
+      enum rtx_code code = swap_condition (rcode);
+      std::swap (op0, op1);
+      mask = gen_reg_rtx (dmode);
+      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+      return mask;
+    }
 
   switch (rcode)
     {
-    case LT:
-      rcode = GT;
-      swap_operands = true;
-      try_again = true;
-      break;
-    case LTU:
-      rcode = GTU;
-      swap_operands = true;
-      try_again = true;
-      break;
     case NE:
       /* Invert condition and try again.
 	 e.g., A != B becomes ~(A==B).  */
@@ -15761,16 +15757,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
       return NULL_RTX;
     }
 
-  if (try_again)
-    {
-      if (swap_operands)
-	std::swap (op0, op1);
-
-      mask = rs6000_emit_vector_compare (rcode, op0, op1, dmode);
-      if (mask)
-	return mask;
-    }
-
   /* You only get two chances.  */
   return NULL_RTX;
 }
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 7/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
  2022-11-24  9:15 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen Lin
                   ` (5 preceding siblings ...)
  2022-11-24  9:15 ` [PATCH 6/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2 Kewen Lin
@ 2022-11-24  9:15 ` Kewen Lin
  2022-11-24  9:15 ` [PATCH 8/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4 Kewen Lin
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Kewen Lin @ 2022-11-24  9:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kewen Lin, segher, dje.gcc, bergner, meissner

The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 3, it's to refactor the handlings on NE.
This patch doesn't introduce any functionality change.

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refactor the
	handlings for operator NE.
---
 gcc/config/rs6000/rs6000.cc | 30 ++++++++++--------------------
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index c1aebbb5c03..b4ca7b3d1b1 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15681,29 +15681,19 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
       return mask;
     }
+  else if (rcode == NE)
+    {
+      /* ne(a,b) = ~eq(a,b)  */
+      mask = gen_reg_rtx (dmode);
+      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (EQ, dmode, op0, op1)));
+      enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
+      gcc_assert (nor_code != CODE_FOR_nothing);
+      emit_insn (GEN_FCN (nor_code) (mask, mask));
+      return mask;
+    }
 
   switch (rcode)
     {
-    case NE:
-      /* Invert condition and try again.
-	 e.g., A != B becomes ~(A==B).  */
-      {
-	enum insn_code nor_code;
-	rtx mask2;
-
-	nor_code = optab_handler (one_cmpl_optab, dmode);
-	if (nor_code == CODE_FOR_nothing)
-	  return NULL_RTX;
-
-	mask2 = rs6000_emit_vector_compare (EQ, op0, op1, dmode);
-	if (!mask2)
-	  return NULL_RTX;
-
-	mask = gen_reg_rtx (dmode);
-	emit_insn (GEN_FCN (nor_code) (mask, mask2));
-	return mask;
-      }
-      break;
     case GE:
     case GEU:
     case LE:
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 8/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
  2022-11-24  9:15 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen Lin
                   ` (6 preceding siblings ...)
  2022-11-24  9:15 ` [PATCH 7/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3 Kewen Lin
@ 2022-11-24  9:15 ` Kewen Lin
  2022-11-24  9:15 ` [PATCH 9/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5 Kewen Lin
  2022-12-14 11:23 ` PING^1 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen.Lin
  9 siblings, 0 replies; 19+ messages in thread
From: Kewen Lin @ 2022-11-24  9:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kewen Lin, segher, dje.gcc, bergner, meissner

The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 4, it's to rework the handlings on GE/GEU/LE/LEU,
also make the function not recursive any more.  This patch
doesn't introduce any functionality change.

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refine the
	handlings for operators GE/GEU/LE/LEU.
---
 gcc/config/rs6000/rs6000.cc | 87 ++++++++-----------------------------
 1 file changed, 17 insertions(+), 70 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index b4ca7b3d1b1..a3645e321a7 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15640,7 +15640,7 @@ output_cbranch (rtx op, const char *label, int reversed, rtx_insn *insn)
 }
 
 /* Emit vector compare for operands OP0 and OP1 using code RCODE.
-   DMODE is expected destination mode. This is a recursive function.  */
+   DMODE is expected destination mode.  */
 
 static rtx
 rs6000_emit_vector_compare (enum rtx_code rcode,
@@ -15649,7 +15649,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
 {
   gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode));
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));
-  rtx mask;
+  rtx mask = gen_reg_rtx (dmode);
 
   /* In vector.md, we support all kinds of vector float point
      comparison operators in a comparison rtl pattern, we can
@@ -15658,7 +15658,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
      of raising invalid exception.  */
   if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
     {
-      mask = gen_reg_rtx (dmode);
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
       return mask;
     }
@@ -15667,11 +15666,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
      have direct hardware instructions, just emit it directly
      here.  */
   if (rcode == EQ || rcode == GT || rcode == GTU)
-    {
-      mask = gen_reg_rtx (dmode);
-      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
-      return mask;
-    }
+    emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
   else if (rcode == LT || rcode == LTU)
     {
       /* lt{,u}(a,b) = gt{,u}(b,a)  */
@@ -15679,76 +15674,28 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
       std::swap (op0, op1);
       mask = gen_reg_rtx (dmode);
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
-      return mask;
     }
-  else if (rcode == NE)
+  else if (rcode == NE || rcode == LE || rcode == LEU)
     {
-      /* ne(a,b) = ~eq(a,b)  */
+      /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b)  */
+      enum rtx_code code = reverse_condition (rcode);
       mask = gen_reg_rtx (dmode);
-      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (EQ, dmode, op0, op1)));
+      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+      enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
+      gcc_assert (nor_code != CODE_FOR_nothing);
+      emit_insn (GEN_FCN (nor_code) (mask, mask));
+    } else {
+      /* ge{,u}(a,b) = ~gt{,u}(b,a)  */
+      gcc_assert (rcode == GE || rcode == GEU);
+      enum rtx_code code = rcode == GE ? GT : GTU;
+      mask = gen_reg_rtx (dmode);
+      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
       enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
       gcc_assert (nor_code != CODE_FOR_nothing);
       emit_insn (GEN_FCN (nor_code) (mask, mask));
-      return mask;
-    }
-
-  switch (rcode)
-    {
-    case GE:
-    case GEU:
-    case LE:
-    case LEU:
-      /* Try GT/GTU/LT/LTU OR EQ */
-      {
-	rtx c_rtx, eq_rtx;
-	enum insn_code ior_code;
-	enum rtx_code new_code;
-
-	switch (rcode)
-	  {
-	  case  GE:
-	    new_code = GT;
-	    break;
-
-	  case GEU:
-	    new_code = GTU;
-	    break;
-
-	  case LE:
-	    new_code = LT;
-	    break;
-
-	  case LEU:
-	    new_code = LTU;
-	    break;
-
-	  default:
-	    gcc_unreachable ();
-	  }
-
-	ior_code = optab_handler (ior_optab, dmode);
-	if (ior_code == CODE_FOR_nothing)
-	  return NULL_RTX;
-
-	c_rtx = rs6000_emit_vector_compare (new_code, op0, op1, dmode);
-	if (!c_rtx)
-	  return NULL_RTX;
-
-	eq_rtx = rs6000_emit_vector_compare (EQ, op0, op1, dmode);
-	if (!eq_rtx)
-	  return NULL_RTX;
-
-	mask = gen_reg_rtx (dmode);
-	emit_insn (GEN_FCN (ior_code) (mask, c_rtx, eq_rtx));
-	return mask;
-      }
-      break;
-    default:
-      return NULL_RTX;
     }
 
-  /* You only get two chances.  */
-  return NULL_RTX;
+  return mask;
 }
 
 /* Emit vector conditional expression.  DEST is destination. OP_TRUE and
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 9/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
  2022-11-24  9:15 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen Lin
                   ` (7 preceding siblings ...)
  2022-11-24  9:15 ` [PATCH 8/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4 Kewen Lin
@ 2022-11-24  9:15 ` Kewen Lin
  2022-12-14 11:23 ` PING^1 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen.Lin
  9 siblings, 0 replies; 19+ messages in thread
From: Kewen Lin @ 2022-11-24  9:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kewen Lin, segher, dje.gcc, bergner, meissner

The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 5, it's to refactor all the handlings of vector
integer comparison to make it neat.  This patch doesn't
introduce any functionality change.

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refactor the
	handlings of vector integer comparison.
---
 gcc/config/rs6000/rs6000.cc | 68 ++++++++++++++++++++++++-------------
 1 file changed, 44 insertions(+), 24 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a3645e321a7..b694080840a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15662,34 +15662,54 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
       return mask;
     }
 
-  /* For any of vector integer comparison operators for which we
-     have direct hardware instructions, just emit it directly
-     here.  */
-  if (rcode == EQ || rcode == GT || rcode == GTU)
-    emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
-  else if (rcode == LT || rcode == LTU)
+  bool swap_operands = false;
+  bool need_invert = false;
+  enum rtx_code code = rcode;
+
+  switch (rcode)
     {
+    case EQ:
+    case GT:
+    case GTU:
+      /* Emit directly with native hardware insn.  */
+      break;
+    case LT:
+    case LTU:
       /* lt{,u}(a,b) = gt{,u}(b,a)  */
-      enum rtx_code code = swap_condition (rcode);
-      std::swap (op0, op1);
-      mask = gen_reg_rtx (dmode);
-      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+      code = swap_condition (rcode);
+      swap_operands = true;
+      break;
+    case NE:
+    case LE:
+    case LEU:
+      /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b)  */
+      code = reverse_condition (rcode);
+      need_invert = true;
+      break;
+    case GE:
+      /* ge(a,b) = ~gt(b,a)  */
+      code = GT;
+      swap_operands = true;
+      need_invert = true;
+      break;
+    case GEU:
+      /* geu(a,b) = ~gtu(b,a)  */
+      code = GTU;
+      swap_operands = true;
+      need_invert = true;
+      break;
+    default:
+      gcc_unreachable ();
+      break;
     }
-  else if (rcode == NE || rcode == LE || rcode == LEU)
+
+  if (swap_operands)
+    std::swap (op0, op1);
+
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+
+  if (need_invert)
     {
-      /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b)  */
-      enum rtx_code code = reverse_condition (rcode);
-      mask = gen_reg_rtx (dmode);
-      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
-      enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
-      gcc_assert (nor_code != CODE_FOR_nothing);
-      emit_insn (GEN_FCN (nor_code) (mask, mask));
-    } else {
-      /* ge{,u}(a,b) = ~gt{,u}(b,a)  */
-      gcc_assert (rcode == GE || rcode == GEU);
-      enum rtx_code code = rcode == GE ? GT : GTU;
-      mask = gen_reg_rtx (dmode);
-      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
       enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
       gcc_assert (nor_code != CODE_FOR_nothing);
       emit_insn (GEN_FCN (nor_code) (mask, mask));
-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* PING^1 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare
  2022-11-24  9:15 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen Lin
                   ` (8 preceding siblings ...)
  2022-11-24  9:15 ` [PATCH 9/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5 Kewen Lin
@ 2022-12-14 11:23 ` Kewen.Lin
  2023-05-17  6:26   ` PING^2 " Kewen.Lin
  9 siblings, 1 reply; 19+ messages in thread
From: Kewen.Lin @ 2022-12-14 11:23 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, bergner, meissner

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

on 2022/11/24 17:15, Kewen Lin wrote:
> Hi,
> 
> Following Segher's suggestion, this patch series is to rework
> function rs6000_emit_vector_compare for vector float and int
> in multiple steps, it's based on the previous attempts [1][2].
> As mentioned in [1], the need to rework this for float is to
> make a centralized place for vector float comparison handlings
> instead of supporting with swapping ops and reversing code etc.
> dispersedly.  It's also for a subsequent patch to handle
> comparison operators with or without trapping math (PR105480).
> With the handling on vector float reworked, we can further make
> the handling on vector int simplified as shown.
> 
> For Segher's concern about whether this rework causes any
> assembly change, I constructed two testcases for vector float[3]
> and int[4] respectively before, it showed the most are fine
> excepting for the difference on LE and UNGT, it's demonstrated
> as improvement since it uses GE instead of GT ior EQ.  The
> associated test case in patch 3/9 is a good example.
> 
> Besides, w/ and w/o the whole patch series, I built the whole
> SPEC2017 at options -O3 and -Ofast separately, checked the
> differences on object assembly.  The result showed that the
> most are unchanged, except for:
> 
>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>     9 object files with differences.
> 
>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>     one and 527.cam4_r has 4 object files with differences.
> 
> By looking into these differences, all significant differences
> are caused by the known improvement mentined above transforming
> GT ior EQ to GE, which can also affect unrolling decision due
> to insn count.  Some other trivial differences are branch
> target offset difference, nop difference for alignment, vsx
> register number differences etc.
> 
> I also evaluated the runtime performance for these changed
> benchmarks, the result is neutral.
> 
> These patches are bootstrapped and regress-tested
> incrementally on powerpc64-linux-gnu P7 & P8, and
> powerpc64le-linux-gnu P9 & P10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -----
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
> 
> Kewen Lin (9):
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
> 
>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>  2 files changed, 74 insertions(+), 131 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* PING^2 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare
  2022-12-14 11:23 ` PING^1 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen.Lin
@ 2023-05-17  6:26   ` Kewen.Lin
  2023-06-15  6:38     ` PING^3 " Kewen.Lin
  0 siblings, 1 reply; 19+ messages in thread
From: Kewen.Lin @ 2023-05-17  6:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, bergner, meissner

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

> 
> on 2022/11/24 17:15, Kewen Lin wrote:
>> Hi,
>>
>> Following Segher's suggestion, this patch series is to rework
>> function rs6000_emit_vector_compare for vector float and int
>> in multiple steps, it's based on the previous attempts [1][2].
>> As mentioned in [1], the need to rework this for float is to
>> make a centralized place for vector float comparison handlings
>> instead of supporting with swapping ops and reversing code etc.
>> dispersedly.  It's also for a subsequent patch to handle
>> comparison operators with or without trapping math (PR105480).
>> With the handling on vector float reworked, we can further make
>> the handling on vector int simplified as shown.
>>
>> For Segher's concern about whether this rework causes any
>> assembly change, I constructed two testcases for vector float[3]
>> and int[4] respectively before, it showed the most are fine
>> excepting for the difference on LE and UNGT, it's demonstrated
>> as improvement since it uses GE instead of GT ior EQ.  The
>> associated test case in patch 3/9 is a good example.
>>
>> Besides, w/ and w/o the whole patch series, I built the whole
>> SPEC2017 at options -O3 and -Ofast separately, checked the
>> differences on object assembly.  The result showed that the
>> most are unchanged, except for:
>>
>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>     9 object files with differences.
>>
>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>     one and 527.cam4_r has 4 object files with differences.
>>
>> By looking into these differences, all significant differences
>> are caused by the known improvement mentined above transforming
>> GT ior EQ to GE, which can also affect unrolling decision due
>> to insn count.  Some other trivial differences are branch
>> target offset difference, nop difference for alignment, vsx
>> register number differences etc.
>>
>> I also evaluated the runtime performance for these changed
>> benchmarks, the result is neutral.
>>
>> These patches are bootstrapped and regress-tested
>> incrementally on powerpc64-linux-gnu P7 & P8, and
>> powerpc64le-linux-gnu P9 & P10.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -----
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>
>> Kewen Lin (9):
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>
>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* PING^3 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare
  2023-05-17  6:26   ` PING^2 " Kewen.Lin
@ 2023-06-15  6:38     ` Kewen.Lin
  2023-07-06 21:54       ` Michael Meissner
  2023-08-07 10:05       ` PING^4 " Kewen.Lin
  0 siblings, 2 replies; 19+ messages in thread
From: Kewen.Lin @ 2023-06-15  6:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, bergner, meissner

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

> 
>>
>> on 2022/11/24 17:15, Kewen Lin wrote:
>>> Hi,
>>>
>>> Following Segher's suggestion, this patch series is to rework
>>> function rs6000_emit_vector_compare for vector float and int
>>> in multiple steps, it's based on the previous attempts [1][2].
>>> As mentioned in [1], the need to rework this for float is to
>>> make a centralized place for vector float comparison handlings
>>> instead of supporting with swapping ops and reversing code etc.
>>> dispersedly.  It's also for a subsequent patch to handle
>>> comparison operators with or without trapping math (PR105480).
>>> With the handling on vector float reworked, we can further make
>>> the handling on vector int simplified as shown.
>>>
>>> For Segher's concern about whether this rework causes any
>>> assembly change, I constructed two testcases for vector float[3]
>>> and int[4] respectively before, it showed the most are fine
>>> excepting for the difference on LE and UNGT, it's demonstrated
>>> as improvement since it uses GE instead of GT ior EQ.  The
>>> associated test case in patch 3/9 is a good example.
>>>
>>> Besides, w/ and w/o the whole patch series, I built the whole
>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>> differences on object assembly.  The result showed that the
>>> most are unchanged, except for:
>>>
>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>     9 object files with differences.
>>>
>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>     one and 527.cam4_r has 4 object files with differences.
>>>
>>> By looking into these differences, all significant differences
>>> are caused by the known improvement mentined above transforming
>>> GT ior EQ to GE, which can also affect unrolling decision due
>>> to insn count.  Some other trivial differences are branch
>>> target offset difference, nop difference for alignment, vsx
>>> register number differences etc.
>>>
>>> I also evaluated the runtime performance for these changed
>>> benchmarks, the result is neutral.
>>>
>>> These patches are bootstrapped and regress-tested
>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>> powerpc64le-linux-gnu P9 & P10.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -----
>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>
>>> Kewen Lin (9):
>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>
>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: PING^3 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare
  2023-06-15  6:38     ` PING^3 " Kewen.Lin
@ 2023-07-06 21:54       ` Michael Meissner
  2023-08-07 10:05       ` PING^4 " Kewen.Lin
  1 sibling, 0 replies; 19+ messages in thread
From: Michael Meissner @ 2023-07-06 21:54 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: gcc-patches, segher, dje.gcc, bergner, meissner

I get the following warning which prevents gcc from bootstrapping due to
-Werror:

/home/meissner/fsf-src/work124-sfsplat/gcc/config/rs6000/rs6000-p10sfopt.cc: In function ‘void {anonymous}::process_chain_from_load(gimple*)’:
/home/meissner/fsf-src/work124-sfsplat/gcc/config/rs6000/rs6000-p10sfopt.cc:505:30: warning: zero-length gcc_dump_printf format string [-Wformat-zero-length]
  505 |       dump_printf (MSG_NOTE, "");
      |                              ^~

I just commented out the dump_printf call.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* PING^4 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare
  2023-06-15  6:38     ` PING^3 " Kewen.Lin
  2023-07-06 21:54       ` Michael Meissner
@ 2023-08-07 10:05       ` Kewen.Lin
  2023-10-25  2:47         ` PING^5 " Kewen.Lin
  1 sibling, 1 reply; 19+ messages in thread
From: Kewen.Lin @ 2023-08-07 10:05 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, bergner, meissner

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

>>> on 2022/11/24 17:15, Kewen Lin wrote:
>>>> Hi,
>>>>
>>>> Following Segher's suggestion, this patch series is to rework
>>>> function rs6000_emit_vector_compare for vector float and int
>>>> in multiple steps, it's based on the previous attempts [1][2].
>>>> As mentioned in [1], the need to rework this for float is to
>>>> make a centralized place for vector float comparison handlings
>>>> instead of supporting with swapping ops and reversing code etc.
>>>> dispersedly.  It's also for a subsequent patch to handle
>>>> comparison operators with or without trapping math (PR105480).
>>>> With the handling on vector float reworked, we can further make
>>>> the handling on vector int simplified as shown.
>>>>
>>>> For Segher's concern about whether this rework causes any
>>>> assembly change, I constructed two testcases for vector float[3]
>>>> and int[4] respectively before, it showed the most are fine
>>>> excepting for the difference on LE and UNGT, it's demonstrated
>>>> as improvement since it uses GE instead of GT ior EQ.  The
>>>> associated test case in patch 3/9 is a good example.
>>>>
>>>> Besides, w/ and w/o the whole patch series, I built the whole
>>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>>> differences on object assembly.  The result showed that the
>>>> most are unchanged, except for:
>>>>
>>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>>     9 object files with differences.
>>>>
>>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>>     one and 527.cam4_r has 4 object files with differences.
>>>>
>>>> By looking into these differences, all significant differences
>>>> are caused by the known improvement mentined above transforming
>>>> GT ior EQ to GE, which can also affect unrolling decision due
>>>> to insn count.  Some other trivial differences are branch
>>>> target offset difference, nop difference for alignment, vsx
>>>> register number differences etc.
>>>>
>>>> I also evaluated the runtime performance for these changed
>>>> benchmarks, the result is neutral.
>>>>
>>>> These patches are bootstrapped and regress-tested
>>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>>> powerpc64le-linux-gnu P9 & P10.
>>>>
>>>> Is it ok for trunk?
>>>>
>>>> BR,
>>>> Kewen
>>>> -----
>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>>
>>>> Kewen Lin (9):
>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>>
>>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* PING^5 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare
  2023-08-07 10:05       ` PING^4 " Kewen.Lin
@ 2023-10-25  2:47         ` Kewen.Lin
  2023-11-08  2:50           ` PING^6 " Kewen.Lin
  0 siblings, 1 reply; 19+ messages in thread
From: Kewen.Lin @ 2023-10-25  2:47 UTC (permalink / raw)
  To: segher, dje.gcc; +Cc: bergner, meissner, gcc-patches

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

>>>> on 2022/11/24 17:15, Kewen Lin wrote:
>>>>> Hi,
>>>>>
>>>>> Following Segher's suggestion, this patch series is to rework
>>>>> function rs6000_emit_vector_compare for vector float and int
>>>>> in multiple steps, it's based on the previous attempts [1][2].
>>>>> As mentioned in [1], the need to rework this for float is to
>>>>> make a centralized place for vector float comparison handlings
>>>>> instead of supporting with swapping ops and reversing code etc.
>>>>> dispersedly.  It's also for a subsequent patch to handle
>>>>> comparison operators with or without trapping math (PR105480).
>>>>> With the handling on vector float reworked, we can further make
>>>>> the handling on vector int simplified as shown.
>>>>>
>>>>> For Segher's concern about whether this rework causes any
>>>>> assembly change, I constructed two testcases for vector float[3]
>>>>> and int[4] respectively before, it showed the most are fine
>>>>> excepting for the difference on LE and UNGT, it's demonstrated
>>>>> as improvement since it uses GE instead of GT ior EQ.  The
>>>>> associated test case in patch 3/9 is a good example.
>>>>>
>>>>> Besides, w/ and w/o the whole patch series, I built the whole
>>>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>>>> differences on object assembly.  The result showed that the
>>>>> most are unchanged, except for:
>>>>>
>>>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>>>     9 object files with differences.
>>>>>
>>>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>>>     one and 527.cam4_r has 4 object files with differences.
>>>>>
>>>>> By looking into these differences, all significant differences
>>>>> are caused by the known improvement mentined above transforming
>>>>> GT ior EQ to GE, which can also affect unrolling decision due
>>>>> to insn count.  Some other trivial differences are branch
>>>>> target offset difference, nop difference for alignment, vsx
>>>>> register number differences etc.
>>>>>
>>>>> I also evaluated the runtime performance for these changed
>>>>> benchmarks, the result is neutral.
>>>>>
>>>>> These patches are bootstrapped and regress-tested
>>>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>>>> powerpc64le-linux-gnu P9 & P10.
>>>>>
>>>>> Is it ok for trunk?
>>>>>
>>>>> BR,
>>>>> Kewen
>>>>> -----
>>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>>>
>>>>> Kewen Lin (9):
>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>>>
>>>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* PING^6 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare
  2023-10-25  2:47         ` PING^5 " Kewen.Lin
@ 2023-11-08  2:50           ` Kewen.Lin
  2023-12-04  9:50             ` PING^7 " Kewen.Lin
  0 siblings, 1 reply; 19+ messages in thread
From: Kewen.Lin @ 2023-11-08  2:50 UTC (permalink / raw)
  To: segher, dje.gcc; +Cc: bergner, meissner, gcc-patches

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

>>>>> on 2022/11/24 17:15, Kewen Lin wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Following Segher's suggestion, this patch series is to rework
>>>>>> function rs6000_emit_vector_compare for vector float and int
>>>>>> in multiple steps, it's based on the previous attempts [1][2].
>>>>>> As mentioned in [1], the need to rework this for float is to
>>>>>> make a centralized place for vector float comparison handlings
>>>>>> instead of supporting with swapping ops and reversing code etc.
>>>>>> dispersedly.  It's also for a subsequent patch to handle
>>>>>> comparison operators with or without trapping math (PR105480).
>>>>>> With the handling on vector float reworked, we can further make
>>>>>> the handling on vector int simplified as shown.
>>>>>>
>>>>>> For Segher's concern about whether this rework causes any
>>>>>> assembly change, I constructed two testcases for vector float[3]
>>>>>> and int[4] respectively before, it showed the most are fine
>>>>>> excepting for the difference on LE and UNGT, it's demonstrated
>>>>>> as improvement since it uses GE instead of GT ior EQ.  The
>>>>>> associated test case in patch 3/9 is a good example.
>>>>>>
>>>>>> Besides, w/ and w/o the whole patch series, I built the whole
>>>>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>>>>> differences on object assembly.  The result showed that the
>>>>>> most are unchanged, except for:
>>>>>>
>>>>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>>>>     9 object files with differences.
>>>>>>
>>>>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>>>>     one and 527.cam4_r has 4 object files with differences.
>>>>>>
>>>>>> By looking into these differences, all significant differences
>>>>>> are caused by the known improvement mentined above transforming
>>>>>> GT ior EQ to GE, which can also affect unrolling decision due
>>>>>> to insn count.  Some other trivial differences are branch
>>>>>> target offset difference, nop difference for alignment, vsx
>>>>>> register number differences etc.
>>>>>>
>>>>>> I also evaluated the runtime performance for these changed
>>>>>> benchmarks, the result is neutral.
>>>>>>
>>>>>> These patches are bootstrapped and regress-tested
>>>>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>>>>> powerpc64le-linux-gnu P9 & P10.
>>>>>>
>>>>>> Is it ok for trunk?
>>>>>>
>>>>>> BR,
>>>>>> Kewen
>>>>>> -----
>>>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>>>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>>>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>>>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>>>>
>>>>>> Kewen Lin (9):
>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>>>>
>>>>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>>>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* PING^7 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare
  2023-11-08  2:50           ` PING^6 " Kewen.Lin
@ 2023-12-04  9:50             ` Kewen.Lin
  2023-12-12  6:08               ` PING^8 " Kewen.Lin
  0 siblings, 1 reply; 19+ messages in thread
From: Kewen.Lin @ 2023-12-04  9:50 UTC (permalink / raw)
  To: segher, dje.gcc; +Cc: bergner, meissner, gcc-patches

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

> 
>>>>>> on 2022/11/24 17:15, Kewen Lin wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Following Segher's suggestion, this patch series is to rework
>>>>>>> function rs6000_emit_vector_compare for vector float and int
>>>>>>> in multiple steps, it's based on the previous attempts [1][2].
>>>>>>> As mentioned in [1], the need to rework this for float is to
>>>>>>> make a centralized place for vector float comparison handlings
>>>>>>> instead of supporting with swapping ops and reversing code etc.
>>>>>>> dispersedly.  It's also for a subsequent patch to handle
>>>>>>> comparison operators with or without trapping math (PR105480).
>>>>>>> With the handling on vector float reworked, we can further make
>>>>>>> the handling on vector int simplified as shown.
>>>>>>>
>>>>>>> For Segher's concern about whether this rework causes any
>>>>>>> assembly change, I constructed two testcases for vector float[3]
>>>>>>> and int[4] respectively before, it showed the most are fine
>>>>>>> excepting for the difference on LE and UNGT, it's demonstrated
>>>>>>> as improvement since it uses GE instead of GT ior EQ.  The
>>>>>>> associated test case in patch 3/9 is a good example.
>>>>>>>
>>>>>>> Besides, w/ and w/o the whole patch series, I built the whole
>>>>>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>>>>>> differences on object assembly.  The result showed that the
>>>>>>> most are unchanged, except for:
>>>>>>>
>>>>>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>>>>>     9 object files with differences.
>>>>>>>
>>>>>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>>>>>     one and 527.cam4_r has 4 object files with differences.
>>>>>>>
>>>>>>> By looking into these differences, all significant differences
>>>>>>> are caused by the known improvement mentined above transforming
>>>>>>> GT ior EQ to GE, which can also affect unrolling decision due
>>>>>>> to insn count.  Some other trivial differences are branch
>>>>>>> target offset difference, nop difference for alignment, vsx
>>>>>>> register number differences etc.
>>>>>>>
>>>>>>> I also evaluated the runtime performance for these changed
>>>>>>> benchmarks, the result is neutral.
>>>>>>>
>>>>>>> These patches are bootstrapped and regress-tested
>>>>>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>>>>>> powerpc64le-linux-gnu P9 & P10.
>>>>>>>
>>>>>>> Is it ok for trunk?
>>>>>>>
>>>>>>> BR,
>>>>>>> Kewen
>>>>>>> -----
>>>>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>>>>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>>>>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>>>>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>>>>>
>>>>>>> Kewen Lin (9):
>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>>>>>
>>>>>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>>>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>>>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>>>>>
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* PING^8 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare
  2023-12-04  9:50             ` PING^7 " Kewen.Lin
@ 2023-12-12  6:08               ` Kewen.Lin
  0 siblings, 0 replies; 19+ messages in thread
From: Kewen.Lin @ 2023-12-12  6:08 UTC (permalink / raw)
  To: segher, dje.gcc; +Cc: bergner, meissner, gcc-patches

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

>>>>>>> on 2022/11/24 17:15, Kewen Lin wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Following Segher's suggestion, this patch series is to rework
>>>>>>>> function rs6000_emit_vector_compare for vector float and int
>>>>>>>> in multiple steps, it's based on the previous attempts [1][2].
>>>>>>>> As mentioned in [1], the need to rework this for float is to
>>>>>>>> make a centralized place for vector float comparison handlings
>>>>>>>> instead of supporting with swapping ops and reversing code etc.
>>>>>>>> dispersedly.  It's also for a subsequent patch to handle
>>>>>>>> comparison operators with or without trapping math (PR105480).
>>>>>>>> With the handling on vector float reworked, we can further make
>>>>>>>> the handling on vector int simplified as shown.
>>>>>>>>
>>>>>>>> For Segher's concern about whether this rework causes any
>>>>>>>> assembly change, I constructed two testcases for vector float[3]
>>>>>>>> and int[4] respectively before, it showed the most are fine
>>>>>>>> excepting for the difference on LE and UNGT, it's demonstrated
>>>>>>>> as improvement since it uses GE instead of GT ior EQ.  The
>>>>>>>> associated test case in patch 3/9 is a good example.
>>>>>>>>
>>>>>>>> Besides, w/ and w/o the whole patch series, I built the whole
>>>>>>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>>>>>>> differences on object assembly.  The result showed that the
>>>>>>>> most are unchanged, except for:
>>>>>>>>
>>>>>>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>>>>>>     9 object files with differences.
>>>>>>>>
>>>>>>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>>>>>>     one and 527.cam4_r has 4 object files with differences.
>>>>>>>>
>>>>>>>> By looking into these differences, all significant differences
>>>>>>>> are caused by the known improvement mentined above transforming
>>>>>>>> GT ior EQ to GE, which can also affect unrolling decision due
>>>>>>>> to insn count.  Some other trivial differences are branch
>>>>>>>> target offset difference, nop difference for alignment, vsx
>>>>>>>> register number differences etc.
>>>>>>>>
>>>>>>>> I also evaluated the runtime performance for these changed
>>>>>>>> benchmarks, the result is neutral.
>>>>>>>>
>>>>>>>> These patches are bootstrapped and regress-tested
>>>>>>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>>>>>>> powerpc64le-linux-gnu P9 & P10.
>>>>>>>>
>>>>>>>> Is it ok for trunk?
>>>>>>>>
>>>>>>>> BR,
>>>>>>>> Kewen
>>>>>>>> -----
>>>>>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>>>>>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>>>>>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>>>>>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>>>>>>
>>>>>>>> Kewen Lin (9):
>>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>>>>>>
>>>>>>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>>>>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>>>>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>>>>>>
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-12-12  6:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-24  9:15 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen Lin
2022-11-24  9:15 ` [PATCH 1/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1 Kewen Lin
2022-11-24  9:15 ` [PATCH 2/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2 Kewen Lin
2022-11-24  9:15 ` [PATCH 3/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3 Kewen Lin
2022-11-24  9:15 ` [PATCH 4/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4 Kewen Lin
2022-11-24  9:15 ` [PATCH 5/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1 Kewen Lin
2022-11-24  9:15 ` [PATCH 6/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2 Kewen Lin
2022-11-24  9:15 ` [PATCH 7/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3 Kewen Lin
2022-11-24  9:15 ` [PATCH 8/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4 Kewen Lin
2022-11-24  9:15 ` [PATCH 9/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5 Kewen Lin
2022-12-14 11:23 ` PING^1 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Kewen.Lin
2023-05-17  6:26   ` PING^2 " Kewen.Lin
2023-06-15  6:38     ` PING^3 " Kewen.Lin
2023-07-06 21:54       ` Michael Meissner
2023-08-07 10:05       ` PING^4 " Kewen.Lin
2023-10-25  2:47         ` PING^5 " Kewen.Lin
2023-11-08  2:50           ` PING^6 " Kewen.Lin
2023-12-04  9:50             ` PING^7 " Kewen.Lin
2023-12-12  6:08               ` PING^8 " Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).