[PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move
@ 2020-09-22  3:39 Michael Meissner
  2020-09-22  3:41 ` [PATCH 1/2] Power10: Add IEEE 128-bit xsmaxcqp and xsmincqp support Michael Meissner
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Michael Meissner @ 2020-09-22  3:39 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

These patches are my latest versions of the patches to add IEEE 128-bit min,
max, and conditional move to GCC.  They correspond to the earlier patches #3
and #4 (patches #1 and #2 have been installed).

The first patch just adds support for the xsmincqp and xsmaxcqp instructions.
Due to the NaN rules, this patch will only generate the minc/maxc instructions
if -ffast-math is used.  Unlike the previous patch, I did not try to combine
min/max for IEEE 128-bit with the existing power9 min/max for SF/DF mode.
Instead, I just created a new insn (without a generator for it).

The second patch adds the support for doing a conditional move, where the
comparison is one of the 4 binary floating point scalar types.  The types being
moved do not have to be the same as the types for comparison.  Unlike the
previous patches, I did not try to combine the normal comparison and the
reversed comparison into one insn.  Instead, I kept the two insns.

After using it for awhile, I did not like using SFDFKFTF as the mode for all 4
floating point types.  Instead, I used FPMASK and FPMASK2 for the cases that we
generate the compare and set mask instruction to do the conditional move.

As with power9, the conditional move allows some forms of min/max to be
generated without using -ffast-math, assuming the tests give the right results
for NaN's.

I have built bootstrapped compilers with/without these patches, and there were
no regressions in the test suite.  Can I check these patches into the master
branch?

I do not anticipate needing to back port these changes to GCC 10.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] Power10: Add IEEE 128-bit xsmaxcqp and xsmincqp support.
  2020-09-22  3:39 [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move Michael Meissner
@ 2020-09-22  3:41 ` Michael Meissner
  2020-09-22  3:42 ` [PATCH 2/2] Power10: Add IEEE 128-bit fp conditional move Michael Meissner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Michael Meissner @ 2020-09-22  3:41 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

Power10: Add IEEE 128-bit xsmaxcqp and xsmincqp support.

This patch adds the support for the IEEE 128-bit floating point C minimum and
maximum instructions.  The next patch will add the support for using the
compare and set mask instruction to implement conditional moves.

Rather than trying to overload the current SF/DF min/max support, it was
simpler to just provide the new instructions as a separate insn.

gcc/
2020-09-17  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/rs6000.c (rs6000_emit_minmax): Add support for ISA
	3.1 IEEE 128-bit floating point xsmaxcqp and xsmincqp instructions.
	* config/rs6000/rs60000.h (FLOAT128_MIN_MAX_FPMASK_P): New macro.
	* config/rs6000/rs6000.md (s<minmax><mode>3): Add support for the
	ISA 3.1 IEEE 128-bit minimum and maximum instructions.

gcc/testsuite/
2020-09-17  Michael Meissner  <meissner@linux.ibm.com>

	* gcc.target/powerpc/float128-minmax-2.c: New test.
---
 gcc/config/rs6000/rs6000.c                        |  3 ++-
 gcc/config/rs6000/rs6000.h                        |  5 +++++
 gcc/config/rs6000/rs6000.md                       | 11 +++++++++++
 .../gcc.target/powerpc/float128-minmax-2.c        | 15 +++++++++++++++
 4 files changed, 33 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6f204ca202a..e055d9a7979 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15462,7 +15462,8 @@ rs6000_emit_minmax (rtx dest, enum rtx_code code, rtx op0, rtx op1)
   /* VSX/altivec have direct min/max insns.  */
   if ((code == SMAX || code == SMIN)
       && (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
-	  || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))))
+	  || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))
+	  || FLOAT128_MIN_MAX_FPMASK_P (mode)))
     {
       emit_insn (gen_rtx_SET (dest, gen_rtx_fmt_ee (code, mode, op0, op1)));
       return;
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index bbd8060e143..0f30b77ca5c 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -345,6 +345,11 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || ((MODE) == TDmode)						\
    || (!TARGET_FLOAT128_TYPE && FLOAT128_IEEE_P (MODE)))
 
+/* Macro whether the float128 minimum, maximum, and set compare mask
+   instructions are enabled.  */
+#define FLOAT128_MIN_MAX_FPMASK_P(MODE)					\
+  (TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (MODE))
+
 /* Return true for floating point that does not use a vector register.  */
 #define SCALAR_FLOAT_MODE_NOT_VECTOR_P(MODE)				\
   (SCALAR_FLOAT_MODE_P (MODE) && !FLOAT128_VECTOR_P (MODE))
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 694ff70635e..04b89113196 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5163,6 +5163,17 @@ (define_insn "*s<minmax><mode>3_vsx"
 }
   [(set_attr "type" "fp")])
 
+;; Min/max for ISA 3.1 IEEE 128-bit floating point
+(define_insn "s<minmax><mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(fp_minmax:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "xs<minmax>cqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")
+   (set_attr "size" "128")])
+
 ;; The conditional move instructions allow us to perform max and min operations
 ;; even when we don't have the appropriate max/min instruction using the FSEL
 ;; instruction.
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
new file mode 100644
index 00000000000..c71ba08c9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
@@ -0,0 +1,15 @@
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */
+
+#ifndef TYPE
+#define TYPE _Float128
+#endif
+
+/* Test that the fminf128/fmaxf128 functions generate if/then/else and not a
+   call.  */
+TYPE f128_min (TYPE a, TYPE b) { return __builtin_fminf128 (a, b); }
+TYPE f128_max (TYPE a, TYPE b) { return __builtin_fmaxf128 (a, b); }
+
+/* { dg-final { scan-assembler {\mxsmaxcqp\M} } } */
+/* { dg-final { scan-assembler {\mxsmincqp\M} } } */
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] Power10: Add IEEE 128-bit fp conditional move
  2020-09-22  3:39 [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move Michael Meissner
  2020-09-22  3:41 ` [PATCH 1/2] Power10: Add IEEE 128-bit xsmaxcqp and xsmincqp support Michael Meissner
@ 2020-09-22  3:42 ` Michael Meissner
  2020-09-24  8:24 ` [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and " Florian Weimer
  2020-10-12 23:02 ` Ping: " Michael Meissner
  3 siblings, 0 replies; 9+ messages in thread
From: Michael Meissner @ 2020-09-22  3:42 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

Power10: Add IEEE 128-bit fp conditional move.

This patch adds the support for power10 IEEE 128-bit floating point conditional
move and for automatically generating min/max.  Unlike the previous patch, I
decided to keep two separate patterns for fpmask before splitting (one pattern
for normal compares, and the other pattern for inverted compares).  I can go
back to a single pattern with a new predicate that allows either comparison.

Compared to the original code, these patterns do simplify the fpmask insns to
having one alternative instead of two.  In the original code, the first
alternative tried to use the result as a temporary register.  But that doesn't
work if you are doing a conditional move with SF/DF types, but the comparison
is KF/TF.  That is because the SF/DF types can use the traditional FPR
registers, but IEEE 128-bit floating point can only do arithmetic in the
traditional Altivec registers.

This code also has to insert a XXPERMDI if you are moving KF/TF values, but
the comparison is done with SF/DF values.  In this case, the set and compare
mask for SF/DF clears the bottom 64-bits of the register, and the XXPERMDI is
needed to fill it.

gcc/
2020-09-17  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/rs6000.c (have_compare_and_set_mask): Add IEEE
	128-bit floating point types.
	* config/rs6000/rs6000.md (FPMASK): New iterator.
	(FPMASK2): New iterator.
	(Fv mode attribute): Add KFmode and TFmode.
	(mov<FPMASK:mode><FPMASK2:mode>cc_fpmask): Replace
	mov<SFDF:mode><SFDF2:mode>cc_p9.  Add IEEE 128-bit fp support.
	(mov<FPMASK:mode><FPMASK2:mode>cc_invert_fpmask): Replace
	mov<SFDF:mode><SFDF2:mode>cc_invert_p9.  Add IEEE 128-bit fp
	support.
	(fpmask<mode>): Add IEEE 128-bit fp support.  Enable generator to
	build te RTL.
	(xxsel<mode>): Add IEEE 128-bit fp support.  Enable generator to
	build te RTL.

gcc/testsuite/
2020-09-17  Michael Meissner  <meissner@linux.ibm.com>

	* gcc.target/powerpc/float128-cmove.c: New test.
	* gcc.target/powerpc/float128-minmax-3.c: New test.
---
 gcc/config/rs6000/rs6000.c                    |   8 +-
 gcc/config/rs6000/rs6000.md                   | 188 ++++++++++++------
 .../gcc.target/powerpc/float128-cmove.c       |  93 +++++++++
 .../gcc.target/powerpc/float128-minmax-3.c    |  15 ++
 4 files changed, 237 insertions(+), 67 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-cmove.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e055d9a7979..92b080a96df 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15057,8 +15057,8 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, rtx op_false,
   return 1;
 }
 
-/* Possibly emit the xsmaxcdp and xsmincdp instructions to emit a maximum or
-   minimum with "C" semantics.
+/* Possibly emit the xsmaxc{dp,qp} and xsminc{dp,qp} instructions to emit a
+   maximum or minimum with "C" semantics.
 
    Unless you use -ffast-math, you can't use these instructions to replace
    conditions that implicitly reverse the condition because the comparison
@@ -15194,6 +15194,10 @@ have_compare_and_set_mask (machine_mode mode)
     case E_DFmode:
       return TARGET_P9_MINMAX;
 
+    case E_KFmode:
+    case E_TFmode:
+      return FLOAT128_MIN_MAX_FPMASK_P (mode);
+
     default:
       break;
     }
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 04b89113196..1aebaa7fade 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -564,6 +564,19 @@ (define_mode_iterator SFDF [SF DF])
 ; And again, for when we need two FP modes in a pattern.
 (define_mode_iterator SFDF2 [SF DF])
 
+; Floating scalars that supports the set compare mask instruction.
+(define_mode_iterator FPMASK [SF
+			      DF
+			      (KF "FLOAT128_MIN_MAX_FPMASK_P (KFmode)")
+			      (TF "FLOAT128_MIN_MAX_FPMASK_P (TFmode)")])
+
+; And again, for patterns that need two (potentially) different floating point
+; scalars that support the set compare mask instruction.
+(define_mode_iterator FPMASK2 [SF
+			       DF
+			       (KF "FLOAT128_MIN_MAX_FPMASK_P (KFmode)")
+			       (TF "FLOAT128_MIN_MAX_FPMASK_P (TFmode)")])
+
 ; A generic s/d attribute, for sp/dp for example.
 (define_mode_attr sd [(SF   "s") (DF   "d")
 		      (V4SF "s") (V2DF "d")])
@@ -597,8 +610,13 @@ (define_mode_attr Ff		[(SF "f") (DF "d") (DI "d")])
 ; SF/DF constraint for arithmetic on VSX registers using instructions added in
 ; ISA 2.06 (power7).  This includes instructions that normally target DF mode,
 ; but are used on SFmode, since internally SFmode values are kept in the DFmode
-; format.
-(define_mode_attr Fv		[(SF "wa") (DF "wa") (DI "wa")])
+; format.  Also include IEEE 128-bit instructions which are restricted to the
+; Altivec registers.
+(define_mode_attr Fv		[(SF "wa")
+				 (DF "wa")
+				 (DI "wa")
+				 (KF "v")
+				 (TF "v")])
 
 ; Which isa is needed for those float instructions?
 (define_mode_attr Fisa		[(SF "p8v")  (DF "*") (DI "*")])
@@ -5285,10 +5303,10 @@ (define_insn "*setnbcr_<un>signed_<GPR:mode>"
 
 ;; Floating point conditional move
 (define_expand "mov<mode>cc"
-   [(set (match_operand:SFDF 0 "gpc_reg_operand")
-	 (if_then_else:SFDF (match_operand 1 "comparison_operator")
-			    (match_operand:SFDF 2 "gpc_reg_operand")
-			    (match_operand:SFDF 3 "gpc_reg_operand")))]
+   [(set (match_operand:FPMASK 0 "gpc_reg_operand")
+	 (if_then_else:FPMASK (match_operand 1 "comparison_operator")
+			      (match_operand:FPMASK 2 "gpc_reg_operand")
+			      (match_operand:FPMASK 3 "gpc_reg_operand")))]
   "TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT"
 {
   if (rs6000_emit_cmove (operands[0], operands[1], operands[2], operands[3]))
@@ -5308,92 +5326,132 @@ (define_insn "*fsel<SFDF:mode><SFDF2:mode>4"
   "fsel %0,%1,%2,%3"
   [(set_attr "type" "fp")])
 
-(define_insn_and_split "*mov<SFDF:mode><SFDF2:mode>cc_p9"
-  [(set (match_operand:SFDF 0 "vsx_register_operand" "=&<SFDF:Fv>,<SFDF:Fv>")
-	(if_then_else:SFDF
+(define_insn_and_split "*mov<FPMASK:mode><FPMASK2:mode>cc_fpmask"
+  [(set (match_operand:FPMASK 0 "vsx_register_operand" "=<FPMASK:Fv>")
+	(if_then_else:FPMASK
 	 (match_operator:CCFP 1 "fpmask_comparison_operator"
-		[(match_operand:SFDF2 2 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")
-		 (match_operand:SFDF2 3 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")])
-	 (match_operand:SFDF 4 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")
-	 (match_operand:SFDF 5 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")))
-   (clobber (match_scratch:V2DI 6 "=0,&wa"))]
+	    [(match_operand:FPMASK2 2 "vsx_register_operand" "<FPMASK2:Fv>")
+	     (match_operand:FPMASK2 3 "vsx_register_operand" "<FPMASK2:Fv>")])
+	 (match_operand:FPMASK 4 "vsx_register_operand" "<FPMASK:Fv>")
+	 (match_operand:FPMASK 5 "vsx_register_operand" "<FPMASK:Fv>")))
+   (clobber (match_scratch:V2DI 6 "=&<FPMASK2:Fv>"))]
   "TARGET_P9_MINMAX"
   "#"
-  ""
-  [(set (match_dup 6)
-	(if_then_else:V2DI (match_dup 1)
-			   (match_dup 7)
-			   (match_dup 8)))
-   (set (match_dup 0)
-	(if_then_else:SFDF (ne (match_dup 6)
-			       (match_dup 8))
-			   (match_dup 4)
-			   (match_dup 5)))]
+  "&& 1"
+  [(pc)]
 {
-  if (GET_CODE (operands[6]) == SCRATCH)
-    operands[6] = gen_reg_rtx (V2DImode);
+  rtx dest = operands[0];
+  rtx cmp = operands[1];
+  rtx cmp_op1 = operands[2];
+  rtx cmp_op2 = operands[3];
+  rtx move_t = operands[4];
+  rtx move_f = operands[5];
+  rtx mask_reg = operands[6];
+  rtx mask_m1 = CONSTM1_RTX (V2DImode);
+  rtx mask_0 = CONST0_RTX (V2DImode);
+  machine_mode move_mode = <FPMASK:MODE>mode;
+  machine_mode compare_mode = <FPMASK2:MODE>mode;
 
-  operands[7] = CONSTM1_RTX (V2DImode);
-  operands[8] = CONST0_RTX (V2DImode);
+  if (GET_CODE (mask_reg) == SCRATCH)
+    mask_reg = gen_reg_rtx (V2DImode);
+
+  /* Emit the compare and set mask instruction.  */
+  emit_insn (gen_fpmask<FPMASK2:mode> (mask_reg, cmp, cmp_op1, cmp_op2,
+				       mask_m1, mask_0));
+
+  /* If we have a 64-bit comparison, but an 128-bit move, we need to extend the
+     mask.  Because we are using the splat builtin to extend the V2DImode, we
+     need to use element 1 on little endian systems.  */
+  if (!FLOAT128_IEEE_P (compare_mode) && FLOAT128_IEEE_P (move_mode))
+    {
+      rtx element = WORDS_BIG_ENDIAN ? const0_rtx : const1_rtx;
+      emit_insn (gen_vsx_xxspltd_v2di (mask_reg, mask_reg, element));
+    }
+
+  /* Now emit the XXSEL insn.  */
+  emit_insn (gen_xxsel<FPMASK:mode> (dest, mask_reg, mask_0, move_t, move_f));
+  DONE;
 }
- [(set_attr "length" "8")
+ ;; length is 12 in case we need to add XXPERMDI
+ [(set_attr "length" "12")
   (set_attr "type" "vecperm")])
 
 ;; Handle inverting the fpmask comparisons.
-(define_insn_and_split "*mov<SFDF:mode><SFDF2:mode>cc_invert_p9"
-  [(set (match_operand:SFDF 0 "vsx_register_operand" "=&<SFDF:Fv>,<SFDF:Fv>")
-	(if_then_else:SFDF
+(define_insn_and_split "*mov<FPMASK:mode><FPMASK2:mode>cc_invert_fpmask"
+  [(set (match_operand:FPMASK 0 "vsx_register_operand" "=<FPMASK:Fv>")
+	(if_then_else:FPMASK
 	 (match_operator:CCFP 1 "invert_fpmask_comparison_operator"
-		[(match_operand:SFDF2 2 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")
-		 (match_operand:SFDF2 3 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")])
-	 (match_operand:SFDF 4 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")
-	 (match_operand:SFDF 5 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")))
-   (clobber (match_scratch:V2DI 6 "=0,&wa"))]
+	    [(match_operand:FPMASK2 2 "vsx_register_operand" "<FPMASK2:Fv>")
+	     (match_operand:FPMASK2 3 "vsx_register_operand" "<FPMASK2:Fv>")])
+	 (match_operand:FPMASK 4 "vsx_register_operand" "<FPMASK:Fv>")
+	 (match_operand:FPMASK 5 "vsx_register_operand" "<FPMASK:Fv>")))
+   (clobber (match_scratch:V2DI 6 "=&<FPMASK2:Fv>"))]
   "TARGET_P9_MINMAX"
   "#"
   "&& 1"
-  [(set (match_dup 6)
-	(if_then_else:V2DI (match_dup 9)
-			   (match_dup 7)
-			   (match_dup 8)))
-   (set (match_dup 0)
-	(if_then_else:SFDF (ne (match_dup 6)
-			       (match_dup 8))
-			   (match_dup 5)
-			   (match_dup 4)))]
+  [(pc)]
 {
-  rtx op1 = operands[1];
-  enum rtx_code cond = reverse_condition_maybe_unordered (GET_CODE (op1));
+  rtx dest = operands[0];
+  rtx old_cmp = operands[1];
+  rtx cmp_op1 = operands[2];
+  rtx cmp_op2 = operands[3];
+  enum rtx_code cond = reverse_condition_maybe_unordered (GET_CODE (old_cmp));
+  rtx cmp_rev = gen_rtx_fmt_ee (cond, CCFPmode, cmp_op1, cmp_op2);
+  rtx move_f = operands[4];
+  rtx move_t = operands[5];
+  rtx mask_reg = operands[6];
+  rtx mask_m1 = CONSTM1_RTX (V2DImode);
+  rtx mask_0 = CONST0_RTX (V2DImode);
+  machine_mode move_mode = <FPMASK:MODE>mode;
+  machine_mode compare_mode = <FPMASK2:MODE>mode;
+
+  if (GET_CODE (mask_reg) == SCRATCH)
+    mask_reg = gen_reg_rtx (V2DImode);
 
-  if (GET_CODE (operands[6]) == SCRATCH)
-    operands[6] = gen_reg_rtx (V2DImode);
+  /* Emit the compare and set mask instruction.  */
+  emit_insn (gen_fpmask<FPMASK2:mode> (mask_reg, cmp_rev, cmp_op1, cmp_op2,
+				       mask_m1, mask_0));
 
-  operands[7] = CONSTM1_RTX (V2DImode);
-  operands[8] = CONST0_RTX (V2DImode);
+  /* If we have a 64-bit comparison, but an 128-bit move, we need to extend the
+     mask.  Because we are using the splat builtin to extend the V2DImode, we
+     need to use element 1 on little endian systems.  */
+  if (!FLOAT128_IEEE_P (compare_mode) && FLOAT128_IEEE_P (move_mode))
+    {
+      rtx element = WORDS_BIG_ENDIAN ? const0_rtx : const1_rtx;
+      emit_insn (gen_vsx_xxspltd_v2di (mask_reg, mask_reg, element));
+    }
 
-  operands[9] = gen_rtx_fmt_ee (cond, CCFPmode, operands[2], operands[3]);
+  /* Now emit the XXSEL insn.  */
+  emit_insn (gen_xxsel<FPMASK:mode> (dest, mask_reg, mask_0, move_t, move_f));
+  DONE;
 }
- [(set_attr "length" "8")
+ ;; length is 12 in case we need to add XXPERMDI
+ [(set_attr "length" "12")
   (set_attr "type" "vecperm")])
 
-(define_insn "*fpmask<mode>"
-  [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa")
+(define_insn "fpmask<mode>"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=<Fv>")
 	(if_then_else:V2DI
 	 (match_operator:CCFP 1 "fpmask_comparison_operator"
-		[(match_operand:SFDF 2 "vsx_register_operand" "<Fv>")
-		 (match_operand:SFDF 3 "vsx_register_operand" "<Fv>")])
+		[(match_operand:FPMASK 2 "vsx_register_operand" "<Fv>")
+		 (match_operand:FPMASK 3 "vsx_register_operand" "<Fv>")])
 	 (match_operand:V2DI 4 "all_ones_constant" "")
 	 (match_operand:V2DI 5 "zero_constant" "")))]
   "TARGET_P9_MINMAX"
-  "xscmp%V1dp %x0,%x2,%x3"
+{
+  return (FLOAT128_IEEE_P (<MODE>mode)
+	  ? "xscmp%V1qp %0,%2,%3"
+	  : "xscmp%V1dp %x0,%x2,%x3");
+}
   [(set_attr "type" "fpcompare")])
 
-(define_insn "*xxsel<mode>"
-  [(set (match_operand:SFDF 0 "vsx_register_operand" "=<Fv>")
-	(if_then_else:SFDF (ne (match_operand:V2DI 1 "vsx_register_operand" "wa")
-			       (match_operand:V2DI 2 "zero_constant" ""))
-			   (match_operand:SFDF 3 "vsx_register_operand" "<Fv>")
-			   (match_operand:SFDF 4 "vsx_register_operand" "<Fv>")))]
+(define_insn "xxsel<mode>"
+  [(set (match_operand:FPMASK 0 "vsx_register_operand" "=wa")
+	(if_then_else:FPMASK
+	 (ne (match_operand:V2DI 1 "vsx_register_operand" "wa")
+	     (match_operand:V2DI 2 "zero_constant" ""))
+	 (match_operand:FPMASK 3 "vsx_register_operand" "wa")
+	 (match_operand:FPMASK 4 "vsx_register_operand" "wa")))]
   "TARGET_P9_MINMAX"
   "xxsel %x0,%x4,%x3,%x1"
   [(set_attr "type" "vecmove")])
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-cmove.c b/gcc/testsuite/gcc.target/powerpc/float128-cmove.c
new file mode 100644
index 00000000000..639d5a77087
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-cmove.c
@@ -0,0 +1,93 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+/* { dg-final { scan-assembler     {\mxscmpeq[dq]p\M} } } */
+/* { dg-final { scan-assembler     {\mxxpermdi\M}     } } */
+/* { dg-final { scan-assembler     {\mxxsel\M}        } } */
+/* { dg-final { scan-assembler-not {\mxscmpu[dq]p\M}  } } */
+/* { dg-final { scan-assembler-not {\mfcmp[uo]\M}     } } */
+/* { dg-final { scan-assembler-not {\mfsel\M}         } } */
+
+/* This series of tests tests whether you can do a conditional move where the
+   test is one floating point type, and the result is another floating point
+   type.
+
+   If the comparison type is SF/DFmode, and the move type is IEEE 128-bit
+   floating point, we have to duplicate the mask in the lower 64-bits with
+   XXPERMDI because XSCMPEQDP clears the bottom 64-bits of the mask register.
+
+   Going the other way (IEEE 128-bit comparsion, 64-bit move) is fine as the
+   mask word will be 128-bits.  */
+
+float
+eq_f_d (float a, float b, double x, double y)
+{
+  return (x == y) ? a : b;
+}
+
+double
+eq_d_f (double a, double b, float x, float y)
+{
+  return (x == y) ? a : b;
+}
+
+float
+eq_f_f128 (float a, float b, __float128 x, __float128 y)
+{
+  return (x == y) ? a : b;
+}
+
+double
+eq_d_f128 (double a, double b, __float128 x, __float128 y)
+{
+  return (x == y) ? a : b;
+}
+
+__float128
+eq_f128_f (__float128 a, __float128 b, float x, float y)
+{
+  return (x == y) ? a : b;
+}
+
+__float128
+eq_f128_d (__float128 a, __float128 b, double x, double y)
+{
+  return (x != y) ? a : b;
+}
+
+float
+ne_f_d (float a, float b, double x, double y)
+{
+  return (x != y) ? a : b;
+}
+
+double
+ne_d_f (double a, double b, float x, float y)
+{
+  return (x != y) ? a : b;
+}
+
+float
+ne_f_f128 (float a, float b, __float128 x, __float128 y)
+{
+  return (x != y) ? a : b;
+}
+
+double
+ne_d_f128 (double a, double b, __float128 x, __float128 y)
+{
+  return (x != y) ? a : b;
+}
+
+__float128
+ne_f128_f (__float128 a, __float128 b, float x, float y)
+{
+  return (x != y) ? a : b;
+}
+
+__float128
+ne_f128_d (__float128 a, __float128 b, double x, double y)
+{
+  return (x != y) ? a : b;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c b/gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c
new file mode 100644
index 00000000000..6f7627c0f2a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c
@@ -0,0 +1,15 @@
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#ifndef TYPE
+#define TYPE _Float128
+#endif
+
+/* Test that the fminf128/fmaxf128 functions generate if/then/else and not a
+   call.  */
+TYPE f128_min (TYPE a, TYPE b) { return (a < b) ? a : b; }
+TYPE f128_max (TYPE a, TYPE b) { return (b > a) ? b : a; }
+
+/* { dg-final { scan-assembler {\mxsmaxcqp\M} } } */
+/* { dg-final { scan-assembler {\mxsmincqp\M} } } */
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move
  2020-09-22  3:39 [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move Michael Meissner
  2020-09-22  3:41 ` [PATCH 1/2] Power10: Add IEEE 128-bit xsmaxcqp and xsmincqp support Michael Meissner
  2020-09-22  3:42 ` [PATCH 2/2] Power10: Add IEEE 128-bit fp conditional move Michael Meissner
@ 2020-09-24  8:24 ` Florian Weimer
  2020-09-24 20:56   ` Michael Meissner
  2020-10-12 23:02 ` Ping: " Michael Meissner
  3 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2020-09-24  8:24 UTC (permalink / raw)
  To: Michael Meissner
  Cc: gcc-patches, Segher Boessenkool, David Edelsohn, Bill Schmidt,
	Peter Bergner

* Michael Meissner via Gcc-patches:

> These patches are my latest versions of the patches to add IEEE 128-bit min,
> max, and conditional move to GCC.  They correspond to the earlier patches #3
> and #4 (patches #1 and #2 have been installed).

Is this about IEEE min or IEEE minimum?  My understanding is that they
are not the same (or that the behavior depends on the standard version,
but I think min was replaced with minimum in the 2019 standard or
something like that).

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move
  2020-09-24  8:24 ` [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and " Florian Weimer
@ 2020-09-24 20:56   ` Michael Meissner
  2020-09-25 18:52     ` Segher Boessenkool
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Meissner @ 2020-09-24 20:56 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

On Thu, Sep 24, 2020 at 10:24:52AM +0200, Florian Weimer wrote:
> * Michael Meissner via Gcc-patches:
> 
> > These patches are my latest versions of the patches to add IEEE 128-bit min,
> > max, and conditional move to GCC.  They correspond to the earlier patches #3
> > and #4 (patches #1 and #2 have been installed).
> 
> Is this about IEEE min or IEEE minimum?  My understanding is that they
> are not the same (or that the behavior depends on the standard version,
> but I think min was replaced with minimum in the 2019 standard or
> something like that).
> 
> Thanks,
> Florian

The ISA 3.0 added 2 min/max variants to add to the original variant in power7
(ISA 2.6).

	xsmaxdp   Maximum value
	xsmaxcdp  Maximum value with "C" semantics
	xsmaxjdp  Maximum value with "Java" semantics

Due to the NaN rules, unless you use -ffast-math, the compiler won't generate
these by default.  However with the compare and set mask instruction that was
also introduced in ISA 3.0, the compiler can use compare and set mask to
implement maximum and minimum in some cases, that would return the 'right'
value with NaNs.

In ISA 3.1 (power10) the decision was made to only provide the "C" form on
maximum and minimum.  Hence the test in the first patch that uses -ffast-math
to get XSMAXCQP generated.  The second patch adds the conditional move support,
which like for SF/DF modes, can generate maximum and minimums in some cases.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move
  2020-09-24 20:56   ` Michael Meissner
@ 2020-09-25 18:52     ` Segher Boessenkool
  0 siblings, 0 replies; 9+ messages in thread
From: Segher Boessenkool @ 2020-09-25 18:52 UTC (permalink / raw)
  To: Michael Meissner, Florian Weimer, gcc-patches, David Edelsohn,
	Bill Schmidt, Peter Bergner

Hi!

On Thu, Sep 24, 2020 at 04:56:27PM -0400, Michael Meissner wrote:
> On Thu, Sep 24, 2020 at 10:24:52AM +0200, Florian Weimer wrote:
> > * Michael Meissner via Gcc-patches:
> > 
> > > These patches are my latest versions of the patches to add IEEE 128-bit min,
> > > max, and conditional move to GCC.  They correspond to the earlier patches #3
> > > and #4 (patches #1 and #2 have been installed).
> > 
> > Is this about IEEE min or IEEE minimum?  My understanding is that they
> > are not the same (or that the behavior depends on the standard version,
> > but I think min was replaced with minimum in the 2019 standard or
> > something like that).

This is about the GCC internal RTX code "smin", which returns an
undefined result if either operand is a NAN, or both are zeros (of
different sign).

> The ISA 3.0 added 2 min/max variants to add to the original variant in power7
> (ISA 2.6).

2.06, fwiw.

> 	xsmaxdp   Maximum value
> 	xsmaxcdp  Maximum value with "C" semantics
> 	xsmaxjdp  Maximum value with "Java" semantics

xsmaxdp implements IEEE behaviour fine.  xsmaxcdp is simply the C
expression  (x > y ? x : y) (or something like that), and xsmaxjdp is
something like that for Java.

> Due to the NaN rules, unless you use -ffast-math, the compiler won't generate
> these by default.

Simply because the RTL would be undefined!

> In ISA 3.1 (power10) the decision was made to only provide the "C" form on
> maximum and minimum.

... for quad precision.


Segher

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Ping: [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move
  2020-09-22  3:39 [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move Michael Meissner
                   ` (2 preceding siblings ...)
  2020-09-24  8:24 ` [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and " Florian Weimer
@ 2020-10-12 23:02 ` Michael Meissner
  3 siblings, 0 replies; 9+ messages in thread
From: Michael Meissner @ 2020-10-12 23:02 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool
  Cc: gcc-patches, David Edelsohn, Bill Schmidt, Peter Bergner

Ping the following two patches to add IEEE 128-bit minimum, maximu, and
conditional move support:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554460.html
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554461.html

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] Power10: Add IEEE 128-bit fp conditional move.
  2021-01-15  4:40 [PATCH 0/2] PowerPC: Add power10 IEEE 128-bit min/max/cmove Michael Meissner
@ 2021-01-15  4:42 ` Michael Meissner
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Meissner @ 2021-01-15  4:42 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

[PATCH 2/2] Power10: Add IEEE 128-bit fp conditional move.

This patch adds the support for power10 IEEE 128-bit floating point conditional
move and for automatically generating min/max.  Unlike the previous patch, I
decided to keep two separate patterns for fpmask before splitting (one pattern
for normal compares, and the other pattern for inverted compares).  I can go
back to a single pattern with a new predicate that allows either comparison.

Compared to the original code, these patterns do simplify the fpmask insns to
having one alternative instead of two.  In the original code, the first
alternative tried to use the result as a temporary register.  But that doesn't
work if you are doing a conditional move with SF/DF types, but the comparison
is KF/TF.  That is because the SF/DF types can use the traditional FPR
registers, but IEEE 128-bit floating point can only do arithmetic in the
traditional Altivec registers.

This code also has to insert a XXPERMDI if you are moving KF/TF values, but
the comparison is done with SF/DF values.  In this case, the set and compare
mask for SF/DF clears the bottom 64-bits of the register, and the XXPERMDI is
needed to fill it.

I have tested this on little endian power9 servers, and there were no
regressions.  I have also done unit tests on power10 hardware and the code
generates the min/max instructions.  Can I check this into the master branch?

gcc/
2021-01-14 Michael Meissner  <meissner@linux.ibm.com>

        * config/rs6000/rs6000.c (have_compare_and_set_mask): Add IEEE
        128-bit floating point types.
        * config/rs6000/rs6000.md (FPMASK): New iterator.
        (FPMASK2): New iterator.
        (Fv mode attribute): Add KFmode and TFmode.
        (mov<FPMASK:mode><FPMASK2:mode>cc_fpmask): Replace
        mov<SFDF:mode><SFDF2:mode>cc_p9.  Add IEEE 128-bit fp support.
        (mov<FPMASK:mode><FPMASK2:mode>cc_invert_fpmask): Replace
        mov<SFDF:mode><SFDF2:mode>cc_invert_p9.  Add IEEE 128-bit fp
        support.
        (fpmask<mode>): Add IEEE 128-bit fp support.  Enable generator to
        build te RTL.
        (xxsel<mode>): Add IEEE 128-bit fp support.  Enable generator to
        build te RTL.

gcc/testsuite/
2021-01-14  Michael Meissner  <meissner@linux.ibm.com>

        * gcc.target/powerpc/float128-cmove.c: New test.
        * gcc.target/powerpc/float128-minmax-3.c: New test.
---
 gcc/config/rs6000/rs6000.c                    |   8 +-
 gcc/config/rs6000/rs6000.md                   | 188 ++++++++++++------
 .../gcc.target/powerpc/float128-cmove.c       |  93 +++++++++
 .../gcc.target/powerpc/float128-minmax-3.c    |  15 ++
 4 files changed, 237 insertions(+), 67 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-cmove.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 08b407030a3..47a56912e27 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15337,8 +15337,8 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, rtx op_false,
   return 1;
 }
 
-/* Possibly emit the xsmaxcdp and xsmincdp instructions to emit a maximum or
-   minimum with "C" semantics.
+/* Possibly emit the xsmaxc{dp,qp} and xsminc{dp,qp} instructions to emit a
+   maximum or minimum with "C" semantics.
 
    Unless you use -ffast-math, you can't use these instructions to replace
    conditions that implicitly reverse the condition because the comparison
@@ -15474,6 +15474,10 @@ have_compare_and_set_mask (machine_mode mode)
     case E_DFmode:
       return TARGET_P9_MINMAX;
 
+    case E_KFmode:
+    case E_TFmode:
+      return FLOAT128_MIN_MAX_FPMASK_P (mode);
+
     default:
       break;
     }
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 45be9c381bc..d658b45ee3a 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -564,6 +564,19 @@ (define_mode_iterator SFDF [SF DF])
 ; And again, for when we need two FP modes in a pattern.
 (define_mode_iterator SFDF2 [SF DF])
 
+; Floating scalars that supports the set compare mask instruction.
+(define_mode_iterator FPMASK [SF
+			      DF
+			      (KF "FLOAT128_MIN_MAX_FPMASK_P (KFmode)")
+			      (TF "FLOAT128_MIN_MAX_FPMASK_P (TFmode)")])
+
+; And again, for patterns that need two (potentially) different floating point
+; scalars that support the set compare mask instruction.
+(define_mode_iterator FPMASK2 [SF
+			       DF
+			       (KF "FLOAT128_MIN_MAX_FPMASK_P (KFmode)")
+			       (TF "FLOAT128_MIN_MAX_FPMASK_P (TFmode)")])
+
 ; A generic s/d attribute, for sp/dp for example.
 (define_mode_attr sd [(SF   "s") (DF   "d")
 		      (V4SF "s") (V2DF "d")])
@@ -597,8 +610,13 @@ (define_mode_attr Ff		[(SF "f") (DF "d") (DI "d")])
 ; SF/DF constraint for arithmetic on VSX registers using instructions added in
 ; ISA 2.06 (power7).  This includes instructions that normally target DF mode,
 ; but are used on SFmode, since internally SFmode values are kept in the DFmode
-; format.
-(define_mode_attr Fv		[(SF "wa") (DF "wa") (DI "wa")])
+; format.  Also include IEEE 128-bit instructions which are restricted to the
+; Altivec registers.
+(define_mode_attr Fv		[(SF "wa")
+				 (DF "wa")
+				 (DI "wa")
+				 (KF "v")
+				 (TF "v")])
 
 ; Which isa is needed for those float instructions?
 (define_mode_attr Fisa		[(SF "p8v")  (DF "*") (DI "*")])
@@ -5285,10 +5303,10 @@ (define_insn "*setnbcr_<un>signed_<GPR:mode>"
 
 ;; Floating point conditional move
 (define_expand "mov<mode>cc"
-   [(set (match_operand:SFDF 0 "gpc_reg_operand")
-	 (if_then_else:SFDF (match_operand 1 "comparison_operator")
-			    (match_operand:SFDF 2 "gpc_reg_operand")
-			    (match_operand:SFDF 3 "gpc_reg_operand")))]
+   [(set (match_operand:FPMASK 0 "gpc_reg_operand")
+	 (if_then_else:FPMASK (match_operand 1 "comparison_operator")
+			      (match_operand:FPMASK 2 "gpc_reg_operand")
+			      (match_operand:FPMASK 3 "gpc_reg_operand")))]
   "TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT"
 {
   if (rs6000_emit_cmove (operands[0], operands[1], operands[2], operands[3]))
@@ -5308,92 +5326,132 @@ (define_insn "*fsel<SFDF:mode><SFDF2:mode>4"
   "fsel %0,%1,%2,%3"
   [(set_attr "type" "fp")])
 
-(define_insn_and_split "*mov<SFDF:mode><SFDF2:mode>cc_p9"
-  [(set (match_operand:SFDF 0 "vsx_register_operand" "=&<SFDF:Fv>,<SFDF:Fv>")
-	(if_then_else:SFDF
+(define_insn_and_split "*mov<FPMASK:mode><FPMASK2:mode>cc_fpmask"
+  [(set (match_operand:FPMASK 0 "vsx_register_operand" "=<FPMASK:Fv>")
+	(if_then_else:FPMASK
 	 (match_operator:CCFP 1 "fpmask_comparison_operator"
-		[(match_operand:SFDF2 2 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")
-		 (match_operand:SFDF2 3 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")])
-	 (match_operand:SFDF 4 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")
-	 (match_operand:SFDF 5 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")))
-   (clobber (match_scratch:V2DI 6 "=0,&wa"))]
+	    [(match_operand:FPMASK2 2 "vsx_register_operand" "<FPMASK2:Fv>")
+	     (match_operand:FPMASK2 3 "vsx_register_operand" "<FPMASK2:Fv>")])
+	 (match_operand:FPMASK 4 "vsx_register_operand" "<FPMASK:Fv>")
+	 (match_operand:FPMASK 5 "vsx_register_operand" "<FPMASK:Fv>")))
+   (clobber (match_scratch:V2DI 6 "=&<FPMASK2:Fv>"))]
   "TARGET_P9_MINMAX"
   "#"
-  ""
-  [(set (match_dup 6)
-	(if_then_else:V2DI (match_dup 1)
-			   (match_dup 7)
-			   (match_dup 8)))
-   (set (match_dup 0)
-	(if_then_else:SFDF (ne (match_dup 6)
-			       (match_dup 8))
-			   (match_dup 4)
-			   (match_dup 5)))]
+  "&& 1"
+  [(pc)]
 {
-  if (GET_CODE (operands[6]) == SCRATCH)
-    operands[6] = gen_reg_rtx (V2DImode);
+  rtx dest = operands[0];
+  rtx cmp = operands[1];
+  rtx cmp_op1 = operands[2];
+  rtx cmp_op2 = operands[3];
+  rtx move_t = operands[4];
+  rtx move_f = operands[5];
+  rtx mask_reg = operands[6];
+  rtx mask_m1 = CONSTM1_RTX (V2DImode);
+  rtx mask_0 = CONST0_RTX (V2DImode);
+  machine_mode move_mode = <FPMASK:MODE>mode;
+  machine_mode compare_mode = <FPMASK2:MODE>mode;
 
-  operands[7] = CONSTM1_RTX (V2DImode);
-  operands[8] = CONST0_RTX (V2DImode);
+  if (GET_CODE (mask_reg) == SCRATCH)
+    mask_reg = gen_reg_rtx (V2DImode);
+
+  /* Emit the compare and set mask instruction.  */
+  emit_insn (gen_fpmask<FPMASK2:mode> (mask_reg, cmp, cmp_op1, cmp_op2,
+				       mask_m1, mask_0));
+
+  /* If we have a 64-bit comparison, but an 128-bit move, we need to extend the
+     mask.  Because we are using the splat builtin to extend the V2DImode, we
+     need to use element 1 on little endian systems.  */
+  if (!FLOAT128_IEEE_P (compare_mode) && FLOAT128_IEEE_P (move_mode))
+    {
+      rtx element = WORDS_BIG_ENDIAN ? const0_rtx : const1_rtx;
+      emit_insn (gen_vsx_xxspltd_v2di (mask_reg, mask_reg, element));
+    }
+
+  /* Now emit the XXSEL insn.  */
+  emit_insn (gen_xxsel<FPMASK:mode> (dest, mask_reg, mask_0, move_t, move_f));
+  DONE;
 }
- [(set_attr "length" "8")
+ ;; length is 12 in case we need to add XXPERMDI
+ [(set_attr "length" "12")
   (set_attr "type" "vecperm")])
 
 ;; Handle inverting the fpmask comparisons.
-(define_insn_and_split "*mov<SFDF:mode><SFDF2:mode>cc_invert_p9"
-  [(set (match_operand:SFDF 0 "vsx_register_operand" "=&<SFDF:Fv>,<SFDF:Fv>")
-	(if_then_else:SFDF
+(define_insn_and_split "*mov<FPMASK:mode><FPMASK2:mode>cc_invert_fpmask"
+  [(set (match_operand:FPMASK 0 "vsx_register_operand" "=<FPMASK:Fv>")
+	(if_then_else:FPMASK
 	 (match_operator:CCFP 1 "invert_fpmask_comparison_operator"
-		[(match_operand:SFDF2 2 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")
-		 (match_operand:SFDF2 3 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")])
-	 (match_operand:SFDF 4 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")
-	 (match_operand:SFDF 5 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")))
-   (clobber (match_scratch:V2DI 6 "=0,&wa"))]
+	    [(match_operand:FPMASK2 2 "vsx_register_operand" "<FPMASK2:Fv>")
+	     (match_operand:FPMASK2 3 "vsx_register_operand" "<FPMASK2:Fv>")])
+	 (match_operand:FPMASK 4 "vsx_register_operand" "<FPMASK:Fv>")
+	 (match_operand:FPMASK 5 "vsx_register_operand" "<FPMASK:Fv>")))
+   (clobber (match_scratch:V2DI 6 "=&<FPMASK2:Fv>"))]
   "TARGET_P9_MINMAX"
   "#"
   "&& 1"
-  [(set (match_dup 6)
-	(if_then_else:V2DI (match_dup 9)
-			   (match_dup 7)
-			   (match_dup 8)))
-   (set (match_dup 0)
-	(if_then_else:SFDF (ne (match_dup 6)
-			       (match_dup 8))
-			   (match_dup 5)
-			   (match_dup 4)))]
+  [(pc)]
 {
-  rtx op1 = operands[1];
-  enum rtx_code cond = reverse_condition_maybe_unordered (GET_CODE (op1));
+  rtx dest = operands[0];
+  rtx old_cmp = operands[1];
+  rtx cmp_op1 = operands[2];
+  rtx cmp_op2 = operands[3];
+  enum rtx_code cond = reverse_condition_maybe_unordered (GET_CODE (old_cmp));
+  rtx cmp_rev = gen_rtx_fmt_ee (cond, CCFPmode, cmp_op1, cmp_op2);
+  rtx move_f = operands[4];
+  rtx move_t = operands[5];
+  rtx mask_reg = operands[6];
+  rtx mask_m1 = CONSTM1_RTX (V2DImode);
+  rtx mask_0 = CONST0_RTX (V2DImode);
+  machine_mode move_mode = <FPMASK:MODE>mode;
+  machine_mode compare_mode = <FPMASK2:MODE>mode;
+
+  if (GET_CODE (mask_reg) == SCRATCH)
+    mask_reg = gen_reg_rtx (V2DImode);
 
-  if (GET_CODE (operands[6]) == SCRATCH)
-    operands[6] = gen_reg_rtx (V2DImode);
+  /* Emit the compare and set mask instruction.  */
+  emit_insn (gen_fpmask<FPMASK2:mode> (mask_reg, cmp_rev, cmp_op1, cmp_op2,
+				       mask_m1, mask_0));
 
-  operands[7] = CONSTM1_RTX (V2DImode);
-  operands[8] = CONST0_RTX (V2DImode);
+  /* If we have a 64-bit comparison, but an 128-bit move, we need to extend the
+     mask.  Because we are using the splat builtin to extend the V2DImode, we
+     need to use element 1 on little endian systems.  */
+  if (!FLOAT128_IEEE_P (compare_mode) && FLOAT128_IEEE_P (move_mode))
+    {
+      rtx element = WORDS_BIG_ENDIAN ? const0_rtx : const1_rtx;
+      emit_insn (gen_vsx_xxspltd_v2di (mask_reg, mask_reg, element));
+    }
 
-  operands[9] = gen_rtx_fmt_ee (cond, CCFPmode, operands[2], operands[3]);
+  /* Now emit the XXSEL insn.  */
+  emit_insn (gen_xxsel<FPMASK:mode> (dest, mask_reg, mask_0, move_t, move_f));
+  DONE;
 }
- [(set_attr "length" "8")
+ ;; length is 12 in case we need to add XXPERMDI
+ [(set_attr "length" "12")
   (set_attr "type" "vecperm")])
 
-(define_insn "*fpmask<mode>"
-  [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa")
+(define_insn "fpmask<mode>"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=<Fv>")
 	(if_then_else:V2DI
 	 (match_operator:CCFP 1 "fpmask_comparison_operator"
-		[(match_operand:SFDF 2 "vsx_register_operand" "<Fv>")
-		 (match_operand:SFDF 3 "vsx_register_operand" "<Fv>")])
+		[(match_operand:FPMASK 2 "vsx_register_operand" "<Fv>")
+		 (match_operand:FPMASK 3 "vsx_register_operand" "<Fv>")])
 	 (match_operand:V2DI 4 "all_ones_constant" "")
 	 (match_operand:V2DI 5 "zero_constant" "")))]
   "TARGET_P9_MINMAX"
-  "xscmp%V1dp %x0,%x2,%x3"
+{
+  return (FLOAT128_IEEE_P (<MODE>mode)
+	  ? "xscmp%V1qp %0,%2,%3"
+	  : "xscmp%V1dp %x0,%x2,%x3");
+}
   [(set_attr "type" "fpcompare")])
 
-(define_insn "*xxsel<mode>"
-  [(set (match_operand:SFDF 0 "vsx_register_operand" "=<Fv>")
-	(if_then_else:SFDF (ne (match_operand:V2DI 1 "vsx_register_operand" "wa")
-			       (match_operand:V2DI 2 "zero_constant" ""))
-			   (match_operand:SFDF 3 "vsx_register_operand" "<Fv>")
-			   (match_operand:SFDF 4 "vsx_register_operand" "<Fv>")))]
+(define_insn "xxsel<mode>"
+  [(set (match_operand:FPMASK 0 "vsx_register_operand" "=wa")
+	(if_then_else:FPMASK
+	 (ne (match_operand:V2DI 1 "vsx_register_operand" "wa")
+	     (match_operand:V2DI 2 "zero_constant" ""))
+	 (match_operand:FPMASK 3 "vsx_register_operand" "wa")
+	 (match_operand:FPMASK 4 "vsx_register_operand" "wa")))]
   "TARGET_P9_MINMAX"
   "xxsel %x0,%x4,%x3,%x1"
   [(set_attr "type" "vecmove")])
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-cmove.c b/gcc/testsuite/gcc.target/powerpc/float128-cmove.c
new file mode 100644
index 00000000000..639d5a77087
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-cmove.c
@@ -0,0 +1,93 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+/* { dg-final { scan-assembler     {\mxscmpeq[dq]p\M} } } */
+/* { dg-final { scan-assembler     {\mxxpermdi\M}     } } */
+/* { dg-final { scan-assembler     {\mxxsel\M}        } } */
+/* { dg-final { scan-assembler-not {\mxscmpu[dq]p\M}  } } */
+/* { dg-final { scan-assembler-not {\mfcmp[uo]\M}     } } */
+/* { dg-final { scan-assembler-not {\mfsel\M}         } } */
+
+/* This series of tests tests whether you can do a conditional move where the
+   test is one floating point type, and the result is another floating point
+   type.
+
+   If the comparison type is SF/DFmode, and the move type is IEEE 128-bit
+   floating point, we have to duplicate the mask in the lower 64-bits with
+   XXPERMDI because XSCMPEQDP clears the bottom 64-bits of the mask register.
+
+   Going the other way (IEEE 128-bit comparsion, 64-bit move) is fine as the
+   mask word will be 128-bits.  */
+
+float
+eq_f_d (float a, float b, double x, double y)
+{
+  return (x == y) ? a : b;
+}
+
+double
+eq_d_f (double a, double b, float x, float y)
+{
+  return (x == y) ? a : b;
+}
+
+float
+eq_f_f128 (float a, float b, __float128 x, __float128 y)
+{
+  return (x == y) ? a : b;
+}
+
+double
+eq_d_f128 (double a, double b, __float128 x, __float128 y)
+{
+  return (x == y) ? a : b;
+}
+
+__float128
+eq_f128_f (__float128 a, __float128 b, float x, float y)
+{
+  return (x == y) ? a : b;
+}
+
+__float128
+eq_f128_d (__float128 a, __float128 b, double x, double y)
+{
+  return (x != y) ? a : b;
+}
+
+float
+ne_f_d (float a, float b, double x, double y)
+{
+  return (x != y) ? a : b;
+}
+
+double
+ne_d_f (double a, double b, float x, float y)
+{
+  return (x != y) ? a : b;
+}
+
+float
+ne_f_f128 (float a, float b, __float128 x, __float128 y)
+{
+  return (x != y) ? a : b;
+}
+
+double
+ne_d_f128 (double a, double b, __float128 x, __float128 y)
+{
+  return (x != y) ? a : b;
+}
+
+__float128
+ne_f128_f (__float128 a, __float128 b, float x, float y)
+{
+  return (x != y) ? a : b;
+}
+
+__float128
+ne_f128_d (__float128 a, __float128 b, double x, double y)
+{
+  return (x != y) ? a : b;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c b/gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c
new file mode 100644
index 00000000000..6f7627c0f2a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c
@@ -0,0 +1,15 @@
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#ifndef TYPE
+#define TYPE _Float128
+#endif
+
+/* Test that the fminf128/fmaxf128 functions generate if/then/else and not a
+   call.  */
+TYPE f128_min (TYPE a, TYPE b) { return (a < b) ? a : b; }
+TYPE f128_max (TYPE a, TYPE b) { return (b > a) ? b : a; }
+
+/* { dg-final { scan-assembler {\mxsmaxcqp\M} } } */
+/* { dg-final { scan-assembler {\mxsmincqp\M} } } */
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] Power10: Add IEEE 128-bit fp conditional move
  2020-11-16  4:45 [PATCH 0/2] Power10 IEEE 128-bit min, max, cmove Michael Meissner
@ 2020-11-16  4:53 ` Michael Meissner
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Meissner @ 2020-11-16  4:53 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

Power10: Add IEEE 128-bit fp conditional move.

This patch adds the support for power10 IEEE 128-bit floating point conditional
move and for automatically generating min/max.  Unlike the previous patch, I
decided to keep two separate patterns for fpmask before splitting (one pattern
for normal compares, and the other pattern for inverted compares).  I can go
back to a single pattern with a new predicate that allows either comparison.

Compared to the original code, these patterns do simplify the fpmask insns to
having one alternative instead of two.  In the original code, the first
alternative tried to use the result as a temporary register.  But that doesn't
work if you are doing a conditional move with SF/DF types, but the comparison
is KF/TF.  That is because the SF/DF types can use the traditional FPR
registers, but IEEE 128-bit floating point can only do arithmetic in the
traditional Altivec registers.

This code also has to insert a XXPERMDI if you are moving KF/TF values, but
the comparison is done with SF/DF values.  In this case, the set and compare
mask for SF/DF clears the bottom 64-bits of the register, and the XXPERMDI is
needed to fill it.

I have tested these patches with a bootstrap compiler on a little endian power9
system and also on a big endian power8 system.  There were no regressions.  Can
I check this patch into the master branch?

gcc/
2020-11-15 Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/rs6000.c (have_compare_and_set_mask): Add IEEE
	128-bit floating point types.
	* config/rs6000/rs6000.md (FPMASK): New iterator.
	(FPMASK2): New iterator.
	(Fv mode attribute): Add KFmode and TFmode.
	(mov<FPMASK:mode><FPMASK2:mode>cc_fpmask): Replace
	mov<SFDF:mode><SFDF2:mode>cc_p9.  Add IEEE 128-bit fp support.
	(mov<FPMASK:mode><FPMASK2:mode>cc_invert_fpmask): Replace
	mov<SFDF:mode><SFDF2:mode>cc_invert_p9.  Add IEEE 128-bit fp
	support.
	(fpmask<mode>): Add IEEE 128-bit fp support.  Enable generator to
	build te RTL.
	(xxsel<mode>): Add IEEE 128-bit fp support.  Enable generator to
	build te RTL.

gcc/testsuite/
2020-11-15  Michael Meissner  <meissner@linux.ibm.com>

	* gcc.target/powerpc/float128-cmove.c: New test.
	* gcc.target/powerpc/float128-minmax-3.c: New test.
---
 gcc/config/rs6000/rs6000.c                    |   8 +-
 gcc/config/rs6000/rs6000.md                   | 188 ++++++++++++------
 .../gcc.target/powerpc/float128-cmove.c       |  93 +++++++++
 .../gcc.target/powerpc/float128-minmax-3.c    |  15 ++
 4 files changed, 237 insertions(+), 67 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-cmove.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index f6a1f63e842..b6fd21a5d6f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15336,8 +15336,8 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, rtx op_false,
   return 1;
 }
 
-/* Possibly emit the xsmaxcdp and xsmincdp instructions to emit a maximum or
-   minimum with "C" semantics.
+/* Possibly emit the xsmaxc{dp,qp} and xsminc{dp,qp} instructions to emit a
+   maximum or minimum with "C" semantics.
 
    Unless you use -ffast-math, you can't use these instructions to replace
    conditions that implicitly reverse the condition because the comparison
@@ -15473,6 +15473,10 @@ have_compare_and_set_mask (machine_mode mode)
     case E_DFmode:
       return TARGET_P9_MINMAX;
 
+    case E_KFmode:
+    case E_TFmode:
+      return FLOAT128_MIN_MAX_FPMASK_P (mode);
+
     default:
       break;
     }
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d8fbac124fb..0a40f3acf78 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -564,6 +564,19 @@ (define_mode_iterator SFDF [SF DF])
 ; And again, for when we need two FP modes in a pattern.
 (define_mode_iterator SFDF2 [SF DF])
 
+; Floating scalars that supports the set compare mask instruction.
+(define_mode_iterator FPMASK [SF
+			      DF
+			      (KF "FLOAT128_MIN_MAX_FPMASK_P (KFmode)")
+			      (TF "FLOAT128_MIN_MAX_FPMASK_P (TFmode)")])
+
+; And again, for patterns that need two (potentially) different floating point
+; scalars that support the set compare mask instruction.
+(define_mode_iterator FPMASK2 [SF
+			       DF
+			       (KF "FLOAT128_MIN_MAX_FPMASK_P (KFmode)")
+			       (TF "FLOAT128_MIN_MAX_FPMASK_P (TFmode)")])
+
 ; A generic s/d attribute, for sp/dp for example.
 (define_mode_attr sd [(SF   "s") (DF   "d")
 		      (V4SF "s") (V2DF "d")])
@@ -597,8 +610,13 @@ (define_mode_attr Ff		[(SF "f") (DF "d") (DI "d")])
 ; SF/DF constraint for arithmetic on VSX registers using instructions added in
 ; ISA 2.06 (power7).  This includes instructions that normally target DF mode,
 ; but are used on SFmode, since internally SFmode values are kept in the DFmode
-; format.
-(define_mode_attr Fv		[(SF "wa") (DF "wa") (DI "wa")])
+; format.  Also include IEEE 128-bit instructions which are restricted to the
+; Altivec registers.
+(define_mode_attr Fv		[(SF "wa")
+				 (DF "wa")
+				 (DI "wa")
+				 (KF "v")
+				 (TF "v")])
 
 ; Which isa is needed for those float instructions?
 (define_mode_attr Fisa		[(SF "p8v")  (DF "*") (DI "*")])
@@ -5285,10 +5303,10 @@ (define_insn "*setnbcr_<un>signed_<GPR:mode>"
 
 ;; Floating point conditional move
 (define_expand "mov<mode>cc"
-   [(set (match_operand:SFDF 0 "gpc_reg_operand")
-	 (if_then_else:SFDF (match_operand 1 "comparison_operator")
-			    (match_operand:SFDF 2 "gpc_reg_operand")
-			    (match_operand:SFDF 3 "gpc_reg_operand")))]
+   [(set (match_operand:FPMASK 0 "gpc_reg_operand")
+	 (if_then_else:FPMASK (match_operand 1 "comparison_operator")
+			      (match_operand:FPMASK 2 "gpc_reg_operand")
+			      (match_operand:FPMASK 3 "gpc_reg_operand")))]
   "TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT"
 {
   if (rs6000_emit_cmove (operands[0], operands[1], operands[2], operands[3]))
@@ -5308,92 +5326,132 @@ (define_insn "*fsel<SFDF:mode><SFDF2:mode>4"
   "fsel %0,%1,%2,%3"
   [(set_attr "type" "fp")])
 
-(define_insn_and_split "*mov<SFDF:mode><SFDF2:mode>cc_p9"
-  [(set (match_operand:SFDF 0 "vsx_register_operand" "=&<SFDF:Fv>,<SFDF:Fv>")
-	(if_then_else:SFDF
+(define_insn_and_split "*mov<FPMASK:mode><FPMASK2:mode>cc_fpmask"
+  [(set (match_operand:FPMASK 0 "vsx_register_operand" "=<FPMASK:Fv>")
+	(if_then_else:FPMASK
 	 (match_operator:CCFP 1 "fpmask_comparison_operator"
-		[(match_operand:SFDF2 2 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")
-		 (match_operand:SFDF2 3 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")])
-	 (match_operand:SFDF 4 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")
-	 (match_operand:SFDF 5 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")))
-   (clobber (match_scratch:V2DI 6 "=0,&wa"))]
+	    [(match_operand:FPMASK2 2 "vsx_register_operand" "<FPMASK2:Fv>")
+	     (match_operand:FPMASK2 3 "vsx_register_operand" "<FPMASK2:Fv>")])
+	 (match_operand:FPMASK 4 "vsx_register_operand" "<FPMASK:Fv>")
+	 (match_operand:FPMASK 5 "vsx_register_operand" "<FPMASK:Fv>")))
+   (clobber (match_scratch:V2DI 6 "=&<FPMASK2:Fv>"))]
   "TARGET_P9_MINMAX"
   "#"
-  ""
-  [(set (match_dup 6)
-	(if_then_else:V2DI (match_dup 1)
-			   (match_dup 7)
-			   (match_dup 8)))
-   (set (match_dup 0)
-	(if_then_else:SFDF (ne (match_dup 6)
-			       (match_dup 8))
-			   (match_dup 4)
-			   (match_dup 5)))]
+  "&& 1"
+  [(pc)]
 {
-  if (GET_CODE (operands[6]) == SCRATCH)
-    operands[6] = gen_reg_rtx (V2DImode);
+  rtx dest = operands[0];
+  rtx cmp = operands[1];
+  rtx cmp_op1 = operands[2];
+  rtx cmp_op2 = operands[3];
+  rtx move_t = operands[4];
+  rtx move_f = operands[5];
+  rtx mask_reg = operands[6];
+  rtx mask_m1 = CONSTM1_RTX (V2DImode);
+  rtx mask_0 = CONST0_RTX (V2DImode);
+  machine_mode move_mode = <FPMASK:MODE>mode;
+  machine_mode compare_mode = <FPMASK2:MODE>mode;
 
-  operands[7] = CONSTM1_RTX (V2DImode);
-  operands[8] = CONST0_RTX (V2DImode);
+  if (GET_CODE (mask_reg) == SCRATCH)
+    mask_reg = gen_reg_rtx (V2DImode);
+
+  /* Emit the compare and set mask instruction.  */
+  emit_insn (gen_fpmask<FPMASK2:mode> (mask_reg, cmp, cmp_op1, cmp_op2,
+				       mask_m1, mask_0));
+
+  /* If we have a 64-bit comparison, but an 128-bit move, we need to extend the
+     mask.  Because we are using the splat builtin to extend the V2DImode, we
+     need to use element 1 on little endian systems.  */
+  if (!FLOAT128_IEEE_P (compare_mode) && FLOAT128_IEEE_P (move_mode))
+    {
+      rtx element = WORDS_BIG_ENDIAN ? const0_rtx : const1_rtx;
+      emit_insn (gen_vsx_xxspltd_v2di (mask_reg, mask_reg, element));
+    }
+
+  /* Now emit the XXSEL insn.  */
+  emit_insn (gen_xxsel<FPMASK:mode> (dest, mask_reg, mask_0, move_t, move_f));
+  DONE;
 }
- [(set_attr "length" "8")
+ ;; length is 12 in case we need to add XXPERMDI
+ [(set_attr "length" "12")
   (set_attr "type" "vecperm")])
 
 ;; Handle inverting the fpmask comparisons.
-(define_insn_and_split "*mov<SFDF:mode><SFDF2:mode>cc_invert_p9"
-  [(set (match_operand:SFDF 0 "vsx_register_operand" "=&<SFDF:Fv>,<SFDF:Fv>")
-	(if_then_else:SFDF
+(define_insn_and_split "*mov<FPMASK:mode><FPMASK2:mode>cc_invert_fpmask"
+  [(set (match_operand:FPMASK 0 "vsx_register_operand" "=<FPMASK:Fv>")
+	(if_then_else:FPMASK
 	 (match_operator:CCFP 1 "invert_fpmask_comparison_operator"
-		[(match_operand:SFDF2 2 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")
-		 (match_operand:SFDF2 3 "vsx_register_operand" "<SFDF2:Fv>,<SFDF2:Fv>")])
-	 (match_operand:SFDF 4 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")
-	 (match_operand:SFDF 5 "vsx_register_operand" "<SFDF:Fv>,<SFDF:Fv>")))
-   (clobber (match_scratch:V2DI 6 "=0,&wa"))]
+	    [(match_operand:FPMASK2 2 "vsx_register_operand" "<FPMASK2:Fv>")
+	     (match_operand:FPMASK2 3 "vsx_register_operand" "<FPMASK2:Fv>")])
+	 (match_operand:FPMASK 4 "vsx_register_operand" "<FPMASK:Fv>")
+	 (match_operand:FPMASK 5 "vsx_register_operand" "<FPMASK:Fv>")))
+   (clobber (match_scratch:V2DI 6 "=&<FPMASK2:Fv>"))]
   "TARGET_P9_MINMAX"
   "#"
   "&& 1"
-  [(set (match_dup 6)
-	(if_then_else:V2DI (match_dup 9)
-			   (match_dup 7)
-			   (match_dup 8)))
-   (set (match_dup 0)
-	(if_then_else:SFDF (ne (match_dup 6)
-			       (match_dup 8))
-			   (match_dup 5)
-			   (match_dup 4)))]
+  [(pc)]
 {
-  rtx op1 = operands[1];
-  enum rtx_code cond = reverse_condition_maybe_unordered (GET_CODE (op1));
+  rtx dest = operands[0];
+  rtx old_cmp = operands[1];
+  rtx cmp_op1 = operands[2];
+  rtx cmp_op2 = operands[3];
+  enum rtx_code cond = reverse_condition_maybe_unordered (GET_CODE (old_cmp));
+  rtx cmp_rev = gen_rtx_fmt_ee (cond, CCFPmode, cmp_op1, cmp_op2);
+  rtx move_f = operands[4];
+  rtx move_t = operands[5];
+  rtx mask_reg = operands[6];
+  rtx mask_m1 = CONSTM1_RTX (V2DImode);
+  rtx mask_0 = CONST0_RTX (V2DImode);
+  machine_mode move_mode = <FPMASK:MODE>mode;
+  machine_mode compare_mode = <FPMASK2:MODE>mode;
+
+  if (GET_CODE (mask_reg) == SCRATCH)
+    mask_reg = gen_reg_rtx (V2DImode);
 
-  if (GET_CODE (operands[6]) == SCRATCH)
-    operands[6] = gen_reg_rtx (V2DImode);
+  /* Emit the compare and set mask instruction.  */
+  emit_insn (gen_fpmask<FPMASK2:mode> (mask_reg, cmp_rev, cmp_op1, cmp_op2,
+				       mask_m1, mask_0));
 
-  operands[7] = CONSTM1_RTX (V2DImode);
-  operands[8] = CONST0_RTX (V2DImode);
+  /* If we have a 64-bit comparison, but an 128-bit move, we need to extend the
+     mask.  Because we are using the splat builtin to extend the V2DImode, we
+     need to use element 1 on little endian systems.  */
+  if (!FLOAT128_IEEE_P (compare_mode) && FLOAT128_IEEE_P (move_mode))
+    {
+      rtx element = WORDS_BIG_ENDIAN ? const0_rtx : const1_rtx;
+      emit_insn (gen_vsx_xxspltd_v2di (mask_reg, mask_reg, element));
+    }
 
-  operands[9] = gen_rtx_fmt_ee (cond, CCFPmode, operands[2], operands[3]);
+  /* Now emit the XXSEL insn.  */
+  emit_insn (gen_xxsel<FPMASK:mode> (dest, mask_reg, mask_0, move_t, move_f));
+  DONE;
 }
- [(set_attr "length" "8")
+ ;; length is 12 in case we need to add XXPERMDI
+ [(set_attr "length" "12")
   (set_attr "type" "vecperm")])
 
-(define_insn "*fpmask<mode>"
-  [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa")
+(define_insn "fpmask<mode>"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=<Fv>")
 	(if_then_else:V2DI
 	 (match_operator:CCFP 1 "fpmask_comparison_operator"
-		[(match_operand:SFDF 2 "vsx_register_operand" "<Fv>")
-		 (match_operand:SFDF 3 "vsx_register_operand" "<Fv>")])
+		[(match_operand:FPMASK 2 "vsx_register_operand" "<Fv>")
+		 (match_operand:FPMASK 3 "vsx_register_operand" "<Fv>")])
 	 (match_operand:V2DI 4 "all_ones_constant" "")
 	 (match_operand:V2DI 5 "zero_constant" "")))]
   "TARGET_P9_MINMAX"
-  "xscmp%V1dp %x0,%x2,%x3"
+{
+  return (FLOAT128_IEEE_P (<MODE>mode)
+	  ? "xscmp%V1qp %0,%2,%3"
+	  : "xscmp%V1dp %x0,%x2,%x3");
+}
   [(set_attr "type" "fpcompare")])
 
-(define_insn "*xxsel<mode>"
-  [(set (match_operand:SFDF 0 "vsx_register_operand" "=<Fv>")
-	(if_then_else:SFDF (ne (match_operand:V2DI 1 "vsx_register_operand" "wa")
-			       (match_operand:V2DI 2 "zero_constant" ""))
-			   (match_operand:SFDF 3 "vsx_register_operand" "<Fv>")
-			   (match_operand:SFDF 4 "vsx_register_operand" "<Fv>")))]
+(define_insn "xxsel<mode>"
+  [(set (match_operand:FPMASK 0 "vsx_register_operand" "=wa")
+	(if_then_else:FPMASK
+	 (ne (match_operand:V2DI 1 "vsx_register_operand" "wa")
+	     (match_operand:V2DI 2 "zero_constant" ""))
+	 (match_operand:FPMASK 3 "vsx_register_operand" "wa")
+	 (match_operand:FPMASK 4 "vsx_register_operand" "wa")))]
   "TARGET_P9_MINMAX"
   "xxsel %x0,%x4,%x3,%x1"
   [(set_attr "type" "vecmove")])
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-cmove.c b/gcc/testsuite/gcc.target/powerpc/float128-cmove.c
new file mode 100644
index 00000000000..639d5a77087
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-cmove.c
@@ -0,0 +1,93 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+/* { dg-final { scan-assembler     {\mxscmpeq[dq]p\M} } } */
+/* { dg-final { scan-assembler     {\mxxpermdi\M}     } } */
+/* { dg-final { scan-assembler     {\mxxsel\M}        } } */
+/* { dg-final { scan-assembler-not {\mxscmpu[dq]p\M}  } } */
+/* { dg-final { scan-assembler-not {\mfcmp[uo]\M}     } } */
+/* { dg-final { scan-assembler-not {\mfsel\M}         } } */
+
+/* This series of tests tests whether you can do a conditional move where the
+   test is one floating point type, and the result is another floating point
+   type.
+
+   If the comparison type is SF/DFmode, and the move type is IEEE 128-bit
+   floating point, we have to duplicate the mask in the lower 64-bits with
+   XXPERMDI because XSCMPEQDP clears the bottom 64-bits of the mask register.
+
+   Going the other way (IEEE 128-bit comparsion, 64-bit move) is fine as the
+   mask word will be 128-bits.  */
+
+float
+eq_f_d (float a, float b, double x, double y)
+{
+  return (x == y) ? a : b;
+}
+
+double
+eq_d_f (double a, double b, float x, float y)
+{
+  return (x == y) ? a : b;
+}
+
+float
+eq_f_f128 (float a, float b, __float128 x, __float128 y)
+{
+  return (x == y) ? a : b;
+}
+
+double
+eq_d_f128 (double a, double b, __float128 x, __float128 y)
+{
+  return (x == y) ? a : b;
+}
+
+__float128
+eq_f128_f (__float128 a, __float128 b, float x, float y)
+{
+  return (x == y) ? a : b;
+}
+
+__float128
+eq_f128_d (__float128 a, __float128 b, double x, double y)
+{
+  return (x != y) ? a : b;
+}
+
+float
+ne_f_d (float a, float b, double x, double y)
+{
+  return (x != y) ? a : b;
+}
+
+double
+ne_d_f (double a, double b, float x, float y)
+{
+  return (x != y) ? a : b;
+}
+
+float
+ne_f_f128 (float a, float b, __float128 x, __float128 y)
+{
+  return (x != y) ? a : b;
+}
+
+double
+ne_d_f128 (double a, double b, __float128 x, __float128 y)
+{
+  return (x != y) ? a : b;
+}
+
+__float128
+ne_f128_f (__float128 a, __float128 b, float x, float y)
+{
+  return (x != y) ? a : b;
+}
+
+__float128
+ne_f128_d (__float128 a, __float128 b, double x, double y)
+{
+  return (x != y) ? a : b;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c b/gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c
new file mode 100644
index 00000000000..6f7627c0f2a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c
@@ -0,0 +1,15 @@
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#ifndef TYPE
+#define TYPE _Float128
+#endif
+
+/* Test that the fminf128/fmaxf128 functions generate if/then/else and not a
+   call.  */
+TYPE f128_min (TYPE a, TYPE b) { return (a < b) ? a : b; }
+TYPE f128_max (TYPE a, TYPE b) { return (b > a) ? b : a; }
+
+/* { dg-final { scan-assembler {\mxsmaxcqp\M} } } */
+/* { dg-final { scan-assembler {\mxsmincqp\M} } } */
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-01-15  4:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22  3:39 [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move Michael Meissner
2020-09-22  3:41 ` [PATCH 1/2] Power10: Add IEEE 128-bit xsmaxcqp and xsmincqp support Michael Meissner
2020-09-22  3:42 ` [PATCH 2/2] Power10: Add IEEE 128-bit fp conditional move Michael Meissner
2020-09-24  8:24 ` [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and " Florian Weimer
2020-09-24 20:56   ` Michael Meissner
2020-09-25 18:52     ` Segher Boessenkool
2020-10-12 23:02 ` Ping: " Michael Meissner
2020-11-16  4:45 [PATCH 0/2] Power10 IEEE 128-bit min, max, cmove Michael Meissner
2020-11-16  4:53 ` [PATCH 2/2] Power10: Add IEEE 128-bit fp conditional move Michael Meissner
2021-01-15  4:40 [PATCH 0/2] PowerPC: Add power10 IEEE 128-bit min/max/cmove Michael Meissner
2021-01-15  4:42 ` [PATCH 2/2] Power10: Add IEEE 128-bit fp conditional move Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).