public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions
@ 2021-11-05  4:02 Michael Meissner
  2021-11-05  4:04 ` [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function) Michael Meissner
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-05  4:02 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

These patches are a refinement of the patches to add XXSPLTIDP support on
September 13th.  These patches generate instructions that load up a VSX
register with certain constants instead of using PLXV to load the constant.

On the Power10:

 * XXSPLTIDP is a prefixed instruction that takes a value encoded as a SFmode
   constant, converts it to DFmode, and splats that value to the two 64-bit
   parts of the register.

 * XXSPLTIW is a prefixed instruction that takes a 32-bit value and splats this
   value into the 4 32-bit parts of the vector register, i.e. it can be used to
   generate V4SImode and V4SFmode vector constants where all of the elements
   are the same.

 * XXSPLTI32DX is a prefixed instruction that takes a 32-bit value and splats
   this value into either the 2 even 32-bit parts of the vector register or 2
   odd 32-bit parts.  Thus 2 XXSPLTI32DX instructions can generate a 64-bit
   constant that cannot be generated by XXSPLTIDP.  Note, in the current set of
   patches, I do not add support for XXSPLTI32DX.  I have done so in previous
   patches, and I could add it if desired.  Because it is 2 back-to-back
   prefixed instructions that are serially dependent on each other, I don't
   think it is worthwhile to use XXSPLTI32DX.

 * LXVKQ is a non-prefixed instruction that loads up certain 128-bit values the
   match particular IEEE 128-bit constants (-0.0f128, 1.0f128, 2.0f128, etc.).

There are 5 patches in this set.

One of the takeaways from the last review was it would be desirable to generate
the instruction if it generates a value that matches the vector constant, even
if the vector type is not the native vector type for the instruction.

For example, the following code:

	vector unsigned long long
	foo (void)
	{
	#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
	  return (vector unsigned long long) { 0, 1ULL << 63 };
	#else
	  return (vector unsigned long long) { 1ULL << 63, 0 };
	#endif
	}

should generate:

	foo:
		lxvkq 34,16
	        blr

To that end, I added support to create a data structure that takes a vector or
scalar constant and represents it as a series of bytes, half-words, words, and
double-words.  Then the recognizer functions use this data structure to decide
if a given instruction can be generated.

This way functions like easy_vector_constant can avoid repeatedly taking a
vector constant and converting it into internal format before trying to decide
if a given instruction can be generated.  For example, this is the part in
easy_vector_constant that determines if a vector constant can generate LXVKQ,
XXSPLTIDP, or XXSPLTIW:

      /* Constants that can be generated with ISA 3.1 instructions are
         easy.  */
      vec_const_128bit_type vsx_const;
      if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
	{
	  if (constant_generates_lxvkq (&vsx_const))
	    return true;

	  if (constant_generates_xxspltiw (&vsx_const))
	    return true;

	  if (constant_generates_xxspltidp (&vsx_const))
	    return true;
	}

In theory, a lot of the altivec constant functions could be converted to use
this data structure, but I haven't rewritten those instructions.

The 5 patches are:

1) Add the data structure and function converting vector/scalar constants to
   that data structure.  Note, this function is not used in the current patch,
   but the remaining 4 patches depend on it.
   
2) Add support to recognize when we could generate the LXVKQ instruction.

3) Add support to recognize when we could generate the XXSPLTIW instruction.

4) Add support to recognize when we could generate the XXSPLTIDP instruction
   for vector constants.

5) Add support to recognize when we could generate the XXSPLTIDP instruction
   for SFmode and DFmode constants.

I have built these patches on power9 and power10 little endian systems with no
regressions in the current tests.  I am kicking off a build on a power8 big
endian system as I write this post.  I have run previous versions of the patch
on the big endian system without problems.  I would like to check this into the
GCC 12 trunk branch.

At the moment, I am not asking to be able to back-port the patches to GCC 11,
but we can do this if it is deemed desirable.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)
  2021-11-05  4:02 [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner
@ 2021-11-05  4:04 ` Michael Meissner
  2021-11-05 17:01   ` will schmidt
                     ` (2 more replies)
  2021-11-05  4:07 ` [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ) Michael Meissner
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-05  4:04 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Add new constant data structure.

This patch provides the data structure and function to convert a
CONST_INT, CONST_DOUBLE, CONST_VECTOR, or VEC_DUPLICATE of a constant) to
an array of bytes, half-words, words, and  double words that can be loaded
into a 128-bit vector register.

The next patches will use this data structure to generate code that
generates load of the vector/floating point registers using the XXSPLTIDP,
XXSPLTIW, and LXVKQ instructions that were added in power10.

2021-11-05  Michael Meissner  <meissner@the-meissners.org>

gcc/

	* config/rs6000/rs6000-protos.h (VECTOR_128BIT_*): New macros.
	(vec_const_128bit_type): New structure type.
	(vec_const_128bit_to_bytes): New declaration.
	* config/rs6000/rs6000.c (constant_int_to_128bit_vector): New
	helper function.
	(constant_fp_to_128bit_vector): New helper function.
	(vec_const_128bit_to_bytes): New function.
---
 gcc/config/rs6000/rs6000-protos.h |  28 ++++
 gcc/config/rs6000/rs6000.c        | 253 ++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..490d6e33736 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -222,6 +222,34 @@ address_is_prefixed (rtx addr,
   return (iform == INSN_FORM_PREFIXED_NUMERIC
 	  || iform == INSN_FORM_PCREL_LOCAL);
 }
+
+/* Functions and data structures relating to 128-bit constants that are
+   converted to byte, half-word, word, and double-word values.  All fields are
+   kept in big endian order.  We also convert scalar values to 128-bits if they
+   are going to be loaded into vector registers.  */
+#define VECTOR_128BIT_BITS		128
+#define VECTOR_128BIT_BYTES		(128 / 8)
+#define VECTOR_128BIT_HALF_WORDS	(128 / 16)
+#define VECTOR_128BIT_WORDS		(128 / 32)
+#define VECTOR_128BIT_DOUBLE_WORDS	(128 / 64)
+
+typedef struct {
+  /* Constant as various sized items.  */
+  unsigned HOST_WIDE_INT double_words[VECTOR_128BIT_DOUBLE_WORDS];
+  unsigned int words[VECTOR_128BIT_WORDS];
+  unsigned short half_words[VECTOR_128BIT_HALF_WORDS];
+  unsigned char bytes[VECTOR_128BIT_BYTES];
+
+  unsigned original_size;		/* Constant size before splat.  */
+  bool fp_constant_p;			/* Is the constant floating point?  */
+  bool all_double_words_same;		/* Are the double words all equal?  */
+  bool all_words_same;			/* Are the words all equal?  */
+  bool all_half_words_same;		/* Are the halft words all equal?  */
+  bool all_bytes_same;			/* Are the bytes all equal?  */
+} vec_const_128bit_type;
+
+extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
+				       vec_const_128bit_type *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 01affc7a47c..f285022294a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -28619,6 +28619,259 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
   fprintf (file, "\n");
 }
 
+\f
+/* Copy an integer constant to the vector constant structure.  */
+
+static void
+constant_int_to_128bit_vector (rtx op,
+			       machine_mode mode,
+			       size_t byte_num,
+			       vec_const_128bit_type *info)
+{
+  unsigned HOST_WIDE_INT uvalue = UINTVAL (op);
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+
+  for (int shift = bitsize - 8; shift >= 0; shift -= 8)
+    info->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+}
+
+/* Copy an floating point constant to the vector constant structure.  */
+
+static void
+constant_fp_to_128bit_vector (rtx op,
+			      machine_mode mode,
+			      size_t byte_num,
+			      vec_const_128bit_type *info)
+{
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+  unsigned num_words = bitsize / 32;
+  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
+  long real_words[VECTOR_128BIT_WORDS];
+
+  /* Make sure we don't overflow the real_words array and that it is
+     filled completely.  */
+  gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0);
+
+  real_to_target (real_words, rtype, mode);
+
+  /* Iterate over each 32-bit word in the floating point constant.  The
+     real_to_target function puts out words in endian fashion.  We need
+     to arrange so the words are written in big endian order.  */
+  for (unsigned num = 0; num < num_words; num++)
+    {
+      unsigned endian_num = (BYTES_BIG_ENDIAN
+			     ? num
+			     : num_words - 1 - num);
+
+      unsigned uvalue = real_words[endian_num];
+      for (int shift = 32 - 8; shift >= 0; shift -= 8)
+	info->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+    }
+
+  /* Mark that this constant involves floating point.  */
+  info->fp_constant_p = true;
+}
+
+/* Convert a vector constant OP with mode MODE to a vector 128-bit constant
+   structure INFO.
+
+   Break out the constant out to bytes, half words, words, and double words.
+   Return true if we have successfully broken out a constant.
+
+   We handle CONST_INT, CONST_DOUBLE, CONST_VECTOR, and VEC_DUPLICATE of
+   constants.  Integer and floating point scalar constants are splatted to fill
+   out the vector.  */
+
+bool
+vec_const_128bit_to_bytes (rtx op,
+			   machine_mode mode,
+			   vec_const_128bit_type *info)
+{
+  /* Initialize the constant structure.  */
+  memset ((void *)info, 0, sizeof (vec_const_128bit_type));
+
+  /* Assume CONST_INTs are DImode.  */
+  if (mode == VOIDmode)
+    mode = CONST_INT_P (op) ? DImode : GET_MODE (op);
+
+  if (mode == VOIDmode)
+    return false;
+
+  unsigned size = GET_MODE_SIZE (mode);
+  bool splat_p = false;
+
+  if (size > VECTOR_128BIT_BYTES)
+    return false;
+
+  /* Set up the bits.  */
+  switch (GET_CODE (op))
+    {
+      /* Integer constants, default to double word.  */
+    case CONST_INT:
+      {
+	constant_int_to_128bit_vector (op, mode, 0, info);
+	splat_p = true;
+	break;
+      }
+
+      /* Floating point constants.  */
+    case CONST_DOUBLE:
+      {
+	/* Fail if the floating point constant is the wrong mode.  */
+	if (GET_MODE (op) != mode)
+	  return false;
+
+	/* SFmode stored as scalars are stored in DFmode format.  */
+	if (mode == SFmode)
+	  {
+	    mode = DFmode;
+	    size = GET_MODE_SIZE (DFmode);
+	  }
+
+	constant_fp_to_128bit_vector (op, mode, 0, info);
+	splat_p = true;
+	break;
+      }
+
+      /* Vector constants, iterate over each element.  On little endian
+	 systems, we have to reverse the element numbers.  */
+    case CONST_VECTOR:
+      {
+	/* Fail if the vector constant is the wrong mode or size.  */
+	if (GET_MODE (op) != mode
+	    || GET_MODE_SIZE (mode) != VECTOR_128BIT_BYTES)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t ele_size = GET_MODE_SIZE (ele_mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    rtx ele = CONST_VECTOR_ELT (op, num);
+	    size_t byte_num = (BYTES_BIG_ENDIAN
+			       ? num
+			       : nunits - 1 - num) * ele_size;
+
+	    if (CONST_INT_P (ele))
+	      constant_int_to_128bit_vector (ele, ele_mode, byte_num, info);
+	    else if (CONST_DOUBLE_P (ele))
+	      constant_fp_to_128bit_vector (ele, ele_mode, byte_num, info);
+	    else
+	      return false;
+	  }
+
+	break;
+      }
+
+	/* Treat VEC_DUPLICATE of a constant just like a vector constant.
+	   Since we are duplicating the element, we don't have to worry about
+	   endian issues.  */
+    case VEC_DUPLICATE:
+      {
+	/* Fail if the vector duplicate is the wrong mode or size.  */
+	if (GET_MODE (op) != mode
+	    || GET_MODE_SIZE (mode) != VECTOR_128BIT_BYTES)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t ele_size = GET_MODE_SIZE (ele_mode);
+	rtx ele = XEXP (op, 0);
+	size_t nunits = GET_MODE_NUNITS (mode);
+
+	if (!CONST_INT_P (ele) && !CONST_DOUBLE_P (ele))
+	  return false;
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    size_t byte_num = num * ele_size;
+
+	    if (CONST_INT_P (ele))
+	      constant_int_to_128bit_vector (ele, ele_mode, byte_num, info);
+	    else
+	      constant_fp_to_128bit_vector (ele, ele_mode, byte_num, info);
+	  }
+
+	break;
+      }
+
+      /* Any thing else, just return failure.  */
+    default:
+      return false;
+    }
+
+  /* Possibly splat the constant to fill a vector size.  */
+  if (splat_p && size < VECTOR_128BIT_BYTES)
+    {
+      if ((VECTOR_128BIT_BYTES % size) != 0)
+	return false;
+
+      for (size_t offset = size;
+	   offset < VECTOR_128BIT_BYTES;
+	   offset += size)
+	memcpy ((void *) &info->bytes[offset],
+		(void *) &info->bytes[0],
+		size);
+    }
+
+  /* Remember original size.  */
+  info->original_size = size;
+
+  /* Determine if the bytes are all the same.  */
+  unsigned char first_byte = info->bytes[0];
+  info->all_bytes_same = true;
+  for (size_t i = 1; i < VECTOR_128BIT_BYTES; i++)
+    if (first_byte != info->bytes[i])
+      {
+	info->all_bytes_same = false;
+	break;
+      }
+
+  /* Pack half words together & determine if all of the half words are the
+     same.  */
+  for (size_t i = 0; i < VECTOR_128BIT_HALF_WORDS; i++)
+    info->half_words[i] = ((info->bytes[i * 2] << 8)
+			   | info->bytes[(i * 2) + 1]);
+
+  unsigned short first_hword = info->half_words[0];
+  info->all_half_words_same = true;
+  for (size_t i = 1; i < VECTOR_128BIT_HALF_WORDS; i++)
+    if (first_hword != info->half_words[i])
+      {
+	info->all_half_words_same = false;
+	break;
+      }
+
+  /* Pack words together & determine if all of the words are the same.  */
+  for (size_t i = 0; i < VECTOR_128BIT_WORDS; i++)
+    info->words[i] = ((info->bytes[i * 4] << 24)
+		      | (info->bytes[(i * 4) + 1] << 16)
+		      | (info->bytes[(i * 4) + 2] << 8)
+		      | info->bytes[(i * 4) + 3]);
+
+  info->all_words_same
+    = (info->words[0] == info->words[1]
+       && info->words[0] == info->words[1]
+       && info->words[0] == info->words[2]
+       && info->words[0] == info->words[3]);
+
+  /* Pack double words together & determine if all of the double words are the
+     same.  */
+  for (size_t i = 0; i < VECTOR_128BIT_DOUBLE_WORDS; i++)
+    {
+      unsigned HOST_WIDE_INT d_word = 0;
+      for (size_t j = 0; j < 8; j++)
+	d_word = (d_word << 8) | info->bytes[(i * 8) + j];
+
+      info->double_words[i] = d_word;
+    }
+
+  info->all_double_words_same
+    = (info->double_words[0] == info->double_words[1]);
+
+  return true;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
-- 
2.31.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)
  2021-11-05  4:02 [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner
  2021-11-05  4:04 ` [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function) Michael Meissner
@ 2021-11-05  4:07 ` Michael Meissner
  2021-11-05 17:52   ` will schmidt
                     ` (2 more replies)
  2021-11-05  4:09 ` [PATCH 3/5] Add Power10 XXSPLTIW Michael Meissner
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-05  4:07 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Add LXVKQ support.

This patch adds support to generate the LXVKQ instruction to load specific
IEEE-128 floating point constants.

Compared to the last time I submitted this patch, I modified it so that it
uses the bit pattern of the vector to see if it can generate the LXVKQ
instruction.  This means on a little endian Power<xxx> system, the
following code will generate a LXVKQ 34,16 instruction:

    vector long long foo (void)
    {
    #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
      return (vector long long) { 0x0000000000000000, 0x8000000000000000 };
    #else
      return (vector long long) { 0x8000000000000000, 0x0000000000000000 };
    #endif
    }

because that vector pattern is the same bit pattern as -0.0F128.

2021-11-05  Michael Meissner  <meissner@the-meissners.org>

gcc/

	* config/rs6000/constraints.md (eQ): New constraint.
	* config/rs6000/predicates.md (easy_fp_constant): Add support for
	generating the LXVKQ instruction.
	(easy_vector_constant_ieee128): New predicate.
	(easy_vector_constant): Add support for generating the LXVKQ
	instruction.
	* config/rs6000/rs6000-protos.h (constant_generates_lxvkq): New
	declaration.
	* config/rs6000/rs6000.c (output_vec_const_move): Add support for
	generating LXVKQ.
	(constant_generates_lxvkq): New function.
	* config/rs6000/rs6000.opt (-mieee128-constant): New debug
	option.
	* config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
	generating LXVKQ.
	(vsx_mov<mode>_32bit): Likewise.
	* doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
	eQ constraint.

gcc/testsuite/

	* gcc.target/powerpc/float128-constant.c: New test.
---
 gcc/config/rs6000/constraints.md              |   6 +
 gcc/config/rs6000/predicates.md               |  34 ++++
 gcc/config/rs6000/rs6000-protos.h             |   1 +
 gcc/config/rs6000/rs6000.c                    |  62 +++++++
 gcc/config/rs6000/rs6000.opt                  |   4 +
 gcc/config/rs6000/vsx.md                      |  14 ++
 gcc/doc/md.texi                               |   4 +
 .../gcc.target/powerpc/float128-constant.c    | 160 ++++++++++++++++++
 8 files changed, 285 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-constant.c

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..e72132b4c28 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@ (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; A TF/KF scalar constant or a vector constant that can load certain IEEE
+;; 128-bit constants into vector registers using LXVKQ.
+(define_constraint "eQ"
+  "An IEEE 128-bit constant that can be loaded into VSX registers."
+  (match_operand 0 "easy_vector_constant_ieee128"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..e0d1c718e9f 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,14 @@ (define_predicate "easy_fp_constant"
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
+  vec_const_128bit_type vsx_const;
+  if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
+    {
+      if (constant_generates_lxvkq (&vsx_const) != 0)
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +617,23 @@ (define_predicate "easy_fp_constant"
    return 0;
 })
 
+;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded
+;; via the LXVKQ instruction.
+
+(define_predicate "easy_vector_constant_ieee128"
+  (match_code "const_vector,const_double")
+{
+  vec_const_128bit_type vsx_const;
+
+  /* Can we generate the LXVKQ instruction?  */
+  if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10
+      || !TARGET_VSX)
+    return false;
+
+  return (vec_const_128bit_to_bytes (op, mode, &vsx_const)
+	  && constant_generates_lxvkq (&vsx_const) != 0);
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -653,6 +678,15 @@ (define_predicate "easy_vector_constant"
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      /* Constants that can be generated with ISA 3.1 instructions are
+         easy.  */
+      vec_const_128bit_type vsx_const;
+      if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
+	{
+	  if (constant_generates_lxvkq (&vsx_const) != 0)
+	    return true;
+	}
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 490d6e33736..494a95cc6ee 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -250,6 +250,7 @@ typedef struct {
 
 extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
 				       vec_const_128bit_type *);
+extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index f285022294a..06d02085b06 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6991,6 +6991,17 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      vec_const_128bit_type vsx_const;
+      if (TARGET_POWER10 && vec_const_128bit_to_bytes (vec, mode, &vsx_const))
+	{
+	  unsigned imm = constant_generates_lxvkq (&vsx_const);
+	  if (imm)
+	    {
+	      operands[2] = GEN_INT (imm);
+	      return "lxvkq %x0,%2";
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -28872,6 +28883,57 @@ vec_const_128bit_to_bytes (rtx op,
   return true;
 }
 
+/* Determine if an IEEE 128-bit constant can be loaded with LXVKQ.  Return zero
+   if the LXVKQ instruction cannot be used.  Otherwise return the immediate
+   value to be used with the LXVKQ instruction.  */
+
+unsigned
+constant_generates_lxvkq (vec_const_128bit_type *vsx_const)
+{
+  /* Is the instruction supported with power10 code generation, IEEE 128-bit
+     floating point hardware and VSX registers are available.  */
+  if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10
+      || !TARGET_VSX)
+    return 0;
+
+  /* Verify that all of the bottom 3 words in the constants loaded by the
+     LXVKQ instruction are zero.  */
+  if (vsx_const->words[1] != 0
+      || vsx_const->words[2] != 0
+      || vsx_const->words[3] != 0)
+      return 0;
+
+  /* See if we have a match.  */
+  switch (vsx_const->words[0])
+    {
+    case 0x3FFF0000U: return 1;		/* IEEE 128-bit +1.0.  */
+    case 0x40000000U: return 2;		/* IEEE 128-bit +2.0.  */
+    case 0x40008000U: return 3;		/* IEEE 128-bit +3.0.  */
+    case 0x40010000U: return 4;		/* IEEE 128-bit +4.0.  */
+    case 0x40014000U: return 5;		/* IEEE 128-bit +5.0.  */
+    case 0x40018000U: return 6;		/* IEEE 128-bit +6.0.  */
+    case 0x4001C000U: return 7;		/* IEEE 128-bit +7.0.  */
+    case 0x7FFF0000U: return 8;		/* IEEE 128-bit +Infinity.  */
+    case 0x7FFF8000U: return 9;		/* IEEE 128-bit quiet NaN.  */
+    case 0x80000000U: return 16;	/* IEEE 128-bit -0.0.  */
+    case 0xBFFF0000U: return 17;	/* IEEE 128-bit -1.0.  */
+    case 0xC0000000U: return 18;	/* IEEE 128-bit -2.0.  */
+    case 0xC0008000U: return 19;	/* IEEE 128-bit -3.0.  */
+    case 0xC0010000U: return 20;	/* IEEE 128-bit -4.0.  */
+    case 0xC0014000U: return 21;	/* IEEE 128-bit -5.0.  */
+    case 0xC0018000U: return 22;	/* IEEE 128-bit -6.0.  */
+    case 0xC001C000U: return 23;	/* IEEE 128-bit -7.0.  */
+    case 0xFFFF0000U: return 24;	/* IEEE 128-bit -Infinity.  */
+
+      /* anything else cannot be loaded.  */
+    default:
+      break;
+    }
+
+  return 0;
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..b7433ec4e30 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mieee128-constant
+Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the LXVKQ instruction.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0bf04feb6c4..0a376ee4c28 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,16 +1192,19 @@ (define_insn_and_split "*xxspltib_<mode>_split"
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              LXVKQ
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                eQ,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,35 +1216,43 @@ (define_insn "vsx_mov<mode>_64bit"
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
+;;              LXVKQ
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
+                wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
+                eQ,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1253,14 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit"
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
+                vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
+                *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e..4af8fd76992 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3336,6 +3336,10 @@ A constant whose negation is a signed 16-bit constant.
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eQ
+An IEEE 128-bit constant that can be loaded into a VSX register with a
+single instruction.
+
 @ifset INTERNALS
 @item G
 A floating point constant that can be loaded into a register with one
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-constant.c b/gcc/testsuite/gcc.target/powerpc/float128-constant.c
new file mode 100644
index 00000000000..e3286a786a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-constant.c
@@ -0,0 +1,160 @@
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the LXVKQ instruction is generated to load special IEEE 128-bit
+   constants.  */
+
+_Float128
+return_0 (void)
+{
+  return 0.0f128;			/* XXSPLTIB 34,0.  */
+}
+
+_Float128
+return_1 (void)
+{
+  return 1.0f128;			/* LXVKQ 34,1.  */
+}
+
+_Float128
+return_2 (void)
+{
+  return 2.0f128;			/* LXVKQ 34,2.  */
+}
+
+_Float128
+return_3 (void)
+{
+  return 3.0f128;			/* LXVKQ 34,3.  */
+}
+
+_Float128
+return_4 (void)
+{
+  return 4.0f128;			/* LXVKQ 34,4.  */
+}
+
+_Float128
+return_5 (void)
+{
+  return 5.0f128;			/* LXVKQ 34,5.  */
+}
+
+_Float128
+return_6 (void)
+{
+  return 6.0f128;			/* LXVKQ 34,6.  */
+}
+
+_Float128
+return_7 (void)
+{
+  return 7.0f128;			/* LXVKQ 34,7.  */
+}
+
+_Float128
+return_m0 (void)
+{
+  return -0.0f128;			/* LXVKQ 34,16.  */
+}
+
+_Float128
+return_m1 (void)
+{
+  return -1.0f128;			/* LXVKQ 34,17.  */
+}
+
+_Float128
+return_m2 (void)
+{
+  return -2.0f128;			/* LXVKQ 34,18.  */
+}
+
+_Float128
+return_m3 (void)
+{
+  return -3.0f128;			/* LXVKQ 34,19.  */
+}
+
+_Float128
+return_m4 (void)
+{
+  return -4.0f128;			/* LXVKQ 34,20.  */
+}
+
+_Float128
+return_m5 (void)
+{
+  return -5.0f128;			/* LXVKQ 34,21.  */
+}
+
+_Float128
+return_m6 (void)
+{
+  return -6.0f128;			/* LXVKQ 34,22.  */
+}
+
+_Float128
+return_m7 (void)
+{
+  return -7.0f128;			/* LXVKQ 34,23.  */
+}
+
+_Float128
+return_inf (void)
+{
+  return __builtin_inff128 ();		/* LXVKQ 34,8.  */
+}
+
+_Float128
+return_minf (void)
+{
+  return - __builtin_inff128 ();	/* LXVKQ 34,24.  */
+}
+
+_Float128
+return_nan (void)
+{
+  return __builtin_nanf128 ("");	/* LXVKQ 34,9.  */
+}
+
+/* Note, the following NaNs should not generate a LXVKQ instruction.  */
+_Float128
+return_mnan (void)
+{
+  return - __builtin_nanf128 ("");	/* PLXV 34,... */
+}
+
+_Float128
+return_nan2 (void)
+{
+  return __builtin_nanf128 ("1");	/* PLXV 34,... */
+}
+
+_Float128
+return_nans (void)
+{
+  return __builtin_nansf128 ("");	/* PLXV 34,... */
+}
+
+vector long long
+return_longlong_neg_0 (void)
+{
+  /* This vector is the same pattern as -0.0F128.  */
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define FIRST	0x8000000000000000
+#define SECOND	0x0000000000000000
+
+#else
+#define FIRST	0x0000000000000000
+#define SECOND	0x8000000000000000
+#endif
+
+  return (vector long long) { FIRST, SECOND };	/* LXVKQ 34,16.  */
+}
+
+/* { dg-final { scan-assembler-times {\mlxvkq\M}    19 } } */
+/* { dg-final { scan-assembler-times {\mplxv\M}      3 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M}  1 } } */
+
-- 
2.31.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 3/5] Add Power10 XXSPLTIW
  2021-11-05  4:02 [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner
  2021-11-05  4:04 ` [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function) Michael Meissner
  2021-11-05  4:07 ` [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ) Michael Meissner
@ 2021-11-05  4:09 ` Michael Meissner
  2021-11-05 18:50   ` will schmidt
                     ` (2 more replies)
  2021-11-05  4:10 ` [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants Michael Meissner
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-05  4:09 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Generate XXSPLTIW on power10.

This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
adding support for vector constants that can be used, and adding a
VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.

The eP constraint was added to recognize constants that can be loaded into
vector registers with a single prefixed instruction.

I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
constants.

2021-11-05  Michael Meissner  <meissner@linux.ibm.com>

gcc/

	* config/rs6000/constraints.md (eP): Update comment.
	* config/rs6000/predicates.md (easy_fp_constant): Add support for
	generating XXSPLTIW.
	(vsx_prefixed_constant): New predicate.
	(easy_vector_constant): Add support for
	generating XXSPLTIW.
	* config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
	declaration.
	(constant_generates_xxspltiw): Likewise.
	* config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
	XXSPLTIW, don't do XXSPLTIB and sign extend.
	(output_vec_const_move): Add support for XXSPLTIW.
	(prefixed_xxsplti_p): New function.
	(constant_generates_xxspltiw): New function.
	* config/rs6000/rs6000.md (prefixed attribute): Add support to
	mark XXSPLTI* instructions as being prefixed.
	* config/rs6000/rs6000.opt (-msplat-word-constant): New debug
	switch.
	* config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
	generating XXSPLTIW or XXSPLTIDP.
	(vsx_mov<mode>_32bit): Likewise.
	* doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
	eP constraint.

gcc/testsuite/

	* gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
	* gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
	* gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
	* gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
	* gcc.target/powerpc/vec-splati-runnable.c: Update insn count.
---
 gcc/config/rs6000/constraints.md              |  6 ++
 gcc/config/rs6000/predicates.md               | 46 ++++++++++-
 gcc/config/rs6000/rs6000-protos.h             |  2 +
 gcc/config/rs6000/rs6000.c                    | 81 +++++++++++++++++++
 gcc/config/rs6000/rs6000.md                   |  5 ++
 gcc/config/rs6000/rs6000.opt                  |  4 +
 gcc/config/rs6000/vsx.md                      | 28 +++----
 gcc/doc/md.texi                               |  4 +
 .../powerpc/vec-splat-constant-v16qi.c        | 27 +++++++
 .../powerpc/vec-splat-constant-v4sf.c         | 67 +++++++++++++++
 .../powerpc/vec-splat-constant-v4si.c         | 51 ++++++++++++
 .../powerpc/vec-splat-constant-v8hi.c         | 62 ++++++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c  |  4 +-
 13 files changed, 369 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index e72132b4c28..a4b05837fa6 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@ (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; A SF/DF scalar constant or a vector constant that can be loaded into vector
+;; registers with one prefixed instruction such as XXSPLTIDP or XXSPLTIW.
+(define_constraint "eP"
+  "A constant that can be loaded into a VSX register with one prefixed insn."
+  (match_operand 0 "vsx_prefixed_constant"))
+
 ;; A TF/KF scalar constant or a vector constant that can load certain IEEE
 ;; 128-bit constants into vector registers using LXVKQ.
 (define_constraint "eQ"
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index e0d1c718e9f..ed6252bd0c4 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -605,7 +605,10 @@ (define_predicate "easy_fp_constant"
   vec_const_128bit_type vsx_const;
   if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
     {
-      if (constant_generates_lxvkq (&vsx_const) != 0)
+      if (constant_generates_lxvkq (&vsx_const))
+	return true;
+
+      if (constant_generates_xxspltiw (&vsx_const))
 	return true;
     }
 
@@ -617,6 +620,42 @@ (define_predicate "easy_fp_constant"
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit floating point scalar constant or a
+;; vector constant that can be loaded to a VSX register with one prefixed
+;; instruction, such as XXSPLTIDP or XXSPLTIW.
+;;
+;; In addition regular constants, we also recognize constants formed with the
+;; VEC_DUPLICATE insn from scalar constants.
+;;
+;; We don't handle scalar integer constants here because the assumption is the
+;; normal integer constants will be loaded into GPR registers.  For the
+;; constants that need to be loaded into vector registers, the instructions
+;; don't work well with TImode variables assigned a constant.  This is because
+;; the 64-bit scalar constants are splatted into both halves of the register.
+
+(define_predicate "vsx_prefixed_constant"
+  (match_code "const_double,const_vector,vec_duplicate")
+{
+  /* If we can generate the constant with 1-2 Altivec instructions, don't
+      generate a prefixed instruction.  */
+  if (CONST_VECTOR_P (op) && easy_altivec_constant (op, mode))
+    return false;
+
+  /* Do we have prefixed instructions and are VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  vec_const_128bit_type vsx_const;
+  if (!vec_const_128bit_to_bytes (op, mode, &vsx_const))
+    return false;
+
+  if (constant_generates_xxspltiw (&vsx_const))
+    return true;
+
+  return false;
+})
+
 ;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded
 ;; via the LXVKQ instruction.
 
@@ -683,7 +722,10 @@ (define_predicate "easy_vector_constant"
       vec_const_128bit_type vsx_const;
       if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
 	{
-	  if (constant_generates_lxvkq (&vsx_const) != 0)
+	  if (constant_generates_lxvkq (&vsx_const))
+	    return true;
+
+	  if (constant_generates_xxspltiw (&vsx_const))
 	    return true;
 	}
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 494a95cc6ee..99c6a671289 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -251,6 +252,7 @@ typedef struct {
 extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
 				       vec_const_128bit_type *);
 extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
+extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 06d02085b06..be24f56eb31 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6940,6 +6940,11 @@ xxspltib_constant_p (rtx op,
   else if (IN_RANGE (value, -1, 0))
     *num_insns_ptr = 1;
 
+  /* If we can generate XXSPLTIW or XXSPLTIDP, don't generate XXSPLTIB and a
+     sign extend operation.  */
+  else if (vsx_prefixed_constant (op, mode))
+    return false;
+
   else
     *num_insns_ptr = 2;
 
@@ -7000,6 +7005,13 @@ output_vec_const_move (rtx *operands)
 	      operands[2] = GEN_INT (imm);
 	      return "lxvkq %x0,%2";
 	    }
+
+	  imm = constant_generates_xxspltiw (&vsx_const);
+	  if (imm)
+	    {
+	      operands[2] = GEN_INT (imm);
+	      return "xxspltiw %x0,%2";
+	    }
 	}
 
       if (TARGET_P9_VECTOR
@@ -26767,6 +26779,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether an instruction is a prefixed XXSPLTI* instruction.  This is called
+   from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  vec_const_128bit_type vsx_const;
+  if (vec_const_128bit_to_bytes (src, mode, &vsx_const))
+    {
+      if (constant_generates_xxspltiw (&vsx_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28933,6 +28980,40 @@ constant_generates_lxvkq (vec_const_128bit_type *vsx_const)
   return 0;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  Return zero if
+   the XXSPLTIW instruction cannot be used.  Otherwise return the immediate
+   value to be used with the XXSPLTIW instruction.  */
+
+unsigned
+constant_generates_xxspltiw (vec_const_128bit_type *vsx_const)
+{
+  if (!TARGET_SPLAT_WORD_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
+    return 0;
+
+  if (!vsx_const->all_words_same)
+    return 0;
+
+  /* If we can use XXSPLTIB, don't generate XXSPLTIW.  */
+  if (vsx_const->all_bytes_same)
+    return 0;
+
+  /* See if we can use VSPLTISH or VSPLTISW.  */
+  if (vsx_const->all_half_words_same)
+    {
+      unsigned short h_word = vsx_const->half_words[0];
+      short sign_h_word = ((h_word & 0xffff) ^ 0x8000) - 0x8000;
+      if (EASY_VECTOR_15 (sign_h_word))
+	return 0;
+    }
+
+  unsigned int word = vsx_const->words[0];
+  int sign_word = ((word & 0xffffffff) ^ 0x80000000) - 0x80000000;
+  if (EASY_VECTOR_15 (sign_word))
+    return 0;
+
+  return vsx_const->words[0];
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..3a7bcd2426e 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@ (define_attr "prefixed" "no,yes"
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index b7433ec4e30..ec7b106fddb 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+msplat-word-constant
+Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the XXSPLTIW instruction.
+
 mieee128-constant
 Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the LXVKQ instruction.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0a376ee4c28..9f0c48db6f2 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,19 +1192,19 @@ (define_insn_and_split "*xxspltib_<mode>_split"
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              LXVKQ
+;;              LXVKQ      XXSPLTI*
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
-                wa,
+                wa,        wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
-                eQ,
+                eQ,        eP,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1216,43 +1216,43 @@ (define_insn "vsx_mov<mode>_64bit"
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
-                vecperm,
+                vecperm,   vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
-                *,
+                *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
-                *,
+                *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
-                *,
+                *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
-                p10,
+                p10,       p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              LXVKQ
+;;              LXVKQ      XXSPLTI*
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,
+                wa,        wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                eQ,
+                eQ,        eP,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1264,17 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit"
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecperm,
+                vecperm,   vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,
+                *,         *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,
+                p10,       p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4af8fd76992..41a568b7d4e 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3336,6 +3336,10 @@ A constant whose negation is a signed 16-bit constant.
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eP
+A scalar floating point constant or a vector constant that can be
+loaded with one prefixed instruction to a VSX register.
+
 @item eQ
 An IEEE 128-bit constant that can be loaded into a VSX register with a
 single instruction.
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..27764ddbc83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..1f0475cf47a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..59418d3bb0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,8 +149,8 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
 
 
-- 
2.31.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants
  2021-11-05  4:02 [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner
                   ` (2 preceding siblings ...)
  2021-11-05  4:09 ` [PATCH 3/5] Add Power10 XXSPLTIW Michael Meissner
@ 2021-11-05  4:10 ` Michael Meissner
  2021-11-05 19:24   ` will schmidt
                     ` (2 more replies)
  2021-11-05  4:11 ` [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants Michael Meissner
  2021-11-05 13:08 ` [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner
  5 siblings, 3 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-05  4:10 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Generate XXSPLTIDP for vectors on power10.

This patch implements XXSPLTIDP support for all vector constants.  The
XXSPLTIDP instruction is given a 32-bit immediate that is converted to a vector
of two DFmode constants.  The immediate is in SFmode format, so only constants
that fit as SFmode values can be loaded with XXSPLTIDP.

The constraint (eP) added in the previous patch for XXSPLTIW is also used
for XXSPLTIDP.

DImode scalar constants are not handled.  This is due to the majority of DImode
constants will be in the GPR registers.  With vector registers, you have the
problem that XXSPLTIDP splats the double word into both elements of the
vector.  However, if TImode is loaded with an integer constant, it wants a full
128-bit constant.

SFmode and DFmode scalar constants are not handled in this patch.  The
support for for those constants will be in the next patch.

I have added a temporary switch (-msplat-float-constant) to control whether or
not the XXSPLTIDP instruction is generated.

I added 2 new tests to test loading up V2DI and V2DF vector constants.

2021-11-05  Michael Meissner  <meissner@the-meissners.org>

gcc/

	* config/rs6000/predicates.md (easy_fp_constant): Add support for
	generating XXSPLTIDP.
	(vsx_prefixed_constant): Likewise.
	(easy_vector_constant): Likewise.
	* config/rs6000/rs6000-protos.h (constant_generates_xxspltidp):
	New declaration.
	* config/rs6000/rs6000.c (output_vec_const_move): Add support for
	generating XXSPLTIDP.
	(prefixed_xxsplti_p): Likewise.
	(constant_generates_xxspltidp): New function.
	* config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.

gcc/testsuite/

	* gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
	regex for power10.
	* gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
	* gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
---
 gcc/config/rs6000/predicates.md               |   9 ++
 gcc/config/rs6000/rs6000-protos.h             |   1 +
 gcc/config/rs6000/rs6000.c                    | 108 ++++++++++++++++++
 gcc/config/rs6000/rs6000.opt                  |   4 +
 .../powerpc/pr86731-fwrapv-longlong.c         |   9 +-
 .../powerpc/vec-splat-constant-v2df.c         |  64 +++++++++++
 .../powerpc/vec-splat-constant-v2di.c         |  50 ++++++++
 7 files changed, 241 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index ed6252bd0c4..d748b11857c 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -610,6 +610,9 @@ (define_predicate "easy_fp_constant"
 
       if (constant_generates_xxspltiw (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltidp (&vsx_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -653,6 +656,9 @@ (define_predicate "vsx_prefixed_constant"
   if (constant_generates_xxspltiw (&vsx_const))
     return true;
 
+  if (constant_generates_xxspltidp (&vsx_const))
+    return true;
+
   return false;
 })
 
@@ -727,6 +733,9 @@ (define_predicate "easy_vector_constant"
 
 	  if (constant_generates_xxspltiw (&vsx_const))
 	    return true;
+
+	  if (constant_generates_xxspltidp (&vsx_const))
+	    return true;
 	}
 
       if (TARGET_P9_VECTOR
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 99c6a671289..2d28df7442d 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -253,6 +253,7 @@ extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
 				       vec_const_128bit_type *);
 extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
 extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *);
+extern unsigned constant_generates_xxspltidp (vec_const_128bit_type *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index be24f56eb31..8fde48cf2b3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -7012,6 +7012,13 @@ output_vec_const_move (rtx *operands)
 	      operands[2] = GEN_INT (imm);
 	      return "xxspltiw %x0,%2";
 	    }
+
+	  imm = constant_generates_xxspltidp (&vsx_const);
+	  if (imm)
+	    {
+	      operands[2] = GEN_INT (imm);
+	      return "xxspltidp %x0,%2";
+	    }
 	}
 
       if (TARGET_P9_VECTOR
@@ -26809,6 +26816,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
     {
       if (constant_generates_xxspltiw (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltidp (&vsx_const))
+	return true;
     }
 
   return false;
@@ -29014,6 +29024,104 @@ constant_generates_xxspltiw (vec_const_128bit_type *vsx_const)
   return vsx_const->words[0];
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  Return zero if
+   the XXSPLTIDP instruction cannot be used.  Otherwise return the immediate
+   value to be used with the XXSPLTIDP instruction.  */
+
+unsigned
+constant_generates_xxspltidp (vec_const_128bit_type *vsx_const)
+{
+  if (!TARGET_SPLAT_FLOAT_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
+    return 0;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  if (!vsx_const->all_double_words_same)
+    return 0;
+
+  /* If the bytes, half words, or words are all the same, don't use XXSPLTIDP.
+     Use a simpler instruction (XXSPLTIB, VSPLTISB, VSPLTISH, or VSPLTISW).  */
+  if (vsx_const->all_bytes_same
+      || vsx_const->all_half_words_same
+      || vsx_const->all_words_same)
+    return 0;
+
+  unsigned HOST_WIDE_INT value = vsx_const->double_words[0];
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and the signalling NaN bit pattern.  Recognize infinity and
+     negative infinity.  */
+
+  /* Bit representation of DFmode normal quiet NaN.  */
+#define RS6000_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+
+  /* Bit representation of DFmode normal signaling NaN.  */
+#define RS6000_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+
+  /* Bit representation of DFmode positive infinity.  */
+#define RS6000_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+
+  /* Bit representation of DFmode negative infinity.  */
+#define RS6000_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (value != RS6000_CONST_DF_NAN
+      && value != RS6000_CONST_DF_NANS
+      && value != RS6000_CONST_DF_INF
+      && value != RS6000_CONST_DF_NEG_INF)
+    {
+      /* The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for
+	 the exponent, and 52 bits for the mantissa (not counting the hidden
+	 bit used for normal numbers).  NaN values have the exponent set to all
+	 1 bits, and the mantissa non-zero (mantissa == 0 is infinity).  */
+
+      int df_exponent = (value >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= value & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return 0;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return 0;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vsx_const->words[0], vsx_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return 0;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return 0;
+
+  /* Return the immediate to be used.  */
+  return sf_value;
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index ec7b106fddb..c1d661d7e6b 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ msplat-word-constant
 Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the XXSPLTIW instruction.
 
+msplat-float-constant
+Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the XXSPLTIDP instruction.
+
 mieee128-constant
 Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the LXVKQ instruction.
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
-- 
2.31.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.
  2021-11-05  4:02 [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner
                   ` (3 preceding siblings ...)
  2021-11-05  4:10 ` [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants Michael Meissner
@ 2021-11-05  4:11 ` Michael Meissner
  2021-11-05 19:38   ` will schmidt
                     ` (2 more replies)
  2021-11-05 13:08 ` [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner
  5 siblings, 3 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-05  4:11 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Generate XXSPLTIDP for scalars on power10.

This patch implements XXSPLTIDP support for SF, and DF scalar constants.
The previous patch added support for vector constants.  This patch adds
the support for SFmode and DFmode scalar constants.

I added 2 new tests to test loading up SF and DF scalar constants.

2021-11-05  Michael Meissner  <meissner@the-meissners.org>

gcc/

	* config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
	(UNSPEC_XXSPLTIW_CONST): New unspec.
	(movsf_hardfloat): Add support for generating XXSPLTIDP.
	(mov<mode>_hardfloat32): Likewise.
	(mov<mode>_hardfloat64): Likewise.
	(xxspltidp_<mode>_internal): New insns.
	(xxspltiw_<mode>_internal): New insns.
	(splitters for SF/DFmode): Add new splitters for XXSPLTIDP.

gcc/testsuite/

	* gcc.target/powerpc/vec-splat-constant-df.c: New test.
	* gcc.target/powerpc/vec-splat-constant-sf.c: New test.
---
 gcc/config/rs6000/rs6000.md                   | 97 +++++++++++++++----
 .../powerpc/vec-splat-constant-df.c           | 60 ++++++++++++
 .../powerpc/vec-splat-constant-sf.c           | 60 ++++++++++++
 3 files changed, 199 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 3a7bcd2426e..4122acb98cf 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -156,6 +156,8 @@ (define_c_enum "unspec"
    UNSPEC_PEXTD
    UNSPEC_HASHST
    UNSPEC_HASHCHK
+   UNSPEC_XXSPLTIDP_CONST
+   UNSPEC_XXSPLTIW_CONST
   ])
 
 ;;
@@ -7764,17 +7766,17 @@ (define_split
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eP"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7796,15 +7798,16 @@ (define_insn "movsf_hardfloat"
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8064,18 +8067,18 @@ (define_split
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eP"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8092,20 +8095,21 @@ (define_insn "*mov<mode>_hardfloat32"
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8132,19 +8136,19 @@ (define_insn "*mov<mode>_softfloat32"
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eP"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8166,18 +8170,19 @@ (define_insn "*mov<mode>_hardfloat64"
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -8211,6 +8216,62 @@ (define_insn "*mov<mode>_softfloat64"
    (set_attr "length"
             "*,       *,      *,      *,      *,      8,
              12,      16,     *")])
+
+;; Split the VSX prefixed instruction to support SFmode and DFmode scalar
+;; constants that look like DFmode floating point values where both elements
+;; are the same.  The constant has to be expressible as a SFmode constant that
+;; is not a SFmode denormal value.
+;;
+;; We don't need splitters for the 128-bit types, since the function
+;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
+(define_insn "xxspltidp_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIDP_CONST))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_insn "xxspltiw_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW_CONST))]
+  "TARGET_POWER10"
+  "xxspltiw %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_split
+  [(set (match_operand:SFDF 0 "vsx_register_operand")
+	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
+  "TARGET_POWER10"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  vec_const_128bit_type vsx_const;
+
+  if (!vec_const_128bit_to_bytes (src, <MODE>mode, &vsx_const))
+    gcc_unreachable ();
+
+  unsigned imm = constant_generates_xxspltidp (&vsx_const);
+  if (imm)
+    {
+      emit_insn (gen_xxspltidp_<mode>_internal (dest, GEN_INT (imm)));
+      DONE;
+    }
+
+  imm = constant_generates_xxspltiw (&vsx_const);
+  if (imm)
+    {
+      emit_insn (gen_xxspltiw_<mode>_internal (dest, GEN_INT (imm)));
+      DONE;
+    }
+
+  else
+    gcc_unreachable ();
+})
 \f
 (define_expand "mov<mode>"
   [(set (match_operand:FMOVE128 0 "general_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
-- 
2.31.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions
  2021-11-05  4:02 [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner
                   ` (4 preceding siblings ...)
  2021-11-05  4:11 ` [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants Michael Meissner
@ 2021-11-05 13:08 ` Michael Meissner
  5 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-05 13:08 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

I mentioned that I would start a build/check on a big endian power8 system in
the last set of patches.  There were no regressions with this set of patches on
a big endian system, testing both 32-bit and 64-bit code generation.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)
  2021-11-05  4:04 ` [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function) Michael Meissner
@ 2021-11-05 17:01   ` will schmidt
  2021-11-05 18:13     ` Michael Meissner
  2021-11-15 16:35   ` Ping: " Michael Meissner
  2021-12-13 16:58   ` Ping #2: " Michael Meissner
  2 siblings, 1 reply; 29+ messages in thread
From: will schmidt @ 2021-11-05 17:01 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

On Fri, 2021-11-05 at 00:04 -0400, Michael Meissner wrote:
> Add new constant data structure.
> 
> This patch provides the data structure and function to convert a
> CONST_INT, CONST_DOUBLE, CONST_VECTOR, or VEC_DUPLICATE of a constant) to
> an array of bytes, half-words, words, and  double words that can be loaded
> into a 128-bit vector register.
> 
> The next patches will use this data structure to generate code that
> generates load of the vector/floating point registers using the XXSPLTIDP,
> XXSPLTIW, and LXVKQ instructions that were added in power10.
> 
> 2021-11-05  Michael Meissner  <meissner@the-meissners.org>
> 

Email here is different than the from:.  No big deal either way.  

> gcc/
> 
> 	* config/rs6000/rs6000-protos.h (VECTOR_128BIT_*): New macros.

I defer to maintainers.  I like to explicitly include the full macro names here so a grep later on can easily find it.  


> 	(vec_const_128bit_type): New structure type.
> 	(vec_const_128bit_to_bytes): New declaration.
> 	* config/rs6000/rs6000.c (constant_int_to_128bit_vector): New
> 	helper function.
> 	(constant_fp_to_128bit_vector): New helper function.
> 	(vec_const_128bit_to_bytes): New function.

ok

> ---
>  gcc/config/rs6000/rs6000-protos.h |  28 ++++
>  gcc/config/rs6000/rs6000.c        | 253 ++++++++++++++++++++++++++++++
>  2 files changed, 281 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> index 14f6b313105..490d6e33736 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -222,6 +222,34 @@ address_is_prefixed (rtx addr,
>    return (iform == INSN_FORM_PREFIXED_NUMERIC
>  	  || iform == INSN_FORM_PCREL_LOCAL);
>  }
> +
> +/* Functions and data structures relating to 128-bit constants that are
> +   converted to byte, half-word, word, and double-word values.  All fields are
> +   kept in big endian order.  We also convert scalar values to 128-bits if they
> +   are going to be loaded into vector registers.  */
> +#define VECTOR_128BIT_BITS		128
> +#define VECTOR_128BIT_BYTES		(128 / 8)
> +#define VECTOR_128BIT_HALF_WORDS	(128 / 16)
> +#define VECTOR_128BIT_WORDS		(128 / 32)
> +#define VECTOR_128BIT_DOUBLE_WORDS	(128 / 64)

ok

> +
> +typedef struct {
> +  /* Constant as various sized items.  */
> +  unsigned HOST_WIDE_INT double_words[VECTOR_128BIT_DOUBLE_WORDS];
> +  unsigned int words[VECTOR_128BIT_WORDS];
> +  unsigned short half_words[VECTOR_128BIT_HALF_WORDS];
> +  unsigned char bytes[VECTOR_128BIT_BYTES];
> +
> +  unsigned original_size;		/* Constant size before splat.  */
> +  bool fp_constant_p;			/* Is the constant floating point?  */
> +  bool all_double_words_same;		/* Are the double words all equal?  */
> +  bool all_words_same;			/* Are the words all equal?  */
> +  bool all_half_words_same;		/* Are the halft words all equal?  */

half

> +  bool all_bytes_same;			/* Are the bytes all equal?  */




> +} vec_const_128bit_type;
> +

ok.  


> +extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
> +				       vec_const_128bit_type *);
>  #endif /* RTX_CODE */
> 
>  #ifdef TREE_CODE
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 01affc7a47c..f285022294a 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -28619,6 +28619,259 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
>    fprintf (file, "\n");
>  }
> 
> +
> +/* Copy an integer constant to the vector constant structure.  */
> +

Here and subsequent comments, I'd debate on whether to enhance the
comment to be explicit on the structure name being copied to/from.
(vec_const_128bit_type is easy to search for, vector or constant or
structure are not as unique)

> +static void
> +constant_int_to_128bit_vector (rtx op,
> +			       machine_mode mode,
> +			       size_t byte_num,
> +			       vec_const_128bit_type *info)
> +{
> +  unsigned HOST_WIDE_INT uvalue = UINTVAL (op);
> +  unsigned bitsize = GET_MODE_BITSIZE (mode);
> +
> +  for (int shift = bitsize - 8; shift >= 0; shift -= 8)
> +    info->bytes[byte_num++] = (uvalue >> shift) & 0xff;
> +}

I didn't confirm the maths, but looks OK at a glance.


> +
> +/* Copy an floating point constant to the vector constant structure.  */
> +

s/an/a/

> +static void
> +constant_fp_to_128bit_vector (rtx op,
> +			      machine_mode mode,
> +			      size_t byte_num,
> +			      vec_const_128bit_type *info)
> +{
> +  unsigned bitsize = GET_MODE_BITSIZE (mode);
> +  unsigned num_words = bitsize / 32;
> +  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
> +  long real_words[VECTOR_128BIT_WORDS];
> +
> +  /* Make sure we don't overflow the real_words array and that it is
> +     filled completely.  */
> +  gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0);

Not clear to me on the potential to partially fill the real_words
array. 

> +
> +  real_to_target (real_words, rtype, mode);
> +
> +  /* Iterate over each 32-bit word in the floating point constant.  The
> +     real_to_target function puts out words in endian fashion.  We need

Meaning host-endian fashion, or is that meant to be big-endian ? 

Perhaps also rephrase or move the comment up to indicate that
real_to_target will have placed or has already placed the words in
<whatever> endian fashion.
As stated I was expecting to see a call to real_to_target() below the
comment. 

The description of how real_to_target() works may be better placed 
on that function over in real.c .  :-)


> +     to arrange so the words are written in big endian order.  */

> +  for (unsigned num = 0; num < num_words; num++)
> +    {
> +      unsigned endian_num = (BYTES_BIG_ENDIAN
> +			     ? num
> +			     : num_words - 1 - num);
> +
> +      unsigned uvalue = real_words[endian_num];
> +      for (int shift = 32 - 8; shift >= 0; shift -= 8)
> +	info->bytes[byte_num++] = (uvalue >> shift) & 0xff;
> +    }
> +
> +  /* Mark that this constant involves floating point.  */
> +  info->fp_constant_p = true;
> +}
> +
> +/* Convert a vector constant OP with mode MODE to a vector 128-bit constant
> +   structure INFO.
> +
> +   Break out the constant out to bytes, half words, words, and double words.
> +   Return true if we have successfully broken out a constant.

Too many outs.  

> +
> +   We handle CONST_INT, CONST_DOUBLE, CONST_VECTOR, and VEC_DUPLICATE of
> +   constants.  Integer and floating point scalar constants are splatted to fill
> +   out the vector.  */
> +
> +bool
> +vec_const_128bit_to_bytes (rtx op,
> +			   machine_mode mode,
> +			   vec_const_128bit_type *info)
> +{
> +  /* Initialize the constant structure.  */
> +  memset ((void *)info, 0, sizeof (vec_const_128bit_type));
> +
> +  /* Assume CONST_INTs are DImode.  */
> +  if (mode == VOIDmode)
> +    mode = CONST_INT_P (op) ? DImode : GET_MODE (op);
> +
> +  if (mode == VOIDmode)
> +    return false;
> +
> +  unsigned size = GET_MODE_SIZE (mode);
> +  bool splat_p = false;
> +
> +  if (size > VECTOR_128BIT_BYTES)
> +    return false;
> +
> +  /* Set up the bits.  */
> +  switch (GET_CODE (op))
> +    {
> +      /* Integer constants, default to double word.  */
> +    case CONST_INT:
> +      {
> +	constant_int_to_128bit_vector (op, mode, 0, info);
> +	splat_p = true;
> +	break;
> +      }
> +
> +      /* Floating point constants.  */
> +    case CONST_DOUBLE:
> +      {
> +	/* Fail if the floating point constant is the wrong mode.  */
> +	if (GET_MODE (op) != mode)
> +	  return false;
> +
> +	/* SFmode stored as scalars are stored in DFmode format.  */
> +	if (mode == SFmode)
> +	  {
> +	    mode = DFmode;
> +	    size = GET_MODE_SIZE (DFmode);
> +	  }
> +
> +	constant_fp_to_128bit_vector (op, mode, 0, info);
> +	splat_p = true;
> +	break;
> +      }
> +
> +      /* Vector constants, iterate over each element.  On little endian
> +	 systems, we have to reverse the element numbers.  */
> +    case CONST_VECTOR:
> +      {
> +	/* Fail if the vector constant is the wrong mode or size.  */
> +	if (GET_MODE (op) != mode
> +	    || GET_MODE_SIZE (mode) != VECTOR_128BIT_BYTES)
> +	  return false;
> +
> +	machine_mode ele_mode = GET_MODE_INNER (mode);
> +	size_t ele_size = GET_MODE_SIZE (ele_mode);
> +	size_t nunits = GET_MODE_NUNITS (mode);
> +
> +	for (size_t num = 0; num < nunits; num++)
> +	  {
> +	    rtx ele = CONST_VECTOR_ELT (op, num);
> +	    size_t byte_num = (BYTES_BIG_ENDIAN
> +			       ? num
> +			       : nunits - 1 - num) * ele_size;
> +
> +	    if (CONST_INT_P (ele))
> +	      constant_int_to_128bit_vector (ele, ele_mode, byte_num, info);
> +	    else if (CONST_DOUBLE_P (ele))
> +	      constant_fp_to_128bit_vector (ele, ele_mode, byte_num, info);
> +	    else
> +	      return false;
> +	  }
> +
> +	break;
> +      }
> +
> +	/* Treat VEC_DUPLICATE of a constant just like a vector constant.
> +	   Since we are duplicating the element, we don't have to worry about
> +	   endian issues.  */
> +    case VEC_DUPLICATE:
> +      {
> +	/* Fail if the vector duplicate is the wrong mode or size.  */
> +	if (GET_MODE (op) != mode
> +	    || GET_MODE_SIZE (mode) != VECTOR_128BIT_BYTES)
> +	  return false;
> +
> +	machine_mode ele_mode = GET_MODE_INNER (mode);
> +	size_t ele_size = GET_MODE_SIZE (ele_mode);
> +	rtx ele = XEXP (op, 0);
> +	size_t nunits = GET_MODE_NUNITS (mode);
> +
> +	if (!CONST_INT_P (ele) && !CONST_DOUBLE_P (ele))
> +	  return false;
> +
> +	for (size_t num = 0; num < nunits; num++)
> +	  {
> +	    size_t byte_num = num * ele_size;
> +
> +	    if (CONST_INT_P (ele))
> +	      constant_int_to_128bit_vector (ele, ele_mode, byte_num, info);
> +	    else
> +	      constant_fp_to_128bit_vector (ele, ele_mode, byte_num, info);
> +	  }
> +
> +	break;
> +      }
> +
> +      /* Any thing else, just return failure.  */
> +    default:
> +      return false;
> +    }

Seems OK.

> +
> +  /* Possibly splat the constant to fill a vector size.  */


Suggest "Splat the constant to fill a vector size if ..."


> +  if (splat_p && size < VECTOR_128BIT_BYTES)
> +    {
> +      if ((VECTOR_128BIT_BYTES % size) != 0)
> +	return false;
> +
> +      for (size_t offset = size;
> +	   offset < VECTOR_128BIT_BYTES;
> +	   offset += size)
> +	memcpy ((void *) &info->bytes[offset],
> +		(void *) &info->bytes[0],
> +		size);
> +    }
> +
> +  /* Remember original size.  */
> +  info->original_size = size;
> +
> +  /* Determine if the bytes are all the same.  */
> +  unsigned char first_byte = info->bytes[0];
> +  info->all_bytes_same = true;
> +  for (size_t i = 1; i < VECTOR_128BIT_BYTES; i++)
> +    if (first_byte != info->bytes[i])
> +      {
> +	info->all_bytes_same = false;
> +	break;
> +      }
> +
> +  /* Pack half words together & determine if all of the half words are the
> +     same.  */
> +  for (size_t i = 0; i < VECTOR_128BIT_HALF_WORDS; i++)
> +    info->half_words[i] = ((info->bytes[i * 2] << 8)
> +			   | info->bytes[(i * 2) + 1]);
> +
> +  unsigned short first_hword = info->half_words[0];
> +  info->all_half_words_same = true;
> +  for (size_t i = 1; i < VECTOR_128BIT_HALF_WORDS; i++)
> +    if (first_hword != info->half_words[i])
> +      {
> +	info->all_half_words_same = false;
> +	break;
> +      }

ok

> +
> +  /* Pack words together & determine if all of the words are the same.  */
> +  for (size_t i = 0; i < VECTOR_128BIT_WORDS; i++)
> +    info->words[i] = ((info->bytes[i * 4] << 24)
> +		      | (info->bytes[(i * 4) + 1] << 16)
> +		      | (info->bytes[(i * 4) + 2] << 8)
> +		      | info->bytes[(i * 4) + 3]);
> +
> +  info->all_words_same
> +    = (info->words[0] == info->words[1]
> +       && info->words[0] == info->words[1]
> +       && info->words[0] == info->words[2]
> +       && info->words[0] == info->words[3]);
> +
> +  /* Pack double words together & determine if all of the double words are the
> +     same.  */
> +  for (size_t i = 0; i < VECTOR_128BIT_DOUBLE_WORDS; i++)
> +    {
> +      unsigned HOST_WIDE_INT d_word = 0;
> +      for (size_t j = 0; j < 8; j++)
> +	d_word = (d_word << 8) | info->bytes[(i * 8) + j];
> +
> +      info->double_words[i] = d_word;
> +    }
> +
> +  info->all_double_words_same
> +    = (info->double_words[0] == info->double_words[1]);
> +
> +  return true;
> +}
> +

ok.


>  struct gcc_target targetm = TARGET_INITIALIZER;
> 
>  #include "gt-rs6000.h"
> -- 
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)
  2021-11-05  4:07 ` [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ) Michael Meissner
@ 2021-11-05 17:52   ` will schmidt
  2021-11-05 18:01     ` Michael Meissner
  2021-11-15 16:36   ` Ping: " Michael Meissner
  2021-12-13 17:02   ` Ping #2: " Michael Meissner
  2 siblings, 1 reply; 29+ messages in thread
From: will schmidt @ 2021-11-05 17:52 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

On Fri, 2021-11-05 at 00:07 -0400, Michael Meissner wrote:
> Add LXVKQ support.
> 
> This patch adds support to generate the LXVKQ instruction to load specific
> IEEE-128 floating point constants.
> 
> Compared to the last time I submitted this patch, I modified it so that it
> uses the bit pattern of the vector to see if it can generate the LXVKQ
> instruction.  This means on a little endian Power<xxx> system, the
> following code will generate a LXVKQ 34,16 instruction:
> 
>     vector long long foo (void)
>     {
>     #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
>       return (vector long long) { 0x0000000000000000, 0x8000000000000000 };
>     #else
>       return (vector long long) { 0x8000000000000000, 0x0000000000000000 };
>     #endif
>     }
> 
> because that vector pattern is the same bit pattern as -0.0F128.
> 
> 2021-11-05  Michael Meissner  <meissner@the-meissners.org>
> 
> gcc/
> 
> 	* config/rs6000/constraints.md (eQ): New constraint.
> 	* config/rs6000/predicates.md (easy_fp_constant): Add support for
> 	generating the LXVKQ instruction.
> 	(easy_vector_constant_ieee128): New predicate.
> 	(easy_vector_constant): Add support for generating the LXVKQ
> 	instruction.
> 	* config/rs6000/rs6000-protos.h (constant_generates_lxvkq): New
> 	declaration.
> 	* config/rs6000/rs6000.c (output_vec_const_move): Add support for
> 	generating LXVKQ.
> 	(constant_generates_lxvkq): New function.
> 	* config/rs6000/rs6000.opt (-mieee128-constant): New debug
> 	option.
> 	* config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
> 	generating LXVKQ.
> 	(vsx_mov<mode>_32bit): Likewise.
> 	* doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
> 	eQ constraint.
> 
> gcc/testsuite/
> 
> 	* gcc.target/powerpc/float128-constant.c: New test.
> ---
>  gcc/config/rs6000/constraints.md              |   6 +
>  gcc/config/rs6000/predicates.md               |  34 ++++
>  gcc/config/rs6000/rs6000-protos.h             |   1 +
>  gcc/config/rs6000/rs6000.c                    |  62 +++++++
>  gcc/config/rs6000/rs6000.opt                  |   4 +
>  gcc/config/rs6000/vsx.md                      |  14 ++
>  gcc/doc/md.texi                               |   4 +
>  .../gcc.target/powerpc/float128-constant.c    | 160 ++++++++++++++++++
>  8 files changed, 285 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-constant.c
> 
> diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
> index c8cff1a3038..e72132b4c28 100644
> --- a/gcc/config/rs6000/constraints.md
> +++ b/gcc/config/rs6000/constraints.md
> @@ -213,6 +213,12 @@ (define_constraint "eI"
>    "A signed 34-bit integer constant if prefixed instructions are supported."
>    (match_operand 0 "cint34_operand"))
> 
> +;; A TF/KF scalar constant or a vector constant that can load certain IEEE
> +;; 128-bit constants into vector registers using LXVKQ.
> +(define_constraint "eQ"
> +  "An IEEE 128-bit constant that can be loaded into VSX registers."
> +  (match_operand 0 "easy_vector_constant_ieee128"))
> +
>  ;; Floating-point constraints.  These two are defined so that insn
>  ;; length attributes can be calculated exactly.
> 

ok


> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 956e42bc514..e0d1c718e9f 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -601,6 +601,14 @@ (define_predicate "easy_fp_constant"
>    if (TARGET_VSX && op == CONST0_RTX (mode))
>      return 1;
> 
> +  /* Constants that can be generated with ISA 3.1 instructions are easy.  */

Easy is relative, but OK.

> +  vec_const_128bit_type vsx_const;
> +  if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
> +    {
> +      if (constant_generates_lxvkq (&vsx_const) != 0)
> +	return true;
> +    }
> +
>    /* Otherwise consider floating point constants hard, so that the
>       constant gets pushed to memory during the early RTL phases.  This
>       has the advantage that double precision constants that can be
> @@ -609,6 +617,23 @@ (define_predicate "easy_fp_constant"
>     return 0;
>  })
> 
> +;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded
> +;; via the LXVKQ instruction.
> +
> +(define_predicate "easy_vector_constant_ieee128"
> +  (match_code "const_vector,const_double")
> +{
> +  vec_const_128bit_type vsx_const;
> +
> +  /* Can we generate the LXVKQ instruction?  */
> +  if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10
> +      || !TARGET_VSX)
> +    return false;

Presumably all of the checks there are valid.  (Can we have power10
without float128_hw or ieee128_constant flags set?)    I do notice the
addition of an ieee128_constant flag below.
> +
> +  return (vec_const_128bit_to_bytes (op, mode, &vsx_const)
> +	  && constant_generates_lxvkq (&vsx_const) != 0);
> +})
> +

ok


>  ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
>  ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
> 
> @@ -653,6 +678,15 @@ (define_predicate "easy_vector_constant"
>        if (zero_constant (op, mode) || all_ones_constant (op, mode))
>  	return true;
> 
> +      /* Constants that can be generated with ISA 3.1 instructions are
> +         easy.  */
> +      vec_const_128bit_type vsx_const;
> +      if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
> +	{
> +	  if (constant_generates_lxvkq (&vsx_const) != 0)
> +	    return true;
> +	}
> +
>        if (TARGET_P9_VECTOR
>            && xxspltib_constant_p (op, mode, &num_insns, &value))
>  	return true;


ok.

> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> index 490d6e33736..494a95cc6ee 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -250,6 +250,7 @@ typedef struct {
> 
>  extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
>  				       vec_const_128bit_type *);
> +extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
>  #endif /* RTX_CODE */
> 
>  #ifdef TREE_CODE

ok

> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index f285022294a..06d02085b06 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -6991,6 +6991,17 @@ output_vec_const_move (rtx *operands)
>  	    gcc_unreachable ();
>  	}
> 
> +      vec_const_128bit_type vsx_const;
> +      if (TARGET_POWER10 && vec_const_128bit_to_bytes (vec, mode, &vsx_const))
> +	{
> +	  unsigned imm = constant_generates_lxvkq (&vsx_const);
> +	  if (imm)
> +	    {
> +	      operands[2] = GEN_INT (imm);
> +	      return "lxvkq %x0,%2";
> +	    }
> +	}
> +
>        if (TARGET_P9_VECTOR
>  	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
>  	{

ok

> @@ -28872,6 +28883,57 @@ vec_const_128bit_to_bytes (rtx op,
>    return true;
>  }
> 
> +/* Determine if an IEEE 128-bit constant can be loaded with LXVKQ.  Return zero
> +   if the LXVKQ instruction cannot be used.  Otherwise return the immediate
> +   value to be used with the LXVKQ instruction.  */
> +
> +unsigned
> +constant_generates_lxvkq (vec_const_128bit_type *vsx_const)
> +{
> +  /* Is the instruction supported with power10 code generation, IEEE 128-bit
> +     floating point hardware and VSX registers are available.  */
> +  if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10
> +      || !TARGET_VSX)
> +    return 0;
> +
> +  /* Verify that all of the bottom 3 words in the constants loaded by the
> +     LXVKQ instruction are zero.  */


Ok.  I did look at this a bit before it clicked, so would suggest a
comment stl "All of the constants that can be loaded by lxvkq will have
zero in the bottom 3 words, so ensure those are zero before we use a
switch based on the nonzero portion of the constant."

It would be fine as-is too.  :-)


> +  if (vsx_const->words[1] != 0
> +      || vsx_const->words[2] != 0
> +      || vsx_const->words[3] != 0)
> +      return 0;
> +
> +  /* See if we have a match.  */
> +  switch (vsx_const->words[0])
> +    {
> +    case 0x3FFF0000U: return 1;		/* IEEE 128-bit +1.0.  */
> +    case 0x40000000U: return 2;		/* IEEE 128-bit +2.0.  */
> +    case 0x40008000U: return 3;		/* IEEE 128-bit +3.0.  */
> +    case 0x40010000U: return 4;		/* IEEE 128-bit +4.0.  */
> +    case 0x40014000U: return 5;		/* IEEE 128-bit +5.0.  */
> +    case 0x40018000U: return 6;		/* IEEE 128-bit +6.0.  */
> +    case 0x4001C000U: return 7;		/* IEEE 128-bit +7.0.  */
> +    case 0x7FFF0000U: return 8;		/* IEEE 128-bit +Infinity.  */
> +    case 0x7FFF8000U: return 9;		/* IEEE 128-bit quiet NaN.  */
> +    case 0x80000000U: return 16;	/* IEEE 128-bit -0.0.  */
> +    case 0xBFFF0000U: return 17;	/* IEEE 128-bit -1.0.  */
> +    case 0xC0000000U: return 18;	/* IEEE 128-bit -2.0.  */
> +    case 0xC0008000U: return 19;	/* IEEE 128-bit -3.0.  */
> +    case 0xC0010000U: return 20;	/* IEEE 128-bit -4.0.  */
> +    case 0xC0014000U: return 21;	/* IEEE 128-bit -5.0.  */
> +    case 0xC0018000U: return 22;	/* IEEE 128-bit -6.0.  */
> +    case 0xC001C000U: return 23;	/* IEEE 128-bit -7.0.  */
> +    case 0xFFFF0000U: return 24;	/* IEEE 128-bit -Infinity.  */

> +
> +      /* anything else cannot be loaded.  */
> +    default:
> +      break;
> +    }
> +
> +  return 0;
> +}
> +
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;


ok

> 
>  #include "gt-rs6000.h"
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 9d7878f144a..b7433ec4e30 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -640,6 +640,10 @@ mprivileged
>  Target Var(rs6000_privileged) Init(0)
>  Generate code that will run in privileged state.
> 
> +mieee128-constant
> +Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
> +Generate (do not generate) code that uses the LXVKQ instruction.
> +
>  -param=rs6000-density-pct-threshold=
>  Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
>  When costing for loop vectorization, we probably need to penalize the loop body

I do wonder if this option is necessary..   presumably it is useful at
least for before/after comparison purposes.    Is there any expectation
that this would be necessary long term? 


> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 0bf04feb6c4..0a376ee4c28 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1192,16 +1192,19 @@ (define_insn_and_split "*xxspltib_<mode>_split"
> 
>  ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
>  ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
> +;;              LXVKQ
>  ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
>  (define_insn "vsx_mov<mode>_64bit"
>    [(set (match_operand:VSX_M 0 "nonimmediate_operand"
>                 "=ZwO,      wa,        wa,        r,         we,        ?wQ,
>                  ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
> +                wa,
>                  ?wa,       v,         <??r>,     wZ,        v")
> 
>  	(match_operand:VSX_M 1 "input_operand" 
>                 "wa,        ZwO,       wa,        we,        r,         r,
>                  wQ,        Y,         r,         r,         wE,        jwM,
> +                eQ,
>                  ?jwM,      W,         <nW>,      v,         wZ"))]
> 
>    "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
> @@ -1213,35 +1216,43 @@ (define_insn "vsx_mov<mode>_64bit"
>    [(set_attr "type"
>                 "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
>                  store,     load,      store,     *,         vecsimple, vecsimple,
> +                vecperm,
>                  vecsimple, *,         *,         vecstore,  vecload")
>     (set_attr "num_insns"
>                 "*,         *,         *,         2,         *,         2,
>                  2,         2,         2,         2,         *,         *,
> +                *,
>                  *,         5,         2,         *,         *")
>     (set_attr "max_prefixed_insns"
>                 "*,         *,         *,         *,         *,         2,
>                  2,         2,         2,         2,         *,         *,
> +                *,
>                  *,         *,         *,         *,         *")
>     (set_attr "length"
>                 "*,         *,         *,         8,         *,         8,
>                  8,         8,         8,         8,         *,         *,
> +                *,
>                  *,         20,        8,         *,         *")
>     (set_attr "isa"
>                 "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
>                  *,         *,         *,         *,         p9v,       *,
> +                p10,
>                  <VSisa>,   *,         *,         *,         *")])
> 
>  ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
> +;;              LXVKQ
>  ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
>  ;;              LVX (VMX)  STVX (VMX)
>  (define_insn "*vsx_mov<mode>_32bit"
>    [(set (match_operand:VSX_M 0 "nonimmediate_operand"
>                 "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
> +                wa,
>                  wa,        v,         ?wa,       v,         <??r>,
>                  wZ,        v")
> 
>  	(match_operand:VSX_M 1 "input_operand" 
>                 "wa,        ZwO,       wa,        Y,         r,         r,
> +                eQ,
>                  wE,        jwM,       ?jwM,      W,         <nW>,
>                  v,         wZ"))]
> 
> @@ -1253,14 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit"
>  }
>    [(set_attr "type"
>                 "vecstore,  vecload,   vecsimple, load,      store,    *,
> +                vecperm,
>                  vecsimple, vecsimple, vecsimple, *,         *,
>                  vecstore,  vecload")
>     (set_attr "length"
>                 "*,         *,         *,         16,        16,        16,
> +                *,
>                  *,         *,         *,         20,        16,
>                  *,         *")
>     (set_attr "isa"
>                 "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
> +                p10,
>                  p9v,       *,         <VSisa>,   *,         *,
>                  *,         *")])
> 


Just skimmed this part, nothing jumps out at me. 


> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 41f1850bf6e..4af8fd76992 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -3336,6 +3336,10 @@ A constant whose negation is a signed 16-bit constant.
>  @item eI
>  A signed 34-bit integer constant if prefixed instructions are supported.
> 
> +@item eQ
> +An IEEE 128-bit constant that can be loaded into a VSX register with a
> +single instruction.
> +
>  @ifset INTERNALS
>  @item G
>  A floating point constant that can be loaded into a register with one


Should 'single instruction' be replaced with 'lxvkq'?   Or have some
lxvkq reference added, since that is the only instruction currently
behind this constraint? 


> diff --git a/gcc/testsuite/gcc.target/powerpc/float128-constant.c b/gcc/testsuite/gcc.target/powerpc/float128-constant.c
> new file mode 100644
> index 00000000000..e3286a786a5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/float128-constant.c
> @@ -0,0 +1,160 @@
> +/* { dg-require-effective-target ppc_float128_hw } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +


Ok.
(Nothing further reviewed in detail).

thanks
-Will


> +/* Test whether the LXVKQ instruction is generated to load special IEEE 128-bit
> +   constants.  */
> +
> +_Float128
> +return_0 (void)
> +{
> +  return 0.0f128;			/* XXSPLTIB 34,0.  */
> +}
> +
> +_Float128
> +return_1 (void)
> +{
> +  return 1.0f128;			/* LXVKQ 34,1.  */
> +}
> +
> +_Float128
> +return_2 (void)
> +{
> +  return 2.0f128;			/* LXVKQ 34,2.  */
> +}
> +
> +_Float128
> +return_3 (void)
> +{
> +  return 3.0f128;			/* LXVKQ 34,3.  */
> +}
> +
> +_Float128
> +return_4 (void)
> +{
> +  return 4.0f128;			/* LXVKQ 34,4.  */
> +}
> +
> +_Float128
> +return_5 (void)
> +{
> +  return 5.0f128;			/* LXVKQ 34,5.  */
> +}
> +
> +_Float128
> +return_6 (void)
> +{
> +  return 6.0f128;			/* LXVKQ 34,6.  */
> +}
> +
> +_Float128
> +return_7 (void)
> +{
> +  return 7.0f128;			/* LXVKQ 34,7.  */
> +}
> +
> +_Float128
> +return_m0 (void)
> +{
> +  return -0.0f128;			/* LXVKQ 34,16.  */
> +}
> +
> +_Float128
> +return_m1 (void)
> +{
> +  return -1.0f128;			/* LXVKQ 34,17.  */
> +}
> +
> +_Float128
> +return_m2 (void)
> +{
> +  return -2.0f128;			/* LXVKQ 34,18.  */
> +}
> +
> +_Float128
> +return_m3 (void)
> +{
> +  return -3.0f128;			/* LXVKQ 34,19.  */
> +}
> +
> +_Float128
> +return_m4 (void)
> +{
> +  return -4.0f128;			/* LXVKQ 34,20.  */
> +}
> +
> +_Float128
> +return_m5 (void)
> +{
> +  return -5.0f128;			/* LXVKQ 34,21.  */
> +}
> +
> +_Float128
> +return_m6 (void)
> +{
> +  return -6.0f128;			/* LXVKQ 34,22.  */
> +}
> +
> +_Float128
> +return_m7 (void)
> +{
> +  return -7.0f128;			/* LXVKQ 34,23.  */
> +}
> +
> +_Float128
> +return_inf (void)
> +{
> +  return __builtin_inff128 ();		/* LXVKQ 34,8.  */
> +}
> +
> +_Float128
> +return_minf (void)
> +{
> +  return - __builtin_inff128 ();	/* LXVKQ 34,24.  */
> +}
> +
> +_Float128
> +return_nan (void)
> +{
> +  return __builtin_nanf128 ("");	/* LXVKQ 34,9.  */
> +}
> +
> +/* Note, the following NaNs should not generate a LXVKQ instruction.  */
> +_Float128
> +return_mnan (void)
> +{
> +  return - __builtin_nanf128 ("");	/* PLXV 34,... */
> +}
> +
> +_Float128
> +return_nan2 (void)
> +{
> +  return __builtin_nanf128 ("1");	/* PLXV 34,... */
> +}
> +
> +_Float128
> +return_nans (void)
> +{
> +  return __builtin_nansf128 ("");	/* PLXV 34,... */
> +}
> +
> +vector long long
> +return_longlong_neg_0 (void)
> +{
> +  /* This vector is the same pattern as -0.0F128.  */
> +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> +#define FIRST	0x8000000000000000
> +#define SECOND	0x0000000000000000
> +
> +#else
> +#define FIRST	0x0000000000000000
> +#define SECOND	0x8000000000000000
> +#endif
> +
> +  return (vector long long) { FIRST, SECOND };	/* LXVKQ 34,16.  */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mlxvkq\M}    19 } } */
> +/* { dg-final { scan-assembler-times {\mplxv\M}      3 } } */
> +/* { dg-final { scan-assembler-times {\mxxspltib\M}  1 } } */
> +
> -- 
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)
  2021-11-05 17:52   ` will schmidt
@ 2021-11-05 18:01     ` Michael Meissner
  2021-12-14 16:57       ` David Edelsohn
  0 siblings, 1 reply; 29+ messages in thread
From: Michael Meissner @ 2021-11-05 18:01 UTC (permalink / raw)
  To: will schmidt
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

On Fri, Nov 05, 2021 at 12:52:51PM -0500, will schmidt wrote:
> > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> > index 956e42bc514..e0d1c718e9f 100644
> > --- a/gcc/config/rs6000/predicates.md
> > +++ b/gcc/config/rs6000/predicates.md
> > @@ -601,6 +601,14 @@ (define_predicate "easy_fp_constant"
> >    if (TARGET_VSX && op == CONST0_RTX (mode))
> >      return 1;
> > 
> > +  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
> 
> Easy is relative, but OK.

The names of the function is easy_fp_constant.

> > +  vec_const_128bit_type vsx_const;
> > +  if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
> > +    {
> > +      if (constant_generates_lxvkq (&vsx_const) != 0)
> > +	return true;
> > +    }
> > +
> >    /* Otherwise consider floating point constants hard, so that the
> >       constant gets pushed to memory during the early RTL phases.  This
> >       has the advantage that double precision constants that can be
> > @@ -609,6 +617,23 @@ (define_predicate "easy_fp_constant"
> >     return 0;
> >  })
> > 
> > +;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded
> > +;; via the LXVKQ instruction.
> > +
> > +(define_predicate "easy_vector_constant_ieee128"
> > +  (match_code "const_vector,const_double")
> > +{
> > +  vec_const_128bit_type vsx_const;
> > +
> > +  /* Can we generate the LXVKQ instruction?  */
> > +  if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10
> > +      || !TARGET_VSX)
> > +    return false;
> 
> Presumably all of the checks there are valid.  (Can we have power10
> without float128_hw or ieee128_constant flags set?)    I do notice the
> addition of an ieee128_constant flag below.

Yes, we can have power10 without float128_hw.  At the moment, 32-bit big endian
does not enable the 128-bit IEEE instructions.  Also when we are building the
bits in libgcc that can switch between compiling the software routines and the
routines used for IEEE hardware, and when we are building the IEEE 128-bit
software emulation functions we need to explicitly turn off IEEE 128-bit
hardware support.

Similarly for VSX, if the user explicitly says -mno-vsx, then we can't enable
this instruction.

> Ok.  I did look at this a bit before it clicked, so would suggest a
> comment stl "All of the constants that can be loaded by lxvkq will have
> zero in the bottom 3 words, so ensure those are zero before we use a
> switch based on the nonzero portion of the constant."
> 
> It would be fine as-is too.  :-)

Ok.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)
  2021-11-05 17:01   ` will schmidt
@ 2021-11-05 18:13     ` Michael Meissner
  2021-12-14 16:57       ` David Edelsohn
  0 siblings, 1 reply; 29+ messages in thread
From: Michael Meissner @ 2021-11-05 18:13 UTC (permalink / raw)
  To: will schmidt
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

On Fri, Nov 05, 2021 at 12:01:43PM -0500, will schmidt wrote:
> On Fri, 2021-11-05 at 00:04 -0400, Michael Meissner wrote:
> > Add new constant data structure.
> > 
> > This patch provides the data structure and function to convert a
> > CONST_INT, CONST_DOUBLE, CONST_VECTOR, or VEC_DUPLICATE of a constant) to
> > an array of bytes, half-words, words, and  double words that can be loaded
> > into a 128-bit vector register.
> > 
> > The next patches will use this data structure to generate code that
> > generates load of the vector/floating point registers using the XXSPLTIDP,
> > XXSPLTIW, and LXVKQ instructions that were added in power10.
> > 
> > 2021-11-05  Michael Meissner  <meissner@the-meissners.org>
> > 

Whoops, it should be meissner@linux.ibm.com.

> comment to be explicit on the structure name being copied to/from.
> (vec_const_128bit_type is easy to search for, vector or constant or
> structure are not as unique)

Yes, the original name was more generic (rs6000_const).  Originally it could
potentially handle vector constants that were greater than 128-bits if we ever
have support for larger vectors.  But I thought that extra generallity hindered
the code (since you had to check whether the size was exactly 128-bits, etc.).
So I made the data structure tailored to the problem at hand.

> > +
> > +/* Copy an floating point constant to the vector constant structure.  */
> > +
> 
> s/an/a/

Ok.

> > +static void
> > +constant_fp_to_128bit_vector (rtx op,
> > +			      machine_mode mode,
> > +			      size_t byte_num,
> > +			      vec_const_128bit_type *info)
> > +{
> > +  unsigned bitsize = GET_MODE_BITSIZE (mode);
> > +  unsigned num_words = bitsize / 32;
> > +  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
> > +  long real_words[VECTOR_128BIT_WORDS];
> > +
> > +  /* Make sure we don't overflow the real_words array and that it is
> > +     filled completely.  */
> > +  gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0);
> 
> Not clear to me on the potential to partially fill the real_words
> array. 

At the moment we don't support a 16-bit floating point type in the compiler
(the Power10 has limited 16-bit floating point support, but we don't make a
special type for it).  If/when we add the 16-bit floating point, we will
possibly need to revisit this.

> > +
> > +  real_to_target (real_words, rtype, mode);
> > +
> > +  /* Iterate over each 32-bit word in the floating point constant.  The
> > +     real_to_target function puts out words in endian fashion.  We need
> 
> Meaning host-endian fashion, or is that meant to be big-endian ? 

Real_to_target puts out the 32-bit values in endian fashion.  This data
structure wants to hold everything in big endian fashion to make checking
things simpler.

> Perhaps also rephrase or move the comment up to indicate that
> real_to_target will have placed or has already placed the words in
> <whatever> endian fashion.
> As stated I was expecting to see a call to real_to_target() below the
> comment. 

Yes, I probably should move the real_to_target call after the comment.

> > +
> > +  /* Possibly splat the constant to fill a vector size.  */
> 
> 
> Suggest "Splat the constant to fill a vector size if ..."

Ok.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] Add Power10 XXSPLTIW
  2021-11-05  4:09 ` [PATCH 3/5] Add Power10 XXSPLTIW Michael Meissner
@ 2021-11-05 18:50   ` will schmidt
  2021-12-14 16:59     ` David Edelsohn
  2021-11-15 16:37   ` Ping: " Michael Meissner
  2021-12-13 17:04   ` Ping #2: " Michael Meissner
  2 siblings, 1 reply; 29+ messages in thread
From: will schmidt @ 2021-11-05 18:50 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

On Fri, 2021-11-05 at 00:09 -0400, Michael Meissner wrote:
> Generate XXSPLTIW on power10.
> 

Hi,


> This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
> instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
> adding support for vector constants that can be used, and adding a
> VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
> 
> The eP constraint was added to recognize constants that can be loaded into
> vector registers with a single prefixed instruction.

Perhaps Swap "... the eP constraint was added ..."  for "Add the eP
constraint to ..."


> 
> I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
> constants.


> 
> 2021-11-05  Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 
> 	* config/rs6000/constraints.md (eP): Update comment.
> 	* config/rs6000/predicates.md (easy_fp_constant): Add support for
> 	generating XXSPLTIW.
> 	(vsx_prefixed_constant): New predicate.
> 	(easy_vector_constant): Add support for
> 	generating XXSPLTIW.
> 	* config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
> 	declaration.
> 	(constant_generates_xxspltiw): Likewise.
> 	* config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
> 	XXSPLTIW, don't do XXSPLTIB and sign extend.

Perhaps just 'generate XXSPLTIW if possible'.  

> 	(output_vec_const_move): Add support for XXSPLTIW.
> 	(prefixed_xxsplti_p): New function.
> 	(constant_generates_xxspltiw): New function.
> 	* config/rs6000/rs6000.md (prefixed attribute): Add support to
> 	mark XXSPLTI* instructions as being prefixed.
> 	* config/rs6000/rs6000.opt (-msplat-word-constant): New debug
> 	switch.
> 	* config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
> 	generating XXSPLTIW or XXSPLTIDP.
> 	(vsx_mov<mode>_32bit): Likewise.
> 	* doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
> 	eP constraint.
> 
> gcc/testsuite/
> 
> 	* gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
> 	* gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
> 	* gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
> 	* gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
> 	* gcc.target/powerpc/vec-splati-runnable.c: Update insn count.
> ---
>  gcc/config/rs6000/constraints.md              |  6 ++
>  gcc/config/rs6000/predicates.md               | 46 ++++++++++-
>  gcc/config/rs6000/rs6000-protos.h             |  2 +
>  gcc/config/rs6000/rs6000.c                    | 81 +++++++++++++++++++
>  gcc/config/rs6000/rs6000.md                   |  5 ++
>  gcc/config/rs6000/rs6000.opt                  |  4 +
>  gcc/config/rs6000/vsx.md                      | 28 +++----
>  gcc/doc/md.texi                               |  4 +
>  .../powerpc/vec-splat-constant-v16qi.c        | 27 +++++++
>  .../powerpc/vec-splat-constant-v4sf.c         | 67 +++++++++++++++
>  .../powerpc/vec-splat-constant-v4si.c         | 51 ++++++++++++
>  .../powerpc/vec-splat-constant-v8hi.c         | 62 ++++++++++++++
>  .../gcc.target/powerpc/vec-splati-runnable.c  |  4 +-
>  13 files changed, 369 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
> 
> diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
> index e72132b4c28..a4b05837fa6 100644
> --- a/gcc/config/rs6000/constraints.md
> +++ b/gcc/config/rs6000/constraints.md
> @@ -213,6 +213,12 @@ (define_constraint "eI"
>    "A signed 34-bit integer constant if prefixed instructions are supported."
>    (match_operand 0 "cint34_operand"))
> 
> +;; A SF/DF scalar constant or a vector constant that can be loaded into vector
> +;; registers with one prefixed instruction such as XXSPLTIDP or XXSPLTIW.
> +(define_constraint "eP"
> +  "A constant that can be loaded into a VSX register with one prefixed insn."
> +  (match_operand 0 "vsx_prefixed_constant"))
> +
>  ;; A TF/KF scalar constant or a vector constant that can load certain IEEE
>  ;; 128-bit constants into vector registers using LXVKQ.
>  (define_constraint "eQ"
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index e0d1c718e9f..ed6252bd0c4 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -605,7 +605,10 @@ (define_predicate "easy_fp_constant"
>    vec_const_128bit_type vsx_const;
>    if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
>      {
> -      if (constant_generates_lxvkq (&vsx_const) != 0)
> +      if (constant_generates_lxvkq (&vsx_const))
> +	return true;
> +
> +      if (constant_generates_xxspltiw (&vsx_const))
>  	return true;
>      }
> 

ok

> @@ -617,6 +620,42 @@ (define_predicate "easy_fp_constant"
>     return 0;
>  })
> 
> +;; Return 1 if the operand is a 64-bit floating point scalar constant or a
> +;; vector constant that can be loaded to a VSX register with one prefixed
> +;; instruction, such as XXSPLTIDP or XXSPLTIW.
> +;;
> +;; In addition regular constants, we also recognize constants formed with the
> +;; VEC_DUPLICATE insn from scalar constants.
> +;;
> +;; We don't handle scalar integer constants here because the assumption is the
> +;; normal integer constants will be loaded into GPR registers.  For the
> +;; constants that need to be loaded into vector registers, the instructions
> +;; don't work well with TImode variables assigned a constant.  This is because
> +;; the 64-bit scalar constants are splatted into both halves of the register.
> +
> +(define_predicate "vsx_prefixed_constant"
> +  (match_code "const_double,const_vector,vec_duplicate")
> +{
> +  /* If we can generate the constant with 1-2 Altivec instructions, don't
> +      generate a prefixed instruction.  */

1-2 Altivec instructions is both vague and specific.  Perhaps swap for
a comment something like "If ..  with easy altivec instructions ... " 

> +  if (CONST_VECTOR_P (op) && easy_altivec_constant (op, mode))
> +    return false;
> +
> +  /* Do we have prefixed instructions and are VSX registers available?  Is the
> +     constant recognized?  */
> +  if (!TARGET_PREFIXED || !TARGET_VSX)
> +    return false;
> +
> +  vec_const_128bit_type vsx_const;
> +  if (!vec_const_128bit_to_bytes (op, mode, &vsx_const))
> +    return false;
> +
> +  if (constant_generates_xxspltiw (&vsx_const))
> +    return true;
> +
> +  return false;
> +})

ok

> +
>  ;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded
>  ;; via the LXVKQ instruction.
> 
> @@ -683,7 +722,10 @@ (define_predicate "easy_vector_constant"
>        vec_const_128bit_type vsx_const;
>        if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
>  	{
> -	  if (constant_generates_lxvkq (&vsx_const) != 0)
> +	  if (constant_generates_lxvkq (&vsx_const))
> +	    return true;
> +
> +	  if (constant_generates_xxspltiw (&vsx_const))
>  	    return true;
>  	}


ok


> 
> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> index 494a95cc6ee..99c6a671289 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
>  extern bool prefixed_load_p (rtx_insn *);
>  extern bool prefixed_store_p (rtx_insn *);
>  extern bool prefixed_paddi_p (rtx_insn *);
> +extern bool prefixed_xxsplti_p (rtx_insn *);
>  extern void rs6000_asm_output_opcode (FILE *);
>  extern void output_pcrel_opt_reloc (rtx);
>  extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
> @@ -251,6 +252,7 @@ typedef struct {
>  extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
>  				       vec_const_128bit_type *);
>  extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
> +extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *);
>  #endif /* RTX_CODE */
> 
>  #ifdef TREE_CODE


ok

> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 06d02085b06..be24f56eb31 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -6940,6 +6940,11 @@ xxspltib_constant_p (rtx op,
>    else if (IN_RANGE (value, -1, 0))
>      *num_insns_ptr = 1;
> 
> +  /* If we can generate XXSPLTIW or XXSPLTIDP, don't generate XXSPLTIB and a
> +     sign extend operation.  */
> +  else if (vsx_prefixed_constant (op, mode))
> +    return false;

Comment is accurate, but might be clearer with stl
  Don't generate this (xxspltib) instruction if we will be able to
generate an xxspltiw or xxspltidp.  


> +
>    else
>      *num_insns_ptr = 2;
> 
> @@ -7000,6 +7005,13 @@ output_vec_const_move (rtx *operands)
>  	      operands[2] = GEN_INT (imm);
>  	      return "lxvkq %x0,%2";
>  	    }
> +
> +	  imm = constant_generates_xxspltiw (&vsx_const);
> +	  if (imm)
> +	    {
> +	      operands[2] = GEN_INT (imm);
> +	      return "xxspltiw %x0,%2";
> +	    }
>  	}
> 
>        if (TARGET_P9_VECTOR

ok

> @@ -26767,6 +26779,41 @@ prefixed_paddi_p (rtx_insn *insn)
>    return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
>  }
> 
> +/* Whether an instruction is a prefixed XXSPLTI* instruction.  This is called
> +   from the prefixed attribute processing.  */
> +
> +bool
> +prefixed_xxsplti_p (rtx_insn *insn)
> +{
> +  rtx set = single_set (insn);
> +  if (!set)
> +    return false;
> +
> +  rtx dest = SET_DEST (set);
> +  rtx src = SET_SRC (set);
> +  machine_mode mode = GET_MODE (dest);
> +
> +  if (!REG_P (dest) && !SUBREG_P (dest))
> +    return false;
> +
> +  if (GET_CODE (src) == UNSPEC)
> +    {
> +      int unspec = XINT (src, 1);
> +      return (unspec == UNSPEC_XXSPLTIW
> +	      || unspec == UNSPEC_XXSPLTIDP
> +	      || unspec == UNSPEC_XXSPLTI32DX);
> +    }
> +
> +  vec_const_128bit_type vsx_const;
> +  if (vec_const_128bit_to_bytes (src, mode, &vsx_const))
> +    {
> +      if (constant_generates_xxspltiw (&vsx_const))
> +	return true;
> +    }
> +
> +  return false;
> +}
> +

ok.


>  /* Whether the next instruction needs a 'p' prefix issued before the
>     instruction is printed out.  */
>  static bool prepend_p_to_next_insn;
> @@ -28933,6 +28980,40 @@ constant_generates_lxvkq (vec_const_128bit_type *vsx_const)
>    return 0;
>  }
> 
> +/* Determine if a vector constant can be loaded with XXSPLTIW.  Return zero if
> +   the XXSPLTIW instruction cannot be used.  Otherwise return the immediate
> +   value to be used with the XXSPLTIW instruction.  */
> +
> +unsigned
> +constant_generates_xxspltiw (vec_const_128bit_type *vsx_const)
> +{
> +  if (!TARGET_SPLAT_WORD_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
> +    return 0;
> +
> +  if (!vsx_const->all_words_same)
> +    return 0;
> +
> +  /* If we can use XXSPLTIB, don't generate XXSPLTIW.  */
> +  if (vsx_const->all_bytes_same)
> +    return 0;
> +
> +  /* See if we can use VSPLTISH or VSPLTISW.  */
> +  if (vsx_const->all_half_words_same)
> +    {
> +      unsigned short h_word = vsx_const->half_words[0];
> +      short sign_h_word = ((h_word & 0xffff) ^ 0x8000) - 0x8000;
> +      if (EASY_VECTOR_15 (sign_h_word))
> +	return 0;
> +    }
> +
> +  unsigned int word = vsx_const->words[0];
> +  int sign_word = ((word & 0xffffffff) ^ 0x80000000) - 0x80000000;
> +  if (EASY_VECTOR_15 (sign_word))
> +    return 0;
> +
> +  return vsx_const->words[0];
> +}
> +

ok

>  
>  struct gcc_target targetm = TARGET_INITIALIZER;
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6bec2bddbde..3a7bcd2426e 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -314,6 +314,11 @@ (define_attr "prefixed" "no,yes"
> 
>  	 (eq_attr "type" "integer,add")
>  	 (if_then_else (match_test "prefixed_paddi_p (insn)")
> +		       (const_string "yes")
> +		       (const_string "no"))
> +
> +	 (eq_attr "type" "vecperm")
> +	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
>  		       (const_string "yes")
>  		       (const_string "no"))]
> 

ok


> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index b7433ec4e30..ec7b106fddb 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -640,6 +640,10 @@ mprivileged
>  Target Var(rs6000_privileged) Init(0)
>  Generate code that will run in privileged state.
> 
> +msplat-word-constant
> +Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
> +Generate (do not generate) code that uses the XXSPLTIW instruction.
> +
>  mieee128-constant
>  Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
>  Generate (do not generate) code that uses the LXVKQ instruction.

ok


> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 0a376ee4c28..9f0c48db6f2 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1192,19 +1192,19 @@ (define_insn_and_split "*xxspltib_<mode>_split"
> 
>  ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
>  ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
> -;;              LXVKQ
> +;;              LXVKQ      XXSPLTI*
>  ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
>  (define_insn "vsx_mov<mode>_64bit"
>    [(set (match_operand:VSX_M 0 "nonimmediate_operand"
>                 "=ZwO,      wa,        wa,        r,         we,        ?wQ,
>                  ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
> -                wa,
> +                wa,        wa,
>                  ?wa,       v,         <??r>,     wZ,        v")
> 
>  	(match_operand:VSX_M 1 "input_operand" 
>                 "wa,        ZwO,       wa,        we,        r,         r,
>                  wQ,        Y,         r,         r,         wE,        jwM,
> -                eQ,
> +                eQ,        eP,
>                  ?jwM,      W,         <nW>,      v,         wZ"))]
> 
>    "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
> @@ -1216,43 +1216,43 @@ (define_insn "vsx_mov<mode>_64bit"
>    [(set_attr "type"
>                 "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
>                  store,     load,      store,     *,         vecsimple, vecsimple,
> -                vecperm,
> +                vecperm,   vecperm,
>                  vecsimple, *,         *,         vecstore,  vecload")
>     (set_attr "num_insns"
>                 "*,         *,         *,         2,         *,         2,
>                  2,         2,         2,         2,         *,         *,
> -                *,
> +                *,         *,
>                  *,         5,         2,         *,         *")
>     (set_attr "max_prefixed_insns"
>                 "*,         *,         *,         *,         *,         2,
>                  2,         2,         2,         2,         *,         *,
> -                *,
> +                *,         *,
>                  *,         *,         *,         *,         *")
>     (set_attr "length"
>                 "*,         *,         *,         8,         *,         8,
>                  8,         8,         8,         8,         *,         *,
> -                *,
> +                *,         *,
>                  *,         20,        8,         *,         *")
>     (set_attr "isa"
>                 "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
>                  *,         *,         *,         *,         p9v,       *,
> -                p10,
> +                p10,       p10,
>                  <VSisa>,   *,         *,         *,         *")])
> 
>  ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
> -;;              LXVKQ
> +;;              LXVKQ      XXSPLTI*
>  ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
>  ;;              LVX (VMX)  STVX (VMX)
>  (define_insn "*vsx_mov<mode>_32bit"
>    [(set (match_operand:VSX_M 0 "nonimmediate_operand"
>                 "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
> -                wa,
> +                wa,        wa,
>                  wa,        v,         ?wa,       v,         <??r>,
>                  wZ,        v")
> 
>  	(match_operand:VSX_M 1 "input_operand" 
>                 "wa,        ZwO,       wa,        Y,         r,         r,
> -                eQ,
> +                eQ,        eP,
>                  wE,        jwM,       ?jwM,      W,         <nW>,
>                  v,         wZ"))]
> 
> @@ -1264,17 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit"
>  }
>    [(set_attr "type"
>                 "vecstore,  vecload,   vecsimple, load,      store,    *,
> -                vecperm,
> +                vecperm,   vecperm,
>                  vecsimple, vecsimple, vecsimple, *,         *,
>                  vecstore,  vecload")
>     (set_attr "length"
>                 "*,         *,         *,         16,        16,        16,
> -                *,
> +                *,         *,
>                  *,         *,         *,         20,        16,
>                  *,         *")
>     (set_attr "isa"
>                 "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
> -                p10,
> +                p10,       p10,
>                  p9v,       *,         <VSisa>,   *,         *,
>                  *,         *")])
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 4af8fd76992..41a568b7d4e 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -3336,6 +3336,10 @@ A constant whose negation is a signed 16-bit constant.
>  @item eI
>  A signed 34-bit integer constant if prefixed instructions are supported.
> 
> +@item eP
> +A scalar floating point constant or a vector constant that can be
> +loaded with one prefixed instruction to a VSX register.


...  loaded to a VSX register with one previxed instruction.


> +
>  @item eQ
>  An IEEE 128-bit constant that can be loaded into a VSX register with a
>  single instruction.
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
> new file mode 100644
> index 00000000000..27764ddbc83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +



> +#include <altivec.h>
> +
> +/* Test whether XXSPLTIW is generated for V16HI vector constants where the
> +   first 4 elements are the same as the next 4 elements, etc.  */
> +
> +vector unsigned char
> +v16qi_const_1 (void)
> +{
> +  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
> +				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
> +}
> +
> +vector unsigned char
> +v16qi_const_2 (void)
> +{
> +  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
> +				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
> +/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
> +/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
> +/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */


ok


Nothing jumped out at me with the test cases below..

Thanks
-Will


> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
> new file mode 100644
> index 00000000000..1f0475cf47a

<snip>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants
  2021-11-05  4:10 ` [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants Michael Meissner
@ 2021-11-05 19:24   ` will schmidt
  2021-12-14 17:00     ` David Edelsohn
  2021-11-15 16:38   ` Ping: " Michael Meissner
  2021-12-13 17:06   ` Ping #2: " Michael Meissner
  2 siblings, 1 reply; 29+ messages in thread
From: will schmidt @ 2021-11-05 19:24 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

On Fri, 2021-11-05 at 00:10 -0400, Michael Meissner wrote:
> Generate XXSPLTIDP for vectors on power10.
> 
> This patch implements XXSPLTIDP support for all vector constants.  The
> XXSPLTIDP instruction is given a 32-bit immediate that is converted to a vector
> of two DFmode constants.  The immediate is in SFmode format, so only constants
> that fit as SFmode values can be loaded with XXSPLTIDP.
> 
> The constraint (eP) added in the previous patch for XXSPLTIW is also used
> for XXSPLTIDP.
> 

ok


> DImode scalar constants are not handled.  This is due to the majority of DImode
> constants will be in the GPR registers.  With vector registers, you have the
> problem that XXSPLTIDP splats the double word into both elements of the
> vector.  However, if TImode is loaded with an integer constant, it wants a full
> 128-bit constant.

This may be worth as adding to a todo somewhere in the code.

> 
> SFmode and DFmode scalar constants are not handled in this patch.  The
> support for for those constants will be in the next patch.

ok

> 
> I have added a temporary switch (-msplat-float-constant) to control whether or
> not the XXSPLTIDP instruction is generated.
> 
> I added 2 new tests to test loading up V2DI and V2DF vector constants.




> 
> 2021-11-05  Michael Meissner  <meissner@the-meissners.org>
> 
> gcc/
> 
> 	* config/rs6000/predicates.md (easy_fp_constant): Add support for
> 	generating XXSPLTIDP.
> 	(vsx_prefixed_constant): Likewise.
> 	(easy_vector_constant): Likewise.
> 	* config/rs6000/rs6000-protos.h (constant_generates_xxspltidp):
> 	New declaration.
> 	* config/rs6000/rs6000.c (output_vec_const_move): Add support for
> 	generating XXSPLTIDP.
> 	(prefixed_xxsplti_p): Likewise.
> 	(constant_generates_xxspltidp): New function.
> 	* config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.
> 
> gcc/testsuite/
> 
> 	* gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
> 	regex for power10.
> 	* gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
> 	* gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
> ---


ok

>  gcc/config/rs6000/predicates.md               |   9 ++
>  gcc/config/rs6000/rs6000-protos.h             |   1 +
>  gcc/config/rs6000/rs6000.c                    | 108 ++++++++++++++++++
>  gcc/config/rs6000/rs6000.opt                  |   4 +
>  .../powerpc/pr86731-fwrapv-longlong.c         |   9 +-
>  .../powerpc/vec-splat-constant-v2df.c         |  64 +++++++++++
>  .../powerpc/vec-splat-constant-v2di.c         |  50 ++++++++
>  7 files changed, 241 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
> 
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index ed6252bd0c4..d748b11857c 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -610,6 +610,9 @@ (define_predicate "easy_fp_constant"
> 
>        if (constant_generates_xxspltiw (&vsx_const))
>  	return true;
> +
> +      if (constant_generates_xxspltidp (&vsx_const))
> +	return true;
>      }
> 
>    /* Otherwise consider floating point constants hard, so that the
> @@ -653,6 +656,9 @@ (define_predicate "vsx_prefixed_constant"
>    if (constant_generates_xxspltiw (&vsx_const))
>      return true;
> 
> +  if (constant_generates_xxspltidp (&vsx_const))
> +    return true;
> +
>    return false;
>  })
> 
> @@ -727,6 +733,9 @@ (define_predicate "easy_vector_constant"
> 
>  	  if (constant_generates_xxspltiw (&vsx_const))
>  	    return true;
> +
> +	  if (constant_generates_xxspltidp (&vsx_const))
> +	    return true;
>  	}


ok

> 
>        if (TARGET_P9_VECTOR
> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> index 99c6a671289..2d28df7442d 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -253,6 +253,7 @@ extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
>  				       vec_const_128bit_type *);
>  extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
>  extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *);
> +extern unsigned constant_generates_xxspltidp (vec_const_128bit_type *);
>  #endif /* RTX_CODE */
> 
>  #ifdef TREE_CODE
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index be24f56eb31..8fde48cf2b3 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -7012,6 +7012,13 @@ output_vec_const_move (rtx *operands)
>  	      operands[2] = GEN_INT (imm);
>  	      return "xxspltiw %x0,%2";
>  	    }
> +
> +	  imm = constant_generates_xxspltidp (&vsx_const);
> +	  if (imm)


Just a nit that the two lines could be combined into a similar form
as used elsewhere as ...
	if (constant_generates_xxspltidp(&vsx_const))


> +	    {
> +	      operands[2] = GEN_INT (imm);
> +	      return "xxspltidp %x0,%2";
> +	    }

>  	}
> 
>        if (TARGET_P9_VECTOR
> @@ -26809,6 +26816,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
>      {
>        if (constant_generates_xxspltiw (&vsx_const))
>  	return true;
> +
> +      if (constant_generates_xxspltidp (&vsx_const))
> +	return true;
>      }
> 
>    return false;
> @@ -29014,6 +29024,104 @@ constant_generates_xxspltiw (vec_const_128bit_type *vsx_const)
>    return vsx_const->words[0];
>  }
> 
> +/* Determine if a vector constant can be loaded with XXSPLTIDP.  Return zero if
> +   the XXSPLTIDP instruction cannot be used.  Otherwise return the immediate
> +   value to be used with the XXSPLTIDP instruction.  */
> +
> +unsigned
> +constant_generates_xxspltidp (vec_const_128bit_type *vsx_const)
> +{
> +  if (!TARGET_SPLAT_FLOAT_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
> +    return 0;
> +
> +  /* Make sure that the two 64-bit segments are the same.  */
> +  if (!vsx_const->all_double_words_same)
> +    return 0;

Perhaps more like "Reject if the two 64-bit segments are (not?) the
same."


> +
> +  /* If the bytes, half words, or words are all the same, don't use XXSPLTIDP.
> +     Use a simpler instruction (XXSPLTIB, VSPLTISB, VSPLTISH, or VSPLTISW).  */
> +  if (vsx_const->all_bytes_same
> +      || vsx_const->all_half_words_same
> +      || vsx_const->all_words_same)
> +    return 0;
> +
> +  unsigned HOST_WIDE_INT value = vsx_const->double_words[0];
> +
> +  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
> +     pattern and the signalling NaN bit pattern.  Recognize infinity and
> +     negative infinity.  */
> +
> +  /* Bit representation of DFmode normal quiet NaN.  */
> +#define RS6000_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
> +
> +  /* Bit representation of DFmode normal signaling NaN.  */
> +#define RS6000_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
> +
> +  /* Bit representation of DFmode positive infinity.  */
> +#define RS6000_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
> +
> +  /* Bit representation of DFmode negative infinity.  */
> +#define RS6000_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)

Defines may be more useful in a header file?  

> +
> +  if (value != RS6000_CONST_DF_NAN
> +      && value != RS6000_CONST_DF_NANS
> +      && value != RS6000_CONST_DF_INF
> +      && value != RS6000_CONST_DF_NEG_INF)
> +    {
> +      /* The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for
> +	 the exponent, and 52 bits for the mantissa (not counting the hidden
> +	 bit used for normal numbers).  NaN values have the exponent set to all
> +	 1 bits, and the mantissa non-zero (mantissa == 0 is infinity).  */
> +
> +      int df_exponent = (value >> 52) & 0x7ff;
> +      unsigned HOST_WIDE_INT df_mantissa
> +	= value & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);


Should the "=" be on the end of the previous line? 


> +
> +      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
> +	return 0;
> +
> +      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
> +	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
> +	 subnormal, then the hidden bit in the mantissa is not set.  */
> +      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
> +	return 0;
> +    }
> +
> +  /* Change the representation to DFmode constant.  */
> +  long df_words[2] = { vsx_const->words[0], vsx_const->words[1] };
> +
> +  /* real_from_target takes the target words in  target order.  */

Extra space before target order.

> +  if (!BYTES_BIG_ENDIAN)
> +    std::swap (df_words[0], df_words[1]);
> +
> +  REAL_VALUE_TYPE rv_type;
> +  real_from_target (&rv_type, df_words, DFmode);
> +
> +  const REAL_VALUE_TYPE *rv = &rv_type;
> +
> +  /* Validate that the number can be stored as a SFmode value.  */
> +  if (!exact_real_truncate (SFmode, rv))
> +    return 0;
> +
> +  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
> +     mantissa field is non-zero) which is undefined for the XXSPLTIDP
> +     instruction.  */
> +  long sf_value;
> +  real_to_target (&sf_value, rv, SFmode);
> +
> +  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
> +     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
> +     0 bits, and the mantissa non-zero.  */
> +  long sf_exponent = (sf_value >> 23) & 0xFF;
> +  long sf_mantissa = sf_value & 0x7FFFFF;
> +
> +  if (sf_exponent == 0 && sf_mantissa != 0)
> +    return 0;
> +
> +  /* Return the immediate to be used.  */
> +  return sf_value;
> +}

ok

> +
>  
>  struct gcc_target targetm = TARGET_INITIALIZER;
> 
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index ec7b106fddb..c1d661d7e6b 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -644,6 +644,10 @@ msplat-word-constant
>  Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
>  Generate (do not generate) code that uses the XXSPLTIW instruction.
> 
> +msplat-float-constant
> +Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
> +Generate (do not generate) code that uses the XXSPLTIDP instruction.
> +
>  mieee128-constant
>  Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
>  Generate (do not generate) code that uses the LXVKQ instruction.

ok


> diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> index bd1502bb30a..dcb30e1d886 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> @@ -24,11 +24,12 @@ vector signed long long splats4(void)
>          return (vector signed long long) vec_sl(mzero, mzero);
>  }
> 
> -/* Codegen will consist of splat and shift instructions for most types.
> -   If folding is enabled, the vec_sl tests using vector long long type will
> -   generate a lvx instead of a vspltisw+vsld pair.  */
> +/* Codegen will consist of splat and shift instructions for most types.  If
> +   folding is enabled, the vec_sl tests using vector long long type will
> +   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
> +   generate a xxspltidp instruction instead of the lvx.  */
> 
>  /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
>  /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
> -/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */


ok

No further comments, 
Thanks
-Will


> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> new file mode 100644
> index 00000000000..82ffc86f8aa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> @@ -0,0 +1,64 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> +   instruction.  */
> +
> +vector double
> +v2df_double_0 (void)
> +{
> +  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
> +}
> +
> +vector double
> +v2df_double_1 (void)
> +{
> +  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
> +}
> +
> +#ifndef __FAST_MATH__
> +vector double
> +v2df_double_m0 (void)
> +{
> +  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
> +}
> +
> +vector double
> +v2df_double_nan (void)
> +{
> +  return (vector double) { __builtin_nan (""),
> +			   __builtin_nan ("") };	/* XXSPLTIDP.  */
> +}
> +
> +vector double
> +v2df_double_inf (void)
> +{
> +  return (vector double) { __builtin_inf (),
> +			   __builtin_inf () };		/* XXSPLTIDP.  */
> +}
> +
> +vector double
> +v2df_double_m_inf (void)
> +{
> +  return (vector double) { - __builtin_inf (),
> +			   - __builtin_inf () };	/* XXSPLTIDP.  */
> +}
> +#endif
> +
> +vector double
> +v2df_double_pi (void)
> +{
> +  return (vector double) { M_PI, M_PI };		/* PLVX.  */
> +}
> +
> +vector double
> +v2df_double_denorm (void)
> +{
> +  return (vector double) { (double)0x1p-149f,
> +			   (double)0x1p-149f };		/* PLVX.  */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
> new file mode 100644
> index 00000000000..4d44f943d26
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
> @@ -0,0 +1,50 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +/* Test generating V2DImode constants that have the same bit pattern as
> +   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
> +   the ISA 3.1 (power10).  */
> +
> +vector long long
> +vector_0 (void)
> +{
> +  /* XXSPLTIB or XXLXOR.  */
> +  return (vector long long) { 0LL, 0LL };
> +}
> +
> +vector long long
> +vector_1 (void)
> +{
> +  /* XXSPLTIB and VEXTSB2D.  */
> +  return (vector long long) { 1LL, 1LL };
> +}
> +
> +/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
> +   with XXSPLTISDP.  */
> +vector long long
> +vector_float_neg_0 (void)
> +{
> +  /* XXSPLTIDP.  */
> +  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
> +}
> +
> +/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
> +   XXSPLTISDP.  */
> +vector long long
> +vector_float_1_0 (void)
> +{
> +  /* XXSPLTIDP.  */
> +  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
> +}
> +
> +/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
> +   with XXSPLTIDP.  */
> +vector long long
> +scalar_pi (void)
> +{
> +  /* PLXV.  */
> +  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
> -- 
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.
  2021-11-05  4:11 ` [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants Michael Meissner
@ 2021-11-05 19:38   ` will schmidt
  2021-12-14 17:01     ` David Edelsohn
  2021-11-15 16:38   ` Ping: " Michael Meissner
  2021-12-13 17:07   ` Ping #2: " Michael Meissner
  2 siblings, 1 reply; 29+ messages in thread
From: will schmidt @ 2021-11-05 19:38 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

On Fri, 2021-11-05 at 00:11 -0400, Michael Meissner wrote:
> Generate XXSPLTIDP for scalars on power10.
> 
> This patch implements XXSPLTIDP support for SF, and DF scalar constants.
> The previous patch added support for vector constants.  This patch adds
> the support for SFmode and DFmode scalar constants.
> 
> I added 2 new tests to test loading up SF and DF scalar constants.


ok

> 
> 2021-11-05  Michael Meissner  <meissner@the-meissners.org>
> 
> gcc/
> 
> 	* config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
> 	(UNSPEC_XXSPLTIW_CONST): New unspec.
> 	(movsf_hardfloat): Add support for generating XXSPLTIDP.
> 	(mov<mode>_hardfloat32): Likewise.
> 	(mov<mode>_hardfloat64): Likewise.
> 	(xxspltidp_<mode>_internal): New insns.
> 	(xxspltiw_<mode>_internal): New insns.
> 	(splitters for SF/DFmode): Add new splitters for XXSPLTIDP.
> 
> gcc/testsuite/
> 
> 	* gcc.target/powerpc/vec-splat-constant-df.c: New test.
> 	* gcc.target/powerpc/vec-splat-constant-sf.c: New test.
> ---

ok


>  gcc/config/rs6000/rs6000.md                   | 97 +++++++++++++++----
>  .../powerpc/vec-splat-constant-df.c           | 60 ++++++++++++
>  .../powerpc/vec-splat-constant-sf.c           | 60 ++++++++++++
>  3 files changed, 199 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 3a7bcd2426e..4122acb98cf 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -156,6 +156,8 @@ (define_c_enum "unspec"
>     UNSPEC_PEXTD
>     UNSPEC_HASHST
>     UNSPEC_HASHCHK
> +   UNSPEC_XXSPLTIDP_CONST
> +   UNSPEC_XXSPLTIW_CONST
>    ])
> 
>  ;;
> @@ -7764,17 +7766,17 @@ (define_split
>  ;;
>  ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
>  ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
> -;;	MR           MT<x>      MF<x>       NOP
> +;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
> 
>  (define_insn "movsf_hardfloat"
>    [(set (match_operand:SF 0 "nonimmediate_operand"
>  	 "=!r,       f,         v,          wa,        m,         wY,
>  	  Z,         m,         wa,         !r,        f,         wa,
> -	  !r,        *c*l,      !r,         *h")
> +	  !r,        *c*l,      !r,         *h,        wa")
>  	(match_operand:SF 1 "input_operand"
>  	 "m,         m,         wY,         Z,         f,         v,
>  	  wa,        r,         j,          j,         f,         wa,
> -	  r,         r,         *h,         0"))]
> +	  r,         r,         *h,         0,         eP"))]
>    "(register_operand (operands[0], SFmode)
>     || register_operand (operands[1], SFmode))
>     && TARGET_HARD_FLOAT
> @@ -7796,15 +7798,16 @@ (define_insn "movsf_hardfloat"
>     mr %0,%1
>     mt%0 %1
>     mf%1 %0
> -   nop"
> +   nop
> +   #"
>    [(set_attr "type"
>  	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
>  	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
> -	 *,          mtjmpr,    mfjmpr,     *")
> +	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
>     (set_attr "isa"
>  	"*,          *,         p9v,        p8v,       *,         p9v,
>  	 p8v,        *,         *,          *,         *,         *,
> -	 *,          *,         *,          *")])
> +	 *,          *,         *,          *,         p10")])
> 
>  ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
>  ;;	FMR          MR         MT%0       MF%1       NOP
> @@ -8064,18 +8067,18 @@ (define_split
> 
>  ;;           STFD         LFD         FMR         LXSD        STXSD
>  ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
> -;;           LWZ          STW         MR
> +;;           LWZ          STW         MR          XXSPLTIDP
> 
> 
>  (define_insn "*mov<mode>_hardfloat32"
>    [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
>              "=m,          d,          d,          <f64_p9>,   wY,
>                <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
> -              Y,          r,          !r")
> +              Y,          r,          !r,         wa")
>  	(match_operand:FMOVE64 1 "input_operand"
>               "d,          m,          d,          wY,         <f64_p9>,
>                Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
> -              r,          Y,          r"))]
> +              r,          Y,          r,          eP"))]
>    "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
>     && (gpc_reg_operand (operands[0], <MODE>mode)
>         || gpc_reg_operand (operands[1], <MODE>mode))"
> @@ -8092,20 +8095,21 @@ (define_insn "*mov<mode>_hardfloat32"
>     #
>     #
>     #
> +   #
>     #"
>    [(set_attr "type"
>              "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
>               fpload,      fpstore,    veclogical, veclogical, two,
> -             store,       load,       two")
> +             store,       load,       two,        vecperm")
>     (set_attr "size" "64")
>     (set_attr "length"
>              "*,           *,          *,          *,          *,
>               *,           *,          *,          *,          8,
> -             8,           8,          8")
> +             8,           8,          8,          *")
>     (set_attr "isa"
>              "*,           *,          *,          p9v,        p9v,
>               p7v,         p7v,        *,          *,          *,
> -             *,           *,          *")])
> +             *,           *,          *,          p10")])
> 
>  ;;           STW      LWZ     MR      G-const H-const F-const
> 
> @@ -8132,19 +8136,19 @@ (define_insn "*mov<mode>_softfloat32"
>  ;;           STFD         LFD         FMR         LXSD        STXSD
>  ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
>  ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
> -;;           NOP          MFVSRD      MTVSRD
> +;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
> 
>  (define_insn "*mov<mode>_hardfloat64"
>    [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
>             "=m,           d,          d,          <f64_p9>,   wY,
>               <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
>               YZ,          r,          !r,         *c*l,       !r,
> -            *h,           r,          <f64_dm>")
> +            *h,           r,          <f64_dm>,   wa")
>  	(match_operand:FMOVE64 1 "input_operand"
>              "d,           m,          d,          wY,         <f64_p9>,
>               Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
>               r,           YZ,         r,          r,          *h,
> -             0,           <f64_dm>,   r"))]
> +             0,           <f64_dm>,   r,          eP"))]
>    "TARGET_POWERPC64 && TARGET_HARD_FLOAT
>     && (gpc_reg_operand (operands[0], <MODE>mode)
>         || gpc_reg_operand (operands[1], <MODE>mode))"
> @@ -8166,18 +8170,19 @@ (define_insn "*mov<mode>_hardfloat64"
>     mf%1 %0
>     nop
>     mfvsrd %0,%x1
> -   mtvsrd %x0,%1"
> +   mtvsrd %x0,%1
> +   #"
>    [(set_attr "type"
>              "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
>               fpload,      fpstore,    veclogical, veclogical, integer,
>               store,       load,       *,          mtjmpr,     mfjmpr,
> -             *,           mfvsr,      mtvsr")
> +             *,           mfvsr,      mtvsr,      vecperm")
>     (set_attr "size" "64")
>     (set_attr "isa"
>              "*,           *,          *,          p9v,        p9v,
>               p7v,         p7v,        *,          *,          *,
>               *,           *,          *,          *,          *,
> -             *,           p8v,        p8v")])
> +             *,           p8v,        p8v,        p10")])
> 
>  ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
>  ;;           H-const  F-const  Special
> @@ -8211,6 +8216,62 @@ (define_insn "*mov<mode>_softfloat64"
>     (set_attr "length"
>              "*,       *,      *,      *,      *,      8,
>               12,      16,     *")])
> +

ok


> +;; Split the VSX prefixed instruction to support SFmode and DFmode scalar
> +;; constants that look like DFmode floating point values where both elements
> +;; are the same.  The constant has to be expressible as a SFmode constant that
> +;; is not a SFmode denormal value.
> +;;
> +;; We don't need splitters for the 128-bit types, since the function
> +;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.

ok

> +(define_insn "xxspltidp_<mode>_internal"
> +  [(set (match_operand:SFDF 0 "register_operand" "=wa")
> +	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> +		     UNSPEC_XXSPLTIDP_CONST))]
> +  "TARGET_POWER10"
> +  "xxspltidp %x0,%1"
> +  [(set_attr "type" "vecperm")
> +   (set_attr "prefixed" "yes")])
> +
> +(define_insn "xxspltiw_<mode>_internal"
> +  [(set (match_operand:SFDF 0 "register_operand" "=wa")
> +	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> +		     UNSPEC_XXSPLTIW_CONST))]
> +  "TARGET_POWER10"
> +  "xxspltiw %x0,%1"
> +  [(set_attr "type" "vecperm")
> +   (set_attr "prefixed" "yes")])
> +
> +(define_split
> +  [(set (match_operand:SFDF 0 "vsx_register_operand")
> +	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
> +  "TARGET_POWER10"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx src = operands[1];
> +  vec_const_128bit_type vsx_const;
> +
> +  if (!vec_const_128bit_to_bytes (src, <MODE>mode, &vsx_const))
> +    gcc_unreachable ();
> +
> +  unsigned imm = constant_generates_xxspltidp (&vsx_const);
> +  if (imm)
> +    {
> +      emit_insn (gen_xxspltidp_<mode>_internal (dest, GEN_INT (imm)));
> +      DONE;
> +    }
> +
> +  imm = constant_generates_xxspltiw (&vsx_const);
> +  if (imm)
> +    {
> +      emit_insn (gen_xxspltiw_<mode>_internal (dest, GEN_INT (imm)));
> +      DONE;
> +    }
> +
> +  else
> +    gcc_unreachable ();
> +})


ok
Nothing further, 
thanks,
-Will


>  
>  (define_expand "mov<mode>"
>    [(set (match_operand:FMOVE128 0 "general_operand")
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> new file mode 100644
> index 00000000000..8f6e176f9af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> +   instruction.  */
> +
> +double
> +scalar_double_0 (void)
> +{
> +  return 0.0;			/* XXSPLTIB or XXLXOR.  */
> +}
> +
> +double
> +scalar_double_1 (void)
> +{
> +  return 1.0;			/* XXSPLTIDP.  */
> +}
> +
> +#ifndef __FAST_MATH__
> +double
> +scalar_double_m0 (void)
> +{
> +  return -0.0;			/* XXSPLTIDP.  */
> +}
> +
> +double
> +scalar_double_nan (void)
> +{
> +  return __builtin_nan ("");	/* XXSPLTIDP.  */
> +}
> +
> +double
> +scalar_double_inf (void)
> +{
> +  return __builtin_inf ();	/* XXSPLTIDP.  */
> +}
> +
> +double
> +scalar_double_m_inf (void)	/* XXSPLTIDP.  */
> +{
> +  return - __builtin_inf ();
> +}
> +#endif
> +
> +double
> +scalar_double_pi (void)
> +{
> +  return M_PI;			/* PLFD.  */
> +}
> +
> +double
> +scalar_double_denorm (void)
> +{
> +  return 0x1p-149f;		/* PLFD.  */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> new file mode 100644
> index 00000000000..72504bdfbbd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> +   instruction.  */
> +
> +float
> +scalar_float_0 (void)
> +{
> +  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
> +}
> +
> +float
> +scalar_float_1 (void)
> +{
> +  return 1.0f;			/* XXSPLTIDP.  */
> +}
> +
> +#ifndef __FAST_MATH__
> +float
> +scalar_float_m0 (void)
> +{
> +  return -0.0f;			/* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_nan (void)
> +{
> +  return __builtin_nanf ("");	/* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_inf (void)
> +{
> +  return __builtin_inff ();	/* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_m_inf (void)	/* XXSPLTIDP.  */
> +{
> +  return - __builtin_inff ();
> +}
> +#endif
> +
> +float
> +scalar_float_pi (void)
> +{
> +  return (float)M_PI;		/* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_denorm (void)
> +{
> +  return 0x1p-149f;		/* PLFS.  */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
> -- 
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)
  2021-11-05  4:04 ` [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function) Michael Meissner
  2021-11-05 17:01   ` will schmidt
@ 2021-11-15 16:35   ` Michael Meissner
  2021-12-13 16:58   ` Ping #2: " Michael Meissner
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-15 16:35 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch.

| Date: Fri, 5 Nov 2021 00:04:40 -0400
| Subject: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)
| Message-ID: <YYStWC018qK1Ta33@toto.the-meissners.org>

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)
  2021-11-05  4:07 ` [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ) Michael Meissner
  2021-11-05 17:52   ` will schmidt
@ 2021-11-15 16:36   ` Michael Meissner
  2021-12-13 17:02   ` Ping #2: " Michael Meissner
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-15 16:36 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch:

| Date: Fri, 5 Nov 2021 00:07:05 -0400
| Subject: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)
| Message-ID: <YYSt6fbNwHqmU1wY@toto.the-meissners.org>

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping: [PATCH 3/5] Add Power10 XXSPLTIW
  2021-11-05  4:09 ` [PATCH 3/5] Add Power10 XXSPLTIW Michael Meissner
  2021-11-05 18:50   ` will schmidt
@ 2021-11-15 16:37   ` Michael Meissner
  2021-12-13 17:04   ` Ping #2: " Michael Meissner
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-15 16:37 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch.

| Date: Fri, 5 Nov 2021 00:09:07 -0400
| Subject: [PATCH 3/5] Add Power10 XXSPLTIW
| Message-ID: <YYSuY1C3Dt5RIRd6@toto.the-meissners.org>

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants
  2021-11-05  4:10 ` [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants Michael Meissner
  2021-11-05 19:24   ` will schmidt
@ 2021-11-15 16:38   ` Michael Meissner
  2021-12-13 17:06   ` Ping #2: " Michael Meissner
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-15 16:38 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch.

| Date: Fri, 5 Nov 2021 00:10:18 -0400
| Subject: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants
| Message-ID: <YYSuqkJe2ZwADOA9@toto.the-meissners.org>

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.
  2021-11-05  4:11 ` [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants Michael Meissner
  2021-11-05 19:38   ` will schmidt
@ 2021-11-15 16:38   ` Michael Meissner
  2021-12-13 17:07   ` Ping #2: " Michael Meissner
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 2021-11-15 16:38 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch.

| Date: Fri, 5 Nov 2021 00:11:20 -0400
| Subject: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.
| Message-ID: <YYSu6FMxMQyhRD3d@toto.the-meissners.org>

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping #2: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)
  2021-11-05  4:04 ` [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function) Michael Meissner
  2021-11-05 17:01   ` will schmidt
  2021-11-15 16:35   ` Ping: " Michael Meissner
@ 2021-12-13 16:58   ` Michael Meissner
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 2021-12-13 16:58 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch.

| Date: Fri, 5 Nov 2021 00:04:40 -0400
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)
| Message-ID: <YYStWC018qK1Ta33@toto.the-meissners.org>

Note, I will on-line until December 20th, and then I won't be on-line until January.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping #2: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)
  2021-11-05  4:07 ` [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ) Michael Meissner
  2021-11-05 17:52   ` will schmidt
  2021-11-15 16:36   ` Ping: " Michael Meissner
@ 2021-12-13 17:02   ` Michael Meissner
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 2021-12-13 17:02 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch #2.

| Date: Fri, 5 Nov 2021 00:07:05 -0400
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)
| Message-ID: <YYSt6fbNwHqmU1wY@toto.the-meissners.org>

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583391.html

Note, I will on-line until December 20th, and then off-line until January.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping #2: [PATCH 3/5] Add Power10 XXSPLTIW
  2021-11-05  4:09 ` [PATCH 3/5] Add Power10 XXSPLTIW Michael Meissner
  2021-11-05 18:50   ` will schmidt
  2021-11-15 16:37   ` Ping: " Michael Meissner
@ 2021-12-13 17:04   ` Michael Meissner
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 2021-12-13 17:04 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch #2.

| Date: Fri, 5 Nov 2021 00:09:07 -0400
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: [PATCH 3/5] Add Power10 XXSPLTIW
| Message-ID: <YYSuY1C3Dt5RIRd6@toto.the-meissners.org>

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583392.html

Note, I will on-line through December 20th.  I will be off-line after that
until January.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping #2: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants
  2021-11-05  4:10 ` [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants Michael Meissner
  2021-11-05 19:24   ` will schmidt
  2021-11-15 16:38   ` Ping: " Michael Meissner
@ 2021-12-13 17:06   ` Michael Meissner
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 2021-12-13 17:06 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch #2.

| Date: Fri, 5 Nov 2021 00:10:18 -0400
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants
| Message-ID: <YYSuqkJe2ZwADOA9@toto.the-meissners.org>

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583393.html

Note, I will be on-line through December 20th.  I will be off-line from
December 21st through January 1st.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping #2: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.
  2021-11-05  4:11 ` [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants Michael Meissner
  2021-11-05 19:38   ` will schmidt
  2021-11-15 16:38   ` Ping: " Michael Meissner
@ 2021-12-13 17:07   ` Michael Meissner
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 2021-12-13 17:07 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch #2.

| Date: Fri, 5 Nov 2021 00:11:20 -0400
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.
| Message-ID: <YYSu6FMxMQyhRD3d@toto.the-meissners.org>

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583394.html

Note, I will be on-line through December 20th.  I will be off-line December
21st through January 1st.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)
  2021-11-05 18:01     ` Michael Meissner
@ 2021-12-14 16:57       ` David Edelsohn
  0 siblings, 0 replies; 29+ messages in thread
From: David Edelsohn @ 2021-12-14 16:57 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool
  Cc: GCC Patches, will schmidt, Bill Schmidt, Peter Bergner

On Fri, Nov 5, 2021 at 2:01 PM Michael Meissner <meissner@linux.ibm.com> wrote:
>
> On Fri, Nov 05, 2021 at 12:52:51PM -0500, will schmidt wrote:
> > > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> > > index 956e42bc514..e0d1c718e9f 100644
> > > --- a/gcc/config/rs6000/predicates.md
> > > +++ b/gcc/config/rs6000/predicates.md
> > > @@ -601,6 +601,14 @@ (define_predicate "easy_fp_constant"
> > >    if (TARGET_VSX && op == CONST0_RTX (mode))
> > >      return 1;
> > >
> > > +  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
> >
> > Easy is relative, but OK.
>
> The names of the function is easy_fp_constant.
>
> > > +  vec_const_128bit_type vsx_const;
> > > +  if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
> > > +    {
> > > +      if (constant_generates_lxvkq (&vsx_const) != 0)
> > > +   return true;
> > > +    }
> > > +
> > >    /* Otherwise consider floating point constants hard, so that the
> > >       constant gets pushed to memory during the early RTL phases.  This
> > >       has the advantage that double precision constants that can be
> > > @@ -609,6 +617,23 @@ (define_predicate "easy_fp_constant"
> > >     return 0;
> > >  })
> > >
> > > +;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded
> > > +;; via the LXVKQ instruction.
> > > +
> > > +(define_predicate "easy_vector_constant_ieee128"
> > > +  (match_code "const_vector,const_double")
> > > +{
> > > +  vec_const_128bit_type vsx_const;
> > > +
> > > +  /* Can we generate the LXVKQ instruction?  */
> > > +  if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10
> > > +      || !TARGET_VSX)
> > > +    return false;
> >
> > Presumably all of the checks there are valid.  (Can we have power10
> > without float128_hw or ieee128_constant flags set?)    I do notice the
> > addition of an ieee128_constant flag below.
>
> Yes, we can have power10 without float128_hw.  At the moment, 32-bit big endian
> does not enable the 128-bit IEEE instructions.  Also when we are building the
> bits in libgcc that can switch between compiling the software routines and the
> routines used for IEEE hardware, and when we are building the IEEE 128-bit
> software emulation functions we need to explicitly turn off IEEE 128-bit
> hardware support.
>
> Similarly for VSX, if the user explicitly says -mno-vsx, then we can't enable
> this instruction.
>
> > Ok.  I did look at this a bit before it clicked, so would suggest a
> > comment stl "All of the constants that can be loaded by lxvkq will have
> > zero in the bottom 3 words, so ensure those are zero before we use a
> > switch based on the nonzero portion of the constant."
> >
> > It would be fine as-is too.  :-)
>
> Ok.

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)
  2021-11-05 18:13     ` Michael Meissner
@ 2021-12-14 16:57       ` David Edelsohn
  0 siblings, 0 replies; 29+ messages in thread
From: David Edelsohn @ 2021-12-14 16:57 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Peter Bergner, will schmidt

On Fri, Nov 5, 2021 at 2:13 PM Michael Meissner <meissner@linux.ibm.com> wrote:
>
> On Fri, Nov 05, 2021 at 12:01:43PM -0500, will schmidt wrote:
> > On Fri, 2021-11-05 at 00:04 -0400, Michael Meissner wrote:
> > > Add new constant data structure.
> > >
> > > This patch provides the data structure and function to convert a
> > > CONST_INT, CONST_DOUBLE, CONST_VECTOR, or VEC_DUPLICATE of a constant) to
> > > an array of bytes, half-words, words, and  double words that can be loaded
> > > into a 128-bit vector register.
> > >
> > > The next patches will use this data structure to generate code that
> > > generates load of the vector/floating point registers using the XXSPLTIDP,
> > > XXSPLTIW, and LXVKQ instructions that were added in power10.
> > >
> > > 2021-11-05  Michael Meissner  <meissner@the-meissners.org>
> > >
>
> Whoops, it should be meissner@linux.ibm.com.
>
> > comment to be explicit on the structure name being copied to/from.
> > (vec_const_128bit_type is easy to search for, vector or constant or
> > structure are not as unique)
>
> Yes, the original name was more generic (rs6000_const).  Originally it could
> potentially handle vector constants that were greater than 128-bits if we ever
> have support for larger vectors.  But I thought that extra generallity hindered
> the code (since you had to check whether the size was exactly 128-bits, etc.).
> So I made the data structure tailored to the problem at hand.
>
> > > +
> > > +/* Copy an floating point constant to the vector constant structure.  */
> > > +
> >
> > s/an/a/
>
> Ok.
>
> > > +static void
> > > +constant_fp_to_128bit_vector (rtx op,
> > > +                         machine_mode mode,
> > > +                         size_t byte_num,
> > > +                         vec_const_128bit_type *info)
> > > +{
> > > +  unsigned bitsize = GET_MODE_BITSIZE (mode);
> > > +  unsigned num_words = bitsize / 32;
> > > +  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
> > > +  long real_words[VECTOR_128BIT_WORDS];
> > > +
> > > +  /* Make sure we don't overflow the real_words array and that it is
> > > +     filled completely.  */
> > > +  gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0);
> >
> > Not clear to me on the potential to partially fill the real_words
> > array.
>
> At the moment we don't support a 16-bit floating point type in the compiler
> (the Power10 has limited 16-bit floating point support, but we don't make a
> special type for it).  If/when we add the 16-bit floating point, we will
> possibly need to revisit this.
>
> > > +
> > > +  real_to_target (real_words, rtype, mode);
> > > +
> > > +  /* Iterate over each 32-bit word in the floating point constant.  The
> > > +     real_to_target function puts out words in endian fashion.  We need
> >
> > Meaning host-endian fashion, or is that meant to be big-endian ?
>
> Real_to_target puts out the 32-bit values in endian fashion.  This data
> structure wants to hold everything in big endian fashion to make checking
> things simpler.
>
> > Perhaps also rephrase or move the comment up to indicate that
> > real_to_target will have placed or has already placed the words in
> > <whatever> endian fashion.
> > As stated I was expecting to see a call to real_to_target() below the
> > comment.
>
> Yes, I probably should move the real_to_target call after the comment.
>
> > > +
> > > +  /* Possibly splat the constant to fill a vector size.  */
> >
> >
> > Suggest "Splat the constant to fill a vector size if ..."
>
> Ok.

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] Add Power10 XXSPLTIW
  2021-11-05 18:50   ` will schmidt
@ 2021-12-14 16:59     ` David Edelsohn
  0 siblings, 0 replies; 29+ messages in thread
From: David Edelsohn @ 2021-12-14 16:59 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Peter Bergner, will schmidt

On Fri, Nov 5, 2021 at 2:50 PM will schmidt <will_schmidt@vnet.ibm.com> wrote:
>
> On Fri, 2021-11-05 at 00:09 -0400, Michael Meissner wrote:
> > Generate XXSPLTIW on power10.
> >
>
> Hi,
>
>
> > This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
> > instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
> > adding support for vector constants that can be used, and adding a
> > VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
> >
> > The eP constraint was added to recognize constants that can be loaded into
> > vector registers with a single prefixed instruction.
>
> Perhaps Swap "... the eP constraint was added ..."  for "Add the eP
> constraint to ..."
>
>
> >
> > I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
> > constants.
>
>
> >
> > 2021-11-05  Michael Meissner  <meissner@linux.ibm.com>
> >
> > gcc/
> >
> >       * config/rs6000/constraints.md (eP): Update comment.
> >       * config/rs6000/predicates.md (easy_fp_constant): Add support for
> >       generating XXSPLTIW.
> >       (vsx_prefixed_constant): New predicate.
> >       (easy_vector_constant): Add support for
> >       generating XXSPLTIW.
> >       * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
> >       declaration.
> >       (constant_generates_xxspltiw): Likewise.
> >       * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
> >       XXSPLTIW, don't do XXSPLTIB and sign extend.
>
> Perhaps just 'generate XXSPLTIW if possible'.
>
> >       (output_vec_const_move): Add support for XXSPLTIW.
> >       (prefixed_xxsplti_p): New function.
> >       (constant_generates_xxspltiw): New function.
> >       * config/rs6000/rs6000.md (prefixed attribute): Add support to
> >       mark XXSPLTI* instructions as being prefixed.
> >       * config/rs6000/rs6000.opt (-msplat-word-constant): New debug
> >       switch.
> >       * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
> >       generating XXSPLTIW or XXSPLTIDP.
> >       (vsx_mov<mode>_32bit): Likewise.
> >       * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
> >       eP constraint.
> >
> > gcc/testsuite/
> >
> >       * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
> >       * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
> >       * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
> >       * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
> >       * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.
> > ---
> >  gcc/config/rs6000/constraints.md              |  6 ++
> >  gcc/config/rs6000/predicates.md               | 46 ++++++++++-
> >  gcc/config/rs6000/rs6000-protos.h             |  2 +
> >  gcc/config/rs6000/rs6000.c                    | 81 +++++++++++++++++++
> >  gcc/config/rs6000/rs6000.md                   |  5 ++
> >  gcc/config/rs6000/rs6000.opt                  |  4 +
> >  gcc/config/rs6000/vsx.md                      | 28 +++----
> >  gcc/doc/md.texi                               |  4 +
> >  .../powerpc/vec-splat-constant-v16qi.c        | 27 +++++++
> >  .../powerpc/vec-splat-constant-v4sf.c         | 67 +++++++++++++++
> >  .../powerpc/vec-splat-constant-v4si.c         | 51 ++++++++++++
> >  .../powerpc/vec-splat-constant-v8hi.c         | 62 ++++++++++++++
> >  .../gcc.target/powerpc/vec-splati-runnable.c  |  4 +-
> >  13 files changed, 369 insertions(+), 18 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
> >
> > diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
> > index e72132b4c28..a4b05837fa6 100644
> > --- a/gcc/config/rs6000/constraints.md
> > +++ b/gcc/config/rs6000/constraints.md
> > @@ -213,6 +213,12 @@ (define_constraint "eI"
> >    "A signed 34-bit integer constant if prefixed instructions are supported."
> >    (match_operand 0 "cint34_operand"))
> >
> > +;; A SF/DF scalar constant or a vector constant that can be loaded into vector
> > +;; registers with one prefixed instruction such as XXSPLTIDP or XXSPLTIW.
> > +(define_constraint "eP"
> > +  "A constant that can be loaded into a VSX register with one prefixed insn."
> > +  (match_operand 0 "vsx_prefixed_constant"))
> > +
> >  ;; A TF/KF scalar constant or a vector constant that can load certain IEEE
> >  ;; 128-bit constants into vector registers using LXVKQ.
> >  (define_constraint "eQ"
> > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> > index e0d1c718e9f..ed6252bd0c4 100644
> > --- a/gcc/config/rs6000/predicates.md
> > +++ b/gcc/config/rs6000/predicates.md
> > @@ -605,7 +605,10 @@ (define_predicate "easy_fp_constant"
> >    vec_const_128bit_type vsx_const;
> >    if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
> >      {
> > -      if (constant_generates_lxvkq (&vsx_const) != 0)
> > +      if (constant_generates_lxvkq (&vsx_const))
> > +     return true;
> > +
> > +      if (constant_generates_xxspltiw (&vsx_const))
> >       return true;
> >      }
> >
>
> ok
>
> > @@ -617,6 +620,42 @@ (define_predicate "easy_fp_constant"
> >     return 0;
> >  })
> >
> > +;; Return 1 if the operand is a 64-bit floating point scalar constant or a
> > +;; vector constant that can be loaded to a VSX register with one prefixed
> > +;; instruction, such as XXSPLTIDP or XXSPLTIW.
> > +;;
> > +;; In addition regular constants, we also recognize constants formed with the
> > +;; VEC_DUPLICATE insn from scalar constants.
> > +;;
> > +;; We don't handle scalar integer constants here because the assumption is the
> > +;; normal integer constants will be loaded into GPR registers.  For the
> > +;; constants that need to be loaded into vector registers, the instructions
> > +;; don't work well with TImode variables assigned a constant.  This is because
> > +;; the 64-bit scalar constants are splatted into both halves of the register.
> > +
> > +(define_predicate "vsx_prefixed_constant"
> > +  (match_code "const_double,const_vector,vec_duplicate")
> > +{
> > +  /* If we can generate the constant with 1-2 Altivec instructions, don't
> > +      generate a prefixed instruction.  */
>
> 1-2 Altivec instructions is both vague and specific.  Perhaps swap for
> a comment something like "If ..  with easy altivec instructions ... "
>
> > +  if (CONST_VECTOR_P (op) && easy_altivec_constant (op, mode))
> > +    return false;
> > +
> > +  /* Do we have prefixed instructions and are VSX registers available?  Is the
> > +     constant recognized?  */
> > +  if (!TARGET_PREFIXED || !TARGET_VSX)
> > +    return false;
> > +
> > +  vec_const_128bit_type vsx_const;
> > +  if (!vec_const_128bit_to_bytes (op, mode, &vsx_const))
> > +    return false;
> > +
> > +  if (constant_generates_xxspltiw (&vsx_const))
> > +    return true;
> > +
> > +  return false;
> > +})
>
> ok
>
> > +
> >  ;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded
> >  ;; via the LXVKQ instruction.
> >
> > @@ -683,7 +722,10 @@ (define_predicate "easy_vector_constant"
> >        vec_const_128bit_type vsx_const;
> >        if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
> >       {
> > -       if (constant_generates_lxvkq (&vsx_const) != 0)
> > +       if (constant_generates_lxvkq (&vsx_const))
> > +         return true;
> > +
> > +       if (constant_generates_xxspltiw (&vsx_const))
> >           return true;
> >       }
>
>
> ok
>
>
> >
> > diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> > index 494a95cc6ee..99c6a671289 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
> >  extern bool prefixed_load_p (rtx_insn *);
> >  extern bool prefixed_store_p (rtx_insn *);
> >  extern bool prefixed_paddi_p (rtx_insn *);
> > +extern bool prefixed_xxsplti_p (rtx_insn *);
> >  extern void rs6000_asm_output_opcode (FILE *);
> >  extern void output_pcrel_opt_reloc (rtx);
> >  extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
> > @@ -251,6 +252,7 @@ typedef struct {
> >  extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
> >                                      vec_const_128bit_type *);
> >  extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
> > +extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *);
> >  #endif /* RTX_CODE */
> >
> >  #ifdef TREE_CODE
>
>
> ok
>
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 06d02085b06..be24f56eb31 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -6940,6 +6940,11 @@ xxspltib_constant_p (rtx op,
> >    else if (IN_RANGE (value, -1, 0))
> >      *num_insns_ptr = 1;
> >
> > +  /* If we can generate XXSPLTIW or XXSPLTIDP, don't generate XXSPLTIB and a
> > +     sign extend operation.  */
> > +  else if (vsx_prefixed_constant (op, mode))
> > +    return false;
>
> Comment is accurate, but might be clearer with stl
>   Don't generate this (xxspltib) instruction if we will be able to
> generate an xxspltiw or xxspltidp.
>
>
> > +
> >    else
> >      *num_insns_ptr = 2;
> >
> > @@ -7000,6 +7005,13 @@ output_vec_const_move (rtx *operands)
> >             operands[2] = GEN_INT (imm);
> >             return "lxvkq %x0,%2";
> >           }
> > +
> > +       imm = constant_generates_xxspltiw (&vsx_const);
> > +       if (imm)
> > +         {
> > +           operands[2] = GEN_INT (imm);
> > +           return "xxspltiw %x0,%2";
> > +         }
> >       }
> >
> >        if (TARGET_P9_VECTOR
>
> ok
>
> > @@ -26767,6 +26779,41 @@ prefixed_paddi_p (rtx_insn *insn)
> >    return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
> >  }
> >
> > +/* Whether an instruction is a prefixed XXSPLTI* instruction.  This is called
> > +   from the prefixed attribute processing.  */
> > +
> > +bool
> > +prefixed_xxsplti_p (rtx_insn *insn)
> > +{
> > +  rtx set = single_set (insn);
> > +  if (!set)
> > +    return false;
> > +
> > +  rtx dest = SET_DEST (set);
> > +  rtx src = SET_SRC (set);
> > +  machine_mode mode = GET_MODE (dest);
> > +
> > +  if (!REG_P (dest) && !SUBREG_P (dest))
> > +    return false;
> > +
> > +  if (GET_CODE (src) == UNSPEC)
> > +    {
> > +      int unspec = XINT (src, 1);
> > +      return (unspec == UNSPEC_XXSPLTIW
> > +           || unspec == UNSPEC_XXSPLTIDP
> > +           || unspec == UNSPEC_XXSPLTI32DX);
> > +    }
> > +
> > +  vec_const_128bit_type vsx_const;
> > +  if (vec_const_128bit_to_bytes (src, mode, &vsx_const))
> > +    {
> > +      if (constant_generates_xxspltiw (&vsx_const))
> > +     return true;
> > +    }
> > +
> > +  return false;
> > +}
> > +
>
> ok.
>
>
> >  /* Whether the next instruction needs a 'p' prefix issued before the
> >     instruction is printed out.  */
> >  static bool prepend_p_to_next_insn;
> > @@ -28933,6 +28980,40 @@ constant_generates_lxvkq (vec_const_128bit_type *vsx_const)
> >    return 0;
> >  }
> >
> > +/* Determine if a vector constant can be loaded with XXSPLTIW.  Return zero if
> > +   the XXSPLTIW instruction cannot be used.  Otherwise return the immediate
> > +   value to be used with the XXSPLTIW instruction.  */
> > +
> > +unsigned
> > +constant_generates_xxspltiw (vec_const_128bit_type *vsx_const)
> > +{
> > +  if (!TARGET_SPLAT_WORD_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
> > +    return 0;
> > +
> > +  if (!vsx_const->all_words_same)
> > +    return 0;
> > +
> > +  /* If we can use XXSPLTIB, don't generate XXSPLTIW.  */
> > +  if (vsx_const->all_bytes_same)
> > +    return 0;
> > +
> > +  /* See if we can use VSPLTISH or VSPLTISW.  */
> > +  if (vsx_const->all_half_words_same)
> > +    {
> > +      unsigned short h_word = vsx_const->half_words[0];
> > +      short sign_h_word = ((h_word & 0xffff) ^ 0x8000) - 0x8000;
> > +      if (EASY_VECTOR_15 (sign_h_word))
> > +     return 0;
> > +    }
> > +
> > +  unsigned int word = vsx_const->words[0];
> > +  int sign_word = ((word & 0xffffffff) ^ 0x80000000) - 0x80000000;
> > +  if (EASY_VECTOR_15 (sign_word))
> > +    return 0;
> > +
> > +  return vsx_const->words[0];
> > +}
> > +
>
> ok
>
> >
> >  struct gcc_target targetm = TARGET_INITIALIZER;
> >
> > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> > index 6bec2bddbde..3a7bcd2426e 100644
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -314,6 +314,11 @@ (define_attr "prefixed" "no,yes"
> >
> >        (eq_attr "type" "integer,add")
> >        (if_then_else (match_test "prefixed_paddi_p (insn)")
> > +                    (const_string "yes")
> > +                    (const_string "no"))
> > +
> > +      (eq_attr "type" "vecperm")
> > +      (if_then_else (match_test "prefixed_xxsplti_p (insn)")
> >                      (const_string "yes")
> >                      (const_string "no"))]
> >
>
> ok
>
>
> > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> > index b7433ec4e30..ec7b106fddb 100644
> > --- a/gcc/config/rs6000/rs6000.opt
> > +++ b/gcc/config/rs6000/rs6000.opt
> > @@ -640,6 +640,10 @@ mprivileged
> >  Target Var(rs6000_privileged) Init(0)
> >  Generate code that will run in privileged state.
> >
> > +msplat-word-constant
> > +Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
> > +Generate (do not generate) code that uses the XXSPLTIW instruction.
> > +
> >  mieee128-constant
> >  Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
> >  Generate (do not generate) code that uses the LXVKQ instruction.
>
> ok
>
>
> > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> > index 0a376ee4c28..9f0c48db6f2 100644
> > --- a/gcc/config/rs6000/vsx.md
> > +++ b/gcc/config/rs6000/vsx.md
> > @@ -1192,19 +1192,19 @@ (define_insn_and_split "*xxspltib_<mode>_split"
> >
> >  ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
> >  ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
> > -;;              LXVKQ
> > +;;              LXVKQ      XXSPLTI*
> >  ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
> >  (define_insn "vsx_mov<mode>_64bit"
> >    [(set (match_operand:VSX_M 0 "nonimmediate_operand"
> >                 "=ZwO,      wa,        wa,        r,         we,        ?wQ,
> >                  ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
> > -                wa,
> > +                wa,        wa,
> >                  ?wa,       v,         <??r>,     wZ,        v")
> >
> >       (match_operand:VSX_M 1 "input_operand"
> >                 "wa,        ZwO,       wa,        we,        r,         r,
> >                  wQ,        Y,         r,         r,         wE,        jwM,
> > -                eQ,
> > +                eQ,        eP,
> >                  ?jwM,      W,         <nW>,      v,         wZ"))]
> >
> >    "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
> > @@ -1216,43 +1216,43 @@ (define_insn "vsx_mov<mode>_64bit"
> >    [(set_attr "type"
> >                 "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
> >                  store,     load,      store,     *,         vecsimple, vecsimple,
> > -                vecperm,
> > +                vecperm,   vecperm,
> >                  vecsimple, *,         *,         vecstore,  vecload")
> >     (set_attr "num_insns"
> >                 "*,         *,         *,         2,         *,         2,
> >                  2,         2,         2,         2,         *,         *,
> > -                *,
> > +                *,         *,
> >                  *,         5,         2,         *,         *")
> >     (set_attr "max_prefixed_insns"
> >                 "*,         *,         *,         *,         *,         2,
> >                  2,         2,         2,         2,         *,         *,
> > -                *,
> > +                *,         *,
> >                  *,         *,         *,         *,         *")
> >     (set_attr "length"
> >                 "*,         *,         *,         8,         *,         8,
> >                  8,         8,         8,         8,         *,         *,
> > -                *,
> > +                *,         *,
> >                  *,         20,        8,         *,         *")
> >     (set_attr "isa"
> >                 "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
> >                  *,         *,         *,         *,         p9v,       *,
> > -                p10,
> > +                p10,       p10,
> >                  <VSisa>,   *,         *,         *,         *")])
> >
> >  ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
> > -;;              LXVKQ
> > +;;              LXVKQ      XXSPLTI*
> >  ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
> >  ;;              LVX (VMX)  STVX (VMX)
> >  (define_insn "*vsx_mov<mode>_32bit"
> >    [(set (match_operand:VSX_M 0 "nonimmediate_operand"
> >                 "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
> > -                wa,
> > +                wa,        wa,
> >                  wa,        v,         ?wa,       v,         <??r>,
> >                  wZ,        v")
> >
> >       (match_operand:VSX_M 1 "input_operand"
> >                 "wa,        ZwO,       wa,        Y,         r,         r,
> > -                eQ,
> > +                eQ,        eP,
> >                  wE,        jwM,       ?jwM,      W,         <nW>,
> >                  v,         wZ"))]
> >
> > @@ -1264,17 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit"
> >  }
> >    [(set_attr "type"
> >                 "vecstore,  vecload,   vecsimple, load,      store,    *,
> > -                vecperm,
> > +                vecperm,   vecperm,
> >                  vecsimple, vecsimple, vecsimple, *,         *,
> >                  vecstore,  vecload")
> >     (set_attr "length"
> >                 "*,         *,         *,         16,        16,        16,
> > -                *,
> > +                *,         *,
> >                  *,         *,         *,         20,        16,
> >                  *,         *")
> >     (set_attr "isa"
> >                 "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
> > -                p10,
> > +                p10,       p10,
> >                  p9v,       *,         <VSisa>,   *,         *,
> >                  *,         *")])
> >
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 4af8fd76992..41a568b7d4e 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -3336,6 +3336,10 @@ A constant whose negation is a signed 16-bit constant.
> >  @item eI
> >  A signed 34-bit integer constant if prefixed instructions are supported.
> >
> > +@item eP
> > +A scalar floating point constant or a vector constant that can be
> > +loaded with one prefixed instruction to a VSX register.
>
>
> ...  loaded to a VSX register with one previxed instruction.
>
>
> > +
> >  @item eQ
> >  An IEEE 128-bit constant that can be loaded into a VSX register with a
> >  single instruction.
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
> > new file mode 100644
> > index 00000000000..27764ddbc83
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target power10_ok } */
> > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> > +
>
>
>
> > +#include <altivec.h>
> > +
> > +/* Test whether XXSPLTIW is generated for V16HI vector constants where the
> > +   first 4 elements are the same as the next 4 elements, etc.  */
> > +
> > +vector unsigned char
> > +v16qi_const_1 (void)
> > +{
> > +  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
> > +                               1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
> > +}
> > +
> > +vector unsigned char
> > +v16qi_const_2 (void)
> > +{
> > +  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
> > +                               1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
> > +/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
> > +/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
> > +/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
>
>
> ok
>
>
> Nothing jumped out at me with the test cases below..
>
> Thanks
> -Wil

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants
  2021-11-05 19:24   ` will schmidt
@ 2021-12-14 17:00     ` David Edelsohn
  0 siblings, 0 replies; 29+ messages in thread
From: David Edelsohn @ 2021-12-14 17:00 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Peter Bergner, will schmidt

On Fri, Nov 5, 2021 at 3:24 PM will schmidt <will_schmidt@vnet.ibm.com> wrote:
>
> On Fri, 2021-11-05 at 00:10 -0400, Michael Meissner wrote:
> > Generate XXSPLTIDP for vectors on power10.
> >
> > This patch implements XXSPLTIDP support for all vector constants.  The
> > XXSPLTIDP instruction is given a 32-bit immediate that is converted to a vector
> > of two DFmode constants.  The immediate is in SFmode format, so only constants
> > that fit as SFmode values can be loaded with XXSPLTIDP.
> >
> > The constraint (eP) added in the previous patch for XXSPLTIW is also used
> > for XXSPLTIDP.
> >
>
> ok
>
>
> > DImode scalar constants are not handled.  This is due to the majority of DImode
> > constants will be in the GPR registers.  With vector registers, you have the
> > problem that XXSPLTIDP splats the double word into both elements of the
> > vector.  However, if TImode is loaded with an integer constant, it wants a full
> > 128-bit constant.
>
> This may be worth as adding to a todo somewhere in the code.
>
> >
> > SFmode and DFmode scalar constants are not handled in this patch.  The
> > support for for those constants will be in the next patch.
>
> ok
>
> >
> > I have added a temporary switch (-msplat-float-constant) to control whether or
> > not the XXSPLTIDP instruction is generated.
> >
> > I added 2 new tests to test loading up V2DI and V2DF vector constants.
>
>
>
>
> >
> > 2021-11-05  Michael Meissner  <meissner@the-meissners.org>
> >
> > gcc/
> >
> >       * config/rs6000/predicates.md (easy_fp_constant): Add support for
> >       generating XXSPLTIDP.
> >       (vsx_prefixed_constant): Likewise.
> >       (easy_vector_constant): Likewise.
> >       * config/rs6000/rs6000-protos.h (constant_generates_xxspltidp):
> >       New declaration.
> >       * config/rs6000/rs6000.c (output_vec_const_move): Add support for
> >       generating XXSPLTIDP.
> >       (prefixed_xxsplti_p): Likewise.
> >       (constant_generates_xxspltidp): New function.
> >       * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.
> >
> > gcc/testsuite/
> >
> >       * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
> >       regex for power10.
> >       * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
> >       * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
> > ---
>
>
> ok
>
> >  gcc/config/rs6000/predicates.md               |   9 ++
> >  gcc/config/rs6000/rs6000-protos.h             |   1 +
> >  gcc/config/rs6000/rs6000.c                    | 108 ++++++++++++++++++
> >  gcc/config/rs6000/rs6000.opt                  |   4 +
> >  .../powerpc/pr86731-fwrapv-longlong.c         |   9 +-
> >  .../powerpc/vec-splat-constant-v2df.c         |  64 +++++++++++
> >  .../powerpc/vec-splat-constant-v2di.c         |  50 ++++++++
> >  7 files changed, 241 insertions(+), 4 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
> >
> > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> > index ed6252bd0c4..d748b11857c 100644
> > --- a/gcc/config/rs6000/predicates.md
> > +++ b/gcc/config/rs6000/predicates.md
> > @@ -610,6 +610,9 @@ (define_predicate "easy_fp_constant"
> >
> >        if (constant_generates_xxspltiw (&vsx_const))
> >       return true;
> > +
> > +      if (constant_generates_xxspltidp (&vsx_const))
> > +     return true;
> >      }
> >
> >    /* Otherwise consider floating point constants hard, so that the
> > @@ -653,6 +656,9 @@ (define_predicate "vsx_prefixed_constant"
> >    if (constant_generates_xxspltiw (&vsx_const))
> >      return true;
> >
> > +  if (constant_generates_xxspltidp (&vsx_const))
> > +    return true;
> > +
> >    return false;
> >  })
> >
> > @@ -727,6 +733,9 @@ (define_predicate "easy_vector_constant"
> >
> >         if (constant_generates_xxspltiw (&vsx_const))
> >           return true;
> > +
> > +       if (constant_generates_xxspltidp (&vsx_const))
> > +         return true;
> >       }
>
>
> ok
>
> >
> >        if (TARGET_P9_VECTOR
> > diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> > index 99c6a671289..2d28df7442d 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -253,6 +253,7 @@ extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
> >                                      vec_const_128bit_type *);
> >  extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
> >  extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *);
> > +extern unsigned constant_generates_xxspltidp (vec_const_128bit_type *);
> >  #endif /* RTX_CODE */
> >
> >  #ifdef TREE_CODE
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index be24f56eb31..8fde48cf2b3 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -7012,6 +7012,13 @@ output_vec_const_move (rtx *operands)
> >             operands[2] = GEN_INT (imm);
> >             return "xxspltiw %x0,%2";
> >           }
> > +
> > +       imm = constant_generates_xxspltidp (&vsx_const);
> > +       if (imm)
>
>
> Just a nit that the two lines could be combined into a similar form
> as used elsewhere as ...
>         if (constant_generates_xxspltidp(&vsx_const))
>
>
> > +         {
> > +           operands[2] = GEN_INT (imm);
> > +           return "xxspltidp %x0,%2";
> > +         }
>
> >       }
> >
> >        if (TARGET_P9_VECTOR
> > @@ -26809,6 +26816,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
> >      {
> >        if (constant_generates_xxspltiw (&vsx_const))
> >       return true;
> > +
> > +      if (constant_generates_xxspltidp (&vsx_const))
> > +     return true;
> >      }
> >
> >    return false;
> > @@ -29014,6 +29024,104 @@ constant_generates_xxspltiw (vec_const_128bit_type *vsx_const)
> >    return vsx_const->words[0];
> >  }
> >
> > +/* Determine if a vector constant can be loaded with XXSPLTIDP.  Return zero if
> > +   the XXSPLTIDP instruction cannot be used.  Otherwise return the immediate
> > +   value to be used with the XXSPLTIDP instruction.  */
> > +
> > +unsigned
> > +constant_generates_xxspltidp (vec_const_128bit_type *vsx_const)
> > +{
> > +  if (!TARGET_SPLAT_FLOAT_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
> > +    return 0;
> > +
> > +  /* Make sure that the two 64-bit segments are the same.  */
> > +  if (!vsx_const->all_double_words_same)
> > +    return 0;
>
> Perhaps more like "Reject if the two 64-bit segments are (not?) the
> same."
>
>
> > +
> > +  /* If the bytes, half words, or words are all the same, don't use XXSPLTIDP.
> > +     Use a simpler instruction (XXSPLTIB, VSPLTISB, VSPLTISH, or VSPLTISW).  */
> > +  if (vsx_const->all_bytes_same
> > +      || vsx_const->all_half_words_same
> > +      || vsx_const->all_words_same)
> > +    return 0;
> > +
> > +  unsigned HOST_WIDE_INT value = vsx_const->double_words[0];
> > +
> > +  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
> > +     pattern and the signalling NaN bit pattern.  Recognize infinity and
> > +     negative infinity.  */
> > +
> > +  /* Bit representation of DFmode normal quiet NaN.  */
> > +#define RS6000_CONST_DF_NAN  HOST_WIDE_INT_UC (0x7ff8000000000000)
> > +
> > +  /* Bit representation of DFmode normal signaling NaN.  */
> > +#define RS6000_CONST_DF_NANS HOST_WIDE_INT_UC (0x7ff4000000000000)
> > +
> > +  /* Bit representation of DFmode positive infinity.  */
> > +#define RS6000_CONST_DF_INF  HOST_WIDE_INT_UC (0x7ff0000000000000)
> > +
> > +  /* Bit representation of DFmode negative infinity.  */
> > +#define RS6000_CONST_DF_NEG_INF      HOST_WIDE_INT_UC (0xfff0000000000000)
>
> Defines may be more useful in a header file?
>
> > +
> > +  if (value != RS6000_CONST_DF_NAN
> > +      && value != RS6000_CONST_DF_NANS
> > +      && value != RS6000_CONST_DF_INF
> > +      && value != RS6000_CONST_DF_NEG_INF)
> > +    {
> > +      /* The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for
> > +      the exponent, and 52 bits for the mantissa (not counting the hidden
> > +      bit used for normal numbers).  NaN values have the exponent set to all
> > +      1 bits, and the mantissa non-zero (mantissa == 0 is infinity).  */
> > +
> > +      int df_exponent = (value >> 52) & 0x7ff;
> > +      unsigned HOST_WIDE_INT df_mantissa
> > +     = value & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
>
>
> Should the "=" be on the end of the previous line?
>
>
> > +
> > +      if (df_exponent == 0x7ff && df_mantissa != 0)  /* other NaNs.  */
> > +     return 0;
> > +
> > +      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
> > +      the exponent all 0 bits, and the mantissa non-zero.  If the value is
> > +      subnormal, then the hidden bit in the mantissa is not set.  */
> > +      if (df_exponent == 0 && df_mantissa != 0)              /* subnormal.  */
> > +     return 0;
> > +    }
> > +
> > +  /* Change the representation to DFmode constant.  */
> > +  long df_words[2] = { vsx_const->words[0], vsx_const->words[1] };
> > +
> > +  /* real_from_target takes the target words in  target order.  */
>
> Extra space before target order.
>
> > +  if (!BYTES_BIG_ENDIAN)
> > +    std::swap (df_words[0], df_words[1]);
> > +
> > +  REAL_VALUE_TYPE rv_type;
> > +  real_from_target (&rv_type, df_words, DFmode);
> > +
> > +  const REAL_VALUE_TYPE *rv = &rv_type;
> > +
> > +  /* Validate that the number can be stored as a SFmode value.  */
> > +  if (!exact_real_truncate (SFmode, rv))
> > +    return 0;
> > +
> > +  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
> > +     mantissa field is non-zero) which is undefined for the XXSPLTIDP
> > +     instruction.  */
> > +  long sf_value;
> > +  real_to_target (&sf_value, rv, SFmode);
> > +
> > +  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
> > +     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
> > +     0 bits, and the mantissa non-zero.  */
> > +  long sf_exponent = (sf_value >> 23) & 0xFF;
> > +  long sf_mantissa = sf_value & 0x7FFFFF;
> > +
> > +  if (sf_exponent == 0 && sf_mantissa != 0)
> > +    return 0;
> > +
> > +  /* Return the immediate to be used.  */
> > +  return sf_value;
> > +}
>
> ok
>
> > +
> >
> >  struct gcc_target targetm = TARGET_INITIALIZER;
> >
> > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> > index ec7b106fddb..c1d661d7e6b 100644
> > --- a/gcc/config/rs6000/rs6000.opt
> > +++ b/gcc/config/rs6000/rs6000.opt
> > @@ -644,6 +644,10 @@ msplat-word-constant
> >  Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
> >  Generate (do not generate) code that uses the XXSPLTIW instruction.
> >
> > +msplat-float-constant
> > +Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
> > +Generate (do not generate) code that uses the XXSPLTIDP instruction.
> > +
> >  mieee128-constant
> >  Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
> >  Generate (do not generate) code that uses the LXVKQ instruction.
>
> ok
>
>
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> > index bd1502bb30a..dcb30e1d886 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> > @@ -24,11 +24,12 @@ vector signed long long splats4(void)
> >          return (vector signed long long) vec_sl(mzero, mzero);
> >  }
> >
> > -/* Codegen will consist of splat and shift instructions for most types.
> > -   If folding is enabled, the vec_sl tests using vector long long type will
> > -   generate a lvx instead of a vspltisw+vsld pair.  */
> > +/* Codegen will consist of splat and shift instructions for most types.  If
> > +   folding is enabled, the vec_sl tests using vector long long type will
> > +   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
> > +   generate a xxspltidp instruction instead of the lvx.  */
> >
> >  /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
> >  /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
> > -/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
> > +/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
>
>
> ok
>
> No further comments,
> Thanks
> -Will

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.
  2021-11-05 19:38   ` will schmidt
@ 2021-12-14 17:01     ` David Edelsohn
  0 siblings, 0 replies; 29+ messages in thread
From: David Edelsohn @ 2021-12-14 17:01 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool
  Cc: will schmidt, GCC Patches, Bill Schmidt, Peter Bergner

On Fri, Nov 5, 2021 at 3:38 PM will schmidt <will_schmidt@vnet.ibm.com> wrote:
>
> On Fri, 2021-11-05 at 00:11 -0400, Michael Meissner wrote:
> > Generate XXSPLTIDP for scalars on power10.
> >
> > This patch implements XXSPLTIDP support for SF, and DF scalar constants.
> > The previous patch added support for vector constants.  This patch adds
> > the support for SFmode and DFmode scalar constants.
> >
> > I added 2 new tests to test loading up SF and DF scalar constants.
>
>
> ok
>
> >
> > 2021-11-05  Michael Meissner  <meissner@the-meissners.org>
> >
> > gcc/
> >
> >       * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
> >       (UNSPEC_XXSPLTIW_CONST): New unspec.
> >       (movsf_hardfloat): Add support for generating XXSPLTIDP.
> >       (mov<mode>_hardfloat32): Likewise.
> >       (mov<mode>_hardfloat64): Likewise.
> >       (xxspltidp_<mode>_internal): New insns.
> >       (xxspltiw_<mode>_internal): New insns.
> >       (splitters for SF/DFmode): Add new splitters for XXSPLTIDP.
> >
> > gcc/testsuite/
> >
> >       * gcc.target/powerpc/vec-splat-constant-df.c: New test.
> >       * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
> > ---
>
> ok
>
>
> >  gcc/config/rs6000/rs6000.md                   | 97 +++++++++++++++----
> >  .../powerpc/vec-splat-constant-df.c           | 60 ++++++++++++
> >  .../powerpc/vec-splat-constant-sf.c           | 60 ++++++++++++
> >  3 files changed, 199 insertions(+), 18 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> >
> > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> > index 3a7bcd2426e..4122acb98cf 100644
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -156,6 +156,8 @@ (define_c_enum "unspec"
> >     UNSPEC_PEXTD
> >     UNSPEC_HASHST
> >     UNSPEC_HASHCHK
> > +   UNSPEC_XXSPLTIDP_CONST
> > +   UNSPEC_XXSPLTIW_CONST
> >    ])
> >
> >  ;;
> > @@ -7764,17 +7766,17 @@ (define_split
> >  ;;
> >  ;;   LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
> >  ;;   STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
> > -;;   MR           MT<x>      MF<x>       NOP
> > +;;   MR           MT<x>      MF<x>       NOP        XXSPLTIDP
> >
> >  (define_insn "movsf_hardfloat"
> >    [(set (match_operand:SF 0 "nonimmediate_operand"
> >        "=!r,       f,         v,          wa,        m,         wY,
> >         Z,         m,         wa,         !r,        f,         wa,
> > -       !r,        *c*l,      !r,         *h")
> > +       !r,        *c*l,      !r,         *h,        wa")
> >       (match_operand:SF 1 "input_operand"
> >        "m,         m,         wY,         Z,         f,         v,
> >         wa,        r,         j,          j,         f,         wa,
> > -       r,         r,         *h,         0"))]
> > +       r,         r,         *h,         0,         eP"))]
> >    "(register_operand (operands[0], SFmode)
> >     || register_operand (operands[1], SFmode))
> >     && TARGET_HARD_FLOAT
> > @@ -7796,15 +7798,16 @@ (define_insn "movsf_hardfloat"
> >     mr %0,%1
> >     mt%0 %1
> >     mf%1 %0
> > -   nop"
> > +   nop
> > +   #"
> >    [(set_attr "type"
> >       "load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
> >        fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
> > -      *,          mtjmpr,    mfjmpr,     *")
> > +      *,          mtjmpr,    mfjmpr,     *,         vecperm")
> >     (set_attr "isa"
> >       "*,          *,         p9v,        p8v,       *,         p9v,
> >        p8v,        *,         *,          *,         *,         *,
> > -      *,          *,         *,          *")])
> > +      *,          *,         *,          *,         p10")])
> >
> >  ;;   LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
> >  ;;   FMR          MR         MT%0       MF%1       NOP
> > @@ -8064,18 +8067,18 @@ (define_split
> >
> >  ;;           STFD         LFD         FMR         LXSD        STXSD
> >  ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
> > -;;           LWZ          STW         MR
> > +;;           LWZ          STW         MR          XXSPLTIDP
> >
> >
> >  (define_insn "*mov<mode>_hardfloat32"
> >    [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
> >              "=m,          d,          d,          <f64_p9>,   wY,
> >                <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
> > -              Y,          r,          !r")
> > +              Y,          r,          !r,         wa")
> >       (match_operand:FMOVE64 1 "input_operand"
> >               "d,          m,          d,          wY,         <f64_p9>,
> >                Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
> > -              r,          Y,          r"))]
> > +              r,          Y,          r,          eP"))]
> >    "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
> >     && (gpc_reg_operand (operands[0], <MODE>mode)
> >         || gpc_reg_operand (operands[1], <MODE>mode))"
> > @@ -8092,20 +8095,21 @@ (define_insn "*mov<mode>_hardfloat32"
> >     #
> >     #
> >     #
> > +   #
> >     #"
> >    [(set_attr "type"
> >              "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
> >               fpload,      fpstore,    veclogical, veclogical, two,
> > -             store,       load,       two")
> > +             store,       load,       two,        vecperm")
> >     (set_attr "size" "64")
> >     (set_attr "length"
> >              "*,           *,          *,          *,          *,
> >               *,           *,          *,          *,          8,
> > -             8,           8,          8")
> > +             8,           8,          8,          *")
> >     (set_attr "isa"
> >              "*,           *,          *,          p9v,        p9v,
> >               p7v,         p7v,        *,          *,          *,
> > -             *,           *,          *")])
> > +             *,           *,          *,          p10")])
> >
> >  ;;           STW      LWZ     MR      G-const H-const F-const
> >
> > @@ -8132,19 +8136,19 @@ (define_insn "*mov<mode>_softfloat32"
> >  ;;           STFD         LFD         FMR         LXSD        STXSD
> >  ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
> >  ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
> > -;;           NOP          MFVSRD      MTVSRD
> > +;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
> >
> >  (define_insn "*mov<mode>_hardfloat64"
> >    [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
> >             "=m,           d,          d,          <f64_p9>,   wY,
> >               <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
> >               YZ,          r,          !r,         *c*l,       !r,
> > -            *h,           r,          <f64_dm>")
> > +            *h,           r,          <f64_dm>,   wa")
> >       (match_operand:FMOVE64 1 "input_operand"
> >              "d,           m,          d,          wY,         <f64_p9>,
> >               Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
> >               r,           YZ,         r,          r,          *h,
> > -             0,           <f64_dm>,   r"))]
> > +             0,           <f64_dm>,   r,          eP"))]
> >    "TARGET_POWERPC64 && TARGET_HARD_FLOAT
> >     && (gpc_reg_operand (operands[0], <MODE>mode)
> >         || gpc_reg_operand (operands[1], <MODE>mode))"
> > @@ -8166,18 +8170,19 @@ (define_insn "*mov<mode>_hardfloat64"
> >     mf%1 %0
> >     nop
> >     mfvsrd %0,%x1
> > -   mtvsrd %x0,%1"
> > +   mtvsrd %x0,%1
> > +   #"
> >    [(set_attr "type"
> >              "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
> >               fpload,      fpstore,    veclogical, veclogical, integer,
> >               store,       load,       *,          mtjmpr,     mfjmpr,
> > -             *,           mfvsr,      mtvsr")
> > +             *,           mfvsr,      mtvsr,      vecperm")
> >     (set_attr "size" "64")
> >     (set_attr "isa"
> >              "*,           *,          *,          p9v,        p9v,
> >               p7v,         p7v,        *,          *,          *,
> >               *,           *,          *,          *,          *,
> > -             *,           p8v,        p8v")])
> > +             *,           p8v,        p8v,        p10")])
> >
> >  ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
> >  ;;           H-const  F-const  Special
> > @@ -8211,6 +8216,62 @@ (define_insn "*mov<mode>_softfloat64"
> >     (set_attr "length"
> >              "*,       *,      *,      *,      *,      8,
> >               12,      16,     *")])
> > +
>
> ok
>
>
> > +;; Split the VSX prefixed instruction to support SFmode and DFmode scalar
> > +;; constants that look like DFmode floating point values where both elements
> > +;; are the same.  The constant has to be expressible as a SFmode constant that
> > +;; is not a SFmode denormal value.
> > +;;
> > +;; We don't need splitters for the 128-bit types, since the function
> > +;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
>
> ok
>
> > +(define_insn "xxspltidp_<mode>_internal"
> > +  [(set (match_operand:SFDF 0 "register_operand" "=wa")
> > +     (unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> > +                  UNSPEC_XXSPLTIDP_CONST))]
> > +  "TARGET_POWER10"
> > +  "xxspltidp %x0,%1"
> > +  [(set_attr "type" "vecperm")
> > +   (set_attr "prefixed" "yes")])
> > +
> > +(define_insn "xxspltiw_<mode>_internal"
> > +  [(set (match_operand:SFDF 0 "register_operand" "=wa")
> > +     (unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> > +                  UNSPEC_XXSPLTIW_CONST))]
> > +  "TARGET_POWER10"
> > +  "xxspltiw %x0,%1"
> > +  [(set_attr "type" "vecperm")
> > +   (set_attr "prefixed" "yes")])
> > +
> > +(define_split
> > +  [(set (match_operand:SFDF 0 "vsx_register_operand")
> > +     (match_operand:SFDF 1 "vsx_prefixed_constant"))]
> > +  "TARGET_POWER10"
> > +  [(pc)]
> > +{
> > +  rtx dest = operands[0];
> > +  rtx src = operands[1];
> > +  vec_const_128bit_type vsx_const;
> > +
> > +  if (!vec_const_128bit_to_bytes (src, <MODE>mode, &vsx_const))
> > +    gcc_unreachable ();
> > +
> > +  unsigned imm = constant_generates_xxspltidp (&vsx_const);
> > +  if (imm)
> > +    {
> > +      emit_insn (gen_xxspltidp_<mode>_internal (dest, GEN_INT (imm)));
> > +      DONE;
> > +    }
> > +
> > +  imm = constant_generates_xxspltiw (&vsx_const);
> > +  if (imm)
> > +    {
> > +      emit_insn (gen_xxspltiw_<mode>_internal (dest, GEN_INT (imm)));
> > +      DONE;
> > +    }
> > +
> > +  else
> > +    gcc_unreachable ();
> > +})
>
>
> ok
> Nothing further,
> thanks,
> -Will

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-12-14 17:01 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-05  4:02 [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner
2021-11-05  4:04 ` [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function) Michael Meissner
2021-11-05 17:01   ` will schmidt
2021-11-05 18:13     ` Michael Meissner
2021-12-14 16:57       ` David Edelsohn
2021-11-15 16:35   ` Ping: " Michael Meissner
2021-12-13 16:58   ` Ping #2: " Michael Meissner
2021-11-05  4:07 ` [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ) Michael Meissner
2021-11-05 17:52   ` will schmidt
2021-11-05 18:01     ` Michael Meissner
2021-12-14 16:57       ` David Edelsohn
2021-11-15 16:36   ` Ping: " Michael Meissner
2021-12-13 17:02   ` Ping #2: " Michael Meissner
2021-11-05  4:09 ` [PATCH 3/5] Add Power10 XXSPLTIW Michael Meissner
2021-11-05 18:50   ` will schmidt
2021-12-14 16:59     ` David Edelsohn
2021-11-15 16:37   ` Ping: " Michael Meissner
2021-12-13 17:04   ` Ping #2: " Michael Meissner
2021-11-05  4:10 ` [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants Michael Meissner
2021-11-05 19:24   ` will schmidt
2021-12-14 17:00     ` David Edelsohn
2021-11-15 16:38   ` Ping: " Michael Meissner
2021-12-13 17:06   ` Ping #2: " Michael Meissner
2021-11-05  4:11 ` [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants Michael Meissner
2021-11-05 19:38   ` will schmidt
2021-12-14 17:01     ` David Edelsohn
2021-11-15 16:38   ` Ping: " Michael Meissner
2021-12-13 17:07   ` Ping #2: " Michael Meissner
2021-11-05 13:08 ` [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).