[PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments
@ 2023-07-07 20:18 Carl Love
  2023-07-13  9:41 ` Kewen.Lin
  0 siblings, 1 reply; 3+ messages in thread
From: Carl Love @ 2023-07-07 20:18 UTC (permalink / raw)
  To: Kewen.Lin, Segher Boessenkool, gcc-patches, David Edelsohn
  Cc: cel, Peter Bergner


GCC maintainers:

Version 3, added code to altivec_resolve_overloaded_builtin so the
correct instruction is selected for the size of the second argument. 
This restores the instruction counts to the original values where the
correct instructions were originally being generated.  The naming of
the overloaded builtin instances and builtin definitions were changed
to reflect the type of the second argument since the type of the first
argument is now the same for all overloaded instances.  A new builtin
test file was added for the case where the first argument is cast to
the unsigned long long type.  This test requires the -flax-vector-
conversions gcc command line option.  Since the other tests do not
require this option, I felt that the new test needed to be in a
separate file.  Finally some formatting fixes were made in the original
test file.  Patch has been retested on Power 10 with no regressions.

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt <wschmidt@linux.ibm.com>
   Date:   Fri Feb 4 13:07:17 2022 -0600

       rs6000: Correct function prototypes for vec_replace_unaligned

       Due to a pasto error in the documentation, vec_replace_unaligned
was
       implemented with the same function prototypes as
vec_replace_elt.  It was
       intended that vec_replace_unaligned always specify output
vectors as having
       type vector unsigned char, to emphasize that elements are
potentially
       misaligned by this built-in function.  This patch corrects the
       misimplementation.
    ....

This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

                 Carl 

------------------
rs6000, fix vec_replace_unaligned built-in arguments

The first argument of the vec_replace_unaligned built-in should always be
unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the test cases to use
the correct arguments.  The original test file is renamed and a second test
file is added for a new test case.

gcc/ChangeLog:
	* config/rs6000/rs6000-builtins.def: Rename
	__builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi
	__builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi
	__builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df
	__builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di
	__builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf
	__builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si.
	Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as
	VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
	VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
	VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI.
	Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as
	vreplace_un_si, vreplace_un_v2df as vreplace_un_df,
	vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as
	vreplace_un_sf, vreplace_un_v4si as vreplace_un_si.
	* config/rs6000/rs6000-c.cc (find_instance): Add new argument
	nargs.  Add nargs check.  Extend function to handle three arguments.
	(altivec_resolve_overloaded_builtin): Add new argument nargs to
	function calls.  Add case RS6000_OVLD_VEC_REPLACE_UN.
	* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
	Fix first argument type.  Rename VREPLACE_UN_UV4SI as
	VREPLACE_UN_USI, VREPLACE_UN_V4SI as VREPLACE_UN_SI,
	VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_V2DI as
	VREPLACE_UN_DI, VREPLACE_UN_V4SF as VREPLACE_UN_SF,
	VREPLACE_UN_V2DF as VREPLACE_UN_DF.
	* config/rs6000/vsx.md (VEC_RU): New mode iterator.
	(VEC_RU_char): New mode attribute.
	(vreplace_un_<mode>): Change iterator and mode attribute.

gcc/testsuite/ChangeLog:
	* gcc.target/powerpc/vec-replace-word-runnable.c: Renamed
	vec-replace-word-runnable_1.c.
	* gcc.target/powerpc/vec-replace-word-runnable_1.c
	(dg-options): add -flax-vector-conversions.
	(vec_replace_unaligned) Fix first argument type.
	(vresult_uchar): Fix expected results.
	(vec_replace_unaligned): Update for loop to check uchar results.
	Remove extra spaces in if statements. Insert missing spaces in
	for statements.
	* gcc.target/powerpc/vec-replace-word-runnable_2.c: New test file.
---
 gcc/config/rs6000/rs6000-builtins.def         |  28 +--
 gcc/config/rs6000/rs6000-c.cc                 |  62 ++++++-
 gcc/config/rs6000/rs6000-overload.def         |  24 +--
 gcc/config/rs6000/vsx.md                      |  13 +-
 ...nnable.c => vec-replace-word-runnable_1.c} | 165 ++++++++++--------
 .../powerpc/vec-replace-word-runnable_2.c     |  50 ++++++
 6 files changed, 233 insertions(+), 109 deletions(-)
 rename gcc/testsuite/gcc.target/powerpc/{vec-replace-word-runnable.c => vec-replace-word-runnable_1.c} (54%)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_2.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def
index 5ac6af4c6e3..8206cc86014 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -3388,26 +3388,26 @@
   const vull __builtin_altivec_vpextd (vull, vull);
     VPEXTD vpextd {}
 
-  const vuc __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \
-                                                 const int<4>);
-    VREPLACE_UN_UV2DI vreplace_un_v2di {}
+  const vuc __builtin_altivec_vreplace_un_udi (vuc, unsigned long long, \
+                                               const int<4>);
+    VREPLACE_UN_UDI vreplace_un_di {}
 
-  const vuc __builtin_altivec_vreplace_un_uv4si (vui, unsigned int, \
-                                                 const int<4>);
-    VREPLACE_UN_UV4SI vreplace_un_v4si {}
+  const vuc __builtin_altivec_vreplace_un_usi (vuc, unsigned int, \
+                                               const int<4>);
+    VREPLACE_UN_USI vreplace_un_si {}
 
-  const vuc __builtin_altivec_vreplace_un_v2df (vd, double, const int<4>);
-    VREPLACE_UN_V2DF vreplace_un_v2df {}
+  const vuc __builtin_altivec_vreplace_un_df (vuc, double, const int<4>);
+    VREPLACE_UN_DF vreplace_un_df {}
 
-  const vuc __builtin_altivec_vreplace_un_v2di (vsll, signed long long, \
+  const vuc __builtin_altivec_vreplace_un_di (vuc, signed long long, \
                                                 const int<4>);
-    VREPLACE_UN_V2DI vreplace_un_v2di {}
+    VREPLACE_UN_DI vreplace_un_di {}
 
-  const vuc __builtin_altivec_vreplace_un_v4sf (vf, float, const int<4>);
-    VREPLACE_UN_V4SF vreplace_un_v4sf {}
+  const vuc __builtin_altivec_vreplace_un_sf (vuc, float, const int<4>);
+    VREPLACE_UN_SF vreplace_un_sf {}
 
-  const vuc __builtin_altivec_vreplace_un_v4si (vsi, signed int, const int<4>);
-    VREPLACE_UN_V4SI vreplace_un_v4si {}
+  const vuc __builtin_altivec_vreplace_un_si (vuc, signed int, const int<4>);
+    VREPLACE_UN_SI vreplace_un_si {}
 
   const vull __builtin_altivec_vreplace_uv2di (vull, unsigned long long, \
                                                const int<1>);
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 8555174d36e..7862ccb0c45 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1674,13 +1674,18 @@ tree
 find_instance (bool *unsupported_builtin, ovlddata **instance,
 	       rs6000_gen_builtins instance_code,
 	       rs6000_gen_builtins fcode,
-	       tree *types, tree *args)
+	       tree *types, tree *args, int nargs)
 {
   while (*instance && (*instance)->bifid != instance_code)
     *instance = (*instance)->next;
 
   ovlddata *inst = *instance;
   gcc_assert (inst != NULL);
+
+  /* Function only supports nargs equal to 2 or 3.  */
+  if (!(nargs == 2 || nargs == 3))
+    gcc_assert(0);
+
   /* It is possible for an instance to require a data type that isn't
      defined on this target, in which case inst->fntype will be NULL.  */
   if (!inst->fntype)
@@ -1688,15 +1693,25 @@ find_instance (bool *unsupported_builtin, ovlddata **instance,
   tree fntype = rs6000_builtin_info[inst->bifid].fntype;
   tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
   tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
+  tree parmtype2;
+  int arg2_compatible = true;
+
+  if (nargs == 3)
+    {
+      parmtype2
+       = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (TYPE_ARG_TYPES (fntype))));
+      arg2_compatible = rs6000_builtin_type_compatible (types[2], parmtype2);
+    }
 
   if (rs6000_builtin_type_compatible (types[0], parmtype0)
-      && rs6000_builtin_type_compatible (types[1], parmtype1))
+      && rs6000_builtin_type_compatible (types[1], parmtype1)
+      && arg2_compatible)
     {
       if (rs6000_builtin_decl (inst->bifid, false) != error_mark_node
 	  && rs6000_builtin_is_supported (inst->bifid))
 	{
 	  tree ret_type = TREE_TYPE (inst->fntype);
-	  return altivec_build_resolved_builtin (args, 2, fntype, ret_type,
+	  return altivec_build_resolved_builtin (args, nargs, fntype, ret_type,
 						 inst->bifid, fcode);
 	}
       else
@@ -1916,7 +1931,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
 	  instance_code = RS6000_BIF_CMPB_32;
 
 	tree call = find_instance (&unsupported_builtin, &instance,
-				   instance_code, fcode, types, args);
+				   instance_code, fcode, types, args, nargs);
 	if (call != error_mark_node)
 	  return call;
 	break;
@@ -1949,7 +1964,44 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
 	  }
 
 	tree call = find_instance (&unsupported_builtin, &instance,
-				   instance_code, fcode, types, args);
+				   instance_code, fcode, types, args, nargs);
+	if (call != error_mark_node)
+	  return call;
+	break;
+      }
+
+    case RS6000_OVLD_VEC_REPLACE_UN:
+      {
+	machine_mode arg2_mode = TYPE_MODE (types[1]);
+
+	if (GET_MODE_SIZE(arg2_mode) == 4)
+	  {
+	    if (GET_MODE_CLASS(arg2_mode) == MODE_INT)
+	      /* Signed and unsigned are handled the same.  */
+	      instance_code = RS6000_BIF_VREPLACE_UN_USI;
+	    else if (GET_MODE_CLASS(arg2_mode) == MODE_FLOAT)
+	      instance_code = RS6000_BIF_VREPLACE_UN_SF;
+	    else
+	      /* Only have support for int and float.  */
+		gcc_assert (0);
+	  }
+	else if (GET_MODE_SIZE(arg2_mode) == 8)
+	  {
+	    if (GET_MODE_CLASS(arg2_mode) == MODE_INT)
+	      /* Signed and unsigned are handled the same.  */
+	      instance_code = RS6000_BIF_VREPLACE_UN_UDI;
+	    else if (GET_MODE_CLASS(arg2_mode) == MODE_FLOAT)
+	      instance_code = RS6000_BIF_VREPLACE_UN_DF;
+	    else
+	      /* Only have support for long long int and double.  */
+	      gcc_assert (0);
+	  }
+	else
+	  /* Only have support for word and double.   */
+	  gcc_assert (0);
+
+	tree call = find_instance (&unsupported_builtin, &instance,
+				   instance_code, fcode, types, args, nargs);
 	if (call != error_mark_node)
 	  return call;
 	break;
diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..2e33bc97a6c 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3059,18 +3059,18 @@
     VREPLACE_ELT_V2DF
 
 [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
-  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
-    VREPLACE_UN_UV4SI
-  vuc __builtin_vec_replace_un (vsi, signed int, const int);
-    VREPLACE_UN_V4SI
-  vuc __builtin_vec_replace_un (vull, unsigned long long, const int);
-    VREPLACE_UN_UV2DI
-  vuc __builtin_vec_replace_un (vsll, signed long long, const int);
-    VREPLACE_UN_V2DI
-  vuc __builtin_vec_replace_un (vf, float, const int);
-    VREPLACE_UN_V4SF
-  vuc __builtin_vec_replace_un (vd, double, const int);
-    VREPLACE_UN_V2DF
+  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
+    VREPLACE_UN_USI
+  vuc __builtin_vec_replace_un (vuc, signed int, const int);
+    VREPLACE_UN_SI
+  vuc __builtin_vec_replace_un (vuc, unsigned long long, const int);
+    VREPLACE_UN_UDI
+  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
+    VREPLACE_UN_DI
+  vuc __builtin_vec_replace_un (vuc, float, const int);
+    VREPLACE_UN_SF
+  vuc __builtin_vec_replace_un (vuc, double, const int);
+    VREPLACE_UN_DF
 
 [VEC_REVB, vec_revb, __builtin_vec_revb]
   vss __builtin_vec_revb (vss);
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0a34ceebeb5..50b65a18eac 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -389,6 +389,13 @@ (define_mode_attr REPLACE_ELT_sh [(V4SI "2") (V4SF "2")
 (define_mode_attr REPLACE_ELT_max [(V4SI "12") (V4SF "12")
 				   (V2DI  "8") (V2DF "8")])
 
+;; vector replace unaligned modes
+(define_mode_iterator VEC_RU [SI SF DI DF])
+(define_mode_attr VEC_RU_char [(SI "w")
+			       (SF "w")
+			       (DI "d")
+			       (DF "d")])
+
 ;; Like VM2 in altivec.md, just do char, short, int, long, float and double
 (define_mode_iterator VM3 [V4SI
 			   V8HI
@@ -4191,12 +4198,12 @@ (define_insn "vreplace_elt_<mode>_inst"
 
 (define_insn "vreplace_un_<mode>"
  [(set (match_operand:V16QI 0 "register_operand" "=v")
-  (unspec:V16QI [(match_operand:REPLACE_ELT 1 "register_operand" "0")
-                 (match_operand:<VEC_base> 2 "register_operand" "r")
+  (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
+		 (match_operand:VEC_RU 2 "register_operand" "r")
 		 (match_operand:QI 3 "const_0_to_12_operand" "n")]
 		UNSPEC_REPLACE_UN))]
  "TARGET_POWER10"
- "vins<REPLACE_ELT_char> %0,%2,%3"
+ "vins<VEC_RU_char> %0,%2,%3"
  [(set_attr "type" "vecsimple")])
 
 ;; VSX_EXTRACT optimizations
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_1.c
similarity index 54%
rename from gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
rename to gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_1.c
index 27318822871..f5d30a88067 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_1.c
@@ -20,6 +20,9 @@ main (int argc, char *argv [])
   unsigned char ch;
   unsigned int index;
 
+  vector unsigned char src_va_uchar;
+  vector unsigned char expected_vresult_uchar;
+
   vector unsigned int vresult_uint;
   vector unsigned int expected_vresult_uint;
   vector unsigned int src_va_uint;
@@ -64,10 +67,10 @@ main (int argc, char *argv [])
 						 
   vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
 
-  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+  if (!vec_all_eq (vresult_uint, expected_vresult_uint)) {
 #if DEBUG
     printf("ERROR, vec_replace_elt (src_vb_uint, src_va_uint, index)\n");
-    for(i = 0; i < 4; i++)
+    for (i = 0; i < 4; i++)
       printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
 	     i, vresult_uint[i], i, expected_vresult_uint[i]);
 #else
@@ -82,10 +85,10 @@ main (int argc, char *argv [])
 						 
   vresult_int = vec_replace_elt (src_va_int, src_a_int, 1);
 
-  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+  if (!vec_all_eq (vresult_int, expected_vresult_int)) {
 #if DEBUG
-    printf("ERROR, vec_replace_elt (src_vb_int, src_va_int, index)\n");
-    for(i = 0; i < 4; i++)
+    printf("ERROR, vec_replace_elt (src_vb_int, src_a_int, index)\n");
+    for (i = 0; i < 4; i++)
       printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
 	     i, vresult_int[i], i, expected_vresult_int[i]);
 #else
@@ -100,10 +103,10 @@ main (int argc, char *argv [])
 						 
   vresult_float = vec_replace_elt (src_va_float, src_a_float, 1);
 
-  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
+  if (!vec_all_eq (vresult_float, expected_vresult_float)) {
 #if DEBUG
-    printf("ERROR, vec_replace_elt (src_vb_float, src_va_float, index)\n");
-    for(i = 0; i < 4; i++)
+    printf("ERROR, vec_replace_elt (src_vb_float, src_a_float, index)\n");
+    for (i = 0; i < 4; i++)
       printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n",
 	     i, vresult_float[i], i, expected_vresult_float[i]);
 #else
@@ -121,8 +124,8 @@ main (int argc, char *argv [])
 
   if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
 #if DEBUG
-    printf("ERROR, vec_replace_elt (src_vb_ullint, src_va_ullint, index)\n");
-    for(i = 0; i < 2; i++)
+    printf("ERROR, vec_replace_elt (src_vb_ullint, src_a_ullint, index)\n");
+    for (i = 0; i < 2; i++)
       printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
 	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
 #else
@@ -137,10 +140,10 @@ main (int argc, char *argv [])
 						 
   vresult_llint = vec_replace_elt (src_va_llint, src_a_llint, 1);
 
-  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
+  if (!vec_all_eq (vresult_llint, expected_vresult_llint)) {
 #if DEBUG
-    printf("ERROR, vec_replace_elt (src_vb_llint, src_va_llint, index)\n");
-    for(i = 0; i < 2; i++)
+    printf("ERROR, vec_replace_elt (src_vb_llint, src_a_llint, index)\n");
+    for (i = 0; i < 2; i++)
       printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
 	     i, vresult_llint[i], i, expected_vresult_llint[i]);
 #else
@@ -155,10 +158,10 @@ main (int argc, char *argv [])
 						 
   vresult_double = vec_replace_elt (src_va_double, src_a_double, 1);
 
-  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
+  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
 #if DEBUG
-    printf("ERROR, vec_replace_elt (src_vb_double, src_va_double, index)\n");
-    for(i = 0; i < 2; i++)
+    printf("ERROR, vec_replace_elt (src_vb_double, src_a_double, index)\n");
+    for (i = 0; i < 2; i++)
       printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n",
 	     i, vresult_double[i], i, expected_vresult_double[i]);
 #else
@@ -169,60 +172,68 @@ main (int argc, char *argv [])
 
   /* Vector replace 32-bit element, unaligned */
   src_a_uint = 345;
-  src_va_uint = (vector unsigned int) { 1, 2, 0, 0 };
-  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
+  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
+					  0, 0, 0, 0, 0, 0, 0, 0 };
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
   /* Byte index 7 will overwrite part of elements 2 and 3 */
-  expected_vresult_uint = (vector unsigned int) { 1, 2, 345*256, 0 };
+  expected_vresult_uchar
+    = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
+			       0, 0x59, 0x1, 0, 0, 0, 0, 0 };
 						 
-  vresult_uchar = vec_replace_unaligned (src_va_uint, src_a_uint, 3);
-  vresult_uint = (vector unsigned int) vresult_uchar;
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_uint, 3);
 
-  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_uint, src_va_uint, index)\n");
-    for(i = 0; i < 4; i++)
-      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
-	     i, vresult_uint[i], i, expected_vresult_uint[i]);
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_uint, index)\n");
+    for (i = 0; i < 16; i++)
+      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uint[%d] = 0x%x\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
 #else
     abort();
 #endif
   }
 
   src_a_int = 234;
-  src_va_int = (vector int) { 1, 0, 3, 4 };
-  vresult_int = (vector int) { 0, 0, 0, 0 };
+  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 0, 0, 0, 0,
+					  3, 0, 0, 0, 4, 0, 0, 0 };
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
   /* Byte index 7 will over write part of elements 1 and 2 */
-  expected_vresult_int = (vector int) { 1, 234*256, 0, 4 };
-						 
-  vresult_uchar = vec_replace_unaligned (src_va_int, src_a_int, 7);
-  vresult_int = (vector signed int) vresult_uchar;
+  expected_vresult_uchar = (vector unsigned char) { 1, 0, 0, 0, 0, 0xea, 0, 0,
+						    0, 0, 0, 0, 4, 0, 0, 0 };
 
-  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_int, 7);
+
+  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_int, src_va_int, index)\n");
-    for(i = 0; i < 4; i++)
-      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
-	     i, vresult_int[i], i, expected_vresult_int[i]);
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_int, index)\n");
+    for (i = 0; i < 16; i++)
+      printf(" vresult_int[%d] = 0x%x, expected_vresult_int[%d] = 0x%x\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
 #else
     abort();
 #endif
   }
 
   src_a_float = 34.0;
-  src_va_float = (vector float) { 0.0, 10.0, 20.0, 30.0 };
-  vresult_float = (vector float) { 0.0, 0.0, 0.0, 0.0 };
-  expected_vresult_float = (vector float) { 0.0, 34.0, 20.0, 30.0 };
+  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0x41, 0xa0,
+					  5, 6, 7, 8, 0x41, 0xf0, 0, 0};
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_uchar
+    = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 8, 0x42,
+			       5, 6, 7, 8, 0x41, 0xf0, 0, 0 };
 						 
-  vresult_uchar = vec_replace_unaligned (src_va_float, src_a_float, 8);
-  vresult_float = (vector float) vresult_uchar;
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_float, 8);
 
-  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
+  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_float, src_va_float, "
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_float, "
 	   "index)\n");
-    for(i = 0; i < 4; i++)
-      printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n",
-	     i, vresult_float[i], i, expected_vresult_float[i]);
+    for (i = 0; i < 16; i++)
+      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uchar[%d] = 0x%x\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
 #else
     abort();
 #endif
@@ -230,72 +241,76 @@ main (int argc, char *argv [])
 
   /* Vector replace 64-bit element, unaligned  */
   src_a_ullint = 456;
-  src_va_ullint = (vector unsigned long long int) { 0, 0x222 };
-  vresult_ullint = (vector unsigned long long int) { 0, 0 };
-  expected_vresult_ullint = (vector unsigned long long int) { 456*256,
-							      0x200 };
+  src_va_uchar = (vector unsigned char) { 0, 0xc, 0x1, 0, 0, 0, 0, 0,
+					  0x22, 0x2, 0, 0, 0, 0, 0, 0 };
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_uchar
+    = (vector unsigned char) { 0, 0xc8, 0x1, 0, 0, 0, 0, 0,
+			       0, 2, 0, 0, 0, 0, 0, 0 };
 						 
   /* Byte index 7 will over write least significant byte of  element 0  */
-  vresult_uchar = vec_replace_unaligned (src_va_ullint, src_a_ullint, 7);
-  vresult_ullint = (vector unsigned long long) vresult_uchar;
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_ullint, 7);
 
-  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
+  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_ullint, src_va_ullint, "
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_ullint, "
 	   "index)\n");
-    for(i = 0; i < 2; i++)
-      printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
-	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
+    for (i = 0; i < 15; i++)
+      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uchar[%d] = 0x%x\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
 #else
     abort();
 #endif
   }
 
   src_a_llint = 678;
-  src_va_llint = (vector long long int) { 0, 0x101 };
+  src_va_uchar = (vector unsigned char) { 0, 0xa6, 0x2, 0, 0, 0, 0, 0,
+					  0x0, 0x1, 0, 0, 0, 0, 0, 0 };
   vresult_llint = (vector long long int) { 0, 0 };
   /* Byte index 7 will over write least significant byte of  element 0  */
-  expected_vresult_llint = (vector long long int) { 678*256, 0x100 };
+  expected_vresult_uchar
+    = (vector unsigned char) { 0x0, 0xa6, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0,
+			       0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0 };
 						 
-  vresult_uchar = vec_replace_unaligned (src_va_llint, src_a_llint, 7);
-  vresult_llint = (vector signed long long) vresult_uchar;
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_llint, 7);
 
-  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
+  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_llint, src_va_llint, "
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_llint, "
 	   "index)\n");
-    for(i = 0; i < 2; i++)
-      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
-	     i, vresult_llint[i], i, expected_vresult_llint[i]);
+    for (i = 0; i < 16; i++)
+      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
 #else
     abort();
 #endif
   }
   
   src_a_double = 678.0;
-  src_va_double = (vector double) { 0.0, 50.0 };
+  //  src_va_double = (vector double) { 0.0, 50.0 };
+  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					  0, 0, 0, 0, 0, 0, 0, 0 };
   vresult_double = (vector double) { 0.0, 0.0 };
   expected_vresult_double = (vector double) { 0.0, 678.0 };
 						 
-  vresult_uchar = vec_replace_unaligned (src_va_double, src_a_double, 0);
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_double, 0);
   vresult_double = (vector double) vresult_uchar;
 
-  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
+  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_double, src_va_double, "
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_double, "
 	   "index)\n");
-    for(i = 0; i < 2; i++)
+    for (i = 0; i < 2; i++)
       printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n",
 	     i, vresult_double[i], i, expected_vresult_double[i]);
 #else
     abort();
 #endif
   }
-    
+  
   return 0;
 }
 
 /* { dg-final { scan-assembler-times {\mvinsw\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mvinsd\M} 6 } } */
-
-
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_2.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_2.c
new file mode 100644
index 00000000000..806ff3317e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_2.c
@@ -0,0 +1,50 @@
+/* { dg-do run { target { power10_hw } } } */
+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps -flax-vector-conversions" } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#endif
+
+extern void abort (void);
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  vector unsigned long long int vresult_ullint;
+  vector unsigned long long int expected_vresult_ullint;
+  vector unsigned long long int src_va_ullint;
+  unsigned int long long src_a_ullint;
+
+  /* Replace doubleword size chunk specified as a constant that can be
+     represented by an int.  Should generate a vinsd instruction.  Note, this
+     test requires command line option -flax-vector-conversions.  */
+  src_a_ullint = 456;
+  src_va_ullint = (vector unsigned long long int) { 0, 11 };
+  vresult_ullint = (vector unsigned long long int) { 0, 2 };
+  expected_vresult_ullint = (vector unsigned long long int) { 0, 456 };
+						 
+  vresult_ullint = vec_replace_unaligned ((vector unsigned char)src_va_ullint,
+					  src_a_ullint, 0);
+
+  if (!vec_all_eq (vresult_ullint, expected_vresult_ullint)) {
+#if DEBUG
+    printf("ERROR, vec_replace_unaligned ((vector unsigned char)src_vb_ullint, src_a_ullint, index)\n");
+    for (i = 0; i < 2; i++)
+      printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
+	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
+#else
+    abort();
+#endif
+  }
+  
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\mvinsd\M} 1 } } */
-- 
2.37.2



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments
  2023-07-07 20:18 [PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments Carl Love
@ 2023-07-13  9:41 ` Kewen.Lin
  2023-07-17 18:33   ` Carl Love
  0 siblings, 1 reply; 3+ messages in thread
From: Kewen.Lin @ 2023-07-13  9:41 UTC (permalink / raw)
  To: Carl Love; +Cc: Peter Bergner, Segher Boessenkool, David Edelsohn, gcc-patches

Hi Carl,

on 2023/7/8 04:18, Carl Love wrote:
> 
> GCC maintainers:
> 
> Version 3, added code to altivec_resolve_overloaded_builtin so the
> correct instruction is selected for the size of the second argument. 
> This restores the instruction counts to the original values where the
> correct instructions were originally being generated.  The naming of

Nice, I have some comments inlined below.

> the overloaded builtin instances and builtin definitions were changed
> to reflect the type of the second argument since the type of the first
> argument is now the same for all overloaded instances.  A new builtin
> test file was added for the case where the first argument is cast to
> the unsigned long long type.  This test requires the -flax-vector-
> conversions gcc command line option.  Since the other tests do not
> require this option, I felt that the new test needed to be in a
> separate file.  Finally some formatting fixes were made in the original
> test file.  Patch has been retested on Power 10 with no regressions.
> 
> Version 2, fixed various typos.  Updated the change log body to say the
> instruction counts were updated.  The instruction counts changed as a
> result of changing the first argument of the vec_replace_unaligned
> builtin call from vector unsigned long long (vull) to vector unsigned
> char (vuc).  When the first argument was vull the builtin call
> generated the vinsd instruction for the two test cases.  The updated
> call with vuc as the first argument generates two vinsw instructions
> instead.  Patch was retested on Power 10 with no regressions.
> 
> The following patch fixes the first argument in the builtin definition
> and the corresponding test cases.  Initially, the builtin specification
> was wrong due to a cut and past error.  The documentation was fixed in:
> 
>    commit ed3fea09b18f67e757b5768b42cb6e816626f1db
>    Author: Bill Schmidt <wschmidt@linux.ibm.com>
>    Date:   Fri Feb 4 13:07:17 2022 -0600
> 
>        rs6000: Correct function prototypes for vec_replace_unaligned
> 
>        Due to a pasto error in the documentation, vec_replace_unaligned
> was
>        implemented with the same function prototypes as
> vec_replace_elt.  It was
>        intended that vec_replace_unaligned always specify output
> vectors as having
>        type vector unsigned char, to emphasize that elements are
> potentially
>        misaligned by this built-in function.  This patch corrects the
>        misimplementation.
>     ....
> 
> This patch fixes the arguments in the definitions and updates the
> testcases accordingly.  Additionally, a few minor spacing issues are
> fixed.
> 
> The patch has been tested on Power 10 with no regressions.  Please let
> me know if the patch is acceptable for mainline.  Thanks.
> 
>                  Carl 
> 
> ------------------
> rs6000, fix vec_replace_unaligned built-in arguments
> 
> The first argument of the vec_replace_unaligned built-in should always be
> unsigned char, as specified in gcc/doc/extend.texi.

Maybe "be with type vector unsigned char"?

> 
> This patch fixes the builtin definitions and updates the test cases to use
> the correct arguments.  The original test file is renamed and a second test
> file is added for a new test case.
> 
> gcc/ChangeLog:
> 	* config/rs6000/rs6000-builtins.def: Rename
> 	__builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi
> 	__builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi
> 	__builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df
> 	__builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di
> 	__builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf
> 	__builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si.
> 	Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as
> 	VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
> 	VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
> 	VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI.
> 	Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as
> 	vreplace_un_si, vreplace_un_v2df as vreplace_un_df,
> 	vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as
> 	vreplace_un_sf, vreplace_un_v4si as vreplace_un_si.
> 	* config/rs6000/rs6000-c.cc (find_instance): Add new argument
> 	nargs.  Add nargs check.  Extend function to handle three arguments.
> 	(altivec_resolve_overloaded_builtin): Add new argument nargs to
> 	function calls.  Add case RS6000_OVLD_VEC_REPLACE_UN.
> 	* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
> 	Fix first argument type.  Rename VREPLACE_UN_UV4SI as
> 	VREPLACE_UN_USI, VREPLACE_UN_V4SI as VREPLACE_UN_SI,
> 	VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_V2DI as
> 	VREPLACE_UN_DI, VREPLACE_UN_V4SF as VREPLACE_UN_SF,
> 	VREPLACE_UN_V2DF as VREPLACE_UN_DF.
> 	* config/rs6000/vsx.md (VEC_RU): New mode iterator.
> 	(VEC_RU_char): New mode attribute.
> 	(vreplace_un_<mode>): Change iterator and mode attribute.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.target/powerpc/vec-replace-word-runnable.c: Renamed
> 	vec-replace-word-runnable_1.c.
> 	* gcc.target/powerpc/vec-replace-word-runnable_1.c
> 	(dg-options): add -flax-vector-conversions.
> 	(vec_replace_unaligned) Fix first argument type.
> 	(vresult_uchar): Fix expected results.
> 	(vec_replace_unaligned): Update for loop to check uchar results.
> 	Remove extra spaces in if statements. Insert missing spaces in
> 	for statements.
> 	* gcc.target/powerpc/vec-replace-word-runnable_2.c: New test file.
> ---
>  gcc/config/rs6000/rs6000-builtins.def         |  28 +--
>  gcc/config/rs6000/rs6000-c.cc                 |  62 ++++++-
>  gcc/config/rs6000/rs6000-overload.def         |  24 +--
>  gcc/config/rs6000/vsx.md                      |  13 +-
>  ...nnable.c => vec-replace-word-runnable_1.c} | 165 ++++++++++--------
>  .../powerpc/vec-replace-word-runnable_2.c     |  50 ++++++
>  6 files changed, 233 insertions(+), 109 deletions(-)
>  rename gcc/testsuite/gcc.target/powerpc/{vec-replace-word-runnable.c => vec-replace-word-runnable_1.c} (54%)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_2.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def
> index 5ac6af4c6e3..8206cc86014 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -3388,26 +3388,26 @@
>    const vull __builtin_altivec_vpextd (vull, vull);
>      VPEXTD vpextd {}
>  
> -  const vuc __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \
> -                                                 const int<4>);
> -    VREPLACE_UN_UV2DI vreplace_un_v2di {}
> +  const vuc __builtin_altivec_vreplace_un_udi (vuc, unsigned long long, \
> +                                               const int<4>);
> +    VREPLACE_UN_UDI vreplace_un_di {}
>  
> -  const vuc __builtin_altivec_vreplace_un_uv4si (vui, unsigned int, \
> -                                                 const int<4>);
> -    VREPLACE_UN_UV4SI vreplace_un_v4si {}
> +  const vuc __builtin_altivec_vreplace_un_usi (vuc, unsigned int, \
> +                                               const int<4>);
> +    VREPLACE_UN_USI vreplace_un_si {}
>  
> -  const vuc __builtin_altivec_vreplace_un_v2df (vd, double, const int<4>);
> -    VREPLACE_UN_V2DF vreplace_un_v2df {}
> +  const vuc __builtin_altivec_vreplace_un_df (vuc, double, const int<4>);
> +    VREPLACE_UN_DF vreplace_un_df {}
>  
> -  const vuc __builtin_altivec_vreplace_un_v2di (vsll, signed long long, \
> +  const vuc __builtin_altivec_vreplace_un_di (vuc, signed long long, \
>                                                  const int<4>);
> -    VREPLACE_UN_V2DI vreplace_un_v2di {}
> +    VREPLACE_UN_DI vreplace_un_di {}
>  
> -  const vuc __builtin_altivec_vreplace_un_v4sf (vf, float, const int<4>);
> -    VREPLACE_UN_V4SF vreplace_un_v4sf {}
> +  const vuc __builtin_altivec_vreplace_un_sf (vuc, float, const int<4>);
> +    VREPLACE_UN_SF vreplace_un_sf {}
>  
> -  const vuc __builtin_altivec_vreplace_un_v4si (vsi, signed int, const int<4>);
> -    VREPLACE_UN_V4SI vreplace_un_v4si {}
> +  const vuc __builtin_altivec_vreplace_un_si (vuc, signed int, const int<4>);
> +    VREPLACE_UN_SI vreplace_un_si {}
>  
>    const vull __builtin_altivec_vreplace_uv2di (vull, unsigned long long, \
>                                                 const int<1>);
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 8555174d36e..7862ccb0c45 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -1674,13 +1674,18 @@ tree
>  find_instance (bool *unsupported_builtin, ovlddata **instance,
>  	       rs6000_gen_builtins instance_code,
>  	       rs6000_gen_builtins fcode,
> -	       tree *types, tree *args)
> +	       tree *types, tree *args, int nargs)

I think we should extend this find_instance to cover any number of arguments
(well, positive number :)), not only for the required three arguments here.

Could you do it with a separated patch instead?

I think we can go with the loop bounded with the given NARGS, check each
type[i] is rs6000_builtin_type_compatible with the corresponding paramtype
from TYPE_ARG_TYPES chains, if any of them fails, return error_mark_node.
If everything goes well, further check rs6000_builtin_decl and 
rs6000_builtin_is_supported, if either fails, set unsupported_builtin,
otherwise call altivec_build_resolved_builtin.

>  {
>    while (*instance && (*instance)->bifid != instance_code)
>      *instance = (*instance)->next;
>  
>    ovlddata *inst = *instance;
>    gcc_assert (inst != NULL);
> +
> +  /* Function only supports nargs equal to 2 or 3.  */
> +  if (!(nargs == 2 || nargs == 3))
> +    gcc_assert(0);

A side comment for the above change (this line isn't needed any more with
the above requested change): it can be read better with
"gcc_assert (nargs == 2 || nargs == 3);"

> +
>    /* It is possible for an instance to require a data type that isn't
>       defined on this target, in which case inst->fntype will be NULL.  */
>    if (!inst->fntype)
> @@ -1688,15 +1693,25 @@ find_instance (bool *unsupported_builtin, ovlddata **instance,
>    tree fntype = rs6000_builtin_info[inst->bifid].fntype;
>    tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
>    tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
> +  tree parmtype2;
> +  int arg2_compatible = true;
> +
> +  if (nargs == 3)
> +    {
> +      parmtype2
> +       = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (TYPE_ARG_TYPES (fntype))));
> +      arg2_compatible = rs6000_builtin_type_compatible (types[2], parmtype2);
> +    }
>  
>    if (rs6000_builtin_type_compatible (types[0], parmtype0)
> -      && rs6000_builtin_type_compatible (types[1], parmtype1))
> +      && rs6000_builtin_type_compatible (types[1], parmtype1)
> +      && arg2_compatible)
>      {
>        if (rs6000_builtin_decl (inst->bifid, false) != error_mark_node
>  	  && rs6000_builtin_is_supported (inst->bifid))
>  	{
>  	  tree ret_type = TREE_TYPE (inst->fntype);
> -	  return altivec_build_resolved_builtin (args, 2, fntype, ret_type,
> +	  return altivec_build_resolved_builtin (args, nargs, fntype, ret_type,
>  						 inst->bifid, fcode);
>  	}
>        else
> @@ -1916,7 +1931,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
>  	  instance_code = RS6000_BIF_CMPB_32;
>  
>  	tree call = find_instance (&unsupported_builtin, &instance,
> -				   instance_code, fcode, types, args);
> +				   instance_code, fcode, types, args, nargs);
>  	if (call != error_mark_node)
>  	  return call;
>  	break;
> @@ -1949,7 +1964,44 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
>  	  }
>  
>  	tree call = find_instance (&unsupported_builtin, &instance,
> -				   instance_code, fcode, types, args);
> +				   instance_code, fcode, types, args, nargs);
> +	if (call != error_mark_node)
> +	  return call;
> +	break;
> +      }
> +
> +    case RS6000_OVLD_VEC_REPLACE_UN:
> +      {
> +	machine_mode arg2_mode = TYPE_MODE (types[1]);
> +
> +	if (GET_MODE_SIZE(arg2_mode) == 4)
> +	  {
> +	    if (GET_MODE_CLASS(arg2_mode) == MODE_INT)
> +	      /* Signed and unsigned are handled the same.  */
> +	      instance_code = RS6000_BIF_VREPLACE_UN_USI;
> +	    else if (GET_MODE_CLASS(arg2_mode) == MODE_FLOAT)
> +	      instance_code = RS6000_BIF_VREPLACE_UN_SF;
> +	    else
> +	      /* Only have support for int and float.  */
> +		gcc_assert (0);
> +	  }
> +	else if (GET_MODE_SIZE(arg2_mode) == 8)
> +	  {
> +	    if (GET_MODE_CLASS(arg2_mode) == MODE_INT)
> +	      /* Signed and unsigned are handled the same.  */
> +	      instance_code = RS6000_BIF_VREPLACE_UN_UDI;
> +	    else if (GET_MODE_CLASS(arg2_mode) == MODE_FLOAT)
> +	      instance_code = RS6000_BIF_VREPLACE_UN_DF;
> +	    else
> +	      /* Only have support for long long int and double.  */
> +	      gcc_assert (0);
> +	  }

I don't think putting a gcc_assert (0) (or gcc_unreachable()) for
unexpected types is a correct thing to do here, we are looking for
a matched overloaded bif instance here, it's likely that we can't
find a matched, we should emit some error messages instead of ICE.

Could you restructure it like:

    if (arg2_mode == SImode)
       ...
    else if (arg2_mode == DImode)
       ...
    else if (arg2_mode == SFmode)
       ...
    else if (arg2_mode == DFmode)
       ...
    else
       break;

The "break" will fall into the general unexpected cases handling.

> +	else
> +	  /* Only have support for word and double.   */
> +	  gcc_assert (0);
> +
> +	tree call = find_instance (&unsupported_builtin, &instance,
> +				   instance_code, fcode, types, args, nargs);
>  	if (call != error_mark_node)
>  	  return call;
>  	break;
> diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
> index c582490c084..2e33bc97a6c 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -3059,18 +3059,18 @@
>      VREPLACE_ELT_V2DF
>  
>  [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
> -  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
> -    VREPLACE_UN_UV4SI
> -  vuc __builtin_vec_replace_un (vsi, signed int, const int);
> -    VREPLACE_UN_V4SI
> -  vuc __builtin_vec_replace_un (vull, unsigned long long, const int);
> -    VREPLACE_UN_UV2DI
> -  vuc __builtin_vec_replace_un (vsll, signed long long, const int);
> -    VREPLACE_UN_V2DI
> -  vuc __builtin_vec_replace_un (vf, float, const int);
> -    VREPLACE_UN_V4SF
> -  vuc __builtin_vec_replace_un (vd, double, const int);
> -    VREPLACE_UN_V2DF
> +  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
> +    VREPLACE_UN_USI
> +  vuc __builtin_vec_replace_un (vuc, signed int, const int);
> +    VREPLACE_UN_SI
> +  vuc __builtin_vec_replace_un (vuc, unsigned long long, const int);
> +    VREPLACE_UN_UDI
> +  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
> +    VREPLACE_UN_DI
> +  vuc __builtin_vec_replace_un (vuc, float, const int);
> +    VREPLACE_UN_SF
> +  vuc __builtin_vec_replace_un (vuc, double, const int);
> +    VREPLACE_UN_DF
>  
>  [VEC_REVB, vec_revb, __builtin_vec_revb]
>    vss __builtin_vec_revb (vss);
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 0a34ceebeb5..50b65a18eac 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -389,6 +389,13 @@ (define_mode_attr REPLACE_ELT_sh [(V4SI "2") (V4SF "2")
>  (define_mode_attr REPLACE_ELT_max [(V4SI "12") (V4SF "12")
>  				   (V2DI  "8") (V2DF "8")])
>  
> +;; vector replace unaligned modes
> +(define_mode_iterator VEC_RU [SI SF DI DF])
> +(define_mode_attr VEC_RU_char [(SI "w")
> +			       (SF "w")
> +			       (DI "d")
> +			       (DF "d")])

I think we can extend the existing REPLACE_ELT by adding the modes of VEC_RU,
guard its current uses with VECTOR_MODE_P (<MODE>mode) and guard your below
define_insn use with !VECTOR_MODE_P (<MODE>mode).

And extend REPLACE_ELT_char as well.

> +
>  ;; Like VM2 in altivec.md, just do char, short, int, long, float and double
>  (define_mode_iterator VM3 [V4SI
>  			   V8HI
> @@ -4191,12 +4198,12 @@ (define_insn "vreplace_elt_<mode>_inst"
>  
>  (define_insn "vreplace_un_<mode>"
>   [(set (match_operand:V16QI 0 "register_operand" "=v")
> -  (unspec:V16QI [(match_operand:REPLACE_ELT 1 "register_operand" "0")
> -                 (match_operand:<VEC_base> 2 "register_operand" "r")
> +  (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
> +		 (match_operand:VEC_RU 2 "register_operand" "r")
>  		 (match_operand:QI 3 "const_0_to_12_operand" "n")]
>  		UNSPEC_REPLACE_UN))]
>   "TARGET_POWER10"

... with the above suggestion, this condition can be:
"TARGET_POWER10 && !VECTOR_MODE_P (<MODE>mode)"

> - "vins<REPLACE_ELT_char> %0,%2,%3"
> + "vins<VEC_RU_char> %0,%2,%3"
>   [(set_attr "type" "vecsimple")])
>  
>  ;; VSX_EXTRACT optimizations
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_1.c
> similarity index 54%
> rename from gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
> rename to gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_1.c
> index 27318822871..f5d30a88067 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_1.c
> @@ -20,6 +20,9 @@ main (int argc, char *argv [])
>    unsigned char ch;
>    unsigned int index;
>  
> +  vector unsigned char src_va_uchar;
> +  vector unsigned char expected_vresult_uchar;
> +
>    vector unsigned int vresult_uint;
>    vector unsigned int expected_vresult_uint;
>    vector unsigned int src_va_uint;
> @@ -64,10 +67,10 @@ main (int argc, char *argv [])
>  						 
>    vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
>  
> -  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
> +  if (!vec_all_eq (vresult_uint, expected_vresult_uint)) {
>  #if DEBUG
>      printf("ERROR, vec_replace_elt (src_vb_uint, src_va_uint, index)\n");
> -    for(i = 0; i < 4; i++)
> +    for (i = 0; i < 4; i++)
>        printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
>  	     i, vresult_uint[i], i, expected_vresult_uint[i]);
>  #else
> @@ -82,10 +85,10 @@ main (int argc, char *argv [])
>  						 
>    vresult_int = vec_replace_elt (src_va_int, src_a_int, 1);
>  
> -  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
> +  if (!vec_all_eq (vresult_int, expected_vresult_int)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_elt (src_vb_int, src_va_int, index)\n");
> -    for(i = 0; i < 4; i++)
> +    printf("ERROR, vec_replace_elt (src_vb_int, src_a_int, index)\n");
> +    for (i = 0; i < 4; i++)
>        printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
>  	     i, vresult_int[i], i, expected_vresult_int[i]);
>  #else
> @@ -100,10 +103,10 @@ main (int argc, char *argv [])
>  						 
>    vresult_float = vec_replace_elt (src_va_float, src_a_float, 1);
>  
> -  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
> +  if (!vec_all_eq (vresult_float, expected_vresult_float)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_elt (src_vb_float, src_va_float, index)\n");
> -    for(i = 0; i < 4; i++)
> +    printf("ERROR, vec_replace_elt (src_vb_float, src_a_float, index)\n");
> +    for (i = 0; i < 4; i++)
>        printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n",
>  	     i, vresult_float[i], i, expected_vresult_float[i]);
>  #else
> @@ -121,8 +124,8 @@ main (int argc, char *argv [])
>  
>    if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_elt (src_vb_ullint, src_va_ullint, index)\n");
> -    for(i = 0; i < 2; i++)
> +    printf("ERROR, vec_replace_elt (src_vb_ullint, src_a_ullint, index)\n");
> +    for (i = 0; i < 2; i++)
>        printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
>  	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
>  #else
> @@ -137,10 +140,10 @@ main (int argc, char *argv [])
>  						 
>    vresult_llint = vec_replace_elt (src_va_llint, src_a_llint, 1);
>  
> -  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
> +  if (!vec_all_eq (vresult_llint, expected_vresult_llint)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_elt (src_vb_llint, src_va_llint, index)\n");
> -    for(i = 0; i < 2; i++)
> +    printf("ERROR, vec_replace_elt (src_vb_llint, src_a_llint, index)\n");
> +    for (i = 0; i < 2; i++)
>        printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
>  	     i, vresult_llint[i], i, expected_vresult_llint[i]);
>  #else
> @@ -155,10 +158,10 @@ main (int argc, char *argv [])
>  						 
>    vresult_double = vec_replace_elt (src_va_double, src_a_double, 1);
>  
> -  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
> +  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_elt (src_vb_double, src_va_double, index)\n");
> -    for(i = 0; i < 2; i++)
> +    printf("ERROR, vec_replace_elt (src_vb_double, src_a_double, index)\n");
> +    for (i = 0; i < 2; i++)
>        printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n",
>  	     i, vresult_double[i], i, expected_vresult_double[i]);
>  #else
> @@ -169,60 +172,68 @@ main (int argc, char *argv [])
>  
>    /* Vector replace 32-bit element, unaligned */
>    src_a_uint = 345;
> -  src_va_uint = (vector unsigned int) { 1, 2, 0, 0 };
> -  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
> +  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
> +					  0, 0, 0, 0, 0, 0, 0, 0 };
> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 };
>    /* Byte index 7 will overwrite part of elements 2 and 3 */
> -  expected_vresult_uint = (vector unsigned int) { 1, 2, 345*256, 0 };
> +  expected_vresult_uchar
> +    = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
> +			       0, 0x59, 0x1, 0, 0, 0, 0, 0 };
>  						 
> -  vresult_uchar = vec_replace_unaligned (src_va_uint, src_a_uint, 3);
> -  vresult_uint = (vector unsigned int) vresult_uchar;
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_uint, 3);
>  
> -  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_uint, src_va_uint, index)\n");
> -    for(i = 0; i < 4; i++)
> -      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
> -	     i, vresult_uint[i], i, expected_vresult_uint[i]);
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_uint, index)\n");
> +    for (i = 0; i < 16; i++)
> +      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uint[%d] = 0x%x\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>  #else
>      abort();
>  #endif
>    }
>  
>    src_a_int = 234;
> -  src_va_int = (vector int) { 1, 0, 3, 4 };
> -  vresult_int = (vector int) { 0, 0, 0, 0 };
> +  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 0, 0, 0, 0,
> +					  3, 0, 0, 0, 4, 0, 0, 0 };
> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 };
>    /* Byte index 7 will over write part of elements 1 and 2 */
> -  expected_vresult_int = (vector int) { 1, 234*256, 0, 4 };
> -						 
> -  vresult_uchar = vec_replace_unaligned (src_va_int, src_a_int, 7);
> -  vresult_int = (vector signed int) vresult_uchar;
> +  expected_vresult_uchar = (vector unsigned char) { 1, 0, 0, 0, 0, 0xea, 0, 0,
> +						    0, 0, 0, 0, 4, 0, 0, 0 };
>  
> -  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_int, 7);
> +
> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_int, src_va_int, index)\n");
> -    for(i = 0; i < 4; i++)
> -      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
> -	     i, vresult_int[i], i, expected_vresult_int[i]);
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_int, index)\n");
> +    for (i = 0; i < 16; i++)
> +      printf(" vresult_int[%d] = 0x%x, expected_vresult_int[%d] = 0x%x\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>  #else
>      abort();
>  #endif
>    }
>  
>    src_a_float = 34.0;
> -  src_va_float = (vector float) { 0.0, 10.0, 20.0, 30.0 };
> -  vresult_float = (vector float) { 0.0, 0.0, 0.0, 0.0 };
> -  expected_vresult_float = (vector float) { 0.0, 34.0, 20.0, 30.0 };
> +  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0x41, 0xa0,
> +					  5, 6, 7, 8, 0x41, 0xf0, 0, 0};
> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 };
> +  expected_vresult_uchar
> +    = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 8, 0x42,
> +			       5, 6, 7, 8, 0x41, 0xf0, 0, 0 };
>  						 
> -  vresult_uchar = vec_replace_unaligned (src_va_float, src_a_float, 8);
> -  vresult_float = (vector float) vresult_uchar;
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_float, 8);
>  
> -  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_float, src_va_float, "
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_float, "
>  	   "index)\n");
> -    for(i = 0; i < 4; i++)
> -      printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n",
> -	     i, vresult_float[i], i, expected_vresult_float[i]);
> +    for (i = 0; i < 16; i++)
> +      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uchar[%d] = 0x%x\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>  #else
>      abort();
>  #endif
> @@ -230,72 +241,76 @@ main (int argc, char *argv [])
>  
>    /* Vector replace 64-bit element, unaligned  */
>    src_a_ullint = 456;
> -  src_va_ullint = (vector unsigned long long int) { 0, 0x222 };
> -  vresult_ullint = (vector unsigned long long int) { 0, 0 };
> -  expected_vresult_ullint = (vector unsigned long long int) { 456*256,
> -							      0x200 };
> +  src_va_uchar = (vector unsigned char) { 0, 0xc, 0x1, 0, 0, 0, 0, 0,
> +					  0x22, 0x2, 0, 0, 0, 0, 0, 0 };
> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 };
> +  expected_vresult_uchar
> +    = (vector unsigned char) { 0, 0xc8, 0x1, 0, 0, 0, 0, 0,
> +			       0, 2, 0, 0, 0, 0, 0, 0 };
>  						 
>    /* Byte index 7 will over write least significant byte of  element 0  */
> -  vresult_uchar = vec_replace_unaligned (src_va_ullint, src_a_ullint, 7);
> -  vresult_ullint = (vector unsigned long long) vresult_uchar;
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_ullint, 7);
>  
> -  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_ullint, src_va_ullint, "
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_ullint, "
>  	   "index)\n");
> -    for(i = 0; i < 2; i++)
> -      printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
> -	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
> +    for (i = 0; i < 15; i++)
> +      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uchar[%d] = 0x%x\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>  #else
>      abort();
>  #endif
>    }
>  
>    src_a_llint = 678;
> -  src_va_llint = (vector long long int) { 0, 0x101 };
> +  src_va_uchar = (vector unsigned char) { 0, 0xa6, 0x2, 0, 0, 0, 0, 0,
> +					  0x0, 0x1, 0, 0, 0, 0, 0, 0 };
>    vresult_llint = (vector long long int) { 0, 0 };
>    /* Byte index 7 will over write least significant byte of  element 0  */
> -  expected_vresult_llint = (vector long long int) { 678*256, 0x100 };
> +  expected_vresult_uchar
> +    = (vector unsigned char) { 0x0, 0xa6, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0,
> +			       0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0 };
>  						 
> -  vresult_uchar = vec_replace_unaligned (src_va_llint, src_a_llint, 7);
> -  vresult_llint = (vector signed long long) vresult_uchar;
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_llint, 7);
>  
> -  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_llint, src_va_llint, "
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_llint, "
>  	   "index)\n");
> -    for(i = 0; i < 2; i++)
> -      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
> -	     i, vresult_llint[i], i, expected_vresult_llint[i]);
> +    for (i = 0; i < 16; i++)
> +      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>  #else
>      abort();
>  #endif
>    }
>    
>    src_a_double = 678.0;
> -  src_va_double = (vector double) { 0.0, 50.0 };
> +  //  src_va_double = (vector double) { 0.0, 50.0 };
> +  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					  0, 0, 0, 0, 0, 0, 0, 0 };
>    vresult_double = (vector double) { 0.0, 0.0 };
>    expected_vresult_double = (vector double) { 0.0, 678.0 };
>  						 
> -  vresult_uchar = vec_replace_unaligned (src_va_double, src_a_double, 0);
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_double, 0);
>    vresult_double = (vector double) vresult_uchar;
>  
> -  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {> +  if (!vec_all_eq (vresult_double, expected_vresult_double)) {

Nit: The used vector type for vresult* isn't consistent with the others.
Since the underlying data matters but type doesn't, it's fine.  Just feel
weird float and double use different ways. I'd expect either both adopt
floating point vector type, or both use uchar vector type. :) 

>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_double, src_va_double, "
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_double, "
>  	   "index)\n");
> -    for(i = 0; i < 2; i++)
> +    for (i = 0; i < 2; i++)
>        printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n",
>  	     i, vresult_double[i], i, expected_vresult_double[i]);
>  #else
>      abort();
>  #endif
>    }
> -    
> +  
>    return 0;
>  }
>  
>  /* { dg-final { scan-assembler-times {\mvinsw\M} 6 } } */
>  /* { dg-final { scan-assembler-times {\mvinsd\M} 6 } } */
> -
> -
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_2.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_2.c
> new file mode 100644
> index 00000000000..806ff3317e6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable_2.c
> @@ -0,0 +1,50 @@
> +/* { dg-do run { target { power10_hw } } } */
> +/* { dg-do link { target { ! power10_hw } } } */

Why uses link?  compile would be enough?

> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps -flax-vector-conversions" } */

IMHO we don't need "-flax-vector-conversions" here, see below ...

> +
> +#include <altivec.h>
> +
> +#define DEBUG 0
> +
> +#if DEBUG
> +#include <stdio.h>
> +#endif
> +
> +extern void abort (void);
> +
> +int
> +main (int argc, char *argv [])
> +{
> +  int i;
> +  vector unsigned long long int vresult_ullint;
> +  vector unsigned long long int expected_vresult_ullint;
> +  vector unsigned long long int src_va_ullint;
> +  unsigned int long long src_a_ullint;
> +
> +  /* Replace doubleword size chunk specified as a constant that can be
> +     represented by an int.  Should generate a vinsd instruction.  Note, this
> +     test requires command line option -flax-vector-conversions.  */
> +  src_a_ullint = 456;
> +  src_va_ullint = (vector unsigned long long int) { 0, 11 };
> +  vresult_ullint = (vector unsigned long long int) { 0, 2 };
> +  expected_vresult_ullint = (vector unsigned long long int) { 0, 456 };
> +						 
> +  vresult_ullint = vec_replace_unaligned ((vector unsigned char)src_va_ullint,
> +					  src_a_ullint, 0);

... one explicit conversion on the return value of vec_replace_unaligned to
"vector unsigned long long int" can get rid of "-flax-vector-conversions".

I also noticed the expected result would be wrong on BE.

// assuming that [A, B] is the register view, A & B are doubleword.

On LE, src register looks like [11, 0], it will become to [456, 0] after vinsd.

On BE, src register looks like [0, 11], it will become to [456, 11] after vinsd.

So the expected results should be {0, 456} on LE while {456, 11} on BE.

I think the existing vec-replace-word-runnable_1.c doesn't have the correct expected
results for BE either, it isn't exposed as we don't have P10 BE test coverage for
running cases for now.  We can have a separated patch to fix them together later.

BR,
Kewen

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments
  2023-07-13  9:41 ` Kewen.Lin
@ 2023-07-17 18:33   ` Carl Love
  0 siblings, 0 replies; 3+ messages in thread
From: Carl Love @ 2023-07-17 18:33 UTC (permalink / raw)
  To: Kewen.Lin, cel
  Cc: Peter Bergner, Segher Boessenkool, David Edelsohn, gcc-patches

On Thu, 2023-07-13 at 17:41 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/8 04:18, Carl Love wrote:
> > GCC maintainers:
> > 
> > Version 3, added code to altivec_resolve_overloaded_builtin so the
> > correct instruction is selected for the size of the second
> > argument. 
> > This restores the instruction counts to the original values where
> > the
> > correct instructions were originally being generated.  The naming
> > of
> 
> Nice, I have some comments inlined below.
> 
> > the overloaded builtin instances and builtin definitions were
> > changed
> > to reflect the type of the second argument since the type of the
> > first
> > argument is now the same for all overloaded instances.  A new
> > builtin
> > test file was added for the case where the first argument is cast
> > to
> > the unsigned long long type.  This test requires the -flax-vector-
> > conversions gcc command line option.  Since the other tests do not
> > require this option, I felt that the new test needed to be in a
> > separate file.  Finally some formatting fixes were made in the
> > original
> > test file.  Patch has been retested on Power 10 with no
> > regressions.
> > 
> > Version 2, fixed various typos.  Updated the change log body to say
> > the
> > instruction counts were updated.  The instruction counts changed as
> > a
> > result of changing the first argument of the vec_replace_unaligned
> > builtin call from vector unsigned long long (vull) to vector
> > unsigned
> > char (vuc).  When the first argument was vull the builtin call
> > generated the vinsd instruction for the two test cases.  The
> > updated
> > call with vuc as the first argument generates two vinsw
> > instructions
> > instead.  Patch was retested on Power 10 with no regressions.
> > 
> > The following patch fixes the first argument in the builtin
> > definition
> > and the corresponding test cases.  Initially, the builtin
> > specification
> > was wrong due to a cut and past error.  The documentation was fixed
> > in:
> > 
> >    commit ed3fea09b18f67e757b5768b42cb6e816626f1db
> >    Author: Bill Schmidt <wschmidt@linux.ibm.com>
> >    Date:   Fri Feb 4 13:07:17 2022 -0600
> > 
> >        rs6000: Correct function prototypes for
> > vec_replace_unaligned
> > 
> >        Due to a pasto error in the documentation,
> > vec_replace_unaligned
> > was
> >        implemented with the same function prototypes as
> > vec_replace_elt.  It was
> >        intended that vec_replace_unaligned always specify output
> > vectors as having
> >        type vector unsigned char, to emphasize that elements are
> > potentially
> >        misaligned by this built-in function.  This patch corrects
> > the
> >        misimplementation.
> >     ....
> > 
> > This patch fixes the arguments in the definitions and updates the
> > testcases accordingly.  Additionally, a few minor spacing issues
> > are
> > fixed.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable for mainline.  Thanks.
> > 
> >                  Carl 
> > 
> > ------------------
> > rs6000, fix vec_replace_unaligned built-in arguments
> > 
> > The first argument of the vec_replace_unaligned built-in should
> > always be
> > unsigned char, as specified in gcc/doc/extend.texi.
> 
> Maybe "be with type vector unsigned char"?

Changed to 

  The first argument of the vec_replace_unaligned built-in should
always be of type unsigned char, ....

> 
> > This patch fixes the builtin definitions and updates the test cases
> > to use
> > the correct arguments.  The original test file is renamed and a
> > second test
> > file is added for a new test case.
> > 
> > gcc/ChangeLog:
> > 	* config/rs6000/rs6000-builtins.def: Rename
> > 	__builtin_altivec_vreplace_un_uv2di as
> > __builtin_altivec_vreplace_un_udi
> > 	__builtin_altivec_vreplace_un_uv4si as
> > __builtin_altivec_vreplace_un_usi
> > 	__builtin_altivec_vreplace_un_v2df as
> > __builtin_altivec_vreplace_un_df
> > 	__builtin_altivec_vreplace_un_v2di as
> > __builtin_altivec_vreplace_un_di
> > 	__builtin_altivec_vreplace_un_v4sf as
> > __builtin_altivec_vreplace_un_sf
> > 	__builtin_altivec_vreplace_un_v4si as
> > __builtin_altivec_vreplace_un_si.
> > 	Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI
> > as
> > 	VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
> > 	VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
> > 	VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI.
> > 	Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as
> > 	vreplace_un_si, vreplace_un_v2df as vreplace_un_df,
> > 	vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as
> > 	vreplace_un_sf, vreplace_un_v4si as vreplace_un_si.
> > 	* config/rs6000/rs6000-c.cc (find_instance): Add new argument
> > 	nargs.  Add nargs check.  Extend function to handle three
> > arguments.
> > 	(altivec_resolve_overloaded_builtin): Add new argument nargs to
> > 	function calls.  Add case RS6000_OVLD_VEC_REPLACE_UN.
> > 	* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
> > 	Fix first argument type.  Rename VREPLACE_UN_UV4SI as
> > 	VREPLACE_UN_USI, VREPLACE_UN_V4SI as VREPLACE_UN_SI,
> > 	VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_V2DI as
> > 	VREPLACE_UN_DI, VREPLACE_UN_V4SF as VREPLACE_UN_SF,
> > 	VREPLACE_UN_V2DF as VREPLACE_UN_DF.
> > 	* config/rs6000/vsx.md (VEC_RU): New mode iterator.
> > 	(VEC_RU_char): New mode attribute.
> > 	(vreplace_un_<mode>): Change iterator and mode attribute.
> > 
> > gcc/testsuite/ChangeLog:
> > 	* gcc.target/powerpc/vec-replace-word-runnable.c: Renamed
> > 	vec-replace-word-runnable_1.c.
> > 	* gcc.target/powerpc/vec-replace-word-runnable_1.c
> > 	(dg-options): add -flax-vector-conversions.
> > 	(vec_replace_unaligned) Fix first argument type.
> > 	(vresult_uchar): Fix expected results.
> > 	(vec_replace_unaligned): Update for loop to check uchar
> > results.
> > 	Remove extra spaces in if statements. Insert missing spaces in
> > 	for statements.
> > 	* gcc.target/powerpc/vec-replace-word-runnable_2.c: New test
> > file.
> > ---
> >  gcc/config/rs6000/rs6000-builtins.def         |  28 +--
> >  gcc/config/rs6000/rs6000-c.cc                 |  62 ++++++-
> >  gcc/config/rs6000/rs6000-overload.def         |  24 +--
> >  gcc/config/rs6000/vsx.md                      |  13 +-
> >  ...nnable.c => vec-replace-word-runnable_1.c} | 165 ++++++++++--
> > ------
> >  .../powerpc/vec-replace-word-runnable_2.c     |  50 ++++++
> >  6 files changed, 233 insertions(+), 109 deletions(-)
> >  rename gcc/testsuite/gcc.target/powerpc/{vec-replace-word-
> > runnable.c => vec-replace-word-runnable_1.c} (54%)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-replace-
> > word-runnable_2.c
> > 
> > diff --git a/gcc/config/rs6000/rs6000-builtins.def
> > b/gcc/config/rs6000/rs6000-builtins.def
> > index 5ac6af4c6e3..8206cc86014 100644
> > --- a/gcc/config/rs6000/rs6000-builtins.def
> > +++ b/gcc/config/rs6000/rs6000-builtins.def
> > @@ -3388,26 +3388,26 @@
> >    const vull __builtin_altivec_vpextd (vull, vull);
> >      VPEXTD vpextd {}
> >  
> > -  const vuc __builtin_altivec_vreplace_un_uv2di (vull, unsigned
> > long long, \
> > -                                                 const int<4>);
> > -    VREPLACE_UN_UV2DI vreplace_un_v2di {}
> > +  const vuc __builtin_altivec_vreplace_un_udi (vuc, unsigned long
> > long, \
> > +                                               const int<4>);
> > +    VREPLACE_UN_UDI vreplace_un_di {}
> >  
> > -  const vuc __builtin_altivec_vreplace_un_uv4si (vui, unsigned
> > int, \
> > -                                                 const int<4>);
> > -    VREPLACE_UN_UV4SI vreplace_un_v4si {}
> > +  const vuc __builtin_altivec_vreplace_un_usi (vuc, unsigned int,
> > \
> > +                                               const int<4>);
> > +    VREPLACE_UN_USI vreplace_un_si {}
> >  
> > -  const vuc __builtin_altivec_vreplace_un_v2df (vd, double, const
> > int<4>);
> > -    VREPLACE_UN_V2DF vreplace_un_v2df {}
> > +  const vuc __builtin_altivec_vreplace_un_df (vuc, double, const
> > int<4>);
> > +    VREPLACE_UN_DF vreplace_un_df {}
> >  
> > -  const vuc __builtin_altivec_vreplace_un_v2di (vsll, signed long
> > long, \
> > +  const vuc __builtin_altivec_vreplace_un_di (vuc, signed long
> > long, \
> >                                                  const int<4>);
> > -    VREPLACE_UN_V2DI vreplace_un_v2di {}
> > +    VREPLACE_UN_DI vreplace_un_di {}
> >  
> > -  const vuc __builtin_altivec_vreplace_un_v4sf (vf, float, const
> > int<4>);
> > -    VREPLACE_UN_V4SF vreplace_un_v4sf {}
> > +  const vuc __builtin_altivec_vreplace_un_sf (vuc, float, const
> > int<4>);
> > +    VREPLACE_UN_SF vreplace_un_sf {}
> >  
> > -  const vuc __builtin_altivec_vreplace_un_v4si (vsi, signed int,
> > const int<4>);
> > -    VREPLACE_UN_V4SI vreplace_un_v4si {}
> > +  const vuc __builtin_altivec_vreplace_un_si (vuc, signed int,
> > const int<4>);
> > +    VREPLACE_UN_SI vreplace_un_si {}
> >  
> >    const vull __builtin_altivec_vreplace_uv2di (vull, unsigned long
> > long, \
> >                                                 const int<1>);
> > diff --git a/gcc/config/rs6000/rs6000-c.cc
> > b/gcc/config/rs6000/rs6000-c.cc
> > index 8555174d36e..7862ccb0c45 100644
> > --- a/gcc/config/rs6000/rs6000-c.cc
> > +++ b/gcc/config/rs6000/rs6000-c.cc
> > @@ -1674,13 +1674,18 @@ tree
> >  find_instance (bool *unsupported_builtin, ovlddata **instance,
> >  	       rs6000_gen_builtins instance_code,
> >  	       rs6000_gen_builtins fcode,
> > -	       tree *types, tree *args)
> > +	       tree *types, tree *args, int nargs)
> 
> I think we should extend this find_instance to cover any number of
> arguments
> (well, positive number :)), not only for the required three arguments
> here.
> 
> Could you do it with a separated patch instead?
> 
> I think we can go with the loop bounded with the given NARGS, check
> each
> type[i] is rs6000_builtin_type_compatible with the corresponding
> paramtype
> from TYPE_ARG_TYPES chains, if any of them fails, return
> error_mark_node.
> If everything goes well, further check rs6000_builtin_decl and 
> rs6000_builtin_is_supported, if either fails, set
> unsupported_builtin,
> otherwise call altivec_build_resolved_builtin.

Yup, I did a quick and dirty fix when working on the patch.  I didn't
go back and fix it generally.  

I updated the function with a general fix for narg as a separate fix.

> 
> >  {
> >    while (*instance && (*instance)->bifid != instance_code)
> >      *instance = (*instance)->next;
> >  
> >    ovlddata *inst = *instance;
> >    gcc_assert (inst != NULL);
> > +
> > +  /* Function only supports nargs equal to 2 or 3.  */
> > +  if (!(nargs == 2 || nargs == 3))
> > +    gcc_assert(0);
> 
> A side comment for the above change (this line isn't needed any more
> with
> the above requested change): it can be read better with
> "gcc_assert (nargs == 2 || nargs == 3);"

Not needed with the general fix.

> 
> > +
> >    /* It is possible for an instance to require a data type that
> > isn't
> >       defined on this target, in which case inst->fntype will be
> > NULL.  */
> >    if (!inst->fntype)
> > @@ -1688,15 +1693,25 @@ find_instance (bool *unsupported_builtin,
> > ovlddata **instance,
> >    tree fntype = rs6000_builtin_info[inst->bifid].fntype;
> >    tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
> >    tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES
> > (fntype)));
> > +  tree parmtype2;
> > +  int arg2_compatible = true;
> > +
> > +  if (nargs == 3)
> > +    {
> > +      parmtype2
> > +       = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (TYPE_ARG_TYPES
> > (fntype))));
> > +      arg2_compatible = rs6000_builtin_type_compatible (types[2],
> > parmtype2);
> > +    }
> >  
> >    if (rs6000_builtin_type_compatible (types[0], parmtype0)
> > -      && rs6000_builtin_type_compatible (types[1], parmtype1))
> > +      && rs6000_builtin_type_compatible (types[1], parmtype1)
> > +      && arg2_compatible)
> >      {
> >        if (rs6000_builtin_decl (inst->bifid, false) !=
> > error_mark_node
> >  	  && rs6000_builtin_is_supported (inst->bifid))
> >  	{
> >  	  tree ret_type = TREE_TYPE (inst->fntype);
> > -	  return altivec_build_resolved_builtin (args, 2, fntype,
> > ret_type,
> > +	  return altivec_build_resolved_builtin (args, nargs, fntype,
> > ret_type,
> >  						 inst->bifid, fcode);
> >  	}
> >        else
> > @@ -1916,7 +1931,7 @@ altivec_resolve_overloaded_builtin
> > (location_t loc, tree fndecl,
> >  	  instance_code = RS6000_BIF_CMPB_32;
> >  
> >  	tree call = find_instance (&unsupported_builtin, &instance,
> > -				   instance_code, fcode, types, args);
> > +				   instance_code, fcode, types, args,
> > nargs);
> >  	if (call != error_mark_node)
> >  	  return call;
> >  	break;
> > @@ -1949,7 +1964,44 @@ altivec_resolve_overloaded_builtin
> > (location_t loc, tree fndecl,
> >  	  }
> >  
> >  	tree call = find_instance (&unsupported_builtin, &instance,
> > -				   instance_code, fcode, types, args);
> > +				   instance_code, fcode, types, args,
> > nargs);
> > +	if (call != error_mark_node)
> > +	  return call;
> > +	break;
> > +      }
> > +
> > +    case RS6000_OVLD_VEC_REPLACE_UN:
> > +      {
> > +	machine_mode arg2_mode = TYPE_MODE (types[1]);
> > +
> > +	if (GET_MODE_SIZE(arg2_mode) == 4)
> > +	  {
> > +	    if (GET_MODE_CLASS(arg2_mode) == MODE_INT)
> > +	      /* Signed and unsigned are handled the same.  */
> > +	      instance_code = RS6000_BIF_VREPLACE_UN_USI;
> > +	    else if (GET_MODE_CLASS(arg2_mode) == MODE_FLOAT)
> > +	      instance_code = RS6000_BIF_VREPLACE_UN_SF;
> > +	    else
> > +	      /* Only have support for int and float.  */
> > +		gcc_assert (0);
> > +	  }
> > +	else if (GET_MODE_SIZE(arg2_mode) == 8)
> > +	  {
> > +	    if (GET_MODE_CLASS(arg2_mode) == MODE_INT)
> > +	      /* Signed and unsigned are handled the same.  */
> > +	      instance_code = RS6000_BIF_VREPLACE_UN_UDI;
> > +	    else if (GET_MODE_CLASS(arg2_mode) == MODE_FLOAT)
> > +	      instance_code = RS6000_BIF_VREPLACE_UN_DF;
> > +	    else
> > +	      /* Only have support for long long int and double.  */
> > +	      gcc_assert (0);
> > +	  }
> 
> I don't think putting a gcc_assert (0) (or gcc_unreachable()) for
> unexpected types is a correct thing to do here, we are looking for
> a matched overloaded bif instance here, it's likely that we can't
> find a matched, we should emit some error messages instead of ICE.
> 
> Could you restructure it like:
> 
>     if (arg2_mode == SImode)
>        ...
>     else if (arg2_mode == DImode)
>        ...
>     else if (arg2_mode == SFmode)
>        ...
>     else if (arg2_mode == DFmode)
>        ...
>     else
>        break;
> 
> The "break" will fall into the general unexpected cases handling.
> 

I like the suggested if then else a lot better.  Much cleaner then
looking at mode and length.  Changed to use the suggested approach.

> > +	else
> > +	  /* Only have support for word and double.   */
> > +	  gcc_assert (0);
> > +
> > +	tree call = find_instance (&unsupported_builtin, &instance,
> > +				   instance_code, fcode, types, args,
> > nargs);
> >  	if (call != error_mark_node)
> >  	  return call;
> >  	break;
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..2e33bc97a6c 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -3059,18 +3059,18 @@
> >      VREPLACE_ELT_V2DF
> >  
> >  [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
> > -  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
> > -    VREPLACE_UN_UV4SI
> > -  vuc __builtin_vec_replace_un (vsi, signed int, const int);
> > -    VREPLACE_UN_V4SI
> > -  vuc __builtin_vec_replace_un (vull, unsigned long long, const
> > int);
> > -    VREPLACE_UN_UV2DI
> > -  vuc __builtin_vec_replace_un (vsll, signed long long, const
> > int);
> > -    VREPLACE_UN_V2DI
> > -  vuc __builtin_vec_replace_un (vf, float, const int);
> > -    VREPLACE_UN_V4SF
> > -  vuc __builtin_vec_replace_un (vd, double, const int);
> > -    VREPLACE_UN_V2DF
> > +  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
> > +    VREPLACE_UN_USI
> > +  vuc __builtin_vec_replace_un (vuc, signed int, const int);
> > +    VREPLACE_UN_SI
> > +  vuc __builtin_vec_replace_un (vuc, unsigned long long, const
> > int);
> > +    VREPLACE_UN_UDI
> > +  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
> > +    VREPLACE_UN_DI
> > +  vuc __builtin_vec_replace_un (vuc, float, const int);
> > +    VREPLACE_UN_SF
> > +  vuc __builtin_vec_replace_un (vuc, double, const int);
> > +    VREPLACE_UN_DF
> >  
> >  [VEC_REVB, vec_revb, __builtin_vec_revb]
> >    vss __builtin_vec_revb (vss);
> > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> > index 0a34ceebeb5..50b65a18eac 100644
> > --- a/gcc/config/rs6000/vsx.md
> > +++ b/gcc/config/rs6000/vsx.md
> > @@ -389,6 +389,13 @@ (define_mode_attr REPLACE_ELT_sh [(V4SI "2")
> > (V4SF "2")
> >  (define_mode_attr REPLACE_ELT_max [(V4SI "12") (V4SF "12")
> >  				   (V2DI  "8") (V2DF "8")])
> >  
> > +;; vector replace unaligned modes
> > +(define_mode_iterator VEC_RU [SI SF DI DF])
> > +(define_mode_attr VEC_RU_char [(SI "w")
> > +			       (SF "w")
> > +			       (DI "d")
> > +			       (DF "d")])
> 
> I think we can extend the existing REPLACE_ELT by adding the modes of
> VEC_RU,
> guard its current uses with VECTOR_MODE_P (<MODE>mode) and guard your
> below
> define_insn use with !VECTOR_MODE_P (<MODE>mode).
> 
> And extend REPLACE_ELT_char as well.

Extending REPLACE_ELT didn't work as the 
  define_insn "vreplace_elt_<mode> ..." 
has issues because VEC_base doesn't handle the new modes.  

Per private discussion, the following was done

Renamed: REPLACE_ELT  as  REPLACE_ELT_v for the iterator name and
associated define_mode_attrs.  This handles the vector cases.

Renamed VEC_RU to REPLACE_ELT for the iterator name and VEC_RU_char to
REPLACE_ELT_char.
> 
> > +
> >  ;; Like VM2 in altivec.md, just do char, short, int, long, float
> > and double
> >  (define_mode_iterator VM3 [V4SI
> >  			   V8HI
> > @@ -4191,12 +4198,12 @@ (define_insn "vreplace_elt_<mode>_inst"
> >  
> >  (define_insn "vreplace_un_<mode>"
> >   [(set (match_operand:V16QI 0 "register_operand" "=v")
> > -  (unspec:V16QI [(match_operand:REPLACE_ELT 1 "register_operand"
> > "0")
> > -                 (match_operand:<VEC_base> 2 "register_operand"
> > "r")
> > +  (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
> > +		 (match_operand:VEC_RU 2 "register_operand" "r")
> >  		 (match_operand:QI 3 "const_0_to_12_operand" "n")]
> >  		UNSPEC_REPLACE_UN))]
> >   "TARGET_POWER10"
> 
> ... with the above suggestion, this condition can be:
> "TARGET_POWER10 && !VECTOR_MODE_P (<MODE>mode)"

Didn't go down this path, see comment above.

> 
> > - "vins<REPLACE_ELT_char> %0,%2,%3"
> > + "vins<VEC_RU_char> %0,%2,%3"
> >   [(set_attr "type" "vecsimple")])
> >  
> >  ;; VSX_EXTRACT optimizations
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable_1.c
> > similarity index 54%
> > rename from gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable.c
> > rename to gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable_1.c
> > index 27318822871..f5d30a88067 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable_1.c
> > @@ -20,6 +20,9 @@ main (int argc, char *argv [])
> >    unsigned char ch;
> >    unsigned int index;
> >  
> > +  vector unsigned char src_va_uchar;
> > +  vector unsigned char expected_vresult_uchar;
> > +
> >    vector unsigned int vresult_uint;
> >    vector unsigned int expected_vresult_uint;
> >    vector unsigned int src_va_uint;
> > @@ -64,10 +67,10 @@ main (int argc, char *argv [])
> >  						 
> >    vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
> >  
> > -  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
> > +  if (!vec_all_eq (vresult_uint, expected_vresult_uint)) {
> >  #if DEBUG
> >      printf("ERROR, vec_replace_elt (src_vb_uint, src_va_uint,
> > index)\n");
> > -    for(i = 0; i < 4; i++)
> > +    for (i = 0; i < 4; i++)
> >        printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] =
> > %d\n",
> >  	     i, vresult_uint[i], i, expected_vresult_uint[i]);
> >  #else
> > @@ -82,10 +85,10 @@ main (int argc, char *argv [])
> >  						 
> >    vresult_int = vec_replace_elt (src_va_int, src_a_int, 1);
> >  
> > -  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
> > +  if (!vec_all_eq (vresult_int, expected_vresult_int)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_elt (src_vb_int, src_va_int,
> > index)\n");
> > -    for(i = 0; i < 4; i++)
> > +    printf("ERROR, vec_replace_elt (src_vb_int, src_a_int,
> > index)\n");
> > +    for (i = 0; i < 4; i++)
> >        printf(" vresult_int[%d] = %d, expected_vresult_int[%d] =
> > %d\n",
> >  	     i, vresult_int[i], i, expected_vresult_int[i]);
> >  #else
> > @@ -100,10 +103,10 @@ main (int argc, char *argv [])
> >  						 
> >    vresult_float = vec_replace_elt (src_va_float, src_a_float, 1);
> >  
> > -  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
> > +  if (!vec_all_eq (vresult_float, expected_vresult_float)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_elt (src_vb_float, src_va_float,
> > index)\n");
> > -    for(i = 0; i < 4; i++)
> > +    printf("ERROR, vec_replace_elt (src_vb_float, src_a_float,
> > index)\n");
> > +    for (i = 0; i < 4; i++)
> >        printf(" vresult_float[%d] = %f, expected_vresult_float[%d]
> > = %f\n",
> >  	     i, vresult_float[i], i, expected_vresult_float[i]);
> >  #else
> > @@ -121,8 +124,8 @@ main (int argc, char *argv [])
> >  
> >    if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_elt (src_vb_ullint, src_va_ullint,
> > index)\n");
> > -    for(i = 0; i < 2; i++)
> > +    printf("ERROR, vec_replace_elt (src_vb_ullint, src_a_ullint,
> > index)\n");
> > +    for (i = 0; i < 2; i++)
> >        printf(" vresult_ullint[%d] = %d,
> > expected_vresult_ullint[%d] = %d\n",
> >  	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
> >  #else
> > @@ -137,10 +140,10 @@ main (int argc, char *argv [])
> >  						 
> >    vresult_llint = vec_replace_elt (src_va_llint, src_a_llint, 1);
> >  
> > -  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
> > +  if (!vec_all_eq (vresult_llint, expected_vresult_llint)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_elt (src_vb_llint, src_va_llint,
> > index)\n");
> > -    for(i = 0; i < 2; i++)
> > +    printf("ERROR, vec_replace_elt (src_vb_llint, src_a_llint,
> > index)\n");
> > +    for (i = 0; i < 2; i++)
> >        printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d]
> > = %d\n",
> >  	     i, vresult_llint[i], i, expected_vresult_llint[i]);
> >  #else
> > @@ -155,10 +158,10 @@ main (int argc, char *argv [])
> >  						 
> >    vresult_double = vec_replace_elt (src_va_double, src_a_double,
> > 1);
> >  
> > -  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
> > +  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_elt (src_vb_double, src_va_double,
> > index)\n");
> > -    for(i = 0; i < 2; i++)
> > +    printf("ERROR, vec_replace_elt (src_vb_double, src_a_double,
> > index)\n");
> > +    for (i = 0; i < 2; i++)
> >        printf(" vresult_double[%d] = %f,
> > expected_vresult_double[%d] = %f\n",
> >  	     i, vresult_double[i], i, expected_vresult_double[i]);
> >  #else
> > @@ -169,60 +172,68 @@ main (int argc, char *argv [])
> >  
> >    /* Vector replace 32-bit element, unaligned */
> >    src_a_uint = 345;
> > -  src_va_uint = (vector unsigned int) { 1, 2, 0, 0 };
> > -  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
> > +  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
> > +					  0, 0, 0, 0, 0, 0, 0, 0 };
> > +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> > +					   0, 0, 0, 0, 0, 0, 0, 0 };
> >    /* Byte index 7 will overwrite part of elements 2 and 3 */
> > -  expected_vresult_uint = (vector unsigned int) { 1, 2, 345*256, 0
> > };
> > +  expected_vresult_uchar
> > +    = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
> > +			       0, 0x59, 0x1, 0, 0, 0, 0, 0 };
> >  						 
> > -  vresult_uchar = vec_replace_unaligned (src_va_uint, src_a_uint,
> > 3);
> > -  vresult_uint = (vector unsigned int) vresult_uchar;
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_uint,
> > 3);
> >  
> > -  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
> > +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_uint,
> > src_va_uint, index)\n");
> > -    for(i = 0; i < 4; i++)
> > -      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] =
> > %d\n",
> > -	     i, vresult_uint[i], i, expected_vresult_uint[i]);
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
> > src_a_uint, index)\n");
> > +    for (i = 0; i < 16; i++)
> > +      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uint[%d]
> > = 0x%x\n",
> > +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> >  #else
> >      abort();
> >  #endif
> >    }
> >  
> >    src_a_int = 234;
> > -  src_va_int = (vector int) { 1, 0, 3, 4 };
> > -  vresult_int = (vector int) { 0, 0, 0, 0 };
> > +  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 0, 0, 0, 0,
> > +					  3, 0, 0, 0, 4, 0, 0, 0 };
> > +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> > +					   0, 0, 0, 0, 0, 0, 0, 0 };
> >    /* Byte index 7 will over write part of elements 1 and 2 */
> > -  expected_vresult_int = (vector int) { 1, 234*256, 0, 4 };
> > -						 
> > -  vresult_uchar = vec_replace_unaligned (src_va_int, src_a_int,
> > 7);
> > -  vresult_int = (vector signed int) vresult_uchar;
> > +  expected_vresult_uchar = (vector unsigned char) { 1, 0, 0, 0, 0,
> > 0xea, 0, 0,
> > +						    0, 0, 0, 0, 4, 0,
> > 0, 0 };
> >  
> > -  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_int,
> > 7);
> > +
> > +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_int, src_va_int,
> > index)\n");
> > -    for(i = 0; i < 4; i++)
> > -      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] =
> > %d\n",
> > -	     i, vresult_int[i], i, expected_vresult_int[i]);
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_a_int,
> > index)\n");
> > +    for (i = 0; i < 16; i++)
> > +      printf(" vresult_int[%d] = 0x%x, expected_vresult_int[%d] =
> > 0x%x\n",
> > +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> >  #else
> >      abort();
> >  #endif
> >    }
> >  
> >    src_a_float = 34.0;
> > -  src_va_float = (vector float) { 0.0, 10.0, 20.0, 30.0 };
> > -  vresult_float = (vector float) { 0.0, 0.0, 0.0, 0.0 };
> > -  expected_vresult_float = (vector float) { 0.0, 34.0, 20.0, 30.0
> > };
> > +  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0x41,
> > 0xa0,
> > +					  5, 6, 7, 8, 0x41, 0xf0, 0,
> > 0};
> > +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> > +					   0, 0, 0, 0, 0, 0, 0, 0 };
> > +  expected_vresult_uchar
> > +    = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 8, 0x42,
> > +			       5, 6, 7, 8, 0x41, 0xf0, 0, 0 };
> >  						 
> > -  vresult_uchar = vec_replace_unaligned (src_va_float,
> > src_a_float, 8);
> > -  vresult_float = (vector float) vresult_uchar;
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
> > src_a_float, 8);
> >  
> > -  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
> > +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_float,
> > src_va_float, "
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
> > src_a_float, "
> >  	   "index)\n");
> > -    for(i = 0; i < 4; i++)
> > -      printf(" vresult_float[%d] = %f, expected_vresult_float[%d]
> > = %f\n",
> > -	     i, vresult_float[i], i, expected_vresult_float[i]);
> > +    for (i = 0; i < 16; i++)
> > +      printf(" vresult_uchar[%d] = 0x%x,
> > expected_vresult_uchar[%d] = 0x%x\n",
> > +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> >  #else
> >      abort();
> >  #endif
> > @@ -230,72 +241,76 @@ main (int argc, char *argv [])
> >  
> >    /* Vector replace 64-bit element, unaligned  */
> >    src_a_ullint = 456;
> > -  src_va_ullint = (vector unsigned long long int) { 0, 0x222 };
> > -  vresult_ullint = (vector unsigned long long int) { 0, 0 };
> > -  expected_vresult_ullint = (vector unsigned long long int) {
> > 456*256,
> > -							      0x200 };
> > +  src_va_uchar = (vector unsigned char) { 0, 0xc, 0x1, 0, 0, 0, 0,
> > 0,
> > +					  0x22, 0x2, 0, 0, 0, 0, 0, 0
> > };
> > +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> > +					   0, 0, 0, 0, 0, 0, 0, 0 };
> > +  expected_vresult_uchar
> > +    = (vector unsigned char) { 0, 0xc8, 0x1, 0, 0, 0, 0, 0,
> > +			       0, 2, 0, 0, 0, 0, 0, 0 };
> >  						 
> >    /* Byte index 7 will over write least significant byte
> > of  element 0  */
> > -  vresult_uchar = vec_replace_unaligned (src_va_ullint,
> > src_a_ullint, 7);
> > -  vresult_ullint = (vector unsigned long long) vresult_uchar;
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
> > src_a_ullint, 7);
> >  
> > -  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
> > +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_ullint,
> > src_va_ullint, "
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
> > src_a_ullint, "
> >  	   "index)\n");
> > -    for(i = 0; i < 2; i++)
> > -      printf(" vresult_ullint[%d] = %d,
> > expected_vresult_ullint[%d] = %d\n",
> > -	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
> > +    for (i = 0; i < 15; i++)
> > +      printf(" vresult_uchar[%d] = 0x%x,
> > expected_vresult_uchar[%d] = 0x%x\n",
> > +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> >  #else
> >      abort();
> >  #endif
> >    }
> >  
> >    src_a_llint = 678;
> > -  src_va_llint = (vector long long int) { 0, 0x101 };
> > +  src_va_uchar = (vector unsigned char) { 0, 0xa6, 0x2, 0, 0, 0,
> > 0, 0,
> > +					  0x0, 0x1, 0, 0, 0, 0, 0, 0 };
> >    vresult_llint = (vector long long int) { 0, 0 };
> >    /* Byte index 7 will over write least significant byte
> > of  element 0  */
> > -  expected_vresult_llint = (vector long long int) { 678*256, 0x100
> > };
> > +  expected_vresult_uchar
> > +    = (vector unsigned char) { 0x0, 0xa6, 0x2, 0x0, 0x0, 0x0, 0x0,
> > 0x0,
> > +			       0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0
> > };
> >  						 
> > -  vresult_uchar = vec_replace_unaligned (src_va_llint,
> > src_a_llint, 7);
> > -  vresult_llint = (vector signed long long) vresult_uchar;
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
> > src_a_llint, 7);
> >  
> > -  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
> > +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_llint,
> > src_va_llint, "
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
> > src_a_llint, "
> >  	   "index)\n");
> > -    for(i = 0; i < 2; i++)
> > -      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d]
> > = %d\n",
> > -	     i, vresult_llint[i], i, expected_vresult_llint[i]);
> > +    for (i = 0; i < 16; i++)
> > +      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d]
> > = %d\n",
> > +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> >  #else
> >      abort();
> >  #endif
> >    }
> >    
> >    src_a_double = 678.0;
> > -  src_va_double = (vector double) { 0.0, 50.0 };
> > +  //  src_va_double = (vector double) { 0.0, 50.0 };
> > +  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> > +					  0, 0, 0, 0, 0, 0, 0, 0 };
> >    vresult_double = (vector double) { 0.0, 0.0 };
> >    expected_vresult_double = (vector double) { 0.0, 678.0 };
> >  						 
> > -  vresult_uchar = vec_replace_unaligned (src_va_double,
> > src_a_double, 0);
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
> > src_a_double, 0);
> >    vresult_double = (vector double) vresult_uchar;
> >  
> > -  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {>
> > +  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
> 
> Nit: The used vector type for vresult* isn't consistent with the
> others.
> Since the underlying data matters but type doesn't, it's fine.  Just
> feel
> weird float and double use different ways. I'd expect either both
> adopt
> floating point vector type, or both use uchar vector type. :) 

OK, fixed the test to use vresult_uchar like the rest of the tests.
> 
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_double,
> > src_va_double, "
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
> > src_a_double, "
> >  	   "index)\n");
> > -    for(i = 0; i < 2; i++)
> > +    for (i = 0; i < 2; i++)
> >        printf(" vresult_double[%d] = %f,
> > expected_vresult_double[%d] = %f\n",
> >  	     i, vresult_double[i], i, expected_vresult_double[i]);
> >  #else
> >      abort();
> >  #endif
> >    }
> > -    
> > +  
> >    return 0;
> >  }
> >  
> >  /* { dg-final { scan-assembler-times {\mvinsw\M} 6 } } */
> >  /* { dg-final { scan-assembler-times {\mvinsd\M} 6 } } */
> > -
> > -
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable_2.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable_2.c
> > new file mode 100644
> > index 00000000000..806ff3317e6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable_2.c
> > @@ -0,0 +1,50 @@
> > +/* { dg-do run { target { power10_hw } } } */
> > +/* { dg-do link { target { ! power10_hw } } } */
> 
> Why uses link?  compile would be enough?

The original test had it.  It was copied from the original test to the
new test when I created the new test.  I don't see any additional value
here or for that matter in vec-replace-word-runnable_1.c.  

Also removed the link statement from vec-replace-word-runnable_1.c.

> 
> > +/* { dg-require-effective-target power10_ok } */
> > +/* { dg-options "-mdejagnu-cpu=power10 -save-temps -flax-vector-
> > conversions" } */
> 
> IMHO we don't need "-flax-vector-conversions" here, see below ...

OK
> 
> > +
> > +#include <altivec.h>
> > +
> > +#define DEBUG 0
> > +
> > +#if DEBUG
> > +#include <stdio.h>
> > +#endif
> > +
> > +extern void abort (void);
> > +
> > +int
> > +main (int argc, char *argv [])
> > +{
> > +  int i;
> > +  vector unsigned long long int vresult_ullint;
> > +  vector unsigned long long int expected_vresult_ullint;
> > +  vector unsigned long long int src_va_ullint;
> > +  unsigned int long long src_a_ullint;
> > +
> > +  /* Replace doubleword size chunk specified as a constant that
> > can be
> > +     represented by an int.  Should generate a vinsd
> > instruction.  Note, this
> > +     test requires command line option -flax-vector-
> > conversions.  */
> > +  src_a_ullint = 456;
> > +  src_va_ullint = (vector unsigned long long int) { 0, 11 };
> > +  vresult_ullint = (vector unsigned long long int) { 0, 2 };
> > +  expected_vresult_ullint = (vector unsigned long long int) { 0,
> > 456 };
> > +						 
> > +  vresult_ullint = vec_replace_unaligned ((vector unsigned
> > char)src_va_ullint,
> > +					  src_a_ullint, 0);
> 
> ... one explicit conversion on the return value of
> vec_replace_unaligned to
> "vector unsigned long long int" can get rid of "-flax-vector-
> conversions".

OK, put the explicit conversion in, removed -flax-vector-conversions.

I guess we could put all the tests back together in a single file if
you prefer.  For the moment, I left it as two separate test files since
this is a "new" test as opposed to a fix for existing tests in vec-
replace-word-runnable_1.c.  

> 
> I also noticed the expected result would be wrong on BE.
> 
> // assuming that [A, B] is the register view, A & B are doubleword.
> 
> On LE, src register looks like [11, 0], it will become to [456, 0]
> after vinsd.
> 
> On BE, src register looks like [0, 11], it will become to [456, 11]
> after vinsd.
> 
> So the expected results should be {0, 456} on LE while {456, 11} on
> BE.
> 
> I think the existing vec-replace-word-runnable_1.c doesn't have the
> correct expected
> results for BE either, it isn't exposed as we don't have P10 BE test
> coverage for
> running cases for now.  We can have a separated patch to fix them
> together later.

Yea, didn't worry about BE results but agree they will be different. 
For now, leaving the BE fix for future work if we every support BE on
P10 or beyond.

                    Carl 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-07-17 18:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-07 20:18 [PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments Carl Love
2023-07-13  9:41 ` Kewen.Lin
2023-07-17 18:33   ` Carl Love

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).