[PATCH] rs6000, vector integer multiply/divide/modulo instructions

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] rs6000, vector integer multiply/divide/modulo instructions
@ 2020-10-30 20:07 Carl Love
  2020-10-30 21:05 ` David Edelsohn
  2020-10-31 13:28 ` David Edelsohn
  0 siblings, 2 replies; 22+ messages in thread
From: Carl Love @ 2020-10-30 20:07 UTC (permalink / raw)
  To: dje.gcc, gcc-patches, Will Schmidt, Segher Boessenkool

GCC maintainers:

The following patch adds new builtins for the vector integer multiply,
divide and modulo operations.  The builtins are:  
vec_mulh(), vec_div(), vec_dive(), vec_mod() for signed and unsigned
integers and long long integers.  Support for signed and unsigned long
long integers the exiting vec_mul() is added.  Note that the existing
support for the vec_div()and vec_mul() builtins emulate the vector
operations with multiple scalar instructions.  This patch adds support
for these builtins to use the new vector instructions.

The patch was compiled and tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

                Carl Love

-------------------------------------------------------------

2020-10-30  Carl Love  <cel@us.ibm.com>

gcc/
	* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
	defines.
	* config/rs6000/altivec.md (VIlong): Move define to file vector.md.
	* config/rs6000/rs6000-builtin.def (VDIVES_V4SI, VDIVES_V2DI,
	VDIVEU_V4SI, VDIVEU_V2DI, VDIVS_V4SI, VDIVS_V2DI, VDIVU_V4SI,
	VDIVU_V2DI, VMODS_V2DI, VMODS_V4SI, VMODU_V2DI, VMODU_V4SI, VMULHS_V2DI,
	VMULHS_V4SI, VMULHU_V2DI, VMULHU_V4SI, VMULLD_V2DI): Add builtin define.
	(VMUL, VMULH, VDIVE, VMOD):  Add new BU_P10_OVERLOAD_2 definitions.
	* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV, P10_BUILTIN_VEC_VDIVE,
	P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH): New overloaded definitions.
	(builtin_function_type)
	[P10V_BUILTIN_VDIVEU_V4SI, P10V_BUILTIN_VDIVEU_V2DI,
	P10V_BUILTIN_VDIVU_V4SI, P10V_BUILTIN_VDIVU_V2DI,
	P10V_BUILTIN_VMODU_V2DI, P10V_BUILTIN_VMODU_V4SI, P10V_BUILTIN_VMULHU_V2DI,
	P10V_BUILTIN_VMULHU_V4SI, P10V_BUILTIN_VMULLD_V2DI]: Add case statement
	for builtins.
	* config/rs6000/vector.md (UNSPEC_VDIVES, UNSPEC_VDIVEU, UNSPEC_VMULHS,
	UNSPEC_VMULHU, UNSPEC_VMULLD): Add enum for UNSPECs.
	(VIlong_char): Add define_mod_attribute.
	(vdives_<mode>, vdiveu_<mode>, vdiv<mode>3, uuvdiv<mode>3, vdivs_<mode>,
	vdivu_<mode>, vmods_<mode>, vmodu_<mode>, vmulhs_<mode>, vmulhu_<mode>,
	mulv2di3): Add define_insn, mode is VIlong.
	config/rs6000/vsx.md (vsx_mul_v2di, vsx_udiv_v2di): Add if (TARGET_POWER10)
	statement.
	* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
	builtin descriptions.

gcc/testsuite/
	* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |   5 +
 gcc/config/rs6000/altivec.md                  |   2 -
 gcc/config/rs6000/rs6000-builtin.def          |  23 ++
 gcc/config/rs6000/rs6000-call.c               |  49 +++
 gcc/config/rs6000/vector.md                   | 104 +++++
 gcc/config/rs6000/vsx.md                      | 118 +++---
 gcc/doc/extend.texi                           | 120 ++++++
 .../powerpc/builtins-1-p10-runnable.c         | 378 ++++++++++++++++++
 8 files changed, 747 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index df10a8c498d..b2803e52d93 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -725,6 +725,11 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a)	__builtin_vec_strir_p (a)
 #define vec_stril_p(a)	__builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh (a, b)
+#define vec_div(a, b) __builtin_vec_div (a, b)
+#define vec_dive(a, b) __builtin_vec_dive (a, b)
+#define vec_mod(a, b) __builtin_vec_mod (a, b)
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0a2e634d6b0..8e80c681b11 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -192,8 +192,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 5b05da87f4b..706527dcd3a 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2830,6 +2830,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
 BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
 
+BU_P10V_AV_2 (VDIVES_V4SI, "vdivesw", CONST, vdives_v4si)
+BU_P10V_AV_2 (VDIVES_V2DI, "vdivesd", CONST, vdives_v2di)
+BU_P10V_AV_2 (VDIVEU_V4SI, "vdiveuw", CONST, vdiveu_v4si)
+BU_P10V_AV_2 (VDIVEU_V2DI, "vdiveud", CONST, vdiveu_v2di)
+BU_P10V_AV_2 (VDIVS_V4SI, "vdivsw", CONST, vdivs_v4si)
+BU_P10V_AV_2 (VDIVS_V2DI, "vdivsd", CONST, vdivs_v2di)
+BU_P10V_AV_2 (VDIVU_V4SI, "vdivuw", CONST, vdivu_v4si)
+BU_P10V_AV_2 (VDIVU_V2DI, "vdivud", CONST, vdivu_v2di)
+BU_P10V_AV_2 (VMODS_V2DI, "vmodsd", CONST, vmods_v2di)
+BU_P10V_AV_2 (VMODS_V4SI, "vmodsw", CONST, vmods_v4si)
+BU_P10V_AV_2 (VMODU_V2DI, "vmodud", CONST, vmodu_v2di)
+BU_P10V_AV_2 (VMODU_V4SI, "vmoduw", CONST, vmodu_v4si)
+BU_P10V_AV_2 (VMULHS_V2DI, "vmulhsd", CONST, vmulhs_v2di)
+BU_P10V_AV_2 (VMULHS_V4SI, "vmulhsw", CONST, vmulhs_v4si)
+BU_P10V_AV_2 (VMULHU_V2DI, "vmulhud", CONST, vmulhu_v2di)
+BU_P10V_AV_2 (VMULHU_V4SI, "vmulhuw", CONST, vmulhu_v4si)
+BU_P10V_AV_2 (VMULLD_V2DI, "vmulld", CONST, mulv2di3)
+
 BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, xxspltiw_v4si)
 BU_P10V_VSX_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST, xxspltiw_v4sf)
 
@@ -2905,6 +2923,11 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
 BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
 BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
 
+BU_P10_OVERLOAD_2 (VMUL, "mul")
+BU_P10_OVERLOAD_2 (VMULH, "mulh")
+BU_P10_OVERLOAD_2 (VDIVE, "dive")
+BU_P10_OVERLOAD_2 (VMOD, "mod")
+
 \f
 BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
 BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b044778a7ae..be2e6c56632 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -993,6 +993,35 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_VDIVS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_VDIVU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+
+  { P10_BUILTIN_VEC_VDIVE, P10V_BUILTIN_VDIVES_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_VDIVE, P10V_BUILTIN_VDIVEU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_VDIVE, P10V_BUILTIN_VDIVES_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_VDIVE, P10V_BUILTIN_VDIVEU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P10_BUILTIN_VEC_VMOD, P10V_BUILTIN_VMODS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_VMOD, P10V_BUILTIN_VMODU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_VMOD, P10V_BUILTIN_VMODS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_VMOD, P10V_BUILTIN_VMODU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
@@ -1833,6 +1862,17 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P10_BUILTIN_VEC_VMULH, P10V_BUILTIN_VMULHS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_VMULH, P10V_BUILTIN_VMULHU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_VMULH, P10V_BUILTIN_VMULHS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_VMULH, P10V_BUILTIN_VMULHU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB,
@@ -14325,6 +14365,15 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10V_BUILTIN_XXGENPCVM_V8HI:
     case P10V_BUILTIN_XXGENPCVM_V4SI:
     case P10V_BUILTIN_XXGENPCVM_V2DI:
+    case P10V_BUILTIN_VDIVEU_V4SI:
+    case P10V_BUILTIN_VDIVEU_V2DI:
+    case P10V_BUILTIN_VDIVU_V4SI:
+    case P10V_BUILTIN_VDIVU_V2DI:
+    case P10V_BUILTIN_VMODU_V2DI:
+    case P10V_BUILTIN_VMODU_V4SI:
+    case P10V_BUILTIN_VMULHU_V2DI:
+    case P10V_BUILTIN_VMULHU_V4SI:
+    case P10V_BUILTIN_VMULLD_V2DI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 796345c80d3..670feb4e04e 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -22,6 +22,12 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
+(define_c_enum "unspec"
+  [UNSPEC_VDIVES
+   UNSPEC_VDIVEU
+   UNSPEC_VMULHS
+   UNSPEC_VMULHU
+])
 
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
@@ -64,6 +70,11 @@
 ;; Vector integer modes
 (define_mode_iterator VI [V4SI V8HI V16QI])
 
+;; Longer vec int moeds for Vector Integer Multiply/Divide/Modulo Instructions
+(define_mode_iterator VIlong [V2DI V4SI])
+(define_mode_attr VIlong_char [(V2DI "d")
+			       (V4SI "w")])
+
 ;; Base type from vector mode
 (define_mode_attr VEC_base [(V16QI "QI")
 			    (V8HI  "HI")
@@ -1521,3 +1532,96 @@
     emit_insn (gen_vsx_extract_<VEC_F:mode> (operand0, vec, elt));
     DONE;
   })
+
+(define_insn "vdives_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
+		       UNSPEC_VDIVES))]
+  "TARGET_POWER10"
+  "vdives<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vdiveu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+			 (match_operand:VIlong 2 "vsx_register_operand" "v")]
+			UNSPEC_VDIVEU))]
+  "TARGET_POWER10"
+  "vdiveu<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "div<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivs<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "udiv<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivu<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vdivs_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivs<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vdivu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivu<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vmods_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmods<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vmodu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmodu<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vmulhs_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
+		       UNSPEC_VMULHS))]
+  "TARGET_POWER10"
+  "vmulhs<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vmulhu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
+		       UNSPEC_VMULHU))]
+  "TARGET_POWER10"
+  "vmulhu<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+;; Vector multiply low double word
+(define_insn "mulv2di3"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
+	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
+		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmulld %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index d6347dba149..2b04f19842d 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1623,28 +1623,35 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op5, op3, op4));
-  else
-    {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op5, ret);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op3, op3, op4));
+
+  if (TARGET_POWER10)
+    emit_insn (gen_mulv2di3 (op0, op1, op2) );
+
   else
     {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op3, ret);
+      rtx op3 = gen_reg_rtx (DImode);
+      rtx op4 = gen_reg_rtx (DImode);
+      rtx op5 = gen_reg_rtx (DImode);
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op5, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op5, ret);
+	}
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op3, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op3, ret);
+	}
+      emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
     }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }
   [(set_attr "type" "mul")])
@@ -1674,6 +1681,7 @@
   rtx op3 = gen_reg_rtx (DImode);
   rtx op4 = gen_reg_rtx (DImode);
   rtx op5 = gen_reg_rtx (DImode);
+
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
   if (TARGET_POWERPC64)
@@ -1718,37 +1726,47 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op5, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op5, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op5, target);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op3, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op3, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op3, target);
-    }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
-  DONE;
+
+    if (TARGET_POWER10)
+      emit_insn (gen_udivv2di3 (op0, op1, op2) );
+
+    else
+      {
+	rtx op3 = gen_reg_rtx (DImode);
+	rtx op4 = gen_reg_rtx (DImode);
+	rtx op5 = gen_reg_rtx (DImode);
+
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op5, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op5, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op5, target);
+	  }
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op3, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op3, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op3, target);
+	  }
+	emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
+      }
+    DONE;
 }
   [(set_attr "type" "div")])
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5be1cbecf60..7b4293c3db7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21356,6 +21356,126 @@ integer value between 0 and 255 inclusive.
 @exdent vector unsigned int vec_genpcvm (vector unsigned long long int,
                                          const int)
 @end smallexample
+
+Vector Integer Multiply-Divide-Modulo
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mulh (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mulh (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value i from 0 to 3, do the following. The integer value in
+word element i of a is multiplied by the integer value in word
+element i of b. The high-order 32 bits of the 64-bit product are placed into
+word element i of the vector returned.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mulh (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mulh (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value i from 0 to 1, do the following. The integer value in
+doubleword element i of a is multiplied by the integer value in doubleword
+element i of b. The high-order 64 bits of the 128-bit product are placed into
+doubleword element i of the vector returned.
+
+@smallexample
+@exdent vector unsigned long long
+@exdent vec_mul (vector unsigned long long a, vector unsigned long long b)
+@exdent vector signed long long
+@exdent vec_mul (vector signed long long a, vector signed long long b)
+@end smallexample
+
+For each integer value i from 0 to 1, do the following. The integer value in
+doubleword element i of a is multiplied by the integer value in
+doubleword element i of b. The low-order 64 bits of the 128-bit product
+are placed into doubleword element i of the vector returned.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_div (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_div (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value i from 0 to 3, do the following. The integer in word
+element i of a is divided by the integer in word element i of b. The unique
+integer quotient is placed into the word element i of the vector returned. If
+an attempt is made to perform any of the divisions <anything> ÷ 0 then the
+quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_div (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_div (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value i from 0 to 1, do the following. The integer in
+doubleword element i of a is divided by the integer in doubleword
+element i of b. The unique integer quotient is placed into the
+doubleword element i of the vector returned. If an attempt is made to perform
+any of the divisions 0x8000_0000_0000_0000 ÷ -1 or <anything> ÷ 0 then the
+quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_dive (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_dive (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value i from 0 to 3, do the following. The integer in word
+element i of a is shifted left by 32 bits, then divided by the integer
+in word element i of b. The unique integer quotient is placed into the
+word element i of the vector returned. If the quotient cannot be represented
+in 32 bits, or if an attempt is made to perform any of the divisions
+<anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_dive (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_dive (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value i from 0 to 1, do the following. The integer in
+doubleword element i of a is shifted left by 64 bits, then divided by the
+integer in doubleword element i of b. The unique integer quotient is placed
+into the doubleword element i of the vector returned. If the quotient cannot
+be represented in 64 bits, or if an attempt is made to perform <anything> ÷ 0
+then the quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mod (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mod (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value i from 0 to 3, do the following. The integer in word
+element i of a is divided by the integer in word element i of b. The unique
+integer remainder is placed into the word element i of the vector returned.
+If an attempt is made to perform any of the divisions 0x8000_0000 ÷ -1 or
+<anything> ÷ 0 then the remainder is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mod (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mod (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value i from 0 to 1, do the following. The integer in
+doubleword element i of a is divided by the integer in doubleword element i of
+b. The unique integer remainder is placed into the doubleword
+element i of the vector returned. If an attempt is made to perform
+<anything> ÷ 0 then the remainder is undefined.
+
 Generate PCV from specified Mask size, as if implemented by the
 @code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where
 immediate value is either 0, 1, 2 or 3.
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
new file mode 100644
index 00000000000..549bc742d12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -0,0 +1,378 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+/* { dg-final { scan-assembler-times "\mvdivsw\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvdivuw\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvdivsd\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvdivud\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvdivesw\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvdiveuw\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvdivesd\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvdiveud\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvmodsw\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvmoduw\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvmodsd\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvmodud\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvmulhsw\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvmulhuw\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvmulhsd\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvmulhud\M" 1 } } */
+/* { dg-final { scan-assembler-times "\mvmulld\M" 2 } } */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <math.h>
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+void abort (void);
+
+int main()
+  {
+    int i;
+    vector int i_arg1, i_arg2;
+    vector unsigned int u_arg1, u_arg2;
+    vector long long int d_arg1, d_arg2;
+    vector long long unsigned int ud_arg1, ud_arg2;
+   
+    vector int vec_i_expected, vec_i_result;
+    vector unsigned int vec_u_expected, vec_u_result;
+    vector long long int vec_d_expected, vec_d_result;
+    vector long long unsigned int vec_ud_expected, vec_ud_result;
+  
+    /* Signed word divide */
+    i_arg1 = (vector int){ 20, 40, 60, 80};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){10, 20, 30, 40};
+
+    vec_i_result = vec_div (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word divide */
+    u_arg1 = (vector unsigned int){ 20, 40, 60, 80};
+    u_arg2 = (vector unsigned int){ 2, 2, 2, 2};
+    vec_u_expected = (vector unsigned int){10, 20, 30, 40};
+
+    vec_u_result = vec_div (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word divide */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 2, 2};
+    vec_d_expected = (vector long long){12, 34};
+
+    vec_d_result = vec_div (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+	  printf("ERROR vec_div result[%d] = %d != expected[%d] = %d\n",
+		 i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word divide */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 2, 2};
+    vec_ud_expected = (vector unsigned long long){12, 34};
+
+    vec_ud_result = vec_div (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended signed word  result = (arg1 << 32)/arg2 */
+    i_arg1 = (vector int){ 2, 4, 6, 8};
+    i_arg2 = (vector int){ 2048, 2048, 2048, 2048};
+    vec_i_expected = (vector int){4194304, 8388608, 12582912, 16777216};
+
+    vec_i_result = vec_dive (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended unsigned word  result = (arg1 << 32)/arg2 */
+    u_arg1 = (vector unsigned int){ 2, 4, 6, 8};
+    u_arg2 = (vector unsigned int){ 2048, 2048, 2048, 2048};
+    vec_u_expected = (vector unsigned int){4194304, 8388608,
+					   12582912, 16777216};
+
+    vec_u_result = vec_dive (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double signed  esult = (arg1 << 64)/arg2 */
+    d_arg1 = (vector long long int){ 2, 4};
+    d_arg2 = (vector long long int){ 4294967296, 4294967296};
+
+    vec_d_expected = (vector long long int){8589934592, 17179869184};
+
+    vec_d_result = vec_dive (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive result[%d] = %lld != expected[%d] = %lld\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double unsigned result = (arg1 << 64)/arg2 */
+    ud_arg1 = (vector long long unsigned int){ 2, 4};
+    ud_arg2 = (vector long long unsigned int){ 4294967296, 4294967296};
+
+    vec_ud_expected = (vector long long unsigned int){8589934592,
+						      17179869184};
+
+    vec_ud_result = vec_dive (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive result[%d] = %lld != expected[%d] = %lld\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word modulo */
+    i_arg1 = (vector int){ 23, 45, 61, 89};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){1, 1, 1, 1};
+
+    vec_i_result = vec_mod (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word modulo */
+    u_arg1 = (vector unsigned int){ 25, 41, 67, 86};
+    u_arg2 = (vector unsigned int){ 3, 3, 3, 3};
+    vec_u_expected = (vector unsigned int){1, 2, 1, 2};
+
+    vec_u_result = vec_mod (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word modulo */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 7, 7};
+    vec_d_expected = (vector long long){3, 5};
+
+    vec_d_result = vec_mod (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word modulo */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 8, 8};
+    vec_ud_expected = (vector unsigned long long){0, 4};
+
+    vec_ud_result = vec_mod (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vecmod result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word multiply high */
+    i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 };
+    i_arg2 = (vector int){ 2, 3, 4, 5};
+    vec_i_expected = (vector int){-1, -2, -2, -3};
+
+    vec_i_result = vec_mulh (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word multiply high */
+    u_arg1 = (vector unsigned int){ 2147483648, 2147483648,
+				    2147483648, 2147483648 };
+    u_arg2 = (vector unsigned int){ 4, 5, 6, 7 };
+    vec_u_expected = (vector unsigned int){2, 2, 3, 3 };
+
+    vec_u_result = vec_mulh (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply high */
+    d_arg1 = (vector long long int){  2305843009213693951,
+				      4611686018427387903 };
+    d_arg2 = (vector long long int){ 12, 20 };
+    vec_d_expected = (vector long long int){ 1, 4 };
+
+    vec_d_result = vec_mulh (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply high */
+    ud_arg1 = (vector unsigned long long int){ 2305843009213693951,
+					       4611686018427387903 };
+    ud_arg2 = (vector unsigned long long int){ 32, 10 };
+    vec_ud_expected = (vector unsigned long long int){ 3, 2 };
+
+    vec_ud_result = vec_mulh (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply low */
+    ud_arg1 = (vector unsigned long long int){ 2048, 4096 };
+    ud_arg2 = (vector unsigned long long int){ 2, 4 };
+    vec_ud_expected = (vector unsigned long long int){ 4096, 16384 };
+
+    vec_ud_result = vec_mul (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply low */
+    d_arg1 = (vector signed long long int){ 2048, 4096 };
+    d_arg2 = (vector signed long long int){ 2, 4 };
+    vec_d_expected = (vector signed long long int){ 4096, 16384 };
+
+    vec_d_result = vec_mul (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+  }
-- 
2.17.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions
  2020-10-30 20:07 [PATCH] rs6000, vector integer multiply/divide/modulo instructions Carl Love
@ 2020-10-30 21:05 ` David Edelsohn
  2020-10-30 21:33   ` Carl Love
  2020-10-31 13:28 ` David Edelsohn
  1 sibling, 1 reply; 22+ messages in thread
From: David Edelsohn @ 2020-10-30 21:05 UTC (permalink / raw)
  To: Carl Love
  Cc: GCC Patches, Will Schmidt, Segher Boessenkool, Bill Schmidt,
	Peter Bergner

On Fri, Oct 30, 2020 at 4:07 PM Carl Love <cel@us.ibm.com> wrote:

> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> new file mode 100644
> index 00000000000..549bc742d12
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> @@ -0,0 +1,378 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +/* { dg-final { scan-assembler-times "\mvdivsw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdivuw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdivsd\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdivud\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdivesw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdiveuw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdivesd\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdiveud\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmodsw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmoduw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmodsd\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmodud\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmulhsw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmulhuw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmulhsd\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmulhud\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmulld\M" 2 } } */

As Alan mentioned with the other testcases, without an explicit
"-save-temps", dg-do run will not test for the assembler output.  Are
you certain that the assembler output is actually tested and matching?

Thanks, David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] rs6000, vector integer multiply/divide/modulo instructions
  2020-10-30 21:05 ` David Edelsohn
@ 2020-10-30 21:33   ` Carl Love
  0 siblings, 0 replies; 22+ messages in thread
From: Carl Love @ 2020-10-30 21:33 UTC (permalink / raw)
  To: David Edelsohn
  Cc: GCC Patches, Will Schmidt, Segher Boessenkool, Bill Schmidt,
	Peter Bergner

On Fri, 2020-10-30 at 17:05 -0400, David Edelsohn wrote:
> On Fri, Oct 30, 2020 at 4:07 PM Carl Love <cel@us.ibm.com> wrote:
> 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-
> > runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-
> > runnable.c
> > new file mode 100644
> > index 00000000000..549bc742d12
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> > @@ -0,0 +1,378 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target power10_hw } */
> > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> > +/* { dg-final { scan-assembler-times "\mvdivsw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdivuw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdivsd\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdivud\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdivesw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdiveuw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdivesd\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdiveud\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmodsw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmoduw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmodsd\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmodud\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmulhsw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmulhuw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmulhsd\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmulhud\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmulld\M" 2 } } */
> 
> As Alan mentioned with the other testcases, without an explicit
> "-save-temps", dg-do run will not test for the assembler output.  Are
> you certain that the assembler output is actually tested and
> matching?
> 
> Thanks, David

David:

I am just running the binary on Mambo by hand.  I am not running the
GCC regression test on Mambo.  I don't have GCC setup on Mambo.   But
yes, I did miss the -save-temps.  I will fix that.  Thanks.

               Carl 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions
  2020-10-30 20:07 [PATCH] rs6000, vector integer multiply/divide/modulo instructions Carl Love
  2020-10-30 21:05 ` David Edelsohn
@ 2020-10-31 13:28 ` David Edelsohn
  2020-11-02 21:06   ` Carl Love
  2020-11-04 16:44   ` Carl Love
  1 sibling, 2 replies; 22+ messages in thread
From: David Edelsohn @ 2020-10-31 13:28 UTC (permalink / raw)
  To: Carl Love
  Cc: GCC Patches, Will Schmidt, Segher Boessenkool, Bill Schmidt,
	Peter Bergner

On Fri, Oct 30, 2020 at 4:07 PM Carl Love <cel@us.ibm.com> wrote:
>
> GCC maintainers:
>
> The following patch adds new builtins for the vector integer multiply,
> divide and modulo operations.  The builtins are:
> vec_mulh(), vec_div(), vec_dive(), vec_mod() for signed and unsigned
> integers and long long integers.  Support for signed and unsigned long
> long integers the exiting vec_mul() is added.  Note that the existing
> support for the vec_div()and vec_mul() builtins emulate the vector
> operations with multiple scalar instructions.  This patch adds support
> for these builtins to use the new vector instructions.
>
> The patch was compiled and tested on:
>
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
>
> with no regressions. Additionally the new test case was compiled and
> executed by hand on Mambo to verify the test case passes.
>
> Please let me know if this patch is acceptable for mainline.  Thanks.
>
>                 Carl Love
>
> -------------------------------------------------------------
>
> 2020-10-30  Carl Love  <cel@us.ibm.com>
>
> gcc/
>         * config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
>         defines.
>         * config/rs6000/altivec.md (VIlong): Move define to file vector.md.
>         * config/rs6000/rs6000-builtin.def (VDIVES_V4SI, VDIVES_V2DI,
>         VDIVEU_V4SI, VDIVEU_V2DI, VDIVS_V4SI, VDIVS_V2DI, VDIVU_V4SI,
>         VDIVU_V2DI, VMODS_V2DI, VMODS_V4SI, VMODU_V2DI, VMODU_V4SI, VMULHS_V2DI,
>         VMULHS_V4SI, VMULHU_V2DI, VMULHU_V4SI, VMULLD_V2DI): Add builtin define.
>         (VMUL, VMULH, VDIVE, VMOD):  Add new BU_P10_OVERLOAD_2 definitions.
>         * config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV, P10_BUILTIN_VEC_VDIVE,
>         P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH): New overloaded definitions.
>         (builtin_function_type)
>         [P10V_BUILTIN_VDIVEU_V4SI, P10V_BUILTIN_VDIVEU_V2DI,
>         P10V_BUILTIN_VDIVU_V4SI, P10V_BUILTIN_VDIVU_V2DI,
>         P10V_BUILTIN_VMODU_V2DI, P10V_BUILTIN_VMODU_V4SI, P10V_BUILTIN_VMULHU_V2DI,
>         P10V_BUILTIN_VMULHU_V4SI, P10V_BUILTIN_VMULLD_V2DI]: Add case statement
>         for builtins.
>         * config/rs6000/vector.md (UNSPEC_VDIVES, UNSPEC_VDIVEU, UNSPEC_VMULHS,
>         UNSPEC_VMULHU, UNSPEC_VMULLD): Add enum for UNSPECs.
>         (VIlong_char): Add define_mod_attribute.
>         (vdives_<mode>, vdiveu_<mode>, vdiv<mode>3, uuvdiv<mode>3, vdivs_<mode>,
>         vdivu_<mode>, vmods_<mode>, vmodu_<mode>, vmulhs_<mode>, vmulhu_<mode>,
>         mulv2di3): Add define_insn, mode is VIlong.
>         config/rs6000/vsx.md (vsx_mul_v2di, vsx_udiv_v2di): Add if (TARGET_POWER10)
>         statement.
>         * doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
>         builtin descriptions.

> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md

Hi, Carl

I thought that vector.md was a transfer vector for the patterns and
instructions were defined in vsx.md.  Why are the new insn patterns
defined in vector.md?

> +(define_insn "div<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +       (div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +                   (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "udiv<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +       (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +                   (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vdivs_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +       (div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +                   (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vdivu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +       (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +                    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

Also, what is the reason to define div<mode>3 and udiv<mode>3, then
repeat the patterns for vdivs_<mode> and vdivu_<mode>?  Is there a
difference between the two patterns that I'm missing?  The new
builtins should be able to invoke the new named standard patterns.  Or
we really want an additional set of patterns that match the builtin
names?

The div<mode>3 and udiv<mode>3 patterns do not seem to be listed in
the ChangeLog.

Thanks, David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] rs6000, vector integer multiply/divide/modulo instructions
  2020-10-31 13:28 ` David Edelsohn
@ 2020-11-02 21:06   ` Carl Love
  2020-11-04 16:44   ` Carl Love
  1 sibling, 0 replies; 22+ messages in thread
From: Carl Love @ 2020-11-02 21:06 UTC (permalink / raw)
  To: David Edelsohn, cel
  Cc: GCC Patches, Will Schmidt, Segher Boessenkool, Bill Schmidt,
	Peter Bergner


David:
> 
> Hi, Carl
> 
> I thought that vector.md was a transfer vector for the patterns and
> instructions were defined in vsx.md.  Why are the new insn patterns
> defined in vector.md?

I am a bit of a newbie here.  I wasn't aware of any specific guide
lines on the vector instructions.  I put them in vector.md since they
are vector instructions.  Made sense to me.  I can move them to vsx.md
if that is the prefered place, no problem.
> 
> > +(define_insn "div<mode>3"
> > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > +       (div:VIlong (match_operand:VIlong 1 "vsx_register_operand"
> > "v")
> > +                   (match_operand:VIlong 2 "vsx_register_operand"
> > "v")))]
> > +  "TARGET_POWER10"
> > +  "vdivs<VIlong_char> %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> > +
> > +(define_insn "udiv<mode>3"
> > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > +       (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand"
> > "v")
> > +                   (match_operand:VIlong 2 "vsx_register_operand"
> > "v")))]
> > +  "TARGET_POWER10"
> > +  "vdivu<VIlong_char> %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> > +
> > +(define_insn "vdivs_<mode>"
> > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > +       (div:VIlong (match_operand:VIlong 1 "vsx_register_operand"
> > "v")
> > +                   (match_operand:VIlong 2 "vsx_register_operand"
> > "v")))]
> > +  "TARGET_POWER10"
> > +  "vdivs<VIlong_char> %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> > +
> > +(define_insn "vdivu_<mode>"
> > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > +       (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand"
> > "v")
> > +                    (match_operand:VIlong 2 "vsx_register_operand"
> > "v")))]
> > +  "TARGET_POWER10"
> > +  "vdivu<VIlong_char> %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> 
> Also, what is the reason to define div<mode>3 and udiv<mode>3, then
> repeat the patterns for vdivs_<mode> and vdivu_<mode>?  Is there a
> difference between the two patterns that I'm missing?  The new
> builtins should be able to invoke the new named standard
> patterns.  Or
> we really want an additional set of patterns that match the builtin
> names?
> 
> The div<mode>3 and udiv<mode>3 patterns do not seem to be listed in
> the ChangeLog.

I originally added the vector multiply and divide instructions as
vmult_<mode>, vdivs_<mode>, etc.  I couldn't get GCC to generate the
instructions.  Bill pointed out that I hadn't used the default names
div<mode>3.  I thought I changed the original mult and div names to the
default names.  Looks like I didn't get the div stuff all updated in
the patch.  So, yea there should just be the div<mode>3 and udiv<mode>3
definitions.  My bad, sorry.  

I will update the patch, retest and repost.  Thanks for the input.

                          Carl 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] rs6000, vector integer multiply/divide/modulo instructions
  2020-10-31 13:28 ` David Edelsohn
  2020-11-02 21:06   ` Carl Love
@ 2020-11-04 16:44   ` Carl Love
  2020-11-19  0:42     ` David Edelsohn
                       ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Carl Love @ 2020-11-04 16:44 UTC (permalink / raw)
  To: David Edelsohn
  Cc: GCC Patches, Will Schmidt, Segher Boessenkool, Bill Schmidt,
	Peter Bergner

David:

I have reworked the patch moving the new vector instruction patterns to
vsx.md.  Also, cleaned up the vector division instructions.  The
div<mode>3 pattern definitions are the only ones that should be
defined.  

I have retested the patch on:

   powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

                Carl Love

--------------------------------------------------------------

2020-11-02  Carl Love  <cel@us.ibm.com>

gcc/
	* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
	defines.
	* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
	* config/rs6000/rs6000-builtin.def (VDIVES_V4SI, VDIVES_V2DI,
	VDIVEU_V4SI, VDIVEU_V2DI, VDIVS_V4SI, VDIVS_V2DI, VDIVU_V4SI,
	VDIVU_V2DI, VMODS_V2DI, VMODS_V4SI, VMODU_V2DI, VMODU_V4SI,
	VMULHS_V2DI, VMULHS_V4SI, VMULHU_V2DI, VMULHU_V4SI, VMULLD_V2DI):
	Add builtin define.
	(VMUL, VMULH, VDIVE, VMOD):  Add new BU_P10_OVERLOAD_2 definitions.
	* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
	P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH):
	New overloaded definitions.
	(builtin_function_type) [P10V_BUILTIN_VDIVEU_V4SI,
	P10V_BUILTIN_VDIVEU_V2DI, P10V_BUILTIN_VDIVU_V4SI,
	P10V_BUILTIN_VDIVU_V2DI, P10V_BUILTIN_VMODU_V2DI,
	P10V_BUILTIN_VMODU_V4SI, P10V_BUILTIN_VMULHU_V2DI,
	P10V_BUILTIN_VMULHU_V4SI, P10V_BUILTIN_VMULLD_V2DI]: Add case
	statement for builtins.
	* config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute.
	(UNSPEC_VDIVES, UNSPEC_VDIVEU,
	UNSPEC_VMULHS, UNSPEC_VMULHU, UNSPEC_VMULLD): Add enum for UNSPECs.
	(vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement.
	(vdives_<mode>, vdiveu_<mode>, vdiv<mode>3, uuvdiv<mode>3,
	vmods_<mode>, vmodu_<mode>, vmulhs_<mode>, vmulhu_<mode>, mulv2di3):
	Add define_insn, mode is VIlong.
	* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
	builtin descriptions.

gcc/testsuite/
	* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |   5 +
 gcc/config/rs6000/altivec.md                  |   2 -
 gcc/config/rs6000/rs6000-builtin.def          |  23 ++
 gcc/config/rs6000/rs6000-call.c               |  49 +++
 gcc/config/rs6000/vsx.md                      | 205 +++++++---
 gcc/doc/extend.texi                           | 120 ++++++
 .../powerpc/builtins-1-p10-runnable.c         | 378 ++++++++++++++++++
 7 files changed, 730 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e1884f51bd8..d8f1d2cfc55 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,11 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a)	__builtin_vec_strir_p (a)
 #define vec_stril_p(a)	__builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh (a, b)
+#define vec_div(a, b) __builtin_vec_div (a, b)
+#define vec_dive(a, b) __builtin_vec_dive (a, b)
+#define vec_mod(a, b) __builtin_vec_mod (a, b)
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..f10f1cdd8a7 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -193,8 +193,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index a58102c3785..7663465b755 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2877,6 +2877,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
 BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
 
+BU_P10V_AV_2 (VDIVES_V4SI, "vdivesw", CONST, vdives_v4si)
+BU_P10V_AV_2 (VDIVES_V2DI, "vdivesd", CONST, vdives_v2di)
+BU_P10V_AV_2 (VDIVEU_V4SI, "vdiveuw", CONST, vdiveu_v4si)
+BU_P10V_AV_2 (VDIVEU_V2DI, "vdiveud", CONST, vdiveu_v2di)
+BU_P10V_AV_2 (VDIVS_V4SI, "vdivsw", CONST, divv4si3)
+BU_P10V_AV_2 (VDIVS_V2DI, "vdivsd", CONST, divv2di3)
+BU_P10V_AV_2 (VDIVU_V4SI, "vdivuw", CONST, udivv4si3)
+BU_P10V_AV_2 (VDIVU_V2DI, "vdivud", CONST, udivv2di3)
+BU_P10V_AV_2 (VMODS_V2DI, "vmodsd", CONST, vmods_v2di)
+BU_P10V_AV_2 (VMODS_V4SI, "vmodsw", CONST, vmods_v4si)
+BU_P10V_AV_2 (VMODU_V2DI, "vmodud", CONST, vmodu_v2di)
+BU_P10V_AV_2 (VMODU_V4SI, "vmoduw", CONST, vmodu_v4si)
+BU_P10V_AV_2 (VMULHS_V2DI, "vmulhsd", CONST, vmulhs_v2di)
+BU_P10V_AV_2 (VMULHS_V4SI, "vmulhsw", CONST, vmulhs_v4si)
+BU_P10V_AV_2 (VMULHU_V2DI, "vmulhud", CONST, vmulhu_v2di)
+BU_P10V_AV_2 (VMULHU_V4SI, "vmulhuw", CONST, vmulhu_v4si)
+BU_P10V_AV_2 (VMULLD_V2DI, "vmulld", CONST, mulv2di3)
+
 BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, xxspltiw_v4si)
 BU_P10V_VSX_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST, xxspltiw_v4sf)
 
@@ -2952,6 +2970,11 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
 BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
 BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
 
+BU_P10_OVERLOAD_2 (VMUL, "mul")
+BU_P10_OVERLOAD_2 (VMULH, "mulh")
+BU_P10_OVERLOAD_2 (VDIVE, "dive")
+BU_P10_OVERLOAD_2 (VMOD, "mod")
+
 \f
 BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
 BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 92378e958a9..009df8b15b0 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -1069,6 +1069,35 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_VDIVS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_VDIVU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+
+  { P10_BUILTIN_VEC_VDIVE, P10V_BUILTIN_VDIVES_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_VDIVE, P10V_BUILTIN_VDIVEU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_VDIVE, P10V_BUILTIN_VDIVES_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_VDIVE, P10V_BUILTIN_VDIVEU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P10_BUILTIN_VEC_VMOD, P10V_BUILTIN_VMODS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_VMOD, P10V_BUILTIN_VMODU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_VMOD, P10V_BUILTIN_VMODS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_VMOD, P10V_BUILTIN_VMODU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
@@ -1909,6 +1938,17 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P10_BUILTIN_VEC_VMULH, P10V_BUILTIN_VMULHS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_VMULH, P10V_BUILTIN_VMULHU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_VMULH, P10V_BUILTIN_VMULHS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_VMULH, P10V_BUILTIN_VMULHU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB,
@@ -14410,6 +14450,15 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10V_BUILTIN_XXGENPCVM_V8HI:
     case P10V_BUILTIN_XXGENPCVM_V4SI:
     case P10V_BUILTIN_XXGENPCVM_V2DI:
+    case P10V_BUILTIN_VDIVEU_V4SI:
+    case P10V_BUILTIN_VDIVEU_V2DI:
+    case P10V_BUILTIN_VDIVU_V4SI:
+    case P10V_BUILTIN_VDIVU_V2DI:
+    case P10V_BUILTIN_VMODU_V2DI:
+    case P10V_BUILTIN_VMODU_V4SI:
+    case P10V_BUILTIN_VMULHU_V2DI:
+    case P10V_BUILTIN_VMULHU_V4SI:
+    case P10V_BUILTIN_VMULLD_V2DI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 947631d83ee..0fd1d275e0e 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -267,6 +267,12 @@
 (define_mode_iterator VSX_MM [V16QI V8HI V4SI V2DI V1TI])
 (define_mode_iterator VSX_MM4 [V16QI V8HI V4SI V2DI])
 
+;; Longer vec int modes for rotate/mask ops
+;; and Vector Integer Multiply/Divide/Modulo Instructions
+(define_mode_iterator VIlong [V2DI V4SI])
+(define_mode_attr VIlong_char [(V2DI "d")
+			       (V4SI "w")])
+
 ;; Constants for creating unspecs
 (define_c_enum "unspec"
   [UNSPEC_VSX_CONCAT
@@ -363,8 +369,13 @@
    UNSPEC_INSERTR
    UNSPEC_REPLACE_ELT
    UNSPEC_REPLACE_UN
+   UNSPEC_VDIVES
+   UNSPEC_VDIVEU
+   UNSPEC_VMULHS
+   UNSPEC_VMULHU
   ])
 
+
 (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
 				 UNSPEC_VSX_XVCVBF16SPN])
 
@@ -1623,28 +1634,35 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op5, op3, op4));
-  else
-    {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op5, ret);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op3, op3, op4));
+
+  if (TARGET_POWER10)
+    emit_insn (gen_mulv2di3 (op0, op1, op2) );
+
   else
     {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op3, ret);
+      rtx op3 = gen_reg_rtx (DImode);
+      rtx op4 = gen_reg_rtx (DImode);
+      rtx op5 = gen_reg_rtx (DImode);
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op5, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op5, ret);
+	}
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op3, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op3, ret);
+	}
+      emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
     }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }
   [(set_attr "type" "mul")])
@@ -1718,37 +1736,47 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op5, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op5, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op5, target);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op3, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op3, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op3, target);
-    }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
-  DONE;
+
+    if (TARGET_POWER10)
+      emit_insn (gen_udivv2di3 (op0, op1, op2) );
+
+    else
+      {
+	rtx op3 = gen_reg_rtx (DImode);
+	rtx op4 = gen_reg_rtx (DImode);
+	rtx op5 = gen_reg_rtx (DImode);
+
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op5, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op5, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op5, target);
+	  }
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op3, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op3, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op3, target);
+	  }
+	emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
+      }
+    DONE;
 }
   [(set_attr "type" "div")])
 
@@ -6104,3 +6132,80 @@
   "TARGET_POWER10"
   "vexpand<wd>m %0,%1"
   [(set_attr "type" "vecsimple")])
+
+(define_insn "vdives_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
+		       UNSPEC_VDIVES))]
+  "TARGET_POWER10"
+  "vdives<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vdiveu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+			 (match_operand:VIlong 2 "vsx_register_operand" "v")]
+			UNSPEC_VDIVEU))]
+  "TARGET_POWER10"
+  "vdiveu<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "div<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivs<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "udiv<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivu<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vmods_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmods<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vmodu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmodu<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vmulhs_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
+		       UNSPEC_VMULHS))]
+  "TARGET_POWER10"
+  "vmulhs<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vmulhu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
+		       UNSPEC_VMULHU))]
+  "TARGET_POWER10"
+  "vmulhu<VIlong_char> %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+;; Vector multiply low double word
+(define_insn "mulv2di3"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
+	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
+		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmulld %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7a6ecce6a84..0282ef83154 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21465,6 +21465,126 @@ integer value between 0 and 255 inclusive.
 @exdent vector unsigned int vec_genpcvm (vector unsigned long long int,
                                          const int)
 @end smallexample
+
+Vector Integer Multiply-Divide-Modulo
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mulh (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mulh (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value i from 0 to 3, do the following. The integer value in
+word element i of a is multiplied by the integer value in word
+element i of b. The high-order 32 bits of the 64-bit product are placed into
+word element i of the vector returned.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mulh (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mulh (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value i from 0 to 1, do the following. The integer value in
+doubleword element i of a is multiplied by the integer value in doubleword
+element i of b. The high-order 64 bits of the 128-bit product are placed into
+doubleword element i of the vector returned.
+
+@smallexample
+@exdent vector unsigned long long
+@exdent vec_mul (vector unsigned long long a, vector unsigned long long b)
+@exdent vector signed long long
+@exdent vec_mul (vector signed long long a, vector signed long long b)
+@end smallexample
+
+For each integer value i from 0 to 1, do the following. The integer value in
+doubleword element i of a is multiplied by the integer value in
+doubleword element i of b. The low-order 64 bits of the 128-bit product
+are placed into doubleword element i of the vector returned.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_div (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_div (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value i from 0 to 3, do the following. The integer in word
+element i of a is divided by the integer in word element i of b. The unique
+integer quotient is placed into the word element i of the vector returned. If
+an attempt is made to perform any of the divisions <anything> ÷ 0 then the
+quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_div (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_div (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value i from 0 to 1, do the following. The integer in
+doubleword element i of a is divided by the integer in doubleword
+element i of b. The unique integer quotient is placed into the
+doubleword element i of the vector returned. If an attempt is made to perform
+any of the divisions 0x8000_0000_0000_0000 ÷ -1 or <anything> ÷ 0 then the
+quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_dive (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_dive (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value i from 0 to 3, do the following. The integer in word
+element i of a is shifted left by 32 bits, then divided by the integer
+in word element i of b. The unique integer quotient is placed into the
+word element i of the vector returned. If the quotient cannot be represented
+in 32 bits, or if an attempt is made to perform any of the divisions
+<anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_dive (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_dive (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value i from 0 to 1, do the following. The integer in
+doubleword element i of a is shifted left by 64 bits, then divided by the
+integer in doubleword element i of b. The unique integer quotient is placed
+into the doubleword element i of the vector returned. If the quotient cannot
+be represented in 64 bits, or if an attempt is made to perform <anything> ÷ 0
+then the quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mod (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mod (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value i from 0 to 3, do the following. The integer in word
+element i of a is divided by the integer in word element i of b. The unique
+integer remainder is placed into the word element i of the vector returned.
+If an attempt is made to perform any of the divisions 0x8000_0000 ÷ -1 or
+<anything> ÷ 0 then the remainder is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mod (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mod (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value i from 0 to 1, do the following. The integer in
+doubleword element i of a is divided by the integer in doubleword element i of
+b. The unique integer remainder is placed into the doubleword
+element i of the vector returned. If an attempt is made to perform
+<anything> ÷ 0 then the remainder is undefined.
+
 Generate PCV from specified Mask size, as if implemented by the
 @code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where
 immediate value is either 0, 1, 2 or 3.
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
new file mode 100644
index 00000000000..3e3d63ec8bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -0,0 +1,378 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -save-temps" } */
+/* { dg-final { scan-assembler-times {\mvdivsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmoduw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulld\M} 2 } } */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <math.h>
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+void abort (void);
+
+int main()
+  {
+    int i;
+    vector int i_arg1, i_arg2;
+    vector unsigned int u_arg1, u_arg2;
+    vector long long int d_arg1, d_arg2;
+    vector long long unsigned int ud_arg1, ud_arg2;
+   
+    vector int vec_i_expected, vec_i_result;
+    vector unsigned int vec_u_expected, vec_u_result;
+    vector long long int vec_d_expected, vec_d_result;
+    vector long long unsigned int vec_ud_expected, vec_ud_result;
+  
+    /* Signed word divide */
+    i_arg1 = (vector int){ 20, 40, 60, 80};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){10, 20, 30, 40};
+
+    vec_i_result = vec_div (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word divide */
+    u_arg1 = (vector unsigned int){ 20, 40, 60, 80};
+    u_arg2 = (vector unsigned int){ 2, 2, 2, 2};
+    vec_u_expected = (vector unsigned int){10, 20, 30, 40};
+
+    vec_u_result = vec_div (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word divide */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 2, 2};
+    vec_d_expected = (vector long long){12, 34};
+
+    vec_d_result = vec_div (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+	  printf("ERROR vec_div result[%d] = %d != expected[%d] = %d\n",
+		 i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word divide */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 2, 2};
+    vec_ud_expected = (vector unsigned long long){12, 34};
+
+    vec_ud_result = vec_div (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended signed word  result = (arg1 << 32)/arg2 */
+    i_arg1 = (vector int){ 2, 4, 6, 8};
+    i_arg2 = (vector int){ 2048, 2048, 2048, 2048};
+    vec_i_expected = (vector int){4194304, 8388608, 12582912, 16777216};
+
+    vec_i_result = vec_dive (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended unsigned word  result = (arg1 << 32)/arg2 */
+    u_arg1 = (vector unsigned int){ 2, 4, 6, 8};
+    u_arg2 = (vector unsigned int){ 2048, 2048, 2048, 2048};
+    vec_u_expected = (vector unsigned int){4194304, 8388608,
+					   12582912, 16777216};
+
+    vec_u_result = vec_dive (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double signed  esult = (arg1 << 64)/arg2 */
+    d_arg1 = (vector long long int){ 2, 4};
+    d_arg2 = (vector long long int){ 4294967296, 4294967296};
+
+    vec_d_expected = (vector long long int){8589934592, 17179869184};
+
+    vec_d_result = vec_dive (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive result[%d] = %lld != expected[%d] = %lld\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double unsigned result = (arg1 << 64)/arg2 */
+    ud_arg1 = (vector long long unsigned int){ 2, 4};
+    ud_arg2 = (vector long long unsigned int){ 4294967296, 4294967296};
+
+    vec_ud_expected = (vector long long unsigned int){8589934592,
+						      17179869184};
+
+    vec_ud_result = vec_dive (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive result[%d] = %lld != expected[%d] = %lld\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word modulo */
+    i_arg1 = (vector int){ 23, 45, 61, 89};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){1, 1, 1, 1};
+
+    vec_i_result = vec_mod (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word modulo */
+    u_arg1 = (vector unsigned int){ 25, 41, 67, 86};
+    u_arg2 = (vector unsigned int){ 3, 3, 3, 3};
+    vec_u_expected = (vector unsigned int){1, 2, 1, 2};
+
+    vec_u_result = vec_mod (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word modulo */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 7, 7};
+    vec_d_expected = (vector long long){3, 5};
+
+    vec_d_result = vec_mod (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word modulo */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 8, 8};
+    vec_ud_expected = (vector unsigned long long){0, 4};
+
+    vec_ud_result = vec_mod (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vecmod result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word multiply high */
+    i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 };
+    i_arg2 = (vector int){ 2, 3, 4, 5};
+    vec_i_expected = (vector int){-1, -2, -2, -3};
+
+    vec_i_result = vec_mulh (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word multiply high */
+    u_arg1 = (vector unsigned int){ 2147483648, 2147483648,
+				    2147483648, 2147483648 };
+    u_arg2 = (vector unsigned int){ 4, 5, 6, 7 };
+    vec_u_expected = (vector unsigned int){2, 2, 3, 3 };
+
+    vec_u_result = vec_mulh (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply high */
+    d_arg1 = (vector long long int){  2305843009213693951,
+				      4611686018427387903 };
+    d_arg2 = (vector long long int){ 12, 20 };
+    vec_d_expected = (vector long long int){ 1, 4 };
+
+    vec_d_result = vec_mulh (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply high */
+    ud_arg1 = (vector unsigned long long int){ 2305843009213693951,
+					       4611686018427387903 };
+    ud_arg2 = (vector unsigned long long int){ 32, 10 };
+    vec_ud_expected = (vector unsigned long long int){ 3, 2 };
+
+    vec_ud_result = vec_mulh (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply low */
+    ud_arg1 = (vector unsigned long long int){ 2048, 4096 };
+    ud_arg2 = (vector unsigned long long int){ 2, 4 };
+    vec_ud_expected = (vector unsigned long long int){ 4096, 16384 };
+
+    vec_ud_result = vec_mul (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply low */
+    d_arg1 = (vector signed long long int){ 2048, 4096 };
+    d_arg2 = (vector signed long long int){ 2, 4 };
+    vec_d_expected = (vector signed long long int){ 4096, 16384 };
+
+    vec_d_result = vec_mul (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul result[%d] = %d != expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+  }
-- 
2.17.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions
  2020-11-04 16:44   ` Carl Love
@ 2020-11-19  0:42     ` David Edelsohn
  2020-11-19 17:25     ` Pat Haugen
  2020-11-19 23:26     ` Segher Boessenkool
  2 siblings, 0 replies; 22+ messages in thread
From: David Edelsohn @ 2020-11-19  0:42 UTC (permalink / raw)
  To: Carl Love
  Cc: GCC Patches, Will Schmidt, Segher Boessenkool, Bill Schmidt,
	Peter Bergner

On Wed, Nov 4, 2020 at 11:44 AM Carl Love <cel@us.ibm.com> wrote:
>
> David:
>
> I have reworked the patch moving the new vector instruction patterns to
> vsx.md.  Also, cleaned up the vector division instructions.  The
> div<mode>3 pattern definitions are the only ones that should be
> defined.
>
> I have retested the patch on:
>
>    powerpc64le-unknown-linux-gnu (Power 9 LE)
>
> with no regressions. Additionally the new test case was compiled and
> executed by hand on Mambo to verify the test case passes.
>
> Please let me know if this patch is acceptable for mainline.  Thanks.
>
>                 Carl Love
>
> --------------------------------------------------------------
>
> 2020-11-02  Carl Love  <cel@us.ibm.com>
>
> gcc/
>         * config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
>         defines.
>         * config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
>         * config/rs6000/rs6000-builtin.def (VDIVES_V4SI, VDIVES_V2DI,
>         VDIVEU_V4SI, VDIVEU_V2DI, VDIVS_V4SI, VDIVS_V2DI, VDIVU_V4SI,
>         VDIVU_V2DI, VMODS_V2DI, VMODS_V4SI, VMODU_V2DI, VMODU_V4SI,
>         VMULHS_V2DI, VMULHS_V4SI, VMULHU_V2DI, VMULHU_V4SI, VMULLD_V2DI):
>         Add builtin define.
>         (VMUL, VMULH, VDIVE, VMOD):  Add new BU_P10_OVERLOAD_2 definitions.
>         * config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
>         P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH):
>         New overloaded definitions.
>         (builtin_function_type) [P10V_BUILTIN_VDIVEU_V4SI,
>         P10V_BUILTIN_VDIVEU_V2DI, P10V_BUILTIN_VDIVU_V4SI,
>         P10V_BUILTIN_VDIVU_V2DI, P10V_BUILTIN_VMODU_V2DI,
>         P10V_BUILTIN_VMODU_V4SI, P10V_BUILTIN_VMULHU_V2DI,
>         P10V_BUILTIN_VMULHU_V4SI, P10V_BUILTIN_VMULLD_V2DI]: Add case
>         statement for builtins.
>         * config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute.
>         (UNSPEC_VDIVES, UNSPEC_VDIVEU,
>         UNSPEC_VMULHS, UNSPEC_VMULHU, UNSPEC_VMULLD): Add enum for UNSPECs.
>         (vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement.
>         (vdives_<mode>, vdiveu_<mode>, vdiv<mode>3, uuvdiv<mode>3,
>         vmods_<mode>, vmodu_<mode>, vmulhs_<mode>, vmulhu_<mode>, mulv2di3):
>         Add define_insn, mode is VIlong.
>         * doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
>         builtin descriptions.
>
> gcc/testsuite/
>         * gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.

Hi, Carl

Thanks for making the changes.  This looks okay to me now.  I don't
know if Segher has any additional requests.

Thanks, David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions
  2020-11-04 16:44   ` Carl Love
  2020-11-19  0:42     ` David Edelsohn
@ 2020-11-19 17:25     ` Pat Haugen
  2020-11-19 20:40       ` Segher Boessenkool
  2020-11-19 23:26     ` Segher Boessenkool
  2 siblings, 1 reply; 22+ messages in thread
From: Pat Haugen @ 2020-11-19 17:25 UTC (permalink / raw)
  To: Carl Love, David Edelsohn; +Cc: GCC Patches, Segher Boessenkool

On 11/4/20 10:44 AM, Carl Love via Gcc-patches wrote:
> +
> +(define_insn "vdives_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +		       UNSPEC_VDIVES))]
> +  "TARGET_POWER10"
> +  "vdives<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vdiveu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +			 (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +			UNSPEC_VDIVEU))]
> +  "TARGET_POWER10"
> +  "vdiveu<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "div<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "udiv<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmods_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmods<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmodu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmodu<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

Since the vdiv.../vmod... instructions execute in the fixed point divide unit, all the above instructions should have a type of "div" instead of "vecsimple".


> +
> +(define_insn "vmulhs_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +		       UNSPEC_VMULHS))]
> +  "TARGET_POWER10"
> +  "vmulhs<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmulhu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +		       UNSPEC_VMULHU))]
> +  "TARGET_POWER10"
> +  "vmulhu<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])> +
> +;; Vector multiply low double word
> +(define_insn "mulv2di3"
> +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> +	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> +		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmulld %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

Similarly, the above 3 insns should have a "mul" instruction type.

-Pat

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions
  2020-11-19 17:25     ` Pat Haugen
@ 2020-11-19 20:40       ` Segher Boessenkool
  0 siblings, 0 replies; 22+ messages in thread
From: Segher Boessenkool @ 2020-11-19 20:40 UTC (permalink / raw)
  To: Pat Haugen; +Cc: Carl Love, David Edelsohn, GCC Patches

On Thu, Nov 19, 2020 at 11:25:08AM -0600, Pat Haugen wrote:
> > +(define_insn "vmodu_<mode>"
> > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > +	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> > +		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> > +  "TARGET_POWER10"
> > +  "vmodu<VIlong_char> %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> 
> Since the vdiv.../vmod... instructions execute in the fixed point divide unit,

... on some implementations.  The only one currently, sure, but...

> all the above instructions should have a type of "div" instead of "vecsimple".

... it should use "vecdiv" instead (which already exists).  And set
"size" to a proper value as well, so that the scheduling models can see
the difference with e.g. xsdivqp (which should perhaps not use vecdiv at
all itself, it is a scalar div, but we do not currently have good types
for that).

> > +;; Vector multiply low double word
> > +(define_insn "mulv2di3"
> > +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> > +	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> > +		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
> > +  "TARGET_POWER10"
> > +  "vmulld %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> 
> Similarly, the above 3 insns should have a "mul" instruction type.

The existing AltiVec vmul* are type "veccomplex", because that was the
execution pipe used on original AltiVec...  This needs to be adapted as
well.  Not sure what is best.


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions
  2020-11-04 16:44   ` Carl Love
  2020-11-19  0:42     ` David Edelsohn
  2020-11-19 17:25     ` Pat Haugen
@ 2020-11-19 23:26     ` Segher Boessenkool
  2020-11-24 18:59       ` [PATCH v2] " Carl Love
  2 siblings, 1 reply; 22+ messages in thread
From: Segher Boessenkool @ 2020-11-19 23:26 UTC (permalink / raw)
  To: Carl Love
  Cc: David Edelsohn, GCC Patches, Will Schmidt, Bill Schmidt, Peter Bergner

On Wed, Nov 04, 2020 at 08:44:03AM -0800, Carl Love wrote:
> +#define vec_mulh(a, b) __builtin_vec_mulh (a, b)
> +#define vec_div(a, b) __builtin_vec_div (a, b)
> +#define vec_dive(a, b) __builtin_vec_dive (a, b)
> +#define vec_mod(a, b) __builtin_vec_mod (a, b)

This should be

#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))

etc...  I see we have quite a few cases in altivec.h already that do not
get that right.  Something to fix, and apparently not too important in
practice ;-)

>  ;; Short vec int modes
>  (define_mode_iterator VIshort [V8HI V16QI])
> -;; Longer vec int modes for rotate/mask ops
> -(define_mode_iterator VIlong [V2DI V4SI])

Hrm, you move this one to vsx.md, but leave VIshort here (instead of
moving that to altivec.md).  Oh well, something needs to be done about
this split anyway.

> +BU_P10V_AV_2 (VDIVES_V4SI, "vdivesw", CONST, vdives_v4si)
> +BU_P10V_AV_2 (VDIVES_V2DI, "vdivesd", CONST, vdives_v2di)
> +BU_P10V_AV_2 (VDIVEU_V4SI, "vdiveuw", CONST, vdiveu_v4si)
> +BU_P10V_AV_2 (VDIVEU_V2DI, "vdiveud", CONST, vdiveu_v2di)
> +BU_P10V_AV_2 (VDIVS_V4SI, "vdivsw", CONST, divv4si3)
> +BU_P10V_AV_2 (VDIVS_V2DI, "vdivsd", CONST, divv2di3)
> +BU_P10V_AV_2 (VDIVU_V4SI, "vdivuw", CONST, udivv4si3)
> +BU_P10V_AV_2 (VDIVU_V2DI, "vdivud", CONST, udivv2di3)
> +BU_P10V_AV_2 (VMODS_V2DI, "vmodsd", CONST, vmods_v2di)
> +BU_P10V_AV_2 (VMODS_V4SI, "vmodsw", CONST, vmods_v4si)
> +BU_P10V_AV_2 (VMODU_V2DI, "vmodud", CONST, vmodu_v2di)
> +BU_P10V_AV_2 (VMODU_V4SI, "vmoduw", CONST, vmodu_v4si)
> +BU_P10V_AV_2 (VMULHS_V2DI, "vmulhsd", CONST, vmulhs_v2di)
> +BU_P10V_AV_2 (VMULHS_V4SI, "vmulhsw", CONST, vmulhs_v4si)
> +BU_P10V_AV_2 (VMULHU_V2DI, "vmulhud", CONST, vmulhu_v2di)
> +BU_P10V_AV_2 (VMULHU_V4SI, "vmulhuw", CONST, vmulhu_v4si)
> +BU_P10V_AV_2 (VMULLD_V2DI, "vmulld", CONST, mulv2di3)

So I would remove the leading "v" from all these pattern names, since
all of them have a mode in the name already.

> +(define_mode_attr VIlong_char [(V2DI "d")
> +			       (V4SI "w")])

This is just a subset of <wd> -- use that, instead?

; A generic w/d attribute, for things like cmpw/cmpd.
(define_mode_attr wd [(QI    "b")
                      (HI    "h")
                      (SI    "w")
                      (DI    "d")
                      (V16QI "b")
                      (V8HI  "h")
                      (V4SI  "w")
                      (V2DI  "d")
                      (V1TI  "q")
                      (TI    "q")])

(never mind the name, heh -- it still is nice and short ;-) )

> +(define_insn "vmulhs_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +		       UNSPEC_VMULHS))]
> +  "TARGET_POWER10"
> +  "vmulhs<VIlong_char> %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

The scalar mulh we can describe without unspecs, cannot that be done
here as well?

The type attr is problematic...  At least make it the same as the other
vector int multiplies?  That is veccomplex?

> +Vector Integer Multiply-Divide-Modulo

Use "/" instead of "-" here?  "-" normally is used for things like
"multiply-sum", not to mean "or".

> +For each integer value i from 0 to 3, do the following. The integer value in
> +word element i of a is multiplied by the integer value in word
> +element i of b. The high-order 32 bits of the 64-bit product are placed into
> +word element i of the vector returned.

I think you should quote the "i"?  @code{i} or similar.  I don't think
you need to mark up the digits, phew :-)

Please repost with those things fixed?  Thanks!


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions
  2020-11-19 23:26     ` Segher Boessenkool
@ 2020-11-24 18:59       ` Carl Love
  2020-11-25  2:17         ` Pat Haugen
  0 siblings, 1 reply; 22+ messages in thread
From: Carl Love @ 2020-11-24 18:59 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: David Edelsohn, GCC Patches, Will Schmidt, Bill Schmidt, Peter Bergner

Segher:

I have addressed the various issues you and Pat mentioned. 
Specifically:

  - Added parenthesis around the macro arguments in altivec.h.
  - Removed VIlong_char, using <wd> instead.
  - Reimplemented define_insn "vmulhs_<mode>" and define_insn 
    "vmulhs_<mode>" to not use an UNSPEC.  Also changed the type to
    veccomplex.
  - Fixed the header and index value "i" in documentation file 	
    extend.texi as requested.
  - Changed attribute type from vecsimple to vecdiv in dives_<mode>,
    diveu_<mode>, div<mode>3, udiv<mode>3, mods_<mode>, modu_<mode>.
  - Added the size attribute of 128 to mods_<mode> and modu_<mode>.
  - Changed the set attribute to veccomplex for define_insn "mulv2di3.
  - Added "signed" or "unsigned" to the error print statements in the
    test program to clarify what case had failed.

In addition to testing on Power 9, I was able to test the updated patch
on Power 10.

Please let me know if the above changes are all acceptable and if there
are any additional changes needed.  Thanks.

                    Carl 


-----------------------------------------------------------------
GCC maintainers:

The following patch adds new builtins for the vector integer multiply,
divide and modulo operations.  The builtins are:  
vec_mulh(), vec_div(), vec_dive(), vec_mod() for signed and unsigned
integers and long long integers.  Support for signed and unsigned long
long integers the exiting vec_mul() is added.  Note that the existing
support for the vec_div()and vec_mul() builtins emulate the vector
operations with multiple scalar instructions.  This patch adds support
for these builtins to use the new vector instructions.

The patch was compiled and tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE) 
  powerpc64le-unknown-linux-gnu (Power 10 LE)
  
with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

                Carl Love



-------------------------------------------------------


2020-11-23  Carl Love  <cel@us.ibm.com>

gcc/
	* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
	defines.
	* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
	* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
	DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
	DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
	MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
	Add builtin define.
	(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
	* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
	P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH):
	New overloaded definitions.
	(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
	P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
	P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
	P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
	P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
	statement for builtins.
	* config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute.
	(UNSPEC_VDIVES, UNSPEC_VDIVEU): Add enum for UNSPECs.
	(vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement.
	(dives_<mode>, diveu_<mode>, div<mode>3, uvdiv<mode>3,
	mods_<mode>, modu_<mode>, mulhs_<mode>, mulhu_<mode>, mulv2di3):
	Add define_insn, mode is VIlong.
	* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
	builtin descriptions.

gcc/testsuite/
	* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |   5 +
 gcc/config/rs6000/altivec.md                  |   2 -
 gcc/config/rs6000/rs6000-builtin.def          |  22 +
 gcc/config/rs6000/rs6000-call.c               |  49 +++
 gcc/config/rs6000/vsx.md                      | 213 +++++++---
 gcc/doc/extend.texi                           | 120 ++++++
 .../powerpc/builtins-1-p10-runnable.c         | 398 ++++++++++++++++++
 7 files changed, 757 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e1884f51bd8..12ccbd2fc2f 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,11 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a)	__builtin_vec_strir_p (a)
 #define vec_stril_p(a)	__builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
+#define vec_div(a, b) __builtin_vec_div ((a), (b))
+#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
+#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..f10f1cdd8a7 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -193,8 +193,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index a58102c3785..4eaec550a49 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2877,6 +2877,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
 BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
 
+BU_P10V_AV_2 (DIVES_V4SI, "vdivesw", CONST, dives_v4si)
+BU_P10V_AV_2 (DIVES_V2DI, "vdivesd", CONST, dives_v2di)
+BU_P10V_AV_2 (DIVEU_V4SI, "vdiveuw", CONST, diveu_v4si)
+BU_P10V_AV_2 (DIVEU_V2DI, "vdiveud", CONST, diveu_v2di)
+BU_P10V_AV_2 (DIVS_V4SI, "vdivsw", CONST, divv4si3)
+BU_P10V_AV_2 (DIVS_V2DI, "vdivsd", CONST, divv2di3)
+BU_P10V_AV_2 (DIVU_V4SI, "vdivuw", CONST, udivv4si3)
+BU_P10V_AV_2 (DIVU_V2DI, "vdivud", CONST, udivv2di3)
+BU_P10V_AV_2 (MODS_V2DI, "vmodsd", CONST, mods_v2di)
+BU_P10V_AV_2 (MODS_V4SI, "vmodsw", CONST, mods_v4si)
+BU_P10V_AV_2 (MODU_V2DI, "vmodud", CONST, modu_v2di)
+BU_P10V_AV_2 (MODU_V4SI, "vmoduw", CONST, modu_v4si)
+BU_P10V_AV_2 (MULHS_V2DI, "vmulhsd", CONST, mulhs_v2di)
+BU_P10V_AV_2 (MULHS_V4SI, "vmulhsw", CONST, mulhs_v4si)
+BU_P10V_AV_2 (MULHU_V2DI, "vmulhud", CONST, mulhu_v2di)
+BU_P10V_AV_2 (MULHU_V4SI, "vmulhuw", CONST, mulhu_v4si)
+BU_P10V_AV_2 (MULLD_V2DI, "vmulld", CONST, mulv2di3)
+
 BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, xxspltiw_v4si)
 BU_P10V_VSX_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST, xxspltiw_v4sf)
 
@@ -2952,6 +2970,10 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
 BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
 BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
 
+BU_P10_OVERLOAD_2 (MULH, "mulh")
+BU_P10_OVERLOAD_2 (DIVE, "dive")
+BU_P10_OVERLOAD_2 (MOD, "mod")
+
 \f
 BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
 BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 92378e958a9..914656e4d08 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -1069,6 +1069,35 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
@@ -1909,6 +1938,17 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB,
@@ -14410,6 +14450,15 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10V_BUILTIN_XXGENPCVM_V8HI:
     case P10V_BUILTIN_XXGENPCVM_V4SI:
     case P10V_BUILTIN_XXGENPCVM_V2DI:
+    case P10V_BUILTIN_DIVEU_V4SI:
+    case P10V_BUILTIN_DIVEU_V2DI:
+    case P10V_BUILTIN_DIVU_V4SI:
+    case P10V_BUILTIN_DIVU_V2DI:
+    case P10V_BUILTIN_MODU_V2DI:
+    case P10V_BUILTIN_MODU_V4SI:
+    case P10V_BUILTIN_MULHU_V2DI:
+    case P10V_BUILTIN_MULHU_V4SI:
+    case P10V_BUILTIN_MULLD_V2DI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 947631d83ee..b04b75361a9 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -267,6 +267,10 @@
 (define_mode_iterator VSX_MM [V16QI V8HI V4SI V2DI V1TI])
 (define_mode_iterator VSX_MM4 [V16QI V8HI V4SI V2DI])
 
+;; Longer vec int modes for rotate/mask ops
+;; and Vector Integer Multiply/Divide/Modulo Instructions
+(define_mode_iterator VIlong [V2DI V4SI])
+
 ;; Constants for creating unspecs
 (define_c_enum "unspec"
   [UNSPEC_VSX_CONCAT
@@ -363,8 +367,11 @@
    UNSPEC_INSERTR
    UNSPEC_REPLACE_ELT
    UNSPEC_REPLACE_UN
+   UNSPEC_VDIVES
+   UNSPEC_VDIVEU
   ])
 
+
 (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
 				 UNSPEC_VSX_XVCVBF16SPN])
 
@@ -1623,28 +1630,35 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op5, op3, op4));
-  else
-    {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op5, ret);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op3, op3, op4));
+
+  if (TARGET_POWER10)
+    emit_insn (gen_mulv2di3 (op0, op1, op2) );
+
   else
     {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op3, ret);
+      rtx op3 = gen_reg_rtx (DImode);
+      rtx op4 = gen_reg_rtx (DImode);
+      rtx op5 = gen_reg_rtx (DImode);
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op5, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op5, ret);
+	}
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op3, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op3, ret);
+	}
+      emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
     }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }
   [(set_attr "type" "mul")])
@@ -1718,37 +1732,47 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op5, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op5, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op5, target);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op3, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op3, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op3, target);
-    }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
-  DONE;
+
+    if (TARGET_POWER10)
+      emit_insn (gen_udivv2di3 (op0, op1, op2) );
+
+    else
+      {
+	rtx op3 = gen_reg_rtx (DImode);
+	rtx op4 = gen_reg_rtx (DImode);
+	rtx op5 = gen_reg_rtx (DImode);
+
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op5, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op5, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op5, target);
+	  }
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op3, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op3, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op3, target);
+	  }
+	emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
+      }
+    DONE;
 }
   [(set_attr "type" "div")])
 
@@ -6104,3 +6128,92 @@
   "TARGET_POWER10"
   "vexpand<wd>m %0,%1"
   [(set_attr "type" "vecsimple")])
+
+(define_insn "dives_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
+		       UNSPEC_VDIVES))]
+  "TARGET_POWER10"
+  "vdives<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "128")])
+
+(define_insn "diveu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+			 (match_operand:VIlong 2 "vsx_register_operand" "v")]
+			UNSPEC_VDIVEU))]
+  "TARGET_POWER10"
+  "vdiveu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "128")])
+
+(define_insn "div<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivs<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "128")])
+
+(define_insn "udiv<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "128")])
+
+(define_insn "mods_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmods<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "128")])
+
+(define_insn "modu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmodu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "128")])
+
+(define_insn "mulhs_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(mult:VIlong (ashiftrt
+		       (match_operand:VIlong 1 "vsx_register_operand" "v")
+		       (const_int 32))
+		     (ashiftrt
+		       (match_operand:VIlong 2 "vsx_register_operand" "v")
+		       (const_int 32))))]
+  "TARGET_POWER10"
+  "vmulhs<wd> %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+(define_insn "mulhu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(mult:VIlong (ashiftrt
+		       (match_operand:VIlong 1 "vsx_register_operand" "v")
+		       (const_int 32))
+		     (ashiftrt
+		       (match_operand:VIlong 2 "vsx_register_operand" "v")
+		       (const_int 32))))]
+  "TARGET_POWER10"
+  "vmulhu<wd> %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+;; Vector multiply low double word
+(define_insn "mulv2di3"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
+	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
+		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmulld %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7a6ecce6a84..de1a6a0ec57 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21465,6 +21465,126 @@ integer value between 0 and 255 inclusive.
 @exdent vector unsigned int vec_genpcvm (vector unsigned long long int,
                                          const int)
 @end smallexample
+
+Vector Integer Multiply/Divide/Modulo
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mulh (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mulh (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer
+value in word element @code{i} of a is multiplied by the integer value in word
+element @code{i} of b. The high-order 32 bits of the 64-bit product are placed
+into word element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mulh (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mulh (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer
+value in doubleword element @code{i} of a is multiplied by the integer value in
+doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product
+are placed into doubleword element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector unsigned long long
+@exdent vec_mul (vector unsigned long long a, vector unsigned long long b)
+@exdent vector signed long long
+@exdent vec_mul (vector signed long long a, vector signed long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer
+value in doubleword element @code{i} of a is multiplied by the integer value in
+doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product
+are placed into doubleword element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_div (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_div (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is divided by the integer in word element @code{i}
+of b. The unique integer quotient is placed into the word element @code{i} of
+the vector returned. If an attempt is made to perform any of the divisions
+<anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_div (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_div (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is divided by the integer in doubleword
+element @code{i} of b. The unique integer quotient is placed into the
+doubleword element @code{i} of the vector returned. If an attempt is made to
+perform any of the divisions 0x8000_0000_0000_0000 ÷ -1 or <anything> ÷ 0 then
+the quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_dive (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_dive (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is shifted left by 32 bits, then divided by the
+integer in word element @code{i} of b. The unique integer quotient is placed
+into the word element @code{i} of the vector returned. If the quotient cannot
+be represented in 32 bits, or if an attempt is made to perform any of the
+divisions <anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_dive (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_dive (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is shifted left by 64 bits, then divided by
+the integer in doubleword element @code{i} of b. The unique integer quotient is
+placed into the doubleword element @code{i} of the vector returned. If the
+quotient cannot be represented in 64 bits, or if an attempt is made to perform
+<anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mod (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mod (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is divided by the integer in word element @code{i}
+of b. The unique integer remainder is placed into the word element @code{i} of
+the vector returned.  If an attempt is made to perform any of the divisions
+0x8000_0000 ÷ -1 or <anything> ÷ 0 then the remainder is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mod (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mod (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is divided by the integer in doubleword
+element @code{i} of b. The unique integer remainder is placed into the
+doubleword element @code{i} of the vector returned. If an attempt is made to
+perform <anything> ÷ 0 then the remainder is undefined.
+
 Generate PCV from specified Mask size, as if implemented by the
 @code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where
 immediate value is either 0, 1, 2 or 3.
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
new file mode 100644
index 00000000000..222c8b3a409
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -0,0 +1,398 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+
+/* { dg-final { scan-assembler-times {\mvdivsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmoduw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulld\M} 2 } } */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <math.h>
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+void abort (void);
+
+int main()
+  {
+    int i;
+    vector int i_arg1, i_arg2;
+    vector unsigned int u_arg1, u_arg2;
+    vector long long int d_arg1, d_arg2;
+    vector long long unsigned int ud_arg1, ud_arg2;
+   
+    vector int vec_i_expected, vec_i_result;
+    vector unsigned int vec_u_expected, vec_u_result;
+    vector long long int vec_d_expected, vec_d_result;
+    vector long long unsigned int vec_ud_expected, vec_ud_result;
+  
+    /* Signed word divide */
+    i_arg1 = (vector int){ 20, 40, 60, 80};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){10, 20, 30, 40};
+
+    vec_i_result = vec_div (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word divide */
+    u_arg1 = (vector unsigned int){ 20, 40, 60, 80};
+    u_arg2 = (vector unsigned int){ 2, 2, 2, 2};
+    vec_u_expected = (vector unsigned int){10, 20, 30, 40};
+
+    vec_u_result = vec_div (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word divide */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 2, 2};
+    vec_d_expected = (vector long long){12, 34};
+
+    vec_d_result = vec_div (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+	  printf("ERROR vec_div signed result[%d] = %d != "
+		 "expected[%d] = %d\n",
+		 i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word divide */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 2, 2};
+    vec_ud_expected = (vector unsigned long long){12, 34};
+
+    vec_ud_result = vec_div (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended signed word  result = (arg1 << 32)/arg2 */
+    i_arg1 = (vector int){ 2, 4, 6, 8};
+    i_arg2 = (vector int){ 2048, 2048, 2048, 2048};
+    vec_i_expected = (vector int){4194304, 8388608, 12582912, 16777216};
+
+    vec_i_result = vec_dive (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended unsigned word  result = (arg1 << 32)/arg2 */
+    u_arg1 = (vector unsigned int){ 2, 4, 6, 8};
+    u_arg2 = (vector unsigned int){ 2048, 2048, 2048, 2048};
+    vec_u_expected = (vector unsigned int){4194304, 8388608,
+					   12582912, 16777216};
+
+    vec_u_result = vec_dive (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double signed  esult = (arg1 << 64)/arg2 */
+    d_arg1 = (vector long long int){ 2, 4};
+    d_arg2 = (vector long long int){ 4294967296, 4294967296};
+
+    vec_d_expected = (vector long long int){8589934592, 17179869184};
+
+    vec_d_result = vec_dive (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive signed result[%d] = %lld != "
+		  "expected[%d] = %lld\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double unsigned result = (arg1 << 64)/arg2 */
+    ud_arg1 = (vector long long unsigned int){ 2, 4};
+    ud_arg2 = (vector long long unsigned int){ 4294967296, 4294967296};
+
+    vec_ud_expected = (vector long long unsigned int){8589934592,
+						      17179869184};
+
+    vec_ud_result = vec_dive (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive unsigned result[%d] = %lld != "
+		  "expected[%d] = %lld\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word modulo */
+    i_arg1 = (vector int){ 23, 45, 61, 89};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){1, 1, 1, 1};
+
+    vec_i_result = vec_mod (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word modulo */
+    u_arg1 = (vector unsigned int){ 25, 41, 67, 86};
+    u_arg2 = (vector unsigned int){ 3, 3, 3, 3};
+    vec_u_expected = (vector unsigned int){1, 2, 1, 2};
+
+    vec_u_result = vec_mod (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word modulo */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 7, 7};
+    vec_d_expected = (vector long long){3, 5};
+
+    vec_d_result = vec_mod (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word modulo */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 8, 8};
+    vec_ud_expected = (vector unsigned long long){0, 4};
+
+    vec_ud_result = vec_mod (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vecmod unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word multiply high */
+    i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 };
+    i_arg2 = (vector int){ 2, 3, 4, 5};
+    //    vec_i_expected = (vector int){-1, -2, -2, -3};
+    vec_i_expected = (vector int){1, -2, -2, -3};
+
+    vec_i_result = vec_mulh (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word multiply high */
+    u_arg1 = (vector unsigned int){ 2147483648, 2147483648,
+				    2147483648, 2147483648 };
+    u_arg2 = (vector unsigned int){ 4, 5, 6, 7 };
+    vec_u_expected = (vector unsigned int){2, 2, 3, 3 };
+
+    vec_u_result = vec_mulh (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply high */
+    d_arg1 = (vector long long int){  2305843009213693951,
+				      4611686018427387903 };
+    d_arg2 = (vector long long int){ 12, 20 };
+    vec_d_expected = (vector long long int){ 1, 4 };
+
+    vec_d_result = vec_mulh (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply high */
+    ud_arg1 = (vector unsigned long long int){ 2305843009213693951,
+					       4611686018427387903 };
+    ud_arg2 = (vector unsigned long long int){ 32, 10 };
+    vec_ud_expected = (vector unsigned long long int){ 3, 2 };
+
+    vec_ud_result = vec_mulh (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply low */
+    ud_arg1 = (vector unsigned long long int){ 2048, 4096 };
+    ud_arg2 = (vector unsigned long long int){ 2, 4 };
+    vec_ud_expected = (vector unsigned long long int){ 4096, 16384 };
+
+    vec_ud_result = vec_mul (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply low */
+    d_arg1 = (vector signed long long int){ 2048, 4096 };
+    d_arg2 = (vector signed long long int){ 2, 4 };
+    vec_d_expected = (vector signed long long int){ 4096, 16384 };
+
+    vec_d_result = vec_mul (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+  }
-- 
2.17.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions
  2020-11-24 18:59       ` [PATCH v2] " Carl Love
@ 2020-11-25  2:17         ` Pat Haugen
  2020-11-25  2:34           ` Pat Haugen
  0 siblings, 1 reply; 22+ messages in thread
From: Pat Haugen @ 2020-11-25  2:17 UTC (permalink / raw)
  To: Carl Love, Segher Boessenkool; +Cc: GCC Patches, David Edelsohn

On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
> +
> +(define_insn "dives_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +		       UNSPEC_VDIVES))]
> +  "TARGET_POWER10"
> +  "vdives<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "diveu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +			 (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +			UNSPEC_VDIVEU))]
> +  "TARGET_POWER10"
> +  "vdiveu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "div<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "udiv<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "mods_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmods<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "modu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmodu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])

We should only be setting "size" "128" for instructions that operate on scalar 128-bit data items (i.e. 'vdivesq' etc). Since the above insns are either V2DI/V4SI (ala VIlong mode_iterator), they shouldn't be marked as size 128. If you want to set the size based on mode, (set_attr "size" "<bits>") should do the trick I believe.

-Pat

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions
  2020-11-25  2:17         ` Pat Haugen
@ 2020-11-25  2:34           ` Pat Haugen
  2020-11-26  2:30             ` Segher Boessenkool
  0 siblings, 1 reply; 22+ messages in thread
From: Pat Haugen @ 2020-11-25  2:34 UTC (permalink / raw)
  To: Carl Love, Segher Boessenkool; +Cc: GCC Patches, David Edelsohn

On 11/24/20 8:17 PM, Pat Haugen via Gcc-patches wrote:
> On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
>> +
>> +(define_insn "dives_<mode>"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
>> +		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
>> +		       UNSPEC_VDIVES))]
>> +  "TARGET_POWER10"
>> +  "vdives<wd> %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "diveu_<mode>"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +        (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
>> +			 (match_operand:VIlong 2 "vsx_register_operand" "v")]
>> +			UNSPEC_VDIVEU))]
>> +  "TARGET_POWER10"
>> +  "vdiveu<wd> %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "div<mode>3"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vdivs<wd> %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "udiv<mode>3"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vdivu<wd> %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "mods_<mode>"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vmods<wd> %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "modu_<mode>"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> +		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vmodu<wd> %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
> 
> We should only be setting "size" "128" for instructions that operate on scalar 128-bit data items (i.e. 'vdivesq' etc). Since the above insns are either V2DI/V4SI (ala VIlong mode_iterator), they shouldn't be marked as size 128. If you want to set the size based on mode, (set_attr "size" "<bits>") should do the trick I believe.

Well, after you update "(define_mode_attr bits" in rs6000.md for V2DI/V4SI.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions
  2020-11-25  2:34           ` Pat Haugen
@ 2020-11-26  2:30             ` Segher Boessenkool
  2020-12-01 23:48               ` Carl Love
  0 siblings, 1 reply; 22+ messages in thread
From: Segher Boessenkool @ 2020-11-26  2:30 UTC (permalink / raw)
  To: Pat Haugen; +Cc: Carl Love, GCC Patches, David Edelsohn

On Tue, Nov 24, 2020 at 08:34:51PM -0600, Pat Haugen wrote:
> On 11/24/20 8:17 PM, Pat Haugen via Gcc-patches wrote:
> > On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
> >> +(define_insn "modu_<mode>"
> >> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> >> +	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> >> +		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> >> +  "TARGET_POWER10"
> >> +  "vmodu<wd> %0,%1,%2"
> >> +  [(set_attr "type" "vecdiv")
> >> +   (set_attr "size" "128")])
> > 
> > We should only be setting "size" "128" for instructions that operate on scalar 128-bit data items (i.e. 'vdivesq' etc). Since the above insns are either V2DI/V4SI (ala VIlong mode_iterator), they shouldn't be marked as size 128. If you want to set the size based on mode, (set_attr "size" "<bits>") should do the trick I believe.
> 
> Well, after you update "(define_mode_attr bits" in rs6000.md for V2DI/V4SI.

So far, <size> was only used for scalars.  I agree that for vectors it
makes most sense to do the element size (because the vector size always
is 128 bits, and for scheduling the element size can matter).  But, the
definitions of <size> and <bits> now say

;; What data size does this instruction work on?
;; This is used for insert, mul and others as necessary.
(define_attr "size" "8,16,32,64,128" (const_string "32"))

and

;; How many bits in this mode?
(define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
                                           (SF "32") (DF "64")])
so those need a bit of update as well then :-)


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions
  2020-11-26  2:30             ` Segher Boessenkool
@ 2020-12-01 23:48               ` Carl Love
  2020-12-02 23:14                 ` will schmidt
  2020-12-03 19:55                 ` will schmidt
  0 siblings, 2 replies; 22+ messages in thread
From: Carl Love @ 2020-12-01 23:48 UTC (permalink / raw)
  To: Segher Boessenkool, Pat Haugen; +Cc: GCC Patches, David Edelsohn, cel

Segher, Pat:

I have updated the patch to address the comments below.

On Wed, 2020-11-25 at 20:30 -0600, Segher Boessenkool wrote:
> On Tue, Nov 24, 2020 at 08:34:51PM -0600, Pat Haugen wrote:
> > On 11/24/20 8:17 PM, Pat Haugen via Gcc-patches wrote:
> > > On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
> > > > +(define_insn "modu_<mode>"
> > > > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > > > +	(umod:VIlong (match_operand:VIlong 1
> > > > "vsx_register_operand" "v")
> > > > +		     (match_operand:VIlong 2
> > > > "vsx_register_operand" "v")))]
> > > > +  "TARGET_POWER10"
> > > > +  "vmodu<wd> %0,%1,%2"
> > > > +  [(set_attr "type" "vecdiv")
> > > > +   (set_attr "size" "128")])
> > > 
> > > We should only be setting "size" "128" for instructions that
> > > operate on scalar 128-bit data items (i.e. 'vdivesq' etc). Since
> > > the above insns are either V2DI/V4SI (ala VIlong mode_iterator),
> > > they shouldn't be marked as size 128. If you want to set the size
> > > based on mode, (set_attr "size" "<bits>") should do the trick I
> > > believe.
> > 
> > Well, after you update "(define_mode_attr bits" in rs6000.md for
> > V2DI/V4SI.
> 
> So far, <size> was only used for scalars.  I agree that for vectors
> it
> makes most sense to do the element size (because the vector size
> always
> is 128 bits, and for scheduling the element size can matter).  But,
> the
> definitions of <size> and <bits> now say
> 
> ;; What data size does this instruction work on?
> ;; This is used for insert, mul and others as necessary.
> (define_attr "size" "8,16,32,64,128" (const_string "32"))
> 
> and
> 
> ;; How many bits in this mode?
> (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
>                                            (SF "32") (DF "64")])
> so those need a bit of update as well then :-)

I set the size based on the vector element size, extendeing the
define_mode_attr bits definition.  Please take a look at the updated
patch.  Hopefully I have this all correct.  Thanks.

Note, I retested the updated patch on 

  powerpc64le-unknown-linux-gnu (Power 9 LE)
  powerpc64le-unknown-linux-gnu (Power 10 LE)

Thanks for the help.

                     Carl 

-----------------------------------------------------------------------
rs6000, vector integer multiply/divide/modulo instructions

2020-12-01  Carl Love  <cel@us.ibm.com>

gcc/
	* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
	defines.
	* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
	* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
	DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
	DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
	MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
	Add builtin define.
	(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
	* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
	P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH):
	New overloaded definitions.
	(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
	P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
	P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
	P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
	P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
	statement for builtins.
	* config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute.
	(UNSPEC_VDIVES, UNSPEC_VDIVEU): Add enum for UNSPECs.
	(vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement.
	(dives_<mode>, diveu_<mode>, div<mode>3, uvdiv<mode>3,
	mods_<mode>, modu_<mode>, mulhs_<mode>, mulhu_<mode>, mulv2di3):
	Add define_insn, mode is VIlong.
	* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
	builtin descriptions.

gcc/testsuite/
	* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |   5 +
 gcc/config/rs6000/altivec.md                  |   2 -
 gcc/config/rs6000/rs6000-builtin.def          |  22 +
 gcc/config/rs6000/rs6000-call.c               |  49 +++
 gcc/config/rs6000/rs6000.md                   |   3 +-
 gcc/config/rs6000/vsx.md                      | 213 +++++++---
 gcc/doc/extend.texi                           | 120 ++++++
 .../powerpc/builtins-1-p10-runnable.c         | 398 ++++++++++++++++++
 8 files changed, 759 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e1884f51bd8..12ccbd2fc2f 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,11 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a)	__builtin_vec_strir_p (a)
 #define vec_stril_p(a)	__builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
+#define vec_div(a, b) __builtin_vec_div ((a), (b))
+#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
+#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..f10f1cdd8a7 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -193,8 +193,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 47b1f74e616..e9ea2114615 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2883,6 +2883,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
 BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
 
+BU_P10V_AV_2 (DIVES_V4SI, "vdivesw", CONST, dives_v4si)
+BU_P10V_AV_2 (DIVES_V2DI, "vdivesd", CONST, dives_v2di)
+BU_P10V_AV_2 (DIVEU_V4SI, "vdiveuw", CONST, diveu_v4si)
+BU_P10V_AV_2 (DIVEU_V2DI, "vdiveud", CONST, diveu_v2di)
+BU_P10V_AV_2 (DIVS_V4SI, "vdivsw", CONST, divv4si3)
+BU_P10V_AV_2 (DIVS_V2DI, "vdivsd", CONST, divv2di3)
+BU_P10V_AV_2 (DIVU_V4SI, "vdivuw", CONST, udivv4si3)
+BU_P10V_AV_2 (DIVU_V2DI, "vdivud", CONST, udivv2di3)
+BU_P10V_AV_2 (MODS_V2DI, "vmodsd", CONST, mods_v2di)
+BU_P10V_AV_2 (MODS_V4SI, "vmodsw", CONST, mods_v4si)
+BU_P10V_AV_2 (MODU_V2DI, "vmodud", CONST, modu_v2di)
+BU_P10V_AV_2 (MODU_V4SI, "vmoduw", CONST, modu_v4si)
+BU_P10V_AV_2 (MULHS_V2DI, "vmulhsd", CONST, mulhs_v2di)
+BU_P10V_AV_2 (MULHS_V4SI, "vmulhsw", CONST, mulhs_v4si)
+BU_P10V_AV_2 (MULHU_V2DI, "vmulhud", CONST, mulhu_v2di)
+BU_P10V_AV_2 (MULHU_V4SI, "vmulhuw", CONST, mulhu_v4si)
+BU_P10V_AV_2 (MULLD_V2DI, "vmulld", CONST, mulv2di3)
+
 BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, xxspltiw_v4si)
 BU_P10V_VSX_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST, xxspltiw_v4sf)
 
@@ -2958,6 +2976,10 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
 BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
 BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
 
+BU_P10_OVERLOAD_2 (MULH, "mulh")
+BU_P10_OVERLOAD_2 (DIVE, "dive")
+BU_P10_OVERLOAD_2 (MOD, "mod")
+
 \f
 BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
 BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 45bc048b5c7..5b310ea9039 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -1069,6 +1069,35 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
@@ -1909,6 +1938,17 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB,
@@ -14438,6 +14478,15 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10V_BUILTIN_XXGENPCVM_V8HI:
     case P10V_BUILTIN_XXGENPCVM_V4SI:
     case P10V_BUILTIN_XXGENPCVM_V2DI:
+    case P10V_BUILTIN_DIVEU_V4SI:
+    case P10V_BUILTIN_DIVEU_V2DI:
+    case P10V_BUILTIN_DIVU_V4SI:
+    case P10V_BUILTIN_DIVU_V2DI:
+    case P10V_BUILTIN_MODU_V2DI:
+    case P10V_BUILTIN_MODU_V4SI:
+    case P10V_BUILTIN_MULHU_V2DI:
+    case P10V_BUILTIN_MULHU_V4SI:
+    case P10V_BUILTIN_MULLD_V2DI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b89990f46bf..1575cf54580 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -670,7 +670,8 @@
 
 ;; How many bits in this mode?
 (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
-					   (SF "32") (DF "64")])
+					   (SF "32") (DF "64")
+					   (V4SI "32") (V2DI "64")])
 
 ; DImode bits
 (define_mode_attr dbits [(QI "56") (HI "48") (SI "32")])
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 947631d83ee..0cc202e7c74 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -267,6 +267,10 @@
 (define_mode_iterator VSX_MM [V16QI V8HI V4SI V2DI V1TI])
 (define_mode_iterator VSX_MM4 [V16QI V8HI V4SI V2DI])
 
+;; Longer vec int modes for rotate/mask ops
+;; and Vector Integer Multiply/Divide/Modulo Instructions
+(define_mode_iterator VIlong [V2DI V4SI])
+
 ;; Constants for creating unspecs
 (define_c_enum "unspec"
   [UNSPEC_VSX_CONCAT
@@ -363,8 +367,11 @@
    UNSPEC_INSERTR
    UNSPEC_REPLACE_ELT
    UNSPEC_REPLACE_UN
+   UNSPEC_VDIVES
+   UNSPEC_VDIVEU
   ])
 
+
 (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
 				 UNSPEC_VSX_XVCVBF16SPN])
 
@@ -1623,28 +1630,35 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op5, op3, op4));
-  else
-    {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op5, ret);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op3, op3, op4));
+
+  if (TARGET_POWER10)
+    emit_insn (gen_mulv2di3 (op0, op1, op2) );
+
   else
     {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op3, ret);
+      rtx op3 = gen_reg_rtx (DImode);
+      rtx op4 = gen_reg_rtx (DImode);
+      rtx op5 = gen_reg_rtx (DImode);
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op5, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op5, ret);
+	}
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op3, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op3, ret);
+	}
+      emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
     }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }
   [(set_attr "type" "mul")])
@@ -1718,37 +1732,47 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op5, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op5, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op5, target);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op3, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op3, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op3, target);
-    }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
-  DONE;
+
+    if (TARGET_POWER10)
+      emit_insn (gen_udivv2di3 (op0, op1, op2) );
+
+    else
+      {
+	rtx op3 = gen_reg_rtx (DImode);
+	rtx op4 = gen_reg_rtx (DImode);
+	rtx op5 = gen_reg_rtx (DImode);
+
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op5, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op5, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op5, target);
+	  }
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op3, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op3, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op3, target);
+	  }
+	emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
+      }
+    DONE;
 }
   [(set_attr "type" "div")])
 
@@ -6104,3 +6128,92 @@
   "TARGET_POWER10"
   "vexpand<wd>m %0,%1"
   [(set_attr "type" "vecsimple")])
+
+(define_insn "dives_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
+		       UNSPEC_VDIVES))]
+  "TARGET_POWER10"
+  "vdives<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "diveu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+			 (match_operand:VIlong 2 "vsx_register_operand" "v")]
+			UNSPEC_VDIVEU))]
+  "TARGET_POWER10"
+  "vdiveu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "div<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivs<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "udiv<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "mods_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmods<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "modu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmodu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "mulhs_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(mult:VIlong (ashiftrt
+		       (match_operand:VIlong 1 "vsx_register_operand" "v")
+		       (const_int 32))
+		     (ashiftrt
+		       (match_operand:VIlong 2 "vsx_register_operand" "v")
+		       (const_int 32))))]
+  "TARGET_POWER10"
+  "vmulhs<wd> %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+(define_insn "mulhu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(us_mult:VIlong (ashiftrt
+			  (match_operand:VIlong 1 "vsx_register_operand" "v")
+			  (const_int 32))
+			(ashiftrt
+			  (match_operand:VIlong 2 "vsx_register_operand" "v")
+			  (const_int 32))))]
+  "TARGET_POWER10"
+  "vmulhu<wd> %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+;; Vector multiply low double word
+(define_insn "mulv2di3"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
+	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
+		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmulld %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 23ede966bae..e20abd8f1f5 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21568,6 +21568,126 @@ integer value between 0 and 255 inclusive.
 @exdent vector unsigned int vec_genpcvm (vector unsigned long long int,
                                          const int)
 @end smallexample
+
+Vector Integer Multiply/Divide/Modulo
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mulh (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mulh (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer
+value in word element @code{i} of a is multiplied by the integer value in word
+element @code{i} of b. The high-order 32 bits of the 64-bit product are placed
+into word element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mulh (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mulh (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer
+value in doubleword element @code{i} of a is multiplied by the integer value in
+doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product
+are placed into doubleword element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector unsigned long long
+@exdent vec_mul (vector unsigned long long a, vector unsigned long long b)
+@exdent vector signed long long
+@exdent vec_mul (vector signed long long a, vector signed long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer
+value in doubleword element @code{i} of a is multiplied by the integer value in
+doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product
+are placed into doubleword element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_div (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_div (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is divided by the integer in word element @code{i}
+of b. The unique integer quotient is placed into the word element @code{i} of
+the vector returned. If an attempt is made to perform any of the divisions
+<anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_div (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_div (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is divided by the integer in doubleword
+element @code{i} of b. The unique integer quotient is placed into the
+doubleword element @code{i} of the vector returned. If an attempt is made to
+perform any of the divisions 0x8000_0000_0000_0000 ÷ -1 or <anything> ÷ 0 then
+the quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_dive (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_dive (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is shifted left by 32 bits, then divided by the
+integer in word element @code{i} of b. The unique integer quotient is placed
+into the word element @code{i} of the vector returned. If the quotient cannot
+be represented in 32 bits, or if an attempt is made to perform any of the
+divisions <anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_dive (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_dive (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is shifted left by 64 bits, then divided by
+the integer in doubleword element @code{i} of b. The unique integer quotient is
+placed into the doubleword element @code{i} of the vector returned. If the
+quotient cannot be represented in 64 bits, or if an attempt is made to perform
+<anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mod (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mod (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is divided by the integer in word element @code{i}
+of b. The unique integer remainder is placed into the word element @code{i} of
+the vector returned.  If an attempt is made to perform any of the divisions
+0x8000_0000 ÷ -1 or <anything> ÷ 0 then the remainder is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mod (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mod (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is divided by the integer in doubleword
+element @code{i} of b. The unique integer remainder is placed into the
+doubleword element @code{i} of the vector returned. If an attempt is made to
+perform <anything> ÷ 0 then the remainder is undefined.
+
 Generate PCV from specified Mask size, as if implemented by the
 @code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where
 immediate value is either 0, 1, 2 or 3.
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
new file mode 100644
index 00000000000..222c8b3a409
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -0,0 +1,398 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+
+/* { dg-final { scan-assembler-times {\mvdivsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmoduw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulld\M} 2 } } */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <math.h>
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+void abort (void);
+
+int main()
+  {
+    int i;
+    vector int i_arg1, i_arg2;
+    vector unsigned int u_arg1, u_arg2;
+    vector long long int d_arg1, d_arg2;
+    vector long long unsigned int ud_arg1, ud_arg2;
+   
+    vector int vec_i_expected, vec_i_result;
+    vector unsigned int vec_u_expected, vec_u_result;
+    vector long long int vec_d_expected, vec_d_result;
+    vector long long unsigned int vec_ud_expected, vec_ud_result;
+  
+    /* Signed word divide */
+    i_arg1 = (vector int){ 20, 40, 60, 80};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){10, 20, 30, 40};
+
+    vec_i_result = vec_div (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word divide */
+    u_arg1 = (vector unsigned int){ 20, 40, 60, 80};
+    u_arg2 = (vector unsigned int){ 2, 2, 2, 2};
+    vec_u_expected = (vector unsigned int){10, 20, 30, 40};
+
+    vec_u_result = vec_div (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word divide */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 2, 2};
+    vec_d_expected = (vector long long){12, 34};
+
+    vec_d_result = vec_div (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+	  printf("ERROR vec_div signed result[%d] = %d != "
+		 "expected[%d] = %d\n",
+		 i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word divide */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 2, 2};
+    vec_ud_expected = (vector unsigned long long){12, 34};
+
+    vec_ud_result = vec_div (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended signed word  result = (arg1 << 32)/arg2 */
+    i_arg1 = (vector int){ 2, 4, 6, 8};
+    i_arg2 = (vector int){ 2048, 2048, 2048, 2048};
+    vec_i_expected = (vector int){4194304, 8388608, 12582912, 16777216};
+
+    vec_i_result = vec_dive (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended unsigned word  result = (arg1 << 32)/arg2 */
+    u_arg1 = (vector unsigned int){ 2, 4, 6, 8};
+    u_arg2 = (vector unsigned int){ 2048, 2048, 2048, 2048};
+    vec_u_expected = (vector unsigned int){4194304, 8388608,
+					   12582912, 16777216};
+
+    vec_u_result = vec_dive (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double signed  esult = (arg1 << 64)/arg2 */
+    d_arg1 = (vector long long int){ 2, 4};
+    d_arg2 = (vector long long int){ 4294967296, 4294967296};
+
+    vec_d_expected = (vector long long int){8589934592, 17179869184};
+
+    vec_d_result = vec_dive (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive signed result[%d] = %lld != "
+		  "expected[%d] = %lld\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double unsigned result = (arg1 << 64)/arg2 */
+    ud_arg1 = (vector long long unsigned int){ 2, 4};
+    ud_arg2 = (vector long long unsigned int){ 4294967296, 4294967296};
+
+    vec_ud_expected = (vector long long unsigned int){8589934592,
+						      17179869184};
+
+    vec_ud_result = vec_dive (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive unsigned result[%d] = %lld != "
+		  "expected[%d] = %lld\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word modulo */
+    i_arg1 = (vector int){ 23, 45, 61, 89};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){1, 1, 1, 1};
+
+    vec_i_result = vec_mod (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word modulo */
+    u_arg1 = (vector unsigned int){ 25, 41, 67, 86};
+    u_arg2 = (vector unsigned int){ 3, 3, 3, 3};
+    vec_u_expected = (vector unsigned int){1, 2, 1, 2};
+
+    vec_u_result = vec_mod (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word modulo */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 7, 7};
+    vec_d_expected = (vector long long){3, 5};
+
+    vec_d_result = vec_mod (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word modulo */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 8, 8};
+    vec_ud_expected = (vector unsigned long long){0, 4};
+
+    vec_ud_result = vec_mod (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vecmod unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word multiply high */
+    i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 };
+    i_arg2 = (vector int){ 2, 3, 4, 5};
+    //    vec_i_expected = (vector int){-1, -2, -2, -3};
+    vec_i_expected = (vector int){1, -2, -2, -3};
+
+    vec_i_result = vec_mulh (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word multiply high */
+    u_arg1 = (vector unsigned int){ 2147483648, 2147483648,
+				    2147483648, 2147483648 };
+    u_arg2 = (vector unsigned int){ 4, 5, 6, 7 };
+    vec_u_expected = (vector unsigned int){2, 2, 3, 3 };
+
+    vec_u_result = vec_mulh (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply high */
+    d_arg1 = (vector long long int){  2305843009213693951,
+				      4611686018427387903 };
+    d_arg2 = (vector long long int){ 12, 20 };
+    vec_d_expected = (vector long long int){ 1, 4 };
+
+    vec_d_result = vec_mulh (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply high */
+    ud_arg1 = (vector unsigned long long int){ 2305843009213693951,
+					       4611686018427387903 };
+    ud_arg2 = (vector unsigned long long int){ 32, 10 };
+    vec_ud_expected = (vector unsigned long long int){ 3, 2 };
+
+    vec_ud_result = vec_mulh (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply low */
+    ud_arg1 = (vector unsigned long long int){ 2048, 4096 };
+    ud_arg2 = (vector unsigned long long int){ 2, 4 };
+    vec_ud_expected = (vector unsigned long long int){ 4096, 16384 };
+
+    vec_ud_result = vec_mul (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply low */
+    d_arg1 = (vector signed long long int){ 2048, 4096 };
+    d_arg2 = (vector signed long long int){ 2, 4 };
+    vec_d_expected = (vector signed long long int){ 4096, 16384 };
+
+    vec_d_result = vec_mul (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+  }
-- 
2.17.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions
  2020-12-01 23:48               ` Carl Love
@ 2020-12-02 23:14                 ` will schmidt
  2020-12-03 19:55                 ` will schmidt
  1 sibling, 0 replies; 22+ messages in thread
From: will schmidt @ 2020-12-02 23:14 UTC (permalink / raw)
  To: Carl Love, Segher Boessenkool, Pat Haugen; +Cc: GCC Patches, David Edelsohn

On Tue, 2020-12-01 at 15:48 -0800, Carl Love via Gcc-patches wrote:
> Segher, Pat:
> 
> I have updated the patch to address the comments below.

In all the excitement, i've lost track of some of the details throughout the thread.  :-) 


Subject: Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions

This is at least now V3.


Given the number of changes, May be worth re-posting as a a clean [v3] version, etc..

> 
> On Wed, 2020-11-25 at 20:30 -0600, Segher Boessenkool wrote:
> > On Tue, Nov 24, 2020 at 08:34:51PM -0600, Pat Haugen wrote:
> > > On 11/24/20 8:17 PM, Pat Haugen via Gcc-patches wrote:
> > > > On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
> > > > > +(define_insn "modu_<mode>"
> > > > > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > > > > +	(umod:VIlong (match_operand:VIlong 1
> > > > > "vsx_register_operand" "v")
> > > > > +		     (match_operand:VIlong 2
> > > > > "vsx_register_operand" "v")))]
> > > > > +  "TARGET_POWER10"
> > > > > +  "vmodu<wd> %0,%1,%2"
> > > > > +  [(set_attr "type" "vecdiv")
> > > > > +   (set_attr "size" "128")])
> > > > 
> > > > We should only be setting "size" "128" for instructions that
> > > > operate on scalar 128-bit data items (i.e. 'vdivesq' etc). Since
> > > > the above insns are either V2DI/V4SI (ala VIlong mode_iterator),
> > > > they shouldn't be marked as size 128. If you want to set the size
> > > > based on mode, (set_attr "size" "<bits>") should do the trick I
> > > > believe.
> > > 
> > > Well, after you update "(define_mode_attr bits" in rs6000.md for
> > > V2DI/V4SI.
> > 
> > So far, <size> was only used for scalars.  I agree that for vectors
> > it
> > makes most sense to do the element size (because the vector size
> > always
> > is 128 bits, and for scheduling the element size can matter).  But,
> > the
> > definitions of <size> and <bits> now say
> > 
> > ;; What data size does this instruction work on?
> > ;; This is used for insert, mul and others as necessary.
> > (define_attr "size" "8,16,32,64,128" (const_string "32"))
> > 
> > and
> > 
> > ;; How many bits in this mode?
> > (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
> >                                            (SF "32") (DF "64")])
> > so those need a bit of update as well then :-)
> 
> I set the size based on the vector element size, extendeing the
> define_mode_attr bits definition.  Please take a look at the updated
> patch.  Hopefully I have this all correct.  Thanks.


Would be useful to include the patch descriptionm as a standalone paragraph here.  

I believe the first email in the thread contained this:

> GCC maintainers:
> 
> The following patch adds new builtins for the vector integer multiply,
> divide and modulo operations.  The builtins are:  
> vec_mulh(), vec_div(), vec_dive(), vec_mod() for signed and unsigned
> integers and long long integers.  Support for signed and unsigned long
> long integers the exiting vec_mul() is added.  Note that the existing
> support for the vec_div()and vec_mul() builtins emulate the vector
> operations with multiple scalar instructions.  This patch adds support
> for these builtins to use the new vector instructions.
> 

I don't see an updated in-between version. 
Nit: ".. exiting vec_mul() is added" doesn't read quite right.
First and last sentences there can probably be combined.





> 
> Note, I retested the updated patch on 
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
>   powerpc64le-unknown-linux-gnu (Power 10 LE)
> 
> Thanks for the help.
> 
>                      Carl 
> 
> -----------------------------------------------------------------------
> rs6000, vector integer multiply/divide/modulo instructions
> 
> 2020-12-01  Carl Love  <cel@us.ibm.com>
> 
> gcc/
> 	* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
> 	defines.
> 	* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
> 	* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
> 	DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
> 	DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
> 	MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
> 	Add builtin define.
> 	(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
> 	* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
> 	P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH):
> 	New overloaded definitions.
> 	(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
> 	P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
> 	P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
> 	P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
> 	P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
> 	statement for builtins.
> 	* config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute.
> 	(UNSPEC_VDIVES, UNSPEC_VDIVEU): Add enum for UNSPECs.
> 	(vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement.
> 	(dives_<mode>, diveu_<mode>, div<mode>3, uvdiv<mode>3,
> 	mods_<mode>, modu_<mode>, mulhs_<mode>, mulhu_<mode>, mulv2di3):
> 	Add define_insn, mode is VIlong.
> 	* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
> 	builtin descriptions.
> 
> gcc/testsuite/
> 	* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
> ---
>  gcc/config/rs6000/altivec.h                   |   5 +
>  gcc/config/rs6000/altivec.md                  |   2 -
>  gcc/config/rs6000/rs6000-builtin.def          |  22 +
>  gcc/config/rs6000/rs6000-call.c               |  49 +++
>  gcc/config/rs6000/rs6000.md                   |   3 +-
>  gcc/config/rs6000/vsx.md                      | 213 +++++++---
>  gcc/doc/extend.texi                           | 120 ++++++
>  .../powerpc/builtins-1-p10-runnable.c         | 398 ++++++++++++++++++
>  8 files changed, 759 insertions(+), 53 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index e1884f51bd8..12ccbd2fc2f 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -750,6 +750,11 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_strir_p(a)	__builtin_vec_strir_p (a)
>  #define vec_stril_p(a)	__builtin_vec_stril_p (a)
> 
> +#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
> +#define vec_div(a, b) __builtin_vec_div ((a), (b))
> +#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
> +#define vec_mod(a, b) __builtin_vec_mod ((a), (b))


Looking at an otherwise pristine recent upstream tree, we currently
already have at least one vec_div define.  Does that entry need
adjustment?

	#ifdef __VSX__
	/* VSX additions */
	#define vec_div __builtin_vec_div


> +
>  /* VSX Mask Manipulation builtin. */
>  #define vec_genbm __builtin_vec_mtvsrbm
>  #define vec_genhm __builtin_vec_mtvsrhm
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 6a6ce0f84ed..f10f1cdd8a7 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -193,8 +193,6 @@
> 
>  ;; Short vec int modes
>  (define_mode_iterator VIshort [V8HI V16QI])
> -;; Longer vec int modes for rotate/mask ops
> -(define_mode_iterator VIlong [V2DI V4SI])
>  ;; Vec float modes
>  (define_mode_iterator VF [V4SF])
>  ;; Vec modes, pity mode iterators are not composable
ok


this is as far as I can get today, i'll look close at the rest when I
can unless others chime in before I get back..
thanks
-Will


> diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
> index 47b1f74e616..e9ea2114615 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2883,6 +2883,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
>  BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
>  BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
> 
> +BU_P10V_AV_2 (DIVES_V4SI, "vdivesw", CONST, dives_v4si)
> +BU_P10V_AV_2 (DIVES_V2DI, "vdivesd", CONST, dives_v2di)
> +BU_P10V_AV_2 (DIVEU_V4SI, "vdiveuw", CONST, diveu_v4si)
> +BU_P10V_AV_2 (DIVEU_V2DI, "vdiveud", CONST, diveu_v2di)
> +BU_P10V_AV_2 (DIVS_V4SI, "vdivsw", CONST, divv4si3)
> +BU_P10V_AV_2 (DIVS_V2DI, "vdivsd", CONST, divv2di3)
> +BU_P10V_AV_2 (DIVU_V4SI, "vdivuw", CONST, udivv4si3)
> +BU_P10V_AV_2 (DIVU_V2DI, "vdivud", CONST, udivv2di3)
> +BU_P10V_AV_2 (MODS_V2DI, "vmodsd", CONST, mods_v2di)
> +BU_P10V_AV_2 (MODS_V4SI, "vmodsw", CONST, mods_v4si)
> +BU_P10V_AV_2 (MODU_V2DI, "vmodud", CONST, modu_v2di)
> +BU_P10V_AV_2 (MODU_V4SI, "vmoduw", CONST, modu_v4si)
> +BU_P10V_AV_2 (MULHS_V2DI, "vmulhsd", CONST, mulhs_v2di)
> +BU_P10V_AV_2 (MULHS_V4SI, "vmulhsw", CONST, mulhs_v4si)
> +BU_P10V_AV_2 (MULHU_V2DI, "vmulhud", CONST, mulhu_v2di)
> +BU_P10V_AV_2 (MULHU_V4SI, "vmulhuw", CONST, mulhu_v4si)
> +BU_P10V_AV_2 (MULLD_V2DI, "vmulld", CONST, mulv2di3)
> +
>  BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, xxspltiw_v4si)
>  BU_P10V_VSX_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST, xxspltiw_v4sf)
> 
> @@ -2958,6 +2976,10 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
> 
> +BU_P10_OVERLOAD_2 (MULH, "mulh")
> +BU_P10_OVERLOAD_2 (DIVE, "dive")
> +BU_P10_OVERLOAD_2 (MOD, "mod")
> +
>  
>  BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
>  BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 45bc048b5c7..5b310ea9039 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -1069,6 +1069,35 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
>    { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
>      RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
> @@ -1909,6 +1938,17 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB,
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
>    { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB,
>      RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB,
> @@ -14438,6 +14478,15 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
>      case P10V_BUILTIN_XXGENPCVM_V8HI:
>      case P10V_BUILTIN_XXGENPCVM_V4SI:
>      case P10V_BUILTIN_XXGENPCVM_V2DI:
> +    case P10V_BUILTIN_DIVEU_V4SI:
> +    case P10V_BUILTIN_DIVEU_V2DI:
> +    case P10V_BUILTIN_DIVU_V4SI:
> +    case P10V_BUILTIN_DIVU_V2DI:
> +    case P10V_BUILTIN_MODU_V2DI:
> +    case P10V_BUILTIN_MODU_V4SI:
> +    case P10V_BUILTIN_MULHU_V2DI:
> +    case P10V_BUILTIN_MULHU_V4SI:
> +    case P10V_BUILTIN_MULLD_V2DI:
>        h.uns_p[0] = 1;
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index b89990f46bf..1575cf54580 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -670,7 +670,8 @@
> 
>  ;; How many bits in this mode?
>  (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
> -					   (SF "32") (DF "64")])
> +					   (SF "32") (DF "64")
> +					   (V4SI "32") (V2DI "64")])
> 
>  ; DImode bits
>  (define_mode_attr dbits [(QI "56") (HI "48") (SI "32")])
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 947631d83ee..0cc202e7c74 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -267,6 +267,10 @@
>  (define_mode_iterator VSX_MM [V16QI V8HI V4SI V2DI V1TI])
>  (define_mode_iterator VSX_MM4 [V16QI V8HI V4SI V2DI])
> 
> +;; Longer vec int modes for rotate/mask ops
> +;; and Vector Integer Multiply/Divide/Modulo Instructions
> +(define_mode_iterator VIlong [V2DI V4SI])
> +
>  ;; Constants for creating unspecs
>  (define_c_enum "unspec"
>    [UNSPEC_VSX_CONCAT
> @@ -363,8 +367,11 @@
>     UNSPEC_INSERTR
>     UNSPEC_REPLACE_ELT
>     UNSPEC_REPLACE_UN
> +   UNSPEC_VDIVES
> +   UNSPEC_VDIVEU
>    ])
> 
> +
>  (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
>  				 UNSPEC_VSX_XVCVBF16SPN])
> 
> @@ -1623,28 +1630,35 @@
>    rtx op0 = operands[0];
>    rtx op1 = operands[1];
>    rtx op2 = operands[2];
> -  rtx op3 = gen_reg_rtx (DImode);
> -  rtx op4 = gen_reg_rtx (DImode);
> -  rtx op5 = gen_reg_rtx (DImode);
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_muldi3 (op5, op3, op4));
> -  else
> -    {
> -      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> -      emit_move_insn (op5, ret);
> -    }
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_muldi3 (op3, op3, op4));
> +
> +  if (TARGET_POWER10)
> +    emit_insn (gen_mulv2di3 (op0, op1, op2) );
> +
>    else
>      {
> -      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> -      emit_move_insn (op3, ret);
> +      rtx op3 = gen_reg_rtx (DImode);
> +      rtx op4 = gen_reg_rtx (DImode);
> +      rtx op5 = gen_reg_rtx (DImode);
> +      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> +      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> +      if (TARGET_POWERPC64)
> +	emit_insn (gen_muldi3 (op5, op3, op4));
> +      else
> +	{
> +	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> +	  emit_move_insn (op5, ret);
> +	}
> +      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> +      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> +      if (TARGET_POWERPC64)
> +	emit_insn (gen_muldi3 (op3, op3, op4));
> +      else
> +	{
> +	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> +	  emit_move_insn (op3, ret);
> +	}
> +      emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
>      }
> -  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
>    DONE;
>  }
>    [(set_attr "type" "mul")])
> @@ -1718,37 +1732,47 @@
>    rtx op0 = operands[0];
>    rtx op1 = operands[1];
>    rtx op2 = operands[2];
> -  rtx op3 = gen_reg_rtx (DImode);
> -  rtx op4 = gen_reg_rtx (DImode);
> -  rtx op5 = gen_reg_rtx (DImode);
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_udivdi3 (op5, op3, op4));
> -  else
> -    {
> -      rtx libfunc = optab_libfunc (udiv_optab, DImode);
> -      rtx target = emit_library_call_value (libfunc,
> -					    op5, LCT_NORMAL, DImode,
> -					    op3, DImode,
> -					    op4, DImode);
> -      emit_move_insn (op5, target);
> -    }
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_udivdi3 (op3, op3, op4));
> -  else
> -    {
> -      rtx libfunc = optab_libfunc (udiv_optab, DImode);
> -      rtx target = emit_library_call_value (libfunc,
> -					    op3, LCT_NORMAL, DImode,
> -					    op3, DImode,
> -					    op4, DImode);
> -      emit_move_insn (op3, target);
> -    }
> -  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
> -  DONE;
> +
> +    if (TARGET_POWER10)
> +      emit_insn (gen_udivv2di3 (op0, op1, op2) );
> +
> +    else
> +      {
> +	rtx op3 = gen_reg_rtx (DImode);
> +	rtx op4 = gen_reg_rtx (DImode);
> +	rtx op5 = gen_reg_rtx (DImode);
> +
> +	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> +	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> +
> +	if (TARGET_POWERPC64)
> +	  emit_insn (gen_udivdi3 (op5, op3, op4));
> +	else
> +	  {
> +	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
> +	    rtx target = emit_library_call_value (libfunc,
> +						  op5, LCT_NORMAL, DImode,
> +						  op3, DImode,
> +						  op4, DImode);
> +	    emit_move_insn (op5, target);
> +	  }
> +	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> +	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> +
> +	if (TARGET_POWERPC64)
> +	  emit_insn (gen_udivdi3 (op3, op3, op4));
> +	else
> +	  {
> +	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
> +	    rtx target = emit_library_call_value (libfunc,
> +						  op3, LCT_NORMAL, DImode,
> +						  op3, DImode,
> +						  op4, DImode);
> +	    emit_move_insn (op3, target);
> +	  }
> +	emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
> +      }
> +    DONE;
>  }
>    [(set_attr "type" "div")])
> 
> @@ -6104,3 +6128,92 @@
>    "TARGET_POWER10"
>    "vexpand<wd>m %0,%1"
>    [(set_attr "type" "vecsimple")])
> +
> +(define_insn "dives_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +		       UNSPEC_VDIVES))]
> +  "TARGET_POWER10"
> +  "vdives<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "diveu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +			 (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +			UNSPEC_VDIVEU))]
> +  "TARGET_POWER10"
> +  "vdiveu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "div<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "udiv<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "mods_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmods<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "modu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmodu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "mulhs_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(mult:VIlong (ashiftrt
> +		       (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		       (const_int 32))
> +		     (ashiftrt
> +		       (match_operand:VIlong 2 "vsx_register_operand" "v")
> +		       (const_int 32))))]
> +  "TARGET_POWER10"
> +  "vmulhs<wd> %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +(define_insn "mulhu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(us_mult:VIlong (ashiftrt
> +			  (match_operand:VIlong 1 "vsx_register_operand" "v")
> +			  (const_int 32))
> +			(ashiftrt
> +			  (match_operand:VIlong 2 "vsx_register_operand" "v")
> +			  (const_int 32))))]
> +  "TARGET_POWER10"
> +  "vmulhu<wd> %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +;; Vector multiply low double word
> +(define_insn "mulv2di3"
> +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> +	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> +		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmulld %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 23ede966bae..e20abd8f1f5 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21568,6 +21568,126 @@ integer value between 0 and 255 inclusive.
>  @exdent vector unsigned int vec_genpcvm (vector unsigned long long int,
>                                           const int)
>  @end smallexample
> +
> +Vector Integer Multiply/Divide/Modulo
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_mulh (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_mulh (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer
> +value in word element @code{i} of a is multiplied by the integer value in word
> +element @code{i} of b. The high-order 32 bits of the 64-bit product are placed
> +into word element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_mulh (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_mulh (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer
> +value in doubleword element @code{i} of a is multiplied by the integer value in
> +doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product
> +are placed into doubleword element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector unsigned long long
> +@exdent vec_mul (vector unsigned long long a, vector unsigned long long b)
> +@exdent vector signed long long
> +@exdent vec_mul (vector signed long long a, vector signed long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer
> +value in doubleword element @code{i} of a is multiplied by the integer value in
> +doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product
> +are placed into doubleword element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_div (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_div (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer in
> +word element @code{i} of a is divided by the integer in word element @code{i}
> +of b. The unique integer quotient is placed into the word element @code{i} of
> +the vector returned. If an attempt is made to perform any of the divisions
> +<anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_div (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_div (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer in
> +doubleword element @code{i} of a is divided by the integer in doubleword
> +element @code{i} of b. The unique integer quotient is placed into the
> +doubleword element @code{i} of the vector returned. If an attempt is made to
> +perform any of the divisions 0x8000_0000_0000_0000 ÷ -1 or <anything> ÷ 0 then
> +the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_dive (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_dive (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer in
> +word element @code{i} of a is shifted left by 32 bits, then divided by the
> +integer in word element @code{i} of b. The unique integer quotient is placed
> +into the word element @code{i} of the vector returned. If the quotient cannot
> +be represented in 32 bits, or if an attempt is made to perform any of the
> +divisions <anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_dive (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_dive (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer in
> +doubleword element @code{i} of a is shifted left by 64 bits, then divided by
> +the integer in doubleword element @code{i} of b. The unique integer quotient is
> +placed into the doubleword element @code{i} of the vector returned. If the
> +quotient cannot be represented in 64 bits, or if an attempt is made to perform
> +<anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_mod (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_mod (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer in
> +word element @code{i} of a is divided by the integer in word element @code{i}
> +of b. The unique integer remainder is placed into the word element @code{i} of
> +the vector returned.  If an attempt is made to perform any of the divisions
> +0x8000_0000 ÷ -1 or <anything> ÷ 0 then the remainder is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_mod (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_mod (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer in
> +doubleword element @code{i} of a is divided by the integer in doubleword
> +element @code{i} of b. The unique integer remainder is placed into the
> +doubleword element @code{i} of the vector returned. If an attempt is made to
> +perform <anything> ÷ 0 then the remainder is undefined.
> +
>  Generate PCV from specified Mask size, as if implemented by the
>  @code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where
>  immediate value is either 0, 1, 2 or 3.
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> new file mode 100644
> index 00000000000..222c8b3a409
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> @@ -0,0 +1,398 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> +
> +/* { dg-final { scan-assembler-times {\mvdivsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivesw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdiveuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivesd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdiveud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmoduw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulld\M} 2 } } */
> +
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <math.h>
> +#include <altivec.h>
> +
> +#define DEBUG 0
> +
> +#ifdef DEBUG
> +#include <stdio.h>
> +#endif
> +
> +void abort (void);
> +
> +int main()
> +  {
> +    int i;
> +    vector int i_arg1, i_arg2;
> +    vector unsigned int u_arg1, u_arg2;
> +    vector long long int d_arg1, d_arg2;
> +    vector long long unsigned int ud_arg1, ud_arg2;
> +   
> +    vector int vec_i_expected, vec_i_result;
> +    vector unsigned int vec_u_expected, vec_u_result;
> +    vector long long int vec_d_expected, vec_d_result;
> +    vector long long unsigned int vec_ud_expected, vec_ud_result;
> +  
> +    /* Signed word divide */
> +    i_arg1 = (vector int){ 20, 40, 60, 80};
> +    i_arg2 = (vector int){ 2, 2, 2, 2};
> +    vec_i_expected = (vector int){10, 20, 30, 40};
> +
> +    vec_i_result = vec_div (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word divide */
> +    u_arg1 = (vector unsigned int){ 20, 40, 60, 80};
> +    u_arg2 = (vector unsigned int){ 2, 2, 2, 2};
> +    vec_u_expected = (vector unsigned int){10, 20, 30, 40};
> +
> +    vec_u_result = vec_div (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word divide */
> +    d_arg1 = (vector long long){ 24, 68};
> +    d_arg2 = (vector long long){ 2, 2};
> +    vec_d_expected = (vector long long){12, 34};
> +
> +    vec_d_result = vec_div (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +	  printf("ERROR vec_div signed result[%d] = %d != "
> +		 "expected[%d] = %d\n",
> +		 i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word divide */
> +    ud_arg1 = (vector unsigned long long){ 24, 68};
> +    ud_arg2 = (vector unsigned long long){ 2, 2};
> +    vec_ud_expected = (vector unsigned long long){12, 34};
> +
> +    vec_ud_result = vec_div (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended signed word  result = (arg1 << 32)/arg2 */
> +    i_arg1 = (vector int){ 2, 4, 6, 8};
> +    i_arg2 = (vector int){ 2048, 2048, 2048, 2048};
> +    vec_i_expected = (vector int){4194304, 8388608, 12582912, 16777216};
> +
> +    vec_i_result = vec_dive (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended unsigned word  result = (arg1 << 32)/arg2 */
> +    u_arg1 = (vector unsigned int){ 2, 4, 6, 8};
> +    u_arg2 = (vector unsigned int){ 2048, 2048, 2048, 2048};
> +    vec_u_expected = (vector unsigned int){4194304, 8388608,
> +					   12582912, 16777216};
> +
> +    vec_u_result = vec_dive (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended double signed  esult = (arg1 << 64)/arg2 */
> +    d_arg1 = (vector long long int){ 2, 4};
> +    d_arg2 = (vector long long int){ 4294967296, 4294967296};
> +
> +    vec_d_expected = (vector long long int){8589934592, 17179869184};
> +
> +    vec_d_result = vec_dive (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive signed result[%d] = %lld != "
> +		  "expected[%d] = %lld\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended double unsigned result = (arg1 << 64)/arg2 */
> +    ud_arg1 = (vector long long unsigned int){ 2, 4};
> +    ud_arg2 = (vector long long unsigned int){ 4294967296, 4294967296};
> +
> +    vec_ud_expected = (vector long long unsigned int){8589934592,
> +						      17179869184};
> +
> +    vec_ud_result = vec_dive (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive unsigned result[%d] = %lld != "
> +		  "expected[%d] = %lld\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed word modulo */
> +    i_arg1 = (vector int){ 23, 45, 61, 89};
> +    i_arg2 = (vector int){ 2, 2, 2, 2};
> +    vec_i_expected = (vector int){1, 1, 1, 1};
> +
> +    vec_i_result = vec_mod (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word modulo */
> +    u_arg1 = (vector unsigned int){ 25, 41, 67, 86};
> +    u_arg2 = (vector unsigned int){ 3, 3, 3, 3};
> +    vec_u_expected = (vector unsigned int){1, 2, 1, 2};
> +
> +    vec_u_result = vec_mod (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word modulo */
> +    d_arg1 = (vector long long){ 24, 68};
> +    d_arg2 = (vector long long){ 7, 7};
> +    vec_d_expected = (vector long long){3, 5};
> +
> +    vec_d_result = vec_mod (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word modulo */
> +    ud_arg1 = (vector unsigned long long){ 24, 68};
> +    ud_arg2 = (vector unsigned long long){ 8, 8};
> +    vec_ud_expected = (vector unsigned long long){0, 4};
> +
> +    vec_ud_result = vec_mod (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vecmod unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed word multiply high */
> +    i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 };
> +    i_arg2 = (vector int){ 2, 3, 4, 5};
> +    //    vec_i_expected = (vector int){-1, -2, -2, -3};
> +    vec_i_expected = (vector int){1, -2, -2, -3};
> +
> +    vec_i_result = vec_mulh (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word multiply high */
> +    u_arg1 = (vector unsigned int){ 2147483648, 2147483648,
> +				    2147483648, 2147483648 };
> +    u_arg2 = (vector unsigned int){ 4, 5, 6, 7 };
> +    vec_u_expected = (vector unsigned int){2, 2, 3, 3 };
> +
> +    vec_u_result = vec_mulh (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word multiply high */
> +    d_arg1 = (vector long long int){  2305843009213693951,
> +				      4611686018427387903 };
> +    d_arg2 = (vector long long int){ 12, 20 };
> +    vec_d_expected = (vector long long int){ 1, 4 };
> +
> +    vec_d_result = vec_mulh (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word multiply high */
> +    ud_arg1 = (vector unsigned long long int){ 2305843009213693951,
> +					       4611686018427387903 };
> +    ud_arg2 = (vector unsigned long long int){ 32, 10 };
> +    vec_ud_expected = (vector unsigned long long int){ 3, 2 };
> +
> +    vec_ud_result = vec_mulh (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word multiply low */
> +    ud_arg1 = (vector unsigned long long int){ 2048, 4096 };
> +    ud_arg2 = (vector unsigned long long int){ 2, 4 };
> +    vec_ud_expected = (vector unsigned long long int){ 4096, 16384 };
> +
> +    vec_ud_result = vec_mul (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mul unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word multiply low */
> +    d_arg1 = (vector signed long long int){ 2048, 4096 };
> +    d_arg2 = (vector signed long long int){ 2, 4 };
> +    vec_d_expected = (vector signed long long int){ 4096, 16384 };
> +
> +    vec_d_result = vec_mul (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mul signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +  }


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions
  2020-12-01 23:48               ` Carl Love
  2020-12-02 23:14                 ` will schmidt
@ 2020-12-03 19:55                 ` will schmidt
  2020-12-08  0:31                   ` [PATCH v4] " Carl Love
  1 sibling, 1 reply; 22+ messages in thread
From: will schmidt @ 2020-12-03 19:55 UTC (permalink / raw)
  To: Carl Love, Segher Boessenkool, Pat Haugen; +Cc: GCC Patches, David Edelsohn

On Tue, 2020-12-01 at 15:48 -0800, Carl Love via Gcc-patches wrote:
> Segher, Pat:
> 
> I have updated the patch to address the comments below.
> 
> On Wed, 2020-11-25 at 20:30 -0600, Segher Boessenkool wrote:
> > On Tue, Nov 24, 2020 at 08:34:51PM -0600, Pat Haugen wrote:
> > > On 11/24/20 8:17 PM, Pat Haugen via Gcc-patches wrote:
> > > > On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
> > > > > +(define_insn "modu_<mode>"
> > > > > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > > > > +	(umod:VIlong (match_operand:VIlong 1
> > > > > "vsx_register_operand" "v")
> > > > > +		     (match_operand:VIlong 2
> > > > > "vsx_register_operand" "v")))]
> > > > > +  "TARGET_POWER10"
> > > > > +  "vmodu<wd> %0,%1,%2"
> > > > > +  [(set_attr "type" "vecdiv")
> > > > > +   (set_attr "size" "128")])
> > > > 
> > > > We should only be setting "size" "128" for instructions that
> > > > operate on scalar 128-bit data items (i.e. 'vdivesq' etc).
> > > > Since
> > > > the above insns are either V2DI/V4SI (ala VIlong
> > > > mode_iterator),
> > > > they shouldn't be marked as size 128. If you want to set the
> > > > size
> > > > based on mode, (set_attr "size" "<bits>") should do the trick I
> > > > believe.
> > > 
> > > Well, after you update "(define_mode_attr bits" in rs6000.md for
> > > V2DI/V4SI.
> > 
> > So far, <size> was only used for scalars.  I agree that for vectors
> > it
> > makes most sense to do the element size (because the vector size
> > always
> > is 128 bits, and for scheduling the element size can matter).  But,
> > the
> > definitions of <size> and <bits> now say
> > 
> > ;; What data size does this instruction work on?
> > ;; This is used for insert, mul and others as necessary.
> > (define_attr "size" "8,16,32,64,128" (const_string "32"))
> > 
> > and
> > 
> > ;; How many bits in this mode?
> > (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
> >                                            (SF "32") (DF "64")])
> > so those need a bit of update as well then :-)
> 
> I set the size based on the vector element size, extendeing the
> define_mode_attr bits definition.  Please take a look at the updated
> patch.  Hopefully I have this all correct.  Thanks.
> 
> Note, I retested the updated patch on 
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
>   powerpc64le-unknown-linux-gnu (Power 10 LE)
> 
> Thanks for the help.
> 
>                      Carl 
> 

Continued from yesterday..  
Thanks
-Will

> -------------------------------------------------------------------
> ----
> rs6000, vector integer multiply/divide/modulo instructions
> 
> 2020-12-01  Carl Love  <cel@us.ibm.com>
> 
> gcc/
> 	* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive,
> vec_mod): New
> 	defines.
> 	* config/rs6000/altivec.md (VIlong): Move define to file
> vsx.md.
> 	* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
> 	DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
> 	DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
> 	MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
> 	Add builtin define.
> 	(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
> 	* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
> 	P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, 
> P10_BUILTIN_VEC_VMULH):
No mentions of these three P10_BUILTIN_VEC_* in patch below.


> 	New overloaded definitions.
> 	(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
> 	P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
> 	P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
> 	P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
> 	P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
> 	statement for builtins.
> 	* config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute.

just VIlong 
Maybe s/define_mod_attribute/define_mod_attr /  ? 

> 	(UNSPEC_VDIVES, UNSPEC_VDIVEU): Add enum for UNSPECs.



> 	(vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement.

I don't see vsx_mul_v2di or vsx_udiv_v2di in the patch contexts, Looks
OK per a look at trunks vsx.md. 

> 	(dives_<mode>, diveu_<mode>, div<mode>3, uvdiv<mode>3,
> 	mods_<mode>, modu_<mode>, mulhs_<mode>, mulhu_<mode>,
> mulv2di3):
> 	Add define_insn, mode is VIlong.
> 	* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive,
> vec_mod): Add
> 	builtin descriptions.
> 
> gcc/testsuite/
> 	* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
> ---
>  gcc/config/rs6000/altivec.h                   |   5 +
>  gcc/config/rs6000/altivec.md                  |   2 -
>  gcc/config/rs6000/rs6000-builtin.def          |  22 +
>  gcc/config/rs6000/rs6000-call.c               |  49 +++
>  gcc/config/rs6000/rs6000.md                   |   3 +-
>  gcc/config/rs6000/vsx.md                      | 213 +++++++---
>  gcc/doc/extend.texi                           | 120 ++++++
>  .../powerpc/builtins-1-p10-runnable.c         | 398
> ++++++++++++++++++
>  8 files changed, 759 insertions(+), 53 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-
> runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h
> b/gcc/config/rs6000/altivec.h
> index e1884f51bd8..12ccbd2fc2f 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -750,6 +750,11 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_strir_p(a)	__builtin_vec_strir_p (a)
>  #define vec_stril_p(a)	__builtin_vec_stril_p (a)
> 
> +#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
> +#define vec_div(a, b) __builtin_vec_div ((a), (b))
> +#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
> +#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
> +
>  /* VSX Mask Manipulation builtin. */
>  #define vec_genbm __builtin_vec_mtvsrbm
>  #define vec_genhm __builtin_vec_mtvsrhm
> diff --git a/gcc/config/rs6000/altivec.md
> b/gcc/config/rs6000/altivec.md
> index 6a6ce0f84ed..f10f1cdd8a7 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -193,8 +193,6 @@
> 
>  ;; Short vec int modes
>  (define_mode_iterator VIshort [V8HI V16QI])
> -;; Longer vec int modes for rotate/mask ops
> -(define_mode_iterator VIlong [V2DI V4SI])
>  ;; Vec float modes
>  (define_mode_iterator VF [V4SF])
>  ;; Vec modes, pity mode iterators are not composable
> diff --git a/gcc/config/rs6000/rs6000-builtin.def
> b/gcc/config/rs6000/rs6000-builtin.def
> index 47b1f74e616..e9ea2114615 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2883,6 +2883,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST,
> vsrdb_v8hi)
>  BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
>  BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
> 
> +BU_P10V_AV_2 (DIVES_V4SI, "vdivesw", CONST, dives_v4si)
> +BU_P10V_AV_2 (DIVES_V2DI, "vdivesd", CONST, dives_v2di)
> +BU_P10V_AV_2 (DIVEU_V4SI, "vdiveuw", CONST, diveu_v4si)
> +BU_P10V_AV_2 (DIVEU_V2DI, "vdiveud", CONST, diveu_v2di)
> +BU_P10V_AV_2 (DIVS_V4SI, "vdivsw", CONST, divv4si3)
> +BU_P10V_AV_2 (DIVS_V2DI, "vdivsd", CONST, divv2di3)
> +BU_P10V_AV_2 (DIVU_V4SI, "vdivuw", CONST, udivv4si3)
> +BU_P10V_AV_2 (DIVU_V2DI, "vdivud", CONST, udivv2di3)
> +BU_P10V_AV_2 (MODS_V2DI, "vmodsd", CONST, mods_v2di)
> +BU_P10V_AV_2 (MODS_V4SI, "vmodsw", CONST, mods_v4si)
> +BU_P10V_AV_2 (MODU_V2DI, "vmodud", CONST, modu_v2di)
> +BU_P10V_AV_2 (MODU_V4SI, "vmoduw", CONST, modu_v4si)
> +BU_P10V_AV_2 (MULHS_V2DI, "vmulhsd", CONST, mulhs_v2di)
> +BU_P10V_AV_2 (MULHS_V4SI, "vmulhsw", CONST, mulhs_v4si)
> +BU_P10V_AV_2 (MULHU_V2DI, "vmulhud", CONST, mulhu_v2di)
> +BU_P10V_AV_2 (MULHU_V4SI, "vmulhuw", CONST, mulhu_v4si)
> +BU_P10V_AV_2 (MULLD_V2DI, "vmulld", CONST, mulv2di3)
> +
>  BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST,
> xxspltiw_v4si)
>  BU_P10V_VSX_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST,
> xxspltiw_v4sf)
> 
> @@ -2958,6 +2976,10 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
> 
> +BU_P10_OVERLOAD_2 (MULH, "mulh")
> +BU_P10_OVERLOAD_2 (DIVE, "dive")
> +BU_P10_OVERLOAD_2 (MOD, "mod")
> +
>  
>  BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
>  BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
> diff --git a/gcc/config/rs6000/rs6000-call.c
> b/gcc/config/rs6000/rs6000-call.c
> index 45bc048b5c7..5b310ea9039 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -1069,6 +1069,35 @@ const struct altivec_builtin_types
> altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
>    { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> RS6000_BTI_unsigned_V2DI, 0 },
> +
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },

Is there already/should there be entries here for DIVU_V2DI ? (and 
DIVS_V2SI?)   There was/is a case statement added for DIVU_V2DI noted
below.


> +
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
>      RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
> @@ -1909,6 +1938,17 @@ const struct altivec_builtin_types
> altivec_overloaded_builtins[] = {
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
> RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB,
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
> RS6000_BTI_bool_V16QI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
>    { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB,
>      RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI,
> RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB,
> @@ -14438,6 +14478,15 @@ builtin_function_type (machine_mode
> mode_ret, machine_mode mode_arg0,
>      case P10V_BUILTIN_XXGENPCVM_V8HI:
>      case P10V_BUILTIN_XXGENPCVM_V4SI:
>      case P10V_BUILTIN_XXGENPCVM_V2DI:
> +    case P10V_BUILTIN_DIVEU_V4SI:
> +    case P10V_BUILTIN_DIVEU_V2DI:
> +    case P10V_BUILTIN_DIVU_V4SI:
> +    case P10V_BUILTIN_DIVU_V2DI:

^ this case statment may be missing a table entry above.  (Or it's part
of a previous/subsequent patch).

> +    case P10V_BUILTIN_MODU_V2DI:
> +    case P10V_BUILTIN_MODU_V4SI:
> +    case P10V_BUILTIN_MULHU_V2DI:
> +    case P10V_BUILTIN_MULHU_V4SI:
> +    case P10V_BUILTIN_MULLD_V2DI:

I didn't notice an entry adding MULLD_V2DI in the tables above.

>        h.uns_p[0] = 1;
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;
> diff --git a/gcc/config/rs6000/rs6000.md
> b/gcc/config/rs6000/rs6000.md
> index b89990f46bf..1575cf54580 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -670,7 +670,8 @@
> 
>  ;; How many bits in this mode?
>  (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
> -					   (SF "32") (DF "64")])
> +					   (SF "32") (DF "64")
> +					   (V4SI "32") (V2DI "64")])
> 
>  ; DImode bits
>  (define_mode_attr dbits [(QI "56") (HI "48") (SI "32")])

ok

> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 947631d83ee..0cc202e7c74 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -267,6 +267,10 @@
>  (define_mode_iterator VSX_MM [V16QI V8HI V4SI V2DI V1TI])
>  (define_mode_iterator VSX_MM4 [V16QI V8HI V4SI V2DI])
> 
> +;; Longer vec int modes for rotate/mask ops
> +;; and Vector Integer Multiply/Divide/Modulo Instructions
> +(define_mode_iterator VIlong [V2DI V4SI])
> +
>  ;; Constants for creating unspecs
>  (define_c_enum "unspec"
>    [UNSPEC_VSX_CONCAT
> @@ -363,8 +367,11 @@
>     UNSPEC_INSERTR
>     UNSPEC_REPLACE_ELT
>     UNSPEC_REPLACE_UN
> +   UNSPEC_VDIVES
> +   UNSPEC_VDIVEU
>    ])
> 
> +
>  (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
>  				 UNSPEC_VSX_XVCVBF16SPN])
> 
> @@ -1623,28 +1630,35 @@
>    rtx op0 = operands[0];
>    rtx op1 = operands[1];
>    rtx op2 = operands[2];
> -  rtx op3 = gen_reg_rtx (DImode);
> -  rtx op4 = gen_reg_rtx (DImode);
> -  rtx op5 = gen_reg_rtx (DImode);
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_muldi3 (op5, op3, op4));
> -  else
> -    {
> -      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> -      emit_move_insn (op5, ret);
> -    }
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_muldi3 (op3, op3, op4));
> +
> +  if (TARGET_POWER10)
> +    emit_insn (gen_mulv2di3 (op0, op1, op2) );
> +
>    else
>      {
> -      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> -      emit_move_insn (op3, ret);
> +      rtx op3 = gen_reg_rtx (DImode);
> +      rtx op4 = gen_reg_rtx (DImode);
> +      rtx op5 = gen_reg_rtx (DImode);
> +      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> +      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> +      if (TARGET_POWERPC64)
> +	emit_insn (gen_muldi3 (op5, op3, op4));
> +      else
> +	{
> +	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> +	  emit_move_insn (op5, ret);
> +	}
> +      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> +      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> +      if (TARGET_POWERPC64)
> +	emit_insn (gen_muldi3 (op3, op3, op4));
> +      else
> +	{
> +	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> +	  emit_move_insn (op3, ret);
> +	}
> +      emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
>      }
> -  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));

ok

>    DONE;
>  }
>    [(set_attr "type" "mul")])
> @@ -1718,37 +1732,47 @@
>    rtx op0 = operands[0];
>    rtx op1 = operands[1];
>    rtx op2 = operands[2];
> -  rtx op3 = gen_reg_rtx (DImode);
> -  rtx op4 = gen_reg_rtx (DImode);
> -  rtx op5 = gen_reg_rtx (DImode);
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_udivdi3 (op5, op3, op4));
> -  else
> -    {
> -      rtx libfunc = optab_libfunc (udiv_optab, DImode);
> -      rtx target = emit_library_call_value (libfunc,
> -					    op5, LCT_NORMAL, DImode,
> -					    op3, DImode,
> -					    op4, DImode);
> -      emit_move_insn (op5, target);
> -    }
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_udivdi3 (op3, op3, op4));
> -  else
> -    {
> -      rtx libfunc = optab_libfunc (udiv_optab, DImode);
> -      rtx target = emit_library_call_value (libfunc,
> -					    op3, LCT_NORMAL, DImode,
> -					    op3, DImode,
> -					    op4, DImode);
> -      emit_move_insn (op3, target);
> -    }
> -  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
> -  DONE;
> +
> +    if (TARGET_POWER10)
> +      emit_insn (gen_udivv2di3 (op0, op1, op2) );
> +

unnecessary blank line ? 

> +    else
> +      {
> +	rtx op3 = gen_reg_rtx (DImode);
> +	rtx op4 = gen_reg_rtx (DImode);
> +	rtx op5 = gen_reg_rtx (DImode);
> +
> +	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> +	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> +
> +	if (TARGET_POWERPC64)
> +	  emit_insn (gen_udivdi3 (op5, op3, op4));
> +	else
> +	  {
> +	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
> +	    rtx target = emit_library_call_value (libfunc,
> +						  op5, LCT_NORMAL, DImode,
> +						  op3, DImode,
> +						  op4, DImode);
> +	    emit_move_insn (op5, target);
> +	  }
> +	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> +	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> +
> +	if (TARGET_POWERPC64)
> +	  emit_insn (gen_udivdi3 (op3, op3, op4));
> +	else
> +	  {
> +	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
> +	    rtx target = emit_library_call_value (libfunc,
> +						  op3, LCT_NORMAL, DImode,
> +						  op3, DImode,
> +						  op4, DImode);
> +	    emit_move_insn (op3, target);
> +	  }
> +	emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
> +      }
> +    DONE;
>  }
>    [(set_attr "type" "div")])
> 

ok.


> @@ -6104,3 +6128,92 @@
>    "TARGET_POWER10"
>    "vexpand<wd>m %0,%1"
>    [(set_attr "type" "vecsimple")])
> +
> +(define_insn "dives_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +		       UNSPEC_VDIVES))]
> +  "TARGET_POWER10"
> +  "vdives<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "diveu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +			 (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +			UNSPEC_VDIVEU))]
> +  "TARGET_POWER10"
> +  "vdiveu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "div<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "udiv<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "mods_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmods<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "modu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmodu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "mulhs_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(mult:VIlong (ashiftrt
> +		       (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		       (const_int 32))
> +		     (ashiftrt
> +		       (match_operand:VIlong 2 "vsx_register_operand" "v")
> +		       (const_int 32))))]
> +  "TARGET_POWER10"
> +  "vmulhs<wd> %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +(define_insn "mulhu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(us_mult:VIlong (ashiftrt
> +			  (match_operand:VIlong 1 "vsx_register_operand" "v")
> +			  (const_int 32))
> +			(ashiftrt
> +			  (match_operand:VIlong 2 "vsx_register_operand" "v")
> +			  (const_int 32))))]
> +  "TARGET_POWER10"
> +  "vmulhu<wd> %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +;; Vector multiply low double word
> +(define_insn "mulv2di3"
> +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> +	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> +		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmulld %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])


I've just skimmed over the rest.  Nothing jumped out at me in the
documentation or testcases below.
Thanks,
-Will


> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 23ede966bae..e20abd8f1f5 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21568,6 +21568,126 @@ integer value between 0 and 255 inclusive.
>  @exdent vector unsigned int vec_genpcvm (vector unsigned long long int,
>                                           const int)
>  @end smallexample
> +
> +Vector Integer Multiply/Divide/Modulo
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_mulh (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_mulh (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer
> +value in word element @code{i} of a is multiplied by the integer value in word
> +element @code{i} of b. The high-order 32 bits of the 64-bit product are placed
> +into word element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_mulh (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_mulh (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer
> +value in doubleword element @code{i} of a is multiplied by the integer value in
> +doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product
> +are placed into doubleword element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector unsigned long long
> +@exdent vec_mul (vector unsigned long long a, vector unsigned long long b)
> +@exdent vector signed long long
> +@exdent vec_mul (vector signed long long a, vector signed long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer
> +value in doubleword element @code{i} of a is multiplied by the integer value in
> +doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product
> +are placed into doubleword element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_div (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_div (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer in
> +word element @code{i} of a is divided by the integer in word element @code{i}
> +of b. The unique integer quotient is placed into the word element @code{i} of
> +the vector returned. If an attempt is made to perform any of the divisions
> +<anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_div (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_div (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer in
> +doubleword element @code{i} of a is divided by the integer in doubleword
> +element @code{i} of b. The unique integer quotient is placed into the
> +doubleword element @code{i} of the vector returned. If an attempt is made to
> +perform any of the divisions 0x8000_0000_0000_0000 ÷ -1 or <anything> ÷ 0 then
> +the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_dive (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_dive (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer in
> +word element @code{i} of a is shifted left by 32 bits, then divided by the
> +integer in word element @code{i} of b. The unique integer quotient is placed
> +into the word element @code{i} of the vector returned. If the quotient cannot
> +be represented in 32 bits, or if an attempt is made to perform any of the
> +divisions <anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_dive (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_dive (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer in
> +doubleword element @code{i} of a is shifted left by 64 bits, then divided by
> +the integer in doubleword element @code{i} of b. The unique integer quotient is
> +placed into the doubleword element @code{i} of the vector returned. If the
> +quotient cannot be represented in 64 bits, or if an attempt is made to perform
> +<anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_mod (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_mod (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer in
> +word element @code{i} of a is divided by the integer in word element @code{i}
> +of b. The unique integer remainder is placed into the word element @code{i} of
> +the vector returned.  If an attempt is made to perform any of the divisions
> +0x8000_0000 ÷ -1 or <anything> ÷ 0 then the remainder is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_mod (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_mod (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer in
> +doubleword element @code{i} of a is divided by the integer in doubleword
> +element @code{i} of b. The unique integer remainder is placed into the
> +doubleword element @code{i} of the vector returned. If an attempt is made to
> +perform <anything> ÷ 0 then the remainder is undefined.
> +
>  Generate PCV from specified Mask size, as if implemented by the
>  @code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where
>  immediate value is either 0, 1, 2 or 3.
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> new file mode 100644
> index 00000000000..222c8b3a409
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> @@ -0,0 +1,398 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> +
> +/* { dg-final { scan-assembler-times {\mvdivsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivesw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdiveuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivesd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdiveud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmoduw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulld\M} 2 } } */
> +
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <math.h>
> +#include <altivec.h>
> +
> +#define DEBUG 0
> +
> +#ifdef DEBUG
> +#include <stdio.h>
> +#endif
> +
> +void abort (void);
> +
> +int main()
> +  {
> +    int i;
> +    vector int i_arg1, i_arg2;
> +    vector unsigned int u_arg1, u_arg2;
> +    vector long long int d_arg1, d_arg2;
> +    vector long long unsigned int ud_arg1, ud_arg2;
> +   
> +    vector int vec_i_expected, vec_i_result;
> +    vector unsigned int vec_u_expected, vec_u_result;
> +    vector long long int vec_d_expected, vec_d_result;
> +    vector long long unsigned int vec_ud_expected, vec_ud_result;
> +  
> +    /* Signed word divide */
> +    i_arg1 = (vector int){ 20, 40, 60, 80};
> +    i_arg2 = (vector int){ 2, 2, 2, 2};
> +    vec_i_expected = (vector int){10, 20, 30, 40};
> +
> +    vec_i_result = vec_div (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word divide */
> +    u_arg1 = (vector unsigned int){ 20, 40, 60, 80};
> +    u_arg2 = (vector unsigned int){ 2, 2, 2, 2};
> +    vec_u_expected = (vector unsigned int){10, 20, 30, 40};
> +
> +    vec_u_result = vec_div (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word divide */
> +    d_arg1 = (vector long long){ 24, 68};
> +    d_arg2 = (vector long long){ 2, 2};
> +    vec_d_expected = (vector long long){12, 34};
> +
> +    vec_d_result = vec_div (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +	  printf("ERROR vec_div signed result[%d] = %d != "
> +		 "expected[%d] = %d\n",
> +		 i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word divide */
> +    ud_arg1 = (vector unsigned long long){ 24, 68};
> +    ud_arg2 = (vector unsigned long long){ 2, 2};
> +    vec_ud_expected = (vector unsigned long long){12, 34};
> +
> +    vec_ud_result = vec_div (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended signed word  result = (arg1 << 32)/arg2 */
> +    i_arg1 = (vector int){ 2, 4, 6, 8};
> +    i_arg2 = (vector int){ 2048, 2048, 2048, 2048};
> +    vec_i_expected = (vector int){4194304, 8388608, 12582912, 16777216};
> +
> +    vec_i_result = vec_dive (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended unsigned word  result = (arg1 << 32)/arg2 */
> +    u_arg1 = (vector unsigned int){ 2, 4, 6, 8};
> +    u_arg2 = (vector unsigned int){ 2048, 2048, 2048, 2048};
> +    vec_u_expected = (vector unsigned int){4194304, 8388608,
> +					   12582912, 16777216};
> +
> +    vec_u_result = vec_dive (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended double signed  esult = (arg1 << 64)/arg2 */
> +    d_arg1 = (vector long long int){ 2, 4};
> +    d_arg2 = (vector long long int){ 4294967296, 4294967296};
> +
> +    vec_d_expected = (vector long long int){8589934592, 17179869184};
> +
> +    vec_d_result = vec_dive (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive signed result[%d] = %lld != "
> +		  "expected[%d] = %lld\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended double unsigned result = (arg1 << 64)/arg2 */
> +    ud_arg1 = (vector long long unsigned int){ 2, 4};
> +    ud_arg2 = (vector long long unsigned int){ 4294967296, 4294967296};
> +
> +    vec_ud_expected = (vector long long unsigned int){8589934592,
> +						      17179869184};
> +
> +    vec_ud_result = vec_dive (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive unsigned result[%d] = %lld != "
> +		  "expected[%d] = %lld\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed word modulo */
> +    i_arg1 = (vector int){ 23, 45, 61, 89};
> +    i_arg2 = (vector int){ 2, 2, 2, 2};
> +    vec_i_expected = (vector int){1, 1, 1, 1};
> +
> +    vec_i_result = vec_mod (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word modulo */
> +    u_arg1 = (vector unsigned int){ 25, 41, 67, 86};
> +    u_arg2 = (vector unsigned int){ 3, 3, 3, 3};
> +    vec_u_expected = (vector unsigned int){1, 2, 1, 2};
> +
> +    vec_u_result = vec_mod (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word modulo */
> +    d_arg1 = (vector long long){ 24, 68};
> +    d_arg2 = (vector long long){ 7, 7};
> +    vec_d_expected = (vector long long){3, 5};
> +
> +    vec_d_result = vec_mod (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word modulo */
> +    ud_arg1 = (vector unsigned long long){ 24, 68};
> +    ud_arg2 = (vector unsigned long long){ 8, 8};
> +    vec_ud_expected = (vector unsigned long long){0, 4};
> +
> +    vec_ud_result = vec_mod (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vecmod unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed word multiply high */
> +    i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 };
> +    i_arg2 = (vector int){ 2, 3, 4, 5};
> +    //    vec_i_expected = (vector int){-1, -2, -2, -3};
> +    vec_i_expected = (vector int){1, -2, -2, -3};
> +
> +    vec_i_result = vec_mulh (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word multiply high */
> +    u_arg1 = (vector unsigned int){ 2147483648, 2147483648,
> +				    2147483648, 2147483648 };
> +    u_arg2 = (vector unsigned int){ 4, 5, 6, 7 };
> +    vec_u_expected = (vector unsigned int){2, 2, 3, 3 };
> +
> +    vec_u_result = vec_mulh (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word multiply high */
> +    d_arg1 = (vector long long int){  2305843009213693951,
> +				      4611686018427387903 };
> +    d_arg2 = (vector long long int){ 12, 20 };
> +    vec_d_expected = (vector long long int){ 1, 4 };
> +
> +    vec_d_result = vec_mulh (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word multiply high */
> +    ud_arg1 = (vector unsigned long long int){ 2305843009213693951,
> +					       4611686018427387903 };
> +    ud_arg2 = (vector unsigned long long int){ 32, 10 };
> +    vec_ud_expected = (vector unsigned long long int){ 3, 2 };
> +
> +    vec_ud_result = vec_mulh (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word multiply low */
> +    ud_arg1 = (vector unsigned long long int){ 2048, 4096 };
> +    ud_arg2 = (vector unsigned long long int){ 2, 4 };
> +    vec_ud_expected = (vector unsigned long long int){ 4096, 16384 };
> +
> +    vec_ud_result = vec_mul (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mul unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word multiply low */
> +    d_arg1 = (vector signed long long int){ 2048, 4096 };
> +    d_arg2 = (vector signed long long int){ 2, 4 };
> +    vec_d_expected = (vector signed long long int){ 4096, 16384 };
> +
> +    vec_d_result = vec_mul (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mul signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +  }


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4] rs6000, vector integer multiply/divide/modulo instructions
  2020-12-03 19:55                 ` will schmidt
@ 2020-12-08  0:31                   ` Carl Love
  2021-01-04 16:45                     ` Carl Love
                                       ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Carl Love @ 2020-12-08  0:31 UTC (permalink / raw)
  To: will schmidt, Segher Boessenkool, Pat Haugen; +Cc: GCC Patches, David Edelsohn

Will:

I have addressed you comments with regards to the Change Log entries.  

The extra define vec_div was removed.

Added the missing entries for DIVU_V2DI  DIVS_V2DI in rs6000-call.c.

The extra MULLD_V2DI case statement entry was removed.

Added comment in rs6000.md about size for vector types per discussion
with Pat.

                          Carl
--------------------------------------------------------

GCC maintainers:

The following patch adds new builtins for the vector integer multiply,
divide and modulo operations.  The builtins are: vec_mulh(),
vec_dive(), vec_mod() for signed and unsigned integers and long
longintegers. The existing support for the vec_div()and vec_mul()
builtins emulate the vector operations with multiple scalar
instructions.  This patch adds support for these builtins using the new
vector instructions for Power 10.

The patch was compiled and tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE)
  powerpc64le-unknown-linux-gnu (Power 10 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

                Carl Love

-------------------------------------------------

From 15f9c090106c62af83cc405414466ad03d1a4c55 Mon Sep 17 00:00:00 2001
From: Carl Love <cel@us.ibm.com>
Date: Fri, 4 Sep 2020 19:24:22 -0500
Subject: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-12-07  Carl Love  <cel@us.ibm.com>

gcc/
	* config/rs6000/altivec.h (vec_mulh, vec_dive, vec_mod): New	defines.
	* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
	* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
	DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
	DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
	MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
	Add builtin define.
	(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
	* config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Add
	VSX_BUILTIN_VEC_DIV, P10_BUILTIN_VEC_VDIVE,
	P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH
	overloaded definitions.
	(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
	P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
	P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
	P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
	P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
	statements for builtins.
	* config/rs6000/rs6000.md (bits): Add new attribute sizes.
	* config/rs6000/vsx.md (VIlong): New define_mode_iterator.
	(UNSPEC_VDIVES, UNSPEC_VDIVEU): New unspec definitions.
	(vsx_mul_v2di): Add if TARGET_POWER10 statement.
	(vsx_udiv_v2di): Add if TARGET_POWER10 statement.
	(dives_<mode>, diveu_<mode>, div<mode>3, uvdiv<mode>3,
	mods_<mode>, modu_<mode>, mulhs_<mode>, mulhu_<mode>, mulv2di3):
	Add define_insn, mode is VIlong.
	doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
	builtin descriptions.

gcc/testsuite/
	* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |   4 +
 gcc/config/rs6000/altivec.md                  |   2 -
 gcc/config/rs6000/rs6000-builtin.def          |  22 +
 gcc/config/rs6000/rs6000-call.c               |  53 +++
 gcc/config/rs6000/rs6000.md                   |   4 +-
 gcc/config/rs6000/vsx.md                      | 212 +++++++---
 gcc/doc/extend.texi                           | 120 ++++++
 .../powerpc/builtins-1-p10-runnable.c         | 398 ++++++++++++++++++
 8 files changed, 762 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e1884f51bd8..b678e5cf28d 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,10 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a)	__builtin_vec_strir_p (a)
 #define vec_stril_p(a)	__builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
+#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
+#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..f10f1cdd8a7 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -193,8 +193,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 47b1f74e616..e9ea2114615 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2883,6 +2883,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
 BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
 
+BU_P10V_AV_2 (DIVES_V4SI, "vdivesw", CONST, dives_v4si)
+BU_P10V_AV_2 (DIVES_V2DI, "vdivesd", CONST, dives_v2di)
+BU_P10V_AV_2 (DIVEU_V4SI, "vdiveuw", CONST, diveu_v4si)
+BU_P10V_AV_2 (DIVEU_V2DI, "vdiveud", CONST, diveu_v2di)
+BU_P10V_AV_2 (DIVS_V4SI, "vdivsw", CONST, divv4si3)
+BU_P10V_AV_2 (DIVS_V2DI, "vdivsd", CONST, divv2di3)
+BU_P10V_AV_2 (DIVU_V4SI, "vdivuw", CONST, udivv4si3)
+BU_P10V_AV_2 (DIVU_V2DI, "vdivud", CONST, udivv2di3)
+BU_P10V_AV_2 (MODS_V2DI, "vmodsd", CONST, mods_v2di)
+BU_P10V_AV_2 (MODS_V4SI, "vmodsw", CONST, mods_v4si)
+BU_P10V_AV_2 (MODU_V2DI, "vmodud", CONST, modu_v2di)
+BU_P10V_AV_2 (MODU_V4SI, "vmoduw", CONST, modu_v4si)
+BU_P10V_AV_2 (MULHS_V2DI, "vmulhsd", CONST, mulhs_v2di)
+BU_P10V_AV_2 (MULHS_V4SI, "vmulhsw", CONST, mulhs_v4si)
+BU_P10V_AV_2 (MULHU_V2DI, "vmulhud", CONST, mulhu_v2di)
+BU_P10V_AV_2 (MULHU_V4SI, "vmulhuw", CONST, mulhu_v4si)
+BU_P10V_AV_2 (MULLD_V2DI, "vmulld", CONST, mulv2di3)
+
 BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, xxspltiw_v4si)
 BU_P10V_VSX_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST, xxspltiw_v4sf)
 
@@ -2958,6 +2976,10 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
 BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
 BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
 
+BU_P10_OVERLOAD_2 (MULH, "mulh")
+BU_P10_OVERLOAD_2 (DIVE, "dive")
+BU_P10_OVERLOAD_2 (MOD, "mod")
+
 \f
 BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
 BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 45bc048b5c7..da442e400b9 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -1069,6 +1069,40 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
@@ -1909,6 +1943,17 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB,
@@ -14438,6 +14483,14 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10V_BUILTIN_XXGENPCVM_V8HI:
     case P10V_BUILTIN_XXGENPCVM_V4SI:
     case P10V_BUILTIN_XXGENPCVM_V2DI:
+    case P10V_BUILTIN_DIVEU_V4SI:
+    case P10V_BUILTIN_DIVEU_V2DI:
+    case P10V_BUILTIN_DIVU_V4SI:
+    case P10V_BUILTIN_DIVU_V2DI:
+    case P10V_BUILTIN_MODU_V2DI:
+    case P10V_BUILTIN_MODU_V4SI:
+    case P10V_BUILTIN_MULHU_V2DI:
+    case P10V_BUILTIN_MULHU_V4SI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b89990f46bf..7dea1dfb1d5 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -669,8 +669,10 @@
 			   (V2DI  "d")])
 
 ;; How many bits in this mode?
+;; For vector type, bits is the size of the elmement
 (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
-					   (SF "32") (DF "64")])
+					   (SF "32") (DF "64")
+					   (V4SI "32") (V2DI "64")])
 
 ; DImode bits
 (define_mode_attr dbits [(QI "56") (HI "48") (SI "32")])
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 947631d83ee..950f655cdb7 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -267,6 +267,10 @@
 (define_mode_iterator VSX_MM [V16QI V8HI V4SI V2DI V1TI])
 (define_mode_iterator VSX_MM4 [V16QI V8HI V4SI V2DI])
 
+;; Longer vec int modes for rotate/mask ops
+;; and Vector Integer Multiply/Divide/Modulo Instructions
+(define_mode_iterator VIlong [V2DI V4SI])
+
 ;; Constants for creating unspecs
 (define_c_enum "unspec"
   [UNSPEC_VSX_CONCAT
@@ -363,8 +367,11 @@
    UNSPEC_INSERTR
    UNSPEC_REPLACE_ELT
    UNSPEC_REPLACE_UN
+   UNSPEC_VDIVES
+   UNSPEC_VDIVEU
   ])
 
+
 (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
 				 UNSPEC_VSX_XVCVBF16SPN])
 
@@ -1623,28 +1630,35 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op5, op3, op4));
-  else
-    {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op5, ret);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op3, op3, op4));
+
+  if (TARGET_POWER10)
+    emit_insn (gen_mulv2di3 (op0, op1, op2) );
+
   else
     {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op3, ret);
+      rtx op3 = gen_reg_rtx (DImode);
+      rtx op4 = gen_reg_rtx (DImode);
+      rtx op5 = gen_reg_rtx (DImode);
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op5, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op5, ret);
+	}
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op3, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op3, ret);
+	}
+      emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
     }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }
   [(set_attr "type" "mul")])
@@ -1718,37 +1732,46 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op5, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op5, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op5, target);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op3, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op3, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op3, target);
-    }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
-  DONE;
+
+    if (TARGET_POWER10)
+      emit_insn (gen_udivv2di3 (op0, op1, op2) );
+    else
+      {
+	rtx op3 = gen_reg_rtx (DImode);
+	rtx op4 = gen_reg_rtx (DImode);
+	rtx op5 = gen_reg_rtx (DImode);
+
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op5, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op5, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op5, target);
+	  }
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op3, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op3, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op3, target);
+	  }
+	emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
+      }
+    DONE;
 }
   [(set_attr "type" "div")])
 
@@ -6104,3 +6127,92 @@
   "TARGET_POWER10"
   "vexpand<wd>m %0,%1"
   [(set_attr "type" "vecsimple")])
+
+(define_insn "dives_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
+		       UNSPEC_VDIVES))]
+  "TARGET_POWER10"
+  "vdives<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "diveu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+			 (match_operand:VIlong 2 "vsx_register_operand" "v")]
+			UNSPEC_VDIVEU))]
+  "TARGET_POWER10"
+  "vdiveu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "div<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivs<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "udiv<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "mods_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmods<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "modu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmodu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "mulhs_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(mult:VIlong (ashiftrt
+		       (match_operand:VIlong 1 "vsx_register_operand" "v")
+		       (const_int 32))
+		     (ashiftrt
+		       (match_operand:VIlong 2 "vsx_register_operand" "v")
+		       (const_int 32))))]
+  "TARGET_POWER10"
+  "vmulhs<wd> %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+(define_insn "mulhu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(us_mult:VIlong (ashiftrt
+			  (match_operand:VIlong 1 "vsx_register_operand" "v")
+			  (const_int 32))
+			(ashiftrt
+			  (match_operand:VIlong 2 "vsx_register_operand" "v")
+			  (const_int 32))))]
+  "TARGET_POWER10"
+  "vmulhu<wd> %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+;; Vector multiply low double word
+(define_insn "mulv2di3"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
+	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
+		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmulld %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0c969085d1f..3c2d2fa892f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21642,6 +21642,126 @@ integer value between 0 and 255 inclusive.
 @exdent vector unsigned int vec_genpcvm (vector unsigned long long int,
                                          const int)
 @end smallexample
+
+Vector Integer Multiply/Divide/Modulo
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mulh (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mulh (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer
+value in word element @code{i} of a is multiplied by the integer value in word
+element @code{i} of b. The high-order 32 bits of the 64-bit product are placed
+into word element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mulh (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mulh (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer
+value in doubleword element @code{i} of a is multiplied by the integer value in
+doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product
+are placed into doubleword element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector unsigned long long
+@exdent vec_mul (vector unsigned long long a, vector unsigned long long b)
+@exdent vector signed long long
+@exdent vec_mul (vector signed long long a, vector signed long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer
+value in doubleword element @code{i} of a is multiplied by the integer value in
+doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product
+are placed into doubleword element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_div (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_div (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is divided by the integer in word element @code{i}
+of b. The unique integer quotient is placed into the word element @code{i} of
+the vector returned. If an attempt is made to perform any of the divisions
+<anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_div (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_div (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is divided by the integer in doubleword
+element @code{i} of b. The unique integer quotient is placed into the
+doubleword element @code{i} of the vector returned. If an attempt is made to
+perform any of the divisions 0x8000_0000_0000_0000 ÷ -1 or <anything> ÷ 0 then
+the quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_dive (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_dive (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is shifted left by 32 bits, then divided by the
+integer in word element @code{i} of b. The unique integer quotient is placed
+into the word element @code{i} of the vector returned. If the quotient cannot
+be represented in 32 bits, or if an attempt is made to perform any of the
+divisions <anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_dive (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_dive (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is shifted left by 64 bits, then divided by
+the integer in doubleword element @code{i} of b. The unique integer quotient is
+placed into the doubleword element @code{i} of the vector returned. If the
+quotient cannot be represented in 64 bits, or if an attempt is made to perform
+<anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mod (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mod (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is divided by the integer in word element @code{i}
+of b. The unique integer remainder is placed into the word element @code{i} of
+the vector returned.  If an attempt is made to perform any of the divisions
+0x8000_0000 ÷ -1 or <anything> ÷ 0 then the remainder is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mod (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mod (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is divided by the integer in doubleword
+element @code{i} of b. The unique integer remainder is placed into the
+doubleword element @code{i} of the vector returned. If an attempt is made to
+perform <anything> ÷ 0 then the remainder is undefined.
+
 Generate PCV from specified Mask size, as if implemented by the
 @code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where
 immediate value is either 0, 1, 2 or 3.
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
new file mode 100644
index 00000000000..222c8b3a409
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -0,0 +1,398 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+
+/* { dg-final { scan-assembler-times {\mvdivsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmoduw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulld\M} 2 } } */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <math.h>
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+void abort (void);
+
+int main()
+  {
+    int i;
+    vector int i_arg1, i_arg2;
+    vector unsigned int u_arg1, u_arg2;
+    vector long long int d_arg1, d_arg2;
+    vector long long unsigned int ud_arg1, ud_arg2;
+   
+    vector int vec_i_expected, vec_i_result;
+    vector unsigned int vec_u_expected, vec_u_result;
+    vector long long int vec_d_expected, vec_d_result;
+    vector long long unsigned int vec_ud_expected, vec_ud_result;
+  
+    /* Signed word divide */
+    i_arg1 = (vector int){ 20, 40, 60, 80};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){10, 20, 30, 40};
+
+    vec_i_result = vec_div (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word divide */
+    u_arg1 = (vector unsigned int){ 20, 40, 60, 80};
+    u_arg2 = (vector unsigned int){ 2, 2, 2, 2};
+    vec_u_expected = (vector unsigned int){10, 20, 30, 40};
+
+    vec_u_result = vec_div (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word divide */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 2, 2};
+    vec_d_expected = (vector long long){12, 34};
+
+    vec_d_result = vec_div (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+	  printf("ERROR vec_div signed result[%d] = %d != "
+		 "expected[%d] = %d\n",
+		 i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word divide */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 2, 2};
+    vec_ud_expected = (vector unsigned long long){12, 34};
+
+    vec_ud_result = vec_div (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended signed word  result = (arg1 << 32)/arg2 */
+    i_arg1 = (vector int){ 2, 4, 6, 8};
+    i_arg2 = (vector int){ 2048, 2048, 2048, 2048};
+    vec_i_expected = (vector int){4194304, 8388608, 12582912, 16777216};
+
+    vec_i_result = vec_dive (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended unsigned word  result = (arg1 << 32)/arg2 */
+    u_arg1 = (vector unsigned int){ 2, 4, 6, 8};
+    u_arg2 = (vector unsigned int){ 2048, 2048, 2048, 2048};
+    vec_u_expected = (vector unsigned int){4194304, 8388608,
+					   12582912, 16777216};
+
+    vec_u_result = vec_dive (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double signed  esult = (arg1 << 64)/arg2 */
+    d_arg1 = (vector long long int){ 2, 4};
+    d_arg2 = (vector long long int){ 4294967296, 4294967296};
+
+    vec_d_expected = (vector long long int){8589934592, 17179869184};
+
+    vec_d_result = vec_dive (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive signed result[%d] = %lld != "
+		  "expected[%d] = %lld\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double unsigned result = (arg1 << 64)/arg2 */
+    ud_arg1 = (vector long long unsigned int){ 2, 4};
+    ud_arg2 = (vector long long unsigned int){ 4294967296, 4294967296};
+
+    vec_ud_expected = (vector long long unsigned int){8589934592,
+						      17179869184};
+
+    vec_ud_result = vec_dive (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive unsigned result[%d] = %lld != "
+		  "expected[%d] = %lld\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word modulo */
+    i_arg1 = (vector int){ 23, 45, 61, 89};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){1, 1, 1, 1};
+
+    vec_i_result = vec_mod (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word modulo */
+    u_arg1 = (vector unsigned int){ 25, 41, 67, 86};
+    u_arg2 = (vector unsigned int){ 3, 3, 3, 3};
+    vec_u_expected = (vector unsigned int){1, 2, 1, 2};
+
+    vec_u_result = vec_mod (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word modulo */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 7, 7};
+    vec_d_expected = (vector long long){3, 5};
+
+    vec_d_result = vec_mod (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word modulo */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 8, 8};
+    vec_ud_expected = (vector unsigned long long){0, 4};
+
+    vec_ud_result = vec_mod (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vecmod unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word multiply high */
+    i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 };
+    i_arg2 = (vector int){ 2, 3, 4, 5};
+    //    vec_i_expected = (vector int){-1, -2, -2, -3};
+    vec_i_expected = (vector int){1, -2, -2, -3};
+
+    vec_i_result = vec_mulh (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word multiply high */
+    u_arg1 = (vector unsigned int){ 2147483648, 2147483648,
+				    2147483648, 2147483648 };
+    u_arg2 = (vector unsigned int){ 4, 5, 6, 7 };
+    vec_u_expected = (vector unsigned int){2, 2, 3, 3 };
+
+    vec_u_result = vec_mulh (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply high */
+    d_arg1 = (vector long long int){  2305843009213693951,
+				      4611686018427387903 };
+    d_arg2 = (vector long long int){ 12, 20 };
+    vec_d_expected = (vector long long int){ 1, 4 };
+
+    vec_d_result = vec_mulh (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply high */
+    ud_arg1 = (vector unsigned long long int){ 2305843009213693951,
+					       4611686018427387903 };
+    ud_arg2 = (vector unsigned long long int){ 32, 10 };
+    vec_ud_expected = (vector unsigned long long int){ 3, 2 };
+
+    vec_ud_result = vec_mulh (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply low */
+    ud_arg1 = (vector unsigned long long int){ 2048, 4096 };
+    ud_arg2 = (vector unsigned long long int){ 2, 4 };
+    vec_ud_expected = (vector unsigned long long int){ 4096, 16384 };
+
+    vec_ud_result = vec_mul (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply low */
+    d_arg1 = (vector signed long long int){ 2048, 4096 };
+    d_arg2 = (vector signed long long int){ 2, 4 };
+    vec_d_expected = (vector signed long long int){ 4096, 16384 };
+
+    vec_d_result = vec_mul (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+  }
-- 
2.17.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] rs6000, vector integer multiply/divide/modulo instructions
  2020-12-08  0:31                   ` [PATCH v4] " Carl Love
@ 2021-01-04 16:45                     ` Carl Love
  2021-01-11 21:18                     ` will schmidt
  2021-01-13 22:15                     ` [PATCH v5] " Carl Love
  2 siblings, 0 replies; 22+ messages in thread
From: Carl Love @ 2021-01-04 16:45 UTC (permalink / raw)
  To: will schmidt, Segher Boessenkool, Pat Haugen; +Cc: GCC Patches, David Edelsohn

Segher, Will:

Just wanted to ping you both on this patch.  It has been out there for
awhile.

                  Carl

On Mon, 2020-12-07 at 16:31 -0800, Carl Love wrote:
> Will:
> 
> I have addressed you comments with regards to the Change Log
> entries.  
> 
> The extra define vec_div was removed.
> 
> Added the missing entries for DIVU_V2DI  DIVS_V2DI in rs6000-call.c.
> 
> The extra MULLD_V2DI case statement entry was removed.
> 
> Added comment in rs6000.md about size for vector types per discussion
> with Pat.
> 
>                           Carl
> --------------------------------------------------------
> 
> GCC maintainers:
> 
> The following patch adds new builtins for the vector integer
> multiply,
> divide and modulo operations.  The builtins are: vec_mulh(),
> vec_dive(), vec_mod() for signed and unsigned integers and long
> longintegers. The existing support for the vec_div()and vec_mul()
> builtins emulate the vector operations with multiple scalar
> instructions.  This patch adds support for these builtins using the
> new
> vector instructions for Power 10.
> 
> The patch was compiled and tested on:
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
>   powerpc64le-unknown-linux-gnu (Power 10 LE)
> 
> with no regressions. Additionally the new test case was compiled and
> executed by hand on Mambo to verify the test case passes.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>                 Carl Love
> 
> -------------------------------------------------
> 
> From 15f9c090106c62af83cc405414466ad03d1a4c55 Mon Sep 17 00:00:00
> 2001
> From: Carl Love <cel@us.ibm.com>
> Date: Fri, 4 Sep 2020 19:24:22 -0500
> Subject: [PATCH] rs6000, vector integer multiply/divide/modulo
> instructions
> 
> 2020-12-07  Carl Love  <cel@us.ibm.com>
> 
> gcc/
> 	* config/rs6000/altivec.h (vec_mulh, vec_dive, vec_mod): New	
> defines.
> 	* config/rs6000/altivec.md (VIlong): Move define to file
> vsx.md.
> 	* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
> 	DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
> 	DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
> 	MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
> 	Add builtin define.
> 	(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
> 	* config/rs6000/rs6000-call.c (altivec_overloaded_builtins):
> Add
> 	VSX_BUILTIN_VEC_DIV, P10_BUILTIN_VEC_VDIVE,
> 	P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD,
> P10_BUILTIN_VEC_VMULH
> 	overloaded definitions.
> 	(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
> 	P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
> 	P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
> 	P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
> 	P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
> 	statements for builtins.
> 	* config/rs6000/rs6000.md (bits): Add new attribute sizes.
> 	* config/rs6000/vsx.md (VIlong): New define_mode_iterator.
> 	(UNSPEC_VDIVES, UNSPEC_VDIVEU): New unspec definitions.
> 	(vsx_mul_v2di): Add if TARGET_POWER10 statement.
> 	(vsx_udiv_v2di): Add if TARGET_POWER10 statement.
> 	(dives_<mode>, diveu_<mode>, div<mode>3, uvdiv<mode>3,
> 	mods_<mode>, modu_<mode>, mulhs_<mode>, mulhu_<mode>,
> mulv2di3):
> 	Add define_insn, mode is VIlong.
> 	doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive,
> vec_mod): Add
> 	builtin descriptions.
> 
> gcc/testsuite/
> 	* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
> ---
>  gcc/config/rs6000/altivec.h                   |   4 +
>  gcc/config/rs6000/altivec.md                  |   2 -
>  gcc/config/rs6000/rs6000-builtin.def          |  22 +
>  gcc/config/rs6000/rs6000-call.c               |  53 +++
>  gcc/config/rs6000/rs6000.md                   |   4 +-
>  gcc/config/rs6000/vsx.md                      | 212 +++++++---
>  gcc/doc/extend.texi                           | 120 ++++++
>  .../powerpc/builtins-1-p10-runnable.c         | 398
> ++++++++++++++++++
>  8 files changed, 762 insertions(+), 53 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-
> runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h
> b/gcc/config/rs6000/altivec.h
> index e1884f51bd8..b678e5cf28d 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -750,6 +750,10 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_strir_p(a)	__builtin_vec_strir_p (a)
>  #define vec_stril_p(a)	__builtin_vec_stril_p (a)
>  
> +#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
> +#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
> +#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
> +
>  /* VSX Mask Manipulation builtin. */
>  #define vec_genbm __builtin_vec_mtvsrbm
>  #define vec_genhm __builtin_vec_mtvsrhm
> diff --git a/gcc/config/rs6000/altivec.md
> b/gcc/config/rs6000/altivec.md
> index 6a6ce0f84ed..f10f1cdd8a7 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -193,8 +193,6 @@
>  
>  ;; Short vec int modes
>  (define_mode_iterator VIshort [V8HI V16QI])
> -;; Longer vec int modes for rotate/mask ops
> -(define_mode_iterator VIlong [V2DI V4SI])
>  ;; Vec float modes
>  (define_mode_iterator VF [V4SF])
>  ;; Vec modes, pity mode iterators are not composable
> diff --git a/gcc/config/rs6000/rs6000-builtin.def
> b/gcc/config/rs6000/rs6000-builtin.def
> index 47b1f74e616..e9ea2114615 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2883,6 +2883,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST,
> vsrdb_v8hi)
>  BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
>  BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
>  
> +BU_P10V_AV_2 (DIVES_V4SI, "vdivesw", CONST, dives_v4si)
> +BU_P10V_AV_2 (DIVES_V2DI, "vdivesd", CONST, dives_v2di)
> +BU_P10V_AV_2 (DIVEU_V4SI, "vdiveuw", CONST, diveu_v4si)
> +BU_P10V_AV_2 (DIVEU_V2DI, "vdiveud", CONST, diveu_v2di)
> +BU_P10V_AV_2 (DIVS_V4SI, "vdivsw", CONST, divv4si3)
> +BU_P10V_AV_2 (DIVS_V2DI, "vdivsd", CONST, divv2di3)
> +BU_P10V_AV_2 (DIVU_V4SI, "vdivuw", CONST, udivv4si3)
> +BU_P10V_AV_2 (DIVU_V2DI, "vdivud", CONST, udivv2di3)
> +BU_P10V_AV_2 (MODS_V2DI, "vmodsd", CONST, mods_v2di)
> +BU_P10V_AV_2 (MODS_V4SI, "vmodsw", CONST, mods_v4si)
> +BU_P10V_AV_2 (MODU_V2DI, "vmodud", CONST, modu_v2di)
> +BU_P10V_AV_2 (MODU_V4SI, "vmoduw", CONST, modu_v4si)
> +BU_P10V_AV_2 (MULHS_V2DI, "vmulhsd", CONST, mulhs_v2di)
> +BU_P10V_AV_2 (MULHS_V4SI, "vmulhsw", CONST, mulhs_v4si)
> +BU_P10V_AV_2 (MULHU_V2DI, "vmulhud", CONST, mulhu_v2di)
> +BU_P10V_AV_2 (MULHU_V4SI, "vmulhuw", CONST, mulhu_v4si)
> +BU_P10V_AV_2 (MULLD_V2DI, "vmulld", CONST, mulv2di3)
> +
>  BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST,
> xxspltiw_v4si)
>  BU_P10V_VSX_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST,
> xxspltiw_v4sf)
>  
> @@ -2958,6 +2976,10 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
>  
> +BU_P10_OVERLOAD_2 (MULH, "mulh")
> +BU_P10_OVERLOAD_2 (DIVE, "dive")
> +BU_P10_OVERLOAD_2 (MOD, "mod")
> +
>  
>  BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
>  BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
> diff --git a/gcc/config/rs6000/rs6000-call.c
> b/gcc/config/rs6000/rs6000-call.c
> index 45bc048b5c7..da442e400b9 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -1069,6 +1069,40 @@ const struct altivec_builtin_types
> altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
>    { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> RS6000_BTI_unsigned_V2DI, 0 },
> +
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
>      RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
> @@ -1909,6 +1943,17 @@ const struct altivec_builtin_types
> altivec_overloaded_builtins[] = {
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
> RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB,
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
> RS6000_BTI_bool_V16QI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
>    { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB,
>      RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI,
> RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB,
> @@ -14438,6 +14483,14 @@ builtin_function_type (machine_mode
> mode_ret, machine_mode mode_arg0,
>      case P10V_BUILTIN_XXGENPCVM_V8HI:
>      case P10V_BUILTIN_XXGENPCVM_V4SI:
>      case P10V_BUILTIN_XXGENPCVM_V2DI:
> +    case P10V_BUILTIN_DIVEU_V4SI:
> +    case P10V_BUILTIN_DIVEU_V2DI:
> +    case P10V_BUILTIN_DIVU_V4SI:
> +    case P10V_BUILTIN_DIVU_V2DI:
> +    case P10V_BUILTIN_MODU_V2DI:
> +    case P10V_BUILTIN_MODU_V4SI:
> +    case P10V_BUILTIN_MULHU_V2DI:
> +    case P10V_BUILTIN_MULHU_V4SI:
>        h.uns_p[0] = 1;
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;
> diff --git a/gcc/config/rs6000/rs6000.md
> b/gcc/config/rs6000/rs6000.md
> index b89990f46bf..7dea1dfb1d5 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -669,8 +669,10 @@
>  			   (V2DI  "d")])
>  
>  ;; How many bits in this mode?
> +;; For vector type, bits is the size of the elmement
>  (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
> -					   (SF "32") (DF "64")])
> +					   (SF "32") (DF "64")
> +					   (V4SI "32") (V2DI "64")])
>  
>  ; DImode bits
>  (define_mode_attr dbits [(QI "56") (HI "48") (SI "32")])
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 947631d83ee..950f655cdb7 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -267,6 +267,10 @@
>  (define_mode_iterator VSX_MM [V16QI V8HI V4SI V2DI V1TI])
>  (define_mode_iterator VSX_MM4 [V16QI V8HI V4SI V2DI])
>  
> +;; Longer vec int modes for rotate/mask ops
> +;; and Vector Integer Multiply/Divide/Modulo Instructions
> +(define_mode_iterator VIlong [V2DI V4SI])
> +
>  ;; Constants for creating unspecs
>  (define_c_enum "unspec"
>    [UNSPEC_VSX_CONCAT
> @@ -363,8 +367,11 @@
>     UNSPEC_INSERTR
>     UNSPEC_REPLACE_ELT
>     UNSPEC_REPLACE_UN
> +   UNSPEC_VDIVES
> +   UNSPEC_VDIVEU
>    ])
>  
> +
>  (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
>  				 UNSPEC_VSX_XVCVBF16SPN])
>  
> @@ -1623,28 +1630,35 @@
>    rtx op0 = operands[0];
>    rtx op1 = operands[1];
>    rtx op2 = operands[2];
> -  rtx op3 = gen_reg_rtx (DImode);
> -  rtx op4 = gen_reg_rtx (DImode);
> -  rtx op5 = gen_reg_rtx (DImode);
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_muldi3 (op5, op3, op4));
> -  else
> -    {
> -      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> -      emit_move_insn (op5, ret);
> -    }
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_muldi3 (op3, op3, op4));
> +
> +  if (TARGET_POWER10)
> +    emit_insn (gen_mulv2di3 (op0, op1, op2) );
> +
>    else
>      {
> -      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> -      emit_move_insn (op3, ret);
> +      rtx op3 = gen_reg_rtx (DImode);
> +      rtx op4 = gen_reg_rtx (DImode);
> +      rtx op5 = gen_reg_rtx (DImode);
> +      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> +      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> +      if (TARGET_POWERPC64)
> +	emit_insn (gen_muldi3 (op5, op3, op4));
> +      else
> +	{
> +	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> +	  emit_move_insn (op5, ret);
> +	}
> +      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> +      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> +      if (TARGET_POWERPC64)
> +	emit_insn (gen_muldi3 (op3, op3, op4));
> +      else
> +	{
> +	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> +	  emit_move_insn (op3, ret);
> +	}
> +      emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
>      }
> -  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
>    DONE;
>  }
>    [(set_attr "type" "mul")])
> @@ -1718,37 +1732,46 @@
>    rtx op0 = operands[0];
>    rtx op1 = operands[1];
>    rtx op2 = operands[2];
> -  rtx op3 = gen_reg_rtx (DImode);
> -  rtx op4 = gen_reg_rtx (DImode);
> -  rtx op5 = gen_reg_rtx (DImode);
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_udivdi3 (op5, op3, op4));
> -  else
> -    {
> -      rtx libfunc = optab_libfunc (udiv_optab, DImode);
> -      rtx target = emit_library_call_value (libfunc,
> -					    op5, LCT_NORMAL, DImode,
> -					    op3, DImode,
> -					    op4, DImode);
> -      emit_move_insn (op5, target);
> -    }
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_udivdi3 (op3, op3, op4));
> -  else
> -    {
> -      rtx libfunc = optab_libfunc (udiv_optab, DImode);
> -      rtx target = emit_library_call_value (libfunc,
> -					    op3, LCT_NORMAL, DImode,
> -					    op3, DImode,
> -					    op4, DImode);
> -      emit_move_insn (op3, target);
> -    }
> -  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
> -  DONE;
> +
> +    if (TARGET_POWER10)
> +      emit_insn (gen_udivv2di3 (op0, op1, op2) );
> +    else
> +      {
> +	rtx op3 = gen_reg_rtx (DImode);
> +	rtx op4 = gen_reg_rtx (DImode);
> +	rtx op5 = gen_reg_rtx (DImode);
> +
> +	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> +	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> +
> +	if (TARGET_POWERPC64)
> +	  emit_insn (gen_udivdi3 (op5, op3, op4));
> +	else
> +	  {
> +	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
> +	    rtx target = emit_library_call_value (libfunc,
> +						  op5, LCT_NORMAL,
> DImode,
> +						  op3, DImode,
> +						  op4, DImode);
> +	    emit_move_insn (op5, target);
> +	  }
> +	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> +	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> +
> +	if (TARGET_POWERPC64)
> +	  emit_insn (gen_udivdi3 (op3, op3, op4));
> +	else
> +	  {
> +	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
> +	    rtx target = emit_library_call_value (libfunc,
> +						  op3, LCT_NORMAL,
> DImode,
> +						  op3, DImode,
> +						  op4, DImode);
> +	    emit_move_insn (op3, target);
> +	  }
> +	emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
> +      }
> +    DONE;
>  }
>    [(set_attr "type" "div")])
>  
> @@ -6104,3 +6127,92 @@
>    "TARGET_POWER10"
>    "vexpand<wd>m %0,%1"
>    [(set_attr "type" "vecsimple")])
> +
> +(define_insn "dives_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec:VIlong [(match_operand:VIlong 1
> "vsx_register_operand" "v")
> +		        (match_operand:VIlong 2 "vsx_register_operand"
> "v")]
> +		       UNSPEC_VDIVES))]
> +  "TARGET_POWER10"
> +  "vdives<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "diveu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec: VIlong [(match_operand:VIlong 1
> "vsx_register_operand" "v")
> +			 (match_operand:VIlong 2 "vsx_register_operand"
> "v")]
> +			UNSPEC_VDIVEU))]
> +  "TARGET_POWER10"
> +  "vdiveu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "div<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand"
> "v")))]
> +  "TARGET_POWER10"
> +  "vdivs<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "udiv<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand"
> "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand"
> "v")))]
> +  "TARGET_POWER10"
> +  "vdivu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "mods_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand"
> "v")))]
> +  "TARGET_POWER10"
> +  "vmods<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "modu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand"
> "v")
> +		     (match_operand:VIlong 2 "vsx_register_operand"
> "v")))]
> +  "TARGET_POWER10"
> +  "vmodu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "mulhs_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(mult:VIlong (ashiftrt
> +		       (match_operand:VIlong 1 "vsx_register_operand"
> "v")
> +		       (const_int 32))
> +		     (ashiftrt
> +		       (match_operand:VIlong 2 "vsx_register_operand"
> "v")
> +		       (const_int 32))))]
> +  "TARGET_POWER10"
> +  "vmulhs<wd> %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +(define_insn "mulhu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(us_mult:VIlong (ashiftrt
> +			  (match_operand:VIlong 1
> "vsx_register_operand" "v")
> +			  (const_int 32))
> +			(ashiftrt
> +			  (match_operand:VIlong 2
> "vsx_register_operand" "v")
> +			  (const_int 32))))]
> +  "TARGET_POWER10"
> +  "vmulhu<wd> %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +;; Vector multiply low double word
> +(define_insn "mulv2di3"
> +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> +	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> +		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmulld %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 0c969085d1f..3c2d2fa892f 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21642,6 +21642,126 @@ integer value between 0 and 255 inclusive.
>  @exdent vector unsigned int vec_genpcvm (vector unsigned long long
> int,
>                                           const int)
>  @end smallexample
> +
> +Vector Integer Multiply/Divide/Modulo
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_mulh (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_mulh (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The
> integer
> +value in word element @code{i} of a is multiplied by the integer
> value in word
> +element @code{i} of b. The high-order 32 bits of the 64-bit product
> are placed
> +into word element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_mulh (vector signed long long a, vector signed long long
> b)
> +@exdent vector unsigned long long
> +@exdent vec_mulh (vector unsigned long long a, vector unsigned long
> long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The
> integer
> +value in doubleword element @code{i} of a is multiplied by the
> integer value in
> +doubleword element @code{i} of b. The high-order 64 bits of the 128-
> bit product
> +are placed into doubleword element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector unsigned long long
> +@exdent vec_mul (vector unsigned long long a, vector unsigned long
> long b)
> +@exdent vector signed long long
> +@exdent vec_mul (vector signed long long a, vector signed long long
> b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The
> integer
> +value in doubleword element @code{i} of a is multiplied by the
> integer value in
> +doubleword element @code{i} of b. The low-order 64 bits of the 128-
> bit product
> +are placed into doubleword element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_div (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_div (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The
> integer in
> +word element @code{i} of a is divided by the integer in word element
> @code{i}
> +of b. The unique integer quotient is placed into the word element
> @code{i} of
> +the vector returned. If an attempt is made to perform any of the
> divisions
> +<anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_div (vector signed long long a, vector signed long long
> b)
> +@exdent vector unsigned long long
> +@exdent vec_div (vector unsigned long long a, vector unsigned long
> long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The
> integer in
> +doubleword element @code{i} of a is divided by the integer in
> doubleword
> +element @code{i} of b. The unique integer quotient is placed into
> the
> +doubleword element @code{i} of the vector returned. If an attempt is
> made to
> +perform any of the divisions 0x8000_0000_0000_0000 ÷ -1 or
> <anything> ÷ 0 then
> +the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_dive (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_dive (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The
> integer in
> +word element @code{i} of a is shifted left by 32 bits, then divided
> by the
> +integer in word element @code{i} of b. The unique integer quotient
> is placed
> +into the word element @code{i} of the vector returned. If the
> quotient cannot
> +be represented in 32 bits, or if an attempt is made to perform any
> of the
> +divisions <anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_dive (vector signed long long a, vector signed long long
> b)
> +@exdent vector unsigned long long
> +@exdent vec_dive (vector unsigned long long a, vector unsigned long
> long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The
> integer in
> +doubleword element @code{i} of a is shifted left by 64 bits, then
> divided by
> +the integer in doubleword element @code{i} of b. The unique integer
> quotient is
> +placed into the doubleword element @code{i} of the vector returned.
> If the
> +quotient cannot be represented in 64 bits, or if an attempt is made
> to perform
> +<anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_mod (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_mod (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The
> integer in
> +word element @code{i} of a is divided by the integer in word element
> @code{i}
> +of b. The unique integer remainder is placed into the word element
> @code{i} of
> +the vector returned.  If an attempt is made to perform any of the
> divisions
> +0x8000_0000 ÷ -1 or <anything> ÷ 0 then the remainder is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_mod (vector signed long long a, vector signed long long
> b)
> +@exdent vector unsigned long long
> +@exdent vec_mod (vector unsigned long long a, vector unsigned long
> long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The
> integer in
> +doubleword element @code{i} of a is divided by the integer in
> doubleword
> +element @code{i} of b. The unique integer remainder is placed into
> the
> +doubleword element @code{i} of the vector returned. If an attempt is
> made to
> +perform <anything> ÷ 0 then the remainder is undefined.
> +
>  Generate PCV from specified Mask size, as if implemented by the
>  @code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm}
> instructions, where
>  immediate value is either 0, 1, 2 or 3.
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-
> runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-
> runnable.c
> new file mode 100644
> index 00000000000..222c8b3a409
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> @@ -0,0 +1,398 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> +
> +/* { dg-final { scan-assembler-times {\mvdivsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivesw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdiveuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivesd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdiveud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmoduw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulld\M} 2 } } */
> +
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <math.h>
> +#include <altivec.h>
> +
> +#define DEBUG 0
> +
> +#ifdef DEBUG
> +#include <stdio.h>
> +#endif
> +
> +void abort (void);
> +
> +int main()
> +  {
> +    int i;
> +    vector int i_arg1, i_arg2;
> +    vector unsigned int u_arg1, u_arg2;
> +    vector long long int d_arg1, d_arg2;
> +    vector long long unsigned int ud_arg1, ud_arg2;
> +   
> +    vector int vec_i_expected, vec_i_result;
> +    vector unsigned int vec_u_expected, vec_u_result;
> +    vector long long int vec_d_expected, vec_d_result;
> +    vector long long unsigned int vec_ud_expected, vec_ud_result;
> +  
> +    /* Signed word divide */
> +    i_arg1 = (vector int){ 20, 40, 60, 80};
> +    i_arg2 = (vector int){ 2, 2, 2, 2};
> +    vec_i_expected = (vector int){10, 20, 30, 40};
> +
> +    vec_i_result = vec_div (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word divide */
> +    u_arg1 = (vector unsigned int){ 20, 40, 60, 80};
> +    u_arg2 = (vector unsigned int){ 2, 2, 2, 2};
> +    vec_u_expected = (vector unsigned int){10, 20, 30, 40};
> +
> +    vec_u_result = vec_div (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word divide */
> +    d_arg1 = (vector long long){ 24, 68};
> +    d_arg2 = (vector long long){ 2, 2};
> +    vec_d_expected = (vector long long){12, 34};
> +
> +    vec_d_result = vec_div (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +	  printf("ERROR vec_div signed result[%d] = %d != "
> +		 "expected[%d] = %d\n",
> +		 i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word divide */
> +    ud_arg1 = (vector unsigned long long){ 24, 68};
> +    ud_arg2 = (vector unsigned long long){ 2, 2};
> +    vec_ud_expected = (vector unsigned long long){12, 34};
> +
> +    vec_ud_result = vec_div (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended signed word  result = (arg1 << 32)/arg2 */
> +    i_arg1 = (vector int){ 2, 4, 6, 8};
> +    i_arg2 = (vector int){ 2048, 2048, 2048, 2048};
> +    vec_i_expected = (vector int){4194304, 8388608, 12582912,
> 16777216};
> +
> +    vec_i_result = vec_dive (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended unsigned word  result = (arg1 << 32)/arg2 */
> +    u_arg1 = (vector unsigned int){ 2, 4, 6, 8};
> +    u_arg2 = (vector unsigned int){ 2048, 2048, 2048, 2048};
> +    vec_u_expected = (vector unsigned int){4194304, 8388608,
> +					   12582912, 16777216};
> +
> +    vec_u_result = vec_dive (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended double signed  esult = (arg1 << 64)/arg2 */
> +    d_arg1 = (vector long long int){ 2, 4};
> +    d_arg2 = (vector long long int){ 4294967296, 4294967296};
> +
> +    vec_d_expected = (vector long long int){8589934592,
> 17179869184};
> +
> +    vec_d_result = vec_dive (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive signed result[%d] = %lld != "
> +		  "expected[%d] = %lld\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended double unsigned result = (arg1 << 64)/arg2 */
> +    ud_arg1 = (vector long long unsigned int){ 2, 4};
> +    ud_arg2 = (vector long long unsigned int){ 4294967296,
> 4294967296};
> +
> +    vec_ud_expected = (vector long long unsigned int){8589934592,
> +						      17179869184};
> +
> +    vec_ud_result = vec_dive (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive unsigned result[%d] = %lld != "
> +		  "expected[%d] = %lld\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed word modulo */
> +    i_arg1 = (vector int){ 23, 45, 61, 89};
> +    i_arg2 = (vector int){ 2, 2, 2, 2};
> +    vec_i_expected = (vector int){1, 1, 1, 1};
> +
> +    vec_i_result = vec_mod (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word modulo */
> +    u_arg1 = (vector unsigned int){ 25, 41, 67, 86};
> +    u_arg2 = (vector unsigned int){ 3, 3, 3, 3};
> +    vec_u_expected = (vector unsigned int){1, 2, 1, 2};
> +
> +    vec_u_result = vec_mod (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word modulo */
> +    d_arg1 = (vector long long){ 24, 68};
> +    d_arg2 = (vector long long){ 7, 7};
> +    vec_d_expected = (vector long long){3, 5};
> +
> +    vec_d_result = vec_mod (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word modulo */
> +    ud_arg1 = (vector unsigned long long){ 24, 68};
> +    ud_arg2 = (vector unsigned long long){ 8, 8};
> +    vec_ud_expected = (vector unsigned long long){0, 4};
> +
> +    vec_ud_result = vec_mod (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vecmod unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed word multiply high */
> +    i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648,
> 2147483648 };
> +    i_arg2 = (vector int){ 2, 3, 4, 5};
> +    //    vec_i_expected = (vector int){-1, -2, -2, -3};
> +    vec_i_expected = (vector int){1, -2, -2, -3};
> +
> +    vec_i_result = vec_mulh (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word multiply high */
> +    u_arg1 = (vector unsigned int){ 2147483648, 2147483648,
> +				    2147483648, 2147483648 };
> +    u_arg2 = (vector unsigned int){ 4, 5, 6, 7 };
> +    vec_u_expected = (vector unsigned int){2, 2, 3, 3 };
> +
> +    vec_u_result = vec_mulh (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word multiply high */
> +    d_arg1 = (vector long long int){  2305843009213693951,
> +				      4611686018427387903 };
> +    d_arg2 = (vector long long int){ 12, 20 };
> +    vec_d_expected = (vector long long int){ 1, 4 };
> +
> +    vec_d_result = vec_mulh (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word multiply high */
> +    ud_arg1 = (vector unsigned long long int){ 2305843009213693951,
> +					       4611686018427387903 };
> +    ud_arg2 = (vector unsigned long long int){ 32, 10 };
> +    vec_ud_expected = (vector unsigned long long int){ 3, 2 };
> +
> +    vec_ud_result = vec_mulh (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word multiply low */
> +    ud_arg1 = (vector unsigned long long int){ 2048, 4096 };
> +    ud_arg2 = (vector unsigned long long int){ 2, 4 };
> +    vec_ud_expected = (vector unsigned long long int){ 4096, 16384
> };
> +
> +    vec_ud_result = vec_mul (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mul unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word multiply low */
> +    d_arg1 = (vector signed long long int){ 2048, 4096 };
> +    d_arg2 = (vector signed long long int){ 2, 4 };
> +    vec_d_expected = (vector signed long long int){ 4096, 16384 };
> +
> +    vec_d_result = vec_mul (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mul signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +  }


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] rs6000, vector integer multiply/divide/modulo instructions
  2020-12-08  0:31                   ` [PATCH v4] " Carl Love
  2021-01-04 16:45                     ` Carl Love
@ 2021-01-11 21:18                     ` will schmidt
  2021-01-13 22:15                     ` [PATCH v5] " Carl Love
  2 siblings, 0 replies; 22+ messages in thread
From: will schmidt @ 2021-01-11 21:18 UTC (permalink / raw)
  To: Carl Love, Segher Boessenkool, Pat Haugen; +Cc: GCC Patches, David Edelsohn

On Mon, 2020-12-07 at 16:31 -0800, Carl Love wrote:
> Will:
> 
> I have addressed you comments with regards to the Change Log entries.  
> 
> The extra define vec_div was removed.
> 
> Added the missing entries for DIVU_V2DI  DIVS_V2DI in rs6000-call.c.
> 
> The extra MULLD_V2DI case statement entry was removed.
> 
> Added comment in rs6000.md about size for vector types per discussion
> with Pat.
> 
>                           Carl
> --------------------------------------------------------
> 
> GCC maintainers:
> 
> The following patch adds new builtins for the vector integer multiply,
> divide and modulo operations.  The builtins are: vec_mulh(),
> vec_dive(), vec_mod() for signed and unsigned integers and long
> longintegers. The existing support for the vec_div()and vec_mul()
> builtins emulate the vector operations with multiple scalar
> instructions.  This patch adds support for these builtins using the new
> vector instructions for Power 10.

Missing a couple spaces. 
"long integers"
 and 
"vec_div() and".


> 
> The patch was compiled and tested on:
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
>   powerpc64le-unknown-linux-gnu (Power 10 LE)
> 
> with no regressions. Additionally the new test case was compiled and
> executed by hand on Mambo to verify the test case passes.

May also be worth trying on Power8/BE, just for the variety.

> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>                 Carl Love
> 
> -------------------------------------------------
> 
> From 15f9c090106c62af83cc405414466ad03d1a4c55 Mon Sep 17 00:00:00 2001
> From: Carl Love <cel@us.ibm.com>
> Date: Fri, 4 Sep 2020 19:24:22 -0500
> Subject: [PATCH] rs6000, vector integer multiply/divide/modulo instructions
> 
> 2020-12-07  Carl Love  <cel@us.ibm.com>
> 
> gcc/
> 	* config/rs6000/altivec.h (vec_mulh, vec_dive, vec_mod): New	defines.

Embedded tab there.

> 	* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.

> 	* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
> 	DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
> 	DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
> 	MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
> 	Add builtin define.
> 	(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.



> 	* config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Add
> 	VSX_BUILTIN_VEC_DIV, P10_BUILTIN_VEC_VDIVE,
> 	P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH
> 	overloaded definitions.

P10_BUILTIN_VEC_VDIVE is mentioned here twice.
I don't see it in the
patch body at all. 
I don't see P10_BUILTIN_VEC_VMOD either.
Also don't
see P10_BUILTIN_VEC_VMULH.


> 	(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
> 	P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
> 	P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
> 	P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
> 	P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
> 	statements for builtins.

I don't see the P10V_BUILTIN_MULLD_V2DI case statement entry in the
patch below.   A previous review commented that there might have been a
missing altivec_overloaded_builtins entry for the MULLD_V2DI entry. 
Codegen for unsigned mull against v2di was correct?


> 	* config/rs6000/rs6000.md (bits): Add new attribute sizes.

I'd be more verbose here to provide something searchable.
i.e. "Add V4SI,V2DI entries..."

> 	* config/rs6000/vsx.md (VIlong): New define_mode_iterator.

Not new.  'Moved here from altivec.md' or something similar.


> 	(UNSPEC_VDIVES, UNSPEC_VDIVEU): New unspec definitions.
> 	(vsx_mul_v2di): Add if TARGET_POWER10 statement.
> 	(vsx_udiv_v2di): Add if TARGET_POWER10 statement.
> 	(dives_<mode>, diveu_<mode>, div<mode>3, uvdiv<mode>3,
> 	mods_<mode>, modu_<mode>, mulhs_<mode>, mulhu_<mode>, mulv2di3):
> 	Add define_insn, mode is VIlong.
> 	doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
> 	builtin descriptions.
> 
> gcc/testsuite/
> 	* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
> ---
>  gcc/config/rs6000/altivec.h                   |   4 +
>  gcc/config/rs6000/altivec.md                  |   2 -
>  gcc/config/rs6000/rs6000-builtin.def          |  22 +
>  gcc/config/rs6000/rs6000-call.c               |  53 +++
>  gcc/config/rs6000/rs6000.md                   |   4 +-
>  gcc/config/rs6000/vsx.md                      | 212 +++++++---
>  gcc/doc/extend.texi                           | 120 ++++++
>  .../powerpc/builtins-1-p10-runnable.c         | 398 ++++++++++++++++++
>  8 files changed, 762 insertions(+), 53 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index e1884f51bd8..b678e5cf28d 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -750,6 +750,10 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_strir_p(a)	__builtin_vec_strir_p (a)
>  #define vec_stril_p(a)	__builtin_vec_stril_p (a)
> 
> +#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
> +#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
> +#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
> +
>  /* VSX Mask Manipulation builtin. */
>  #define vec_genbm __builtin_vec_mtvsrbm
>  #define vec_genhm __builtin_vec_mtvsrhm

ok


> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 6a6ce0f84ed..f10f1cdd8a7 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -193,8 +193,6 @@
> 
>  ;; Short vec int modes
>  (define_mode_iterator VIshort [V8HI V16QI])
> -;; Longer vec int modes for rotate/mask ops
> -(define_mode_iterator VIlong [V2DI V4SI])
>  ;; Vec float modes
>  (define_mode_iterator VF [V4SF])
>  ;; Vec modes, pity mode iterators are not composable

ok

> diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
> index 47b1f74e616..e9ea2114615 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2883,6 +2883,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
>  BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
>  BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
> 
> +BU_P10V_AV_2 (DIVES_V4SI, "vdivesw", CONST, dives_v4si)
> +BU_P10V_AV_2 (DIVES_V2DI, "vdivesd", CONST, dives_v2di)
> +BU_P10V_AV_2 (DIVEU_V4SI, "vdiveuw", CONST, diveu_v4si)
> +BU_P10V_AV_2 (DIVEU_V2DI, "vdiveud", CONST, diveu_v2di)
> +BU_P10V_AV_2 (DIVS_V4SI, "vdivsw", CONST, divv4si3)
> +BU_P10V_AV_2 (DIVS_V2DI, "vdivsd", CONST, divv2di3)
> +BU_P10V_AV_2 (DIVU_V4SI, "vdivuw", CONST, udivv4si3)
> +BU_P10V_AV_2 (DIVU_V2DI, "vdivud", CONST, udivv2di3)
> +BU_P10V_AV_2 (MODS_V2DI, "vmodsd", CONST, mods_v2di)
> +BU_P10V_AV_2 (MODS_V4SI, "vmodsw", CONST, mods_v4si)
> +BU_P10V_AV_2 (MODU_V2DI, "vmodud", CONST, modu_v2di)
> +BU_P10V_AV_2 (MODU_V4SI, "vmoduw", CONST, modu_v4si)
> +BU_P10V_AV_2 (MULHS_V2DI, "vmulhsd", CONST, mulhs_v2di)
> +BU_P10V_AV_2 (MULHS_V4SI, "vmulhsw", CONST, mulhs_v4si)
> +BU_P10V_AV_2 (MULHU_V2DI, "vmulhud", CONST, mulhu_v2di)
> +BU_P10V_AV_2 (MULHU_V4SI, "vmulhuw", CONST, mulhu_v4si)
> +BU_P10V_AV_2 (MULLD_V2DI, "vmulld", CONST, mulv2di3)
> +
>  BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, xxspltiw_v4si)
>  BU_P10V_VSX_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST, xxspltiw_v4sf)
> 
> @@ -2958,6 +2976,10 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
> 
> +BU_P10_OVERLOAD_2 (MULH, "mulh")
> +BU_P10_OVERLOAD_2 (DIVE, "dive")
> +BU_P10_OVERLOAD_2 (MOD, "mod")
> +
>  

blank line unnecessary.
ok.

>  BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
>  BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 45bc048b5c7..da442e400b9 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -1069,6 +1069,40 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },


>    { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },

> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },

I'd be concerned about the order of this entry, versus the existing
VSX_BUILTIN_VEC_DIV with the same parameters (as seen in the diff
context above this hunk).  Presumably this works OK, and doesn't affect
older/non-power10 targets.

ok


> +
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
>      RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
> @@ -1909,6 +1943,17 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB,
>      RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
> +    RS6000_BTI_unsigned_V4SI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
>    { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB,
>      RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB,
> @@ -14438,6 +14483,14 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
>      case P10V_BUILTIN_XXGENPCVM_V8HI:
>      case P10V_BUILTIN_XXGENPCVM_V4SI:
>      case P10V_BUILTIN_XXGENPCVM_V2DI:
> +    case P10V_BUILTIN_DIVEU_V4SI:
> +    case P10V_BUILTIN_DIVEU_V2DI:
> +    case P10V_BUILTIN_DIVU_V4SI:
> +    case P10V_BUILTIN_DIVU_V2DI:
> +    case P10V_BUILTIN_MODU_V2DI:
> +    case P10V_BUILTIN_MODU_V4SI:
> +    case P10V_BUILTIN_MULHU_V2DI:
> +    case P10V_BUILTIN_MULHU_V4SI:
>        h.uns_p[0] = 1;
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index b89990f46bf..7dea1dfb1d5 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -669,8 +669,10 @@
>  			   (V2DI  "d")])
> 
>  ;; How many bits in this mode?
> +;; For vector type, bits is the size of the elmement

types
element

>  (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
> -					   (SF "32") (DF "64")])
> +					   (SF "32") (DF "64")
> +					   (V4SI "32") (V2DI "64")])

Presumably it is safe (no side affects) when adding V4SI and V2DI here,
with respect to other current users of 'bits'.
Is it worth adding the
other modes while we are here? (V1TI, V8HI, V16QI ).


> 
>  ; DImode bits
>  (define_mode_attr dbits [(QI "56") (HI "48") (SI "32")])
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 947631d83ee..950f655cdb7 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -267,6 +267,10 @@
>  (define_mode_iterator VSX_MM [V16QI V8HI V4SI V2DI V1TI])
>  (define_mode_iterator VSX_MM4 [V16QI V8HI V4SI V2DI])
> 
> +;; Longer vec int modes for rotate/mask ops
> +;; and Vector Integer Multiply/Divide/Modulo Instructions
> +(define_mode_iterator VIlong [V2DI V4SI])
> +

ok.


>  ;; Constants for creating unspecs
>  (define_c_enum "unspec"
>    [UNSPEC_VSX_CONCAT
> @@ -363,8 +367,11 @@
>     UNSPEC_INSERTR
>     UNSPEC_REPLACE_ELT
>     UNSPEC_REPLACE_UN
> +   UNSPEC_VDIVES
> +   UNSPEC_VDIVEU
>    ])
> 
> +

unnecessary blank line.
ok


>  (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
>  				 UNSPEC_VSX_XVCVBF16SPN])
> 
> @@ -1623,28 +1630,35 @@
>    rtx op0 = operands[0];
>    rtx op1 = operands[1];
>    rtx op2 = operands[2];
> -  rtx op3 = gen_reg_rtx (DImode);
> -  rtx op4 = gen_reg_rtx (DImode);
> -  rtx op5 = gen_reg_rtx (DImode);
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_muldi3 (op5, op3, op4));
> -  else
> -    {
> -      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> -      emit_move_insn (op5, ret);
> -    }
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_muldi3 (op3, op3, op4));
> +
> +  if (TARGET_POWER10)
> +    emit_insn (gen_mulv2di3 (op0, op1, op2) );
> +
>    else
>      {
> -      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> -      emit_move_insn (op3, ret);
> +      rtx op3 = gen_reg_rtx (DImode);
> +      rtx op4 = gen_reg_rtx (DImode);
> +      rtx op5 = gen_reg_rtx (DImode);
> +      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> +      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> +      if (TARGET_POWERPC64)
> +	emit_insn (gen_muldi3 (op5, op3, op4));
> +      else
> +	{
> +	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> +	  emit_move_insn (op5, ret);
> +	}
> +      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> +      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> +      if (TARGET_POWERPC64)
> +	emit_insn (gen_muldi3 (op3, op3, op4));
> +      else
> +	{
> +	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
> +	  emit_move_insn (op3, ret);
> +	}
> +      emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
>      }
> -  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
>    DONE;
>  }
>    [(set_attr "type" "mul")])
> @@ -1718,37 +1732,46 @@
>    rtx op0 = operands[0];
>    rtx op1 = operands[1];
>    rtx op2 = operands[2];
> -  rtx op3 = gen_reg_rtx (DImode);
> -  rtx op4 = gen_reg_rtx (DImode);
> -  rtx op5 = gen_reg_rtx (DImode);
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_udivdi3 (op5, op3, op4));
> -  else
> -    {
> -      rtx libfunc = optab_libfunc (udiv_optab, DImode);
> -      rtx target = emit_library_call_value (libfunc,
> -					    op5, LCT_NORMAL, DImode,
> -					    op3, DImode,
> -					    op4, DImode);
> -      emit_move_insn (op5, target);
> -    }
> -  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> -  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> -  if (TARGET_POWERPC64)
> -    emit_insn (gen_udivdi3 (op3, op3, op4));
> -  else
> -    {
> -      rtx libfunc = optab_libfunc (udiv_optab, DImode);
> -      rtx target = emit_library_call_value (libfunc,
> -					    op3, LCT_NORMAL, DImode,
> -					    op3, DImode,
> -					    op4, DImode);
> -      emit_move_insn (op3, target);
> -    }
> -  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
> -  DONE;
> +
> +    if (TARGET_POWER10)
> +      emit_insn (gen_udivv2di3 (op0, op1, op2) );
> +    else
> +      {
> +	rtx op3 = gen_reg_rtx (DImode);
> +	rtx op4 = gen_reg_rtx (DImode);
> +	rtx op5 = gen_reg_rtx (DImode);
> +
> +	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
> +	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
> +
> +	if (TARGET_POWERPC64)
> +	  emit_insn (gen_udivdi3 (op5, op3, op4));
> +	else
> +	  {
> +	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
> +	    rtx target = emit_library_call_value (libfunc,
> +						  op5, LCT_NORMAL, DImode,
> +						  op3, DImode,
> +						  op4, DImode);
> +	    emit_move_insn (op5, target);
> +	  }
> +	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
> +	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
> +
> +	if (TARGET_POWERPC64)
> +	  emit_insn (gen_udivdi3 (op3, op3, op4));
> +	else
> +	  {
> +	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
> +	    rtx target = emit_library_call_value (libfunc,
> +						  op3, LCT_NORMAL, DImode,
> +						  op3, DImode,
> +						  op4, DImode);
> +	    emit_move_insn (op3, target);
> +	  }
> +	emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
> +      }
> +    DONE;
>  }
>    [(set_attr "type" "div")])
> 
> @@ -6104,3 +6127,92 @@
>    "TARGET_POWER10"
>    "vexpand<wd>m %0,%1"
>    [(set_attr "type" "vecsimple")])
> +
> +(define_insn "dives_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +		       UNSPEC_VDIVES))]

8 spaces/tab above (?).


> +  "TARGET_POWER10"
> +  "vdives<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "diveu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +        (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +			 (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +			UNSPEC_VDIVEU))]

No space after "unspec:"

> +  "TARGET_POWER10"
> +  "vdiveu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "div<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "udiv<mode>3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "mods_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmods<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "modu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmodu<wd> %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "<bits>")])
> +
> +(define_insn "mulhs_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(mult:VIlong (ashiftrt
> +		       (match_operand:VIlong 1 "vsx_register_operand" "v")
> +		       (const_int 32))
> +		     (ashiftrt
> +		       (match_operand:VIlong 2 "vsx_register_operand" "v")
> +		       (const_int 32))))]
> +  "TARGET_POWER10"
> +  "vmulhs<wd> %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +(define_insn "mulhu_<mode>"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +	(us_mult:VIlong (ashiftrt
> +			  (match_operand:VIlong 1 "vsx_register_operand" "v")
> +			  (const_int 32))
> +			(ashiftrt
> +			  (match_operand:VIlong 2 "vsx_register_operand" "v")
> +			  (const_int 32))))]
> +  "TARGET_POWER10"
> +  "vmulhu<wd> %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +;; Vector multiply low double word
> +(define_insn "mulv2di3"
> +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> +	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> +		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmulld %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 0c969085d1f..3c2d2fa892f 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21642,6 +21642,126 @@ integer value between 0 and 255 inclusive.
>  @exdent vector unsigned int vec_genpcvm (vector unsigned long long int,
>                                           const int)
>  @end smallexample
> +
> +Vector Integer Multiply/Divide/Modulo
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_mulh (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_mulh (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer
> +value in word element @code{i} of a is multiplied by the integer value in word
> +element @code{i} of b. The high-order 32 bits of the 64-bit product are placed
> +into word element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_mulh (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_mulh (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer
> +value in doubleword element @code{i} of a is multiplied by the integer value in
> +doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product
> +are placed into doubleword element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector unsigned long long
> +@exdent vec_mul (vector unsigned long long a, vector unsigned long long b)
> +@exdent vector signed long long
> +@exdent vec_mul (vector signed long long a, vector signed long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer
> +value in doubleword element @code{i} of a is multiplied by the integer value in
> +doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product
> +are placed into doubleword element @code{i} of the vector returned.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_div (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_div (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer in
> +word element @code{i} of a is divided by the integer in word element @code{i}
> +of b. The unique integer quotient is placed into the word element @code{i} of
> +the vector returned. If an attempt is made to perform any of the divisions
> +<anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_div (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_div (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer in
> +doubleword element @code{i} of a is divided by the integer in doubleword
> +element @code{i} of b. The unique integer quotient is placed into the
> +doubleword element @code{i} of the vector returned. If an attempt is made to
> +perform any of the divisions 0x8000_0000_0000_0000 ÷ -1 or <anything> ÷ 0 then
> +the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_dive (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_dive (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer in
> +word element @code{i} of a is shifted left by 32 bits, then divided by the
> +integer in word element @code{i} of b. The unique integer quotient is placed
> +into the word element @code{i} of the vector returned. If the quotient cannot
> +be represented in 32 bits, or if an attempt is made to perform any of the
> +divisions <anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_dive (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_dive (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer in
> +doubleword element @code{i} of a is shifted left by 64 bits, then divided by
> +the integer in doubleword element @code{i} of b. The unique integer quotient is
> +placed into the doubleword element @code{i} of the vector returned. If the
> +quotient cannot be represented in 64 bits, or if an attempt is made to perform
> +<anything> ÷ 0 then the quotient is undefined.
> +
> +@smallexample
> +@exdent vector signed int
> +@exdent vec_mod (vector signed int a, vector signed int b)
> +@exdent vector unsigned int
> +@exdent vec_mod (vector unsigned int a, vector unsigned int b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 3, do the following. The integer in
> +word element @code{i} of a is divided by the integer in word element @code{i}
> +of b. The unique integer remainder is placed into the word element @code{i} of
> +the vector returned.  If an attempt is made to perform any of the divisions
> +0x8000_0000 ÷ -1 or <anything> ÷ 0 then the remainder is undefined.
> +
> +@smallexample
> +@exdent vector signed long long
> +@exdent vec_mod (vector signed long long a, vector signed long long b)
> +@exdent vector unsigned long long
> +@exdent vec_mod (vector unsigned long long a, vector unsigned long long b)
> +@end smallexample
> +
> +For each integer value @code{i} from 0 to 1, do the following. The integer in
> +doubleword element @code{i} of a is divided by the integer in doubleword
> +element @code{i} of b. The unique integer remainder is placed into the
> +doubleword element @code{i} of the vector returned. If an attempt is made to
> +perform <anything> ÷ 0 then the remainder is undefined.
> +
>  Generate PCV from specified Mask size, as if implemented by the
>  @code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where
>  immediate value is either 0, 1, 2 or 3.
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> new file mode 100644
> index 00000000000..222c8b3a409
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> @@ -0,0 +1,398 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> +
> +/* { dg-final { scan-assembler-times {\mvdivsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivesw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdiveuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivesd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdiveud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmoduw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhsw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhuw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhsd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulhud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulld\M} 2 } } */
> +
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <math.h>
> +#include <altivec.h>
> +
> +#define DEBUG 0
> +
> +#ifdef DEBUG
> +#include <stdio.h>
> +#endif
> +
> +void abort (void);
> +
> +int main()
> +  {
> +    int i;
> +    vector int i_arg1, i_arg2;
> +    vector unsigned int u_arg1, u_arg2;
> +    vector long long int d_arg1, d_arg2;
> +    vector long long unsigned int ud_arg1, ud_arg2;
> +   
> +    vector int vec_i_expected, vec_i_result;
> +    vector unsigned int vec_u_expected, vec_u_result;
> +    vector long long int vec_d_expected, vec_d_result;
> +    vector long long unsigned int vec_ud_expected, vec_ud_result;
> +  
> +    /* Signed word divide */
> +    i_arg1 = (vector int){ 20, 40, 60, 80};
> +    i_arg2 = (vector int){ 2, 2, 2, 2};
> +    vec_i_expected = (vector int){10, 20, 30, 40};
> +
> +    vec_i_result = vec_div (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word divide */
> +    u_arg1 = (vector unsigned int){ 20, 40, 60, 80};
> +    u_arg2 = (vector unsigned int){ 2, 2, 2, 2};
> +    vec_u_expected = (vector unsigned int){10, 20, 30, 40};
> +
> +    vec_u_result = vec_div (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word divide */
> +    d_arg1 = (vector long long){ 24, 68};
> +    d_arg2 = (vector long long){ 2, 2};
> +    vec_d_expected = (vector long long){12, 34};
> +
> +    vec_d_result = vec_div (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +	  printf("ERROR vec_div signed result[%d] = %d != "
> +		 "expected[%d] = %d\n",
> +		 i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word divide */
> +    ud_arg1 = (vector unsigned long long){ 24, 68};
> +    ud_arg2 = (vector unsigned long long){ 2, 2};
> +    vec_ud_expected = (vector unsigned long long){12, 34};
> +
> +    vec_ud_result = vec_div (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_div unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended signed word  result = (arg1 << 32)/arg2 */
> +    i_arg1 = (vector int){ 2, 4, 6, 8};
> +    i_arg2 = (vector int){ 2048, 2048, 2048, 2048};
> +    vec_i_expected = (vector int){4194304, 8388608, 12582912, 16777216};
> +
> +    vec_i_result = vec_dive (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended unsigned word  result = (arg1 << 32)/arg2 */
> +    u_arg1 = (vector unsigned int){ 2, 4, 6, 8};
> +    u_arg2 = (vector unsigned int){ 2048, 2048, 2048, 2048};
> +    vec_u_expected = (vector unsigned int){4194304, 8388608,
> +					   12582912, 16777216};
> +
> +    vec_u_result = vec_dive (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended double signed  esult = (arg1 << 64)/arg2 */
> +    d_arg1 = (vector long long int){ 2, 4};
> +    d_arg2 = (vector long long int){ 4294967296, 4294967296};
> +
> +    vec_d_expected = (vector long long int){8589934592, 17179869184};
> +
> +    vec_d_result = vec_dive (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive signed result[%d] = %lld != "
> +		  "expected[%d] = %lld\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Divide Extended double unsigned result = (arg1 << 64)/arg2 */
> +    ud_arg1 = (vector long long unsigned int){ 2, 4};
> +    ud_arg2 = (vector long long unsigned int){ 4294967296, 4294967296};
> +
> +    vec_ud_expected = (vector long long unsigned int){8589934592,
> +						      17179869184};
> +
> +    vec_ud_result = vec_dive (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_dive unsigned result[%d] = %lld != "
> +		  "expected[%d] = %lld\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed word modulo */
> +    i_arg1 = (vector int){ 23, 45, 61, 89};
> +    i_arg2 = (vector int){ 2, 2, 2, 2};
> +    vec_i_expected = (vector int){1, 1, 1, 1};
> +
> +    vec_i_result = vec_mod (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word modulo */
> +    u_arg1 = (vector unsigned int){ 25, 41, 67, 86};
> +    u_arg2 = (vector unsigned int){ 3, 3, 3, 3};
> +    vec_u_expected = (vector unsigned int){1, 2, 1, 2};
> +
> +    vec_u_result = vec_mod (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word modulo */
> +    d_arg1 = (vector long long){ 24, 68};
> +    d_arg2 = (vector long long){ 7, 7};
> +    vec_d_expected = (vector long long){3, 5};
> +
> +    vec_d_result = vec_mod (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mod signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word modulo */
> +    ud_arg1 = (vector unsigned long long){ 24, 68};
> +    ud_arg2 = (vector unsigned long long){ 8, 8};
> +    vec_ud_expected = (vector unsigned long long){0, 4};
> +
> +    vec_ud_result = vec_mod (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vecmod unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed word multiply high */
> +    i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 };
> +    i_arg2 = (vector int){ 2, 3, 4, 5};
> +    //    vec_i_expected = (vector int){-1, -2, -2, -3};
> +    vec_i_expected = (vector int){1, -2, -2, -3};
> +
> +    vec_i_result = vec_mulh (i_arg1, i_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_i_expected[i] != vec_i_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_i_result[i],  i, vec_i_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned word multiply high */
> +    u_arg1 = (vector unsigned int){ 2147483648, 2147483648,
> +				    2147483648, 2147483648 };
> +    u_arg2 = (vector unsigned int){ 4, 5, 6, 7 };
> +    vec_u_expected = (vector unsigned int){2, 2, 3, 3 };
> +
> +    vec_u_result = vec_mulh (u_arg1, u_arg2);
> +
> +    for (i = 0; i < 4; i++)
> +      {
> +        if (vec_u_expected[i] != vec_u_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_u_result[i],  i, vec_u_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word multiply high */
> +    d_arg1 = (vector long long int){  2305843009213693951,
> +				      4611686018427387903 };
> +    d_arg2 = (vector long long int){ 12, 20 };
> +    vec_d_expected = (vector long long int){ 1, 4 };
> +
> +    vec_d_result = vec_mulh (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word multiply high */
> +    ud_arg1 = (vector unsigned long long int){ 2305843009213693951,
> +					       4611686018427387903 };
> +    ud_arg2 = (vector unsigned long long int){ 32, 10 };
> +    vec_ud_expected = (vector unsigned long long int){ 3, 2 };
> +
> +    vec_ud_result = vec_mulh (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mulh unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Unsigned double word multiply low */
> +    ud_arg1 = (vector unsigned long long int){ 2048, 4096 };
> +    ud_arg2 = (vector unsigned long long int){ 2, 4 };
> +    vec_ud_expected = (vector unsigned long long int){ 4096, 16384 };
> +
> +    vec_ud_result = vec_mul (ud_arg1, ud_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_ud_expected[i] != vec_ud_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mul unsigned result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +
> +    /* Signed double word multiply low */
> +    d_arg1 = (vector signed long long int){ 2048, 4096 };
> +    d_arg2 = (vector signed long long int){ 2, 4 };
> +    vec_d_expected = (vector signed long long int){ 4096, 16384 };
> +
> +    vec_d_result = vec_mul (d_arg1, d_arg2);
> +
> +    for (i = 0; i < 2; i++)
> +      {
> +        if (vec_d_expected[i] != vec_d_result[i])
> +#ifdef DEBUG
> +           printf("ERROR vec_mul signed result[%d] = %d != "
> +		  "expected[%d] = %d\n",
> +		  i, vec_d_result[i],  i, vec_d_expected[i]);
> +#else
> +        abort();
> +#endif
> +      }
> +  }


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5] rs6000, vector integer multiply/divide/modulo instructions
  2020-12-08  0:31                   ` [PATCH v4] " Carl Love
  2021-01-04 16:45                     ` Carl Love
  2021-01-11 21:18                     ` will schmidt
@ 2021-01-13 22:15                     ` Carl Love
  2021-01-15 18:51                       ` Segher Boessenkool
  2 siblings, 1 reply; 22+ messages in thread
From: Carl Love @ 2021-01-13 22:15 UTC (permalink / raw)
  To: will schmidt, Segher Boessenkool, Pat Haugen
  Cc: GCC Patches, David Edelsohn, cel

Will:

I have addressed the various typos you mentioned in the messages to the
maintainers.

Per your comment I have also tested the updated patch on Power 8 BE.

The patch was compiled and tested on:

   powerpc64le-unknown-linux-gnu (Power 8 BE)
   powerpc64le-unknown-linux-gnu (Power 9 LE)
   powerpc64le-unknown-linux-gnu (Power 10 LE)

I have fixed the change log entries.

I have fixed the formatting/white space issues you mentioned.

With regards to the comment:

> Presumably it is safe (no side affects) when adding V4SI and V2DI here,
> with respect to other current users of 'bits'.
> Is it worth adding the
> other modes while we are here? (V1TI, V8HI, V16QI ).

I did not add the additional modes.  I don't see any reason it would
hurt but feel it is best to only add them when they are needed.

                 Carl 

---------------------------------------------------------------

Will:

I have addressed you comments with regards to the Change Log entries.  

The extra define vec_div was removed.

Added the missing entries for DIVU_V2DI  DIVS_V2DI in rs6000-call.c.

The extra MULLD_V2DI case statement entry was removed.

Added comment in rs6000.md about size for vector types per discussion
with Pat.

                          Carl
--------------------------------------------------------

GCC maintainers:

The following patch adds new builtins for the vector integer multiply,
divide and modulo operations.  The builtins are: vec_mulh(),
vec_dive(), vec_mod() for signed and unsigned integers and long
long integers. The existing support for the vec_div() and vec_mul()
builtins emulate the vector operations with multiple scalar
instructions.  This patch adds support for these builtins using the new
vector instructions for Power 10.

The patch was compiled and tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE)
  powerpc64le-unknown-linux-gnu (Power 10 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

                Carl Love

-------------------------------------------------------------------

2021-01-12  Carl Love  <cel@us.ibm.com>

gcc/
	* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
	defines.
	* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
	* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
	DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
	DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
	MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
	Add builtin define.
	(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
	* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
	VSX_BUILTIN_VEC_DIVE, P10_BUILTIN_VEC_MOD, P10_BUILTIN_VEC_MULH):
	New overloaded definitions.
	(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
	P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
	P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
	P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
	P10V_BUILTIN_MULHU_V4SI]: Add case
	statement for builtins.
	* config/rs6000/rs6000.md (bits): Add new attribute sizes V4SI, V2DI.
	* config/rs6000/vsx.md (VIlong): Moved from config/rs6000/altivec.md.
	(UNSPEC_VDIVES, UNSPEC_VDIVEU): New unspec definitions.
	(vsx_mul_v2di): Add if TARGET_POWER10 statement.
	(vsx_udiv_v2di): Add if TARGET_POWER10 statement.
	(dives_<mode>, diveu_<mode>, div<mode>3, uvdiv<mode>3,
	mods_<mode>, modu_<mode>, mulhs_<mode>, mulhu_<mode>, mulv2di3):
	Add define_insn, mode is VIlong.
	doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
	builtin descriptions.

gcc/testsuite/
	* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |   4 +
 gcc/config/rs6000/altivec.md                  |   2 -
 gcc/config/rs6000/rs6000-builtin.def          |  21 +
 gcc/config/rs6000/rs6000-call.c               |  53 +++
 gcc/config/rs6000/rs6000.md                   |   3 +-
 gcc/config/rs6000/vsx.md                      | 211 +++++++---
 gcc/doc/extend.texi                           | 120 ++++++
 .../powerpc/builtins-1-p10-runnable.c         | 398 ++++++++++++++++++
 8 files changed, 759 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 06f0d4d9f14..961621a0841 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,10 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a)	__builtin_vec_strir_p (a)
 #define vec_stril_p(a)	__builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
+#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
+#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index fc19a8fc807..27a269b9e72 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -193,8 +193,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 8aa31ad0a06..058a32abf4c 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2883,6 +2883,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
 BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
 
+BU_P10V_AV_2 (DIVES_V4SI, "vdivesw", CONST, dives_v4si)
+BU_P10V_AV_2 (DIVES_V2DI, "vdivesd", CONST, dives_v2di)
+BU_P10V_AV_2 (DIVEU_V4SI, "vdiveuw", CONST, diveu_v4si)
+BU_P10V_AV_2 (DIVEU_V2DI, "vdiveud", CONST, diveu_v2di)
+BU_P10V_AV_2 (DIVS_V4SI, "vdivsw", CONST, divv4si3)
+BU_P10V_AV_2 (DIVS_V2DI, "vdivsd", CONST, divv2di3)
+BU_P10V_AV_2 (DIVU_V4SI, "vdivuw", CONST, udivv4si3)
+BU_P10V_AV_2 (DIVU_V2DI, "vdivud", CONST, udivv2di3)
+BU_P10V_AV_2 (MODS_V2DI, "vmodsd", CONST, mods_v2di)
+BU_P10V_AV_2 (MODS_V4SI, "vmodsw", CONST, mods_v4si)
+BU_P10V_AV_2 (MODU_V2DI, "vmodud", CONST, modu_v2di)
+BU_P10V_AV_2 (MODU_V4SI, "vmoduw", CONST, modu_v4si)
+BU_P10V_AV_2 (MULHS_V2DI, "vmulhsd", CONST, mulhs_v2di)
+BU_P10V_AV_2 (MULHS_V4SI, "vmulhsw", CONST, mulhs_v4si)
+BU_P10V_AV_2 (MULHU_V2DI, "vmulhud", CONST, mulhu_v2di)
+BU_P10V_AV_2 (MULHU_V4SI, "vmulhuw", CONST, mulhu_v4si)
+BU_P10V_AV_2 (MULLD_V2DI, "vmulld", CONST, mulv2di3)
+
 BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, xxspltiw_v4si)
 BU_P10V_VSX_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST, xxspltiw_v4sf)
 
@@ -2958,6 +2976,9 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
 BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
 BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
 
+BU_P10_OVERLOAD_2 (MULH, "mulh")
+BU_P10_OVERLOAD_2 (DIVE, "dive")
+BU_P10_OVERLOAD_2 (MOD, "mod")
 \f
 BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
 BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 2308cc8b4a2..ae0c761f0a4 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -1069,6 +1069,40 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
@@ -1909,6 +1943,17 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB,
@@ -14438,6 +14483,14 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10V_BUILTIN_XXGENPCVM_V8HI:
     case P10V_BUILTIN_XXGENPCVM_V4SI:
     case P10V_BUILTIN_XXGENPCVM_V2DI:
+    case P10V_BUILTIN_DIVEU_V4SI:
+    case P10V_BUILTIN_DIVEU_V2DI:
+    case P10V_BUILTIN_DIVU_V4SI:
+    case P10V_BUILTIN_DIVU_V2DI:
+    case P10V_BUILTIN_MODU_V2DI:
+    case P10V_BUILTIN_MODU_V4SI:
+    case P10V_BUILTIN_MULHU_V2DI:
+    case P10V_BUILTIN_MULHU_V4SI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bb9fb42f82a..9113ba5dbc4 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -670,7 +670,8 @@
 
 ;; How many bits in this mode?
 (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
-					   (SF "32") (DF "64")])
+					   (SF "32") (DF "64")
+					   (V4SI "32") (V2DI "64")])
 
 ; DImode bits
 (define_mode_attr dbits [(QI "56") (HI "48") (SI "32")])
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0c1bda522a9..3e0518631df 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -267,6 +267,10 @@
 (define_mode_iterator VSX_MM [V16QI V8HI V4SI V2DI V1TI])
 (define_mode_iterator VSX_MM4 [V16QI V8HI V4SI V2DI])
 
+;; Longer vec int modes for rotate/mask ops
+;; and Vector Integer Multiply/Divide/Modulo Instructions
+(define_mode_iterator VIlong [V2DI V4SI])
+
 ;; Constants for creating unspecs
 (define_c_enum "unspec"
   [UNSPEC_VSX_CONCAT
@@ -363,6 +367,8 @@
    UNSPEC_INSERTR
    UNSPEC_REPLACE_ELT
    UNSPEC_REPLACE_UN
+   UNSPEC_VDIVES
+   UNSPEC_VDIVEU
   ])
 
 (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
@@ -1623,28 +1629,35 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op5, op3, op4));
-  else
-    {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op5, ret);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_muldi3 (op3, op3, op4));
+
+  if (TARGET_POWER10)
+    emit_insn (gen_mulv2di3 (op0, op1, op2) );
+
   else
     {
-      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
-      emit_move_insn (op3, ret);
+      rtx op3 = gen_reg_rtx (DImode);
+      rtx op4 = gen_reg_rtx (DImode);
+      rtx op5 = gen_reg_rtx (DImode);
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op5, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op5, ret);
+	}
+      emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+      emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+      if (TARGET_POWERPC64)
+	emit_insn (gen_muldi3 (op3, op3, op4));
+      else
+	{
+	  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+	  emit_move_insn (op3, ret);
+	}
+      emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
     }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }
   [(set_attr "type" "mul")])
@@ -1718,37 +1731,46 @@
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op3 = gen_reg_rtx (DImode);
-  rtx op4 = gen_reg_rtx (DImode);
-  rtx op5 = gen_reg_rtx (DImode);
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op5, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op5, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op5, target);
-    }
-  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
-  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  if (TARGET_POWERPC64)
-    emit_insn (gen_udivdi3 (op3, op3, op4));
-  else
-    {
-      rtx libfunc = optab_libfunc (udiv_optab, DImode);
-      rtx target = emit_library_call_value (libfunc,
-					    op3, LCT_NORMAL, DImode,
-					    op3, DImode,
-					    op4, DImode);
-      emit_move_insn (op3, target);
-    }
-  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
-  DONE;
+
+    if (TARGET_POWER10)
+      emit_insn (gen_udivv2di3 (op0, op1, op2) );
+    else
+      {
+	rtx op3 = gen_reg_rtx (DImode);
+	rtx op4 = gen_reg_rtx (DImode);
+	rtx op5 = gen_reg_rtx (DImode);
+
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op5, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op5, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op5, target);
+	  }
+	emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
+	emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
+
+	if (TARGET_POWERPC64)
+	  emit_insn (gen_udivdi3 (op3, op3, op4));
+	else
+	  {
+	    rtx libfunc = optab_libfunc (udiv_optab, DImode);
+	    rtx target = emit_library_call_value (libfunc,
+						  op3, LCT_NORMAL, DImode,
+						  op3, DImode,
+						  op4, DImode);
+	    emit_move_insn (op3, target);
+	  }
+	emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
+      }
+    DONE;
 }
   [(set_attr "type" "div")])
 
@@ -6104,3 +6126,92 @@
   "TARGET_POWER10"
   "vexpand<wd>m %0,%1"
   [(set_attr "type" "vecsimple")])
+
+(define_insn "dives_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+		        (match_operand:VIlong 2 "vsx_register_operand" "v")]
+        UNSPEC_VDIVES))]
+  "TARGET_POWER10"
+  "vdives<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "diveu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+        (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
+			(match_operand:VIlong 2 "vsx_register_operand" "v")]
+	UNSPEC_VDIVEU))]
+  "TARGET_POWER10"
+  "vdiveu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "div<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivs<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "udiv<mode>3"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vdivu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "mods_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		    (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmods<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "modu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
+		     (match_operand:VIlong 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmodu<wd> %0,%1,%2"
+  [(set_attr "type" "vecdiv")
+   (set_attr "size" "<bits>")])
+
+(define_insn "mulhs_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(mult:VIlong (ashiftrt
+		       (match_operand:VIlong 1 "vsx_register_operand" "v")
+		       (const_int 32))
+		     (ashiftrt
+		       (match_operand:VIlong 2 "vsx_register_operand" "v")
+		       (const_int 32))))]
+  "TARGET_POWER10"
+  "vmulhs<wd> %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+(define_insn "mulhu_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(us_mult:VIlong (ashiftrt
+			  (match_operand:VIlong 1 "vsx_register_operand" "v")
+			  (const_int 32))
+			(ashiftrt
+			  (match_operand:VIlong 2 "vsx_register_operand" "v")
+			  (const_int 32))))]
+  "TARGET_POWER10"
+  "vmulhu<wd> %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+;; Vector multiply low double word
+(define_insn "mulv2di3"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
+	(mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
+		   (match_operand:V2DI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmulld %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 2748e98462e..c5b1faff60b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21642,6 +21642,126 @@ integer value between 0 and 255 inclusive.
 @exdent vector unsigned int vec_genpcvm (vector unsigned long long int,
                                          const int)
 @end smallexample
+
+Vector Integer Multiply/Divide/Modulo
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mulh (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mulh (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer
+value in word element @code{i} of a is multiplied by the integer value in word
+element @code{i} of b. The high-order 32 bits of the 64-bit product are placed
+into word element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mulh (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mulh (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer
+value in doubleword element @code{i} of a is multiplied by the integer value in
+doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product
+are placed into doubleword element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector unsigned long long
+@exdent vec_mul (vector unsigned long long a, vector unsigned long long b)
+@exdent vector signed long long
+@exdent vec_mul (vector signed long long a, vector signed long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer
+value in doubleword element @code{i} of a is multiplied by the integer value in
+doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product
+are placed into doubleword element @code{i} of the vector returned.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_div (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_div (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is divided by the integer in word element @code{i}
+of b. The unique integer quotient is placed into the word element @code{i} of
+the vector returned. If an attempt is made to perform any of the divisions
+<anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_div (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_div (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is divided by the integer in doubleword
+element @code{i} of b. The unique integer quotient is placed into the
+doubleword element @code{i} of the vector returned. If an attempt is made to
+perform any of the divisions 0x8000_0000_0000_0000 ÷ -1 or <anything> ÷ 0 then
+the quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_dive (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_dive (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is shifted left by 32 bits, then divided by the
+integer in word element @code{i} of b. The unique integer quotient is placed
+into the word element @code{i} of the vector returned. If the quotient cannot
+be represented in 32 bits, or if an attempt is made to perform any of the
+divisions <anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_dive (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_dive (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is shifted left by 64 bits, then divided by
+the integer in doubleword element @code{i} of b. The unique integer quotient is
+placed into the doubleword element @code{i} of the vector returned. If the
+quotient cannot be represented in 64 bits, or if an attempt is made to perform
+<anything> ÷ 0 then the quotient is undefined.
+
+@smallexample
+@exdent vector signed int
+@exdent vec_mod (vector signed int a, vector signed int b)
+@exdent vector unsigned int
+@exdent vec_mod (vector unsigned int a, vector unsigned int b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is divided by the integer in word element @code{i}
+of b. The unique integer remainder is placed into the word element @code{i} of
+the vector returned.  If an attempt is made to perform any of the divisions
+0x8000_0000 ÷ -1 or <anything> ÷ 0 then the remainder is undefined.
+
+@smallexample
+@exdent vector signed long long
+@exdent vec_mod (vector signed long long a, vector signed long long b)
+@exdent vector unsigned long long
+@exdent vec_mod (vector unsigned long long a, vector unsigned long long b)
+@end smallexample
+
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is divided by the integer in doubleword
+element @code{i} of b. The unique integer remainder is placed into the
+doubleword element @code{i} of the vector returned. If an attempt is made to
+perform <anything> ÷ 0 then the remainder is undefined.
+
 Generate PCV from specified Mask size, as if implemented by the
 @code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where
 immediate value is either 0, 1, 2 or 3.
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
new file mode 100644
index 00000000000..222c8b3a409
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -0,0 +1,398 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+
+/* { dg-final { scan-assembler-times {\mvdivsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmoduw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhuw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulld\M} 2 } } */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <math.h>
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+void abort (void);
+
+int main()
+  {
+    int i;
+    vector int i_arg1, i_arg2;
+    vector unsigned int u_arg1, u_arg2;
+    vector long long int d_arg1, d_arg2;
+    vector long long unsigned int ud_arg1, ud_arg2;
+   
+    vector int vec_i_expected, vec_i_result;
+    vector unsigned int vec_u_expected, vec_u_result;
+    vector long long int vec_d_expected, vec_d_result;
+    vector long long unsigned int vec_ud_expected, vec_ud_result;
+  
+    /* Signed word divide */
+    i_arg1 = (vector int){ 20, 40, 60, 80};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){10, 20, 30, 40};
+
+    vec_i_result = vec_div (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word divide */
+    u_arg1 = (vector unsigned int){ 20, 40, 60, 80};
+    u_arg2 = (vector unsigned int){ 2, 2, 2, 2};
+    vec_u_expected = (vector unsigned int){10, 20, 30, 40};
+
+    vec_u_result = vec_div (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word divide */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 2, 2};
+    vec_d_expected = (vector long long){12, 34};
+
+    vec_d_result = vec_div (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+	  printf("ERROR vec_div signed result[%d] = %d != "
+		 "expected[%d] = %d\n",
+		 i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word divide */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 2, 2};
+    vec_ud_expected = (vector unsigned long long){12, 34};
+
+    vec_ud_result = vec_div (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_div unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended signed word  result = (arg1 << 32)/arg2 */
+    i_arg1 = (vector int){ 2, 4, 6, 8};
+    i_arg2 = (vector int){ 2048, 2048, 2048, 2048};
+    vec_i_expected = (vector int){4194304, 8388608, 12582912, 16777216};
+
+    vec_i_result = vec_dive (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended unsigned word  result = (arg1 << 32)/arg2 */
+    u_arg1 = (vector unsigned int){ 2, 4, 6, 8};
+    u_arg2 = (vector unsigned int){ 2048, 2048, 2048, 2048};
+    vec_u_expected = (vector unsigned int){4194304, 8388608,
+					   12582912, 16777216};
+
+    vec_u_result = vec_dive (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double signed  esult = (arg1 << 64)/arg2 */
+    d_arg1 = (vector long long int){ 2, 4};
+    d_arg2 = (vector long long int){ 4294967296, 4294967296};
+
+    vec_d_expected = (vector long long int){8589934592, 17179869184};
+
+    vec_d_result = vec_dive (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive signed result[%d] = %lld != "
+		  "expected[%d] = %lld\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Divide Extended double unsigned result = (arg1 << 64)/arg2 */
+    ud_arg1 = (vector long long unsigned int){ 2, 4};
+    ud_arg2 = (vector long long unsigned int){ 4294967296, 4294967296};
+
+    vec_ud_expected = (vector long long unsigned int){8589934592,
+						      17179869184};
+
+    vec_ud_result = vec_dive (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_dive unsigned result[%d] = %lld != "
+		  "expected[%d] = %lld\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word modulo */
+    i_arg1 = (vector int){ 23, 45, 61, 89};
+    i_arg2 = (vector int){ 2, 2, 2, 2};
+    vec_i_expected = (vector int){1, 1, 1, 1};
+
+    vec_i_result = vec_mod (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word modulo */
+    u_arg1 = (vector unsigned int){ 25, 41, 67, 86};
+    u_arg2 = (vector unsigned int){ 3, 3, 3, 3};
+    vec_u_expected = (vector unsigned int){1, 2, 1, 2};
+
+    vec_u_result = vec_mod (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word modulo */
+    d_arg1 = (vector long long){ 24, 68};
+    d_arg2 = (vector long long){ 7, 7};
+    vec_d_expected = (vector long long){3, 5};
+
+    vec_d_result = vec_mod (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mod signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word modulo */
+    ud_arg1 = (vector unsigned long long){ 24, 68};
+    ud_arg2 = (vector unsigned long long){ 8, 8};
+    vec_ud_expected = (vector unsigned long long){0, 4};
+
+    vec_ud_result = vec_mod (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vecmod unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed word multiply high */
+    i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 };
+    i_arg2 = (vector int){ 2, 3, 4, 5};
+    //    vec_i_expected = (vector int){-1, -2, -2, -3};
+    vec_i_expected = (vector int){1, -2, -2, -3};
+
+    vec_i_result = vec_mulh (i_arg1, i_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_i_expected[i] != vec_i_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_i_result[i],  i, vec_i_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned word multiply high */
+    u_arg1 = (vector unsigned int){ 2147483648, 2147483648,
+				    2147483648, 2147483648 };
+    u_arg2 = (vector unsigned int){ 4, 5, 6, 7 };
+    vec_u_expected = (vector unsigned int){2, 2, 3, 3 };
+
+    vec_u_result = vec_mulh (u_arg1, u_arg2);
+
+    for (i = 0; i < 4; i++)
+      {
+        if (vec_u_expected[i] != vec_u_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_u_result[i],  i, vec_u_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply high */
+    d_arg1 = (vector long long int){  2305843009213693951,
+				      4611686018427387903 };
+    d_arg2 = (vector long long int){ 12, 20 };
+    vec_d_expected = (vector long long int){ 1, 4 };
+
+    vec_d_result = vec_mulh (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply high */
+    ud_arg1 = (vector unsigned long long int){ 2305843009213693951,
+					       4611686018427387903 };
+    ud_arg2 = (vector unsigned long long int){ 32, 10 };
+    vec_ud_expected = (vector unsigned long long int){ 3, 2 };
+
+    vec_ud_result = vec_mulh (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mulh unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Unsigned double word multiply low */
+    ud_arg1 = (vector unsigned long long int){ 2048, 4096 };
+    ud_arg2 = (vector unsigned long long int){ 2, 4 };
+    vec_ud_expected = (vector unsigned long long int){ 4096, 16384 };
+
+    vec_ud_result = vec_mul (ud_arg1, ud_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_ud_expected[i] != vec_ud_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul unsigned result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_ud_result[i],  i, vec_ud_expected[i]);
+#else
+        abort();
+#endif
+      }
+
+    /* Signed double word multiply low */
+    d_arg1 = (vector signed long long int){ 2048, 4096 };
+    d_arg2 = (vector signed long long int){ 2, 4 };
+    vec_d_expected = (vector signed long long int){ 4096, 16384 };
+
+    vec_d_result = vec_mul (d_arg1, d_arg2);
+
+    for (i = 0; i < 2; i++)
+      {
+        if (vec_d_expected[i] != vec_d_result[i])
+#ifdef DEBUG
+           printf("ERROR vec_mul signed result[%d] = %d != "
+		  "expected[%d] = %d\n",
+		  i, vec_d_result[i],  i, vec_d_expected[i]);
+#else
+        abort();
+#endif
+      }
+  }
-- 
2.27.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5] rs6000, vector integer multiply/divide/modulo instructions
  2021-01-13 22:15                     ` [PATCH v5] " Carl Love
@ 2021-01-15 18:51                       ` Segher Boessenkool
  0 siblings, 0 replies; 22+ messages in thread
From: Segher Boessenkool @ 2021-01-15 18:51 UTC (permalink / raw)
  To: Carl Love; +Cc: will schmidt, Pat Haugen, GCC Patches, David Edelsohn

Hi!

On Wed, Jan 13, 2021 at 02:15:04PM -0800, Carl Love wrote:
> The patch was compiled and tested on:
> 
>    powerpc64le-unknown-linux-gnu (Power 8 BE)

(I assume you mean powerpc64-linux instead?)

> > Presumably it is safe (no side affects) when adding V4SI and V2DI here,
> > with respect to other current users of 'bits'.
> > Is it worth adding the
> > other modes while we are here? (V1TI, V8HI, V16QI ).
> 
> I did not add the additional modes.  I don't see any reason it would
> hurt but feel it is best to only add them when they are needed.

Either works, sure.  Having all but one vector modes covered is silly
(and can give people the idea something is special with that one mode),
but you have only two so far, so :-)

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -670,7 +670,8 @@
>  
>  ;; How many bits in this mode?
>  (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
> -					   (SF "32") (DF "64")])
> +					   (SF "32") (DF "64")
> +					   (V4SI "32") (V2DI "64")])

The comment needs a clarification what this means for vector modes.
"How many bits (per element) in this mode?" perhaps?  Does that sound
good?

The patch is okay for trunk with such a tweak.  Thank you!


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-01-15 18:52 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-30 20:07 [PATCH] rs6000, vector integer multiply/divide/modulo instructions Carl Love
2020-10-30 21:05 ` David Edelsohn
2020-10-30 21:33   ` Carl Love
2020-10-31 13:28 ` David Edelsohn
2020-11-02 21:06   ` Carl Love
2020-11-04 16:44   ` Carl Love
2020-11-19  0:42     ` David Edelsohn
2020-11-19 17:25     ` Pat Haugen
2020-11-19 20:40       ` Segher Boessenkool
2020-11-19 23:26     ` Segher Boessenkool
2020-11-24 18:59       ` [PATCH v2] " Carl Love
2020-11-25  2:17         ` Pat Haugen
2020-11-25  2:34           ` Pat Haugen
2020-11-26  2:30             ` Segher Boessenkool
2020-12-01 23:48               ` Carl Love
2020-12-02 23:14                 ` will schmidt
2020-12-03 19:55                 ` will schmidt
2020-12-08  0:31                   ` [PATCH v4] " Carl Love
2021-01-04 16:45                     ` Carl Love
2021-01-11 21:18                     ` will schmidt
2021-01-13 22:15                     ` [PATCH v5] " Carl Love
2021-01-15 18:51                       ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).