public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch 0/5] rs6000, 128-bit Binary Integer Operations
@ 2020-09-21 21:17 Carl Love
  2020-09-21 23:56 ` [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
                   ` (4 more replies)
  0 siblings, 5 replies; 41+ messages in thread
From: Carl Love @ 2020-09-21 21:17 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt

Will, Segher:

The following is the updated patch set for the 128-bit Binary Integer
Operation.  I am reposting the entire set for completeness.  I have
noted in each patch the changes made since the previous version.  

The patches have been tested on Power 8 and Power 9 to ensure there are
no regression errors.  The new tests have been manually compiled and
run on mambo to ensure they work correctly.

Please review the patches and let me know if they are acceptable for
mainline.  Thanks.

                       Carl Love


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations
  2020-09-21 21:17 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
@ 2020-09-21 23:56 ` Carl Love
  2020-09-24 18:20   ` will schmidt
  2020-09-21 23:56 ` [PATCH 2/5] RS6000 add 128-bit Integer Operations Carl Love
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-09-21 23:56 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt

Segher, Will:

Patch 1, adds the 128-bit sign extension instruction support and
corresponding builtin support.

No changes from the previous version.

The patch has been tested on 

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regression errors.

Fixed the issues in the ChangeLog noted by Will.

             Carl Love

---------------------------------------------------

gcc/ChangeLog

2020-09-21  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
	for new builtins.
	* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
	overloaded builtin definitions.
	(VSIGNEXTSB2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D): Add builtin
	expansions.
	* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
	P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
	* config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
	visible.
	* doc/extend.texi:  Add documentation for the vec_signexti and
	vec_signextll builtins.

gcc/testsuite/ChangeLog

2020-09-21  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
---
 gcc/config/rs6000/altivec.h                   |   3 +
 gcc/config/rs6000/rs6000-builtin.def          |   9 ++
 gcc/config/rs6000/rs6000-call.c               |  13 ++
 gcc/config/rs6000/vsx.md                      |   2 +-
 gcc/doc/extend.texi                           |  15 ++
 .../powerpc/p9-sign_extend-runnable.c         | 128 ++++++++++++++++++
 6 files changed, 169 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 8a2dcda0144..acc365612be 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -494,6 +494,9 @@
 
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
+#define vec_signexti  __builtin_vec_vsignexti
+#define vec_signextll __builtin_vec_vsignextll
+
 #endif
 
 /* Predicates.
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index e91a48ddf5f..4c2e9460949 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2715,6 +2715,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,	"vprtybd")
 BU_P9V_OVERLOAD_1 (VPRTYBQ,	"vprtybq")
 BU_P9V_OVERLOAD_1 (VPRTYBW,	"vprtybw")
 BU_P9V_OVERLOAD_1 (VPARITY_LSBB,	"vparity_lsbb")
+BU_P9V_OVERLOAD_1 (VSIGNEXTI,	"vsignexti")
+BU_P9V_OVERLOAD_1 (VSIGNEXTLL,	"vsignextll")
 
 /* 2 argument functions added in ISA 3.0 (power9).  */
 BU_P9_2 (CMPRB,	"byte_in_range",	CONST,	cmprb)
@@ -2726,6 +2728,13 @@ BU_P9_OVERLOAD_2 (CMPRB,	"byte_in_range")
 BU_P9_OVERLOAD_2 (CMPRB2,	"byte_in_either_range")
 BU_P9_OVERLOAD_2 (CMPEQB,	"byte_in_set")
 \f
+/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1.  */
+BU_P9V_AV_1 (VSIGNEXTSB2W,	"vsignextsb2w",		CONST,  vsx_sign_extend_qi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSH2W,	"vsignextsh2w",		CONST,  vsx_sign_extend_hi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSB2D,	"vsignextsb2d",		CONST,  vsx_sign_extend_qi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
+
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
 BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index a8b520834c7..9e514a01012 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5527,6 +5527,19 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI },
 
+  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 */
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
+
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+
   /* Overloaded built-in functions for ISA3.1 (power10). */
   { P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 },
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 4ff52455fd3..31fcffe8f33 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4787,7 +4787,7 @@
   "vextsh2<wd> %0,%1"
   [(set_attr "type" "vecexts")])
 
-(define_insn "*vsx_sign_extend_si_v2di"
+(define_insn "vsx_sign_extend_si_v2di"
   [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
 	(unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
 		     UNSPEC_VSX_SIGN_EXTEND))]
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5571c4f2ff2..c1c2c9f9bf7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20800,6 +20800,21 @@ void vec_xst (vector unsigned char, int, vector unsigned char *);
 void vec_xst (vector unsigned char, int, unsigned char *);
 @end smallexample
 
+uThe following sign extension builtins are provided.
+
+@smallexample
+vector signed int vec_signexti (vector signed char a)
+vector signed long long vec_signextll (vector signed char a)
+vector signed int vec_signexti (vector signed short a)
+vector signed long long vec_signextll (vector signed short a)
+vector signed long long vec_signextll (vector signed int a)
+@end smallexample
+
+Each element of the result is produced by sign-extending the element of the
+input vector that would fall in the least significant portion of the result
+element. For example, a sign-extension of a vector signed char to a vector
+signed long long will sign extend the rightmost byte of each doubleword.
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 3.1
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1
 
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
new file mode 100644
index 00000000000..7bf979c6fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
@@ -0,0 +1,128 @@
+/* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
+
+/* These builtins were not defined until ISA 3.1 but only require ISA 3.0
+   support.  */
+
+/* { dg-final { scan-assembler-times {\mvextsb2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsw2d\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector signed char vec_arg_qi, vec_result_qi;
+  vector signed short int vec_arg_hi, vec_result_hi, vec_expected_hi;
+  vector signed int vec_arg_wi, vec_result_wi, vec_expected_wi;
+  vector signed long long vec_result_di, vec_expected_di;
+
+  /* test sign extend byte to word */
+  vec_arg_qi = (vector signed char) {1, 2, 3, 4, 5, 6, 7, 8,
+				     -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_wi = (vector signed int) {1, 5, -1, -5};
+
+  vec_result_wi = vec_signexti (vec_arg_qi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(char, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n",
+	     i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend byte to double */
+  vec_arg_qi = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,
+				    -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_qi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(byte, long long int):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to word */
+  vec_arg_hi = (vector signed short int){1, 2, 3, 4, -1, -2, -3, -4};
+  vec_expected_wi = (vector signed int){1, 3, -1, -3};
+
+  vec_result_wi = vec_signexti(vec_arg_hi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(short, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n", i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to double word */
+  vec_arg_hi = (vector signed short int ){1, 3, 5, 7,  -1, -3, -5, -7};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_hi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(short, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend word to double word */
+  vec_arg_wi = (vector signed int ){1, 3, -1, -3};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_wi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(word, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  return 0;
+}
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/5] RS6000 add 128-bit Integer Operations
  2020-09-21 21:17 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
  2020-09-21 23:56 ` [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
@ 2020-09-21 23:56 ` Carl Love
  2020-09-24 18:21   ` will schmidt
  2020-09-21 23:56 ` [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support Carl Love
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-09-21 23:56 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt


Will, Segher:

Add support for divide, modulo, shift, compare of 128-bit
integers instructions and builtin support.

The following are the changes from the previous version of the patch.

The TARGET_TI_VECTOR_OPS was removed per comments for patch 3.  Just
using TARGET_POWER10.

Removed extra comment.

Note the change

-#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
+#define
vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))

is a bug fix. Added missing comment to ChangeLog.

Removed vector_eqv1ti, used eqvv1ti3 instead.

Test case, put ppc_native_128bit in the dg-require command.

The patch has been tested on

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regression errors.

The P10 test was run by hand on Mambo.


                    Carl Love


-------------------------------------------------------

gcc/ChangeLog

	2020-09-21  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
	for new builtins.
	(vec_rlnm): Fix bug in argument generation.
	* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
	UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
	(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
	altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
	altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
	altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
	altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
	define_insn.
	(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
	vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
	altivec_vrlqnm): New define_expands.
	* config/rs6000/rs6000-builtin.def (BU_P10_P, BU_P10_128BIT_1,
	BU_P10_128BIT_2, BU_P10_128BIT_3): New macro definitions.
	(VCMPEQUT_P, VCMPGTST_P, VCMPGTUT_P): Add macro expansions.
	(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
	CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
	VCMPAET_P): New macro expansions.
	(VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ, VSLQ,
	VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
	MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
	(VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions.
	* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
	P10_BUILTIN_CMPGE_1TI, P10_BUILTIN_CMPGE_U1TI,
	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPGTST,
	P10_BUILTIN_CMPLE_1TI, P10_BUILTIN_VCMPLE_U1TI,
	P10_BUILTIN_128BIT_DIV_V1TI, P10_BUILTIN_128BIT_UDIV_V1TI,
	P10_BUILTIN_128BIT_VMULESD, P10_BUILTIN_128BIT_VMULEUD,
	P10_BUILTIN_128BIT_VMULOSD, P10_BUILTIN_128BIT_VMULOUD,
	P10_BUILTIN_VNOR_V1TI, P10_BUILTIN_VNOR_V1TI_UNS,
	P10_BUILTIN_128BIT_VRLQ, P10_BUILTIN_128BIT_VRLQMI,
	P10_BUILTIN_128BIT_VRLQNM, P10_BUILTIN_128BIT_VSLQ,
	P10_BUILTIN_128BIT_VSRQ, P10_BUILTIN_128BIT_VSRAQ,
	P10_BUILTIN_VCMPGTUT_P, P10_BUILTIN_VCMPGTST_P,
	P10_BUILTIN_VCMPEQUT_P, P10_BUILTIN_VCMPGTUT_P,
	P10_BUILTIN_VCMPGTST_P, P10_BUILTIN_CMPNET,
	P10_BUILTIN_VCMPNET_P, P10_BUILTIN_VCMPAET_P,
	P10_BUILTIN_128BIT_VSIGNEXTSD2Q, P10_BUILTIN_128BIT_DIVES_V1TI,
	P10_BUILTIN_128BIT_MODS_V1TI, P10_BUILTIN_128BIT_MODU_V1TI):
	New overloaded definitions.
	(int_ftype_int_v1ti_v1ti) [P10_BUILTIN_VCMPEQUT,
	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
	P10_BUILTIN_CMPLE_U1TI, E_V1TImode]: New case statements.
	(int_ftype_int_v1ti_v1ti) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
	New assignments.
	(altivec_init_builtins): New E_V1TImode case statement.
	(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
	P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
	P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.
	* config/rs6000/r6000.c (rs6000_option_override_internal): Add if
	TARGET_POWER10 statement.
	(rs6000_handle_altivec_attribute)[ E_TImode, E_V1TImode]: New case
	statements.
	(rs6000_opt_masks): Add ti-vector-ops entry.
	* config/rs6000/r6000.h (RS6000_BTM_P10_128BIT, bool_V1TI_type_node):
	New defines.
	(rs6000_builtin_type_index): New enum value RS6000_BTI_bool_V1TI.
	* config/rs6000/rs6000.opt: New mti-vector-ops entry.
	* config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti,
	vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti,
	vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
	vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
	vlshrv1ti3, vashrv1ti3): New define_expands.
	* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
	UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
	UNSPEC_VSX_MODUQ, UNSPEC_XXSWAPD_V1TI): New unspecs.
	(vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti, vsx_diveu_v1ti,
	vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti, vsx_sign_extend_v2di_v1ti):
	New define_insns.
	(vcmpnet): New define_expand.
	* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
	vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
	vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,
	vec_cmpge, vec_cmple, vec_all_eq, vec_all_ne, vec_all_gt, vec_all_lt,
	vec_all_ge, vec_all_le, vec_any_eq, vec_any_ne, vec_any_gt, vec_any_lt,
	vec_any_ge, vec_any_le.

gcc/testsuite/ChangeLog

2020-09-21  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/int_128bit-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |    6 +-
 gcc/config/rs6000/altivec.md                  |  240 ++
 gcc/config/rs6000/rs6000-builtin.def          |   84 +
 gcc/config/rs6000/rs6000-call.c               |  138 +-
 gcc/config/rs6000/rs6000.c                    |    1 +
 gcc/config/rs6000/rs6000.h                    |    5 +-
 gcc/config/rs6000/vector.md                   |  191 ++
 gcc/config/rs6000/vsx.md                      |   97 +
 gcc/doc/extend.texi                           |  174 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 2254 +++++++++++++++++
 10 files changed, 3187 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index acc365612be..fc67073f79c 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -183,7 +183,7 @@
 #define vec_recipdiv __builtin_vec_recipdiv
 #define vec_rlmi __builtin_vec_rlmi
 #define vec_vrlnm __builtin_vec_rlnm
-#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
+#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
 #define vec_rsqrt __builtin_vec_rsqrt
 #define vec_rsqrte __builtin_vec_rsqrte
 #define vec_signed __builtin_vec_vsigned
@@ -690,6 +690,10 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
 
 #ifdef _ARCH_PWR10
+#define vec_signextq  __builtin_vec_vsignextq
+#define vec_dive __builtin_vec_dive
+#define vec_mod  __builtin_vec_mod
+
 /* May modify these macro definitions if future capabilities overload
    with support for different vector argument and result types.  */
 #define vec_cntlzm(a, b)	__builtin_altivec_vclzdm (a, b)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0a2e634d6b0..34a4731342a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -39,12 +39,16 @@
    UNSPEC_VMULESH
    UNSPEC_VMULEUW
    UNSPEC_VMULESW
+   UNSPEC_VMULEUD
+   UNSPEC_VMULESD
    UNSPEC_VMULOUB
    UNSPEC_VMULOSB
    UNSPEC_VMULOUH
    UNSPEC_VMULOSH
    UNSPEC_VMULOUW
    UNSPEC_VMULOSW
+   UNSPEC_VMULOUD
+   UNSPEC_VMULOSD
    UNSPEC_VPKPX
    UNSPEC_VPACK_SIGN_SIGN_SAT
    UNSPEC_VPACK_SIGN_UNS_SAT
@@ -628,6 +632,14 @@
   "vcmpequ<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_eqv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpequq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gt<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gt:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -636,6 +648,14 @@
   "vcmpgts<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gt:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpgtsq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gtu<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gtu:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -644,6 +664,14 @@
   "vcmpgtu<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		  (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpgtuq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_eqv4sf"
   [(set (match_operand:V4SF 0 "altivec_register_operand" "=v")
 	(eq:V4SF (match_operand:V4SF 1 "altivec_register_operand" "v")
@@ -1687,6 +1715,19 @@
  DONE;
 })
 
+(define_expand "vec_widen_umult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
 (define_expand "vec_widen_smult_even_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1700,6 +1741,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+ else
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_umult_odd_v16qi"
   [(use (match_operand:V8HI 0 "register_operand"))
    (use (match_operand:V16QI 1 "register_operand"))
@@ -1765,6 +1819,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_umult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_smult_odd_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1778,6 +1845,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "altivec_vmuleub"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
@@ -1859,6 +1939,15 @@
   "vmuleuw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuleud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULEUD))]
+  "TARGET_POWER10"
+  "vmuleud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulouw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1868,6 +1957,15 @@
   "vmulouw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuloud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOUD))]
+  "TARGET_POWER10"
+  "vmuloud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulesw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1877,6 +1975,15 @@
   "vmulesw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulesd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULESD))]
+  "TARGET_POWER10"
+  "vmulesd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulosw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1886,6 +1993,15 @@
   "vmulosw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulosd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOSD))]
+  "TARGET_POWER10"
+  "vmulosd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 ;; Vector pack/unpack
 (define_insn "altivec_vpkpx"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
@@ -1979,6 +2095,15 @@
   "vrl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vrlq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+;; rotate amount in needs to be in bits[57:63] of operand2.
+  "vrlq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vrl<VI_char>mi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
@@ -1989,6 +2114,33 @@
   "vrl<VI_char>mi %0,%2,%3"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqmi"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")
+		      (match_operand:V1TI 3 "vsx_register_operand")]
+		     UNSPEC_VRLMI))]
+  "TARGET_POWER10"
+{
+  /* Mask bit begin, end fields need to be in bits [41:55] of 128-bit operand2. */
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[3]));
+  emit_insn(gen_altivec_vrlqmi_inst (operands[0], operands[1], operands[2], tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqmi_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "0")
+		      (match_operand:V1TI 3 "vsx_register_operand" "v")]
+		     UNSPEC_VRLMI))]
+  "TARGET_POWER10"
+  "vrlqmi %0,%1,%3"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vrl<VI_char>nm"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
@@ -1998,6 +2150,31 @@
   "vrl<VI_char>nm %0,%1,%2"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqnm"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")]
+		     UNSPEC_VRLNM))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vrlqnm_inst (operands[0], operands[1], tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqnm_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+		     UNSPEC_VRLNM))]
+  "TARGET_POWER10"
+  ;; rotate and mask bits need to be in upper 64-bits of operand2.
+  "vrlqnm %0,%1,%2"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vsl"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2042,6 +2219,15 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vslq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vslq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsr<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (lshiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2050,6 +2236,15 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsrq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsrq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsra<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (ashiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2058,6 +2253,15 @@
   "vsra<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsraq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsraq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vsr"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2618,6 +2822,18 @@
   "vcmpequ<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_vcmpequt_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v")
+			   (match_operand:V1TI 2 "altivec_register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpequq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgts<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2630,6 +2846,18 @@
   "vcmpgts<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtst_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gt:CC (match_operand:V1TI 1 "register_operand" "v")
+			   (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gt:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpgtsq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgtu<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gtu:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2642,6 +2870,18 @@
   "vcmpgtu<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtut_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gtu:CC (match_operand:V1TI 1 "register_operand" "v")
+			    (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gtu:V1TI (match_dup 1)
+		  (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpgtuq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpeqfp_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(eq:CC (match_operand:V4SF 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 4c2e9460949..980bed55230 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1061,6 +1061,14 @@
 		     | RS6000_BTC_QUATERNARY),				\
 		    CODE_FOR_ ## ICODE)			/* ICODE */
 
+#define BU_P10_P(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_P (P10_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P10_128BIT,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_PREDICATE),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
 #define BU_P10_OVERLOAD_1(ENUM, NAME)					\
   RS6000_BUILTIN_1 (P10_BUILTIN_VEC_ ## ENUM,		/* ENUM */	\
 		    "__builtin_vec_" NAME,		/* NAME */	\
@@ -1179,6 +1187,38 @@
 		     | RS6000_BTC_TERNARY),				\
 		    CODE_FOR_ ## ICODE)			/* ICODE */
 
+#define BU_P10_P(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_P (P10_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P10_128BIT,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_PREDICATE),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_P10_128BIT_1(ENUM, NAME, ATTR, ICODE)			\
+  RS6000_BUILTIN_1 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P10_128BIT,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_UNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_P10_128BIT_2(ENUM, NAME, ATTR, ICODE)			\
+  RS6000_BUILTIN_2 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P10_128BIT,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_BINARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_P10_128BIT_3(ENUM, NAME, ATTR, ICODE)			\
+  RS6000_BUILTIN_3 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P10_128BIT,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_TERNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
 \f
 /* Insure 0 is not a legitimate index.  */
 BU_SPECIAL_X (RS6000_BUILTIN_NONE, NULL, 0, RS6000_BTC_MISC)
@@ -2736,6 +2776,10 @@ BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
 BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
 
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
+BU_P10_P (VCMPEQUT_P,		"vcmpequt_p",	CONST,	vector_eq_v1ti_p)
+BU_P10_P (VCMPGTST_P,		"vcmpgtst_p",	CONST,	vector_gt_v1ti_p)
+BU_P10_P (VCMPGTUT_P,		"vcmpgtut_p",	CONST,	vector_gtu_v1ti_p)
+
 BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
 BU_P10_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm)
@@ -2756,6 +2800,38 @@ BU_P10V_VSX_2 (XXGENPCVM_V16QI, "xxgenpcvm_v16qi", CONST, xxgenpcvm_v16qi)
 BU_P10V_VSX_2 (XXGENPCVM_V8HI, "xxgenpcvm_v8hi", CONST, xxgenpcvm_v8hi)
 BU_P10V_VSX_2 (XXGENPCVM_V4SI, "xxgenpcvm_v4si", CONST, xxgenpcvm_v4si)
 BU_P10V_VSX_2 (XXGENPCVM_V2DI, "xxgenpcvm_v2di", CONST, xxgenpcvm_v2di)
+BU_P10V_AV_2 (VCMPGTUT,		"vcmpgtut",	CONST,	vector_gtuv1ti)
+BU_P10V_AV_2 (VCMPGTST,		"vcmpgtst",	CONST,	vector_gtv1ti)
+BU_P10V_AV_2 (VCMPEQUT,		"vcmpequt",	CONST,	eqvv1ti3)
+BU_P10V_AV_2 (CMPNET,		"vcmpnet",	CONST,	vcmpnet)
+BU_P10V_AV_2 (CMPGE_1TI,	"cmpge_1ti",    CONST,  vector_nltv1ti)
+BU_P10V_AV_2 (CMPGE_U1TI,	"cmpge_u1ti",   CONST,  vector_nltuv1ti)
+BU_P10V_AV_2 (CMPLE_1TI,	"cmple_1ti",    CONST,  vector_ngtv1ti)
+BU_P10V_AV_2 (CMPLE_U1TI,	"cmple_u1ti",   CONST,  vector_ngtuv1ti)
+BU_P10V_AV_2 (VNOR_V1TI_UNS,	"vnor_v1ti_uns",CONST,	norv1ti3)
+BU_P10V_AV_2 (VNOR_V1TI,	"vnor_v1ti",	CONST,	norv1ti3)
+BU_P10V_AV_2 (VCMPNET_P,	"vcmpnet_p",	CONST,	vector_ne_v1ti_p)
+BU_P10V_AV_2 (VCMPAET_P,	"vcmpaet_p",	CONST,	vector_ae_v1ti_p)
+
+BU_P10_128BIT_1 (VSIGNEXTSD2Q,	"vsignext",     CONST,  vsx_sign_extend_v2di_v1ti)
+
+BU_P10_128BIT_2 (VMULEUD,	"vmuleud",	CONST,	vec_widen_umult_even_v2di)
+BU_P10_128BIT_2 (VMULESD,	"vmulesd",	CONST,	vec_widen_smult_even_v2di)
+BU_P10_128BIT_2 (VMULOUD,	"vmuloud",	CONST,	vec_widen_umult_odd_v2di)
+BU_P10_128BIT_2 (VMULOSD,	"vmulosd",	CONST,	vec_widen_smult_odd_v2di)
+BU_P10_128BIT_2 (VRLQ,		"vrlq",		CONST,	vrotlv1ti3)
+BU_P10_128BIT_2 (VSLQ,		"vslq",		CONST,	vashlv1ti3)
+BU_P10_128BIT_2 (VSRQ,		"vsrq",		CONST,	vlshrv1ti3)
+BU_P10_128BIT_2 (VSRAQ,		"vsraq",	CONST,	vashrv1ti3)
+BU_P10_128BIT_2 (VRLQNM,	"vrlqnm",	CONST,	altivec_vrlqnm)
+BU_P10_128BIT_2 (DIV_V1TI,	"div_1ti",      CONST,  vsx_div_v1ti)
+BU_P10_128BIT_2 (UDIV_V1TI,	"udiv_1ti",     CONST,  vsx_udiv_v1ti)
+BU_P10_128BIT_2 (DIVES_V1TI,	"dives",	CONST,	vsx_dives_v1ti)
+BU_P10_128BIT_2 (DIVEU_V1TI,	"diveu",	CONST,	vsx_diveu_v1ti)
+BU_P10_128BIT_2 (MODS_V1TI,	"mods",		CONST,	vsx_mods_v1ti)
+BU_P10_128BIT_2 (MODU_V1TI,	"modu",		CONST,	vsx_modu_v1ti)
+
+BU_P10_128BIT_3 (VRLQMI,	"vrlqmi",	CONST,	altivec_vrlqmi)
 
 BU_P10V_AV_3 (VEXTRACTBL, "vextdubvlx", CONST, vextractlv16qi)
 BU_P10V_AV_3 (VEXTRACTHL, "vextduhvlx", CONST, vextractlv8hi)
@@ -2863,6 +2939,12 @@ BU_P10_OVERLOAD_2 (CLRR, "clrr")
 BU_P10_OVERLOAD_2 (GNB, "gnb")
 BU_P10_OVERLOAD_4 (XXEVAL, "xxeval")
 BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
+BU_P10_OVERLOAD_2 (VRLQ, "vrlq")
+BU_P10_OVERLOAD_2 (VSLQ, "vslq")
+BU_P10_OVERLOAD_2 (VSRQ, "vsrq")
+BU_P10_OVERLOAD_2 (VSRAQ, "vsraq")
+BU_P10_OVERLOAD_2 (DIVE, "dive")
+BU_P10_OVERLOAD_2 (MOD, "mod")
 
 BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
 BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
@@ -2882,6 +2964,8 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
 BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
 BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
 
+BU_P10_OVERLOAD_1 (SIGNEXT, "vsignextq")
+
 \f
 BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
 BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 9e514a01012..e1d9c2e8729 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -839,6 +839,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, VSX_BUILTIN_XVCMPEQDP,
@@ -885,6 +889,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -899,8 +909,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTSD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTST,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, VSX_BUILTIN_XVCMPGTDP,
@@ -943,6 +957,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_CMPLE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -995,6 +1014,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10_BUILTIN_128BIT_DIV_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10_BUILTIN_128BIT_UDIV_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
@@ -1789,6 +1814,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULE, P8V_BUILTIN_VMULEUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10_BUILTIN_128BIT_VMULESD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10_BUILTIN_128BIT_VMULEUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULEUB, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULESB, ALTIVEC_BUILTIN_VMULESB,
@@ -1812,6 +1842,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULO, P8V_BUILTIN_VMULOUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10_BUILTIN_128BIT_VMULOSD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10_BUILTIN_128BIT_VMULOUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULO, ALTIVEC_BUILTIN_VMULOSH,
     RS6000_BTI_V4SI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULOSH, ALTIVEC_BUILTIN_VMULOSH,
@@ -1854,6 +1889,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
@@ -2115,6 +2160,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_RL, P8V_BUILTIN_VRLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10_BUILTIN_128BIT_VRLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10_BUILTIN_128BIT_VRLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
@@ -2133,12 +2183,23 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_RLMI, P9V_BUILTIN_VRLDMI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { P9V_BUILTIN_VEC_RLMI, P10_BUILTIN_128BIT_VRLQMI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI },
+  { P9V_BUILTIN_VEC_RLMI, P10_BUILTIN_128BIT_VRLQMI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLWNM,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLDNM,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10_BUILTIN_128BIT_VRLQNM,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10_BUILTIN_128BIT_VRLQNM,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
@@ -2155,6 +2216,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, P8V_BUILTIN_VSLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10_BUILTIN_128BIT_VSLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10_BUILTIN_128BIT_VSLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTSP,
@@ -2351,6 +2417,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SR, P8V_BUILTIN_VSRD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10_BUILTIN_128BIT_VSRQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10_BUILTIN_128BIT_VSRQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
@@ -2379,6 +2450,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SRA, P8V_BUILTIN_VSRAD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10_BUILTIN_128BIT_VSRAQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10_BUILTIN_128BIT_VSRAQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
@@ -3996,12 +4072,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, ALTIVEC_BUILTIN_VCMPGTFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, VSX_BUILTIN_XVCMPGTDP_P,
@@ -4066,6 +4146,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, ALTIVEC_BUILTIN_VCMPEQFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, VSX_BUILTIN_XVCMPEQDP_P,
@@ -4117,12 +4201,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, ALTIVEC_BUILTIN_VCMPGEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
@@ -4771,6 +4859,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEW,
     RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
 
   /* The following 2 entries have been deprecated.  */
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEB_P,
@@ -4871,6 +4965,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
 
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
@@ -4976,7 +5072,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
-
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
@@ -5903,6 +6000,21 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
  { P10_BUILTIN_VEC_XVTLSBB_ONES, P10V_BUILTIN_XVTLSBB_ONES,
     RS6000_BTI_INTSI, RS6000_BTI_unsigned_V16QI, 0, 0 },
 
+ { P10_BUILTIN_VEC_SIGNEXT, P10_BUILTIN_128BIT_VSIGNEXTSD2Q,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, 0, 0 },
+
+ { P10_BUILTIN_VEC_DIVE, P10_BUILTIN_128BIT_DIVES_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10_BUILTIN_128BIT_DIVEU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
+  { P10_BUILTIN_VEC_MOD, P10_BUILTIN_128BIT_MODS_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10_BUILTIN_128BIT_MODU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
   { RS6000_BUILTIN_NONE, RS6000_BUILTIN_NONE, 0, 0, 0, 0 }
 };
 \f
@@ -12256,12 +12368,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPEQUH:
     case ALTIVEC_BUILTIN_VCMPEQUW:
     case P8V_BUILTIN_VCMPEQUD:
+    case P10V_BUILTIN_VCMPEQUT:
       fold_compare_helper (gsi, EQ_EXPR, stmt);
       return true;
 
     case P9V_BUILTIN_CMPNEB:
     case P9V_BUILTIN_CMPNEH:
     case P9V_BUILTIN_CMPNEW:
+    case P10V_BUILTIN_CMPNET:
       fold_compare_helper (gsi, NE_EXPR, stmt);
       return true;
 
@@ -12273,6 +12387,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_2DI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10V_BUILTIN_CMPGE_1TI:
+    case P10V_BUILTIN_CMPGE_U1TI:
       fold_compare_helper (gsi, GE_EXPR, stmt);
       return true;
 
@@ -12284,6 +12400,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
     case P8V_BUILTIN_VCMPGTSD:
+    case P10V_BUILTIN_VCMPGTUT:
+    case P10V_BUILTIN_VCMPGTST:
       fold_compare_helper (gsi, GT_EXPR, stmt);
       return true;
 
@@ -12295,6 +12413,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPLE_U4SI:
     case VSX_BUILTIN_CMPLE_2DI:
     case VSX_BUILTIN_CMPLE_U2DI:
+    case P10V_BUILTIN_CMPLE_1TI:
+    case P10V_BUILTIN_CMPLE_U1TI:
       fold_compare_helper (gsi, LE_EXPR, stmt);
       return true;
 
@@ -13000,6 +13120,8 @@ rs6000_init_builtins (void)
 					    ? "__vector __bool long"
 					    : "__vector __bool long long",
 					    bool_long_long_type_node, 2);
+  bool_V1TI_type_node = rs6000_vector_type ("__vector __bool __int128",
+					    intTI_type_node, 1);
   pixel_V8HI_type_node = rs6000_vector_type ("__vector __pixel",
 					     pixel_type_node, 8);
 
@@ -13185,6 +13307,10 @@ altivec_init_builtins (void)
     = build_function_type_list (integer_type_node,
 				integer_type_node, V2DI_type_node,
 				V2DI_type_node, NULL_TREE);
+  tree int_ftype_int_v1ti_v1ti
+    = build_function_type_list (integer_type_node,
+				integer_type_node, V1TI_type_node,
+				V1TI_type_node, NULL_TREE);
   tree void_ftype_v4si
     = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
   tree v8hi_ftype_void
@@ -13537,6 +13663,9 @@ altivec_init_builtins (void)
 	case E_VOIDmode:
 	  type = int_ftype_int_opaque_opaque;
 	  break;
+	case E_V1TImode:
+	  type = int_ftype_int_v1ti_v1ti;
+	  break;
 	case E_V2DImode:
 	  type = int_ftype_int_v2di_v2di;
 	  break;
@@ -14136,6 +14265,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10V_BUILTIN_XXGENPCVM_V8HI:
     case P10V_BUILTIN_XXGENPCVM_V4SI:
     case P10V_BUILTIN_XXGENPCVM_V2DI:
+    case P10_BUILTIN_128BIT_VMULEUD:
+    case P10_BUILTIN_128BIT_VMULOUD:
+    case P10_BUILTIN_128BIT_DIVEU_V1TI:
+    case P10_BUILTIN_128BIT_MODU_V1TI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
@@ -14235,10 +14368,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case VSX_BUILTIN_CMPGE_U8HI:
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10V_BUILTIN_CMPGE_U1TI:
     case ALTIVEC_BUILTIN_VCMPGTUB:
     case ALTIVEC_BUILTIN_VCMPGTUH:
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
+    case P10V_BUILTIN_VCMPGTUT:
+    case P10V_BUILTIN_VCMPEQUT:
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
       break;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6f204ca202a..cc629f2d938 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -19546,6 +19546,7 @@ rs6000_handle_altivec_attribute (tree *node,
     case 'b':
       switch (mode)
 	{
+	case E_TImode: case E_V1TImode: result = bool_V1TI_type_node; break;
 	case E_DImode: case E_V2DImode: result = bool_V2DI_type_node; break;
 	case E_SImode: case E_V4SImode: result = bool_V4SI_type_node; break;
 	case E_HImode: case E_V8HImode: result = bool_V8HI_type_node; break;
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index bbd8060e143..0e5a9617b36 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2305,6 +2305,7 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_P8_VECTOR	MASK_P8_VECTOR	/* ISA 2.07 vector.  */
 #define RS6000_BTM_P9_VECTOR	MASK_P9_VECTOR	/* ISA 3.0 vector.  */
 #define RS6000_BTM_P9_MISC	MASK_P9_MISC	/* ISA 3.0 misc. non-vector */
+#define RS6000_BTM_P10_128BIT   MASK_POWER10    /* ISA 3.1 128-bit vector.  */
 #define RS6000_BTM_CRYPTO	MASK_CRYPTO	/* crypto funcs.  */
 #define RS6000_BTM_HTM		MASK_HTM	/* hardware TM funcs.  */
 #define RS6000_BTM_FRE		MASK_POPCNTB	/* FRE instruction.  */
@@ -2322,7 +2323,7 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_FLOAT128_HW	MASK_FLOAT128_HW /* IEEE 128-bit float h/w.  */
 #define RS6000_BTM_MMA		MASK_MMA	/* ISA 3.1 MMA.  */
 #define RS6000_BTM_P10		MASK_POWER10
-
+#define RS6000_BTM_TI_VECTOR_OPS MASK_TI_VECTOR_OPS /* 128-bit integer support */
 
 #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
 				 | RS6000_BTM_VSX			\
@@ -2436,6 +2437,7 @@ enum rs6000_builtin_type_index
   RS6000_BTI_bool_V8HI,          /* __vector __bool short */
   RS6000_BTI_bool_V4SI,          /* __vector __bool int */
   RS6000_BTI_bool_V2DI,          /* __vector __bool long */
+  RS6000_BTI_bool_V1TI,          /* __vector __bool 128-bit */
   RS6000_BTI_pixel_V8HI,         /* __vector __pixel */
   RS6000_BTI_long,	         /* long_integer_type_node */
   RS6000_BTI_unsigned_long,      /* long_unsigned_type_node */
@@ -2489,6 +2491,7 @@ enum rs6000_builtin_type_index
 #define bool_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V8HI])
 #define bool_V4SI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V4SI])
 #define bool_V2DI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V2DI])
+#define bool_V1TI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V1TI])
 #define pixel_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_pixel_V8HI])
 
 #define long_long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long_long])
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 796345c80d3..0cca4232619 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -685,6 +685,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtv1ti"
+  [(set (match_operand:V1TI 0 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))]
+  "TARGET_POWER10"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nlt<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -697,6 +704,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		 (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_gtu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -704,6 +722,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		  (match_operand:V1TI 2 "altivec_register_operand")))]
+  "TARGET_POWER10"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nltu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -716,6 +741,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		  (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+	(not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_geu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(geu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -735,6 +771,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_ngtu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vlogical_operand")
@@ -746,6 +793,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		  (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 ; There are 14 possible vector FP comparison operators, gt and eq of them have
 ; been expanded above, so just support 12 remaining operators here.
 
@@ -894,6 +952,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_eq_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 ;; This expansion handles the V16QI, V8HI, and V4SI modes in the
 ;; implementation of the vec_all_ne built-in functions on Power9.
 (define_expand "vector_ne_<mode>_p"
@@ -976,6 +1046,23 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+(define_expand "vector_ne_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V2DI mode in the implementation of the
 ;; vec_any_eq built-in function on Power9.
 ;;
@@ -1002,6 +1089,26 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+(define_expand "vector_ae_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))
+   (set (match_dup 0)
+	(xor:SI (match_dup 0)
+		(const_int 1)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V4SF and V2DF modes in the Power9
 ;; implementation of the vec_all_ne built-in functions.  Note that the
 ;; expansions for this pattern with these modes makes no use of power9-
@@ -1061,6 +1168,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gt_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gt:CC (match_operand:V1TI 1 "vlogical_operand")
+			     (match_operand:V1TI 2 "vlogical_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (gt:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 (define_expand "vector_ge_<mode>_p"
   [(parallel
     [(set (reg:CC CR6_REGNO)
@@ -1085,6 +1204,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtu_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gtu:CC (match_operand:V1TI 1 "altivec_register_operand")
+			      (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "altivec_register_operand")
+	  (gtu:V1TI (match_dup 1)
+		    (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 ;; AltiVec/VSX predicates.
 
 ;; This expansion is triggered during expansion of predicate built-in
@@ -1460,6 +1591,20 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vrotlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vrlq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for rotatert to make use of vrotl
 (define_expand "vrotr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1481,6 +1626,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vashlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for logical shift right on each vector element
 (define_expand "vlshr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1489,6 +1649,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vlshrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for arithmetic shift right on each vector element
 (define_expand "vashr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1496,6 +1671,22 @@
 			(match_operand:VEC_I 2 "vint_operand")))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
+
+;; No immediate version of this 128-bit instruction
+(define_expand "vashrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vsraq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 \f
 ;; Vector reduction expanders for VSX
 ; The (VEC_reduc:...
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 31fcffe8f33..5b6a0bd728a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -298,6 +298,12 @@
    UNSPEC_VSX_XXSPLTD
    UNSPEC_VSX_DIVSD
    UNSPEC_VSX_DIVUD
+   UNSPEC_VSX_DIVSQ
+   UNSPEC_VSX_DIVUQ
+   UNSPEC_VSX_DIVESQ
+   UNSPEC_VSX_DIVEUQ
+   UNSPEC_VSX_MODSQ
+   UNSPEC_VSX_MODUQ
    UNSPEC_VSX_MULSD
    UNSPEC_VSX_SIGN_EXTEND
    UNSPEC_VSX_XVCVBF16SPN
@@ -1732,6 +1738,60 @@
 }
   [(set_attr "type" "div")])
 
+(define_insn "vsx_div_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVSQ))]
+  "TARGET_POWER10"
+  "vdivsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_udiv_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVUQ))]
+  "TARGET_POWER10"
+  "vdivuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_dives_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVESQ))]
+  "TARGET_POWER10"
+  "vdivesq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_diveu_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVEUQ))]
+  "TARGET_POWER10"
+  "vdiveuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_mods_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODSQ))]
+  "TARGET_POWER10"
+  "vmodsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_modu_v1ti"
+ [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODUQ))]
+  "TARGET_POWER10"
+  "vmoduq %0,%1,%2"
+  [(set_attr "type" "div")])
+
 ;; *tdiv* instruction returning the FG flag
 (define_expand "vsx_tdiv<mode>3_fg"
   [(set (match_dup 3)
@@ -3083,6 +3143,21 @@
   "xxpermdi %x0,%x1,%x1,2"
   [(set_attr "type" "vecperm")])
 
+;; Swap upper/lower 64-bit values in a 128-bit vector
+(define_insn "xxswapd_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(subreg:V1TI
+           (vec_select:V2DI
+             (subreg:V2DI
+                (match_operand:V1TI 1 "vsx_register_operand" "v") 0 )
+	     (parallel [(const_int 1)(const_int 0)]))
+           0))]
+  "TARGET_POWER10"
+;; AIX does not support extended mnemonic xxswapd.  Use the basic
+;; mnemonic xxpermdi instead.
+  "xxpermdi %x0,%x1,%x1,2"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "xxgenpcvm_<mode>_internal"
   [(set (match_operand:VSX_EXTRACT_I4 0 "altivec_register_operand" "=wa")
 	(unspec:VSX_EXTRACT_I4
@@ -4767,6 +4842,15 @@
    (set_attr "type" "vecload")])
 
 \f
+;; ISA 3.1 vector extend sign support
+(define_insn "vsx_sign_extend_v2di_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
+		     UNSPEC_VSX_SIGN_EXTEND))]
+  "TARGET_POWER10"
+  "vextsd2q %0,%1"
+  [(set_attr "type" "vecexts")])
+
 ;; ISA 3.0 vector extend sign support
 
 (define_insn "vsx_sign_extend_qi_<mode>"
@@ -5451,6 +5535,19 @@
   "vcmpneb %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Vector Compare Not Equal v1ti (specified/not+eq:)
+(define_expand "vcmpnet"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(not:V1TI
+	  (eq:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		   (match_operand:V1TI 2 "altivec_register_operand"))))]
+   "TARGET_POWER10"
+{
+  emit_insn (gen_eqvv1ti3 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_one_cmplv1ti2 (operands[0], operands[0]));
+  DONE;
+})
+
 ;; Vector Compare Not Equal or Zero Byte
 (define_insn "vcmpnezb"
   [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index c1c2c9f9bf7..0a42f1f63e6 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21316,6 +21316,180 @@ Generate PCV from specified Mask size, as if implemented by the
 immediate value is either 0, 1, 2 or 3.
 @findex vec_genpcvm
 
+@smallexample
+@exdent vector unsigned __int128 vec_rl (vector unsigned __int128,
+                                         vector unsigned __int128);
+@exdent vector signed __int128 vec_rl (vector signed __int128,
+                                       vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input left by the number of bits
+specified in the most significant quad word of the second input truncated to
+7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlmi (vector signed __int128,
+                                         vector signed __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and inserting it under mask into the
+second input. The first bit in the mask, the last bit in the mask are obtained from the
+two 7-bit fields bits [108:115] and bits [117:123] respectively of the second input.
+The shift is obtained from the third input in the 7-bit field [125:131] where all bits
+counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlnm (vector signed __int128,
+                                         vector unsigned __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and ANDing it with a mask. The
+first bit in the mask and the last bit in the mask are obtained from the two
+7-bit fields bits [117:123] and bits [125:131] respectively of the second
+input.  The shift is obtained from the third input in the 7-bit field bits
+[125:131] where all bits counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_sl(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sl(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of shifting the first input left by the number of bits
+specified in the most significant bits of the second input truncated to
+7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_sr(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sr(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of performing a logical right shift of the first argument
+by the number of bits specified in the most significant double word of the
+second input truncated to 7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_sra(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sra(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of performing arithmetic right shift of the first argument
+by the number of bits specified in the most significant bits of the
+second input truncated to 7 bits (bits [125:131]).
+
+
+@smallexample
+@exdent vector unsigned __int128 vec_mule (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mule (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the even doubleword
+elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mulo (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mulo (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the odd doubleword
+elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_div (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_div (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+Returns the result of dividing the first operand by the second operand. An attempt to
+divide any value by zero or to divide the most negative signed 128-bit integer by
+negative one results in an undefined value.
+
+@smallexample
+@exdent vector unsigned __int128 vec_dive (vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_dive (vector signed __int128,
+                                         vector signed __int128);
+@end smallexample
+
+The result is produced by shifting the first input left by 128 bits and dividing by the
+second. If an attempt is made to divide by zero or the result is larger than 128 bits,
+the result is undefined.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mod (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_mod (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+The result is the modulo result of dividing the first input  by the second input.
+
+
+The following builtins perform 128-bit vector comparisons.  The @code{vec_all_xx},
+@code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is one of the operations
+@code{eq, ne, gt, lt, ge, le} perform pairwise comparisons between the elements
+at the same positions within their two vector arguments.   The @code{vec_all_xx}
+function returns a non-zero value if and only if all pairwise comparisons are true.  The
+@code{vec_any_xx} function returns a non-zero value if and only if at least one pairwise
+comparison is true.  The @code{vec_cmpxx}function returns a vector of the same type as its
+two arguments, within which each element consists of all ones to denote that specified
+logical comparison of the corresponding elements was true.  Otherwise, the element of the
+returned vector contains all zeros.
+
+@smallexample
+vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128);
+
+int vec_all_eq (vector signed __int128, vector signed __int128);
+int vec_all_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ne (vector signed __int128, vector signed __int128);
+int vec_all_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_all_gt (vector signed __int128, vector signed __int128);
+int vec_all_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_lt (vector signed __int128, vector signed __int128);
+int vec_all_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ge (vector signed __int128, vector signed __int128);
+int vec_all_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_all_le (vector signed __int128, vector signed __int128);
+int vec_all_le (vector unsigned __int128, vector unsigned __int128);
+
+int vec_any_eq (vector signed __int128, vector signed __int128);
+int vec_any_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ne (vector signed __int128, vector signed __int128);
+int vec_any_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_any_gt (vector signed __int128, vector signed __int128);
+int vec_any_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_lt (vector signed __int128, vector signed __int128);
+int vec_any_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ge (vector signed __int128, vector signed __int128);
+int vec_any_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_any_le (vector signed __int128, vector signed __int128);
+int vec_any_le (vector unsigned __int128, vector unsigned __int128);
+@end smallexample
+
+
 @node PowerPC Hardware Transactional Memory Built-in Functions
 @subsection PowerPC Hardware Transactional Memory Built-in Functions
 GCC provides two interfaces for accessing the Hardware Transactional
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
new file mode 100644
index 00000000000..85ad544e22b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -0,0 +1,2254 @@
+/* { dg-do run } */
+/* { dg-options "-mcpu=power10 -O2 -save-temps" } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-require-effective-target ppc_native_128bit } */
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mvextsd2q\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsrq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsraq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvcmpuq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpsq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvmuleud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivsq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivuq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveuq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmoduq\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+
+
+void print_i128(__int128_t val)
+{
+  printf(" %lld %llu (0x%llx %llx)",
+	 (signed long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF),
+	 (unsigned long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF));
+}
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i, result_int;
+
+  __int128_t arg1, result;
+  __uint128_t uarg2;
+
+  vector signed long long int vec_arg1_di, vec_arg2_di;
+  vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
+  vector unsigned long long int vec_uresult_di;
+  vector unsigned long long int vec_uexpected_result_di;
+  
+  __int128_t expected_result;
+  __uint128_t uexpected_result;
+
+  vector __int128_t vec_arg1, vec_arg2, vec_result;
+  vector __uint128_t vec_uarg1, vec_uarg2, vec_uarg3, vec_uresult;
+  vector bool __int128  vec_result_bool;
+
+  /* sign extend double to 128-bit integer  */
+  vec_arg1_di[0] = 1000;
+  vec_arg1_di[1] = -123456;
+
+  expected_result = 1000;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1_di[0] = -123456;
+  vec_arg1_di[1] = 1000;
+
+  expected_result = -123456;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  /* test shift 128-bit integers.
+     Note, shift amount is given by the lower 7-bits of the shift amount. */
+  vec_arg1[0] = 3;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]*4;
+
+  vec_result = vec_sl (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 3;
+  uarg2 = 4;
+  expected_result = arg1*16;
+
+  result = arg1 << uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 << uint128):  ");
+    print_i128(arg1);
+    printf(" << %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 3;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]*4;
+  
+  vec_uresult = vec_sl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]/4;
+
+  vec_result = vec_sr (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 48;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]/4;
+  
+  vec_uresult = vec_sr (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 48;
+  uarg2 = 4;
+  expected_result = arg1/16;
+
+  result = arg1 >> uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 >> uint128:  ");
+    print_i128(arg1);
+    printf(" >> %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x0000000012345678ULL;
+  expected_result = (expected_result << 64) | 0x90ABCDEFAABBCCDDULL;
+
+  vec_result = vec_sra (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0xFFFFFFFFFFFFAABBLL;
+  uexpected_result = (uexpected_result << 64) | 0xCCDDEEFF11221234ULL;
+
+  vec_uresult = vec_sra (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x90ABCDEFAABBCCDDULL;
+  expected_result = (expected_result << 64) | 0xEEFF112212345678ULL;
+
+  vec_result = vec_rl (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0x11221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEEFFULL;
+
+  vec_uresult = vec_rl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = (32 << 8) | 95;
+  vec_uarg3[0] = 32;
+  expected_result = 0xaabbccddULL;
+  expected_result = (expected_result << 64) | 0xeeff112200000000ULL;
+
+  vec_result = vec_rlnm (vec_arg1, vec_uarg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(int128, uint128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = (8 << 8) | 119;
+  vec_uarg3[0] = 48;
+
+  uexpected_result = 0x00221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEE00ULL;
+
+  vec_uresult = vec_rlnm (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(uint128, uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_arg2[0] = 0x000000000000DEADULL;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 0x0000BEEF00000000ULL;
+  vec_uarg3[0] = 96 << 16 | 127 << 8 | 32;
+  expected_result = 0x000000000000DEADULL;
+  expected_result = (expected_result << 64) | 0x0000BEEF12345678ULL;
+
+  vec_result = vec_rlmi (vec_arg1, vec_arg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(int128, int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg2_di[1] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 0xDEAD000000000000ULL;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 0x000000000000BEEFULL;
+  vec_uarg3[0] = 16 << 16 | 111 << 8 | 48;
+  uexpected_result = 0xDEAD1234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDBEEFULL;
+
+  vec_uresult = vec_rlmi (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(uint128, unit128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg3[1] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* 128-bit compare tests, result is all 1's if true */
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1[0] = 2468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  uexpected_result = 0xFFFFFFFFFFFFFFFFULL;
+  uexpected_result = (uexpected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: unsigned vec_cmpgt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed vec_cmpgt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpeq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+   
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector multiply Even and Odd tests */
+  vec_arg1_di[0] = 200;
+  vec_arg1_di[1] = 400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[0] * vec_arg2_di[0];
+
+  vec_result = vec_mule (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (signed, signed) failed.\n");
+    printf(" vec_arg1_di[0] = %lld\n", vec_arg1_di[0]);
+    printf(" vec_arg2_di[0] = %lld\n", vec_arg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1_di[0] = -200;
+  vec_arg1_di[1] = -400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[1] * vec_arg2_di[1];
+
+  vec_result = vec_mulo (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (signed, signed) failed.\n");
+    printf(" vec_arg1_di[1] = %lld\n", vec_arg1_di[1]);
+    printf(" vec_arg2_di[1] = %lld\n", vec_arg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[0] * vec_uarg2_di[0];
+
+  vec_uresult = vec_mule (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[1] = %lld\n", vec_uarg1_di[1]);
+    printf(" vec_uarg2_di[1] = %lld\n", vec_uarg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[1] * vec_uarg2_di[1];
+
+  vec_uresult = vec_mulo (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[0] = %lld\n", vec_uarg1_di[0]);
+    printf(" vec_uarg2_di[0] = %lld\n", vec_uarg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Divide Quadword */
+  vec_arg1[0] = -12345678;
+  vec_arg2[0] = 2;
+  expected_result = -6172839;
+
+  vec_result = vec_div (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24680;
+  vec_uarg2[0] = 4;
+  uexpected_result = 6170;
+
+  vec_uresult = vec_div (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Divide Extended Quadword */
+  vec_arg1[0] = -20;        // has 128-bit of zero concatenated onto it
+  vec_arg2[0] = 0x2000000000000000;
+  vec_arg2[0] = vec_arg2[0] << 64;
+  expected_result = -160;
+
+  vec_result = vec_dive (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 20;        // has 128-bit of zero concatenated onto it
+  vec_uarg2[0] = 0x4000000000000000;
+  vec_uarg2[0] = vec_uarg2[0] << 64;
+  uexpected_result = 80;
+
+  vec_uresult = vec_dive (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector modulo quad word  */
+  vec_arg1[0] = -12345675;
+  vec_arg2[0] = 2;
+  expected_result = -1;
+
+  vec_result = vec_mod (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24685;
+  vec_uarg2[0] = 4;
+  uexpected_result = 1;
+
+  vec_uresult = vec_mod (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  return 0;
+}
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support
  2020-09-21 21:17 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
  2020-09-21 23:56 ` [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
  2020-09-21 23:56 ` [PATCH 2/5] RS6000 add 128-bit Integer Operations Carl Love
@ 2020-09-21 23:56 ` Carl Love
  2020-09-24 18:21   ` will schmidt
  2020-09-21 23:56 ` [PATCH 4/5] Test 128-bit shifts for just the int128 type Carl Love
  2020-09-21 23:57 ` [PATCH 5/5] Conversions between 128-bit integer and floating point values Carl Love
  4 siblings, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-09-21 23:56 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt

Segher, Will:

Add support for converting to/from 128-bit integers and 128-bit 
decimal floating point formats.

The updates from the previous version of the patch:

Removed stray ";; carll" comment.  

Removed #if 1 and #endif in the test case.
 
Replaced TARGET_TI_VECTOR_OPS with POWER10.

The patch has been tested on

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regression errors.

The P10 test was run by hand on Mambo.


             Carl Love


-------------------------------------------------------------------

gcc/ChangeLog

2020-09-21  Carl Love  <cel@us.ibm.com>
	* config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.

gcc/testsuite/ChangeLog

2020-09-21  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/int_128bit-runnable.c:  Update test.
---
 gcc/config/rs6000/dfp.md                      | 14 +++++
 gcc/config/rs6000/rs6000-call.c               |  4 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 62 +++++++++++++++++++
 3 files changed, 80 insertions(+)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 8f822732bac..0e82e315fee 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -222,6 +222,13 @@
   "dcffixq %0,%1"
   [(set_attr "type" "dfp")])
 
+(define_insn "floattitd2"
+  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
+	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
+  "TARGET_POWER10"
+  "dcffixqq %0,%1"
+  [(set_attr "type" "dfp")])
+
 ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
 ;; This is the first stage of converting it to an integer type.
 
@@ -241,6 +248,13 @@
   "TARGET_DFP"
   "dctfix<q> %0,%1"
   [(set_attr "type" "dfp")])
+
+(define_insn "fixtdti2"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
+	(fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
+  "TARGET_POWER10"
+  "dctfixqq %0,%1"
+  [(set_attr "type" "dfp")])
 
 ;; Decimal builtin support
 
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index e1d9c2e8729..9c50cd3c5a7 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -4967,6 +4967,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, 0 },
   { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
     RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
 
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
@@ -5074,6 +5076,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
     RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 85ad544e22b..ec3dcf3dff1 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -38,6 +38,7 @@
 #if DEBUG
 #include <stdio.h>
 #include <stdlib.h>
+#include <math.h>
 
 
 void print_i128(__int128_t val)
@@ -59,6 +60,13 @@ int main ()
   __int128_t arg1, result;
   __uint128_t uarg2;
 
+  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
+
+  struct conv_t {
+    __uint128_t u128;
+    _Decimal128 d128;
+  } conv, conv2;
+
   vector signed long long int vec_arg1_di, vec_arg2_di;
   vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
   vector unsigned long long int vec_uresult_di;
@@ -2249,6 +2257,60 @@ int main ()
     abort();
 #endif
   }
+  
+  /* DFP to __int128 and __int128 to DFP conversions */
+  /* Can't get printing of DFP values to work.  Print the DFP value as an
+     unsigned int so we can see the bit patterns.  */
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
+  expected_result_dfp128 = conv.d128;
 
+  arg1 = 4;
+
+  conv.d128 = (_Decimal128) arg1;
+
+  result_dfp128 = (_Decimal128) arg1;
+  if (((conv.u128 >>64) != 0x2208000000000000ULL) &&
+      ((conv.u128 & 0xFFFFFFFFFFFFFFFF) != 0x4ULL)) {
+#if DEBUG
+    printf("ERROR:  convert int128 value ");
+    print_i128 (arg1);
+    conv.d128 = result_dfp128;
+    printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)((conv.u128) >>64),
+	   (unsigned long long)((conv.u128) & 0xFFFFFFFFFFFFFFFF));
+
+    conv.d128 = expected_result_dfp128;
+    printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
+	   (unsigned long long) (conv.u128>>64),
+	   (unsigned long long) (conv.u128 & 0xFFFFFFFFFFFFFFFF));
+#else
+    abort();
+#endif
+  }
+
+  expected_result = 4;
+
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
+  arg1_dfp128 = conv.d128;
+
+  result = (__int128_t) arg1_dfp128;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR:  convert DFP value ");
+    printf("0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)(conv.u128>>64),
+	   (unsigned long long)(conv.u128 & 0xFFFFFFFFFFFFFFFF));
+    printf("to __int128 value = ");
+    print_i128 (result);
+    printf("\ndoes not match expected_result = ");
+    print_i128 (expected_result);
+    printf("\n");
+#else
+    abort();
+#endif
+  }
   return 0;
 }
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 4/5] Test 128-bit shifts for just the int128 type.
  2020-09-21 21:17 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
                   ` (2 preceding siblings ...)
  2020-09-21 23:56 ` [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support Carl Love
@ 2020-09-21 23:56 ` Carl Love
  2020-09-24 18:21   ` will schmidt
  2020-09-21 23:57 ` [PATCH 5/5] Conversions between 128-bit integer and floating point values Carl Love
  4 siblings, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-09-21 23:56 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt


Segher, Will:

Patch 4 adds the vector 128-bit integer shift instruction support for
the V1TI type.

The following changes were made from the previous version.

Renamed VSX_TI to VEC_TI, put def in vector.md.  Didn't get it
separated into a different patch.

Reworked the XXSWAPD_V1TI to not use UNSPEC.

Test suite program cleanups, removed "//" comments that were not
needed.

The patch has been tested on

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regression errors.

The P10 test was run by hand on Mambo.


                    Carl Love

------------------------------------------------------


gcc/ChangeLog

2020-09-21  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
	Rename altivec_vslq_<mode>, altivec_vsrq_<mode>, mode VEC_TI.
	* config/rs6000/vector.md (VEC_TI): New mode iterator.
	(vashlv1ti3): Change to vashl<mode>3, mode VEC_TI.
	(vlshrv1ti3): Change to vlshr<mode>3, mode VEC_TI.
	* config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator.
	(VSX_TI): Renamed VEC_TI.

gcc/testsuite/ChangeLog

2020-09-21  Carl Love  <cel@us.ibm.com>
	gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
	tests.
---
 gcc/config/rs6000/altivec.md                  | 16 ++++-----
 gcc/config/rs6000/vector.md                   | 27 ++++++++-------
 gcc/config/rs6000/vsx.md                      | 33 +++++++++----------
 .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++++++--
 4 files changed, 52 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 34a4731342a..5db3de3cc9f 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2219,10 +2219,10 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vslq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vslq_<mode>"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+		     (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vslq %0,%1,%2"
@@ -2236,10 +2236,10 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vsrq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vsrq_<mode>"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+			   (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vsrq %0,%1,%2"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 0cca4232619..3ea3a91845a 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,6 +26,9 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
 
+;; 128-bit int modes
+(define_mode_iterator VEC_TI [V1TI TI])
+
 ;; Vector int modes for parity
 (define_mode_iterator VEC_IP [V8HI
 			      V4SI
@@ -1627,17 +1630,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vashlv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vashl<mode>3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+			 (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vslq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
@@ -1650,17 +1653,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vlshrv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vlshr<mode>3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+			   (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vsrq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 5b6a0bd728a..87f96ffcc4c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -37,9 +37,6 @@
 				  TI
 				  V1TI])
 
-;; Iterator for 128-bit integer types that go in a single vector register.
-(define_mode_iterator VSX_TI [TI V1TI])
-
 ;; Iterator for the 2 32-bit vector types
 (define_mode_iterator VSX_W [V4SF V4SI])
 
@@ -944,9 +941,9 @@
 ;; special V1TI container class, which it is not appropriate to use vec_select
 ;; for the type.
 (define_insn "*vsx_le_permute_<mode>"
-  [(set (match_operand:VSX_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
-	(rotate:VSX_TI
-	 (match_operand:VSX_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
+  [(set (match_operand:VEC_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
+	(rotate:VEC_TI
+	 (match_operand:VEC_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
 	 (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
   "@
@@ -960,10 +957,10 @@
    (set_attr "type" "vecperm,vecload,vecstore,*,load,store")])
 
 (define_insn_and_split "*vsx_le_undo_permute_<mode>"
-  [(set (match_operand:VSX_TI 0 "vsx_register_operand" "=wa,wa")
-	(rotate:VSX_TI
-	 (rotate:VSX_TI
-	  (match_operand:VSX_TI 1 "vsx_register_operand" "0,wa")
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=wa,wa")
+	(rotate:VEC_TI
+	 (rotate:VEC_TI
+	  (match_operand:VEC_TI 1 "vsx_register_operand" "0,wa")
 	  (const_int 64))
 	 (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX"
@@ -1031,11 +1028,11 @@
 ;; Peepholes to catch loads and stores for TImode if TImode landed in
 ;; GPR registers on a little endian system.
 (define_peephole2
-  [(set (match_operand:VSX_TI 0 "int_reg_operand")
-	(rotate:VSX_TI (match_operand:VSX_TI 1 "memory_operand")
+  [(set (match_operand:VEC_TI 0 "int_reg_operand")
+	(rotate:VEC_TI (match_operand:VEC_TI 1 "memory_operand")
 		       (const_int 64)))
-   (set (match_operand:VSX_TI 2 "int_reg_operand")
-	(rotate:VSX_TI (match_dup 0)
+   (set (match_operand:VEC_TI 2 "int_reg_operand")
+	(rotate:VEC_TI (match_dup 0)
 		       (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
    && (rtx_equal_p (operands[0], operands[2])
@@ -1043,11 +1040,11 @@
    [(set (match_dup 2) (match_dup 1))])
 
 (define_peephole2
-  [(set (match_operand:VSX_TI 0 "int_reg_operand")
-	(rotate:VSX_TI (match_operand:VSX_TI 1 "int_reg_operand")
+  [(set (match_operand:VEC_TI 0 "int_reg_operand")
+	(rotate:VEC_TI (match_operand:VEC_TI 1 "int_reg_operand")
 		       (const_int 64)))
-   (set (match_operand:VSX_TI 2 "memory_operand")
-	(rotate:VSX_TI (match_dup 0)
+   (set (match_operand:VEC_TI 2 "memory_operand")
+	(rotate:VEC_TI (match_dup 0)
 		       (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
    && peep2_reg_dead_p (2, operands[0])"
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index ec3dcf3dff1..25e2c9d1af4 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -53,6 +53,18 @@ void print_i128(__int128_t val)
 
 void abort (void);
 
+__attribute__((noinline))
+__int128_t shift_right (__int128_t a, __uint128_t b)
+{
+  return a >> b;
+}
+
+__attribute__((noinline))
+__int128_t shift_left (__int128_t a, __uint128_t b)
+{
+  return a << b;
+}
+
 int main ()
 {
   int i, result_int;
@@ -141,7 +153,7 @@ int main ()
 #endif
   }
 
-  arg1 = 3;
+  arg1 = vec_result[0];
   uarg2 = 4;
   expected_result = arg1*16;
 
@@ -225,7 +237,7 @@ int main ()
 #endif
   }
 
-  arg1 = 48;
+  arg1 = vec_uresult[0];
   uarg2 = 4;
   expected_result = arg1/16;
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 5/5] Conversions between 128-bit integer and floating point values.
  2020-09-21 21:17 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
                   ` (3 preceding siblings ...)
  2020-09-21 23:56 ` [PATCH 4/5] Test 128-bit shifts for just the int128 type Carl Love
@ 2020-09-21 23:57 ` Carl Love
  2020-09-24 18:22   ` will schmidt
  4 siblings, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-09-21 23:57 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt


Segher, Will:

Patch 5 adds the 128-bit integer to/from 128-floating point
conversions.  This patch has to invoke the routines to use the 128-bit
hardware instructions if on Power 10 or use software routines if
running on a pre Power 10 system via the resolve function.

Add ifunc resolves for __fixkfti, __floatuntikf_sw, __fixkfti_swn,
__fixunskfti_sw.

The following changes were made to the previous version of the patch: 

Fixed typos in ChangeLog noted by Will.

Turned off debug in test case.

Removed extra blank lines, fixed spacing of #else in the test case.

Added comment to fixunskfti-sw.c about changes made from the original
file fixunskfti.c.

The patch has been tested on

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regression errors.

The P10 tests were run by hand on Mambo.

                    Carl Love
---------------------------------------------------------

gcc/ChangeLog

2020-09-21  Carl Love  <cel@us.ibm.com>
	config/rs6000/rs6000.md (floatti<mode>2, floatunsti<mode>2,
	fix_trunc<mode>ti2, fixuns_trunc<mode>ti2): Add
	define_insn for mode IEEE 128.
	libgcc/config/rs6000/fixkfi-sw.c: New file.
	libgcc/config/rs6000/fixkfi.c: Remove file.
	libgcc/config/rs6000/fixunskfti-sw.c: New file.
	libgcc/config/rs6000/fixunskfti.c: Remove file.
	libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
	New functions.
	libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
	New macro.
	(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
	__fixunskfti_resolve): Add resolve functions.
	(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
	functions.
	libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
	__fixtfti, __fixunstfti): Add editor commands to change
	names.
	libgcc/config/rs6000/float128-sed-hw (__floattitf,
	__floatuntitf, __fixtfti, __fixunstfti): Add editor commands
	to change names.
	libgcc/config/rs6000/floattikf-sw.c: New file.
	libgcc/config/rs6000/floattikf.c: Remove file.
	libgcc/config/rs6000/floatuntikf-sw.c: New file.
	libgcc/config/rs6000/floatuntikf.c: Remove file.
	libgcc/config/rs6000/floatuntikf-sw.c: New file.
	libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
	__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
	__floatuntikf, __fixkfti, __fixunskfti): New extern declarations.
	libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
	fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
	(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
	file names to fp128_ppc_funcs.

gcc/testsuite/ChangeLog

2020-09-21  Carl Love  <cel@us.ibm.com>
	gcc.target/powerpc/fl128_conversions.c: New file.
---
 gcc/config/rs6000/rs6000.md                   |  36 +++
 .../gcc.target/powerpc/fp128_conversions.c    | 286 ++++++++++++++++++
 .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
 .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   7 +-
 libgcc/config/rs6000/float128-hw.c            |  24 ++
 libgcc/config/rs6000/float128-ifunc.c         |  44 ++-
 libgcc/config/rs6000/float128-sed             |   4 +
 libgcc/config/rs6000/float128-sed-hw          |   4 +
 .../rs6000/{floattikf.c => floattikf-sw.c}    |   4 +-
 .../{floatuntikf.c => floatuntikf-sw.c}       |   4 +-
 libgcc/config/rs6000/quad-float128.h          |  17 +-
 libgcc/config/rs6000/t-float128               |   3 +-
 12 files changed, 417 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
 rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
 rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (90%)
 rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
 rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 694ff70635e..5db5d0b4505 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6390,6 +6390,42 @@
    xscvsxddp %x0,%x1"
   [(set_attr "type" "fp")])
 
+(define_insn "floatti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvsqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "floatunsti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvuqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fix_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpsqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fixuns_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpuqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
 ; Allow the combiner to merge source memory operands to the conversion so that
 ; the optimizer/register allocator doesn't try to load the value too early in a
 ; GPR and then use store/load to move it to a FPR and suffer from a store-load
diff --git a/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
new file mode 100644
index 00000000000..dff24e1e2b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
@@ -0,0 +1,286 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10" } */
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mxscvsqqp\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mxscvuqqp\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mxscvqpsqz\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mxscvqpuqz\M} 1 { target { ppc_native_128bit } } } } */
+
+#include <stdio.h>
+#include <math.h>
+#include <fenv.h>
+#include <stdlib.h>
+#include <wchar.h>
+
+#define DEBUG 0
+
+void abort (void);
+
+float conv_i_2_fp( long long int a)
+{
+  return (float) a;
+}
+
+double conv_i_2_fpd( long long int a)
+{
+  return (double) a;
+}
+
+double conv_ui_2_fpd( unsigned long long int a)
+{
+  return (double) a;
+}
+
+__float128 conv_i128_2_fp128 (__int128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__float128 conv_ui128_2_fp128 (__uint128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__int128_t conv_fp128_2_i128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__int128_t) a;
+}
+
+__uint128_t conv_fp128_2_ui128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__uint128_t) a;
+}
+
+long double conv_i128_2_ld (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (long double) a;
+}
+
+__ibm128 conv_i128_2_ibm128 (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble, message uses IBM long double, no binary output
+  return (__ibm128) a;
+}
+
+int main()
+{
+	float a, expected_result_float;
+	double b, expected_result_double;
+	long long int c, expected_result_llint;
+	unsigned long long int u;
+	__int128_t d;
+	__uint128_t u128;
+	unsigned long long expected_result_uint128[2] ;
+	__float128 e;
+	long double ld;     // another 128-bit float version
+
+	union conv_t {
+		float a;
+		double b;
+		long long int c;
+		long long int128[2] ;
+		unsigned long long uint128[2] ;
+		unsigned long long int u;
+		__int128_t d;
+		__uint128_t u128;
+		__float128 e;
+		long double ld;     // another 128-bit float version
+	} conv, conv_result;
+
+	c = 20;
+	expected_result_llint = 20.00000;
+	a = conv_i_2_fp (c);
+
+	if (a != expected_result_llint) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fp(%lld) = %10.5f\n", c, a);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_llint);
+#else
+		abort();
+#endif
+	}
+
+	c = 20;
+	expected_result_double = 20.00000;
+	b = conv_i_2_fpd (c);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fpd(%lld) = %10.5f\n", d, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+	u = 20;
+	expected_result_double = 20.00000;
+	b = conv_ui_2_fpd (u);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_ui_2_fpd(%llu) = %10.5f\n", u, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+  /* Currently printing 128-bit float does not work correctly  */
+  d = -3210;
+  d = (d * 10000000000) + 9876543210;
+  conv_result.e = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0xc02bd2f9068d1160;
+  expected_result_uint128[0] = 0x0;
+  
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(-32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  d = 123;
+  d = (d * 10000000000) + 1234567890;
+  conv_result.ld = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x4271eab4c8ed2000;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(1231234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  /* Currently printing 128-bit float does not work correctly  */
+  u128 = 8760;
+  u128 = (u128 * 10000000000) + 1234567890;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+  expected_result_uint128[1] = 0x402d3eb101df8b48;
+  expected_result_uint128[0] = 0x0;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(87601234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  /* Currently printing 128-bit float does not work correctly  */
+  u128 = 3210;
+  u128 = (u128 * 10000000000) + 9876543210;
+  expected_result_uint128[1] = 0x402bd3429c8feea0;
+  expected_result_uint128[0] = 0x0;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 12345.6789;
+  expected_result_uint128[1] = 0x1407374883526960;
+  expected_result_uint128[0] = 0x3039;
+
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) =  ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = -6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0xffffffffffffe57b;
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x1a85;
+  conv_result.d = conv_fp128_2_ui128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_ui128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+	  
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  return 0;
+}
diff --git a/libgcc/config/rs6000/fixkfti.c b/libgcc/config/rs6000/fixkfti-sw.c
similarity index 96%
rename from libgcc/config/rs6000/fixkfti.c
rename to libgcc/config/rs6000/fixkfti-sw.c
index a22286228aa..d6bbbf889b7 100644
--- a/libgcc/config/rs6000/fixkfti.c
+++ b/libgcc/config/rs6000/fixkfti-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TItype
-__fixkfti (TFtype a)
+__fixkfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/fixunskfti.c b/libgcc/config/rs6000/fixunskfti-sw.c
similarity index 90%
rename from libgcc/config/rs6000/fixunskfti.c
rename to libgcc/config/rs6000/fixunskfti-sw.c
index ab232d92d24..5c8b94b8862 100644
--- a/libgcc/config/rs6000/fixunskfti.c
+++ b/libgcc/config/rs6000/fixunskfti-sw.c
@@ -5,7 +5,10 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
+
+   The fixunskfti.c file renamed fixunskfti-sw.c.  Updated the
+   source file names and made whitespace fixes
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +38,7 @@
 #include "quad-float128.h"
 
 UTItype
-__fixunskfti (TFtype a)
+__fixunskfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/float128-hw.c b/libgcc/config/rs6000/float128-hw.c
index 8705b53e22a..be8bd07e853 100644
--- a/libgcc/config/rs6000/float128-hw.c
+++ b/libgcc/config/rs6000/float128-hw.c
@@ -86,6 +86,30 @@ __floatdikf_hw (DItype_ppc a)
   return (TFtype) a;
 }
 
+TFtype
+__floattikf_hw (TItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TFtype
+__floatuntikf_hw (UTItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TItype_ppc
+__fixkfti_hw (TFtype a)
+{
+  return (TItype_ppc) a;
+}
+
+UTItype_ppc
+__fixunskfti_hw (TFtype a)
+{
+  return (UTItype_ppc) a;
+}
+
 TFtype
 __floatundikf_hw (UDItype_ppc a)
 {
diff --git a/libgcc/config/rs6000/float128-ifunc.c b/libgcc/config/rs6000/float128-ifunc.c
index c2f65912a74..c221be2c864 100644
--- a/libgcc/config/rs6000/float128-ifunc.c
+++ b/libgcc/config/rs6000/float128-ifunc.c
@@ -46,14 +46,9 @@
 #endif
 
 #define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW : SW)
+#define SW_OR_HW_ISA3_1(SW, HW) (__builtin_cpu_supports ("arch_3_1") ? HW : SW)
 
 /* Resolvers.  */
-
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 static __typeof__ (__addkf3_sw) *
 __addkf3_resolve (void)
 {
@@ -102,6 +97,18 @@ __floatdikf_resolve (void)
   return SW_OR_HW (__floatdikf_sw, __floatdikf_hw);
 }
 
+static __typeof__ (__floattikf_sw) *
+__floattikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floattikf_sw, __floattikf_hw);
+}
+
+static __typeof__ (__floatuntikf_sw) *
+__floatuntikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floatuntikf_sw, __floatuntikf_hw);
+}
+
 static __typeof__ (__floatunsikf_sw) *
 __floatunsikf_resolve (void)
 {
@@ -114,6 +121,19 @@ __floatundikf_resolve (void)
   return SW_OR_HW (__floatundikf_sw, __floatundikf_hw);
 }
 
+
+static __typeof__ (__fixkfti_sw) *
+__fixkfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixkfti_sw, __fixkfti_hw);
+}
+
+static __typeof__ (__fixunskfti_sw) *
+__fixunskfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixunskfti_sw, __fixunskfti_hw);
+}
+
 static __typeof__ (__fixkfsi_sw) *
 __fixkfsi_resolve (void)
 {
@@ -303,6 +323,18 @@ TFtype __floatsikf (SItype_ppc)
 TFtype __floatdikf (DItype_ppc)
   __attribute__ ((__ifunc__ ("__floatdikf_resolve")));
 
+TFtype __floattikf (TItype_ppc)
+  __attribute__ ((__ifunc__ ("__floattikf_resolve")));
+
+TFtype __floatuntikf (UTItype_ppc)
+  __attribute__ ((__ifunc__ ("__floatuntikf_resolve")));
+
+TItype_ppc __fixkfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixkfti_resolve")));
+
+UTItype_ppc __fixunskfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixunskfti_resolve")));
+
 TFtype __floatunsikf (USItype_ppc)
   __attribute__ ((__ifunc__ ("__floatunsikf_resolve")));
 
diff --git a/libgcc/config/rs6000/float128-sed b/libgcc/config/rs6000/float128-sed
index d9a089ff9ba..c0fcddb1959 100644
--- a/libgcc/config/rs6000/float128-sed
+++ b/libgcc/config/rs6000/float128-sed
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi/g
 s/__fixunstfdi/__fixunskfdi/g
 s/__fixunstfsi/__fixunskfsi/g
 s/__floatditf/__floatdikf/g
+s/__floattitf/__floattikf/g
+s/__floatuntitf/__floatuntikf/g
+s/__fixtfti/__fixkfti/g
+s/__fixunstfti/__fixunskfti/g
 s/__floatsitf/__floatsikf/g
 s/__floatunditf/__floatundikf/g
 s/__floatunsitf/__floatunsikf/g
diff --git a/libgcc/config/rs6000/float128-sed-hw b/libgcc/config/rs6000/float128-sed-hw
index acf36b0c17d..3d2bf556da1 100644
--- a/libgcc/config/rs6000/float128-sed-hw
+++ b/libgcc/config/rs6000/float128-sed-hw
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi_sw/g
 s/__fixunstfdi/__fixunskfdi_sw/g
 s/__fixunstfsi/__fixunskfsi_sw/g
 s/__floatditf/__floatdikf_sw/g
+s/__floattitf/__floattikf_sw/g
+s/__floatuntitf/__floatuntikf_sw/g
+s/__fixtfti/__fixkfti_sw/g
+s/__fixunstfti/__fixunskfti_sw/g
 s/__floatsitf/__floatsikf_sw/g
 s/__floatunditf/__floatundikf_sw/g
 s/__floatunsitf/__floatunsikf_sw/g
diff --git a/libgcc/config/rs6000/floattikf.c b/libgcc/config/rs6000/floattikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floattikf.c
rename to libgcc/config/rs6000/floattikf-sw.c
index 4e8c40cfbe4..110706352bb 100644
--- a/libgcc/config/rs6000/floattikf.c
+++ b/libgcc/config/rs6000/floattikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floattikf (TItype i)
+__floattikf_sw (TItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/floatuntikf.c b/libgcc/config/rs6000/floatuntikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floatuntikf.c
rename to libgcc/config/rs6000/floatuntikf-sw.c
index 8bfba4267d4..5e712a67e26 100644
--- a/libgcc/config/rs6000/floatuntikf.c
+++ b/libgcc/config/rs6000/floatuntikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floatuntikf (UTItype i)
+__floatuntikf_sw (UTItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/quad-float128.h b/libgcc/config/rs6000/quad-float128.h
index 32ef328a8ea..24712b9277f 100644
--- a/libgcc/config/rs6000/quad-float128.h
+++ b/libgcc/config/rs6000/quad-float128.h
@@ -87,19 +87,18 @@ extern USItype_ppc __fixunskfsi_sw (TFtype);
 extern UDItype_ppc __fixunskfdi_sw (TFtype);
 extern TFtype __floatsikf_sw (SItype_ppc);
 extern TFtype __floatdikf_sw (DItype_ppc);
+extern TFtype __floattikf_sw (TItype_ppc);
 extern TFtype __floatunsikf_sw (USItype_ppc);
 extern TFtype __floatundikf_sw (UDItype_ppc);
+extern TFtype __floatuntikf_sw (UTItype_ppc);
+extern TItype_ppc __fixkfti_sw (TFtype);
+extern UTItype_ppc __fixunskfti_sw (TFtype);
 extern IBM128_TYPE __extendkftf2_sw (TFtype);
 extern TFtype __trunctfkf2_sw (IBM128_TYPE);
 extern TCtype __mulkc3_sw (TFtype, TFtype, TFtype, TFtype);
 extern TCtype __divkc3_sw (TFtype, TFtype, TFtype, TFtype);
 
 #ifdef _ARCH_PPC64
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 extern TItype_ppc __fixkfti (TFtype);
 extern UTItype_ppc __fixunskfti (TFtype);
 extern TFtype __floattikf (TItype_ppc);
@@ -130,8 +129,12 @@ extern USItype_ppc __fixunskfsi_hw (TFtype);
 extern UDItype_ppc __fixunskfdi_hw (TFtype);
 extern TFtype __floatsikf_hw (SItype_ppc);
 extern TFtype __floatdikf_hw (DItype_ppc);
+extern TFtype __floattikf_hw (TItype_ppc);
 extern TFtype __floatunsikf_hw (USItype_ppc);
 extern TFtype __floatundikf_hw (UDItype_ppc);
+extern TFtype __floatuntikf_hw (UTItype_ppc);
+extern TItype_ppc __fixkfti_hw (TFtype);
+extern UTItype_ppc __fixunskfti_hw (TFtype);
 extern IBM128_TYPE __extendkftf2_hw (TFtype);
 extern TFtype __trunctfkf2_hw (IBM128_TYPE);
 extern TCtype __mulkc3_hw (TFtype, TFtype, TFtype, TFtype);
@@ -162,8 +165,12 @@ extern USItype_ppc __fixunskfsi (TFtype);
 extern UDItype_ppc __fixunskfdi (TFtype);
 extern TFtype __floatsikf (SItype_ppc);
 extern TFtype __floatdikf (DItype_ppc);
+extern TFtype __floattikf (TItype_ppc);
 extern TFtype __floatunsikf (USItype_ppc);
 extern TFtype __floatundikf (UDItype_ppc);
+extern TFtype __floatuntikf (UTItype_ppc);
+extern TItype_ppc __fixkfti (TFtype);
+extern UTItype_ppc __fixunskfti (TFtype);
 extern IBM128_TYPE __extendkftf2 (TFtype);
 extern TFtype __trunctfkf2 (IBM128_TYPE);
 
diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
index d5413445189..325b22fd49e 100644
--- a/libgcc/config/rs6000/t-float128
+++ b/libgcc/config/rs6000/t-float128
@@ -23,7 +23,8 @@ fp128_softfp_shared_obj	= $(addsuffix -sw_s$(objext),$(fp128_softfp_funcs))
 fp128_softfp_obj	= $(fp128_softfp_static_obj) $(fp128_softfp_shared_obj)
 
 # New functions for software emulation
-fp128_ppc_funcs		= floattikf floatuntikf fixkfti fixunskfti \
+fp128_ppc_funcs		= floattikf-sw floatuntikf-sw \
+			  fixkfti-sw fixunskfti-sw \
 			  extendkftf2-sw trunctfkf2-sw \
 			  sfp-exceptions _mulkc3 _divkc3 _powikf2
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations
  2020-09-21 23:56 ` [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
@ 2020-09-24 18:20   ` will schmidt
  2020-10-05 18:51     ` Carl Love
  0 siblings, 1 reply; 41+ messages in thread
From: will schmidt @ 2020-09-24 18:20 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 1, adds the 128-bit sign extension instruction support and
> corresponding builtin support.
> 
> No changes from the previous version.
> 
> The patch has been tested on 
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> Fixed the issues in the ChangeLog noted by Will.
> 
>              Carl Love
> 
> ---------------------------------------------------
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
> 	for new builtins.
> 	* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
> 	overloaded builtin definitions.
> 	(VSIGNEXTSB2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D): Add builtin
> 	expansions.

+VSIGNEXTSH2W


> 	* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
> 	P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
> 	* config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
> 	visible.
> 	* doc/extend.texi:  Add documentation for the vec_signexti and
> 	vec_signextll builtins.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  <cel@us.ibm.com>
> 	* gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
> ---
>  gcc/config/rs6000/altivec.h                   |   3 +
>  gcc/config/rs6000/rs6000-builtin.def          |   9 ++
>  gcc/config/rs6000/rs6000-call.c               |  13 ++
>  gcc/config/rs6000/vsx.md                      |   2 +-
>  gcc/doc/extend.texi                           |  15 ++
>  .../powerpc/p9-sign_extend-runnable.c         | 128 ++++++++++++++++++
>  6 files changed, 169 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index 8a2dcda0144..acc365612be 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -494,6 +494,9 @@
> 
>  #define vec_xlx __builtin_vec_vextulx
>  #define vec_xrx __builtin_vec_vexturx
> +#define vec_signexti  __builtin_vec_vsignexti
> +#define vec_signextll __builtin_vec_vsignextll
> +
>  #endif
> 
>  /* Predicates.
> diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
> index e91a48ddf5f..4c2e9460949 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2715,6 +2715,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,	"vprtybd")
>  BU_P9V_OVERLOAD_1 (VPRTYBQ,	"vprtybq")
>  BU_P9V_OVERLOAD_1 (VPRTYBW,	"vprtybw")
>  BU_P9V_OVERLOAD_1 (VPARITY_LSBB,	"vparity_lsbb")
> +BU_P9V_OVERLOAD_1 (VSIGNEXTI,	"vsignexti")
> +BU_P9V_OVERLOAD_1 (VSIGNEXTLL,	"vsignextll")
> 
>  /* 2 argument functions added in ISA 3.0 (power9).  */
>  BU_P9_2 (CMPRB,	"byte_in_range",	CONST,	cmprb)
> @@ -2726,6 +2728,13 @@ BU_P9_OVERLOAD_2 (CMPRB,	"byte_in_range")
>  BU_P9_OVERLOAD_2 (CMPRB2,	"byte_in_either_range")
>  BU_P9_OVERLOAD_2 (CMPEQB,	"byte_in_set")
>  
> +/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1.  */
> +BU_P9V_AV_1 (VSIGNEXTSB2W,	"vsignextsb2w",		CONST,  vsx_sign_extend_qi_v4si)
> +BU_P9V_AV_1 (VSIGNEXTSH2W,	"vsignextsh2w",		CONST,  vsx_sign_extend_hi_v4si)
> +BU_P9V_AV_1 (VSIGNEXTSB2D,	"vsignextsb2d",		CONST,  vsx_sign_extend_qi_v2di)
> +BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
> +BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
> +
>  /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
>  BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
>  BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index a8b520834c7..9e514a01012 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -5527,6 +5527,19 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI },
> 
> +  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 */
> +  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
> +    RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
> +    RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
> +
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
> +    RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
> +    RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
> +    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
> +
>    /* Overloaded built-in functions for ISA3.1 (power10). */
>    { P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB,
>      RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 },
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 4ff52455fd3..31fcffe8f33 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4787,7 +4787,7 @@
>    "vextsh2<wd> %0,%1"
>    [(set_attr "type" "vecexts")])
> 
> -(define_insn "*vsx_sign_extend_si_v2di"
> +(define_insn "vsx_sign_extend_si_v2di"
>    [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
>  	(unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
>  		     UNSPEC_VSX_SIGN_EXTEND))]
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 5571c4f2ff2..c1c2c9f9bf7 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -20800,6 +20800,21 @@ void vec_xst (vector unsigned char, int, vector unsigned char *);
>  void vec_xst (vector unsigned char, int, unsigned char *);
>  @end smallexample
> 
> +uThe following sign extension builtins are provided.
> +
> +@smallexample
> +vector signed int vec_signexti (vector signed char a)
> +vector signed long long vec_signextll (vector signed char a)
> +vector signed int vec_signexti (vector signed short a)
> +vector signed long long vec_signextll (vector signed short a)
> +vector signed long long vec_signextll (vector signed int a)

I'd sort by return type, versus parameter type, but I'd defer.. I don't
see that existing documentation is entirely consistent in the ordering.



Nothing else jumps out at me.

this one lgtm.
thanks
-Will




> +@end smallexample
> +
> +Each element of the result is produced by sign-extending the element of the
> +input vector that would fall in the least significant portion of the result
> +element. For example, a sign-extension of a vector signed char to a vector
> +signed long long will sign extend the rightmost byte of each doubleword.
> +
>  @node PowerPC AltiVec Built-in Functions Available on ISA 3.1
>  @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
> new file mode 100644
> index 00000000000..7bf979c6fd4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
> @@ -0,0 +1,128 @@
> +/* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
> +
> +/* These builtins were not defined until ISA 3.1 but only require ISA 3.0
> +   support.  */
> +
> +/* { dg-final { scan-assembler-times {\mvextsb2w\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvextsh2w\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvextsh2d\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvextsw2d\M} 1 } } */
> +
> +#include <altivec.h>
> +
> +#define DEBUG 0
> +
> +#if DEBUG
> +#include <stdio.h>
> +#include <stdlib.h>
> +#endif
> +
> +void abort (void);
> +
> +int main ()
> +{
> +  int i;
> +
> +  vector signed char vec_arg_qi, vec_result_qi;
> +  vector signed short int vec_arg_hi, vec_result_hi, vec_expected_hi;
> +  vector signed int vec_arg_wi, vec_result_wi, vec_expected_wi;
> +  vector signed long long vec_result_di, vec_expected_di;
> +
> +  /* test sign extend byte to word */
> +  vec_arg_qi = (vector signed char) {1, 2, 3, 4, 5, 6, 7, 8,
> +				     -1, -2, -3, -4, -5, -6, -7, -8};
> +  vec_expected_wi = (vector signed int) {1, 5, -1, -5};
> +
> +  vec_result_wi = vec_signexti (vec_arg_qi);
> +
> +  for (i = 0; i < 4; i++)
> +    if (vec_result_wi[i] != vec_expected_wi[i]) {
> +#if DEBUG
> +      printf("ERROR: vec_signexti(char, int):  ");
> +      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n",
> +	     i, i);
> +      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
> +      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  /* test sign extend byte to double */
> +  vec_arg_qi = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,
> +				    -1, -2, -3, -4, -5, -6, -7, -8};
> +  vec_expected_di = (vector signed long long int){1, -1};
> +
> +  vec_result_di = vec_signextll(vec_arg_qi);
> +
> +  for (i = 0; i < 2; i++)
> +    if (vec_result_di[i] != vec_expected_di[i]) {
> +#if DEBUG
> +      printf("ERROR: vec_signextll(byte, long long int):  ");
> +      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
> +      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
> +      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  /* test sign extend short to word */
> +  vec_arg_hi = (vector signed short int){1, 2, 3, 4, -1, -2, -3, -4};
> +  vec_expected_wi = (vector signed int){1, 3, -1, -3};
> +
> +  vec_result_wi = vec_signexti(vec_arg_hi);
> +
> +  for (i = 0; i < 4; i++)
> +    if (vec_result_wi[i] != vec_expected_wi[i]) {
> +#if DEBUG
> +      printf("ERROR: vec_signexti(short, int):  ");
> +      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n", i, i);
> +      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
> +      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  /* test sign extend short to double word */
> +  vec_arg_hi = (vector signed short int ){1, 3, 5, 7,  -1, -3, -5, -7};
> +  vec_expected_di = (vector signed long long int){1, -1};
> +
> +  vec_result_di = vec_signextll(vec_arg_hi);
> +
> +  for (i = 0; i < 2; i++)
> +    if (vec_result_di[i] != vec_expected_di[i]) {
> +#if DEBUG
> +      printf("ERROR: vec_signextll(short, double):  ");
> +      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
> +      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
> +      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  /* test sign extend word to double word */
> +  vec_arg_wi = (vector signed int ){1, 3, -1, -3};
> +  vec_expected_di = (vector signed long long int){1, -1};
> +
> +  vec_result_di = vec_signextll(vec_arg_wi);
> +
> +  for (i = 0; i < 2; i++)
> +    if (vec_result_di[i] != vec_expected_di[i]) {
> +#if DEBUG
> +      printf("ERROR: vec_signextll(word, double):  ");
> +      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
> +      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
> +      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  return 0;
> +}


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/5] RS6000 add 128-bit Integer Operations
  2020-09-21 23:56 ` [PATCH 2/5] RS6000 add 128-bit Integer Operations Carl Love
@ 2020-09-24 18:21   ` will schmidt
  2020-10-05 18:52     ` [PATCH 2a/5] rs6000, vec_rlnm builtin fix arguments Carl Love
  2020-10-05 18:52     ` [PATCH 2b/5] RS6000 add 128-bit Integer Operations Carl Love
  0 siblings, 2 replies; 41+ messages in thread
From: will schmidt @ 2020-09-24 18:21 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Will, Segher:
> 
> Add support for divide, modulo, shift, compare of 128-bit
> integers instructions and builtin support.
> 
> The following are the changes from the previous version of the patch.
> 
> The TARGET_TI_VECTOR_OPS was removed per comments for patch 3.  Just
> using TARGET_POWER10.
> 
> Removed extra comment.
> 
> Note the change
> 
> -#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
> +#define
> vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
> 
> is a bug fix. Added missing comment to ChangeLog.
> 
> Removed vector_eqv1ti, used eqvv1ti3 instead.
> 
> Test case, put ppc_native_128bit in the dg-require command.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 test was run by hand on Mambo.
> 
> 
>                     Carl Love
> 
> 
> -------------------------------------------------------
> 
> gcc/ChangeLog
> 
> 	2020-09-21  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
> 	for new builtins.
> 	(vec_rlnm): Fix bug in argument generation.


If there is delay, this bugfix could (and probably should) be broken out into it's own patch.


> 	* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
> 	UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
> 	(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
> 	altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
> 	altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
> 	altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
> 	altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
> 	define_insn.

altivec_vrlqnm, altivec_vrlqmi should be in the define_expand list.


> 	(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
> 	vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
> 	altivec_vrlqnm): New define_expands.

Actually, they are here... so just need to be removed from the
define_insn list.  :-)


> 	* config/rs6000/rs6000-builtin.def (BU_P10_P, BU_P10_128BIT_1,
> 	BU_P10_128BIT_2, BU_P10_128BIT_3): New macro definitions.

Question below.

> 	(VCMPEQUT_P, VCMPGTST_P, VCMPGTUT_P): Add macro expansions.
> 	(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
> 	CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
> 	VCMPAET_P): New macro expansions.
> 	(VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ, VSLQ,
> 	VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
> 	MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
> 	(VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions.
> 	* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
> 	P10_BUILTIN_CMPGE_1TI, P10_BUILTIN_CMPGE_U1TI,
> 	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPGTST,
> 	P10_BUILTIN_CMPLE_1TI, P10_BUILTIN_VCMPLE_U1TI,
> 	P10_BUILTIN_128BIT_DIV_V1TI, P10_BUILTIN_128BIT_UDIV_V1TI,
> 	P10_BUILTIN_128BIT_VMULESD, P10_BUILTIN_128BIT_VMULEUD,
> 	P10_BUILTIN_128BIT_VMULOSD, P10_BUILTIN_128BIT_VMULOUD,
> 	P10_BUILTIN_VNOR_V1TI, P10_BUILTIN_VNOR_V1TI_UNS,
> 	P10_BUILTIN_128BIT_VRLQ, P10_BUILTIN_128BIT_VRLQMI,
> 	P10_BUILTIN_128BIT_VRLQNM, P10_BUILTIN_128BIT_VSLQ,
> 	P10_BUILTIN_128BIT_VSRQ, P10_BUILTIN_128BIT_VSRAQ,
> 	P10_BUILTIN_VCMPGTUT_P, P10_BUILTIN_VCMPGTST_P,
> 	P10_BUILTIN_VCMPEQUT_P, P10_BUILTIN_VCMPGTUT_P,
> 	P10_BUILTIN_VCMPGTST_P, P10_BUILTIN_CMPNET,
> 	P10_BUILTIN_VCMPNET_P, P10_BUILTIN_VCMPAET_P,
> 	P10_BUILTIN_128BIT_VSIGNEXTSD2Q, P10_BUILTIN_128BIT_DIVES_V1TI,
> 	P10_BUILTIN_128BIT_MODS_V1TI, P10_BUILTIN_128BIT_MODU_V1TI):
> 	New overloaded definitions.

Looks like those should (all?) now be P10V_BUILTIN_<foo>.


> 	(int_ftype_int_v1ti_v1ti) [P10_BUILTIN_VCMPEQUT,

?  That appeasr to be the (rs6000_gimple_fold_builtin) function.  

> 	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
> 	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
> 	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
> 	P10_BUILTIN_CMPLE_U1TI, E_V1TImode]: New case statements.

Also should be P10V_BUILTIN_<foo>  ?

I see both P10_BUILTIN_CMPNET and P10V_BUILTIN_CMPNET references in the
patch.  

> 	(int_ftype_int_v1ti_v1ti) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
> 	New assignments.

Thats in the (rs6000_init_builtins) function.

> 	(altivec_init_builtins): New E_V1TImode case statement.
> 	(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
> 	P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
> 	P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
> 	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.

May need a refresh with respect to the P10_BUILTIN_<foo> vs
P10V_BUILTIN_<foo> ... 


> 	* config/rs6000/r6000.c (rs6000_option_override_internal): Add if
> 	TARGET_POWER10 statement.

Don't see this in patch.

> 	(rs6000_handle_altivec_attribute)[ E_TImode, E_V1TImode]: New case
> 	statements.
ok
> 	(rs6000_opt_masks): Add ti-vector-ops entry.

Don't see this.

> 	* config/rs6000/r6000.h (RS6000_BTM_P10_128BIT, bool_V1TI_type_node):
> 	New defines.
> 	(rs6000_builtin_type_index): New enum value RS6000_BTI_bool_V1TI.
> 	* config/rs6000/rs6000.opt: New mti-vector-ops entry.
> 	* config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti,
> 	vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti,
> 	vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
> 	vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
> 	vlshrv1ti3, vashrv1ti3): New define_expands.
> 	* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
> 	UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
> 	UNSPEC_VSX_MODUQ, UNSPEC_XXSWAPD_V1TI): New unspecs.
> 	(vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti, vsx_diveu_v1ti,
> 	vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti, vsx_sign_extend_v2di_v1ti):
> 	New define_insns.
> 	(vcmpnet): New define_expand.
> 	* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
> 	vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
> 	vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,
> 	vec_cmpge, vec_cmple, vec_all_eq, vec_all_ne, vec_all_gt, vec_all_lt,
> 	vec_all_ge, vec_all_le, vec_any_eq, vec_any_ne, vec_any_gt, vec_any_lt,
> 	vec_any_ge, vec_any_le.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  <cel@us.ibm.com>
> 	* gcc.target/powerpc/int_128bit-runnable.c: New test file.
> ---
>  gcc/config/rs6000/altivec.h                   |    6 +-
>  gcc/config/rs6000/altivec.md                  |  240 ++
>  gcc/config/rs6000/rs6000-builtin.def          |   84 +
>  gcc/config/rs6000/rs6000-call.c               |  138 +-
>  gcc/config/rs6000/rs6000.c                    |    1 +
>  gcc/config/rs6000/rs6000.h                    |    5 +-
>  gcc/config/rs6000/vector.md                   |  191 ++
>  gcc/config/rs6000/vsx.md                      |   97 +
>  gcc/doc/extend.texi                           |  174 ++
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 2254 +++++++++++++++++
>  10 files changed, 3187 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index acc365612be..fc67073f79c 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -183,7 +183,7 @@
>  #define vec_recipdiv __builtin_vec_recipdiv
>  #define vec_rlmi __builtin_vec_rlmi
>  #define vec_vrlnm __builtin_vec_rlnm
> -#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
> +#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
>  #define vec_rsqrt __builtin_vec_rsqrt
>  #define vec_rsqrte __builtin_vec_rsqrte
>  #define vec_signed __builtin_vec_vsigned
> @@ -690,6 +690,10 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
> 
>  #ifdef _ARCH_PWR10
> +#define vec_signextq  __builtin_vec_vsignextq
> +#define vec_dive __builtin_vec_dive
> +#define vec_mod  __builtin_vec_mod
> +
>  /* May modify these macro definitions if future capabilities overload
>     with support for different vector argument and result types.  */
>  #define vec_cntlzm(a, b)	__builtin_altivec_vclzdm (a, b)
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 0a2e634d6b0..34a4731342a 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -39,12 +39,16 @@
>     UNSPEC_VMULESH
>     UNSPEC_VMULEUW
>     UNSPEC_VMULESW
> +   UNSPEC_VMULEUD
> +   UNSPEC_VMULESD
>     UNSPEC_VMULOUB
>     UNSPEC_VMULOSB
>     UNSPEC_VMULOUH
>     UNSPEC_VMULOSH
>     UNSPEC_VMULOUW
>     UNSPEC_VMULOSW
> +   UNSPEC_VMULOUD
> +   UNSPEC_VMULOSD
>     UNSPEC_VPKPX
>     UNSPEC_VPACK_SIGN_SIGN_SAT
>     UNSPEC_VPACK_SIGN_UNS_SAT
> @@ -628,6 +632,14 @@
>    "vcmpequ<VI_char> %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "altivec_eqv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(eq:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
> +		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vcmpequq %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_gt<mode>"
>    [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
>  	(gt:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
> @@ -636,6 +648,14 @@
>    "vcmpgts<VI_char> %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_gtv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(gt:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
> +		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vcmpgtsq %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_gtu<mode>"
>    [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
>  	(gtu:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
> @@ -644,6 +664,14 @@
>    "vcmpgtu<VI_char> %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_gtuv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
> +		  (match_operand:V1TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vcmpgtuq %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_eqv4sf"
>    [(set (match_operand:V4SF 0 "altivec_register_operand" "=v")
>  	(eq:V4SF (match_operand:V4SF 1 "altivec_register_operand" "v")
> @@ -1687,6 +1715,19 @@
>   DONE;
>  })
> 
> +(define_expand "vec_widen_umult_even_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
> + DONE;
> +})
> +
>  (define_expand "vec_widen_smult_even_v4si"
>    [(use (match_operand:V2DI 0 "register_operand"))
>     (use (match_operand:V4SI 1 "register_operand"))
> @@ -1700,6 +1741,19 @@
>    DONE;
>  })
> 
> +(define_expand "vec_widen_smult_even_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
> + else
> +    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
>  (define_expand "vec_widen_umult_odd_v16qi"
>    [(use (match_operand:V8HI 0 "register_operand"))
>     (use (match_operand:V16QI 1 "register_operand"))
> @@ -1765,6 +1819,19 @@
>    DONE;
>  })
> 
> +(define_expand "vec_widen_umult_odd_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
>  (define_expand "vec_widen_smult_odd_v4si"
>    [(use (match_operand:V2DI 0 "register_operand"))
>     (use (match_operand:V4SI 1 "register_operand"))
> @@ -1778,6 +1845,19 @@
>    DONE;
>  })
> 
> +(define_expand "vec_widen_smult_odd_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
>  (define_insn "altivec_vmuleub"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
> @@ -1859,6 +1939,15 @@
>    "vmuleuw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmuleud"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULEUD))]
> +  "TARGET_POWER10"
> +  "vmuleud %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  (define_insn "altivec_vmulouw"
>    [(set (match_operand:V2DI 0 "register_operand" "=v")
>         (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -1868,6 +1957,15 @@
>    "vmulouw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmuloud"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULOUD))]
> +  "TARGET_POWER10"
> +  "vmuloud %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  (define_insn "altivec_vmulesw"
>    [(set (match_operand:V2DI 0 "register_operand" "=v")
>         (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -1877,6 +1975,15 @@
>    "vmulesw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmulesd"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULESD))]
> +  "TARGET_POWER10"
> +  "vmulesd %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  (define_insn "altivec_vmulosw"
>    [(set (match_operand:V2DI 0 "register_operand" "=v")
>         (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -1886,6 +1993,15 @@
>    "vmulosw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmulosd"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULOSD))]
> +  "TARGET_POWER10"
> +  "vmulosd %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  ;; Vector pack/unpack
>  (define_insn "altivec_vpkpx"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
> @@ -1979,6 +2095,15 @@
>    "vrl<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vrlq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +;; rotate amount in needs to be in bits[57:63] of operand2.
> +  "vrlq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "altivec_vrl<VI_char>mi"
>    [(set (match_operand:VIlong 0 "register_operand" "=v")
>          (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
> @@ -1989,6 +2114,33 @@
>    "vrl<VI_char>mi %0,%2,%3"
>    [(set_attr "type" "veclogical")])
> 
> +(define_expand "altivec_vrlqmi"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
> +		      (match_operand:V1TI 2 "vsx_register_operand")
> +		      (match_operand:V1TI 3 "vsx_register_operand")]
> +		     UNSPEC_VRLMI))]
> +  "TARGET_POWER10"
> +{
> +  /* Mask bit begin, end fields need to be in bits [41:55] of 128-bit operand2. */
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[3]));
> +  emit_insn(gen_altivec_vrlqmi_inst (operands[0], operands[1], operands[2], tmp));
> +  DONE;
> +})
> +
> +(define_insn "altivec_vrlqmi_inst"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +		      (match_operand:V1TI 2 "vsx_register_operand" "0")
> +		      (match_operand:V1TI 3 "vsx_register_operand" "v")]
> +		     UNSPEC_VRLMI))]
> +  "TARGET_POWER10"
> +  "vrlqmi %0,%1,%3"
> +  [(set_attr "type" "veclogical")])
> +
>  (define_insn "altivec_vrl<VI_char>nm"
>    [(set (match_operand:VIlong 0 "register_operand" "=v")
>          (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
> @@ -1998,6 +2150,31 @@
>    "vrl<VI_char>nm %0,%1,%2"
>    [(set_attr "type" "veclogical")])
> 
> +(define_expand "altivec_vrlqnm"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
> +		      (match_operand:V1TI 2 "vsx_register_operand")]
> +		     UNSPEC_VRLNM))]
> +  "TARGET_POWER10"
> +{
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vrlqnm_inst (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
> +(define_insn "altivec_vrlqnm_inst"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +		      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +		     UNSPEC_VRLNM))]
> +  "TARGET_POWER10"
> +  ;; rotate and mask bits need to be in upper 64-bits of operand2.
> +  "vrlqnm %0,%1,%2"
> +  [(set_attr "type" "veclogical")])
> +
>  (define_insn "altivec_vsl"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -2042,6 +2219,15 @@
>    "vsl<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vslq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
> +  "vslq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "*altivec_vsr<VI_char>"
>    [(set (match_operand:VI2 0 "register_operand" "=v")
>          (lshiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
> @@ -2050,6 +2236,15 @@
>    "vsr<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vsrq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
> +  "vsrq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "*altivec_vsra<VI_char>"
>    [(set (match_operand:VI2 0 "register_operand" "=v")
>          (ashiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
> @@ -2058,6 +2253,15 @@
>    "vsra<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vsraq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
> +  "vsraq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "altivec_vsr"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -2618,6 +2822,18 @@
>    "vcmpequ<VI_char>. %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "altivec_vcmpequt_p"
> +  [(set (reg:CC CR6_REGNO)
> +	(unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v")
> +			   (match_operand:V1TI 2 "altivec_register_operand" "v"))]
> +		   UNSPEC_PREDICATE))
> +   (set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(eq:V1TI (match_dup 1)
> +		 (match_dup 2)))]
> +  "TARGET_POWER10"
> +  "vcmpequq. %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_vcmpgts<VI_char>_p"
>    [(set (reg:CC CR6_REGNO)
>  	(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
> @@ -2630,6 +2846,18 @@
>    "vcmpgts<VI_char>. %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_vcmpgtst_p"
> +  [(set (reg:CC CR6_REGNO)
> +	(unspec:CC [(gt:CC (match_operand:V1TI 1 "register_operand" "v")
> +			   (match_operand:V1TI 2 "register_operand" "v"))]
> +		   UNSPEC_PREDICATE))
> +   (set (match_operand:V1TI 0 "register_operand" "=v")
> +	(gt:V1TI (match_dup 1)
> +		 (match_dup 2)))]
> +  "TARGET_POWER10"
> +  "vcmpgtsq. %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_vcmpgtu<VI_char>_p"
>    [(set (reg:CC CR6_REGNO)
>  	(unspec:CC [(gtu:CC (match_operand:VI2 1 "register_operand" "v")
> @@ -2642,6 +2870,18 @@
>    "vcmpgtu<VI_char>. %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_vcmpgtut_p"
> +  [(set (reg:CC CR6_REGNO)
> +	(unspec:CC [(gtu:CC (match_operand:V1TI 1 "register_operand" "v")
> +			    (match_operand:V1TI 2 "register_operand" "v"))]
> +		   UNSPEC_PREDICATE))
> +   (set (match_operand:V1TI 0 "register_operand" "=v")
> +	(gtu:V1TI (match_dup 1)
> +		  (match_dup 2)))]
> +  "TARGET_POWER10"
> +  "vcmpgtuq. %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_vcmpeqfp_p"
>    [(set (reg:CC CR6_REGNO)
>  	(unspec:CC [(eq:CC (match_operand:V4SF 1 "register_operand" "v")
> diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
> index 4c2e9460949..980bed55230 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -1061,6 +1061,14 @@
>  		     | RS6000_BTC_QUATERNARY),				\
>  		    CODE_FOR_ ## ICODE)			/* ICODE */
> 
> +#define BU_P10_P(ENUM, NAME, ATTR, ICODE)				\
> +  RS6000_BUILTIN_P (P10_BUILTIN_ ## ENUM,		/* ENUM */	\
> +		    "__builtin_altivec_" NAME,		/* NAME */	\
> +		    RS6000_BTM_P10_128BIT,		/* MASK */	\
> +		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
> +		     | RS6000_BTC_PREDICATE),				\
> +		    CODE_FOR_ ## ICODE)			/* ICODE */
> +

The notation doesn't quite fit with the other macros nearby.
Should that instead be BU_P10_128BIT_AV_P  ?



>  #define BU_P10_OVERLOAD_1(ENUM, NAME)					\
>    RS6000_BUILTIN_1 (P10_BUILTIN_VEC_ ## ENUM,		/* ENUM */	\
>  		    "__builtin_vec_" NAME,		/* NAME */	\
> @@ -1179,6 +1187,38 @@
>  		     | RS6000_BTC_TERNARY),				\
>  		    CODE_FOR_ ## ICODE)			/* ICODE */
> 
> +#define BU_P10_P(ENUM, NAME, ATTR, ICODE)				\
> +  RS6000_BUILTIN_P (P10_BUILTIN_ ## ENUM,		/* ENUM */	\
> +		    "__builtin_altivec_" NAME,		/* NAME */	\
> +		    RS6000_BTM_P10_128BIT,		/* MASK */	\
> +		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
> +		     | RS6000_BTC_PREDICATE),				\
> +		    CODE_FOR_ ## ICODE)			/* ICODE */
> +
> +#define BU_P10_128BIT_1(ENUM, NAME, ATTR, ICODE)			\
> +  RS6000_BUILTIN_1 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
> +		    "__builtin_altivec_" NAME,		/* NAME */	\
> +		    RS6000_BTM_P10_128BIT,		/* MASK */	\
> +		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
> +		     | RS6000_BTC_UNARY),				\
> +		    CODE_FOR_ ## ICODE)			/* ICODE */
> +
> +#define BU_P10_128BIT_2(ENUM, NAME, ATTR, ICODE)			\
> +  RS6000_BUILTIN_2 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
> +		    "__builtin_altivec_" NAME,		/* NAME */	\
> +		    RS6000_BTM_P10_128BIT,		/* MASK */	\
> +		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
> +		     | RS6000_BTC_BINARY),				\
> +		    CODE_FOR_ ## ICODE)			/* ICODE */
> +
> +#define BU_P10_128BIT_3(ENUM, NAME, ATTR, ICODE)			\
> +  RS6000_BUILTIN_3 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
> +		    "__builtin_altivec_" NAME,		/* NAME */	\
> +		    RS6000_BTM_P10_128BIT,		/* MASK */	\
> +		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
> +		     | RS6000_BTC_TERNARY),				\
> +		    CODE_FOR_ ## ICODE)			/* ICODE */
> +
>  
>  /* Insure 0 is not a legitimate index.  */
>  BU_SPECIAL_X (RS6000_BUILTIN_NONE, NULL, 0, RS6000_BTC_MISC)
> @@ -2736,6 +2776,10 @@ BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
>  BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
> 
>  /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
> +BU_P10_P (VCMPEQUT_P,		"vcmpequt_p",	CONST,	vector_eq_v1ti_p)
> +BU_P10_P (VCMPGTST_P,		"vcmpgtst_p",	CONST,	vector_gt_v1ti_p)
> +BU_P10_P (VCMPGTUT_P,		"vcmpgtut_p",	CONST,	vector_gtu_v1ti_p)
> +
>  BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
>  BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
>  BU_P10_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm)
> @@ -2756,6 +2800,38 @@ BU_P10V_VSX_2 (XXGENPCVM_V16QI, "xxgenpcvm_v16qi", CONST, xxgenpcvm_v16qi)
>  BU_P10V_VSX_2 (XXGENPCVM_V8HI, "xxgenpcvm_v8hi", CONST, xxgenpcvm_v8hi)
>  BU_P10V_VSX_2 (XXGENPCVM_V4SI, "xxgenpcvm_v4si", CONST, xxgenpcvm_v4si)
>  BU_P10V_VSX_2 (XXGENPCVM_V2DI, "xxgenpcvm_v2di", CONST, xxgenpcvm_v2di)
> +BU_P10V_AV_2 (VCMPGTUT,		"vcmpgtut",	CONST,	vector_gtuv1ti)
> +BU_P10V_AV_2 (VCMPGTST,		"vcmpgtst",	CONST,	vector_gtv1ti)
> +BU_P10V_AV_2 (VCMPEQUT,		"vcmpequt",	CONST,	eqvv1ti3)
> +BU_P10V_AV_2 (CMPNET,		"vcmpnet",	CONST,	vcmpnet)
> +BU_P10V_AV_2 (CMPGE_1TI,	"cmpge_1ti",    CONST,  vector_nltv1ti)
> +BU_P10V_AV_2 (CMPGE_U1TI,	"cmpge_u1ti",   CONST,  vector_nltuv1ti)
> +BU_P10V_AV_2 (CMPLE_1TI,	"cmple_1ti",    CONST,  vector_ngtv1ti)
> +BU_P10V_AV_2 (CMPLE_U1TI,	"cmple_u1ti",   CONST,  vector_ngtuv1ti)
> +BU_P10V_AV_2 (VNOR_V1TI_UNS,	"vnor_v1ti_uns",CONST,	norv1ti3)
> +BU_P10V_AV_2 (VNOR_V1TI,	"vnor_v1ti",	CONST,	norv1ti3)
> +BU_P10V_AV_2 (VCMPNET_P,	"vcmpnet_p",	CONST,	vector_ne_v1ti_p)
> +BU_P10V_AV_2 (VCMPAET_P,	"vcmpaet_p",	CONST,	vector_ae_v1ti_p)
> +
> +BU_P10_128BIT_1 (VSIGNEXTSD2Q,	"vsignext",     CONST,  vsx_sign_extend_v2di_v1ti)
> +
> +BU_P10_128BIT_2 (VMULEUD,	"vmuleud",	CONST,	vec_widen_umult_even_v2di)
> +BU_P10_128BIT_2 (VMULESD,	"vmulesd",	CONST,	vec_widen_smult_even_v2di)
> +BU_P10_128BIT_2 (VMULOUD,	"vmuloud",	CONST,	vec_widen_umult_odd_v2di)
> +BU_P10_128BIT_2 (VMULOSD,	"vmulosd",	CONST,	vec_widen_smult_odd_v2di)
> +BU_P10_128BIT_2 (VRLQ,		"vrlq",		CONST,	vrotlv1ti3)
> +BU_P10_128BIT_2 (VSLQ,		"vslq",		CONST,	vashlv1ti3)
> +BU_P10_128BIT_2 (VSRQ,		"vsrq",		CONST,	vlshrv1ti3)
> +BU_P10_128BIT_2 (VSRAQ,		"vsraq",	CONST,	vashrv1ti3)
> +BU_P10_128BIT_2 (VRLQNM,	"vrlqnm",	CONST,	altivec_vrlqnm)
> +BU_P10_128BIT_2 (DIV_V1TI,	"div_1ti",      CONST,  vsx_div_v1ti)
> +BU_P10_128BIT_2 (UDIV_V1TI,	"udiv_1ti",     CONST,  vsx_udiv_v1ti)
> +BU_P10_128BIT_2 (DIVES_V1TI,	"dives",	CONST,	vsx_dives_v1ti)
> +BU_P10_128BIT_2 (DIVEU_V1TI,	"diveu",	CONST,	vsx_diveu_v1ti)
> +BU_P10_128BIT_2 (MODS_V1TI,	"mods",		CONST,	vsx_mods_v1ti)
> +BU_P10_128BIT_2 (MODU_V1TI,	"modu",		CONST,	vsx_modu_v1ti)
> +
> +BU_P10_128BIT_3 (VRLQMI,	"vrlqmi",	CONST,	altivec_vrlqmi)
> 
>  BU_P10V_AV_3 (VEXTRACTBL, "vextdubvlx", CONST, vextractlv16qi)
>  BU_P10V_AV_3 (VEXTRACTHL, "vextduhvlx", CONST, vextractlv8hi)
> @@ -2863,6 +2939,12 @@ BU_P10_OVERLOAD_2 (CLRR, "clrr")
>  BU_P10_OVERLOAD_2 (GNB, "gnb")
>  BU_P10_OVERLOAD_4 (XXEVAL, "xxeval")
>  BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
> +BU_P10_OVERLOAD_2 (VRLQ, "vrlq")
> +BU_P10_OVERLOAD_2 (VSLQ, "vslq")
> +BU_P10_OVERLOAD_2 (VSRQ, "vsrq")
> +BU_P10_OVERLOAD_2 (VSRAQ, "vsraq")
> +BU_P10_OVERLOAD_2 (DIVE, "dive")
> +BU_P10_OVERLOAD_2 (MOD, "mod")
> 
>  BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
>  BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
> @@ -2882,6 +2964,8 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
> 
> +BU_P10_OVERLOAD_1 (SIGNEXT, "vsignextq")
> +
>  
>  BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
>  BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 9e514a01012..e1d9c2e8729 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -839,6 +839,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQFP,
>      RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPEQ, VSX_BUILTIN_XVCMPEQDP,
> @@ -885,6 +889,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U2DI,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, 0},
> +
> +  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
> +  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_U1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0},
>    { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTUB,
>      RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTSB,
> @@ -899,8 +909,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTUD,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTUT,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTSD,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTST,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTFP,
>      RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, VSX_BUILTIN_XVCMPGTDP,
> @@ -943,6 +957,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_CMPLE_U2DI,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, 0},
> +  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
> +  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_U1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0},
>    { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTUB,
>      RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTSB,
> @@ -995,6 +1014,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
>    { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10_BUILTIN_128BIT_DIV_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10_BUILTIN_128BIT_UDIV_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
>      RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
> @@ -1789,6 +1814,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_MULE, P8V_BUILTIN_VMULEUW,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULE, P10_BUILTIN_128BIT_VMULESD,
> +    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULE, P10_BUILTIN_128BIT_VMULEUD,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMULEUB, ALTIVEC_BUILTIN_VMULEUB,
>      RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMULESB, ALTIVEC_BUILTIN_VMULESB,
> @@ -1812,6 +1842,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_MULO, P8V_BUILTIN_VMULOUW,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULO, P10_BUILTIN_128BIT_VMULOSD,
> +    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULO, P10_BUILTIN_128BIT_VMULOUD,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_MULO, ALTIVEC_BUILTIN_VMULOSH,
>      RS6000_BTI_V4SI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMULOSH, ALTIVEC_BUILTIN_VMULOSH,
> @@ -1854,6 +1889,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI,
>      RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
> @@ -2115,6 +2160,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_RL, P8V_BUILTIN_VRLD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_RL, P10_BUILTIN_128BIT_VRLQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_RL, P10_BUILTIN_128BIT_VRLQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
>      RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
> @@ -2133,12 +2183,23 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { P9V_BUILTIN_VEC_RLMI, P9V_BUILTIN_VRLDMI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
> +  { P9V_BUILTIN_VEC_RLMI, P10_BUILTIN_128BIT_VRLQMI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI },
> +  { P9V_BUILTIN_VEC_RLMI, P10_BUILTIN_128BIT_VRLQMI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLWNM,
>      RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
>    { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLDNM,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, 0 },
> +  { P9V_BUILTIN_VEC_RLNM, P10_BUILTIN_128BIT_VRLQNM,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +  { P9V_BUILTIN_VEC_RLNM, P10_BUILTIN_128BIT_VRLQNM,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
>      RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
> @@ -2155,6 +2216,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SL, P8V_BUILTIN_VSLD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SL, P10_BUILTIN_128BIT_VSLQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SL, P10_BUILTIN_128BIT_VSLQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTDP,
>      RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
>    { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTSP,
> @@ -2351,6 +2417,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SR, P8V_BUILTIN_VSRD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SR, P10_BUILTIN_128BIT_VSRQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SR, P10_BUILTIN_128BIT_VSRQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
>      RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
> @@ -2379,6 +2450,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SRA, P8V_BUILTIN_VSRAD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SRA, P10_BUILTIN_128BIT_VSRAQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SRA, P10_BUILTIN_128BIT_VSRAQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
>      RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
> @@ -3996,12 +4072,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTUD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10_BUILTIN_VCMPGTUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10_BUILTIN_VCMPGTST_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, ALTIVEC_BUILTIN_VCMPGTFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, VSX_BUILTIN_XVCMPGTDP_P,
> @@ -4066,6 +4146,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10_BUILTIN_VCMPEQUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10_BUILTIN_VCMPEQUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, ALTIVEC_BUILTIN_VCMPEQFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
>    { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, VSX_BUILTIN_XVCMPEQDP_P,
> @@ -4117,12 +4201,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10_BUILTIN_VCMPGTUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10_BUILTIN_VCMPGTST_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, ALTIVEC_BUILTIN_VCMPGEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
> @@ -4771,6 +4859,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEW,
>      RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI,
> +    RS6000_BTI_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> 
>    /* The following 2 entries have been deprecated.  */
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEB_P,
> @@ -4871,6 +4965,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
>      RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
>      RS6000_BTI_bool_V2DI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> 
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
> @@ -4976,7 +5072,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
>      RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
>      RS6000_BTI_bool_V2DI, 0 },
> -
> +  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
> @@ -5903,6 +6000,21 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>   { P10_BUILTIN_VEC_XVTLSBB_ONES, P10V_BUILTIN_XVTLSBB_ONES,
>      RS6000_BTI_INTSI, RS6000_BTI_unsigned_V16QI, 0, 0 },
> 
> + { P10_BUILTIN_VEC_SIGNEXT, P10_BUILTIN_128BIT_VSIGNEXTSD2Q,
> +    RS6000_BTI_V1TI, RS6000_BTI_V2DI, 0, 0 },
> +
> + { P10_BUILTIN_VEC_DIVE, P10_BUILTIN_128BIT_DIVES_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10_BUILTIN_128BIT_DIVEU_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +
> +  { P10_BUILTIN_VEC_MOD, P10_BUILTIN_128BIT_MODS_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10_BUILTIN_128BIT_MODU_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +
>    { RS6000_BUILTIN_NONE, RS6000_BUILTIN_NONE, 0, 0, 0, 0 }
>  };
>  
> @@ -12256,12 +12368,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case ALTIVEC_BUILTIN_VCMPEQUH:
>      case ALTIVEC_BUILTIN_VCMPEQUW:
>      case P8V_BUILTIN_VCMPEQUD:
> +    case P10V_BUILTIN_VCMPEQUT:
>        fold_compare_helper (gsi, EQ_EXPR, stmt);
>        return true;
> 
>      case P9V_BUILTIN_CMPNEB:
>      case P9V_BUILTIN_CMPNEH:
>      case P9V_BUILTIN_CMPNEW:
> +    case P10V_BUILTIN_CMPNET:
>        fold_compare_helper (gsi, NE_EXPR, stmt);
>        return true;
> 
> @@ -12273,6 +12387,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case VSX_BUILTIN_CMPGE_U4SI:
>      case VSX_BUILTIN_CMPGE_2DI:
>      case VSX_BUILTIN_CMPGE_U2DI:
> +    case P10V_BUILTIN_CMPGE_1TI:
> +    case P10V_BUILTIN_CMPGE_U1TI:
>        fold_compare_helper (gsi, GE_EXPR, stmt);
>        return true;
> 
> @@ -12284,6 +12400,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case ALTIVEC_BUILTIN_VCMPGTUW:
>      case P8V_BUILTIN_VCMPGTUD:
>      case P8V_BUILTIN_VCMPGTSD:
> +    case P10V_BUILTIN_VCMPGTUT:
> +    case P10V_BUILTIN_VCMPGTST:
>        fold_compare_helper (gsi, GT_EXPR, stmt);
>        return true;
> 
> @@ -12295,6 +12413,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case VSX_BUILTIN_CMPLE_U4SI:
>      case VSX_BUILTIN_CMPLE_2DI:
>      case VSX_BUILTIN_CMPLE_U2DI:
> +    case P10V_BUILTIN_CMPLE_1TI:
> +    case P10V_BUILTIN_CMPLE_U1TI:
>        fold_compare_helper (gsi, LE_EXPR, stmt);
>        return true;
> 
> @@ -13000,6 +13120,8 @@ rs6000_init_builtins (void)
>  					    ? "__vector __bool long"
>  					    : "__vector __bool long long",
>  					    bool_long_long_type_node, 2);
> +  bool_V1TI_type_node = rs6000_vector_type ("__vector __bool __int128",
> +					    intTI_type_node, 1);
>    pixel_V8HI_type_node = rs6000_vector_type ("__vector __pixel",
>  					     pixel_type_node, 8);
> 
> @@ -13185,6 +13307,10 @@ altivec_init_builtins (void)
>      = build_function_type_list (integer_type_node,
>  				integer_type_node, V2DI_type_node,
>  				V2DI_type_node, NULL_TREE);
> +  tree int_ftype_int_v1ti_v1ti
> +    = build_function_type_list (integer_type_node,
> +				integer_type_node, V1TI_type_node,
> +				V1TI_type_node, NULL_TREE);
>    tree void_ftype_v4si
>      = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
>    tree v8hi_ftype_void
> @@ -13537,6 +13663,9 @@ altivec_init_builtins (void)
>  	case E_VOIDmode:
>  	  type = int_ftype_int_opaque_opaque;
>  	  break;
> +	case E_V1TImode:
> +	  type = int_ftype_int_v1ti_v1ti;
> +	  break;
>  	case E_V2DImode:
>  	  type = int_ftype_int_v2di_v2di;
>  	  break;
> @@ -14136,6 +14265,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
>      case P10V_BUILTIN_XXGENPCVM_V8HI:
>      case P10V_BUILTIN_XXGENPCVM_V4SI:
>      case P10V_BUILTIN_XXGENPCVM_V2DI:
> +    case P10_BUILTIN_128BIT_VMULEUD:
> +    case P10_BUILTIN_128BIT_VMULOUD:
> +    case P10_BUILTIN_128BIT_DIVEU_V1TI:
> +    case P10_BUILTIN_128BIT_MODU_V1TI:
>        h.uns_p[0] = 1;
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;
> @@ -14235,10 +14368,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
>      case VSX_BUILTIN_CMPGE_U8HI:
>      case VSX_BUILTIN_CMPGE_U4SI:
>      case VSX_BUILTIN_CMPGE_U2DI:
> +    case P10V_BUILTIN_CMPGE_U1TI:
>      case ALTIVEC_BUILTIN_VCMPGTUB:
>      case ALTIVEC_BUILTIN_VCMPGTUH:
>      case ALTIVEC_BUILTIN_VCMPGTUW:
>      case P8V_BUILTIN_VCMPGTUD:
> +    case P10V_BUILTIN_VCMPGTUT:
> +    case P10V_BUILTIN_VCMPEQUT:
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;
>        break;
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 6f204ca202a..cc629f2d938 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -19546,6 +19546,7 @@ rs6000_handle_altivec_attribute (tree *node,
>      case 'b':
>        switch (mode)
>  	{
> +	case E_TImode: case E_V1TImode: result = bool_V1TI_type_node; break;
>  	case E_DImode: case E_V2DImode: result = bool_V2DI_type_node; break;
>  	case E_SImode: case E_V4SImode: result = bool_V4SI_type_node; break;
>  	case E_HImode: case E_V8HImode: result = bool_V8HI_type_node; break;
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index bbd8060e143..0e5a9617b36 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2305,6 +2305,7 @@ extern int frame_pointer_needed;
>  #define RS6000_BTM_P8_VECTOR	MASK_P8_VECTOR	/* ISA 2.07 vector.  */
>  #define RS6000_BTM_P9_VECTOR	MASK_P9_VECTOR	/* ISA 3.0 vector.  */
>  #define RS6000_BTM_P9_MISC	MASK_P9_MISC	/* ISA 3.0 misc. non-vector */
> +#define RS6000_BTM_P10_128BIT   MASK_POWER10    /* ISA 3.1 128-bit vector.  */

Can the RS6000_BTM_P10_128BIT then be collapsed into the RS6000_BTM_P10
as seen a few lines below?  (Possibly more in subsequent patch?)


>  #define RS6000_BTM_CRYPTO	MASK_CRYPTO	/* crypto funcs.  */
>  #define RS6000_BTM_HTM		MASK_HTM	/* hardware TM funcs.  */
>  #define RS6000_BTM_FRE		MASK_POPCNTB	/* FRE instruction.  */
> @@ -2322,7 +2323,7 @@ extern int frame_pointer_needed;
>  #define RS6000_BTM_FLOAT128_HW	MASK_FLOAT128_HW /* IEEE 128-bit float h/w.  */
>  #define RS6000_BTM_MMA		MASK_MMA	/* ISA 3.1 MMA.  */
>  #define RS6000_BTM_P10		MASK_POWER10
> -
> +#define RS6000_BTM_TI_VECTOR_OPS MASK_TI_VECTOR_OPS /* 128-bit integer support */
> 
>  #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
>  				 | RS6000_BTM_VSX			\
> @@ -2436,6 +2437,7 @@ enum rs6000_builtin_type_index
>    RS6000_BTI_bool_V8HI,          /* __vector __bool short */
>    RS6000_BTI_bool_V4SI,          /* __vector __bool int */
>    RS6000_BTI_bool_V2DI,          /* __vector __bool long */
> +  RS6000_BTI_bool_V1TI,          /* __vector __bool 128-bit */
>    RS6000_BTI_pixel_V8HI,         /* __vector __pixel */
>    RS6000_BTI_long,	         /* long_integer_type_node */
>    RS6000_BTI_unsigned_long,      /* long_unsigned_type_node */
> @@ -2489,6 +2491,7 @@ enum rs6000_builtin_type_index
>  #define bool_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V8HI])
>  #define bool_V4SI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V4SI])
>  #define bool_V2DI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V2DI])
> +#define bool_V1TI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V1TI])
>  #define pixel_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_pixel_V8HI])
> 
>  #define long_long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long_long])
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 796345c80d3..0cca4232619 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -685,6 +685,13 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gtv1ti"
> +  [(set (match_operand:V1TI 0 "vlogical_operand")
> +	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
> +		 (match_operand:V1TI 2 "vlogical_operand")))]
> +  "TARGET_POWER10"
> +  "")
> +
>  ; >= for integer vectors: swap operands and apply not-greater-than
>  (define_expand "vector_nlt<mode>"
>    [(set (match_operand:VEC_I 3 "vlogical_operand")
> @@ -697,6 +704,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_nltv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gt:V1TI (match_operand:V1TI 2 "vlogical_operand")
> +		 (match_operand:V1TI 1 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +        (not:V1TI (match_dup 3)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  (define_expand "vector_gtu<mode>"
>    [(set (match_operand:VEC_I 0 "vint_operand")
>  	(gtu:VEC_I (match_operand:VEC_I 1 "vint_operand")
> @@ -704,6 +722,13 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gtuv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand")
> +	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand")
> +		  (match_operand:V1TI 2 "altivec_register_operand")))]
> +  "TARGET_POWER10"
> +  "")
> +
>  ; >= for integer vectors: swap operands and apply not-greater-than
>  (define_expand "vector_nltu<mode>"
>    [(set (match_operand:VEC_I 3 "vlogical_operand")
> @@ -716,6 +741,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_nltuv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gtu:V1TI (match_operand:V1TI 2 "vlogical_operand")
> +		  (match_operand:V1TI 1 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +	(not:V1TI (match_dup 3)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  (define_expand "vector_geu<mode>"
>    [(set (match_operand:VEC_I 0 "vint_operand")
>  	(geu:VEC_I (match_operand:VEC_I 1 "vint_operand")
> @@ -735,6 +771,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_ngtv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
> +		 (match_operand:V1TI 2 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +        (not:V1TI (match_dup 3)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  (define_expand "vector_ngtu<mode>"
>    [(set (match_operand:VEC_I 3 "vlogical_operand")
>  	(gtu:VEC_I (match_operand:VEC_I 1 "vlogical_operand")
> @@ -746,6 +793,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_ngtuv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gtu:V1TI (match_operand:V1TI 1 "vlogical_operand")
> +		  (match_operand:V1TI 2 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +        (not:V1TI (match_dup 3)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  ; There are 14 possible vector FP comparison operators, gt and eq of them have
>  ; been expanded above, so just support 12 remaining operators here.
> 
> @@ -894,6 +952,18 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_eq_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			     (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_operand:V1TI 0 "vlogical_operand")
> +	  (eq:V1TI (match_dup 1)
> +		   (match_dup 2)))])]
> +  "TARGET_POWER10"
> +  "")
> +
>  ;; This expansion handles the V16QI, V8HI, and V4SI modes in the
>  ;; implementation of the vec_all_ne built-in functions on Power9.
>  (define_expand "vector_ne_<mode>_p"
> @@ -976,6 +1046,23 @@
>    operands[3] = gen_reg_rtx (V2DImode);
>  })
> 
> +(define_expand "vector_ne_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			     (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_dup 3)
> +	  (eq:V1TI (match_dup 1)
> +		   (match_dup 2)))])
> +   (set (match_operand:SI 0 "register_operand" "=r")
> +	(eq:SI (reg:CC CR6_REGNO)
> +	       (const_int 0)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx (V1TImode);
> +})
> +
>  ;; This expansion handles the V2DI mode in the implementation of the
>  ;; vec_any_eq built-in function on Power9.
>  ;;
> @@ -1002,6 +1089,26 @@
>    operands[3] = gen_reg_rtx (V2DImode);
>  })
> 
> +(define_expand "vector_ae_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			     (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_dup 3)
> +	  (eq:V1TI (match_dup 1)
> +		   (match_dup 2)))])
> +   (set (match_operand:SI 0 "register_operand" "=r")
> +	(eq:SI (reg:CC CR6_REGNO)
> +	       (const_int 0)))
> +   (set (match_dup 0)
> +	(xor:SI (match_dup 0)
> +		(const_int 1)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx (V1TImode);
> +})
> +
>  ;; This expansion handles the V4SF and V2DF modes in the Power9
>  ;; implementation of the vec_all_ne built-in functions.  Note that the
>  ;; expansions for this pattern with these modes makes no use of power9-
> @@ -1061,6 +1168,18 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gt_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(gt:CC (match_operand:V1TI 1 "vlogical_operand")
> +			     (match_operand:V1TI 2 "vlogical_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_operand:V1TI 0 "vlogical_operand")
> +	  (gt:V1TI (match_dup 1)
> +		   (match_dup 2)))])]
> +  "TARGET_POWER10"
> +  "")
> +
>  (define_expand "vector_ge_<mode>_p"
>    [(parallel
>      [(set (reg:CC CR6_REGNO)
> @@ -1085,6 +1204,18 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gtu_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(gtu:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			      (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_operand:V1TI 0 "altivec_register_operand")
> +	  (gtu:V1TI (match_dup 1)
> +		    (match_dup 2)))])]
> +  "TARGET_POWER10"
> +  "")
> +
>  ;; AltiVec/VSX predicates.
> 
>  ;; This expansion is triggered during expansion of predicate built-in
> @@ -1460,6 +1591,20 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vrotlv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vrlq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
>  ;; Expanders for rotatert to make use of vrotl
>  (define_expand "vrotr<mode>3"
>    [(set (match_operand:VEC_I 0 "vint_operand")
> @@ -1481,6 +1626,21 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +;; No immediate version of this 128-bit instruction
> +(define_expand "vashlv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
>  ;; Expanders for logical shift right on each vector element
>  (define_expand "vlshr<mode>3"
>    [(set (match_operand:VEC_I 0 "vint_operand")
> @@ -1489,6 +1649,21 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +;; No immediate version of this 128-bit instruction
> +(define_expand "vlshrv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
>  ;; Expanders for arithmetic shift right on each vector element
>  (define_expand "vashr<mode>3"
>    [(set (match_operand:VEC_I 0 "vint_operand")
> @@ -1496,6 +1671,22 @@
>  			(match_operand:VEC_I 2 "vint_operand")))]
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> +
> +;; No immediate version of this 128-bit instruction
> +(define_expand "vashrv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vsraq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
>  
>  ;; Vector reduction expanders for VSX
>  ; The (VEC_reduc:...
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 31fcffe8f33..5b6a0bd728a 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -298,6 +298,12 @@
>     UNSPEC_VSX_XXSPLTD
>     UNSPEC_VSX_DIVSD
>     UNSPEC_VSX_DIVUD
> +   UNSPEC_VSX_DIVSQ
> +   UNSPEC_VSX_DIVUQ
> +   UNSPEC_VSX_DIVESQ
> +   UNSPEC_VSX_DIVEUQ
> +   UNSPEC_VSX_MODSQ
> +   UNSPEC_VSX_MODUQ
>     UNSPEC_VSX_MULSD
>     UNSPEC_VSX_SIGN_EXTEND
>     UNSPEC_VSX_XVCVBF16SPN
> @@ -1732,6 +1738,60 @@
>  }
>    [(set_attr "type" "div")])
> 
> +(define_insn "vsx_div_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVSQ))]
> +  "TARGET_POWER10"
> +  "vdivsq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_udiv_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVUQ))]
> +  "TARGET_POWER10"
> +  "vdivuq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_dives_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVESQ))]
> +  "TARGET_POWER10"
> +  "vdivesq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_diveu_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVEUQ))]
> +  "TARGET_POWER10"
> +  "vdiveuq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_mods_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_MODSQ))]
> +  "TARGET_POWER10"
> +  "vmodsq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_modu_v1ti"
> + [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_MODUQ))]
> +  "TARGET_POWER10"
> +  "vmoduq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
>  ;; *tdiv* instruction returning the FG flag
>  (define_expand "vsx_tdiv<mode>3_fg"
>    [(set (match_dup 3)
> @@ -3083,6 +3143,21 @@
>    "xxpermdi %x0,%x1,%x1,2"
>    [(set_attr "type" "vecperm")])
> 
> +;; Swap upper/lower 64-bit values in a 128-bit vector
> +(define_insn "xxswapd_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(subreg:V1TI
> +           (vec_select:V2DI
> +             (subreg:V2DI
> +                (match_operand:V1TI 1 "vsx_register_operand" "v") 0 )
> +	     (parallel [(const_int 1)(const_int 0)]))
> +           0))]
> +  "TARGET_POWER10"
> +;; AIX does not support extended mnemonic xxswapd.  Use the basic
> +;; mnemonic xxpermdi instead.
> +  "xxpermdi %x0,%x1,%x1,2"
> +  [(set_attr "type" "vecperm")])
> +
>  (define_insn "xxgenpcvm_<mode>_internal"
>    [(set (match_operand:VSX_EXTRACT_I4 0 "altivec_register_operand" "=wa")
>  	(unspec:VSX_EXTRACT_I4
> @@ -4767,6 +4842,15 @@
>     (set_attr "type" "vecload")])
> 
>  
> +;; ISA 3.1 vector extend sign support
> +(define_insn "vsx_sign_extend_v2di_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
> +		     UNSPEC_VSX_SIGN_EXTEND))]
> +  "TARGET_POWER10"
> +  "vextsd2q %0,%1"
> +  [(set_attr "type" "vecexts")])
> +
>  ;; ISA 3.0 vector extend sign support
> 
>  (define_insn "vsx_sign_extend_qi_<mode>"
> @@ -5451,6 +5535,19 @@
>    "vcmpneb %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +;; Vector Compare Not Equal v1ti (specified/not+eq:)
> +(define_expand "vcmpnet"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand")
> +	(not:V1TI
> +	  (eq:V1TI (match_operand:V1TI 1 "altivec_register_operand")
> +		   (match_operand:V1TI 2 "altivec_register_operand"))))]
> +   "TARGET_POWER10"
> +{
> +  emit_insn (gen_eqvv1ti3 (operands[0], operands[1], operands[2]));
> +  emit_insn (gen_one_cmplv1ti2 (operands[0], operands[0]));
> +  DONE;
> +})
> +
>  ;; Vector Compare Not Equal or Zero Byte
>  (define_insn "vcmpnezb"
>    [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c1c2c9f9bf7..0a42f1f63e6 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21316,6 +21316,180 @@ Generate PCV from specified Mask size, as if implemented by the
>  immediate value is either 0, 1, 2 or 3.
>  @findex vec_genpcvm
> 
> +@smallexample
> +@exdent vector unsigned __int128 vec_rl (vector unsigned __int128,
> +                                         vector unsigned __int128);
> +@exdent vector signed __int128 vec_rl (vector signed __int128,
> +                                       vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of rotating the first input left by the number of bits
> +specified in the most significant quad word of the second input truncated to
> +7 bits (bits [125:131]).
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128,
> +                                           vector unsigned __int128,
> +                                           vector unsigned __int128);
> +@exdent vector signed __int128 vec_rlmi (vector signed __int128,
> +                                         vector signed __int128,
> +                                         vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of rotating the first input and inserting it under mask into the
> +second input. The first bit in the mask, the last bit in the mask are obtained from the
> +two 7-bit fields bits [108:115] and bits [117:123] respectively of the second input.
> +The shift is obtained from the third input in the 7-bit field [125:131] where all bits
> +counted from zero at the left.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128,
> +                                           vector unsigned __int128,
> +                                           vector unsigned __int128);
> +@exdent vector signed __int128 vec_rlnm (vector signed __int128,
> +                                         vector unsigned __int128,
> +                                         vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of rotating the first input and ANDing it with a mask. The
> +first bit in the mask and the last bit in the mask are obtained from the two
> +7-bit fields bits [117:123] and bits [125:131] respectively of the second
> +input.  The shift is obtained from the third input in the 7-bit field bits
> +[125:131] where all bits counted from zero at the left.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_sl(vector unsigned __int128, vector unsigned __int128);
> +@exdent vector signed __int128 vec_sl(vector signed __int128, vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of shifting the first input left by the number of bits
> +specified in the most significant bits of the second input truncated to
> +7 bits (bits [125:131]).
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_sr(vector unsigned __int128, vector unsigned __int128);
> +@exdent vector signed __int128 vec_sr(vector signed __int128, vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of performing a logical right shift of the first argument
> +by the number of bits specified in the most significant double word of the
> +second input truncated to 7 bits (bits [125:131]).
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_sra(vector unsigned __int128, vector unsigned __int128);
> +@exdent vector signed __int128 vec_sra(vector signed __int128, vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of performing arithmetic right shift of the first argument
> +by the number of bits specified in the most significant bits of the
> +second input truncated to 7 bits (bits [125:131]).
> +
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_mule (vector unsigned long long,
> +                                           vector unsigned long long);
> +@exdent vector signed __int128 vec_mule (vector signed long long,
> +                                         vector signed long long);
> +@end smallexample
> +
> +Returns a vector containing a 128-bit integer result of multiplying the even doubleword
> +elements of the two inputs.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_mulo (vector unsigned long long,
> +                                           vector unsigned long long);
> +@exdent vector signed __int128 vec_mulo (vector signed long long,
> +                                         vector signed long long);
> +@end smallexample
> +
> +Returns a vector containing a 128-bit integer result of multiplying the odd doubleword
> +elements of the two inputs.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_div (vector unsigned __int128,
> +                                          vector unsigned __int128);
> +@exdent vector signed __int128 vec_div (vector signed __int128,
> +                                        vector signed __int128);
> +@end smallexample
> +
> +Returns the result of dividing the first operand by the second operand. An attempt to
> +divide any value by zero or to divide the most negative signed 128-bit integer by
> +negative one results in an undefined value.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_dive (vector unsigned __int128,
> +                                           vector unsigned __int128);
> +@exdent vector signed __int128 vec_dive (vector signed __int128,
> +                                         vector signed __int128);
> +@end smallexample
> +
> +The result is produced by shifting the first input left by 128 bits and dividing by the
> +second. If an attempt is made to divide by zero or the result is larger than 128 bits,
> +the result is undefined.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_mod (vector unsigned __int128,
> +                                          vector unsigned __int128);
> +@exdent vector signed __int128 vec_mod (vector signed __int128,
> +                                        vector signed __int128);
> +@end smallexample
> +
> +The result is the modulo result of dividing the first input  by the second input.
> +
> +
> +The following builtins perform 128-bit vector comparisons.  The @code{vec_all_xx},
> +@code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is one of the operations
> +@code{eq, ne, gt, lt, ge, le} perform pairwise comparisons between the elements
> +at the same positions within their two vector arguments.   The @code{vec_all_xx}
> +function returns a non-zero value if and only if all pairwise comparisons are true.  The
> +@code{vec_any_xx} function returns a non-zero value if and only if at least one pairwise
> +comparison is true.  The @code{vec_cmpxx}function returns a vector of the same type as its
> +two arguments, within which each element consists of all ones to denote that specified
> +logical comparison of the corresponding elements was true.  Otherwise, the element of the
> +returned vector contains all zeros.

Just nits, Two spaces after period.   A few cases where there are
multiple new lines after a paragraph.

Nothing else jumped out at me below.

thanks
-Will



> +
> +@smallexample
> +vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128);
> +
> +int vec_all_eq (vector signed __int128, vector signed __int128);
> +int vec_all_eq (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_ne (vector signed __int128, vector signed __int128);
> +int vec_all_ne (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_gt (vector signed __int128, vector signed __int128);
> +int vec_all_gt (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_lt (vector signed __int128, vector signed __int128);
> +int vec_all_lt (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_ge (vector signed __int128, vector signed __int128);
> +int vec_all_ge (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_le (vector signed __int128, vector signed __int128);
> +int vec_all_le (vector unsigned __int128, vector unsigned __int128);
> +
> +int vec_any_eq (vector signed __int128, vector signed __int128);
> +int vec_any_eq (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_ne (vector signed __int128, vector signed __int128);
> +int vec_any_ne (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_gt (vector signed __int128, vector signed __int128);
> +int vec_any_gt (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_lt (vector signed __int128, vector signed __int128);
> +int vec_any_lt (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_ge (vector signed __int128, vector signed __int128);
> +int vec_any_ge (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_le (vector signed __int128, vector signed __int128);
> +int vec_any_le (vector unsigned __int128, vector unsigned __int128);
> +@end smallexample
> +
> +
>  @node PowerPC Hardware Transactional Memory Built-in Functions
>  @subsection PowerPC Hardware Transactional Memory Built-in Functions
>  GCC provides two interfaces for accessing the Hardware Transactional
> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> new file mode 100644
> index 00000000000..85ad544e22b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -0,0 +1,2254 @@
> +/* { dg-do run } */
> +/* { dg-options "-mcpu=power10 -O2 -save-temps" } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-require-effective-target ppc_native_128bit } */
> +
> +/* Check that the expected 128-bit instructions are generated if the processor
> +   supports the 128-bit integer instructions. */
> +/* { dg-final { scan-assembler-times {\mvextsd2q\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvslq\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvsrq\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvsraq\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpuq\M} 0 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpsq\M} 0 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpequq\M} 0 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpequq.\M} 16 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 0 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtsq.\M} 16 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 0 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtuq.\M} 16 } } */
> +/* { dg-final { scan-assembler-times {\mvmuleud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivsq\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivuq\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivesq\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdiveuq\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodsq\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmoduq\M} 1 } } */
> +
> +#include <altivec.h>
> +
> +#define DEBUG 0
> +
> +#if DEBUG
> +#include <stdio.h>
> +#include <stdlib.h>
> +
> +
> +void print_i128(__int128_t val)
> +{
> +  printf(" %lld %llu (0x%llx %llx)",
> +	 (signed long long)(val >> 64),
> +	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF),
> +	 (unsigned long long)(val >> 64),
> +	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF));
> +}
> +#endif
> +
> +void abort (void);
> +
> +int main ()
> +{
> +  int i, result_int;
> +
> +  __int128_t arg1, result;
> +  __uint128_t uarg2;
> +
> +  vector signed long long int vec_arg1_di, vec_arg2_di;
> +  vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
> +  vector unsigned long long int vec_uresult_di;
> +  vector unsigned long long int vec_uexpected_result_di;
> +  
> +  __int128_t expected_result;
> +  __uint128_t uexpected_result;
> +
> +  vector __int128_t vec_arg1, vec_arg2, vec_result;
> +  vector __uint128_t vec_uarg1, vec_uarg2, vec_uarg3, vec_uresult;
> +  vector bool __int128  vec_result_bool;
> +
> +  /* sign extend double to 128-bit integer  */
> +  vec_arg1_di[0] = 1000;
> +  vec_arg1_di[1] = -123456;
> +
> +  expected_result = 1000;
> +
> +  vec_result = vec_signextq (vec_arg1_di);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1_di[0] = -123456;
> +  vec_arg1_di[1] = 1000;
> +
> +  expected_result = -123456;
> +
> +  vec_result = vec_signextq (vec_arg1_di);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  /* test shift 128-bit integers.
> +     Note, shift amount is given by the lower 7-bits of the shift amount. */
> +  vec_arg1[0] = 3;
> +  vec_uarg2[0] = 2;
> +  expected_result = vec_arg1[0]*4;
> +
> +  vec_result = vec_sl (vec_arg1, vec_uarg2);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sl(int128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" << %lld", vec_uarg2[0] & 0xFF);
> +    printf(" = ");
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  arg1 = 3;
> +  uarg2 = 4;
> +  expected_result = arg1*16;
> +
> +  result = arg1 << uarg2;
> +
> +  if (result != expected_result) {
> +#if DEBUG
> +    printf("ERROR: int128 << uint128):  ");
> +    print_i128(arg1);
> +    printf(" << %lld", uarg2 & 0xFF);
> +    printf(" = ");
> +    print_i128(result);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 3;
> +  vec_uarg2[0] = 2;
> +  uexpected_result = vec_uarg1[0]*4;
> +  
> +  vec_uresult = vec_sl (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sl(uint128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" << %lld", vec_uarg2[0] & 0xFF);
> +    printf(" = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12;
> +  vec_uarg2[0] = 2;
> +  expected_result = vec_arg1[0]/4;
> +
> +  vec_result = vec_sr (vec_arg1, vec_uarg2);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sr(int128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" >> %lld", vec_uarg2[0] & 0xFF);
> +    printf(" = ");
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 48;
> +  vec_uarg2[0] = 2;
> +  uexpected_result = vec_uarg1[0]/4;
> +  
> +  vec_uresult = vec_sr (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sr(uint128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" >> %lld", vec_uarg2[0] & 0xFF);
> +    printf(" = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  arg1 = 48;
> +  uarg2 = 4;
> +  expected_result = arg1/16;
> +
> +  result = arg1 >> uarg2;
> +
> +  if (result != expected_result) {
> +#if DEBUG
> +    printf("ERROR: int128 >> uint128:  ");
> +    print_i128(arg1);
> +    printf(" >> %lld", uarg2 & 0xFF);
> +    printf(" = ");
> +    print_i128(result);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 0x1234567890ABCDEFULL;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg2[0] = 32;
> +  expected_result = 0x0000000012345678ULL;
> +  expected_result = (expected_result << 64) | 0x90ABCDEFAABBCCDDULL;
> +
> +  vec_result = vec_sra (vec_arg1, vec_uarg2);
> +  
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sra(int128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" >> %lld = \n", vec_uarg2[0]);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
> +  vec_uarg2[0] = 48;
> +  uexpected_result = 0xFFFFFFFFFFFFAABBLL;
> +  uexpected_result = (uexpected_result << 64) | 0xCCDDEEFF11221234ULL;
> +
> +  vec_uresult = vec_sra (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sra(uint128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" >> %lld = \n", vec_uarg2[0] & 0xFF);
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 0x1234567890ABCDEFULL;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg2[0] = 32;
> +  expected_result = 0x90ABCDEFAABBCCDDULL;
> +  expected_result = (expected_result << 64) | 0xEEFF112212345678ULL;
> +
> +  vec_result = vec_rl (vec_arg1, vec_uarg2);
> +  
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rl(int128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" >> %lld = \n", vec_uarg2[0]);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
> +  vec_uarg2[0] = 48;
> +  uexpected_result = 0x11221234567890ABULL;
> +  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEEFFULL;
> +
> +  vec_uresult = vec_rl (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rl(uint128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" >> %lld = \n", vec_uarg2[0]);
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 0x1234567890ABCDEFULL;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg2[0] = (32 << 8) | 95;
> +  vec_uarg3[0] = 32;
> +  expected_result = 0xaabbccddULL;
> +  expected_result = (expected_result << 64) | 0xeeff112200000000ULL;
> +
> +  vec_result = vec_rlnm (vec_arg1, vec_uarg2, vec_uarg3);
> +  
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rlnm(int128, uint128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
> +  vec_uarg2[0] = (8 << 8) | 119;
> +  vec_uarg3[0] = 48;
> +
> +  uexpected_result = 0x00221234567890ABULL;
> +  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEE00ULL;
> +
> +  vec_uresult = vec_rlnm (vec_uarg1, vec_uarg2, vec_uarg3);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rlnm(uint128, uint128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 0x1234567890ABCDEFULL;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
> +  vec_arg2[0] = 0x000000000000DEADULL;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 0x0000BEEF00000000ULL;
> +  vec_uarg3[0] = 96 << 16 | 127 << 8 | 32;
> +  expected_result = 0x000000000000DEADULL;
> +  expected_result = (expected_result << 64) | 0x0000BEEF12345678ULL;
> +
> +  vec_result = vec_rlmi (vec_arg1, vec_arg2, vec_uarg3);
> +  
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rlmi(int128, int128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" << %lld = \n", vec_uarg2_di[1] & 0xFF);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
> +  vec_uarg2[0] = 0xDEAD000000000000ULL;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 0x000000000000BEEFULL;
> +  vec_uarg3[0] = 16 << 16 | 111 << 8 | 48;
> +  uexpected_result = 0xDEAD1234567890ABULL;
> +  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDBEEFULL;
> +
> +  vec_uresult = vec_rlmi (vec_uarg1, vec_uarg2, vec_uarg3);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rlmi(uint128, unit128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" << %lld = \n", vec_uarg3[1] & 0xFF);
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  /* 128-bit compare tests, result is all 1's if true */
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1[0] = 2468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  uexpected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  uexpected_result = (uexpected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpgt (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned vec_cmpgt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpgt (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed vec_cmpgt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = -1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR:not equal signed vec_cmpeq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed equal vec_cmpeq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 12468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  not equal vec_cmpeq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: equal unsigned vec_cmpeq ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 12468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  not equal vec_cmpne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: equal unsigned vec_cmpne ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = -1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR:not equal signed vec_cmpne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed equal vec_cmpne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 12468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 > arg2 vec_cmplt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 1234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 12468;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 < arg2 vec_cmplt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR:  unsigned arg1 = arg2 vec_cmplt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = -1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 > arg2 vec_cmplt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = -1234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 12468;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 < arg2 vec_cmplt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_cmplt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +   
> +  vec_uarg1[0] = 12468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 > arg2 vec_cmple ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 1234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 12468;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 < arg2 vec_cmple ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR:  unsigned arg1 = arg2 vec_cmple ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = -1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 > arg2 vec_cmple ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = -1234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 12468;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 < arg2 vec_cmple ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_cmple ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 12468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 > arg2 vec_cmpge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 1234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 12468;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 < arg2 vec_cmpge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR:  unsigned arg1 = arg2 vec_cmpge ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = -1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 > arg2 vec_cmpge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = -1234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 12468;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 < arg2 vec_cmpge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_cmpge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_eq (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_eq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_eq (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_eq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_eq ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_eq ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_ne (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_ne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_ne (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_ne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ne ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ne ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_lt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_lt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_lt (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_lt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_lt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_lt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_le (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_le ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_le (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_le ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_le (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_le ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_le (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_le ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_gt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_gt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_gt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_gt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_gt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_gt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_ge (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_ge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_ge (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_ge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ge ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ge ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_eq (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_eq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_eq (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_eq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_eq ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_eq ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_ne (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_ne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_ne (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_ne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ne ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_ne ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_lt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_lt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_lt (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_lt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_lt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_lt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_gt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_gt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_gt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_gt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_gt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_le (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_le ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_le (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_le ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_le (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_le ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_le (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_le ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_ge (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_ge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_ge (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_ge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ge ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  /* Vector multiply Even and Odd tests */
> +  vec_arg1_di[0] = 200;
> +  vec_arg1_di[1] = 400;
> +  vec_arg2_di[0] = 1234;
> +  vec_arg2_di[1] = 4567;
> +  expected_result = vec_arg1_di[0] * vec_arg2_di[0];
> +
> +  vec_result = vec_mule (vec_arg1_di, vec_arg2_di);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mule (signed, signed) failed.\n");
> +    printf(" vec_arg1_di[0] = %lld\n", vec_arg1_di[0]);
> +    printf(" vec_arg2_di[0] = %lld\n", vec_arg2_di[0]);
> +    printf("Result = ");
> +    print_i128(vec_result[0]);
> +    printf("\nExpected Result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1_di[0] = -200;
> +  vec_arg1_di[1] = -400;
> +  vec_arg2_di[0] = 1234;
> +  vec_arg2_di[1] = 4567;
> +  expected_result = vec_arg1_di[1] * vec_arg2_di[1];
> +
> +  vec_result = vec_mulo (vec_arg1_di, vec_arg2_di);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mulo (signed, signed) failed.\n");
> +    printf(" vec_arg1_di[1] = %lld\n", vec_arg1_di[1]);
> +    printf(" vec_arg2_di[1] = %lld\n", vec_arg2_di[1]);
> +    printf("Result = ");
> +    print_i128(vec_result[0]);
> +    printf("\nExpected Result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1_di[0] = 200;
> +  vec_uarg1_di[1] = 400;
> +  vec_uarg2_di[0] = 1234;
> +  vec_uarg2_di[1] = 4567;
> +  uexpected_result = vec_uarg1_di[0] * vec_uarg2_di[0];
> +
> +  vec_uresult = vec_mule (vec_uarg1_di, vec_uarg2_di);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mule (unsigned, unsigned) failed.\n");
> +    printf(" vec_uarg1_di[1] = %lld\n", vec_uarg1_di[1]);
> +    printf(" vec_uarg2_di[1] = %lld\n", vec_uarg2_di[1]);
> +    printf("Result = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\nExpected Result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1_di[0] = 200;
> +  vec_uarg1_di[1] = 400;
> +  vec_uarg2_di[0] = 1234;
> +  vec_uarg2_di[1] = 4567;
> +  uexpected_result = vec_uarg1_di[1] * vec_uarg2_di[1];
> +
> +  vec_uresult = vec_mulo (vec_uarg1_di, vec_uarg2_di);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mulo (unsigned, unsigned) failed.\n");
> +    printf(" vec_uarg1_di[0] = %lld\n", vec_uarg1_di[0]);
> +    printf(" vec_uarg2_di[0] = %lld\n", vec_uarg2_di[0]);
> +    printf("Result = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\nExpected Result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  /* Vector Divide Quadword */
> +  vec_arg1[0] = -12345678;
> +  vec_arg2[0] = 2;
> +  expected_result = -6172839;
> +
> +  vec_result = vec_div (vec_arg1, vec_arg2);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_div (signed, signed) failed.\n");
> +    printf("vec_arg1[0] = ");
> +    print_i128(vec_arg1[0]);
> +    printf("\nvec_arg2[0] = ");
> +    print_i128(vec_arg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_result[0]);
> +    printf("\nExpected result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 24680;
> +  vec_uarg2[0] = 4;
> +  uexpected_result = 6170;
> +
> +  vec_uresult = vec_div (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_div (unsigned, unsigned) failed.\n");
> +    printf("vec_uarg1[0] = ");
> +    print_i128(vec_uarg1[0]);
> +    printf("\nvec_uarg2[0] = ");
> +    print_i128(vec_uarg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\nExpected result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  /* Vector Divide Extended Quadword */
> +  vec_arg1[0] = -20;        // has 128-bit of zero concatenated onto it
> +  vec_arg2[0] = 0x2000000000000000;
> +  vec_arg2[0] = vec_arg2[0] << 64;
> +  expected_result = -160;
> +
> +  vec_result = vec_dive (vec_arg1, vec_arg2);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_dive (signed, signed) failed.\n");
> +    printf("vec_arg1[0] = ");
> +    print_i128(vec_arg1[0]);
> +    printf("\nvec_arg2[0] = ");
> +    print_i128(vec_arg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_result[0]);
> +    printf("\nExpected result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 20;        // has 128-bit of zero concatenated onto it
> +  vec_uarg2[0] = 0x4000000000000000;
> +  vec_uarg2[0] = vec_uarg2[0] << 64;
> +  uexpected_result = 80;
> +
> +  vec_uresult = vec_dive (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_dive (unsigned, unsigned) failed.\n");
> +    printf("vec_uarg1[0] = ");
> +    print_i128(vec_uarg1[0]);
> +    printf("\nvec_uarg2[0] = ");
> +    print_i128(vec_uarg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\nExpected result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  /* Vector modulo quad word  */
> +  vec_arg1[0] = -12345675;
> +  vec_arg2[0] = 2;
> +  expected_result = -1;
> +
> +  vec_result = vec_mod (vec_arg1, vec_arg2);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mod (signed, signed) failed.\n");
> +    printf("vec_arg1[0] = ");
> +    print_i128(vec_arg1[0]);
> +    printf("\nvec_arg2[0] = ");
> +    print_i128(vec_arg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_result[0]);
> +    printf("\nExpected result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 24685;
> +  vec_uarg2[0] = 4;
> +  uexpected_result = 1;
> +
> +  vec_uresult = vec_mod (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mod (unsigned, unsigned) failed.\n");
> +    printf("vec_uarg1[0] = ");
> +    print_i128(vec_uarg1[0]);
> +    printf("\nvec_uarg2[0] = ");
> +    print_i128(vec_uarg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\nExpected result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  return 0;
> +}


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support
  2020-09-21 23:56 ` [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support Carl Love
@ 2020-09-24 18:21   ` will schmidt
  2020-10-05 18:52     ` Carl Love
  0 siblings, 1 reply; 41+ messages in thread
From: will schmidt @ 2020-09-24 18:21 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Segher, Will:
> 
> Add support for converting to/from 128-bit integers and 128-bit 
> decimal floating point formats.

A more wordy blurb here clarifying what the patch does would be useful.

i.e. this adds support for dcffixqq and dctfixqq instructons..

> 
> The updates from the previous version of the patch:
> 
> Removed stray ";; carll" comment.  
> 
> Removed #if 1 and #endif in the test case.
> 
> Replaced TARGET_TI_VECTOR_OPS with POWER10.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 test was run by hand on Mambo.
> 
> 
>              Carl Love
> 
> 
> -------------------------------------------------------------------
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
> 

ok.


Need changelog blurb to reflect the rs6000-call changes.
(this may have leaked in from previous or subsequent patch?)


> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  <cel@us.ibm.com>
> 	* gcc.target/powerpc/int_128bit-runnable.c:  Update test.


ok.


> ---


>  gcc/config/rs6000/dfp.md                      | 14 +++++
>  gcc/config/rs6000/rs6000-call.c               |  4 ++
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 62 +++++++++++++++++++
>  3 files changed, 80 insertions(+)
> 
> diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
> index 8f822732bac..0e82e315fee 100644
> --- a/gcc/config/rs6000/dfp.md
> +++ b/gcc/config/rs6000/dfp.md
> @@ -222,6 +222,13 @@
>    "dcffixq %0,%1"
>    [(set_attr "type" "dfp")])
> 
> +(define_insn "floattitd2"
> +  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
> +	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
> +  "TARGET_POWER10"
> +  "dcffixqq %0,%1"
> +  [(set_attr "type" "dfp")])
> +
>  ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
>  ;; This is the first stage of converting it to an integer type.
> 
> @@ -241,6 +248,13 @@
>    "TARGET_DFP"
>    "dctfix<q> %0,%1"
>    [(set_attr "type" "dfp")])
> +
> +(define_insn "fixtdti2"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> +	(fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
> +  "TARGET_POWER10"
> +  "dctfixqq %0,%1"
> +  [(set_attr "type" "dfp")])
> 
>  ;; Decimal builtin support
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index e1d9c2e8729..9c50cd3c5a7 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -4967,6 +4967,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V2DI, 0 },
>    { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> 
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
> @@ -5074,6 +5076,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V2DI, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> index 85ad544e22b..ec3dcf3dff1 100644
> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -38,6 +38,7 @@
>  #if DEBUG
>  #include <stdio.h>
>  #include <stdlib.h>
> +#include <math.h>
> 
> 
>  void print_i128(__int128_t val)
> @@ -59,6 +60,13 @@ int main ()
>    __int128_t arg1, result;
>    __uint128_t uarg2;
> 
> +  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
> +
> +  struct conv_t {
> +    __uint128_t u128;
> +    _Decimal128 d128;
> +  } conv, conv2;
> +
>    vector signed long long int vec_arg1_di, vec_arg2_di;
>    vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
>    vector unsigned long long int vec_uresult_di;
> @@ -2249,6 +2257,60 @@ int main ()
>      abort();
>  #endif
>    }
> +  
> +  /* DFP to __int128 and __int128 to DFP conversions */
> +  /* Can't get printing of DFP values to work.  Print the DFP value as an
> +     unsigned int so we can see the bit patterns.  */
> +  conv.u128 = 0x2208000000000000ULL;
> +  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
> +  expected_result_dfp128 = conv.d128;
> 
> +  arg1 = 4;
> +
> +  conv.d128 = (_Decimal128) arg1;
> +
> +  result_dfp128 = (_Decimal128) arg1;
> +  if (((conv.u128 >>64) != 0x2208000000000000ULL) &&
> +      ((conv.u128 & 0xFFFFFFFFFFFFFFFF) != 0x4ULL)) {
> +#if DEBUG
> +    printf("ERROR:  convert int128 value ");
> +    print_i128 (arg1);
> +    conv.d128 = result_dfp128;
> +    printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
> +	   (unsigned long long)((conv.u128) >>64),
> +	   (unsigned long long)((conv.u128) & 0xFFFFFFFFFFFFFFFF));
> +
> +    conv.d128 = expected_result_dfp128;
> +    printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
> +	   (unsigned long long) (conv.u128>>64),
> +	   (unsigned long long) (conv.u128 & 0xFFFFFFFFFFFFFFFF));
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  expected_result = 4;
> +
> +  conv.u128 = 0x2208000000000000ULL;
> +  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
> +  arg1_dfp128 = conv.d128;
> +
> +  result = (__int128_t) arg1_dfp128;
> +
> +  if (result != expected_result) {
> +#if DEBUG
> +    printf("ERROR:  convert DFP value ");
> +    printf("0x%llx %llx (printed as hex bit string) ",
> +	   (unsigned long long)(conv.u128>>64),
> +	   (unsigned long long)(conv.u128 & 0xFFFFFFFFFFFFFFFF));
> +    printf("to __int128 value = ");
> +    print_i128 (result);
> +    printf("\ndoes not match expected_result = ");
> +    print_i128 (expected_result);
> +    printf("\n");
> +#else
> +    abort();
> +#endif
> +  }
>    return 0;
>  }


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/5] Test 128-bit shifts for just the int128 type.
  2020-09-21 23:56 ` [PATCH 4/5] Test 128-bit shifts for just the int128 type Carl Love
@ 2020-09-24 18:21   ` will schmidt
  2020-10-05 18:52     ` Carl Love
  0 siblings, 1 reply; 41+ messages in thread
From: will schmidt @ 2020-09-24 18:21 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 4 adds the vector 128-bit integer shift instruction support for
> the V1TI type.
> 
> The following changes were made from the previous version.
> 
> Renamed VSX_TI to VEC_TI, put def in vector.md.  Didn't get it
> separated into a different patch.
> 
> Reworked the XXSWAPD_V1TI to not use UNSPEC.
> 
> Test suite program cleanups, removed "//" comments that were not
> needed.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 test was run by hand on Mambo.
> 
> 
>                     Carl Love
> 
> ------------------------------------------------------
> 
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
> 	Rename altivec_vslq_<mode>, altivec_vsrq_<mode>, mode VEC_TI.

Nit "Rename to"

> 	* config/rs6000/vector.md (VEC_TI): New mode iterator.
> 	(vashlv1ti3): Change to vashl<mode>3, mode VEC_TI.
> 	(vlshrv1ti3): Change to vlshr<mode>3, mode VEC_TI.
s/Change/Rename to/

'New' isn't quite right for the mode iterator, since it's renamed from
the VSX_TI iterator.
perhaps something like

	* config/rs6000/vector.md (VEC_TI): New name for VSX_TI 
	iterator from vsx.md.

> 	* config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator.
> 	(VSX_TI): Renamed VEC_TI.


Just the Remove.  VEC_TI doesn't exist in vsx.md. 



> 
> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  <cel@us.ibm.com>
> 	gcc.target/powerpc/int_128bit-runnable.c: Add shift_right,
> shift_left
> 	tests.
> ---
>  gcc/config/rs6000/altivec.md                  | 16 ++++-----
>  gcc/config/rs6000/vector.md                   | 27 ++++++++-------
>  gcc/config/rs6000/vsx.md                      | 33 +++++++++------
> ----
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++++++--
>  4 files changed, 52 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.md
> b/gcc/config/rs6000/altivec.md
> index 34a4731342a..5db3de3cc9f 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2219,10 +2219,10 @@
>    "vsl<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vslq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		     (match_operand:V1TI 2 "vsx_register_operand"
> "v")))]
> +(define_insn "altivec_vslq_<mode>"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand"
> "v")
> +		     (match_operand:VEC_TI 2 "vsx_register_operand"
> "v")))]
>    "TARGET_POWER10"
>    /* Shift amount in needs to be in bits[57:63] of 128-bit operand.
> */
>    "vslq %0,%1,%2"
> @@ -2236,10 +2236,10 @@
>    "vsr<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vsrq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand"
> "v")
> -		       (match_operand:V1TI 2 "vsx_register_operand"
> "v")))]
> +(define_insn "altivec_vsrq_<mode>"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand"
> "v")
> +			   (match_operand:VEC_TI 2
> "vsx_register_operand" "v")))]
>    "TARGET_POWER10"
>    /* Shift amount in needs to be in bits[57:63] of 128-bit operand.
> */
>    "vsrq %0,%1,%2"
> diff --git a/gcc/config/rs6000/vector.md
> b/gcc/config/rs6000/vector.md
> index 0cca4232619..3ea3a91845a 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; 128-bit int modes
> +(define_mode_iterator VEC_TI [V1TI TI])
> +
>  ;; Vector int modes for parity
>  (define_mode_iterator VEC_IP [V8HI
>  			      V4SI
> @@ -1627,17 +1630,17 @@
>    "")
> 
>  ;; No immediate version of this 128-bit instruction
> -(define_expand "vashlv1ti3"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		     (match_operand:V1TI 2 "vsx_register_operand"
> "v")))]
> +(define_expand "vashl<mode>3"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
> +			 (match_operand:VEC_TI 2
> "vsx_register_operand")))]
>    "TARGET_POWER10"
>  {
>    /* Shift amount in needs to be put in bits[57:63] of 128-bit
> operand2. */
> -  rtx tmp = gen_reg_rtx (V1TImode);
> +  rtx tmp = gen_reg_rtx (<MODE>mode);
> 
>    emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> -  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
> +  emit_insn(gen_altivec_vslq_<mode> (operands[0], operands[1],
> tmp));
>    DONE;
>  })
> 
> @@ -1650,17 +1653,17 @@
>    "")
> 
>  ;; No immediate version of this 128-bit instruction
> -(define_expand "vlshrv1ti3"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand"
> "v")
> -		       (match_operand:V1TI 2 "vsx_register_operand"
> "v")))]
> +(define_expand "vlshr<mode>3"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:VEC_TI (match_operand:VEC_TI 1
> "vsx_register_operand")
> +			   (match_operand:VEC_TI 2
> "vsx_register_operand")))]
>    "TARGET_POWER10"
>  {
>    /* Shift amount in needs to be put into bits[57:63] of 128-bit
> operand2. */
> -  rtx tmp = gen_reg_rtx (V1TImode);
> +  rtx tmp = gen_reg_rtx (<MODE>mode);
> 
>    emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> -  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
> +  emit_insn(gen_altivec_vsrq_<mode> (operands[0], operands[1],
> tmp));
>    DONE;
>  })
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 5b6a0bd728a..87f96ffcc4c 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -37,9 +37,6 @@
>  				  TI
>  				  V1TI])
> 
> -;; Iterator for 128-bit integer types that go in a single vector
> register.
> -(define_mode_iterator VSX_TI [TI V1TI])
> -
>  ;; Iterator for the 2 32-bit vector types
>  (define_mode_iterator VSX_W [V4SF V4SI])
> 
> @@ -944,9 +941,9 @@
>  ;; special V1TI container class, which it is not appropriate to use
> vec_select
>  ;; for the type.
>  (define_insn "*vsx_le_permute_<mode>"
> -  [(set (match_operand:VSX_TI 0 "nonimmediate_operand"
> "=wa,wa,Z,&r,&r,Q")
> -	(rotate:VSX_TI
> -	 (match_operand:VSX_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
> +  [(set (match_operand:VEC_TI 0 "nonimmediate_operand"
> "=wa,wa,Z,&r,&r,Q")
> +	(rotate:VEC_TI
> +	 (match_operand:VEC_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
>  	 (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
>    "@
> @@ -960,10 +957,10 @@
>     (set_attr "type" "vecperm,vecload,vecstore,*,load,store")])
> 
>  (define_insn_and_split "*vsx_le_undo_permute_<mode>"
> -  [(set (match_operand:VSX_TI 0 "vsx_register_operand" "=wa,wa")
> -	(rotate:VSX_TI
> -	 (rotate:VSX_TI
> -	  (match_operand:VSX_TI 1 "vsx_register_operand" "0,wa")
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=wa,wa")
> +	(rotate:VEC_TI
> +	 (rotate:VEC_TI
> +	  (match_operand:VEC_TI 1 "vsx_register_operand" "0,wa")
>  	  (const_int 64))
>  	 (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX"
> @@ -1031,11 +1028,11 @@
>  ;; Peepholes to catch loads and stores for TImode if TImode landed
> in
>  ;; GPR registers on a little endian system.
>  (define_peephole2
> -  [(set (match_operand:VSX_TI 0 "int_reg_operand")
> -	(rotate:VSX_TI (match_operand:VSX_TI 1 "memory_operand")
> +  [(set (match_operand:VEC_TI 0 "int_reg_operand")
> +	(rotate:VEC_TI (match_operand:VEC_TI 1 "memory_operand")
>  		       (const_int 64)))
> -   (set (match_operand:VSX_TI 2 "int_reg_operand")
> -	(rotate:VSX_TI (match_dup 0)
> +   (set (match_operand:VEC_TI 2 "int_reg_operand")
> +	(rotate:VEC_TI (match_dup 0)
>  		       (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
>     && (rtx_equal_p (operands[0], operands[2])
> @@ -1043,11 +1040,11 @@
>     [(set (match_dup 2) (match_dup 1))])
> 
>  (define_peephole2
> -  [(set (match_operand:VSX_TI 0 "int_reg_operand")
> -	(rotate:VSX_TI (match_operand:VSX_TI 1 "int_reg_operand")
> +  [(set (match_operand:VEC_TI 0 "int_reg_operand")
> +	(rotate:VEC_TI (match_operand:VEC_TI 1 "int_reg_operand")
>  		       (const_int 64)))
> -   (set (match_operand:VSX_TI 2 "memory_operand")
> -	(rotate:VSX_TI (match_dup 0)
> +   (set (match_operand:VEC_TI 2 "memory_operand")
> +	(rotate:VEC_TI (match_dup 0)
>  		       (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
>     && peep2_reg_dead_p (2, operands[0])"
> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> index ec3dcf3dff1..25e2c9d1af4 100644
> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -53,6 +53,18 @@ void print_i128(__int128_t val)
> 
>  void abort (void);
> 
> +__attribute__((noinline))
> +__int128_t shift_right (__int128_t a, __uint128_t b)
> +{
> +  return a >> b;
> +}
> +
> +__attribute__((noinline))
> +__int128_t shift_left (__int128_t a, __uint128_t b)
> +{
> +  return a << b;
> +}
> +
>  int main ()
>  {
>    int i, result_int;
> @@ -141,7 +153,7 @@ int main ()
>  #endif
>    }
> 
> -  arg1 = 3;
> +  arg1 = vec_result[0];
>    uarg2 = 4;
>    expected_result = arg1*16;
> 
> @@ -225,7 +237,7 @@ int main ()
>  #endif
>    }
> 
> -  arg1 = 48;
> +  arg1 = vec_uresult[0];
>    uarg2 = 4;
>    expected_result = arg1/16;
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/5] Conversions between 128-bit integer and floating point values.
  2020-09-21 23:57 ` [PATCH 5/5] Conversions between 128-bit integer and floating point values Carl Love
@ 2020-09-24 18:22   ` will schmidt
  2020-10-05 18:52     ` Carl Love
  0 siblings, 1 reply; 41+ messages in thread
From: will schmidt @ 2020-09-24 18:22 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Mon, 2020-09-21 at 16:57 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 5 adds the 128-bit integer to/from 128-floating point
> conversions.  This patch has to invoke the routines to use the 128-
> bit
> hardware instructions if on Power 10 or use software routines if
> running on a pre Power 10 system via the resolve function.
> 
> Add ifunc resolves for __fixkfti, __floatuntikf_sw, __fixkfti_swn,
> __fixunskfti_sw.
> 
> The following changes were made to the previous version of the
> patch: 
> 
> Fixed typos in ChangeLog noted by Will.
> 
> Turned off debug in test case.
> 
> Removed extra blank lines, fixed spacing of #else in the test case.
> 
> Added comment to fixunskfti-sw.c about changes made from the original
> file fixunskfti.c.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 tests were run by hand on Mambo.
> 
>                     Carl Love
> ---------------------------------------------------------
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  <cel@us.ibm.com>
> 	config/rs6000/rs6000.md (floatti<mode>2, floatunsti<mode>2,
> 	fix_trunc<mode>ti2, fixuns_trunc<mode>ti2): Add
> 	define_insn for mode IEEE 128.
ok

> 	libgcc/config/rs6000/fixkfi-sw.c: New file.
> 	libgcc/config/rs6000/fixkfi.c: Remove file.

Should that be fixkfti-sw.c (missing t).

Adjust to indicate this is a rename
        libgcc/config/rs6000/fixkfti.c: Rename to
        libgcc/config/rs6000/fixkfti-sw.c


> 	libgcc/config/rs6000/fixunskfti-sw.c: New file.
> 	libgcc/config/rs6000/fixunskfti.c: Remove file.
> 	libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
> 	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
> 	New functions.
> 	libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
> 	New macro.
> 	(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
> 	__fixunskfti_resolve): Add resolve functions.
> 	(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
> 	functions.
> 	libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
> 	__fixtfti, __fixunstfti): Add editor commands to change
> 	names.
> 	libgcc/config/rs6000/float128-sed-hw (__floattitf,
> 	__floatuntitf, __fixtfti, __fixunstfti): Add editor commands
> 	to change names.
> 	libgcc/config/rs6000/floattikf-sw.c: New file.
> 	libgcc/config/rs6000/floattikf.c: Remove file.
> 	libgcc/config/rs6000/floatuntikf-sw.c: New file.
> 	libgcc/config/rs6000/floatuntikf.c: Remove file.
> 	libgcc/config/rs6000/floatuntikf-sw.c: New file.
> 	libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
> 	__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw,
> __floattikf_hw,
> 	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
> 	__floatuntikf, __fixkfti, __fixunskfti): New extern
> declarations.
> 	libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
> 	fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
> 	(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
> 	file names to fp128_ppc_funcs.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  <cel@us.ibm.com>
> 	gcc.target/powerpc/fl128_conversions.c: New file.
> ---
>  gcc/config/rs6000/rs6000.md                   |  36 +++
>  .../gcc.target/powerpc/fp128_conversions.c    | 286
> ++++++++++++++++++
>  .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
>  .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   7 +-
>  libgcc/config/rs6000/float128-hw.c            |  24 ++
>  libgcc/config/rs6000/float128-ifunc.c         |  44 ++-
>  libgcc/config/rs6000/float128-sed             |   4 +
>  libgcc/config/rs6000/float128-sed-hw          |   4 +
>  .../rs6000/{floattikf.c => floattikf-sw.c}    |   4 +-
>  .../{floatuntikf.c => floatuntikf-sw.c}       |   4 +-
>  libgcc/config/rs6000/quad-float128.h          |  17 +-
>  libgcc/config/rs6000/t-float128               |   3 +-
>  12 files changed, 417 insertions(+), 20 deletions(-)
>  create mode 100644
> gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
>  rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
>  rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (90%)
>  rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
>  rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c}
> (96%)
> 
> diff --git a/gcc/config/rs6000/rs6000.md
> b/gcc/config/rs6000/rs6000.md
> index 694ff70635e..5db5d0b4505 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6390,6 +6390,42 @@
>     xscvsxddp %x0,%x1"
>    [(set_attr "type" "fp")])
> 
> +(define_insn "floatti<mode>2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +       (float:IEEE128 (match_operand:TI 1 "vsx_register_operand"
> "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvsqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "floatunsti<mode>2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +       (unsigned_float:IEEE128 (match_operand:TI 1
> "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvuqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "fix_trunc<mode>ti2"
> +  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
> +       (fix:TI (match_operand:IEEE128 1 "vsx_register_operand"
> "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvqpsqz %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "fixuns_trunc<mode>ti2"
> +  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
> +       (unsigned_fix:TI (match_operand:IEEE128 1
> "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvqpuqz %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
>  ; Allow the combiner to merge source memory operands to the
> conversion so that
>  ; the optimizer/register allocator doesn't try to load the value too
> early in a
>  ; GPR and then use store/load to move it to a FPR and suffer from a
> store-load
> diff --git a/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
> b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
> new file mode 100644
> index 00000000000..dff24e1e2b4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
> @@ -0,0 +1,286 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10" } */
> +
> +/* Check that the expected 128-bit instructions are generated if the
> processor
> +   supports the 128-bit integer instructions. */
> +/* { dg-final { scan-assembler-times {\mxscvsqqp\M} 1 { target {
> ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvuqqp\M} 1 { target {
> ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvqpsqz\M} 1 { target {
> ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvqpuqz\M} 1 { target {
> ppc_native_128bit } } } } */
> +
> +#include <stdio.h>
> +#include <math.h>
> +#include <fenv.h>
> +#include <stdlib.h>
> +#include <wchar.h>
> +
> +#define DEBUG 0
> +
> +void abort (void);
> +
> +float conv_i_2_fp( long long int a)
> +{
> +  return (float) a;
> +}
> +
> +double conv_i_2_fpd( long long int a)
> +{
> +  return (double) a;
> +}
> +
> +double conv_ui_2_fpd( unsigned long long int a)
> +{
> +  return (double) a;
> +}
> +
> +__float128 conv_i128_2_fp128 (__int128_t a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__float128) a;
> +}
> +
> +__float128 conv_ui128_2_fp128 (__uint128_t a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__float128) a;
> +}
> +
> +__int128_t conv_fp128_2_i128 (__float128 a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__int128_t) a;
> +}
> +
> +__uint128_t conv_fp128_2_ui128 (__float128 a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__uint128_t) a;
> +}
> +
> +long double conv_i128_2_ld (__int128_t a)
> +{
> +  // default, gen call __floattitf
> +  // -mabi=ibmlongdouble, gen call __floattitf
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (long double) a;
> +}
> +
> +__ibm128 conv_i128_2_ibm128 (__int128_t a)
> +{
> +  // default, gen call __floattitf
> +  // -mabi=ibmlongdouble, gen call __floattitf
> +  // -mabi=ieeelongdouble, message uses IBM long double, no binary
> output
> +  return (__ibm128) a;
> +}
> +
> +int main()
> +{
> +	float a, expected_result_float;
> +	double b, expected_result_double;
> +	long long int c, expected_result_llint;
> +	unsigned long long int u;
> +	__int128_t d;
> +	__uint128_t u128;
> +	unsigned long long expected_result_uint128[2] ;
> +	__float128 e;
> +	long double ld;     // another 128-bit float version
> +
> +	union conv_t {
> +		float a;
> +		double b;
> +		long long int c;
> +		long long int128[2] ;
> +		unsigned long long uint128[2] ;
> +		unsigned long long int u;
> +		__int128_t d;
> +		__uint128_t u128;
> +		__float128 e;
> +		long double ld;     // another 128-bit float version
> +	} conv, conv_result;
> +
> +	c = 20;
> +	expected_result_llint = 20.00000;
> +	a = conv_i_2_fp (c);
> +
> +	if (a != expected_result_llint) {
> +#if DEBUG
> +		printf("ERROR: conv_i_2_fp(%lld) = %10.5f\n", c, a);
> +		printf("\n does not match expected_result =
> %10.5f\n\n",
> +				 expected_result_llint);
> +#else
> +		abort();
> +#endif
> +	}
> +
> +	c = 20;
> +	expected_result_double = 20.00000;
> +	b = conv_i_2_fpd (c);
> +
> +	if (b != expected_result_double) {
> +#if DEBUG
> +		printf("ERROR: conv_i_2_fpd(%lld) = %10.5f\n", d, b);
> +		printf("\n does not match expected_result =
> %10.5f\n\n",
> +				 expected_result_double);
> + #else
> +		abort();
> +#endif
> +	}
> +
> +	u = 20;
> +	expected_result_double = 20.00000;
> +	b = conv_ui_2_fpd (u);
> +
> +	if (b != expected_result_double) {
> +#if DEBUG
> +		printf("ERROR: conv_ui_2_fpd(%llu) = %10.5f\n", u, b);
> +		printf("\n does not match expected_result =
> %10.5f\n\n",
> +				 expected_result_double);
> + #else
> +		abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */
> +  d = -3210;
> +  d = (d * 10000000000) + 9876543210;
> +  conv_result.e = conv_i128_2_fp128 (d);
> +  expected_result_uint128[1] = 0xc02bd2f9068d1160;
> +  expected_result_uint128[0] = 0x0;
> +  
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] !=
> expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_i128_2_fp128(-32109876543210) = (result
> in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex)
> 0x%llx %llx\n\n",
> +				expected_result_uint128[1],
> expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  d = 123;
> +  d = (d * 10000000000) + 1234567890;
> +  conv_result.ld = conv_i128_2_fp128 (d);
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0x4271eab4c8ed2000;
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] !=
> expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_i128_2_fp128(1231234567890) = (result in
> hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex)
> 0x%llx %llx\n\n",
> +				expected_result_uint128[1],
> expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */
> +  u128 = 8760;
> +  u128 = (u128 * 10000000000) + 1234567890;
> +  conv_result.e = conv_ui128_2_fp128 (u128);
> +  expected_result_uint128[1] = 0x402d3eb101df8b48;
> +  expected_result_uint128[0] = 0x0;
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] !=
> expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_ui128_2_fp128(87601234567890) = (result
> in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex)
> 0x%llx %llx\n\n",
> +				expected_result_uint128[1],
> expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */
> +  u128 = 3210;
> +  u128 = (u128 * 10000000000) + 9876543210;
> +  expected_result_uint128[1] = 0x402bd3429c8feea0;
> +  expected_result_uint128[0] = 0x0;
> +  conv_result.e = conv_ui128_2_fp128 (u128);
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] !=
> expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_ui128_2_fp128(32109876543210) = (result
> in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex)
> 0x%llx %llx\n\n",
> +				expected_result_uint128[1],
> expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = 12345.6789;
> +  expected_result_uint128[1] = 0x1407374883526960;
> +  expected_result_uint128[0] = 0x3039;
> +
> +  conv_result.d = conv_fp128_2_i128 (conv.e);
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] !=
> expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) =  ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1],
> conv_result.uint128[0]);
> +
> +	  printf("\n does not match expected_result = (result in hex)
> 0x%llx %llx\n\n",
> +				expected_result_uint128[1],
> expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = -6789.12345;
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0xffffffffffffe57b;
> +  conv_result.d = conv_fp128_2_i128 (conv.e);
> + 
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] !=
> expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) = ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1],
> conv_result.uint128[0]);
> +
> +	  printf("\n does not match expected_result = (result in hex)
> 0x%llx %llx\n\n",
> +				expected_result_uint128[1],
> expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = 6789.12345;
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0x1a85;
> +  conv_result.d = conv_fp128_2_ui128 (conv.e);
> + 
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] !=
> expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_ui128(0x%llx %llx) = ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1],
> conv_result.uint128[0]);
> +	  
> +	  printf("\n does not match expected_result = (result in hex)
> 0x%llx %llx\n\n",
> +				expected_result_uint128[1],
> expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  return 0;
> +}
> diff --git a/libgcc/config/rs6000/fixkfti.c
> b/libgcc/config/rs6000/fixkfti-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/fixkfti.c
> rename to libgcc/config/rs6000/fixkfti-sw.c
> index a22286228aa..d6bbbf889b7 100644
> --- a/libgcc/config/rs6000/fixkfti.c
> +++ b/libgcc/config/rs6000/fixkfti-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it
> and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TItype
> -__fixkfti (TFtype a)
> +__fixkfti_sw (TFtype a)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/fixunskfti.c
> b/libgcc/config/rs6000/fixunskfti-sw.c
> similarity index 90%
> rename from libgcc/config/rs6000/fixunskfti.c
> rename to libgcc/config/rs6000/fixunskfti-sw.c
> index ab232d92d24..5c8b94b8862 100644
> --- a/libgcc/config/rs6000/fixunskfti.c
> +++ b/libgcc/config/rs6000/fixunskfti-sw.c
> @@ -5,7 +5,10 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> +
> +   The fixunskfti.c file renamed fixunskfti-sw.c.  Updated the
> +   source file names and made whitespace fixes
> 


^ That should be in the patch description or embedded into the
changelog.



Nothing else jumps out at me below,

Thanks
-Will


>     The GNU C Library is free software; you can redistribute it
> and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +38,7 @@
>  #include "quad-float128.h"
> 
>  UTItype
> -__fixunskfti (TFtype a)
> +__fixunskfti_sw (TFtype a)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/float128-hw.c
> b/libgcc/config/rs6000/float128-hw.c
> index 8705b53e22a..be8bd07e853 100644
> --- a/libgcc/config/rs6000/float128-hw.c
> +++ b/libgcc/config/rs6000/float128-hw.c
> @@ -86,6 +86,30 @@ __floatdikf_hw (DItype_ppc a)
>    return (TFtype) a;
>  }
> 
> +TFtype
> +__floattikf_hw (TItype_ppc a)
> +{
> +  return (TFtype) a;
> +}
> +
> +TFtype
> +__floatuntikf_hw (UTItype_ppc a)
> +{
> +  return (TFtype) a;
> +}
> +
> +TItype_ppc
> +__fixkfti_hw (TFtype a)
> +{
> +  return (TItype_ppc) a;
> +}
> +
> +UTItype_ppc
> +__fixunskfti_hw (TFtype a)
> +{
> +  return (UTItype_ppc) a;
> +}
> +
>  TFtype
>  __floatundikf_hw (UDItype_ppc a)
>  {
> diff --git a/libgcc/config/rs6000/float128-ifunc.c
> b/libgcc/config/rs6000/float128-ifunc.c
> index c2f65912a74..c221be2c864 100644
> --- a/libgcc/config/rs6000/float128-ifunc.c
> +++ b/libgcc/config/rs6000/float128-ifunc.c
> @@ -46,14 +46,9 @@
>  #endif
> 
>  #define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW :
> SW)
> +#define SW_OR_HW_ISA3_1(SW, HW) (__builtin_cpu_supports ("arch_3_1")
> ? HW : SW)
> 
>  /* Resolvers.  */
> -
> -/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti,
> __floattikf,
> -   and __floatuntikf.  There is no ISA 3.0 instruction that converts
> between
> -   128-bit integer types and 128-bit IEEE floating point, or vice
> versa.  So
> -   use the emulator functions for these conversions.  */

This would be good to turn into part of the changelog.

"Provide ifunc resolvers for 128-bit floating point conversions..."


> -
>  static __typeof__ (__addkf3_sw) *
>  __addkf3_resolve (void)
>  {
> @@ -102,6 +97,18 @@ __floatdikf_resolve (void)
>    return SW_OR_HW (__floatdikf_sw, __floatdikf_hw);
>  }
> 
> +static __typeof__ (__floattikf_sw) *
> +__floattikf_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__floattikf_sw, __floattikf_hw);
> +}
> +
> +static __typeof__ (__floatuntikf_sw) *
> +__floatuntikf_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__floatuntikf_sw, __floatuntikf_hw);
> +}
> +
>  static __typeof__ (__floatunsikf_sw) *
>  __floatunsikf_resolve (void)
>  {
> @@ -114,6 +121,19 @@ __floatundikf_resolve (void)
>    return SW_OR_HW (__floatundikf_sw, __floatundikf_hw);
>  }
> 
> +
> +static __typeof__ (__fixkfti_sw) *
> +__fixkfti_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__fixkfti_sw, __fixkfti_hw);
> +}
> +
> +static __typeof__ (__fixunskfti_sw) *
> +__fixunskfti_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__fixunskfti_sw, __fixunskfti_hw);
> +}
> +
>  static __typeof__ (__fixkfsi_sw) *
>  __fixkfsi_resolve (void)
>  {
> @@ -303,6 +323,18 @@ TFtype __floatsikf (SItype_ppc)
>  TFtype __floatdikf (DItype_ppc)
>    __attribute__ ((__ifunc__ ("__floatdikf_resolve")));
> 
> +TFtype __floattikf (TItype_ppc)
> +  __attribute__ ((__ifunc__ ("__floattikf_resolve")));
> +
> +TFtype __floatuntikf (UTItype_ppc)
> +  __attribute__ ((__ifunc__ ("__floatuntikf_resolve")));
> +
> +TItype_ppc __fixkfti (TFtype)
> +  __attribute__ ((__ifunc__ ("__fixkfti_resolve")));
> +
> +UTItype_ppc __fixunskfti (TFtype)
> +  __attribute__ ((__ifunc__ ("__fixunskfti_resolve")));
> +
>  TFtype __floatunsikf (USItype_ppc)
>    __attribute__ ((__ifunc__ ("__floatunsikf_resolve")));
> 
> diff --git a/libgcc/config/rs6000/float128-sed
> b/libgcc/config/rs6000/float128-sed
> index d9a089ff9ba..c0fcddb1959 100644
> --- a/libgcc/config/rs6000/float128-sed
> +++ b/libgcc/config/rs6000/float128-sed
> @@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi/g
>  s/__fixunstfdi/__fixunskfdi/g
>  s/__fixunstfsi/__fixunskfsi/g
>  s/__floatditf/__floatdikf/g
> +s/__floattitf/__floattikf/g
> +s/__floatuntitf/__floatuntikf/g
> +s/__fixtfti/__fixkfti/g
> +s/__fixunstfti/__fixunskfti/g
>  s/__floatsitf/__floatsikf/g
>  s/__floatunditf/__floatundikf/g
>  s/__floatunsitf/__floatunsikf/g
> diff --git a/libgcc/config/rs6000/float128-sed-hw
> b/libgcc/config/rs6000/float128-sed-hw
> index acf36b0c17d..3d2bf556da1 100644
> --- a/libgcc/config/rs6000/float128-sed-hw
> +++ b/libgcc/config/rs6000/float128-sed-hw
> @@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi_sw/g
>  s/__fixunstfdi/__fixunskfdi_sw/g
>  s/__fixunstfsi/__fixunskfsi_sw/g
>  s/__floatditf/__floatdikf_sw/g
> +s/__floattitf/__floattikf_sw/g
> +s/__floatuntitf/__floatuntikf_sw/g
> +s/__fixtfti/__fixkfti_sw/g
> +s/__fixunstfti/__fixunskfti_sw/g
>  s/__floatsitf/__floatsikf_sw/g
>  s/__floatunditf/__floatundikf_sw/g
>  s/__floatunsitf/__floatunsikf_sw/g
> diff --git a/libgcc/config/rs6000/floattikf.c
> b/libgcc/config/rs6000/floattikf-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/floattikf.c
> rename to libgcc/config/rs6000/floattikf-sw.c
> index 4e8c40cfbe4..110706352bb 100644
> --- a/libgcc/config/rs6000/floattikf.c
> +++ b/libgcc/config/rs6000/floattikf-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it
> and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TFtype
> -__floattikf (TItype i)
> +__floattikf_sw (TItype i)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/floatuntikf.c
> b/libgcc/config/rs6000/floatuntikf-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/floatuntikf.c
> rename to libgcc/config/rs6000/floatuntikf-sw.c
> index 8bfba4267d4..5e712a67e26 100644
> --- a/libgcc/config/rs6000/floatuntikf.c
> +++ b/libgcc/config/rs6000/floatuntikf-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it
> and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TFtype
> -__floatuntikf (UTItype i)
> +__floatuntikf_sw (UTItype i)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/quad-float128.h
> b/libgcc/config/rs6000/quad-float128.h
> index 32ef328a8ea..24712b9277f 100644
> --- a/libgcc/config/rs6000/quad-float128.h
> +++ b/libgcc/config/rs6000/quad-float128.h
> @@ -87,19 +87,18 @@ extern USItype_ppc __fixunskfsi_sw (TFtype);
>  extern UDItype_ppc __fixunskfdi_sw (TFtype);
>  extern TFtype __floatsikf_sw (SItype_ppc);
>  extern TFtype __floatdikf_sw (DItype_ppc);
> +extern TFtype __floattikf_sw (TItype_ppc);
>  extern TFtype __floatunsikf_sw (USItype_ppc);
>  extern TFtype __floatundikf_sw (UDItype_ppc);
> +extern TFtype __floatuntikf_sw (UTItype_ppc);
> +extern TItype_ppc __fixkfti_sw (TFtype);
> +extern UTItype_ppc __fixunskfti_sw (TFtype);
>  extern IBM128_TYPE __extendkftf2_sw (TFtype);
>  extern TFtype __trunctfkf2_sw (IBM128_TYPE);
>  extern TCtype __mulkc3_sw (TFtype, TFtype, TFtype, TFtype);
>  extern TCtype __divkc3_sw (TFtype, TFtype, TFtype, TFtype);
> 
>  #ifdef _ARCH_PPC64
> -/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti,
> __floattikf,
> -   and __floatuntikf.  There is no ISA 3.0 instruction that converts
> between
> -   128-bit integer types and 128-bit IEEE floating point, or vice
> versa.  So
> -   use the emulator functions for these conversions.  */
> -
>  extern TItype_ppc __fixkfti (TFtype);
>  extern UTItype_ppc __fixunskfti (TFtype);
>  extern TFtype __floattikf (TItype_ppc);
> @@ -130,8 +129,12 @@ extern USItype_ppc __fixunskfsi_hw (TFtype);
>  extern UDItype_ppc __fixunskfdi_hw (TFtype);
>  extern TFtype __floatsikf_hw (SItype_ppc);
>  extern TFtype __floatdikf_hw (DItype_ppc);
> +extern TFtype __floattikf_hw (TItype_ppc);
>  extern TFtype __floatunsikf_hw (USItype_ppc);
>  extern TFtype __floatundikf_hw (UDItype_ppc);
> +extern TFtype __floatuntikf_hw (UTItype_ppc);
> +extern TItype_ppc __fixkfti_hw (TFtype);
> +extern UTItype_ppc __fixunskfti_hw (TFtype);
>  extern IBM128_TYPE __extendkftf2_hw (TFtype);
>  extern TFtype __trunctfkf2_hw (IBM128_TYPE);
>  extern TCtype __mulkc3_hw (TFtype, TFtype, TFtype, TFtype);
> @@ -162,8 +165,12 @@ extern USItype_ppc __fixunskfsi (TFtype);
>  extern UDItype_ppc __fixunskfdi (TFtype);
>  extern TFtype __floatsikf (SItype_ppc);
>  extern TFtype __floatdikf (DItype_ppc);
> +extern TFtype __floattikf (TItype_ppc);
>  extern TFtype __floatunsikf (USItype_ppc);
>  extern TFtype __floatundikf (UDItype_ppc);
> +extern TFtype __floatuntikf (UTItype_ppc);
> +extern TItype_ppc __fixkfti (TFtype);
> +extern UTItype_ppc __fixunskfti (TFtype);
>  extern IBM128_TYPE __extendkftf2 (TFtype);
>  extern TFtype __trunctfkf2 (IBM128_TYPE);
> 
> diff --git a/libgcc/config/rs6000/t-float128
> b/libgcc/config/rs6000/t-float128
> index d5413445189..325b22fd49e 100644
> --- a/libgcc/config/rs6000/t-float128
> +++ b/libgcc/config/rs6000/t-float128
> @@ -23,7 +23,8 @@ fp128_softfp_shared_obj	= $(addsuffix
> -sw_s$(objext),$(fp128_softfp_funcs))
>  fp128_softfp_obj	= $(fp128_softfp_static_obj)
> $(fp128_softfp_shared_obj)
> 
>  # New functions for software emulation
> -fp128_ppc_funcs		= floattikf floatuntikf fixkfti
> fixunskfti \
> +fp128_ppc_funcs		= floattikf-sw floatuntikf-sw \
> +			  fixkfti-sw fixunskfti-sw \
>  			  extendkftf2-sw trunctfkf2-sw \
>  			  sfp-exceptions _mulkc3 _divkc3 _powikf2
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations
  2020-09-24 18:20   ` will schmidt
@ 2020-10-05 18:51     ` Carl Love
  2020-10-07 21:08       ` will schmidt
  0 siblings, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-10-05 18:51 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will, Segher:

Patch 1, adds the 128-bit sign extension instruction support and
corresponding builtin support.

I updated the change log per the comments from Will.

Patch has been retested on Power 9 LE.

Pet me know if it is ready to commit to mainline.

                     Carl 

-----------------------------------------------------------


gcc/ChangeLog

2020-10-05  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
	for new builtins.
	* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
	overloaded builtin definitions.
	(VSIGNEXTSB2W, VSIGNEXTSH2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D):
	Add builtin expansions.
	* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
	P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
	* config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
	visible.
	* doc/extend.texi:  Add documentation for the vec_signexti and
	vec_signextll builtins.

gcc/testsuite/ChangeLog

2020-10-05  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
---
 gcc/config/rs6000/altivec.h                   |   3 +
 gcc/config/rs6000/rs6000-builtin.def          |   9 ++
 gcc/config/rs6000/rs6000-call.c               |  13 ++
 gcc/config/rs6000/vsx.md                      |   2 +-
 gcc/doc/extend.texi                           |  15 ++
 .../powerpc/p9-sign_extend-runnable.c         | 128 ++++++++++++++++++
 6 files changed, 169 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index f7720d136c9..cfa5eda4cd5 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -494,6 +494,9 @@
 
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
+#define vec_signexti  __builtin_vec_vsignexti
+#define vec_signextll __builtin_vec_vsignextll
+
 #endif
 
 /* Predicates.
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index e91a48ddf5f..4c2e9460949 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2715,6 +2715,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,	"vprtybd")
 BU_P9V_OVERLOAD_1 (VPRTYBQ,	"vprtybq")
 BU_P9V_OVERLOAD_1 (VPRTYBW,	"vprtybw")
 BU_P9V_OVERLOAD_1 (VPARITY_LSBB,	"vparity_lsbb")
+BU_P9V_OVERLOAD_1 (VSIGNEXTI,	"vsignexti")
+BU_P9V_OVERLOAD_1 (VSIGNEXTLL,	"vsignextll")
 
 /* 2 argument functions added in ISA 3.0 (power9).  */
 BU_P9_2 (CMPRB,	"byte_in_range",	CONST,	cmprb)
@@ -2726,6 +2728,13 @@ BU_P9_OVERLOAD_2 (CMPRB,	"byte_in_range")
 BU_P9_OVERLOAD_2 (CMPRB2,	"byte_in_either_range")
 BU_P9_OVERLOAD_2 (CMPEQB,	"byte_in_set")
 \f
+/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1.  */
+BU_P9V_AV_1 (VSIGNEXTSB2W,	"vsignextsb2w",		CONST,  vsx_sign_extend_qi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSH2W,	"vsignextsh2w",		CONST,  vsx_sign_extend_hi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSB2D,	"vsignextsb2d",		CONST,  vsx_sign_extend_qi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
+
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
 BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index a8b520834c7..9e514a01012 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5527,6 +5527,19 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI },
 
+  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 */
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
+
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+
   /* Overloaded built-in functions for ISA3.1 (power10). */
   { P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 },
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 4ff52455fd3..31fcffe8f33 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4787,7 +4787,7 @@
   "vextsh2<wd> %0,%1"
   [(set_attr "type" "vecexts")])
 
-(define_insn "*vsx_sign_extend_si_v2di"
+(define_insn "vsx_sign_extend_si_v2di"
   [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
 	(unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
 		     UNSPEC_VSX_SIGN_EXTEND))]
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5571c4f2ff2..c1c2c9f9bf7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20800,6 +20800,21 @@ void vec_xst (vector unsigned char, int, vector unsigned char *);
 void vec_xst (vector unsigned char, int, unsigned char *);
 @end smallexample
 
+uThe following sign extension builtins are provided.
+
+@smallexample
+vector signed int vec_signexti (vector signed char a)
+vector signed long long vec_signextll (vector signed char a)
+vector signed int vec_signexti (vector signed short a)
+vector signed long long vec_signextll (vector signed short a)
+vector signed long long vec_signextll (vector signed int a)
+@end smallexample
+
+Each element of the result is produced by sign-extending the element of the
+input vector that would fall in the least significant portion of the result
+element. For example, a sign-extension of a vector signed char to a vector
+signed long long will sign extend the rightmost byte of each doubleword.
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 3.1
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1
 
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
new file mode 100644
index 00000000000..7bf979c6fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
@@ -0,0 +1,128 @@
+/* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
+
+/* These builtins were not defined until ISA 3.1 but only require ISA 3.0
+   support.  */
+
+/* { dg-final { scan-assembler-times {\mvextsb2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsw2d\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector signed char vec_arg_qi, vec_result_qi;
+  vector signed short int vec_arg_hi, vec_result_hi, vec_expected_hi;
+  vector signed int vec_arg_wi, vec_result_wi, vec_expected_wi;
+  vector signed long long vec_result_di, vec_expected_di;
+
+  /* test sign extend byte to word */
+  vec_arg_qi = (vector signed char) {1, 2, 3, 4, 5, 6, 7, 8,
+				     -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_wi = (vector signed int) {1, 5, -1, -5};
+
+  vec_result_wi = vec_signexti (vec_arg_qi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(char, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n",
+	     i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend byte to double */
+  vec_arg_qi = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,
+				    -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_qi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(byte, long long int):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to word */
+  vec_arg_hi = (vector signed short int){1, 2, 3, 4, -1, -2, -3, -4};
+  vec_expected_wi = (vector signed int){1, 3, -1, -3};
+
+  vec_result_wi = vec_signexti(vec_arg_hi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(short, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n", i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to double word */
+  vec_arg_hi = (vector signed short int ){1, 3, 5, 7,  -1, -3, -5, -7};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_hi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(short, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend word to double word */
+  vec_arg_wi = (vector signed int ){1, 3, -1, -3};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_wi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(word, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  return 0;
+}
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2a/5] rs6000, vec_rlnm builtin fix arguments
  2020-09-24 18:21   ` will schmidt
@ 2020-10-05 18:52     ` Carl Love
  2020-10-07 21:08       ` will schmidt
  2020-10-05 18:52     ` [PATCH 2b/5] RS6000 add 128-bit Integer Operations Carl Love
  1 sibling, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-10-05 18:52 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will, Segher:



The following changes were made from the previous version:

Per Will's comments, I split the bug fix from patch 2 into a separate
patch.  This patch is the bug fix for the vec_rlnm builtin.

Regression tests reran on Power 9 LE with no regression errors.

Please let me know if it looks OK to commit to mainline.

                  Carl 

----------------------------------------------------------


gcc/ChangeLog

2020-10-05  Carl Love  <cel@us.ibm.com>

	* config/rs6000/altivec.h (vec_rlnm): Fix bug in argument generation.
---
 gcc/config/rs6000/altivec.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 8a2dcda0144..f7720d136c9 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -183,7 +183,7 @@
 #define vec_recipdiv __builtin_vec_recipdiv
 #define vec_rlmi __builtin_vec_rlmi
 #define vec_vrlnm __builtin_vec_rlnm
-#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
+#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
 #define vec_rsqrt __builtin_vec_rsqrt
 #define vec_rsqrte __builtin_vec_rsqrte
 #define vec_signed __builtin_vec_vsigned
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2b/5] RS6000 add 128-bit Integer Operations
  2020-09-24 18:21   ` will schmidt
  2020-10-05 18:52     ` [PATCH 2a/5] rs6000, vec_rlnm builtin fix arguments Carl Love
@ 2020-10-05 18:52     ` Carl Love
  2020-10-07 21:53       ` will schmidt
  1 sibling, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-10-05 18:52 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will and Segher:

This is the rest of the second patch which adds the 128-bit integer
support for divide, modulo, shift, compare of 128-bit
integers instructions and builtin support.

In the last round of changes, the flag for the 128-bit operations was
removed.  Per Will's comments, the  BU_P10_128BIT_* builtin definitions
can be removed.  Instead we can just use P10V_BUILTIN. Similarly for
the BU_P10_P builtin definition.  The commit log was updated to reflect
the change.  There were a few change log entries for the 128-bit
operations flag that needed removing.  As well as other fixes noted by
Will.

The changes are all name changes not functional changes.  

No regression failures were found when run on a P9.

Please let me know if this is ready for mainline.  

                           Carl

--------------------------------------------------------

gcc/ChangeLog

	2020-10/05  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
	for new builtins.
	* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
	UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
	(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
	altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
	altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
	altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
	altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
	define_insn.
	(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
	vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
	altivec_vrlqnm): New define_expands.
	* config/rs6000/rs6000-builtin.def (VCMPEQUT_P, VCMPGTST_P,
	VCMPGTUT_P): Add macro expansions.
	(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
	CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
	VCMPAET_P, VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ,
	VSLQ, VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
	MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
	(VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions.
	* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
	P10V_BUILTIN_CMPGE_1TI, P10V_BUILTIN_CMPGE_U1TI,
	P10V_BUILTIN_VCMPGTUT, P10V_BUILTIN_VCMPGTST,
	P10V_BUILTIN_CMPLE_1TI, P10V_BUILTIN_VCMPLE_U1TI,
	P10V_BUILTIN_128BIT_DIV_V1TI, P10V_BUILTIN_128BIT_UDIV_V1TI,
	P10V_BUILTIN_128BIT_VMULESD, P10V_BUILTIN_128BIT_VMULEUD,
	P10V_BUILTIN_128BIT_VMULOSD, P10V_BUILTIN_128BIT_VMULOUD,
	P10V_BUILTIN_VNOR_V1TI, P10V_BUILTIN_VNOR_V1TI_UNS,
	P10V_BUILTIN_128BIT_VRLQ, P10V_BUILTIN_128BIT_VRLQMI,
	P10V_BUILTIN_128BIT_VRLQNM, P10V_BUILTIN_128BIT_VSLQ,
	P10V_BUILTIN_128BIT_VSRQ, P10V_BUILTIN_128BIT_VSRAQ,
	P10V_BUILTIN_VCMPGTUT_P, P10V_BUILTIN_VCMPGTST_P,
	P10V_BUILTIN_VCMPEQUT_P, P10V_BUILTIN_VCMPGTUT_P,
	P10V_BUILTIN_VCMPGTST_P, P10V_BUILTIN_CMPNET,
	P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P,
	P10V_BUILTIN_128BIT_VSIGNEXTSD2Q, P10V_BUILTIN_128BIT_DIVES_V1TI,
	P10V_BUILTIN_128BIT_MODS_V1TI, P10V_BUILTIN_128BIT_MODU_V1TI):
	New overloaded definitions.
	(rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT,
	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
	P10_BUILTIN_CMPLE_U1TI]: New case statements.
	(rs6000_init_builtins) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
	New assignments.
	(altivec_init_builtins): New E_V1TImode case statement.
	(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
	P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
	P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.
	* config/rs6000/r6000.c (rs6000_handle_altivec_attribute)[E_TImode,
	E_V1TImode]: New case statements.
	* config/rs6000/r6000.h (RS6000_BTM_TI_VECTOR_OPS): New defines.
	(rs6000_builtin_type_index): New enum value RS6000_BTI_bool_V1TI.
	* config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti,
	vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti,
	vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
	vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
	vlshrv1ti3, vashrv1ti3): New define_expands.
	* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
	UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
	UNSPEC_VSX_MODUQ, UNSPEC_XXSWAPD_V1TI): New unspecs.
	(vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti, vsx_diveu_v1ti,
	vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti, vsx_sign_extend_v2di_v1ti):
	New define_insns.
	(vcmpnet): New define_expand.
	* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
	vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
	vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,
	vec_cmpge, vec_cmple, vec_all_eq, vec_all_ne, vec_all_gt, vec_all_lt,
	vec_all_ge, vec_all_le, vec_any_eq, vec_any_ne, vec_any_gt, vec_any_lt,
	vec_any_ge, vec_any_le.

gcc/testsuite/ChangeLog

2020-10-05 Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/int_128bit-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |    4 +
 gcc/config/rs6000/altivec.md                  |  240 ++
 gcc/config/rs6000/rs6000-builtin.def          |   46 +-
 gcc/config/rs6000/rs6000-call.c               |  138 +-
 gcc/config/rs6000/rs6000.c                    |    1 +
 gcc/config/rs6000/rs6000.h                    |    4 +-
 gcc/config/rs6000/vector.md                   |  191 ++
 gcc/config/rs6000/vsx.md                      |   97 +
 gcc/doc/extend.texi                           |  174 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 2254 +++++++++++++++++
 10 files changed, 3145 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index cfa5eda4cd5..fc67073f79c 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -690,6 +690,10 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
 
 #ifdef _ARCH_PWR10
+#define vec_signextq  __builtin_vec_vsignextq
+#define vec_dive __builtin_vec_dive
+#define vec_mod  __builtin_vec_mod
+
 /* May modify these macro definitions if future capabilities overload
    with support for different vector argument and result types.  */
 #define vec_cntlzm(a, b)	__builtin_altivec_vclzdm (a, b)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0a2e634d6b0..34a4731342a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -39,12 +39,16 @@
    UNSPEC_VMULESH
    UNSPEC_VMULEUW
    UNSPEC_VMULESW
+   UNSPEC_VMULEUD
+   UNSPEC_VMULESD
    UNSPEC_VMULOUB
    UNSPEC_VMULOSB
    UNSPEC_VMULOUH
    UNSPEC_VMULOSH
    UNSPEC_VMULOUW
    UNSPEC_VMULOSW
+   UNSPEC_VMULOUD
+   UNSPEC_VMULOSD
    UNSPEC_VPKPX
    UNSPEC_VPACK_SIGN_SIGN_SAT
    UNSPEC_VPACK_SIGN_UNS_SAT
@@ -628,6 +632,14 @@
   "vcmpequ<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_eqv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpequq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gt<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gt:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -636,6 +648,14 @@
   "vcmpgts<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gt:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpgtsq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gtu<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gtu:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -644,6 +664,14 @@
   "vcmpgtu<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		  (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpgtuq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_eqv4sf"
   [(set (match_operand:V4SF 0 "altivec_register_operand" "=v")
 	(eq:V4SF (match_operand:V4SF 1 "altivec_register_operand" "v")
@@ -1687,6 +1715,19 @@
  DONE;
 })
 
+(define_expand "vec_widen_umult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
 (define_expand "vec_widen_smult_even_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1700,6 +1741,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+ else
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_umult_odd_v16qi"
   [(use (match_operand:V8HI 0 "register_operand"))
    (use (match_operand:V16QI 1 "register_operand"))
@@ -1765,6 +1819,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_umult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_smult_odd_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1778,6 +1845,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "altivec_vmuleub"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
@@ -1859,6 +1939,15 @@
   "vmuleuw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuleud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULEUD))]
+  "TARGET_POWER10"
+  "vmuleud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulouw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1868,6 +1957,15 @@
   "vmulouw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuloud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOUD))]
+  "TARGET_POWER10"
+  "vmuloud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulesw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1877,6 +1975,15 @@
   "vmulesw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulesd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULESD))]
+  "TARGET_POWER10"
+  "vmulesd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulosw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1886,6 +1993,15 @@
   "vmulosw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulosd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOSD))]
+  "TARGET_POWER10"
+  "vmulosd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 ;; Vector pack/unpack
 (define_insn "altivec_vpkpx"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
@@ -1979,6 +2095,15 @@
   "vrl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vrlq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+;; rotate amount in needs to be in bits[57:63] of operand2.
+  "vrlq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vrl<VI_char>mi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
@@ -1989,6 +2114,33 @@
   "vrl<VI_char>mi %0,%2,%3"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqmi"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")
+		      (match_operand:V1TI 3 "vsx_register_operand")]
+		     UNSPEC_VRLMI))]
+  "TARGET_POWER10"
+{
+  /* Mask bit begin, end fields need to be in bits [41:55] of 128-bit operand2. */
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[3]));
+  emit_insn(gen_altivec_vrlqmi_inst (operands[0], operands[1], operands[2], tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqmi_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "0")
+		      (match_operand:V1TI 3 "vsx_register_operand" "v")]
+		     UNSPEC_VRLMI))]
+  "TARGET_POWER10"
+  "vrlqmi %0,%1,%3"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vrl<VI_char>nm"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
@@ -1998,6 +2150,31 @@
   "vrl<VI_char>nm %0,%1,%2"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqnm"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")]
+		     UNSPEC_VRLNM))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vrlqnm_inst (operands[0], operands[1], tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqnm_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+		     UNSPEC_VRLNM))]
+  "TARGET_POWER10"
+  ;; rotate and mask bits need to be in upper 64-bits of operand2.
+  "vrlqnm %0,%1,%2"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vsl"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2042,6 +2219,15 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vslq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vslq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsr<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (lshiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2050,6 +2236,15 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsrq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsrq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsra<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (ashiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2058,6 +2253,15 @@
   "vsra<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsraq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsraq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vsr"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2618,6 +2822,18 @@
   "vcmpequ<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_vcmpequt_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v")
+			   (match_operand:V1TI 2 "altivec_register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpequq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgts<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2630,6 +2846,18 @@
   "vcmpgts<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtst_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gt:CC (match_operand:V1TI 1 "register_operand" "v")
+			   (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gt:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpgtsq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgtu<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gtu:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2642,6 +2870,18 @@
   "vcmpgtu<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtut_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gtu:CC (match_operand:V1TI 1 "register_operand" "v")
+			    (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gtu:V1TI (match_dup 1)
+		  (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpgtuq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpeqfp_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(eq:CC (match_operand:V4SF 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 4c2e9460949..14345f9ea9d 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1178,7 +1178,6 @@
 		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
 		     | RS6000_BTC_TERNARY),				\
 		    CODE_FOR_ ## ICODE)			/* ICODE */
-
 
 /* Insure 0 is not a legitimate index.  */
 BU_SPECIAL_X (RS6000_BUILTIN_NONE, NULL, 0, RS6000_BTC_MISC)
@@ -2736,6 +2735,10 @@ BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
 BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
 
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
+BU_P10V_AV_2 (VCMPEQUT_P,	"vcmpequt_p",	CONST,	vector_eq_v1ti_p)
+BU_P10V_AV_2 (VCMPGTST_P,	"vcmpgtst_p",	CONST,	vector_gt_v1ti_p)
+BU_P10V_AV_2 (VCMPGTUT_P,	"vcmpgtut_p",	CONST,	vector_gtu_v1ti_p)
+
 BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
 BU_P10_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm)
@@ -2756,7 +2759,38 @@ BU_P10V_VSX_2 (XXGENPCVM_V16QI, "xxgenpcvm_v16qi", CONST, xxgenpcvm_v16qi)
 BU_P10V_VSX_2 (XXGENPCVM_V8HI, "xxgenpcvm_v8hi", CONST, xxgenpcvm_v8hi)
 BU_P10V_VSX_2 (XXGENPCVM_V4SI, "xxgenpcvm_v4si", CONST, xxgenpcvm_v4si)
 BU_P10V_VSX_2 (XXGENPCVM_V2DI, "xxgenpcvm_v2di", CONST, xxgenpcvm_v2di)
-
+BU_P10V_AV_2 (VCMPGTUT,		"vcmpgtut",	CONST,	vector_gtuv1ti)
+BU_P10V_AV_2 (VCMPGTST,		"vcmpgtst",	CONST,	vector_gtv1ti)
+BU_P10V_AV_2 (VCMPEQUT,		"vcmpequt",	CONST,	eqvv1ti3)
+BU_P10V_AV_2 (CMPNET,		"vcmpnet",	CONST,	vcmpnet)
+BU_P10V_AV_2 (CMPGE_1TI,	"cmpge_1ti",    CONST,  vector_nltv1ti)
+BU_P10V_AV_2 (CMPGE_U1TI,	"cmpge_u1ti",   CONST,  vector_nltuv1ti)
+BU_P10V_AV_2 (CMPLE_1TI,	"cmple_1ti",    CONST,  vector_ngtv1ti)
+BU_P10V_AV_2 (CMPLE_U1TI,	"cmple_u1ti",   CONST,  vector_ngtuv1ti)
+BU_P10V_AV_2 (VNOR_V1TI_UNS,	"vnor_v1ti_uns",CONST,	norv1ti3)
+BU_P10V_AV_2 (VNOR_V1TI,	"vnor_v1ti",	CONST,	norv1ti3)
+BU_P10V_AV_2 (VCMPNET_P,	"vcmpnet_p",	CONST,	vector_ne_v1ti_p)
+BU_P10V_AV_2 (VCMPAET_P,	"vcmpaet_p",	CONST,	vector_ae_v1ti_p)
+
+BU_P10V_AV_1 (VSIGNEXTSD2Q,	"vsignext",     CONST,  vsx_sign_extend_v2di_v1ti)
+
+BU_P10V_AV_2 (VMULEUD,	"vmuleud",	CONST,	vec_widen_umult_even_v2di)
+BU_P10V_AV_2 (VMULESD,	"vmulesd",	CONST,	vec_widen_smult_even_v2di)
+BU_P10V_AV_2 (VMULOUD,	"vmuloud",	CONST,	vec_widen_umult_odd_v2di)
+BU_P10V_AV_2 (VMULOSD,	"vmulosd",	CONST,	vec_widen_smult_odd_v2di)
+BU_P10V_AV_2 (VRLQ,		"vrlq",		CONST,	vrotlv1ti3)
+BU_P10V_AV_2 (VSLQ,		"vslq",		CONST,	vashlv1ti3)
+BU_P10V_AV_2 (VSRQ,		"vsrq",		CONST,	vlshrv1ti3)
+BU_P10V_AV_2 (VSRAQ,		"vsraq",	CONST,	vashrv1ti3)
+BU_P10V_AV_2 (VRLQNM,	"vrlqnm",	CONST,	altivec_vrlqnm)
+BU_P10V_AV_2 (DIV_V1TI,	"div_1ti",      CONST,  vsx_div_v1ti)
+BU_P10V_AV_2 (UDIV_V1TI,	"udiv_1ti",     CONST,  vsx_udiv_v1ti)
+BU_P10V_AV_2 (DIVES_V1TI,	"dives",	CONST,	vsx_dives_v1ti)
+BU_P10V_AV_2 (DIVEU_V1TI,	"diveu",	CONST,	vsx_diveu_v1ti)
+BU_P10V_AV_2 (MODS_V1TI,	"mods",		CONST,	vsx_mods_v1ti)
+BU_P10V_AV_2 (MODU_V1TI,	"modu",		CONST,	vsx_modu_v1ti)
+
+BU_P10V_AV_3 (VRLQMI,	"vrlqmi",	CONST,	altivec_vrlqmi)
 BU_P10V_AV_3 (VEXTRACTBL, "vextdubvlx", CONST, vextractlv16qi)
 BU_P10V_AV_3 (VEXTRACTHL, "vextduhvlx", CONST, vextractlv8hi)
 BU_P10V_AV_3 (VEXTRACTWL, "vextduwvlx", CONST, vextractlv4si)
@@ -2863,6 +2897,12 @@ BU_P10_OVERLOAD_2 (CLRR, "clrr")
 BU_P10_OVERLOAD_2 (GNB, "gnb")
 BU_P10_OVERLOAD_4 (XXEVAL, "xxeval")
 BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
+BU_P10_OVERLOAD_2 (VRLQ, "vrlq")
+BU_P10_OVERLOAD_2 (VSLQ, "vslq")
+BU_P10_OVERLOAD_2 (VSRQ, "vsrq")
+BU_P10_OVERLOAD_2 (VSRAQ, "vsraq")
+BU_P10_OVERLOAD_2 (DIVE, "dive")
+BU_P10_OVERLOAD_2 (MOD, "mod")
 
 BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
 BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
@@ -2882,6 +2922,8 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
 BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
 BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
 
+BU_P10_OVERLOAD_1 (SIGNEXT, "vsignextq")
+
 
 BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
 BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 9e514a01012..87fff5c1c80 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -839,6 +839,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, VSX_BUILTIN_XVCMPEQDP,
@@ -885,6 +889,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -899,8 +909,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTSD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTST,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, VSX_BUILTIN_XVCMPGTDP,
@@ -943,6 +957,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_CMPLE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -995,6 +1014,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIV_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_UDIV_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
@@ -1789,6 +1814,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULE, P8V_BUILTIN_VMULEUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10V_BUILTIN_VMULESD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10V_BUILTIN_VMULEUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULEUB, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULESB, ALTIVEC_BUILTIN_VMULESB,
@@ -1812,6 +1842,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULO, P8V_BUILTIN_VMULOUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10V_BUILTIN_VMULOSD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10V_BUILTIN_VMULOUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULO, ALTIVEC_BUILTIN_VMULOSH,
     RS6000_BTI_V4SI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULOSH, ALTIVEC_BUILTIN_VMULOSH,
@@ -1854,6 +1889,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
@@ -2115,6 +2160,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_RL, P8V_BUILTIN_VRLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10V_BUILTIN_VRLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10V_BUILTIN_VRLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
@@ -2133,12 +2183,23 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_RLMI, P9V_BUILTIN_VRLDMI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { P9V_BUILTIN_VEC_RLMI, P10V_BUILTIN_VRLQMI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI },
+  { P9V_BUILTIN_VEC_RLMI, P10V_BUILTIN_VRLQMI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLWNM,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLDNM,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10V_BUILTIN_VRLQNM,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10V_BUILTIN_VRLQNM,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
@@ -2155,6 +2216,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, P8V_BUILTIN_VSLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10V_BUILTIN_VSLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10V_BUILTIN_VSLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTSP,
@@ -2351,6 +2417,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SR, P8V_BUILTIN_VSRD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10V_BUILTIN_VSRQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10V_BUILTIN_VSRQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
@@ -2379,6 +2450,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SRA, P8V_BUILTIN_VSRAD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10V_BUILTIN_VSRAQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10V_BUILTIN_VSRAQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
@@ -3996,12 +4072,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10V_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10V_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, ALTIVEC_BUILTIN_VCMPGTFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, VSX_BUILTIN_XVCMPGTDP_P,
@@ -4066,6 +4146,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10V_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10V_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, ALTIVEC_BUILTIN_VCMPEQFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, VSX_BUILTIN_XVCMPEQDP_P,
@@ -4117,12 +4201,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10V_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10V_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, ALTIVEC_BUILTIN_VCMPGEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
@@ -4771,6 +4859,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEW,
     RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
 
   /* The following 2 entries have been deprecated.  */
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEB_P,
@@ -4871,6 +4965,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
 
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
@@ -4976,7 +5072,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
-
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
@@ -5903,6 +6000,21 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
  { P10_BUILTIN_VEC_XVTLSBB_ONES, P10V_BUILTIN_XVTLSBB_ONES,
     RS6000_BTI_INTSI, RS6000_BTI_unsigned_V16QI, 0, 0 },
 
+ { P10_BUILTIN_VEC_SIGNEXT, P10V_BUILTIN_VSIGNEXTSD2Q,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, 0, 0 },
+
+ { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
   { RS6000_BUILTIN_NONE, RS6000_BUILTIN_NONE, 0, 0, 0, 0 }
 };
 
@@ -12256,12 +12368,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPEQUH:
     case ALTIVEC_BUILTIN_VCMPEQUW:
     case P8V_BUILTIN_VCMPEQUD:
+    case P10V_BUILTIN_VCMPEQUT:
       fold_compare_helper (gsi, EQ_EXPR, stmt);
       return true;
 
     case P9V_BUILTIN_CMPNEB:
     case P9V_BUILTIN_CMPNEH:
     case P9V_BUILTIN_CMPNEW:
+    case P10V_BUILTIN_CMPNET:
       fold_compare_helper (gsi, NE_EXPR, stmt);
       return true;
 
@@ -12273,6 +12387,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_2DI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10V_BUILTIN_CMPGE_1TI:
+    case P10V_BUILTIN_CMPGE_U1TI:
       fold_compare_helper (gsi, GE_EXPR, stmt);
       return true;
 
@@ -12284,6 +12400,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
     case P8V_BUILTIN_VCMPGTSD:
+    case P10V_BUILTIN_VCMPGTUT:
+    case P10V_BUILTIN_VCMPGTST:
       fold_compare_helper (gsi, GT_EXPR, stmt);
       return true;
 
@@ -12295,6 +12413,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPLE_U4SI:
     case VSX_BUILTIN_CMPLE_2DI:
     case VSX_BUILTIN_CMPLE_U2DI:
+    case P10V_BUILTIN_CMPLE_1TI:
+    case P10V_BUILTIN_CMPLE_U1TI:
       fold_compare_helper (gsi, LE_EXPR, stmt);
       return true;
 
@@ -13000,6 +13120,8 @@ rs6000_init_builtins (void)
 					    ? "__vector __bool long"
 					    : "__vector __bool long long",
 					    bool_long_long_type_node, 2);
+  bool_V1TI_type_node = rs6000_vector_type ("__vector __bool __int128",
+					    intTI_type_node, 1);
   pixel_V8HI_type_node = rs6000_vector_type ("__vector __pixel",
 					     pixel_type_node, 8);
 
@@ -13185,6 +13307,10 @@ altivec_init_builtins (void)
     = build_function_type_list (integer_type_node,
 				integer_type_node, V2DI_type_node,
 				V2DI_type_node, NULL_TREE);
+  tree int_ftype_int_v1ti_v1ti
+    = build_function_type_list (integer_type_node,
+				integer_type_node, V1TI_type_node,
+				V1TI_type_node, NULL_TREE);
   tree void_ftype_v4si
     = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
   tree v8hi_ftype_void
@@ -13537,6 +13663,9 @@ altivec_init_builtins (void)
 	case E_VOIDmode:
 	  type = int_ftype_int_opaque_opaque;
 	  break;
+	case E_V1TImode:
+	  type = int_ftype_int_v1ti_v1ti;
+	  break;
 	case E_V2DImode:
 	  type = int_ftype_int_v2di_v2di;
 	  break;
@@ -14136,6 +14265,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10V_BUILTIN_XXGENPCVM_V8HI:
     case P10V_BUILTIN_XXGENPCVM_V4SI:
     case P10V_BUILTIN_XXGENPCVM_V2DI:
+    case P10V_BUILTIN_VMULEUD:
+    case P10V_BUILTIN_VMULOUD:
+    case P10V_BUILTIN_DIVEU_V1TI:
+    case P10V_BUILTIN_MODU_V1TI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
@@ -14235,10 +14368,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case VSX_BUILTIN_CMPGE_U8HI:
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10V_BUILTIN_CMPGE_U1TI:
     case ALTIVEC_BUILTIN_VCMPGTUB:
     case ALTIVEC_BUILTIN_VCMPGTUH:
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
+    case P10V_BUILTIN_VCMPGTUT:
+    case P10V_BUILTIN_VCMPEQUT:
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
       break;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6f204ca202a..cc629f2d938 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -19546,6 +19546,7 @@ rs6000_handle_altivec_attribute (tree *node,
     case 'b':
       switch (mode)
 	{
+	case E_TImode: case E_V1TImode: result = bool_V1TI_type_node; break;
 	case E_DImode: case E_V2DImode: result = bool_V2DI_type_node; break;
 	case E_SImode: case E_V4SImode: result = bool_V4SI_type_node; break;
 	case E_HImode: case E_V8HImode: result = bool_V8HI_type_node; break;
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index bbd8060e143..32ed95cc813 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2322,7 +2322,7 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_FLOAT128_HW	MASK_FLOAT128_HW /* IEEE 128-bit float h/w.  */
 #define RS6000_BTM_MMA		MASK_MMA	/* ISA 3.1 MMA.  */
 #define RS6000_BTM_P10		MASK_POWER10
-
+#define RS6000_BTM_TI_VECTOR_OPS MASK_TI_VECTOR_OPS /* 128-bit integer support */
 
 #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
 				 | RS6000_BTM_VSX			\
@@ -2436,6 +2436,7 @@ enum rs6000_builtin_type_index
   RS6000_BTI_bool_V8HI,          /* __vector __bool short */
   RS6000_BTI_bool_V4SI,          /* __vector __bool int */
   RS6000_BTI_bool_V2DI,          /* __vector __bool long */
+  RS6000_BTI_bool_V1TI,          /* __vector __bool 128-bit */
   RS6000_BTI_pixel_V8HI,         /* __vector __pixel */
   RS6000_BTI_long,	         /* long_integer_type_node */
   RS6000_BTI_unsigned_long,      /* long_unsigned_type_node */
@@ -2489,6 +2490,7 @@ enum rs6000_builtin_type_index
 #define bool_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V8HI])
 #define bool_V4SI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V4SI])
 #define bool_V2DI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V2DI])
+#define bool_V1TI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V1TI])
 #define pixel_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_pixel_V8HI])
 
 #define long_long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long_long])
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 796345c80d3..0cca4232619 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -685,6 +685,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtv1ti"
+  [(set (match_operand:V1TI 0 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))]
+  "TARGET_POWER10"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nlt<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -697,6 +704,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		 (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_gtu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -704,6 +722,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		  (match_operand:V1TI 2 "altivec_register_operand")))]
+  "TARGET_POWER10"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nltu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -716,6 +741,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		  (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+	(not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_geu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(geu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -735,6 +771,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_ngtu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vlogical_operand")
@@ -746,6 +793,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		  (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 ; There are 14 possible vector FP comparison operators, gt and eq of them have
 ; been expanded above, so just support 12 remaining operators here.
 
@@ -894,6 +952,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_eq_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 ;; This expansion handles the V16QI, V8HI, and V4SI modes in the
 ;; implementation of the vec_all_ne built-in functions on Power9.
 (define_expand "vector_ne_<mode>_p"
@@ -976,6 +1046,23 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+(define_expand "vector_ne_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V2DI mode in the implementation of the
 ;; vec_any_eq built-in function on Power9.
 ;;
@@ -1002,6 +1089,26 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+(define_expand "vector_ae_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))
+   (set (match_dup 0)
+	(xor:SI (match_dup 0)
+		(const_int 1)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V4SF and V2DF modes in the Power9
 ;; implementation of the vec_all_ne built-in functions.  Note that the
 ;; expansions for this pattern with these modes makes no use of power9-
@@ -1061,6 +1168,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gt_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gt:CC (match_operand:V1TI 1 "vlogical_operand")
+			     (match_operand:V1TI 2 "vlogical_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (gt:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 (define_expand "vector_ge_<mode>_p"
   [(parallel
     [(set (reg:CC CR6_REGNO)
@@ -1085,6 +1204,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtu_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gtu:CC (match_operand:V1TI 1 "altivec_register_operand")
+			      (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "altivec_register_operand")
+	  (gtu:V1TI (match_dup 1)
+		    (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 ;; AltiVec/VSX predicates.
 
 ;; This expansion is triggered during expansion of predicate built-in
@@ -1460,6 +1591,20 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vrotlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vrlq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for rotatert to make use of vrotl
 (define_expand "vrotr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1481,6 +1626,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vashlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for logical shift right on each vector element
 (define_expand "vlshr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1489,6 +1649,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vlshrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for arithmetic shift right on each vector element
 (define_expand "vashr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1496,6 +1671,22 @@
 			(match_operand:VEC_I 2 "vint_operand")))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
+
+;; No immediate version of this 128-bit instruction
+(define_expand "vashrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vsraq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 
 ;; Vector reduction expanders for VSX
 ; The (VEC_reduc:...
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 31fcffe8f33..5b6a0bd728a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -298,6 +298,12 @@
    UNSPEC_VSX_XXSPLTD
    UNSPEC_VSX_DIVSD
    UNSPEC_VSX_DIVUD
+   UNSPEC_VSX_DIVSQ
+   UNSPEC_VSX_DIVUQ
+   UNSPEC_VSX_DIVESQ
+   UNSPEC_VSX_DIVEUQ
+   UNSPEC_VSX_MODSQ
+   UNSPEC_VSX_MODUQ
    UNSPEC_VSX_MULSD
    UNSPEC_VSX_SIGN_EXTEND
    UNSPEC_VSX_XVCVBF16SPN
@@ -1732,6 +1738,60 @@
 }
   [(set_attr "type" "div")])
 
+(define_insn "vsx_div_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVSQ))]
+  "TARGET_POWER10"
+  "vdivsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_udiv_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVUQ))]
+  "TARGET_POWER10"
+  "vdivuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_dives_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVESQ))]
+  "TARGET_POWER10"
+  "vdivesq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_diveu_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVEUQ))]
+  "TARGET_POWER10"
+  "vdiveuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_mods_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODSQ))]
+  "TARGET_POWER10"
+  "vmodsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_modu_v1ti"
+ [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODUQ))]
+  "TARGET_POWER10"
+  "vmoduq %0,%1,%2"
+  [(set_attr "type" "div")])
+
 ;; *tdiv* instruction returning the FG flag
 (define_expand "vsx_tdiv<mode>3_fg"
   [(set (match_dup 3)
@@ -3083,6 +3143,21 @@
   "xxpermdi %x0,%x1,%x1,2"
   [(set_attr "type" "vecperm")])
 
+;; Swap upper/lower 64-bit values in a 128-bit vector
+(define_insn "xxswapd_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(subreg:V1TI
+           (vec_select:V2DI
+             (subreg:V2DI
+                (match_operand:V1TI 1 "vsx_register_operand" "v") 0 )
+	     (parallel [(const_int 1)(const_int 0)]))
+           0))]
+  "TARGET_POWER10"
+;; AIX does not support extended mnemonic xxswapd.  Use the basic
+;; mnemonic xxpermdi instead.
+  "xxpermdi %x0,%x1,%x1,2"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "xxgenpcvm_<mode>_internal"
   [(set (match_operand:VSX_EXTRACT_I4 0 "altivec_register_operand" "=wa")
 	(unspec:VSX_EXTRACT_I4
@@ -4767,6 +4842,15 @@
    (set_attr "type" "vecload")])
 
 
+;; ISA 3.1 vector extend sign support
+(define_insn "vsx_sign_extend_v2di_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
+		     UNSPEC_VSX_SIGN_EXTEND))]
+  "TARGET_POWER10"
+  "vextsd2q %0,%1"
+  [(set_attr "type" "vecexts")])
+
 ;; ISA 3.0 vector extend sign support
 
 (define_insn "vsx_sign_extend_qi_<mode>"
@@ -5451,6 +5535,19 @@
   "vcmpneb %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Vector Compare Not Equal v1ti (specified/not+eq:)
+(define_expand "vcmpnet"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(not:V1TI
+	  (eq:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		   (match_operand:V1TI 2 "altivec_register_operand"))))]
+   "TARGET_POWER10"
+{
+  emit_insn (gen_eqvv1ti3 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_one_cmplv1ti2 (operands[0], operands[0]));
+  DONE;
+})
+
 ;; Vector Compare Not Equal or Zero Byte
 (define_insn "vcmpnezb"
   [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index c1c2c9f9bf7..99cc053acbe 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21316,6 +21316,180 @@ Generate PCV from specified Mask size, as if implemented by the
 immediate value is either 0, 1, 2 or 3.
 @findex vec_genpcvm
 
+@smallexample
+@exdent vector unsigned __int128 vec_rl (vector unsigned __int128,
+                                         vector unsigned __int128);
+@exdent vector signed __int128 vec_rl (vector signed __int128,
+                                       vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input left by the number of bits
+specified in the most significant quad word of the second input truncated to
+7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlmi (vector signed __int128,
+                                         vector signed __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and inserting it under mask
+into the second input.  The first bit in the mask, the last bit in the mask are
+obtained from the two 7-bit fields bits [108:115] and bits [117:123]
+respectively of the second input.  The shift is obtained from the third input
+in the 7-bit field [125:131] where all bits counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlnm (vector signed __int128,
+                                         vector unsigned __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and ANDing it with a mask.  The
+first bit in the mask and the last bit in the mask are obtained from the two
+7-bit fields bits [117:123] and bits [125:131] respectively of the second
+input.  The shift is obtained from the third input in the 7-bit field bits
+[125:131] where all bits counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_sl(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sl(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of shifting the first input left by the number of bits
+specified in the most significant bits of the second input truncated to
+7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_sr(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sr(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of performing a logical right shift of the first argument
+by the number of bits specified in the most significant double word of the
+second input truncated to 7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_sra(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sra(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of performing arithmetic right shift of the first argument
+by the number of bits specified in the most significant bits of the
+second input truncated to 7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_mule (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mule (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the even
+doubleword elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mulo (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mulo (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the odd
+doubleword elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_div (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_div (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+Returns the result of dividing the first operand by the second operand. An
+attempt to divide any value by zero or to divide the most negative signed
+128-bit integer by negative one results in an undefined value.
+
+@smallexample
+@exdent vector unsigned __int128 vec_dive (vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_dive (vector signed __int128,
+                                         vector signed __int128);
+@end smallexample
+
+The result is produced by shifting the first input left by 128 bits and
+dividing by the second.  If an attempt is made to divide by zero or the result
+is larger than 128 bits, the result is undefined.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mod (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_mod (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+The result is the modulo result of dividing the first input  by the second
+input.
+
+The following builtins perform 128-bit vector comparisons.  The
+@code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is
+one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise
+comparisons between the elements at the same positions within their two vector
+arguments.  The @code{vec_all_xx}function returns a non-zero value if and only
+if all pairwise comparisons are true.  The @code{vec_any_xx} function returns
+a non-zero value if and only if at least one pairwise comparison is true.  The
+@code{vec_cmpxx}function returns a vector of the same type as its two
+arguments, within which each element consists of all ones to denote that
+specified logical comparison of the corresponding elements was true.
+Otherwise, the element of the returned vector contains all zeros.
+
+@smallexample
+vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128);
+
+int vec_all_eq (vector signed __int128, vector signed __int128);
+int vec_all_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ne (vector signed __int128, vector signed __int128);
+int vec_all_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_all_gt (vector signed __int128, vector signed __int128);
+int vec_all_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_lt (vector signed __int128, vector signed __int128);
+int vec_all_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ge (vector signed __int128, vector signed __int128);
+int vec_all_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_all_le (vector signed __int128, vector signed __int128);
+int vec_all_le (vector unsigned __int128, vector unsigned __int128);
+
+int vec_any_eq (vector signed __int128, vector signed __int128);
+int vec_any_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ne (vector signed __int128, vector signed __int128);
+int vec_any_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_any_gt (vector signed __int128, vector signed __int128);
+int vec_any_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_lt (vector signed __int128, vector signed __int128);
+int vec_any_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ge (vector signed __int128, vector signed __int128);
+int vec_any_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_any_le (vector signed __int128, vector signed __int128);
+int vec_any_le (vector unsigned __int128, vector unsigned __int128);
+@end smallexample
+
+
 @node PowerPC Hardware Transactional Memory Built-in Functions
 @subsection PowerPC Hardware Transactional Memory Built-in Functions
 GCC provides two interfaces for accessing the Hardware Transactional
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
new file mode 100644
index 00000000000..85ad544e22b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -0,0 +1,2254 @@
+/* { dg-do run } */
+/* { dg-options "-mcpu=power10 -O2 -save-temps" } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-require-effective-target ppc_native_128bit } */
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mvextsd2q\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsrq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsraq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvcmpuq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpsq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvmuleud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivsq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivuq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveuq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmoduq\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+
+
+void print_i128(__int128_t val)
+{
+  printf(" %lld %llu (0x%llx %llx)",
+	 (signed long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF),
+	 (unsigned long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF));
+}
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i, result_int;
+
+  __int128_t arg1, result;
+  __uint128_t uarg2;
+
+  vector signed long long int vec_arg1_di, vec_arg2_di;
+  vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
+  vector unsigned long long int vec_uresult_di;
+  vector unsigned long long int vec_uexpected_result_di;
+  
+  __int128_t expected_result;
+  __uint128_t uexpected_result;
+
+  vector __int128_t vec_arg1, vec_arg2, vec_result;
+  vector __uint128_t vec_uarg1, vec_uarg2, vec_uarg3, vec_uresult;
+  vector bool __int128  vec_result_bool;
+
+  /* sign extend double to 128-bit integer  */
+  vec_arg1_di[0] = 1000;
+  vec_arg1_di[1] = -123456;
+
+  expected_result = 1000;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1_di[0] = -123456;
+  vec_arg1_di[1] = 1000;
+
+  expected_result = -123456;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  /* test shift 128-bit integers.
+     Note, shift amount is given by the lower 7-bits of the shift amount. */
+  vec_arg1[0] = 3;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]*4;
+
+  vec_result = vec_sl (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 3;
+  uarg2 = 4;
+  expected_result = arg1*16;
+
+  result = arg1 << uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 << uint128):  ");
+    print_i128(arg1);
+    printf(" << %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 3;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]*4;
+  
+  vec_uresult = vec_sl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]/4;
+
+  vec_result = vec_sr (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 48;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]/4;
+  
+  vec_uresult = vec_sr (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 48;
+  uarg2 = 4;
+  expected_result = arg1/16;
+
+  result = arg1 >> uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 >> uint128:  ");
+    print_i128(arg1);
+    printf(" >> %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x0000000012345678ULL;
+  expected_result = (expected_result << 64) | 0x90ABCDEFAABBCCDDULL;
+
+  vec_result = vec_sra (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0xFFFFFFFFFFFFAABBLL;
+  uexpected_result = (uexpected_result << 64) | 0xCCDDEEFF11221234ULL;
+
+  vec_uresult = vec_sra (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x90ABCDEFAABBCCDDULL;
+  expected_result = (expected_result << 64) | 0xEEFF112212345678ULL;
+
+  vec_result = vec_rl (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0x11221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEEFFULL;
+
+  vec_uresult = vec_rl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = (32 << 8) | 95;
+  vec_uarg3[0] = 32;
+  expected_result = 0xaabbccddULL;
+  expected_result = (expected_result << 64) | 0xeeff112200000000ULL;
+
+  vec_result = vec_rlnm (vec_arg1, vec_uarg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(int128, uint128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = (8 << 8) | 119;
+  vec_uarg3[0] = 48;
+
+  uexpected_result = 0x00221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEE00ULL;
+
+  vec_uresult = vec_rlnm (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(uint128, uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_arg2[0] = 0x000000000000DEADULL;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 0x0000BEEF00000000ULL;
+  vec_uarg3[0] = 96 << 16 | 127 << 8 | 32;
+  expected_result = 0x000000000000DEADULL;
+  expected_result = (expected_result << 64) | 0x0000BEEF12345678ULL;
+
+  vec_result = vec_rlmi (vec_arg1, vec_arg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(int128, int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg2_di[1] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 0xDEAD000000000000ULL;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 0x000000000000BEEFULL;
+  vec_uarg3[0] = 16 << 16 | 111 << 8 | 48;
+  uexpected_result = 0xDEAD1234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDBEEFULL;
+
+  vec_uresult = vec_rlmi (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(uint128, unit128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg3[1] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* 128-bit compare tests, result is all 1's if true */
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1[0] = 2468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  uexpected_result = 0xFFFFFFFFFFFFFFFFULL;
+  uexpected_result = (uexpected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: unsigned vec_cmpgt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed vec_cmpgt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpeq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+   
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector multiply Even and Odd tests */
+  vec_arg1_di[0] = 200;
+  vec_arg1_di[1] = 400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[0] * vec_arg2_di[0];
+
+  vec_result = vec_mule (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (signed, signed) failed.\n");
+    printf(" vec_arg1_di[0] = %lld\n", vec_arg1_di[0]);
+    printf(" vec_arg2_di[0] = %lld\n", vec_arg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1_di[0] = -200;
+  vec_arg1_di[1] = -400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[1] * vec_arg2_di[1];
+
+  vec_result = vec_mulo (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (signed, signed) failed.\n");
+    printf(" vec_arg1_di[1] = %lld\n", vec_arg1_di[1]);
+    printf(" vec_arg2_di[1] = %lld\n", vec_arg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[0] * vec_uarg2_di[0];
+
+  vec_uresult = vec_mule (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[1] = %lld\n", vec_uarg1_di[1]);
+    printf(" vec_uarg2_di[1] = %lld\n", vec_uarg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[1] * vec_uarg2_di[1];
+
+  vec_uresult = vec_mulo (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[0] = %lld\n", vec_uarg1_di[0]);
+    printf(" vec_uarg2_di[0] = %lld\n", vec_uarg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Divide Quadword */
+  vec_arg1[0] = -12345678;
+  vec_arg2[0] = 2;
+  expected_result = -6172839;
+
+  vec_result = vec_div (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24680;
+  vec_uarg2[0] = 4;
+  uexpected_result = 6170;
+
+  vec_uresult = vec_div (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Divide Extended Quadword */
+  vec_arg1[0] = -20;        // has 128-bit of zero concatenated onto it
+  vec_arg2[0] = 0x2000000000000000;
+  vec_arg2[0] = vec_arg2[0] << 64;
+  expected_result = -160;
+
+  vec_result = vec_dive (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 20;        // has 128-bit of zero concatenated onto it
+  vec_uarg2[0] = 0x4000000000000000;
+  vec_uarg2[0] = vec_uarg2[0] << 64;
+  uexpected_result = 80;
+
+  vec_uresult = vec_dive (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector modulo quad word  */
+  vec_arg1[0] = -12345675;
+  vec_arg2[0] = 2;
+  expected_result = -1;
+
+  vec_result = vec_mod (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24685;
+  vec_uarg2[0] = 4;
+  uexpected_result = 1;
+
+  vec_uresult = vec_mod (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  return 0;
+}
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support
  2020-09-24 18:21   ` will schmidt
@ 2020-10-05 18:52     ` Carl Love
  2020-10-08 14:22       ` will schmidt
  0 siblings, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-10-05 18:52 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will, Segher:

Add support for converting to/from 128-bit integers and 128-bit 
decimal floating point formats.

The updates from the previous version of the patch:

Just a fix for the change log per Will's comments.

No regression failures were found when run on a P9.

Please let me know if this is ready for mainline. 

               Carl

--------------------------------------------------------------


gcc/ChangeLog

2020-10-05  Carl Love  <cel@us.ibm.com>
	* config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
	* config/rs6000/rs6000-call.c (P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P):
	New overloaded definitions.

gcc/testsuite/ChangeLog

2020-10-05  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/int_128bit-runnable.c:  Update test.
---
 gcc/config/rs6000/dfp.md                      | 14 +++++
 gcc/config/rs6000/rs6000-call.c               |  4 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 62 +++++++++++++++++++
 3 files changed, 80 insertions(+)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 8f822732bac..0e82e315fee 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -222,6 +222,13 @@
   "dcffixq %0,%1"
   [(set_attr "type" "dfp")])
 
+(define_insn "floattitd2"
+  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
+	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
+  "TARGET_POWER10"
+  "dcffixqq %0,%1"
+  [(set_attr "type" "dfp")])
+
 ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
 ;; This is the first stage of converting it to an integer type.
 
@@ -241,6 +248,13 @@
   "TARGET_DFP"
   "dctfix<q> %0,%1"
   [(set_attr "type" "dfp")])
+
+(define_insn "fixtdti2"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
+	(fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
+  "TARGET_POWER10"
+  "dctfixqq %0,%1"
+  [(set_attr "type" "dfp")])
 
 ;; Decimal builtin support
 
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 87fff5c1c80..8d00a25d806 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -4967,6 +4967,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, 0 },
   { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
     RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
 
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
@@ -5074,6 +5076,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
     RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 85ad544e22b..ec3dcf3dff1 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -38,6 +38,7 @@
 #if DEBUG
 #include <stdio.h>
 #include <stdlib.h>
+#include <math.h>
 
 
 void print_i128(__int128_t val)
@@ -59,6 +60,13 @@ int main ()
   __int128_t arg1, result;
   __uint128_t uarg2;
 
+  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
+
+  struct conv_t {
+    __uint128_t u128;
+    _Decimal128 d128;
+  } conv, conv2;
+
   vector signed long long int vec_arg1_di, vec_arg2_di;
   vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
   vector unsigned long long int vec_uresult_di;
@@ -2249,6 +2257,60 @@ int main ()
     abort();
 #endif
   }
+  
+  /* DFP to __int128 and __int128 to DFP conversions */
+  /* Can't get printing of DFP values to work.  Print the DFP value as an
+     unsigned int so we can see the bit patterns.  */
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
+  expected_result_dfp128 = conv.d128;
 
+  arg1 = 4;
+
+  conv.d128 = (_Decimal128) arg1;
+
+  result_dfp128 = (_Decimal128) arg1;
+  if (((conv.u128 >>64) != 0x2208000000000000ULL) &&
+      ((conv.u128 & 0xFFFFFFFFFFFFFFFF) != 0x4ULL)) {
+#if DEBUG
+    printf("ERROR:  convert int128 value ");
+    print_i128 (arg1);
+    conv.d128 = result_dfp128;
+    printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)((conv.u128) >>64),
+	   (unsigned long long)((conv.u128) & 0xFFFFFFFFFFFFFFFF));
+
+    conv.d128 = expected_result_dfp128;
+    printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
+	   (unsigned long long) (conv.u128>>64),
+	   (unsigned long long) (conv.u128 & 0xFFFFFFFFFFFFFFFF));
+#else
+    abort();
+#endif
+  }
+
+  expected_result = 4;
+
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
+  arg1_dfp128 = conv.d128;
+
+  result = (__int128_t) arg1_dfp128;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR:  convert DFP value ");
+    printf("0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)(conv.u128>>64),
+	   (unsigned long long)(conv.u128 & 0xFFFFFFFFFFFFFFFF));
+    printf("to __int128 value = ");
+    print_i128 (result);
+    printf("\ndoes not match expected_result = ");
+    print_i128 (expected_result);
+    printf("\n");
+#else
+    abort();
+#endif
+  }
   return 0;
 }
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/5] Test 128-bit shifts for just the int128 type.
  2020-09-24 18:21   ` will schmidt
@ 2020-10-05 18:52     ` Carl Love
  2020-10-08 14:50       ` will schmidt
  0 siblings, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-10-05 18:52 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will, Segher:

Patch 4 adds the vector 128-bit integer shift instruction support for
the V1TI type.

The changes from the previous version include:

Fixed up the change log entry issues noted by Will.

Regression tests reran on Power 9 LE with no regression errors.

Please let me know if it looks OK to commit to mainline.

                  Carl 
---------------------------------------------

gcc/ChangeLog

2020-10-05  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
	Rename to altivec_vslq_<mode>, altivec_vsrq_<mode>, mode VEC_TI.
	* config/rs6000/vector.md (VEC_TI): New name for VSX_TI iterator.
	(vashlv1ti3): Change to vashl<mode>3, mode VEC_TI.
	(vlshrv1ti3): Change to vlshr<mode>3, mode VEC_TI.
	* config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator.

gcc/testsuite/ChangeLog

2020-10-05  Carl Love  <cel@us.ibm.com>
	gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
	tests.
---
 gcc/config/rs6000/altivec.md                  | 16 ++++-----
 gcc/config/rs6000/vector.md                   | 27 ++++++++-------
 gcc/config/rs6000/vsx.md                      | 33 +++++++++----------
 .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++++++--
 4 files changed, 52 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 34a4731342a..5db3de3cc9f 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2219,10 +2219,10 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vslq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vslq_<mode>"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+		     (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vslq %0,%1,%2"
@@ -2236,10 +2236,10 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vsrq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vsrq_<mode>"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+			   (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vsrq %0,%1,%2"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 0cca4232619..3ea3a91845a 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,6 +26,9 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
 
+;; 128-bit int modes
+(define_mode_iterator VEC_TI [V1TI TI])
+
 ;; Vector int modes for parity
 (define_mode_iterator VEC_IP [V8HI
 			      V4SI
@@ -1627,17 +1630,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vashlv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vashl<mode>3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+			 (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vslq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
@@ -1650,17 +1653,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vlshrv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vlshr<mode>3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+			   (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vsrq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 5b6a0bd728a..87f96ffcc4c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -37,9 +37,6 @@
 				  TI
 				  V1TI])
 
-;; Iterator for 128-bit integer types that go in a single vector register.
-(define_mode_iterator VSX_TI [TI V1TI])
-
 ;; Iterator for the 2 32-bit vector types
 (define_mode_iterator VSX_W [V4SF V4SI])
 
@@ -944,9 +941,9 @@
 ;; special V1TI container class, which it is not appropriate to use vec_select
 ;; for the type.
 (define_insn "*vsx_le_permute_<mode>"
-  [(set (match_operand:VSX_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
-	(rotate:VSX_TI
-	 (match_operand:VSX_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
+  [(set (match_operand:VEC_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
+	(rotate:VEC_TI
+	 (match_operand:VEC_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
 	 (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
   "@
@@ -960,10 +957,10 @@
    (set_attr "type" "vecperm,vecload,vecstore,*,load,store")])
 
 (define_insn_and_split "*vsx_le_undo_permute_<mode>"
-  [(set (match_operand:VSX_TI 0 "vsx_register_operand" "=wa,wa")
-	(rotate:VSX_TI
-	 (rotate:VSX_TI
-	  (match_operand:VSX_TI 1 "vsx_register_operand" "0,wa")
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=wa,wa")
+	(rotate:VEC_TI
+	 (rotate:VEC_TI
+	  (match_operand:VEC_TI 1 "vsx_register_operand" "0,wa")
 	  (const_int 64))
 	 (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX"
@@ -1031,11 +1028,11 @@
 ;; Peepholes to catch loads and stores for TImode if TImode landed in
 ;; GPR registers on a little endian system.
 (define_peephole2
-  [(set (match_operand:VSX_TI 0 "int_reg_operand")
-	(rotate:VSX_TI (match_operand:VSX_TI 1 "memory_operand")
+  [(set (match_operand:VEC_TI 0 "int_reg_operand")
+	(rotate:VEC_TI (match_operand:VEC_TI 1 "memory_operand")
 		       (const_int 64)))
-   (set (match_operand:VSX_TI 2 "int_reg_operand")
-	(rotate:VSX_TI (match_dup 0)
+   (set (match_operand:VEC_TI 2 "int_reg_operand")
+	(rotate:VEC_TI (match_dup 0)
 		       (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
    && (rtx_equal_p (operands[0], operands[2])
@@ -1043,11 +1040,11 @@
    [(set (match_dup 2) (match_dup 1))])
 
 (define_peephole2
-  [(set (match_operand:VSX_TI 0 "int_reg_operand")
-	(rotate:VSX_TI (match_operand:VSX_TI 1 "int_reg_operand")
+  [(set (match_operand:VEC_TI 0 "int_reg_operand")
+	(rotate:VEC_TI (match_operand:VEC_TI 1 "int_reg_operand")
 		       (const_int 64)))
-   (set (match_operand:VSX_TI 2 "memory_operand")
-	(rotate:VSX_TI (match_dup 0)
+   (set (match_operand:VEC_TI 2 "memory_operand")
+	(rotate:VEC_TI (match_dup 0)
 		       (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
    && peep2_reg_dead_p (2, operands[0])"
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index ec3dcf3dff1..25e2c9d1af4 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -53,6 +53,18 @@ void print_i128(__int128_t val)
 
 void abort (void);
 
+__attribute__((noinline))
+__int128_t shift_right (__int128_t a, __uint128_t b)
+{
+  return a >> b;
+}
+
+__attribute__((noinline))
+__int128_t shift_left (__int128_t a, __uint128_t b)
+{
+  return a << b;
+}
+
 int main ()
 {
   int i, result_int;
@@ -141,7 +153,7 @@ int main ()
 #endif
   }
 
-  arg1 = 3;
+  arg1 = vec_result[0];
   uarg2 = 4;
   expected_result = arg1*16;
 
@@ -225,7 +237,7 @@ int main ()
 #endif
   }
 
-  arg1 = 48;
+  arg1 = vec_uresult[0];
   uarg2 = 4;
   expected_result = arg1/16;
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/5] Conversions between 128-bit integer and floating point values.
  2020-09-24 18:22   ` will schmidt
@ 2020-10-05 18:52     ` Carl Love
  2020-10-08 15:35       ` will schmidt
  0 siblings, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-10-05 18:52 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will, Segher:

This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats using the new P10 instructions
dcffixqq and dctfixqq.  The new instructions are only used on P10 HW,
otherwise the conversions continue to use the existing SW routines.

The changes from the previous version include:

Fixed up the change log entry issues noted by Will.

Regression tests reran on Power 9 LE with no regression errors.

Please let me know if it looks OK to commit to mainline.

                  Carl 
---------------------------------------------

gcc/ChangeLog

2020-10-05  Carl Love  <cel@us.ibm.com>
	* config/rs6000/rs6000.md (floatti<mode>2, floatunsti<mode>2,
	fix_trunc<mode>ti2, fixuns_trunc<mode>ti2): Add
	define_insn for mode IEEE 128.
	* libgcc/config/rs6000/fixkfti.c: Renamed to fixkfti-sw.c.
	Update source function name.  White space fixes.
	* libgcc/config/rs6000/fixunskfti.c: Renamed to fixunskfti-sw.c.
	Update source function name.  White space fixes.
	* libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
	New functions.
	* libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
	New macro.
	(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
	__fixunskfti_resolve): Add resolve functions.
	(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
	functions.
	* libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
	__fixtfti, __fixunstfti): Add editor commands to change
	names.
	* libgcc/config/rs6000/float128-sed-hw (__floattitf,
	__floatuntitf, __fixtfti, __fixunstfti): Add editor commands
	to change names.
	* libgcc/config/rs6000/floattikf.c: Renamed to floattikf-sw.c.
	* libgcc/config/rs6000/floatuntikf.c: Renamed to floatuntikf-sw.c.
	* libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
	__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
	__floatuntikf, __fixkfti, __fixunskfti): New extern declarations.
	* libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
	fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
	(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
	file names to fp128_ppc_funcs.

gcc/testsuite/ChangeLog

2020-10-05  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/fl128_conversions.c: New file.
---
 gcc/config/rs6000/rs6000.md                   |  36 +++
 .../gcc.target/powerpc/fp128_conversions.c    | 286 ++++++++++++++++++
 .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
 .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   7 +-
 libgcc/config/rs6000/float128-hw.c            |  24 ++
 libgcc/config/rs6000/float128-ifunc.c         |  44 ++-
 libgcc/config/rs6000/float128-sed             |   4 +
 libgcc/config/rs6000/float128-sed-hw          |   4 +
 .../rs6000/{floattikf.c => floattikf-sw.c}    |   4 +-
 .../{floatuntikf.c => floatuntikf-sw.c}       |   4 +-
 libgcc/config/rs6000/quad-float128.h          |  17 +-
 libgcc/config/rs6000/t-float128               |   3 +-
 12 files changed, 417 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
 rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
 rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (90%)
 rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
 rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 694ff70635e..5db5d0b4505 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6390,6 +6390,42 @@
    xscvsxddp %x0,%x1"
   [(set_attr "type" "fp")])
 
+(define_insn "floatti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvsqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "floatunsti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvuqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fix_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpsqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fixuns_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpuqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
 ; Allow the combiner to merge source memory operands to the conversion so that
 ; the optimizer/register allocator doesn't try to load the value too early in a
 ; GPR and then use store/load to move it to a FPR and suffer from a store-load
diff --git a/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
new file mode 100644
index 00000000000..dff24e1e2b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
@@ -0,0 +1,286 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10" } */
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mxscvsqqp\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mxscvuqqp\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mxscvqpsqz\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mxscvqpuqz\M} 1 { target { ppc_native_128bit } } } } */
+
+#include <stdio.h>
+#include <math.h>
+#include <fenv.h>
+#include <stdlib.h>
+#include <wchar.h>
+
+#define DEBUG 0
+
+void abort (void);
+
+float conv_i_2_fp( long long int a)
+{
+  return (float) a;
+}
+
+double conv_i_2_fpd( long long int a)
+{
+  return (double) a;
+}
+
+double conv_ui_2_fpd( unsigned long long int a)
+{
+  return (double) a;
+}
+
+__float128 conv_i128_2_fp128 (__int128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__float128 conv_ui128_2_fp128 (__uint128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__int128_t conv_fp128_2_i128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__int128_t) a;
+}
+
+__uint128_t conv_fp128_2_ui128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__uint128_t) a;
+}
+
+long double conv_i128_2_ld (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (long double) a;
+}
+
+__ibm128 conv_i128_2_ibm128 (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble, message uses IBM long double, no binary output
+  return (__ibm128) a;
+}
+
+int main()
+{
+	float a, expected_result_float;
+	double b, expected_result_double;
+	long long int c, expected_result_llint;
+	unsigned long long int u;
+	__int128_t d;
+	__uint128_t u128;
+	unsigned long long expected_result_uint128[2] ;
+	__float128 e;
+	long double ld;     // another 128-bit float version
+
+	union conv_t {
+		float a;
+		double b;
+		long long int c;
+		long long int128[2] ;
+		unsigned long long uint128[2] ;
+		unsigned long long int u;
+		__int128_t d;
+		__uint128_t u128;
+		__float128 e;
+		long double ld;     // another 128-bit float version
+	} conv, conv_result;
+
+	c = 20;
+	expected_result_llint = 20.00000;
+	a = conv_i_2_fp (c);
+
+	if (a != expected_result_llint) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fp(%lld) = %10.5f\n", c, a);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_llint);
+#else
+		abort();
+#endif
+	}
+
+	c = 20;
+	expected_result_double = 20.00000;
+	b = conv_i_2_fpd (c);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fpd(%lld) = %10.5f\n", d, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+	u = 20;
+	expected_result_double = 20.00000;
+	b = conv_ui_2_fpd (u);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_ui_2_fpd(%llu) = %10.5f\n", u, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+  /* Currently printing 128-bit float does not work correctly  */
+  d = -3210;
+  d = (d * 10000000000) + 9876543210;
+  conv_result.e = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0xc02bd2f9068d1160;
+  expected_result_uint128[0] = 0x0;
+  
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(-32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  d = 123;
+  d = (d * 10000000000) + 1234567890;
+  conv_result.ld = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x4271eab4c8ed2000;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(1231234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  /* Currently printing 128-bit float does not work correctly  */
+  u128 = 8760;
+  u128 = (u128 * 10000000000) + 1234567890;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+  expected_result_uint128[1] = 0x402d3eb101df8b48;
+  expected_result_uint128[0] = 0x0;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(87601234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  /* Currently printing 128-bit float does not work correctly  */
+  u128 = 3210;
+  u128 = (u128 * 10000000000) + 9876543210;
+  expected_result_uint128[1] = 0x402bd3429c8feea0;
+  expected_result_uint128[0] = 0x0;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 12345.6789;
+  expected_result_uint128[1] = 0x1407374883526960;
+  expected_result_uint128[0] = 0x3039;
+
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) =  ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = -6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0xffffffffffffe57b;
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x1a85;
+  conv_result.d = conv_fp128_2_ui128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_ui128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+	  
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  return 0;
+}
diff --git a/libgcc/config/rs6000/fixkfti.c b/libgcc/config/rs6000/fixkfti-sw.c
similarity index 96%
rename from libgcc/config/rs6000/fixkfti.c
rename to libgcc/config/rs6000/fixkfti-sw.c
index a22286228aa..d6bbbf889b7 100644
--- a/libgcc/config/rs6000/fixkfti.c
+++ b/libgcc/config/rs6000/fixkfti-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TItype
-__fixkfti (TFtype a)
+__fixkfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/fixunskfti.c b/libgcc/config/rs6000/fixunskfti-sw.c
similarity index 90%
rename from libgcc/config/rs6000/fixunskfti.c
rename to libgcc/config/rs6000/fixunskfti-sw.c
index ab232d92d24..5c8b94b8862 100644
--- a/libgcc/config/rs6000/fixunskfti.c
+++ b/libgcc/config/rs6000/fixunskfti-sw.c
@@ -5,7 +5,10 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
+
+   The fixunskfti.c file renamed fixunskfti-sw.c.  Updated the
+   source file names and made whitespace fixes
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +38,7 @@
 #include "quad-float128.h"
 
 UTItype
-__fixunskfti (TFtype a)
+__fixunskfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/float128-hw.c b/libgcc/config/rs6000/float128-hw.c
index 8705b53e22a..be8bd07e853 100644
--- a/libgcc/config/rs6000/float128-hw.c
+++ b/libgcc/config/rs6000/float128-hw.c
@@ -86,6 +86,30 @@ __floatdikf_hw (DItype_ppc a)
   return (TFtype) a;
 }
 
+TFtype
+__floattikf_hw (TItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TFtype
+__floatuntikf_hw (UTItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TItype_ppc
+__fixkfti_hw (TFtype a)
+{
+  return (TItype_ppc) a;
+}
+
+UTItype_ppc
+__fixunskfti_hw (TFtype a)
+{
+  return (UTItype_ppc) a;
+}
+
 TFtype
 __floatundikf_hw (UDItype_ppc a)
 {
diff --git a/libgcc/config/rs6000/float128-ifunc.c b/libgcc/config/rs6000/float128-ifunc.c
index c2f65912a74..c221be2c864 100644
--- a/libgcc/config/rs6000/float128-ifunc.c
+++ b/libgcc/config/rs6000/float128-ifunc.c
@@ -46,14 +46,9 @@
 #endif
 
 #define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW : SW)
+#define SW_OR_HW_ISA3_1(SW, HW) (__builtin_cpu_supports ("arch_3_1") ? HW : SW)
 
 /* Resolvers.  */
-
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 static __typeof__ (__addkf3_sw) *
 __addkf3_resolve (void)
 {
@@ -102,6 +97,18 @@ __floatdikf_resolve (void)
   return SW_OR_HW (__floatdikf_sw, __floatdikf_hw);
 }
 
+static __typeof__ (__floattikf_sw) *
+__floattikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floattikf_sw, __floattikf_hw);
+}
+
+static __typeof__ (__floatuntikf_sw) *
+__floatuntikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floatuntikf_sw, __floatuntikf_hw);
+}
+
 static __typeof__ (__floatunsikf_sw) *
 __floatunsikf_resolve (void)
 {
@@ -114,6 +121,19 @@ __floatundikf_resolve (void)
   return SW_OR_HW (__floatundikf_sw, __floatundikf_hw);
 }
 
+
+static __typeof__ (__fixkfti_sw) *
+__fixkfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixkfti_sw, __fixkfti_hw);
+}
+
+static __typeof__ (__fixunskfti_sw) *
+__fixunskfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixunskfti_sw, __fixunskfti_hw);
+}
+
 static __typeof__ (__fixkfsi_sw) *
 __fixkfsi_resolve (void)
 {
@@ -303,6 +323,18 @@ TFtype __floatsikf (SItype_ppc)
 TFtype __floatdikf (DItype_ppc)
   __attribute__ ((__ifunc__ ("__floatdikf_resolve")));
 
+TFtype __floattikf (TItype_ppc)
+  __attribute__ ((__ifunc__ ("__floattikf_resolve")));
+
+TFtype __floatuntikf (UTItype_ppc)
+  __attribute__ ((__ifunc__ ("__floatuntikf_resolve")));
+
+TItype_ppc __fixkfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixkfti_resolve")));
+
+UTItype_ppc __fixunskfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixunskfti_resolve")));
+
 TFtype __floatunsikf (USItype_ppc)
   __attribute__ ((__ifunc__ ("__floatunsikf_resolve")));
 
diff --git a/libgcc/config/rs6000/float128-sed b/libgcc/config/rs6000/float128-sed
index d9a089ff9ba..c0fcddb1959 100644
--- a/libgcc/config/rs6000/float128-sed
+++ b/libgcc/config/rs6000/float128-sed
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi/g
 s/__fixunstfdi/__fixunskfdi/g
 s/__fixunstfsi/__fixunskfsi/g
 s/__floatditf/__floatdikf/g
+s/__floattitf/__floattikf/g
+s/__floatuntitf/__floatuntikf/g
+s/__fixtfti/__fixkfti/g
+s/__fixunstfti/__fixunskfti/g
 s/__floatsitf/__floatsikf/g
 s/__floatunditf/__floatundikf/g
 s/__floatunsitf/__floatunsikf/g
diff --git a/libgcc/config/rs6000/float128-sed-hw b/libgcc/config/rs6000/float128-sed-hw
index acf36b0c17d..3d2bf556da1 100644
--- a/libgcc/config/rs6000/float128-sed-hw
+++ b/libgcc/config/rs6000/float128-sed-hw
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi_sw/g
 s/__fixunstfdi/__fixunskfdi_sw/g
 s/__fixunstfsi/__fixunskfsi_sw/g
 s/__floatditf/__floatdikf_sw/g
+s/__floattitf/__floattikf_sw/g
+s/__floatuntitf/__floatuntikf_sw/g
+s/__fixtfti/__fixkfti_sw/g
+s/__fixunstfti/__fixunskfti_sw/g
 s/__floatsitf/__floatsikf_sw/g
 s/__floatunditf/__floatundikf_sw/g
 s/__floatunsitf/__floatunsikf_sw/g
diff --git a/libgcc/config/rs6000/floattikf.c b/libgcc/config/rs6000/floattikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floattikf.c
rename to libgcc/config/rs6000/floattikf-sw.c
index 4e8c40cfbe4..110706352bb 100644
--- a/libgcc/config/rs6000/floattikf.c
+++ b/libgcc/config/rs6000/floattikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floattikf (TItype i)
+__floattikf_sw (TItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/floatuntikf.c b/libgcc/config/rs6000/floatuntikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floatuntikf.c
rename to libgcc/config/rs6000/floatuntikf-sw.c
index 8bfba4267d4..5e712a67e26 100644
--- a/libgcc/config/rs6000/floatuntikf.c
+++ b/libgcc/config/rs6000/floatuntikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floatuntikf (UTItype i)
+__floatuntikf_sw (UTItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/quad-float128.h b/libgcc/config/rs6000/quad-float128.h
index 32ef328a8ea..24712b9277f 100644
--- a/libgcc/config/rs6000/quad-float128.h
+++ b/libgcc/config/rs6000/quad-float128.h
@@ -87,19 +87,18 @@ extern USItype_ppc __fixunskfsi_sw (TFtype);
 extern UDItype_ppc __fixunskfdi_sw (TFtype);
 extern TFtype __floatsikf_sw (SItype_ppc);
 extern TFtype __floatdikf_sw (DItype_ppc);
+extern TFtype __floattikf_sw (TItype_ppc);
 extern TFtype __floatunsikf_sw (USItype_ppc);
 extern TFtype __floatundikf_sw (UDItype_ppc);
+extern TFtype __floatuntikf_sw (UTItype_ppc);
+extern TItype_ppc __fixkfti_sw (TFtype);
+extern UTItype_ppc __fixunskfti_sw (TFtype);
 extern IBM128_TYPE __extendkftf2_sw (TFtype);
 extern TFtype __trunctfkf2_sw (IBM128_TYPE);
 extern TCtype __mulkc3_sw (TFtype, TFtype, TFtype, TFtype);
 extern TCtype __divkc3_sw (TFtype, TFtype, TFtype, TFtype);
 
 #ifdef _ARCH_PPC64
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 extern TItype_ppc __fixkfti (TFtype);
 extern UTItype_ppc __fixunskfti (TFtype);
 extern TFtype __floattikf (TItype_ppc);
@@ -130,8 +129,12 @@ extern USItype_ppc __fixunskfsi_hw (TFtype);
 extern UDItype_ppc __fixunskfdi_hw (TFtype);
 extern TFtype __floatsikf_hw (SItype_ppc);
 extern TFtype __floatdikf_hw (DItype_ppc);
+extern TFtype __floattikf_hw (TItype_ppc);
 extern TFtype __floatunsikf_hw (USItype_ppc);
 extern TFtype __floatundikf_hw (UDItype_ppc);
+extern TFtype __floatuntikf_hw (UTItype_ppc);
+extern TItype_ppc __fixkfti_hw (TFtype);
+extern UTItype_ppc __fixunskfti_hw (TFtype);
 extern IBM128_TYPE __extendkftf2_hw (TFtype);
 extern TFtype __trunctfkf2_hw (IBM128_TYPE);
 extern TCtype __mulkc3_hw (TFtype, TFtype, TFtype, TFtype);
@@ -162,8 +165,12 @@ extern USItype_ppc __fixunskfsi (TFtype);
 extern UDItype_ppc __fixunskfdi (TFtype);
 extern TFtype __floatsikf (SItype_ppc);
 extern TFtype __floatdikf (DItype_ppc);
+extern TFtype __floattikf (TItype_ppc);
 extern TFtype __floatunsikf (USItype_ppc);
 extern TFtype __floatundikf (UDItype_ppc);
+extern TFtype __floatuntikf (UTItype_ppc);
+extern TItype_ppc __fixkfti (TFtype);
+extern UTItype_ppc __fixunskfti (TFtype);
 extern IBM128_TYPE __extendkftf2 (TFtype);
 extern TFtype __trunctfkf2 (IBM128_TYPE);
 
diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
index d5413445189..325b22fd49e 100644
--- a/libgcc/config/rs6000/t-float128
+++ b/libgcc/config/rs6000/t-float128
@@ -23,7 +23,8 @@ fp128_softfp_shared_obj	= $(addsuffix -sw_s$(objext),$(fp128_softfp_funcs))
 fp128_softfp_obj	= $(fp128_softfp_static_obj) $(fp128_softfp_shared_obj)
 
 # New functions for software emulation
-fp128_ppc_funcs		= floattikf floatuntikf fixkfti fixunskfti \
+fp128_ppc_funcs		= floattikf-sw floatuntikf-sw \
+			  fixkfti-sw fixunskfti-sw \
 			  extendkftf2-sw trunctfkf2-sw \
 			  sfp-exceptions _mulkc3 _divkc3 _powikf2
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations
  2020-10-05 18:51     ` Carl Love
@ 2020-10-07 21:08       ` will schmidt
  2020-10-12 20:15         ` Carl Love
  2020-10-12 20:43         ` Segher Boessenkool
  0 siblings, 2 replies; 41+ messages in thread
From: will schmidt @ 2020-10-07 21:08 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Mon, 2020-10-05 at 11:51 -0700, Carl Love wrote:
> Will, Segher:
> 
> Patch 1, adds the 128-bit sign extension instruction support and
> corresponding builtin support.


> 
> I updated the change log per the comments from Will.
> 
> Patch has been retested on Power 9 LE.
> 
> Pet me know if it is ready to commit to mainline.
> 
>                      Carl 
> 
> -----------------------------------------------------------
> 
> 
> gcc/ChangeLog
> 
> 2020-10-05  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
> 	for new builtins.
> 	* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
> 	overloaded builtin definitions.
> 	(VSIGNEXTSB2W, VSIGNEXTSH2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D):
> 	Add builtin expansions.
> 	* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
> 	P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
> 	* config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
> 	visible.
> 	* doc/extend.texi:  Add documentation for the vec_signexti and
> 	vec_signextll builtins.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-10-05  Carl Love  <cel@us.ibm.com>
> 	* gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
> ---
>  gcc/config/rs6000/altivec.h                   |   3 +
>  gcc/config/rs6000/rs6000-builtin.def          |   9 ++
>  gcc/config/rs6000/rs6000-call.c               |  13 ++
>  gcc/config/rs6000/vsx.md                      |   2 +-
>  gcc/doc/extend.texi                           |  15 ++
>  .../powerpc/p9-sign_extend-runnable.c         | 128 ++++++++++++++++++
>  6 files changed, 169 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index f7720d136c9..cfa5eda4cd5 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -494,6 +494,9 @@
> 
>  #define vec_xlx __builtin_vec_vextulx
>  #define vec_xrx __builtin_vec_vexturx
> +#define vec_signexti  __builtin_vec_vsignexti
> +#define vec_signextll __builtin_vec_vsignextll
> +
>  #endif

Can probably drop that blank line.


> 
>  /* Predicates.
> diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
> index e91a48ddf5f..4c2e9460949 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2715,6 +2715,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,	"vprtybd")
>  BU_P9V_OVERLOAD_1 (VPRTYBQ,	"vprtybq")
>  BU_P9V_OVERLOAD_1 (VPRTYBW,	"vprtybw")
>  BU_P9V_OVERLOAD_1 (VPARITY_LSBB,	"vparity_lsbb")
> +BU_P9V_OVERLOAD_1 (VSIGNEXTI,	"vsignexti")
> +BU_P9V_OVERLOAD_1 (VSIGNEXTLL,	"vsignextll")
> 
>  /* 2 argument functions added in ISA 3.0 (power9).  */
>  BU_P9_2 (CMPRB,	"byte_in_range",	CONST,	cmprb)
> @@ -2726,6 +2728,13 @@ BU_P9_OVERLOAD_2 (CMPRB,	"byte_in_range")
>  BU_P9_OVERLOAD_2 (CMPRB2,	"byte_in_either_range")
>  BU_P9_OVERLOAD_2 (CMPEQB,	"byte_in_set")
>  
> +/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1.  */

I have mixed feelings about straddling the ISA 3.0 and 3.1 ; but not
sure how to properly improve.  (I defer).
The rest LGTM, 
Thanks
-Will




> +BU_P9V_AV_1 (VSIGNEXTSB2W,	"vsignextsb2w",		CONST,  vsx_sign_extend_qi_v4si)
> +BU_P9V_AV_1 (VSIGNEXTSH2W,	"vsignextsh2w",		CONST,  vsx_sign_extend_hi_v4si)
> +BU_P9V_AV_1 (VSIGNEXTSB2D,	"vsignextsb2d",		CONST,  vsx_sign_extend_qi_v2di)
> +BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
> +BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
> +
>  /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
>  BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
>  BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index a8b520834c7..9e514a01012 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -5527,6 +5527,19 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI },
> 
> +  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 */
> +  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
> +    RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
> +    RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
> +
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
> +    RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
> +    RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
> +    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
> +
>    /* Overloaded built-in functions for ISA3.1 (power10). */
>    { P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB,
>      RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 },
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 4ff52455fd3..31fcffe8f33 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4787,7 +4787,7 @@
>    "vextsh2<wd> %0,%1"
>    [(set_attr "type" "vecexts")])
> 
> -(define_insn "*vsx_sign_extend_si_v2di"
> +(define_insn "vsx_sign_extend_si_v2di"
>    [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
>  	(unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
>  		     UNSPEC_VSX_SIGN_EXTEND))]
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 5571c4f2ff2..c1c2c9f9bf7 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -20800,6 +20800,21 @@ void vec_xst (vector unsigned char, int, vector unsigned char *);
>  void vec_xst (vector unsigned char, int, unsigned char *);
>  @end smallexample
> 
> +uThe following sign extension builtins are provided.
> +
> +@smallexample
> +vector signed int vec_signexti (vector signed char a)
> +vector signed long long vec_signextll (vector signed char a)
> +vector signed int vec_signexti (vector signed short a)
> +vector signed long long vec_signextll (vector signed short a)
> +vector signed long long vec_signextll (vector signed int a)
> +@end smallexample
> +
> +Each element of the result is produced by sign-extending the element of the
> +input vector that would fall in the least significant portion of the result
> +element. For example, a sign-extension of a vector signed char to a vector
> +signed long long will sign extend the rightmost byte of each doubleword.
> +
>  @node PowerPC AltiVec Built-in Functions Available on ISA 3.1
>  @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
> new file mode 100644
> index 00000000000..7bf979c6fd4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
> @@ -0,0 +1,128 @@
> +/* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
> +
> +/* These builtins were not defined until ISA 3.1 but only require ISA 3.0
> +   support.  */
> +
> +/* { dg-final { scan-assembler-times {\mvextsb2w\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvextsh2w\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvextsh2d\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvextsw2d\M} 1 } } */
> +
> +#include <altivec.h>
> +
> +#define DEBUG 0
> +
> +#if DEBUG
> +#include <stdio.h>
> +#include <stdlib.h>
> +#endif
> +
> +void abort (void);
> +
> +int main ()
> +{
> +  int i;
> +
> +  vector signed char vec_arg_qi, vec_result_qi;
> +  vector signed short int vec_arg_hi, vec_result_hi, vec_expected_hi;
> +  vector signed int vec_arg_wi, vec_result_wi, vec_expected_wi;
> +  vector signed long long vec_result_di, vec_expected_di;
> +
> +  /* test sign extend byte to word */
> +  vec_arg_qi = (vector signed char) {1, 2, 3, 4, 5, 6, 7, 8,
> +				     -1, -2, -3, -4, -5, -6, -7, -8};
> +  vec_expected_wi = (vector signed int) {1, 5, -1, -5};
> +
> +  vec_result_wi = vec_signexti (vec_arg_qi);
> +
> +  for (i = 0; i < 4; i++)
> +    if (vec_result_wi[i] != vec_expected_wi[i]) {
> +#if DEBUG
> +      printf("ERROR: vec_signexti(char, int):  ");
> +      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n",
> +	     i, i);
> +      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
> +      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  /* test sign extend byte to double */
> +  vec_arg_qi = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,
> +				    -1, -2, -3, -4, -5, -6, -7, -8};
> +  vec_expected_di = (vector signed long long int){1, -1};
> +
> +  vec_result_di = vec_signextll(vec_arg_qi);
> +
> +  for (i = 0; i < 2; i++)
> +    if (vec_result_di[i] != vec_expected_di[i]) {
> +#if DEBUG
> +      printf("ERROR: vec_signextll(byte, long long int):  ");
> +      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
> +      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
> +      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  /* test sign extend short to word */
> +  vec_arg_hi = (vector signed short int){1, 2, 3, 4, -1, -2, -3, -4};
> +  vec_expected_wi = (vector signed int){1, 3, -1, -3};
> +
> +  vec_result_wi = vec_signexti(vec_arg_hi);
> +
> +  for (i = 0; i < 4; i++)
> +    if (vec_result_wi[i] != vec_expected_wi[i]) {
> +#if DEBUG
> +      printf("ERROR: vec_signexti(short, int):  ");
> +      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n", i, i);
> +      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
> +      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  /* test sign extend short to double word */
> +  vec_arg_hi = (vector signed short int ){1, 3, 5, 7,  -1, -3, -5, -7};
> +  vec_expected_di = (vector signed long long int){1, -1};
> +
> +  vec_result_di = vec_signextll(vec_arg_hi);
> +
> +  for (i = 0; i < 2; i++)
> +    if (vec_result_di[i] != vec_expected_di[i]) {
> +#if DEBUG
> +      printf("ERROR: vec_signextll(short, double):  ");
> +      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
> +      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
> +      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  /* test sign extend word to double word */
> +  vec_arg_wi = (vector signed int ){1, 3, -1, -3};
> +  vec_expected_di = (vector signed long long int){1, -1};
> +
> +  vec_result_di = vec_signextll(vec_arg_wi);
> +
> +  for (i = 0; i < 2; i++)
> +    if (vec_result_di[i] != vec_expected_di[i]) {
> +#if DEBUG
> +      printf("ERROR: vec_signextll(word, double):  ");
> +      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
> +      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
> +      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  return 0;
> +}


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2a/5] rs6000, vec_rlnm builtin fix arguments
  2020-10-05 18:52     ` [PATCH 2a/5] rs6000, vec_rlnm builtin fix arguments Carl Love
@ 2020-10-07 21:08       ` will schmidt
  2020-10-12 20:15         ` Carl Love
  0 siblings, 1 reply; 41+ messages in thread
From: will schmidt @ 2020-10-07 21:08 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Mon, 2020-10-05 at 11:52 -0700, Carl Love wrote:
> Will, Segher:
> 
> 
> 
> The following changes were made from the previous version:
> 
> Per Will's comments, I split the bug fix from patch 2 into a separate
> patch.  This patch is the bug fix for the vec_rlnm builtin.

I recommend trying to keep a clean paragraph appropriate for the commit
log, separate from the history of the patch.  i.e. 

"This patch fixes an error in how the vec_rlnm() builtin parameters
were handled."




> 
> Regression tests reran on Power 9 LE with no regression errors.
> 
> Please let me know if it looks OK to commit to mainline.
> 
>                   Carl 
> 
> ----------------------------------------------------------
> 
> 
> gcc/ChangeLog
> 
> 2020-10-05  Carl Love  <cel@us.ibm.com>
> 
> 	* config/rs6000/altivec.h (vec_rlnm): Fix bug in argument generation.


Is there any testcase impact?  (If this doesn't fix an existing test,
may be worth adding one..)

LGTM,
Thanks
-Will

> ---
>  gcc/config/rs6000/altivec.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index 8a2dcda0144..f7720d136c9 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -183,7 +183,7 @@
>  #define vec_recipdiv __builtin_vec_recipdiv
>  #define vec_rlmi __builtin_vec_rlmi
>  #define vec_vrlnm __builtin_vec_rlnm
> -#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
> +#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
>  #define vec_rsqrt __builtin_vec_rsqrt
>  #define vec_rsqrte __builtin_vec_rsqrte
>  #define vec_signed __builtin_vec_vsigned


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2b/5] RS6000 add 128-bit Integer Operations
  2020-10-05 18:52     ` [PATCH 2b/5] RS6000 add 128-bit Integer Operations Carl Love
@ 2020-10-07 21:53       ` will schmidt
  2020-10-12 20:16         ` Carl Love
  2020-10-13  0:23         ` Segher Boessenkool
  0 siblings, 2 replies; 41+ messages in thread
From: will schmidt @ 2020-10-07 21:53 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Mon, 2020-10-05 at 11:52 -0700, Carl Love wrote:
> Will and Segher:
> 
> This is the rest of the second patch which adds the 128-bit integer
> support for divide, modulo, shift, compare of 128-bit
> integers instructions and builtin support.
> 
> In the last round of changes, the flag for the 128-bit operations was
> removed.  Per Will's comments, the  BU_P10_128BIT_* builtin definitions
> can be removed.  Instead we can just use P10V_BUILTIN. Similarly for
> the BU_P10_P builtin definition.  The commit log was updated to reflect
> the change.  There were a few change log entries for the 128-bit
> operations flag that needed removing.  As well as other fixes noted by
> Will.
> 
> The changes are all name changes not functional changes.  
> 
> No regression failures were found when run on a P9.
> 
> Please let me know if this is ready for mainline.  
> 
>                            Carl
> 
> --------------------------------------------------------
> 
> gcc/ChangeLog
> 
> 	2020-10/05  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
> 	for new builtins.
> 	* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
> 	UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
> 	(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
> 	altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
> 	altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
> 	altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
> 	altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
> 	define_insn.
> 	(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
> 	vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
> 	altivec_vrlqnm): New define_expands.
> 	* config/rs6000/rs6000-builtin.def (VCMPEQUT_P, VCMPGTST_P,
> 	VCMPGTUT_P): Add macro expansions.
> 	(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
> 	CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
> 	VCMPAET_P, VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ,
> 	VSLQ, VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
> 	MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
> 	(VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions.
> 	* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
> 	P10V_BUILTIN_CMPGE_1TI, P10V_BUILTIN_CMPGE_U1TI,
> 	P10V_BUILTIN_VCMPGTUT, P10V_BUILTIN_VCMPGTST,
> 	P10V_BUILTIN_CMPLE_1TI, P10V_BUILTIN_VCMPLE_U1TI,
> 	P10V_BUILTIN_128BIT_DIV_V1TI, P10V_BUILTIN_128BIT_UDIV_V1TI,
> 	P10V_BUILTIN_128BIT_VMULESD, P10V_BUILTIN_128BIT_VMULEUD,
> 	P10V_BUILTIN_128BIT_VMULOSD, P10V_BUILTIN_128BIT_VMULOUD,

Just sniff-checked a few.  Don't see the P10V_BUILTIN_128BIT_* entries
below.

> 	P10V_BUILTIN_VNOR_V1TI, P10V_BUILTIN_VNOR_V1TI_UNS,
> 	P10V_BUILTIN_128BIT_VRLQ, P10V_BUILTIN_128BIT_VRLQMI,
> 	P10V_BUILTIN_128BIT_VRLQNM, P10V_BUILTIN_128BIT_VSLQ,
> 	P10V_BUILTIN_128BIT_VSRQ, P10V_BUILTIN_128BIT_VSRAQ,
> 	P10V_BUILTIN_VCMPGTUT_P, P10V_BUILTIN_VCMPGTST_P,
> 	P10V_BUILTIN_VCMPEQUT_P, P10V_BUILTIN_VCMPGTUT_P,
> 	P10V_BUILTIN_VCMPGTST_P, P10V_BUILTIN_CMPNET,
> 	P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P,
> 	P10V_BUILTIN_128BIT_VSIGNEXTSD2Q, P10V_BUILTIN_128BIT_DIVES_V1TI,
> 	P10V_BUILTIN_128BIT_MODS_V1TI, P10V_BUILTIN_128BIT_MODU_V1TI):
> 	New overloaded definitions.
> 	(rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT,
> 	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
> 	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
> 	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
> 	P10_BUILTIN_CMPLE_U1TI]: New case statements.
> 	(rs6000_init_builtins) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
> 	New assignments.
> 	(altivec_init_builtins): New E_V1TImode case statement.
> 	(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
> 	P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
> 	P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
> 	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.

I don't see these keywords below.  Possibly P10V_*



> 	* config/rs6000/r6000.c (rs6000_handle_altivec_attribute)[E_TImode,
> 	E_V1TImode]: New case statements.
> 	* config/rs6000/r6000.h (RS6000_BTM_TI_VECTOR_OPS): New defines.
> 	(rs6000_builtin_type_index): New enum value RS6000_BTI_bool_V1TI.
> 	* config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti,
> 	vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti,
> 	vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
> 	vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
> 	vlshrv1ti3, vashrv1ti3): New define_expands.
> 	* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
> 	UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
> 	UNSPEC_VSX_MODUQ, UNSPEC_XXSWAPD_V1TI): New unspecs.
> 	(vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti, vsx_diveu_v1ti,
> 	vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti, vsx_sign_extend_v2di_v1ti):
> 	New define_insns.
> 	(vcmpnet): New define_expand.
> 	* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
> 	vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
> 	vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,
> 	vec_cmpge, vec_cmple, vec_all_eq, vec_all_ne, vec_all_gt, vec_all_lt,
> 	vec_all_ge, vec_all_le, vec_any_eq, vec_any_ne, vec_any_gt, vec_any_lt,
> 	vec_any_ge, vec_any_le.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-10-05 Carl Love  <cel@us.ibm.com>
> 	* gcc.target/powerpc/int_128bit-runnable.c: New test file.
> ---
>  gcc/config/rs6000/altivec.h                   |    4 +
>  gcc/config/rs6000/altivec.md                  |  240 ++
>  gcc/config/rs6000/rs6000-builtin.def          |   46 +-
>  gcc/config/rs6000/rs6000-call.c               |  138 +-
>  gcc/config/rs6000/rs6000.c                    |    1 +
>  gcc/config/rs6000/rs6000.h                    |    4 +-
>  gcc/config/rs6000/vector.md                   |  191 ++
>  gcc/config/rs6000/vsx.md                      |   97 +
>  gcc/doc/extend.texi                           |  174 ++
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 2254 +++++++++++++++++
>  10 files changed, 3145 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index cfa5eda4cd5..fc67073f79c 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -690,6 +690,10 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
> 
>  #ifdef _ARCH_PWR10
> +#define vec_signextq  __builtin_vec_vsignextq
> +#define vec_dive __builtin_vec_dive
> +#define vec_mod  __builtin_vec_mod
> +
>  /* May modify these macro definitions if future capabilities overload
>     with support for different vector argument and result types.  */


ok

>  #define vec_cntlzm(a, b)	__builtin_altivec_vclzdm (a, b)
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 0a2e634d6b0..34a4731342a 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -39,12 +39,16 @@
>     UNSPEC_VMULESH
>     UNSPEC_VMULEUW
>     UNSPEC_VMULESW
> +   UNSPEC_VMULEUD
> +   UNSPEC_VMULESD
>     UNSPEC_VMULOUB
>     UNSPEC_VMULOSB
>     UNSPEC_VMULOUH
>     UNSPEC_VMULOSH
>     UNSPEC_VMULOUW
>     UNSPEC_VMULOSW
> +   UNSPEC_VMULOUD
> +   UNSPEC_VMULOSD
>     UNSPEC_VPKPX
>     UNSPEC_VPACK_SIGN_SIGN_SAT
>     UNSPEC_VPACK_SIGN_UNS_SAT

ok


> @@ -628,6 +632,14 @@
>    "vcmpequ<VI_char> %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "altivec_eqv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(eq:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
> +		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vcmpequq %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_gt<mode>"
>    [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
>  	(gt:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
> @@ -636,6 +648,14 @@
>    "vcmpgts<VI_char> %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_gtv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(gt:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
> +		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vcmpgtsq %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_gtu<mode>"
>    [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
>  	(gtu:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
> @@ -644,6 +664,14 @@
>    "vcmpgtu<VI_char> %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_gtuv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
> +		  (match_operand:V1TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vcmpgtuq %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_eqv4sf"
>    [(set (match_operand:V4SF 0 "altivec_register_operand" "=v")
>  	(eq:V4SF (match_operand:V4SF 1 "altivec_register_operand" "v")
> @@ -1687,6 +1715,19 @@
>   DONE;
>  })
> 
> +(define_expand "vec_widen_umult_even_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
> + DONE;
> +})
> +
>  (define_expand "vec_widen_smult_even_v4si"
>    [(use (match_operand:V2DI 0 "register_operand"))
>     (use (match_operand:V4SI 1 "register_operand"))
> @@ -1700,6 +1741,19 @@
>    DONE;
>  })
> 
> +(define_expand "vec_widen_smult_even_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
> + else
> +    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
>  (define_expand "vec_widen_umult_odd_v16qi"
>    [(use (match_operand:V8HI 0 "register_operand"))
>     (use (match_operand:V16QI 1 "register_operand"))
> @@ -1765,6 +1819,19 @@
>    DONE;
>  })
> 
> +(define_expand "vec_widen_umult_odd_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +

ok


>  (define_expand "vec_widen_smult_odd_v4si"
>    [(use (match_operand:V2DI 0 "register_operand"))
>     (use (match_operand:V4SI 1 "register_operand"))
> @@ -1778,6 +1845,19 @@
>    DONE;
>  })
> 
> +(define_expand "vec_widen_smult_odd_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
>  (define_insn "altivec_vmuleub"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
> @@ -1859,6 +1939,15 @@
>    "vmuleuw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmuleud"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULEUD))]
> +  "TARGET_POWER10"
> +  "vmuleud %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  (define_insn "altivec_vmulouw"
>    [(set (match_operand:V2DI 0 "register_operand" "=v")
>         (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -1868,6 +1957,15 @@
>    "vmulouw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmuloud"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULOUD))]
> +  "TARGET_POWER10"
> +  "vmuloud %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  (define_insn "altivec_vmulesw"
>    [(set (match_operand:V2DI 0 "register_operand" "=v")
>         (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -1877,6 +1975,15 @@
>    "vmulesw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmulesd"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULESD))]
> +  "TARGET_POWER10"
> +  "vmulesd %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  (define_insn "altivec_vmulosw"
>    [(set (match_operand:V2DI 0 "register_operand" "=v")
>         (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -1886,6 +1993,15 @@
>    "vmulosw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmulosd"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULOSD))]
> +  "TARGET_POWER10"
> +  "vmulosd %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  ;; Vector pack/unpack
>  (define_insn "altivec_vpkpx"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
> @@ -1979,6 +2095,15 @@
>    "vrl<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vrlq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +;; rotate amount in needs to be in bits[57:63] of operand2.
> +  "vrlq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "altivec_vrl<VI_char>mi"
>    [(set (match_operand:VIlong 0 "register_operand" "=v")
>          (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
> @@ -1989,6 +2114,33 @@
>    "vrl<VI_char>mi %0,%2,%3"
>    [(set_attr "type" "veclogical")])
> 
> +(define_expand "altivec_vrlqmi"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
> +		      (match_operand:V1TI 2 "vsx_register_operand")
> +		      (match_operand:V1TI 3 "vsx_register_operand")]
> +		     UNSPEC_VRLMI))]
> +  "TARGET_POWER10"
> +{
> +  /* Mask bit begin, end fields need to be in bits [41:55] of 128-bit operand2. */
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[3]));
> +  emit_insn(gen_altivec_vrlqmi_inst (operands[0], operands[1], operands[2], tmp));
> +  DONE;
> +})
> +
> +(define_insn "altivec_vrlqmi_inst"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +		      (match_operand:V1TI 2 "vsx_register_operand" "0")
> +		      (match_operand:V1TI 3 "vsx_register_operand" "v")]
> +		     UNSPEC_VRLMI))]
> +  "TARGET_POWER10"
> +  "vrlqmi %0,%1,%3"
> +  [(set_attr "type" "veclogical")])
> +
>  (define_insn "altivec_vrl<VI_char>nm"
>    [(set (match_operand:VIlong 0 "register_operand" "=v")
>          (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
> @@ -1998,6 +2150,31 @@
>    "vrl<VI_char>nm %0,%1,%2"
>    [(set_attr "type" "veclogical")])
> 
> +(define_expand "altivec_vrlqnm"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
> +		      (match_operand:V1TI 2 "vsx_register_operand")]
> +		     UNSPEC_VRLNM))]
> +  "TARGET_POWER10"
> +{
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vrlqnm_inst (operands[0], operands[1], tmp));
> +  DONE;
> +})

needs space on "emit_insn ("


> +
> +(define_insn "altivec_vrlqnm_inst"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +		      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +		     UNSPEC_VRLNM))]
> +  "TARGET_POWER10"
> +  ;; rotate and mask bits need to be in upper 64-bits of operand2.
> +  "vrlqnm %0,%1,%2"
> +  [(set_attr "type" "veclogical")])
> +
>  (define_insn "altivec_vsl"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -2042,6 +2219,15 @@
>    "vsl<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vslq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
> +  "vslq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "*altivec_vsr<VI_char>"
>    [(set (match_operand:VI2 0 "register_operand" "=v")
>          (lshiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
> @@ -2050,6 +2236,15 @@
>    "vsr<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vsrq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
> +  "vsrq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "*altivec_vsra<VI_char>"
>    [(set (match_operand:VI2 0 "register_operand" "=v")
>          (ashiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
> @@ -2058,6 +2253,15 @@
>    "vsra<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vsraq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
> +  "vsraq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "altivec_vsr"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -2618,6 +2822,18 @@
>    "vcmpequ<VI_char>. %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "altivec_vcmpequt_p"
> +  [(set (reg:CC CR6_REGNO)
> +	(unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v")
> +			   (match_operand:V1TI 2 "altivec_register_operand" "v"))]
> +		   UNSPEC_PREDICATE))
> +   (set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(eq:V1TI (match_dup 1)
> +		 (match_dup 2)))]
> +  "TARGET_POWER10"
> +  "vcmpequq. %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_vcmpgts<VI_char>_p"
>    [(set (reg:CC CR6_REGNO)
>  	(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
> @@ -2630,6 +2846,18 @@
>    "vcmpgts<VI_char>. %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_vcmpgtst_p"
> +  [(set (reg:CC CR6_REGNO)
> +	(unspec:CC [(gt:CC (match_operand:V1TI 1 "register_operand" "v")
> +			   (match_operand:V1TI 2 "register_operand" "v"))]
> +		   UNSPEC_PREDICATE))
> +   (set (match_operand:V1TI 0 "register_operand" "=v")
> +	(gt:V1TI (match_dup 1)
> +		 (match_dup 2)))]
> +  "TARGET_POWER10"
> +  "vcmpgtsq. %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_vcmpgtu<VI_char>_p"
>    [(set (reg:CC CR6_REGNO)
>  	(unspec:CC [(gtu:CC (match_operand:VI2 1 "register_operand" "v")
> @@ -2642,6 +2870,18 @@
>    "vcmpgtu<VI_char>. %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_vcmpgtut_p"
> +  [(set (reg:CC CR6_REGNO)
> +	(unspec:CC [(gtu:CC (match_operand:V1TI 1 "register_operand" "v")
> +			    (match_operand:V1TI 2 "register_operand" "v"))]
> +		   UNSPEC_PREDICATE))
> +   (set (match_operand:V1TI 0 "register_operand" "=v")
> +	(gtu:V1TI (match_dup 1)
> +		  (match_dup 2)))]
> +  "TARGET_POWER10"
> +  "vcmpgtuq. %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_vcmpeqfp_p"
>    [(set (reg:CC CR6_REGNO)
>  	(unspec:CC [(eq:CC (match_operand:V4SF 1 "register_operand" "v")

ok

> diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
> index 4c2e9460949..14345f9ea9d 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -1178,7 +1178,6 @@
>  		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
>  		     | RS6000_BTC_TERNARY),				\
>  		    CODE_FOR_ ## ICODE)			/* ICODE */
> -
> 

spurious whitespace drop.  (unrelated, but a good change).


>  /* Insure 0 is not a legitimate index.  */
>  BU_SPECIAL_X (RS6000_BUILTIN_NONE, NULL, 0, RS6000_BTC_MISC)
> @@ -2736,6 +2735,10 @@ BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
>  BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
> 
>  /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
> +BU_P10V_AV_2 (VCMPEQUT_P,	"vcmpequt_p",	CONST,	vector_eq_v1ti_p)
> +BU_P10V_AV_2 (VCMPGTST_P,	"vcmpgtst_p",	CONST,	vector_gt_v1ti_p)
> +BU_P10V_AV_2 (VCMPGTUT_P,	"vcmpgtut_p",	CONST,	vector_gtu_v1ti_p)
> +
>  BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
>  BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
>  BU_P10_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm)
> @@ -2756,7 +2759,38 @@ BU_P10V_VSX_2 (XXGENPCVM_V16QI, "xxgenpcvm_v16qi", CONST, xxgenpcvm_v16qi)
>  BU_P10V_VSX_2 (XXGENPCVM_V8HI, "xxgenpcvm_v8hi", CONST, xxgenpcvm_v8hi)
>  BU_P10V_VSX_2 (XXGENPCVM_V4SI, "xxgenpcvm_v4si", CONST, xxgenpcvm_v4si)
>  BU_P10V_VSX_2 (XXGENPCVM_V2DI, "xxgenpcvm_v2di", CONST, xxgenpcvm_v2di)
> -
> +BU_P10V_AV_2 (VCMPGTUT,		"vcmpgtut",	CONST,	vector_gtuv1ti)
> +BU_P10V_AV_2 (VCMPGTST,		"vcmpgtst",	CONST,	vector_gtv1ti)
> +BU_P10V_AV_2 (VCMPEQUT,		"vcmpequt",	CONST,	eqvv1ti3)
> +BU_P10V_AV_2 (CMPNET,		"vcmpnet",	CONST,	vcmpnet)
> +BU_P10V_AV_2 (CMPGE_1TI,	"cmpge_1ti",    CONST,  vector_nltv1ti)
> +BU_P10V_AV_2 (CMPGE_U1TI,	"cmpge_u1ti",   CONST,  vector_nltuv1ti)
> +BU_P10V_AV_2 (CMPLE_1TI,	"cmple_1ti",    CONST,  vector_ngtv1ti)
> +BU_P10V_AV_2 (CMPLE_U1TI,	"cmple_u1ti",   CONST,  vector_ngtuv1ti)
> +BU_P10V_AV_2 (VNOR_V1TI_UNS,	"vnor_v1ti_uns",CONST,	norv1ti3)
> +BU_P10V_AV_2 (VNOR_V1TI,	"vnor_v1ti",	CONST,	norv1ti3)
> +BU_P10V_AV_2 (VCMPNET_P,	"vcmpnet_p",	CONST,	vector_ne_v1ti_p)
> +BU_P10V_AV_2 (VCMPAET_P,	"vcmpaet_p",	CONST,	vector_ae_v1ti_p)
> +
> +BU_P10V_AV_1 (VSIGNEXTSD2Q,	"vsignext",     CONST,  vsx_sign_extend_v2di_v1ti)
> +
> +BU_P10V_AV_2 (VMULEUD,	"vmuleud",	CONST,	vec_widen_umult_even_v2di)
> +BU_P10V_AV_2 (VMULESD,	"vmulesd",	CONST,	vec_widen_smult_even_v2di)
> +BU_P10V_AV_2 (VMULOUD,	"vmuloud",	CONST,	vec_widen_umult_odd_v2di)
> +BU_P10V_AV_2 (VMULOSD,	"vmulosd",	CONST,	vec_widen_smult_odd_v2di)
> +BU_P10V_AV_2 (VRLQ,		"vrlq",		CONST,	vrotlv1ti3)
> +BU_P10V_AV_2 (VSLQ,		"vslq",		CONST,	vashlv1ti3)
> +BU_P10V_AV_2 (VSRQ,		"vsrq",		CONST,	vlshrv1ti3)
> +BU_P10V_AV_2 (VSRAQ,		"vsraq",	CONST,	vashrv1ti3)
> +BU_P10V_AV_2 (VRLQNM,	"vrlqnm",	CONST,	altivec_vrlqnm)
> +BU_P10V_AV_2 (DIV_V1TI,	"div_1ti",      CONST,  vsx_div_v1ti)
> +BU_P10V_AV_2 (UDIV_V1TI,	"udiv_1ti",     CONST,  vsx_udiv_v1ti)
> +BU_P10V_AV_2 (DIVES_V1TI,	"dives",	CONST,	vsx_dives_v1ti)
> +BU_P10V_AV_2 (DIVEU_V1TI,	"diveu",	CONST,	vsx_diveu_v1ti)
> +BU_P10V_AV_2 (MODS_V1TI,	"mods",		CONST,	vsx_mods_v1ti)
> +BU_P10V_AV_2 (MODU_V1TI,	"modu",		CONST,	vsx_modu_v1ti)
> +
> +BU_P10V_AV_3 (VRLQMI,	"vrlqmi",	CONST,	altivec_vrlqmi)
>  BU_P10V_AV_3 (VEXTRACTBL, "vextdubvlx", CONST, vextractlv16qi)
>  BU_P10V_AV_3 (VEXTRACTHL, "vextduhvlx", CONST, vextractlv8hi)
>  BU_P10V_AV_3 (VEXTRACTWL, "vextduwvlx", CONST, vextractlv4si)
> @@ -2863,6 +2897,12 @@ BU_P10_OVERLOAD_2 (CLRR, "clrr")
>  BU_P10_OVERLOAD_2 (GNB, "gnb")
>  BU_P10_OVERLOAD_4 (XXEVAL, "xxeval")
>  BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
> +BU_P10_OVERLOAD_2 (VRLQ, "vrlq")
> +BU_P10_OVERLOAD_2 (VSLQ, "vslq")
> +BU_P10_OVERLOAD_2 (VSRQ, "vsrq")
> +BU_P10_OVERLOAD_2 (VSRAQ, "vsraq")
> +BU_P10_OVERLOAD_2 (DIVE, "dive")
> +BU_P10_OVERLOAD_2 (MOD, "mod")
> 
>  BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
>  BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
> @@ -2882,6 +2922,8 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
> 
> +BU_P10_OVERLOAD_1 (SIGNEXT, "vsignextq")
> +
> 
Can probably lose the blank line(s) there.


>  BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
>  BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 9e514a01012..87fff5c1c80 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -839,6 +839,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQFP,
>      RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPEQ, VSX_BUILTIN_XVCMPEQDP,
> @@ -885,6 +889,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U2DI,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, 0},
> +
> +  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
> +  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_U1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0},
>    { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTUB,
>      RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTSB,
> @@ -899,8 +909,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTUD,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTUT,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTSD,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTST,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTFP,
>      RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, VSX_BUILTIN_XVCMPGTDP,
> @@ -943,6 +957,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_CMPLE_U2DI,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, 0},
> +  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
> +  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_U1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0},
>    { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTUB,
>      RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTSB,
> @@ -995,6 +1014,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
>    { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIV_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_UDIV_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
>      RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
> @@ -1789,6 +1814,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_MULE, P8V_BUILTIN_VMULEUW,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULE, P10V_BUILTIN_VMULESD,
> +    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULE, P10V_BUILTIN_VMULEUD,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMULEUB, ALTIVEC_BUILTIN_VMULEUB,
>      RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMULESB, ALTIVEC_BUILTIN_VMULESB,
> @@ -1812,6 +1842,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_MULO, P8V_BUILTIN_VMULOUW,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULO, P10V_BUILTIN_VMULOSD,
> +    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULO, P10V_BUILTIN_VMULOUD,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_MULO, ALTIVEC_BUILTIN_VMULOSH,
>      RS6000_BTI_V4SI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMULOSH, ALTIVEC_BUILTIN_VMULOSH,
> @@ -1854,6 +1889,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI,
>      RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
> @@ -2115,6 +2160,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_RL, P8V_BUILTIN_VRLD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_RL, P10V_BUILTIN_VRLQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_RL, P10V_BUILTIN_VRLQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
>      RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
> @@ -2133,12 +2183,23 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { P9V_BUILTIN_VEC_RLMI, P9V_BUILTIN_VRLDMI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
> +  { P9V_BUILTIN_VEC_RLMI, P10V_BUILTIN_VRLQMI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI },
> +  { P9V_BUILTIN_VEC_RLMI, P10V_BUILTIN_VRLQMI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLWNM,
>      RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
>    { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLDNM,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, 0 },
> +  { P9V_BUILTIN_VEC_RLNM, P10V_BUILTIN_VRLQNM,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +  { P9V_BUILTIN_VEC_RLNM, P10V_BUILTIN_VRLQNM,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
>      RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
> @@ -2155,6 +2216,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SL, P8V_BUILTIN_VSLD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SL, P10V_BUILTIN_VSLQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SL, P10V_BUILTIN_VSLQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTDP,
>      RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
>    { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTSP,
> @@ -2351,6 +2417,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SR, P8V_BUILTIN_VSRD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SR, P10V_BUILTIN_VSRQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SR, P10V_BUILTIN_VSRQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
>      RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
> @@ -2379,6 +2450,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SRA, P8V_BUILTIN_VSRAD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SRA, P10V_BUILTIN_VSRAQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SRA, P10V_BUILTIN_VSRAQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
>      RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
> @@ -3996,12 +4072,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTUD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10V_BUILTIN_VCMPGTUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10V_BUILTIN_VCMPGTST_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, ALTIVEC_BUILTIN_VCMPGTFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, VSX_BUILTIN_XVCMPGTDP_P,
> @@ -4066,6 +4146,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10V_BUILTIN_VCMPEQUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10V_BUILTIN_VCMPEQUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, ALTIVEC_BUILTIN_VCMPEQFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
>    { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, VSX_BUILTIN_XVCMPEQDP_P,
> @@ -4117,12 +4201,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10V_BUILTIN_VCMPGTUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10V_BUILTIN_VCMPGTST_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, ALTIVEC_BUILTIN_VCMPGEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
> @@ -4771,6 +4859,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEW,
>      RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI,
> +    RS6000_BTI_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> 
>    /* The following 2 entries have been deprecated.  */
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEB_P,
> @@ -4871,6 +4965,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
>      RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
>      RS6000_BTI_bool_V2DI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> 
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
> @@ -4976,7 +5072,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
>      RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
>      RS6000_BTI_bool_V2DI, 0 },
> -
> +  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
> @@ -5903,6 +6000,21 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>   { P10_BUILTIN_VEC_XVTLSBB_ONES, P10V_BUILTIN_XVTLSBB_ONES,
>      RS6000_BTI_INTSI, RS6000_BTI_unsigned_V16QI, 0, 0 },
> 
> + { P10_BUILTIN_VEC_SIGNEXT, P10V_BUILTIN_VSIGNEXTSD2Q,
> +    RS6000_BTI_V1TI, RS6000_BTI_V2DI, 0, 0 },
> +
> + { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +
>    { RS6000_BUILTIN_NONE, RS6000_BUILTIN_NONE, 0, 0, 0, 0 }
>  };
> 
> @@ -12256,12 +12368,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case ALTIVEC_BUILTIN_VCMPEQUH:
>      case ALTIVEC_BUILTIN_VCMPEQUW:
>      case P8V_BUILTIN_VCMPEQUD:
> +    case P10V_BUILTIN_VCMPEQUT:
>        fold_compare_helper (gsi, EQ_EXPR, stmt);
>        return true;
> 
>      case P9V_BUILTIN_CMPNEB:
>      case P9V_BUILTIN_CMPNEH:
>      case P9V_BUILTIN_CMPNEW:
> +    case P10V_BUILTIN_CMPNET:
>        fold_compare_helper (gsi, NE_EXPR, stmt);
>        return true;
> 
> @@ -12273,6 +12387,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case VSX_BUILTIN_CMPGE_U4SI:
>      case VSX_BUILTIN_CMPGE_2DI:
>      case VSX_BUILTIN_CMPGE_U2DI:
> +    case P10V_BUILTIN_CMPGE_1TI:
> +    case P10V_BUILTIN_CMPGE_U1TI:
>        fold_compare_helper (gsi, GE_EXPR, stmt);
>        return true;
> 
> @@ -12284,6 +12400,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case ALTIVEC_BUILTIN_VCMPGTUW:
>      case P8V_BUILTIN_VCMPGTUD:
>      case P8V_BUILTIN_VCMPGTSD:
> +    case P10V_BUILTIN_VCMPGTUT:
> +    case P10V_BUILTIN_VCMPGTST:
>        fold_compare_helper (gsi, GT_EXPR, stmt);
>        return true;
> 
> @@ -12295,6 +12413,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case VSX_BUILTIN_CMPLE_U4SI:
>      case VSX_BUILTIN_CMPLE_2DI:
>      case VSX_BUILTIN_CMPLE_U2DI:
> +    case P10V_BUILTIN_CMPLE_1TI:
> +    case P10V_BUILTIN_CMPLE_U1TI:
>        fold_compare_helper (gsi, LE_EXPR, stmt);
>        return true;
> 
> @@ -13000,6 +13120,8 @@ rs6000_init_builtins (void)
>  					    ? "__vector __bool long"
>  					    : "__vector __bool long long",
>  					    bool_long_long_type_node, 2);
> +  bool_V1TI_type_node = rs6000_vector_type ("__vector __bool __int128",
> +					    intTI_type_node, 1);
>    pixel_V8HI_type_node = rs6000_vector_type ("__vector __pixel",
>  					     pixel_type_node, 8);
> 
> @@ -13185,6 +13307,10 @@ altivec_init_builtins (void)
>      = build_function_type_list (integer_type_node,
>  				integer_type_node, V2DI_type_node,
>  				V2DI_type_node, NULL_TREE);
> +  tree int_ftype_int_v1ti_v1ti
> +    = build_function_type_list (integer_type_node,
> +				integer_type_node, V1TI_type_node,
> +				V1TI_type_node, NULL_TREE);
>    tree void_ftype_v4si
>      = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
>    tree v8hi_ftype_void
> @@ -13537,6 +13663,9 @@ altivec_init_builtins (void)
>  	case E_VOIDmode:
>  	  type = int_ftype_int_opaque_opaque;
>  	  break;
> +	case E_V1TImode:
> +	  type = int_ftype_int_v1ti_v1ti;
> +	  break;
>  	case E_V2DImode:
>  	  type = int_ftype_int_v2di_v2di;
>  	  break;
> @@ -14136,6 +14265,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
>      case P10V_BUILTIN_XXGENPCVM_V8HI:
>      case P10V_BUILTIN_XXGENPCVM_V4SI:
>      case P10V_BUILTIN_XXGENPCVM_V2DI:
> +    case P10V_BUILTIN_VMULEUD:
> +    case P10V_BUILTIN_VMULOUD:
> +    case P10V_BUILTIN_DIVEU_V1TI:
> +    case P10V_BUILTIN_MODU_V1TI:
>        h.uns_p[0] = 1;
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;
> @@ -14235,10 +14368,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
>      case VSX_BUILTIN_CMPGE_U8HI:
>      case VSX_BUILTIN_CMPGE_U4SI:
>      case VSX_BUILTIN_CMPGE_U2DI:
> +    case P10V_BUILTIN_CMPGE_U1TI:
>      case ALTIVEC_BUILTIN_VCMPGTUB:
>      case ALTIVEC_BUILTIN_VCMPGTUH:
>      case ALTIVEC_BUILTIN_VCMPGTUW:
>      case P8V_BUILTIN_VCMPGTUD:
> +    case P10V_BUILTIN_VCMPGTUT:
> +    case P10V_BUILTIN_VCMPEQUT:
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;
>        break;
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 6f204ca202a..cc629f2d938 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -19546,6 +19546,7 @@ rs6000_handle_altivec_attribute (tree *node,
>      case 'b':
>        switch (mode)
>  	{
> +	case E_TImode: case E_V1TImode: result = bool_V1TI_type_node; break;
>  	case E_DImode: case E_V2DImode: result = bool_V2DI_type_node; break;
>  	case E_SImode: case E_V4SImode: result = bool_V4SI_type_node; break;
>  	case E_HImode: case E_V8HImode: result = bool_V8HI_type_node; break;

ok

> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index bbd8060e143..32ed95cc813 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2322,7 +2322,7 @@ extern int frame_pointer_needed;
>  #define RS6000_BTM_FLOAT128_HW	MASK_FLOAT128_HW /* IEEE 128-bit float h/w.  */
>  #define RS6000_BTM_MMA		MASK_MMA	/* ISA 3.1 MMA.  */
>  #define RS6000_BTM_P10		MASK_POWER10
> -
> +#define RS6000_BTM_TI_VECTOR_OPS MASK_TI_VECTOR_OPS /* 128-bit integer support */

No other uses of BTM_TI_VECTOR_OPS in this patch.  possibly later in
series ? 

> 
>  #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
>  				 | RS6000_BTM_VSX			\
> @@ -2436,6 +2436,7 @@ enum rs6000_builtin_type_index
>    RS6000_BTI_bool_V8HI,          /* __vector __bool short */
>    RS6000_BTI_bool_V4SI,          /* __vector __bool int */
>    RS6000_BTI_bool_V2DI,          /* __vector __bool long */
> +  RS6000_BTI_bool_V1TI,          /* __vector __bool 128-bit */
>    RS6000_BTI_pixel_V8HI,         /* __vector __pixel */
>    RS6000_BTI_long,	         /* long_integer_type_node */
>    RS6000_BTI_unsigned_long,      /* long_unsigned_type_node */
> @@ -2489,6 +2490,7 @@ enum rs6000_builtin_type_index
>  #define bool_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V8HI])
>  #define bool_V4SI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V4SI])
>  #define bool_V2DI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V2DI])
> +#define bool_V1TI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V1TI])
>  #define pixel_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_pixel_V8HI])
> 
>  #define long_long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long_long])
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 796345c80d3..0cca4232619 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -685,6 +685,13 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gtv1ti"
> +  [(set (match_operand:V1TI 0 "vlogical_operand")
> +	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
> +		 (match_operand:V1TI 2 "vlogical_operand")))]
> +  "TARGET_POWER10"
> +  "")
> +
>  ; >= for integer vectors: swap operands and apply not-greater-than
>  (define_expand "vector_nlt<mode>"
>    [(set (match_operand:VEC_I 3 "vlogical_operand")
> @@ -697,6 +704,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_nltv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gt:V1TI (match_operand:V1TI 2 "vlogical_operand")
> +		 (match_operand:V1TI 1 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +        (not:V1TI (match_dup 3)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  (define_expand "vector_gtu<mode>"
>    [(set (match_operand:VEC_I 0 "vint_operand")
>  	(gtu:VEC_I (match_operand:VEC_I 1 "vint_operand")
> @@ -704,6 +722,13 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gtuv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand")
> +	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand")
> +		  (match_operand:V1TI 2 "altivec_register_operand")))]
> +  "TARGET_POWER10"
> +  "")
> +
>  ; >= for integer vectors: swap operands and apply not-greater-than
>  (define_expand "vector_nltu<mode>"
>    [(set (match_operand:VEC_I 3 "vlogical_operand")
> @@ -716,6 +741,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_nltuv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gtu:V1TI (match_operand:V1TI 2 "vlogical_operand")
> +		  (match_operand:V1TI 1 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +	(not:V1TI (match_dup 3)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  (define_expand "vector_geu<mode>"
>    [(set (match_operand:VEC_I 0 "vint_operand")
>  	(geu:VEC_I (match_operand:VEC_I 1 "vint_operand")
> @@ -735,6 +771,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_ngtv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
> +		 (match_operand:V1TI 2 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +        (not:V1TI (match_dup 3)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  (define_expand "vector_ngtu<mode>"
>    [(set (match_operand:VEC_I 3 "vlogical_operand")
>  	(gtu:VEC_I (match_operand:VEC_I 1 "vlogical_operand")
> @@ -746,6 +793,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_ngtuv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gtu:V1TI (match_operand:V1TI 1 "vlogical_operand")
> +		  (match_operand:V1TI 2 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +        (not:V1TI (match_dup 3)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  ; There are 14 possible vector FP comparison operators, gt and eq of them have
>  ; been expanded above, so just support 12 remaining operators here.
> 
> @@ -894,6 +952,18 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_eq_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			     (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_operand:V1TI 0 "vlogical_operand")
> +	  (eq:V1TI (match_dup 1)
> +		   (match_dup 2)))])]
> +  "TARGET_POWER10"
> +  "")
> +
>  ;; This expansion handles the V16QI, V8HI, and V4SI modes in the
>  ;; implementation of the vec_all_ne built-in functions on Power9.
>  (define_expand "vector_ne_<mode>_p"
> @@ -976,6 +1046,23 @@
>    operands[3] = gen_reg_rtx (V2DImode);
>  })
> 
> +(define_expand "vector_ne_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			     (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_dup 3)
> +	  (eq:V1TI (match_dup 1)
> +		   (match_dup 2)))])
> +   (set (match_operand:SI 0 "register_operand" "=r")
> +	(eq:SI (reg:CC CR6_REGNO)
> +	       (const_int 0)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx (V1TImode);
> +})
> +
>  ;; This expansion handles the V2DI mode in the implementation of the
>  ;; vec_any_eq built-in function on Power9.
>  ;;
> @@ -1002,6 +1089,26 @@
>    operands[3] = gen_reg_rtx (V2DImode);
>  })
> 
> +(define_expand "vector_ae_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			     (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_dup 3)
> +	  (eq:V1TI (match_dup 1)
> +		   (match_dup 2)))])
> +   (set (match_operand:SI 0 "register_operand" "=r")
> +	(eq:SI (reg:CC CR6_REGNO)
> +	       (const_int 0)))
> +   (set (match_dup 0)
> +	(xor:SI (match_dup 0)
> +		(const_int 1)))]
> +  "TARGET_POWER10"
> +{
> +  operands[3] = gen_reg_rtx (V1TImode);
> +})
> +
>  ;; This expansion handles the V4SF and V2DF modes in the Power9
>  ;; implementation of the vec_all_ne built-in functions.  Note that the
>  ;; expansions for this pattern with these modes makes no use of power9-
> @@ -1061,6 +1168,18 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gt_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(gt:CC (match_operand:V1TI 1 "vlogical_operand")
> +			     (match_operand:V1TI 2 "vlogical_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_operand:V1TI 0 "vlogical_operand")
> +	  (gt:V1TI (match_dup 1)
> +		   (match_dup 2)))])]
> +  "TARGET_POWER10"
> +  "")
> +
>  (define_expand "vector_ge_<mode>_p"
>    [(parallel
>      [(set (reg:CC CR6_REGNO)
> @@ -1085,6 +1204,18 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gtu_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(gtu:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			      (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_operand:V1TI 0 "altivec_register_operand")
> +	  (gtu:V1TI (match_dup 1)
> +		    (match_dup 2)))])]
> +  "TARGET_POWER10"
> +  "")
> +
>  ;; AltiVec/VSX predicates.
> 
>  ;; This expansion is triggered during expansion of predicate built-in
> @@ -1460,6 +1591,20 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vrotlv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vrlq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
>  ;; Expanders for rotatert to make use of vrotl
>  (define_expand "vrotr<mode>3"
>    [(set (match_operand:VEC_I 0 "vint_operand")
> @@ -1481,6 +1626,21 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +;; No immediate version of this 128-bit instruction

I'm not sure the value of that comment.  (also below).

> +(define_expand "vashlv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
>  ;; Expanders for logical shift right on each vector element
>  (define_expand "vlshr<mode>3"
>    [(set (match_operand:VEC_I 0 "vint_operand")
> @@ -1489,6 +1649,21 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +;; No immediate version of this 128-bit instruction
> +(define_expand "vlshrv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
>  ;; Expanders for arithmetic shift right on each vector element
>  (define_expand "vashr<mode>3"
>    [(set (match_operand:VEC_I 0 "vint_operand")
> @@ -1496,6 +1671,22 @@
>  			(match_operand:VEC_I 2 "vint_operand")))]
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> +
> +;; No immediate version of this 128-bit instruction
> +(define_expand "vashrv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vsraq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
> 

ok


>  ;; Vector reduction expanders for VSX
>  ; The (VEC_reduc:...
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 31fcffe8f33..5b6a0bd728a 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -298,6 +298,12 @@
>     UNSPEC_VSX_XXSPLTD
>     UNSPEC_VSX_DIVSD
>     UNSPEC_VSX_DIVUD
> +   UNSPEC_VSX_DIVSQ
> +   UNSPEC_VSX_DIVUQ
> +   UNSPEC_VSX_DIVESQ
> +   UNSPEC_VSX_DIVEUQ
> +   UNSPEC_VSX_MODSQ
> +   UNSPEC_VSX_MODUQ
>     UNSPEC_VSX_MULSD
>     UNSPEC_VSX_SIGN_EXTEND
>     UNSPEC_VSX_XVCVBF16SPN
> @@ -1732,6 +1738,60 @@
>  }
>    [(set_attr "type" "div")])
> 
> +(define_insn "vsx_div_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVSQ))]
> +  "TARGET_POWER10"
> +  "vdivsq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_udiv_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVUQ))]
> +  "TARGET_POWER10"
> +  "vdivuq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_dives_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVESQ))]
> +  "TARGET_POWER10"
> +  "vdivesq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_diveu_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVEUQ))]
> +  "TARGET_POWER10"
> +  "vdiveuq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_mods_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_MODSQ))]
> +  "TARGET_POWER10"
> +  "vmodsq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_modu_v1ti"
> + [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_MODUQ))]
> +  "TARGET_POWER10"
> +  "vmoduq %0,%1,%2"
> +  [(set_attr "type" "div")])

ok


> +
>  ;; *tdiv* instruction returning the FG flag
>  (define_expand "vsx_tdiv<mode>3_fg"
>    [(set (match_dup 3)
> @@ -3083,6 +3143,21 @@
>    "xxpermdi %x0,%x1,%x1,2"
>    [(set_attr "type" "vecperm")])
> 
> +;; Swap upper/lower 64-bit values in a 128-bit vector
> +(define_insn "xxswapd_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(subreg:V1TI
> +           (vec_select:V2DI
> +             (subreg:V2DI
> +                (match_operand:V1TI 1 "vsx_register_operand" "v") 0 )
> +	     (parallel [(const_int 1)(const_int 0)]))
> +           0))]
> +  "TARGET_POWER10"
> +;; AIX does not support extended mnemonic xxswapd.  Use the basic
> +;; mnemonic xxpermdi instead.

I'd wonder if there can be additional logic using ( DEFAULT_ABI ==
ABI_AIX ) sort of check to resolve this.  It looks like this same
comment exists in multiple places througout our *.md files, so not
something that needs to be solved here today. 


> +  "xxpermdi %x0,%x1,%x1,2"
> +  [(set_attr "type" "vecperm")])
> +
>  (define_insn "xxgenpcvm_<mode>_internal"
>    [(set (match_operand:VSX_EXTRACT_I4 0 "altivec_register_operand" "=wa")
>  	(unspec:VSX_EXTRACT_I4
> @@ -4767,6 +4842,15 @@
>     (set_attr "type" "vecload")])
> 
> 
> +;; ISA 3.1 vector extend sign support
> +(define_insn "vsx_sign_extend_v2di_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
> +		     UNSPEC_VSX_SIGN_EXTEND))]
> +  "TARGET_POWER10"
> +  "vextsd2q %0,%1"
> +  [(set_attr "type" "vecexts")])
> +
>  ;; ISA 3.0 vector extend sign support
> 
>  (define_insn "vsx_sign_extend_qi_<mode>"
> @@ -5451,6 +5535,19 @@
>    "vcmpneb %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +;; Vector Compare Not Equal v1ti (specified/not+eq:)
> +(define_expand "vcmpnet"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand")
> +	(not:V1TI
> +	  (eq:V1TI (match_operand:V1TI 1 "altivec_register_operand")
> +		   (match_operand:V1TI 2 "altivec_register_operand"))))]
> +   "TARGET_POWER10"
> +{
> +  emit_insn (gen_eqvv1ti3 (operands[0], operands[1], operands[2]));
> +  emit_insn (gen_one_cmplv1ti2 (operands[0], operands[0]));
> +  DONE;
> +})
> +
>  ;; Vector Compare Not Equal or Zero Byte
>  (define_insn "vcmpnezb"
>    [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c1c2c9f9bf7..99cc053acbe 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21316,6 +21316,180 @@ Generate PCV from specified Mask size, as if implemented by the
>  immediate value is either 0, 1, 2 or 3.
>  @findex vec_genpcvm
> 
> +@smallexample
> +@exdent vector unsigned __int128 vec_rl (vector unsigned __int128,
> +                                         vector unsigned __int128);
> +@exdent vector signed __int128 vec_rl (vector signed __int128,
> +                                       vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of rotating the first input left by the number of bits
> +specified in the most significant quad word of the second input truncated to
> +7 bits (bits [125:131]).
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128,
> +                                           vector unsigned __int128,
> +                                           vector unsigned __int128);
> +@exdent vector signed __int128 vec_rlmi (vector signed __int128,
> +                                         vector signed __int128,
> +                                         vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of rotating the first input and inserting it under mask
> +into the second input.  The first bit in the mask, the last bit in the mask are
> +obtained from the two 7-bit fields bits [108:115] and bits [117:123]
> +respectively of the second input.  The shift is obtained from the third input
> +in the 7-bit field [125:131] where all bits counted from zero at the left.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128,
> +                                           vector unsigned __int128,
> +                                           vector unsigned __int128);
> +@exdent vector signed __int128 vec_rlnm (vector signed __int128,
> +                                         vector unsigned __int128,
> +                                         vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of rotating the first input and ANDing it with a mask.  The
> +first bit in the mask and the last bit in the mask are obtained from the two
> +7-bit fields bits [117:123] and bits [125:131] respectively of the second
> +input.  The shift is obtained from the third input in the 7-bit field bits
> +[125:131] where all bits counted from zero at the left.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_sl(vector unsigned __int128, vector unsigned __int128);
> +@exdent vector signed __int128 vec_sl(vector signed __int128, vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of shifting the first input left by the number of bits
> +specified in the most significant bits of the second input truncated to
> +7 bits (bits [125:131]).
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_sr(vector unsigned __int128, vector unsigned __int128);
> +@exdent vector signed __int128 vec_sr(vector signed __int128, vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of performing a logical right shift of the first argument
> +by the number of bits specified in the most significant double word of the
> +second input truncated to 7 bits (bits [125:131]).
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_sra(vector unsigned __int128, vector unsigned __int128);
> +@exdent vector signed __int128 vec_sra(vector signed __int128, vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of performing arithmetic right shift of the first argument
> +by the number of bits specified in the most significant bits of the
> +second input truncated to 7 bits (bits [125:131]).
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_mule (vector unsigned long long,
> +                                           vector unsigned long long);
> +@exdent vector signed __int128 vec_mule (vector signed long long,
> +                                         vector signed long long);
> +@end smallexample
> +
> +Returns a vector containing a 128-bit integer result of multiplying the even
> +doubleword elements of the two inputs.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_mulo (vector unsigned long long,
> +                                           vector unsigned long long);
> +@exdent vector signed __int128 vec_mulo (vector signed long long,
> +                                         vector signed long long);
> +@end smallexample
> +
> +Returns a vector containing a 128-bit integer result of multiplying the odd
> +doubleword elements of the two inputs.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_div (vector unsigned __int128,
> +                                          vector unsigned __int128);
> +@exdent vector signed __int128 vec_div (vector signed __int128,
> +                                        vector signed __int128);
> +@end smallexample
> +
> +Returns the result of dividing the first operand by the second operand. An
> +attempt to divide any value by zero or to divide the most negative signed
> +128-bit integer by negative one results in an undefined value.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_dive (vector unsigned __int128,
> +                                           vector unsigned __int128);
> +@exdent vector signed __int128 vec_dive (vector signed __int128,
> +                                         vector signed __int128);
> +@end smallexample
> +
> +The result is produced by shifting the first input left by 128 bits and
> +dividing by the second.  If an attempt is made to divide by zero or the result
> +is larger than 128 bits, the result is undefined.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_mod (vector unsigned __int128,
> +                                          vector unsigned __int128);
> +@exdent vector signed __int128 vec_mod (vector signed __int128,
> +                                        vector signed __int128);
> +@end smallexample
> +
> +The result is the modulo result of dividing the first input  by the second
> +input.
> +
> +The following builtins perform 128-bit vector comparisons.  The
> +@code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is
> +one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise
> +comparisons between the elements at the same positions within their two vector
> +arguments.  The @code{vec_all_xx}function returns a non-zero value if and only
> +if all pairwise comparisons are true.  The @code{vec_any_xx} function returns
> +a non-zero value if and only if at least one pairwise comparison is true.  The
> +@code{vec_cmpxx}function returns a vector of the same type as its two
> +arguments, within which each element consists of all ones to denote that
> +specified logical comparison of the corresponding elements was true.
> +Otherwise, the element of the returned vector contains all zeros.
> +
> +@smallexample
> +vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128);
> +
> +int vec_all_eq (vector signed __int128, vector signed __int128);
> +int vec_all_eq (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_ne (vector signed __int128, vector signed __int128);
> +int vec_all_ne (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_gt (vector signed __int128, vector signed __int128);
> +int vec_all_gt (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_lt (vector signed __int128, vector signed __int128);
> +int vec_all_lt (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_ge (vector signed __int128, vector signed __int128);
> +int vec_all_ge (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_le (vector signed __int128, vector signed __int128);
> +int vec_all_le (vector unsigned __int128, vector unsigned __int128);
> +
> +int vec_any_eq (vector signed __int128, vector signed __int128);
> +int vec_any_eq (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_ne (vector signed __int128, vector signed __int128);
> +int vec_any_ne (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_gt (vector signed __int128, vector signed __int128);
> +int vec_any_gt (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_lt (vector signed __int128, vector signed __int128);
> +int vec_any_lt (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_ge (vector signed __int128, vector signed __int128);
> +int vec_any_ge (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_le (vector signed __int128, vector signed __int128);
> +int vec_any_le (vector unsigned __int128, vector unsigned __int128);
> +@end smallexample
> +
> +
>  @node PowerPC Hardware Transactional Memory Built-in Functions
>  @subsection PowerPC Hardware Transactional Memory Built-in Functions
>  GCC provides two interfaces for accessing the Hardware Transactional
> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> new file mode 100644
> index 00000000000..85ad544e22b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -0,0 +1,2254 @@
> +/* { dg-do run } */
> +/* { dg-options "-mcpu=power10 -O2 -save-temps" } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-require-effective-target ppc_native_128bit } */

I don't see any other uses of the target option for ppc_native_128bit
in my tree ?


> +
> +/* Check that the expected 128-bit instructions are generated if the processor
> +   supports the 128-bit integer instructions. */
> +/* { dg-final { scan-assembler-times {\mvextsd2q\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvslq\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvsrq\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvsraq\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpuq\M} 0 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpsq\M} 0 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpequq\M} 0 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpequq.\M} 16 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 0 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtsq.\M} 16 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 0 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtuq.\M} 16 } } */
> +/* { dg-final { scan-assembler-times {\mvmuleud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivsq\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivuq\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdivesq\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvdiveuq\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmodsq\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvmoduq\M} 1 } } */


Probably OK.  I'd be tempted to split this up just because.


No further comments, 
lgtm
Thanks
-Will



> +
> +#include <altivec.h>
> +
> +#define DEBUG 0
> +
> +#if DEBUG
> +#include <stdio.h>
> +#include <stdlib.h>
> +
> +
> +void print_i128(__int128_t val)
> +{
> +  printf(" %lld %llu (0x%llx %llx)",
> +	 (signed long long)(val >> 64),
> +	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF),
> +	 (unsigned long long)(val >> 64),
> +	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF));
> +}
> +#endif
> +
> +void abort (void);
> +
> +int main ()
> +{
> +  int i, result_int;
> +
> +  __int128_t arg1, result;
> +  __uint128_t uarg2;
> +
> +  vector signed long long int vec_arg1_di, vec_arg2_di;
> +  vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
> +  vector unsigned long long int vec_uresult_di;
> +  vector unsigned long long int vec_uexpected_result_di;
> +  
> +  __int128_t expected_result;
> +  __uint128_t uexpected_result;
> +
> +  vector __int128_t vec_arg1, vec_arg2, vec_result;
> +  vector __uint128_t vec_uarg1, vec_uarg2, vec_uarg3, vec_uresult;
> +  vector bool __int128  vec_result_bool;
> +
> +  /* sign extend double to 128-bit integer  */
> +  vec_arg1_di[0] = 1000;
> +  vec_arg1_di[1] = -123456;
> +
> +  expected_result = 1000;
> +
> +  vec_result = vec_signextq (vec_arg1_di);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1_di[0] = -123456;
> +  vec_arg1_di[1] = 1000;
> +
> +  expected_result = -123456;
> +
> +  vec_result = vec_signextq (vec_arg1_di);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  /* test shift 128-bit integers.
> +     Note, shift amount is given by the lower 7-bits of the shift amount. */
> +  vec_arg1[0] = 3;
> +  vec_uarg2[0] = 2;
> +  expected_result = vec_arg1[0]*4;
> +
> +  vec_result = vec_sl (vec_arg1, vec_uarg2);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sl(int128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" << %lld", vec_uarg2[0] & 0xFF);
> +    printf(" = ");
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  arg1 = 3;
> +  uarg2 = 4;
> +  expected_result = arg1*16;
> +
> +  result = arg1 << uarg2;
> +
> +  if (result != expected_result) {
> +#if DEBUG
> +    printf("ERROR: int128 << uint128):  ");
> +    print_i128(arg1);
> +    printf(" << %lld", uarg2 & 0xFF);
> +    printf(" = ");
> +    print_i128(result);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 3;
> +  vec_uarg2[0] = 2;
> +  uexpected_result = vec_uarg1[0]*4;
> +  
> +  vec_uresult = vec_sl (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sl(uint128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" << %lld", vec_uarg2[0] & 0xFF);
> +    printf(" = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12;
> +  vec_uarg2[0] = 2;
> +  expected_result = vec_arg1[0]/4;
> +
> +  vec_result = vec_sr (vec_arg1, vec_uarg2);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sr(int128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" >> %lld", vec_uarg2[0] & 0xFF);
> +    printf(" = ");
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 48;
> +  vec_uarg2[0] = 2;
> +  uexpected_result = vec_uarg1[0]/4;
> +  
> +  vec_uresult = vec_sr (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sr(uint128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" >> %lld", vec_uarg2[0] & 0xFF);
> +    printf(" = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  arg1 = 48;
> +  uarg2 = 4;
> +  expected_result = arg1/16;
> +
> +  result = arg1 >> uarg2;
> +
> +  if (result != expected_result) {
> +#if DEBUG
> +    printf("ERROR: int128 >> uint128:  ");
> +    print_i128(arg1);
> +    printf(" >> %lld", uarg2 & 0xFF);
> +    printf(" = ");
> +    print_i128(result);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 0x1234567890ABCDEFULL;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg2[0] = 32;
> +  expected_result = 0x0000000012345678ULL;
> +  expected_result = (expected_result << 64) | 0x90ABCDEFAABBCCDDULL;
> +
> +  vec_result = vec_sra (vec_arg1, vec_uarg2);
> +  
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sra(int128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" >> %lld = \n", vec_uarg2[0]);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
> +  vec_uarg2[0] = 48;
> +  uexpected_result = 0xFFFFFFFFFFFFAABBLL;
> +  uexpected_result = (uexpected_result << 64) | 0xCCDDEEFF11221234ULL;
> +
> +  vec_uresult = vec_sra (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_sra(uint128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" >> %lld = \n", vec_uarg2[0] & 0xFF);
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 0x1234567890ABCDEFULL;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg2[0] = 32;
> +  expected_result = 0x90ABCDEFAABBCCDDULL;
> +  expected_result = (expected_result << 64) | 0xEEFF112212345678ULL;
> +
> +  vec_result = vec_rl (vec_arg1, vec_uarg2);
> +  
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rl(int128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" >> %lld = \n", vec_uarg2[0]);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
> +  vec_uarg2[0] = 48;
> +  uexpected_result = 0x11221234567890ABULL;
> +  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEEFFULL;
> +
> +  vec_uresult = vec_rl (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rl(uint128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" >> %lld = \n", vec_uarg2[0]);
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 0x1234567890ABCDEFULL;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg2[0] = (32 << 8) | 95;
> +  vec_uarg3[0] = 32;
> +  expected_result = 0xaabbccddULL;
> +  expected_result = (expected_result << 64) | 0xeeff112200000000ULL;
> +
> +  vec_result = vec_rlnm (vec_arg1, vec_uarg2, vec_uarg3);
> +  
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rlnm(int128, uint128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
> +  vec_uarg2[0] = (8 << 8) | 119;
> +  vec_uarg3[0] = 48;
> +
> +  uexpected_result = 0x00221234567890ABULL;
> +  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEE00ULL;
> +
> +  vec_uresult = vec_rlnm (vec_uarg1, vec_uarg2, vec_uarg3);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rlnm(uint128, uint128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 0x1234567890ABCDEFULL;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
> +  vec_arg2[0] = 0x000000000000DEADULL;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 0x0000BEEF00000000ULL;
> +  vec_uarg3[0] = 96 << 16 | 127 << 8 | 32;
> +  expected_result = 0x000000000000DEADULL;
> +  expected_result = (expected_result << 64) | 0x0000BEEF12345678ULL;
> +
> +  vec_result = vec_rlmi (vec_arg1, vec_arg2, vec_uarg3);
> +  
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rlmi(int128, int128, uint128):  ");
> +    print_i128(vec_arg1[0]);
> +    printf(" << %lld = \n", vec_uarg2_di[1] & 0xFF);
> +    print_i128(vec_result[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
> +  vec_uarg2[0] = 0xDEAD000000000000ULL;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 0x000000000000BEEFULL;
> +  vec_uarg3[0] = 16 << 16 | 111 << 8 | 48;
> +  uexpected_result = 0xDEAD1234567890ABULL;
> +  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDBEEFULL;
> +
> +  vec_uresult = vec_rlmi (vec_uarg1, vec_uarg2, vec_uarg3);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_rlmi(uint128, unit128, uint128):  ");
> +    print_i128(vec_uarg1[0]);
> +    printf(" << %lld = \n", vec_uarg3[1] & 0xFF);
> +    print_i128(vec_uresult[0]);
> +    printf("\n does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  /* 128-bit compare tests, result is all 1's if true */
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1[0] = 2468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  uexpected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  uexpected_result = (uexpected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpgt (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned vec_cmpgt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpgt (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed vec_cmpgt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = -1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR:not equal signed vec_cmpeq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed equal vec_cmpeq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 12468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  not equal vec_cmpeq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: equal unsigned vec_cmpeq ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 12468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  not equal vec_cmpne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: equal unsigned vec_cmpne ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = -1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR:not equal signed vec_cmpne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed equal vec_cmpne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 12468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 > arg2 vec_cmplt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 1234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 12468;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 < arg2 vec_cmplt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR:  unsigned arg1 = arg2 vec_cmplt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = -1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 > arg2 vec_cmplt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = -1234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 12468;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 < arg2 vec_cmplt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +  expected_result = 0x0ULL;
> +
> +  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_cmplt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +   
> +  vec_uarg1[0] = 12468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 > arg2 vec_cmple ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 1234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 12468;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 < arg2 vec_cmple ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR:  unsigned arg1 = arg2 vec_cmple ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = -1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 > arg2 vec_cmple ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = -1234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 12468;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 < arg2 vec_cmple ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_cmple ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 12468;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 > arg2 vec_cmpge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 1234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 12468;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: unsigned  arg1 < arg2 vec_cmpge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR:  unsigned arg1 = arg2 vec_cmpge ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = 12468;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = -1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 > arg2 vec_cmpge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg1[0] = -1234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 12468;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  expected_result = 0x0;
> +
> +  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed  arg1 < arg2 vec_cmpge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +  expected_result = 0xFFFFFFFFFFFFFFFFULL;
> +  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
> +
> +  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
> +
> +  if (vec_result_bool[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_cmpge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.");
> +    print_i128(vec_result_bool[0]);
> +    printf("\n Result does not match expected_result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_eq (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_eq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_eq (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_eq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_eq ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_eq ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_ne (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_ne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_ne (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_ne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ne ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ne ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_lt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_lt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_lt (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_lt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_lt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_lt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_le (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_le ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_le (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_le ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_le (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_le ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_le (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_le ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_gt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_gt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_gt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_gt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_gt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_gt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_all_ge (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_all_ge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_ge (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_all_ge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ge ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ge ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_eq (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_eq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_eq (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_eq ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_eq ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_eq ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_ne (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_ne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_ne (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_ne ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ne ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_ne ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_lt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_lt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_lt (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_lt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_lt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_lt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_gt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_gt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_gt (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_gt ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_gt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_le (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_le ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_le (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_le ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_le (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_le ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_le (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_le ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +  vec_arg1 = vec_arg2;
> +
> +  result_int = vec_any_ge (vec_arg1, vec_arg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 = arg2 vec_any_ge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1[0] = -234;
> +  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
> +  vec_arg2[0] = 1234;
> +  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_ge (vec_arg1, vec_arg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: signed arg1 != arg2 vec_any_ge ( ");
> +    print_i128(vec_arg1[0]);
> +    printf(", ");
> +    print_i128(vec_arg2[0]);
> +    printf(") failed.\n\n");
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +  vec_uarg1 = vec_uarg2;
> +
> +  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
> +
> +  if (!result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ge ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1[0] = 234;
> +  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
> +  vec_uarg2[0] = 1234;
> +  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
> +
> +  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
> +
> +  if (result_int) {
> +#if DEBUG
> +    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
> +    print_i128(vec_uarg1[0]);
> +    printf(", ");
> +    print_i128(vec_uarg2[0]);
> +    printf(") failed.\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  /* Vector multiply Even and Odd tests */
> +  vec_arg1_di[0] = 200;
> +  vec_arg1_di[1] = 400;
> +  vec_arg2_di[0] = 1234;
> +  vec_arg2_di[1] = 4567;
> +  expected_result = vec_arg1_di[0] * vec_arg2_di[0];
> +
> +  vec_result = vec_mule (vec_arg1_di, vec_arg2_di);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mule (signed, signed) failed.\n");
> +    printf(" vec_arg1_di[0] = %lld\n", vec_arg1_di[0]);
> +    printf(" vec_arg2_di[0] = %lld\n", vec_arg2_di[0]);
> +    printf("Result = ");
> +    print_i128(vec_result[0]);
> +    printf("\nExpected Result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_arg1_di[0] = -200;
> +  vec_arg1_di[1] = -400;
> +  vec_arg2_di[0] = 1234;
> +  vec_arg2_di[1] = 4567;
> +  expected_result = vec_arg1_di[1] * vec_arg2_di[1];
> +
> +  vec_result = vec_mulo (vec_arg1_di, vec_arg2_di);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mulo (signed, signed) failed.\n");
> +    printf(" vec_arg1_di[1] = %lld\n", vec_arg1_di[1]);
> +    printf(" vec_arg2_di[1] = %lld\n", vec_arg2_di[1]);
> +    printf("Result = ");
> +    print_i128(vec_result[0]);
> +    printf("\nExpected Result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1_di[0] = 200;
> +  vec_uarg1_di[1] = 400;
> +  vec_uarg2_di[0] = 1234;
> +  vec_uarg2_di[1] = 4567;
> +  uexpected_result = vec_uarg1_di[0] * vec_uarg2_di[0];
> +
> +  vec_uresult = vec_mule (vec_uarg1_di, vec_uarg2_di);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mule (unsigned, unsigned) failed.\n");
> +    printf(" vec_uarg1_di[1] = %lld\n", vec_uarg1_di[1]);
> +    printf(" vec_uarg2_di[1] = %lld\n", vec_uarg2_di[1]);
> +    printf("Result = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\nExpected Result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +  
> +  vec_uarg1_di[0] = 200;
> +  vec_uarg1_di[1] = 400;
> +  vec_uarg2_di[0] = 1234;
> +  vec_uarg2_di[1] = 4567;
> +  uexpected_result = vec_uarg1_di[1] * vec_uarg2_di[1];
> +
> +  vec_uresult = vec_mulo (vec_uarg1_di, vec_uarg2_di);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mulo (unsigned, unsigned) failed.\n");
> +    printf(" vec_uarg1_di[0] = %lld\n", vec_uarg1_di[0]);
> +    printf(" vec_uarg2_di[0] = %lld\n", vec_uarg2_di[0]);
> +    printf("Result = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\nExpected Result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  /* Vector Divide Quadword */
> +  vec_arg1[0] = -12345678;
> +  vec_arg2[0] = 2;
> +  expected_result = -6172839;
> +
> +  vec_result = vec_div (vec_arg1, vec_arg2);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_div (signed, signed) failed.\n");
> +    printf("vec_arg1[0] = ");
> +    print_i128(vec_arg1[0]);
> +    printf("\nvec_arg2[0] = ");
> +    print_i128(vec_arg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_result[0]);
> +    printf("\nExpected result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 24680;
> +  vec_uarg2[0] = 4;
> +  uexpected_result = 6170;
> +
> +  vec_uresult = vec_div (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_div (unsigned, unsigned) failed.\n");
> +    printf("vec_uarg1[0] = ");
> +    print_i128(vec_uarg1[0]);
> +    printf("\nvec_uarg2[0] = ");
> +    print_i128(vec_uarg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\nExpected result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  /* Vector Divide Extended Quadword */
> +  vec_arg1[0] = -20;        // has 128-bit of zero concatenated onto it
> +  vec_arg2[0] = 0x2000000000000000;
> +  vec_arg2[0] = vec_arg2[0] << 64;
> +  expected_result = -160;
> +
> +  vec_result = vec_dive (vec_arg1, vec_arg2);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_dive (signed, signed) failed.\n");
> +    printf("vec_arg1[0] = ");
> +    print_i128(vec_arg1[0]);
> +    printf("\nvec_arg2[0] = ");
> +    print_i128(vec_arg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_result[0]);
> +    printf("\nExpected result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 20;        // has 128-bit of zero concatenated onto it
> +  vec_uarg2[0] = 0x4000000000000000;
> +  vec_uarg2[0] = vec_uarg2[0] << 64;
> +  uexpected_result = 80;
> +
> +  vec_uresult = vec_dive (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_dive (unsigned, unsigned) failed.\n");
> +    printf("vec_uarg1[0] = ");
> +    print_i128(vec_uarg1[0]);
> +    printf("\nvec_uarg2[0] = ");
> +    print_i128(vec_uarg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\nExpected result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  /* Vector modulo quad word  */
> +  vec_arg1[0] = -12345675;
> +  vec_arg2[0] = 2;
> +  expected_result = -1;
> +
> +  vec_result = vec_mod (vec_arg1, vec_arg2);
> +
> +  if (vec_result[0] != expected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mod (signed, signed) failed.\n");
> +    printf("vec_arg1[0] = ");
> +    print_i128(vec_arg1[0]);
> +    printf("\nvec_arg2[0] = ");
> +    print_i128(vec_arg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_result[0]);
> +    printf("\nExpected result = ");
> +    print_i128(expected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  vec_uarg1[0] = 24685;
> +  vec_uarg2[0] = 4;
> +  uexpected_result = 1;
> +
> +  vec_uresult = vec_mod (vec_uarg1, vec_uarg2);
> +
> +  if (vec_uresult[0] != uexpected_result) {
> +#if DEBUG
> +    printf("ERROR: vec_mod (unsigned, unsigned) failed.\n");
> +    printf("vec_uarg1[0] = ");
> +    print_i128(vec_uarg1[0]);
> +    printf("\nvec_uarg2[0] = ");
> +    print_i128(vec_uarg2[0]);
> +    printf("\nResult = ");
> +    print_i128(vec_uresult[0]);
> +    printf("\nExpected result = ");
> +    print_i128(uexpected_result);
> +    printf("\n\n");
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  return 0;
> +}


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support
  2020-10-05 18:52     ` Carl Love
@ 2020-10-08 14:22       ` will schmidt
  2020-10-12 20:15         ` Carl Love
  0 siblings, 1 reply; 41+ messages in thread
From: will schmidt @ 2020-10-08 14:22 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Mon, 2020-10-05 at 11:52 -0700, Carl Love wrote:
> Will, Segher:
> 
> Add support for converting to/from 128-bit integers and 128-bit 
> decimal floating point formats.
> 
> The updates from the previous version of the patch:
> 
> Just a fix for the change log per Will's comments.
> 
> No regression failures were found when run on a P9.
> 
> Please let me know if this is ready for mainline. 
> 
>                Carl
> 
> --------------------------------------------------------------
> 
> 
> gcc/ChangeLog
> 
> 2020-10-05  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
> 	* config/rs6000/rs6000-call.c (P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P):
> 	New overloaded definitions.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-10-05  Carl Love  <cel@us.ibm.com>
> 	* gcc.target/powerpc/int_128bit-runnable.c:  Update test.


Maybe 'Add 128-bit DFP conversion tests' to give it better meaning.


> ---
>  gcc/config/rs6000/dfp.md                      | 14 +++++
>  gcc/config/rs6000/rs6000-call.c               |  4 ++
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 62 +++++++++++++++++++
>  3 files changed, 80 insertions(+)
> 
> diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
> index 8f822732bac..0e82e315fee 100644
> --- a/gcc/config/rs6000/dfp.md
> +++ b/gcc/config/rs6000/dfp.md
> @@ -222,6 +222,13 @@
>    "dcffixq %0,%1"
>    [(set_attr "type" "dfp")])
> 
> +(define_insn "floattitd2"
> +  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
> +	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
> +  "TARGET_POWER10"
> +  "dcffixqq %0,%1"
> +  [(set_attr "type" "dfp")])
> +
>  ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
>  ;; This is the first stage of converting it to an integer type.
> 
> @@ -241,6 +248,13 @@
>    "TARGET_DFP"
>    "dctfix<q> %0,%1"
>    [(set_attr "type" "dfp")])
> +
> +(define_insn "fixtdti2"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> +	(fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
> +  "TARGET_POWER10"
> +  "dctfixqq %0,%1"
> +  [(set_attr "type" "dfp")])
> 
>  ;; Decimal builtin support


ok

> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 87fff5c1c80..8d00a25d806 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -4967,6 +4967,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V2DI, 0 },
>    { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> 
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
> @@ -5074,6 +5076,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V2DI, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,

ok

> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> index 85ad544e22b..ec3dcf3dff1 100644
> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -38,6 +38,7 @@
>  #if DEBUG
>  #include <stdio.h>
>  #include <stdlib.h>
> +#include <math.h>
> 
> 
>  void print_i128(__int128_t val)
> @@ -59,6 +60,13 @@ int main ()
>    __int128_t arg1, result;
>    __uint128_t uarg2;
> 
> +  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
> +
> +  struct conv_t {
> +    __uint128_t u128;
> +    _Decimal128 d128;
> +  } conv, conv2;
> +
>    vector signed long long int vec_arg1_di, vec_arg2_di;
>    vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
>    vector unsigned long long int vec_uresult_di;
> @@ -2249,6 +2257,60 @@ int main ()
>      abort();
>  #endif
>    }
> +  
> +  /* DFP to __int128 and __int128 to DFP conversions */
> +  /* Can't get printing of DFP values to work.  Print the DFP value as an
> +     unsigned int so we can see the bit patterns.  */

Drop 'Can't get ...', just 'Print the DFP...' should be sufficient.

> +  conv.u128 = 0x2208000000000000ULL;
> +  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
> +  expected_result_dfp128 = conv.d128;
> 
> +  arg1 = 4;
> +
> +  conv.d128 = (_Decimal128) arg1;
> +
> +  result_dfp128 = (_Decimal128) arg1;
> +  if (((conv.u128 >>64) != 0x2208000000000000ULL) &&
> +      ((conv.u128 & 0xFFFFFFFFFFFFFFFF) != 0x4ULL)) {
> +#if DEBUG
> +    printf("ERROR:  convert int128 value ");
> +    print_i128 (arg1);
> +    conv.d128 = result_dfp128;
> +    printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
> +	   (unsigned long long)((conv.u128) >>64),
> +	   (unsigned long long)((conv.u128) & 0xFFFFFFFFFFFFFFFF));
> +
> +    conv.d128 = expected_result_dfp128;
> +    printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
> +	   (unsigned long long) (conv.u128>>64),
> +	   (unsigned long long) (conv.u128 & 0xFFFFFFFFFFFFFFFF));
> +#else
> +    abort();
> +#endif
> +  }
> +
> +  expected_result = 4;
> +
> +  conv.u128 = 0x2208000000000000ULL;
> +  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
> +  arg1_dfp128 = conv.d128;
> +
> +  result = (__int128_t) arg1_dfp128;
> +
> +  if (result != expected_result) {
> +#if DEBUG
> +    printf("ERROR:  convert DFP value ");
> +    printf("0x%llx %llx (printed as hex bit string) ",
> +	   (unsigned long long)(conv.u128>>64),
> +	   (unsigned long long)(conv.u128 & 0xFFFFFFFFFFFFFFFF));
> +    printf("to __int128 value = ");
> +    print_i128 (result);
> +    printf("\ndoes not match expected_result = ");
> +    print_i128 (expected_result);
> +    printf("\n");
> +#else
> +    abort();
> +#endif
> +  }
>    return 0;
>  }


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/5] Test 128-bit shifts for just the int128 type.
  2020-10-05 18:52     ` Carl Love
@ 2020-10-08 14:50       ` will schmidt
  2020-10-12 20:15         ` Carl Love
  0 siblings, 1 reply; 41+ messages in thread
From: will schmidt @ 2020-10-08 14:50 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Mon, 2020-10-05 at 11:52 -0700, Carl Love wrote:
> Will, Segher:
> 
> Patch 4 adds the vector 128-bit integer shift instruction support for
> the V1TI type.
> 
> The changes from the previous version include:
> 
> Fixed up the change log entry issues noted by Will.
> 
> Regression tests reran on Power 9 LE with no regression errors.
> 
> Please let me know if it looks OK to commit to mainline.
> 
>                   Carl 
> ---------------------------------------------
> 
> gcc/ChangeLog
> 
> 2020-10-05  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
> 	Rename to altivec_vslq_<mode>, altivec_vsrq_<mode>, mode VEC_TI.
> 	* config/rs6000/vector.md (VEC_TI): New name for VSX_TI iterator.

What was the old name?   (Maybe just 'New iterator' ?)
Ok, back from below.  this is new name and location for what was
previously named VSX_TI in vsx.md.
Wouldn't hurt to have a statement in the description to clarify that.
"This patch renames the VSX_TI iterator to VEC_TI, and updates the
users." 


> 	(vashlv1ti3): Change to vashl<mode>3, mode VEC_TI.
> 	(vlshrv1ti3): Change to vlshr<mode>3, mode VEC_TI.
> 	* config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator.

> 
> gcc/testsuite/ChangeLog
> 
> 2020-10-05  Carl Love  <cel@us.ibm.com>
> 	gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
> 	tests.
> ---
>  gcc/config/rs6000/altivec.md                  | 16 ++++-----
>  gcc/config/rs6000/vector.md                   | 27 ++++++++-------
>  gcc/config/rs6000/vsx.md                      | 33 +++++++++----------
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++++++--
>  4 files changed, 52 insertions(+), 40 deletions(-)
> 

ok

> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 34a4731342a..5db3de3cc9f 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2219,10 +2219,10 @@
>    "vsl<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vslq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_insn "altivec_vslq_<mode>"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
> +		     (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
>    "TARGET_POWER10"
>    /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
>    "vslq %0,%1,%2"
> @@ -2236,10 +2236,10 @@
>    "vsr<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vsrq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_insn "altivec_vsrq_<mode>"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
> +			   (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
>    "TARGET_POWER10"
>    /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
>    "vsrq %0,%1,%2"

ok


> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 0cca4232619..3ea3a91845a 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; 128-bit int modes
> +(define_mode_iterator VEC_TI [V1TI TI])
> +
>  ;; Vector int modes for parity
>  (define_mode_iterator VEC_IP [V8HI
>  			      V4SI
> @@ -1627,17 +1630,17 @@
>    "")
> 
>  ;; No immediate version of this 128-bit instruction
> -(define_expand "vashlv1ti3"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_expand "vashl<mode>3"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
> +			 (match_operand:VEC_TI 2 "vsx_register_operand")))]
>    "TARGET_POWER10"
>  {
>    /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> -  rtx tmp = gen_reg_rtx (V1TImode);
> +  rtx tmp = gen_reg_rtx (<MODE>mode);
> 
>    emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> -  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
> +  emit_insn(gen_altivec_vslq_<mode> (operands[0], operands[1], tmp));
>    DONE;
>  })
> 
> @@ -1650,17 +1653,17 @@
>    "")
> 
>  ;; No immediate version of this 128-bit instruction
> -(define_expand "vlshrv1ti3"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_expand "vlshr<mode>3"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
> +			   (match_operand:VEC_TI 2 "vsx_register_operand")))]
>    "TARGET_POWER10"
>  {
>    /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
> -  rtx tmp = gen_reg_rtx (V1TImode);
> +  rtx tmp = gen_reg_rtx (<MODE>mode);
> 
>    emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> -  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
> +  emit_insn(gen_altivec_vsrq_<mode> (operands[0], operands[1], tmp));
>    DONE;
>  })
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 5b6a0bd728a..87f96ffcc4c 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -37,9 +37,6 @@
>  				  TI
>  				  V1TI])
> 
> -;; Iterator for 128-bit integer types that go in a single vector register.
> -(define_mode_iterator VSX_TI [TI V1TI])
> -
>  ;; Iterator for the 2 32-bit vector types
>  (define_mode_iterator VSX_W [V4SF V4SI])
> 
> @@ -944,9 +941,9 @@
>  ;; special V1TI container class, which it is not appropriate to use vec_select
>  ;; for the type.
>  (define_insn "*vsx_le_permute_<mode>"
> -  [(set (match_operand:VSX_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
> -	(rotate:VSX_TI
> -	 (match_operand:VSX_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
> +  [(set (match_operand:VEC_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
> +	(rotate:VEC_TI
> +	 (match_operand:VEC_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
>  	 (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
>    "@
> @@ -960,10 +957,10 @@
>     (set_attr "type" "vecperm,vecload,vecstore,*,load,store")])
> 
>  (define_insn_and_split "*vsx_le_undo_permute_<mode>"
> -  [(set (match_operand:VSX_TI 0 "vsx_register_operand" "=wa,wa")
> -	(rotate:VSX_TI
> -	 (rotate:VSX_TI
> -	  (match_operand:VSX_TI 1 "vsx_register_operand" "0,wa")
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=wa,wa")
> +	(rotate:VEC_TI
> +	 (rotate:VEC_TI
> +	  (match_operand:VEC_TI 1 "vsx_register_operand" "0,wa")
>  	  (const_int 64))
>  	 (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX"
> @@ -1031,11 +1028,11 @@
>  ;; Peepholes to catch loads and stores for TImode if TImode landed in
>  ;; GPR registers on a little endian system.
>  (define_peephole2
> -  [(set (match_operand:VSX_TI 0 "int_reg_operand")
> -	(rotate:VSX_TI (match_operand:VSX_TI 1 "memory_operand")
> +  [(set (match_operand:VEC_TI 0 "int_reg_operand")
> +	(rotate:VEC_TI (match_operand:VEC_TI 1 "memory_operand")
>  		       (const_int 64)))
> -   (set (match_operand:VSX_TI 2 "int_reg_operand")
> -	(rotate:VSX_TI (match_dup 0)
> +   (set (match_operand:VEC_TI 2 "int_reg_operand")
> +	(rotate:VEC_TI (match_dup 0)
>  		       (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
>     && (rtx_equal_p (operands[0], operands[2])
> @@ -1043,11 +1040,11 @@
>     [(set (match_dup 2) (match_dup 1))])
> 
>  (define_peephole2
> -  [(set (match_operand:VSX_TI 0 "int_reg_operand")
> -	(rotate:VSX_TI (match_operand:VSX_TI 1 "int_reg_operand")
> +  [(set (match_operand:VEC_TI 0 "int_reg_operand")
> +	(rotate:VEC_TI (match_operand:VEC_TI 1 "int_reg_operand")
>  		       (const_int 64)))
> -   (set (match_operand:VSX_TI 2 "memory_operand")
> -	(rotate:VSX_TI (match_dup 0)
> +   (set (match_operand:VEC_TI 2 "memory_operand")
> +	(rotate:VEC_TI (match_dup 0)
>  		       (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
>     && peep2_reg_dead_p (2, operands[0])"
> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> index ec3dcf3dff1..25e2c9d1af4 100644
> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -53,6 +53,18 @@ void print_i128(__int128_t val)
> 
>  void abort (void);
> 
> +__attribute__((noinline))
> +__int128_t shift_right (__int128_t a, __uint128_t b)
> +{
> +  return a >> b;
> +}
> +
> +__attribute__((noinline))
> +__int128_t shift_left (__int128_t a, __uint128_t b)
> +{
> +  return a << b;
> +}
> +
>  int main ()
>  {
>    int i, result_int;
> @@ -141,7 +153,7 @@ int main ()
>  #endif
>    }
> 
> -  arg1 = 3;
> +  arg1 = vec_result[0];
>    uarg2 = 4;
>    expected_result = arg1*16;
> 
> @@ -225,7 +237,7 @@ int main ()
>  #endif
>    }
> 
> -  arg1 = 48;
> +  arg1 = vec_uresult[0];
>    uarg2 = 4;
>    expected_result = arg1/16;

Presumably the shift_right and shift_left functions get called with
__int128 types elsewhere in the series?

ok.


lgtm, thanks
-Will

> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/5] Conversions between 128-bit integer and floating point values.
  2020-10-05 18:52     ` Carl Love
@ 2020-10-08 15:35       ` will schmidt
  2020-10-12 20:16         ` Carl Love
  0 siblings, 1 reply; 41+ messages in thread
From: will schmidt @ 2020-10-08 15:35 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Mon, 2020-10-05 at 11:52 -0700, Carl Love wrote:
> Will, Segher:
> 
> This patch adds support for converting to/from 128-bit integers and
> 128-bit decimal floating point formats using the new P10 instructions
> dcffixqq and dctfixqq.  The new instructions are only used on P10 HW,
> otherwise the conversions continue to use the existing SW routines.
> 
> The changes from the previous version include:
> 
> Fixed up the change log entry issues noted by Will.
> 
> Regression tests reran on Power 9 LE with no regression errors.
> 
> Please let me know if it looks OK to commit to mainline.
> 
>                   Carl 
> ---------------------------------------------
> 
> gcc/ChangeLog
> 
> 2020-10-05  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/rs6000.md (floatti<mode>2, floatunsti<mode>2,
> 	fix_trunc<mode>ti2, fixuns_trunc<mode>ti2): Add
> 	define_insn for mode IEEE 128.
> 	* libgcc/config/rs6000/fixkfti.c: Renamed to fixkfti-sw.c.
> 	Update source function name.  White space fixes.

I'd move the generic 'Update source ... White space...' bits to the
patch description.  In addition to the 'Rename' statement, some form of
'Change calls of __fixkfti to __fixkfti_sw` would be more useful here.

> 	* libgcc/config/rs6000/fixunskfti.c: Renamed to fixunskfti-sw.c.
> 	Update source function name.  White space fixes.
> 	* libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
> 	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
> 	New functions.
ok
> 	* libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
> 	New macro.
> 	(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
> 	__fixunskfti_resolve): Add resolve functions.
> 	(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
> 	functions.
ok
> 	* libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
> 	__fixtfti, __fixunstfti): Add editor commands to change
> 	names.
> 	* libgcc/config/rs6000/float128-sed-hw (__floattitf,
> 	__floatuntitf, __fixtfti, __fixunstfti): Add editor commands
> 	to change names.
ok

> 	* libgcc/config/rs6000/floattikf.c: Renamed to floattikf-sw.c.
> 	* libgcc/config/rs6000/floatuntikf.c: Renamed to floatuntikf-sw.c.
> 	* libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
> 	__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
> 	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
> 	__floatuntikf, __fixkfti, __fixunskfti): New extern declarations.
> 	* libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
> 	fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
> 	(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
> 	file names to fp128_ppc_funcs.

> 
> gcc/testsuite/ChangeLog
> 
> 2020-10-05  Carl Love  <cel@us.ibm.com>
> 	* gcc.target/powerpc/fl128_conversions.c: New file.

fp128_conversions.c



> ---
>  gcc/config/rs6000/rs6000.md                   |  36 +++
>  .../gcc.target/powerpc/fp128_conversions.c    | 286 ++++++++++++++++++
>  .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
>  .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   7 +-
>  libgcc/config/rs6000/float128-hw.c            |  24 ++
>  libgcc/config/rs6000/float128-ifunc.c         |  44 ++-
>  libgcc/config/rs6000/float128-sed             |   4 +
>  libgcc/config/rs6000/float128-sed-hw          |   4 +
>  .../rs6000/{floattikf.c => floattikf-sw.c}    |   4 +-
>  .../{floatuntikf.c => floatuntikf-sw.c}       |   4 +-
>  libgcc/config/rs6000/quad-float128.h          |  17 +-
>  libgcc/config/rs6000/t-float128               |   3 +-
>  12 files changed, 417 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
>  rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
>  rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (90%)
>  rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
>  rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%)
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 694ff70635e..5db5d0b4505 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6390,6 +6390,42 @@
>     xscvsxddp %x0,%x1"
>    [(set_attr "type" "fp")])
> 
> +(define_insn "floatti<mode>2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +       (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvsqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "floatunsti<mode>2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +       (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvuqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "fix_trunc<mode>ti2"
> +  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
> +       (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvqpsqz %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "fixuns_trunc<mode>ti2"
> +  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
> +       (unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvqpuqz %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
>  ; Allow the combiner to merge source memory operands to the conversion so that
>  ; the optimizer/register allocator doesn't try to load the value too early in a
>  ; GPR and then use store/load to move it to a FPR and suffer from a store-load

ok

> diff --git a/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
> new file mode 100644
> index 00000000000..dff24e1e2b4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
> @@ -0,0 +1,286 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10" } */
> +
> +/* Check that the expected 128-bit instructions are generated if the processor
> +   supports the 128-bit integer instructions. */
> +/* { dg-final { scan-assembler-times {\mxscvsqqp\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvuqqp\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvqpsqz\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvqpuqz\M} 1 { target { ppc_native_128bit } } } } */

I don't see any other uses or references to ppc_native_128bit.  
Nothing in testsuite/lib/target-supports.exp looks close.  ? 

> +
> +#include <stdio.h>
> +#include <math.h>
> +#include <fenv.h>
> +#include <stdlib.h>
> +#include <wchar.h>
> +
> +#define DEBUG 0
> +
> +void abort (void);
> +
> +float conv_i_2_fp( long long int a)
> +{
> +  return (float) a;
> +}
> +
> +double conv_i_2_fpd( long long int a)
> +{
> +  return (double) a;
> +}
> +
> +double conv_ui_2_fpd( unsigned long long int a)
> +{
> +  return (double) a;
> +}
> +
> +__float128 conv_i128_2_fp128 (__int128_t a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__float128) a;
> +}
> +
> +__float128 conv_ui128_2_fp128 (__uint128_t a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__float128) a;
> +}
> +
> +__int128_t conv_fp128_2_i128 (__float128 a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__int128_t) a;
> +}
> +
> +__uint128_t conv_fp128_2_ui128 (__float128 a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__uint128_t) a;
> +}
> +
> +long double conv_i128_2_ld (__int128_t a)
> +{
> +  // default, gen call __floattitf
> +  // -mabi=ibmlongdouble, gen call __floattitf
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (long double) a;
> +}
> +
> +__ibm128 conv_i128_2_ibm128 (__int128_t a)
> +{
> +  // default, gen call __floattitf
> +  // -mabi=ibmlongdouble, gen call __floattitf
> +  // -mabi=ieeelongdouble, message uses IBM long double, no binary output
> +  return (__ibm128) a;
> +}
> +
> +int main()
> +{
> +	float a, expected_result_float;
> +	double b, expected_result_double;
> +	long long int c, expected_result_llint;
> +	unsigned long long int u;
> +	__int128_t d;
> +	__uint128_t u128;
> +	unsigned long long expected_result_uint128[2] ;
> +	__float128 e;
> +	long double ld;     // another 128-bit float version
> +
> +	union conv_t {
> +		float a;
> +		double b;
> +		long long int c;
> +		long long int128[2] ;
> +		unsigned long long uint128[2] ;
> +		unsigned long long int u;
> +		__int128_t d;
> +		__uint128_t u128;
> +		__float128 e;
> +		long double ld;     // another 128-bit float version
> +	} conv, conv_result;
> +
> +	c = 20;
> +	expected_result_llint = 20.00000;
> +	a = conv_i_2_fp (c);
> +
> +	if (a != expected_result_llint) {
> +#if DEBUG
> +		printf("ERROR: conv_i_2_fp(%lld) = %10.5f\n", c, a);
> +		printf("\n does not match expected_result = %10.5f\n\n",
> +				 expected_result_llint);
> +#else
> +		abort();
> +#endif
> +	}
> +
> +	c = 20;
> +	expected_result_double = 20.00000;
> +	b = conv_i_2_fpd (c);
> +
> +	if (b != expected_result_double) {
> +#if DEBUG
> +		printf("ERROR: conv_i_2_fpd(%lld) = %10.5f\n", d, b);
> +		printf("\n does not match expected_result = %10.5f\n\n",
> +				 expected_result_double);
> + #else
> +		abort();
> +#endif
> +	}
> +
> +	u = 20;
> +	expected_result_double = 20.00000;
> +	b = conv_ui_2_fpd (u);
> +
> +	if (b != expected_result_double) {
> +#if DEBUG
> +		printf("ERROR: conv_ui_2_fpd(%llu) = %10.5f\n", u, b);
> +		printf("\n does not match expected_result = %10.5f\n\n",
> +				 expected_result_double);
> + #else
> +		abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */

limitation or pr ? 

> +  d = -3210;
> +  d = (d * 10000000000) + 9876543210;
> +  conv_result.e = conv_i128_2_fp128 (d);
> +  expected_result_uint128[1] = 0xc02bd2f9068d1160;
> +  expected_result_uint128[0] = 0x0;
> +  
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_i128_2_fp128(-32109876543210) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  d = 123;
> +  d = (d * 10000000000) + 1234567890;
> +  conv_result.ld = conv_i128_2_fp128 (d);
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0x4271eab4c8ed2000;
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_i128_2_fp128(1231234567890) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */
> +  u128 = 8760;
> +  u128 = (u128 * 10000000000) + 1234567890;
> +  conv_result.e = conv_ui128_2_fp128 (u128);
> +  expected_result_uint128[1] = 0x402d3eb101df8b48;
> +  expected_result_uint128[0] = 0x0;
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_ui128_2_fp128(87601234567890) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */
> +  u128 = 3210;
> +  u128 = (u128 * 10000000000) + 9876543210;
> +  expected_result_uint128[1] = 0x402bd3429c8feea0;
> +  expected_result_uint128[0] = 0x0;
> +  conv_result.e = conv_ui128_2_fp128 (u128);
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_ui128_2_fp128(32109876543210) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = 12345.6789;
> +  expected_result_uint128[1] = 0x1407374883526960;
> +  expected_result_uint128[0] = 0x3039;
> +
> +  conv_result.d = conv_fp128_2_i128 (conv.e);
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) =  ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
> +
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = -6789.12345;
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0xffffffffffffe57b;
> +  conv_result.d = conv_fp128_2_i128 (conv.e);
> + 
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) = ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
> +
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = 6789.12345;
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0x1a85;
> +  conv_result.d = conv_fp128_2_ui128 (conv.e);
> + 
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_ui128(0x%llx %llx) = ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
> +	  
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  return 0;
> +}
> diff --git a/libgcc/config/rs6000/fixkfti.c b/libgcc/config/rs6000/fixkfti-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/fixkfti.c
> rename to libgcc/config/rs6000/fixkfti-sw.c
> index a22286228aa..d6bbbf889b7 100644
> --- a/libgcc/config/rs6000/fixkfti.c
> +++ b/libgcc/config/rs6000/fixkfti-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TItype
> -__fixkfti (TFtype a)
> +__fixkfti_sw (TFtype a)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/fixunskfti.c b/libgcc/config/rs6000/fixunskfti-sw.c
> similarity index 90%
> rename from libgcc/config/rs6000/fixunskfti.c
> rename to libgcc/config/rs6000/fixunskfti-sw.c
> index ab232d92d24..5c8b94b8862 100644
> --- a/libgcc/config/rs6000/fixunskfti.c
> +++ b/libgcc/config/rs6000/fixunskfti-sw.c
> @@ -5,7 +5,10 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> +
> +   The fixunskfti.c file renamed fixunskfti-sw.c.  Updated the
> +   source file names and made whitespace fixes

This would be better in patch description.  Not sure it is useful here.


> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +38,7 @@
>  #include "quad-float128.h"
> 
>  UTItype
> -__fixunskfti (TFtype a)
> +__fixunskfti_sw (TFtype a)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/float128-hw.c b/libgcc/config/rs6000/float128-hw.c
> index 8705b53e22a..be8bd07e853 100644
> --- a/libgcc/config/rs6000/float128-hw.c
> +++ b/libgcc/config/rs6000/float128-hw.c
> @@ -86,6 +86,30 @@ __floatdikf_hw (DItype_ppc a)
>    return (TFtype) a;
>  }
> 
> +TFtype
> +__floattikf_hw (TItype_ppc a)
> +{
> +  return (TFtype) a;
> +}
> +
> +TFtype
> +__floatuntikf_hw (UTItype_ppc a)
> +{
> +  return (TFtype) a;
> +}
> +
> +TItype_ppc
> +__fixkfti_hw (TFtype a)
> +{
> +  return (TItype_ppc) a;
> +}
> +
> +UTItype_ppc
> +__fixunskfti_hw (TFtype a)
> +{
> +  return (UTItype_ppc) a;
> +}
> +
>  TFtype
>  __floatundikf_hw (UDItype_ppc a)
>  {
> diff --git a/libgcc/config/rs6000/float128-ifunc.c b/libgcc/config/rs6000/float128-ifunc.c
> index c2f65912a74..c221be2c864 100644
> --- a/libgcc/config/rs6000/float128-ifunc.c
> +++ b/libgcc/config/rs6000/float128-ifunc.c
> @@ -46,14 +46,9 @@
>  #endif
> 
>  #define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW : SW)
> +#define SW_OR_HW_ISA3_1(SW, HW) (__builtin_cpu_supports ("arch_3_1") ? HW : SW)
> 
>  /* Resolvers.  */
> -
> -/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
> -   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
> -   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
> -   use the emulator functions for these conversions.  */
> -

good.  :-)

>  static __typeof__ (__addkf3_sw) *
>  __addkf3_resolve (void)
>  {
> @@ -102,6 +97,18 @@ __floatdikf_resolve (void)
>    return SW_OR_HW (__floatdikf_sw, __floatdikf_hw);
>  }
> 
> +static __typeof__ (__floattikf_sw) *
> +__floattikf_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__floattikf_sw, __floattikf_hw);
> +}
> +
> +static __typeof__ (__floatuntikf_sw) *
> +__floatuntikf_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__floatuntikf_sw, __floatuntikf_hw);
> +}
> +
>  static __typeof__ (__floatunsikf_sw) *
>  __floatunsikf_resolve (void)
>  {
> @@ -114,6 +121,19 @@ __floatundikf_resolve (void)
>    return SW_OR_HW (__floatundikf_sw, __floatundikf_hw);
>  }
> 
> +
> +static __typeof__ (__fixkfti_sw) *
> +__fixkfti_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__fixkfti_sw, __fixkfti_hw);
> +}
> +
> +static __typeof__ (__fixunskfti_sw) *
> +__fixunskfti_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__fixunskfti_sw, __fixunskfti_hw);
> +}
> +
>  static __typeof__ (__fixkfsi_sw) *
>  __fixkfsi_resolve (void)
>  {
> @@ -303,6 +323,18 @@ TFtype __floatsikf (SItype_ppc)
>  TFtype __floatdikf (DItype_ppc)
>    __attribute__ ((__ifunc__ ("__floatdikf_resolve")));
> 
> +TFtype __floattikf (TItype_ppc)
> +  __attribute__ ((__ifunc__ ("__floattikf_resolve")));
> +
> +TFtype __floatuntikf (UTItype_ppc)
> +  __attribute__ ((__ifunc__ ("__floatuntikf_resolve")));
> +
> +TItype_ppc __fixkfti (TFtype)
> +  __attribute__ ((__ifunc__ ("__fixkfti_resolve")));
> +
> +UTItype_ppc __fixunskfti (TFtype)
> +  __attribute__ ((__ifunc__ ("__fixunskfti_resolve")));
> +
>  TFtype __floatunsikf (USItype_ppc)
>    __attribute__ ((__ifunc__ ("__floatunsikf_resolve")));
> 
> diff --git a/libgcc/config/rs6000/float128-sed b/libgcc/config/rs6000/float128-sed
> index d9a089ff9ba..c0fcddb1959 100644
> --- a/libgcc/config/rs6000/float128-sed
> +++ b/libgcc/config/rs6000/float128-sed
> @@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi/g
>  s/__fixunstfdi/__fixunskfdi/g
>  s/__fixunstfsi/__fixunskfsi/g
>  s/__floatditf/__floatdikf/g
> +s/__floattitf/__floattikf/g
> +s/__floatuntitf/__floatuntikf/g
> +s/__fixtfti/__fixkfti/g
> +s/__fixunstfti/__fixunskfti/g
>  s/__floatsitf/__floatsikf/g
>  s/__floatunditf/__floatundikf/g
>  s/__floatunsitf/__floatunsikf/g
> diff --git a/libgcc/config/rs6000/float128-sed-hw b/libgcc/config/rs6000/float128-sed-hw
> index acf36b0c17d..3d2bf556da1 100644
> --- a/libgcc/config/rs6000/float128-sed-hw
> +++ b/libgcc/config/rs6000/float128-sed-hw
> @@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi_sw/g
>  s/__fixunstfdi/__fixunskfdi_sw/g
>  s/__fixunstfsi/__fixunskfsi_sw/g
>  s/__floatditf/__floatdikf_sw/g
> +s/__floattitf/__floattikf_sw/g
> +s/__floatuntitf/__floatuntikf_sw/g
> +s/__fixtfti/__fixkfti_sw/g
> +s/__fixunstfti/__fixunskfti_sw/g
>  s/__floatsitf/__floatsikf_sw/g
>  s/__floatunditf/__floatundikf_sw/g
>  s/__floatunsitf/__floatunsikf_sw/g
> diff --git a/libgcc/config/rs6000/floattikf.c b/libgcc/config/rs6000/floattikf-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/floattikf.c
> rename to libgcc/config/rs6000/floattikf-sw.c
> index 4e8c40cfbe4..110706352bb 100644
> --- a/libgcc/config/rs6000/floattikf.c
> +++ b/libgcc/config/rs6000/floattikf-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TFtype
> -__floattikf (TItype i)
> +__floattikf_sw (TItype i)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/floatuntikf.c b/libgcc/config/rs6000/floatuntikf-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/floatuntikf.c
> rename to libgcc/config/rs6000/floatuntikf-sw.c
> index 8bfba4267d4..5e712a67e26 100644
> --- a/libgcc/config/rs6000/floatuntikf.c
> +++ b/libgcc/config/rs6000/floatuntikf-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TFtype
> -__floatuntikf (UTItype i)
> +__floatuntikf_sw (UTItype i)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/quad-float128.h b/libgcc/config/rs6000/quad-float128.h
> index 32ef328a8ea..24712b9277f 100644
> --- a/libgcc/config/rs6000/quad-float128.h
> +++ b/libgcc/config/rs6000/quad-float128.h
> @@ -87,19 +87,18 @@ extern USItype_ppc __fixunskfsi_sw (TFtype);
>  extern UDItype_ppc __fixunskfdi_sw (TFtype);
>  extern TFtype __floatsikf_sw (SItype_ppc);
>  extern TFtype __floatdikf_sw (DItype_ppc);
> +extern TFtype __floattikf_sw (TItype_ppc);
>  extern TFtype __floatunsikf_sw (USItype_ppc);
>  extern TFtype __floatundikf_sw (UDItype_ppc);
> +extern TFtype __floatuntikf_sw (UTItype_ppc);
> +extern TItype_ppc __fixkfti_sw (TFtype);
> +extern UTItype_ppc __fixunskfti_sw (TFtype);
>  extern IBM128_TYPE __extendkftf2_sw (TFtype);
>  extern TFtype __trunctfkf2_sw (IBM128_TYPE);
>  extern TCtype __mulkc3_sw (TFtype, TFtype, TFtype, TFtype);
>  extern TCtype __divkc3_sw (TFtype, TFtype, TFtype, TFtype);
> 
>  #ifdef _ARCH_PPC64
> -/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
> -   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
> -   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
> -   use the emulator functions for these conversions.  */
> -
>  extern TItype_ppc __fixkfti (TFtype);
>  extern UTItype_ppc __fixunskfti (TFtype);
>  extern TFtype __floattikf (TItype_ppc);
> @@ -130,8 +129,12 @@ extern USItype_ppc __fixunskfsi_hw (TFtype);
>  extern UDItype_ppc __fixunskfdi_hw (TFtype);
>  extern TFtype __floatsikf_hw (SItype_ppc);
>  extern TFtype __floatdikf_hw (DItype_ppc);
> +extern TFtype __floattikf_hw (TItype_ppc);
>  extern TFtype __floatunsikf_hw (USItype_ppc);
>  extern TFtype __floatundikf_hw (UDItype_ppc);
> +extern TFtype __floatuntikf_hw (UTItype_ppc);
> +extern TItype_ppc __fixkfti_hw (TFtype);
> +extern UTItype_ppc __fixunskfti_hw (TFtype);
>  extern IBM128_TYPE __extendkftf2_hw (TFtype);
>  extern TFtype __trunctfkf2_hw (IBM128_TYPE);
>  extern TCtype __mulkc3_hw (TFtype, TFtype, TFtype, TFtype);
> @@ -162,8 +165,12 @@ extern USItype_ppc __fixunskfsi (TFtype);
>  extern UDItype_ppc __fixunskfdi (TFtype);
>  extern TFtype __floatsikf (SItype_ppc);
>  extern TFtype __floatdikf (DItype_ppc);
> +extern TFtype __floattikf (TItype_ppc);
>  extern TFtype __floatunsikf (USItype_ppc);
>  extern TFtype __floatundikf (UDItype_ppc);
> +extern TFtype __floatuntikf (UTItype_ppc);
> +extern TItype_ppc __fixkfti (TFtype);
> +extern UTItype_ppc __fixunskfti (TFtype);
>  extern IBM128_TYPE __extendkftf2 (TFtype);
>  extern TFtype __trunctfkf2 (IBM128_TYPE);
> 
> diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
> index d5413445189..325b22fd49e 100644
> --- a/libgcc/config/rs6000/t-float128
> +++ b/libgcc/config/rs6000/t-float128
> @@ -23,7 +23,8 @@ fp128_softfp_shared_obj	= $(addsuffix -sw_s$(objext),$(fp128_softfp_funcs))
>  fp128_softfp_obj	= $(fp128_softfp_static_obj) $(fp128_softfp_shared_obj)
> 
>  # New functions for software emulation
> -fp128_ppc_funcs		= floattikf floatuntikf fixkfti fixunskfti \
> +fp128_ppc_funcs		= floattikf-sw floatuntikf-sw \
> +			  fixkfti-sw fixunskfti-sw \
>  			  extendkftf2-sw trunctfkf2-sw \
>  			  sfp-exceptions _mulkc3 _divkc3 _powikf2
> 
ok.


lgtm,
thanks
-Will


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations
  2020-10-07 21:08       ` will schmidt
@ 2020-10-12 20:15         ` Carl Love
  2020-10-12 20:51           ` Segher Boessenkool
  2020-10-12 20:43         ` Segher Boessenkool
  1 sibling, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-10-12 20:15 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will, Segher:

Patch 1, adds the 128-bit sign extension instruction support and
corresponding builtin support.

Removed the blank line per Will's latest feedback.

Retested the patch on Power 9 with no regression errors.

                    Carl
------------------------------------------------------

gcc/ChangeLog

2020-10-08  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
	for new builtins.
	* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
	overloaded builtin definitions.
	(VSIGNEXTSB2W, VSIGNEXTSH2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D):
	Add builtin expansions.
	* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
	P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
	* config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
	visible.
	* doc/extend.texi:  Add documentation for the vec_signexti and
	vec_signextll builtins.

gcc/testsuite/ChangeLog

2020-10-08  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
---
 gcc/config/rs6000/altivec.h                   |   2 +
 gcc/config/rs6000/rs6000-builtin.def          |   9 ++
 gcc/config/rs6000/rs6000-call.c               |  13 ++
 gcc/config/rs6000/vsx.md                      |   2 +-
 gcc/doc/extend.texi                           |  15 ++
 .../powerpc/p9-sign_extend-runnable.c         | 128 ++++++++++++++++++
 6 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index f7720d136c9..562c5273f71 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -494,6 +494,8 @@
 
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
+#define vec_signexti  __builtin_vec_vsignexti
+#define vec_signextll __builtin_vec_vsignextll
 #endif
 
 /* Predicates.
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index e91a48ddf5f..4c2e9460949 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2715,6 +2715,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,	"vprtybd")
 BU_P9V_OVERLOAD_1 (VPRTYBQ,	"vprtybq")
 BU_P9V_OVERLOAD_1 (VPRTYBW,	"vprtybw")
 BU_P9V_OVERLOAD_1 (VPARITY_LSBB,	"vparity_lsbb")
+BU_P9V_OVERLOAD_1 (VSIGNEXTI,	"vsignexti")
+BU_P9V_OVERLOAD_1 (VSIGNEXTLL,	"vsignextll")
 
 /* 2 argument functions added in ISA 3.0 (power9).  */
 BU_P9_2 (CMPRB,	"byte_in_range",	CONST,	cmprb)
@@ -2726,6 +2728,13 @@ BU_P9_OVERLOAD_2 (CMPRB,	"byte_in_range")
 BU_P9_OVERLOAD_2 (CMPRB2,	"byte_in_either_range")
 BU_P9_OVERLOAD_2 (CMPEQB,	"byte_in_set")
 \f
+/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1.  */
+BU_P9V_AV_1 (VSIGNEXTSB2W,	"vsignextsb2w",		CONST,  vsx_sign_extend_qi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSH2W,	"vsignextsh2w",		CONST,  vsx_sign_extend_hi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSB2D,	"vsignextsb2d",		CONST,  vsx_sign_extend_qi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
+
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
 BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index a8b520834c7..9e514a01012 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5527,6 +5527,19 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI },
 
+  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 */
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
+
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+
   /* Overloaded built-in functions for ISA3.1 (power10). */
   { P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 },
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 4ff52455fd3..31fcffe8f33 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4787,7 +4787,7 @@
   "vextsh2<wd> %0,%1"
   [(set_attr "type" "vecexts")])
 
-(define_insn "*vsx_sign_extend_si_v2di"
+(define_insn "vsx_sign_extend_si_v2di"
   [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
 	(unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
 		     UNSPEC_VSX_SIGN_EXTEND))]
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5571c4f2ff2..c1c2c9f9bf7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20800,6 +20800,21 @@ void vec_xst (vector unsigned char, int, vector unsigned char *);
 void vec_xst (vector unsigned char, int, unsigned char *);
 @end smallexample
 
+uThe following sign extension builtins are provided.
+
+@smallexample
+vector signed int vec_signexti (vector signed char a)
+vector signed long long vec_signextll (vector signed char a)
+vector signed int vec_signexti (vector signed short a)
+vector signed long long vec_signextll (vector signed short a)
+vector signed long long vec_signextll (vector signed int a)
+@end smallexample
+
+Each element of the result is produced by sign-extending the element of the
+input vector that would fall in the least significant portion of the result
+element. For example, a sign-extension of a vector signed char to a vector
+signed long long will sign extend the rightmost byte of each doubleword.
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 3.1
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1
 
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
new file mode 100644
index 00000000000..7bf979c6fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
@@ -0,0 +1,128 @@
+/* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
+
+/* These builtins were not defined until ISA 3.1 but only require ISA 3.0
+   support.  */
+
+/* { dg-final { scan-assembler-times {\mvextsb2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsw2d\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector signed char vec_arg_qi, vec_result_qi;
+  vector signed short int vec_arg_hi, vec_result_hi, vec_expected_hi;
+  vector signed int vec_arg_wi, vec_result_wi, vec_expected_wi;
+  vector signed long long vec_result_di, vec_expected_di;
+
+  /* test sign extend byte to word */
+  vec_arg_qi = (vector signed char) {1, 2, 3, 4, 5, 6, 7, 8,
+				     -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_wi = (vector signed int) {1, 5, -1, -5};
+
+  vec_result_wi = vec_signexti (vec_arg_qi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(char, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n",
+	     i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend byte to double */
+  vec_arg_qi = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,
+				    -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_qi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(byte, long long int):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to word */
+  vec_arg_hi = (vector signed short int){1, 2, 3, 4, -1, -2, -3, -4};
+  vec_expected_wi = (vector signed int){1, 3, -1, -3};
+
+  vec_result_wi = vec_signexti(vec_arg_hi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(short, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n", i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to double word */
+  vec_arg_hi = (vector signed short int ){1, 3, 5, 7,  -1, -3, -5, -7};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_hi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(short, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend word to double word */
+  vec_arg_wi = (vector signed int ){1, 3, -1, -3};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_wi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(word, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  return 0;
+}
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2a/5] rs6000, vec_rlnm builtin fix arguments
  2020-10-07 21:08       ` will schmidt
@ 2020-10-12 20:15         ` Carl Love
  2020-10-12 23:37           ` Segher Boessenkool
  0 siblings, 1 reply; 41+ messages in thread
From: Carl Love @ 2020-10-12 20:15 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will, Segher:

This patch fixes an error in how the vec_rlnm() builtin parameters are
handled.  The current test for this builtin are compile only.  The
issue was found in the path that adds the 128-bit operands to the
vec_rlnm() builtin.  The new test for the 128-bit operands is a compile
and run test.

Re-tested the patch on Power 9 with no regression errors.

                    Carl

-----------------------------------------------------------------

gcc/ChangeLog

2020-10-08  Carl Love  <cel@us.ibm.com>

	* config/rs6000/altivec.h (vec_rlnm): Fix bug in argument generation.
---
 gcc/config/rs6000/altivec.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 8a2dcda0144..f7720d136c9 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -183,7 +183,7 @@
 #define vec_recipdiv __builtin_vec_recipdiv
 #define vec_rlmi __builtin_vec_rlmi
 #define vec_vrlnm __builtin_vec_rlnm
-#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
+#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
 #define vec_rsqrt __builtin_vec_rsqrt
 #define vec_rsqrte __builtin_vec_rsqrte
 #define vec_signed __builtin_vec_vsigned
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support
  2020-10-08 14:22       ` will schmidt
@ 2020-10-12 20:15         ` Carl Love
  0 siblings, 0 replies; 41+ messages in thread
From: Carl Love @ 2020-10-12 20:15 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will, Segher:
 
This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats.
 
Updated ChangeLog comments.  Fixed up comments in the test program.

Re-tested the patch on Power 9 with no regression errors.
                   
                        Carl

-----------------------------------------------------------

gcc/ChangeLog

2020-10-12  Carl Love  <cel@us.ibm.com>
	* config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
	* config/rs6000/rs6000-call.c (P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P):
	New overloaded definitions.

gcc/testsuite/ChangeLog

2020-10-12  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/int_128bit-runnable.c: Add 128-bit DFP
	conversion tests.
---
 gcc/config/rs6000/dfp.md                      | 14 +++++
 gcc/config/rs6000/rs6000-call.c               |  4 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 61 +++++++++++++++++++
 3 files changed, 79 insertions(+)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 8f822732bac..0e82e315fee 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -222,6 +222,13 @@
   "dcffixq %0,%1"
   [(set_attr "type" "dfp")])
 
+(define_insn "floattitd2"
+  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
+	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
+  "TARGET_POWER10"
+  "dcffixqq %0,%1"
+  [(set_attr "type" "dfp")])
+
 ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
 ;; This is the first stage of converting it to an integer type.
 
@@ -241,6 +248,13 @@
   "TARGET_DFP"
   "dctfix<q> %0,%1"
   [(set_attr "type" "dfp")])
+
+(define_insn "fixtdti2"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
+	(fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
+  "TARGET_POWER10"
+  "dctfixqq %0,%1"
+  [(set_attr "type" "dfp")])
 \f
 ;; Decimal builtin support
 
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 87fff5c1c80..8d00a25d806 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -4967,6 +4967,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, 0 },
   { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
     RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
 
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
@@ -5074,6 +5076,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
     RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 85ad544e22b..9d281850ee3 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -38,6 +38,7 @@
 #if DEBUG
 #include <stdio.h>
 #include <stdlib.h>
+#include <math.h>
 
 
 void print_i128(__int128_t val)
@@ -59,6 +60,13 @@ int main ()
   __int128_t arg1, result;
   __uint128_t uarg2;
 
+  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
+
+  struct conv_t {
+    __uint128_t u128;
+    _Decimal128 d128;
+  } conv, conv2;
+
   vector signed long long int vec_arg1_di, vec_arg2_di;
   vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
   vector unsigned long long int vec_uresult_di;
@@ -2249,6 +2257,59 @@ int main ()
     abort();
 #endif
   }
+  
+  /* DFP to __int128 and __int128 to DFP conversions */
+  /* Print the DFP value as an unsigned int so we can see the bit patterns.  */
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
+  expected_result_dfp128 = conv.d128;
 
+  arg1 = 4;
+
+  conv.d128 = (_Decimal128) arg1;
+
+  result_dfp128 = (_Decimal128) arg1;
+  if (((conv.u128 >>64) != 0x2208000000000000ULL) &&
+      ((conv.u128 & 0xFFFFFFFFFFFFFFFF) != 0x4ULL)) {
+#if DEBUG
+    printf("ERROR:  convert int128 value ");
+    print_i128 (arg1);
+    conv.d128 = result_dfp128;
+    printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)((conv.u128) >>64),
+	   (unsigned long long)((conv.u128) & 0xFFFFFFFFFFFFFFFF));
+
+    conv.d128 = expected_result_dfp128;
+    printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
+	   (unsigned long long) (conv.u128>>64),
+	   (unsigned long long) (conv.u128 & 0xFFFFFFFFFFFFFFFF));
+#else
+    abort();
+#endif
+  }
+
+  expected_result = 4;
+
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
+  arg1_dfp128 = conv.d128;
+
+  result = (__int128_t) arg1_dfp128;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR:  convert DFP value ");
+    printf("0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)(conv.u128>>64),
+	   (unsigned long long)(conv.u128 & 0xFFFFFFFFFFFFFFFF));
+    printf("to __int128 value = ");
+    print_i128 (result);
+    printf("\ndoes not match expected_result = ");
+    print_i128 (expected_result);
+    printf("\n");
+#else
+    abort();
+#endif
+  }
   return 0;
 }
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/5] Test 128-bit shifts for just the int128 type.
  2020-10-08 14:50       ` will schmidt
@ 2020-10-12 20:15         ` Carl Love
  0 siblings, 0 replies; 41+ messages in thread
From: Carl Love @ 2020-10-12 20:15 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will, Segher:

Patch 4 adds the vector 128-bit integer shift instruction support for
the V1TI type.

This patch also renames and moves the VSX_TI iterator from vsx.md to
VEC_TI in vector.md.  The uses of VEC_TI are also updated.

Re-tested the patch on Power 9 with no regression errors.

                    Carl

------------------------------------------------

gcc/ChangeLog

2020-10-12  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
	Rename to altivec_vslq_<mode>, altivec_vsrq_<mode>, mode VEC_TI.
	* config/rs6000/vector.md (VEC_TI): Was named VSX_TI in vsx.md.
	(vashlv1ti3): Change to vashl<mode>3, mode VEC_TI.
	(vlshrv1ti3): Change to vlshr<mode>3, mode VEC_TI.
	* config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator. Update
	uses of VSX_TI to VEC_TI.

gcc/testsuite/ChangeLog

2020-10-12  Carl Love  <cel@us.ibm.com>
	gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
	tests.
---
 gcc/config/rs6000/altivec.md                  | 16 ++++-----
 gcc/config/rs6000/vector.md                   | 27 ++++++++-------
 gcc/config/rs6000/vsx.md                      | 33 +++++++++----------
 .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++++++--
 4 files changed, 52 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e9623bc3285..9b70830ae00 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2220,10 +2220,10 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vslq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vslq_<mode>"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+		     (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vslq %0,%1,%2"
@@ -2237,10 +2237,10 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vsrq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vsrq_<mode>"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+			   (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vsrq %0,%1,%2"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index c2ae74fbe92..b2f17063ac9 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,6 +26,9 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
 
+;; 128-bit int modes
+(define_mode_iterator VEC_TI [V1TI TI])
+
 ;; Vector int modes for parity
 (define_mode_iterator VEC_IP [V8HI
 			      V4SI
@@ -1627,17 +1630,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vashlv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vashl<mode>3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+			 (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn (gen_altivec_vslq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vslq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
@@ -1650,17 +1653,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vlshrv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vlshr<mode>3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+			   (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn (gen_altivec_vsrq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vsrq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 5b6a0bd728a..87f96ffcc4c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -37,9 +37,6 @@
 				  TI
 				  V1TI])
 
-;; Iterator for 128-bit integer types that go in a single vector register.
-(define_mode_iterator VSX_TI [TI V1TI])
-
 ;; Iterator for the 2 32-bit vector types
 (define_mode_iterator VSX_W [V4SF V4SI])
 
@@ -944,9 +941,9 @@
 ;; special V1TI container class, which it is not appropriate to use vec_select
 ;; for the type.
 (define_insn "*vsx_le_permute_<mode>"
-  [(set (match_operand:VSX_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
-	(rotate:VSX_TI
-	 (match_operand:VSX_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
+  [(set (match_operand:VEC_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
+	(rotate:VEC_TI
+	 (match_operand:VEC_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
 	 (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
   "@
@@ -960,10 +957,10 @@
    (set_attr "type" "vecperm,vecload,vecstore,*,load,store")])
 
 (define_insn_and_split "*vsx_le_undo_permute_<mode>"
-  [(set (match_operand:VSX_TI 0 "vsx_register_operand" "=wa,wa")
-	(rotate:VSX_TI
-	 (rotate:VSX_TI
-	  (match_operand:VSX_TI 1 "vsx_register_operand" "0,wa")
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=wa,wa")
+	(rotate:VEC_TI
+	 (rotate:VEC_TI
+	  (match_operand:VEC_TI 1 "vsx_register_operand" "0,wa")
 	  (const_int 64))
 	 (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX"
@@ -1031,11 +1028,11 @@
 ;; Peepholes to catch loads and stores for TImode if TImode landed in
 ;; GPR registers on a little endian system.
 (define_peephole2
-  [(set (match_operand:VSX_TI 0 "int_reg_operand")
-	(rotate:VSX_TI (match_operand:VSX_TI 1 "memory_operand")
+  [(set (match_operand:VEC_TI 0 "int_reg_operand")
+	(rotate:VEC_TI (match_operand:VEC_TI 1 "memory_operand")
 		       (const_int 64)))
-   (set (match_operand:VSX_TI 2 "int_reg_operand")
-	(rotate:VSX_TI (match_dup 0)
+   (set (match_operand:VEC_TI 2 "int_reg_operand")
+	(rotate:VEC_TI (match_dup 0)
 		       (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
    && (rtx_equal_p (operands[0], operands[2])
@@ -1043,11 +1040,11 @@
    [(set (match_dup 2) (match_dup 1))])
 
 (define_peephole2
-  [(set (match_operand:VSX_TI 0 "int_reg_operand")
-	(rotate:VSX_TI (match_operand:VSX_TI 1 "int_reg_operand")
+  [(set (match_operand:VEC_TI 0 "int_reg_operand")
+	(rotate:VEC_TI (match_operand:VEC_TI 1 "int_reg_operand")
 		       (const_int 64)))
-   (set (match_operand:VSX_TI 2 "memory_operand")
-	(rotate:VSX_TI (match_dup 0)
+   (set (match_operand:VEC_TI 2 "memory_operand")
+	(rotate:VEC_TI (match_dup 0)
 		       (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
    && peep2_reg_dead_p (2, operands[0])"
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 9d281850ee3..15b4fbf0d95 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -53,6 +53,18 @@ void print_i128(__int128_t val)
 
 void abort (void);
 
+__attribute__((noinline))
+__int128_t shift_right (__int128_t a, __uint128_t b)
+{
+  return a >> b;
+}
+
+__attribute__((noinline))
+__int128_t shift_left (__int128_t a, __uint128_t b)
+{
+  return a << b;
+}
+
 int main ()
 {
   int i, result_int;
@@ -141,7 +153,7 @@ int main ()
 #endif
   }
 
-  arg1 = 3;
+  arg1 = vec_result[0];
   uarg2 = 4;
   expected_result = arg1*16;
 
@@ -225,7 +237,7 @@ int main ()
 #endif
   }
 
-  arg1 = 48;
+  arg1 = vec_uresult[0];
   uarg2 = 4;
   expected_result = arg1/16;
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2b/5] RS6000 add 128-bit Integer Operations
  2020-10-07 21:53       ` will schmidt
@ 2020-10-12 20:16         ` Carl Love
  2020-10-13  0:23         ` Segher Boessenkool
  1 sibling, 0 replies; 41+ messages in thread
From: Carl Love @ 2020-10-12 20:16 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will, Segher:

This patch adds the 128-bit integer support for divide, modulo, shift,
compare of 128-bit integers instructions and builtin support.

Fixed the references to 128-bit in ChangeLog that got missed in the
last go round.

Fixed missing spaces in emit_insn calls.

Re-tested the patch on Power 9 with no regression errors.

                    Carl
--------------------------------------------------


gcc/ChangeLog

	2020-10-08  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
	for new builtins.
	* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
	UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
	(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
	altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
	altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
	altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
	altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
	define_insn.
	(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
	vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
	altivec_vrlqnm): New define_expands.
	* config/rs6000/rs6000-builtin.def (VCMPEQUT_P, VCMPGTST_P,
	VCMPGTUT_P): Add macro expansions.
	(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
	CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
	VCMPAET_P, VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ,
	VSLQ, VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
	MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
	(VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions.
	* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
	P10V_BUILTIN_CMPGE_1TI, P10V_BUILTIN_CMPGE_U1TI,
	P10V_BUILTIN_VCMPGTUT, P10V_BUILTIN_VCMPGTST,
	P10V_BUILTIN_CMPLE_1TI, P10V_BUILTIN_VCMPLE_U1TI,
	P10V_BUILTIN_DIV_V1TI, P10V_BUILTIN_UDIV_V1TI,
	P10V_BUILTIN_VMULESD, P10V_BUILTIN_VMULEUD,
	P10V_BUILTIN_VMULOSD, P10V_BUILTIN_VMULOUD,
	P10V_BUILTIN_VNOR_V1TI, P10V_BUILTIN_VNOR_V1TI_UNS,
	P10V_BUILTIN_VRLQ, P10V_BUILTIN_VRLQMI,
	P10V_BUILTIN_VRLQNM, P10V_BUILTIN_VSLQ,
	P10V_BUILTIN_VSRQ, P10V_BUILTIN_VSRAQ,
	P10V_BUILTIN_VCMPGTUT_P, P10V_BUILTIN_VCMPGTST_P,
	P10V_BUILTIN_VCMPEQUT_P, P10V_BUILTIN_VCMPGTUT_P,
	P10V_BUILTIN_VCMPGTST_P, P10V_BUILTIN_CMPNET,
	P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P,
	P10V_BUILTIN_VSIGNEXTSD2Q, P10V_BUILTIN_DIVES_V1TI,
	P10V_BUILTIN_MODS_V1TI, P10V_BUILTIN_MODU_V1TI):
	New overloaded definitions.
	(rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT,
	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
	P10_BUILTIN_CMPLE_U1TI]: New case statements.
	(rs6000_init_builtins) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
	New assignments.
	(altivec_init_builtins): New E_V1TImode case statement.
	(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
	P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
	P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.
	* config/rs6000/r6000.c (rs6000_handle_altivec_attribute)[E_TImode,
	E_V1TImode]: New case statements.
	* config/rs6000/r6000.h (rs6000_builtin_type_index): New enum
	value RS6000_BTI_bool_V1TI.
	* config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti,
	vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti,
	vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
	vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
	vlshrv1ti3, vashrv1ti3): New define_expands.
	* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
	UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
	UNSPEC_VSX_MODUQ): New unspecs.
	(vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti, vsx_diveu_v1ti,
	vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti, vsx_sign_extend_v2di_v1ti):
	New define_insns.
	(vcmpnet): New define_expand.
	* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
	vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
	vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,
	vec_cmpge, vec_cmple, vec_all_eq, vec_all_ne, vec_all_gt, vec_all_lt,
	vec_all_ge, vec_all_le, vec_any_eq, vec_any_ne, vec_any_gt, vec_any_lt,
	vec_any_ge, vec_any_le.

gcc/testsuite/ChangeLog

2020-10-08 Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/int_128bit-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |    4 +
 gcc/config/rs6000/altivec.md                  |  241 ++
 gcc/config/rs6000/rs6000-builtin.def          |   45 +-
 gcc/config/rs6000/rs6000-call.c               |  138 +-
 gcc/config/rs6000/rs6000.c                    |    1 +
 gcc/config/rs6000/rs6000.h                    |    3 +-
 gcc/config/rs6000/vector.md                   |  191 ++
 gcc/config/rs6000/vsx.md                      |   97 +
 gcc/doc/extend.texi                           |  174 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 2254 +++++++++++++++++
 10 files changed, 3144 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 562c5273f71..7700434f0fe 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -689,6 +689,10 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
 
 #ifdef _ARCH_PWR10
+#define vec_signextq  __builtin_vec_vsignextq
+#define vec_dive __builtin_vec_dive
+#define vec_mod  __builtin_vec_mod
+
 /* May modify these macro definitions if future capabilities overload
    with support for different vector argument and result types.  */
 #define vec_cntlzm(a, b)	__builtin_altivec_vclzdm (a, b)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0a2e634d6b0..e9623bc3285 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -39,12 +39,16 @@
    UNSPEC_VMULESH
    UNSPEC_VMULEUW
    UNSPEC_VMULESW
+   UNSPEC_VMULEUD
+   UNSPEC_VMULESD
    UNSPEC_VMULOUB
    UNSPEC_VMULOSB
    UNSPEC_VMULOUH
    UNSPEC_VMULOSH
    UNSPEC_VMULOUW
    UNSPEC_VMULOSW
+   UNSPEC_VMULOUD
+   UNSPEC_VMULOSD
    UNSPEC_VPKPX
    UNSPEC_VPACK_SIGN_SIGN_SAT
    UNSPEC_VPACK_SIGN_UNS_SAT
@@ -628,6 +632,14 @@
   "vcmpequ<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_eqv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpequq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gt<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gt:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -636,6 +648,14 @@
   "vcmpgts<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gt:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpgtsq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gtu<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gtu:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -644,6 +664,14 @@
   "vcmpgtu<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		  (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpgtuq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_eqv4sf"
   [(set (match_operand:V4SF 0 "altivec_register_operand" "=v")
 	(eq:V4SF (match_operand:V4SF 1 "altivec_register_operand" "v")
@@ -1687,6 +1715,19 @@
  DONE;
 })
 
+(define_expand "vec_widen_umult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
 (define_expand "vec_widen_smult_even_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1700,6 +1741,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+ else
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_umult_odd_v16qi"
   [(use (match_operand:V8HI 0 "register_operand"))
    (use (match_operand:V16QI 1 "register_operand"))
@@ -1765,6 +1819,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_umult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_smult_odd_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1778,6 +1845,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "altivec_vmuleub"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
@@ -1859,6 +1939,15 @@
   "vmuleuw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuleud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULEUD))]
+  "TARGET_POWER10"
+  "vmuleud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulouw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1868,6 +1957,15 @@
   "vmulouw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuloud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOUD))]
+  "TARGET_POWER10"
+  "vmuloud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulesw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1877,6 +1975,15 @@
   "vmulesw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulesd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULESD))]
+  "TARGET_POWER10"
+  "vmulesd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulosw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1886,6 +1993,15 @@
   "vmulosw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulosd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOSD))]
+  "TARGET_POWER10"
+  "vmulosd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 ;; Vector pack/unpack
 (define_insn "altivec_vpkpx"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
@@ -1979,6 +2095,15 @@
   "vrl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vrlq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+;; rotate amount in needs to be in bits[57:63] of operand2.
+  "vrlq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vrl<VI_char>mi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
@@ -1989,6 +2114,34 @@
   "vrl<VI_char>mi %0,%2,%3"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqmi"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")
+		      (match_operand:V1TI 3 "vsx_register_operand")]
+		     UNSPEC_VRLMI))]
+  "TARGET_POWER10"
+{
+  /* Mask bit begin, end fields need to be in bits [41:55] of 128-bit operand2.
+     Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[3]));
+  emit_insn (gen_altivec_vrlqmi_inst (operands[0], operands[1], operands[2],
+				      tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqmi_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "0")
+		      (match_operand:V1TI 3 "vsx_register_operand" "v")]
+		     UNSPEC_VRLMI))]
+  "TARGET_POWER10"
+  "vrlqmi %0,%1,%3"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vrl<VI_char>nm"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
@@ -1998,6 +2151,31 @@
   "vrl<VI_char>nm %0,%1,%2"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqnm"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")]
+		     UNSPEC_VRLNM))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vrlqnm_inst (operands[0], operands[1], tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqnm_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+		     UNSPEC_VRLNM))]
+  "TARGET_POWER10"
+  ;; rotate and mask bits need to be in upper 64-bits of operand2.
+  "vrlqnm %0,%1,%2"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vsl"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2042,6 +2220,15 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vslq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vslq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsr<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (lshiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2050,6 +2237,15 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsrq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsrq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsra<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (ashiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2058,6 +2254,15 @@
   "vsra<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsraq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsraq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vsr"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2618,6 +2823,18 @@
   "vcmpequ<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_vcmpequt_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v")
+			   (match_operand:V1TI 2 "altivec_register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpequq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgts<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2630,6 +2847,18 @@
   "vcmpgts<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtst_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gt:CC (match_operand:V1TI 1 "register_operand" "v")
+			   (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gt:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpgtsq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgtu<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gtu:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2642,6 +2871,18 @@
   "vcmpgtu<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtut_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gtu:CC (match_operand:V1TI 1 "register_operand" "v")
+			    (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gtu:V1TI (match_dup 1)
+		  (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpgtuq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpeqfp_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(eq:CC (match_operand:V4SF 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 4c2e9460949..57d69b4dc87 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1178,7 +1178,6 @@
 		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
 		     | RS6000_BTC_TERNARY),				\
 		    CODE_FOR_ ## ICODE)			/* ICODE */
-
 \f
 /* Insure 0 is not a legitimate index.  */
 BU_SPECIAL_X (RS6000_BUILTIN_NONE, NULL, 0, RS6000_BTC_MISC)
@@ -2736,6 +2735,10 @@ BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
 BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
 
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
+BU_P10V_AV_2 (VCMPEQUT_P,	"vcmpequt_p",	CONST,	vector_eq_v1ti_p)
+BU_P10V_AV_2 (VCMPGTST_P,	"vcmpgtst_p",	CONST,	vector_gt_v1ti_p)
+BU_P10V_AV_2 (VCMPGTUT_P,	"vcmpgtut_p",	CONST,	vector_gtu_v1ti_p)
+
 BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
 BU_P10_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm)
@@ -2756,7 +2759,38 @@ BU_P10V_VSX_2 (XXGENPCVM_V16QI, "xxgenpcvm_v16qi", CONST, xxgenpcvm_v16qi)
 BU_P10V_VSX_2 (XXGENPCVM_V8HI, "xxgenpcvm_v8hi", CONST, xxgenpcvm_v8hi)
 BU_P10V_VSX_2 (XXGENPCVM_V4SI, "xxgenpcvm_v4si", CONST, xxgenpcvm_v4si)
 BU_P10V_VSX_2 (XXGENPCVM_V2DI, "xxgenpcvm_v2di", CONST, xxgenpcvm_v2di)
-
+BU_P10V_AV_2 (VCMPGTUT,		"vcmpgtut",	CONST,	vector_gtuv1ti)
+BU_P10V_AV_2 (VCMPGTST,		"vcmpgtst",	CONST,	vector_gtv1ti)
+BU_P10V_AV_2 (VCMPEQUT,		"vcmpequt",	CONST,	eqvv1ti3)
+BU_P10V_AV_2 (CMPNET,		"vcmpnet",	CONST,	vcmpnet)
+BU_P10V_AV_2 (CMPGE_1TI,	"cmpge_1ti",    CONST,  vector_nltv1ti)
+BU_P10V_AV_2 (CMPGE_U1TI,	"cmpge_u1ti",   CONST,  vector_nltuv1ti)
+BU_P10V_AV_2 (CMPLE_1TI,	"cmple_1ti",    CONST,  vector_ngtv1ti)
+BU_P10V_AV_2 (CMPLE_U1TI,	"cmple_u1ti",   CONST,  vector_ngtuv1ti)
+BU_P10V_AV_2 (VNOR_V1TI_UNS,	"vnor_v1ti_uns",CONST,	norv1ti3)
+BU_P10V_AV_2 (VNOR_V1TI,	"vnor_v1ti",	CONST,	norv1ti3)
+BU_P10V_AV_2 (VCMPNET_P,	"vcmpnet_p",	CONST,	vector_ne_v1ti_p)
+BU_P10V_AV_2 (VCMPAET_P,	"vcmpaet_p",	CONST,	vector_ae_v1ti_p)
+
+BU_P10V_AV_1 (VSIGNEXTSD2Q,	"vsignext",     CONST,  vsx_sign_extend_v2di_v1ti)
+
+BU_P10V_AV_2 (VMULEUD,	"vmuleud",	CONST,	vec_widen_umult_even_v2di)
+BU_P10V_AV_2 (VMULESD,	"vmulesd",	CONST,	vec_widen_smult_even_v2di)
+BU_P10V_AV_2 (VMULOUD,	"vmuloud",	CONST,	vec_widen_umult_odd_v2di)
+BU_P10V_AV_2 (VMULOSD,	"vmulosd",	CONST,	vec_widen_smult_odd_v2di)
+BU_P10V_AV_2 (VRLQ,		"vrlq",		CONST,	vrotlv1ti3)
+BU_P10V_AV_2 (VSLQ,		"vslq",		CONST,	vashlv1ti3)
+BU_P10V_AV_2 (VSRQ,		"vsrq",		CONST,	vlshrv1ti3)
+BU_P10V_AV_2 (VSRAQ,		"vsraq",	CONST,	vashrv1ti3)
+BU_P10V_AV_2 (VRLQNM,	"vrlqnm",	CONST,	altivec_vrlqnm)
+BU_P10V_AV_2 (DIV_V1TI,	"div_1ti",      CONST,  vsx_div_v1ti)
+BU_P10V_AV_2 (UDIV_V1TI,	"udiv_1ti",     CONST,  vsx_udiv_v1ti)
+BU_P10V_AV_2 (DIVES_V1TI,	"dives",	CONST,	vsx_dives_v1ti)
+BU_P10V_AV_2 (DIVEU_V1TI,	"diveu",	CONST,	vsx_diveu_v1ti)
+BU_P10V_AV_2 (MODS_V1TI,	"mods",		CONST,	vsx_mods_v1ti)
+BU_P10V_AV_2 (MODU_V1TI,	"modu",		CONST,	vsx_modu_v1ti)
+
+BU_P10V_AV_3 (VRLQMI,	"vrlqmi",	CONST,	altivec_vrlqmi)
 BU_P10V_AV_3 (VEXTRACTBL, "vextdubvlx", CONST, vextractlv16qi)
 BU_P10V_AV_3 (VEXTRACTHL, "vextduhvlx", CONST, vextractlv8hi)
 BU_P10V_AV_3 (VEXTRACTWL, "vextduwvlx", CONST, vextractlv4si)
@@ -2863,6 +2897,12 @@ BU_P10_OVERLOAD_2 (CLRR, "clrr")
 BU_P10_OVERLOAD_2 (GNB, "gnb")
 BU_P10_OVERLOAD_4 (XXEVAL, "xxeval")
 BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
+BU_P10_OVERLOAD_2 (VRLQ, "vrlq")
+BU_P10_OVERLOAD_2 (VSLQ, "vslq")
+BU_P10_OVERLOAD_2 (VSRQ, "vsrq")
+BU_P10_OVERLOAD_2 (VSRAQ, "vsraq")
+BU_P10_OVERLOAD_2 (DIVE, "dive")
+BU_P10_OVERLOAD_2 (MOD, "mod")
 
 BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
 BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
@@ -2882,6 +2922,7 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
 BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
 BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
 
+BU_P10_OVERLOAD_1 (SIGNEXT, "vsignextq")
 \f
 BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
 BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 9e514a01012..87fff5c1c80 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -839,6 +839,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, VSX_BUILTIN_XVCMPEQDP,
@@ -885,6 +889,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -899,8 +909,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTSD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTST,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, VSX_BUILTIN_XVCMPGTDP,
@@ -943,6 +957,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_CMPLE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -995,6 +1014,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIV_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_UDIV_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
@@ -1789,6 +1814,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULE, P8V_BUILTIN_VMULEUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10V_BUILTIN_VMULESD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10V_BUILTIN_VMULEUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULEUB, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULESB, ALTIVEC_BUILTIN_VMULESB,
@@ -1812,6 +1842,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULO, P8V_BUILTIN_VMULOUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10V_BUILTIN_VMULOSD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10V_BUILTIN_VMULOUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULO, ALTIVEC_BUILTIN_VMULOSH,
     RS6000_BTI_V4SI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULOSH, ALTIVEC_BUILTIN_VMULOSH,
@@ -1854,6 +1889,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
@@ -2115,6 +2160,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_RL, P8V_BUILTIN_VRLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10V_BUILTIN_VRLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10V_BUILTIN_VRLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
@@ -2133,12 +2183,23 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_RLMI, P9V_BUILTIN_VRLDMI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { P9V_BUILTIN_VEC_RLMI, P10V_BUILTIN_VRLQMI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI },
+  { P9V_BUILTIN_VEC_RLMI, P10V_BUILTIN_VRLQMI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLWNM,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLDNM,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10V_BUILTIN_VRLQNM,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10V_BUILTIN_VRLQNM,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
@@ -2155,6 +2216,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, P8V_BUILTIN_VSLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10V_BUILTIN_VSLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10V_BUILTIN_VSLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTSP,
@@ -2351,6 +2417,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SR, P8V_BUILTIN_VSRD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10V_BUILTIN_VSRQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10V_BUILTIN_VSRQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
@@ -2379,6 +2450,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SRA, P8V_BUILTIN_VSRAD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10V_BUILTIN_VSRAQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10V_BUILTIN_VSRAQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
@@ -3996,12 +4072,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10V_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10V_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, ALTIVEC_BUILTIN_VCMPGTFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, VSX_BUILTIN_XVCMPGTDP_P,
@@ -4066,6 +4146,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10V_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10V_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, ALTIVEC_BUILTIN_VCMPEQFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, VSX_BUILTIN_XVCMPEQDP_P,
@@ -4117,12 +4201,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10V_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10V_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, ALTIVEC_BUILTIN_VCMPGEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
@@ -4771,6 +4859,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEW,
     RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
 
   /* The following 2 entries have been deprecated.  */
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEB_P,
@@ -4871,6 +4965,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
 
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
@@ -4976,7 +5072,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
-
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
@@ -5903,6 +6000,21 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
  { P10_BUILTIN_VEC_XVTLSBB_ONES, P10V_BUILTIN_XVTLSBB_ONES,
     RS6000_BTI_INTSI, RS6000_BTI_unsigned_V16QI, 0, 0 },
 
+ { P10_BUILTIN_VEC_SIGNEXT, P10V_BUILTIN_VSIGNEXTSD2Q,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, 0, 0 },
+
+ { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
   { RS6000_BUILTIN_NONE, RS6000_BUILTIN_NONE, 0, 0, 0, 0 }
 };
 \f
@@ -12256,12 +12368,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPEQUH:
     case ALTIVEC_BUILTIN_VCMPEQUW:
     case P8V_BUILTIN_VCMPEQUD:
+    case P10V_BUILTIN_VCMPEQUT:
       fold_compare_helper (gsi, EQ_EXPR, stmt);
       return true;
 
     case P9V_BUILTIN_CMPNEB:
     case P9V_BUILTIN_CMPNEH:
     case P9V_BUILTIN_CMPNEW:
+    case P10V_BUILTIN_CMPNET:
       fold_compare_helper (gsi, NE_EXPR, stmt);
       return true;
 
@@ -12273,6 +12387,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_2DI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10V_BUILTIN_CMPGE_1TI:
+    case P10V_BUILTIN_CMPGE_U1TI:
       fold_compare_helper (gsi, GE_EXPR, stmt);
       return true;
 
@@ -12284,6 +12400,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
     case P8V_BUILTIN_VCMPGTSD:
+    case P10V_BUILTIN_VCMPGTUT:
+    case P10V_BUILTIN_VCMPGTST:
       fold_compare_helper (gsi, GT_EXPR, stmt);
       return true;
 
@@ -12295,6 +12413,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPLE_U4SI:
     case VSX_BUILTIN_CMPLE_2DI:
     case VSX_BUILTIN_CMPLE_U2DI:
+    case P10V_BUILTIN_CMPLE_1TI:
+    case P10V_BUILTIN_CMPLE_U1TI:
       fold_compare_helper (gsi, LE_EXPR, stmt);
       return true;
 
@@ -13000,6 +13120,8 @@ rs6000_init_builtins (void)
 					    ? "__vector __bool long"
 					    : "__vector __bool long long",
 					    bool_long_long_type_node, 2);
+  bool_V1TI_type_node = rs6000_vector_type ("__vector __bool __int128",
+					    intTI_type_node, 1);
   pixel_V8HI_type_node = rs6000_vector_type ("__vector __pixel",
 					     pixel_type_node, 8);
 
@@ -13185,6 +13307,10 @@ altivec_init_builtins (void)
     = build_function_type_list (integer_type_node,
 				integer_type_node, V2DI_type_node,
 				V2DI_type_node, NULL_TREE);
+  tree int_ftype_int_v1ti_v1ti
+    = build_function_type_list (integer_type_node,
+				integer_type_node, V1TI_type_node,
+				V1TI_type_node, NULL_TREE);
   tree void_ftype_v4si
     = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
   tree v8hi_ftype_void
@@ -13537,6 +13663,9 @@ altivec_init_builtins (void)
 	case E_VOIDmode:
 	  type = int_ftype_int_opaque_opaque;
 	  break;
+	case E_V1TImode:
+	  type = int_ftype_int_v1ti_v1ti;
+	  break;
 	case E_V2DImode:
 	  type = int_ftype_int_v2di_v2di;
 	  break;
@@ -14136,6 +14265,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10V_BUILTIN_XXGENPCVM_V8HI:
     case P10V_BUILTIN_XXGENPCVM_V4SI:
     case P10V_BUILTIN_XXGENPCVM_V2DI:
+    case P10V_BUILTIN_VMULEUD:
+    case P10V_BUILTIN_VMULOUD:
+    case P10V_BUILTIN_DIVEU_V1TI:
+    case P10V_BUILTIN_MODU_V1TI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
@@ -14235,10 +14368,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case VSX_BUILTIN_CMPGE_U8HI:
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10V_BUILTIN_CMPGE_U1TI:
     case ALTIVEC_BUILTIN_VCMPGTUB:
     case ALTIVEC_BUILTIN_VCMPGTUH:
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
+    case P10V_BUILTIN_VCMPGTUT:
+    case P10V_BUILTIN_VCMPEQUT:
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
       break;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6f204ca202a..cc629f2d938 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -19546,6 +19546,7 @@ rs6000_handle_altivec_attribute (tree *node,
     case 'b':
       switch (mode)
 	{
+	case E_TImode: case E_V1TImode: result = bool_V1TI_type_node; break;
 	case E_DImode: case E_V2DImode: result = bool_V2DI_type_node; break;
 	case E_SImode: case E_V4SImode: result = bool_V4SI_type_node; break;
 	case E_HImode: case E_V8HImode: result = bool_V8HI_type_node; break;
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index bbd8060e143..ce0543e38fe 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2323,7 +2323,6 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_MMA		MASK_MMA	/* ISA 3.1 MMA.  */
 #define RS6000_BTM_P10		MASK_POWER10
 
-
 #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
 				 | RS6000_BTM_VSX			\
 				 | RS6000_BTM_P8_VECTOR			\
@@ -2436,6 +2435,7 @@ enum rs6000_builtin_type_index
   RS6000_BTI_bool_V8HI,          /* __vector __bool short */
   RS6000_BTI_bool_V4SI,          /* __vector __bool int */
   RS6000_BTI_bool_V2DI,          /* __vector __bool long */
+  RS6000_BTI_bool_V1TI,          /* __vector __bool 128-bit */
   RS6000_BTI_pixel_V8HI,         /* __vector __pixel */
   RS6000_BTI_long,	         /* long_integer_type_node */
   RS6000_BTI_unsigned_long,      /* long_unsigned_type_node */
@@ -2489,6 +2489,7 @@ enum rs6000_builtin_type_index
 #define bool_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V8HI])
 #define bool_V4SI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V4SI])
 #define bool_V2DI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V2DI])
+#define bool_V1TI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V1TI])
 #define pixel_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_pixel_V8HI])
 
 #define long_long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long_long])
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 796345c80d3..c2ae74fbe92 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -685,6 +685,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtv1ti"
+  [(set (match_operand:V1TI 0 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))]
+  "TARGET_POWER10"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nlt<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -697,6 +704,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		 (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_gtu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -704,6 +722,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		  (match_operand:V1TI 2 "altivec_register_operand")))]
+  "TARGET_POWER10"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nltu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -716,6 +741,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		  (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+	(not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_geu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(geu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -735,6 +771,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_ngtu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vlogical_operand")
@@ -746,6 +793,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		  (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 ; There are 14 possible vector FP comparison operators, gt and eq of them have
 ; been expanded above, so just support 12 remaining operators here.
 
@@ -894,6 +952,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_eq_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 ;; This expansion handles the V16QI, V8HI, and V4SI modes in the
 ;; implementation of the vec_all_ne built-in functions on Power9.
 (define_expand "vector_ne_<mode>_p"
@@ -976,6 +1046,23 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+(define_expand "vector_ne_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V2DI mode in the implementation of the
 ;; vec_any_eq built-in function on Power9.
 ;;
@@ -1002,6 +1089,26 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+(define_expand "vector_ae_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))
+   (set (match_dup 0)
+	(xor:SI (match_dup 0)
+		(const_int 1)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V4SF and V2DF modes in the Power9
 ;; implementation of the vec_all_ne built-in functions.  Note that the
 ;; expansions for this pattern with these modes makes no use of power9-
@@ -1061,6 +1168,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gt_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gt:CC (match_operand:V1TI 1 "vlogical_operand")
+			     (match_operand:V1TI 2 "vlogical_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (gt:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 (define_expand "vector_ge_<mode>_p"
   [(parallel
     [(set (reg:CC CR6_REGNO)
@@ -1085,6 +1204,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtu_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gtu:CC (match_operand:V1TI 1 "altivec_register_operand")
+			      (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "altivec_register_operand")
+	  (gtu:V1TI (match_dup 1)
+		    (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 ;; AltiVec/VSX predicates.
 
 ;; This expansion is triggered during expansion of predicate built-in
@@ -1460,6 +1591,20 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vrotlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vrlq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for rotatert to make use of vrotl
 (define_expand "vrotr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1481,6 +1626,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vashlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vslq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for logical shift right on each vector element
 (define_expand "vlshr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1489,6 +1649,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vlshrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vsrq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for arithmetic shift right on each vector element
 (define_expand "vashr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1496,6 +1671,22 @@
 			(match_operand:VEC_I 2 "vint_operand")))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
+
+;; No immediate version of this 128-bit instruction
+(define_expand "vashrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vsraq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 \f
 ;; Vector reduction expanders for VSX
 ; The (VEC_reduc:...
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 31fcffe8f33..5b6a0bd728a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -298,6 +298,12 @@
    UNSPEC_VSX_XXSPLTD
    UNSPEC_VSX_DIVSD
    UNSPEC_VSX_DIVUD
+   UNSPEC_VSX_DIVSQ
+   UNSPEC_VSX_DIVUQ
+   UNSPEC_VSX_DIVESQ
+   UNSPEC_VSX_DIVEUQ
+   UNSPEC_VSX_MODSQ
+   UNSPEC_VSX_MODUQ
    UNSPEC_VSX_MULSD
    UNSPEC_VSX_SIGN_EXTEND
    UNSPEC_VSX_XVCVBF16SPN
@@ -1732,6 +1738,60 @@
 }
   [(set_attr "type" "div")])
 
+(define_insn "vsx_div_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVSQ))]
+  "TARGET_POWER10"
+  "vdivsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_udiv_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVUQ))]
+  "TARGET_POWER10"
+  "vdivuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_dives_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVESQ))]
+  "TARGET_POWER10"
+  "vdivesq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_diveu_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVEUQ))]
+  "TARGET_POWER10"
+  "vdiveuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_mods_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODSQ))]
+  "TARGET_POWER10"
+  "vmodsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_modu_v1ti"
+ [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODUQ))]
+  "TARGET_POWER10"
+  "vmoduq %0,%1,%2"
+  [(set_attr "type" "div")])
+
 ;; *tdiv* instruction returning the FG flag
 (define_expand "vsx_tdiv<mode>3_fg"
   [(set (match_dup 3)
@@ -3083,6 +3143,21 @@
   "xxpermdi %x0,%x1,%x1,2"
   [(set_attr "type" "vecperm")])
 
+;; Swap upper/lower 64-bit values in a 128-bit vector
+(define_insn "xxswapd_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(subreg:V1TI
+           (vec_select:V2DI
+             (subreg:V2DI
+                (match_operand:V1TI 1 "vsx_register_operand" "v") 0 )
+	     (parallel [(const_int 1)(const_int 0)]))
+           0))]
+  "TARGET_POWER10"
+;; AIX does not support extended mnemonic xxswapd.  Use the basic
+;; mnemonic xxpermdi instead.
+  "xxpermdi %x0,%x1,%x1,2"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "xxgenpcvm_<mode>_internal"
   [(set (match_operand:VSX_EXTRACT_I4 0 "altivec_register_operand" "=wa")
 	(unspec:VSX_EXTRACT_I4
@@ -4767,6 +4842,15 @@
    (set_attr "type" "vecload")])
 
 \f
+;; ISA 3.1 vector extend sign support
+(define_insn "vsx_sign_extend_v2di_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
+		     UNSPEC_VSX_SIGN_EXTEND))]
+  "TARGET_POWER10"
+  "vextsd2q %0,%1"
+  [(set_attr "type" "vecexts")])
+
 ;; ISA 3.0 vector extend sign support
 
 (define_insn "vsx_sign_extend_qi_<mode>"
@@ -5451,6 +5535,19 @@
   "vcmpneb %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Vector Compare Not Equal v1ti (specified/not+eq:)
+(define_expand "vcmpnet"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(not:V1TI
+	  (eq:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		   (match_operand:V1TI 2 "altivec_register_operand"))))]
+   "TARGET_POWER10"
+{
+  emit_insn (gen_eqvv1ti3 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_one_cmplv1ti2 (operands[0], operands[0]));
+  DONE;
+})
+
 ;; Vector Compare Not Equal or Zero Byte
 (define_insn "vcmpnezb"
   [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index c1c2c9f9bf7..99cc053acbe 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21316,6 +21316,180 @@ Generate PCV from specified Mask size, as if implemented by the
 immediate value is either 0, 1, 2 or 3.
 @findex vec_genpcvm
 
+@smallexample
+@exdent vector unsigned __int128 vec_rl (vector unsigned __int128,
+                                         vector unsigned __int128);
+@exdent vector signed __int128 vec_rl (vector signed __int128,
+                                       vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input left by the number of bits
+specified in the most significant quad word of the second input truncated to
+7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlmi (vector signed __int128,
+                                         vector signed __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and inserting it under mask
+into the second input.  The first bit in the mask, the last bit in the mask are
+obtained from the two 7-bit fields bits [108:115] and bits [117:123]
+respectively of the second input.  The shift is obtained from the third input
+in the 7-bit field [125:131] where all bits counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlnm (vector signed __int128,
+                                         vector unsigned __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and ANDing it with a mask.  The
+first bit in the mask and the last bit in the mask are obtained from the two
+7-bit fields bits [117:123] and bits [125:131] respectively of the second
+input.  The shift is obtained from the third input in the 7-bit field bits
+[125:131] where all bits counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_sl(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sl(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of shifting the first input left by the number of bits
+specified in the most significant bits of the second input truncated to
+7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_sr(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sr(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of performing a logical right shift of the first argument
+by the number of bits specified in the most significant double word of the
+second input truncated to 7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_sra(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sra(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of performing arithmetic right shift of the first argument
+by the number of bits specified in the most significant bits of the
+second input truncated to 7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_mule (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mule (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the even
+doubleword elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mulo (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mulo (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the odd
+doubleword elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_div (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_div (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+Returns the result of dividing the first operand by the second operand. An
+attempt to divide any value by zero or to divide the most negative signed
+128-bit integer by negative one results in an undefined value.
+
+@smallexample
+@exdent vector unsigned __int128 vec_dive (vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_dive (vector signed __int128,
+                                         vector signed __int128);
+@end smallexample
+
+The result is produced by shifting the first input left by 128 bits and
+dividing by the second.  If an attempt is made to divide by zero or the result
+is larger than 128 bits, the result is undefined.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mod (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_mod (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+The result is the modulo result of dividing the first input  by the second
+input.
+
+The following builtins perform 128-bit vector comparisons.  The
+@code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is
+one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise
+comparisons between the elements at the same positions within their two vector
+arguments.  The @code{vec_all_xx}function returns a non-zero value if and only
+if all pairwise comparisons are true.  The @code{vec_any_xx} function returns
+a non-zero value if and only if at least one pairwise comparison is true.  The
+@code{vec_cmpxx}function returns a vector of the same type as its two
+arguments, within which each element consists of all ones to denote that
+specified logical comparison of the corresponding elements was true.
+Otherwise, the element of the returned vector contains all zeros.
+
+@smallexample
+vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128);
+
+int vec_all_eq (vector signed __int128, vector signed __int128);
+int vec_all_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ne (vector signed __int128, vector signed __int128);
+int vec_all_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_all_gt (vector signed __int128, vector signed __int128);
+int vec_all_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_lt (vector signed __int128, vector signed __int128);
+int vec_all_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ge (vector signed __int128, vector signed __int128);
+int vec_all_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_all_le (vector signed __int128, vector signed __int128);
+int vec_all_le (vector unsigned __int128, vector unsigned __int128);
+
+int vec_any_eq (vector signed __int128, vector signed __int128);
+int vec_any_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ne (vector signed __int128, vector signed __int128);
+int vec_any_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_any_gt (vector signed __int128, vector signed __int128);
+int vec_any_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_lt (vector signed __int128, vector signed __int128);
+int vec_any_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ge (vector signed __int128, vector signed __int128);
+int vec_any_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_any_le (vector signed __int128, vector signed __int128);
+int vec_any_le (vector unsigned __int128, vector unsigned __int128);
+@end smallexample
+
+
 @node PowerPC Hardware Transactional Memory Built-in Functions
 @subsection PowerPC Hardware Transactional Memory Built-in Functions
 GCC provides two interfaces for accessing the Hardware Transactional
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
new file mode 100644
index 00000000000..85ad544e22b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -0,0 +1,2254 @@
+/* { dg-do run } */
+/* { dg-options "-mcpu=power10 -O2 -save-temps" } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-require-effective-target ppc_native_128bit } */
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mvextsd2q\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsrq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsraq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvcmpuq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpsq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvmuleud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivsq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivuq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveuq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmoduq\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+
+
+void print_i128(__int128_t val)
+{
+  printf(" %lld %llu (0x%llx %llx)",
+	 (signed long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF),
+	 (unsigned long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF));
+}
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i, result_int;
+
+  __int128_t arg1, result;
+  __uint128_t uarg2;
+
+  vector signed long long int vec_arg1_di, vec_arg2_di;
+  vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
+  vector unsigned long long int vec_uresult_di;
+  vector unsigned long long int vec_uexpected_result_di;
+  
+  __int128_t expected_result;
+  __uint128_t uexpected_result;
+
+  vector __int128_t vec_arg1, vec_arg2, vec_result;
+  vector __uint128_t vec_uarg1, vec_uarg2, vec_uarg3, vec_uresult;
+  vector bool __int128  vec_result_bool;
+
+  /* sign extend double to 128-bit integer  */
+  vec_arg1_di[0] = 1000;
+  vec_arg1_di[1] = -123456;
+
+  expected_result = 1000;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1_di[0] = -123456;
+  vec_arg1_di[1] = 1000;
+
+  expected_result = -123456;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  /* test shift 128-bit integers.
+     Note, shift amount is given by the lower 7-bits of the shift amount. */
+  vec_arg1[0] = 3;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]*4;
+
+  vec_result = vec_sl (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 3;
+  uarg2 = 4;
+  expected_result = arg1*16;
+
+  result = arg1 << uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 << uint128):  ");
+    print_i128(arg1);
+    printf(" << %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 3;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]*4;
+  
+  vec_uresult = vec_sl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]/4;
+
+  vec_result = vec_sr (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 48;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]/4;
+  
+  vec_uresult = vec_sr (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 48;
+  uarg2 = 4;
+  expected_result = arg1/16;
+
+  result = arg1 >> uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 >> uint128:  ");
+    print_i128(arg1);
+    printf(" >> %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x0000000012345678ULL;
+  expected_result = (expected_result << 64) | 0x90ABCDEFAABBCCDDULL;
+
+  vec_result = vec_sra (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0xFFFFFFFFFFFFAABBLL;
+  uexpected_result = (uexpected_result << 64) | 0xCCDDEEFF11221234ULL;
+
+  vec_uresult = vec_sra (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x90ABCDEFAABBCCDDULL;
+  expected_result = (expected_result << 64) | 0xEEFF112212345678ULL;
+
+  vec_result = vec_rl (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0x11221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEEFFULL;
+
+  vec_uresult = vec_rl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = (32 << 8) | 95;
+  vec_uarg3[0] = 32;
+  expected_result = 0xaabbccddULL;
+  expected_result = (expected_result << 64) | 0xeeff112200000000ULL;
+
+  vec_result = vec_rlnm (vec_arg1, vec_uarg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(int128, uint128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = (8 << 8) | 119;
+  vec_uarg3[0] = 48;
+
+  uexpected_result = 0x00221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEE00ULL;
+
+  vec_uresult = vec_rlnm (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(uint128, uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_arg2[0] = 0x000000000000DEADULL;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 0x0000BEEF00000000ULL;
+  vec_uarg3[0] = 96 << 16 | 127 << 8 | 32;
+  expected_result = 0x000000000000DEADULL;
+  expected_result = (expected_result << 64) | 0x0000BEEF12345678ULL;
+
+  vec_result = vec_rlmi (vec_arg1, vec_arg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(int128, int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg2_di[1] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 0xDEAD000000000000ULL;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 0x000000000000BEEFULL;
+  vec_uarg3[0] = 16 << 16 | 111 << 8 | 48;
+  uexpected_result = 0xDEAD1234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDBEEFULL;
+
+  vec_uresult = vec_rlmi (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(uint128, unit128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg3[1] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* 128-bit compare tests, result is all 1's if true */
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1[0] = 2468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  uexpected_result = 0xFFFFFFFFFFFFFFFFULL;
+  uexpected_result = (uexpected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: unsigned vec_cmpgt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed vec_cmpgt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpeq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+   
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector multiply Even and Odd tests */
+  vec_arg1_di[0] = 200;
+  vec_arg1_di[1] = 400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[0] * vec_arg2_di[0];
+
+  vec_result = vec_mule (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (signed, signed) failed.\n");
+    printf(" vec_arg1_di[0] = %lld\n", vec_arg1_di[0]);
+    printf(" vec_arg2_di[0] = %lld\n", vec_arg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1_di[0] = -200;
+  vec_arg1_di[1] = -400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[1] * vec_arg2_di[1];
+
+  vec_result = vec_mulo (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (signed, signed) failed.\n");
+    printf(" vec_arg1_di[1] = %lld\n", vec_arg1_di[1]);
+    printf(" vec_arg2_di[1] = %lld\n", vec_arg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[0] * vec_uarg2_di[0];
+
+  vec_uresult = vec_mule (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[1] = %lld\n", vec_uarg1_di[1]);
+    printf(" vec_uarg2_di[1] = %lld\n", vec_uarg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[1] * vec_uarg2_di[1];
+
+  vec_uresult = vec_mulo (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[0] = %lld\n", vec_uarg1_di[0]);
+    printf(" vec_uarg2_di[0] = %lld\n", vec_uarg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Divide Quadword */
+  vec_arg1[0] = -12345678;
+  vec_arg2[0] = 2;
+  expected_result = -6172839;
+
+  vec_result = vec_div (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24680;
+  vec_uarg2[0] = 4;
+  uexpected_result = 6170;
+
+  vec_uresult = vec_div (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Divide Extended Quadword */
+  vec_arg1[0] = -20;        // has 128-bit of zero concatenated onto it
+  vec_arg2[0] = 0x2000000000000000;
+  vec_arg2[0] = vec_arg2[0] << 64;
+  expected_result = -160;
+
+  vec_result = vec_dive (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 20;        // has 128-bit of zero concatenated onto it
+  vec_uarg2[0] = 0x4000000000000000;
+  vec_uarg2[0] = vec_uarg2[0] << 64;
+  uexpected_result = 80;
+
+  vec_uresult = vec_dive (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector modulo quad word  */
+  vec_arg1[0] = -12345675;
+  vec_arg2[0] = 2;
+  expected_result = -1;
+
+  vec_result = vec_mod (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24685;
+  vec_uarg2[0] = 4;
+  uexpected_result = 1;
+
+  vec_uresult = vec_mod (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  return 0;
+}
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/5] Conversions between 128-bit integer and floating point values.
  2020-10-08 15:35       ` will schmidt
@ 2020-10-12 20:16         ` Carl Love
  0 siblings, 0 replies; 41+ messages in thread
From: Carl Love @ 2020-10-12 20:16 UTC (permalink / raw)
  To: will schmidt, segher, dje.gcc, gcc-patches

Will, Segher:
 
This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats using the new P10 instructions
dcffixqq and dctfixqq.  The new instructions are only used on P10 HW,
otherwise the conversions continue to use the existing SW routines.

The files fixkfti-sw.c and fixunskfti-sw.c are renamed versions of
fixkfti.c and fixunskfti.c respectively.  The function names in the
files were updated with the rename as well as some white spaces fixes.

Fixed a typo in the ChangeLog noted by Will.

Removed the target ppc_native_128bit from the test case as we no longer
have the 128-bit flag.

Re-tested the patch on Power 9 with no regression errors.
                    Carl

--------------------------------------------------------------



gcc/ChangeLog

2020-10-12  Carl Love  <cel@us.ibm.com>
	* config/rs6000/rs6000.md (floatti<mode>2, floatunsti<mode>2,
	fix_trunc<mode>ti2, fixuns_trunc<mode>ti2): Add
	define_insn for mode IEEE 128.
	* libgcc/config/rs6000/fixkfti.c: Renamed to fixkfti-sw.c.
	Change calls of __fixkfti to __fixkfti_sw.
	* libgcc/config/rs6000/fixunskfti.c: Renamed to fixunskfti-sw.c.
	Change calls of __fixunskfti to __fixunskfti_sw.
	* libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
	New functions.
	* libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
	New macro.
	(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
	__fixunskfti_resolve): Add resolve functions.
	(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
	functions.
	* libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
	__fixtfti, __fixunstfti): Add editor commands to change
	names.
	* libgcc/config/rs6000/float128-sed-hw (__floattitf,
	__floatuntitf, __fixtfti, __fixunstfti): Add editor commands
	to change names.
	* libgcc/config/rs6000/floattikf.c: Renamed to floattikf-sw.c.
	* libgcc/config/rs6000/floatuntikf.c: Renamed to floatuntikf-sw.c.
	* libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
	__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
	__floatuntikf, __fixkfti, __fixunskfti): New extern declarations.
	* libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
	fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
	(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
	file names to fp128_ppc_funcs.

gcc/testsuite/ChangeLog

2020-10-12  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/fp128_conversions.c: New file.
---
 gcc/config/rs6000/rs6000.md                   |  36 +++
 .../gcc.target/powerpc/fp128_conversions.c    | 283 ++++++++++++++++++
 .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
 .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   4 +-
 libgcc/config/rs6000/float128-hw.c            |  24 ++
 libgcc/config/rs6000/float128-ifunc.c         |  44 ++-
 libgcc/config/rs6000/float128-sed             |   4 +
 libgcc/config/rs6000/float128-sed-hw          |   4 +
 .../rs6000/{floattikf.c => floattikf-sw.c}    |   4 +-
 .../{floatuntikf.c => floatuntikf-sw.c}       |   4 +-
 libgcc/config/rs6000/quad-float128.h          |  17 +-
 libgcc/config/rs6000/t-float128               |   3 +-
 12 files changed, 411 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
 rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
 rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (96%)
 rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
 rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 694ff70635e..5db5d0b4505 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6390,6 +6390,42 @@
    xscvsxddp %x0,%x1"
   [(set_attr "type" "fp")])
 
+(define_insn "floatti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvsqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "floatunsti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvuqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fix_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpsqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fixuns_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpuqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
 ; Allow the combiner to merge source memory operands to the conversion so that
 ; the optimizer/register allocator doesn't try to load the value too early in a
 ; GPR and then use store/load to move it to a FPR and suffer from a store-load
diff --git a/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
new file mode 100644
index 00000000000..1269232a6cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
@@ -0,0 +1,283 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10" } */
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mxscvsqqp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxscvuqqp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxscvqpsqz\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxscvqpuqz\M} 1 } } */
+
+#include <stdio.h>
+#include <math.h>
+#include <fenv.h>
+#include <stdlib.h>
+#include <wchar.h>
+
+#define DEBUG 0
+
+void abort (void);
+
+float conv_i_2_fp( long long int a)
+{
+  return (float) a;
+}
+
+double conv_i_2_fpd( long long int a)
+{
+  return (double) a;
+}
+
+double conv_ui_2_fpd( unsigned long long int a)
+{
+  return (double) a;
+}
+
+__float128 conv_i128_2_fp128 (__int128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__float128 conv_ui128_2_fp128 (__uint128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__int128_t conv_fp128_2_i128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__int128_t) a;
+}
+
+__uint128_t conv_fp128_2_ui128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__uint128_t) a;
+}
+
+long double conv_i128_2_ld (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (long double) a;
+}
+
+__ibm128 conv_i128_2_ibm128 (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble, message uses IBM long double, no binary output
+  return (__ibm128) a;
+}
+
+int main()
+{
+	float a, expected_result_float;
+	double b, expected_result_double;
+	long long int c, expected_result_llint;
+	unsigned long long int u;
+	__int128_t d;
+	__uint128_t u128;
+	unsigned long long expected_result_uint128[2] ;
+	__float128 e;
+	long double ld;     // another 128-bit float version
+
+	union conv_t {
+		float a;
+		double b;
+		long long int c;
+		long long int128[2] ;
+		unsigned long long uint128[2] ;
+		unsigned long long int u;
+		__int128_t d;
+		__uint128_t u128;
+		__float128 e;
+		long double ld;     // another 128-bit float version
+	} conv, conv_result;
+
+	c = 20;
+	expected_result_llint = 20.00000;
+	a = conv_i_2_fp (c);
+
+	if (a != expected_result_llint) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fp(%lld) = %10.5f\n", c, a);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_llint);
+#else
+		abort();
+#endif
+	}
+
+	c = 20;
+	expected_result_double = 20.00000;
+	b = conv_i_2_fpd (c);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fpd(%lld) = %10.5f\n", d, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+	u = 20;
+	expected_result_double = 20.00000;
+	b = conv_ui_2_fpd (u);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_ui_2_fpd(%llu) = %10.5f\n", u, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+  d = -3210;
+  d = (d * 10000000000) + 9876543210;
+  conv_result.e = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0xc02bd2f9068d1160;
+  expected_result_uint128[0] = 0x0;
+  
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(-32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  d = 123;
+  d = (d * 10000000000) + 1234567890;
+  conv_result.ld = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x4271eab4c8ed2000;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(1231234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  u128 = 8760;
+  u128 = (u128 * 10000000000) + 1234567890;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+  expected_result_uint128[1] = 0x402d3eb101df8b48;
+  expected_result_uint128[0] = 0x0;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(87601234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  u128 = 3210;
+  u128 = (u128 * 10000000000) + 9876543210;
+  expected_result_uint128[1] = 0x402bd3429c8feea0;
+  expected_result_uint128[0] = 0x0;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 12345.6789;
+  expected_result_uint128[1] = 0x1407374883526960;
+  expected_result_uint128[0] = 0x3039;
+
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) =  ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = -6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0xffffffffffffe57b;
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x1a85;
+  conv_result.d = conv_fp128_2_ui128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_ui128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+	  
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  return 0;
+}
diff --git a/libgcc/config/rs6000/fixkfti.c b/libgcc/config/rs6000/fixkfti-sw.c
similarity index 96%
rename from libgcc/config/rs6000/fixkfti.c
rename to libgcc/config/rs6000/fixkfti-sw.c
index a22286228aa..d6bbbf889b7 100644
--- a/libgcc/config/rs6000/fixkfti.c
+++ b/libgcc/config/rs6000/fixkfti-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TItype
-__fixkfti (TFtype a)
+__fixkfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/fixunskfti.c b/libgcc/config/rs6000/fixunskfti-sw.c
similarity index 96%
rename from libgcc/config/rs6000/fixunskfti.c
rename to libgcc/config/rs6000/fixunskfti-sw.c
index ab232d92d24..d803936e48a 100644
--- a/libgcc/config/rs6000/fixunskfti.c
+++ b/libgcc/config/rs6000/fixunskfti-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 UTItype
-__fixunskfti (TFtype a)
+__fixunskfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/float128-hw.c b/libgcc/config/rs6000/float128-hw.c
index 8705b53e22a..be8bd07e853 100644
--- a/libgcc/config/rs6000/float128-hw.c
+++ b/libgcc/config/rs6000/float128-hw.c
@@ -86,6 +86,30 @@ __floatdikf_hw (DItype_ppc a)
   return (TFtype) a;
 }
 
+TFtype
+__floattikf_hw (TItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TFtype
+__floatuntikf_hw (UTItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TItype_ppc
+__fixkfti_hw (TFtype a)
+{
+  return (TItype_ppc) a;
+}
+
+UTItype_ppc
+__fixunskfti_hw (TFtype a)
+{
+  return (UTItype_ppc) a;
+}
+
 TFtype
 __floatundikf_hw (UDItype_ppc a)
 {
diff --git a/libgcc/config/rs6000/float128-ifunc.c b/libgcc/config/rs6000/float128-ifunc.c
index c2f65912a74..c221be2c864 100644
--- a/libgcc/config/rs6000/float128-ifunc.c
+++ b/libgcc/config/rs6000/float128-ifunc.c
@@ -46,14 +46,9 @@
 #endif
 
 #define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW : SW)
+#define SW_OR_HW_ISA3_1(SW, HW) (__builtin_cpu_supports ("arch_3_1") ? HW : SW)
 
 /* Resolvers.  */
-
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 static __typeof__ (__addkf3_sw) *
 __addkf3_resolve (void)
 {
@@ -102,6 +97,18 @@ __floatdikf_resolve (void)
   return SW_OR_HW (__floatdikf_sw, __floatdikf_hw);
 }
 
+static __typeof__ (__floattikf_sw) *
+__floattikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floattikf_sw, __floattikf_hw);
+}
+
+static __typeof__ (__floatuntikf_sw) *
+__floatuntikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floatuntikf_sw, __floatuntikf_hw);
+}
+
 static __typeof__ (__floatunsikf_sw) *
 __floatunsikf_resolve (void)
 {
@@ -114,6 +121,19 @@ __floatundikf_resolve (void)
   return SW_OR_HW (__floatundikf_sw, __floatundikf_hw);
 }
 
+
+static __typeof__ (__fixkfti_sw) *
+__fixkfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixkfti_sw, __fixkfti_hw);
+}
+
+static __typeof__ (__fixunskfti_sw) *
+__fixunskfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixunskfti_sw, __fixunskfti_hw);
+}
+
 static __typeof__ (__fixkfsi_sw) *
 __fixkfsi_resolve (void)
 {
@@ -303,6 +323,18 @@ TFtype __floatsikf (SItype_ppc)
 TFtype __floatdikf (DItype_ppc)
   __attribute__ ((__ifunc__ ("__floatdikf_resolve")));
 
+TFtype __floattikf (TItype_ppc)
+  __attribute__ ((__ifunc__ ("__floattikf_resolve")));
+
+TFtype __floatuntikf (UTItype_ppc)
+  __attribute__ ((__ifunc__ ("__floatuntikf_resolve")));
+
+TItype_ppc __fixkfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixkfti_resolve")));
+
+UTItype_ppc __fixunskfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixunskfti_resolve")));
+
 TFtype __floatunsikf (USItype_ppc)
   __attribute__ ((__ifunc__ ("__floatunsikf_resolve")));
 
diff --git a/libgcc/config/rs6000/float128-sed b/libgcc/config/rs6000/float128-sed
index d9a089ff9ba..c0fcddb1959 100644
--- a/libgcc/config/rs6000/float128-sed
+++ b/libgcc/config/rs6000/float128-sed
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi/g
 s/__fixunstfdi/__fixunskfdi/g
 s/__fixunstfsi/__fixunskfsi/g
 s/__floatditf/__floatdikf/g
+s/__floattitf/__floattikf/g
+s/__floatuntitf/__floatuntikf/g
+s/__fixtfti/__fixkfti/g
+s/__fixunstfti/__fixunskfti/g
 s/__floatsitf/__floatsikf/g
 s/__floatunditf/__floatundikf/g
 s/__floatunsitf/__floatunsikf/g
diff --git a/libgcc/config/rs6000/float128-sed-hw b/libgcc/config/rs6000/float128-sed-hw
index acf36b0c17d..3d2bf556da1 100644
--- a/libgcc/config/rs6000/float128-sed-hw
+++ b/libgcc/config/rs6000/float128-sed-hw
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi_sw/g
 s/__fixunstfdi/__fixunskfdi_sw/g
 s/__fixunstfsi/__fixunskfsi_sw/g
 s/__floatditf/__floatdikf_sw/g
+s/__floattitf/__floattikf_sw/g
+s/__floatuntitf/__floatuntikf_sw/g
+s/__fixtfti/__fixkfti_sw/g
+s/__fixunstfti/__fixunskfti_sw/g
 s/__floatsitf/__floatsikf_sw/g
 s/__floatunditf/__floatundikf_sw/g
 s/__floatunsitf/__floatunsikf_sw/g
diff --git a/libgcc/config/rs6000/floattikf.c b/libgcc/config/rs6000/floattikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floattikf.c
rename to libgcc/config/rs6000/floattikf-sw.c
index 4e8c40cfbe4..110706352bb 100644
--- a/libgcc/config/rs6000/floattikf.c
+++ b/libgcc/config/rs6000/floattikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floattikf (TItype i)
+__floattikf_sw (TItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/floatuntikf.c b/libgcc/config/rs6000/floatuntikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floatuntikf.c
rename to libgcc/config/rs6000/floatuntikf-sw.c
index 8bfba4267d4..5e712a67e26 100644
--- a/libgcc/config/rs6000/floatuntikf.c
+++ b/libgcc/config/rs6000/floatuntikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floatuntikf (UTItype i)
+__floatuntikf_sw (UTItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/quad-float128.h b/libgcc/config/rs6000/quad-float128.h
index 32ef328a8ea..24712b9277f 100644
--- a/libgcc/config/rs6000/quad-float128.h
+++ b/libgcc/config/rs6000/quad-float128.h
@@ -87,19 +87,18 @@ extern USItype_ppc __fixunskfsi_sw (TFtype);
 extern UDItype_ppc __fixunskfdi_sw (TFtype);
 extern TFtype __floatsikf_sw (SItype_ppc);
 extern TFtype __floatdikf_sw (DItype_ppc);
+extern TFtype __floattikf_sw (TItype_ppc);
 extern TFtype __floatunsikf_sw (USItype_ppc);
 extern TFtype __floatundikf_sw (UDItype_ppc);
+extern TFtype __floatuntikf_sw (UTItype_ppc);
+extern TItype_ppc __fixkfti_sw (TFtype);
+extern UTItype_ppc __fixunskfti_sw (TFtype);
 extern IBM128_TYPE __extendkftf2_sw (TFtype);
 extern TFtype __trunctfkf2_sw (IBM128_TYPE);
 extern TCtype __mulkc3_sw (TFtype, TFtype, TFtype, TFtype);
 extern TCtype __divkc3_sw (TFtype, TFtype, TFtype, TFtype);
 
 #ifdef _ARCH_PPC64
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 extern TItype_ppc __fixkfti (TFtype);
 extern UTItype_ppc __fixunskfti (TFtype);
 extern TFtype __floattikf (TItype_ppc);
@@ -130,8 +129,12 @@ extern USItype_ppc __fixunskfsi_hw (TFtype);
 extern UDItype_ppc __fixunskfdi_hw (TFtype);
 extern TFtype __floatsikf_hw (SItype_ppc);
 extern TFtype __floatdikf_hw (DItype_ppc);
+extern TFtype __floattikf_hw (TItype_ppc);
 extern TFtype __floatunsikf_hw (USItype_ppc);
 extern TFtype __floatundikf_hw (UDItype_ppc);
+extern TFtype __floatuntikf_hw (UTItype_ppc);
+extern TItype_ppc __fixkfti_hw (TFtype);
+extern UTItype_ppc __fixunskfti_hw (TFtype);
 extern IBM128_TYPE __extendkftf2_hw (TFtype);
 extern TFtype __trunctfkf2_hw (IBM128_TYPE);
 extern TCtype __mulkc3_hw (TFtype, TFtype, TFtype, TFtype);
@@ -162,8 +165,12 @@ extern USItype_ppc __fixunskfsi (TFtype);
 extern UDItype_ppc __fixunskfdi (TFtype);
 extern TFtype __floatsikf (SItype_ppc);
 extern TFtype __floatdikf (DItype_ppc);
+extern TFtype __floattikf (TItype_ppc);
 extern TFtype __floatunsikf (USItype_ppc);
 extern TFtype __floatundikf (UDItype_ppc);
+extern TFtype __floatuntikf (UTItype_ppc);
+extern TItype_ppc __fixkfti (TFtype);
+extern UTItype_ppc __fixunskfti (TFtype);
 extern IBM128_TYPE __extendkftf2 (TFtype);
 extern TFtype __trunctfkf2 (IBM128_TYPE);
 
diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
index d5413445189..325b22fd49e 100644
--- a/libgcc/config/rs6000/t-float128
+++ b/libgcc/config/rs6000/t-float128
@@ -23,7 +23,8 @@ fp128_softfp_shared_obj	= $(addsuffix -sw_s$(objext),$(fp128_softfp_funcs))
 fp128_softfp_obj	= $(fp128_softfp_static_obj) $(fp128_softfp_shared_obj)
 
 # New functions for software emulation
-fp128_ppc_funcs		= floattikf floatuntikf fixkfti fixunskfti \
+fp128_ppc_funcs		= floattikf-sw floatuntikf-sw \
+			  fixkfti-sw fixunskfti-sw \
 			  extendkftf2-sw trunctfkf2-sw \
 			  sfp-exceptions _mulkc3 _divkc3 _powikf2
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations
  2020-10-07 21:08       ` will schmidt
  2020-10-12 20:15         ` Carl Love
@ 2020-10-12 20:43         ` Segher Boessenkool
  2020-10-12 21:06           ` Carl Love
  1 sibling, 1 reply; 41+ messages in thread
From: Segher Boessenkool @ 2020-10-12 20:43 UTC (permalink / raw)
  To: will schmidt; +Cc: Carl Love, dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Hi!

On Wed, Oct 07, 2020 at 04:08:12PM -0500, will schmidt wrote:
> On Mon, 2020-10-05 at 11:51 -0700, Carl Love wrote:
> > +/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1.  */
> 
> I have mixed feelings about straddling the ISA 3.0 and 3.1 ; but not
> sure how to properly improve.  (I defer).

The builtins are not defined in the ISA.  The instructions generated by
these builtins are ISA 3.0 insns, but the builtins themselves were only
defined contemporary with ISA 3.1.

I don't know how to write that comment more clearly.  Well, maybe we
have to write it out, not everything is best explained in few words?


Segher

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations
  2020-10-12 20:15         ` Carl Love
@ 2020-10-12 20:51           ` Segher Boessenkool
  0 siblings, 0 replies; 41+ messages in thread
From: Segher Boessenkool @ 2020-10-12 20:51 UTC (permalink / raw)
  To: Carl Love; +Cc: will schmidt, dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Hi!

On Mon, Oct 12, 2020 at 01:15:32PM -0700, Carl Love wrote:
> Patch 1, adds the 128-bit sign extension instruction support and
> corresponding builtin support.

> 	* config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
> 	for new builtins.
> 	* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
> 	overloaded builtin definitions.
> 	(VSIGNEXTSB2W, VSIGNEXTSH2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D):
> 	Add builtin expansions.
> 	* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
> 	P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
> 	* config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
> 	visible.
> 	* doc/extend.texi:  Add documentation for the vec_signexti and
> 	vec_signextll builtins.

> +uThe following sign extension builtins are provided.

Typo ("uThe").  Probably should be a colon at the end, while you're at it.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
> @@ -0,0 +1,128 @@
> +/* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } */

Why only on Linux?  (And everything in gcc.target/powerpc/ is powerpc*
always, so could just be *-*-linux*).

Looks good otherwise.


Segher

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations
  2020-10-12 20:43         ` Segher Boessenkool
@ 2020-10-12 21:06           ` Carl Love
  0 siblings, 0 replies; 41+ messages in thread
From: Carl Love @ 2020-10-12 21:06 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

On Mon, 2020-10-12 at 15:43 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Oct 07, 2020 at 04:08:12PM -0500, will schmidt wrote:
> > On Mon, 2020-10-05 at 11:51 -0700, Carl Love wrote:
> > > +/* Sign extend builtins that work on ISA 3.0, but not defined
> > > until ISA 3.1.  */
> > 
> > I have mixed feelings about straddling the ISA 3.0 and 3.1 ; but
> > not
> > sure how to properly improve.  (I defer).
> 
> The builtins are not defined in the ISA.  The instructions generated
> by
> these builtins are ISA 3.0 insns, but the builtins themselves were
> only
> defined contemporary with ISA 3.1.
> 
> I don't know how to write that comment more clearly.  Well, maybe we
> have to write it out, not everything is best explained in few words?
> 

OK, we can just drop the comment all together.

             Carl 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2a/5] rs6000, vec_rlnm builtin fix arguments
  2020-10-12 20:15         ` Carl Love
@ 2020-10-12 23:37           ` Segher Boessenkool
  0 siblings, 0 replies; 41+ messages in thread
From: Segher Boessenkool @ 2020-10-12 23:37 UTC (permalink / raw)
  To: Carl Love; +Cc: will schmidt, dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

On Mon, Oct 12, 2020 at 01:15:39PM -0700, Carl Love wrote:
> This patch fixes an error in how the vec_rlnm() builtin parameters are
> handled.  The current test for this builtin are compile only.  The
> issue was found in the path that adds the 128-bit operands to the
> vec_rlnm() builtin.  The new test for the 128-bit operands is a compile
> and run test.

> 	* config/rs6000/altivec.h (vec_rlnm): Fix bug in argument generation.

> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index 8a2dcda0144..f7720d136c9 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -183,7 +183,7 @@
>  #define vec_recipdiv __builtin_vec_recipdiv
>  #define vec_rlmi __builtin_vec_rlmi
>  #define vec_vrlnm __builtin_vec_rlnm
> -#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
> +#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
>  #define vec_rsqrt __builtin_vec_rsqrt
>  #define vec_rsqrte __builtin_vec_rsqrte
>  #define vec_signed __builtin_vec_vsigned

That patch is fine of course, thanks!  Is there some testcase that trips
over the old definition?  That would have been good to have.


Segher

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2b/5] RS6000 add 128-bit Integer Operations
  2020-10-07 21:53       ` will schmidt
  2020-10-12 20:16         ` Carl Love
@ 2020-10-13  0:23         ` Segher Boessenkool
  2021-01-19 22:33           ` [PATCH 0/6 ver3] " Carl Love
                             ` (6 more replies)
  1 sibling, 7 replies; 41+ messages in thread
From: Segher Boessenkool @ 2020-10-13  0:23 UTC (permalink / raw)
  To: will schmidt; +Cc: Carl Love, dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Hi!

On Wed, Oct 07, 2020 at 04:53:11PM -0500, will schmidt wrote:
> > +;; AIX does not support extended mnemonic xxswapd.  Use the basic
> > +;; mnemonic xxpermdi instead.
> 
> I'd wonder if there can be additional logic using ( DEFAULT_ABI ==
> ABI_AIX ) sort of check to resolve this.  It looks like this same
> comment exists in multiple places througout our *.md files, so not
> something that needs to be solved here today. 

ABI_AIX just tests the *ABI*, whether we have function descriptors
mostly.  But there is TARGET_AIX to test if we are running on AIX.

The problem with generating different assembler code on AIX is that we
then have to do more testing as well, have more opportunities to get
things wrong.  But it might be worth it for xxpermdi, the extended
mnemonics improve readability a lot.  On the other hand, once we start
this, where will it end :-)

> > new file mode 100644
> > index 00000000000..85ad544e22b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> > @@ -0,0 +1,2254 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-mcpu=power10 -O2 -save-temps" } */
> > +/* { dg-require-effective-target power10_hw } */
> > +/* { dg-require-effective-target ppc_native_128bit } */
> 
> I don't see any other uses of the target option for ppc_native_128bit
> in my tree ?

I don't see where it is defined, even?


Segher

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 0/6 ver3] RS6000 add 128-bit Integer Operations
  2020-10-13  0:23         ` Segher Boessenkool
@ 2021-01-19 22:33           ` Carl Love
  2021-01-19 22:33           ` [PATCH 1/6 ver 3] rs6000, Fix arguments in altivec_vrlwmi and altivec_rlwdi builtins Carl Love
                             ` (5 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Carl Love @ 2021-01-19 22:33 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Segher, Will:

The following patch set is adds the 128-bit integer operation support
and fixes a bug found in the existing support.  This is the third
version of the patch set.  The first five patches have minor updates
based on previous reviews.  The last patch has a number of functional
changes to get the 128-bit conversion support to use the new hardware
instrucitons on Power 10.  The existing software support is used for
Power 9 and earlier platforms.

                      Carl Love


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/6 ver 3] rs6000, Fix arguments in altivec_vrlwmi and altivec_rlwdi builtins
  2020-10-13  0:23         ` Segher Boessenkool
  2021-01-19 22:33           ` [PATCH 0/6 ver3] " Carl Love
@ 2021-01-19 22:33           ` Carl Love
  2021-01-19 22:33           ` [PATCH 2/6 ver 3] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
                             ` (4 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Carl Love @ 2021-01-19 22:33 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Will, Segher:

This patch fixes the order of the argument in the vec_rlmi and
vec_rlnm builtins.  The patch also adds a new test cases to verify
the fix.

The patch has been tested on
    powerpc64-linux instead (Power 8 BE)
    powerpc64-linux instead (Power 9 LE)
    powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

                   Carl Love

----------------------------------------------------------------------

gcc/ChangeLog

2021-01-12  Carl Love  <cel@us.ibm.com>

gcc/
	* config/rs6000/altivec.md (altivec_vrl<VI_char>mi): Fix
	bug in argument generation.

gcc/testsuite/
	gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c:
	New runnable test case.
	gcc.target/powerpc/vec-rlmi-rlnm.c: Update scan assembler times
	for xxlor instruction.
---
 gcc/config/rs6000/altivec.md                  |   6 +-
 .../powerpc/check-builtin-vec_rlnm-runnable.c | 233 ++++++++++++++++++
 .../gcc.target/powerpc/vec-rlmi-rlnm.c        |   2 +-
 3 files changed, 237 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index fc19a8fc807..4d08cca2228 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1982,12 +1982,12 @@
 
 (define_insn "altivec_vrl<VI_char>mi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
-        (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
-	                (match_operand:VIlong 2 "register_operand" "v")
+        (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
+	                (match_operand:VIlong 2 "register_operand" "0")
 		        (match_operand:VIlong 3 "register_operand" "v")]
 		       UNSPEC_VRLMI))]
   "TARGET_P9_VECTOR"
-  "vrl<VI_char>mi %0,%2,%3"
+  "vrl<VI_char>mi %0,%1,%3"
   [(set_attr "type" "veclogical")])
 
 (define_insn "altivec_vrl<VI_char>nm"
diff --git a/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
new file mode 100644
index 00000000000..b97bc519c87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
@@ -0,0 +1,233 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
+
+/* Verify the vec_rlm and vec_rlmi builtins works correctly.  */
+/* { dg-final { scan-assembler-times {\mvrldmi\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 1
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector unsigned int vec_arg1_int, vec_arg2_int, vec_arg3_int;
+  vector unsigned int vec_result_int, vec_expected_result_int;
+  
+  vector unsigned long long int vec_arg1_di, vec_arg2_di, vec_arg3_di;
+  vector unsigned long long int vec_result_di, vec_expected_result_di;
+
+  unsigned int mask_begin, mask_end, shift;
+  unsigned long long int mask;
+
+/* Check vec int version of vec_rlmi builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x80000000ULL >> i;
+
+  for (i = 0; i < 4; i++) {
+    vec_arg1_int[i] = 0x12345678 + i*0x11111111;
+    vec_arg2_int[i] = 0xA1B1CDEF;
+    vec_arg3_int[i] = mask_begin << 16 | mask_end << 8 | shift;
+
+    /* do rotate */
+    vec_expected_result_int[i] =  ( vec_arg2_int[i] & ~mask) 
+      | ((vec_arg1_int[i] << shift) | (vec_arg1_int[i] >> (32-shift))) & mask;
+      
+  }
+
+  /* vec_rlmi(arg1, arg2, arg3)
+     result - rotate each element of arg1 left and inserting it into arg2 
+       element of arg2 based on the mask specified in arg3.  The shift, mask
+       start and end is specified in arg3.  */
+  vec_result_int = vec_rlmi (vec_arg1_int, vec_arg2_int, vec_arg3_int);
+
+  for (i = 0; i < 4; i++) {
+    if (vec_result_int[i] != vec_expected_result_int[i])
+#if DEBUG
+      printf("ERROR: i = %d, vec_rlmi int result 0x%x, does not match "
+	     "expected result 0x%x\n", i, vec_result_int[i],
+	     vec_expected_result_int[i]);
+#else
+      abort();
+#endif
+    }
+
+/* Check vec long long int version of vec_rlmi builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x80000000ULL >> i;
+
+  for (i = 0; i < 2; i++) {
+    vec_arg1_di[i] = 0x1234567800000000 + i*0x11111111;
+    vec_arg2_di[i] = 0xA1B1C1D1E1F12345;
+    vec_arg3_di[i] = mask_begin << 16 | mask_end << 8 | shift;
+
+    /* do rotate */
+    vec_expected_result_di[i] =  ( vec_arg2_di[i] & ~mask) 
+      | ((vec_arg1_di[i] << shift) | (vec_arg1_di[i] >> (64-shift))) & mask;
+  }
+
+  /* vec_rlmi(arg1, arg2, arg3)
+     result - rotate each element of arg1 left and inserting it into arg2 
+       element of arg2 based on the mask specified in arg3.  The shift, mask
+       start and end is specified in arg3.  */
+  vec_result_di = vec_rlmi (vec_arg1_di, vec_arg2_di, vec_arg3_di);
+
+  for (i = 0; i < 2; i++) {
+    if (vec_result_di[i] != vec_expected_result_di[i])
+#if DEBUG
+      printf("ERROR: i = %d, vec_rlmi int result 0x%x, does not match "
+	     "expected result 0x%x\n", i, vec_result_di[i],
+	     vec_expected_result_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* Check vec int version of vec_rlnm builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x80000000ULL >> i;
+
+  for (i = 0; i < 4; i++) {
+    vec_arg1_int[i] = 0x12345678 + i*0x11111111;
+    vec_arg2_int[i] = shift;
+    vec_arg3_int[i] = mask_begin << 8 | mask_end;
+    vec_expected_result_int[i] = (vec_arg1_int[i] << shift) & mask;
+  }
+
+  /* vec_rlnm(arg1, arg2, arg3)
+     result - rotate each element of arg1 left by shift in element of arg2.
+       Then AND with mask whose  start/stop bits are specified in element of
+       arg3.  */
+  vec_result_int = vec_rlnm (vec_arg1_int, vec_arg2_int, vec_arg3_int);
+  for (i = 0; i < 4; i++) {
+    if (vec_result_int[i] != vec_expected_result_int[i])
+#if DEBUG
+      printf("ERROR: vec_rlnm, i = %d, int result 0x%x does not match "
+	     "expected result 0x%x\n", i, vec_result_int[i],
+	     vec_expected_result_int[i]);
+#else
+      abort();
+#endif
+    }
+
+/* Check vec long int version of builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 20;
+
+  for (i = 0; i < 63; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x8000000000000000ULL >> i;
+  
+  for (i = 0; i < 2; i++) {
+    vec_arg1_di[i] = 0x123456789ABCDE00ULL + i*0x1111111111111111ULL;
+    vec_arg2_di[i] = shift;
+    vec_arg3_di[i] = mask_begin << 8 | mask_end;
+    vec_expected_result_di[i] = (vec_arg1_di[i] << shift) & mask;
+  }
+
+  vec_result_di = vec_rlnm (vec_arg1_di, vec_arg2_di, vec_arg3_di);
+
+  for (i = 0; i < 2; i++) {
+    if (vec_result_di[i] != vec_expected_result_di[i])
+#if DEBUG
+      printf("ERROR: vec_rlnm, i = %d, long long int result 0x%llx does not "
+	     "match expected result 0x%llx\n", i, vec_result_di[i],
+	     vec_expected_result_di[i]);
+#else
+      abort();
+#endif
+    }
+
+    /* Check vec int version of vec_vrlnm builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x80000000ULL >> i;
+
+  for (i = 0; i < 4; i++) {
+    vec_arg1_int[i] = 0x12345678 + i*0x11111111;
+    vec_arg2_int[i] = mask_begin << 16 | mask_end << 8 | shift;
+    vec_expected_result_int[i] = (vec_arg1_int[i] << shift) & mask;
+  }
+
+  /* vec_vrlnm(arg1, arg2, arg3)
+     result - rotate each element of arg1 left then AND with mask.  The mask
+       start, stop bits is specified in the second argument.  The shift amount
+       is also specified in the second argument.  */
+  vec_result_int = vec_vrlnm (vec_arg1_int, vec_arg2_int);
+
+  for (i = 0; i < 4; i++) {
+    if (vec_result_int[i] != vec_expected_result_int[i])
+#if DEBUG
+      printf("ERROR: vec_vrlnm, i = %d, int result 0x%x does not match "
+	     "expected result 0x%x\n", i, vec_result_int[i],
+	     vec_expected_result_int[i]);
+#else
+      abort();
+#endif
+    }
+
+/* Check vec long int version of vec_vrlnm builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 20;
+
+  for (i = 0; i < 63; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x8000000000000000ULL >> i;
+  
+  for (i = 0; i < 2; i++) {
+    vec_arg1_di[i] = 0x123456789ABCDE00ULL + i*0x1111111111111111ULL;
+    vec_arg2_di[i] = mask_begin << 16 | mask_end << 8 | shift;
+    vec_expected_result_di[i] = (vec_arg1_di[i] << shift) & mask;
+  }
+
+  vec_result_di = vec_vrlnm (vec_arg1_di, vec_arg2_di);
+
+  for (i = 0; i < 2; i++) {
+    if (vec_result_di[i] != vec_expected_result_di[i])
+#if DEBUG
+      printf("ERROR: vec_vrlnm, i = %d, long long int result 0x%llx does not "
+	     "match expected result 0x%llx\n", i, vec_result_di[i],
+	     vec_expected_result_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
index 1e7d7390c5b..b0f26c8f4cb 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
@@ -62,6 +62,6 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned long long y,
 /* { dg-final { scan-assembler-times "vextsb2d" 1 } } */
 /* { dg-final { scan-assembler-times "vslw" 1 } } */
 /* { dg-final { scan-assembler-times "vsld" 1 } } */
-/* { dg-final { scan-assembler-times "xxlor" 3 } } */
+/* { dg-final { scan-assembler-times "xxlor" 5 } } */
 /* { dg-final { scan-assembler-times "vrlwnm" 2 } } */
 /* { dg-final { scan-assembler-times "vrldnm" 2 } } */
-- 
2.27.0



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/6 ver 3] RS6000 Add 128-bit Binary Integer sign extend operations
  2020-10-13  0:23         ` Segher Boessenkool
  2021-01-19 22:33           ` [PATCH 0/6 ver3] " Carl Love
  2021-01-19 22:33           ` [PATCH 1/6 ver 3] rs6000, Fix arguments in altivec_vrlwmi and altivec_rlwdi builtins Carl Love
@ 2021-01-19 22:33           ` Carl Love
  2021-01-19 22:33           ` [PATCH 3/6 ver 3] RS6000 add 128-bit Integer Operations part 1 Carl Love
                             ` (3 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Carl Love @ 2021-01-19 22:33 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Will, Segher:

Patch 1, adds the 128-bit sign extension instruction support and
corresponding builtin support.

version 3:

  doc/extend.texi:  Fixed the "uThe" typo and added the colon at the
 		    end of the line.

  p9-sign_extend-runnable.c: Changed the dg-do run to  *-*-linux 
			     instead of powerpc*-*-linux.

  Tested on Power 8BE, Power9, Power10.

version 2:

  Removed the blank line per Will's latest feedback.

  Retested the patch on Power 9 with no regression errors.

                    Carl Love

----------------------------------------------------------

gcc/ChangeLog

2021-01-12  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
	for new builtins.
	* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
	overloaded builtin definitions.
	(VSIGNEXTSB2W, VSIGNEXTSH2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D):
	Add builtin expansions.
	* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
	P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
	* config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
	visible.
	* doc/extend.texi:  Add documentation for the vec_signexti and
	vec_signextll builtins.

gcc/testsuite/ChangeLog

2021-01-12  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
---
 gcc/config/rs6000/altivec.h                   |   2 +
 gcc/config/rs6000/rs6000-builtin.def          |   9 ++
 gcc/config/rs6000/rs6000-call.c               |  13 ++
 gcc/config/rs6000/vsx.md                      |   2 +-
 gcc/doc/extend.texi                           |  15 ++
 .../powerpc/p9-sign_extend-runnable.c         | 128 ++++++++++++++++++
 6 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 06f0d4d9f14..460310a5132 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -497,6 +497,8 @@
 
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
+#define vec_signexti  __builtin_vec_vsignexti
+#define vec_signextll __builtin_vec_vsignextll
 
 #endif
 
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 8aa31ad0a06..842f07196de 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2800,6 +2800,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,	"vprtybd")
 BU_P9V_OVERLOAD_1 (VPRTYBQ,	"vprtybq")
 BU_P9V_OVERLOAD_1 (VPRTYBW,	"vprtybw")
 BU_P9V_OVERLOAD_1 (VPARITY_LSBB,	"vparity_lsbb")
+BU_P9V_OVERLOAD_1 (VSIGNEXTI,	"vsignexti")
+BU_P9V_OVERLOAD_1 (VSIGNEXTLL,	"vsignextll")
 
 /* 2 argument functions added in ISA 3.0 (power9).  */
 BU_P9_2 (CMPRB,	"byte_in_range",	CONST,	cmprb)
@@ -2811,6 +2813,13 @@ BU_P9_OVERLOAD_2 (CMPRB,	"byte_in_range")
 BU_P9_OVERLOAD_2 (CMPRB2,	"byte_in_either_range")
 BU_P9_OVERLOAD_2 (CMPEQB,	"byte_in_set")
 \f
+
+BU_P9V_AV_1 (VSIGNEXTSB2W,	"vsignextsb2w",		CONST,  vsx_sign_extend_qi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSH2W,	"vsignextsh2w",		CONST,  vsx_sign_extend_hi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSB2D,	"vsignextsb2d",		CONST,  vsx_sign_extend_qi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
+
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
 BU_P10_POWERPC64_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_POWERPC64_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 2308cc8b4a2..3af325317a1 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5660,6 +5660,19 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI },
 
+  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 */
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
+
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+
   /* Overloaded built-in functions for ISA3.1 (power10). */
   { P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 },
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0c1bda522a9..e17b9c556d4 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4807,7 +4807,7 @@
   "vextsh2<wd> %0,%1"
   [(set_attr "type" "vecexts")])
 
-(define_insn "*vsx_sign_extend_si_v2di"
+(define_insn "vsx_sign_extend_si_v2di"
   [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
 	(unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
 		     UNSPEC_VSX_SIGN_EXTEND))]
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 2748e98462e..feaa4929697 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21146,6 +21146,21 @@ void vec_xst (vector unsigned char, int, vector unsigned char *);
 void vec_xst (vector unsigned char, int, unsigned char *);
 @end smallexample
 
+The following sign extension builtins are provided:
+
+@smallexample
+vector signed int vec_signexti (vector signed char a)
+vector signed long long vec_signextll (vector signed char a)
+vector signed int vec_signexti (vector signed short a)
+vector signed long long vec_signextll (vector signed short a)
+vector signed long long vec_signextll (vector signed int a)
+@end smallexample
+
+Each element of the result is produced by sign-extending the element of the
+input vector that would fall in the least significant portion of the result
+element. For example, a sign-extension of a vector signed char to a vector
+signed long long will sign extend the rightmost byte of each doubleword.
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 3.1
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1
 
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
new file mode 100644
index 00000000000..fdcad019b96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
@@ -0,0 +1,128 @@
+/* { dg-do run { target { *-*-linux* && { lp64 && p9vector_hw } } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
+
+/* These builtins were not defined until ISA 3.1 but only require ISA 3.0
+   support.  */
+
+/* { dg-final { scan-assembler-times {\mvextsb2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsw2d\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector signed char vec_arg_qi, vec_result_qi;
+  vector signed short int vec_arg_hi, vec_result_hi, vec_expected_hi;
+  vector signed int vec_arg_wi, vec_result_wi, vec_expected_wi;
+  vector signed long long vec_result_di, vec_expected_di;
+
+  /* test sign extend byte to word */
+  vec_arg_qi = (vector signed char) {1, 2, 3, 4, 5, 6, 7, 8,
+				     -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_wi = (vector signed int) {1, 5, -1, -5};
+
+  vec_result_wi = vec_signexti (vec_arg_qi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(char, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n",
+	     i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend byte to double */
+  vec_arg_qi = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,
+				    -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_qi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(byte, long long int):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to word */
+  vec_arg_hi = (vector signed short int){1, 2, 3, 4, -1, -2, -3, -4};
+  vec_expected_wi = (vector signed int){1, 3, -1, -3};
+
+  vec_result_wi = vec_signexti(vec_arg_hi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(short, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n", i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to double word */
+  vec_arg_hi = (vector signed short int ){1, 3, 5, 7,  -1, -3, -5, -7};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_hi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(short, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend word to double word */
+  vec_arg_wi = (vector signed int ){1, 3, -1, -3};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_wi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(word, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  return 0;
+}
-- 
2.27.0







^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/6 ver 3] RS6000 add 128-bit Integer Operations part 1
  2020-10-13  0:23         ` Segher Boessenkool
                             ` (2 preceding siblings ...)
  2021-01-19 22:33           ` [PATCH 2/6 ver 3] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
@ 2021-01-19 22:33           ` Carl Love
  2021-01-19 22:33           ` [PATCH 4/6 ver 3] Add TI to TD (128-bit DFP) and TD to TI support Carl Love
                             ` (2 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Carl Love @ 2021-01-19 22:33 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Will, Segher:

This patch adds the 128-bit integer support for divide, modulo, shift,
compare of 128-bit integers instructions and builtin support.

version 3:

  int_128bit-runnable.c: Removed ppc_native_128bit from
 		         dg-require-effective-target.  Was missed from 
			 an earlier cleanup.
  Tested on Power 8BE, Power9, Power10.
		       
version 2:

  Fixed the references to 128-bit in ChangeLog that got missed in the
  last go round.

  Fixed missing spaces in emit_insn calls.

  Re-tested the patch on Power 9 with no regression errors.

                    Carl Love

----------------------------------------------------------------------

gcc/ChangeLog

2021-01-12  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
	for new builtins.
	* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
	UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
	(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
	altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
	altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
	altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
	altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
	define_insn.
	(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
	vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
	altivec_vrlqnm): New define_expands.
	* config/rs6000/rs6000-builtin.def (VCMPEQUT_P, VCMPGTST_P,
	VCMPGTUT_P): Add macro expansions.
	(BU_P10V_AV_P): Add builtin predicate definition.
	(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
	CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
	VCMPAET_P, VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ,
	VSLQ, VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
	MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
	(VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions.
	* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
	P10V_BUILTIN_CMPGE_1TI, P10V_BUILTIN_CMPGE_U1TI,
	P10V_BUILTIN_VCMPGTUT, P10V_BUILTIN_VCMPGTST,
	P10V_BUILTIN_CMPLE_1TI, P10V_BUILTIN_VCMPLE_U1TI,
	P10V_BUILTIN_DIV_V1TI, P10V_BUILTIN_UDIV_V1TI,
	P10V_BUILTIN_VMULESD, P10V_BUILTIN_VMULEUD,
	P10V_BUILTIN_VMULOSD, P10V_BUILTIN_VMULOUD,
	P10V_BUILTIN_VNOR_V1TI, P10V_BUILTIN_VNOR_V1TI_UNS,
	P10V_BUILTIN_VRLQ, P10V_BUILTIN_VRLQMI,
	P10V_BUILTIN_VRLQNM, P10V_BUILTIN_VSLQ,
	P10V_BUILTIN_VSRQ, P10V_BUILTIN_VSRAQ,
	P10V_BUILTIN_VCMPGTUT_P, P10V_BUILTIN_VCMPGTST_P,
	P10V_BUILTIN_VCMPEQUT_P, P10V_BUILTIN_VCMPGTUT_P,
	P10V_BUILTIN_VCMPGTST_P, P10V_BUILTIN_CMPNET,
	P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P,
	P10V_BUILTIN_VSIGNEXTSD2Q, P10V_BUILTIN_DIVES_V1TI,
	P10V_BUILTIN_MODS_V1TI, P10V_BUILTIN_MODU_V1TI):
	New overloaded definitions.
	(rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT,
	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
	P10_BUILTIN_CMPLE_U1TI]: New case statements.
	(rs6000_init_builtins) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
	New assignments.
	(altivec_init_builtins): New E_V1TImode case statement.
	(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
	P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
	P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.
	* config/rs6000/r6000.c (rs6000_handle_altivec_attribute)[E_TImode,
	E_V1TImode]: New case statements.
	* config/rs6000/r6000.h (rs6000_builtin_type_index): New enum
	value RS6000_BTI_bool_V1TI.
	* config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti,
	vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti,
	vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
	vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
	vlshrv1ti3, vashrv1ti3): New define_expands.
	* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
	UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
	UNSPEC_VSX_MODUQ): New unspecs.
	(mulv2di3, vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti,
	vsx_diveu_v1ti,	vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti,
	vsx_sign_extend_v2di_v1ti): New define_insns.
	(vcmpnet): New define_expand.
	* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
	vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
	vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,
	vec_cmpge, vec_cmple, vec_all_eq, vec_all_ne, vec_all_gt, vec_all_lt,
	vec_all_ge, vec_all_le, vec_any_eq, vec_any_ne, vec_any_gt, vec_any_lt,
	vec_any_ge, vec_any_le.

gcc/testsuite/ChangeLog

2021-01-12 Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/int_128bit-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |    4 +
 gcc/config/rs6000/altivec.md                  |  241 ++
 gcc/config/rs6000/rs6000-builtin.def          |   53 +-
 gcc/config/rs6000/rs6000-call.c               |  142 +-
 gcc/config/rs6000/rs6000.c                    |    1 +
 gcc/config/rs6000/rs6000.h                    |    3 +-
 gcc/config/rs6000/vector.md                   |  191 ++
 gcc/config/rs6000/vsx.md                      |  107 +
 gcc/doc/extend.texi                           |  174 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 2301 +++++++++++++++++
 10 files changed, 3214 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 460310a5132..3dedccca189 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -717,6 +717,10 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
 
 #ifdef _ARCH_PWR10
+#define vec_signextq  __builtin_vec_vsignextq
+#define vec_dive __builtin_vec_dive
+#define vec_mod  __builtin_vec_mod
+
 /* May modify these macro definitions if future capabilities overload
    with support for different vector argument and result types.  */
 #define vec_cntlzm(a, b)	__builtin_altivec_vclzdm (a, b)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 4d08cca2228..cb83c5ce012 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -39,12 +39,16 @@
    UNSPEC_VMULESH
    UNSPEC_VMULEUW
    UNSPEC_VMULESW
+   UNSPEC_VMULEUD
+   UNSPEC_VMULESD
    UNSPEC_VMULOUB
    UNSPEC_VMULOSB
    UNSPEC_VMULOUH
    UNSPEC_VMULOSH
    UNSPEC_VMULOUW
    UNSPEC_VMULOSW
+   UNSPEC_VMULOUD
+   UNSPEC_VMULOSD
    UNSPEC_VPKPX
    UNSPEC_VPACK_SIGN_SIGN_SAT
    UNSPEC_VPACK_SIGN_UNS_SAT
@@ -629,6 +633,14 @@
   "vcmpequ<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_eqv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpequq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gt<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gt:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -637,6 +649,14 @@
   "vcmpgts<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gt:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpgtsq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gtu<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gtu:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -645,6 +665,14 @@
   "vcmpgtu<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		  (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpgtuq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_eqv4sf"
   [(set (match_operand:V4SF 0 "altivec_register_operand" "=v")
 	(eq:V4SF (match_operand:V4SF 1 "altivec_register_operand" "v")
@@ -1688,6 +1716,19 @@
  DONE;
 })
 
+(define_expand "vec_widen_umult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
 (define_expand "vec_widen_smult_even_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1701,6 +1742,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+ else
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_umult_odd_v16qi"
   [(use (match_operand:V8HI 0 "register_operand"))
    (use (match_operand:V16QI 1 "register_operand"))
@@ -1766,6 +1820,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_umult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_smult_odd_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1779,6 +1846,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "altivec_vmuleub"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
@@ -1860,6 +1940,15 @@
   "vmuleuw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuleud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULEUD))]
+  "TARGET_POWER10"
+  "vmuleud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulouw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1869,6 +1958,15 @@
   "vmulouw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuloud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOUD))]
+  "TARGET_POWER10"
+  "vmuloud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulesw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1878,6 +1976,15 @@
   "vmulesw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulesd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULESD))]
+  "TARGET_POWER10"
+  "vmulesd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulosw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1887,6 +1994,15 @@
   "vmulosw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulosd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOSD))]
+  "TARGET_POWER10"
+  "vmulosd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 ;; Vector pack/unpack
 (define_insn "altivec_vpkpx"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
@@ -1980,6 +2096,15 @@
   "vrl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vrlq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+;; rotate amount in needs to be in bits[57:63] of operand2.
+  "vrlq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vrl<VI_char>mi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
@@ -1990,6 +2115,34 @@
   "vrl<VI_char>mi %0,%1,%3"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqmi"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")
+		      (match_operand:V1TI 3 "vsx_register_operand")]
+		     UNSPEC_VRLMI))]
+  "TARGET_POWER10"
+{
+  /* Mask bit begin, end fields need to be in bits [41:55] of 128-bit operand2.
+     Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[3]));
+  emit_insn (gen_altivec_vrlqmi_inst (operands[0], operands[1], operands[2],
+				      tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqmi_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "0")
+		      (match_operand:V1TI 3 "vsx_register_operand" "v")]
+		     UNSPEC_VRLMI))]
+  "TARGET_POWER10"
+  "vrlqmi %0,%1,%3"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vrl<VI_char>nm"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
@@ -1999,6 +2152,31 @@
   "vrl<VI_char>nm %0,%1,%2"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqnm"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")]
+		     UNSPEC_VRLNM))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vrlqnm_inst (operands[0], operands[1], tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqnm_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+		     UNSPEC_VRLNM))]
+  "TARGET_POWER10"
+  ;; rotate and mask bits need to be in upper 64-bits of operand2.
+  "vrlqnm %0,%1,%2"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vsl"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2043,6 +2221,15 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vslq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vslq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsr<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (lshiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2051,6 +2238,15 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsrq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsrq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsra<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (ashiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2059,6 +2255,15 @@
   "vsra<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsraq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsraq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vsr"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2619,6 +2824,18 @@
   "vcmpequ<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_vcmpequt_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v")
+			   (match_operand:V1TI 2 "altivec_register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpequq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgts<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2631,6 +2848,18 @@
   "vcmpgts<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtst_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gt:CC (match_operand:V1TI 1 "register_operand" "v")
+			   (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gt:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpgtsq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgtu<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gtu:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2643,6 +2872,18 @@
   "vcmpgtu<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtut_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gtu:CC (match_operand:V1TI 1 "register_operand" "v")
+			    (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gtu:V1TI (match_dup 1)
+		  (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpgtuq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpeqfp_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(eq:CC (match_operand:V4SF 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 842f07196de..623907216af 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1201,6 +1201,15 @@
 		     | RS6000_BTC_TERNARY),				\
 		    CODE_FOR_ ## ICODE)			/* ICODE */
 
+/* See the comment on BU_ALTIVEC_P.  */
+#define BU_P10V_AV_P(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_P (P10V_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P10,	 		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_PREDICATE),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
 #define BU_P10V_AV_X(ENUM, NAME, ATTR)					\
   RS6000_BUILTIN_X (P10_BUILTIN_ ## ENUM,		/* ENUM */	\
 		    "__builtin_altivec_" NAME,		/* NAME */	\
@@ -2821,6 +2830,10 @@ BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
 BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
 
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
+BU_P10V_AV_P (VCMPEQUT_P,	"vcmpequt_p",	CONST,	vector_eq_v1ti_p)
+BU_P10V_AV_P (VCMPGTST_P,	"vcmpgtst_p",	CONST,	vector_gt_v1ti_p)
+BU_P10V_AV_P (VCMPGTUT_P,	"vcmpgtut_p",	CONST,	vector_gtu_v1ti_p)
+
 BU_P10_POWERPC64_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_POWERPC64_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
 BU_P10_POWERPC64_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm)
@@ -2841,7 +2854,38 @@ BU_P10V_VSX_2 (XXGENPCVM_V16QI, "xxgenpcvm_v16qi", CONST, xxgenpcvm_v16qi)
 BU_P10V_VSX_2 (XXGENPCVM_V8HI, "xxgenpcvm_v8hi", CONST, xxgenpcvm_v8hi)
 BU_P10V_VSX_2 (XXGENPCVM_V4SI, "xxgenpcvm_v4si", CONST, xxgenpcvm_v4si)
 BU_P10V_VSX_2 (XXGENPCVM_V2DI, "xxgenpcvm_v2di", CONST, xxgenpcvm_v2di)
-
+BU_P10V_AV_2 (VCMPGTUT,		"vcmpgtut",	CONST,	vector_gtuv1ti)
+BU_P10V_AV_2 (VCMPGTST,		"vcmpgtst",	CONST,	vector_gtv1ti)
+BU_P10V_AV_2 (VCMPEQUT,		"vcmpequt",	CONST,	eqvv1ti3)
+BU_P10V_AV_2 (CMPNET,		"vcmpnet",	CONST,	vcmpnet)
+BU_P10V_AV_2 (CMPGE_1TI,	"cmpge_1ti",    CONST,  vector_nltv1ti)
+BU_P10V_AV_2 (CMPGE_U1TI,	"cmpge_u1ti",   CONST,  vector_nltuv1ti)
+BU_P10V_AV_2 (CMPLE_1TI,	"cmple_1ti",    CONST,  vector_ngtv1ti)
+BU_P10V_AV_2 (CMPLE_U1TI,	"cmple_u1ti",   CONST,  vector_ngtuv1ti)
+BU_P10V_AV_2 (VNOR_V1TI_UNS,	"vnor_v1ti_uns",CONST,	norv1ti3)
+BU_P10V_AV_2 (VNOR_V1TI,	"vnor_v1ti",	CONST,	norv1ti3)
+BU_P10V_AV_2 (VCMPNET_P,	"vcmpnet_p",	CONST,	vector_ne_v1ti_p)
+BU_P10V_AV_2 (VCMPAET_P,	"vcmpaet_p",	CONST,	vector_ae_v1ti_p)
+
+BU_P10V_AV_1 (VSIGNEXTSD2Q,	"vsignext",     CONST,  vsx_sign_extend_v2di_v1ti)
+
+BU_P10V_AV_2 (VMULEUD,	"vmuleud",	CONST,	vec_widen_umult_even_v2di)
+BU_P10V_AV_2 (VMULESD,	"vmulesd",	CONST,	vec_widen_smult_even_v2di)
+BU_P10V_AV_2 (VMULOUD,	"vmuloud",	CONST,	vec_widen_umult_odd_v2di)
+BU_P10V_AV_2 (VMULOSD,	"vmulosd",	CONST,	vec_widen_smult_odd_v2di)
+BU_P10V_AV_2 (VRLQ,		"vrlq",		CONST,	vrotlv1ti3)
+BU_P10V_AV_2 (VSLQ,		"vslq",		CONST,	vashlv1ti3)
+BU_P10V_AV_2 (VSRQ,		"vsrq",		CONST,	vlshrv1ti3)
+BU_P10V_AV_2 (VSRAQ,		"vsraq",	CONST,	vashrv1ti3)
+BU_P10V_AV_2 (VRLQNM,	"vrlqnm",	CONST,	altivec_vrlqnm)
+BU_P10V_AV_2 (DIV_V1TI,	"div_1ti",      CONST,  vsx_div_v1ti)
+BU_P10V_AV_2 (UDIV_V1TI,	"udiv_1ti",     CONST,  vsx_udiv_v1ti)
+BU_P10V_AV_2 (DIVES_V1TI,	"dives",	CONST,	vsx_dives_v1ti)
+BU_P10V_AV_2 (DIVEU_V1TI,	"diveu",	CONST,	vsx_diveu_v1ti)
+BU_P10V_AV_2 (MODS_V1TI,	"mods",		CONST,	vsx_mods_v1ti)
+BU_P10V_AV_2 (MODU_V1TI,	"modu",		CONST,	vsx_modu_v1ti)
+
+BU_P10V_AV_3 (VRLQMI,	"vrlqmi",	CONST,	altivec_vrlqmi)
 BU_P10V_AV_3 (VEXTRACTBL, "vextdubvlx", CONST, vextractlv16qi)
 BU_P10V_AV_3 (VEXTRACTHL, "vextduhvlx", CONST, vextractlv8hi)
 BU_P10V_AV_3 (VEXTRACTWL, "vextduwvlx", CONST, vextractlv4si)
@@ -2948,6 +2992,12 @@ BU_P10_OVERLOAD_2 (CLRR, "clrr")
 BU_P10_OVERLOAD_2 (GNB, "gnb")
 BU_P10_OVERLOAD_4 (XXEVAL, "xxeval")
 BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
+BU_P10_OVERLOAD_2 (VRLQ, "vrlq")
+BU_P10_OVERLOAD_2 (VSLQ, "vslq")
+BU_P10_OVERLOAD_2 (VSRQ, "vsrq")
+BU_P10_OVERLOAD_2 (VSRAQ, "vsraq")
+BU_P10_OVERLOAD_2 (DIVE, "dive")
+BU_P10_OVERLOAD_2 (MOD, "mod")
 
 BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
 BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
@@ -2967,6 +3017,7 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
 BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
 BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
 
+BU_P10_OVERLOAD_1 (SIGNEXT, "vsignextq")
 \f
 BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
 BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 3af325317a1..e9ba4751cd4 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -837,6 +837,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, VSX_BUILTIN_XVCMPEQDP,
@@ -883,6 +887,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -897,8 +907,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTSD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTST,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, VSX_BUILTIN_XVCMPGTDP,
@@ -941,6 +955,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_CMPLE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -1069,6 +1088,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIV_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_UDIV_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
@@ -1922,6 +1947,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULE, P8V_BUILTIN_VMULEUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10V_BUILTIN_VMULESD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10V_BUILTIN_VMULEUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULEUB, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULESB, ALTIVEC_BUILTIN_VMULESB,
@@ -1945,6 +1975,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULO, P8V_BUILTIN_VMULOUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10V_BUILTIN_VMULOSD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10V_BUILTIN_VMULOUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULO, ALTIVEC_BUILTIN_VMULOSH,
     RS6000_BTI_V4SI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULOSH, ALTIVEC_BUILTIN_VMULOSH,
@@ -1987,6 +2022,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
@@ -2248,6 +2293,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_RL, P8V_BUILTIN_VRLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10V_BUILTIN_VRLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10V_BUILTIN_VRLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
@@ -2266,12 +2316,23 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_RLMI, P9V_BUILTIN_VRLDMI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { P9V_BUILTIN_VEC_RLMI, P10V_BUILTIN_VRLQMI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI },
+  { P9V_BUILTIN_VEC_RLMI, P10V_BUILTIN_VRLQMI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLWNM,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLDNM,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10V_BUILTIN_VRLQNM,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10V_BUILTIN_VRLQNM,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
@@ -2288,6 +2349,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, P8V_BUILTIN_VSLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10V_BUILTIN_VSLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10V_BUILTIN_VSLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTSP,
@@ -2484,6 +2550,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SR, P8V_BUILTIN_VSRD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10V_BUILTIN_VSRQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10V_BUILTIN_VSRQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
@@ -2512,6 +2583,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SRA, P8V_BUILTIN_VSRAD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10V_BUILTIN_VSRAQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10V_BUILTIN_VSRAQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
@@ -4129,12 +4205,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10V_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10V_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, ALTIVEC_BUILTIN_VCMPGTFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, VSX_BUILTIN_XVCMPGTDP_P,
@@ -4199,6 +4279,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10V_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10V_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, ALTIVEC_BUILTIN_VCMPEQFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, VSX_BUILTIN_XVCMPEQDP_P,
@@ -4250,12 +4334,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10V_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10V_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, ALTIVEC_BUILTIN_VCMPGEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
@@ -4904,6 +4992,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEW,
     RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
 
   /* The following 2 entries have been deprecated.  */
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEB_P,
@@ -5004,6 +5098,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
 
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
@@ -5109,7 +5207,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
-
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
@@ -6036,6 +6137,21 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
  { P10_BUILTIN_VEC_XVTLSBB_ONES, P10V_BUILTIN_XVTLSBB_ONES,
     RS6000_BTI_INTSI, RS6000_BTI_unsigned_V16QI, 0, 0 },
 
+ { P10_BUILTIN_VEC_SIGNEXT, P10V_BUILTIN_VSIGNEXTSD2Q,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, 0, 0 },
+
+ { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
   { RS6000_BUILTIN_NONE, RS6000_BUILTIN_NONE, 0, 0, 0, 0 }
 };
 \f
@@ -12530,12 +12646,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPEQUH:
     case ALTIVEC_BUILTIN_VCMPEQUW:
     case P8V_BUILTIN_VCMPEQUD:
+    case P10V_BUILTIN_VCMPEQUT:
       fold_compare_helper (gsi, EQ_EXPR, stmt);
       return true;
 
     case P9V_BUILTIN_CMPNEB:
     case P9V_BUILTIN_CMPNEH:
     case P9V_BUILTIN_CMPNEW:
+    case P10V_BUILTIN_CMPNET:
       fold_compare_helper (gsi, NE_EXPR, stmt);
       return true;
 
@@ -12547,6 +12665,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_2DI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10V_BUILTIN_CMPGE_1TI:
+    case P10V_BUILTIN_CMPGE_U1TI:
       fold_compare_helper (gsi, GE_EXPR, stmt);
       return true;
 
@@ -12558,6 +12678,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
     case P8V_BUILTIN_VCMPGTSD:
+    case P10V_BUILTIN_VCMPGTUT:
+    case P10V_BUILTIN_VCMPGTST:
       fold_compare_helper (gsi, GT_EXPR, stmt);
       return true;
 
@@ -12569,6 +12691,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPLE_U4SI:
     case VSX_BUILTIN_CMPLE_2DI:
     case VSX_BUILTIN_CMPLE_U2DI:
+    case P10V_BUILTIN_CMPLE_1TI:
+    case P10V_BUILTIN_CMPLE_U1TI:
       fold_compare_helper (gsi, LE_EXPR, stmt);
       return true;
 
@@ -13296,6 +13420,8 @@ rs6000_init_builtins (void)
 					    ? "__vector __bool long"
 					    : "__vector __bool long long",
 					    bool_long_long_type_node, 2);
+  bool_V1TI_type_node = rs6000_vector_type ("__vector __bool __int128",
+					    intTI_type_node, 1);
   pixel_V8HI_type_node = rs6000_vector_type ("__vector __pixel",
 					     pixel_type_node, 8);
 
@@ -13481,6 +13607,10 @@ altivec_init_builtins (void)
     = build_function_type_list (integer_type_node,
 				integer_type_node, V2DI_type_node,
 				V2DI_type_node, NULL_TREE);
+  tree int_ftype_int_v1ti_v1ti
+    = build_function_type_list (integer_type_node,
+				integer_type_node, V1TI_type_node,
+				V1TI_type_node, NULL_TREE);
   tree void_ftype_v4si
     = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
   tree v8hi_ftype_void
@@ -13848,6 +13978,9 @@ altivec_init_builtins (void)
 	case E_VOIDmode:
 	  type = int_ftype_int_opaque_opaque;
 	  break;
+	case E_V1TImode:
+	  type = int_ftype_int_v1ti_v1ti;
+	  break;
 	case E_V2DImode:
 	  type = int_ftype_int_v2di_v2di;
 	  break;
@@ -14451,6 +14584,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10V_BUILTIN_XXGENPCVM_V8HI:
     case P10V_BUILTIN_XXGENPCVM_V4SI:
     case P10V_BUILTIN_XXGENPCVM_V2DI:
+    case P10V_BUILTIN_VMULEUD:
+    case P10V_BUILTIN_VMULOUD:
+    case P10V_BUILTIN_DIVEU_V1TI:
+    case P10V_BUILTIN_MODU_V1TI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
@@ -14550,10 +14687,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case VSX_BUILTIN_CMPGE_U8HI:
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10V_BUILTIN_CMPGE_U1TI:
     case ALTIVEC_BUILTIN_VCMPGTUB:
     case ALTIVEC_BUILTIN_VCMPGTUH:
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
+    case P10V_BUILTIN_VCMPGTUT:
+    case P10V_BUILTIN_VCMPEQUT:
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
       break;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 67681d18150..0c51491cbe7 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -19855,6 +19855,7 @@ rs6000_handle_altivec_attribute (tree *node,
     case 'b':
       switch (mode)
 	{
+	case E_TImode: case E_V1TImode: result = bool_V1TI_type_node; break;
 	case E_DImode: case E_V2DImode: result = bool_V2DI_type_node; break;
 	case E_SImode: case E_V4SImode: result = bool_V4SI_type_node; break;
 	case E_HImode: case E_V8HImode: result = bool_V8HI_type_node; break;
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index b05dd827b13..d6c3ba040be 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2330,7 +2330,6 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_MMA		MASK_MMA	/* ISA 3.1 MMA.  */
 #define RS6000_BTM_P10		MASK_POWER10
 
-
 #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
 				 | RS6000_BTM_VSX			\
 				 | RS6000_BTM_P8_VECTOR			\
@@ -2443,6 +2442,7 @@ enum rs6000_builtin_type_index
   RS6000_BTI_bool_V8HI,          /* __vector __bool short */
   RS6000_BTI_bool_V4SI,          /* __vector __bool int */
   RS6000_BTI_bool_V2DI,          /* __vector __bool long */
+  RS6000_BTI_bool_V1TI,          /* __vector __bool 128-bit */
   RS6000_BTI_pixel_V8HI,         /* __vector __pixel */
   RS6000_BTI_long,	         /* long_integer_type_node */
   RS6000_BTI_unsigned_long,      /* long_unsigned_type_node */
@@ -2496,6 +2496,7 @@ enum rs6000_builtin_type_index
 #define bool_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V8HI])
 #define bool_V4SI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V4SI])
 #define bool_V2DI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V2DI])
+#define bool_V1TI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V1TI])
 #define pixel_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_pixel_V8HI])
 
 #define long_long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long_long])
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index e5191bd1424..0f252c915b0 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -685,6 +685,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtv1ti"
+  [(set (match_operand:V1TI 0 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))]
+  "TARGET_POWER10"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nlt<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -697,6 +704,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		 (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_gtu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -704,6 +722,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		  (match_operand:V1TI 2 "altivec_register_operand")))]
+  "TARGET_POWER10"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nltu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -716,6 +741,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		  (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+	(not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_geu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(geu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -735,6 +771,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_ngtu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vlogical_operand")
@@ -746,6 +793,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		  (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 ; There are 14 possible vector FP comparison operators, gt and eq of them have
 ; been expanded above, so just support 12 remaining operators here.
 
@@ -894,6 +952,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_eq_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 ;; This expansion handles the V16QI, V8HI, and V4SI modes in the
 ;; implementation of the vec_all_ne built-in functions on Power9.
 (define_expand "vector_ne_<mode>_p"
@@ -976,6 +1046,23 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+(define_expand "vector_ne_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V2DI mode in the implementation of the
 ;; vec_any_eq built-in function on Power9.
 ;;
@@ -1002,6 +1089,26 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+(define_expand "vector_ae_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))
+   (set (match_dup 0)
+	(xor:SI (match_dup 0)
+		(const_int 1)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V4SF and V2DF modes in the Power9
 ;; implementation of the vec_all_ne built-in functions.  Note that the
 ;; expansions for this pattern with these modes makes no use of power9-
@@ -1061,6 +1168,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gt_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gt:CC (match_operand:V1TI 1 "vlogical_operand")
+			     (match_operand:V1TI 2 "vlogical_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (gt:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 (define_expand "vector_ge_<mode>_p"
   [(parallel
     [(set (reg:CC CR6_REGNO)
@@ -1085,6 +1204,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtu_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gtu:CC (match_operand:V1TI 1 "altivec_register_operand")
+			      (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "altivec_register_operand")
+	  (gtu:V1TI (match_dup 1)
+		    (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 ;; AltiVec/VSX predicates.
 
 ;; This expansion is triggered during expansion of predicate built-in
@@ -1460,6 +1591,20 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vrotlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vrlq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for rotatert to make use of vrotl
 (define_expand "vrotr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1481,6 +1626,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vashlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vslq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for logical shift right on each vector element
 (define_expand "vlshr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1489,6 +1649,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vlshrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vsrq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for arithmetic shift right on each vector element
 (define_expand "vashr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1496,6 +1671,22 @@
 			(match_operand:VEC_I 2 "vint_operand")))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
+
+;; No immediate version of this 128-bit instruction
+(define_expand "vashrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vsraq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 \f
 ;; Vector reduction expanders for VSX
 ; The (VEC_reduc:...
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e17b9c556d4..fd779435390 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -298,6 +298,12 @@
    UNSPEC_VSX_XXSPLTD
    UNSPEC_VSX_DIVSD
    UNSPEC_VSX_DIVUD
+   UNSPEC_VSX_DIVSQ
+   UNSPEC_VSX_DIVUQ
+   UNSPEC_VSX_DIVESQ
+   UNSPEC_VSX_DIVEUQ
+   UNSPEC_VSX_MODSQ
+   UNSPEC_VSX_MODUQ
    UNSPEC_VSX_MULSD
    UNSPEC_VSX_SIGN_EXTEND
    UNSPEC_VSX_XVCVBF16SPN
@@ -1752,6 +1758,70 @@
 }
   [(set_attr "type" "div")])
 
+;; 64-bit multiply
+(define_insn "mulv2di3"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+	(mult:V2DI (match_operand:V2DI 1 "register_operand" "v")
+		   (match_operand:V2DI 2 "register_operand" "v")))]
+  "TARGET_POWER10"
+  "vmulld %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+;; Vector integer signed/unsigned divide
+(define_insn "vsx_div_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVSQ))]
+  "TARGET_POWER10"
+  "vdivsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_udiv_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVUQ))]
+  "TARGET_POWER10"
+  "vdivuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_dives_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVESQ))]
+  "TARGET_POWER10"
+  "vdivesq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_diveu_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVEUQ))]
+  "TARGET_POWER10"
+  "vdiveuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_mods_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODSQ))]
+  "TARGET_POWER10"
+  "vmodsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_modu_v1ti"
+ [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODUQ))]
+  "TARGET_POWER10"
+  "vmoduq %0,%1,%2"
+  [(set_attr "type" "div")])
+
 ;; *tdiv* instruction returning the FG flag
 (define_expand "vsx_tdiv<mode>3_fg"
   [(set (match_dup 3)
@@ -3103,6 +3173,21 @@
   "xxpermdi %x0,%x1,%x1,2"
   [(set_attr "type" "vecperm")])
 
+;; Swap upper/lower 64-bit values in a 128-bit vector
+(define_insn "xxswapd_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(subreg:V1TI
+           (vec_select:V2DI
+             (subreg:V2DI
+                (match_operand:V1TI 1 "vsx_register_operand" "v") 0 )
+	     (parallel [(const_int 1)(const_int 0)]))
+           0))]
+  "TARGET_POWER10"
+;; AIX does not support extended mnemonic xxswapd.  Use the basic
+;; mnemonic xxpermdi instead.
+  "xxpermdi %x0,%x1,%x1,2"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "xxgenpcvm_<mode>_internal"
   [(set (match_operand:VSX_EXTRACT_I4 0 "altivec_register_operand" "=wa")
 	(unspec:VSX_EXTRACT_I4
@@ -4787,6 +4872,15 @@
    (set_attr "type" "vecload")])
 
 \f
+;; ISA 3.1 vector extend sign support
+(define_insn "vsx_sign_extend_v2di_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
+		     UNSPEC_VSX_SIGN_EXTEND))]
+  "TARGET_POWER10"
+  "vextsd2q %0,%1"
+  [(set_attr "type" "vecexts")])
+
 ;; ISA 3.0 vector extend sign support
 
 (define_insn "vsx_sign_extend_qi_<mode>"
@@ -5502,6 +5596,19 @@
   "vcmpneb %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Vector Compare Not Equal v1ti (specified/not+eq:)
+(define_expand "vcmpnet"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(not:V1TI
+	  (eq:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		   (match_operand:V1TI 2 "altivec_register_operand"))))]
+   "TARGET_POWER10"
+{
+  emit_insn (gen_eqvv1ti3 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_one_cmplv1ti2 (operands[0], operands[0]));
+  DONE;
+})
+
 ;; Vector Compare Not Equal or Zero Byte
 (define_insn "vcmpnezb"
   [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index feaa4929697..f2efd73e12d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21662,6 +21662,180 @@ Generate PCV from specified Mask size, as if implemented by the
 immediate value is either 0, 1, 2 or 3.
 @findex vec_genpcvm
 
+@smallexample
+@exdent vector unsigned __int128 vec_rl (vector unsigned __int128,
+                                         vector unsigned __int128);
+@exdent vector signed __int128 vec_rl (vector signed __int128,
+                                       vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input left by the number of bits
+specified in the most significant quad word of the second input truncated to
+7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlmi (vector signed __int128,
+                                         vector signed __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and inserting it under mask
+into the second input.  The first bit in the mask, the last bit in the mask are
+obtained from the two 7-bit fields bits [108:115] and bits [117:123]
+respectively of the second input.  The shift is obtained from the third input
+in the 7-bit field [125:131] where all bits counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlnm (vector signed __int128,
+                                         vector unsigned __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and ANDing it with a mask.  The
+first bit in the mask and the last bit in the mask are obtained from the two
+7-bit fields bits [117:123] and bits [125:131] respectively of the second
+input.  The shift is obtained from the third input in the 7-bit field bits
+[125:131] where all bits counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_sl(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sl(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of shifting the first input left by the number of bits
+specified in the most significant bits of the second input truncated to
+7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_sr(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sr(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of performing a logical right shift of the first argument
+by the number of bits specified in the most significant double word of the
+second input truncated to 7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_sra(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sra(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of performing arithmetic right shift of the first argument
+by the number of bits specified in the most significant bits of the
+second input truncated to 7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_mule (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mule (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the even
+doubleword elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mulo (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mulo (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the odd
+doubleword elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_div (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_div (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+Returns the result of dividing the first operand by the second operand. An
+attempt to divide any value by zero or to divide the most negative signed
+128-bit integer by negative one results in an undefined value.
+
+@smallexample
+@exdent vector unsigned __int128 vec_dive (vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_dive (vector signed __int128,
+                                         vector signed __int128);
+@end smallexample
+
+The result is produced by shifting the first input left by 128 bits and
+dividing by the second.  If an attempt is made to divide by zero or the result
+is larger than 128 bits, the result is undefined.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mod (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_mod (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+The result is the modulo result of dividing the first input  by the second
+input.
+
+The following builtins perform 128-bit vector comparisons.  The
+@code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is
+one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise
+comparisons between the elements at the same positions within their two vector
+arguments.  The @code{vec_all_xx}function returns a non-zero value if and only
+if all pairwise comparisons are true.  The @code{vec_any_xx} function returns
+a non-zero value if and only if at least one pairwise comparison is true.  The
+@code{vec_cmpxx}function returns a vector of the same type as its two
+arguments, within which each element consists of all ones to denote that
+specified logical comparison of the corresponding elements was true.
+Otherwise, the element of the returned vector contains all zeros.
+
+@smallexample
+vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128);
+
+int vec_all_eq (vector signed __int128, vector signed __int128);
+int vec_all_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ne (vector signed __int128, vector signed __int128);
+int vec_all_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_all_gt (vector signed __int128, vector signed __int128);
+int vec_all_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_lt (vector signed __int128, vector signed __int128);
+int vec_all_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ge (vector signed __int128, vector signed __int128);
+int vec_all_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_all_le (vector signed __int128, vector signed __int128);
+int vec_all_le (vector unsigned __int128, vector unsigned __int128);
+
+int vec_any_eq (vector signed __int128, vector signed __int128);
+int vec_any_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ne (vector signed __int128, vector signed __int128);
+int vec_any_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_any_gt (vector signed __int128, vector signed __int128);
+int vec_any_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_lt (vector signed __int128, vector signed __int128);
+int vec_any_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ge (vector signed __int128, vector signed __int128);
+int vec_any_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_any_le (vector signed __int128, vector signed __int128);
+int vec_any_le (vector unsigned __int128, vector unsigned __int128);
+@end smallexample
+
+
 @node PowerPC Hardware Transactional Memory Built-in Functions
 @subsection PowerPC Hardware Transactional Memory Built-in Functions
 GCC provides two interfaces for accessing the Hardware Transactional
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
new file mode 100644
index 00000000000..3f8892b39d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -0,0 +1,2301 @@
+/* { dg-do run } */
+/* { dg-options "-mcpu=power10 -save-temps" } */
+/* { dg-require-effective-target power10_hw } */
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mvextsd2q\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsrq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsraq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvcmpuq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpsq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvmuleud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulld\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivsq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivuq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveuq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmoduq\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+
+
+void print_i128(__int128_t val)
+{
+  printf(" %lld %llu (0x%llx %llx)",
+	 (signed long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF),
+	 (unsigned long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF));
+}
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i, result_int;
+
+  __int128_t arg1, result;
+  __uint128_t uarg2;
+
+  vector signed long long int vec_arg1_di, vec_arg2_di;
+  vector signed long long int vec_result_di, vec_expected_result_di;
+  vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
+  vector unsigned long long int vec_uresult_di;
+  vector unsigned long long int vec_uexpected_result_di;
+  
+  __int128_t expected_result;
+  __uint128_t uexpected_result;
+
+  vector __int128 vec_arg1, vec_arg2, vec_result;
+  vector unsigned __int128 vec_uarg1, vec_uarg2, vec_uarg3, vec_uresult;
+  vector bool __int128  vec_result_bool;
+
+  /* sign extend double to 128-bit integer  */
+  vec_arg1_di[0] = 1000;
+  vec_arg1_di[1] = -123456;
+
+  expected_result = 1000;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1_di[0] = -123456;
+  vec_arg1_di[1] = 1000;
+
+  expected_result = -123456;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  /* test shift 128-bit integers.
+     Note, shift amount is given by the lower 7-bits of the shift amount. */
+  vec_arg1[0] = 3;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]*4;
+
+  vec_result = vec_sl (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 3;
+  uarg2 = 4;
+  expected_result = arg1*16;
+
+  result = arg1 << uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 << uint128):  ");
+    print_i128(arg1);
+    printf(" << %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 3;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]*4;
+  
+  vec_uresult = vec_sl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]/4;
+
+  vec_result = vec_sr (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 48;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]/4;
+  
+  vec_uresult = vec_sr (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 48;
+  uarg2 = 4;
+  expected_result = arg1/16;
+
+  result = arg1 >> uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 >> uint128:  ");
+    print_i128(arg1);
+    printf(" >> %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x0000000012345678ULL;
+  expected_result = (expected_result << 64) | 0x90ABCDEFAABBCCDDULL;
+
+  vec_result = vec_sra (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0xFFFFFFFFFFFFAABBLL;
+  uexpected_result = (uexpected_result << 64) | 0xCCDDEEFF11221234ULL;
+
+  vec_uresult = vec_sra (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x90ABCDEFAABBCCDDULL;
+  expected_result = (expected_result << 64) | 0xEEFF112212345678ULL;
+
+  vec_result = vec_rl (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0x11221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEEFFULL;
+
+  vec_uresult = vec_rl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* vec_rlnm(arg1, arg2, arg3)
+     result - rotate each element of arg1 left by shift in element of arg2.
+       Then AND with mask whose  start/stop bits are specified in element of
+       arg3.  */
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  vec_uarg3[0] = (32 << 8) | 95;
+  expected_result = 0xaabbccddULL;
+  expected_result = (expected_result << 64) | 0xeeff112200000000ULL;
+
+  vec_result = vec_rlnm (vec_arg1, vec_uarg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(int128, uint128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  
+
+  /* vec_rlnm(arg1, arg2, arg3)
+     result - rotate each element of arg1 left by shift in element of arg2;
+       then AND with mask whose  start/stop bits are specified in element of
+       arg3.  */
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  vec_uarg3[0] = (8 << 8) | 119;
+
+  uexpected_result = 0x00221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEE00ULL;
+
+  vec_uresult = vec_rlnm (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(uint128, uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* vec_rlmi(arg1, arg2, arg3)
+     result - rotate each element of arg1 left and inserting it into arg2 
+       ement of arg2 based on the mask specified in arg3.  The shift, mask
+       start and end is specified in arg3.  */
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_arg2[0] = 0x000000000000DEADULL;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 0x0000BEEF00000000ULL;
+  vec_uarg3[0] = 96 << 16 | 127 << 8 | 32;
+  expected_result = 0x000000000000DEADULL;
+  expected_result = (expected_result << 64) | 0x0000BEEF12345678ULL;
+
+  vec_result = vec_rlmi (vec_arg1, vec_arg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(int128, int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg2_di[1] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* vec_rlmi(arg1, arg2, arg3)
+     result - rotate each element of arg1 left and inserting it into arg2 
+       ement of arg2 based on the mask specified in arg3.  The shift, mask
+       start and end is specified in arg3.  */
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 0xDEAD000000000000ULL;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 0x000000000000BEEFULL;
+  vec_uarg3[0] = 16 << 16 | 111 << 8 | 48;
+  uexpected_result = 0xDEAD1234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDBEEFULL;
+
+  vec_uresult = vec_rlmi (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(uint128, unit128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg3[1] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* 128-bit compare tests, result is all 1's if true */
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1[0] = 2468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  uexpected_result = 0xFFFFFFFFFFFFFFFFULL;
+  uexpected_result = (uexpected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: unsigned vec_cmpgt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed vec_cmpgt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpeq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+   
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+#if 1
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+#endif
+
+  /* Vector multiply Even and Odd tests */
+  vec_arg1_di[0] = 200;
+  vec_arg1_di[1] = 400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[0] * vec_arg2_di[0];
+
+  vec_result = vec_mule (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (signed, signed) failed.\n");
+    printf(" vec_arg1_di[0] = %lld\n", vec_arg1_di[0]);
+    printf(" vec_arg2_di[0] = %lld\n", vec_arg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1_di[0] = -200;
+  vec_arg1_di[1] = -400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[1] * vec_arg2_di[1];
+
+  vec_result = vec_mulo (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (signed, signed) failed.\n");
+    printf(" vec_arg1_di[1] = %lld\n", vec_arg1_di[1]);
+    printf(" vec_arg2_di[1] = %lld\n", vec_arg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[0] * vec_uarg2_di[0];
+
+  vec_uresult = vec_mule (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[1] = %lld\n", vec_uarg1_di[1]);
+    printf(" vec_uarg2_di[1] = %lld\n", vec_uarg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[1] * vec_uarg2_di[1];
+
+  vec_uresult = vec_mulo (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[0] = %lld\n", vec_uarg1_di[0]);
+    printf(" vec_uarg2_di[0] = %lld\n", vec_uarg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Multiply Longword */
+  vec_arg1_di[0] = 100;
+  vec_arg1_di[1] = -123456;
+
+  vec_arg2_di[0] = 123;
+  vec_arg2_di[1] = 1000;
+
+  vec_expected_result_di[0] = 12300;
+  vec_expected_result_di[1] = -123456000;
+
+  vec_result_di = vec_arg1_di * vec_arg2_di;
+
+  for (i = 0; i<2; i++) {
+    if (vec_result_di[i] != vec_expected_result_di[i]) {
+#if DEBUG
+      printf("ERROR: vector multipy [%d] ((long long) %lld) =  ", i,
+	     vec_result_di[i]);
+      printf("\n does not match expected_result [%d] = ((long long) %lld)", i,
+	     vec_expected_result_di[i]);
+      printf("\n\n");
+#else
+      abort();
+#endif
+    }
+  }
+
+  /* Vector Divide Quadword */
+  vec_arg1[0] = -12345678;
+  vec_arg2[0] = 2;
+  expected_result = -6172839;
+
+  vec_result = vec_div (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24680;
+  vec_uarg2[0] = 4;
+  uexpected_result = 6170;
+
+  vec_uresult = vec_div (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Divide Extended Quadword */
+  vec_arg1[0] = -20;        // has 128-bit of zero concatenated onto it
+  vec_arg2[0] = 0x2000000000000000;
+  vec_arg2[0] = vec_arg2[0] << 64;
+  expected_result = -160;
+
+  vec_result = vec_dive (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 20;        // has 128-bit of zero concatenated onto it
+  vec_uarg2[0] = 0x4000000000000000;
+  vec_uarg2[0] = vec_uarg2[0] << 64;
+  uexpected_result = 80;
+
+  vec_uresult = vec_dive (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector modulo quad word  */
+  vec_arg1[0] = -12345675;
+  vec_arg2[0] = 2;
+  expected_result = -1;
+
+  vec_result = vec_mod (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24685;
+  vec_uarg2[0] = 4;
+  uexpected_result = 1;
+
+  vec_uresult = vec_mod (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  return 0;
+}
-- 
2.27.0



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 4/6 ver 3] Add TI to TD (128-bit DFP) and TD to TI support
  2020-10-13  0:23         ` Segher Boessenkool
                             ` (3 preceding siblings ...)
  2021-01-19 22:33           ` [PATCH 3/6 ver 3] RS6000 add 128-bit Integer Operations part 1 Carl Love
@ 2021-01-19 22:33           ` Carl Love
  2021-01-19 22:33           ` [PATCH 5/6 ver 3] rs6000, Add test 128-bit shifts for just the int128 type Carl Love
  2021-01-19 22:33           ` [PATCH 6/6 ver 3] Conversions between 128-bit integer and floating point values Carl Love
  6 siblings, 0 replies; 41+ messages in thread
From: Carl Love @ 2021-01-19 22:33 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Will, Segher:
 
This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats.

Version 3:

  No functional changes.
  Tested on Power 8BE, Power9, Power10.

Version 2:
  Updated ChangeLog comments.  Fixed up comments in the test program.

  Re-tested the patch on Power 9 with no regression errors.
                   
                        Carl

-----------------------------------------------------------

gcc/ChangeLog

2021-01-12  Carl Love  <cel@us.ibm.com>
        * config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
        * config/rs6000/rs6000-call.c (P10V_BUILTIN_VCMPNET_P,
	P10V_BUILTIN_VCMPAET_P): New overloaded definitions.

gcc/testsuite/ChangeLog

2021-01-12  Carl Love  <cel@us.ibm.com>
        * gcc.target/powerpc/int_128bit-runnable.c: Add 128-bit DFP
        conversion tests.
---
 gcc/config/rs6000/dfp.md                      | 14 +++++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 61 +++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index c8cdb645865..876ab2ed682 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -222,6 +222,13 @@
   "dcffixq %0,%1"
   [(set_attr "type" "dfp")])
 
+(define_insn "floattitd2"
+  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
+	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
+  "TARGET_POWER10"
+  "dcffixqq %0,%1"
+  [(set_attr "type" "dfp")])
+
 ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
 ;; This is the first stage of converting it to an integer type.
 
@@ -241,6 +248,13 @@
   "TARGET_DFP"
   "dctfix<q> %0,%1"
   [(set_attr "type" "dfp")])
+
+(define_insn "fixtdti2"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
+	(fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
+  "TARGET_POWER10"
+  "dctfixqq %0,%1"
+  [(set_attr "type" "dfp")])
 \f
 ;; Decimal builtin support
 
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 3f8892b39d6..42cb91c7ba9 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -38,6 +38,7 @@
 #if DEBUG
 #include <stdio.h>
 #include <stdlib.h>
+#include <math.h>
 
 
 void print_i128(__int128_t val)
@@ -59,6 +60,13 @@ int main ()
   __int128_t arg1, result;
   __uint128_t uarg2;
 
+  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
+
+  struct conv_t {
+    __uint128_t u128;
+    _Decimal128 d128;
+  } conv, conv2;
+
   vector signed long long int vec_arg1_di, vec_arg2_di;
   vector signed long long int vec_result_di, vec_expected_result_di;
   vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
@@ -2296,6 +2304,59 @@ int main ()
     abort();
 #endif
   }
+  
+  /* DFP to __int128 and __int128 to DFP conversions */
+  /* Print the DFP value as an unsigned int so we can see the bit patterns.  */
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
+  expected_result_dfp128 = conv.d128;
 
+  arg1 = 4;
+
+  conv.d128 = (_Decimal128) arg1;
+
+  result_dfp128 = (_Decimal128) arg1;
+  if (((conv.u128 >>64) != 0x2208000000000000ULL) &&
+      ((conv.u128 & 0xFFFFFFFFFFFFFFFF) != 0x4ULL)) {
+#if DEBUG
+    printf("ERROR:  convert int128 value ");
+    print_i128 (arg1);
+    conv.d128 = result_dfp128;
+    printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)((conv.u128) >>64),
+	   (unsigned long long)((conv.u128) & 0xFFFFFFFFFFFFFFFF));
+
+    conv.d128 = expected_result_dfp128;
+    printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
+	   (unsigned long long) (conv.u128>>64),
+	   (unsigned long long) (conv.u128 & 0xFFFFFFFFFFFFFFFF));
+#else
+    abort();
+#endif
+  }
+
+  expected_result = 4;
+
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
+  arg1_dfp128 = conv.d128;
+
+  result = (__int128_t) arg1_dfp128;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR:  convert DFP value ");
+    printf("0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)(conv.u128>>64),
+	   (unsigned long long)(conv.u128 & 0xFFFFFFFFFFFFFFFF));
+    printf("to __int128 value = ");
+    print_i128 (result);
+    printf("\ndoes not match expected_result = ");
+    print_i128 (expected_result);
+    printf("\n");
+#else
+    abort();
+#endif
+  }
   return 0;
 }
-- 
2.27.0




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 5/6 ver 3] rs6000, Add test 128-bit shifts for just the int128 type.
  2020-10-13  0:23         ` Segher Boessenkool
                             ` (4 preceding siblings ...)
  2021-01-19 22:33           ` [PATCH 4/6 ver 3] Add TI to TD (128-bit DFP) and TD to TI support Carl Love
@ 2021-01-19 22:33           ` Carl Love
  2021-01-19 22:33           ` [PATCH 6/6 ver 3] Conversions between 128-bit integer and floating point values Carl Love
  6 siblings, 0 replies; 41+ messages in thread
From: Carl Love @ 2021-01-19 22:33 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Will, Segher:

Patch 4 adds the vector 128-bit integer shift instruction support for
the V1TI type.  This patch also renames and moves the VSX_TI iterator
from vsx.md to VEC_TI in vector.md.  The uses of VEC_TI are also
updated.

This patch also renames and moves the VSX_TI iterator from vsx.md to
VEC_TI in vector.md.  The uses of VEC_TI are also updated.

version 3:
  No additional functional changes.
  Tested on Power 8BE, Power 9, Power 10.
  
version 2:
  Re-tested the patch on Power 9 with no regression errors.

                    Carl Love

--------------------------------------------------------

gcc/ChangeLog

2021-01-12  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
	Rename to altivec_vslq_<mode>, altivec_vsrq_<mode>, mode VEC_TI.
	* config/rs6000/vector.md (VEC_TI): Was named VSX_TI in vsx.md.
	(vashlv1ti3): Change to vashl<mode>3, mode VEC_TI.
	(vlshrv1ti3): Change to vlshr<mode>3, mode VEC_TI.
	* config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator. Update
	uses of VSX_TI to VEC_TI.

gcc/testsuite/ChangeLog

2021-01-12  Carl Love  <cel@us.ibm.com>
	gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
	tests.
---
 gcc/config/rs6000/altivec.md                  | 16 ++++-----
 gcc/config/rs6000/vector.md                   | 27 ++++++++-------
 gcc/config/rs6000/vsx.md                      | 33 +++++++++----------
 .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++++++--
 4 files changed, 52 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index cb83c5ce012..61ab5c9afb6 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2221,10 +2221,10 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vslq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vslq_<mode>"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+		     (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vslq %0,%1,%2"
@@ -2238,10 +2238,10 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vsrq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vsrq_<mode>"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+			   (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vsrq %0,%1,%2"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 0f252c915b0..6a4cd69d866 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,6 +26,9 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
 
+;; 128-bit int modes
+(define_mode_iterator VEC_TI [V1TI TI])
+
 ;; Vector int modes for parity
 (define_mode_iterator VEC_IP [V8HI
 			      V4SI
@@ -1627,17 +1630,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vashlv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vashl<mode>3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+			 (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn (gen_altivec_vslq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vslq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
@@ -1650,17 +1653,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vlshrv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vlshr<mode>3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+			   (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn (gen_altivec_vsrq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vsrq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index fd779435390..e5db5793488 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -37,9 +37,6 @@
 				  TI
 				  V1TI])
 
-;; Iterator for 128-bit integer types that go in a single vector register.
-(define_mode_iterator VSX_TI [TI V1TI])
-
 ;; Iterator for the 2 32-bit vector types
 (define_mode_iterator VSX_W [V4SF V4SI])
 
@@ -946,9 +943,9 @@
 ;; special V1TI container class, which it is not appropriate to use vec_select
 ;; for the type.
 (define_insn "*vsx_le_permute_<mode>"
-  [(set (match_operand:VSX_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
-	(rotate:VSX_TI
-	 (match_operand:VSX_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
+  [(set (match_operand:VEC_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
+	(rotate:VEC_TI
+	 (match_operand:VEC_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
 	 (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
   "@
@@ -962,10 +959,10 @@
    (set_attr "type" "vecperm,vecload,vecstore,*,load,store")])
 
 (define_insn_and_split "*vsx_le_undo_permute_<mode>"
-  [(set (match_operand:VSX_TI 0 "vsx_register_operand" "=wa,wa")
-	(rotate:VSX_TI
-	 (rotate:VSX_TI
-	  (match_operand:VSX_TI 1 "vsx_register_operand" "0,wa")
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=wa,wa")
+	(rotate:VEC_TI
+	 (rotate:VEC_TI
+	  (match_operand:VEC_TI 1 "vsx_register_operand" "0,wa")
 	  (const_int 64))
 	 (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX"
@@ -1033,11 +1030,11 @@
 ;; Peepholes to catch loads and stores for TImode if TImode landed in
 ;; GPR registers on a little endian system.
 (define_peephole2
-  [(set (match_operand:VSX_TI 0 "int_reg_operand")
-	(rotate:VSX_TI (match_operand:VSX_TI 1 "memory_operand")
+  [(set (match_operand:VEC_TI 0 "int_reg_operand")
+	(rotate:VEC_TI (match_operand:VEC_TI 1 "memory_operand")
 		       (const_int 64)))
-   (set (match_operand:VSX_TI 2 "int_reg_operand")
-	(rotate:VSX_TI (match_dup 0)
+   (set (match_operand:VEC_TI 2 "int_reg_operand")
+	(rotate:VEC_TI (match_dup 0)
 		       (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
    && (rtx_equal_p (operands[0], operands[2])
@@ -1045,11 +1042,11 @@
    [(set (match_dup 2) (match_dup 1))])
 
 (define_peephole2
-  [(set (match_operand:VSX_TI 0 "int_reg_operand")
-	(rotate:VSX_TI (match_operand:VSX_TI 1 "int_reg_operand")
+  [(set (match_operand:VEC_TI 0 "int_reg_operand")
+	(rotate:VEC_TI (match_operand:VEC_TI 1 "int_reg_operand")
 		       (const_int 64)))
-   (set (match_operand:VSX_TI 2 "memory_operand")
-	(rotate:VSX_TI (match_dup 0)
+   (set (match_operand:VEC_TI 2 "memory_operand")
+	(rotate:VEC_TI (match_dup 0)
 		       (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
    && peep2_reg_dead_p (2, operands[0])"
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 42cb91c7ba9..953c23ec046 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -53,6 +53,18 @@ void print_i128(__int128_t val)
 
 void abort (void);
 
+__attribute__((noinline))
+__int128_t shift_right (__int128_t a, __uint128_t b)
+{
+  return a >> b;
+}
+
+__attribute__((noinline))
+__int128_t shift_left (__int128_t a, __uint128_t b)
+{
+  return a << b;
+}
+
 int main ()
 {
   int i, result_int;
@@ -142,7 +154,7 @@ int main ()
 #endif
   }
 
-  arg1 = 3;
+  arg1 = vec_result[0];
   uarg2 = 4;
   expected_result = arg1*16;
 
@@ -226,7 +238,7 @@ int main ()
 #endif
   }
 
-  arg1 = 48;
+  arg1 = vec_uresult[0];
   uarg2 = 4;
   expected_result = arg1/16;
 
-- 
2.27.0



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 6/6 ver 3] Conversions between 128-bit integer and floating point values.
  2020-10-13  0:23         ` Segher Boessenkool
                             ` (5 preceding siblings ...)
  2021-01-19 22:33           ` [PATCH 5/6 ver 3] rs6000, Add test 128-bit shifts for just the int128 type Carl Love
@ 2021-01-19 22:33           ` Carl Love
  6 siblings, 0 replies; 41+ messages in thread
From: Carl Love @ 2021-01-19 22:33 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel, Michael Meissner
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Will, Segher:
 
This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats using the new P10 instructions
dcffixqq and dctfixqq.  The new instructions are only used on P10 HW,
otherwise the conversions continue to use the existing SW routines.

The files fixkfti-sw.c and fixunskfti-sw.c are renamed versions of
fixkfti.c and fixunskfti.c respectively.  The function names in the
files were updated with the rename as well as some white spaces fixes.

version 3:  Numerous changes with help/input from Michael Meissner

  Add assembler checks for the 128-bit conversion  instructions, see
  configure and configure.ac.

  Add the libgcc resolvers to select sw or hw support for the
conversions.

  Rename, rewrite the existing conversion files (fixkfti.c,
fixunskfti.c,
  floattikf.c, floatuntikf.c) to create the sw conversion files.

  Tested on Power 8BE, Power9, Power10.

version 2:

  Fixed a typo in the ChangeLog noted by Will.

  Removed the target ppc_native_128bit from the test case as we no
  longer have the 128-bit flag.

  Re-tested the patch on Power 9 with no regression errors.

                Carl Love

----------------------------------------------------

gcc/ChangeLog

2021-01-15  Carl Love  <cel@us.ibm.com>
	* config/rs6000/rs6000.md (floatti<mode>2, floatunsti<mode>2,
	fix_trunc<mode>ti2, fixuns_trunc<mode>ti2): Add
	define_insn for mode IEEE 128.

gcc/testsuite/ChangeLog

2021-01-15  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/fp128_conversions.c: New file.
	* gcc.target/powerpc/int_128bit-runnable.c(vextsd2ppc_native_128bitq,
	vcmpuq, vcmpsq, vcmpequq, vcmpequq., vcmpgtsq, vcmpgtsq.
	vcmpgtuq, vcmpgtuq.): Update scan-assembler-times.
	(ppc_native_128bit): Remove dg-require-effective-target.

libgcc/ChangeLog
2021-01-15  Carl Love  <cel@us.ibm.com>
	* config.host: Add if test and set for
	libgcc_cv_powerpc_3_1_float128_hw.
	* libgcc/config/rs6000/fixkfti.c: Renamed to fixkfti-sw.c.
	Change calls of __fixkfti to __fixkfti_sw.
	* libgcc/config/rs6000/fixunskfti.c: Renamed to fixunskfti-sw.c.
	Change calls of __fixunskfti to __fixunskfti_sw.
	* libgcc/config/rs6000/float128-p10.c (__floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw): New file.
	* libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1): New macro.
	(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
	__fixunskfti_resolve): Add resolve functions.
	(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New functions.
	* libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
	__fixtfti, __fixunstfti): Add editor commands to change names.
	* libgcc/config/rs6000/float128-sed-hw (__floattitf,
	__floatuntitf, __fixtfti, __fixunstfti): Add editor commands to
	change names.
	* libgcc/config/rs6000/floattikf.c: Renamed to floattikf-sw.c.
	* libgcc/config/rs6000/floatuntikf.c: Renamed to floatuntikf-sw.c.
	* libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
	__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
	__floatuntikf, __fixkfti, __fixunskfti): New extern declarations.
	* libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
	fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
	(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
	file names to fp128_ppc_funcs.
	* libgcc/config/rs6000/t-float128-hw(fp128_3_1_hw_funcs,
	fp128_3_1_hw_src, fp128_3_1_hw_static_obj, fp128_3_1_hw_shared_obj,
	fp128_3_1_hw_obj): Add variables for ISA 3.1 support.
	* libgcc/config/rs6000/t-float128-p10-hw: New file.
	* configure: Update script for isa 3.1 128-bit float support.
	* configure.ac: Add check for 128-bit float hardware support.
---
 gcc/config/rs6000/rs6000.md                   |  36 +++
 .../gcc.target/powerpc/fp128_conversions.c    | 294 ++++++++++++++++++
 .../gcc.target/powerpc/int_128bit-runnable.c  |  14 +-
 libgcc/config.host                            |   4 +
 .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
 .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   4 +-
 libgcc/config/rs6000/float128-ifunc.c         |  44 ++-
 libgcc/config/rs6000/float128-p10.c           |  71 +++++
 libgcc/config/rs6000/float128-sed             |   4 +
 libgcc/config/rs6000/float128-sed-hw          |   4 +
 .../rs6000/{floattikf.c => floattikf-sw.c}    |   4 +-
 .../{floatuntikf.c => floatuntikf-sw.c}       |   4 +-
 libgcc/config/rs6000/quad-float128.h          |  17 +-
 libgcc/config/rs6000/t-float128               |  12 +-
 libgcc/config/rs6000/t-float128-hw            |  16 +
 libgcc/config/rs6000/t-float128-p10-hw        |  24 ++
 libgcc/configure                              |  39 ++-
 libgcc/configure.ac                           |  25 ++
 18 files changed, 586 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
 rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
 rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (96%)
 create mode 100644 libgcc/config/rs6000/float128-p10.c
 rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
 rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%)
 create mode 100644 libgcc/config/rs6000/t-float128-p10-hw

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bb9fb42f82a..096c874f6e4 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6390,6 +6390,42 @@
    xscvsxddp %x0,%x1"
   [(set_attr "type" "fp")])
 
+(define_insn "floatti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvsqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "floatunsti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvuqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fix_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpsqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fixuns_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpuqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
 ; Allow the combiner to merge source memory operands to the conversion so that
 ; the optimizer/register allocator doesn't try to load the value too early in a
 ; GPR and then use store/load to move it to a FPR and suffer from a store-load
diff --git a/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
new file mode 100644
index 00000000000..c20282fa0e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
@@ -0,0 +1,294 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mxscvsqqp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxscvuqqp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxscvqpsqz\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxscvqpuqz\M} 1 } } */
+
+#include <stdio.h>
+#include <math.h>
+#include <fenv.h>
+#include <stdlib.h>
+#include <wchar.h>
+
+#define DEBUG 0
+
+void
+abort (void);
+
+float
+conv_i_2_fp( long long int a)
+{
+  return (float) a;
+}
+
+double
+conv_i_2_fpd( long long int a)
+{
+  return (double) a;
+}
+
+double
+conv_ui_2_fpd( unsigned long long int a)
+{
+  return (double) a;
+}
+
+__float128
+conv_i128_2_fp128 (__int128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__float128
+conv_ui128_2_fp128 (__uint128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__int128_t
+conv_fp128_2_i128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__int128_t) a;
+}
+
+__uint128_t
+conv_fp128_2_ui128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__uint128_t) a;
+}
+
+long double
+conv_i128_2_ld (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (long double) a;
+}
+
+__ibm128
+conv_i128_2_ibm128 (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble, message uses IBM long double, no binary output
+  return (__ibm128) a;
+}
+
+int
+main()
+{
+	float a, expected_result_float;
+	double b, expected_result_double;
+	long long int c, expected_result_llint;
+	unsigned long long int u;
+	__int128_t d;
+	__uint128_t u128;
+	unsigned long long expected_result_uint128[2] ;
+	__float128 e;
+	long double ld;     // another 128-bit float version
+
+	union conv_t {
+		float a;
+		double b;
+		long long int c;
+		long long int128[2] ;
+		unsigned long long uint128[2] ;
+		unsigned long long int u;
+		__int128_t d;
+		__uint128_t u128;
+		__float128 e;
+		long double ld;     // another 128-bit float version
+	} conv, conv_result;
+
+	c = 20;
+	expected_result_llint = 20.00000;
+	a = conv_i_2_fp (c);
+
+	if (a != expected_result_llint) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fp(%lld) = %10.5f\n", c, a);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_llint);
+#else
+		abort();
+#endif
+	}
+
+	c = 20;
+	expected_result_double = 20.00000;
+	b = conv_i_2_fpd (c);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fpd(%lld) = %10.5f\n", d, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+	u = 20;
+	expected_result_double = 20.00000;
+	b = conv_ui_2_fpd (u);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_ui_2_fpd(%llu) = %10.5f\n", u, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+  d = -3210;
+  d = (d * 10000000000) + 9876543210;
+  conv_result.e = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0xc02bd2f9068d1160;
+  expected_result_uint128[0] = 0x0;
+  
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(-32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  d = 123;
+  d = (d * 10000000000) + 1234567890;
+  conv_result.ld = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x4271eab4c8ed2000;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(1231234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  u128 = 8760;
+  u128 = (u128 * 10000000000) + 1234567890;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+  expected_result_uint128[1] = 0x402d3eb101df8b48;
+  expected_result_uint128[0] = 0x0;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(87601234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  u128 = 3210;
+  u128 = (u128 * 10000000000) + 9876543210;
+  expected_result_uint128[1] = 0x402bd3429c8feea0;
+  expected_result_uint128[0] = 0x0;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 12345.6789;
+  expected_result_uint128[1] = 0x1407374883526960;
+  expected_result_uint128[0] = 0x3039;
+
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) =  ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = -6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0xffffffffffffe57b;
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x1a85;
+  conv_result.d = conv_fp128_2_ui128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_ui128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+	  
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 953c23ec046..5c3aa4fa5ca 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -4,22 +4,16 @@
 
 /* Check that the expected 128-bit instructions are generated if the processor
    supports the 128-bit integer instructions. */
-/* { dg-final { scan-assembler-times {\mvextsd2q\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvextsd2q\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mvslq\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvsrq\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvsraq\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mvcmpuq\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mvcmpsq\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mvcmpequq\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mvcmpequq.\M} 16 } } */
-/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mvcmpgtsq.\M} 16 } } */
-/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mvcmpgtuq.\M} 16 } } */
-/* { dg-final { scan-assembler-times {\mvmuleud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 16 } } */
 /* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
diff --git a/libgcc/config.host b/libgcc/config.host
index f808b61be70..50f00062232 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1224,6 +1224,10 @@ powerpc*-*-linux*)
 		tmake_file="${tmake_file} rs6000/t-float128-hw"
 	fi
 
+	if test $libgcc_cv_powerpc_3_1_float128_hw = yes; then
+		tmake_file="${tmake_file} rs6000/t-float128-p10-hw"
+	fi
+
 	extra_parts="$extra_parts ecrti.o ecrtn.o ncrti.o ncrtn.o"
 	md_unwind_header=rs6000/linux-unwind.h
 	;;
diff --git a/libgcc/config/rs6000/fixkfti.c b/libgcc/config/rs6000/fixkfti-sw.c
similarity index 96%
rename from libgcc/config/rs6000/fixkfti.c
rename to libgcc/config/rs6000/fixkfti-sw.c
index 0d965bc6253..cc000fca0f8 100644
--- a/libgcc/config/rs6000/fixkfti.c
+++ b/libgcc/config/rs6000/fixkfti-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TItype
-__fixkfti (TFtype a)
+__fixkfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/fixunskfti.c b/libgcc/config/rs6000/fixunskfti-sw.c
similarity index 96%
rename from libgcc/config/rs6000/fixunskfti.c
rename to libgcc/config/rs6000/fixunskfti-sw.c
index f285b4e3fbd..7a04d1a489a 100644
--- a/libgcc/config/rs6000/fixunskfti.c
+++ b/libgcc/config/rs6000/fixunskfti-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 UTItype
-__fixunskfti (TFtype a)
+__fixunskfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/float128-ifunc.c b/libgcc/config/rs6000/float128-ifunc.c
index 85380471c5e..57545dd7edb 100644
--- a/libgcc/config/rs6000/float128-ifunc.c
+++ b/libgcc/config/rs6000/float128-ifunc.c
@@ -46,14 +46,9 @@
 #endif
 
 #define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW : SW)
+#define SW_OR_HW_ISA3_1(SW, HW) (__builtin_cpu_supports ("arch_3_1") ? HW : SW)
 
 /* Resolvers.  */
-
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 static __typeof__ (__addkf3_sw) *
 __addkf3_resolve (void)
 {
@@ -102,6 +97,18 @@ __floatdikf_resolve (void)
   return SW_OR_HW (__floatdikf_sw, __floatdikf_hw);
 }
 
+static __typeof__ (__floattikf_sw) *
+__floattikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floattikf_sw, __floattikf_hw);
+}
+
+static __typeof__ (__floatuntikf_sw) *
+__floatuntikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floatuntikf_sw, __floatuntikf_hw);
+}
+
 static __typeof__ (__floatunsikf_sw) *
 __floatunsikf_resolve (void)
 {
@@ -114,6 +121,19 @@ __floatundikf_resolve (void)
   return SW_OR_HW (__floatundikf_sw, __floatundikf_hw);
 }
 
+
+static __typeof__ (__fixkfti_sw) *
+__fixkfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixkfti_sw, __fixkfti_hw);
+}
+
+static __typeof__ (__fixunskfti_sw) *
+__fixunskfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixunskfti_sw, __fixunskfti_hw);
+}
+
 static __typeof__ (__fixkfsi_sw) *
 __fixkfsi_resolve (void)
 {
@@ -303,6 +323,18 @@ TFtype __floatsikf (SItype_ppc)
 TFtype __floatdikf (DItype_ppc)
   __attribute__ ((__ifunc__ ("__floatdikf_resolve")));
 
+TFtype __floattikf (TItype_ppc)
+  __attribute__ ((__ifunc__ ("__floattikf_resolve")));
+
+TFtype __floatuntikf (UTItype_ppc)
+  __attribute__ ((__ifunc__ ("__floatuntikf_resolve")));
+
+TItype_ppc __fixkfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixkfti_resolve")));
+
+UTItype_ppc __fixunskfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixunskfti_resolve")));
+
 TFtype __floatunsikf (USItype_ppc)
   __attribute__ ((__ifunc__ ("__floatunsikf_resolve")));
 
diff --git a/libgcc/config/rs6000/float128-p10.c b/libgcc/config/rs6000/float128-p10.c
new file mode 100644
index 00000000000..7f5d317631a
--- /dev/null
+++ b/libgcc/config/rs6000/float128-p10.c
@@ -0,0 +1,71 @@
+/* Automatic switching between software and hardware IEEE 128-bit
+   ISA 3.1 floating-point emulation for PowerPC.
+
+   Copyright (C) 2016-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Carl Love (cel@us.ibm.com)
+   Code is based on the main soft-fp library written by:
+	Richard Henderson (rth@cygnus.com) and
+	Jakub Jelinek (jj@ultra.linux.cz).
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Note, the hardware conversion instructions for 128-bit integers are
+   supported for ISA 3.1 and later.  Only compile this file with -mcpu=power10
+   or newer support.  */
+
+#include <soft-fp.h>
+#include <quad-float128.h>
+
+#ifndef __FLOAT128_HARDWARE__
+#error "This module must be compiled with IEEE 128-bit hardware support"
+#endif
+
+#ifndef _ARCH_PWR10
+#error "This module must be compiled for Power 10 support"
+#endif
+
+TFtype
+__floattikf_hw (TItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TFtype
+__floatuntikf_hw (UTItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TItype_ppc
+__fixkfti_hw (TFtype a)
+{
+  return (TItype_ppc) a;
+}
+
+UTItype_ppc
+__fixunskfti_hw (TFtype a)
+{
+  return (UTItype_ppc) a;
+}
diff --git a/libgcc/config/rs6000/float128-sed b/libgcc/config/rs6000/float128-sed
index d9a089ff9ba..c0fcddb1959 100644
--- a/libgcc/config/rs6000/float128-sed
+++ b/libgcc/config/rs6000/float128-sed
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi/g
 s/__fixunstfdi/__fixunskfdi/g
 s/__fixunstfsi/__fixunskfsi/g
 s/__floatditf/__floatdikf/g
+s/__floattitf/__floattikf/g
+s/__floatuntitf/__floatuntikf/g
+s/__fixtfti/__fixkfti/g
+s/__fixunstfti/__fixunskfti/g
 s/__floatsitf/__floatsikf/g
 s/__floatunditf/__floatundikf/g
 s/__floatunsitf/__floatunsikf/g
diff --git a/libgcc/config/rs6000/float128-sed-hw b/libgcc/config/rs6000/float128-sed-hw
index acf36b0c17d..3d2bf556da1 100644
--- a/libgcc/config/rs6000/float128-sed-hw
+++ b/libgcc/config/rs6000/float128-sed-hw
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi_sw/g
 s/__fixunstfdi/__fixunskfdi_sw/g
 s/__fixunstfsi/__fixunskfsi_sw/g
 s/__floatditf/__floatdikf_sw/g
+s/__floattitf/__floattikf_sw/g
+s/__floatuntitf/__floatuntikf_sw/g
+s/__fixtfti/__fixkfti_sw/g
+s/__fixunstfti/__fixunskfti_sw/g
 s/__floatsitf/__floatsikf_sw/g
 s/__floatunditf/__floatundikf_sw/g
 s/__floatunsitf/__floatunsikf_sw/g
diff --git a/libgcc/config/rs6000/floattikf.c b/libgcc/config/rs6000/floattikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floattikf.c
rename to libgcc/config/rs6000/floattikf-sw.c
index cc5c7ca0fd0..4e1786cd229 100644
--- a/libgcc/config/rs6000/floattikf.c
+++ b/libgcc/config/rs6000/floattikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floattikf (TItype i)
+__floattikf_sw (TItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/floatuntikf.c b/libgcc/config/rs6000/floatuntikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floatuntikf.c
rename to libgcc/config/rs6000/floatuntikf-sw.c
index 96f2d3bdcb8..c4b814ddd68 100644
--- a/libgcc/config/rs6000/floatuntikf.c
+++ b/libgcc/config/rs6000/floatuntikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floatuntikf (UTItype i)
+__floatuntikf_sw (UTItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/quad-float128.h b/libgcc/config/rs6000/quad-float128.h
index 0eb1d34691f..bbb7fcb422e 100644
--- a/libgcc/config/rs6000/quad-float128.h
+++ b/libgcc/config/rs6000/quad-float128.h
@@ -87,19 +87,18 @@ extern USItype_ppc __fixunskfsi_sw (TFtype);
 extern UDItype_ppc __fixunskfdi_sw (TFtype);
 extern TFtype __floatsikf_sw (SItype_ppc);
 extern TFtype __floatdikf_sw (DItype_ppc);
+extern TFtype __floattikf_sw (TItype_ppc);
 extern TFtype __floatunsikf_sw (USItype_ppc);
 extern TFtype __floatundikf_sw (UDItype_ppc);
+extern TFtype __floatuntikf_sw (UTItype_ppc);
+extern TItype_ppc __fixkfti_sw (TFtype);
+extern UTItype_ppc __fixunskfti_sw (TFtype);
 extern IBM128_TYPE __extendkftf2_sw (TFtype);
 extern TFtype __trunctfkf2_sw (IBM128_TYPE);
 extern TCtype __mulkc3_sw (TFtype, TFtype, TFtype, TFtype);
 extern TCtype __divkc3_sw (TFtype, TFtype, TFtype, TFtype);
 
 #ifdef _ARCH_PPC64
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 extern TItype_ppc __fixkfti (TFtype);
 extern UTItype_ppc __fixunskfti (TFtype);
 extern TFtype __floattikf (TItype_ppc);
@@ -130,8 +129,12 @@ extern USItype_ppc __fixunskfsi_hw (TFtype);
 extern UDItype_ppc __fixunskfdi_hw (TFtype);
 extern TFtype __floatsikf_hw (SItype_ppc);
 extern TFtype __floatdikf_hw (DItype_ppc);
+extern TFtype __floattikf_hw (TItype_ppc);
 extern TFtype __floatunsikf_hw (USItype_ppc);
 extern TFtype __floatundikf_hw (UDItype_ppc);
+extern TFtype __floatuntikf_hw (UTItype_ppc);
+extern TItype_ppc __fixkfti_hw (TFtype);
+extern UTItype_ppc __fixunskfti_hw (TFtype);
 extern IBM128_TYPE __extendkftf2_hw (TFtype);
 extern TFtype __trunctfkf2_hw (IBM128_TYPE);
 extern TCtype __mulkc3_hw (TFtype, TFtype, TFtype, TFtype);
@@ -162,8 +165,12 @@ extern USItype_ppc __fixunskfsi (TFtype);
 extern UDItype_ppc __fixunskfdi (TFtype);
 extern TFtype __floatsikf (SItype_ppc);
 extern TFtype __floatdikf (DItype_ppc);
+extern TFtype __floattikf (TItype_ppc);
 extern TFtype __floatunsikf (USItype_ppc);
 extern TFtype __floatundikf (UDItype_ppc);
+extern TFtype __floatuntikf (UTItype_ppc);
+extern TItype_ppc __fixkfti (TFtype);
+extern UTItype_ppc __fixunskfti (TFtype);
 extern IBM128_TYPE __extendkftf2 (TFtype);
 extern TFtype __trunctfkf2 (IBM128_TYPE);
 
diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
index d5413445189..fe67b831888 100644
--- a/libgcc/config/rs6000/t-float128
+++ b/libgcc/config/rs6000/t-float128
@@ -23,7 +23,8 @@ fp128_softfp_shared_obj	= $(addsuffix -sw_s$(objext),$(fp128_softfp_funcs))
 fp128_softfp_obj	= $(fp128_softfp_static_obj) $(fp128_softfp_shared_obj)
 
 # New functions for software emulation
-fp128_ppc_funcs		= floattikf floatuntikf fixkfti fixunskfti \
+fp128_ppc_funcs		= floattikf-sw floatuntikf-sw \
+			  fixkfti-sw fixunskfti-sw \
 			  extendkftf2-sw trunctfkf2-sw \
 			  sfp-exceptions _mulkc3 _divkc3 _powikf2
 
@@ -35,13 +36,16 @@ fp128_ppc_obj		= $(fp128_ppc_static_obj) $(fp128_ppc_shared_obj)
 
 # All functions
 fp128_funcs		= $(fp128_softfp_funcs) $(fp128_ppc_funcs) \
-			  $(fp128_hw_funcs) $(fp128_ifunc_funcs)
+			  $(fp128_hw_funcs) $(fp128_ifunc_funcs) \
+			  $(fp128_3_1_hw_funcs)
 
 fp128_src		= $(fp128_softfp_src) $(fp128_ppc_src) \
-			  $(fp128_hw_src) $(fp128_ifunc_src)
+			  $(fp128_hw_src) $(fp128_ifunc_src) \
+			  $(fp128_3_1_hw_src)
 
 fp128_obj		= $(fp128_softfp_obj) $(fp128_ppc_obj) \
-			  $(fp128_hw_obj) $(fp128_ifunc_obj)
+			  $(fp128_hw_obj) $(fp128_ifunc_obj) \
+			  $(fp128_3_1_hw_obj)
 
 fp128_sed		= $(srcdir)/config/rs6000/float128-sed$(fp128_sed_hw)
 fp128_dep		= $(fp128_sed) $(srcdir)/config/rs6000/t-float128
diff --git a/libgcc/config/rs6000/t-float128-hw b/libgcc/config/rs6000/t-float128-hw
index d64ca4dd694..c0827366cc4 100644
--- a/libgcc/config/rs6000/t-float128-hw
+++ b/libgcc/config/rs6000/t-float128-hw
@@ -13,6 +13,13 @@ fp128_hw_static_obj	= $(addsuffix $(objext),$(fp128_hw_funcs))
 fp128_hw_shared_obj	= $(addsuffix _s$(objext),$(fp128_hw_funcs))
 fp128_hw_obj		= $(fp128_hw_static_obj) $(fp128_hw_shared_obj)
 
+# New functions for ISA 3.1 hardware support
+fp128_3_1_hw_funcs	= float128-p10
+fp128_3_1_hw_src	= $(srcdir)/config/rs6000/float128-p10.c
+fp128_3_1_hw_static_obj	= $(addsuffix $(objext),$(fp128_3_1_hw_funcs))
+fp128_3_1_hw_shared_obj	= $(addsuffix _s$(objext),$(fp128_3_1_hw_funcs))
+fp128_3_1_hw_obj	= $(fp128_3_1_hw_static_obj) $(fp128_3_1_hw_shared_obj)
+
 fp128_ifunc_funcs	= float128-ifunc
 fp128_ifunc_src		= $(srcdir)/config/rs6000/float128-ifunc.c
 fp128_ifunc_static_obj	= float128-ifunc$(objext)
@@ -30,9 +37,18 @@ FP128_CFLAGS_HW		 = -Wno-type-limits -mvsx -mfloat128 \
 			   -I$(srcdir)/config/rs6000 \
 			   $(FLOAT128_HW_INSNS)
 
+FP128_3_1_CFLAGS_HW	 = -Wno-type-limits -mvsx -mcpu=power10 \
+			   -mfloat128-hardware -mno-gnu-attribute \
+			   -I$(srcdir)/soft-fp \
+			   -I$(srcdir)/config/rs6000 \
+			   $(FLOAT128_HW_INSNS)
+
 $(fp128_hw_obj)		 : INTERNAL_CFLAGS += $(FP128_CFLAGS_HW)
 $(fp128_hw_obj)		 : $(srcdir)/config/rs6000/t-float128-hw
 
+$(fp128_3_1_hw_obj)	 : INTERNAL_CFLAGS += $(FP128_3_1_CFLAGS_HW)
+$(fp128_3_1_hw_obj)	 : $(srcdir)/config/rs6000/t-float128-p10-hw
+
 $(fp128_ifunc_obj)	 : INTERNAL_CFLAGS += $(FP128_CFLAGS_SW)
 $(fp128_ifunc_obj)	 : $(srcdir)/config/rs6000/t-float128-hw
 
diff --git a/libgcc/config/rs6000/t-float128-p10-hw b/libgcc/config/rs6000/t-float128-p10-hw
new file mode 100644
index 00000000000..de36227c3d1
--- /dev/null
+++ b/libgcc/config/rs6000/t-float128-p10-hw
@@ -0,0 +1,24 @@
+# Support for adding __float128 hardware support to the powerpc.
+# Tell the float128 functions that the ISA 3.1 hardware support can
+# be compiled it to be selected via IFUNC functions.
+
+FLOAT128_HW_INSNS	= -DFLOAT128_HW_INSNS
+
+# New functions for hardware support
+
+fp128_3_1_hw_funcs	= float128-p10
+fp128_3_1_hw_src	= $(srcdir)/config/rs6000/float128-p10.c
+fp128_3_1_hw_static_obj	= $(addsuffix $(objext),$(fp128_3_1_hw_funcs))
+fp128_3_1_hw_shared_obj	= $(addsuffix _s$(objext),$(fp128_3_1_hw_funcs))
+fp128_3_1_hw_obj	= $(fp128_3_1_hw_static_obj) $(fp128_3_1_hw_shared_obj)
+
+# Build the hardware support functions with appropriate hardware support
+FP128_3_1_CFLAGS_HW	 = -Wno-type-limits -mvsx -mfloat128 \
+			   -mpower10 \
+			   -mfloat128-hardware -mno-gnu-attribute \
+			   -I$(srcdir)/soft-fp \
+			   -I$(srcdir)/config/rs6000 \
+			   $(FLOAT128_HW_INSNS)
+
+$(fp128_3_1_hw_obj)		 : INTERNAL_CFLAGS += $(FP128_3_1_CFLAGS_HW)
+$(fp128_3_1_hw_obj)		 : $(srcdir)/config/rs6000/t-float128-p10-hw
diff --git a/libgcc/configure b/libgcc/configure
index 78fc22a5784..db00c9ac7ca 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -4913,7 +4913,7 @@ case "$host" in
     case "$enable_cet" in
       auto)
 	# Check if target supports multi-byte NOPs
-	# and if assembler supports CET insn.
+	# and if compiler and assembler support CET insn.
 	cet_save_CFLAGS="$CFLAGS"
 	CFLAGS="$CFLAGS -fcf-protection"
 	cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -5261,6 +5261,43 @@ fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_powerpc_float128_hw" >&5
+$as_echo "$libgcc_cv_powerpc_float128_hw" >&6; }
+  CFLAGS="$saved_CFLAGS"
+
+  saved_CFLAGS="$CFLAGS"
+  CFLAGS="$CFLAGS -mpower10 -mfloat128-hardware"
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PowerPC ISA 3.1 to build hardware __float128 libraries" >&5
+$as_echo_n "checking for PowerPC ISA 3.1 to build hardware __float128 libraries... " >&6; }
+if ${libgcc_cv_powerpc_float128_hw+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <sys/auxv.h>
+     #ifndef AT_PLATFORM
+     #error "AT_PLATFORM is not defined"
+     #endif
+     #ifndef __BUILTIN_CPU_SUPPORTS__
+     #error "__builtin_cpu_supports is not available"
+     #endif
+     vector unsigned char add (vector unsigned char a, vector unsigned char b)
+     {
+       vector unsigned char ret;
+       __asm__ ("xscvsqqp %0,%1,%2" : "=v" (ret) : "v" (a), "v" (b));
+       return ret;
+     }
+     void *add_resolver (void) { return (void *) add; }
+     __float128 add_ifunc (__float128, __float128)
+	__attribute__ ((__ifunc__ ("add_resolver")));
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  libgcc_cv_powerpc_3_1_float128_hw=yes
+else
+  libgcc_cv_powerpc_3_1_float128_hw=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_powerpc_float128_hw" >&5
 $as_echo "$libgcc_cv_powerpc_float128_hw" >&6; }
   CFLAGS="$saved_CFLAGS"
 esac
diff --git a/libgcc/configure.ac b/libgcc/configure.ac
index ed50c0e9b49..b0088a16372 100644
--- a/libgcc/configure.ac
+++ b/libgcc/configure.ac
@@ -458,6 +458,31 @@ powerpc*-*-linux*)
     [libgcc_cv_powerpc_float128_hw=yes],
     [libgcc_cv_powerpc_float128_hw=no])])
   CFLAGS="$saved_CFLAGS"
+
+  saved_CFLAGS="$CFLAGS"
+  CFLAGS="$CFLAGS -mpower10 -mfloat128-hardware"
+  AC_CACHE_CHECK([for PowerPC ISA 3.1 to build hardware __float128 libraries],
+		 [libgcc_cv_powerpc_float128_hw],
+		 [AC_COMPILE_IFELSE(
+    [AC_LANG_SOURCE([#include <sys/auxv.h>
+     #ifndef AT_PLATFORM
+     #error "AT_PLATFORM is not defined"
+     #endif
+     #ifndef __BUILTIN_CPU_SUPPORTS__
+     #error "__builtin_cpu_supports is not available"
+     #endif
+     vector unsigned char add (vector unsigned char a, vector unsigned char b)
+     {
+       vector unsigned char ret;
+       __asm__ ("xscvsqqp %0,%1,%2" : "=v" (ret) : "v" (a), "v" (b));
+       return ret;
+     }
+     void *add_resolver (void) { return (void *) add; }
+     __float128 add_ifunc (__float128, __float128)
+	__attribute__ ((__ifunc__ ("add_resolver")));])],
+    [libgcc_cv_powerpc_3_1_float128_hw=yes],
+    [libgcc_cv_powerpc_3_1_float128_hw=no])])
+  CFLAGS="$saved_CFLAGS"
 esac
 
 # Collect host-machine-specific information.
-- 
2.27.0



^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-01-19 22:38 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-21 21:17 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
2020-09-21 23:56 ` [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
2020-09-24 18:20   ` will schmidt
2020-10-05 18:51     ` Carl Love
2020-10-07 21:08       ` will schmidt
2020-10-12 20:15         ` Carl Love
2020-10-12 20:51           ` Segher Boessenkool
2020-10-12 20:43         ` Segher Boessenkool
2020-10-12 21:06           ` Carl Love
2020-09-21 23:56 ` [PATCH 2/5] RS6000 add 128-bit Integer Operations Carl Love
2020-09-24 18:21   ` will schmidt
2020-10-05 18:52     ` [PATCH 2a/5] rs6000, vec_rlnm builtin fix arguments Carl Love
2020-10-07 21:08       ` will schmidt
2020-10-12 20:15         ` Carl Love
2020-10-12 23:37           ` Segher Boessenkool
2020-10-05 18:52     ` [PATCH 2b/5] RS6000 add 128-bit Integer Operations Carl Love
2020-10-07 21:53       ` will schmidt
2020-10-12 20:16         ` Carl Love
2020-10-13  0:23         ` Segher Boessenkool
2021-01-19 22:33           ` [PATCH 0/6 ver3] " Carl Love
2021-01-19 22:33           ` [PATCH 1/6 ver 3] rs6000, Fix arguments in altivec_vrlwmi and altivec_rlwdi builtins Carl Love
2021-01-19 22:33           ` [PATCH 2/6 ver 3] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
2021-01-19 22:33           ` [PATCH 3/6 ver 3] RS6000 add 128-bit Integer Operations part 1 Carl Love
2021-01-19 22:33           ` [PATCH 4/6 ver 3] Add TI to TD (128-bit DFP) and TD to TI support Carl Love
2021-01-19 22:33           ` [PATCH 5/6 ver 3] rs6000, Add test 128-bit shifts for just the int128 type Carl Love
2021-01-19 22:33           ` [PATCH 6/6 ver 3] Conversions between 128-bit integer and floating point values Carl Love
2020-09-21 23:56 ` [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support Carl Love
2020-09-24 18:21   ` will schmidt
2020-10-05 18:52     ` Carl Love
2020-10-08 14:22       ` will schmidt
2020-10-12 20:15         ` Carl Love
2020-09-21 23:56 ` [PATCH 4/5] Test 128-bit shifts for just the int128 type Carl Love
2020-09-24 18:21   ` will schmidt
2020-10-05 18:52     ` Carl Love
2020-10-08 14:50       ` will schmidt
2020-10-12 20:15         ` Carl Love
2020-09-21 23:57 ` [PATCH 5/5] Conversions between 128-bit integer and floating point values Carl Love
2020-09-24 18:22   ` will schmidt
2020-10-05 18:52     ` Carl Love
2020-10-08 15:35       ` will schmidt
2020-10-12 20:16         ` Carl Love

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).