public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch 0/5] rs6000, 128-bit Binary Integer Operations
@ 2020-08-11 19:01 Carl Love
  2020-08-11 19:22 ` [Patch 1/5] rs6000, Add 128-bit sign extension support Carl Love
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Carl Love @ 2020-08-11 19:01 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt

Segher:

The following is a five patch series for the 128-bit Binary Integer
Operations (RFC 2608).

The last patch does the 128-bit integer to 128-bit float to/from
conversions.  The patch has been reviewed by Michael Meissner to make
sure the Floating point 128-mode handling is correct.

The patches have been tested on Power 8 and Power 9 to ensure there are
no regression errors.  The new tests have been manually compiled and
run on mambo to ensure they work correctly.

Please review the patches and let me know if they are acceptable for
mainline.  Thanks.

                       Carl Love


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Patch 1/5] rs6000, Add 128-bit sign extension support
  2020-08-11 19:01 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
@ 2020-08-11 19:22 ` Carl Love
  2020-08-13 17:36   ` Segher Boessenkool
  2020-08-11 19:22 ` [Patch 2/5] rs6000, 128-bit multiply, divide, modulo, shift, compare Carl Love
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Carl Love @ 2020-08-11 19:22 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt; +Cc: Bill Schmidt, cel

Segher, Will:

Patch 1, adds the sign extension instruction support and corresponding
builtins.

             Carl Love

---------------------------------------------------------------------
RS6000 Add 128-bit sign extension support

gcc/ChangeLog

2020-08-10  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
	for new builtins.
	* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
	overloaded builtin definitions.
	(VSIGNEXTSB2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D): Add builtin
	expansions.
	* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
	P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
	* config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
	visible.
	* doc/extend.texi:  Add documentation for the vec_signexti and
	vec_signextll builtins.

gcc/testsuite/ChangeLog

2020-08-10  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
---
 gcc/config/rs6000/altivec.h                   |   3 +
 gcc/config/rs6000/rs6000-builtin.def          |   9 ++
 gcc/config/rs6000/rs6000-call.c               |  13 ++
 gcc/config/rs6000/vsx.md                      |   2 +-
 gcc/doc/extend.texi                           |  15 ++
 .../powerpc/p9-sign_extend-runnable.c         | 128 ++++++++++++++++++
 6 files changed, 169 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index bf2240f16a2..09320df14ca 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -498,6 +498,9 @@
 
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
+#define vec_signexti  __builtin_vec_vsignexti
+#define vec_signextll __builtin_vec_vsignextll
+
 #endif
 
 /* Predicates.
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index f9f0fece549..667c2450d41 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2691,6 +2691,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,	"vprtybd")
 BU_P9V_OVERLOAD_1 (VPRTYBQ,	"vprtybq")
 BU_P9V_OVERLOAD_1 (VPRTYBW,	"vprtybw")
 BU_P9V_OVERLOAD_1 (VPARITY_LSBB,	"vparity_lsbb")
+BU_P9V_OVERLOAD_1 (VSIGNEXTI,	"vsignexti")
+BU_P9V_OVERLOAD_1 (VSIGNEXTLL,	"vsignextll")
 
 /* 2 argument functions added in ISA 3.0 (power9).  */
 BU_P9_2 (CMPRB,	"byte_in_range",	CONST,	cmprb)
@@ -2702,6 +2704,13 @@ BU_P9_OVERLOAD_2 (CMPRB,	"byte_in_range")
 BU_P9_OVERLOAD_2 (CMPRB2,	"byte_in_either_range")
 BU_P9_OVERLOAD_2 (CMPEQB,	"byte_in_set")
 \f
+/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1.  */
+BU_P9V_AV_1 (VSIGNEXTSB2W,	"vsignextsb2w",		CONST,  vsx_sign_extend_qi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSH2W,	"vsignextsh2w",		CONST,  vsx_sign_extend_hi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSB2D,	"vsignextsb2d",		CONST,  vsx_sign_extend_qi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
+
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
 BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 189497efb45..87699be8a07 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5527,6 +5527,19 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI },
 
+  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 */
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
+
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+
   /* Overloaded built-in functions for ISA3.1 (power10). */
   { P10_BUILTIN_VEC_CLRL, P10_BUILTIN_VCLRLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 },
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index dd750210758..1153a01b4ef 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4787,7 +4787,7 @@
   "vextsh2<wd> %0,%1"
   [(set_attr "type" "vecexts")])
 
-(define_insn "*vsx_sign_extend_si_v2di"
+(define_insn "vsx_sign_extend_si_v2di"
   [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
 	(unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
 		     UNSPEC_VSX_SIGN_EXTEND))]
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 79833171c5a..cb501ab2d75 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20754,6 +20754,21 @@ void vec_xst (vector unsigned char, int, vector unsigned char *);
 void vec_xst (vector unsigned char, int, unsigned char *);
 @end smallexample
 
+uThe following sign extension builtins are provided.
+
+@smallexample
+vector signed int vec_signexti (vector signed char a)
+vector signed long long vec_signextll (vector signed char a)
+vector signed int vec_signexti (vector signed short a)
+vector signed long long vec_signextll (vector signed short a)
+vector signed long long vec_signextll (vector signed int a)
+@end smallexample
+
+Each element of the result is produced by sign-extending the element of the
+input vector that would fall in the least significant portion of the result
+element. For example, a sign-extension of a vector signed char to a vector
+signed long long will sign extend the rightmost byte of each doubleword.
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 3.1
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1
 
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
new file mode 100644
index 00000000000..7bf979c6fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
@@ -0,0 +1,128 @@
+/* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
+
+/* These builtins were not defined until ISA 3.1 but only require ISA 3.0
+   support.  */
+
+/* { dg-final { scan-assembler-times {\mvextsb2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsw2d\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector signed char vec_arg_qi, vec_result_qi;
+  vector signed short int vec_arg_hi, vec_result_hi, vec_expected_hi;
+  vector signed int vec_arg_wi, vec_result_wi, vec_expected_wi;
+  vector signed long long vec_result_di, vec_expected_di;
+
+  /* test sign extend byte to word */
+  vec_arg_qi = (vector signed char) {1, 2, 3, 4, 5, 6, 7, 8,
+				     -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_wi = (vector signed int) {1, 5, -1, -5};
+
+  vec_result_wi = vec_signexti (vec_arg_qi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(char, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n",
+	     i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend byte to double */
+  vec_arg_qi = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,
+				    -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_qi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(byte, long long int):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to word */
+  vec_arg_hi = (vector signed short int){1, 2, 3, 4, -1, -2, -3, -4};
+  vec_expected_wi = (vector signed int){1, 3, -1, -3};
+
+  vec_result_wi = vec_signexti(vec_arg_hi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(short, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n", i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to double word */
+  vec_arg_hi = (vector signed short int ){1, 3, 5, 7,  -1, -3, -5, -7};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_hi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(short, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend word to double word */
+  vec_arg_wi = (vector signed int ){1, 3, -1, -3};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_wi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(word, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  return 0;
+}
-- 
2.25.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Patch 2/5] rs6000, 128-bit multiply, divide, modulo, shift, compare
  2020-08-11 19:01 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
  2020-08-11 19:22 ` [Patch 1/5] rs6000, Add 128-bit sign extension support Carl Love
@ 2020-08-11 19:22 ` Carl Love
  2020-08-13 23:46   ` will schmidt
  2020-08-11 19:22 ` [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support Carl Love
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Carl Love @ 2020-08-11 19:22 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt; +Cc: Bill Schmidt, cel

Segher, Will:

Patch 2, adds support for divide, modulo, shift, compare of 128-bit
integers.  The support adds the instruction and builtin support.

             Carl Love


-------------------------------------------------------
rs6000, 128-bit multiply, divide, shift, compare

gcc/ChangeLog

2020-08-10  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
	for new builtins .
	* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
	UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
	(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
	altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
	altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
	altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
	altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
	define_insn.
	(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
	vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
	altivec_vrlqnm): New define_expands.
	* config/rs6000/rs6000-builtin.def (BU_P10_P, BU_P10_128BIT_1,
	BU_P10_128BIT_2, BU_P10_128BIT_3): New macro definitions.
	(VCMPEQUT_P, VCMPGTST_P, VCMPGTUT_P): Add macro expansions.
	(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
	CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
	VCMPAET_P): New macro expansions.
	(VSIGNEXTSD2Q,VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ, VSLQ,
	VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
	MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
	(VRLQ, VSLQ, VSRQ, VSRAQ, SIGNEXT): New overload expansions.
	* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
	P10_BUILTIN_VCMPEQUT, P10_BUILTIN_CMPGE_1TI,
	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
	P10_BUILTIN_128BIT_DIV_V1TI, P10_BUILTIN_128BIT_UDIV_V1TI,
	P10_BUILTIN_128BIT_VMULESD, P10_BUILTIN_128BIT_VMULEUD,
	P10_BUILTIN_128BIT_VMULOSD, P10_BUILTIN_128BIT_VMULOUD,
	P10_BUILTIN_VNOR_V1TI, P10_BUILTIN_VNOR_V1TI_UNS,
	P10_BUILTIN_128BIT_VRLQ, P10_BUILTIN_128BIT_VRLQMI,
	P10_BUILTIN_128BIT_VRLQNM, P10_BUILTIN_128BIT_VSLQ,
	P10_BUILTIN_128BIT_VSRQ, P10_BUILTIN_128BIT_VSRAQ,
	P10_BUILTIN_VCMPGTUT_P, P10_BUILTIN_VCMPGTST_P,
	P10_BUILTIN_VCMPEQUT_P, P10_BUILTIN_VCMPGTUT_P,
	P10_BUILTIN_VCMPGTST_P, P10_BUILTIN_CMPNET,
	P10_BUILTIN_VCMPNET_P, P10_BUILTIN_VCMPAET_P,
	P10_BUILTIN_128BIT_VSIGNEXTSD2Q, P10_BUILTIN_128BIT_DIVES_V1TI,
	P10_BUILTIN_128BIT_MODS_V1TI, P10_BUILTIN_128BIT_MODU_V1TI):
	New overloaded definitions.
	(int_ftype_int_v1ti_v1ti) [P10_BUILTIN_VCMPEQUT,
	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
	P10_BUILTIN_CMPLE_U1TI, E_V1TImode]: New case statements.
	(int_ftype_int_v1ti_v1ti) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
	New assignments.
	(int_ftype_int_v1ti_v1ti)[P10_BUILTIN_128BIT_VMULEUD,
	P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
	P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.
	* config/rs6000/r6000.c (rs6000_builtin_mask_calculate): New
	TARGET_TI_VECTOR_OPS definition.
	(rs6000_option_override_internal): Add if TARGET_POWER10 statement.
	(rs6000_handle_altivec_attribute)[ E_TImode, E_V1TImode]: New case
	statements.
	(rs6000_opt_masks): Add ti-vector-ops entry.
	* config/rs6000/r6000.h (MASK_TI_VECTOR_OPS, RS6000_BTM_P10_128BIT,
	RS6000_BTM_TI_VECTOR_OPS, bool_V1TI_type_node): New defines.
	(rs6000_builtin_type_index): New enum value RS6000_BTI_bool_V1TI.
	* config/rs6000/rs6000.opt: New mti-vector-ops entry.
	* config/rs6000/vector.md (vector_eqv1ti, vector_gtv1ti,
	vector_nltv1ti, vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti,
	vector_ngtuv1ti, vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
	vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
	vlshrv1ti3, vashrv1ti3): New define_expands.
	* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
	UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
	UNSPEC_VSX_MODUQ, UNSPEC_XXSWAPD_V1TI): New unspecs.
	(vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti, vsx_diveu_v1ti,
	vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti, vsx_sign_extend_v2di_v1ti):
	New define_insns.
	(vcmpnet): New define_expand.
	* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
	vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
	vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,
	vec_cmpge, vec_cmple, vec_all_eq, vec_all_ne, vec_all_gt, vec_all_lt,
	vec_all_ge, vec_all_le, vec_any_eq, vec_any_ne, vec_any_gt, vec_any_lt,
	vec_any_ge, vec_any_le.

gcc/testsuite/ChangeLog

2020-08-10  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/int_128bit-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |    6 +-
 gcc/config/rs6000/altivec.md                  |  242 +-
 gcc/config/rs6000/rs6000-builtin.def          |   77 +
 gcc/config/rs6000/rs6000-call.c               |  150 +-
 gcc/config/rs6000/rs6000.c                    |   17 +-
 gcc/config/rs6000/rs6000.h                    |    6 +-
 gcc/config/rs6000/rs6000.opt                  |    4 +
 gcc/config/rs6000/vector.md                   |  199 ++
 gcc/config/rs6000/vsx.md                      |   99 +-
 gcc/doc/extend.texi                           |  174 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 2254 +++++++++++++++++
 11 files changed, 3217 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 09320df14ca..a121004b3af 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -183,7 +183,7 @@
 #define vec_recipdiv __builtin_vec_recipdiv
 #define vec_rlmi __builtin_vec_rlmi
 #define vec_vrlnm __builtin_vec_rlnm
-#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
+#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
 #define vec_rsqrt __builtin_vec_rsqrt
 #define vec_rsqrte __builtin_vec_rsqrte
 #define vec_signed __builtin_vec_vsigned
@@ -694,6 +694,10 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
 
 #ifdef _ARCH_PWR10
+#define vec_signextq  __builtin_vec_vsignextq
+#define vec_dive __builtin_vec_dive
+#define vec_mod  __builtin_vec_mod
+
 /* May modify these macro definitions if future capabilities overload
    with support for different vector argument and result types.  */
 #define vec_cntlzm(a, b)	__builtin_altivec_vclzdm (a, b)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0a2e634d6b0..2763d920828 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -39,12 +39,16 @@
    UNSPEC_VMULESH
    UNSPEC_VMULEUW
    UNSPEC_VMULESW
+   UNSPEC_VMULEUD
+   UNSPEC_VMULESD
    UNSPEC_VMULOUB
    UNSPEC_VMULOSB
    UNSPEC_VMULOUH
    UNSPEC_VMULOSH
    UNSPEC_VMULOUW
    UNSPEC_VMULOSW
+   UNSPEC_VMULOUD
+   UNSPEC_VMULOSD
    UNSPEC_VPKPX
    UNSPEC_VPACK_SIGN_SIGN_SAT
    UNSPEC_VPACK_SIGN_UNS_SAT
@@ -628,6 +632,14 @@
   "vcmpequ<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_eqv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+  "vcmpequq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gt<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gt:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -636,6 +648,14 @@
   "vcmpgts<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gt:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+  "vcmpgtsq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gtu<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gtu:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -644,6 +664,14 @@
   "vcmpgtu<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		  (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+  "vcmpgtuq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_eqv4sf"
   [(set (match_operand:V4SF 0 "altivec_register_operand" "=v")
 	(eq:V4SF (match_operand:V4SF 1 "altivec_register_operand" "v")
@@ -1687,6 +1715,19 @@
  DONE;
 })
 
+(define_expand "vec_widen_umult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
 (define_expand "vec_widen_smult_even_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1695,11 +1736,24 @@
 {
   if (BYTES_BIG_ENDIAN)
     emit_insn (gen_altivec_vmulesw (operands[0], operands[1], operands[2]));
- else
+  else
     emit_insn (gen_altivec_vmulosw (operands[0], operands[1], operands[2]));
   DONE;
 })
 
+(define_expand "vec_widen_smult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+ else
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_umult_odd_v16qi"
   [(use (match_operand:V8HI 0 "register_operand"))
    (use (match_operand:V16QI 1 "register_operand"))
@@ -1765,6 +1819,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_umult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_smult_odd_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1778,6 +1845,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "altivec_vmuleub"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
@@ -1859,6 +1939,15 @@
   "vmuleuw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuleud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULEUD))]
+  "TARGET_TI_VECTOR_OPS"
+  "vmuleud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulouw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1868,6 +1957,15 @@
   "vmulouw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuloud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOUD))]
+  "TARGET_TI_VECTOR_OPS"
+  "vmuloud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulesw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1877,6 +1975,15 @@
   "vmulesw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulesd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULESD))]
+  "TARGET_TI_VECTOR_OPS"
+  "vmulesd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulosw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1886,6 +1993,15 @@
   "vmulosw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulosd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOSD))]
+  "TARGET_TI_VECTOR_OPS"
+  "vmulosd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 ;; Vector pack/unpack
 (define_insn "altivec_vpkpx"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
@@ -1979,6 +2095,15 @@
   "vrl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vrlq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+;; rotate amount in needs to be in bits[57:63] of operand2.
+  "vrlq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vrl<VI_char>mi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
@@ -1989,6 +2114,33 @@
   "vrl<VI_char>mi %0,%2,%3"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqmi"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")
+		      (match_operand:V1TI 3 "vsx_register_operand")]
+		     UNSPEC_VRLMI))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  /* Mask bit begin, end fields need to be in bits [41:55] of 128-bit operand2. */
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[3]));
+  emit_insn(gen_altivec_vrlqmi_inst (operands[0], operands[1], operands[2], tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqmi_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "0")
+		      (match_operand:V1TI 3 "vsx_register_operand" "v")]
+		     UNSPEC_VRLMI))]
+  "TARGET_TI_VECTOR_OPS"
+  "vrlqmi %0,%1,%3"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vrl<VI_char>nm"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
@@ -1998,6 +2150,31 @@
   "vrl<VI_char>nm %0,%1,%2"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqnm"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")]
+		     UNSPEC_VRLNM))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vrlqnm_inst (operands[0], operands[1], tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqnm_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+		     UNSPEC_VRLNM))]
+  "TARGET_TI_VECTOR_OPS"
+  ;; rotate and mask bits need to be in upper 64-bits of operand2.
+  "vrlqnm %0,%1,%2"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vsl"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2042,6 +2219,15 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vslq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vslq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsr<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (lshiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2050,6 +2236,15 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsrq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsrq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsra<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (ashiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2058,6 +2253,15 @@
   "vsra<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsraq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsraq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vsr"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2618,6 +2822,18 @@
   "vcmpequ<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_vcmpequt_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v")
+			   (match_operand:V1TI 2 "altivec_register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_TI_VECTOR_OPS"
+  "vcmpequq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgts<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2630,6 +2846,18 @@
   "vcmpgts<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtst_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gt:CC (match_operand:V1TI 1 "register_operand" "v")
+			   (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gt:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_TI_VECTOR_OPS"
+  "vcmpgtsq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgtu<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gtu:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2642,6 +2870,18 @@
   "vcmpgtu<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtut_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gtu:CC (match_operand:V1TI 1 "register_operand" "v")
+			    (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gtu:V1TI (match_dup 1)
+		  (match_dup 2)))]
+  "TARGET_TI_VECTOR_OPS"
+  "vcmpgtuq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpeqfp_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(eq:CC (match_operand:V4SF 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 667c2450d41..871da6c4cf7 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1070,6 +1070,15 @@
 		     | RS6000_BTC_UNARY),				\
 		    CODE_FOR_ ## ICODE)			/* ICODE */
 
+
+#define BU_P10_P(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_P (P10_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P10_128BIT,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_PREDICATE),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
 #define BU_P10_OVERLOAD_1(ENUM, NAME)					\
   RS6000_BUILTIN_1 (P10_BUILTIN_VEC_ ## ENUM,		/* ENUM */	\
 		    "__builtin_vec_" NAME,		/* NAME */	\
@@ -1152,6 +1161,30 @@
 		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
 		     | RS6000_BTC_BINARY),				\
 		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_P10_128BIT_1(ENUM, NAME, ATTR, ICODE)			\
+  RS6000_BUILTIN_1 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P10_128BIT,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_UNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_P10_128BIT_2(ENUM, NAME, ATTR, ICODE)			\
+  RS6000_BUILTIN_2 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P10_128BIT,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_BINARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_P10_128BIT_3(ENUM, NAME, ATTR, ICODE)			\
+  RS6000_BUILTIN_3 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P10_128BIT,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_TERNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
 #endif
 
 \f
@@ -2712,6 +2745,10 @@ BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
 BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
 
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
+BU_P10_P (VCMPEQUT_P,		"vcmpequt_p",	CONST,	vector_eq_v1ti_p)
+BU_P10_P (VCMPGTST_P,		"vcmpgtst_p",	CONST,	vector_gt_v1ti_p)
+BU_P10_P (VCMPGTUT_P,		"vcmpgtut_p",	CONST,	vector_gtu_v1ti_p)
+
 BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
 BU_P10_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm)
@@ -2733,6 +2770,39 @@ BU_P10V_2 (XXGENPCVM_V8HI, "xxgenpcvm_v8hi", CONST, xxgenpcvm_v8hi)
 BU_P10V_2 (XXGENPCVM_V4SI, "xxgenpcvm_v4si", CONST, xxgenpcvm_v4si)
 BU_P10V_2 (XXGENPCVM_V2DI, "xxgenpcvm_v2di", CONST, xxgenpcvm_v2di)
 
+BU_P10V_2 (VCMPGTUT,		"vcmpgtut",	CONST,	vector_gtuv1ti)
+BU_P10V_2 (VCMPGTST,		"vcmpgtst",	CONST,	vector_gtv1ti)
+BU_P10V_2 (VCMPEQUT,		"vcmpequt",	CONST,	vector_eqv1ti)
+BU_P10V_2 (CMPNET,		"vcmpnet",	CONST,	vcmpnet)
+BU_P10V_2 (CMPGE_1TI,		"cmpge_1ti",    CONST,  vector_nltv1ti)
+BU_P10V_2 (CMPGE_U1TI,		"cmpge_u1ti",   CONST,  vector_nltuv1ti)
+BU_P10V_2 (CMPLE_1TI,		"cmple_1ti",    CONST,  vector_ngtv1ti)
+BU_P10V_2 (CMPLE_U1TI,		"cmple_u1ti",   CONST,  vector_ngtuv1ti)
+BU_P10V_2 (VNOR_V1TI_UNS,	"vnor_v1ti_uns",CONST,	norv1ti3)
+BU_P10V_2 (VNOR_V1TI,		"vnor_v1ti",	CONST,	norv1ti3)
+BU_P10V_2 (VCMPNET_P,		"vcmpnet_p",	CONST,	vector_ne_v1ti_p)
+BU_P10V_2 (VCMPAET_P,		"vcmpaet_p",	CONST,	vector_ae_v1ti_p)
+
+BU_P10_128BIT_1 (VSIGNEXTSD2Q,	"vsignext",     CONST,  vsx_sign_extend_v2di_v1ti)
+
+BU_P10_128BIT_2 (VMULEUD,	"vmuleud",	CONST,	vec_widen_umult_even_v2di)
+BU_P10_128BIT_2 (VMULESD,	"vmulesd",	CONST,	vec_widen_smult_even_v2di)
+BU_P10_128BIT_2 (VMULOUD,	"vmuloud",	CONST,	vec_widen_umult_odd_v2di)
+BU_P10_128BIT_2 (VMULOSD,	"vmulosd",	CONST,	vec_widen_smult_odd_v2di)
+BU_P10_128BIT_2 (VRLQ,		"vrlq",		CONST,	vrotlv1ti3)
+BU_P10_128BIT_2 (VSLQ,		"vslq",		CONST,	vashlv1ti3)
+BU_P10_128BIT_2 (VSRQ,		"vsrq",		CONST,	vlshrv1ti3)
+BU_P10_128BIT_2 (VSRAQ,		"vsraq",	CONST,	vashrv1ti3)
+BU_P10_128BIT_2 (VRLQNM,	"vrlqnm",	CONST,	altivec_vrlqnm)
+BU_P10_128BIT_2 (DIV_V1TI,	"div_1ti",      CONST,  vsx_div_v1ti)
+BU_P10_128BIT_2 (UDIV_V1TI,	"udiv_1ti",     CONST,  vsx_udiv_v1ti)
+BU_P10_128BIT_2 (DIVES_V1TI,	"dives",	CONST,	vsx_dives_v1ti)
+BU_P10_128BIT_2 (DIVEU_V1TI,	"diveu",	CONST,	vsx_diveu_v1ti)
+BU_P10_128BIT_2 (MODS_V1TI,	"mods",		CONST,	vsx_mods_v1ti)
+BU_P10_128BIT_2 (MODU_V1TI,	"modu",		CONST,	vsx_modu_v1ti)
+
+BU_P10_128BIT_3 (VRLQMI,	"vrlqmi",	CONST,	altivec_vrlqmi)
+
 BU_P10V_3 (VEXTRACTBL, "vextdubvlx", CONST, vextractlv16qi)
 BU_P10V_3 (VEXTRACTHL, "vextduhvlx", CONST, vextractlv8hi)
 BU_P10V_3 (VEXTRACTWL, "vextduwvlx", CONST, vextractlv4si)
@@ -2839,6 +2909,12 @@ BU_P10_OVERLOAD_2 (CLRR, "clrr")
 BU_P10_OVERLOAD_2 (GNB, "gnb")
 BU_P10_OVERLOAD_4 (XXEVAL, "xxeval")
 BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
+BU_P10_OVERLOAD_2 (VRLQ, "vrlq")
+BU_P10_OVERLOAD_2 (VSLQ, "vslq")
+BU_P10_OVERLOAD_2 (VSRQ, "vsrq")
+BU_P10_OVERLOAD_2 (VSRAQ, "vsraq")
+BU_P10_OVERLOAD_2 (DIVE,  "dive")
+BU_P10_OVERLOAD_2 (MOD,  "mod")
 
 BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
 BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
@@ -2854,6 +2930,7 @@ BU_P10_OVERLOAD_1 (VSTRIL, "stril")
 
 BU_P10_OVERLOAD_1 (VSTRIR_P, "strir_p")
 BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
+BU_P10_OVERLOAD_1 (SIGNEXT, "vsignextq")
 
 BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
 BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 87699be8a07..2bd6412a502 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -839,6 +839,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, VSX_BUILTIN_XVCMPEQDP,
@@ -885,6 +889,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10_BUILTIN_CMPGE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10_BUILTIN_CMPGE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -899,8 +908,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10_BUILTIN_VCMPGTUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTSD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10_BUILTIN_VCMPGTST,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, VSX_BUILTIN_XVCMPGTDP,
@@ -943,6 +956,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_CMPLE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10_BUILTIN_CMPLE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10_BUILTIN_CMPLE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -995,6 +1013,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10_BUILTIN_128BIT_DIV_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10_BUILTIN_128BIT_UDIV_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
@@ -1789,6 +1813,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULE, P8V_BUILTIN_VMULEUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10_BUILTIN_128BIT_VMULESD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10_BUILTIN_128BIT_VMULEUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
   { ALTIVEC_BUILTIN_VEC_VMULEUB, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULESB, ALTIVEC_BUILTIN_VMULESB,
@@ -1812,6 +1842,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULO, P8V_BUILTIN_VMULOUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10_BUILTIN_128BIT_VMULOSD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10_BUILTIN_128BIT_VMULOUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULO, ALTIVEC_BUILTIN_VMULOSH,
     RS6000_BTI_V4SI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULOSH, ALTIVEC_BUILTIN_VMULOSH,
@@ -1860,6 +1895,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
     RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V4SI,
@@ -2115,6 +2160,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_RL, P8V_BUILTIN_VRLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10_BUILTIN_128BIT_VRLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10_BUILTIN_128BIT_VRLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
@@ -2133,12 +2183,23 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_RLMI, P9V_BUILTIN_VRLDMI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { P9V_BUILTIN_VEC_RLMI, P10_BUILTIN_128BIT_VRLQMI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI },
+  { P9V_BUILTIN_VEC_RLMI, P10_BUILTIN_128BIT_VRLQMI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLWNM,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLDNM,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10_BUILTIN_128BIT_VRLQNM,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10_BUILTIN_128BIT_VRLQNM,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
@@ -2155,6 +2216,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, P8V_BUILTIN_VSLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10_BUILTIN_128BIT_VSLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10_BUILTIN_128BIT_VSLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTSP,
@@ -2351,6 +2417,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SR, P8V_BUILTIN_VSRD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10_BUILTIN_128BIT_VSRQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10_BUILTIN_128BIT_VSRQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
@@ -2379,6 +2450,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SRA, P8V_BUILTIN_VSRAD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10_BUILTIN_128BIT_VSRAQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10_BUILTIN_128BIT_VSRAQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
@@ -3996,12 +4072,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, ALTIVEC_BUILTIN_VCMPGTFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, VSX_BUILTIN_XVCMPGTDP_P,
@@ -4066,6 +4146,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, ALTIVEC_BUILTIN_VCMPEQFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, VSX_BUILTIN_XVCMPEQDP_P,
@@ -4117,12 +4201,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, ALTIVEC_BUILTIN_VCMPGEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
@@ -4771,6 +4859,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEW,
     RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
 
   /* The following 2 entries have been deprecated.  */
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEB_P,
@@ -4856,8 +4950,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, 0 },
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
     RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI,
-    RS6000_BTI_unsigned_V2DI, 0
-  },
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
 
   /* The following 2 entries have been deprecated.  */
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
@@ -4871,6 +4967,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
 
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
@@ -4961,8 +5059,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
     RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI,
-    RS6000_BTI_unsigned_V2DI, 0
-  },
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
 
   /* The following 2 entries have been deprecated.  */
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
@@ -4976,7 +5076,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
-
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
@@ -5903,6 +6004,21 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
  { P10_BUILTIN_VEC_XVTLSBB_ONES, P10_BUILTIN_XVTLSBB_ONES,
     RS6000_BTI_INTSI, RS6000_BTI_unsigned_V16QI, 0, 0 },
 
+  { P10_BUILTIN_VEC_SIGNEXT, P10_BUILTIN_128BIT_VSIGNEXTSD2Q,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, 0, 0 },
+
+  { P10_BUILTIN_VEC_DIVE, P10_BUILTIN_128BIT_DIVES_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10_BUILTIN_128BIT_DIVEU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
+  { P10_BUILTIN_VEC_MOD, P10_BUILTIN_128BIT_MODS_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10_BUILTIN_128BIT_MODU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+
   { RS6000_BUILTIN_NONE, RS6000_BUILTIN_NONE, 0, 0, 0, 0 }
 };
 \f
@@ -12228,12 +12344,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPEQUH:
     case ALTIVEC_BUILTIN_VCMPEQUW:
     case P8V_BUILTIN_VCMPEQUD:
+    case P10_BUILTIN_VCMPEQUT:
       fold_compare_helper (gsi, EQ_EXPR, stmt);
       return true;
 
     case P9V_BUILTIN_CMPNEB:
     case P9V_BUILTIN_CMPNEH:
     case P9V_BUILTIN_CMPNEW:
+    case P10_BUILTIN_CMPNET:
       fold_compare_helper (gsi, NE_EXPR, stmt);
       return true;
 
@@ -12245,6 +12363,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_2DI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10_BUILTIN_CMPGE_1TI:
+    case P10_BUILTIN_CMPGE_U1TI:
       fold_compare_helper (gsi, GE_EXPR, stmt);
       return true;
 
@@ -12256,6 +12376,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
     case P8V_BUILTIN_VCMPGTSD:
+    case P10_BUILTIN_VCMPGTUT:
+    case P10_BUILTIN_VCMPGTST:
       fold_compare_helper (gsi, GT_EXPR, stmt);
       return true;
 
@@ -12267,6 +12389,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPLE_U4SI:
     case VSX_BUILTIN_CMPLE_2DI:
     case VSX_BUILTIN_CMPLE_U2DI:
+    case P10_BUILTIN_CMPLE_1TI:
+    case P10_BUILTIN_CMPLE_U1TI:
       fold_compare_helper (gsi, LE_EXPR, stmt);
       return true;
 
@@ -12978,6 +13102,8 @@ rs6000_init_builtins (void)
 					    ? "__vector __bool long"
 					    : "__vector __bool long long",
 					    bool_long_long_type_node, 2);
+  bool_V1TI_type_node = rs6000_vector_type ("__vector __bool __int128",
+					    intTI_type_node, 1);
   pixel_V8HI_type_node = rs6000_vector_type ("__vector __pixel",
 					     pixel_type_node, 8);
 
@@ -13163,6 +13289,10 @@ altivec_init_builtins (void)
     = build_function_type_list (integer_type_node,
 				integer_type_node, V2DI_type_node,
 				V2DI_type_node, NULL_TREE);
+  tree int_ftype_int_v1ti_v1ti
+    = build_function_type_list (integer_type_node,
+				integer_type_node, V1TI_type_node,
+				V1TI_type_node, NULL_TREE);
   tree void_ftype_v4si
     = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
   tree v8hi_ftype_void
@@ -13515,6 +13645,9 @@ altivec_init_builtins (void)
 	case E_VOIDmode:
 	  type = int_ftype_int_opaque_opaque;
 	  break;
+	case E_V1TImode:
+	  type = int_ftype_int_v1ti_v1ti;
+	  break;
 	case E_V2DImode:
 	  type = int_ftype_int_v2di_v2di;
 	  break;
@@ -14114,6 +14247,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10_BUILTIN_XXGENPCVM_V8HI:
     case P10_BUILTIN_XXGENPCVM_V4SI:
     case P10_BUILTIN_XXGENPCVM_V2DI:
+    case P10_BUILTIN_128BIT_VMULEUD:
+    case P10_BUILTIN_128BIT_VMULOUD:
+    case P10_BUILTIN_128BIT_DIVEU_V1TI:
+    case P10_BUILTIN_128BIT_MODU_V1TI:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
@@ -14213,10 +14350,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case VSX_BUILTIN_CMPGE_U8HI:
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10_BUILTIN_CMPGE_U1TI:
     case ALTIVEC_BUILTIN_VCMPGTUB:
     case ALTIVEC_BUILTIN_VCMPGTUH:
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
+    case P10_BUILTIN_VCMPGTUT:
+    case P10_BUILTIN_VCMPEQUT:
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
       break;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 40ee0a695f1..1fa4a527f12 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3401,7 +3401,9 @@ rs6000_builtin_mask_calculate (void)
 	  | ((TARGET_FLOAT128_TYPE)	    ? RS6000_BTM_FLOAT128  : 0)
 	  | ((TARGET_FLOAT128_HW)	    ? RS6000_BTM_FLOAT128_HW : 0)
 	  | ((TARGET_MMA)		    ? RS6000_BTM_MMA	   : 0)
-	  | ((TARGET_POWER10)               ? RS6000_BTM_P10       : 0));
+	  | ((TARGET_POWER10)               ? RS6000_BTM_P10       : 0)
+	  | ((TARGET_TI_VECTOR_OPS)         ? RS6000_BTM_TI_VECTOR_OPS : 0));
+
 }
 
 /* Implement TARGET_MD_ASM_ADJUST.  All asm statements are considered
@@ -3732,6 +3734,17 @@ rs6000_option_override_internal (bool global_init_p)
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "before defaults", rs6000_isa_flags);
 
+  /* The -mti-vector-ops option requires ISA 3.1 support and -maltivec for
+     the 128-bit instructions.  Currently, TARGET_POWER10 is sufficient to
+     enable it by default.  */
+  if (TARGET_POWER10)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
+	warning(0, ("%<-mno-altivec%> disables -mti-vector-ops (128-bit integer vector register operations)."));
+      else
+	rs6000_isa_flags |= OPTION_MASK_TI_VECTOR_OPS;
+    }
+
   /* Handle explicit -mno-{altivec,vsx,power8-vector,power9-vector} and turn
      off all of the options that depend on those flags.  */
   ignore_masks = rs6000_disable_incompatible_switches ();
@@ -19489,6 +19502,7 @@ rs6000_handle_altivec_attribute (tree *node,
     case 'b':
       switch (mode)
 	{
+	case E_TImode: case E_V1TImode: result = bool_V1TI_type_node; break;
 	case E_DImode: case E_V2DImode: result = bool_V2DI_type_node; break;
 	case E_SImode: case E_V4SImode: result = bool_V4SI_type_node; break;
 	case E_HImode: case E_V8HImode: result = bool_V8HI_type_node; break;
@@ -23218,6 +23232,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "float128-hardware",	OPTION_MASK_FLOAT128_HW,	false, true  },
   { "fprnd",			OPTION_MASK_FPRND,		false, true  },
   { "power10",			OPTION_MASK_POWER10,		false, true  },
+  { "ti-vector-ops",		OPTION_MASK_TI_VECTOR_OPS,      false, true  },
   { "hard-dfp",			OPTION_MASK_DFP,		false, true  },
   { "htm",			OPTION_MASK_HTM,		false, true  },
   { "isel",			OPTION_MASK_ISEL,		false, true  },
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index bbd8060e143..da84abde671 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -539,6 +539,7 @@ extern int rs6000_vector_align[];
 #define MASK_UPDATE			OPTION_MASK_UPDATE
 #define MASK_VSX			OPTION_MASK_VSX
 #define MASK_POWER10			OPTION_MASK_POWER10
+#define MASK_TI_VECTOR_OPS		OPTION_MASK_TI_VECTOR_OPS
 
 #ifndef IN_LIBGCC2
 #define MASK_POWERPC64			OPTION_MASK_POWERPC64
@@ -2305,6 +2306,7 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_P8_VECTOR	MASK_P8_VECTOR	/* ISA 2.07 vector.  */
 #define RS6000_BTM_P9_VECTOR	MASK_P9_VECTOR	/* ISA 3.0 vector.  */
 #define RS6000_BTM_P9_MISC	MASK_P9_MISC	/* ISA 3.0 misc. non-vector */
+#define RS6000_BTM_P10_128BIT   MASK_POWER10    /* ISA P10 vector.  */
 #define RS6000_BTM_CRYPTO	MASK_CRYPTO	/* crypto funcs.  */
 #define RS6000_BTM_HTM		MASK_HTM	/* hardware TM funcs.  */
 #define RS6000_BTM_FRE		MASK_POPCNTB	/* FRE instruction.  */
@@ -2322,7 +2324,7 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_FLOAT128_HW	MASK_FLOAT128_HW /* IEEE 128-bit float h/w.  */
 #define RS6000_BTM_MMA		MASK_MMA	/* ISA 3.1 MMA.  */
 #define RS6000_BTM_P10		MASK_POWER10
-
+#define RS6000_BTM_TI_VECTOR_OPS MASK_TI_VECTOR_OPS /* 128-bit integer support */
 
 #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
 				 | RS6000_BTM_VSX			\
@@ -2436,6 +2438,7 @@ enum rs6000_builtin_type_index
   RS6000_BTI_bool_V8HI,          /* __vector __bool short */
   RS6000_BTI_bool_V4SI,          /* __vector __bool int */
   RS6000_BTI_bool_V2DI,          /* __vector __bool long */
+  RS6000_BTI_bool_V1TI,          /* __vector __bool long */
   RS6000_BTI_pixel_V8HI,         /* __vector __pixel */
   RS6000_BTI_long,	         /* long_integer_type_node */
   RS6000_BTI_unsigned_long,      /* long_unsigned_type_node */
@@ -2489,6 +2492,7 @@ enum rs6000_builtin_type_index
 #define bool_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V8HI])
 #define bool_V4SI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V4SI])
 #define bool_V2DI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V2DI])
+#define bool_V1TI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V1TI])
 #define pixel_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_pixel_V8HI])
 
 #define long_long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long_long])
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d3e740e930..67d667bf1fd 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -585,3 +585,7 @@ Generate (do not generate) pc-relative memory addressing.
 mmma
 Target Report Mask(MMA) Var(rs6000_isa_flags)
 Generate (do not generate) MMA instructions.
+
+mti-vector-ops
+Target Report Mask(TI_VECTOR_OPS) Var(rs6000_isa_flags)
+Use integer 128-bit instructions for a future architecture.
\ No newline at end of file
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 796345c80d3..2deff282076 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -678,6 +678,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_eqv1ti"
+  [(set (match_operand:V1TI 0 "vlogical_operand")
+	(eq:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))]
+  "TARGET_TI_VECTOR_OPS"
+  "")
+
 (define_expand "vector_gt<mode>"
   [(set (match_operand:VEC_C 0 "vlogical_operand")
 	(gt:VEC_C (match_operand:VEC_C 1 "vlogical_operand")
@@ -685,6 +692,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtv1ti"
+  [(set (match_operand:V1TI 0 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))]
+  "TARGET_TI_VECTOR_OPS"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nlt<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -697,6 +711,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		 (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_gtu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -704,6 +729,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		  (match_operand:V1TI 2 "altivec_register_operand")))]
+  "TARGET_TI_VECTOR_OPS"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nltu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -716,6 +748,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		  (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+	(not:V1TI (match_dup 3)))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_geu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(geu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -735,6 +778,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_ngtu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vlogical_operand")
@@ -746,6 +800,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		  (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 ; There are 14 possible vector FP comparison operators, gt and eq of them have
 ; been expanded above, so just support 12 remaining operators here.
 
@@ -894,6 +959,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_eq_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_TI_VECTOR_OPS"
+  "")
+
 ;; This expansion handles the V16QI, V8HI, and V4SI modes in the
 ;; implementation of the vec_all_ne built-in functions on Power9.
 (define_expand "vector_ne_<mode>_p"
@@ -976,6 +1053,23 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+(define_expand "vector_ne_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V2DI mode in the implementation of the
 ;; vec_any_eq built-in function on Power9.
 ;;
@@ -1002,6 +1096,27 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+;; Power 10
+(define_expand "vector_ae_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))
+   (set (match_dup 0)
+	(xor:SI (match_dup 0)
+		(const_int 1)))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V4SF and V2DF modes in the Power9
 ;; implementation of the vec_all_ne built-in functions.  Note that the
 ;; expansions for this pattern with these modes makes no use of power9-
@@ -1061,6 +1176,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gt_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gt:CC (match_operand:V1TI 1 "vlogical_operand")
+			     (match_operand:V1TI 2 "vlogical_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (gt:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_TI_VECTOR_OPS"
+  "")
+
 (define_expand "vector_ge_<mode>_p"
   [(parallel
     [(set (reg:CC CR6_REGNO)
@@ -1085,6 +1212,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtu_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gtu:CC (match_operand:V1TI 1 "altivec_register_operand")
+			      (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "altivec_register_operand")
+	  (gtu:V1TI (match_dup 1)
+		    (match_dup 2)))])]
+  "TARGET_TI_VECTOR_OPS"
+  "")
+
 ;; AltiVec/VSX predicates.
 
 ;; This expansion is triggered during expansion of predicate built-in
@@ -1460,6 +1599,20 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vrotlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vrlq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for rotatert to make use of vrotl
 (define_expand "vrotr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1481,6 +1634,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vashlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for logical shift right on each vector element
 (define_expand "vlshr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1489,6 +1657,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vlshrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for arithmetic shift right on each vector element
 (define_expand "vashr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1496,6 +1679,22 @@
 			(match_operand:VEC_I 2 "vint_operand")))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
+
+;; No immediate version of this 128-bit instruction
+(define_expand "vashrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn(gen_altivec_vsraq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 \f
 ;; Vector reduction expanders for VSX
 ; The (VEC_reduc:...
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 1153a01b4ef..998af3908ad 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -298,6 +298,12 @@
    UNSPEC_VSX_XXSPLTD
    UNSPEC_VSX_DIVSD
    UNSPEC_VSX_DIVUD
+   UNSPEC_VSX_DIVSQ
+   UNSPEC_VSX_DIVUQ
+   UNSPEC_VSX_DIVESQ
+   UNSPEC_VSX_DIVEUQ
+   UNSPEC_VSX_MODSQ
+   UNSPEC_VSX_MODUQ
    UNSPEC_VSX_MULSD
    UNSPEC_VSX_SIGN_EXTEND
    UNSPEC_VSX_XVCVBF16SP
@@ -361,6 +367,7 @@
    UNSPEC_INSERTR
    UNSPEC_REPLACE_ELT
    UNSPEC_REPLACE_UN
+	UNSPEC_XXSWAPD_V1TI
   ])
 
 (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
@@ -1732,7 +1739,61 @@
 }
   [(set_attr "type" "div")])
 
-;; *tdiv* instruction returning the FG flag
+(define_insn "vsx_div_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVSQ))]
+  "TARGET_TI_VECTOR_OPS"
+  "vdivsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_udiv_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVUQ))]
+  "TARGET_TI_VECTOR_OPS"
+  "vdivuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_dives_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVESQ))]
+  "TARGET_TI_VECTOR_OPS"
+  "vdivesq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_diveu_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVEUQ))]
+  "TARGET_TI_VECTOR_OPS"
+  "vdiveuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_mods_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODSQ))]
+  "TARGET_TI_VECTOR_OPS"
+  "vmodsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_modu_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODUQ))]
+  "TARGET_TI_VECTOR_OPS"
+  "vmoduq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+ ;; *tdiv* instruction returning the FG flag
 (define_expand "vsx_tdiv<mode>3_fg"
   [(set (match_dup 3)
 	(unspec:CCFP [(match_operand:VSX_B 1 "vsx_register_operand")
@@ -3083,6 +3144,18 @@
   "xxpermdi %x0,%x1,%x1,2"
   [(set_attr "type" "vecperm")])
 
+;; Swap upper/lower 64-bit values in a 128-bit vector
+(define_insn "xxswapd_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (parallel [(const_int 0)(const_int 1)])]
+                     UNSPEC_XXSWAPD_V1TI))]
+  "TARGET_POWER10"
+;; AIX does not support extended mnemonic xxswapd.  Use the basic
+;; mnemonic xxpermdi instead.
+  "xxpermdi %x0,%x1,%x1,2"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "xxgenpcvm_<mode>_internal"
   [(set (match_operand:VSX_EXTRACT_I4 0 "altivec_register_operand" "=wa")
 	(unspec:VSX_EXTRACT_I4
@@ -4767,8 +4840,16 @@
    (set_attr "type" "vecload")])
 
 \f
-;; ISA 3.0 vector extend sign support
+;; ISA 3.1 vector extend sign support
+(define_insn "vsx_sign_extend_v2di_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
+		     UNSPEC_VSX_SIGN_EXTEND))]
+  "TARGET_TI_VECTOR_OPS"
+  "vextsd2q %0,%1"
+  [(set_attr "type" "vecexts")])
 
+;; ISA 3.0 vector extend sign support
 (define_insn "vsx_sign_extend_qi_<mode>"
   [(set (match_operand:VSINT_84 0 "vsx_register_operand" "=v")
 	(unspec:VSINT_84
@@ -5508,6 +5589,20 @@
   "vcmpnew %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Vector Compare Not Equal v1ti (specified/not+eq:)
+(define_expand "vcmpnet"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(not:V1TI
+	  (eq:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		   (match_operand:V1TI 2 "altivec_register_operand"))))]
+   "TARGET_TI_VECTOR_OPS"
+{
+  emit_insn (gen_vector_eqv1ti (operands[0], operands[1], operands[2]));
+  emit_insn (gen_one_cmplv1ti2 (operands[0], operands[0]));
+  DONE;
+})
+
+
 ;; Vector Compare Not Equal or Zero Word
 (define_insn "vcmpnezw"
   [(set (match_operand:V4SI 0 "altivec_register_operand" "=v")
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cb501ab2d75..346885de545 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21270,6 +21270,180 @@ Generate PCV from specified Mask size, as if implemented by the
 immediate value is either 0, 1, 2 or 3.
 @findex vec_genpcvm
 
+@smallexample
+@exdent vector unsigned __int128 vec_rl (vector unsigned __int128,
+                                         vector unsigned __int128);
+@exdent vector signed __int128 vec_rl (vector signed __int128,
+                                       vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input left by the number of bits
+specified in the most significant quad word of the second input truncated to
+7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlmi (vector signed __int128,
+                                         vector signed __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and inserting it under mask into the
+second input. The first bit in the mask, the last bit in the mask are obtained from the
+two 7-bit fields bits [108:115] and bits [117:123] respectively of the second input.
+The shift is obtained from the third input in the 7-bit field [125:131] where all bits
+counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlnm (vector signed __int128,
+                                         vector unsigned __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and ANDing it with a mask. The first
+bit in the mask, the last bit in the mask and the shift amount are obtained from the two
+7-bit fields bits [117:123] and bits [125:131] respectively of the second input.
+The shift is obtained from the third input in the 7-bit field bits [125:131] where all
+bits counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_sl(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sl(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of shifting the first input left by the number of bits
+specified in the most significant bits of the second input truncated to
+7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_sr(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sr(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of performing a logical right shift of the first argument
+by the number of bits specified in the most significant double word of the
+second input truncated to 7 bits (bits [125:131]).
+
+@smallexample
+@exdent vector unsigned __int128 vec_sra(vector unsigned __int128, vector unsigned __int128);
+@exdent vector signed __int128 vec_sra(vector signed __int128, vector unsigned __int128);
+@end smallexample
+
+Returns the result of performing arithmetic right shift of the first argument
+by the number of bits specified in the most significant bits of the
+second input truncated to 7 bits (bits [125:131]).
+
+
+@smallexample
+@exdent vector unsigned __int128 vec_mule (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mule (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the even doubleword
+elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mulo (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mulo (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the odd doubleword
+elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_div (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_div (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+Returns the result of dividing the first operand by the second operand. An attempt to
+divide any value by zero or to divide the most negative signed 128-bit integer by
+negative one results in an undefined value.
+
+@smallexample
+@exdent vector unsigned __int128 vec_dive (vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_dive (vector signed __int128,
+                                         vector signed __int128);
+@end smallexample
+
+The result is produced by shifting the first input left by 128 bits and dividing by the
+second. If an attempt is made to divide by zero or the result is larger than 128 bits,
+the result is undefined.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mod (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_mod (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+The result is the modulo result of dividing the first input  by the second input.
+
+
+The following builtins perform 128-bit vector comparisons.  The @code{vec_all_xx},
+@code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is one of the operations
+@code{eq, ne, gt, lt, ge, le} perform pairwise comparisons between the elements
+at the same positions within their two vector arguments.   The @code{vec_all_xx}
+function returns a non-zero value if and only if all pairwise comparisons are true.  The
+@code{vec_any_xx} function returns a non-zero value if and only if at least one pairwise
+comparison is true.  The @code{vec_cmpxx}function returns a vector of the same type as its
+two arguments, within which each element consists of all ones to denote that specified
+logical comparison of the corresponding elements was true.  Otherwise, the element of the
+returned vector contains all zeros.
+
+@smallexample
+vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128);
+
+int vec_all_eq (vector signed __int128, vector signed __int128);
+int vec_all_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ne (vector signed __int128, vector signed __int128);
+int vec_all_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_all_gt (vector signed __int128, vector signed __int128);
+int vec_all_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_lt (vector signed __int128, vector signed __int128);
+int vec_all_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ge (vector signed __int128, vector signed __int128);
+int vec_all_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_all_le (vector signed __int128, vector signed __int128);
+int vec_all_le (vector unsigned __int128, vector unsigned __int128);
+
+int vec_any_eq (vector signed __int128, vector signed __int128);
+int vec_any_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ne (vector signed __int128, vector signed __int128);
+int vec_any_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_any_gt (vector signed __int128, vector signed __int128);
+int vec_any_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_lt (vector signed __int128, vector signed __int128);
+int vec_any_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ge (vector signed __int128, vector signed __int128);
+int vec_any_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_any_le (vector signed __int128, vector signed __int128);
+int vec_any_le (vector unsigned __int128, vector unsigned __int128);
+@end smallexample
+
+
 @node PowerPC Hardware Transactional Memory Built-in Functions
 @subsection PowerPC Hardware Transactional Memory Built-in Functions
 GCC provides two interfaces for accessing the Hardware Transactional
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
new file mode 100644
index 00000000000..c84494fc28d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -0,0 +1,2254 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10" } */
+
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mvextsd2q\M} 2 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvslq\M} 2 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvsrq\M} 2 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvsraq\M} 2 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvrlq\M} 2 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvcmpuq\M} 0 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvcmpsq\M} 0 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq\M} 0 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq.\M} 16 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 0 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq.\M} 16 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 0 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq.\M} 16 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvmuleud\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvmuloud\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvmulesd\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvmulosd\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvdivsq\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvdivuq\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvdivesq\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvdiveuq\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvmodsq\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mvmoduq\M} 1 { target { ppc_native_128bit } } } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+
+
+void print_i128(__int128_t val)
+{
+  printf(" %lld %llu (0x%llx %llx)",
+	 (signed long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF),
+	 (unsigned long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF));
+}
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i, result_int;
+
+  __int128_t arg1, result;
+  __uint128_t uarg2;
+
+  vector signed long long int vec_arg1_di, vec_arg2_di;
+  vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
+  vector unsigned long long int vec_uresult_di;
+  vector unsigned long long int vec_uexpected_result_di;
+  
+  __int128_t expected_result;
+  __uint128_t uexpected_result;
+
+  vector __int128_t vec_arg1, vec_arg2, vec_result;
+  vector __uint128_t vec_uarg1, vec_uarg2, vec_uarg3, vec_uresult;
+  vector bool __int128  vec_result_bool;
+
+  /* sign extend double to 128-bit integer  */
+  vec_arg1_di[0] = 1000;
+  vec_arg1_di[1] = -123456;
+
+  expected_result = 1000;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1_di[0] = -123456;
+  vec_arg1_di[1] = 1000;
+
+  expected_result = -123456;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  /* test shift 128-bit integers.
+     Note, shift amount is given by the lower 7-bits of the shift amount. */
+  vec_arg1[0] = 3;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]*4;
+
+  vec_result = vec_sl (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 3;
+  uarg2 = 4;
+  expected_result = arg1*16;
+
+  result = arg1 << uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 << uint128):  ");
+    print_i128(arg1);
+    printf(" << %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 3;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]*4;
+  
+  vec_uresult = vec_sl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]/4;
+
+  vec_result = vec_sr (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 48;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]/4;
+  
+  vec_uresult = vec_sr (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 48;
+  uarg2 = 4;
+  expected_result = arg1/16;
+
+  result = arg1 >> uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 >> uint128:  ");
+    print_i128(arg1);
+    printf(" >> %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x0000000012345678ULL;
+  expected_result = (expected_result << 64) | 0x90ABCDEFAABBCCDDULL;
+
+  vec_result = vec_sra (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0xFFFFFFFFFFFFAABBLL;
+  uexpected_result = (uexpected_result << 64) | 0xCCDDEEFF11221234ULL;
+
+  vec_uresult = vec_sra (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x90ABCDEFAABBCCDDULL;
+  expected_result = (expected_result << 64) | 0xEEFF112212345678ULL;
+
+  vec_result = vec_rl (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0x11221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEEFFULL;
+
+  vec_uresult = vec_rl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32 << (63-55) | 95 << (63-63);
+  vec_uarg3[0] = 32;
+  expected_result = 0xAABBCCDDULL;
+  expected_result = (expected_result << 64) | 0xEEFF112200000000ULL;
+
+  vec_result = vec_rlnm (vec_arg1, vec_uarg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(int128, uint128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg2[0] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 8 << (63-55) | 119 << (63-63);
+  vec_uarg3[0] = 48;
+
+  uexpected_result = 0x00221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEE00ULL;
+
+  vec_uresult = vec_rlnm (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(uint128, uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg2[0] && 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_arg2[0] = 0x000000000000DEADULL;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 0x0000BEEF00000000ULL;
+  vec_uarg3[0] = 96 << 16 | 127 << 8 | 32;
+  expected_result = 0x000000000000DEADULL;
+  expected_result = (expected_result << 64) | 0x0000BEEF12345678ULL;
+
+  vec_result = vec_rlmi (vec_arg1, vec_arg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(int128, int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg2_di[1] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 0xDEAD000000000000ULL;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 0x000000000000BEEFULL;
+  vec_uarg3[0] = 16 << 16 | 111 << 8 | 48;
+  uexpected_result = 0xDEAD1234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDBEEFULL;
+
+  vec_uresult = vec_rlmi (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(uint128, unit128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg3[1] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* 128-bit compare tests, result is all 1's if true */
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1[0] = 2468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  uexpected_result = 0xFFFFFFFFFFFFFFFFULL;
+  uexpected_result = (uexpected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: unsigned vec_cmpgt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed vec_cmpgt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpeq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+   
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector multiply Even and Odd tests */
+  vec_arg1_di[0] = 200;
+  vec_arg1_di[1] = 400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[0] * vec_arg2_di[0];
+
+  vec_result = vec_mule (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (signed, signed) failed.\n");
+    printf(" vec_arg1_di[0] = %lld\n", vec_arg1_di[0]);
+    printf(" vec_arg2_di[0] = %lld\n", vec_arg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1_di[0] = -200;
+  vec_arg1_di[1] = -400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[1] * vec_arg2_di[1];
+
+  vec_result = vec_mulo (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (signed, signed) failed.\n");
+    printf(" vec_arg1_di[1] = %lld\n", vec_arg1_di[1]);
+    printf(" vec_arg2_di[1] = %lld\n", vec_arg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[0] * vec_uarg2_di[0];
+
+  vec_uresult = vec_mule (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[1] = %lld\n", vec_uarg1_di[1]);
+    printf(" vec_uarg2_di[1] = %lld\n", vec_uarg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[1] * vec_uarg2_di[1];
+
+  vec_uresult = vec_mulo (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[0] = %lld\n", vec_uarg1_di[0]);
+    printf(" vec_uarg2_di[0] = %lld\n", vec_uarg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Divide Quadword */
+  vec_arg1[0] = -12345678;
+  vec_arg2[0] = 2;
+  expected_result = -6172839;
+
+  vec_result = vec_div (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24680;
+  vec_uarg2[0] = 4;
+  uexpected_result = 6170;
+
+  vec_uresult = vec_div (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Divide Extended Quadword */
+  vec_arg1[0] = -20;        // has 128-bit of zero concatenated onto it
+  vec_arg2[0] = 0x2000000000000000;
+  vec_arg2[0] = vec_arg2[0] << 64;
+  expected_result = -160;
+
+  vec_result = vec_dive (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 20;        // has 128-bit of zero concatenated onto it
+  vec_uarg2[0] = 0x4000000000000000;
+  vec_uarg2[0] = vec_uarg2[0] << 64;
+  uexpected_result = 80;
+
+  vec_uresult = vec_dive (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector modulo quad word  */
+  vec_arg1[0] = -12345675;
+  vec_arg2[0] = 2;
+  expected_result = -1;
+
+  vec_result = vec_mod (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24685;
+  vec_uarg2[0] = 4;
+  uexpected_result = 1;
+
+  vec_uresult = vec_mod (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  return 0;
+}
-- 
2.25.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support
  2020-08-11 19:01 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
  2020-08-11 19:22 ` [Patch 1/5] rs6000, Add 128-bit sign extension support Carl Love
  2020-08-11 19:22 ` [Patch 2/5] rs6000, 128-bit multiply, divide, modulo, shift, compare Carl Love
@ 2020-08-11 19:22 ` Carl Love
  2020-08-14 17:13   ` will schmidt
  2020-08-20  1:29   ` Segher Boessenkool
  2020-08-11 19:23 ` [Patch 4/5] rs6000, Test 128-bit shifts for just the int128 type Carl Love
  2020-08-11 19:23 ` [Patch 5/5] rs6000, Conversions between 128-bit integer and floating point values Carl Love
  4 siblings, 2 replies; 27+ messages in thread
From: Carl Love @ 2020-08-11 19:22 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt; +Cc: Bill Schmidt, cel

Segher, Will:

Path 3 adds support for converting to/from 128-bit integers and 128-bit 
decimal floating point formats.  

                  Carl Love


----------------------------------------------------------------
Add TI to TD (128-bit DFP) and TD to TI support

gcc/ChangeLog

2020-08-10  Carl Love  <cel@us.ibm.com>
	* config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.

gcc/testsuite/ChangeLog

2020-08-10  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/int_128bit-runnable.c:  Add tests.
---
 gcc/config/rs6000/dfp.md                      | 15 +++++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 64 +++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 8f822732bac..ac9fe189f3e 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -222,6 +222,13 @@
   "dcffixq %0,%1"
   [(set_attr "type" "dfp")])
 
+(define_insn "floattitd2"
+  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
+	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
+  "TARGET_TI_VECTOR_OPS"
+  "dcffixqq %0,%1"
+  [(set_attr "type" "dfp")])
+
 ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
 ;; This is the first stage of converting it to an integer type.
 
@@ -241,6 +248,14 @@
   "TARGET_DFP"
   "dctfix<q> %0,%1"
   [(set_attr "type" "dfp")])
+
+  ;; carll
+(define_insn "fixtdti2"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
+	(fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
+  "TARGET_TI_VECTOR_OPS"
+  "dctfixqq %0,%1"
+  [(set_attr "type" "dfp")])
 \f
 ;; Decimal builtin support
 
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index c84494fc28d..d1e69cea021 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -38,6 +38,7 @@
 #if DEBUG
 #include <stdio.h>
 #include <stdlib.h>
+#include <math.h>
 
 
 void print_i128(__int128_t val)
@@ -59,6 +60,13 @@ int main ()
   __int128_t arg1, result;
   __uint128_t uarg2;
 
+  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
+
+  struct conv_t {
+    __uint128_t u128;
+    _Decimal128 d128;
+  } conv, conv2;
+
   vector signed long long int vec_arg1_di, vec_arg2_di;
   vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
   vector unsigned long long int vec_uresult_di;
@@ -2249,6 +2257,62 @@ int main ()
     abort();
 #endif
   }
+  
+  /* DFP to __int128 and __int128 to DFP conversions */
+  /* Can't get printing of DFP values to work.  Print the DFP value as an
+     unsigned int so we can see the bit patterns.  */
+#if 1
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
+  expected_result_dfp128 = conv.d128;
+
+  arg1 = 4;
+
+  conv.d128 = (_Decimal128) arg1;
+
+  result_dfp128 = (_Decimal128) arg1;
+  if (((conv.u128 >>64) != 0x2208000000000000ULL) &&
+      ((conv.u128 & 0xFFFFFFFFFFFFFFFF) != 0x4ULL)) {
+#if DEBUG
+    printf("ERROR:  convert int128 value ");
+    print_i128 (arg1);
+    conv.d128 = result_dfp128;
+    printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)((conv.u128) >>64),
+	   (unsigned long long)((conv.u128) & 0xFFFFFFFFFFFFFFFF));
+
+    conv.d128 = expected_result_dfp128;
+    printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
+	   (unsigned long long) (conv.u128>>64),
+	   (unsigned long long) (conv.u128 & 0xFFFFFFFFFFFFFFFF));
+#else
+    abort();
+#endif
+  }
+#endif
+
+  expected_result = 4;
 
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
+  arg1_dfp128 = conv.d128;
+
+  result = (__int128_t) arg1_dfp128;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR:  convert DFP value ");
+    printf("0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)(conv.u128>>64),
+	   (unsigned long long)(conv.u128 & 0xFFFFFFFFFFFFFFFF));
+    printf("to __int128 value = ");
+    print_i128 (result);
+    printf("\ndoes not match expected_result = ");
+    print_i128 (expected_result);
+    printf("\n");
+#else
+    abort();
+#endif
+  }
   return 0;
 }
-- 
2.25.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Patch 4/5] rs6000,  Test 128-bit shifts for just the int128 type.
  2020-08-11 19:01 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
                   ` (2 preceding siblings ...)
  2020-08-11 19:22 ` [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support Carl Love
@ 2020-08-11 19:23 ` Carl Love
  2020-08-14 17:35   ` will schmidt
  2020-08-20 21:50   ` Segher Boessenkool
  2020-08-11 19:23 ` [Patch 5/5] rs6000, Conversions between 128-bit integer and floating point values Carl Love
  4 siblings, 2 replies; 27+ messages in thread
From: Carl Love @ 2020-08-11 19:23 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt; +Cc: Bill Schmidt, cel

Segher, Will:

Patch 4 adds 128-bit integer shift instruction support.

                 Carl Love

---------------------------------------------------------
Test 128-bit shifts for just the int128 type.

gcc/ChangeLog

2020-08-10  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq): Add mode
	VEC_I128.
	* config/rs6000/vector.md (VEC_I128): New mode iterator.
	(vashlv1ti3): Change to vashl<mode>3, mode VEC_I128.
	(vlshrv1ti3): Change to vlshr<mode>3, mode VEC_I128.
	* config/rs6000/vsx.md (UNSPEC_XXSWAPD_V1TI): Change to
	UNSPEC_XXSWAPD_VEC_I128.
	(xxswapd_v1ti): Change to xxswapd_<mode>, mode VEC_I128.

gcc/testsuite/ChangeLog

2020-08-10  Carl Love  <cel@us.ibm.com>
	gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
	tests.
---
 gcc/config/rs6000/altivec.md                  | 16 +++++------
 gcc/config/rs6000/vector.md                   | 27 ++++++++++---------
 gcc/config/rs6000/vsx.md                      | 14 +++++-----
 .../gcc.target/powerpc/int_128bit-runnable.c  | 24 +++++++++++++++--
 4 files changed, 52 insertions(+), 29 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2763d920828..cba39852070 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2219,10 +2219,10 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vslq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vslq_<mode>"
+  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
+	(ashift:VEC_I128 (match_operand:VEC_I128 1 "vsx_register_operand" "v")
+		     (match_operand:VEC_I128 2 "vsx_register_operand" "v")))]
   "TARGET_TI_VECTOR_OPS"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vslq %0,%1,%2"
@@ -2236,10 +2236,10 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vsrq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vsrq_<mode>"
+  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_I128 (match_operand:VEC_I128 1 "vsx_register_operand" "v")
+			   (match_operand:VEC_I128 2 "vsx_register_operand" "v")))]
   "TARGET_TI_VECTOR_OPS"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vsrq %0,%1,%2"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 2deff282076..682aabc4657 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,6 +26,9 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
 
+;; 128-bit int modes
+(define_mode_iterator VEC_I128 [V1TI TI])
+
 ;; Vector int modes for parity
 (define_mode_iterator VEC_IP [V8HI
 			      V4SI
@@ -1635,17 +1638,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vashlv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vashl<mode>3"
+  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
+	(ashift:VEC_I128 (match_operand:VEC_I128 1 "vsx_register_operand")
+			 (match_operand:VEC_I128 2 "vsx_register_operand")))]
   "TARGET_TI_VECTOR_OPS"
 {
   /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vslq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
@@ -1658,17 +1661,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vlshrv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vlshr<mode>3"
+  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_I128 (match_operand:VEC_I128 1 "vsx_register_operand")
+			   (match_operand:VEC_I128 2 "vsx_register_operand")))]
   "TARGET_TI_VECTOR_OPS"
 {
   /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vsrq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 998af3908ad..5be535808b3 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -367,7 +367,7 @@
    UNSPEC_INSERTR
    UNSPEC_REPLACE_ELT
    UNSPEC_REPLACE_UN
-	UNSPEC_XXSWAPD_V1TI
+	UNSPEC_XXSWAPD_VEC_I128
   ])
 
 (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
@@ -3144,12 +3144,12 @@
   "xxpermdi %x0,%x1,%x1,2"
   [(set_attr "type" "vecperm")])
 
-;; Swap upper/lower 64-bit values in a 128-bit vector
-(define_insn "xxswapd_v1ti"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
-		      (parallel [(const_int 0)(const_int 1)])]
-                     UNSPEC_XXSWAPD_V1TI))]
+;; Swap upper/lower 64-bit values in V1TI or TI type
+(define_insn "xxswapd_<mode>"
+  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
+	(unspec:VEC_I128 [(match_operand:VEC_I128 1 "vsx_register_operand" "v")
+			  (parallel [(const_int 0)(const_int 1)])]
+                     UNSPEC_XXSWAPD_VEC_I128))]
   "TARGET_POWER10"
 ;; AIX does not support extended mnemonic xxswapd.  Use the basic
 ;; mnemonic xxpermdi instead.
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index d1e69cea021..b074d83bd68 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -53,6 +53,18 @@ void print_i128(__int128_t val)
 
 void abort (void);
 
+__attribute__((noinline))
+__int128_t shift_right (__int128_t a, __uint128_t b)
+{
+  return a >> b;
+}
+
+__attribute__((noinline))
+__int128_t shift_left (__int128_t a, __uint128_t b)
+{
+  return a << b;
+}
+
 int main ()
 {
   int i, result_int;
@@ -141,10 +153,12 @@ int main ()
 #endif
   }
 
-  arg1 = 3;
+  //  arg1 = 3;
+  arg1 = vec_result[0];
   uarg2 = 4;
   expected_result = arg1*16;
 
+  //  result = shift_left(arg1, uarg2);
   result = arg1 << uarg2;
 
   if (result != expected_result) {
@@ -225,10 +239,16 @@ int main ()
 #endif
   }
 
-  arg1 = 48;
+  //  arg1 = 48;
+
+  // use the previous result to try and keep gcc from doing the shift
+  // at compile time
+  arg1 = vec_uresult[0];
   uarg2 = 4;
   expected_result = arg1/16;
 
+  //Not getting 128-bit shift inst generated
+  //  result = shift_right (arg1, uarg2);
   result = arg1 >> uarg2;
 
   if (result != expected_result) {
-- 
2.25.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Patch 5/5] rs6000,  Conversions between 128-bit integer and floating point values.
  2020-08-11 19:01 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
                   ` (3 preceding siblings ...)
  2020-08-11 19:23 ` [Patch 4/5] rs6000, Test 128-bit shifts for just the int128 type Carl Love
@ 2020-08-11 19:23 ` Carl Love
  2020-08-14 18:50   ` will schmidt
                     ` (2 more replies)
  4 siblings, 3 replies; 27+ messages in thread
From: Carl Love @ 2020-08-11 19:23 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt; +Cc: Bill Schmidt, cel

Segher, Will:

Patch 5 adds the 128-bit integer to/from 128-floating point
conversions.  This patch has to invoke the routines to use the 128-bit
hardware instructions if on Power 10 or use software routines if
running on a pre Power 10 system via the resolve function.  

                          Carl 

-----------------------------------------------------------
Conversions between 128-bit integer and floating point values.

gcc/ChangeLog

2020-08-10  Carl Love  <cel@us.ibm.com>
	config/rs6000/rs6000.md (floatunsti<mode>2,
	fix_trunc<mode>ti2, fixuns_trunc<mode>ti2): Add
	define_insn for mode IEEE 128.
	libgcc/config/rs6000/fixkfi-sw.c: New file.
	libgcc/config/rs6000/fixkfi.c: Remove file.
	libgcc/config/rs6000/fixunskfi-sw.c: New file.
	libgcc/config/rs6000/fixunskfi.c: Remove file.
	libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
	New functions.
	libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
	New macro.
	(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
	__fixunskfti_resolve): Add resolve functions.
	(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
	functions.
	libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
	__fixtfti, __fixunstfti): Add editor commands to change
	names.
	libgcc/config/rs6000/float128-sed-hw (__floattitf,
	__floatuntitf, __fixtfti, __fixunstfti): Add editor commands
	to change names.
	libgcc/config/rs6000/floattikf-sw.c: New file.
	libgcc/config/rs6000/floattikf.c: Remove file.
	libgcc/config/rs6000/floatuntikf-sw.c: New file.
	libgcc/config/rs6000/floatuntikf.c: Remove file.
	libgcc/config/rs6000/floatuntikf-sw.c: New file.
	libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
	__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
	__floatuntikf, __fixkfti, __fixunskfti):	New extern declarations.
	libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
	fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
	(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
	file names to fp128_ppc_funcs.

gcc/testsuite/ChangeLog

2020-08-10  Carl Love  <cel@us.ibm.com>
	gcc.target/powerpc/fl128_conversions.c: New file.
---
 gcc/config/rs6000/rs6000.md                   |  36 +++
 .../gcc.target/powerpc/fp128_conversions.c    | 287 ++++++++++++++++++
 .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
 .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   4 +-
 libgcc/config/rs6000/float128-hw.c            |  24 ++
 libgcc/config/rs6000/float128-ifunc.c         |  44 ++-
 libgcc/config/rs6000/float128-sed             |   4 +
 libgcc/config/rs6000/float128-sed-hw          |   4 +
 .../rs6000/{floattikf.c => floattikf-sw.c}    |   4 +-
 .../{floatuntikf.c => floatuntikf-sw.c}       |   4 +-
 libgcc/config/rs6000/quad-float128.h          |  17 +-
 libgcc/config/rs6000/t-float128               |   3 +-
 12 files changed, 415 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
 rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
 rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (96%)
 rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
 rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 43b620ae1c0..3853ebd4195 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6390,6 +6390,42 @@
    xscvsxddp %x0,%x1"
   [(set_attr "type" "fp")])
 
+(define_insn "floatti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvsqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "floatunsti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvuqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fix_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpsqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fixuns_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpuqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
 ; Allow the combiner to merge source memory operands to the conversion so that
 ; the optimizer/register allocator doesn't try to load the value too early in a
 ; GPR and then use store/load to move it to a FPR and suffer from a store-load
diff --git a/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
new file mode 100644
index 00000000000..f0336e6f1fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
@@ -0,0 +1,287 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10" } */
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mxscvsqqp\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mxscvuqqp\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mxscvqpsqz\M} 1 { target { ppc_native_128bit } } } } */
+/* { dg-final { scan-assembler-times {\mxscvqpuqz\M} 1 { target { ppc_native_128bit } } } } */
+
+#include <stdio.h>
+#include <math.h>
+#include <fenv.h>
+#include <stdlib.h>
+#include <wchar.h>
+
+#define DEBUG 1
+
+void abort (void);
+
+float conv_i_2_fp( long long int a)
+{
+  return (float) a;
+}
+
+double conv_i_2_fpd( long long int a)
+{
+  return (double) a;
+}
+
+double conv_ui_2_fpd( unsigned long long int a)
+{
+  return (double) a;
+}
+
+__float128 conv_i128_2_fp128 (__int128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__float128 conv_ui128_2_fp128 (__uint128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__int128_t conv_fp128_2_i128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__int128_t) a;
+}
+
+__uint128_t conv_fp128_2_ui128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__uint128_t) a;
+}
+
+long double conv_i128_2_ld (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (long double) a;
+}
+
+__ibm128 conv_i128_2_ibm128 (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble, messages about uses IBM long double, no binary output
+  return (__ibm128) a;
+}
+
+int main()
+{
+	float a, expected_result_float;
+	double b, expected_result_double;
+	long long int c, expected_result_llint;
+	unsigned long long int u;
+	__int128_t d;
+	__uint128_t u128;
+	unsigned long long expected_result_uint128[2] ;
+	__float128 e;
+	long double ld;     // another 128-bit float version
+
+	union conv_t {
+		float a;
+		double b;
+		long long int c;
+		long long int128[2] ;
+		unsigned long long uint128[2] ;
+		unsigned long long int u;
+		__int128_t d;
+		__uint128_t u128;
+		__float128 e;
+		long double ld;     // another 128-bit float version
+	} conv, conv_result;
+
+ 
+	c = 20;
+	expected_result_llint = 20.00000;
+	a = conv_i_2_fp (c);
+
+	if (a != expected_result_llint) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fp(%lld) = %10.5f\n", c, a);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_llint);
+ #else
+		abort();
+#endif
+	}
+
+	c = 20;
+	expected_result_double = 20.00000;
+	b = conv_i_2_fpd (c);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fpd(%lld) = %10.5f\n", d, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+	u = 20;
+	expected_result_double = 20.00000;
+	b = conv_ui_2_fpd (u);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_ui_2_fpd(%llu) = %10.5f\n", u, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+  /* Currently printing 128-bit float does not work correctly  */
+  d = -3210;
+  d = (d * 10000000000) + 9876543210;
+  conv_result.e = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0xc02bd2f9068d1160;
+  expected_result_uint128[0] = 0x0;
+  
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(-32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  d = 123;
+  d = (d * 10000000000) + 1234567890;
+  conv_result.ld = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x4271eab4c8ed2000;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(1231234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  /* Currently printing 128-bit float does not work correctly  */
+  u128 = 8760;
+  u128 = (u128 * 10000000000) + 1234567890;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+  expected_result_uint128[1] = 0x402d3eb101df8b48;
+  expected_result_uint128[0] = 0x0;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(87601234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  /* Currently printing 128-bit float does not work correctly  */
+  u128 = 3210;
+  u128 = (u128 * 10000000000) + 9876543210;
+  expected_result_uint128[1] = 0x402bd3429c8feea0;
+  expected_result_uint128[0] = 0x0;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 12345.6789;
+  expected_result_uint128[1] = 0x1407374883526960;
+  expected_result_uint128[0] = 0x3039;
+
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) =  ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = -6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0xffffffffffffe57b;
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x1a85;
+  conv_result.d = conv_fp128_2_ui128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_ui128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+	  
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  return 0;
+}
diff --git a/libgcc/config/rs6000/fixkfti.c b/libgcc/config/rs6000/fixkfti-sw.c
similarity index 96%
rename from libgcc/config/rs6000/fixkfti.c
rename to libgcc/config/rs6000/fixkfti-sw.c
index a22286228aa..d6bbbf889b7 100644
--- a/libgcc/config/rs6000/fixkfti.c
+++ b/libgcc/config/rs6000/fixkfti-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TItype
-__fixkfti (TFtype a)
+__fixkfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/fixunskfti.c b/libgcc/config/rs6000/fixunskfti-sw.c
similarity index 96%
rename from libgcc/config/rs6000/fixunskfti.c
rename to libgcc/config/rs6000/fixunskfti-sw.c
index ab232d92d24..d803936e48a 100644
--- a/libgcc/config/rs6000/fixunskfti.c
+++ b/libgcc/config/rs6000/fixunskfti-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 UTItype
-__fixunskfti (TFtype a)
+__fixunskfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/float128-hw.c b/libgcc/config/rs6000/float128-hw.c
index 8705b53e22a..be8bd07e853 100644
--- a/libgcc/config/rs6000/float128-hw.c
+++ b/libgcc/config/rs6000/float128-hw.c
@@ -86,6 +86,30 @@ __floatdikf_hw (DItype_ppc a)
   return (TFtype) a;
 }
 
+TFtype
+__floattikf_hw (TItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TFtype
+__floatuntikf_hw (UTItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TItype_ppc
+__fixkfti_hw (TFtype a)
+{
+  return (TItype_ppc) a;
+}
+
+UTItype_ppc
+__fixunskfti_hw (TFtype a)
+{
+  return (UTItype_ppc) a;
+}
+
 TFtype
 __floatundikf_hw (UDItype_ppc a)
 {
diff --git a/libgcc/config/rs6000/float128-ifunc.c b/libgcc/config/rs6000/float128-ifunc.c
index c2f65912a74..c221be2c864 100644
--- a/libgcc/config/rs6000/float128-ifunc.c
+++ b/libgcc/config/rs6000/float128-ifunc.c
@@ -46,14 +46,9 @@
 #endif
 
 #define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW : SW)
+#define SW_OR_HW_ISA3_1(SW, HW) (__builtin_cpu_supports ("arch_3_1") ? HW : SW)
 
 /* Resolvers.  */
-
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 static __typeof__ (__addkf3_sw) *
 __addkf3_resolve (void)
 {
@@ -102,6 +97,18 @@ __floatdikf_resolve (void)
   return SW_OR_HW (__floatdikf_sw, __floatdikf_hw);
 }
 
+static __typeof__ (__floattikf_sw) *
+__floattikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floattikf_sw, __floattikf_hw);
+}
+
+static __typeof__ (__floatuntikf_sw) *
+__floatuntikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floatuntikf_sw, __floatuntikf_hw);
+}
+
 static __typeof__ (__floatunsikf_sw) *
 __floatunsikf_resolve (void)
 {
@@ -114,6 +121,19 @@ __floatundikf_resolve (void)
   return SW_OR_HW (__floatundikf_sw, __floatundikf_hw);
 }
 
+
+static __typeof__ (__fixkfti_sw) *
+__fixkfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixkfti_sw, __fixkfti_hw);
+}
+
+static __typeof__ (__fixunskfti_sw) *
+__fixunskfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixunskfti_sw, __fixunskfti_hw);
+}
+
 static __typeof__ (__fixkfsi_sw) *
 __fixkfsi_resolve (void)
 {
@@ -303,6 +323,18 @@ TFtype __floatsikf (SItype_ppc)
 TFtype __floatdikf (DItype_ppc)
   __attribute__ ((__ifunc__ ("__floatdikf_resolve")));
 
+TFtype __floattikf (TItype_ppc)
+  __attribute__ ((__ifunc__ ("__floattikf_resolve")));
+
+TFtype __floatuntikf (UTItype_ppc)
+  __attribute__ ((__ifunc__ ("__floatuntikf_resolve")));
+
+TItype_ppc __fixkfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixkfti_resolve")));
+
+UTItype_ppc __fixunskfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixunskfti_resolve")));
+
 TFtype __floatunsikf (USItype_ppc)
   __attribute__ ((__ifunc__ ("__floatunsikf_resolve")));
 
diff --git a/libgcc/config/rs6000/float128-sed b/libgcc/config/rs6000/float128-sed
index d9a089ff9ba..c0fcddb1959 100644
--- a/libgcc/config/rs6000/float128-sed
+++ b/libgcc/config/rs6000/float128-sed
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi/g
 s/__fixunstfdi/__fixunskfdi/g
 s/__fixunstfsi/__fixunskfsi/g
 s/__floatditf/__floatdikf/g
+s/__floattitf/__floattikf/g
+s/__floatuntitf/__floatuntikf/g
+s/__fixtfti/__fixkfti/g
+s/__fixunstfti/__fixunskfti/g
 s/__floatsitf/__floatsikf/g
 s/__floatunditf/__floatundikf/g
 s/__floatunsitf/__floatunsikf/g
diff --git a/libgcc/config/rs6000/float128-sed-hw b/libgcc/config/rs6000/float128-sed-hw
index acf36b0c17d..3d2bf556da1 100644
--- a/libgcc/config/rs6000/float128-sed-hw
+++ b/libgcc/config/rs6000/float128-sed-hw
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi_sw/g
 s/__fixunstfdi/__fixunskfdi_sw/g
 s/__fixunstfsi/__fixunskfsi_sw/g
 s/__floatditf/__floatdikf_sw/g
+s/__floattitf/__floattikf_sw/g
+s/__floatuntitf/__floatuntikf_sw/g
+s/__fixtfti/__fixkfti_sw/g
+s/__fixunstfti/__fixunskfti_sw/g
 s/__floatsitf/__floatsikf_sw/g
 s/__floatunditf/__floatundikf_sw/g
 s/__floatunsitf/__floatunsikf_sw/g
diff --git a/libgcc/config/rs6000/floattikf.c b/libgcc/config/rs6000/floattikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floattikf.c
rename to libgcc/config/rs6000/floattikf-sw.c
index 4e8c40cfbe4..110706352bb 100644
--- a/libgcc/config/rs6000/floattikf.c
+++ b/libgcc/config/rs6000/floattikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floattikf (TItype i)
+__floattikf_sw (TItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/floatuntikf.c b/libgcc/config/rs6000/floatuntikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floatuntikf.c
rename to libgcc/config/rs6000/floatuntikf-sw.c
index 8bfba4267d4..5e712a67e26 100644
--- a/libgcc/config/rs6000/floatuntikf.c
+++ b/libgcc/config/rs6000/floatuntikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floatuntikf (UTItype i)
+__floatuntikf_sw (UTItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/quad-float128.h b/libgcc/config/rs6000/quad-float128.h
index 32ef328a8ea..24712b9277f 100644
--- a/libgcc/config/rs6000/quad-float128.h
+++ b/libgcc/config/rs6000/quad-float128.h
@@ -87,19 +87,18 @@ extern USItype_ppc __fixunskfsi_sw (TFtype);
 extern UDItype_ppc __fixunskfdi_sw (TFtype);
 extern TFtype __floatsikf_sw (SItype_ppc);
 extern TFtype __floatdikf_sw (DItype_ppc);
+extern TFtype __floattikf_sw (TItype_ppc);
 extern TFtype __floatunsikf_sw (USItype_ppc);
 extern TFtype __floatundikf_sw (UDItype_ppc);
+extern TFtype __floatuntikf_sw (UTItype_ppc);
+extern TItype_ppc __fixkfti_sw (TFtype);
+extern UTItype_ppc __fixunskfti_sw (TFtype);
 extern IBM128_TYPE __extendkftf2_sw (TFtype);
 extern TFtype __trunctfkf2_sw (IBM128_TYPE);
 extern TCtype __mulkc3_sw (TFtype, TFtype, TFtype, TFtype);
 extern TCtype __divkc3_sw (TFtype, TFtype, TFtype, TFtype);
 
 #ifdef _ARCH_PPC64
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 extern TItype_ppc __fixkfti (TFtype);
 extern UTItype_ppc __fixunskfti (TFtype);
 extern TFtype __floattikf (TItype_ppc);
@@ -130,8 +129,12 @@ extern USItype_ppc __fixunskfsi_hw (TFtype);
 extern UDItype_ppc __fixunskfdi_hw (TFtype);
 extern TFtype __floatsikf_hw (SItype_ppc);
 extern TFtype __floatdikf_hw (DItype_ppc);
+extern TFtype __floattikf_hw (TItype_ppc);
 extern TFtype __floatunsikf_hw (USItype_ppc);
 extern TFtype __floatundikf_hw (UDItype_ppc);
+extern TFtype __floatuntikf_hw (UTItype_ppc);
+extern TItype_ppc __fixkfti_hw (TFtype);
+extern UTItype_ppc __fixunskfti_hw (TFtype);
 extern IBM128_TYPE __extendkftf2_hw (TFtype);
 extern TFtype __trunctfkf2_hw (IBM128_TYPE);
 extern TCtype __mulkc3_hw (TFtype, TFtype, TFtype, TFtype);
@@ -162,8 +165,12 @@ extern USItype_ppc __fixunskfsi (TFtype);
 extern UDItype_ppc __fixunskfdi (TFtype);
 extern TFtype __floatsikf (SItype_ppc);
 extern TFtype __floatdikf (DItype_ppc);
+extern TFtype __floattikf (TItype_ppc);
 extern TFtype __floatunsikf (USItype_ppc);
 extern TFtype __floatundikf (UDItype_ppc);
+extern TFtype __floatuntikf (UTItype_ppc);
+extern TItype_ppc __fixkfti (TFtype);
+extern UTItype_ppc __fixunskfti (TFtype);
 extern IBM128_TYPE __extendkftf2 (TFtype);
 extern TFtype __trunctfkf2 (IBM128_TYPE);
 
diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
index d5413445189..325b22fd49e 100644
--- a/libgcc/config/rs6000/t-float128
+++ b/libgcc/config/rs6000/t-float128
@@ -23,7 +23,8 @@ fp128_softfp_shared_obj	= $(addsuffix -sw_s$(objext),$(fp128_softfp_funcs))
 fp128_softfp_obj	= $(fp128_softfp_static_obj) $(fp128_softfp_shared_obj)
 
 # New functions for software emulation
-fp128_ppc_funcs		= floattikf floatuntikf fixkfti fixunskfti \
+fp128_ppc_funcs		= floattikf-sw floatuntikf-sw \
+			  fixkfti-sw fixunskfti-sw \
 			  extendkftf2-sw trunctfkf2-sw \
 			  sfp-exceptions _mulkc3 _divkc3 _powikf2
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 1/5] rs6000, Add 128-bit sign extension support
  2020-08-11 19:22 ` [Patch 1/5] rs6000, Add 128-bit sign extension support Carl Love
@ 2020-08-13 17:36   ` Segher Boessenkool
  2020-08-13 18:09     ` Carl Love
  0 siblings, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2020-08-13 17:36 UTC (permalink / raw)
  To: Carl Love; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt, cel

Hi!

On Tue, Aug 11, 2020 at 12:22:37PM -0700, Carl Love wrote:
> +/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1.  */

What does this mean?  Not defined in GCC before now?  Does it need
backporting?  Not defined in older versions of the ELFv2 ABI (or vector
doc) and we do not want a backport?

> +  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 */

Same (also "work work").

> +uThe following sign extension builtins are provided.

(stray "u")

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
> @@ -0,0 +1,128 @@
> +/* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } */

/* { dg-do run { target { lp64 && p9vector_hw } } } */

or such; or do you require Linux actually?

> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */

Is -save-temps needed?  Not for the scan-assembler at least.

Okay for trunk with those details take care of.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [Patch 1/5] rs6000, Add 128-bit sign extension support
  2020-08-13 17:36   ` Segher Boessenkool
@ 2020-08-13 18:09     ` Carl Love
  2020-08-13 18:29       ` Segher Boessenkool
  0 siblings, 1 reply; 27+ messages in thread
From: Carl Love @ 2020-08-13 18:09 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt, cel

Segher:

On Thu, 2020-08-13 at 12:36 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 11, 2020 at 12:22:37PM -0700, Carl Love wrote:
> > +/* Sign extend builtins that work on ISA 3.0, but not defined
> > until ISA 3.1.  */
> 
> What does this mean?  Not defined in GCC before now?  Does it need
> backporting?  Not defined in older versions of the ELFv2 ABI (or
> vector
> doc) and we do not want a backport?
> 
> > +  /* Sign extend builtins that work work on ISA 3.0, not added
> > until ISA 3.1 */

The builtins

vector signed int vec_signexti (vector signed char a)
vector signed long long vec_signextll (vector signed char a)
vector signed int vec_signexti (vector signed short a)
vector signed long long vec_signextll (vector signed short a)
vector signed long long vec_signextll (vector signed int a)

were defined in the function prototypes directory in box called "RFC
2608 - 128-bit Binary Integer Operations".  The document the new P10
builtins.  However, this subset of the newly defined builtins for P10
can be implemented with existing Power 9 instructions.  That was the
point of the comment.  That is probably a level of detail that is not
really needed in the GCC code comment.  Probably best to just change
the comment to read something like "ISA 3.0 sign extend builtins". 

My thought for calling it out is that they could be back ported to an
earlier GCC version since they use Power 9 instructions but it is
probably not worth the effort unless there is an explicit request for
them. 

                 Carl 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 1/5] rs6000, Add 128-bit sign extension support
  2020-08-13 18:09     ` Carl Love
@ 2020-08-13 18:29       ` Segher Boessenkool
  2020-08-13 22:11         ` [EXTERNAL] " will schmidt
  0 siblings, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2020-08-13 18:29 UTC (permalink / raw)
  To: Carl Love; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt

On Thu, Aug 13, 2020 at 11:09:10AM -0700, Carl Love wrote:
> The builtins
> 
> vector signed int vec_signexti (vector signed char a)
> vector signed long long vec_signextll (vector signed char a)
> vector signed int vec_signexti (vector signed short a)
> vector signed long long vec_signextll (vector signed short a)
> vector signed long long vec_signextll (vector signed int a)
> 
> were defined in the function prototypes directory in box called "RFC
> 2608 - 128-bit Binary Integer Operations".  The document the new P10
> builtins.  However, this subset of the newly defined builtins for P10
> can be implemented with existing Power 9 instructions.  That was the
> point of the comment.

Ah, I see :-)

> That is probably a level of detail that is not
> really needed in the GCC code comment.  Probably best to just change
> the comment to read something like "ISA 3.0 sign extend builtins". 

Sounds good.

> My thought for calling it out is that they could be back ported to an
> earlier GCC version since they use Power 9 instructions but it is
> probably not worth the effort unless there is an explicit request for
> them. 

Yeah.  Thanks for the explanation!


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [EXTERNAL] Re: [Patch 1/5] rs6000, Add 128-bit sign extension support
  2020-08-13 18:29       ` Segher Boessenkool
@ 2020-08-13 22:11         ` will schmidt
  2020-08-13 22:55           ` Segher Boessenkool
  0 siblings, 1 reply; 27+ messages in thread
From: will schmidt @ 2020-08-13 22:11 UTC (permalink / raw)
  To: Segher Boessenkool, Carl Love; +Cc: dje.gcc, gcc-patches, Bill Schmidt

On Thu, 2020-08-13 at 13:29 -0500, Segher Boessenkool wrote:
> On Thu, Aug 13, 2020 at 11:09:10AM -0700, Carl Love wrote:
> > The builtins
> > 
> > vector signed int vec_signexti (vector signed char a)
> > vector signed long long vec_signextll (vector signed char a)
> > vector signed int vec_signexti (vector signed short a)
> > vector signed long long vec_signextll (vector signed short a)
> > vector signed long long vec_signextll (vector signed int a)
> > 
> > were defined in the function prototypes directory in box called
> > "RFC
> > 2608 - 128-bit Binary Integer Operations".  The document the new
> > P10
> > builtins.  However, this subset of the newly defined builtins for
> > P10
> > can be implemented with existing Power 9 instructions.  That was
> > the
> > point of the comment.
> 
> Ah, I see :-)
> 
> > That is probably a level of detail that is not
> > really needed in the GCC code comment.  Probably best to just
> > change
> > the comment to read something like "ISA 3.0 sign extend builtins". 
> 
> Sounds good.

As long as there are no issues defining the builtins for 3.0 here.
AFAIK they are not documented in ISA 3.0.  This is a happy accident
that these ISA 3.1 builtins can be implemented with existing support.

> 
> > My thought for calling it out is that they could be back ported to
> > an
> > earlier GCC version since they use Power 9 instructions but it is
> > probably not worth the effort unless there is an explicit request
> > for
> > them. 
> 
> Yeah.  Thanks for the explanation!
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 1/5] rs6000, Add 128-bit sign extension support
  2020-08-13 22:11         ` [EXTERNAL] " will schmidt
@ 2020-08-13 22:55           ` Segher Boessenkool
  2020-08-13 23:53             ` [EXTERNAL] " will schmidt
  0 siblings, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2020-08-13 22:55 UTC (permalink / raw)
  To: will schmidt; +Cc: Carl Love, dje.gcc, gcc-patches, Bill Schmidt

Hi!

On Thu, Aug 13, 2020 at 05:11:11PM -0500, will schmidt wrote:
> > > That is probably a level of detail that is not
> > > really needed in the GCC code comment.  Probably best to just
> > > change
> > > the comment to read something like "ISA 3.0 sign extend builtins". 
> > 
> > Sounds good.
> 
> As long as there are no issues defining the builtins for 3.0 here.
> AFAIK they are not documented in ISA 3.0.  This is a happy accident
> that these ISA 3.1 builtins can be implemented with existing support.

There are *no* builtins defined in the ISA!  The insns are just ISA 3.0
instructions.


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 2/5] rs6000, 128-bit multiply, divide, modulo, shift, compare
  2020-08-11 19:22 ` [Patch 2/5] rs6000, 128-bit multiply, divide, modulo, shift, compare Carl Love
@ 2020-08-13 23:46   ` will schmidt
  2020-08-20  1:06     ` Segher Boessenkool
  0 siblings, 1 reply; 27+ messages in thread
From: will schmidt @ 2020-08-13 23:46 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches; +Cc: Bill Schmidt, cel

On Tue, 2020-08-11 at 12:22 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 2, adds support for divide, modulo, shift, compare of 128-bit
> integers.  The support adds the instruction and builtin support.
> 
>              Carl Love
> 
> 
> -------------------------------------------------------
> rs6000, 128-bit multiply, divide, shift, compare
> 
> gcc/ChangeLog
> 
> 2020-08-10  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
> 	for new builtins .

Looks like there is also a change to the parameters for vec_rlnm(a,b,c)
here.  

> 	* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
> 	UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
ok

> 	(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
> 	altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
> 	altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
> 	altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
> 	altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
> 	define_insn.
> 	(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
> 	vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
> 	altivec_vrlqnm): New define_expands.

Also a whitespace fix in there.
ok.

> 	* config/rs6000/rs6000-builtin.def (BU_P10_P, BU_P10_128BIT_1,
> 	BU_P10_128BIT_2, BU_P10_128BIT_3): New macro definitions.


Is this consistent with the other recent changes that reworked some of
those macro definition names?


> 	(VCMPEQUT_P, VCMPGTST_P, VCMPGTUT_P): Add macro expansions.

> 	(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
> 	CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
> 	VCMPAET_P): New macro expansions.

> 	(VSIGNEXTSD2Q,VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ, VSLQ,

comma+space 

> 	VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,


> 	MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.

> 	(VRLQ, VSLQ, VSRQ, VSRAQ, SIGNEXT): New overload expansions.


DIVE, MOD  missing.



> 	* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
> 	P10_BUILTIN_VCMPEQUT, P10_BUILTIN_CMPGE_1TI,

Duplication of P10_BUILTIN_VCMPEQUT.  

> 	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
> 	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,

missing P10_BUILTIN_VCMPLE_U1TI

> 	P10_BUILTIN_128BIT_DIV_V1TI, P10_BUILTIN_128BIT_UDIV_V1TI,
> 	P10_BUILTIN_128BIT_VMULESD, P10_BUILTIN_128BIT_VMULEUD,
> 	P10_BUILTIN_128BIT_VMULOSD, P10_BUILTIN_128BIT_VMULOUD,

> 	P10_BUILTIN_VNOR_V1TI, P10_BUILTIN_VNOR_V1TI_UNS,

> 	P10_BUILTIN_128BIT_VRLQ, P10_BUILTIN_128BIT_VRLQMI,
> 	P10_BUILTIN_128BIT_VRLQNM, P10_BUILTIN_128BIT_VSLQ,

> 	P10_BUILTIN_128BIT_VSRQ, P10_BUILTIN_128BIT_VSRAQ,

> 	P10_BUILTIN_VCMPGTUT_P, P10_BUILTIN_VCMPGTST_P,
> 	P10_BUILTIN_VCMPEQUT_P, P10_BUILTIN_VCMPGTUT_P,
> 	P10_BUILTIN_VCMPGTST_P, P10_BUILTIN_CMPNET,

> 	P10_BUILTIN_VCMPNET_P, P10_BUILTIN_VCMPAET_P,
> 	P10_BUILTIN_128BIT_VSIGNEXTSD2Q, P10_BUILTIN_128BIT_DIVES_V1TI,
> 	P10_BUILTIN_128BIT_MODS_V1TI, P10_BUILTIN_128BIT_MODU_V1TI):
> 	New overloaded definitions.


> 	(int_ftype_int_v1ti_v1ti) [P10_BUILTIN_VCMPEQUT,
> 	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
> 	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
> 	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
> 	P10_BUILTIN_CMPLE_U1TI, E_V1TImode]: New case statements.

Those are part of (rs6000_gimple_fold_builtin). 

Also may be worth a sniff check of the generated code to ensure the
folding behaves properly.


> 	(int_ftype_int_v1ti_v1ti) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
> 	New assignments.

ok.


missing (altivec_init_builtins): Add E_V1TImode case.


> 	(int_ftype_int_v1ti_v1ti)[P10_BUILTIN_128BIT_VMULEUD,
> 	P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
> 	P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
> 	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.

Those are part of (builtin_function_type).


> 	* config/rs6000/r6000.c (rs6000_builtin_mask_calculate): New
> 	TARGET_TI_VECTOR_OPS definition.

> 	(rs6000_option_override_internal): Add if TARGET_POWER10 statement.

comment below.


> 	(rs6000_handle_altivec_attribute)[ E_TImode, E_V1TImode]: New case
> 	statements.
> 	(rs6000_opt_masks): Add ti-vector-ops entry.

ok.

> 	* config/rs6000/r6000.h (MASK_TI_VECTOR_OPS, RS6000_BTM_P10_128BIT,
> 	RS6000_BTM_TI_VECTOR_OPS, bool_V1TI_type_node): New defines.

> 	(rs6000_builtin_type_index): New enum value RS6000_BTI_bool_V1TI.

> 	* config/rs6000/rs6000.opt: New mti-vector-ops entry.

comment below.

> 	* config/rs6000/vector.md (vector_eqv1ti, vector_gtv1ti,
> 	vector_nltv1ti, vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti,
> 	vector_ngtuv1ti, vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
> 	vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
> 	vlshrv1ti3, vashrv1ti3): New define_expands.

ok

> 	* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
> 	UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
> 	UNSPEC_VSX_MODUQ, UNSPEC_XXSWAPD_V1TI): New unspecs.

comment below.

> 	(vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti, vsx_diveu_v1ti,
> 	vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti, vsx_sign_extend_v2di_v1ti):
> 	New define_insns.

> 	(vcmpnet): New define_expand.

> 	* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
> 	vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
> 	vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,
> 	vec_cmpge, vec_cmple, vec_all_eq, vec_all_ne, vec_all_gt, vec_all_lt,
> 	vec_all_ge, vec_all_le, vec_any_eq, vec_any_ne, vec_any_gt, vec_any_lt,
> 	vec_any_ge, vec_any_le.

comment below.

> 
> gcc/testsuite/ChangeLog
> 
> 2020-08-10  Carl Love  <cel@us.ibm.com>
> 	* gcc.target/powerpc/int_128bit-runnable.c: New test file.
> ---
>  gcc/config/rs6000/altivec.h                   |    6 +-
>  gcc/config/rs6000/altivec.md                  |  242 +-
>  gcc/config/rs6000/rs6000-builtin.def          |   77 +
>  gcc/config/rs6000/rs6000-call.c               |  150 +-
>  gcc/config/rs6000/rs6000.c                    |   17 +-
>  gcc/config/rs6000/rs6000.h                    |    6 +-
>  gcc/config/rs6000/rs6000.opt                  |    4 +
>  gcc/config/rs6000/vector.md                   |  199 ++
>  gcc/config/rs6000/vsx.md                      |   99 +-
>  gcc/doc/extend.texi                           |  174 ++
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 2254 +++++++++++++++++


The path into the testsuite subdir looks strange there.

>  11 files changed, 3217 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index 09320df14ca..a121004b3af 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -183,7 +183,7 @@
>  #define vec_recipdiv __builtin_vec_recipdiv
>  #define vec_rlmi __builtin_vec_rlmi
>  #define vec_vrlnm __builtin_vec_rlnm
> -#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
> +#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))

per above.   I don't see this change called out.


>  #define vec_rsqrt __builtin_vec_rsqrt
>  #define vec_rsqrte __builtin_vec_rsqrte
>  #define vec_signed __builtin_vec_vsigned
> @@ -694,6 +694,10 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
> 
>  #ifdef _ARCH_PWR10
> +#define vec_signextq  __builtin_vec_vsignextq
> +#define vec_dive __builtin_vec_dive
> +#define vec_mod  __builtin_vec_mod
> +
>  /* May modify these macro definitions if future capabilities overload
>     with support for different vector argument and result types.  */
>  #define vec_cntlzm(a, b)	__builtin_altivec_vclzdm (a, b)
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 0a2e634d6b0..2763d920828 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -39,12 +39,16 @@
>     UNSPEC_VMULESH
>     UNSPEC_VMULEUW
>     UNSPEC_VMULESW
> +   UNSPEC_VMULEUD
> +   UNSPEC_VMULESD
>     UNSPEC_VMULOUB
>     UNSPEC_VMULOSB
>     UNSPEC_VMULOUH
>     UNSPEC_VMULOSH
>     UNSPEC_VMULOUW
>     UNSPEC_VMULOSW
> +   UNSPEC_VMULOUD
> +   UNSPEC_VMULOSD
>     UNSPEC_VPKPX
>     UNSPEC_VPACK_SIGN_SIGN_SAT
>     UNSPEC_VPACK_SIGN_UNS_SAT
> @@ -628,6 +632,14 @@
>    "vcmpequ<VI_char> %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "altivec_eqv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(eq:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
> +		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vcmpequq %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_gt<mode>"
>    [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
>  	(gt:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
> @@ -636,6 +648,14 @@
>    "vcmpgts<VI_char> %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_gtv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(gt:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
> +		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vcmpgtsq %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_gtu<mode>"
>    [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
>  	(gtu:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
> @@ -644,6 +664,14 @@
>    "vcmpgtu<VI_char> %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_gtuv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
> +		  (match_operand:V1TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vcmpgtuq %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_eqv4sf"
>    [(set (match_operand:V4SF 0 "altivec_register_operand" "=v")
>  	(eq:V4SF (match_operand:V4SF 1 "altivec_register_operand" "v")
> @@ -1687,6 +1715,19 @@
>   DONE;
>  })
> 
> +(define_expand "vec_widen_umult_even_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
> + DONE;
> +})
> +
>  (define_expand "vec_widen_smult_even_v4si"
>    [(use (match_operand:V2DI 0 "register_operand"))
>     (use (match_operand:V4SI 1 "register_operand"))
> @@ -1695,11 +1736,24 @@
>  {
>    if (BYTES_BIG_ENDIAN)
>      emit_insn (gen_altivec_vmulesw (operands[0], operands[1], operands[2]));
> - else
> +  else
>      emit_insn (gen_altivec_vmulosw (operands[0], operands[1], operands[2]));
>    DONE;
>  })
> 
> +(define_expand "vec_widen_smult_even_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
> + else
> +    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
>  (define_expand "vec_widen_umult_odd_v16qi"
>    [(use (match_operand:V8HI 0 "register_operand"))
>     (use (match_operand:V16QI 1 "register_operand"))
> @@ -1765,6 +1819,19 @@
>    DONE;
>  })
> 
> +(define_expand "vec_widen_umult_odd_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
>  (define_expand "vec_widen_smult_odd_v4si"
>    [(use (match_operand:V2DI 0 "register_operand"))
>     (use (match_operand:V4SI 1 "register_operand"))
> @@ -1778,6 +1845,19 @@
>    DONE;
>  })
> 
> +(define_expand "vec_widen_smult_odd_v2di"
> +  [(use (match_operand:V1TI 0 "register_operand"))
> +   (use (match_operand:V2DI 1 "register_operand"))
> +   (use (match_operand:V2DI 2 "register_operand"))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
>  (define_insn "altivec_vmuleub"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
> @@ -1859,6 +1939,15 @@
>    "vmuleuw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmuleud"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULEUD))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vmuleud %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  (define_insn "altivec_vmulouw"
>    [(set (match_operand:V2DI 0 "register_operand" "=v")
>         (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -1868,6 +1957,15 @@
>    "vmulouw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmuloud"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULOUD))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vmuloud %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  (define_insn "altivec_vmulesw"
>    [(set (match_operand:V2DI 0 "register_operand" "=v")
>         (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -1877,6 +1975,15 @@
>    "vmulesw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmulesd"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULESD))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vmulesd %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  (define_insn "altivec_vmulosw"
>    [(set (match_operand:V2DI 0 "register_operand" "=v")
>         (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -1886,6 +1993,15 @@
>    "vmulosw %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
> 
> +(define_insn "altivec_vmulosd"
> +  [(set (match_operand:V1TI 0 "register_operand" "=v")
> +       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> +                     (match_operand:V2DI 2 "register_operand" "v")]
> +                    UNSPEC_VMULOSD))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vmulosd %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
>  ;; Vector pack/unpack
>  (define_insn "altivec_vpkpx"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
> @@ -1979,6 +2095,15 @@
>    "vrl<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vrlq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +;; rotate amount in needs to be in bits[57:63] of operand2.
> +  "vrlq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "altivec_vrl<VI_char>mi"
>    [(set (match_operand:VIlong 0 "register_operand" "=v")
>          (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
> @@ -1989,6 +2114,33 @@
>    "vrl<VI_char>mi %0,%2,%3"
>    [(set_attr "type" "veclogical")])
> 
> +(define_expand "altivec_vrlqmi"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
> +		      (match_operand:V1TI 2 "vsx_register_operand")
> +		      (match_operand:V1TI 3 "vsx_register_operand")]
> +		     UNSPEC_VRLMI))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  /* Mask bit begin, end fields need to be in bits [41:55] of 128-bit operand2. */
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[3]));
> +  emit_insn(gen_altivec_vrlqmi_inst (operands[0], operands[1], operands[2], tmp));
> +  DONE;
> +})
> +
> +(define_insn "altivec_vrlqmi_inst"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +		      (match_operand:V1TI 2 "vsx_register_operand" "0")
> +		      (match_operand:V1TI 3 "vsx_register_operand" "v")]
> +		     UNSPEC_VRLMI))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vrlqmi %0,%1,%3"
> +  [(set_attr "type" "veclogical")])
> +
>  (define_insn "altivec_vrl<VI_char>nm"
>    [(set (match_operand:VIlong 0 "register_operand" "=v")
>          (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
> @@ -1998,6 +2150,31 @@
>    "vrl<VI_char>nm %0,%1,%2"
>    [(set_attr "type" "veclogical")])
> 
> +(define_expand "altivec_vrlqnm"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
> +		      (match_operand:V1TI 2 "vsx_register_operand")]
> +		     UNSPEC_VRLNM))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vrlqnm_inst (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
> +(define_insn "altivec_vrlqnm_inst"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +		      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +		     UNSPEC_VRLNM))]
> +  "TARGET_TI_VECTOR_OPS"
> +  ;; rotate and mask bits need to be in upper 64-bits of operand2.
> +  "vrlqnm %0,%1,%2"
> +  [(set_attr "type" "veclogical")])
> +
>  (define_insn "altivec_vsl"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -2042,6 +2219,15 @@
>    "vsl<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vslq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
> +  "vslq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "*altivec_vsr<VI_char>"
>    [(set (match_operand:VI2 0 "register_operand" "=v")
>          (lshiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
> @@ -2050,6 +2236,15 @@
>    "vsr<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vsrq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
> +  "vsrq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "*altivec_vsra<VI_char>"
>    [(set (match_operand:VI2 0 "register_operand" "=v")
>          (ashiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
> @@ -2058,6 +2253,15 @@
>    "vsra<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +(define_insn "altivec_vsraq"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
> +  "vsraq %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_insn "altivec_vsr"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> @@ -2618,6 +2822,18 @@
>    "vcmpequ<VI_char>. %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "altivec_vcmpequt_p"
> +  [(set (reg:CC CR6_REGNO)
> +	(unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v")
> +			   (match_operand:V1TI 2 "altivec_register_operand" "v"))]
> +		   UNSPEC_PREDICATE))
> +   (set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(eq:V1TI (match_dup 1)
> +		 (match_dup 2)))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vcmpequq. %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_vcmpgts<VI_char>_p"
>    [(set (reg:CC CR6_REGNO)
>  	(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
> @@ -2630,6 +2846,18 @@
>    "vcmpgts<VI_char>. %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_vcmpgtst_p"
> +  [(set (reg:CC CR6_REGNO)
> +	(unspec:CC [(gt:CC (match_operand:V1TI 1 "register_operand" "v")
> +			   (match_operand:V1TI 2 "register_operand" "v"))]
> +		   UNSPEC_PREDICATE))
> +   (set (match_operand:V1TI 0 "register_operand" "=v")
> +	(gt:V1TI (match_dup 1)
> +		 (match_dup 2)))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vcmpgtsq. %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_vcmpgtu<VI_char>_p"
>    [(set (reg:CC CR6_REGNO)
>  	(unspec:CC [(gtu:CC (match_operand:VI2 1 "register_operand" "v")
> @@ -2642,6 +2870,18 @@
>    "vcmpgtu<VI_char>. %0,%1,%2"
>    [(set_attr "type" "veccmpfx")])
> 
> +(define_insn "*altivec_vcmpgtut_p"
> +  [(set (reg:CC CR6_REGNO)
> +	(unspec:CC [(gtu:CC (match_operand:V1TI 1 "register_operand" "v")
> +			    (match_operand:V1TI 2 "register_operand" "v"))]
> +		   UNSPEC_PREDICATE))
> +   (set (match_operand:V1TI 0 "register_operand" "=v")
> +	(gtu:V1TI (match_dup 1)
> +		  (match_dup 2)))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vcmpgtuq. %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])
> +
>  (define_insn "*altivec_vcmpeqfp_p"
>    [(set (reg:CC CR6_REGNO)
>  	(unspec:CC [(eq:CC (match_operand:V4SF 1 "register_operand" "v")
> diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
> index 667c2450d41..871da6c4cf7 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -1070,6 +1070,15 @@
>  		     | RS6000_BTC_UNARY),				\
>  		    CODE_FOR_ ## ICODE)			/* ICODE */
> 
> +
> +#define BU_P10_P(ENUM, NAME, ATTR, ICODE)				\
> +  RS6000_BUILTIN_P (P10_BUILTIN_ ## ENUM,		/* ENUM */	\
> +		    "__builtin_altivec_" NAME,		/* NAME */	\
> +		    RS6000_BTM_P10_128BIT,		/* MASK */	\
> +		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
> +		     | RS6000_BTC_PREDICATE),				\
> +		    CODE_FOR_ ## ICODE)			/* ICODE */
> +
>  #define BU_P10_OVERLOAD_1(ENUM, NAME)					\
>    RS6000_BUILTIN_1 (P10_BUILTIN_VEC_ ## ENUM,		/* ENUM */	\
>  		    "__builtin_vec_" NAME,		/* NAME */	\
> @@ -1152,6 +1161,30 @@
>  		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
>  		     | RS6000_BTC_BINARY),				\
>  		    CODE_FOR_ ## ICODE)			/* ICODE */
> +
> +#define BU_P10_128BIT_1(ENUM, NAME, ATTR, ICODE)			\
> +  RS6000_BUILTIN_1 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
> +		    "__builtin_altivec_" NAME,		/* NAME */	\
> +		    RS6000_BTM_P10_128BIT,		/* MASK */	\
> +		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
> +		     | RS6000_BTC_UNARY),				\
> +		    CODE_FOR_ ## ICODE)			/* ICODE */
> +
> +#define BU_P10_128BIT_2(ENUM, NAME, ATTR, ICODE)			\
> +  RS6000_BUILTIN_2 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
> +		    "__builtin_altivec_" NAME,		/* NAME */	\
> +		    RS6000_BTM_P10_128BIT,		/* MASK */	\
> +		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
> +		     | RS6000_BTC_BINARY),				\
> +		    CODE_FOR_ ## ICODE)			/* ICODE */
> +
> +#define BU_P10_128BIT_3(ENUM, NAME, ATTR, ICODE)			\
> +  RS6000_BUILTIN_3 (P10_BUILTIN_128BIT_ ## ENUM,	/* ENUM */	\
> +		    "__builtin_altivec_" NAME,		/* NAME */	\
> +		    RS6000_BTM_P10_128BIT,		/* MASK */	\
> +		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
> +		     | RS6000_BTC_TERNARY),				\
> +		    CODE_FOR_ ## ICODE)			/* ICODE */
>  #endif
> 
>  
> @@ -2712,6 +2745,10 @@ BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsx_sign_extend_hi_v2di)
>  BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsx_sign_extend_si_v2di)
> 
>  /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
> +BU_P10_P (VCMPEQUT_P,		"vcmpequt_p",	CONST,	vector_eq_v1ti_p)
> +BU_P10_P (VCMPGTST_P,		"vcmpgtst_p",	CONST,	vector_gt_v1ti_p)
> +BU_P10_P (VCMPGTUT_P,		"vcmpgtut_p",	CONST,	vector_gtu_v1ti_p)
> +
>  BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
>  BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
>  BU_P10_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm)
> @@ -2733,6 +2770,39 @@ BU_P10V_2 (XXGENPCVM_V8HI, "xxgenpcvm_v8hi", CONST, xxgenpcvm_v8hi)
>  BU_P10V_2 (XXGENPCVM_V4SI, "xxgenpcvm_v4si", CONST, xxgenpcvm_v4si)
>  BU_P10V_2 (XXGENPCVM_V2DI, "xxgenpcvm_v2di", CONST, xxgenpcvm_v2di)
> 
> +BU_P10V_2 (VCMPGTUT,		"vcmpgtut",	CONST,	vector_gtuv1ti)
> +BU_P10V_2 (VCMPGTST,		"vcmpgtst",	CONST,	vector_gtv1ti)
> +BU_P10V_2 (VCMPEQUT,		"vcmpequt",	CONST,	vector_eqv1ti)
> +BU_P10V_2 (CMPNET,		"vcmpnet",	CONST,	vcmpnet)
> +BU_P10V_2 (CMPGE_1TI,		"cmpge_1ti",    CONST,  vector_nltv1ti)
> +BU_P10V_2 (CMPGE_U1TI,		"cmpge_u1ti",   CONST,  vector_nltuv1ti)
> +BU_P10V_2 (CMPLE_1TI,		"cmple_1ti",    CONST,  vector_ngtv1ti)
> +BU_P10V_2 (CMPLE_U1TI,		"cmple_u1ti",   CONST,  vector_ngtuv1ti)
> +BU_P10V_2 (VNOR_V1TI_UNS,	"vnor_v1ti_uns",CONST,	norv1ti3)
> +BU_P10V_2 (VNOR_V1TI,		"vnor_v1ti",	CONST,	norv1ti3)
> +BU_P10V_2 (VCMPNET_P,		"vcmpnet_p",	CONST,	vector_ne_v1ti_p)
> +BU_P10V_2 (VCMPAET_P,		"vcmpaet_p",	CONST,	vector_ae_v1ti_p)
> +
> +BU_P10_128BIT_1 (VSIGNEXTSD2Q,	"vsignext",     CONST,  vsx_sign_extend_v2di_v1ti)
> +
> +BU_P10_128BIT_2 (VMULEUD,	"vmuleud",	CONST,	vec_widen_umult_even_v2di)
> +BU_P10_128BIT_2 (VMULESD,	"vmulesd",	CONST,	vec_widen_smult_even_v2di)
> +BU_P10_128BIT_2 (VMULOUD,	"vmuloud",	CONST,	vec_widen_umult_odd_v2di)
> +BU_P10_128BIT_2 (VMULOSD,	"vmulosd",	CONST,	vec_widen_smult_odd_v2di)
> +BU_P10_128BIT_2 (VRLQ,		"vrlq",		CONST,	vrotlv1ti3)
> +BU_P10_128BIT_2 (VSLQ,		"vslq",		CONST,	vashlv1ti3)
> +BU_P10_128BIT_2 (VSRQ,		"vsrq",		CONST,	vlshrv1ti3)
> +BU_P10_128BIT_2 (VSRAQ,		"vsraq",	CONST,	vashrv1ti3)
> +BU_P10_128BIT_2 (VRLQNM,	"vrlqnm",	CONST,	altivec_vrlqnm)
> +BU_P10_128BIT_2 (DIV_V1TI,	"div_1ti",      CONST,  vsx_div_v1ti)
> +BU_P10_128BIT_2 (UDIV_V1TI,	"udiv_1ti",     CONST,  vsx_udiv_v1ti)
> +BU_P10_128BIT_2 (DIVES_V1TI,	"dives",	CONST,	vsx_dives_v1ti)
> +BU_P10_128BIT_2 (DIVEU_V1TI,	"diveu",	CONST,	vsx_diveu_v1ti)
> +BU_P10_128BIT_2 (MODS_V1TI,	"mods",		CONST,	vsx_mods_v1ti)
> +BU_P10_128BIT_2 (MODU_V1TI,	"modu",		CONST,	vsx_modu_v1ti)
> +
> +BU_P10_128BIT_3 (VRLQMI,	"vrlqmi",	CONST,	altivec_vrlqmi)
> +
>  BU_P10V_3 (VEXTRACTBL, "vextdubvlx", CONST, vextractlv16qi)
>  BU_P10V_3 (VEXTRACTHL, "vextduhvlx", CONST, vextractlv8hi)
>  BU_P10V_3 (VEXTRACTWL, "vextduwvlx", CONST, vextractlv4si)
> @@ -2839,6 +2909,12 @@ BU_P10_OVERLOAD_2 (CLRR, "clrr")
>  BU_P10_OVERLOAD_2 (GNB, "gnb")
>  BU_P10_OVERLOAD_4 (XXEVAL, "xxeval")
>  BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
> +BU_P10_OVERLOAD_2 (VRLQ, "vrlq")
> +BU_P10_OVERLOAD_2 (VSLQ, "vslq")
> +BU_P10_OVERLOAD_2 (VSRQ, "vsrq")
> +BU_P10_OVERLOAD_2 (VSRAQ, "vsraq")
> +BU_P10_OVERLOAD_2 (DIVE,  "dive")
> +BU_P10_OVERLOAD_2 (MOD,  "mod")
> 
>  BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
>  BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
> @@ -2854,6 +2930,7 @@ BU_P10_OVERLOAD_1 (VSTRIL, "stril")
> 
>  BU_P10_OVERLOAD_1 (VSTRIR_P, "strir_p")
>  BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
> +BU_P10_OVERLOAD_1 (SIGNEXT, "vsignextq")
> 
>  BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
>  BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 87699be8a07..2bd6412a502 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -839,6 +839,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10_BUILTIN_VCMPEQUT,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10_BUILTIN_VCMPEQUT,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQFP,
>      RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPEQ, VSX_BUILTIN_XVCMPEQDP,
> @@ -885,6 +889,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U2DI,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, 0},
> +  { ALTIVEC_BUILTIN_VEC_CMPGE, P10_BUILTIN_CMPGE_1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
> +  { ALTIVEC_BUILTIN_VEC_CMPGE, P10_BUILTIN_CMPGE_U1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0},
>    { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTUB,
>      RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTSB,
> @@ -899,8 +908,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTUD,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPGT, P10_BUILTIN_VCMPGTUT,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTSD,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPGT, P10_BUILTIN_VCMPGTST,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTFP,
>      RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPGT, VSX_BUILTIN_XVCMPGTDP,
> @@ -943,6 +956,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_CMPLE_U2DI,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, 0},
> +  { ALTIVEC_BUILTIN_VEC_CMPLE, P10_BUILTIN_CMPLE_1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
> +  { ALTIVEC_BUILTIN_VEC_CMPLE, P10_BUILTIN_CMPLE_U1TI,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0},
>    { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTUB,
>      RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTSB,
> @@ -995,6 +1013,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
>    { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10_BUILTIN_128BIT_DIV_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { VSX_BUILTIN_VEC_DIV, P10_BUILTIN_128BIT_UDIV_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
>      RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
>    { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP,
> @@ -1789,6 +1813,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_MULE, P8V_BUILTIN_VMULEUW,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULE, P10_BUILTIN_128BIT_VMULESD,
> +    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULE, P10_BUILTIN_128BIT_VMULEUD,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +
>    { ALTIVEC_BUILTIN_VEC_VMULEUB, ALTIVEC_BUILTIN_VMULEUB,
>      RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMULESB, ALTIVEC_BUILTIN_VMULESB,
> @@ -1812,6 +1842,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_MULO, P8V_BUILTIN_VMULOUW,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULO, P10_BUILTIN_128BIT_VMULOSD,
> +    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_MULO, P10_BUILTIN_128BIT_VMULOUD,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
> +    RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_MULO, ALTIVEC_BUILTIN_VMULOSH,
>      RS6000_BTI_V4SI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VMULOSH, ALTIVEC_BUILTIN_VMULOSH,
> @@ -1860,6 +1895,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10_BUILTIN_VNOR_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10_BUILTIN_VNOR_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10_BUILTIN_VNOR_V1TI_UNS,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10_BUILTIN_VNOR_V1TI_UNS,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_NOR, P10_BUILTIN_VNOR_V1TI_UNS,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
>      RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V4SI,
> @@ -2115,6 +2160,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_RL, P8V_BUILTIN_VRLD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_RL, P10_BUILTIN_128BIT_VRLQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_RL, P10_BUILTIN_128BIT_VRLQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
>      RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
> @@ -2133,12 +2183,23 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { P9V_BUILTIN_VEC_RLMI, P9V_BUILTIN_VRLDMI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
> +  { P9V_BUILTIN_VEC_RLMI, P10_BUILTIN_128BIT_VRLQMI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI },
> +  { P9V_BUILTIN_VEC_RLMI, P10_BUILTIN_128BIT_VRLQMI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLWNM,
>      RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
>    { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLDNM,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>      RS6000_BTI_unsigned_V2DI, 0 },
> +  { P9V_BUILTIN_VEC_RLNM, P10_BUILTIN_128BIT_VRLQNM,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +  { P9V_BUILTIN_VEC_RLNM, P10_BUILTIN_128BIT_VRLQNM,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
>      RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
> @@ -2155,6 +2216,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SL, P8V_BUILTIN_VSLD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SL, P10_BUILTIN_128BIT_VSLQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SL, P10_BUILTIN_128BIT_VSLQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTDP,
>      RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
>    { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTSP,
> @@ -2351,6 +2417,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SR, P8V_BUILTIN_VSRD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SR, P10_BUILTIN_128BIT_VSRQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SR, P10_BUILTIN_128BIT_VSRQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
>      RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
> @@ -2379,6 +2450,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
>    { ALTIVEC_BUILTIN_VEC_SRA, P8V_BUILTIN_VSRAD,
>      RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SRA, P10_BUILTIN_128BIT_VSRAQ,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_SRA, P10_BUILTIN_128BIT_VSRAQ,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
>      RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
>    { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
> @@ -3996,12 +4072,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTUD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10_BUILTIN_VCMPGTUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10_BUILTIN_VCMPGTST_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, ALTIVEC_BUILTIN_VCMPGTFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
>    { ALTIVEC_BUILTIN_VEC_VCMPGT_P, VSX_BUILTIN_XVCMPGTDP_P,
> @@ -4066,6 +4146,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10_BUILTIN_VCMPEQUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10_BUILTIN_VCMPEQUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, ALTIVEC_BUILTIN_VCMPEQFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
>    { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, VSX_BUILTIN_XVCMPEQDP_P,
> @@ -4117,12 +4201,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10_BUILTIN_VCMPGTUT_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
> +  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10_BUILTIN_VCMPGTST_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, ALTIVEC_BUILTIN_VCMPGEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
>    { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
> @@ -4771,6 +4859,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEW,
>      RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
>      RS6000_BTI_unsigned_V4SI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPNE, P10_BUILTIN_CMPNET,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI,
> +    RS6000_BTI_V1TI, 0 },
> +  { ALTIVEC_BUILTIN_VEC_CMPNE, P10_BUILTIN_CMPNET,
> +    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> 
>    /* The following 2 entries have been deprecated.  */
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEB_P,
> @@ -4856,8 +4950,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V2DI, 0 },
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
>      RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI,
> -    RS6000_BTI_unsigned_V2DI, 0
> -  },
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPNE_P, P10_BUILTIN_VCMPNET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> 
>    /* The following 2 entries have been deprecated.  */
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
> @@ -4871,6 +4967,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
>      RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
>      RS6000_BTI_bool_V2DI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPNE_P, P10_BUILTIN_VCMPNET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> 
>    { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
> @@ -4961,8 +5059,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>      RS6000_BTI_bool_V2DI, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
>      RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI,
> -    RS6000_BTI_unsigned_V2DI, 0
> -  },
> +    RS6000_BTI_unsigned_V2DI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPAE_P, P10_BUILTIN_VCMPAET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> 
>    /* The following 2 entries have been deprecated.  */
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
> @@ -4976,7 +5076,8 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
>      RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
>      RS6000_BTI_bool_V2DI, 0 },
> -
> +  { P9V_BUILTIN_VEC_VCMPAE_P, P10_BUILTIN_VCMPAET_P,
> +    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
>      RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>    { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
> @@ -5903,6 +6004,21 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
>   { P10_BUILTIN_VEC_XVTLSBB_ONES, P10_BUILTIN_XVTLSBB_ONES,
>      RS6000_BTI_INTSI, RS6000_BTI_unsigned_V16QI, 0, 0 },
> 
> +  { P10_BUILTIN_VEC_SIGNEXT, P10_BUILTIN_128BIT_VSIGNEXTSD2Q,
> +    RS6000_BTI_V1TI, RS6000_BTI_V2DI, 0, 0 },
> +
> +  { P10_BUILTIN_VEC_DIVE, P10_BUILTIN_128BIT_DIVES_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P10_BUILTIN_VEC_DIVE, P10_BUILTIN_128BIT_DIVEU_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +
> +  { P10_BUILTIN_VEC_MOD, P10_BUILTIN_128BIT_MODS_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P10_BUILTIN_VEC_MOD, P10_BUILTIN_128BIT_MODU_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> +    RS6000_BTI_unsigned_V1TI, 0 },
> +
>    { RS6000_BUILTIN_NONE, RS6000_BUILTIN_NONE, 0, 0, 0, 0 }
>  };
>  
> @@ -12228,12 +12344,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case ALTIVEC_BUILTIN_VCMPEQUH:
>      case ALTIVEC_BUILTIN_VCMPEQUW:
>      case P8V_BUILTIN_VCMPEQUD:
> +    case P10_BUILTIN_VCMPEQUT:
>        fold_compare_helper (gsi, EQ_EXPR, stmt);
>        return true;
> 
>      case P9V_BUILTIN_CMPNEB:
>      case P9V_BUILTIN_CMPNEH:
>      case P9V_BUILTIN_CMPNEW:
> +    case P10_BUILTIN_CMPNET:
>        fold_compare_helper (gsi, NE_EXPR, stmt);
>        return true;
> 
> @@ -12245,6 +12363,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case VSX_BUILTIN_CMPGE_U4SI:
>      case VSX_BUILTIN_CMPGE_2DI:
>      case VSX_BUILTIN_CMPGE_U2DI:
> +    case P10_BUILTIN_CMPGE_1TI:
> +    case P10_BUILTIN_CMPGE_U1TI:
>        fold_compare_helper (gsi, GE_EXPR, stmt);
>        return true;
> 
> @@ -12256,6 +12376,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case ALTIVEC_BUILTIN_VCMPGTUW:
>      case P8V_BUILTIN_VCMPGTUD:
>      case P8V_BUILTIN_VCMPGTSD:
> +    case P10_BUILTIN_VCMPGTUT:
> +    case P10_BUILTIN_VCMPGTST:
>        fold_compare_helper (gsi, GT_EXPR, stmt);
>        return true;
> 
> @@ -12267,6 +12389,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>      case VSX_BUILTIN_CMPLE_U4SI:
>      case VSX_BUILTIN_CMPLE_2DI:
>      case VSX_BUILTIN_CMPLE_U2DI:
> +    case P10_BUILTIN_CMPLE_1TI:
> +    case P10_BUILTIN_CMPLE_U1TI:
>        fold_compare_helper (gsi, LE_EXPR, stmt);
>        return true;
> 
> @@ -12978,6 +13102,8 @@ rs6000_init_builtins (void)
>  					    ? "__vector __bool long"
>  					    : "__vector __bool long long",
>  					    bool_long_long_type_node, 2);
> +  bool_V1TI_type_node = rs6000_vector_type ("__vector __bool __int128",
> +					    intTI_type_node, 1);
>    pixel_V8HI_type_node = rs6000_vector_type ("__vector __pixel",
>  					     pixel_type_node, 8);
> 
> @@ -13163,6 +13289,10 @@ altivec_init_builtins (void)
>      = build_function_type_list (integer_type_node,
>  				integer_type_node, V2DI_type_node,
>  				V2DI_type_node, NULL_TREE);
> +  tree int_ftype_int_v1ti_v1ti
> +    = build_function_type_list (integer_type_node,
> +				integer_type_node, V1TI_type_node,
> +				V1TI_type_node, NULL_TREE);
>    tree void_ftype_v4si
>      = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
>    tree v8hi_ftype_void
> @@ -13515,6 +13645,9 @@ altivec_init_builtins (void)
>  	case E_VOIDmode:
>  	  type = int_ftype_int_opaque_opaque;
>  	  break;
> +	case E_V1TImode:
> +	  type = int_ftype_int_v1ti_v1ti;
> +	  break;
>  	case E_V2DImode:
>  	  type = int_ftype_int_v2di_v2di;
>  	  break;
> @@ -14114,6 +14247,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
>      case P10_BUILTIN_XXGENPCVM_V8HI:
>      case P10_BUILTIN_XXGENPCVM_V4SI:
>      case P10_BUILTIN_XXGENPCVM_V2DI:
> +    case P10_BUILTIN_128BIT_VMULEUD:
> +    case P10_BUILTIN_128BIT_VMULOUD:
> +    case P10_BUILTIN_128BIT_DIVEU_V1TI:
> +    case P10_BUILTIN_128BIT_MODU_V1TI:
>        h.uns_p[0] = 1;
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;
> @@ -14213,10 +14350,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
>      case VSX_BUILTIN_CMPGE_U8HI:
>      case VSX_BUILTIN_CMPGE_U4SI:
>      case VSX_BUILTIN_CMPGE_U2DI:
> +    case P10_BUILTIN_CMPGE_U1TI:
>      case ALTIVEC_BUILTIN_VCMPGTUB:
>      case ALTIVEC_BUILTIN_VCMPGTUH:
>      case ALTIVEC_BUILTIN_VCMPGTUW:
>      case P8V_BUILTIN_VCMPGTUD:
> +    case P10_BUILTIN_VCMPGTUT:
> +    case P10_BUILTIN_VCMPEQUT:
>        h.uns_p[1] = 1;
>        h.uns_p[2] = 1;
>        break;
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 40ee0a695f1..1fa4a527f12 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -3401,7 +3401,9 @@ rs6000_builtin_mask_calculate (void)
>  	  | ((TARGET_FLOAT128_TYPE)	    ? RS6000_BTM_FLOAT128  : 0)
>  	  | ((TARGET_FLOAT128_HW)	    ? RS6000_BTM_FLOAT128_HW : 0)
>  	  | ((TARGET_MMA)		    ? RS6000_BTM_MMA	   : 0)
> -	  | ((TARGET_POWER10)               ? RS6000_BTM_P10       : 0));
> +	  | ((TARGET_POWER10)               ? RS6000_BTM_P10       : 0)
> +	  | ((TARGET_TI_VECTOR_OPS)         ? RS6000_BTM_TI_VECTOR_OPS : 0));
> +
>  }
> 
>  /* Implement TARGET_MD_ASM_ADJUST.  All asm statements are considered
> @@ -3732,6 +3734,17 @@ rs6000_option_override_internal (bool global_init_p)
>    if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
>      rs6000_print_isa_options (stderr, 0, "before defaults", rs6000_isa_flags);
> 
> +  /* The -mti-vector-ops option requires ISA 3.1 support and -maltivec for
> +     the 128-bit instructions.  Currently, TARGET_POWER10 is sufficient to
> +     enable it by default.  */
> +  if (TARGET_POWER10)
> +    {
> +      if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
> +	warning(0, ("%<-mno-altivec%> disables -mti-vector-ops (128-bit integer vector register operations)."));
> +      else
> +	rs6000_isa_flags |= OPTION_MASK_TI_VECTOR_OPS;
> +    }


It seems odd here that -maltivec is explicitly called out here.  That
should be default on for quite a while at this point.


> +
>    /* Handle explicit -mno-{altivec,vsx,power8-vector,power9-vector} and turn
>       off all of the options that depend on those flags.  */
>    ignore_masks = rs6000_disable_incompatible_switches ();
> @@ -19489,6 +19502,7 @@ rs6000_handle_altivec_attribute (tree *node,
>      case 'b':
>        switch (mode)
>  	{
> +	case E_TImode: case E_V1TImode: result = bool_V1TI_type_node; break;
>  	case E_DImode: case E_V2DImode: result = bool_V2DI_type_node; break;
>  	case E_SImode: case E_V4SImode: result = bool_V4SI_type_node; break;
>  	case E_HImode: case E_V8HImode: result = bool_V8HI_type_node; break;
> @@ -23218,6 +23232,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
>    { "float128-hardware",	OPTION_MASK_FLOAT128_HW,	false, true  },
>    { "fprnd",			OPTION_MASK_FPRND,		false, true  },
>    { "power10",			OPTION_MASK_POWER10,		false, true  },
> +  { "ti-vector-ops",		OPTION_MASK_TI_VECTOR_OPS,      false, true  },
>    { "hard-dfp",			OPTION_MASK_DFP,		false, true  },
>    { "htm",			OPTION_MASK_HTM,		false, true  },
>    { "isel",			OPTION_MASK_ISEL,		false, true  },
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index bbd8060e143..da84abde671 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -539,6 +539,7 @@ extern int rs6000_vector_align[];
>  #define MASK_UPDATE			OPTION_MASK_UPDATE
>  #define MASK_VSX			OPTION_MASK_VSX
>  #define MASK_POWER10			OPTION_MASK_POWER10
> +#define MASK_TI_VECTOR_OPS		OPTION_MASK_TI_VECTOR_OPS
> 
>  #ifndef IN_LIBGCC2
>  #define MASK_POWERPC64			OPTION_MASK_POWERPC64
> @@ -2305,6 +2306,7 @@ extern int frame_pointer_needed;
>  #define RS6000_BTM_P8_VECTOR	MASK_P8_VECTOR	/* ISA 2.07 vector.  */
>  #define RS6000_BTM_P9_VECTOR	MASK_P9_VECTOR	/* ISA 3.0 vector.  */
>  #define RS6000_BTM_P9_MISC	MASK_P9_MISC	/* ISA 3.0 misc. non-vector */
> +#define RS6000_BTM_P10_128BIT   MASK_POWER10    /* ISA P10 vector.  */

Should comment be 128-bit something?  (not just P10 vector).

>  #define RS6000_BTM_CRYPTO	MASK_CRYPTO	/* crypto funcs.  */
>  #define RS6000_BTM_HTM		MASK_HTM	/* hardware TM funcs.  */
>  #define RS6000_BTM_FRE		MASK_POPCNTB	/* FRE instruction.  */
> @@ -2322,7 +2324,7 @@ extern int frame_pointer_needed;
>  #define RS6000_BTM_FLOAT128_HW	MASK_FLOAT128_HW /* IEEE 128-bit float h/w.  */
>  #define RS6000_BTM_MMA		MASK_MMA	/* ISA 3.1 MMA.  */
>  #define RS6000_BTM_P10		MASK_POWER10
> -
> +#define RS6000_BTM_TI_VECTOR_OPS MASK_TI_VECTOR_OPS /* 128-bit integer support */
> 
>  #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
>  				 | RS6000_BTM_VSX			\
> @@ -2436,6 +2438,7 @@ enum rs6000_builtin_type_index
>    RS6000_BTI_bool_V8HI,          /* __vector __bool short */
>    RS6000_BTI_bool_V4SI,          /* __vector __bool int */
>    RS6000_BTI_bool_V2DI,          /* __vector __bool long */
> +  RS6000_BTI_bool_V1TI,          /* __vector __bool long */

Fix comment?


>    RS6000_BTI_pixel_V8HI,         /* __vector __pixel */
>    RS6000_BTI_long,	         /* long_integer_type_node */
>    RS6000_BTI_unsigned_long,      /* long_unsigned_type_node */
> @@ -2489,6 +2492,7 @@ enum rs6000_builtin_type_index
>  #define bool_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V8HI])
>  #define bool_V4SI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V4SI])
>  #define bool_V2DI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V2DI])
> +#define bool_V1TI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V1TI])
>  #define pixel_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_pixel_V8HI])
> 
>  #define long_long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long_long])
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 9d3e740e930..67d667bf1fd 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -585,3 +585,7 @@ Generate (do not generate) pc-relative memory addressing.
>  mmma
>  Target Report Mask(MMA) Var(rs6000_isa_flags)
>  Generate (do not generate) MMA instructions.
> +
> +mti-vector-ops
> +Target Report Mask(TI_VECTOR_OPS) Var(rs6000_isa_flags)
> +Use integer 128-bit instructions for a future architecture.


'future' can probably be adjusted.


> \ No newline at end of file

diff error?



> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 796345c80d3..2deff282076 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -678,6 +678,13 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_eqv1ti"
> +  [(set (match_operand:V1TI 0 "vlogical_operand")
> +	(eq:V1TI (match_operand:V1TI 1 "vlogical_operand")
> +		 (match_operand:V1TI 2 "vlogical_operand")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "")
> +
>  (define_expand "vector_gt<mode>"
>    [(set (match_operand:VEC_C 0 "vlogical_operand")
>  	(gt:VEC_C (match_operand:VEC_C 1 "vlogical_operand")
> @@ -685,6 +692,13 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gtv1ti"
> +  [(set (match_operand:V1TI 0 "vlogical_operand")
> +	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
> +		 (match_operand:V1TI 2 "vlogical_operand")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "")
> +
>  ; >= for integer vectors: swap operands and apply not-greater-than
>  (define_expand "vector_nlt<mode>"
>    [(set (match_operand:VEC_I 3 "vlogical_operand")
> @@ -697,6 +711,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_nltv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gt:V1TI (match_operand:V1TI 2 "vlogical_operand")
> +		 (match_operand:V1TI 1 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +        (not:V1TI (match_dup 3)))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  (define_expand "vector_gtu<mode>"
>    [(set (match_operand:VEC_I 0 "vint_operand")
>  	(gtu:VEC_I (match_operand:VEC_I 1 "vint_operand")
> @@ -704,6 +729,13 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gtuv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand")
> +	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand")
> +		  (match_operand:V1TI 2 "altivec_register_operand")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "")
> +
>  ; >= for integer vectors: swap operands and apply not-greater-than
>  (define_expand "vector_nltu<mode>"
>    [(set (match_operand:VEC_I 3 "vlogical_operand")
> @@ -716,6 +748,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_nltuv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gtu:V1TI (match_operand:V1TI 2 "vlogical_operand")
> +		  (match_operand:V1TI 1 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +	(not:V1TI (match_dup 3)))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  (define_expand "vector_geu<mode>"
>    [(set (match_operand:VEC_I 0 "vint_operand")
>  	(geu:VEC_I (match_operand:VEC_I 1 "vint_operand")
> @@ -735,6 +778,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_ngtv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
> +		 (match_operand:V1TI 2 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +        (not:V1TI (match_dup 3)))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  (define_expand "vector_ngtu<mode>"
>    [(set (match_operand:VEC_I 3 "vlogical_operand")
>  	(gtu:VEC_I (match_operand:VEC_I 1 "vlogical_operand")
> @@ -746,6 +800,17 @@
>    operands[3] = gen_reg_rtx_and_attrs (operands[0]);
>  })
> 
> +(define_expand "vector_ngtuv1ti"
> +  [(set (match_operand:V1TI 3 "vlogical_operand")
> +	(gtu:V1TI (match_operand:V1TI 1 "vlogical_operand")
> +		  (match_operand:V1TI 2 "vlogical_operand")))
> +   (set (match_operand:V1TI 0 "vlogical_operand")
> +        (not:V1TI (match_dup 3)))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
> +})
> +
>  ; There are 14 possible vector FP comparison operators, gt and eq of them have
>  ; been expanded above, so just support 12 remaining operators here.
> 
> @@ -894,6 +959,18 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_eq_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			     (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_operand:V1TI 0 "vlogical_operand")
> +	  (eq:V1TI (match_dup 1)
> +		   (match_dup 2)))])]
> +  "TARGET_TI_VECTOR_OPS"
> +  "")
> +
>  ;; This expansion handles the V16QI, V8HI, and V4SI modes in the
>  ;; implementation of the vec_all_ne built-in functions on Power9.
>  (define_expand "vector_ne_<mode>_p"
> @@ -976,6 +1053,23 @@
>    operands[3] = gen_reg_rtx (V2DImode);
>  })
> 
> +(define_expand "vector_ne_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			     (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_dup 3)
> +	  (eq:V1TI (match_dup 1)
> +		   (match_dup 2)))])
> +   (set (match_operand:SI 0 "register_operand" "=r")
> +	(eq:SI (reg:CC CR6_REGNO)
> +	       (const_int 0)))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  operands[3] = gen_reg_rtx (V1TImode);
> +})
> +
>  ;; This expansion handles the V2DI mode in the implementation of the
>  ;; vec_any_eq built-in function on Power9.
>  ;;
> @@ -1002,6 +1096,27 @@
>    operands[3] = gen_reg_rtx (V2DImode);
>  })
> 
> +;; Power 10

Meaningful comment?

> +(define_expand "vector_ae_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			     (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_dup 3)
> +	  (eq:V1TI (match_dup 1)
> +		   (match_dup 2)))])
> +   (set (match_operand:SI 0 "register_operand" "=r")
> +	(eq:SI (reg:CC CR6_REGNO)
> +	       (const_int 0)))
> +   (set (match_dup 0)
> +	(xor:SI (match_dup 0)
> +		(const_int 1)))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  operands[3] = gen_reg_rtx (V1TImode);
> +})
> +
>  ;; This expansion handles the V4SF and V2DF modes in the Power9
>  ;; implementation of the vec_all_ne built-in functions.  Note that the
>  ;; expansions for this pattern with these modes makes no use of power9-
> @@ -1061,6 +1176,18 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gt_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(gt:CC (match_operand:V1TI 1 "vlogical_operand")
> +			     (match_operand:V1TI 2 "vlogical_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_operand:V1TI 0 "vlogical_operand")
> +	  (gt:V1TI (match_dup 1)
> +		   (match_dup 2)))])]
> +  "TARGET_TI_VECTOR_OPS"
> +  "")
> +
>  (define_expand "vector_ge_<mode>_p"
>    [(parallel
>      [(set (reg:CC CR6_REGNO)
> @@ -1085,6 +1212,18 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vector_gtu_v1ti_p"
> +  [(parallel
> +    [(set (reg:CC CR6_REGNO)
> +	  (unspec:CC [(gtu:CC (match_operand:V1TI 1 "altivec_register_operand")
> +			      (match_operand:V1TI 2 "altivec_register_operand"))]
> +		     UNSPEC_PREDICATE))
> +     (set (match_operand:V1TI 0 "altivec_register_operand")
> +	  (gtu:V1TI (match_dup 1)
> +		    (match_dup 2)))])]
> +  "TARGET_TI_VECTOR_OPS"
> +  "")
> +
>  ;; AltiVec/VSX predicates.
> 
>  ;; This expansion is triggered during expansion of predicate built-in
> @@ -1460,6 +1599,20 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +(define_expand "vrotlv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vrlq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
>  ;; Expanders for rotatert to make use of vrotl
>  (define_expand "vrotr<mode>3"
>    [(set (match_operand:VEC_I 0 "vint_operand")
> @@ -1481,6 +1634,21 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +;; No immediate version of this 128-bit instruction
> +(define_expand "vashlv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
>  ;; Expanders for logical shift right on each vector element
>  (define_expand "vlshr<mode>3"
>    [(set (match_operand:VEC_I 0 "vint_operand")
> @@ -1489,6 +1657,21 @@
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> 
> +;; No immediate version of this 128-bit instruction
> +(define_expand "vlshrv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
>  ;; Expanders for arithmetic shift right on each vector element
>  (define_expand "vashr<mode>3"
>    [(set (match_operand:VEC_I 0 "vint_operand")
> @@ -1496,6 +1679,22 @@
>  			(match_operand:VEC_I 2 "vint_operand")))]
>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>    "")
> +
> +;; No immediate version of this 128-bit instruction
> +(define_expand "vashrv1ti3"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> +		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +{
> +  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +
> +  emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> +  emit_insn(gen_altivec_vsraq (operands[0], operands[1], tmp));
> +  DONE;
> +})
> +
>  
>  ;; Vector reduction expanders for VSX
>  ; The (VEC_reduc:...
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 1153a01b4ef..998af3908ad 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -298,6 +298,12 @@
>     UNSPEC_VSX_XXSPLTD
>     UNSPEC_VSX_DIVSD
>     UNSPEC_VSX_DIVUD
> +   UNSPEC_VSX_DIVSQ
> +   UNSPEC_VSX_DIVUQ
> +   UNSPEC_VSX_DIVESQ
> +   UNSPEC_VSX_DIVEUQ
> +   UNSPEC_VSX_MODSQ
> +   UNSPEC_VSX_MODUQ
>     UNSPEC_VSX_MULSD
>     UNSPEC_VSX_SIGN_EXTEND
>     UNSPEC_VSX_XVCVBF16SP
> @@ -361,6 +367,7 @@
>     UNSPEC_INSERTR
>     UNSPEC_REPLACE_ELT
>     UNSPEC_REPLACE_UN
> +	UNSPEC_XXSWAPD_V1TI
>    ])
> 
>  (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
> @@ -1732,7 +1739,61 @@
>  }
>    [(set_attr "type" "div")])
> 
> -;; *tdiv* instruction returning the FG flag
> +(define_insn "vsx_div_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVSQ))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vdivsq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_udiv_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVUQ))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vdivuq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_dives_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVESQ))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vdivesq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_diveu_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_DIVEUQ))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vdiveuq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_mods_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_MODSQ))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vmodsq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> +(define_insn "vsx_modu_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
> +                     UNSPEC_VSX_MODUQ))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vmoduq %0,%1,%2"
> +  [(set_attr "type" "div")])
> +
> + ;; *tdiv* instruction returning the FG flag
>  (define_expand "vsx_tdiv<mode>3_fg"
>    [(set (match_dup 3)
>  	(unspec:CCFP [(match_operand:VSX_B 1 "vsx_register_operand")
> @@ -3083,6 +3144,18 @@
>    "xxpermdi %x0,%x1,%x1,2"
>    [(set_attr "type" "vecperm")])
> 
> +;; Swap upper/lower 64-bit values in a 128-bit vector
> +(define_insn "xxswapd_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> +		      (parallel [(const_int 0)(const_int 1)])]
> +                     UNSPEC_XXSWAPD_V1TI))]
> +  "TARGET_POWER10"
> +;; AIX does not support extended mnemonic xxswapd.  Use the basic
> +;; mnemonic xxpermdi instead.
> +  "xxpermdi %x0,%x1,%x1,2"
> +  [(set_attr "type" "vecperm")])
> +
>  (define_insn "xxgenpcvm_<mode>_internal"
>    [(set (match_operand:VSX_EXTRACT_I4 0 "altivec_register_operand" "=wa")
>  	(unspec:VSX_EXTRACT_I4
> @@ -4767,8 +4840,16 @@
>     (set_attr "type" "vecload")])
> 
>  
> -;; ISA 3.0 vector extend sign support
> +;; ISA 3.1 vector extend sign support
> +(define_insn "vsx_sign_extend_v2di_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
> +		     UNSPEC_VSX_SIGN_EXTEND))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "vextsd2q %0,%1"
> +  [(set_attr "type" "vecexts")])
> 
> +;; ISA 3.0 vector extend sign support
>  (define_insn "vsx_sign_extend_qi_<mode>"
>    [(set (match_operand:VSINT_84 0 "vsx_register_operand" "=v")
>  	(unspec:VSINT_84
> @@ -5508,6 +5589,20 @@
>    "vcmpnew %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> +;; Vector Compare Not Equal v1ti (specified/not+eq:)
> +(define_expand "vcmpnet"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand")
> +	(not:V1TI
> +	  (eq:V1TI (match_operand:V1TI 1 "altivec_register_operand")
> +		   (match_operand:V1TI 2 "altivec_register_operand"))))]
> +   "TARGET_TI_VECTOR_OPS"
> +{
> +  emit_insn (gen_vector_eqv1ti (operands[0], operands[1], operands[2]));
> +  emit_insn (gen_one_cmplv1ti2 (operands[0], operands[0]));
> +  DONE;
> +})
> +
> +

nit: extra line.


>  ;; Vector Compare Not Equal or Zero Word
>  (define_insn "vcmpnezw"
>    [(set (match_operand:V4SI 0 "altivec_register_operand" "=v")
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index cb501ab2d75..346885de545 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21270,6 +21270,180 @@ Generate PCV from specified Mask size, as if implemented by the
>  immediate value is either 0, 1, 2 or 3.
>  @findex vec_genpcvm
> 
> +@smallexample
> +@exdent vector unsigned __int128 vec_rl (vector unsigned __int128,
> +                                         vector unsigned __int128);
> +@exdent vector signed __int128 vec_rl (vector signed __int128,
> +                                       vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of rotating the first input left by the number of bits
> +specified in the most significant quad word of the second input truncated to
> +7 bits (bits [125:131]).
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128,
> +                                           vector unsigned __int128,
> +                                           vector unsigned __int128);
> +@exdent vector signed __int128 vec_rlmi (vector signed __int128,
> +                                         vector signed __int128,
> +                                         vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of rotating the first input and inserting it under mask into the
> +second input. The first bit in the mask, the last bit in the mask are obtained from the
> +two 7-bit fields bits [108:115] and bits [117:123] respectively of the second input.
> +The shift is obtained from the third input in the 7-bit field [125:131] where all bits
> +counted from zero at the left.

I initially had a comment here, but after a re-read I think this is OK.


> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128,
> +                                           vector unsigned __int128,
> +                                           vector unsigned __int128);
> +@exdent vector signed __int128 vec_rlnm (vector signed __int128,
> +                                         vector unsigned __int128,
> +                                         vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of rotating the first input and ANDing it with a mask. The first
> +bit in the mask, the last bit in the mask and the shift amount are obtained from the two
> +7-bit fields bits [117:123] and bits [125:131] respectively of the second input.
> +The shift is obtained from the third input in the 7-bit field bits [125:131] where all
> +bits counted from zero at the left.

Shift amount reference in second sentence read clunky, should be
adjusted wrt third sentence.


> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_sl(vector unsigned __int128, vector unsigned __int128);
> +@exdent vector signed __int128 vec_sl(vector signed __int128, vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of shifting the first input left by the number of bits
> +specified in the most significant bits of the second input truncated to
> +7 bits (bits [125:131]).
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_sr(vector unsigned __int128, vector unsigned __int128);
> +@exdent vector signed __int128 vec_sr(vector signed __int128, vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of performing a logical right shift of the first argument
> +by the number of bits specified in the most significant double word of the
> +second input truncated to 7 bits (bits [125:131]).
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_sra(vector unsigned __int128, vector unsigned __int128);
> +@exdent vector signed __int128 vec_sra(vector signed __int128, vector unsigned __int128);
> +@end smallexample
> +
> +Returns the result of performing arithmetic right shift of the first argument
> +by the number of bits specified in the most significant bits of the
> +second input truncated to 7 bits (bits [125:131]).
> +
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_mule (vector unsigned long long,
> +                                           vector unsigned long long);
> +@exdent vector signed __int128 vec_mule (vector signed long long,
> +                                         vector signed long long);
> +@end smallexample
> +
> +Returns a vector containing a 128-bit integer result of multiplying the even doubleword
> +elements of the two inputs.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_mulo (vector unsigned long long,
> +                                           vector unsigned long long);
> +@exdent vector signed __int128 vec_mulo (vector signed long long,
> +                                         vector signed long long);
> +@end smallexample
> +
> +Returns a vector containing a 128-bit integer result of multiplying the odd doubleword
> +elements of the two inputs.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_div (vector unsigned __int128,
> +                                          vector unsigned __int128);
> +@exdent vector signed __int128 vec_div (vector signed __int128,
> +                                        vector signed __int128);
> +@end smallexample
> +
> +Returns the result of dividing the first operand by the second operand. An attempt to
> +divide any value by zero or to divide the most negative signed 128-bit integer by
> +negative one results in an undefined value.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_dive (vector unsigned __int128,
> +                                           vector unsigned __int128);
> +@exdent vector signed __int128 vec_dive (vector signed __int128,
> +                                         vector signed __int128);
> +@end smallexample
> +
> +The result is produced by shifting the first input left by 128 bits and dividing by the
> +second. If an attempt is made to divide by zero or the result is larger than 128 bits,
> +the result is undefined.
> +
> +@smallexample
> +@exdent vector unsigned __int128 vec_mod (vector unsigned __int128,
> +                                          vector unsigned __int128);
> +@exdent vector signed __int128 vec_mod (vector signed __int128,
> +                                        vector signed __int128);
> +@end smallexample
> +
> +The result is the modulo result of dividing the first input  by the second input.
> +
> +
> +The following builtins perform 128-bit vector comparisons.  The @code{vec_all_xx},
> +@code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is one of the operations
> +@code{eq, ne, gt, lt, ge, le} perform pairwise comparisons between the elements
> +at the same positions within their two vector arguments.   The @code{vec_all_xx}
> +function returns a non-zero value if and only if all pairwise comparisons are true.  The
> +@code{vec_any_xx} function returns a non-zero value if and only if at least one pairwise
> +comparison is true.  The @code{vec_cmpxx}function returns a vector of the same type as its
> +two arguments, within which each element consists of all ones to denote that specified
> +logical comparison of the corresponding elements was true.  Otherwise, the element of the
> +returned vector contains all zeros.
> +
> +@smallexample
> +vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128);
> +vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128);
> +
> +int vec_all_eq (vector signed __int128, vector signed __int128);
> +int vec_all_eq (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_ne (vector signed __int128, vector signed __int128);
> +int vec_all_ne (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_gt (vector signed __int128, vector signed __int128);
> +int vec_all_gt (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_lt (vector signed __int128, vector signed __int128);
> +int vec_all_lt (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_ge (vector signed __int128, vector signed __int128);
> +int vec_all_ge (vector unsigned __int128, vector unsigned __int128);
> +int vec_all_le (vector signed __int128, vector signed __int128);
> +int vec_all_le (vector unsigned __int128, vector unsigned __int128);
> +
> +int vec_any_eq (vector signed __int128, vector signed __int128);
> +int vec_any_eq (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_ne (vector signed __int128, vector signed __int128);
> +int vec_any_ne (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_gt (vector signed __int128, vector signed __int128);
> +int vec_any_gt (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_lt (vector signed __int128, vector signed __int128);
> +int vec_any_lt (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_ge (vector signed __int128, vector signed __int128);
> +int vec_any_ge (vector unsigned __int128, vector unsigned __int128);
> +int vec_any_le (vector signed __int128, vector signed __int128);
> +int vec_any_le (vector unsigned __int128, vector unsigned __int128);
> +@end smallexample
> +
> +
>  @node PowerPC Hardware Transactional Memory Built-in Functions
>  @subsection PowerPC Hardware Transactional Memory Built-in Functions
>  GCC provides two interfaces for accessing the Hardware Transactional
> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> new file mode 100644
> index 00000000000..c84494fc28d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -0,0 +1,2254 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10" } */
> +
> +
> +/* Check that the expected 128-bit instructions are generated if the processor
> +   supports the 128-bit integer instructions. */
> +/* { dg-final { scan-assembler-times {\mvextsd2q\M} 2 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvslq\M} 2 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvsrq\M} 2 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvsraq\M} 2 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvrlq\M} 2 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvcmpuq\M} 0 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvcmpsq\M} 0 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvcmpequq\M} 0 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvcmpequq.\M} 16 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 0 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtsq.\M} 16 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 0 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtuq.\M} 16 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvmuleud\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvmuloud\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvmulesd\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvmulosd\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvdivsq\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvdivuq\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvdivesq\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvdiveuq\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvmodsq\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mvmoduq\M} 1 { target { ppc_native_128bit } } } } */


Since it's on all of the clauses, Maybe adjust the dg-require to
include ppc_native_128bit for the whole test, unless there is more to
follow.


No other comments,.. 
Thanks
-Will




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [EXTERNAL] Re: [Patch 1/5] rs6000, Add 128-bit sign extension support
  2020-08-13 22:55           ` Segher Boessenkool
@ 2020-08-13 23:53             ` will schmidt
  2020-08-18 21:50               ` Segher Boessenkool
  0 siblings, 1 reply; 27+ messages in thread
From: will schmidt @ 2020-08-13 23:53 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Carl Love, dje.gcc, gcc-patches, Bill Schmidt

On Thu, 2020-08-13 at 17:55 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Aug 13, 2020 at 05:11:11PM -0500, will schmidt wrote:
> > > > That is probably a level of detail that is not
> > > > really needed in the GCC code comment.  Probably best to just
> > > > change
> > > > the comment to read something like "ISA 3.0 sign extend
> > > > builtins". 
> > > 
> > > Sounds good.
> > 
> > As long as there are no issues defining the builtins for 3.0 here.
> > AFAIK they are not documented in ISA 3.0.  This is a happy accident
> > that these ISA 3.1 builtins can be implemented with existing
> > support.
> 
> There are *no* builtins defined in the ISA!  The insns are just ISA
> 3.0
> instructions.
> 

Ok. 

So then maybe just "Sign extend builtins" and leave off the ISA
reference all together.   

:-)

thanks
-WIll

> 
> Segher


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support
  2020-08-11 19:22 ` [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support Carl Love
@ 2020-08-14 17:13   ` will schmidt
  2020-08-20  1:29   ` Segher Boessenkool
  1 sibling, 0 replies; 27+ messages in thread
From: will schmidt @ 2020-08-14 17:13 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches

On Tue, 2020-08-11 at 12:22 -0700, Carl Love wrote:
> Segher, Will:
> 
> Path 3 adds support for converting to/from 128-bit integers and 128-bit 
> decimal floating point formats.  
> 
>                   Carl Love
> 

Some cosmetic comments below.  overall lgtm. 

Thanks, 
-Will


> 
> ----------------------------------------------------------------
> Add TI to TD (128-bit DFP) and TD to TI support
> 
> gcc/ChangeLog
> 
> 2020-08-10  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-08-10  Carl Love  <cel@us.ibm.com>
> 	* gcc.target/powerpc/int_128bit-runnable.c:  Add tests.


Update test.  (This test already exists).


> ---
>  gcc/config/rs6000/dfp.md                      | 15 +++++
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 64 +++++++++++++++++++

nit - Path to testcase looks strange?

>  2 files changed, 79 insertions(+)
> 
> diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
> index 8f822732bac..ac9fe189f3e 100644
> --- a/gcc/config/rs6000/dfp.md
> +++ b/gcc/config/rs6000/dfp.md
> @@ -222,6 +222,13 @@
>    "dcffixq %0,%1"
>    [(set_attr "type" "dfp")])
> 
> +(define_insn "floattitd2"
> +  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
> +	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "dcffixqq %0,%1"
> +  [(set_attr "type" "dfp")])
> +
>  ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
>  ;; This is the first stage of converting it to an integer type.
> 

Compared to some existing define_insn entries, this matches the style,
looks reasonable.


> @@ -241,6 +248,14 @@
>    "TARGET_DFP"
>    "dctfix<q> %0,%1"
>    [(set_attr "type" "dfp")])
> +
> +  ;; carll

Fix comment.

> +(define_insn "fixtdti2"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> +	(fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "dctfixqq %0,%1"
> +  [(set_attr "type" "dfp")])
>  

looks reasonable.


>  ;; Decimal builtin support
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> index c84494fc28d..d1e69cea021 100644
> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -38,6 +38,7 @@
>  #if DEBUG
>  #include <stdio.h>
>  #include <stdlib.h>
> +#include <math.h>
> 
> 
>  void print_i128(__int128_t val)
> @@ -59,6 +60,13 @@ int main ()
>    __int128_t arg1, result;
>    __uint128_t uarg2;
> 
> +  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
> +
> +  struct conv_t {
> +    __uint128_t u128;
> +    _Decimal128 d128;
> +  } conv, conv2;
> +
>    vector signed long long int vec_arg1_di, vec_arg2_di;
>    vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
>    vector unsigned long long int vec_uresult_di;
> @@ -2249,6 +2257,62 @@ int main ()
>      abort();
>  #endif
>    }
> +  
> +  /* DFP to __int128 and __int128 to DFP conversions */
> +  /* Can't get printing of DFP values to work.  Print the DFP value as an
> +     unsigned int so we can see the bit patterns.  */
> +#if 1


I'd recommend dropping the #if 1 and matching #endif.

> +  conv.u128 = 0x2208000000000000ULL;
> +  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
> +  expected_result_dfp128 = conv.d128;
> +
> +  arg1 = 4;
> +
> +  conv.d128 = (_Decimal128) arg1;
> +
> +  result_dfp128 = (_Decimal128) arg1;
> +  if (((conv.u128 >>64) != 0x2208000000000000ULL) &&
> +      ((conv.u128 & 0xFFFFFFFFFFFFFFFF) != 0x4ULL)) {
> +#if DEBUG
> +    printf("ERROR:  convert int128 value ");
> +    print_i128 (arg1);
> +    conv.d128 = result_dfp128;
> +    printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
> +	   (unsigned long long)((conv.u128) >>64),
> +	   (unsigned long long)((conv.u128) & 0xFFFFFFFFFFFFFFFF));
> +
> +    conv.d128 = expected_result_dfp128;
> +    printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
> +	   (unsigned long long) (conv.u128>>64),
> +	   (unsigned long long) (conv.u128 & 0xFFFFFFFFFFFFFFFF));
> +#else
> +    abort();
> +#endif
> +  }
> +#endif
> +
> +  expected_result = 4;
> 
> +  conv.u128 = 0x2208000000000000ULL;
> +  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
> +  arg1_dfp128 = conv.d128;
> +
> +  result = (__int128_t) arg1_dfp128;
> +
> +  if (result != expected_result) {
> +#if DEBUG
> +    printf("ERROR:  convert DFP value ");
> +    printf("0x%llx %llx (printed as hex bit string) ",
> +	   (unsigned long long)(conv.u128>>64),
> +	   (unsigned long long)(conv.u128 & 0xFFFFFFFFFFFFFFFF));
> +    printf("to __int128 value = ");
> +    print_i128 (result);
> +    printf("\ndoes not match expected_result = ");
> +    print_i128 (expected_result);
> +    printf("\n");
> +#else
> +    abort();
> +#endif
> +  }
>    return 0;
>  }


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 4/5] rs6000,  Test 128-bit shifts for just the int128 type.
  2020-08-11 19:23 ` [Patch 4/5] rs6000, Test 128-bit shifts for just the int128 type Carl Love
@ 2020-08-14 17:35   ` will schmidt
  2020-08-20 21:50   ` Segher Boessenkool
  1 sibling, 0 replies; 27+ messages in thread
From: will schmidt @ 2020-08-14 17:35 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches; +Cc: Bill Schmidt, cel

On Tue, 2020-08-11 at 12:23 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 4 adds 128-bit integer shift instruction support.


I suggest having a few more words here to better describe what this
patch is doing. 
i.e. 
This is adding the VEC_I128 iterator which contains the V1TI and TI
types, and modifying existing define_insns to add handling for the
VEC_I128 iterator.


> 
>                  Carl Love
> 
> ---------------------------------------------------------
> Test 128-bit shifts for just the int128 type.
> 
> gcc/ChangeLog
> 
> 2020-08-10  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq): Add mode
> 	VEC_I128.


Maybe also rename to altivec_vslq_<mode> and altivec_vsrq_<mode>.  


> 	* config/rs6000/vector.md (VEC_I128): New mode iterator.

ok

> 	(vashlv1ti3): Change to vashl<mode>3, mode VEC_I128.
> 	(vlshrv1ti3): Change to vlshr<mode>3, mode VEC_I128.


> 	* config/rs6000/vsx.md (UNSPEC_XXSWAPD_V1TI): Change to
> 	UNSPEC_XXSWAPD_VEC_I128.

s/Change/Rename/


> 	(xxswapd_v1ti): Change to xxswapd_<mode>, mode VEC_I128.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-08-10  Carl Love  <cel@us.ibm.com>
> 	gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
> 	tests.
> ---
>  gcc/config/rs6000/altivec.md                  | 16 +++++------
>  gcc/config/rs6000/vector.md                   | 27 ++++++++++---------
>  gcc/config/rs6000/vsx.md                      | 14 +++++-----
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 24 +++++++++++++++--
>  4 files changed, 52 insertions(+), 29 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2763d920828..cba39852070 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2219,10 +2219,10 @@
>    "vsl<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vslq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_insn "altivec_vslq_<mode>"
> +  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
> +	(ashift:VEC_I128 (match_operand:VEC_I128 1 "vsx_register_operand" "v")
> +		     (match_operand:VEC_I128 2 "vsx_register_operand" "v")))]
>    "TARGET_TI_VECTOR_OPS"
>    /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
>    "vslq %0,%1,%2"

ok

> @@ -2236,10 +2236,10 @@
>    "vsr<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vsrq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_insn "altivec_vsrq_<mode>"
> +  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
> +	(lshiftrt:VEC_I128 (match_operand:VEC_I128 1 "vsx_register_operand" "v")
> +			   (match_operand:VEC_I128 2 "vsx_register_operand" "v")))]
>    "TARGET_TI_VECTOR_OPS"
>    /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
>    "vsrq %0,%1,%2"

ok

> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 2deff282076..682aabc4657 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; 128-bit int modes
> +(define_mode_iterator VEC_I128 [V1TI TI])
> +
>  ;; Vector int modes for parity
>  (define_mode_iterator VEC_IP [V8HI
>  			      V4SI
> @@ -1635,17 +1638,17 @@
>    "")
> 
>  ;; No immediate version of this 128-bit instruction
> -(define_expand "vashlv1ti3"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_expand "vashl<mode>3"
> +  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
> +	(ashift:VEC_I128 (match_operand:VEC_I128 1 "vsx_register_operand")
> +			 (match_operand:VEC_I128 2 "vsx_register_operand")))]
>    "TARGET_TI_VECTOR_OPS"
>  {
>    /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> -  rtx tmp = gen_reg_rtx (V1TImode);
> +  rtx tmp = gen_reg_rtx (<MODE>mode);
> 
>    emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> -  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
> +  emit_insn(gen_altivec_vslq_<mode> (operands[0], operands[1], tmp));
>    DONE;
>  })
> 
> @@ -1658,17 +1661,17 @@
>    "")
> 
>  ;; No immediate version of this 128-bit instruction
> -(define_expand "vlshrv1ti3"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_expand "vlshr<mode>3"
> +  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
> +	(lshiftrt:VEC_I128 (match_operand:VEC_I128 1 "vsx_register_operand")
> +			   (match_operand:VEC_I128 2 "vsx_register_operand")))]
>    "TARGET_TI_VECTOR_OPS"
>  {
>    /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
> -  rtx tmp = gen_reg_rtx (V1TImode);
> +  rtx tmp = gen_reg_rtx (<MODE>mode);
> 
>    emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> -  emit_insn(gen_altivec_vsrq (operands[0], operands[1], tmp));
> +  emit_insn(gen_altivec_vsrq_<mode> (operands[0], operands[1], tmp));
>    DONE;
>  })
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 998af3908ad..5be535808b3 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -367,7 +367,7 @@
>     UNSPEC_INSERTR
>     UNSPEC_REPLACE_ELT
>     UNSPEC_REPLACE_UN
> -	UNSPEC_XXSWAPD_V1TI
> +	UNSPEC_XXSWAPD_VEC_I128
>    ])


double-check whitespace indentation.


> 
>  (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
> @@ -3144,12 +3144,12 @@
>    "xxpermdi %x0,%x1,%x1,2"
>    [(set_attr "type" "vecperm")])
> 
> -;; Swap upper/lower 64-bit values in a 128-bit vector
> -(define_insn "xxswapd_v1ti"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
> -		      (parallel [(const_int 0)(const_int 1)])]
> -                     UNSPEC_XXSWAPD_V1TI))]
> +;; Swap upper/lower 64-bit values in V1TI or TI type
> +(define_insn "xxswapd_<mode>"
> +  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
> +	(unspec:VEC_I128 [(match_operand:VEC_I128 1 "vsx_register_operand" "v")
> +			  (parallel [(const_int 0)(const_int 1)])]
> +                     UNSPEC_XXSWAPD_VEC_I128))]
>    "TARGET_POWER10"
>  ;; AIX does not support extended mnemonic xxswapd.  Use the basic
>  ;; mnemonic xxpermdi instead.
> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> index d1e69cea021..b074d83bd68 100644
> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -53,6 +53,18 @@ void print_i128(__int128_t val)
> 
>  void abort (void);
> 
> +__attribute__((noinline))
> +__int128_t shift_right (__int128_t a, __uint128_t b)
> +{
> +  return a >> b;
> +}
> +
> +__attribute__((noinline))
> +__int128_t shift_left (__int128_t a, __uint128_t b)
> +{
> +  return a << b;
> +}
> +
>  int main ()
>  {
>    int i, result_int;
> @@ -141,10 +153,12 @@ int main ()
>  #endif
>    }
> 
> -  arg1 = 3;
> +  //  arg1 = 3;
> +  arg1 = vec_result[0];
>    uarg2 = 4;
>    expected_result = arg1*16;

Just drop the lines that are commented out with "//" 
Here and below.

> 
> +  //  result = shift_left(arg1, uarg2);
>    result = arg1 << uarg2;
> 
>    if (result != expected_result) {
> @@ -225,10 +239,16 @@ int main ()
>  #endif
>    }
> 
> -  arg1 = 48;
> +  //  arg1 = 48;
> +
> +  // use the previous result to try and keep gcc from doing the shift
> +  // at compile time
> +  arg1 = vec_uresult[0];
>    uarg2 = 4;
>    expected_result = arg1/16;
> 
> +  //Not getting 128-bit shift inst generated
> +  //  result = shift_right (arg1, uarg2);
>    result = arg1 >> uarg2;
> 
>    if (result != expected_result) {


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 5/5] rs6000,  Conversions between 128-bit integer and floating point values.
  2020-08-11 19:23 ` [Patch 5/5] rs6000, Conversions between 128-bit integer and floating point values Carl Love
@ 2020-08-14 18:50   ` will schmidt
  2020-08-20 22:36   ` Segher Boessenkool
  2020-09-19  0:25   ` will schmidt
  2 siblings, 0 replies; 27+ messages in thread
From: will schmidt @ 2020-08-14 18:50 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches; +Cc: Bill Schmidt, cel

On Tue, 2020-08-11 at 12:23 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 5 adds the 128-bit integer to/from 128-floating point
> conversions.  This patch has to invoke the routines to use the 128-bit
> hardware instructions if on Power 10 or use software routines if
> running on a pre Power 10 system via the resolve function.  
> 




>                           Carl 
> 
> -----------------------------------------------------------
> Conversions between 128-bit integer and floating point values.
> 
> gcc/ChangeLog
> 
> 2020-08-10  Carl Love  <cel@us.ibm.com>
> 	config/rs6000/rs6000.md (floatunsti<mode>2,
> 	fix_trunc<mode>ti2, fixuns_trunc<mode>ti2): Add
> 	define_insn for mode IEEE 128.

s/Add/Update/

missing floatti<mode>2



> 	libgcc/config/rs6000/fixkfi-sw.c: New file.
> 	libgcc/config/rs6000/fixkfi.c: Remove file.
> 	libgcc/config/rs6000/fixunskfi-sw.c: New file.
> 	libgcc/config/rs6000/fixunskfi.c: Remove file.

... rename to ... ?


> 	libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
> 	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
> 	New functions.
> 	libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
> 	New macro.
> 	(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
> 	__fixunskfti_resolve): Add resolve functions.
> 	(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
> 	functions.
> 	libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
> 	__fixtfti, __fixunstfti): Add editor commands to change
> 	names.
> 	libgcc/config/rs6000/float128-sed-hw (__floattitf,
> 	__floatuntitf, __fixtfti, __fixunstfti): Add editor commands
> 	to change names.


> 	libgcc/config/rs6000/floattikf-sw.c: New file.
> 	libgcc/config/rs6000/floattikf.c: Remove file.
> 	libgcc/config/rs6000/floatuntikf-sw.c: New file.
> 	libgcc/config/rs6000/floatuntikf.c: Remove file.
> 	libgcc/config/rs6000/floatuntikf-sw.c: New file.

floatuntikf-sw was so good, it was added twice.
... rename to ... ? 


> 	libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,

One 'a' in quad.

> 	__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
> 	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
> 	__floatuntikf, __fixkfti, __fixunskfti):	New extern declarations.

Tab in there that should not be.


> 	libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
> 	fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
> 	(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
> 	file names to fp128_ppc_funcs.


Perhaps
	libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
 	fixkfti, fixunskfti): Rename to (floattikf-sw, floatuntikf-sw, 
	fixkfti-sw, fixunskfti-sw)



> 
> gcc/testsuite/ChangeLog
> 
> 2020-08-10  Carl Love  <cel@us.ibm.com>
> 	gcc.target/powerpc/fl128_conversions.c: New file.

New test.  or just New.


> ---
>  gcc/config/rs6000/rs6000.md                   |  36 +++
>  .../gcc.target/powerpc/fp128_conversions.c    | 287 ++++++++++++++++++
>  .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
>  .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   4 +-
>  libgcc/config/rs6000/float128-hw.c            |  24 ++
>  libgcc/config/rs6000/float128-ifunc.c         |  44 ++-
>  libgcc/config/rs6000/float128-sed             |   4 +
>  libgcc/config/rs6000/float128-sed-hw          |   4 +
>  .../rs6000/{floattikf.c => floattikf-sw.c}    |   4 +-
>  .../{floatuntikf.c => floatuntikf-sw.c}       |   4 +-
>  libgcc/config/rs6000/quad-float128.h          |  17 +-
>  libgcc/config/rs6000/t-float128               |   3 +-
>  12 files changed, 415 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
>  rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
>  rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (96%)
>  rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
>  rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%)
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 43b620ae1c0..3853ebd4195 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6390,6 +6390,42 @@
>     xscvsxddp %x0,%x1"
>    [(set_attr "type" "fp")])
> 
> +(define_insn "floatti<mode>2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +       (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvsqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "floatunsti<mode>2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +       (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvuqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "fix_trunc<mode>ti2"
> +  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
> +       (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvqpsqz %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "fixuns_trunc<mode>ti2"
> +  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
> +       (unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvqpuqz %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
>  ; Allow the combiner to merge source memory operands to the conversion so that
>  ; the optimizer/register allocator doesn't try to load the value too early in a
>  ; GPR and then use store/load to move it to a FPR and suffer from a store-load

ok.

> diff --git a/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
> new file mode 100644
> index 00000000000..f0336e6f1fc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
> @@ -0,0 +1,287 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10" } */
> +
> +/* Check that the expected 128-bit instructions are generated if the processor
> +   supports the 128-bit integer instructions. */
> +/* { dg-final { scan-assembler-times {\mxscvsqqp\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvuqqp\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvqpsqz\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvqpuqz\M} 1 { target { ppc_native_128bit } } } } */
> +
> +#include <stdio.h>
> +#include <math.h>
> +#include <fenv.h>
> +#include <stdlib.h>
> +#include <wchar.h>
> +
> +#define DEBUG 1

Turn off Debug.


> +
> +void abort (void);
> +
> +float conv_i_2_fp( long long int a)
> +{
> +  return (float) a;
> +}
> +
> +double conv_i_2_fpd( long long int a)
> +{
> +  return (double) a;
> +}
> +
> +double conv_ui_2_fpd( unsigned long long int a)
> +{
> +  return (double) a;
> +}
> +
> +__float128 conv_i128_2_fp128 (__int128_t a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__float128) a;


So..  Should this test be duplicated and updated to test both
of those -mabi=<foo> options?  And the default?


> +}
> +
> +__float128 conv_ui128_2_fp128 (__uint128_t a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__float128) a;
> +}
> +
> +__int128_t conv_fp128_2_i128 (__float128 a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__int128_t) a;
> +}
> +
> +__uint128_t conv_fp128_2_ui128 (__float128 a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__uint128_t) a;
> +}
> +
> +long double conv_i128_2_ld (__int128_t a)
> +{
> +  // default, gen call __floattitf
> +  // -mabi=ibmlongdouble, gen call __floattitf
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (long double) a;
> +}
> +
> +__ibm128 conv_i128_2_ibm128 (__int128_t a)
> +{
> +  // default, gen call __floattitf
> +  // -mabi=ibmlongdouble, gen call __floattitf
> +  // -mabi=ieeelongdouble, messages about uses IBM long double, no binary output

What does that mean?  What messages?  Clarify.

> +  return (__ibm128) a;
> +}
> +
> +int main()
> +{
> +	float a, expected_result_float;
> +	double b, expected_result_double;
> +	long long int c, expected_result_llint;
> +	unsigned long long int u;
> +	__int128_t d;
> +	__uint128_t u128;
> +	unsigned long long expected_result_uint128[2] ;
> +	__float128 e;
> +	long double ld;     // another 128-bit float version
> +
> +	union conv_t {
> +		float a;
> +		double b;
> +		long long int c;
> +		long long int128[2] ;
> +		unsigned long long uint128[2] ;
> +		unsigned long long int u;
> +		__int128_t d;
> +		__uint128_t u128;
> +		__float128 e;
> +		long double ld;     // another 128-bit float version
> +	} conv, conv_result;
> +
> + 
> +	c = 20;
> +	expected_result_llint = 20.00000;
> +	a = conv_i_2_fp (c);
> +
> +	if (a != expected_result_llint) {
> +#if DEBUG
> +		printf("ERROR: conv_i_2_fp(%lld) = %10.5f\n", c, a);
> +		printf("\n does not match expected_result = %10.5f\n\n",
> +				 expected_result_llint);
> + #else
> +		abort();
> +#endif
> +	}
> +
> +	c = 20;
> +	expected_result_double = 20.00000;
> +	b = conv_i_2_fpd (c);
> +
> +	if (b != expected_result_double) {
> +#if DEBUG
> +		printf("ERROR: conv_i_2_fpd(%lld) = %10.5f\n", d, b);
> +		printf("\n does not match expected_result = %10.5f\n\n",
> +				 expected_result_double);
> + #else
> +		abort();
> +#endif
> +	}
> +
> +	u = 20;
> +	expected_result_double = 20.00000;
> +	b = conv_ui_2_fpd (u);
> +
> +	if (b != expected_result_double) {
> +#if DEBUG
> +		printf("ERROR: conv_ui_2_fpd(%llu) = %10.5f\n", u, b);
> +		printf("\n does not match expected_result = %10.5f\n\n",
> +				 expected_result_double);
> + #else
> +		abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */

What does that mean?   (Limitations in printing the type? Perhaps
just a comment at the printf that clarifies you are printing the
values in hex for reasons..)   Here and elsewhere.


> +  d = -3210;
> +  d = (d * 10000000000) + 9876543210;
> +  conv_result.e = conv_i128_2_fp128 (d);
> +  expected_result_uint128[1] = 0xc02bd2f9068d1160;
> +  expected_result_uint128[0] = 0x0;
> +  
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_i128_2_fp128(-32109876543210) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  d = 123;
> +  d = (d * 10000000000) + 1234567890;
> +  conv_result.ld = conv_i128_2_fp128 (d);
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0x4271eab4c8ed2000;
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_i128_2_fp128(1231234567890) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */
> +  u128 = 8760;
> +  u128 = (u128 * 10000000000) + 1234567890;
> +  conv_result.e = conv_ui128_2_fp128 (u128);
> +  expected_result_uint128[1] = 0x402d3eb101df8b48;
> +  expected_result_uint128[0] = 0x0;
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_ui128_2_fp128(87601234567890) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */
> +  u128 = 3210;
> +  u128 = (u128 * 10000000000) + 9876543210;
> +  expected_result_uint128[1] = 0x402bd3429c8feea0;
> +  expected_result_uint128[0] = 0x0;
> +  conv_result.e = conv_ui128_2_fp128 (u128);
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_ui128_2_fp128(32109876543210) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = 12345.6789;
> +  expected_result_uint128[1] = 0x1407374883526960;
> +  expected_result_uint128[0] = 0x3039;
> +
> +  conv_result.d = conv_fp128_2_i128 (conv.e);
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) =  ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
> +
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = -6789.12345;
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0xffffffffffffe57b;
> +  conv_result.d = conv_fp128_2_i128 (conv.e);
> + 
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) = ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
> +
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = 6789.12345;
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0x1a85;
> +  conv_result.d = conv_fp128_2_ui128 (conv.e);
> + 
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_ui128(0x%llx %llx) = ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
> +	  
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  return 0;
> +}

ok


> diff --git a/libgcc/config/rs6000/fixkfti.c b/libgcc/config/rs6000/fixkfti-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/fixkfti.c
> rename to libgcc/config/rs6000/fixkfti-sw.c
> index a22286228aa..d6bbbf889b7 100644
> --- a/libgcc/config/rs6000/fixkfti.c
> +++ b/libgcc/config/rs6000/fixkfti-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TItype
> -__fixkfti (TFtype a)
> +__fixkfti_sw (TFtype a)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
ok

> diff --git a/libgcc/config/rs6000/fixunskfti.c b/libgcc/config/rs6000/fixunskfti-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/fixunskfti.c
> rename to libgcc/config/rs6000/fixunskfti-sw.c
> index ab232d92d24..d803936e48a 100644
> --- a/libgcc/config/rs6000/fixunskfti.c
> +++ b/libgcc/config/rs6000/fixunskfti-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  UTItype
> -__fixunskfti (TFtype a)
> +__fixunskfti_sw (TFtype a)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);

ok

> diff --git a/libgcc/config/rs6000/float128-hw.c b/libgcc/config/rs6000/float128-hw.c
> index 8705b53e22a..be8bd07e853 100644
> --- a/libgcc/config/rs6000/float128-hw.c
> +++ b/libgcc/config/rs6000/float128-hw.c
> @@ -86,6 +86,30 @@ __floatdikf_hw (DItype_ppc a)
>    return (TFtype) a;
>  }
> 
> +TFtype
> +__floattikf_hw (TItype_ppc a)
> +{
> +  return (TFtype) a;
> +}
> +
> +TFtype
> +__floatuntikf_hw (UTItype_ppc a)
> +{
> +  return (TFtype) a;
> +}
> +
> +TItype_ppc
> +__fixkfti_hw (TFtype a)
> +{
> +  return (TItype_ppc) a;
> +}
> +
> +UTItype_ppc
> +__fixunskfti_hw (TFtype a)
> +{
> +  return (UTItype_ppc) a;
> +}
> +
>  TFtype
>  __floatundikf_hw (UDItype_ppc a)
>  {

ok.

> diff --git a/libgcc/config/rs6000/float128-ifunc.c b/libgcc/config/rs6000/float128-ifunc.c
> index c2f65912a74..c221be2c864 100644
> --- a/libgcc/config/rs6000/float128-ifunc.c
> +++ b/libgcc/config/rs6000/float128-ifunc.c
> @@ -46,14 +46,9 @@
>  #endif
> 
>  #define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW : SW)
> +#define SW_OR_HW_ISA3_1(SW, HW) (__builtin_cpu_supports ("arch_3_1") ? HW : SW)
> 
>  /* Resolvers.  */
> -
> -/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
> -   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
> -   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
> -   use the emulator functions for these conversions.  */
> -
>  static __typeof__ (__addkf3_sw) *
>  __addkf3_resolve (void)
>  {
> @@ -102,6 +97,18 @@ __floatdikf_resolve (void)
>    return SW_OR_HW (__floatdikf_sw, __floatdikf_hw);
>  }
> 
> +static __typeof__ (__floattikf_sw) *
> +__floattikf_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__floattikf_sw, __floattikf_hw);
> +}
> +
> +static __typeof__ (__floatuntikf_sw) *
> +__floatuntikf_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__floatuntikf_sw, __floatuntikf_hw);
> +}
> +
>  static __typeof__ (__floatunsikf_sw) *
>  __floatunsikf_resolve (void)
>  {
> @@ -114,6 +121,19 @@ __floatundikf_resolve (void)
>    return SW_OR_HW (__floatundikf_sw, __floatundikf_hw);
>  }
> 
> +
> +static __typeof__ (__fixkfti_sw) *
> +__fixkfti_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__fixkfti_sw, __fixkfti_hw);
> +}
> +
> +static __typeof__ (__fixunskfti_sw) *
> +__fixunskfti_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__fixunskfti_sw, __fixunskfti_hw);
> +}
> +
>  static __typeof__ (__fixkfsi_sw) *
>  __fixkfsi_resolve (void)
>  {
> @@ -303,6 +323,18 @@ TFtype __floatsikf (SItype_ppc)
>  TFtype __floatdikf (DItype_ppc)
>    __attribute__ ((__ifunc__ ("__floatdikf_resolve")));
> 
> +TFtype __floattikf (TItype_ppc)
> +  __attribute__ ((__ifunc__ ("__floattikf_resolve")));
> +
> +TFtype __floatuntikf (UTItype_ppc)
> +  __attribute__ ((__ifunc__ ("__floatuntikf_resolve")));
> +
> +TItype_ppc __fixkfti (TFtype)
> +  __attribute__ ((__ifunc__ ("__fixkfti_resolve")));
> +
> +UTItype_ppc __fixunskfti (TFtype)
> +  __attribute__ ((__ifunc__ ("__fixunskfti_resolve")));
> +
>  TFtype __floatunsikf (USItype_ppc)
>    __attribute__ ((__ifunc__ ("__floatunsikf_resolve")));
> 

ok

> diff --git a/libgcc/config/rs6000/float128-sed b/libgcc/config/rs6000/float128-sed
> index d9a089ff9ba..c0fcddb1959 100644
> --- a/libgcc/config/rs6000/float128-sed
> +++ b/libgcc/config/rs6000/float128-sed
> @@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi/g
>  s/__fixunstfdi/__fixunskfdi/g
>  s/__fixunstfsi/__fixunskfsi/g
>  s/__floatditf/__floatdikf/g
> +s/__floattitf/__floattikf/g
> +s/__floatuntitf/__floatuntikf/g
> +s/__fixtfti/__fixkfti/g
> +s/__fixunstfti/__fixunskfti/g
>  s/__floatsitf/__floatsikf/g
>  s/__floatunditf/__floatundikf/g
>  s/__floatunsitf/__floatunsikf/g
> diff --git a/libgcc/config/rs6000/float128-sed-hw b/libgcc/config/rs6000/float128-sed-hw
> index acf36b0c17d..3d2bf556da1 100644
> --- a/libgcc/config/rs6000/float128-sed-hw
> +++ b/libgcc/config/rs6000/float128-sed-hw
> @@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi_sw/g
>  s/__fixunstfdi/__fixunskfdi_sw/g
>  s/__fixunstfsi/__fixunskfsi_sw/g
>  s/__floatditf/__floatdikf_sw/g
> +s/__floattitf/__floattikf_sw/g
> +s/__floatuntitf/__floatuntikf_sw/g
> +s/__fixtfti/__fixkfti_sw/g
> +s/__fixunstfti/__fixunskfti_sw/g
>  s/__floatsitf/__floatsikf_sw/g
>  s/__floatunditf/__floatundikf_sw/g
>  s/__floatunsitf/__floatunsikf_sw/g

ok

> diff --git a/libgcc/config/rs6000/floattikf.c b/libgcc/config/rs6000/floattikf-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/floattikf.c
> rename to libgcc/config/rs6000/floattikf-sw.c
> index 4e8c40cfbe4..110706352bb 100644
> --- a/libgcc/config/rs6000/floattikf.c
> +++ b/libgcc/config/rs6000/floattikf-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TFtype
> -__floattikf (TItype i)
> +__floattikf_sw (TItype i)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/floatuntikf.c b/libgcc/config/rs6000/floatuntikf-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/floatuntikf.c
> rename to libgcc/config/rs6000/floatuntikf-sw.c
> index 8bfba4267d4..5e712a67e26 100644
> --- a/libgcc/config/rs6000/floatuntikf.c
> +++ b/libgcc/config/rs6000/floatuntikf-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TFtype
> -__floatuntikf (UTItype i)
> +__floatuntikf_sw (UTItype i)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/quad-float128.h b/libgcc/config/rs6000/quad-float128.h
> index 32ef328a8ea..24712b9277f 100644
> --- a/libgcc/config/rs6000/quad-float128.h
> +++ b/libgcc/config/rs6000/quad-float128.h
> @@ -87,19 +87,18 @@ extern USItype_ppc __fixunskfsi_sw (TFtype);
>  extern UDItype_ppc __fixunskfdi_sw (TFtype);
>  extern TFtype __floatsikf_sw (SItype_ppc);
>  extern TFtype __floatdikf_sw (DItype_ppc);
> +extern TFtype __floattikf_sw (TItype_ppc);
>  extern TFtype __floatunsikf_sw (USItype_ppc);
>  extern TFtype __floatundikf_sw (UDItype_ppc);
> +extern TFtype __floatuntikf_sw (UTItype_ppc);
> +extern TItype_ppc __fixkfti_sw (TFtype);
> +extern UTItype_ppc __fixunskfti_sw (TFtype);
>  extern IBM128_TYPE __extendkftf2_sw (TFtype);
>  extern TFtype __trunctfkf2_sw (IBM128_TYPE);
>  extern TCtype __mulkc3_sw (TFtype, TFtype, TFtype, TFtype);
>  extern TCtype __divkc3_sw (TFtype, TFtype, TFtype, TFtype);
> 
>  #ifdef _ARCH_PPC64
> -/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
> -   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
> -   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
> -   use the emulator functions for these conversions.  */
> -
>  extern TItype_ppc __fixkfti (TFtype);
>  extern UTItype_ppc __fixunskfti (TFtype);
>  extern TFtype __floattikf (TItype_ppc);
> @@ -130,8 +129,12 @@ extern USItype_ppc __fixunskfsi_hw (TFtype);
>  extern UDItype_ppc __fixunskfdi_hw (TFtype);
>  extern TFtype __floatsikf_hw (SItype_ppc);
>  extern TFtype __floatdikf_hw (DItype_ppc);
> +extern TFtype __floattikf_hw (TItype_ppc);
>  extern TFtype __floatunsikf_hw (USItype_ppc);
>  extern TFtype __floatundikf_hw (UDItype_ppc);
> +extern TFtype __floatuntikf_hw (UTItype_ppc);
> +extern TItype_ppc __fixkfti_hw (TFtype);
> +extern UTItype_ppc __fixunskfti_hw (TFtype);
>  extern IBM128_TYPE __extendkftf2_hw (TFtype);
>  extern TFtype __trunctfkf2_hw (IBM128_TYPE);
>  extern TCtype __mulkc3_hw (TFtype, TFtype, TFtype, TFtype);
> @@ -162,8 +165,12 @@ extern USItype_ppc __fixunskfsi (TFtype);
>  extern UDItype_ppc __fixunskfdi (TFtype);
>  extern TFtype __floatsikf (SItype_ppc);
>  extern TFtype __floatdikf (DItype_ppc);
> +extern TFtype __floattikf (TItype_ppc);
>  extern TFtype __floatunsikf (USItype_ppc);
>  extern TFtype __floatundikf (UDItype_ppc);
> +extern TFtype __floatuntikf (UTItype_ppc);
> +extern TItype_ppc __fixkfti (TFtype);
> +extern UTItype_ppc __fixunskfti (TFtype);
>  extern IBM128_TYPE __extendkftf2 (TFtype);
>  extern TFtype __trunctfkf2 (IBM128_TYPE);

ok

> diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
> index d5413445189..325b22fd49e 100644
> --- a/libgcc/config/rs6000/t-float128
> +++ b/libgcc/config/rs6000/t-float128
> @@ -23,7 +23,8 @@ fp128_softfp_shared_obj	= $(addsuffix -sw_s$(objext),$(fp128_softfp_funcs))
>  fp128_softfp_obj	= $(fp128_softfp_static_obj) $(fp128_softfp_shared_obj)
> 
>  # New functions for software emulation
> -fp128_ppc_funcs		= floattikf floatuntikf fixkfti fixunskfti \
> +fp128_ppc_funcs		= floattikf-sw floatuntikf-sw \
> +			  fixkfti-sw fixunskfti-sw \
>  			  extendkftf2-sw trunctfkf2-sw \
>  			  sfp-exceptions _mulkc3 _divkc3 _powikf2


ok.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [EXTERNAL] Re: [Patch 1/5] rs6000, Add 128-bit sign extension support
  2020-08-13 23:53             ` [EXTERNAL] " will schmidt
@ 2020-08-18 21:50               ` Segher Boessenkool
  0 siblings, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2020-08-18 21:50 UTC (permalink / raw)
  To: will schmidt; +Cc: Carl Love, dje.gcc, gcc-patches, Bill Schmidt

On Thu, Aug 13, 2020 at 06:53:56PM -0500, will schmidt wrote:
> On Thu, 2020-08-13 at 17:55 -0500, Segher Boessenkool wrote:
> > > As long as there are no issues defining the builtins for 3.0 here.
> > > AFAIK they are not documented in ISA 3.0.  This is a happy accident
> > > that these ISA 3.1 builtins can be implemented with existing
> > > support.
> > 
> > There are *no* builtins defined in the ISA!  The insns are just ISA
> > 3.0
> > instructions.
> 
> Ok. 
> 
> So then maybe just "Sign extend builtins" and leave off the ISA
> reference all together.   

Sure.  Or you can say "builtins for the instructions introduced in
Power ISA 3.1" or such.

If we ever get the builtins documentation updated quickly (and updated),
it should go on https://gcc.gnu.org/readings.html , and live will be
good.


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 2/5] rs6000, 128-bit multiply, divide, modulo, shift, compare
  2020-08-13 23:46   ` will schmidt
@ 2020-08-20  1:06     ` Segher Boessenkool
  0 siblings, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2020-08-20  1:06 UTC (permalink / raw)
  To: will schmidt; +Cc: Carl Love, dje.gcc, gcc-patches, Bill Schmidt, cel

On Thu, Aug 13, 2020 at 06:46:05PM -0500, will schmidt wrote:
> >  .../gcc.target/powerpc/int_128bit-runnable.c  | 2254 +++++++++++++++++
> 
> The path into the testsuite subdir looks strange there.

Git abbreviated this.  It is autogenerated (git diffstat), so there is
nothing much you can do about it.  The abbreviation is quite helpful
often (for moves for example), but not always, yup.

> > --- a/gcc/config/rs6000/altivec.h
> > +++ b/gcc/config/rs6000/altivec.h
> > @@ -183,7 +183,7 @@
> >  #define vec_recipdiv __builtin_vec_recipdiv
> >  #define vec_rlmi __builtin_vec_rlmi
> >  #define vec_vrlnm __builtin_vec_rlnm
> > -#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
> > +#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
> 
> per above.   I don't see this change called out.

It looks like an accidental revert.  Good catch :-)

commit e97929e20b2f52e6cfc046c1302324d1b24d95e3
Author: Carl Love <carll@us.ibm.com>
Date:   Wed Mar 25 18:33:37 2020 -0500

> > +  /* The -mti-vector-ops option requires ISA 3.1 support and -maltivec for
> > +     the 128-bit instructions.  Currently, TARGET_POWER10 is sufficient to
> > +     enable it by default.  */
> > +  if (TARGET_POWER10)
> > +    {
> > +      if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
> > +	warning(0, ("%<-mno-altivec%> disables -mti-vector-ops (128-bit integer vector register operations)."));
> > +      else
> > +	rs6000_isa_flags |= OPTION_MASK_TI_VECTOR_OPS;
> > +    }
> 
> It seems odd here that -maltivec is explicitly called out here.  That
> should be default on for quite a while at this point.

And the actual check it for -mvsx anyway?  Not sure I follow what this
does at all.

> > @@ -2305,6 +2306,7 @@ extern int frame_pointer_needed;
> >  #define RS6000_BTM_P8_VECTOR	MASK_P8_VECTOR	/* ISA 2.07 vector.  */
> >  #define RS6000_BTM_P9_VECTOR	MASK_P9_VECTOR	/* ISA 3.0 vector.  */
> >  #define RS6000_BTM_P9_MISC	MASK_P9_MISC	/* ISA 3.0 misc. non-vector */
> > +#define RS6000_BTM_P10_128BIT   MASK_POWER10    /* ISA P10 vector.  */
> 
> Should comment be 128-bit something?  (not just P10 vector).

Yeah, or it should be called P10_VECTOR (and it is called ISA 3.1).

> > @@ -2436,6 +2438,7 @@ enum rs6000_builtin_type_index
> >    RS6000_BTI_bool_V8HI,          /* __vector __bool short */
> >    RS6000_BTI_bool_V4SI,          /* __vector __bool int */
> >    RS6000_BTI_bool_V2DI,          /* __vector __bool long */
> > +  RS6000_BTI_bool_V1TI,          /* __vector __bool long */
> 
> Fix comment?

I wonder if the V2DI is correct even (should be "long long"?).

> > +mti-vector-ops
> > +Target Report Mask(TI_VECTOR_OPS) Var(rs6000_isa_flags)
> > +Use integer 128-bit instructions for a future architecture.
> 
> 'future' can probably be adjusted.

Yes :-)

> > \ No newline at end of file
> 
> diff error?

No, the file really should have a newline at the end.  Not all editors
enforce that by default :-(

> > +(define_expand "vector_eqv1ti"
> > +  [(set (match_operand:V1TI 0 "vlogical_operand")
> > +	(eq:V1TI (match_operand:V1TI 1 "vlogical_operand")
> > +		 (match_operand:V1TI 2 "vlogical_operand")))]
> > +  "TARGET_TI_VECTOR_OPS"
> > +  "")

All the rest of this is in rs6000.md, won't "eqvv1ti3" work already?

> Since it's on all of the clauses, Maybe adjust the dg-require to
> include ppc_native_128bit for the whole test, unless there is more to
> follow.

Good plan :-)

Thanks for all the comments Will!  Carl, could you fix things and resend
please?  It's a rather big patch, we'll have to do it in stages :-/


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support
  2020-08-11 19:22 ` [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support Carl Love
  2020-08-14 17:13   ` will schmidt
@ 2020-08-20  1:29   ` Segher Boessenkool
  2020-08-26 18:23     ` Carl Love
  1 sibling, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2020-08-20  1:29 UTC (permalink / raw)
  To: Carl Love; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt, cel

Hi!

On Tue, Aug 11, 2020 at 12:22:59PM -0700, Carl Love wrote:
> +(define_insn "floattitd2"
> +  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
> +	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "dcffixqq %0,%1"
> +  [(set_attr "type" "dfp")])

I wonder if this should just be TARGET_POWER10 now?  That goes for the
whole series of course.

> +  ;; carll

I don't think we need this comment on trunk ;-)

Looks fine otherwise.  Okay for trunk, modulo whatever we do with
YARGET_TI_VECTOR_OPS.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 4/5] rs6000, Test 128-bit shifts for just the int128 type.
  2020-08-11 19:23 ` [Patch 4/5] rs6000, Test 128-bit shifts for just the int128 type Carl Love
  2020-08-14 17:35   ` will schmidt
@ 2020-08-20 21:50   ` Segher Boessenkool
  2020-08-26 20:27     ` Carl Love
  1 sibling, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2020-08-20 21:50 UTC (permalink / raw)
  To: Carl Love; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt, cel

Hi!

On Tue, Aug 11, 2020 at 12:23:05PM -0700, Carl Love wrote:
> +;; 128-bit int modes
> +(define_mode_iterator VEC_I128 [V1TI TI])

We already have VSX_TI for this (in vsx.md).  Rename that to something
without VSX, and move it to vector.md or such?  Maybe name it VEC_TI
or anyTI.

Do that renaming as a separate patch before this one?  It is logically
separate, and it is boring stuff, so putting it in a separate patch
makes the non-boring stuff stand out more.

(It would be better if we could just get rid of V1TI, but that isn't
going to happen soon).

> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -367,7 +367,7 @@
>     UNSPEC_INSERTR
>     UNSPEC_REPLACE_ELT
>     UNSPEC_REPLACE_UN
> -	UNSPEC_XXSWAPD_V1TI
> +	UNSPEC_XXSWAPD_VEC_I128

Why not just UNSPEC_XXSWAPD?  And, why an unspec at all?


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 5/5] rs6000, Conversions between 128-bit integer and floating point values.
  2020-08-11 19:23 ` [Patch 5/5] rs6000, Conversions between 128-bit integer and floating point values Carl Love
  2020-08-14 18:50   ` will schmidt
@ 2020-08-20 22:36   ` Segher Boessenkool
  2020-09-19  0:25   ` will schmidt
  2 siblings, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2020-08-20 22:36 UTC (permalink / raw)
  To: Carl Love; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt, cel

Hi!

On Tue, Aug 11, 2020 at 12:23:13PM -0700, Carl Love wrote:
[ Perfect stuff, or I don't see anything anyway! ]

Okay for trunk.  Thank you!


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support
  2020-08-20  1:29   ` Segher Boessenkool
@ 2020-08-26 18:23     ` Carl Love
  2020-09-10 17:36       ` Segher Boessenkool
  0 siblings, 1 reply; 27+ messages in thread
From: Carl Love @ 2020-08-26 18:23 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt, cel

Segher:

On Wed, 2020-08-19 at 20:29 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 11, 2020 at 12:22:59PM -0700, Carl Love wrote:
> > +(define_insn "floattitd2"
> > +  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
> > +	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
> > +  "TARGET_TI_VECTOR_OPS"
> > +  "dcffixqq %0,%1"
> > +  [(set_attr "type" "dfp")])
> 
> I wonder if this should just be TARGET_POWER10 now?  That goes for
> the
> whole series of course.
> 
<snip>
> 
> Looks fine otherwise.  Okay for trunk, modulo whatever we do with
> YARGET_TI_VECTOR_OPS.  Thanks!
> 
> 
> Segher

You commented on the TARGET_TI_VECTOR_OPS in patch 2 as well, i.e.

   > > +  /* The -mti-vector-ops option requires ISA 3.1 support and -maltivec for
   > > +     the 128-bit instructions.  Currently, TARGET_POWER10 is sufficient to
   > > +     enable it by default.  */
   > > +  if (TARGET_POWER10)
   > > +    {
   > > +      if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
   > > +   warning(0, ("%<-mno-altivec%> disables -mti-vector-ops (128-bit integer vector register operations)."));
   > > +      else
   > > +   rs6000_isa_flags |= OPTION_MASK_TI_VECTOR_OPS;
   > > +    }
   > 
   > It seems odd here that -maltivec is explicitly called out here.  That
   > should be default on for quite a while at this point.

   And the actual check it for -mvsx anyway?  Not sure I follow what this
   does at all.

So this all goes way back, the following is from a discussion way back
in November.   We added it thinking it would be good to be able to
enable/disable the 128-bit vector support, as I recall.  I have to
admit that I am struggling a bit to remember all the details as we
discussed them back then. 

The following is from an old email.


   Hi Carl,

   On Mon, Nov 18, 2019 at 12:18:48PM -0800, Carl Love wrote:
   > Per your other note, I change the test suite check, with spelling fix,
   > to:
   >  
   > # Return 1 if the can generate the 128-bit integer operations in ISA 3.1.
   > # That means the following 128-bit instructions can be generated:
   > # vadduqm, vsubuqm, vmsumcud, vdivsq, vdivuq, vmodsq, vmoduq, vcmpuq, vcmpsq,
   > # vrlq, vslq, vsrq.  The -mti-vector-ops flag enables the needed support.
   > # The -mti-vector-ops is enabled by default for -mcpu=future.  Sufficient
   > # to test if the vadduqm instruction is generated for ISA 3.1 support.
   > proc check_effective_target_ppc_native_128bit { } {                             
   >     return [check_no_messages_and_pattern_nocache \
   >                 ppc_native_128bit {\mvadduqm\M} assembly {
   >                     __int128_t test_add (__int128_t a, __int128_t b)
   >                     { return a + b; }           
   >                 } {-mti-vector-ops} ]
   > }

   Ah, very good, this will work fine :-)

   But without that -mti-vector-ops option?

   > > (We also need to test what happens with -mno-altivec, etc. --
   > > shouldn't
   > > be hard, should just do the same thing as it does on older CPUs).
   > 
   > So I tried compiling the test case with -mn-altivec and -mcpu=future
   > and I get a GCC crash.  :-(

   Think of it this way: it is good if developers get ICEs.  That way, users
   get fewer.  :-)

   > I updated the setting for rs6000_isa_flags in rs6000.c to:
   > 
   >   /* The -mti-vector-ops option requires ISA 3.1 support and -maltivec for
   >      the 128-bit instructions.  Currently, TARGET_FUTURE is sufficient to
   >      enable it by default.  */                                 
   >   if (TARGET_FUTURE)                                                            
   >     {                                                                           
   >       if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
   >         warning(0, ("%<-mno-altivec%> disables -mti-vector-ops (128-bit integer vector register operations)."));                            
   >       else                                                                      
   >         rs6000_isa_flags |= OPTION_MASK_TI_VECTOR_OPS;                          
   >     }
   >  
   > Now, I get:
   >  
   > $GCC_INSTALL/bin/gcc -g -mcpu=future -mno-altivec  int_128bit-runnable.c -o int_128bit-runnable
   > cc1: warning: ‘-mno-altivec’ disables vsx
   > cc1: warning: ‘-mno-altivec’ disables -mti-vector-ops (128-bit integer vector register operations).
   > 
   > without -mno-altivec it compiles with no warning or errors.  The object
   > dump has the expected vadduqm and vsubuqm instructions.

   That will do the trick.  Maybe you want the message to be quiet, (or
   quieter anyway), if you get testsuite fallout?  You will find out.

   > > > +mti-vector-ops
   > > > +Target Report Mask(TI_VECTOR_OPS) Var(rs6000_isa_flags)
   > > > +Use integer 128-bit instructions for a future architecture.
   > > 
   > > For upstream we should make this an internal option (so
   > > Undocumented), but
   > > it may be handy to keep the option more visible for now, sure.
   > 
   > I was kinda thinking it would probably be made internal down the road.
   > 
   > So with these changes, I think we have a good base patch to build on. 

   Yes!  Thank you, this looks like it will be much less problematic than I
   feared for :-)


   Segher

So, do we want to drop the option OPTION_MASK_TI_VECTOR_OPS at this
point and go with just TARGET_POWER10?  
 

                        Carl 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [Patch 4/5] rs6000, Test 128-bit shifts for just the int128 type.
  2020-08-20 21:50   ` Segher Boessenkool
@ 2020-08-26 20:27     ` Carl Love
  2020-09-10 17:52       ` Segher Boessenkool
  0 siblings, 1 reply; 27+ messages in thread
From: Carl Love @ 2020-08-26 20:27 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt, cel

Segher:

On Thu, 2020-08-20 at 16:50 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 11, 2020 at 12:23:05PM -0700, Carl Love wrote:
> > +;; 128-bit int modes
> > +(define_mode_iterator VEC_I128 [V1TI TI])
> 
> We already have VSX_TI for this (in vsx.md).  Rename that to
> something
> without VSX, and move it to vector.md or such?  Maybe name it VEC_TI
> or anyTI.
> 
> Do that renaming as a separate patch before this one?  It is
> logically
> separate, and it is boring stuff, so putting it in a separate patch
> makes the non-boring stuff stand out more.
> 
> (It would be better if we could just get rid of V1TI, but that isn't
> going to happen soon).
> 
> > --- a/gcc/config/rs6000/vsx.md
> > +++ b/gcc/config/rs6000/vsx.md
> > @@ -367,7 +367,7 @@
> >     UNSPEC_INSERTR
> >     UNSPEC_REPLACE_ELT
> >     UNSPEC_REPLACE_UN
> > -	UNSPEC_XXSWAPD_V1TI
> > +	UNSPEC_XXSWAPD_VEC_I128
> 
> Why not just UNSPEC_XXSWAPD?  And, why an unspec at all?

I am trying to figure out how to specify this without using an unpsec
per your last comment.  I changed the definition to:

;; Swap upper/lower 64-bit values in V1TI or TI type                                   
(define_insn "xxswapd_<mode>"                                                          
  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")                         
        (vec_select:VEC_I128                                                           
          (match_operand:VEC_I128 1 "vsx_register_operand" "v")                        
          (parallel [(const_int 0)])))]                                                
  "TARGET_POWER10"                                                                     
;; AIX does not support extended mnemonic xxswapd.  Use the basic                      
;; mnemonic xxpermdi instead.                                                          
  "xxpermdi %x0,%x1,%x1,2"                                                             
  [(set_attr "type" "vecperm")])


All of the swap definitions that I can see are based on using
vec_select which seems to be the issue here.  Not seeing anyway to do
this without using unspec.  Any thoughts?

                      Carl 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support
  2020-08-26 18:23     ` Carl Love
@ 2020-09-10 17:36       ` Segher Boessenkool
  0 siblings, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2020-09-10 17:36 UTC (permalink / raw)
  To: Carl Love; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt, cel

(Long ago...)

On Wed, Aug 26, 2020 at 11:23:45AM -0700, Carl Love wrote:
(Lots of context, thanks!)

> So, do we want to drop the option OPTION_MASK_TI_VECTOR_OPS at this
> point and go with just TARGET_POWER10?  

It has no value anymore now, as far as I can see?  So deleting it would
be good, yes.


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 4/5] rs6000, Test 128-bit shifts for just the int128 type.
  2020-08-26 20:27     ` Carl Love
@ 2020-09-10 17:52       ` Segher Boessenkool
  0 siblings, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2020-09-10 17:52 UTC (permalink / raw)
  To: Carl Love; +Cc: dje.gcc, gcc-patches, Will Schmidt, Bill Schmidt

Hi!

On Wed, Aug 26, 2020 at 01:27:44PM -0700, Carl Love wrote:
> > > @@ -367,7 +367,7 @@
> > >     UNSPEC_INSERTR
> > >     UNSPEC_REPLACE_ELT
> > >     UNSPEC_REPLACE_UN
> > > -	UNSPEC_XXSWAPD_V1TI
> > > +	UNSPEC_XXSWAPD_VEC_I128
> > 
> > Why not just UNSPEC_XXSWAPD?  And, why an unspec at all?
> 
> I am trying to figure out how to specify this without using an unpsec
> per your last comment.  I changed the definition to:
> 
> ;; Swap upper/lower 64-bit values in V1TI or TI type
> (define_insn "xxswapd_<mode>"
>   [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
>         (vec_select:VEC_I128
>           (match_operand:VEC_I128 1 "vsx_register_operand" "v")
>           (parallel [(const_int 0)])))]
>   "TARGET_POWER10"
> ;; AIX does not support extended mnemonic xxswapd.  Use the basic
> ;; mnemonic xxpermdi instead.
>   "xxpermdi %x0,%x1,%x1,2"
>   [(set_attr "type" "vecperm")])

(define_insn "xxswapd_<mode>"
  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")
        (subreg:VEC_I128
          (vec_select:V2DI
            (match_operand:V2DI 1 "vsx_register_operand" "v")
            (parallel [(const_int 1) (const_int 0)]))
          0))]

or similar (i.e., just cast it to the type you want -- in hardware, all
vectors are just an opaque 128 bits, but in RTL they have a type).

(You probably want to cast operands[1] as well).


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 5/5] rs6000,  Conversions between 128-bit integer and floating point values.
  2020-08-11 19:23 ` [Patch 5/5] rs6000, Conversions between 128-bit integer and floating point values Carl Love
  2020-08-14 18:50   ` will schmidt
  2020-08-20 22:36   ` Segher Boessenkool
@ 2020-09-19  0:25   ` will schmidt
  2 siblings, 0 replies; 27+ messages in thread
From: will schmidt @ 2020-09-19  0:25 UTC (permalink / raw)
  To: Carl Love, segher, dje.gcc, gcc-patches; +Cc: Bill Schmidt, cel

On Tue, 2020-08-11 at 12:23 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 5 adds the 128-bit integer to/from 128-floating point
> conversions.  This patch has to invoke the routines to use the 128-bit
> hardware instructions if on Power 10 or use software routines if
> running on a pre Power 10 system via the resolve function.  
> 
>                           Carl 


Some mostly cosmetic bits below.
Thanks
-Will


> 
> -----------------------------------------------------------
> Conversions between 128-bit integer and floating point values.
> 
> gcc/ChangeLog
> 
> 2020-08-10  Carl Love  <cel@us.ibm.com>
> 	config/rs6000/rs6000.md (floatunsti<mode>2,
> 	fix_trunc<mode>ti2, fixuns_trunc<mode>ti2): Add
> 	define_insn for mode IEEE 128.

also floatti<mode>2



> 	libgcc/config/rs6000/fixkfi-sw.c: New file.
> 	libgcc/config/rs6000/fixkfi.c: Remove file.

Should that be fixkfti-sw.c  (missing t)?

Adjust to indicate this is a rename
	libgcc/config/rs6000/fixkfti.c: Rename to
	libgcc/config/rs6000/fixkfti-sw.c


> 	libgcc/config/rs6000/fixunskfi-sw.c: New file.
> 	libgcc/config/rs6000/fixunskfi.c: Remove file.
> 	libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
> 	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
> 	New functions.

> 	libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
> 	New macro.
> 	(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
> 	__fixunskfti_resolve): Add resolve functions.
> 	(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
> 	functions.
> 	libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
> 	__fixtfti, __fixunstfti): Add editor commands to change
> 	names.
> 	libgcc/config/rs6000/float128-sed-hw (__floattitf,
> 	__floatuntitf, __fixtfti, __fixunstfti): Add editor commands
> 	to change names.
> 	libgcc/config/rs6000/floattikf-sw.c: New file.
> 	libgcc/config/rs6000/floattikf.c: Remove file.
> 	libgcc/config/rs6000/floatuntikf-sw.c: New file.
> 	libgcc/config/rs6000/floatuntikf.c: Remove file.
> 	libgcc/config/rs6000/floatuntikf-sw.c: New file.
> 	libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
> 	__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
> 	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
> 	__floatuntikf, __fixkfti, __fixunskfti):	New extern declarations.

no tab.

> 	libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
> 	fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
> 	(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
> 	file names to fp128_ppc_funcs.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-08-10  Carl Love  <cel@us.ibm.com>
> 	gcc.target/powerpc/fl128_conversions.c: New file.
> ---
>  gcc/config/rs6000/rs6000.md                   |  36 +++
>  .../gcc.target/powerpc/fp128_conversions.c    | 287 ++++++++++++++++++
>  .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
>  .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   4 +-
>  libgcc/config/rs6000/float128-hw.c            |  24 ++
>  libgcc/config/rs6000/float128-ifunc.c         |  44 ++-
>  libgcc/config/rs6000/float128-sed             |   4 +
>  libgcc/config/rs6000/float128-sed-hw          |   4 +
>  .../rs6000/{floattikf.c => floattikf-sw.c}    |   4 +-
>  .../{floatuntikf.c => floatuntikf-sw.c}       |   4 +-
>  libgcc/config/rs6000/quad-float128.h          |  17 +-
>  libgcc/config/rs6000/t-float128               |   3 +-
>  12 files changed, 415 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
>  rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
>  rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (96%)
>  rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
>  rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%)
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 43b620ae1c0..3853ebd4195 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6390,6 +6390,42 @@
>     xscvsxddp %x0,%x1"
>    [(set_attr "type" "fp")])
> 
> +(define_insn "floatti<mode>2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +       (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvsqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "floatunsti<mode>2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +       (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvuqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "fix_trunc<mode>ti2"
> +  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
> +       (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvqpsqz %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "fixuns_trunc<mode>ti2"
> +  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
> +       (unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvqpuqz %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
>  ; Allow the combiner to merge source memory operands to the conversion so that
>  ; the optimizer/register allocator doesn't try to load the value too early in a
>  ; GPR and then use store/load to move it to a FPR and suffer from a store-load
> diff --git a/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
> new file mode 100644
> index 00000000000..f0336e6f1fc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
> @@ -0,0 +1,287 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10" } */
> +
> +/* Check that the expected 128-bit instructions are generated if the processor
> +   supports the 128-bit integer instructions. */
> +/* { dg-final { scan-assembler-times {\mxscvsqqp\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvuqqp\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvqpsqz\M} 1 { target { ppc_native_128bit } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvqpuqz\M} 1 { target { ppc_native_128bit } } } } */
> +
> +#include <stdio.h>
> +#include <math.h>
> +#include <fenv.h>
> +#include <stdlib.h>
> +#include <wchar.h>
> +
> +#define DEBUG 1
> +

Probably turn off the DEBUG.



> +void abort (void);
> +
> +float conv_i_2_fp( long long int a)
> +{
> +  return (float) a;
> +}
> +
> +double conv_i_2_fpd( long long int a)
> +{
> +  return (double) a;
> +}
> +
> +double conv_ui_2_fpd( unsigned long long int a)
> +{
> +  return (double) a;
> +}
> +
> +__float128 conv_i128_2_fp128 (__int128_t a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__float128) a;
> +}
> +
> +__float128 conv_ui128_2_fp128 (__uint128_t a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__float128) a;
> +}
> +
> +__int128_t conv_fp128_2_i128 (__float128 a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__int128_t) a;
> +}
> +
> +__uint128_t conv_fp128_2_ui128 (__float128 a)
> +{
> +  // default, gen inst KF mode
> +  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (__uint128_t) a;
> +}
> +
> +long double conv_i128_2_ld (__int128_t a)
> +{
> +  // default, gen call __floattitf
> +  // -mabi=ibmlongdouble, gen call __floattitf
> +  // -mabi=ieeelongdouble gen inst floattiieee TF mode
> +  return (long double) a;
> +}
> +
> +__ibm128 conv_i128_2_ibm128 (__int128_t a)
> +{
> +  // default, gen call __floattitf
> +  // -mabi=ibmlongdouble, gen call __floattitf
> +  // -mabi=ieeelongdouble, messages about uses IBM long double, no binary output

Could use a few more words..  What messages? 


> +  return (__ibm128) a;
> +}
> +
> +int main()
> +{
> +	float a, expected_result_float;
> +	double b, expected_result_double;
> +	long long int c, expected_result_llint;
> +	unsigned long long int u;
> +	__int128_t d;
> +	__uint128_t u128;
> +	unsigned long long expected_result_uint128[2] ;
> +	__float128 e;
> +	long double ld;     // another 128-bit float version
> +
> +	union conv_t {
> +		float a;
> +		double b;
> +		long long int c;
> +		long long int128[2] ;
> +		unsigned long long uint128[2] ;
> +		unsigned long long int u;
> +		__int128_t d;
> +		__uint128_t u128;
> +		__float128 e;
> +		long double ld;     // another 128-bit float version
> +	} conv, conv_result;
> +
> + 
> +	c = 20;

Extra blank line +space above the "c = 20".

> +	expected_result_llint = 20.00000;
> +	a = conv_i_2_fp (c);
> +
> +	if (a != expected_result_llint) {
> +#if DEBUG
> +		printf("ERROR: conv_i_2_fp(%lld) = %10.5f\n", c, a);
> +		printf("\n does not match expected_result = %10.5f\n\n",
> +				 expected_result_llint);
> + #else

The indent for #else should match the #if and #endif.
Same elsewhere.

> +		abort();
> +#endif
> +	}
> +
> +	c = 20;
> +	expected_result_double = 20.00000;
> +	b = conv_i_2_fpd (c);
> +
> +	if (b != expected_result_double) {
> +#if DEBUG
> +		printf("ERROR: conv_i_2_fpd(%lld) = %10.5f\n", d, b);
> +		printf("\n does not match expected_result = %10.5f\n\n",
> +				 expected_result_double);
> + #else
> +		abort();
> +#endif
> +	}
> +
> +	u = 20;
> +	expected_result_double = 20.00000;
> +	b = conv_ui_2_fpd (u);
> +
> +	if (b != expected_result_double) {
> +#if DEBUG
> +		printf("ERROR: conv_ui_2_fpd(%llu) = %10.5f\n", u, b);
> +		printf("\n does not match expected_result = %10.5f\n\n",
> +				 expected_result_double);
> + #else
> +		abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */
> +  d = -3210;
> +  d = (d * 10000000000) + 9876543210;
> +  conv_result.e = conv_i128_2_fp128 (d);
> +  expected_result_uint128[1] = 0xc02bd2f9068d1160;
> +  expected_result_uint128[0] = 0x0;
> +  
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_i128_2_fp128(-32109876543210) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  d = 123;
> +  d = (d * 10000000000) + 1234567890;
> +  conv_result.ld = conv_i128_2_fp128 (d);
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0x4271eab4c8ed2000;
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_i128_2_fp128(1231234567890) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */
> +  u128 = 8760;
> +  u128 = (u128 * 10000000000) + 1234567890;
> +  conv_result.e = conv_ui128_2_fp128 (u128);
> +  expected_result_uint128[1] = 0x402d3eb101df8b48;
> +  expected_result_uint128[0] = 0x0;
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_ui128_2_fp128(87601234567890) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  /* Currently printing 128-bit float does not work correctly  */
> +  u128 = 3210;
> +  u128 = (u128 * 10000000000) + 9876543210;
> +  expected_result_uint128[1] = 0x402bd3429c8feea0;
> +  expected_result_uint128[0] = 0x0;
> +  conv_result.e = conv_ui128_2_fp128 (u128);
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_ui128_2_fp128(32109876543210) = (result in hex) 0x%llx %llx\n",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = 12345.6789;
> +  expected_result_uint128[1] = 0x1407374883526960;
> +  expected_result_uint128[0] = 0x3039;
> +
> +  conv_result.d = conv_fp128_2_i128 (conv.e);
> +
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) =  ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
> +
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = -6789.12345;
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0xffffffffffffe57b;
> +  conv_result.d = conv_fp128_2_i128 (conv.e);
> + 
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) = ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
> +
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  conv.e = 6789.12345;
> +  expected_result_uint128[1] = 0x0;
> +  expected_result_uint128[0] = 0x1a85;
> +  conv_result.d = conv_fp128_2_ui128 (conv.e);
> + 
> +  if ((conv_result.uint128[1] != expected_result_uint128[1])
> +		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
> +#if DEBUG
> +	  printf("ERROR: conv_fp128_2_ui128(0x%llx %llx) = ",
> +				conv.uint128[1], conv.uint128[0]);
> +	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
> +	  
> +	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
> +				expected_result_uint128[1], expected_result_uint128[0]);
> + #else
> +	  abort();
> +#endif
> +	}
> +
> +  return 0;
> +}
> diff --git a/libgcc/config/rs6000/fixkfti.c b/libgcc/config/rs6000/fixkfti-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/fixkfti.c
> rename to libgcc/config/rs6000/fixkfti-sw.c
> index a22286228aa..d6bbbf889b7 100644
> --- a/libgcc/config/rs6000/fixkfti.c
> +++ b/libgcc/config/rs6000/fixkfti-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TItype
> -__fixkfti (TFtype a)
> +__fixkfti_sw (TFtype a)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/fixunskfti.c b/libgcc/config/rs6000/fixunskfti-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/fixunskfti.c
> rename to libgcc/config/rs6000/fixunskfti-sw.c
> index ab232d92d24..d803936e48a 100644
> --- a/libgcc/config/rs6000/fixunskfti.c
> +++ b/libgcc/config/rs6000/fixunskfti-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).


Probably OK.  I'd recommend adding a line in the summary
paragraph to clarify that you are renaming some of the fix* source files
and doing whitespace touch-ups to the same.


> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  UTItype
> -__fixunskfti (TFtype a)
> +__fixunskfti_sw (TFtype a)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/float128-hw.c b/libgcc/config/rs6000/float128-hw.c
> index 8705b53e22a..be8bd07e853 100644
> --- a/libgcc/config/rs6000/float128-hw.c
> +++ b/libgcc/config/rs6000/float128-hw.c
> @@ -86,6 +86,30 @@ __floatdikf_hw (DItype_ppc a)
>    return (TFtype) a;
>  }
> 
> +TFtype
> +__floattikf_hw (TItype_ppc a)
> +{
> +  return (TFtype) a;
> +}
> +
> +TFtype
> +__floatuntikf_hw (UTItype_ppc a)
> +{
> +  return (TFtype) a;
> +}
> +
> +TItype_ppc
> +__fixkfti_hw (TFtype a)
> +{
> +  return (TItype_ppc) a;
> +}
> +
> +UTItype_ppc
> +__fixunskfti_hw (TFtype a)
> +{
> +  return (UTItype_ppc) a;
> +}
> +
>  TFtype
>  __floatundikf_hw (UDItype_ppc a)
>  {
> diff --git a/libgcc/config/rs6000/float128-ifunc.c b/libgcc/config/rs6000/float128-ifunc.c
> index c2f65912a74..c221be2c864 100644
> --- a/libgcc/config/rs6000/float128-ifunc.c
> +++ b/libgcc/config/rs6000/float128-ifunc.c
> @@ -46,14 +46,9 @@
>  #endif
> 
>  #define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW : SW)
> +#define SW_OR_HW_ISA3_1(SW, HW) (__builtin_cpu_supports ("arch_3_1") ? HW : SW)
> 
>  /* Resolvers.  */
> -
> -/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
> -   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
> -   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
> -   use the emulator functions for these conversions.  */


Could add a line in the patch description " Add ifunc resolves for
__fixkfti,... "
same as appropriate below.

> -
>  static __typeof__ (__addkf3_sw) *
>  __addkf3_resolve (void)
>  {
> @@ -102,6 +97,18 @@ __floatdikf_resolve (void)
>    return SW_OR_HW (__floatdikf_sw, __floatdikf_hw);
>  }
> 
> +static __typeof__ (__floattikf_sw) *
> +__floattikf_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__floattikf_sw, __floattikf_hw);
> +}
> +
> +static __typeof__ (__floatuntikf_sw) *
> +__floatuntikf_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__floatuntikf_sw, __floatuntikf_hw);
> +}
> +
>  static __typeof__ (__floatunsikf_sw) *
>  __floatunsikf_resolve (void)
>  {
> @@ -114,6 +121,19 @@ __floatundikf_resolve (void)
>    return SW_OR_HW (__floatundikf_sw, __floatundikf_hw);
>  }
> 
> +

extra blank line.

> +static __typeof__ (__fixkfti_sw) *
> +__fixkfti_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__fixkfti_sw, __fixkfti_hw);
> +}
> +
> +static __typeof__ (__fixunskfti_sw) *
> +__fixunskfti_resolve (void)
> +{
> +  return SW_OR_HW_ISA3_1 (__fixunskfti_sw, __fixunskfti_hw);
> +}
> +
>  static __typeof__ (__fixkfsi_sw) *
>  __fixkfsi_resolve (void)
>  {
> @@ -303,6 +323,18 @@ TFtype __floatsikf (SItype_ppc)
>  TFtype __floatdikf (DItype_ppc)
>    __attribute__ ((__ifunc__ ("__floatdikf_resolve")));
> 
> +TFtype __floattikf (TItype_ppc)
> +  __attribute__ ((__ifunc__ ("__floattikf_resolve")));
> +
> +TFtype __floatuntikf (UTItype_ppc)
> +  __attribute__ ((__ifunc__ ("__floatuntikf_resolve")));
> +
> +TItype_ppc __fixkfti (TFtype)
> +  __attribute__ ((__ifunc__ ("__fixkfti_resolve")));
> +
> +UTItype_ppc __fixunskfti (TFtype)
> +  __attribute__ ((__ifunc__ ("__fixunskfti_resolve")));
> +
>  TFtype __floatunsikf (USItype_ppc)
>    __attribute__ ((__ifunc__ ("__floatunsikf_resolve")));
> 
> diff --git a/libgcc/config/rs6000/float128-sed b/libgcc/config/rs6000/float128-sed
> index d9a089ff9ba..c0fcddb1959 100644
> --- a/libgcc/config/rs6000/float128-sed
> +++ b/libgcc/config/rs6000/float128-sed
> @@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi/g
>  s/__fixunstfdi/__fixunskfdi/g
>  s/__fixunstfsi/__fixunskfsi/g
>  s/__floatditf/__floatdikf/g
> +s/__floattitf/__floattikf/g
> +s/__floatuntitf/__floatuntikf/g
> +s/__fixtfti/__fixkfti/g
> +s/__fixunstfti/__fixunskfti/g
>  s/__floatsitf/__floatsikf/g
>  s/__floatunditf/__floatundikf/g
>  s/__floatunsitf/__floatunsikf/g
> diff --git a/libgcc/config/rs6000/float128-sed-hw b/libgcc/config/rs6000/float128-sed-hw
> index acf36b0c17d..3d2bf556da1 100644
> --- a/libgcc/config/rs6000/float128-sed-hw
> +++ b/libgcc/config/rs6000/float128-sed-hw
> @@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi_sw/g
>  s/__fixunstfdi/__fixunskfdi_sw/g
>  s/__fixunstfsi/__fixunskfsi_sw/g
>  s/__floatditf/__floatdikf_sw/g
> +s/__floattitf/__floattikf_sw/g
> +s/__floatuntitf/__floatuntikf_sw/g
> +s/__fixtfti/__fixkfti_sw/g
> +s/__fixunstfti/__fixunskfti_sw/g
>  s/__floatsitf/__floatsikf_sw/g
>  s/__floatunditf/__floatundikf_sw/g
>  s/__floatunsitf/__floatunsikf_sw/g

ok

> diff --git a/libgcc/config/rs6000/floattikf.c b/libgcc/config/rs6000/floattikf-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/floattikf.c
> rename to libgcc/config/rs6000/floattikf-sw.c
> index 4e8c40cfbe4..110706352bb 100644
> --- a/libgcc/config/rs6000/floattikf.c
> +++ b/libgcc/config/rs6000/floattikf-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TFtype
> -__floattikf (TItype i)
> +__floattikf_sw (TItype i)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/floatuntikf.c b/libgcc/config/rs6000/floatuntikf-sw.c
> similarity index 96%
> rename from libgcc/config/rs6000/floatuntikf.c
> rename to libgcc/config/rs6000/floatuntikf-sw.c
> index 8bfba4267d4..5e712a67e26 100644
> --- a/libgcc/config/rs6000/floatuntikf.c
> +++ b/libgcc/config/rs6000/floatuntikf-sw.c
> @@ -5,7 +5,7 @@
>     This file is part of the GNU C Library.
>     Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
>     Code is based on the main soft-fp library written by:
> -   	   Uros Bizjak (ubizjak@gmail.com).
> +	   Uros Bizjak (ubizjak@gmail.com).
> 
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -35,7 +35,7 @@
>  #include "quad-float128.h"
> 
>  TFtype
> -__floatuntikf (UTItype i)
> +__floatuntikf_sw (UTItype i)
>  {
>    FP_DECL_EX;
>    FP_DECL_Q (A);
> diff --git a/libgcc/config/rs6000/quad-float128.h b/libgcc/config/rs6000/quad-float128.h
> index 32ef328a8ea..24712b9277f 100644
> --- a/libgcc/config/rs6000/quad-float128.h
> +++ b/libgcc/config/rs6000/quad-float128.h
> @@ -87,19 +87,18 @@ extern USItype_ppc __fixunskfsi_sw (TFtype);
>  extern UDItype_ppc __fixunskfdi_sw (TFtype);
>  extern TFtype __floatsikf_sw (SItype_ppc);
>  extern TFtype __floatdikf_sw (DItype_ppc);
> +extern TFtype __floattikf_sw (TItype_ppc);
>  extern TFtype __floatunsikf_sw (USItype_ppc);
>  extern TFtype __floatundikf_sw (UDItype_ppc);
> +extern TFtype __floatuntikf_sw (UTItype_ppc);
> +extern TItype_ppc __fixkfti_sw (TFtype);
> +extern UTItype_ppc __fixunskfti_sw (TFtype);
>  extern IBM128_TYPE __extendkftf2_sw (TFtype);
>  extern TFtype __trunctfkf2_sw (IBM128_TYPE);
>  extern TCtype __mulkc3_sw (TFtype, TFtype, TFtype, TFtype);
>  extern TCtype __divkc3_sw (TFtype, TFtype, TFtype, TFtype);
> 
>  #ifdef _ARCH_PPC64
> -/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
> -   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
> -   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
> -   use the emulator functions for these conversions.  */
> -
>  extern TItype_ppc __fixkfti (TFtype);
>  extern UTItype_ppc __fixunskfti (TFtype);
>  extern TFtype __floattikf (TItype_ppc);
> @@ -130,8 +129,12 @@ extern USItype_ppc __fixunskfsi_hw (TFtype);
>  extern UDItype_ppc __fixunskfdi_hw (TFtype);
>  extern TFtype __floatsikf_hw (SItype_ppc);
>  extern TFtype __floatdikf_hw (DItype_ppc);
> +extern TFtype __floattikf_hw (TItype_ppc);
>  extern TFtype __floatunsikf_hw (USItype_ppc);
>  extern TFtype __floatundikf_hw (UDItype_ppc);
> +extern TFtype __floatuntikf_hw (UTItype_ppc);
> +extern TItype_ppc __fixkfti_hw (TFtype);
> +extern UTItype_ppc __fixunskfti_hw (TFtype);
>  extern IBM128_TYPE __extendkftf2_hw (TFtype);
>  extern TFtype __trunctfkf2_hw (IBM128_TYPE);
>  extern TCtype __mulkc3_hw (TFtype, TFtype, TFtype, TFtype);
> @@ -162,8 +165,12 @@ extern USItype_ppc __fixunskfsi (TFtype);
>  extern UDItype_ppc __fixunskfdi (TFtype);
>  extern TFtype __floatsikf (SItype_ppc);
>  extern TFtype __floatdikf (DItype_ppc);
> +extern TFtype __floattikf (TItype_ppc);
>  extern TFtype __floatunsikf (USItype_ppc);
>  extern TFtype __floatundikf (UDItype_ppc);
> +extern TFtype __floatuntikf (UTItype_ppc);
> +extern TItype_ppc __fixkfti (TFtype);
> +extern UTItype_ppc __fixunskfti (TFtype);
>  extern IBM128_TYPE __extendkftf2 (TFtype);
>  extern TFtype __trunctfkf2 (IBM128_TYPE);
> 
> diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
> index d5413445189..325b22fd49e 100644
> --- a/libgcc/config/rs6000/t-float128
> +++ b/libgcc/config/rs6000/t-float128
> @@ -23,7 +23,8 @@ fp128_softfp_shared_obj	= $(addsuffix -sw_s$(objext),$(fp128_softfp_funcs))
>  fp128_softfp_obj	= $(fp128_softfp_static_obj) $(fp128_softfp_shared_obj)
> 
>  # New functions for software emulation
> -fp128_ppc_funcs		= floattikf floatuntikf fixkfti fixunskfti \
> +fp128_ppc_funcs		= floattikf-sw floatuntikf-sw \
> +			  fixkfti-sw fixunskfti-sw \
>  			  extendkftf2-sw trunctfkf2-sw \
>  			  sfp-exceptions _mulkc3 _divkc3 _powikf2

ok
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Patch 0/5] rs6000, 128-bit Binary Integer Operations
@ 2020-09-21 21:17 Carl Love
  0 siblings, 0 replies; 27+ messages in thread
From: Carl Love @ 2020-09-21 21:17 UTC (permalink / raw)
  To: segher, dje.gcc, gcc-patches, Will Schmidt

Will, Segher:

The following is the updated patch set for the 128-bit Binary Integer
Operation.  I am reposting the entire set for completeness.  I have
noted in each patch the changes made since the previous version.  

The patches have been tested on Power 8 and Power 9 to ensure there are
no regression errors.  The new tests have been manually compiled and
run on mambo to ensure they work correctly.

Please review the patches and let me know if they are acceptable for
mainline.  Thanks.

                       Carl Love


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-09-21 21:17 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-11 19:01 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love
2020-08-11 19:22 ` [Patch 1/5] rs6000, Add 128-bit sign extension support Carl Love
2020-08-13 17:36   ` Segher Boessenkool
2020-08-13 18:09     ` Carl Love
2020-08-13 18:29       ` Segher Boessenkool
2020-08-13 22:11         ` [EXTERNAL] " will schmidt
2020-08-13 22:55           ` Segher Boessenkool
2020-08-13 23:53             ` [EXTERNAL] " will schmidt
2020-08-18 21:50               ` Segher Boessenkool
2020-08-11 19:22 ` [Patch 2/5] rs6000, 128-bit multiply, divide, modulo, shift, compare Carl Love
2020-08-13 23:46   ` will schmidt
2020-08-20  1:06     ` Segher Boessenkool
2020-08-11 19:22 ` [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support Carl Love
2020-08-14 17:13   ` will schmidt
2020-08-20  1:29   ` Segher Boessenkool
2020-08-26 18:23     ` Carl Love
2020-09-10 17:36       ` Segher Boessenkool
2020-08-11 19:23 ` [Patch 4/5] rs6000, Test 128-bit shifts for just the int128 type Carl Love
2020-08-14 17:35   ` will schmidt
2020-08-20 21:50   ` Segher Boessenkool
2020-08-26 20:27     ` Carl Love
2020-09-10 17:52       ` Segher Boessenkool
2020-08-11 19:23 ` [Patch 5/5] rs6000, Conversions between 128-bit integer and floating point values Carl Love
2020-08-14 18:50   ` will schmidt
2020-08-20 22:36   ` Segher Boessenkool
2020-09-19  0:25   ` will schmidt
2020-09-21 21:17 [Patch 0/5] rs6000, 128-bit Binary Integer Operations Carl Love

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).