[PATCH 0/5 ver4] RS6000: Add 128-bit Integer Operations

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/5 ver4] RS6000: Add 128-bit Integer Operations
       [not found] <d660f874049b7fdb338ff33478c56b2259828ab1.camel@us.ibm.com>
@ 2021-04-26 16:35 ` Carl Love
  2021-04-26 16:35 ` [PATCH 1/5 " Carl Love
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Carl Love @ 2021-04-26 16:35 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Segher, Will:

Bill asked that I refresh thes 128-bit integer patch set and get it
reposted.  I have rebased the patches onto the latest mainline
respository.  I have also retested the patches on Power 8 BE, Power 9
LE and Power 10 LE hardware.

A request has been made last week to change the functionality of the
128-bit sign extension builtins.  I pulled the support for these
builtins out of the previous patch 2 and 3 and moved to a single last
patch.  That leaves five of the six patches ready for review. I will
update and post the last, sixth, patch later when we have full
agreement on what the new functionality will be. 

The following five patches have minor fixes to the builtin
descriptions, typo fixes, etc. from the previous iteration.  The patch
series numbers have changed.

                      Carl Love

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5 ver4] RS6000: Add 128-bit Integer Operations
       [not found] <d660f874049b7fdb338ff33478c56b2259828ab1.camel@us.ibm.com>
  2021-04-26 16:35 ` [PATCH 0/5 ver4] RS6000: Add 128-bit Integer Operations Carl Love
@ 2021-04-26 16:35 ` Carl Love
  2021-04-27 23:46   ` will schmidt
  2021-05-26 21:03   ` Segher Boessenkool
  2021-04-26 16:36 ` [PATCH 2/5 " Carl Love
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: Carl Love @ 2021-04-26 16:35 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Will, Segher:

This patch fixes the order of the argument in the vec_rlmi and
vec_rlnm builtins.  The patch also adds a new test cases to verify
the fix.

The patch has been tested on
    powerpc64-linux instead (Power 8 BE)
    powerpc64-linux instead (Power 9 LE)
    powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

                   Carl Love

------------------------------------------------------------

2021-04-26  Carl Love  <cel@us.ibm.com>

gcc/
	* config/rs6000/altivec.md (altivec_vrl<VI_char>mi): Fix
	bug in argument generation.

gcc/testsuite/
	gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c:
	New runnable test case.
	gcc.target/powerpc/vec-rlmi-rlnm.c: Update scan assembler times
	for xxlor instruction.
---
 gcc/config/rs6000/altivec.md                  |   6 +-
 .../powerpc/check-builtin-vec_rlnm-runnable.c | 231 ++++++++++++++++++
 .../gcc.target/powerpc/vec-rlmi-rlnm.c        |   2 +-
 3 files changed, 235 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 1351dafbc41..97dc9d2bda9 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1987,12 +1987,12 @@
 
 (define_insn "altivec_vrl<VI_char>mi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
-        (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
-	                (match_operand:VIlong 2 "register_operand" "v")
+        (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
+	                (match_operand:VIlong 2 "register_operand" "0")
 		        (match_operand:VIlong 3 "register_operand" "v")]
 		       UNSPEC_VRLMI))]
   "TARGET_P9_VECTOR"
-  "vrl<VI_char>mi %0,%2,%3"
+  "vrl<VI_char>mi %0,%1,%3"
   [(set_attr "type" "veclogical")])
 
 (define_insn "altivec_vrl<VI_char>nm"
diff --git a/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
new file mode 100644
index 00000000000..be8f82d8a06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
@@ -0,0 +1,231 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
+
+/* Verify the vec_rlm and vec_rlmi builtins works correctly.  */
+/* { dg-final { scan-assembler-times {\mvrldmi\M} 1 } } */
+
+#include <altivec.h>
+
+#ifdef DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector unsigned int vec_arg1_int, vec_arg2_int, vec_arg3_int;
+  vector unsigned int vec_result_int, vec_expected_result_int;
+  
+  vector unsigned long long int vec_arg1_di, vec_arg2_di, vec_arg3_di;
+  vector unsigned long long int vec_result_di, vec_expected_result_di;
+
+  unsigned int mask_begin, mask_end, shift;
+  unsigned long long int mask;
+
+/* Check vec int version of vec_rlmi builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x80000000ULL >> i;
+
+  for (i = 0; i < 4; i++) {
+    vec_arg1_int[i] = 0x12345678 + i*0x11111111;
+    vec_arg2_int[i] = 0xA1B1CDEF;
+    vec_arg3_int[i] = mask_begin << 16 | mask_end << 8 | shift;
+
+    /* do rotate */
+    vec_expected_result_int[i] =  ( vec_arg2_int[i] & ~mask) 
+      | ((vec_arg1_int[i] << shift) | (vec_arg1_int[i] >> (32-shift))) & mask;
+      
+  }
+
+  /* vec_rlmi(arg1, arg2, arg3)
+     result - rotate each element of arg2 left and inserting it into arg1 
+       element based on the mask specified in arg3.  The shift, mask
+       start and end is specified in arg3.  */
+  vec_result_int = vec_rlmi (vec_arg1_int, vec_arg2_int, vec_arg3_int);
+
+  for (i = 0; i < 4; i++) {
+    if (vec_result_int[i] != vec_expected_result_int[i])
+#ifdef DEBUG
+      printf("ERROR: i = %d, vec_rlmi int result 0x%x, does not match "
+	     "expected result 0x%x\n", i, vec_result_int[i],
+	     vec_expected_result_int[i]);
+#else
+      abort();
+#endif
+    }
+
+/* Check vec long long int version of vec_rlmi builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x8000000000000000ULL >> i;
+
+  for (i = 0; i < 2; i++) {
+    vec_arg1_di[i] = 0x1234567800000000 + i*0x11111111;
+    vec_arg2_di[i] = 0xA1B1C1D1E1F12345;
+    vec_arg3_di[i] = mask_begin << 16 | mask_end << 8 | shift;
+
+    /* do rotate */
+    vec_expected_result_di[i] =  ( vec_arg2_di[i] & ~mask) 
+      | ((vec_arg1_di[i] << shift) | (vec_arg1_di[i] >> (64-shift))) & mask;
+  }
+
+  /* vec_rlmi(arg1, arg2, arg3)
+     result - rotate each element of arg1 left and inserting it into arg2 
+       element of arg2 based on the mask specified in arg3.  The shift, mask
+       start and end is specified in arg3.  */
+  vec_result_di = vec_rlmi (vec_arg1_di, vec_arg2_di, vec_arg3_di);
+
+  for (i = 0; i < 2; i++) {
+    if (vec_result_di[i] != vec_expected_result_di[i])
+#ifdef DEBUG
+      printf("ERROR: i = %d, vec_rlmi int long long result 0x%llx, does not match "
+	     "expected result 0x%llx\n", i, vec_result_di[i],
+	     vec_expected_result_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* Check vec int version of vec_rlnm builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x80000000ULL >> i;
+
+  for (i = 0; i < 4; i++) {
+    vec_arg1_int[i] = 0x12345678 + i*0x11111111;
+    vec_arg2_int[i] = shift;
+    vec_arg3_int[i] = mask_begin << 8 | mask_end;
+    vec_expected_result_int[i] = (vec_arg1_int[i] << shift) & mask;
+  }
+
+  /* vec_rlnm(arg1, arg2, arg3)
+     result - rotate each element of arg1 left by shift in element of arg2.
+       Then AND with mask whose  start/stop bits are specified in element of
+       arg3.  */
+  vec_result_int = vec_rlnm (vec_arg1_int, vec_arg2_int, vec_arg3_int);
+  for (i = 0; i < 4; i++) {
+    if (vec_result_int[i] != vec_expected_result_int[i])
+#ifdef DEBUG
+      printf("ERROR: vec_rlnm, i = %d, int result 0x%x does not match "
+	     "expected result 0x%x\n", i, vec_result_int[i],
+	     vec_expected_result_int[i]);
+#else
+      abort();
+#endif
+    }
+
+/* Check vec long int version of builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 20;
+
+  for (i = 0; i < 63; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x8000000000000000ULL >> i;
+  
+  for (i = 0; i < 2; i++) {
+    vec_arg1_di[i] = 0x123456789ABCDE00ULL + i*0x1111111111111111ULL;
+    vec_arg2_di[i] = shift;
+    vec_arg3_di[i] = mask_begin << 8 | mask_end;
+    vec_expected_result_di[i] = (vec_arg1_di[i] << shift) & mask;
+  }
+
+  vec_result_di = vec_rlnm (vec_arg1_di, vec_arg2_di, vec_arg3_di);
+
+  for (i = 0; i < 2; i++) {
+    if (vec_result_di[i] != vec_expected_result_di[i])
+#ifdef DEBUG
+      printf("ERROR: vec_rlnm, i = %d, long long int result 0x%llx does not "
+	     "match expected result 0x%llx\n", i, vec_result_di[i],
+	     vec_expected_result_di[i]);
+#else
+      abort();
+#endif
+    }
+
+    /* Check vec int version of vec_vrlnm builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x80000000ULL >> i;
+
+  for (i = 0; i < 4; i++) {
+    vec_arg1_int[i] = 0x12345678 + i*0x11111111;
+    vec_arg2_int[i] = mask_begin << 16 | mask_end << 8 | shift;
+    vec_expected_result_int[i] = (vec_arg1_int[i] << shift) & mask;
+  }
+
+  /* vec_vrlnm(arg1, arg2, arg3)
+     result - rotate each element of arg1 left then AND with mask.  The mask
+       start, stop bits is specified in the second argument.  The shift amount
+       is also specified in the second argument.  */
+  vec_result_int = vec_vrlnm (vec_arg1_int, vec_arg2_int);
+
+  for (i = 0; i < 4; i++) {
+    if (vec_result_int[i] != vec_expected_result_int[i])
+#ifdef DEBUG
+      printf("ERROR: vec_vrlnm, i = %d, int result 0x%x does not match "
+	     "expected result 0x%x\n", i, vec_result_int[i],
+	     vec_expected_result_int[i]);
+#else
+      abort();
+#endif
+    }
+
+/* Check vec long int version of vec_vrlnm builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 20;
+
+  for (i = 0; i < 63; i++)
+    if ((i >= mask_begin) && (i <= mask_end))
+      mask |= 0x8000000000000000ULL >> i;
+  
+  for (i = 0; i < 2; i++) {
+    vec_arg1_di[i] = 0x123456789ABCDE00ULL + i*0x1111111111111111ULL;
+    vec_arg2_di[i] = mask_begin << 16 | mask_end << 8 | shift;
+    vec_expected_result_di[i] = (vec_arg1_di[i] << shift) & mask;
+  }
+
+  vec_result_di = vec_vrlnm (vec_arg1_di, vec_arg2_di);
+
+  for (i = 0; i < 2; i++) {
+    if (vec_result_di[i] != vec_expected_result_di[i])
+#ifdef DEBUG
+      printf("ERROR: vec_vrlnm, i = %d, long long int result 0x%llx does not "
+	     "match expected result 0x%llx\n", i, vec_result_di[i],
+	     vec_expected_result_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
index 1e7d7390c5b..b0f26c8f4cb 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
@@ -62,6 +62,6 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned long long y,
 /* { dg-final { scan-assembler-times "vextsb2d" 1 } } */
 /* { dg-final { scan-assembler-times "vslw" 1 } } */
 /* { dg-final { scan-assembler-times "vsld" 1 } } */
-/* { dg-final { scan-assembler-times "xxlor" 3 } } */
+/* { dg-final { scan-assembler-times "xxlor" 5 } } */
 /* { dg-final { scan-assembler-times "vrlwnm" 2 } } */
 /* { dg-final { scan-assembler-times "vrldnm" 2 } } */
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/5 ver4] RS6000: Add 128-bit Integer Operations
       [not found] <d660f874049b7fdb338ff33478c56b2259828ab1.camel@us.ibm.com>
  2021-04-26 16:35 ` [PATCH 0/5 ver4] RS6000: Add 128-bit Integer Operations Carl Love
  2021-04-26 16:35 ` [PATCH 1/5 " Carl Love
@ 2021-04-26 16:36 ` Carl Love
  2021-04-27 23:46   ` will schmidt
  2021-06-04 23:14   ` Segher Boessenkool
  2021-04-26 16:36 ` [PATCH 3/5 ver4] RS6000: Add TI to TD (128-bit DFP) and TD to TI support Carl Love
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: Carl Love @ 2021-04-26 16:36 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Will, Segher:

This patch adds the 128-bit integer support for divide, modulo, shift,
compare of 128-bit integers instructions and builtin support.

The patch has been tested on
    powerpc64-linux instead (Power 8 BE)
    powerpc64-linux instead (Power 9 LE)
    powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

                   Carl Love

-----------------------------------------------------------

gcc/ChangeLog

2021-04-26  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_dive, vec_mod): Add define for new
	builtins.
	* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
	UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
	(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
	altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
	altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
	altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
	altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
	define_insn.
	(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
	vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
	altivec_vrlqnm): New define_expands.
	* config/rs6000/rs6000-builtin.def (VCMPEQUT_P, VCMPGTST_P,
	VCMPGTUT_P): Add macro expansions.
	(BU_P10V_AV_P): Add builtin predicate definition.
	(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
	CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
	VCMPAET_P, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ,
	VSLQ, VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
	MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
	(VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD): New overload expansions.
	* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
	P10V_BUILTIN_CMPGE_1TI, P10V_BUILTIN_CMPGE_U1TI,
	P10V_BUILTIN_VCMPGTUT, P10V_BUILTIN_VCMPGTST,
	P10V_BUILTIN_CMPLE_1TI, P10V_BUILTIN_VCMPLE_U1TI,
	P10V_BUILTIN_DIV_V1TI, P10V_BUILTIN_UDIV_V1TI,
	P10V_BUILTIN_VMULESD, P10V_BUILTIN_VMULEUD,
	P10V_BUILTIN_VMULOSD, P10V_BUILTIN_VMULOUD,
	P10V_BUILTIN_VNOR_V1TI, P10V_BUILTIN_VNOR_V1TI_UNS,
	P10V_BUILTIN_VRLQ, P10V_BUILTIN_VRLQMI,
	P10V_BUILTIN_VRLQNM, P10V_BUILTIN_VSLQ,
	P10V_BUILTIN_VSRQ, P10V_BUILTIN_VSRAQ,
	P10V_BUILTIN_VCMPGTUT_P, P10V_BUILTIN_VCMPGTST_P,
	P10V_BUILTIN_VCMPEQUT_P, P10V_BUILTIN_VCMPGTUT_P,
	P10V_BUILTIN_VCMPGTST_P, P10V_BUILTIN_CMPNET,
	P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P,
	P10V_BUILTIN_DIVES_V1TI, P10V_BUILTIN_MODS_V1TI,
	P10V_BUILTIN_MODU_V1TI):
	New overloaded definitions.
	(rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT,
	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
	P10_BUILTIN_CMPLE_U1TI]: New case statements.
	(rs6000_init_builtins) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
	New assignments.
	(altivec_init_builtins): New E_V1TImode case statement.
	(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
	P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
	P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.
	* config/rs6000/r6000.c (rs6000_handle_altivec_attribute)[E_TImode,
	E_V1TImode]: New case statements.
	* config/rs6000/r6000.h (rs6000_builtin_type_index): New enum
	value RS6000_BTI_bool_V1TI.
	* config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti,
	vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti,
	vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
	vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
	vlshrv1ti3, vashrv1ti3): New define_expands.
	* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
	UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
	UNSPEC_VSX_MODUQ): New unspecs.
	(mulv2di3, vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti,
	vsx_diveu_v1ti,	vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti): New
	define_insns.
	(vcmpnet): New define_expand.
	* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
	vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
	vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,
	vec_cmpge, vec_cmple, vec_all_eq, vec_all_ne, vec_all_gt, vec_all_lt,
	vec_all_ge, vec_all_le, vec_any_eq, vec_any_ne, vec_any_gt, vec_any_lt,
	vec_any_ge, vec_any_le.

gcc/testsuite/ChangeLog

2021-04-26 Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/int_128bit-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h                   |    3 +
 gcc/config/rs6000/altivec.md                  |  241 ++
 gcc/config/rs6000/rs6000-builtin.def          |   48 +-
 gcc/config/rs6000/rs6000-call.c               |  136 +-
 gcc/config/rs6000/rs6000.c                    |    1 +
 gcc/config/rs6000/rs6000.h                    |    3 +-
 gcc/config/rs6000/vector.md                   |  191 ++
 gcc/config/rs6000/vsx.md                      |   89 +
 gcc/doc/extend.texi                           |  171 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 2263 +++++++++++++++++
 10 files changed, 3143 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 961621a0841..314695a43ca 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -715,6 +715,9 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
 
 #ifdef _ARCH_PWR10
+#define vec_dive __builtin_vec_dive
+#define vec_mod  __builtin_vec_mod
+
 /* May modify these macro definitions if future capabilities overload
    with support for different vector argument and result types.  */
 #define vec_cntlzm(a, b)	__builtin_altivec_vclzdm (a, b)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 97dc9d2bda9..c4c82b33f8d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -39,12 +39,16 @@
    UNSPEC_VMULESH
    UNSPEC_VMULEUW
    UNSPEC_VMULESW
+   UNSPEC_VMULEUD
+   UNSPEC_VMULESD
    UNSPEC_VMULOUB
    UNSPEC_VMULOSB
    UNSPEC_VMULOUH
    UNSPEC_VMULOSH
    UNSPEC_VMULOUW
    UNSPEC_VMULOSW
+   UNSPEC_VMULOUD
+   UNSPEC_VMULOSD
    UNSPEC_VPKPX
    UNSPEC_VPACK_SIGN_SIGN_SAT
    UNSPEC_VPACK_SIGN_UNS_SAT
@@ -627,6 +631,14 @@
   "vcmpequ<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_eqv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpequq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gt<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gt:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -635,6 +647,14 @@
   "vcmpgts<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gt:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpgtsq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_gtu<mode>"
   [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
 	(gtu:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
@@ -643,6 +663,14 @@
   "vcmpgtu<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
+		  (match_operand:V1TI 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "vcmpgtuq %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_eqv4sf"
   [(set (match_operand:V4SF 0 "altivec_register_operand" "=v")
 	(eq:V4SF (match_operand:V4SF 1 "altivec_register_operand" "v")
@@ -1693,6 +1721,19 @@
  DONE;
 })
 
+(define_expand "vec_widen_umult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
 (define_expand "vec_widen_smult_even_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1706,6 +1747,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_even_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+ else
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_umult_odd_v16qi"
   [(use (match_operand:V8HI 0 "register_operand"))
    (use (match_operand:V16QI 1 "register_operand"))
@@ -1771,6 +1825,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_umult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuloud (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuleud (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_expand "vec_widen_smult_odd_v4si"
   [(use (match_operand:V2DI 0 "register_operand"))
    (use (match_operand:V4SI 1 "register_operand"))
@@ -1784,6 +1851,19 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_odd_v2di"
+  [(use (match_operand:V1TI 0 "register_operand"))
+   (use (match_operand:V2DI 1 "register_operand"))
+   (use (match_operand:V2DI 2 "register_operand"))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulosd (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmulesd (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "altivec_vmuleub"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
@@ -1865,6 +1945,15 @@
   "vmuleuw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuleud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULEUD))]
+  "TARGET_POWER10"
+  "vmuleud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulouw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1874,6 +1963,15 @@
   "vmulouw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmuloud"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOUD))]
+  "TARGET_POWER10"
+  "vmuloud %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulesw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1883,6 +1981,15 @@
   "vmulesw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulesd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULESD))]
+  "TARGET_POWER10"
+  "vmulesd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 (define_insn "altivec_vmulosw"
   [(set (match_operand:V2DI 0 "register_operand" "=v")
        (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1892,6 +1999,15 @@
   "vmulosw %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
+(define_insn "altivec_vmulosd"
+  [(set (match_operand:V1TI 0 "register_operand" "=v")
+       (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+                     (match_operand:V2DI 2 "register_operand" "v")]
+                    UNSPEC_VMULOSD))]
+  "TARGET_POWER10"
+  "vmulosd %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
 ;; Vector pack/unpack
 (define_insn "altivec_vpkpx"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
@@ -1985,6 +2101,15 @@
   "vrl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vrlq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+;; rotate amount in needs to be in bits[57:63] of operand2.
+  "vrlq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vrl<VI_char>mi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
@@ -1995,6 +2120,34 @@
   "vrl<VI_char>mi %0,%1,%3"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqmi"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")
+		      (match_operand:V1TI 3 "vsx_register_operand")]
+		     UNSPEC_VRLMI))]
+  "TARGET_POWER10"
+{
+  /* Mask bit begin, end fields need to be in bits [41:55] of 128-bit operand2.
+     Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[3]));
+  emit_insn (gen_altivec_vrlqmi_inst (operands[0], operands[1], operands[2],
+				      tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqmi_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "0")
+		      (match_operand:V1TI 3 "vsx_register_operand" "v")]
+		     UNSPEC_VRLMI))]
+  "TARGET_POWER10"
+  "vrlqmi %0,%1,%3"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vrl<VI_char>nm"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
         (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
@@ -2004,6 +2157,31 @@
   "vrl<VI_char>nm %0,%1,%2"
   [(set_attr "type" "veclogical")])
 
+(define_expand "altivec_vrlqnm"
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand")
+		      (match_operand:V1TI 2 "vsx_register_operand")]
+		     UNSPEC_VRLNM))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vrlqnm_inst (operands[0], operands[1], tmp));
+  DONE;
+})
+
+(define_insn "altivec_vrlqnm_inst"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+		      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+		     UNSPEC_VRLNM))]
+  "TARGET_POWER10"
+  ;; rotate and mask bits need to be in upper 64-bits of operand2.
+  "vrlqnm %0,%1,%2"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vsl"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2048,6 +2226,15 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vslq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vslq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsr<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (lshiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2056,6 +2243,15 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsrq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsrq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsra<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (ashiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
@@ -2064,6 +2260,15 @@
   "vsra<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vsraq"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+  /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
+  "vsraq %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vsr"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
@@ -2624,6 +2829,18 @@
   "vcmpequ<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "altivec_vcmpequt_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v")
+			   (match_operand:V1TI 2 "altivec_register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "altivec_register_operand" "=v")
+	(eq:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpequq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgts<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2636,6 +2853,18 @@
   "vcmpgts<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtst_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gt:CC (match_operand:V1TI 1 "register_operand" "v")
+			   (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gt:V1TI (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpgtsq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpgtu<VI_char>_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(gtu:CC (match_operand:VI2 1 "register_operand" "v")
@@ -2648,6 +2877,18 @@
   "vcmpgtu<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+(define_insn "*altivec_vcmpgtut_p"
+  [(set (reg:CC CR6_REGNO)
+	(unspec:CC [(gtu:CC (match_operand:V1TI 1 "register_operand" "v")
+			    (match_operand:V1TI 2 "register_operand" "v"))]
+		   UNSPEC_PREDICATE))
+   (set (match_operand:V1TI 0 "register_operand" "=v")
+	(gtu:V1TI (match_dup 1)
+		  (match_dup 2)))]
+  "TARGET_POWER10"
+  "vcmpgtuq. %0,%1,%2"
+  [(set_attr "type" "veccmpfx")])
+
 (define_insn "*altivec_vcmpeqfp_p"
   [(set (reg:CC CR6_REGNO)
 	(unspec:CC [(eq:CC (match_operand:V4SF 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 609bebdfd74..dba22825b79 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1269,6 +1269,15 @@
 		     | RS6000_BTC_TERNARY),				\
 		    CODE_FOR_ ## ICODE)			/* ICODE */
 
+/* See the comment on BU_ALTIVEC_P.  */
+#define BU_P10V_AV_P(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_P (P10V_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P10,	 		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_PREDICATE),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
 #define BU_P10V_AV_X(ENUM, NAME, ATTR)					\
   RS6000_BUILTIN_X (P10_BUILTIN_ ## ENUM,		/* ENUM */	\
 		    "__builtin_altivec_" NAME,		/* NAME */	\
@@ -2880,6 +2889,10 @@ BU_P9_OVERLOAD_2 (CMPRB2,	"byte_in_either_range")
 BU_P9_OVERLOAD_2 (CMPEQB,	"byte_in_set")
 \f
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
+BU_P10V_AV_P (VCMPEQUT_P,	"vcmpequt_p",	CONST,	vector_eq_v1ti_p)
+BU_P10V_AV_P (VCMPGTST_P,	"vcmpgtst_p",	CONST,	vector_gt_v1ti_p)
+BU_P10V_AV_P (VCMPGTUT_P,	"vcmpgtut_p",	CONST,	vector_gtu_v1ti_p)
+
 BU_P10_POWERPC64_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_POWERPC64_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
 BU_P10_POWERPC64_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm)
@@ -2900,7 +2913,36 @@ BU_P10V_VSX_2 (XXGENPCVM_V16QI, "xxgenpcvm_v16qi", CONST, xxgenpcvm_v16qi)
 BU_P10V_VSX_2 (XXGENPCVM_V8HI, "xxgenpcvm_v8hi", CONST, xxgenpcvm_v8hi)
 BU_P10V_VSX_2 (XXGENPCVM_V4SI, "xxgenpcvm_v4si", CONST, xxgenpcvm_v4si)
 BU_P10V_VSX_2 (XXGENPCVM_V2DI, "xxgenpcvm_v2di", CONST, xxgenpcvm_v2di)
-
+BU_P10V_AV_2 (VCMPGTUT,		"vcmpgtut",	CONST,	vector_gtuv1ti)
+BU_P10V_AV_2 (VCMPGTST,		"vcmpgtst",	CONST,	vector_gtv1ti)
+BU_P10V_AV_2 (VCMPEQUT,		"vcmpequt",	CONST,	eqvv1ti3)
+BU_P10V_AV_2 (CMPNET,		"vcmpnet",	CONST,	vcmpnet)
+BU_P10V_AV_2 (CMPGE_1TI,	"cmpge_1ti",    CONST,  vector_nltv1ti)
+BU_P10V_AV_2 (CMPGE_U1TI,	"cmpge_u1ti",   CONST,  vector_nltuv1ti)
+BU_P10V_AV_2 (CMPLE_1TI,	"cmple_1ti",    CONST,  vector_ngtv1ti)
+BU_P10V_AV_2 (CMPLE_U1TI,	"cmple_u1ti",   CONST,  vector_ngtuv1ti)
+BU_P10V_AV_2 (VNOR_V1TI_UNS,	"vnor_v1ti_uns",CONST,	norv1ti3)
+BU_P10V_AV_2 (VNOR_V1TI,	"vnor_v1ti",	CONST,	norv1ti3)
+BU_P10V_AV_2 (VCMPNET_P,	"vcmpnet_p",	CONST,	vector_ne_v1ti_p)
+BU_P10V_AV_2 (VCMPAET_P,	"vcmpaet_p",	CONST,	vector_ae_v1ti_p)
+
+BU_P10V_AV_2 (VMULEUD,	"vmuleud",	CONST,	vec_widen_umult_even_v2di)
+BU_P10V_AV_2 (VMULESD,	"vmulesd",	CONST,	vec_widen_smult_even_v2di)
+BU_P10V_AV_2 (VMULOUD,	"vmuloud",	CONST,	vec_widen_umult_odd_v2di)
+BU_P10V_AV_2 (VMULOSD,	"vmulosd",	CONST,	vec_widen_smult_odd_v2di)
+BU_P10V_AV_2 (VRLQ,		"vrlq",		CONST,	vrotlv1ti3)
+BU_P10V_AV_2 (VSLQ,		"vslq",		CONST,	vashlv1ti3)
+BU_P10V_AV_2 (VSRQ,		"vsrq",		CONST,	vlshrv1ti3)
+BU_P10V_AV_2 (VSRAQ,		"vsraq",	CONST,	vashrv1ti3)
+BU_P10V_AV_2 (VRLQNM,	"vrlqnm",	CONST,	altivec_vrlqnm)
+BU_P10V_AV_2 (DIV_V1TI,	"div_1ti",      CONST,  vsx_div_v1ti)
+BU_P10V_AV_2 (UDIV_V1TI,	"udiv_1ti",     CONST,  vsx_udiv_v1ti)
+BU_P10V_AV_2 (DIVES_V1TI,	"dives",	CONST,	vsx_dives_v1ti)
+BU_P10V_AV_2 (DIVEU_V1TI,	"diveu",	CONST,	vsx_diveu_v1ti)
+BU_P10V_AV_2 (MODS_V1TI,	"mods",		CONST,	vsx_mods_v1ti)
+BU_P10V_AV_2 (MODU_V1TI,	"modu",		CONST,	vsx_modu_v1ti)
+
+BU_P10V_AV_3 (VRLQMI,	"vrlqmi",	CONST,	altivec_vrlqmi)
 BU_P10V_AV_3 (VEXTRACTBL, "vextdubvlx", CONST, vextractlv16qi)
 BU_P10V_AV_3 (VEXTRACTHL, "vextduhvlx", CONST, vextractlv8hi)
 BU_P10V_AV_3 (VEXTRACTWL, "vextduwvlx", CONST, vextractlv4si)
@@ -3025,6 +3067,10 @@ BU_P10_OVERLOAD_2 (CLRR, "clrr")
 BU_P10_OVERLOAD_2 (GNB, "gnb")
 BU_P10_OVERLOAD_4 (XXEVAL, "xxeval")
 BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
+BU_P10_OVERLOAD_2 (VRLQ, "vrlq")
+BU_P10_OVERLOAD_2 (VSLQ, "vslq")
+BU_P10_OVERLOAD_2 (VSRQ, "vsrq")
+BU_P10_OVERLOAD_2 (VSRAQ, "vsraq")
 
 BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
 BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index f5676255387..32a8af92458 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -843,6 +843,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P10V_BUILTIN_VCMPEQUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, VSX_BUILTIN_XVCMPEQDP,
@@ -889,6 +893,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, P10V_BUILTIN_CMPGE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -903,8 +913,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTUD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTUT,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTSD,
     RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P10V_BUILTIN_VCMPGTST,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, VSX_BUILTIN_XVCMPGTDP,
@@ -947,6 +961,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_CMPLE_U2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPLE, P10V_BUILTIN_CMPLE_U1TI,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0},
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTUB,
     RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTSB,
@@ -1086,6 +1105,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIV_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_UDIV_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
 
   { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V4SI,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
@@ -1097,6 +1121,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
 
   { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V4SI,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
@@ -1108,6 +1137,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
 
   { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 },
@@ -1973,6 +2007,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULE, P8V_BUILTIN_VMULEUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10V_BUILTIN_VMULESD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULE, P10V_BUILTIN_VMULEUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULEUB, ALTIVEC_BUILTIN_VMULEUB,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULESB, ALTIVEC_BUILTIN_VMULESB,
@@ -1996,6 +2035,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_MULO, P8V_BUILTIN_VMULOUW,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10V_BUILTIN_VMULOSD,
+    RS6000_BTI_V1TI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MULO, P10V_BUILTIN_VMULOUD,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULO, ALTIVEC_BUILTIN_VMULOSH,
     RS6000_BTI_V4SI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMULOSH, ALTIVEC_BUILTIN_VMULOSH,
@@ -2038,6 +2082,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_NOR, P10V_BUILTIN_VNOR_V1TI_UNS,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_NOR, ALTIVEC_BUILTIN_VNOR_V2DI_UNS,
@@ -2299,6 +2353,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_RL, P8V_BUILTIN_VRLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10V_BUILTIN_VRLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P10V_BUILTIN_VRLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
@@ -2317,12 +2376,23 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_RLMI, P9V_BUILTIN_VRLDMI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { P9V_BUILTIN_VEC_RLMI, P10V_BUILTIN_VRLQMI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI },
+  { P9V_BUILTIN_VEC_RLMI, P10V_BUILTIN_VRLQMI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLWNM,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
   { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLDNM,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10V_BUILTIN_VRLQNM,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
+  { P9V_BUILTIN_VEC_RLNM, P10V_BUILTIN_VRLQNM,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB,
@@ -2339,6 +2409,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, P8V_BUILTIN_VSLD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10V_BUILTIN_VSLQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P10V_BUILTIN_VSLQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTSP,
@@ -2535,6 +2610,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SR, P8V_BUILTIN_VSRD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10V_BUILTIN_VSRQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P10V_BUILTIN_VSRQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
@@ -2563,6 +2643,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SRA, P8V_BUILTIN_VSRAD,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10V_BUILTIN_VSRAQ,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P10V_BUILTIN_VSRAQ,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
@@ -4180,12 +4265,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10V_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGT_P, P10V_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, ALTIVEC_BUILTIN_VCMPGTFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGT_P, VSX_BUILTIN_XVCMPGTDP_P,
@@ -4250,6 +4339,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10V_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P10V_BUILTIN_VCMPEQUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, ALTIVEC_BUILTIN_VCMPEQFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, VSX_BUILTIN_XVCMPEQDP_P,
@@ -4301,12 +4394,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10V_BUILTIN_VCMPGTUT_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P10V_BUILTIN_VCMPGTST_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, ALTIVEC_BUILTIN_VCMPGEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
@@ -4955,6 +5052,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEW,
     RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_V1TI,
+    RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P10V_BUILTIN_CMPNET,
+    RS6000_BTI_bool_V1TI, RS6000_BTI_unsigned_V1TI,
+    RS6000_BTI_unsigned_V1TI, 0 },
 
   /* The following 2 entries have been deprecated.  */
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEB_P,
@@ -5055,6 +5158,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
 
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
@@ -5160,7 +5267,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAED_P,
     RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI,
     RS6000_BTI_bool_V2DI, 0 },
-
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
+    RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
@@ -12577,12 +12687,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPEQUH:
     case ALTIVEC_BUILTIN_VCMPEQUW:
     case P8V_BUILTIN_VCMPEQUD:
+    case P10V_BUILTIN_VCMPEQUT:
       fold_compare_helper (gsi, EQ_EXPR, stmt);
       return true;
 
     case P9V_BUILTIN_CMPNEB:
     case P9V_BUILTIN_CMPNEH:
     case P9V_BUILTIN_CMPNEW:
+    case P10V_BUILTIN_CMPNET:
       fold_compare_helper (gsi, NE_EXPR, stmt);
       return true;
 
@@ -12594,6 +12706,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_2DI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10V_BUILTIN_CMPGE_1TI:
+    case P10V_BUILTIN_CMPGE_U1TI:
       fold_compare_helper (gsi, GE_EXPR, stmt);
       return true;
 
@@ -12605,6 +12719,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
     case P8V_BUILTIN_VCMPGTSD:
+    case P10V_BUILTIN_VCMPGTUT:
+    case P10V_BUILTIN_VCMPGTST:
       fold_compare_helper (gsi, GT_EXPR, stmt);
       return true;
 
@@ -12616,6 +12732,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     case VSX_BUILTIN_CMPLE_U4SI:
     case VSX_BUILTIN_CMPLE_2DI:
     case VSX_BUILTIN_CMPLE_U2DI:
+    case P10V_BUILTIN_CMPLE_1TI:
+    case P10V_BUILTIN_CMPLE_U1TI:
       fold_compare_helper (gsi, LE_EXPR, stmt);
       return true;
 
@@ -13343,6 +13461,8 @@ rs6000_init_builtins (void)
 					    ? "__vector __bool long"
 					    : "__vector __bool long long",
 					    bool_long_long_type_node, 2);
+  bool_V1TI_type_node = rs6000_vector_type ("__vector __bool __int128",
+					    intTI_type_node, 1);
   pixel_V8HI_type_node = rs6000_vector_type ("__vector __pixel",
 					     pixel_type_node, 8);
 
@@ -13540,6 +13660,10 @@ altivec_init_builtins (void)
     = build_function_type_list (integer_type_node,
 				integer_type_node, V2DI_type_node,
 				V2DI_type_node, NULL_TREE);
+  tree int_ftype_int_v1ti_v1ti
+    = build_function_type_list (integer_type_node,
+				integer_type_node, V1TI_type_node,
+				V1TI_type_node, NULL_TREE);
   tree void_ftype_v4si
     = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
   tree v8hi_ftype_void
@@ -13907,6 +14031,9 @@ altivec_init_builtins (void)
 	case E_VOIDmode:
 	  type = int_ftype_int_opaque_opaque;
 	  break;
+	case E_V1TImode:
+	  type = int_ftype_int_v1ti_v1ti;
+	  break;
 	case E_V2DImode:
 	  type = int_ftype_int_v2di_v2di;
 	  break;
@@ -14512,12 +14639,16 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P10V_BUILTIN_XXGENPCVM_V2DI:
     case P10V_BUILTIN_DIVEU_V4SI:
     case P10V_BUILTIN_DIVEU_V2DI:
+    case P10V_BUILTIN_DIVEU_V1TI:
     case P10V_BUILTIN_DIVU_V4SI:
     case P10V_BUILTIN_DIVU_V2DI:
+    case P10V_BUILTIN_MODU_V1TI:
     case P10V_BUILTIN_MODU_V2DI:
     case P10V_BUILTIN_MODU_V4SI:
     case P10V_BUILTIN_MULHU_V2DI:
     case P10V_BUILTIN_MULHU_V4SI:
+    case P10V_BUILTIN_VMULEUD:
+    case P10V_BUILTIN_VMULOUD:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
@@ -14617,10 +14748,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case VSX_BUILTIN_CMPGE_U8HI:
     case VSX_BUILTIN_CMPGE_U4SI:
     case VSX_BUILTIN_CMPGE_U2DI:
+    case P10V_BUILTIN_CMPGE_U1TI:
     case ALTIVEC_BUILTIN_VCMPGTUB:
     case ALTIVEC_BUILTIN_VCMPGTUH:
     case ALTIVEC_BUILTIN_VCMPGTUW:
     case P8V_BUILTIN_VCMPGTUD:
+    case P10V_BUILTIN_VCMPGTUT:
+    case P10V_BUILTIN_VCMPEQUT:
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
       break;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 844fee88cf3..531a9c87243 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -20225,6 +20225,7 @@ rs6000_handle_altivec_attribute (tree *node,
     case 'b':
       switch (mode)
 	{
+	case E_TImode: case E_V1TImode: result = bool_V1TI_type_node; break;
 	case E_DImode: case E_V2DImode: result = bool_V2DI_type_node; break;
 	case E_SImode: case E_V4SImode: result = bool_V4SI_type_node; break;
 	case E_HImode: case E_V8HImode: result = bool_V8HI_type_node; break;
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 164d359b724..a7953fad965 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2332,7 +2332,6 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_MMA		MASK_MMA	/* ISA 3.1 MMA.  */
 #define RS6000_BTM_P10		MASK_POWER10
 
-
 #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
 				 | RS6000_BTM_VSX			\
 				 | RS6000_BTM_P8_VECTOR			\
@@ -2445,6 +2444,7 @@ enum rs6000_builtin_type_index
   RS6000_BTI_bool_V8HI,          /* __vector __bool short */
   RS6000_BTI_bool_V4SI,          /* __vector __bool int */
   RS6000_BTI_bool_V2DI,          /* __vector __bool long */
+  RS6000_BTI_bool_V1TI,          /* __vector __bool 128-bit */
   RS6000_BTI_pixel_V8HI,         /* __vector __pixel */
   RS6000_BTI_long,	         /* long_integer_type_node */
   RS6000_BTI_unsigned_long,      /* long_unsigned_type_node */
@@ -2498,6 +2498,7 @@ enum rs6000_builtin_type_index
 #define bool_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V8HI])
 #define bool_V4SI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V4SI])
 #define bool_V2DI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V2DI])
+#define bool_V1TI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V1TI])
 #define pixel_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_pixel_V8HI])
 
 #define long_long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long_long])
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 3446b03d40d..55bbaa9c32f 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -685,6 +685,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtv1ti"
+  [(set (match_operand:V1TI 0 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))]
+  "TARGET_POWER10"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nlt<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -697,6 +704,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		 (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_gtu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -704,6 +722,13 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtuv1ti"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		  (match_operand:V1TI 2 "altivec_register_operand")))]
+  "TARGET_POWER10"
+  "")
+
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nltu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -716,6 +741,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_nltuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 2 "vlogical_operand")
+		  (match_operand:V1TI 1 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+	(not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_geu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand")
 	(geu:VEC_I (match_operand:VEC_I 1 "vint_operand")
@@ -735,6 +771,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		 (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 (define_expand "vector_ngtu<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vlogical_operand")
@@ -746,6 +793,17 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
+(define_expand "vector_ngtuv1ti"
+  [(set (match_operand:V1TI 3 "vlogical_operand")
+	(gtu:V1TI (match_operand:V1TI 1 "vlogical_operand")
+		  (match_operand:V1TI 2 "vlogical_operand")))
+   (set (match_operand:V1TI 0 "vlogical_operand")
+        (not:V1TI (match_dup 3)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx_and_attrs (operands[0]);
+})
+
 ; There are 14 possible vector FP comparison operators, gt and eq of them have
 ; been expanded above, so just support 12 remaining operators here.
 
@@ -894,6 +952,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_eq_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 ;; This expansion handles the V16QI, V8HI, and V4SI modes in the
 ;; implementation of the vec_all_ne built-in functions on Power9.
 (define_expand "vector_ne_<mode>_p"
@@ -976,6 +1046,23 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+(define_expand "vector_ne_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V2DI mode in the implementation of the
 ;; vec_any_eq built-in function on Power9.
 ;;
@@ -1002,6 +1089,26 @@
   operands[3] = gen_reg_rtx (V2DImode);
 })
 
+(define_expand "vector_ae_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand")
+			     (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_dup 3)
+	  (eq:V1TI (match_dup 1)
+		   (match_dup 2)))])
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(eq:SI (reg:CC CR6_REGNO)
+	       (const_int 0)))
+   (set (match_dup 0)
+	(xor:SI (match_dup 0)
+		(const_int 1)))]
+  "TARGET_POWER10"
+{
+  operands[3] = gen_reg_rtx (V1TImode);
+})
+
 ;; This expansion handles the V4SF and V2DF modes in the Power9
 ;; implementation of the vec_all_ne built-in functions.  Note that the
 ;; expansions for this pattern with these modes makes no use of power9-
@@ -1061,6 +1168,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gt_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gt:CC (match_operand:V1TI 1 "vlogical_operand")
+			     (match_operand:V1TI 2 "vlogical_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "vlogical_operand")
+	  (gt:V1TI (match_dup 1)
+		   (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 (define_expand "vector_ge_<mode>_p"
   [(parallel
     [(set (reg:CC CR6_REGNO)
@@ -1085,6 +1204,18 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vector_gtu_v1ti_p"
+  [(parallel
+    [(set (reg:CC CR6_REGNO)
+	  (unspec:CC [(gtu:CC (match_operand:V1TI 1 "altivec_register_operand")
+			      (match_operand:V1TI 2 "altivec_register_operand"))]
+		     UNSPEC_PREDICATE))
+     (set (match_operand:V1TI 0 "altivec_register_operand")
+	  (gtu:V1TI (match_dup 1)
+		    (match_dup 2)))])]
+  "TARGET_POWER10"
+  "")
+
 ;; AltiVec/VSX predicates.
 
 ;; This expansion is triggered during expansion of predicate built-in
@@ -1460,6 +1591,20 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+(define_expand "vrotlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (rotate:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+                     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vrlq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for rotatert to make use of vrotl
 (define_expand "vrotr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1481,6 +1626,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vashlv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vslq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for logical shift right on each vector element
 (define_expand "vlshr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1489,6 +1649,21 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; No immediate version of this 128-bit instruction
+(define_expand "vlshrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vsrq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 ;; Expanders for arithmetic shift right on each vector element
 (define_expand "vashr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
@@ -1496,6 +1671,22 @@
 			(match_operand:VEC_I 2 "vint_operand")))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
+
+;; No immediate version of this 128-bit instruction
+(define_expand "vashrv1ti3"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(ashiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
+		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
+  rtx tmp = gen_reg_rtx (V1TImode);
+
+  emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
+  emit_insn (gen_altivec_vsraq (operands[0], operands[1], tmp));
+  DONE;
+})
+
 \f
 ;; Vector reduction expanders for VSX
 ; The (VEC_reduc:...
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bcb92be2f5c..ba539549024 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -302,6 +302,12 @@
    UNSPEC_VSX_XXSPLTD
    UNSPEC_VSX_DIVSD
    UNSPEC_VSX_DIVUD
+   UNSPEC_VSX_DIVSQ
+   UNSPEC_VSX_DIVUQ
+   UNSPEC_VSX_DIVESQ
+   UNSPEC_VSX_DIVEUQ
+   UNSPEC_VSX_MODSQ
+   UNSPEC_VSX_MODUQ
    UNSPEC_VSX_MULSD
    UNSPEC_VSX_SIGN_EXTEND
    UNSPEC_VSX_XVCVBF16SPN
@@ -1781,6 +1787,61 @@
 }
   [(set_attr "type" "div")])
 
+;; Vector integer signed/unsigned divide
+(define_insn "vsx_div_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVSQ))]
+  "TARGET_POWER10"
+  "vdivsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_udiv_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVUQ))]
+  "TARGET_POWER10"
+  "vdivuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_dives_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVESQ))]
+  "TARGET_POWER10"
+  "vdivesq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_diveu_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_DIVEUQ))]
+  "TARGET_POWER10"
+  "vdiveuq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_mods_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODSQ))]
+  "TARGET_POWER10"
+  "vmodsq %0,%1,%2"
+  [(set_attr "type" "div")])
+
+(define_insn "vsx_modu_v1ti"
+ [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+        (unspec:V1TI [(match_operand:V1TI 1 "vsx_register_operand" "v")
+                      (match_operand:V1TI 2 "vsx_register_operand" "v")]
+                     UNSPEC_VSX_MODUQ))]
+  "TARGET_POWER10"
+  "vmoduq %0,%1,%2"
+  [(set_attr "type" "div")])
+
 ;; *tdiv* instruction returning the FG flag
 (define_expand "vsx_tdiv<mode>3_fg"
   [(set (match_dup 3)
@@ -3126,6 +3187,21 @@
   "xxpermdi %x0,%x1,%x1,2"
   [(set_attr "type" "vecperm")])
 
+;; Swap upper/lower 64-bit values in a 128-bit vector
+(define_insn "xxswapd_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(subreg:V1TI
+           (vec_select:V2DI
+             (subreg:V2DI
+                (match_operand:V1TI 1 "vsx_register_operand" "v") 0 )
+	     (parallel [(const_int 1)(const_int 0)]))
+           0))]
+  "TARGET_POWER10"
+;; AIX does not support extended mnemonic xxswapd.  Use the basic
+;; mnemonic xxpermdi instead.
+  "xxpermdi %x0,%x1,%x1,2"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "xxgenpcvm_<mode>_internal"
   [(set (match_operand:VSX_EXTRACT_I4 0 "altivec_register_operand" "=wa")
 	(unspec:VSX_EXTRACT_I4
@@ -5525,6 +5601,19 @@
   "vcmpneb %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Vector Compare Not Equal v1ti (specified/not+eq:)
+(define_expand "vcmpnet"
+  [(set (match_operand:V1TI 0 "altivec_register_operand")
+	(not:V1TI
+	  (eq:V1TI (match_operand:V1TI 1 "altivec_register_operand")
+		   (match_operand:V1TI 2 "altivec_register_operand"))))]
+   "TARGET_POWER10"
+{
+  emit_insn (gen_eqvv1ti3 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_one_cmplv1ti2 (operands[0], operands[0]));
+  DONE;
+})
+
 ;; Vector Compare Not Equal or Zero Byte
 (define_insn "vcmpnezb"
   [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 3260f0639d2..d1c56edbaa8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20131,6 +20131,177 @@ Generate PCV from specified Mask size, as if implemented by the
 immediate value is either 0, 1, 2 or 3.
 @findex vec_genpcvm
 
+@smallexample
+@exdent vector unsigned __int128 vec_rl (vector unsigned __int128 A,
+                                         vector unsigned __int128 B);
+@exdent vector signed __int128 vec_rl (vector signed __int128 A,
+                                       vector unsigned __int128 B);
+@end smallexample
+
+Result value: Each element of R is obtained by rotating the corresponding element
+of A left by the number of bits specified by the corresponding element of B.
+
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlmi (vector signed __int128,
+                                         vector signed __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and inserting it under mask
+into the second input.  The first bit in the mask, the last bit in the mask are
+obtained from the two 7-bit fields bits [108:115] and bits [117:123]
+respectively of the second input.  The shift is obtained from the third input
+in the 7-bit field [125:131] where all bits counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlnm (vector signed __int128,
+                                         vector unsigned __int128,
+                                         vector unsigned __int128);
+@end smallexample
+
+Returns the result of rotating the first input and ANDing it with a mask.  The
+first bit in the mask and the last bit in the mask are obtained from the two
+7-bit fields bits [117:123] and bits [125:131] respectively of the second
+input.  The shift is obtained from the third input in the 7-bit field bits
+[125:131] where all bits counted from zero at the left.
+
+@smallexample
+@exdent vector unsigned __int128 vec_sl(vector unsigned __int128 A, vector unsigned __int128 B);
+@exdent vector signed __int128 vec_sl(vector signed __int128 A, vector unsigned __int128 B);
+@end smallexample
+
+Result value: Each element of R is obtained by shifting the corresponding element of
+A left by the number of bits specified by the corresponding element of B.
+
+@smallexample
+@exdent vector unsigned __int128 vec_sr(vector unsigned __int128 A, vector unsigned __int128 B);
+@exdent vector signed __int128 vec_sr(vector signed __int128 A, vector unsigned __int128 B);
+@end smallexample
+
+Result value: Each element of R is obtained by shifting the corresponding element of
+A right by the number of bits specified by the corresponding element of B.
+
+@smallexample
+@exdent vector unsigned __int128 vec_sra(vector unsigned __int128 A, vector unsigned __int128 B);
+@exdent vector signed __int128 vec_sra(vector signed __int128 A, vector unsigned __int128 B);
+@end smallexample
+
+Result value: Each element of R is obtained by arithmetic shifting the corresponding
+element of A right by the number of bits specified by the corresponding element of B.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mule (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mule (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the even
+doubleword elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mulo (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mulo (vector signed long long,
+                                         vector signed long long);
+@end smallexample
+
+Returns a vector containing a 128-bit integer result of multiplying the odd
+doubleword elements of the two inputs.
+
+@smallexample
+@exdent vector unsigned __int128 vec_div (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_div (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+Returns the result of dividing the first operand by the second operand. An
+attempt to divide any value by zero or to divide the most negative signed
+128-bit integer by negative one results in an undefined value.
+
+@smallexample
+@exdent vector unsigned __int128 vec_dive (vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_dive (vector signed __int128,
+                                         vector signed __int128);
+@end smallexample
+
+The result is produced by shifting the first input left by 128 bits and
+dividing by the second.  If an attempt is made to divide by zero or the result
+is larger than 128 bits, the result is undefined.
+
+@smallexample
+@exdent vector unsigned __int128 vec_mod (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_mod (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
+
+The result is the modulo result of dividing the first input  by the second
+input.
+
+The following builtins perform 128-bit vector comparisons.  The
+@code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is
+one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise
+comparisons between the elements at the same positions within their two vector
+arguments.  The @code{vec_all_xx}function returns a non-zero value if and only
+if all pairwise comparisons are true.  The @code{vec_any_xx} function returns
+a non-zero value if and only if at least one pairwise comparison is true.  The
+@code{vec_cmpxx}function returns a vector of the same type as its two
+arguments, within which each element consists of all ones to denote that
+specified logical comparison of the corresponding elements was true.
+Otherwise, the element of the returned vector contains all zeros.
+
+@smallexample
+vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128);
+
+int vec_all_eq (vector signed __int128, vector signed __int128);
+int vec_all_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ne (vector signed __int128, vector signed __int128);
+int vec_all_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_all_gt (vector signed __int128, vector signed __int128);
+int vec_all_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_lt (vector signed __int128, vector signed __int128);
+int vec_all_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ge (vector signed __int128, vector signed __int128);
+int vec_all_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_all_le (vector signed __int128, vector signed __int128);
+int vec_all_le (vector unsigned __int128, vector unsigned __int128);
+
+int vec_any_eq (vector signed __int128, vector signed __int128);
+int vec_any_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ne (vector signed __int128, vector signed __int128);
+int vec_any_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_any_gt (vector signed __int128, vector signed __int128);
+int vec_any_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_lt (vector signed __int128, vector signed __int128);
+int vec_any_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ge (vector signed __int128, vector signed __int128);
+int vec_any_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_any_le (vector signed __int128, vector signed __int128);
+int vec_any_le (vector unsigned __int128, vector unsigned __int128);
+@end smallexample
+
+
 @node PowerPC Hardware Transactional Memory Built-in Functions
 @subsection PowerPC Hardware Transactional Memory Built-in Functions
 GCC provides two interfaces for accessing the Hardware Transactional
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
new file mode 100644
index 00000000000..042758c8684
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -0,0 +1,2263 @@
+/* { dg-do run } */
+/* { dg-options "-mcpu=power10 -save-temps" } */
+/* { dg-require-effective-target power10_hw } */
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mvslq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsrq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsraq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvcmpuq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpsq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq.\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvmuleud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulld\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivsq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivuq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdivesq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvdiveuq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmodsq\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmoduq\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+
+
+void print_i128(__int128_t val)
+{
+  printf(" %lld %llu (0x%llx %llx)",
+	 (signed long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF),
+	 (unsigned long long)(val >> 64),
+	 (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF));
+}
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i, result_int;
+
+  __int128_t arg1, result;
+  __uint128_t uarg2;
+
+  vector signed long long int vec_arg1_di, vec_arg2_di;
+  vector signed long long int vec_result_di, vec_expected_result_di;
+  vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
+  vector unsigned long long int vec_uresult_di;
+  vector unsigned long long int vec_uexpected_result_di;
+  
+  __int128_t expected_result;
+  __uint128_t uexpected_result;
+
+  vector __int128 vec_arg1, vec_arg2, vec_result;
+  vector unsigned __int128 vec_uarg1, vec_uarg2, vec_uarg3, vec_uresult;
+  vector bool __int128  vec_result_bool;
+  
+  /* test shift 128-bit integers.
+     Note, shift amount is given by the lower 7-bits of the shift amount. */
+  vec_arg1[0] = 3;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]*4;
+
+  vec_result = vec_sl (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 3;
+  uarg2 = 4;
+  expected_result = arg1*16;
+
+  result = arg1 << uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 << uint128):  ");
+    print_i128(arg1);
+    printf(" << %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 3;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]*4;
+  
+  vec_uresult = vec_sl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12;
+  vec_uarg2[0] = 2;
+  expected_result = vec_arg1[0]/4;
+
+  vec_result = vec_sr (vec_arg1, vec_uarg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 48;
+  vec_uarg2[0] = 2;
+  uexpected_result = vec_uarg1[0]/4;
+  
+  vec_uresult = vec_sr (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sr(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld", vec_uarg2[0] & 0xFF);
+    printf(" = ");
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  arg1 = 48;
+  uarg2 = 4;
+  expected_result = arg1/16;
+
+  result = arg1 >> uarg2;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR: int128 >> uint128:  ");
+    print_i128(arg1);
+    printf(" >> %lld", uarg2 & 0xFF);
+    printf(" = ");
+    print_i128(result);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x0000000012345678ULL;
+  expected_result = (expected_result << 64) | 0x90ABCDEFAABBCCDDULL;
+
+  vec_result = vec_sra (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0xFFFFFFFFFFFFAABBLL;
+  uexpected_result = (uexpected_result << 64) | 0xCCDDEEFF11221234ULL;
+
+  vec_uresult = vec_sra (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_sra(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  expected_result = 0x90ABCDEFAABBCCDDULL;
+  expected_result = (expected_result << 64) | 0xEEFF112212345678ULL;
+
+  vec_result = vec_rl (vec_arg1, vec_uarg2);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  uexpected_result = 0x11221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEEFFULL;
+
+  vec_uresult = vec_rl (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rl(uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" >> %lld = \n", vec_uarg2[0]);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* vec_rlnm(arg1, arg2, arg3)
+     result - rotate each element of arg1 left by shift in element of arg2.
+       Then AND with mask whose  start/stop bits are specified in element of
+       arg3.  */
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_uarg2[0] = 32;
+  vec_uarg3[0] = (32 << 8) | 95;
+  expected_result = 0xaabbccddULL;
+  expected_result = (expected_result << 64) | 0xeeff112200000000ULL;
+
+  vec_result = vec_rlnm (vec_arg1, vec_uarg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(int128, uint128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  
+
+  /* vec_rlnm(arg1, arg2, arg3)
+     result - rotate each element of arg1 left by shift in element of arg2;
+     then AND with mask whose  start/stop bits are specified in element of
+     arg3.  */
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 48;
+  vec_uarg3[0] = (8 << 8) | 119;
+
+  uexpected_result = 0x00221234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDEE00ULL;
+
+  vec_uresult = vec_rlnm (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlnm(uint128, uint128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg3[0] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /*  vec_rlmi(R, A, B)
+      Result value: Each element of R is obtained by rotating the corresponding
+      element of A left by the number of bits specified by the corresponding element
+      of B.  */
+
+  vec_arg1[0] = 0x1234567890ABCDEFULL;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 0xAABBCCDDEEFF1122ULL;
+  vec_arg2[0] = 0x000000000000DEADULL;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 0x0000BEEF00000000ULL;
+  vec_uarg3[0] = 96 << 16 | 127 << 8 | 32;
+  expected_result = 0x000000000000DEADULL;
+  expected_result = (expected_result << 64) | 0x0000BEEF12345678ULL;
+
+  vec_result = vec_rlmi (vec_arg1, vec_arg2, vec_uarg3);
+  
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(int128, int128, uint128):  ");
+    print_i128(vec_arg1[0]);
+    printf(" << %lld = \n", vec_uarg2_di[1] & 0xFF);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* vec_rlmi(R, A, B)
+     Result value: Each element of R is obtained by rotating the corresponding
+     element of A left by the number of bits specified by the corresponding element
+     of B.  */
+
+  vec_uarg1[0] = 0xAABBCCDDEEFF1122ULL;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 0x1234567890ABCDEFULL;
+  vec_uarg2[0] = 0xDEAD000000000000ULL;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 0x000000000000BEEFULL;
+  vec_uarg3[0] = 16 << 16 | 111 << 8 | 48;
+  uexpected_result = 0xDEAD1234567890ABULL;
+  uexpected_result = (uexpected_result << 64) | 0xCDEFAABBCCDDBEEFULL;
+
+  vec_uresult = vec_rlmi (vec_uarg1, vec_uarg2, vec_uarg3);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_rlmi(uint128, unit128, uint128):  ");
+    print_i128(vec_uarg1[0]);
+    printf(" << %lld = \n", vec_uarg3[1] & 0xFF);
+    print_i128(vec_uresult[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* 128-bit compare tests, result is all 1's if true */
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1[0] = 2468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  uexpected_result = 0xFFFFFFFFFFFFFFFFULL;
+  uexpected_result = (uexpected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: unsigned vec_cmpgt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpgt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed vec_cmpgt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpeq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpeq (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpeq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  not equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: equal unsigned vec_cmpne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:not equal signed vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmpne (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed equal vec_cmpne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0x0ULL;
+
+  vec_result_bool = vec_cmplt (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmplt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+   
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmple (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmple ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 12468;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 1234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 12468;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: unsigned  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_uarg1, vec_uarg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR:  unsigned arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = 12468;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = -1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 > arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1[0] = -1234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 12468;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  expected_result = 0x0;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed  arg1 < arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+  expected_result = 0xFFFFFFFFFFFFFFFFULL;
+  expected_result = (expected_result << 64) | 0xFFFFFFFFFFFFFFFFULL;
+
+  vec_result_bool = vec_cmpge (vec_arg1, vec_arg2);
+
+  if (vec_result_bool[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_cmpge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.");
+    print_i128(vec_result_bool[0]);
+    printf("\n Result does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+#if 1
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_all_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_all_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_all_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_eq ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_eq (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_eq ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ne ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ne (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_ne ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_lt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_lt (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_lt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_gt ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_gt (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_le ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_le (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_le ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+  vec_arg1 = vec_arg2;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 = arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1[0] = -234;
+  vec_arg1[0] = (vec_arg1[0] << 64) | 4567;
+  vec_arg2[0] = 1234;
+  vec_arg2[0] = (vec_arg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_arg1, vec_arg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: signed arg1 != arg2 vec_any_ge ( ");
+    print_i128(vec_arg1[0]);
+    printf(", ");
+    print_i128(vec_arg2[0]);
+    printf(") failed.\n\n");
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+  vec_uarg1 = vec_uarg2;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (!result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 = uarg2 vec_any_ge ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1[0] = 234;
+  vec_uarg1[0] = (vec_uarg1[0] << 64) | 4567;
+  vec_uarg2[0] = 1234;
+  vec_uarg2[0] = (vec_uarg2[0] << 64) | 4567;
+
+  result_int = vec_any_ge (vec_uarg1, vec_uarg2);
+
+  if (result_int) {
+#if DEBUG
+    printf("ERROR: unsigned uarg1 != uarg2 vec_any_gt ( ");
+    print_i128(vec_uarg1[0]);
+    printf(", ");
+    print_i128(vec_uarg2[0]);
+    printf(") failed.\n\n");
+#else
+    abort();
+#endif
+  }
+#endif
+
+  /* Vector multiply Even and Odd tests */
+  vec_arg1_di[0] = 200;
+  vec_arg1_di[1] = 400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[0] * vec_arg2_di[0];
+
+  vec_result = vec_mule (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (signed, signed) failed.\n");
+    printf(" vec_arg1_di[0] = %lld\n", vec_arg1_di[0]);
+    printf(" vec_arg2_di[0] = %lld\n", vec_arg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_arg1_di[0] = -200;
+  vec_arg1_di[1] = -400;
+  vec_arg2_di[0] = 1234;
+  vec_arg2_di[1] = 4567;
+  expected_result = vec_arg1_di[1] * vec_arg2_di[1];
+
+  vec_result = vec_mulo (vec_arg1_di, vec_arg2_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (signed, signed) failed.\n");
+    printf(" vec_arg1_di[1] = %lld\n", vec_arg1_di[1]);
+    printf(" vec_arg2_di[1] = %lld\n", vec_arg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected Result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[0] * vec_uarg2_di[0];
+
+  vec_uresult = vec_mule (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mule (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[1] = %lld\n", vec_uarg1_di[1]);
+    printf(" vec_uarg2_di[1] = %lld\n", vec_uarg2_di[1]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+  
+  vec_uarg1_di[0] = 200;
+  vec_uarg1_di[1] = 400;
+  vec_uarg2_di[0] = 1234;
+  vec_uarg2_di[1] = 4567;
+  uexpected_result = vec_uarg1_di[1] * vec_uarg2_di[1];
+
+  vec_uresult = vec_mulo (vec_uarg1_di, vec_uarg2_di);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mulo (unsigned, unsigned) failed.\n");
+    printf(" vec_uarg1_di[0] = %lld\n", vec_uarg1_di[0]);
+    printf(" vec_uarg2_di[0] = %lld\n", vec_uarg2_di[0]);
+    printf("Result = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected Result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Multiply Longword */
+  vec_arg1_di[0] = 100;
+  vec_arg1_di[1] = -123456;
+
+  vec_arg2_di[0] = 123;
+  vec_arg2_di[1] = 1000;
+
+  vec_expected_result_di[0] = 12300;
+  vec_expected_result_di[1] = -123456000;
+
+  vec_result_di = vec_arg1_di * vec_arg2_di;
+
+  for (i = 0; i<2; i++) {
+    if (vec_result_di[i] != vec_expected_result_di[i]) {
+#if DEBUG
+      printf("ERROR: vector multipy [%d] ((long long) %lld) =  ", i,
+	     vec_result_di[i]);
+      printf("\n does not match expected_result [%d] = ((long long) %lld)", i,
+	     vec_expected_result_di[i]);
+      printf("\n\n");
+#else
+      abort();
+#endif
+    }
+  }
+
+  /* Vector Divide Quadword */
+  vec_arg1[0] = -12345678;
+  vec_arg2[0] = 2;
+  expected_result = -6172839;
+
+  vec_result = vec_div (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24680;
+  vec_uarg2[0] = 4;
+  uexpected_result = 6170;
+
+  vec_uresult = vec_div (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_div (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector Divide Extended Quadword */
+  vec_arg1[0] = -20;        // has 128-bit of zero concatenated onto it
+  vec_arg2[0] = 0x2000000000000000;
+  vec_arg2[0] = vec_arg2[0] << 64;
+  expected_result = -160;
+
+  vec_result = vec_dive (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 20;        // has 128-bit of zero concatenated onto it
+  vec_uarg2[0] = 0x4000000000000000;
+  vec_uarg2[0] = vec_uarg2[0] << 64;
+  uexpected_result = 80;
+
+  vec_uresult = vec_dive (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_dive (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  /* Vector modulo quad word  */
+  vec_arg1[0] = -12345675;
+  vec_arg2[0] = 2;
+  expected_result = -1;
+
+  vec_result = vec_mod (vec_arg1, vec_arg2);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (signed, signed) failed.\n");
+    printf("vec_arg1[0] = ");
+    print_i128(vec_arg1[0]);
+    printf("\nvec_arg2[0] = ");
+    print_i128(vec_arg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_result[0]);
+    printf("\nExpected result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_uarg1[0] = 24685;
+  vec_uarg2[0] = 4;
+  uexpected_result = 1;
+
+  vec_uresult = vec_mod (vec_uarg1, vec_uarg2);
+
+  if (vec_uresult[0] != uexpected_result) {
+#if DEBUG
+    printf("ERROR: vec_mod (unsigned, unsigned) failed.\n");
+    printf("vec_uarg1[0] = ");
+    print_i128(vec_uarg1[0]);
+    printf("\nvec_uarg2[0] = ");
+    print_i128(vec_uarg2[0]);
+    printf("\nResult = ");
+    print_i128(vec_uresult[0]);
+    printf("\nExpected result = ");
+    print_i128(uexpected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  return 0;
+}
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/5 ver4]   RS6000: Add TI to TD (128-bit DFP) and TD to TI support
       [not found] <d660f874049b7fdb338ff33478c56b2259828ab1.camel@us.ibm.com>
                   ` (2 preceding siblings ...)
  2021-04-26 16:36 ` [PATCH 2/5 " Carl Love
@ 2021-04-26 16:36 ` Carl Love
  2021-04-27 23:46   ` will schmidt
  2021-06-04 23:55   ` Segher Boessenkool
  2021-04-26 16:36 ` [PATCH 4/5 ver4] RS6000, Add test 128-bit shifts for just the int128 type Carl Love
  2021-04-26 16:36 ` [PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values Carl Love
  5 siblings, 2 replies; 20+ messages in thread
From: Carl Love @ 2021-04-26 16:36 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Will, Segher:

This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats.

The patch has been tested on
    powerpc64-linux instead (Power 8 BE)
    powerpc64-linux instead (Power 9 LE)
    powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

                   Carl Love
------------------------------------------------

gcc/ChangeLog
dje.gcc@gmail.com, gcc-patches@gcc.gnu.org, Bill Schmidt <wschmidt@linux.vnet.ibm.com>, Peter Bergner <bergner@vnet.ibm.com>,  
2021-04-26  Carl Love  <cel@us.ibm.com>
        * config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
        * config/rs6000/rs6000-call.c (P10V_BUILTIN_VCMPNET_P,
	P10V_BUILTIN_VCMPAET_P): New overloaded definitions.

gcc/testsuite/ChangeLog

2021-04-26  Carl Love  <cel@us.ibm.com>
        * gcc.target/powerpc/int_128bit-runnable.c: Add 128-bit DFP
        conversion tests.
---
 gcc/config/rs6000/dfp.md                      | 14 +++++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 61 +++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 026be5d48a6..b89d5ecc91d 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -226,6 +226,13 @@
   [(set_attr "type" "dfp")
    (set_attr "size" "128")])
 
+(define_insn "floattitd2"
+  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
+	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
+  "TARGET_POWER10"
+  "dcffixqq %0,%1"
+  [(set_attr "type" "dfp")])
+
 ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
 ;; This is the first stage of converting it to an integer type.
 
@@ -247,6 +254,13 @@
   "dctfix<q> %0,%1"
   [(set_attr "type" "dfp")
    (set_attr "size" "<bits>")])
+
+(define_insn "fixtdti2"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
+	(fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
+  "TARGET_POWER10"
+  "dctfixqq %0,%1"
+  [(set_attr "type" "dfp")])
 \f
 ;; Decimal builtin support
 
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 042758c8684..625b3869118 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -37,6 +37,7 @@
 #if DEBUG
 #include <stdio.h>
 #include <stdlib.h>
+#include <math.h>
 
 
 void print_i128(__int128_t val)
@@ -58,6 +59,13 @@ int main ()
   __int128_t arg1, result;
   __uint128_t uarg2;
 
+  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
+
+  struct conv_t {
+    __uint128_t u128;
+    _Decimal128 d128;
+  } conv, conv2;
+
   vector signed long long int vec_arg1_di, vec_arg2_di;
   vector signed long long int vec_result_di, vec_expected_result_di;
   vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
@@ -2258,6 +2266,59 @@ int main ()
     abort();
 #endif
   }
+  
+  /* DFP to __int128 and __int128 to DFP conversions */
+  /* Print the DFP value as an unsigned int so we can see the bit patterns.  */
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
+  expected_result_dfp128 = conv.d128;
 
+  arg1 = 4;
+
+  conv.d128 = (_Decimal128) arg1;
+
+  result_dfp128 = (_Decimal128) arg1;
+  if (((conv.u128 >>64) != 0x2208000000000000ULL) &&
+      ((conv.u128 & 0xFFFFFFFFFFFFFFFF) != 0x4ULL)) {
+#if DEBUG
+    printf("ERROR:  convert int128 value ");
+    print_i128 (arg1);
+    conv.d128 = result_dfp128;
+    printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)((conv.u128) >>64),
+	   (unsigned long long)((conv.u128) & 0xFFFFFFFFFFFFFFFF));
+
+    conv.d128 = expected_result_dfp128;
+    printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
+	   (unsigned long long) (conv.u128>>64),
+	   (unsigned long long) (conv.u128 & 0xFFFFFFFFFFFFFFFF));
+#else
+    abort();
+#endif
+  }
+
+  expected_result = 4;
+
+  conv.u128 = 0x2208000000000000ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
+  arg1_dfp128 = conv.d128;
+
+  result = (__int128_t) arg1_dfp128;
+
+  if (result != expected_result) {
+#if DEBUG
+    printf("ERROR:  convert DFP value ");
+    printf("0x%llx %llx (printed as hex bit string) ",
+	   (unsigned long long)(conv.u128>>64),
+	   (unsigned long long)(conv.u128 & 0xFFFFFFFFFFFFFFFF));
+    printf("to __int128 value = ");
+    print_i128 (result);
+    printf("\ndoes not match expected_result = ");
+    print_i128 (expected_result);
+    printf("\n");
+#else
+    abort();
+#endif
+  }
   return 0;
 }
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/5 ver4]  RS6000, Add test 128-bit shifts for just the int128 type.
       [not found] <d660f874049b7fdb338ff33478c56b2259828ab1.camel@us.ibm.com>
                   ` (3 preceding siblings ...)
  2021-04-26 16:36 ` [PATCH 3/5 ver4] RS6000: Add TI to TD (128-bit DFP) and TD to TI support Carl Love
@ 2021-04-26 16:36 ` Carl Love
  2021-04-27 23:46   ` will schmidt
  2021-06-05  0:03   ` Segher Boessenkool
  2021-04-26 16:36 ` [PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values Carl Love
  5 siblings, 2 replies; 20+ messages in thread
From: Carl Love @ 2021-04-26 16:36 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Will, Segher:

The previous patch added the vector 128-bit integer shift instruction
support for the V1TI type.  This patch renames and moves the VSX_TI
iterator from vsx.md to VEC_TI in vector.md.  The uses of VEC_TI are
also updated.

The patch has been tested on
    powerpc64-linux instead (Power 8 BE)
    powerpc64-linux instead (Power 9 LE)
    powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

                   Carl Love
------------------------------------------------

gcc/ChangeLog

2021-04-26  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
	Rename to altivec_vslq_<mode>, altivec_vsrq_<mode>, mode VEC_TI.
	* config/rs6000/vector.md (VEC_TI): Was named VSX_TI in vsx.md.
	(vashlv1ti3): Change to vashl<mode>3, mode VEC_TI.
	(vlshrv1ti3): Change to vlshr<mode>3, mode VEC_TI.
	* config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator. Update
	uses of VSX_TI to VEC_TI.

gcc/testsuite/ChangeLog

2021-04-26  Carl Love  <cel@us.ibm.com>
	gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
	tests.
---
 gcc/config/rs6000/altivec.md                  | 16 ++++-----
 gcc/config/rs6000/vector.md                   | 27 ++++++++-------
 gcc/config/rs6000/vsx.md                      | 33 +++++++++----------
 .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++++++--
 4 files changed, 52 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index c4c82b33f8d..c7d2cd0aa88 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2226,10 +2226,10 @@
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vslq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vslq_<mode>"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+		     (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vslq %0,%1,%2"
@@ -2243,10 +2243,10 @@
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vsrq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vsrq_<mode>"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+			   (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vsrq %0,%1,%2"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 55bbaa9c32f..5695154e316 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,6 +26,9 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
 
+;; 128-bit int modes
+(define_mode_iterator VEC_TI [V1TI TI])
+
 ;; Vector int modes for parity
 (define_mode_iterator VEC_IP [V8HI
 			      V4SI
@@ -1627,17 +1630,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vashlv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vashl<mode>3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+			 (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn (gen_altivec_vslq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vslq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
@@ -1650,17 +1653,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vlshrv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vlshr<mode>3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+			   (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
 
   emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn (gen_altivec_vsrq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vsrq_<mode> (operands[0], operands[1], tmp));
   DONE;
 })
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index ba539549024..587011081e1 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -37,9 +37,6 @@
 				  TI
 				  V1TI])
 
-;; Iterator for 128-bit integer types that go in a single vector register.
-(define_mode_iterator VSX_TI [TI V1TI])
-
 ;; Iterator for the 2 32-bit vector types
 (define_mode_iterator VSX_W [V4SF V4SI])
 
@@ -952,9 +949,9 @@
 ;; special V1TI container class, which it is not appropriate to use vec_select
 ;; for the type.
 (define_insn "*vsx_le_permute_<mode>"
-  [(set (match_operand:VSX_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
-	(rotate:VSX_TI
-	 (match_operand:VSX_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
+  [(set (match_operand:VEC_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
+	(rotate:VEC_TI
+	 (match_operand:VEC_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
 	 (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
   "@
@@ -968,10 +965,10 @@
    (set_attr "type" "vecperm,vecload,vecstore,*,load,store")])
 
 (define_insn_and_split "*vsx_le_undo_permute_<mode>"
-  [(set (match_operand:VSX_TI 0 "vsx_register_operand" "=wa,wa")
-	(rotate:VSX_TI
-	 (rotate:VSX_TI
-	  (match_operand:VSX_TI 1 "vsx_register_operand" "0,wa")
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=wa,wa")
+	(rotate:VEC_TI
+	 (rotate:VEC_TI
+	  (match_operand:VEC_TI 1 "vsx_register_operand" "0,wa")
 	  (const_int 64))
 	 (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX"
@@ -1043,11 +1040,11 @@
 ;; Peepholes to catch loads and stores for TImode if TImode landed in
 ;; GPR registers on a little endian system.
 (define_peephole2
-  [(set (match_operand:VSX_TI 0 "int_reg_operand")
-	(rotate:VSX_TI (match_operand:VSX_TI 1 "memory_operand")
+  [(set (match_operand:VEC_TI 0 "int_reg_operand")
+	(rotate:VEC_TI (match_operand:VEC_TI 1 "memory_operand")
 		       (const_int 64)))
-   (set (match_operand:VSX_TI 2 "int_reg_operand")
-	(rotate:VSX_TI (match_dup 0)
+   (set (match_operand:VEC_TI 2 "int_reg_operand")
+	(rotate:VEC_TI (match_dup 0)
 		       (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
    && (rtx_equal_p (operands[0], operands[2])
@@ -1055,11 +1052,11 @@
    [(set (match_dup 2) (match_dup 1))])
 
 (define_peephole2
-  [(set (match_operand:VSX_TI 0 "int_reg_operand")
-	(rotate:VSX_TI (match_operand:VSX_TI 1 "int_reg_operand")
+  [(set (match_operand:VEC_TI 0 "int_reg_operand")
+	(rotate:VEC_TI (match_operand:VEC_TI 1 "int_reg_operand")
 		       (const_int 64)))
-   (set (match_operand:VSX_TI 2 "memory_operand")
-	(rotate:VSX_TI (match_dup 0)
+   (set (match_operand:VEC_TI 2 "memory_operand")
+	(rotate:VEC_TI (match_dup 0)
 		       (const_int 64)))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
    && peep2_reg_dead_p (2, operands[0])"
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 625b3869118..9f7dbc6cc75 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -52,6 +52,18 @@ void print_i128(__int128_t val)
 
 void abort (void);
 
+__attribute__((noinline))
+__int128_t shift_right (__int128_t a, __uint128_t b)
+{
+  return a >> b;
+}
+
+__attribute__((noinline))
+__int128_t shift_left (__int128_t a, __uint128_t b)
+{
+  return a << b;
+}
+
 int main ()
 {
   int i, result_int;
@@ -102,7 +114,7 @@ int main ()
 #endif
   }
 
-  arg1 = 3;
+  arg1 = vec_result[0];
   uarg2 = 4;
   expected_result = arg1*16;
 
@@ -186,7 +198,7 @@ int main ()
 #endif
   }
 
-  arg1 = 48;
+  arg1 = vec_uresult[0];
   uarg2 = 4;
   expected_result = arg1/16;
 
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/5 ver4]  RS6000: Conversions between 128-bit integer and floating point values.
       [not found] <d660f874049b7fdb338ff33478c56b2259828ab1.camel@us.ibm.com>
                   ` (4 preceding siblings ...)
  2021-04-26 16:36 ` [PATCH 4/5 ver4] RS6000, Add test 128-bit shifts for just the int128 type Carl Love
@ 2021-04-26 16:36 ` Carl Love
  2021-04-27 23:46   ` will schmidt
                     ` (2 more replies)
  5 siblings, 3 replies; 20+ messages in thread
From: Carl Love @ 2021-04-26 16:36 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Will, Segher:

This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats using the new P10 instructions
dcffixqq and dctfixqq.  The new instructions are only used on P10 HW,
otherwise the conversions continue to use the existing SW routines.

The files fixkfti-sw.c and fixunskfti-sw.c are renamed versions of
fixkfti.c and fixunskfti.c respectively.  The function names in the
files were updated with the rename as well as some white spaces fixes.

The patch has been tested on
    powerpc64-linux instead (Power 8 BE)
    powerpc64-linux instead (Power 9 LE)
    powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

                   Carl Love
------------------------------------------------

gcc/ChangeLog

2021-04-26  Carl Love  <cel@us.ibm.com>
	* config/rs6000/rs6000.md (floatti<mode>2, floatunsti<mode>2,
	fix_trunc<mode>ti2, fixuns_trunc<mode>ti2): Add
	define_insn for mode IEEE 128.

gcc/testsuite/ChangeLog

2021-04-26  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/fp128_conversions.c: New file.
	* gcc.target/powerpc/int_128bit-runnable.c(vextsd2ppc_native_128bitq,
	vcmpuq, vcmpsq, vcmpequq, vcmpequq., vcmpgtsq, vcmpgtsq.
	vcmpgtuq, vcmpgtuq.): Update scan-assembler-times.
	(ppc_native_128bit): Remove dg-require-effective-target.

libgcc/ChangeLog
2021-04-26  Carl Love  <cel@us.ibm.com>
	* config.host: Add if test and set for
	libgcc_cv_powerpc_3_1_float128_hw.
	* libgcc/config/rs6000/fixkfti.c: Renamed to fixkfti-sw.c.
	Change calls of __fixkfti to __fixkfti_sw.
	* libgcc/config/rs6000/fixunskfti.c: Renamed to fixunskfti-sw.c.
	Change calls of __fixunskfti to __fixunskfti_sw.
	* libgcc/config/rs6000/float128-p10.c (__floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw): New file.
	* libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1): New macro.
	(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
	__fixunskfti_resolve): Add resolve functions.
	(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New functions.
	* libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
	__fixtfti, __fixunstfti): Add editor commands to change names.
	* libgcc/config/rs6000/float128-sed-hw (__floattitf,
	__floatuntitf, __fixtfti, __fixunstfti): Add editor commands to
	change names.
	* libgcc/config/rs6000/floattikf.c: Renamed to floattikf-sw.c.
	* libgcc/config/rs6000/floatuntikf.c: Renamed to floatuntikf-sw.c.
	* libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
	__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
	__floatuntikf, __fixkfti, __fixunskfti): New extern declarations.
	* libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
	fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
	(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
	file names to fp128_ppc_funcs.
	* libgcc/config/rs6000/t-float128-hw(fp128_3_1_hw_funcs,
	fp128_3_1_hw_src, fp128_3_1_hw_static_obj, fp128_3_1_hw_shared_obj,
	fp128_3_1_hw_obj): Add variables for ISA 3.1 support.
	* libgcc/config/rs6000/t-float128-p10-hw: New file.
	* configure: Update script for isa 3.1 128-bit float support.
	* configure.ac: Add check for 128-bit float hardware support.
---
 gcc/config/rs6000/rs6000.c                    |   8 +-
 gcc/config/rs6000/rs6000.md                   |  36 +++
 .../gcc.target/powerpc/fp128_conversions.c    | 294 ++++++++++++++++++
 .../gcc.target/powerpc/int_128bit-runnable.c  |  13 +-
 libgcc/config.host                            |   4 +
 .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
 .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   4 +-
 libgcc/config/rs6000/float128-ifunc.c         |  44 ++-
 libgcc/config/rs6000/float128-p10.c           |  71 +++++
 libgcc/config/rs6000/float128-sed             |   4 +
 libgcc/config/rs6000/float128-sed-hw          |   4 +
 .../rs6000/{floattikf.c => floattikf-sw.c}    |   4 +-
 .../{floatuntikf.c => floatuntikf-sw.c}       |   4 +-
 libgcc/config/rs6000/quad-float128.h          |  17 +-
 libgcc/config/rs6000/t-float128               |  12 +-
 libgcc/config/rs6000/t-float128-hw            |  16 +
 libgcc/config/rs6000/t-float128-p10-hw        |  24 ++
 libgcc/configure                              |  37 +++
 libgcc/configure.ac                           |  25 ++
 19 files changed, 589 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
 rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
 rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (96%)
 create mode 100644 libgcc/config/rs6000/float128-p10.c
 rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
 rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%)
 create mode 100644 libgcc/config/rs6000/t-float128-p10-hw

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 531a9c87243..c7a28044f68 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -11022,10 +11022,10 @@ init_float128_ieee (machine_mode mode)
 
       if (TARGET_POWERPC64)
 	{
-	  set_conv_libfunc (sfix_optab, TImode, mode, "__fixkfti");
-	  set_conv_libfunc (ufix_optab, TImode, mode, "__fixunskfti");
-	  set_conv_libfunc (sfloat_optab, mode, TImode, "__floattikf");
-	  set_conv_libfunc (ufloat_optab, mode, TImode, "__floatuntikf");
+	  set_conv_libfunc (sfix_optab, TImode, mode, "__fixkfti_sw");
+	  set_conv_libfunc (ufix_optab, TImode, mode, "__fixunskfti_sw");
+	  set_conv_libfunc (sfloat_optab, mode, TImode, "__floattikf_sw");
+	  set_conv_libfunc (ufloat_optab, mode, TImode, "__floatuntikf_sw");
 	}
     }
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index c8cdc42533c..674c4432661 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6421,6 +6421,42 @@
    xscvsxddp %x0,%x1"
   [(set_attr "type" "fp")])
 
+(define_insn "floatti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvsqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "floatunsti<mode>2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+       (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvuqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fix_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpsqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fixuns_trunc<mode>ti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+       (unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpuqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
 ; Allow the combiner to merge source memory operands to the conversion so that
 ; the optimizer/register allocator doesn't try to load the value too early in a
 ; GPR and then use store/load to move it to a FPR and suffer from a store-load
diff --git a/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
new file mode 100644
index 00000000000..c20282fa0e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
@@ -0,0 +1,294 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+
+/* Check that the expected 128-bit instructions are generated if the processor
+   supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mxscvsqqp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxscvuqqp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxscvqpsqz\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxscvqpuqz\M} 1 } } */
+
+#include <stdio.h>
+#include <math.h>
+#include <fenv.h>
+#include <stdlib.h>
+#include <wchar.h>
+
+#define DEBUG 0
+
+void
+abort (void);
+
+float
+conv_i_2_fp( long long int a)
+{
+  return (float) a;
+}
+
+double
+conv_i_2_fpd( long long int a)
+{
+  return (double) a;
+}
+
+double
+conv_ui_2_fpd( unsigned long long int a)
+{
+  return (double) a;
+}
+
+__float128
+conv_i128_2_fp128 (__int128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__float128
+conv_ui128_2_fp128 (__uint128_t a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__float128) a;
+}
+
+__int128_t
+conv_fp128_2_i128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__int128_t) a;
+}
+
+__uint128_t
+conv_fp128_2_ui128 (__float128 a)
+{
+  // default, gen inst KF mode
+  // -mabi=ibmlongdouble, gen inst floattiieee KF mode
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (__uint128_t) a;
+}
+
+long double
+conv_i128_2_ld (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble gen inst floattiieee TF mode
+  return (long double) a;
+}
+
+__ibm128
+conv_i128_2_ibm128 (__int128_t a)
+{
+  // default, gen call __floattitf
+  // -mabi=ibmlongdouble, gen call __floattitf
+  // -mabi=ieeelongdouble, message uses IBM long double, no binary output
+  return (__ibm128) a;
+}
+
+int
+main()
+{
+	float a, expected_result_float;
+	double b, expected_result_double;
+	long long int c, expected_result_llint;
+	unsigned long long int u;
+	__int128_t d;
+	__uint128_t u128;
+	unsigned long long expected_result_uint128[2] ;
+	__float128 e;
+	long double ld;     // another 128-bit float version
+
+	union conv_t {
+		float a;
+		double b;
+		long long int c;
+		long long int128[2] ;
+		unsigned long long uint128[2] ;
+		unsigned long long int u;
+		__int128_t d;
+		__uint128_t u128;
+		__float128 e;
+		long double ld;     // another 128-bit float version
+	} conv, conv_result;
+
+	c = 20;
+	expected_result_llint = 20.00000;
+	a = conv_i_2_fp (c);
+
+	if (a != expected_result_llint) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fp(%lld) = %10.5f\n", c, a);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_llint);
+#else
+		abort();
+#endif
+	}
+
+	c = 20;
+	expected_result_double = 20.00000;
+	b = conv_i_2_fpd (c);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_i_2_fpd(%lld) = %10.5f\n", d, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+	u = 20;
+	expected_result_double = 20.00000;
+	b = conv_ui_2_fpd (u);
+
+	if (b != expected_result_double) {
+#if DEBUG
+		printf("ERROR: conv_ui_2_fpd(%llu) = %10.5f\n", u, b);
+		printf("\n does not match expected_result = %10.5f\n\n",
+				 expected_result_double);
+ #else
+		abort();
+#endif
+	}
+
+  d = -3210;
+  d = (d * 10000000000) + 9876543210;
+  conv_result.e = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0xc02bd2f9068d1160;
+  expected_result_uint128[0] = 0x0;
+  
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(-32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  d = 123;
+  d = (d * 10000000000) + 1234567890;
+  conv_result.ld = conv_i128_2_fp128 (d);
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x4271eab4c8ed2000;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_i128_2_fp128(1231234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  u128 = 8760;
+  u128 = (u128 * 10000000000) + 1234567890;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+  expected_result_uint128[1] = 0x402d3eb101df8b48;
+  expected_result_uint128[0] = 0x0;
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(87601234567890) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  u128 = 3210;
+  u128 = (u128 * 10000000000) + 9876543210;
+  expected_result_uint128[1] = 0x402bd3429c8feea0;
+  expected_result_uint128[0] = 0x0;
+  conv_result.e = conv_ui128_2_fp128 (u128);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_ui128_2_fp128(32109876543210) = (result in hex) 0x%llx %llx\n",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 12345.6789;
+  expected_result_uint128[1] = 0x1407374883526960;
+  expected_result_uint128[0] = 0x3039;
+
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) =  ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = -6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0xffffffffffffe57b;
+  conv_result.d = conv_fp128_2_i128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_i128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  conv.e = 6789.12345;
+  expected_result_uint128[1] = 0x0;
+  expected_result_uint128[0] = 0x1a85;
+  conv_result.d = conv_fp128_2_ui128 (conv.e);
+ 
+  if ((conv_result.uint128[1] != expected_result_uint128[1])
+		&& (conv_result.uint128[0] != expected_result_uint128[0])) {
+#if DEBUG
+	  printf("ERROR: conv_fp128_2_ui128(0x%llx %llx) = ",
+				conv.uint128[1], conv.uint128[0]);
+	  printf("0x%llx %llx\n", conv_result.uint128[1], conv_result.uint128[0]);
+	  
+	  printf("\n does not match expected_result = (result in hex) 0x%llx %llx\n\n",
+				expected_result_uint128[1], expected_result_uint128[0]);
+ #else
+	  abort();
+#endif
+	}
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 9f7dbc6cc75..94dbd2b6230 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -4,21 +4,16 @@
 
 /* Check that the expected 128-bit instructions are generated if the processor
    supports the 128-bit integer instructions. */
+/* { dg-final { scan-assembler-times {\mvextsd2q\M} 4 } } */
 /* { dg-final { scan-assembler-times {\mvslq\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvsrq\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvsraq\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mvcmpuq\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mvcmpsq\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mvcmpequq\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mvcmpequq.\M} 16 } } */
-/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mvcmpgtsq.\M} 16 } } */
-/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mvcmpgtuq.\M} 16 } } */
-/* { dg-final { scan-assembler-times {\mvmuleud\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 16 } } */
 /* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
diff --git a/libgcc/config.host b/libgcc/config.host
index f808b61be70..50f00062232 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1224,6 +1224,10 @@ powerpc*-*-linux*)
 		tmake_file="${tmake_file} rs6000/t-float128-hw"
 	fi
 
+	if test $libgcc_cv_powerpc_3_1_float128_hw = yes; then
+		tmake_file="${tmake_file} rs6000/t-float128-p10-hw"
+	fi
+
 	extra_parts="$extra_parts ecrti.o ecrtn.o ncrti.o ncrtn.o"
 	md_unwind_header=rs6000/linux-unwind.h
 	;;
diff --git a/libgcc/config/rs6000/fixkfti.c b/libgcc/config/rs6000/fixkfti-sw.c
similarity index 96%
rename from libgcc/config/rs6000/fixkfti.c
rename to libgcc/config/rs6000/fixkfti-sw.c
index 0d965bc6253..cc000fca0f8 100644
--- a/libgcc/config/rs6000/fixkfti.c
+++ b/libgcc/config/rs6000/fixkfti-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TItype
-__fixkfti (TFtype a)
+__fixkfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/fixunskfti.c b/libgcc/config/rs6000/fixunskfti-sw.c
similarity index 96%
rename from libgcc/config/rs6000/fixunskfti.c
rename to libgcc/config/rs6000/fixunskfti-sw.c
index f285b4e3fbd..7a04d1a489a 100644
--- a/libgcc/config/rs6000/fixunskfti.c
+++ b/libgcc/config/rs6000/fixunskfti-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 UTItype
-__fixunskfti (TFtype a)
+__fixunskfti_sw (TFtype a)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/float128-ifunc.c b/libgcc/config/rs6000/float128-ifunc.c
index 85380471c5e..57545dd7edb 100644
--- a/libgcc/config/rs6000/float128-ifunc.c
+++ b/libgcc/config/rs6000/float128-ifunc.c
@@ -46,14 +46,9 @@
 #endif
 
 #define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW : SW)
+#define SW_OR_HW_ISA3_1(SW, HW) (__builtin_cpu_supports ("arch_3_1") ? HW : SW)
 
 /* Resolvers.  */
-
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 static __typeof__ (__addkf3_sw) *
 __addkf3_resolve (void)
 {
@@ -102,6 +97,18 @@ __floatdikf_resolve (void)
   return SW_OR_HW (__floatdikf_sw, __floatdikf_hw);
 }
 
+static __typeof__ (__floattikf_sw) *
+__floattikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floattikf_sw, __floattikf_hw);
+}
+
+static __typeof__ (__floatuntikf_sw) *
+__floatuntikf_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__floatuntikf_sw, __floatuntikf_hw);
+}
+
 static __typeof__ (__floatunsikf_sw) *
 __floatunsikf_resolve (void)
 {
@@ -114,6 +121,19 @@ __floatundikf_resolve (void)
   return SW_OR_HW (__floatundikf_sw, __floatundikf_hw);
 }
 
+
+static __typeof__ (__fixkfti_sw) *
+__fixkfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixkfti_sw, __fixkfti_hw);
+}
+
+static __typeof__ (__fixunskfti_sw) *
+__fixunskfti_resolve (void)
+{
+  return SW_OR_HW_ISA3_1 (__fixunskfti_sw, __fixunskfti_hw);
+}
+
 static __typeof__ (__fixkfsi_sw) *
 __fixkfsi_resolve (void)
 {
@@ -303,6 +323,18 @@ TFtype __floatsikf (SItype_ppc)
 TFtype __floatdikf (DItype_ppc)
   __attribute__ ((__ifunc__ ("__floatdikf_resolve")));
 
+TFtype __floattikf (TItype_ppc)
+  __attribute__ ((__ifunc__ ("__floattikf_resolve")));
+
+TFtype __floatuntikf (UTItype_ppc)
+  __attribute__ ((__ifunc__ ("__floatuntikf_resolve")));
+
+TItype_ppc __fixkfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixkfti_resolve")));
+
+UTItype_ppc __fixunskfti (TFtype)
+  __attribute__ ((__ifunc__ ("__fixunskfti_resolve")));
+
 TFtype __floatunsikf (USItype_ppc)
   __attribute__ ((__ifunc__ ("__floatunsikf_resolve")));
 
diff --git a/libgcc/config/rs6000/float128-p10.c b/libgcc/config/rs6000/float128-p10.c
new file mode 100644
index 00000000000..7f5d317631a
--- /dev/null
+++ b/libgcc/config/rs6000/float128-p10.c
@@ -0,0 +1,71 @@
+/* Automatic switching between software and hardware IEEE 128-bit
+   ISA 3.1 floating-point emulation for PowerPC.
+
+   Copyright (C) 2016-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Carl Love (cel@us.ibm.com)
+   Code is based on the main soft-fp library written by:
+	Richard Henderson (rth@cygnus.com) and
+	Jakub Jelinek (jj@ultra.linux.cz).
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Note, the hardware conversion instructions for 128-bit integers are
+   supported for ISA 3.1 and later.  Only compile this file with -mcpu=power10
+   or newer support.  */
+
+#include <soft-fp.h>
+#include <quad-float128.h>
+
+#ifndef __FLOAT128_HARDWARE__
+#error "This module must be compiled with IEEE 128-bit hardware support"
+#endif
+
+#ifndef _ARCH_PWR10
+#error "This module must be compiled for Power 10 support"
+#endif
+
+TFtype
+__floattikf_hw (TItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TFtype
+__floatuntikf_hw (UTItype_ppc a)
+{
+  return (TFtype) a;
+}
+
+TItype_ppc
+__fixkfti_hw (TFtype a)
+{
+  return (TItype_ppc) a;
+}
+
+UTItype_ppc
+__fixunskfti_hw (TFtype a)
+{
+  return (UTItype_ppc) a;
+}
diff --git a/libgcc/config/rs6000/float128-sed b/libgcc/config/rs6000/float128-sed
index d9a089ff9ba..c0fcddb1959 100644
--- a/libgcc/config/rs6000/float128-sed
+++ b/libgcc/config/rs6000/float128-sed
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi/g
 s/__fixunstfdi/__fixunskfdi/g
 s/__fixunstfsi/__fixunskfsi/g
 s/__floatditf/__floatdikf/g
+s/__floattitf/__floattikf/g
+s/__floatuntitf/__floatuntikf/g
+s/__fixtfti/__fixkfti/g
+s/__fixunstfti/__fixunskfti/g
 s/__floatsitf/__floatsikf/g
 s/__floatunditf/__floatundikf/g
 s/__floatunsitf/__floatunsikf/g
diff --git a/libgcc/config/rs6000/float128-sed-hw b/libgcc/config/rs6000/float128-sed-hw
index acf36b0c17d..3d2bf556da1 100644
--- a/libgcc/config/rs6000/float128-sed-hw
+++ b/libgcc/config/rs6000/float128-sed-hw
@@ -8,6 +8,10 @@ s/__fixtfsi/__fixkfsi_sw/g
 s/__fixunstfdi/__fixunskfdi_sw/g
 s/__fixunstfsi/__fixunskfsi_sw/g
 s/__floatditf/__floatdikf_sw/g
+s/__floattitf/__floattikf_sw/g
+s/__floatuntitf/__floatuntikf_sw/g
+s/__fixtfti/__fixkfti_sw/g
+s/__fixunstfti/__fixunskfti_sw/g
 s/__floatsitf/__floatsikf_sw/g
 s/__floatunditf/__floatundikf_sw/g
 s/__floatunsitf/__floatunsikf_sw/g
diff --git a/libgcc/config/rs6000/floattikf.c b/libgcc/config/rs6000/floattikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floattikf.c
rename to libgcc/config/rs6000/floattikf-sw.c
index cc5c7ca0fd0..4e1786cd229 100644
--- a/libgcc/config/rs6000/floattikf.c
+++ b/libgcc/config/rs6000/floattikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floattikf (TItype i)
+__floattikf_sw (TItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/floatuntikf.c b/libgcc/config/rs6000/floatuntikf-sw.c
similarity index 96%
rename from libgcc/config/rs6000/floatuntikf.c
rename to libgcc/config/rs6000/floatuntikf-sw.c
index 96f2d3bdcb8..c4b814ddd68 100644
--- a/libgcc/config/rs6000/floatuntikf.c
+++ b/libgcc/config/rs6000/floatuntikf-sw.c
@@ -5,7 +5,7 @@
    This file is part of the GNU C Library.
    Contributed by Steven Munroe (munroesj@linux.vnet.ibm.com)
    Code is based on the main soft-fp library written by:
-   	   Uros Bizjak (ubizjak@gmail.com).
+	   Uros Bizjak (ubizjak@gmail.com).
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -35,7 +35,7 @@
 #include "quad-float128.h"
 
 TFtype
-__floatuntikf (UTItype i)
+__floatuntikf_sw (UTItype i)
 {
   FP_DECL_EX;
   FP_DECL_Q (A);
diff --git a/libgcc/config/rs6000/quad-float128.h b/libgcc/config/rs6000/quad-float128.h
index 5beb1531d2b..c4d775b4ad3 100644
--- a/libgcc/config/rs6000/quad-float128.h
+++ b/libgcc/config/rs6000/quad-float128.h
@@ -88,19 +88,18 @@ extern USItype_ppc __fixunskfsi_sw (TFtype);
 extern UDItype_ppc __fixunskfdi_sw (TFtype);
 extern TFtype __floatsikf_sw (SItype_ppc);
 extern TFtype __floatdikf_sw (DItype_ppc);
+extern TFtype __floattikf_sw (TItype_ppc);
 extern TFtype __floatunsikf_sw (USItype_ppc);
 extern TFtype __floatundikf_sw (UDItype_ppc);
+extern TFtype __floatuntikf_sw (UTItype_ppc);
+extern TItype_ppc __fixkfti_sw (TFtype);
+extern UTItype_ppc __fixunskfti_sw (TFtype);
 extern IBM128_TYPE __extendkftf2_sw (TFtype);
 extern TFtype __trunctfkf2_sw (IBM128_TYPE);
 extern TCtype __mulkc3_sw (TFtype, TFtype, TFtype, TFtype);
 extern TCtype __divkc3_sw (TFtype, TFtype, TFtype, TFtype);
 
 #ifdef _ARCH_PPC64
-/* We do not provide ifunc resolvers for __fixkfti, __fixunskfti, __floattikf,
-   and __floatuntikf.  There is no ISA 3.0 instruction that converts between
-   128-bit integer types and 128-bit IEEE floating point, or vice versa.  So
-   use the emulator functions for these conversions.  */
-
 extern TItype_ppc __fixkfti (TFtype);
 extern UTItype_ppc __fixunskfti (TFtype);
 extern TFtype __floattikf (TItype_ppc);
@@ -131,8 +130,12 @@ extern USItype_ppc __fixunskfsi_hw (TFtype);
 extern UDItype_ppc __fixunskfdi_hw (TFtype);
 extern TFtype __floatsikf_hw (SItype_ppc);
 extern TFtype __floatdikf_hw (DItype_ppc);
+extern TFtype __floattikf_hw (TItype_ppc);
 extern TFtype __floatunsikf_hw (USItype_ppc);
 extern TFtype __floatundikf_hw (UDItype_ppc);
+extern TFtype __floatuntikf_hw (UTItype_ppc);
+extern TItype_ppc __fixkfti_hw (TFtype);
+extern UTItype_ppc __fixunskfti_hw (TFtype);
 extern IBM128_TYPE __extendkftf2_hw (TFtype);
 extern TFtype __trunctfkf2_hw (IBM128_TYPE);
 extern TCtype __mulkc3_hw (TFtype, TFtype, TFtype, TFtype);
@@ -163,8 +166,12 @@ extern USItype_ppc __fixunskfsi (TFtype);
 extern UDItype_ppc __fixunskfdi (TFtype);
 extern TFtype __floatsikf (SItype_ppc);
 extern TFtype __floatdikf (DItype_ppc);
+extern TFtype __floattikf (TItype_ppc);
 extern TFtype __floatunsikf (USItype_ppc);
 extern TFtype __floatundikf (UDItype_ppc);
+extern TFtype __floatuntikf (UTItype_ppc);
+extern TItype_ppc __fixkfti (TFtype);
+extern UTItype_ppc __fixunskfti (TFtype);
 extern IBM128_TYPE __extendkftf2 (TFtype);
 extern TFtype __trunctfkf2 (IBM128_TYPE);
 
diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
index d745f0d82e1..b09b5664af0 100644
--- a/libgcc/config/rs6000/t-float128
+++ b/libgcc/config/rs6000/t-float128
@@ -31,7 +31,8 @@ ibm128_dec_funcs	= _tf_to_sd _tf_to_dd _tf_to_td \
 			  _sd_to_tf _dd_to_tf _td_to_tf
 
 # New functions for software emulation
-fp128_ppc_funcs		= floattikf floatuntikf fixkfti fixunskfti \
+fp128_ppc_funcs		= floattikf-sw floatuntikf-sw \
+			  fixkfti-sw fixunskfti-sw \
 			  extendkftf2-sw trunctfkf2-sw \
 			  sfp-exceptions _mulkc3 _divkc3 _powikf2
 
@@ -47,13 +48,16 @@ fp128_ppc_obj		= $(fp128_ppc_static_obj) $(fp128_ppc_shared_obj)
 
 # All functions
 fp128_funcs		= $(fp128_softfp_funcs) $(fp128_ppc_funcs) \
-			  $(fp128_hw_funcs) $(fp128_ifunc_funcs)
+			  $(fp128_hw_funcs) $(fp128_ifunc_funcs) \
+			  $(fp128_3_1_hw_funcs)
 
 fp128_src		= $(fp128_softfp_src) $(fp128_ppc_src) \
-			  $(fp128_hw_src) $(fp128_ifunc_src)
+			  $(fp128_hw_src) $(fp128_ifunc_src) \
+			  $(fp128_3_1_hw_src)
 
 fp128_obj		= $(fp128_softfp_obj) $(fp128_ppc_obj) \
-			  $(fp128_hw_obj) $(fp128_ifunc_obj)
+			  $(fp128_hw_obj) $(fp128_ifunc_obj) \
+			  $(fp128_3_1_hw_obj)
 
 fp128_sed		= $(srcdir)/config/rs6000/float128-sed$(fp128_sed_hw)
 fp128_dep		= $(fp128_sed) $(srcdir)/config/rs6000/t-float128
diff --git a/libgcc/config/rs6000/t-float128-hw b/libgcc/config/rs6000/t-float128-hw
index d64ca4dd694..c0827366cc4 100644
--- a/libgcc/config/rs6000/t-float128-hw
+++ b/libgcc/config/rs6000/t-float128-hw
@@ -13,6 +13,13 @@ fp128_hw_static_obj	= $(addsuffix $(objext),$(fp128_hw_funcs))
 fp128_hw_shared_obj	= $(addsuffix _s$(objext),$(fp128_hw_funcs))
 fp128_hw_obj		= $(fp128_hw_static_obj) $(fp128_hw_shared_obj)
 
+# New functions for ISA 3.1 hardware support
+fp128_3_1_hw_funcs	= float128-p10
+fp128_3_1_hw_src	= $(srcdir)/config/rs6000/float128-p10.c
+fp128_3_1_hw_static_obj	= $(addsuffix $(objext),$(fp128_3_1_hw_funcs))
+fp128_3_1_hw_shared_obj	= $(addsuffix _s$(objext),$(fp128_3_1_hw_funcs))
+fp128_3_1_hw_obj	= $(fp128_3_1_hw_static_obj) $(fp128_3_1_hw_shared_obj)
+
 fp128_ifunc_funcs	= float128-ifunc
 fp128_ifunc_src		= $(srcdir)/config/rs6000/float128-ifunc.c
 fp128_ifunc_static_obj	= float128-ifunc$(objext)
@@ -30,9 +37,18 @@ FP128_CFLAGS_HW		 = -Wno-type-limits -mvsx -mfloat128 \
 			   -I$(srcdir)/config/rs6000 \
 			   $(FLOAT128_HW_INSNS)
 
+FP128_3_1_CFLAGS_HW	 = -Wno-type-limits -mvsx -mcpu=power10 \
+			   -mfloat128-hardware -mno-gnu-attribute \
+			   -I$(srcdir)/soft-fp \
+			   -I$(srcdir)/config/rs6000 \
+			   $(FLOAT128_HW_INSNS)
+
 $(fp128_hw_obj)		 : INTERNAL_CFLAGS += $(FP128_CFLAGS_HW)
 $(fp128_hw_obj)		 : $(srcdir)/config/rs6000/t-float128-hw
 
+$(fp128_3_1_hw_obj)	 : INTERNAL_CFLAGS += $(FP128_3_1_CFLAGS_HW)
+$(fp128_3_1_hw_obj)	 : $(srcdir)/config/rs6000/t-float128-p10-hw
+
 $(fp128_ifunc_obj)	 : INTERNAL_CFLAGS += $(FP128_CFLAGS_SW)
 $(fp128_ifunc_obj)	 : $(srcdir)/config/rs6000/t-float128-hw
 
diff --git a/libgcc/config/rs6000/t-float128-p10-hw b/libgcc/config/rs6000/t-float128-p10-hw
new file mode 100644
index 00000000000..de36227c3d1
--- /dev/null
+++ b/libgcc/config/rs6000/t-float128-p10-hw
@@ -0,0 +1,24 @@
+# Support for adding __float128 hardware support to the powerpc.
+# Tell the float128 functions that the ISA 3.1 hardware support can
+# be compiled it to be selected via IFUNC functions.
+
+FLOAT128_HW_INSNS	= -DFLOAT128_HW_INSNS
+
+# New functions for hardware support
+
+fp128_3_1_hw_funcs	= float128-p10
+fp128_3_1_hw_src	= $(srcdir)/config/rs6000/float128-p10.c
+fp128_3_1_hw_static_obj	= $(addsuffix $(objext),$(fp128_3_1_hw_funcs))
+fp128_3_1_hw_shared_obj	= $(addsuffix _s$(objext),$(fp128_3_1_hw_funcs))
+fp128_3_1_hw_obj	= $(fp128_3_1_hw_static_obj) $(fp128_3_1_hw_shared_obj)
+
+# Build the hardware support functions with appropriate hardware support
+FP128_3_1_CFLAGS_HW	 = -Wno-type-limits -mvsx -mfloat128 \
+			   -mpower10 \
+			   -mfloat128-hardware -mno-gnu-attribute \
+			   -I$(srcdir)/soft-fp \
+			   -I$(srcdir)/config/rs6000 \
+			   $(FLOAT128_HW_INSNS)
+
+$(fp128_3_1_hw_obj)		 : INTERNAL_CFLAGS += $(FP128_3_1_CFLAGS_HW)
+$(fp128_3_1_hw_obj)		 : $(srcdir)/config/rs6000/t-float128-p10-hw
diff --git a/libgcc/configure b/libgcc/configure
index dd3afb2c957..ce05e0dd48b 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -5263,6 +5263,43 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_powerpc_float128_hw" >&5
 $as_echo "$libgcc_cv_powerpc_float128_hw" >&6; }
   CFLAGS="$saved_CFLAGS"
+
+  saved_CFLAGS="$CFLAGS"
+  CFLAGS="$CFLAGS -mpower10 -mfloat128-hardware"
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PowerPC ISA 3.1 to build hardware __float128 libraries" >&5
+$as_echo_n "checking for PowerPC ISA 3.1 to build hardware __float128 libraries... " >&6; }
+if ${libgcc_cv_powerpc_float128_hw+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <sys/auxv.h>
+     #ifndef AT_PLATFORM
+     #error "AT_PLATFORM is not defined"
+     #endif
+     #ifndef __BUILTIN_CPU_SUPPORTS__
+     #error "__builtin_cpu_supports is not available"
+     #endif
+     vector unsigned char add (vector unsigned char a, vector unsigned char b)
+     {
+       vector unsigned char ret;
+       __asm__ ("xscvsqqp %0,%1,%2" : "=v" (ret) : "v" (a), "v" (b));
+       return ret;
+     }
+     void *add_resolver (void) { return (void *) add; }
+     __float128 add_ifunc (__float128, __float128)
+	__attribute__ ((__ifunc__ ("add_resolver")));
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  libgcc_cv_powerpc_3_1_float128_hw=yes
+else
+  libgcc_cv_powerpc_3_1_float128_hw=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_powerpc_float128_hw" >&5
+  $as_echo "$libgcc_cv_powerpc_float128_hw" >&6; }
+  CFLAGS="$saved_CFLAGS"
 esac
 
 # Collect host-machine-specific information.
diff --git a/libgcc/configure.ac b/libgcc/configure.ac
index 10ffb046415..bc315dec7e4 100644
--- a/libgcc/configure.ac
+++ b/libgcc/configure.ac
@@ -458,6 +458,31 @@ powerpc*-*-linux*)
     [libgcc_cv_powerpc_float128_hw=yes],
     [libgcc_cv_powerpc_float128_hw=no])])
   CFLAGS="$saved_CFLAGS"
+
+  saved_CFLAGS="$CFLAGS"
+  CFLAGS="$CFLAGS -mpower10 -mfloat128-hardware"
+  AC_CACHE_CHECK([for PowerPC ISA 3.1 to build hardware __float128 libraries],
+		 [libgcc_cv_powerpc_float128_hw],
+		 [AC_COMPILE_IFELSE(
+    [AC_LANG_SOURCE([#include <sys/auxv.h>
+     #ifndef AT_PLATFORM
+     #error "AT_PLATFORM is not defined"
+     #endif
+     #ifndef __BUILTIN_CPU_SUPPORTS__
+     #error "__builtin_cpu_supports is not available"
+     #endif
+     vector unsigned char add (vector unsigned char a, vector unsigned char b)
+     {
+       vector unsigned char ret;
+       __asm__ ("xscvsqqp %0,%1,%2" : "=v" (ret) : "v" (a), "v" (b));
+       return ret;
+     }
+     void *add_resolver (void) { return (void *) add; }
+     __float128 add_ifunc (__float128, __float128)
+	__attribute__ ((__ifunc__ ("add_resolver")));])],
+    [libgcc_cv_powerpc_3_1_float128_hw=yes],
+    [libgcc_cv_powerpc_3_1_float128_hw=no])])
+  CFLAGS="$saved_CFLAGS"
 esac
 
 # Collect host-machine-specific information.
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5 ver4]  RS6000: Conversions between 128-bit integer and floating point values.
  2021-04-26 16:36 ` [PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values Carl Love
@ 2021-04-27 23:46   ` will schmidt
  2021-04-28 17:39   ` [PATCH ] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
  2021-06-07 18:49   ` [PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values Segher Boessenkool
  2 siblings, 0 replies; 20+ messages in thread
From: will schmidt @ 2021-04-27 23:46 UTC (permalink / raw)
  To: Carl Love, Segher Boessenkool
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

On Mon, 2021-04-26 at 09:36 -0700, Carl Love wrote:
> Will, Segher:
> 
> This patch adds support for converting to/from 128-bit integers and
> 128-bit decimal floating point formats using the new P10 instructions
> dcffixqq and dctfixqq.  The new instructions are only used on P10 HW,
> otherwise the conversions continue to use the existing SW routines.
> 
> The files fixkfti-sw.c and fixunskfti-sw.c are renamed versions of
> fixkfti.c and fixunskfti.c respectively.  The function names in the
> files were updated with the rename as well as some white spaces fixes.
> 
> The patch has been tested on
>     powerpc64-linux instead (Power 8 BE)
>     powerpc64-linux instead (Power 9 LE)
>     powerpc64-linux instead (Power 10 LE)
> 
> Please let me know if the patch is acceptable for mainline.
> 
>                    Carl Love
> ------------------------------------------------
> 
> gcc/ChangeLog
> 
> 2021-04-26  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/rs6000.md (floatti<mode>2, floatunsti<mode>2,
> 	fix_trunc<mode>ti2, fixuns_trunc<mode>ti2): Add
> 	define_insn for mode IEEE 128.
> 
> gcc/testsuite/ChangeLog
> 
> 2021-04-26  Carl Love  <cel@us.ibm.com>
> 	* gcc.target/powerpc/fp128_conversions.c: New file.
> 	* gcc.target/powerpc/int_128bit-runnable.c(vextsd2ppc_native_128bitq,

^ vextsd2ppc_native_128bitq ?   

> 	vcmpuq, vcmpsq, vcmpequq, vcmpequq., vcmpgtsq, vcmpgtsq.
> 	vcmpgtuq, vcmpgtuq.): Update scan-assembler-times.

> 	(ppc_native_128bit): Remove dg-require-effective-target.
> 
> libgcc/ChangeLog
> 2021-04-26  Carl Love  <cel@us.ibm.com>
> 	* config.host: Add if test and set for
> 	libgcc_cv_powerpc_3_1_float128_hw.
> 	* libgcc/config/rs6000/fixkfti.c: Renamed to fixkfti-sw.c.
> 	Change calls of __fixkfti to __fixkfti_sw.
> 	* libgcc/config/rs6000/fixunskfti.c: Renamed to fixunskfti-sw.c.
> 	Change calls of __fixunskfti to __fixunskfti_sw.
> 	* libgcc/config/rs6000/float128-p10.c (__floattikf_hw,
> 	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw): New file.
> 	* libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1): New macro.
> 	(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
> 	__fixunskfti_resolve): Add resolve functions.
> 	(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New functions.
> 	* libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
> 	__fixtfti, __fixunstfti): Add editor commands to change names.
> 	* libgcc/config/rs6000/float128-sed-hw (__floattitf,
> 	__floatuntitf, __fixtfti, __fixunstfti): Add editor commands to
> 	change names.
> 	* libgcc/config/rs6000/floattikf.c: Renamed to floattikf-sw.c.
> 	* libgcc/config/rs6000/floatuntikf.c: Renamed to floatuntikf-sw.c.
> 	* libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
> 	__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
> 	__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
> 	__floatuntikf, __fixkfti, __fixunskfti): New extern declarations.
> 	* libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
> 	fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
> 	(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
> 	file names to fp128_ppc_funcs.
> 	* libgcc/config/rs6000/t-float128-hw(fp128_3_1_hw_funcs,
> 	fp128_3_1_hw_src, fp128_3_1_hw_static_obj, fp128_3_1_hw_shared_obj,
> 	fp128_3_1_hw_obj): Add variables for ISA 3.1 support.
> 	* libgcc/config/rs6000/t-float128-p10-hw: New file.
> 	* configure: Update script for isa 3.1 128-bit float support.
> 	* configure.ac: Add check for 128-bit float hardware support.


Ok.

Only skimmed, nothing else jumped out at me from below.

thanks
-Will




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5 ver4] RS6000: Add 128-bit Integer Operations
  2021-04-26 16:36 ` [PATCH 2/5 " Carl Love
@ 2021-04-27 23:46   ` will schmidt
  2021-06-04 20:20     ` Segher Boessenkool
  2021-06-04 23:14   ` Segher Boessenkool
  1 sibling, 1 reply; 20+ messages in thread
From: will schmidt @ 2021-04-27 23:46 UTC (permalink / raw)
  To: Carl Love, Segher Boessenkool
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

On Mon, 2021-04-26 at 09:36 -0700, Carl Love wrote:
> Will, Segher:
> 
> This patch adds the 128-bit integer support for divide, modulo, shift,
> compare of 128-bit integers instructions and builtin support.

Hi,

> 
> The patch has been tested on
>     powerpc64-linux instead (Power 8 BE)
>     powerpc64-linux instead (Power 9 LE)
>     powerpc64-linux instead (Power 10 LE)
> 
> Please let me know if the patch is acceptable for mainline.
> 
>                    Carl Love
> 
> -----------------------------------------------------------
> 
> gcc/ChangeLog
> 
> 2021-04-26  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/altivec.h (vec_dive, vec_mod): Add define for new
> 	builtins.
> 	* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
> 	UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
> 	(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
> 	altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
> 	altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
> 	altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
> 	altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
> 	define_insn.
> 	(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
> 	vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
> 	altivec_vrlqnm): New define_expands.
> 	* config/rs6000/rs6000-builtin.def (VCMPEQUT_P, VCMPGTST_P,
> 	VCMPGTUT_P): Add macro expansions.
> 	(BU_P10V_AV_P): Add builtin predicate definition.
> 	(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
> 	CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
> 	VCMPAET_P, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ,
> 	VSLQ, VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
> 	MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
> 	(VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD): New overload expansions.
> 	* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
> 	P10V_BUILTIN_CMPGE_1TI, P10V_BUILTIN_CMPGE_U1TI,
> 	P10V_BUILTIN_VCMPGTUT, P10V_BUILTIN_VCMPGTST,
> 	P10V_BUILTIN_CMPLE_1TI, P10V_BUILTIN_VCMPLE_U1TI,
> 	P10V_BUILTIN_DIV_V1TI, P10V_BUILTIN_UDIV_V1TI,
> 	P10V_BUILTIN_VMULESD, P10V_BUILTIN_VMULEUD,
> 	P10V_BUILTIN_VMULOSD, P10V_BUILTIN_VMULOUD,
> 	P10V_BUILTIN_VNOR_V1TI, P10V_BUILTIN_VNOR_V1TI_UNS,
> 	P10V_BUILTIN_VRLQ, P10V_BUILTIN_VRLQMI,
> 	P10V_BUILTIN_VRLQNM, P10V_BUILTIN_VSLQ,
> 	P10V_BUILTIN_VSRQ, P10V_BUILTIN_VSRAQ,
> 	P10V_BUILTIN_VCMPGTUT_P, P10V_BUILTIN_VCMPGTST_P,
> 	P10V_BUILTIN_VCMPEQUT_P, P10V_BUILTIN_VCMPGTUT_P,
> 	P10V_BUILTIN_VCMPGTST_P, P10V_BUILTIN_CMPNET,
> 	P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P,
> 	P10V_BUILTIN_DIVES_V1TI, P10V_BUILTIN_MODS_V1TI,
> 	P10V_BUILTIN_MODU_V1TI):
> 	New overloaded definitions.
> 	(rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT,
> 	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
> 	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
> 	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
> 	P10_BUILTIN_CMPLE_U1TI]: New case statements.

No signs of P10_BUILTIN_CMPNET below.  possibly P10V_BUILTIN_CMPNET?  
S
ame through at least P10_BUILTIN_CMPLE_U1TI.


> 	(rs6000_init_builtins) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
> 	New assignments.
> 	(altivec_init_builtins): New E_V1TImode case statement.
> 	(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
> 	P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
> 	P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
> 	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.
> 	* config/rs6000/r6000.c (rs6000_handle_altivec_attribute)[E_TImode,
> 	E_V1TImode]: New case statements.
> 	* config/rs6000/r6000.h (rs6000_builtin_type_index): New enum
> 	value RS6000_BTI_bool_V1TI.
> 	* config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti,
> 	vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti,
> 	vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
> 	vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
> 	vlshrv1ti3, vashrv1ti3): New define_expands.
> 	* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
> 	UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
> 	UNSPEC_VSX_MODUQ): New unspecs.
> 	(mulv2di3, vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti,
> 	vsx_diveu_v1ti,	vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti): New
> 	define_insns.
> 	(vcmpnet): New define_expand.
> 	* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
> 	vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
> 	vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,
> 	vec_cmpge, vec_cmple, vec_all_eq, vec_all_ne, vec_all_gt, vec_all_lt,
> 	vec_all_ge, vec_all_le, vec_any_eq, vec_any_ne, vec_any_gt, vec_any_lt,
> 	vec_any_ge, vec_any_le.
> 
> gcc/testsuite/ChangeLog
> 
> 2021-04-26 Carl Love  <cel@us.ibm.com>
> 	* gcc.target/powerpc/int_128bit-runnable.c: New test file.


ok.




> ---
>  gcc/config/rs6000/altivec.h                   |    3 +
>  gcc/config/rs6000/altivec.md                  |  241 ++
>  gcc/config/rs6000/rs6000-builtin.def          |   48 +-
>  gcc/config/rs6000/rs6000-call.c               |  136 +-
>  gcc/config/rs6000/rs6000.c                    |    1 +
>  gcc/config/rs6000/rs6000.h                    |    3 +-
>  gcc/config/rs6000/vector.md                   |  191 ++
>  gcc/config/rs6000/vsx.md                      |   89 +

I've skimmed those changes, nothing jumped out at me there.

>  gcc/doc/extend.texi                           |  171 ++

Only recommendation for extend.texi content is to spell out "result"
instead of just "R".  

>  .../gcc.target/powerpc/int_128bit-runnable.c  | 2263 +++++++++++++++++
ok.


Thanks
-Will


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5 ver4]  RS6000, Add test 128-bit shifts for just the int128 type.
  2021-04-26 16:36 ` [PATCH 4/5 ver4] RS6000, Add test 128-bit shifts for just the int128 type Carl Love
@ 2021-04-27 23:46   ` will schmidt
  2021-06-05  0:03   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: will schmidt @ 2021-04-27 23:46 UTC (permalink / raw)
  To: Carl Love, Segher Boessenkool
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

On Mon, 2021-04-26 at 09:36 -0700, Carl Love wrote:
> Will, Segher:
> 
> The previous patch added the vector 128-bit integer shift instruction
> support for the V1TI type.  This patch renames and moves the VSX_TI
> iterator from vsx.md to VEC_TI in vector.md.  The uses of VEC_TI are
> also updated.
> 
> The patch has been tested on
>     powerpc64-linux instead (Power 8 BE)
>     powerpc64-linux instead (Power 9 LE)
>     powerpc64-linux instead (Power 10 LE)
> 
> Please let me know if the patch is acceptable for mainline.
> 
>                    Carl Love
> ------------------------------------------------
> 
> gcc/ChangeLog
> 
> 2021-04-26  Carl Love  <cel@us.ibm.com>
> 	* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
> 	Rename to altivec_vslq_<mode>, altivec_vsrq_<mode>, mode VEC_TI.
> 	* config/rs6000/vector.md (VEC_TI): Was named VSX_TI in vsx.md.
> 	(vashlv1ti3): Change to vashl<mode>3, mode VEC_TI.
> 	(vlshrv1ti3): Change to vlshr<mode>3, mode VEC_TI.
> 	* config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator. Update
> 	uses of VSX_TI to VEC_TI.
> 
> gcc/testsuite/ChangeLog
> 
> 2021-04-26  Carl Love  <cel@us.ibm.com>
> 	gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
> 	tests.
> ---
>  gcc/config/rs6000/altivec.md                  | 16 ++++-----
>  gcc/config/rs6000/vector.md                   | 27 ++++++++-------
>  gcc/config/rs6000/vsx.md                      | 33 +++++++++----------
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++++++--
>  4 files changed, 52 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index c4c82b33f8d..c7d2cd0aa88 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2226,10 +2226,10 @@
>    "vsl<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vslq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_insn "altivec_vslq_<mode>"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
> +		     (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
>    "TARGET_POWER10"
>    /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
>    "vslq %0,%1,%2"
> @@ -2243,10 +2243,10 @@
>    "vsr<VI_char> %0,%1,%2"
>    [(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vsrq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_insn "altivec_vsrq_<mode>"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
> +			   (match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
>    "TARGET_POWER10"
>    /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
>    "vsrq %0,%1,%2"

ok

> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 55bbaa9c32f..5695154e316 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; 128-bit int modes
> +(define_mode_iterator VEC_TI [V1TI TI])
> +
>  ;; Vector int modes for parity
>  (define_mode_iterator VEC_IP [V8HI
>  			      V4SI
> @@ -1627,17 +1630,17 @@
>    "")
> 
>  ;; No immediate version of this 128-bit instruction
> -(define_expand "vashlv1ti3"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		     (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_expand "vashl<mode>3"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
> +			 (match_operand:VEC_TI 2 "vsx_register_operand")))]
>    "TARGET_POWER10"
>  {
>    /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
> -  rtx tmp = gen_reg_rtx (V1TImode);
> +  rtx tmp = gen_reg_rtx (<MODE>mode);
> 
>    emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
> -  emit_insn (gen_altivec_vslq (operands[0], operands[1], tmp));
> +  emit_insn(gen_altivec_vslq_<mode> (operands[0], operands[1], tmp));
>    DONE;
>  })
> 
> @@ -1650,17 +1653,17 @@
>    "")
> 
>  ;; No immediate version of this 128-bit instruction
> -(define_expand "vlshrv1ti3"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> -	(lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -		       (match_operand:V1TI 2 "vsx_register_operand" "v")))]
> +(define_expand "vlshr<mode>3"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> +	(lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
> +			   (match_operand:VEC_TI 2 "vsx_register_operand")))]
>    "TARGET_POWER10"
>  {
>    /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
> -  rtx tmp = gen_reg_rtx (V1TImode);
> +  rtx tmp = gen_reg_rtx (<MODE>mode);
> 
>    emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
> -  emit_insn (gen_altivec_vsrq (operands[0], operands[1], tmp));
> +  emit_insn(gen_altivec_vsrq_<mode> (operands[0], operands[1], tmp));
>    DONE;
>  })
> 

ok


> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index ba539549024..587011081e1 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -37,9 +37,6 @@
>  				  TI
>  				  V1TI])
> 
> -;; Iterator for 128-bit integer types that go in a single vector register.
> -(define_mode_iterator VSX_TI [TI V1TI])
> -
>  ;; Iterator for the 2 32-bit vector types
>  (define_mode_iterator VSX_W [V4SF V4SI])
> 
> @@ -952,9 +949,9 @@
>  ;; special V1TI container class, which it is not appropriate to use vec_select
>  ;; for the type.
>  (define_insn "*vsx_le_permute_<mode>"
> -  [(set (match_operand:VSX_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
> -	(rotate:VSX_TI
> -	 (match_operand:VSX_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
> +  [(set (match_operand:VEC_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
> +	(rotate:VEC_TI
> +	 (match_operand:VEC_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
>  	 (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
>    "@
> @@ -968,10 +965,10 @@
>     (set_attr "type" "vecperm,vecload,vecstore,*,load,store")])
> 
>  (define_insn_and_split "*vsx_le_undo_permute_<mode>"
> -  [(set (match_operand:VSX_TI 0 "vsx_register_operand" "=wa,wa")
> -	(rotate:VSX_TI
> -	 (rotate:VSX_TI
> -	  (match_operand:VSX_TI 1 "vsx_register_operand" "0,wa")
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=wa,wa")
> +	(rotate:VEC_TI
> +	 (rotate:VEC_TI
> +	  (match_operand:VEC_TI 1 "vsx_register_operand" "0,wa")
>  	  (const_int 64))
>  	 (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX"
> @@ -1043,11 +1040,11 @@
>  ;; Peepholes to catch loads and stores for TImode if TImode landed in
>  ;; GPR registers on a little endian system.
>  (define_peephole2
> -  [(set (match_operand:VSX_TI 0 "int_reg_operand")
> -	(rotate:VSX_TI (match_operand:VSX_TI 1 "memory_operand")
> +  [(set (match_operand:VEC_TI 0 "int_reg_operand")
> +	(rotate:VEC_TI (match_operand:VEC_TI 1 "memory_operand")
>  		       (const_int 64)))
> -   (set (match_operand:VSX_TI 2 "int_reg_operand")
> -	(rotate:VSX_TI (match_dup 0)
> +   (set (match_operand:VEC_TI 2 "int_reg_operand")
> +	(rotate:VEC_TI (match_dup 0)
>  		       (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
>     && (rtx_equal_p (operands[0], operands[2])
> @@ -1055,11 +1052,11 @@
>     [(set (match_dup 2) (match_dup 1))])
> 
>  (define_peephole2
> -  [(set (match_operand:VSX_TI 0 "int_reg_operand")
> -	(rotate:VSX_TI (match_operand:VSX_TI 1 "int_reg_operand")
> +  [(set (match_operand:VEC_TI 0 "int_reg_operand")
> +	(rotate:VEC_TI (match_operand:VEC_TI 1 "int_reg_operand")
>  		       (const_int 64)))
> -   (set (match_operand:VSX_TI 2 "memory_operand")
> -	(rotate:VSX_TI (match_dup 0)
> +   (set (match_operand:VEC_TI 2 "memory_operand")
> +	(rotate:VEC_TI (match_dup 0)
>  		       (const_int 64)))]
>    "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
>     && peep2_reg_dead_p (2, operands[0])"


ok

> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> index 625b3869118..9f7dbc6cc75 100644
> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -52,6 +52,18 @@ void print_i128(__int128_t val)
> 
>  void abort (void);
> 
> +__attribute__((noinline))
> +__int128_t shift_right (__int128_t a, __uint128_t b)
> +{
> +  return a >> b;
> +}
> +
> +__attribute__((noinline))
> +__int128_t shift_left (__int128_t a, __uint128_t b)
> +{
> +  return a << b;
> +}
> +
>  int main ()
>  {
>    int i, result_int;
> @@ -102,7 +114,7 @@ int main ()
>  #endif
>    }
> 
> -  arg1 = 3;
> +  arg1 = vec_result[0];
>    uarg2 = 4;
>    expected_result = arg1*16;
> 
> @@ -186,7 +198,7 @@ int main ()
>  #endif
>    }
> 
> -  arg1 = 48;
> +  arg1 = vec_uresult[0];
>    uarg2 = 4;
>    expected_result = arg1/16;

ok

> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5 ver4]   RS6000: Add TI to TD (128-bit DFP) and TD to TI support
  2021-04-26 16:36 ` [PATCH 3/5 ver4] RS6000: Add TI to TD (128-bit DFP) and TD to TI support Carl Love
@ 2021-04-27 23:46   ` will schmidt
  2021-06-04 23:55   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: will schmidt @ 2021-04-27 23:46 UTC (permalink / raw)
  To: Carl Love, Segher Boessenkool
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

On Mon, 2021-04-26 at 09:36 -0700, Carl Love wrote:
> Will, Segher:
> 


Hi,


> This patch adds support for converting to/from 128-bit integers and
> 128-bit decimal floating point formats.


You reference TI,TD in the subject, would be helpful to elaborate a bit in your description.


> 
> The patch has been tested on
>     powerpc64-linux instead (Power 8 BE)
>     powerpc64-linux instead (Power 9 LE)
>     powerpc64-linux instead (Power 10 LE)
> 
> Please let me know if the patch is acceptable for mainline.
> 
>                    Carl Love
> ------------------------------------------------
> 
> gcc/ChangeLog
> dje.gcc@gmail.com, gcc-patches@gcc.gnu.org, Bill Schmidt <wschmidt@linux.vnet.ibm.com>, Peter Bergner <bergner@vnet.ibm.com>,  
> 2021-04-26  Carl Love  <cel@us.ibm.com>


Something there seems wrong.

>         * config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.


>         * config/rs6000/rs6000-call.c (P10V_BUILTIN_VCMPNET_P,
> 	P10V_BUILTIN_VCMPAET_P): New overloaded definitions.

I don't see this below.

> 
> gcc/testsuite/ChangeLog
> 
> 2021-04-26  Carl Love  <cel@us.ibm.com>
>         * gcc.target/powerpc/int_128bit-runnable.c: Add 128-bit DFP
>         conversion tests.

ok

> ---
>  gcc/config/rs6000/dfp.md                      | 14 +++++
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 61 

> +++++++++++++++++++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
> index 026be5d48a6..b89d5ecc91d 100644
> --- a/gcc/config/rs6000/dfp.md
> +++ b/gcc/config/rs6000/dfp.md
> @@ -226,6 +226,13 @@
>    [(set_attr "type" "dfp")
>     (set_attr "size" "128")])
> 
> +(define_insn "floattitd2"
> +  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
> +	(float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
> +  "TARGET_POWER10"
> +  "dcffixqq %0,%1"
> +  [(set_attr "type" "dfp")])
> +
>  ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
>  ;; This is the first stage of converting it to an integer type.
> 
> @@ -247,6 +254,13 @@
>    "dctfix<q> %0,%1"
>    [(set_attr "type" "dfp")
>     (set_attr "size" "<bits>")])
> +
> +(define_insn "fixtdti2"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> +	(fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
> +  "TARGET_POWER10"
> +  "dctfixqq %0,%1"
> +  [(set_attr "type" "dfp")])

ok


>  
>  ;; Decimal builtin support
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> index 042758c8684..625b3869118 100644
> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -37,6 +37,7 @@
>  #if DEBUG
>  #include <stdio.h>
>  #include <stdlib.h>
> +#include <math.h>
> 
> 
>  void print_i128(__int128_t val)
> @@ -58,6 +59,13 @@ int main ()
>    __int128_t arg1, result;
>    __uint128_t uarg2;
> 
> +  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
> +
> +  struct conv_t {
> +    __uint128_t u128;
> +    _Decimal128 d128;
> +  } conv, conv2;
> +
>    vector signed long long int vec_arg1_di, vec_arg2_di;
>    vector signed long long int vec_result_di, vec_expected_result_di;
>    vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
> @@ -2258,6 +2266,59 @@ int main ()
>      abort();
>  #endif
>    }
> +  
> +  /* DFP to __int128 and __int128 to DFP conversions */
> +  /* Print the DFP value as an unsigned int so we can see the bit patterns.  */
> +  conv.u128 = 0x2208000000000000ULL;
> +  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
> +  expected_result_dfp128 = conv.d128;
> 
> +  arg1 = 4;
> +
> +  conv.d128 = (_Decimal128) arg1;
> +
> +  result_dfp128 = (_Decimal128) arg1;
> +  if (((conv.u128 >>64) != 0x2208000000000000ULL) &&
> +      ((conv.u128 & 0xFFFFFFFFFFFFFFFF) != 0x4ULL)) {
> +#if DEBUG
> +    printf("ERROR:  convert int128 value ");
> +    print_i128 (arg1);
> +    conv.d128 = result_dfp128;
> +    printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
> +	   (unsigned long long)((conv.u128) >>64),
> +	   (unsigned long long)((conv.u128) & 0xFFFFFFFFFFFFFFFF));
> +
> +    conv.d128 = expected_result_dfp128;
> +    printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
> +	   (unsigned long long) (conv.u128>>64),
> +	   (unsigned long long) (conv.u128 & 0xFFFFFFFFFFFFFFFF));
> +#else

> +    abort();
> +#endif
> +  }
> +
> +  expected_result = 4;
> +
> +  conv.u128 = 0x2208000000000000ULL;
> +  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
> +  arg1_dfp128 = conv.d128;
> +
> +  result = (__int128_t) arg1_dfp128;
> +
> +  if (result != expected_result) {
> +#if DEBUG
> +    printf("ERROR:  convert DFP value ");
> +    printf("0x%llx %llx (printed as hex bit string) ",
> +	   (unsigned long long)(conv.u128>>64),
> +	   (unsigned long long)(conv.u128 & 0xFFFFFFFFFFFFFFFF));
> +    printf("to __int128 value = ");
> +    print_i128 (result);
> +    printf("\ndoes not match expected_result = ");
> +    print_i128 (expected_result);
> +    printf("\n");
> +#else
> +    abort();
> +#endif
> +  }
>    return 0;
>  }


ok




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5 ver4] RS6000: Add 128-bit Integer Operations
  2021-04-26 16:35 ` [PATCH 1/5 " Carl Love
@ 2021-04-27 23:46   ` will schmidt
  2021-04-28 17:58     ` Carl Love
  2021-05-26 21:03   ` Segher Boessenkool
  1 sibling, 1 reply; 20+ messages in thread
From: will schmidt @ 2021-04-27 23:46 UTC (permalink / raw)
  To: Carl Love, Segher Boessenkool
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

On Mon, 2021-04-26 at 09:35 -0700, Carl Love wrote:
> Will, Segher:
> 
> This patch fixes the order of the argument in the vec_rlmi and
> vec_rlnm builtins.  The patch also adds a new test cases to verify
> the fix.
> 
> The patch has been tested on
>     powerpc64-linux instead (Power 8 BE)
>     powerpc64-linux instead (Power 9 LE)
>     powerpc64-linux instead (Power 10 LE)
> 
> Please let me know if the patch is acceptable for mainline.
> 
>                    Carl Love

Hi,

Is there an existing PR for this one? 

The subject line does not reflect this patch.
subject: Re: [PATCH 1/5 ver4] RS6000: Add 128-bit Integer Operations


> 
> ------------------------------------------------------------
> 
> 2021-04-26  Carl Love  <cel@us.ibm.com>
> 
> gcc/
> 	* config/rs6000/altivec.md (altivec_vrl<VI_char>mi): Fix
> 	bug in argument generation.
> 
> gcc/testsuite/
> 	gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c:
> 	New runnable test case.
> 	gcc.target/powerpc/vec-rlmi-rlnm.c: Update scan assembler times
> 	for xxlor instruction.

Need leading "*" on the file names above.


> ---
>  gcc/config/rs6000/altivec.md                  |   6 +-
>  .../powerpc/check-builtin-vec_rlnm-runnable.c | 231 ++++++++++++++++++
>  .../gcc.target/powerpc/vec-rlmi-rlnm.c        |   2 +-
>  3 files changed, 235 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 1351dafbc41..97dc9d2bda9 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1987,12 +1987,12 @@
> 
>  (define_insn "altivec_vrl<VI_char>mi"
>    [(set (match_operand:VIlong 0 "register_operand" "=v")
> -        (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
> -	                (match_operand:VIlong 2 "register_operand" "v")
> +        (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
> +	                (match_operand:VIlong 2 "register_operand" "0")
>  		        (match_operand:VIlong 3 "register_operand" "v")]
>  		       UNSPEC_VRLMI))]
>    "TARGET_P9_VECTOR"
> -  "vrl<VI_char>mi %0,%2,%3"
> +  "vrl<VI_char>mi %0,%1,%3"
>    [(set_attr "type" "veclogical")])

ok

> 
>  (define_insn "altivec_vrl<VI_char>nm"
> diff --git a/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
> new file mode 100644
> index 00000000000..be8f82d8a06
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
> @@ -0,0 +1,231 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
> +
> +/* Verify the vec_rlm and vec_rlmi builtins works correctly.  */
> +/* { dg-final { scan-assembler-times {\mvrldmi\M} 1 } } */
> +
> +#include <altivec.h>
> +
> +#ifdef DEBUG
> +#include <stdio.h>
> +#include <stdlib.h>
> +#endif
> +
> +void abort (void);
> +
> +int main ()
> +{
> +  int i;
> +
> +  vector unsigned int vec_arg1_int, vec_arg2_int, vec_arg3_int;
> +  vector unsigned int vec_result_int, vec_expected_result_int;
> +  
> +  vector unsigned long long int vec_arg1_di, vec_arg2_di, vec_arg3_di;
> +  vector unsigned long long int vec_result_di, vec_expected_result_di;
> +
> +  unsigned int mask_begin, mask_end, shift;
> +  unsigned long long int mask;
> +
> +/* Check vec int version of vec_rlmi builtin */
> +  mask = 0;
> +  mask_begin = 0;
> +  mask_end   = 4;
> +  shift = 16;
> +
> +  for (i = 0; i < 31; i++)
> +    if ((i >= mask_begin) && (i <= mask_end))
> +      mask |= 0x80000000ULL >> i;
> +
> +  for (i = 0; i < 4; i++) {
> +    vec_arg1_int[i] = 0x12345678 + i*0x11111111;
> +    vec_arg2_int[i] = 0xA1B1CDEF;
> +    vec_arg3_int[i] = mask_begin << 16 | mask_end << 8 | shift;
> +
> +    /* do rotate */
> +    vec_expected_result_int[i] =  ( vec_arg2_int[i] & ~mask) 
> +      | ((vec_arg1_int[i] << shift) | (vec_arg1_int[i] >> (32-shift))) & mask;
> +      
> +  }
> +
> +  /* vec_rlmi(arg1, arg2, arg3)
> +     result - rotate each element of arg2 left and inserting it into arg1 
> +       element based on the mask specified in arg3.  The shift, mask
> +       start and end is specified in arg3.  */
> +  vec_result_int = vec_rlmi (vec_arg1_int, vec_arg2_int, vec_arg3_int);

s/inserting /inserts /


> +
> +  for (i = 0; i < 4; i++) {
> +    if (vec_result_int[i] != vec_expected_result_int[i])
> +#ifdef DEBUG
> +      printf("ERROR: i = %d, vec_rlmi int result 0x%x, does not match "
> +	     "expected result 0x%x\n", i, vec_result_int[i],
> +	     vec_expected_result_int[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +/* Check vec long long int version of vec_rlmi builtin */
> +  mask = 0;
> +  mask_begin = 0;
> +  mask_end   = 4;
> +  shift = 16;
> +
> +  for (i = 0; i < 31; i++)
> +    if ((i >= mask_begin) && (i <= mask_end))
> +      mask |= 0x8000000000000000ULL >> i;
> +
> +  for (i = 0; i < 2; i++) {
> +    vec_arg1_di[i] = 0x1234567800000000 + i*0x11111111;
> +    vec_arg2_di[i] = 0xA1B1C1D1E1F12345;
> +    vec_arg3_di[i] = mask_begin << 16 | mask_end << 8 | shift;
> +
> +    /* do rotate */
> +    vec_expected_result_di[i] =  ( vec_arg2_di[i] & ~mask) 
> +      | ((vec_arg1_di[i] << shift) | (vec_arg1_di[i] >> (64-shift))) & mask;
> +  }
> +
> +  /* vec_rlmi(arg1, arg2, arg3)
> +     result - rotate each element of arg1 left and inserting it into arg2 
> +       element of arg2 based on the mask specified in arg3.  The shift, mask
> +       start and end is specified in arg3.  */

"insert into arg2 element of arg2" doesn't seem right.

> +  vec_result_di = vec_rlmi (vec_arg1_di, vec_arg2_di, vec_arg3_di);
> +
> +  for (i = 0; i < 2; i++) {
> +    if (vec_result_di[i] != vec_expected_result_di[i])
> +#ifdef DEBUG
> +      printf("ERROR: i = %d, vec_rlmi int long long result 0x%llx, does not match "
> +	     "expected result 0x%llx\n", i, vec_result_di[i],
> +	     vec_expected_result_di[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  /* Check vec int version of vec_rlnm builtin */
> +  mask = 0;
> +  mask_begin = 0;
> +  mask_end   = 4;
> +  shift = 16;
> +
> +  for (i = 0; i < 31; i++)
> +    if ((i >= mask_begin) && (i <= mask_end))
> +      mask |= 0x80000000ULL >> i;
> +
> +  for (i = 0; i < 4; i++) {
> +    vec_arg1_int[i] = 0x12345678 + i*0x11111111;
> +    vec_arg2_int[i] = shift;
> +    vec_arg3_int[i] = mask_begin << 8 | mask_end;
> +    vec_expected_result_int[i] = (vec_arg1_int[i] << shift) & mask;
> +  }
> +
> +  /* vec_rlnm(arg1, arg2, arg3)
> +     result - rotate each element of arg1 left by shift in element of arg2.
> +       Then AND with mask whose  start/stop bits are specified in element of
> +       arg3.  */
> +  vec_result_int = vec_rlnm (vec_arg1_int, vec_arg2_int, vec_arg3_int);
> +  for (i = 0; i < 4; i++) {
> +    if (vec_result_int[i] != vec_expected_result_int[i])
> +#ifdef DEBUG
> +      printf("ERROR: vec_rlnm, i = %d, int result 0x%x does not match "
> +	     "expected result 0x%x\n", i, vec_result_int[i],
> +	     vec_expected_result_int[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +/* Check vec long int version of builtin */
> +  mask = 0;
> +  mask_begin = 0;
> +  mask_end   = 4;
> +  shift = 20;
> +
> +  for (i = 0; i < 63; i++)
> +    if ((i >= mask_begin) && (i <= mask_end))
> +      mask |= 0x8000000000000000ULL >> i;
> +  
> +  for (i = 0; i < 2; i++) {
> +    vec_arg1_di[i] = 0x123456789ABCDE00ULL + i*0x1111111111111111ULL;
> +    vec_arg2_di[i] = shift;
> +    vec_arg3_di[i] = mask_begin << 8 | mask_end;
> +    vec_expected_result_di[i] = (vec_arg1_di[i] << shift) & mask;
> +  }
> +
> +  vec_result_di = vec_rlnm (vec_arg1_di, vec_arg2_di, vec_arg3_di);
> +
> +  for (i = 0; i < 2; i++) {
> +    if (vec_result_di[i] != vec_expected_result_di[i])
> +#ifdef DEBUG
> +      printf("ERROR: vec_rlnm, i = %d, long long int result 0x%llx does not "
> +	     "match expected result 0x%llx\n", i, vec_result_di[i],
> +	     vec_expected_result_di[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +    /* Check vec int version of vec_vrlnm builtin */




> +  mask = 0;
> +  mask_begin = 0;
> +  mask_end   = 4;
> +  shift = 16;
> +
> +  for (i = 0; i < 31; i++)
> +    if ((i >= mask_begin) && (i <= mask_end))
> +      mask |= 0x80000000ULL >> i;
> +
> +  for (i = 0; i < 4; i++) {
> +    vec_arg1_int[i] = 0x12345678 + i*0x11111111;
> +    vec_arg2_int[i] = mask_begin << 16 | mask_end << 8 | shift;
> +    vec_expected_result_int[i] = (vec_arg1_int[i] << shift) & mask;
> +  }
> +
> +  /* vec_vrlnm(arg1, arg2, arg3)
> +     result - rotate each element of arg1 left then AND with mask.  The mask
> +       start, stop bits is specified in the second argument.  The shift amount
> +       is also specified in the second argument.  */
> +  vec_result_int = vec_vrlnm (vec_arg1_int, vec_arg2_int);
> +
> +  for (i = 0; i < 4; i++) {
> +    if (vec_result_int[i] != vec_expected_result_int[i])
> +#ifdef DEBUG
> +      printf("ERROR: vec_vrlnm, i = %d, int result 0x%x does not match "
> +	     "expected result 0x%x\n", i, vec_result_int[i],
> +	     vec_expected_result_int[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +/* Check vec long int version of vec_vrlnm builtin */
> +  mask = 0;
> +  mask_begin = 0;
> +  mask_end   = 4;
> +  shift = 20;
> +
> +  for (i = 0; i < 63; i++)
> +    if ((i >= mask_begin) && (i <= mask_end))
> +      mask |= 0x8000000000000000ULL >> i;
> +  
> +  for (i = 0; i < 2; i++) {
> +    vec_arg1_di[i] = 0x123456789ABCDE00ULL + i*0x1111111111111111ULL;
> +    vec_arg2_di[i] = mask_begin << 16 | mask_end << 8 | shift;
> +    vec_expected_result_di[i] = (vec_arg1_di[i] << shift) & mask;
> +  }
> +
> +  vec_result_di = vec_vrlnm (vec_arg1_di, vec_arg2_di);
> +
> +  for (i = 0; i < 2; i++) {
> +    if (vec_result_di[i] != vec_expected_result_di[i])
> +#ifdef DEBUG
> +      printf("ERROR: vec_vrlnm, i = %d, long long int result 0x%llx does not "
> +	     "match expected result 0x%llx\n", i, vec_result_di[i],
> +	     vec_expected_result_di[i]);
> +#else
> +      abort();
> +#endif
> +    }
> +
> +  return 0;
> +}

Some comment cosmetic nits, but not a big deal for tests. 
ok.


> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
> index 1e7d7390c5b..b0f26c8f4cb 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
> @@ -62,6 +62,6 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned long long y,
>  /* { dg-final { scan-assembler-times "vextsb2d" 1 } } */
>  /* { dg-final { scan-assembler-times "vslw" 1 } } */
>  /* { dg-final { scan-assembler-times "vsld" 1 } } */
> -/* { dg-final { scan-assembler-times "xxlor" 3 } } */
> +/* { dg-final { scan-assembler-times "xxlor" 5 } } */
>  /* { dg-final { scan-assembler-times "vrlwnm" 2 } } */
>  /* { dg-final { scan-assembler-times "vrldnm" 2 } } */

ok.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH ] RS6000 Add 128-bit Binary Integer sign extend operations
  2021-04-26 16:36 ` [PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values Carl Love
  2021-04-27 23:46   ` will schmidt
@ 2021-04-28 17:39   ` Carl Love
  2021-06-07 19:03     ` Segher Boessenkool
  2021-06-07 18:49   ` [PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values Segher Boessenkool
  2 siblings, 1 reply; 20+ messages in thread
From: Carl Love @ 2021-04-28 17:39 UTC (permalink / raw)
  To: Segher Boessenkool, will schmidt, cel
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Segher, Will:

The agreement for the sign extension builtin was to just make it Endian
aware rather then go with a more complex definition.  The prior patch
has been updated with this new functionality.

This patch adds support for the 128-bit extension instruction and
corresponding builtin support for the various sign extensions.

This was originally part of the Add 128-bit Integer operations patch
series.  The patch logically goes with the earlier 5 patch series.

The LE support testing was done on a Power 10.  The regression testing
for LE passes with no regressions.  

The BE support testing was done by generating the BE code sequence and
then manually using gdb and visual inspection to make sure the elements
were correctly reversed and the expected elements were sign extended.

Please let me know if the patch is acceptable for mainline.

                   Carl Love

-------------------------------------------

gcc/ChangeLog

2021-04-28  Carl Love  <cel@us.ibm.com>
	* config/rs6000/altivec.h (vec_signextll, vec_signexti, vec_signextq):
	Add define for new builtins.
	* config/rs6000/altivec.md(altivec_vreveti2): Add define_expand.
	* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
	overloaded builtin definitions.
	(VSIGNEXTSB2W, VSIGNEXTSH2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D,
	VSIGNEXTSD2Q):	Add builtin expansions.
	(SIGNEXT): Add P10 overload definition.
	* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VEC_VSIGNEXTLL,
	P10_BUILTIN_VEC_SIGNEXT): Add overloaded argument definitions.
	* config/rs6000/vsx.md (vsx_sign_extend_v2di_v1ti): Add define_insn.
	(vsignextend_v2di_v1ti, vsignextend_qi_<mode>, vsignextend_hi_<mode>,
	vsignextend_si_v2di)[VIlong]: Add define_expand.
	Make define_insn vsx_sign_extend_si_v2di visible.
	* doc/extend.texi:  Add documentation for the vec_signexti,
	vec_signextll builtins and vec_signextq.

gcc/testsuite/ChangeLog

2021-04-28  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/int_128bit-runnable.c (extsd2q): Update expected
	count.
	Add tests for vec_signextq.
	* gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
---
 gcc/config/rs6000/altivec.h                   |   3 +
 gcc/config/rs6000/altivec.md                  |  24 ++++
 gcc/config/rs6000/rs6000-builtin.def          |  12 ++
 gcc/config/rs6000/rs6000-call.c               |  16 +++
 gcc/config/rs6000/vsx.md                      |  83 +++++++++++-
 gcc/doc/extend.texi                           |  16 +++
 .../gcc.target/powerpc/int_128bit-runnable.c  |  41 +++++-
 .../powerpc/p9-sign_extend-runnable.c         | 128 ++++++++++++++++++
 8 files changed, 321 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 314695a43ca..5b631c7ebaf 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -497,6 +497,8 @@
 
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
+#define vec_signexti  __builtin_vec_vsignexti
+#define vec_signextll __builtin_vec_vsignextll
 
 #endif
 
@@ -715,6 +717,7 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
 
 #ifdef _ARCH_PWR10
+#define vec_signextq  __builtin_vec_vsignextq
 #define vec_dive __builtin_vec_dive
 #define vec_mod  __builtin_vec_mod
 
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index c7d2cd0aa88..61a0905789f 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -4291,6 +4291,30 @@
 })
 
 ;; Vector reverse elements
+(define_expand "altivec_vreveti2"
+  [(set (match_operand:TI 0 "register_operand" "=v")
+	(unspec:TI [(match_operand:TI 1 "register_operand" "v")]
+		      UNSPEC_VREVEV))]
+  "TARGET_ALTIVEC"
+{
+  int i, j, size, num_elements;
+  rtvec v = rtvec_alloc (16);
+  rtx mask = gen_reg_rtx (V16QImode);
+
+  size = GET_MODE_UNIT_SIZE (TImode);
+  num_elements = GET_MODE_NUNITS (TImode);
+
+  for (j = 0; j < num_elements; j++)
+    for (i = 0; i < size; i++)
+      RTVEC_ELT (v, i + j * size)
+	= GEN_INT (i + (num_elements - 1 - j) * size);
+
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
+	     operands[1], mask));
+  DONE;
+})
+
 (define_expand "altivec_vreve<mode>2"
   [(set (match_operand:VEC_A 0 "register_operand" "=v")
 	(unspec:VEC_A [(match_operand:VEC_A 1 "register_operand" "v")]
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index dba22825b79..d55095b01bb 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2877,6 +2877,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,	"vprtybd")
 BU_P9V_OVERLOAD_1 (VPRTYBQ,	"vprtybq")
 BU_P9V_OVERLOAD_1 (VPRTYBW,	"vprtybw")
 BU_P9V_OVERLOAD_1 (VPARITY_LSBB,	"vparity_lsbb")
+BU_P9V_OVERLOAD_1 (VSIGNEXTI,	"vsignexti")
+BU_P9V_OVERLOAD_1 (VSIGNEXTLL,	"vsignextll")
 
 /* 2 argument functions added in ISA 3.0 (power9).  */
 BU_P9_2 (CMPRB,	"byte_in_range",	CONST,	cmprb)
@@ -2888,6 +2890,13 @@ BU_P9_OVERLOAD_2 (CMPRB,	"byte_in_range")
 BU_P9_OVERLOAD_2 (CMPRB2,	"byte_in_either_range")
 BU_P9_OVERLOAD_2 (CMPEQB,	"byte_in_set")
 \f
+
+BU_P9V_AV_1 (VSIGNEXTSB2W,	"vsignextsb2w",		CONST,  vsignextend_qi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSH2W,	"vsignextsh2w",		CONST,  vsignextend_hi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSB2D,	"vsignextsb2d",		CONST,  vsignextend_qi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSH2D,	"vsignextsh2d",		CONST,  vsignextend_hi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSW2D,	"vsignextsw2d",		CONST,  vsignextend_si_v2di)
+
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
 BU_P10V_AV_P (VCMPEQUT_P,	"vcmpequt_p",	CONST,	vector_eq_v1ti_p)
 BU_P10V_AV_P (VCMPGTST_P,	"vcmpgtst_p",	CONST,	vector_gt_v1ti_p)
@@ -2926,6 +2935,8 @@ BU_P10V_AV_2 (VNOR_V1TI,	"vnor_v1ti",	CONST,	norv1ti3)
 BU_P10V_AV_2 (VCMPNET_P,	"vcmpnet_p",	CONST,	vector_ne_v1ti_p)
 BU_P10V_AV_2 (VCMPAET_P,	"vcmpaet_p",	CONST,	vector_ae_v1ti_p)
 
+BU_P10V_AV_1 (VSIGNEXTSD2Q,	"vsignext",     CONST,  vsignextend_v2di_v1ti)
+
 BU_P10V_AV_2 (VMULEUD,	"vmuleud",	CONST,	vec_widen_umult_even_v2di)
 BU_P10V_AV_2 (VMULESD,	"vmulesd",	CONST,	vec_widen_smult_even_v2di)
 BU_P10V_AV_2 (VMULOUD,	"vmuloud",	CONST,	vec_widen_umult_odd_v2di)
@@ -3145,6 +3156,7 @@ BU_CRYPTO_OVERLOAD_2A (VPMSUM,	 "vpmsum")
 BU_CRYPTO_OVERLOAD_3A (VPERMXOR,	 "vpermxor")
 BU_CRYPTO_OVERLOAD_3 (VSHASIGMA, "vshasigma")
 
+BU_P10_OVERLOAD_1 (SIGNEXT, "vsignextq")
 \f
 /* HTM functions.  */
 BU_HTM_1  (TABORT,	"tabort",	CR,	tabort)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 32a8af92458..25beed5f0e4 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5821,6 +5821,19 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI },
 
+  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 */
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
+    RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
+
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+
   /* Overloaded built-in functions for ISA3.1 (power10). */
   { P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 },
@@ -6184,6 +6197,9 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
  { P10_BUILTIN_VEC_XVTLSBB_ONES, P10V_BUILTIN_XVTLSBB_ONES,
     RS6000_BTI_INTSI, RS6000_BTI_unsigned_V16QI, 0, 0 },
 
+  { P10_BUILTIN_VEC_SIGNEXT, P10V_BUILTIN_VSIGNEXTSD2Q,
+     RS6000_BTI_V1TI, RS6000_BTI_V2DI, 0, 0 },
+
   { RS6000_BUILTIN_NONE, RS6000_BUILTIN_NONE, 0, 0, 0, 0 }
 };
 \f
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 587011081e1..c99f80e83dd 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4883,6 +4883,33 @@
    (set_attr "type" "vecload")])
 
 \f
+;; ISA 3.1 vector extend sign support
+(define_insn "vsx_sign_extend_v2di_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
+		     UNSPEC_VSX_SIGN_EXTEND))]
+  "TARGET_POWER10"
+ "vextsd2q %0,%1"
+[(set_attr "type" "vecexts")])
+
+(define_expand "vsignextend_v2di_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
+	(unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
+		     UNSPEC_VSX_SIGN_EXTEND))]
+  "TARGET_POWER10"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+       rtx tmp = gen_reg_rtx (V2DImode);
+
+       emit_insn (gen_altivec_vrevev2di2(tmp, operands[1]));
+       emit_insn (gen_vsx_sign_extend_v2di_v1ti(operands[0], tmp));
+     }
+  else
+    emit_insn (gen_vsx_sign_extend_v2di_v1ti(operands[0], operands[1]));
+  DONE;
+})
+
 ;; ISA 3.0 vector extend sign support
 
 (define_insn "vsx_sign_extend_qi_<mode>"
@@ -4894,6 +4921,24 @@
   "vextsb2<wd> %0,%1"
   [(set_attr "type" "vecexts")])
 
+(define_expand "vsignextend_qi_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(unspec:VIlong
+	 [(match_operand:V16QI 1 "vsx_register_operand" "v")]
+	 UNSPEC_VSX_SIGN_EXTEND))]
+  "TARGET_P9_VECTOR"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      rtx tmp = gen_reg_rtx (V16QImode);
+      emit_insn (gen_altivec_vrevev16qi2(tmp, operands[1]));
+      emit_insn (gen_vsx_sign_extend_qi_<mode>(operands[0], tmp));
+    }
+  else
+    emit_insn (gen_vsx_sign_extend_qi_<mode>(operands[0], operands[1]));
+  DONE;
+})
+
 (define_insn "vsx_sign_extend_hi_<mode>"
   [(set (match_operand:VSINT_84 0 "vsx_register_operand" "=v")
 	(unspec:VSINT_84
@@ -4903,7 +4948,25 @@
   "vextsh2<wd> %0,%1"
   [(set_attr "type" "vecexts")])
 
-(define_insn "*vsx_sign_extend_si_v2di"
+(define_expand "vsignextend_hi_<mode>"
+  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
+	(unspec:VIlong
+	 [(match_operand:V8HI 1 "vsx_register_operand" "v")]
+	 UNSPEC_VSX_SIGN_EXTEND))]
+  "TARGET_P9_VECTOR"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      rtx tmp = gen_reg_rtx (V8HImode);
+      emit_insn (gen_altivec_vrevev8hi2(tmp, operands[1]));
+      emit_insn (gen_vsx_sign_extend_hi_<mode>(operands[0], tmp));
+    }
+  else
+     emit_insn (gen_vsx_sign_extend_hi_<mode>(operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "vsx_sign_extend_si_v2di"
   [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
 	(unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
 		     UNSPEC_VSX_SIGN_EXTEND))]
@@ -4911,6 +4974,24 @@
   "vextsw2d %0,%1"
   [(set_attr "type" "vecexts")])
 
+(define_expand "vsignextend_si_v2di"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
+	(unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
+		     UNSPEC_VSX_SIGN_EXTEND))]
+  "TARGET_P9_VECTOR"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+       rtx tmp = gen_reg_rtx (V4SImode);
+
+       emit_insn (gen_altivec_vrevev4si2(tmp, operands[1]));
+       emit_insn (gen_vsx_sign_extend_si_v2di(operands[0], tmp));
+    }
+  else
+     emit_insn (gen_vsx_sign_extend_si_v2di(operands[0], operands[1]));
+  DONE;
+})
+
 ;; ISA 3.1 vector sign extend
 ;; Move DI value from GPR to TI mode in VSX register, word 1.
 (define_insn "mtvsrdd_diti_w1"
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d1c56edbaa8..3f6956276b3 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -19510,6 +19510,22 @@ The second argument to @var{__builtin_crypto_vshasigmad} and
 integer that is 0 or 1.  The third argument to these built-in functions
 must be a constant integer in the range of 0 to 15.
 
+The following sign extension builtins are provided:
+
+@smallexample
+vector signed int vec_signexti (vector signed char a)
+vector signed long long vec_signextll (vector signed char a)
+vector signed int vec_signexti (vector signed short a)
+vector signed long long vec_signextll (vector signed short a)
+vector signed long long vec_signextll (vector signed int a)
+vector signed long long vec_signextq (vector signed long long a)
+@end smallexample
+
+Each element of the result is produced by sign-extending the element of the
+input vector that would fall in the least significant portion of the result
+element. For example, a sign-extension of a vector signed char to a vector
+signed long long will sign extend the rightmost byte of each doubleword.
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 3.1
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1
 
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 94dbd2b6230..1255ee9f0ab 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -4,7 +4,7 @@
 
 /* Check that the expected 128-bit instructions are generated if the processor
    supports the 128-bit integer instructions. */
-/* { dg-final { scan-assembler-times {\mvextsd2q\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mvextsd2q\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mvslq\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvsrq\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvsraq\M} 2 } } */
@@ -85,6 +85,45 @@ int main ()
   vector __int128 vec_arg1, vec_arg2, vec_result;
   vector unsigned __int128 vec_uarg1, vec_uarg2, vec_uarg3, vec_uresult;
   vector bool __int128  vec_result_bool;
+
+  /* sign extend double to 128-bit integer  */
+  vec_arg1_di[0] = 1000;
+  vec_arg1_di[1] = -123456;
+
+  expected_result = 1000;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
+
+  vec_arg1_di[0] = -123456;
+  vec_arg1_di[1] = 1000;
+
+  expected_result = -123456;
+
+  vec_result = vec_signextq (vec_arg1_di);
+
+  if (vec_result[0] != expected_result) {
+#if DEBUG
+    printf("ERROR: vec_signextq ((long long) %lld) =  ",  vec_arg1_di[0]);
+    print_i128(vec_result[0]);
+    printf("\n does not match expected_result = ");
+    print_i128(expected_result);
+    printf("\n\n");
+#else
+    abort();
+#endif
+  }
   
   /* test shift 128-bit integers.
      Note, shift amount is given by the lower 7-bits of the shift amount. */
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
new file mode 100644
index 00000000000..fdcad019b96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
@@ -0,0 +1,128 @@
+/* { dg-do run { target { *-*-linux* && { lp64 && p9vector_hw } } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
+
+/* These builtins were not defined until ISA 3.1 but only require ISA 3.0
+   support.  */
+
+/* { dg-final { scan-assembler-times {\mvextsb2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2w\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsw2d\M} 1 } } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+#include <stdlib.h>
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector signed char vec_arg_qi, vec_result_qi;
+  vector signed short int vec_arg_hi, vec_result_hi, vec_expected_hi;
+  vector signed int vec_arg_wi, vec_result_wi, vec_expected_wi;
+  vector signed long long vec_result_di, vec_expected_di;
+
+  /* test sign extend byte to word */
+  vec_arg_qi = (vector signed char) {1, 2, 3, 4, 5, 6, 7, 8,
+				     -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_wi = (vector signed int) {1, 5, -1, -5};
+
+  vec_result_wi = vec_signexti (vec_arg_qi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(char, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n",
+	     i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend byte to double */
+  vec_arg_qi = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,
+				    -1, -2, -3, -4, -5, -6, -7, -8};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_qi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(byte, long long int):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to word */
+  vec_arg_hi = (vector signed short int){1, 2, 3, 4, -1, -2, -3, -4};
+  vec_expected_wi = (vector signed int){1, 3, -1, -3};
+
+  vec_result_wi = vec_signexti(vec_arg_hi);
+
+  for (i = 0; i < 4; i++)
+    if (vec_result_wi[i] != vec_expected_wi[i]) {
+#if DEBUG
+      printf("ERROR: vec_signexti(short, int):  ");
+      printf("vec_result_wi[%d] != vec_expected_wi[%d]\n", i, i);
+      printf("vec_result_wi[%d] = %d\n", i, vec_result_wi[i]);
+      printf("vec_expected_wi[%d] = %d\n", i, vec_expected_wi[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend short to double word */
+  vec_arg_hi = (vector signed short int ){1, 3, 5, 7,  -1, -3, -5, -7};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_hi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(short, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  /* test sign extend word to double word */
+  vec_arg_wi = (vector signed int ){1, 3, -1, -3};
+  vec_expected_di = (vector signed long long int){1, -1};
+
+  vec_result_di = vec_signextll(vec_arg_wi);
+
+  for (i = 0; i < 2; i++)
+    if (vec_result_di[i] != vec_expected_di[i]) {
+#if DEBUG
+      printf("ERROR: vec_signextll(word, double):  ");
+      printf("vec_result_di[%d] != vec_expected_di[%d]\n", i, i);
+      printf("vec_result_di[%d] = %lld\n", i, vec_result_di[i]);
+      printf("vec_expected_di[%d] = %lld\n", i, vec_expected_di[i]);
+#else
+      abort();
+#endif
+    }
+
+  return 0;
+}
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5 ver4] RS6000: Add 128-bit Integer Operations
  2021-04-27 23:46   ` will schmidt
@ 2021-04-28 17:58     ` Carl Love
  0 siblings, 0 replies; 20+ messages in thread
From: Carl Love @ 2021-04-28 17:58 UTC (permalink / raw)
  To: will schmidt, Segher Boessenkool
  Cc: dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

On Tue, 2021-04-27 at 18:46 -0500, will schmidt wrote:
> On Mon, 2021-04-26 at 09:35 -0700, Carl Love wrote:
> > Will, Segher:
> > 
> > This patch fixes the order of the argument in the vec_rlmi and
> > vec_rlnm builtins.  The patch also adds a new test cases to verify
> > the fix.
> > 
> > The patch has been tested on
> >      powerpc64-linux instead (Power 8 BE)
> >      powerpc64-linux instead (Power 9 LE)
> >      powerpc64-linux instead (Power 10 LE)
> > 
> > Please let me know if the patch is acceptable for mainline.
> > 
> >                     Carl Love
> 
> Hi,
> 
> Is there an existing PR for this one? 

Will:

Not that I am aware of.  I found the issue as part of doing the
patches.  

                 Carl 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5 ver4] RS6000: Add 128-bit Integer Operations
  2021-04-26 16:35 ` [PATCH 1/5 " Carl Love
  2021-04-27 23:46   ` will schmidt
@ 2021-05-26 21:03   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-05-26 21:03 UTC (permalink / raw)
  To: Carl Love; +Cc: will schmidt, dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Hi!

On Mon, Apr 26, 2021 at 09:35:58AM -0700, Carl Love wrote:
> This patch fixes the order of the argument in the vec_rlmi and
> vec_rlnm builtins.  The patch also adds a new test cases to verify
> the fix.
> 
> The patch has been tested on
>     powerpc64-linux instead (Power 8 BE)
>     powerpc64-linux instead (Power 9 LE)
>     powerpc64-linux instead (Power 10 LE)

Those last two are powerpc64le-linux, right?

> gcc/testsuite/
> 	gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c:
> 	New runnable test case.
> 	gcc.target/powerpc/vec-rlmi-rlnm.c: Update scan assembler times
> 	for xxlor instruction.

These need a leading *, as Will pointed out.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
> @@ -0,0 +1,231 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */

It's a run test, so it needs p9vector_hw (no powerpc_ prefix on that
one).

> --- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
> @@ -62,6 +62,6 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned long long y,

> -/* { dg-final { scan-assembler-times "xxlor" 3 } } */
> +/* { dg-final { scan-assembler-times "xxlor" 5 } } */

This was expected, right?

Okay for trunk, and backports after a while.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5 ver4] RS6000: Add 128-bit Integer Operations
  2021-04-27 23:46   ` will schmidt
@ 2021-06-04 20:20     ` Segher Boessenkool
  0 siblings, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-06-04 20:20 UTC (permalink / raw)
  To: will schmidt; +Cc: Carl Love, dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

On Tue, Apr 27, 2021 at 06:46:16PM -0500, will schmidt wrote:
> On Mon, 2021-04-26 at 09:36 -0700, Carl Love wrote:
> > 	(rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT,
> > 	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
> > 	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
> > 	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
> > 	P10_BUILTIN_CMPLE_U1TI]: New case statements.
> 
> No signs of P10_BUILTIN_CMPNET below.  possibly P10V_BUILTIN_CMPNET?  
> S
> ame through at least P10_BUILTIN_CMPLE_U1TI.

It is P10V_BUILTIN_CMPNET (note the V).


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5 ver4] RS6000: Add 128-bit Integer Operations
  2021-04-26 16:36 ` [PATCH 2/5 " Carl Love
  2021-04-27 23:46   ` will schmidt
@ 2021-06-04 23:14   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-06-04 23:14 UTC (permalink / raw)
  To: Carl Love; +Cc: will schmidt, dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Hi!

On Mon, Apr 26, 2021 at 09:36:12AM -0700, Carl Love wrote:
> This patch adds the 128-bit integer support for divide, modulo, shift,
> compare of 128-bit integers instructions and builtin support.

> 	(rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT,
> 	P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
> 	P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
> 	P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
> 	P10_BUILTIN_CMPLE_U1TI]: New case statements.

> 	(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
> 	P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
> 	P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
> 	P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.

All P10_ here should be P10V_.

> 	* config/rs6000/r6000.c (rs6000_handle_altivec_attribute)[E_TImode,
> 	E_V1TImode]: New case statements.

Space between ) and [.

> +(define_insn "altivec_eqv1ti"
> +  [(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
> +	(eq:V1TI (match_operand:V1TI 1 "altivec_register_operand" "v")
> +		 (match_operand:V1TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vcmpequq %0,%1,%2"
> +  [(set_attr "type" "veccmpfx")])

There already is

(define_insn "altivec_eq<mode>"
  [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
	(eq:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
		(match_operand:VI2 2 "altivec_register_operand" "v")))]
  "<VI_unit>"
  "vcmpequ<VI_char> %0,%1,%2"
  [(set_attr "type" "veccmpfx")])

so changing the iterator VI2 to also include V1TI (or making a new
iterator VI3 or something that includes it) will make this much easier.
(You also need the add something to VI_char; VI_unit already has it).

There are multiple iterators that can be used already.  Especially since
we have the "isa" and "enabled" attributes now :-)  So maybe we can come
up with a good logical name for it.

This can be done later.  But in the future, have an eye out if you can
just use existing patterns, it saves work :-)

The shift type things have the amount to shift by in the wrong place, so
that is a bit of a kink.


All the builtin stuff...  I didn't see mistakes, but I am so glad it
will all be rewritten soon, that is soo hard to look at :-)

> +(define_expand "vector_gtv1ti"
> +  [(set (match_operand:V1TI 0 "vlogical_operand")
> +	(gt:V1TI (match_operand:V1TI 1 "vlogical_operand")
> +		 (match_operand:V1TI 2 "vlogical_operand")))]
> +  "TARGET_POWER10"
> +  "")

So this will require extending VEC_C a bit if we merge patterns.

> +;; Swap upper/lower 64-bit values in a 128-bit vector
> +(define_insn "xxswapd_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(subreg:V1TI
> +           (vec_select:V2DI
> +             (subreg:V2DI
> +                (match_operand:V1TI 1 "vsx_register_operand" "v") 0 )
> +	     (parallel [(const_int 1)(const_int 0)]))
> +           0))]

There are spaces instead of tabs on most of these lines.


Okay for trunk with the trivialities fixed.  Also okay for GCC 11 once
it has been tested on all CPUs and OSes (all we care about that it :-) )
Thanks!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5 ver4] RS6000: Add TI to TD (128-bit DFP) and TD to TI support
  2021-04-26 16:36 ` [PATCH 3/5 ver4] RS6000: Add TI to TD (128-bit DFP) and TD to TI support Carl Love
  2021-04-27 23:46   ` will schmidt
@ 2021-06-04 23:55   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-06-04 23:55 UTC (permalink / raw)
  To: Carl Love; +Cc: will schmidt, dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Hi!

Maybe use "Add floattitd2 and fixtdti2" or similar as title?

On Mon, Apr 26, 2021 at 09:36:19AM -0700, Carl Love wrote:
> gcc/ChangeLog
> dje.gcc@gmail.com, gcc-patches@gcc.gnu.org, Bill Schmidt <wschmidt@linux.vnet.ibm.com>, Peter Bergner <bergner@vnet.ibm.com>,  

What Will said here.

> 2021-04-26  Carl Love  <cel@us.ibm.com>
>         * config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
>         * config/rs6000/rs6000-call.c (P10V_BUILTIN_VCMPNET_P,
> 	P10V_BUILTIN_VCMPAET_P): New overloaded definitions.

That last line is just spurious?

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5 ver4]  RS6000, Add test 128-bit shifts for just the int128 type.
  2021-04-26 16:36 ` [PATCH 4/5 ver4] RS6000, Add test 128-bit shifts for just the int128 type Carl Love
  2021-04-27 23:46   ` will schmidt
@ 2021-06-05  0:03   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-06-05  0:03 UTC (permalink / raw)
  To: Carl Love; +Cc: will schmidt, dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Hi!

On Mon, Apr 26, 2021 at 09:36:26AM -0700, Carl Love wrote:
> The previous patch added the vector 128-bit integer shift instruction
> support for the V1TI type.  This patch renames and moves the VSX_TI
> iterator from vsx.md to VEC_TI in vector.md.  The uses of VEC_TI are
> also updated.

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values.
  2021-04-26 16:36 ` [PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values Carl Love
  2021-04-27 23:46   ` will schmidt
  2021-04-28 17:39   ` [PATCH ] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
@ 2021-06-07 18:49   ` Segher Boessenkool
  2 siblings, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-06-07 18:49 UTC (permalink / raw)
  To: Carl Love; +Cc: will schmidt, dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

On Mon, Apr 26, 2021 at 09:36:33AM -0700, Carl Love wrote:
> This patch adds support for converting to/from 128-bit integers and
> 128-bit decimal floating point formats using the new P10 instructions
> dcffixqq and dctfixqq.  The new instructions are only used on P10 HW,
> otherwise the conversions continue to use the existing SW routines.
> 
> The files fixkfti-sw.c and fixunskfti-sw.c are renamed versions of
> fixkfti.c and fixunskfti.c respectively.  The function names in the
> files were updated with the rename as well as some white spaces fixes.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6421,6 +6421,42 @@
>     xscvsxddp %x0,%x1"
>    [(set_attr "type" "fp")])
>  
> +(define_insn "floatti<mode>2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +       (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]

Broken indent?  It should be indented by 8 spaces, thus, a tab.  (More of
this further on, please fix all).


It isn't clear to me why you need the separate *_sw and *_hw names, or
if it is just to make it clearer, or maybe something else?  Some words
here would have helped :-)


Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH ] RS6000 Add 128-bit Binary Integer sign extend operations
  2021-04-28 17:39   ` [PATCH ] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
@ 2021-06-07 19:03     ` Segher Boessenkool
  0 siblings, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-06-07 19:03 UTC (permalink / raw)
  To: Carl Love; +Cc: will schmidt, dje.gcc, gcc-patches, Bill Schmidt, Peter Bergner

Hi Carl,

On Wed, Apr 28, 2021 at 10:39:14AM -0700, Carl Love wrote:
> The agreement for the sign extension builtin was to just make it Endian
> aware rather then go with a more complex definition.  The prior patch
> has been updated with this new functionality.
> 
> This patch adds support for the 128-bit extension instruction and
> corresponding builtin support for the various sign extensions.
> 
> This was originally part of the Add 128-bit Integer operations patch
> series.  The patch logically goes with the earlier 5 patch series.

But it has nothing to do with patch 5/5, so it confused me that you
posted it as a reply to that.

> +(define_expand "vsignextend_v2di_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> +	(unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
> +		     UNSPEC_VSX_SIGN_EXTEND))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    {
> +       rtx tmp = gen_reg_rtx (V2DImode);
> +
> +       emit_insn (gen_altivec_vrevev2di2(tmp, operands[1]));
> +       emit_insn (gen_vsx_sign_extend_v2di_v1ti(operands[0], tmp));
> +     }
> +  else
> +    emit_insn (gen_vsx_sign_extend_v2di_v1ti(operands[0], operands[1]));
> +  DONE;
> +})

The indentation is broken -- everything should go by two columns.

In cases like this where the pattern in the expand agrees with a
define_insn you have, you can write just

{
  if (BYTES_BIG_ENDIAN)
    {
      ...
      DONE;
    }
}

This is the normal way of writing it.  When a define_expand does not
call DONE or FAIL, the pattern is inserted in the insn stream.


Okay for trunk with the indents fixed, and maybe the expand thing.  Also
okay for GCC 11 if it survived testing on all targets and OSes.  That
goes for all patches in this series btw.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-06-07 19:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <d660f874049b7fdb338ff33478c56b2259828ab1.camel@us.ibm.com>
2021-04-26 16:35 ` [PATCH 0/5 ver4] RS6000: Add 128-bit Integer Operations Carl Love
2021-04-26 16:35 ` [PATCH 1/5 " Carl Love
2021-04-27 23:46   ` will schmidt
2021-04-28 17:58     ` Carl Love
2021-05-26 21:03   ` Segher Boessenkool
2021-04-26 16:36 ` [PATCH 2/5 " Carl Love
2021-04-27 23:46   ` will schmidt
2021-06-04 20:20     ` Segher Boessenkool
2021-06-04 23:14   ` Segher Boessenkool
2021-04-26 16:36 ` [PATCH 3/5 ver4] RS6000: Add TI to TD (128-bit DFP) and TD to TI support Carl Love
2021-04-27 23:46   ` will schmidt
2021-06-04 23:55   ` Segher Boessenkool
2021-04-26 16:36 ` [PATCH 4/5 ver4] RS6000, Add test 128-bit shifts for just the int128 type Carl Love
2021-04-27 23:46   ` will schmidt
2021-06-05  0:03   ` Segher Boessenkool
2021-04-26 16:36 ` [PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values Carl Love
2021-04-27 23:46   ` will schmidt
2021-04-28 17:39   ` [PATCH ] RS6000 Add 128-bit Binary Integer sign extend operations Carl Love
2021-06-07 19:03     ` Segher Boessenkool
2021-06-07 18:49   ` [PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).