[PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction
@ 2014-08-19 10:44 Alan Lawrence
  2014-08-19 13:43 ` [PATCH AArch64 2/2] Remove vector compare/tst __builtins Alan Lawrence
  2014-09-02 15:17 ` [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction Marcus Shawcroft
  0 siblings, 2 replies; 8+ messages in thread
From: Alan Lawrence @ 2014-08-19 10:44 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1812 bytes --]

Vector comparisons are sometimes generated with needless 'not' instructions, and 
'cmtst' is generally not output at all. This patch makes 
gen_aarch64_vcond_internal more intelligent with regard to swapping the operands 
to both the comparison and the conditional move, such that not is avoided when 
possible. Also update the 'tst' pattern to reflect that RTX (ne ...) is no 
longer generated [and (neg (not (eq ...))) is simplify_rtx'd to (plus (eq ...) -1)].

New tests are in terms of the Neon intrinsics - so not 100% exhaustive, but 
second patch will rewrite the Neon intrinsics in terms of a more comprehensive 
set of gcc-vector-extension comparisons.

Bootstrapped on aarch64-none-linux-gnu and cross-tested check-gcc on 
aarch64-none-elf and aarch64_be-none-elf.

gcc/ChangeLog:

	* config/aarch64/aarch64-builtins.c (aarch64_types_cmtst_qualifiers,
	TYPES_TST): Define.
	(aarch64_fold_builtin): Update pattern for cmtst.

	* config/aarch64/aarch64-protos.h (aarch64_const_vec_all_same_int_p):
	Declare.

	* config/aarch64/aarch64-simd-builtins.def (cmtst): Update qualifiers.

	* config/aarch64/aarch64-simd.md (aarch64_vcond_internal<mode><mode>):
	Switch operands, separate out more cases, refactor.

	(aarch64_cmtst<mode>): Rewrite pattern to match (plus ... -1).

	* config/aarch64.c (aarch64_const_vec_all_same_int_p): Take single
	argument; rename old version to...
	(aarch64_const_vec_all_same_in_range_p): ...this.
	(aarch64_print_operand, aarch64_simd_shift_imm_p): Follow renaming.

	* config/aarch64/predicates.md (aarch64_simd_imm_minus_one): Define.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/simd/int_comparisons.x: New file.
	* gcc.target/aarch64/simd/int_comparisons_1.c: New test.
	* gcc.target/aarch64/simd/int_comparisons_2.c: Ditto.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: vcond_pats.patch --]
[-- Type: text/x-patch; name=vcond_pats.patch, Size: 24172 bytes --]

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 5217f4a..4fb8ec0 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -146,6 +146,11 @@ aarch64_types_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_maybe_immediate };
 #define TYPES_BINOP (aarch64_types_binop_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_cmtst_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none,
+      qualifier_internal, qualifier_internal };
+#define TYPES_TST (aarch64_types_cmtst_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_binopv_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_none, qualifier_none };
 #define TYPES_BINOPV (aarch64_types_binopv_qualifiers)
@@ -1297,7 +1302,7 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args,
       BUILTIN_VALLDI (BINOP, cmeq, 0)
 	return fold_build2 (EQ_EXPR, type, args[0], args[1]);
 	break;
-      BUILTIN_VSDQ_I_DI (BINOP, cmtst, 0)
+      BUILTIN_VSDQ_I_DI (TST, cmtst, 0)
 	{
 	  tree and_node = fold_build2 (BIT_AND_EXPR, type, args[0], args[1]);
 	  tree vec_zero_node = build_zero_cst (type);
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index cca3bc9..5c8013d 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -179,6 +179,7 @@ bool aarch64_cannot_change_mode_class (enum machine_mode,
 				       enum reg_class);
 enum aarch64_symbol_type
 aarch64_classify_symbolic_expression (rtx, enum aarch64_symbol_context);
+bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
 bool aarch64_constant_address_p (rtx);
 bool aarch64_expand_movmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 4f3bd12..6aa45b6 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -246,7 +246,7 @@
   /* Implemented by aarch64_cm<cmp><mode>.  */
   BUILTIN_VSDQ_I_DI (BINOP, cmgeu, 0)
   BUILTIN_VSDQ_I_DI (BINOP, cmgtu, 0)
-  BUILTIN_VSDQ_I_DI (BINOP, cmtst, 0)
+  BUILTIN_VSDQ_I_DI (TST, cmtst, 0)
 
   /* Implemented by reduc_<sur>plus_<mode>.  */
   BUILTIN_VALL (UNOP, reduc_splus_, 10)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f5fa4ae..4d5d840 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1871,58 +1871,94 @@
 	  (match_operand:VDQ 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  int inverse = 0, has_zero_imm_form = 0;
   rtx op1 = operands[1];
   rtx op2 = operands[2];
   rtx mask = gen_reg_rtx (<MODE>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  /* Switching OP1 and OP2 is necessary for NE (to output a cmeq insn),
+     and desirable for other comparisons if it results in FOO ? -1 : 0
+     (this allows direct use of the comparison result without a bsl).  */
+  if (code == NE
+      || (code != EQ
+	  && op1 == CONST0_RTX (<V_cmp_result>mode)
+	  && op2 == CONSTM1_RTX (<V_cmp_result>mode)))
+    {
+      op1 = operands[2];
+      op2 = operands[1];
+      switch (code)
+        {
+        case LE: code = GT; break;
+        case LT: code = GE; break;
+        case GE: code = LT; break;
+        case GT: code = LE; break;
+        /* No case EQ.  */
+        case NE: code = EQ; break;
+        case LTU: code = GEU; break;
+        case LEU: code = GTU; break;
+        case GTU: code = LEU; break;
+        case GEU: code = LTU; break;
+        default: gcc_unreachable ();
+        }
+    }
 
-  switch (GET_CODE (operands[3]))
+  /* Make sure we can handle the last operand.  */
+  switch (code)
     {
+    case NE:
+      /* Normalized to EQ above.  */
+      gcc_unreachable ();
+
     case LE:
     case LT:
-    case NE:
-      inverse = 1;
-      /* Fall through.  */
     case GE:
     case GT:
     case EQ:
-      has_zero_imm_form = 1;
-      break;
-    case LEU:
-    case LTU:
-      inverse = 1;
-      break;
+      /* These instructions have a form taking an immediate zero.  */
+      if (operands[5] == CONST0_RTX (<MODE>mode))
+        break;
+      /* Fall through, as may need to load into register.  */
     default:
+      if (!REG_P (operands[5]))
+        operands[5] = force_reg (<MODE>mode, operands[5]);
       break;
     }
 
-  if (!REG_P (operands[5])
-      && (operands[5] != CONST0_RTX (<MODE>mode) || !has_zero_imm_form))
-    operands[5] = force_reg (<MODE>mode, operands[5]);
-
-  switch (GET_CODE (operands[3]))
+  switch (code)
     {
     case LT:
+      emit_insn (gen_aarch64_cmlt<mode> (mask, operands[4], operands[5]));
+      break;
+
     case GE:
       emit_insn (gen_aarch64_cmge<mode> (mask, operands[4], operands[5]));
       break;
 
     case LE:
+      emit_insn (gen_aarch64_cmle<mode> (mask, operands[4], operands[5]));
+      break;
+
     case GT:
       emit_insn (gen_aarch64_cmgt<mode> (mask, operands[4], operands[5]));
       break;
 
     case LTU:
+      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[5], operands[4]));
+      break;
+
     case GEU:
       emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[4], operands[5]));
       break;
 
     case LEU:
+      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[5], operands[4]));
+      break;
+
     case GTU:
       emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[4], operands[5]));
       break;
 
-    case NE:
+    /* NE has been normalized to EQ above.  */
     case EQ:
       emit_insn (gen_aarch64_cmeq<mode> (mask, operands[4], operands[5]));
       break;
@@ -1931,12 +1967,6 @@
       gcc_unreachable ();
     }
 
-  if (inverse)
-    {
-      op1 = operands[2];
-      op2 = operands[1];
-    }
-
     /* If we have (a = (b CMP c) ? -1 : 0);
        Then we can simply move the generated mask.  */
 
@@ -3891,14 +3921,22 @@
 
 ;; cmtst
 
+;; Although neg (ne (and x y) 0) is the natural way of expressing a cmtst,
+;; we don't have any insns using ne, and aarch64_vcond_internal outputs
+;; not (neg (eq (and x y) 0))
+;; which is rewritten by simplify_rtx as
+;; plus (eq (and x y) 0) -1.
+
 (define_insn "aarch64_cmtst<mode>"
   [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w")
-	(neg:<V_cmp_result>
-	  (ne:<V_cmp_result>
+	(plus:<V_cmp_result>
+	  (eq:<V_cmp_result>
 	    (and:VDQ
 	      (match_operand:VDQ 1 "register_operand" "w")
 	      (match_operand:VDQ 2 "register_operand" "w"))
-	    (vec_duplicate:<V_cmp_result> (const_int 0)))))]
+	    (match_operand:VDQ 3 "aarch64_simd_imm_zero"))
+	  (match_operand:<V_cmp_result> 4 "aarch64_simd_imm_minus_one")))
+  ]
   "TARGET_SIMD"
   "cmtst\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
   [(set_attr "type" "neon_tst<q>")]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e7946fc..6a877c2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -137,9 +137,6 @@ static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED;
 static void aarch64_override_options_after_change (void);
 static bool aarch64_vector_mode_supported_p (enum machine_mode);
 static unsigned bit_count (unsigned HOST_WIDE_INT);
-static bool aarch64_const_vec_all_same_int_p (rtx,
-					      HOST_WIDE_INT, HOST_WIDE_INT);
-
 static bool aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode,
 						 const unsigned char *sel);
 static int aarch64_address_cost (rtx, enum machine_mode, addr_space_t, bool);
@@ -3679,6 +3676,36 @@ aarch64_get_condition_code (rtx x)
     }
 }
 
+bool
+aarch64_const_vec_all_same_in_range_p (rtx x,
+				  HOST_WIDE_INT minval,
+				  HOST_WIDE_INT maxval)
+{
+  HOST_WIDE_INT firstval;
+  int count, i;
+
+  if (GET_CODE (x) != CONST_VECTOR
+      || GET_MODE_CLASS (GET_MODE (x)) != MODE_VECTOR_INT)
+    return false;
+
+  firstval = INTVAL (CONST_VECTOR_ELT (x, 0));
+  if (firstval < minval || firstval > maxval)
+    return false;
+
+  count = CONST_VECTOR_NUNITS (x);
+  for (i = 1; i < count; i++)
+    if (INTVAL (CONST_VECTOR_ELT (x, i)) != firstval)
+      return false;
+
+  return true;
+}
+
+bool
+aarch64_const_vec_all_same_int_p (rtx x, HOST_WIDE_INT val)
+{
+  return aarch64_const_vec_all_same_in_range_p (x, val, val);
+}
+
 static unsigned
 bit_count (unsigned HOST_WIDE_INT value)
 {
@@ -3921,9 +3948,10 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	case CONST_VECTOR:
 	  if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
 	    {
-	      gcc_assert (aarch64_const_vec_all_same_int_p (x,
-							    HOST_WIDE_INT_MIN,
-							    HOST_WIDE_INT_MAX));
+	      gcc_assert (
+		  aarch64_const_vec_all_same_in_range_p (x,
+							 HOST_WIDE_INT_MIN,
+							 HOST_WIDE_INT_MAX));
 	      asm_fprintf (f, "%wd", INTVAL (CONST_VECTOR_ELT (x, 0)));
 	    }
 	  else if (aarch64_simd_imm_zero_p (x, GET_MODE (x)))
@@ -7826,39 +7854,15 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, bool inverse,
 #undef CHECK
 }
 
-static bool
-aarch64_const_vec_all_same_int_p (rtx x,
-				  HOST_WIDE_INT minval,
-				  HOST_WIDE_INT maxval)
-{
-  HOST_WIDE_INT firstval;
-  int count, i;
-
-  if (GET_CODE (x) != CONST_VECTOR
-      || GET_MODE_CLASS (GET_MODE (x)) != MODE_VECTOR_INT)
-    return false;
-
-  firstval = INTVAL (CONST_VECTOR_ELT (x, 0));
-  if (firstval < minval || firstval > maxval)
-    return false;
-
-  count = CONST_VECTOR_NUNITS (x);
-  for (i = 1; i < count; i++)
-    if (INTVAL (CONST_VECTOR_ELT (x, i)) != firstval)
-      return false;
-
-  return true;
-}
-
 /* Check of immediate shift constants are within range.  */
 bool
 aarch64_simd_shift_imm_p (rtx x, enum machine_mode mode, bool left)
 {
   int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
   if (left)
-    return aarch64_const_vec_all_same_int_p (x, 0, bit_width - 1);
+    return aarch64_const_vec_all_same_in_range_p (x, 0, bit_width - 1);
   else
-    return aarch64_const_vec_all_same_int_p (x, 1, bit_width);
+    return aarch64_const_vec_all_same_in_range_p (x, 1, bit_width);
 }
 
 /* Return true if X is a uniform vector where all elements
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 3dd83ca..18133eb 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -261,3 +261,9 @@
 {
   return aarch64_simd_imm_zero_p (op, mode);
 })
+
+(define_special_predicate "aarch64_simd_imm_minus_one"
+  (match_code "const_vector")
+{
+  return aarch64_const_vec_all_same_int_p (op, -1);
+})
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons.x b/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons.x
new file mode 100644
index 0000000..3b468eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons.x
@@ -0,0 +1,68 @@
+/*  test_vcXXX wrappers for all the vcXXX (vector compare) and vtst intrinsics
+    in arm_neon.h (excluding the 64x1 variants as these generally produce scalar
+    not vector ops).  */
+#include "arm_neon.h"
+
+#define DONT_FORCE(X)
+
+#define FORCE_SIMD(V1)   asm volatile ("mov %d0, %1.d[0]"       \
+           : "=w"(V1)                                           \
+           : "w"(V1)                                            \
+           : /* No clobbers */);
+
+#define OP1(SIZE, OP, BASETYPE, SUFFIX, FORCE) uint##SIZE##_t	\
+test_v##OP##SUFFIX (BASETYPE##SIZE##_t a)			\
+{								\
+  uint##SIZE##_t res;						\
+  FORCE (a);							\
+  res = v##OP##SUFFIX (a);					\
+  FORCE (res);							\
+  return res;							\
+}
+
+#define OP2(SIZE, OP, BASETYPE, SUFFIX, FORCE) uint##SIZE##_t	\
+test_v##OP##SUFFIX (BASETYPE##SIZE##_t a, BASETYPE##SIZE##_t b) \
+{								\
+  uint##SIZE##_t res;						\
+  FORCE (a);							\
+  FORCE (b);							\
+  res = v##OP##SUFFIX (a, b);					\
+  FORCE (res);							\
+  return res;							\
+}
+
+#define UNSIGNED_OPS(SIZE, BASETYPE, SUFFIX, FORCE) \
+OP2 (SIZE, tst, BASETYPE, SUFFIX, FORCE) \
+OP1 (SIZE, ceqz, BASETYPE, SUFFIX, FORCE) \
+OP2 (SIZE, ceq, BASETYPE, SUFFIX, FORCE) \
+OP2 (SIZE, cge, BASETYPE, SUFFIX, FORCE) \
+OP2 (SIZE, cgt, BASETYPE, SUFFIX, FORCE) \
+OP2 (SIZE, cle, BASETYPE, SUFFIX, FORCE) \
+OP2 (SIZE, clt, BASETYPE, SUFFIX, FORCE)
+
+#define ALL_OPS(SIZE, BASETYPE, SUFFIX, FORCE) \
+OP1 (SIZE, cgez, BASETYPE, SUFFIX, FORCE) \
+OP1 (SIZE, cgtz, BASETYPE, SUFFIX, FORCE) \
+OP1 (SIZE, clez, BASETYPE, SUFFIX, FORCE) \
+OP1 (SIZE, cltz, BASETYPE, SUFFIX, FORCE) \
+UNSIGNED_OPS (SIZE, BASETYPE, SUFFIX, FORCE)
+
+ALL_OPS (8x8, int, _s8, DONT_FORCE)
+ALL_OPS (16x4, int, _s16, DONT_FORCE)
+ALL_OPS (32x2, int, _s32, DONT_FORCE)
+ALL_OPS (64x1, int, _s64, DONT_FORCE)
+ALL_OPS (64, int, d_s64, FORCE_SIMD)
+ALL_OPS (8x16, int, q_s8, DONT_FORCE)
+ALL_OPS (16x8, int, q_s16, DONT_FORCE)
+ALL_OPS (32x4, int, q_s32, DONT_FORCE)
+ALL_OPS (64x2, int, q_s64, DONT_FORCE)
+UNSIGNED_OPS (8x8, uint, _u8, DONT_FORCE)
+UNSIGNED_OPS (16x4, uint, _u16, DONT_FORCE)
+UNSIGNED_OPS (32x2, uint, _u32, DONT_FORCE)
+UNSIGNED_OPS (64x1, uint, _u64, DONT_FORCE)
+UNSIGNED_OPS (64, uint, d_u64, FORCE_SIMD)
+UNSIGNED_OPS (8x16, uint, q_u8, DONT_FORCE)
+UNSIGNED_OPS (16x8, uint, q_u16, DONT_FORCE)
+UNSIGNED_OPS (32x4, uint, q_u32, DONT_FORCE)
+UNSIGNED_OPS (64x2, uint, q_u64, DONT_FORCE)
+
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_1.c b/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_1.c
new file mode 100644
index 0000000..86c6ed2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_1.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-inline" } */
+
+/* Scan-assembler test, so, incorporate as little other code as possible.  */
+
+#include "arm_neon.h"
+#include "int_comparisons.x"
+
+/* Operations on all 18 integer types:  (q?)_[su](8|16|32|64), d_[su]64.
+   (d?)_[us]64 generate regs of form 'd0' rather than e.g. 'v0.2d'.  */
+/* { dg-final { scan-assembler-times "\[ \t\]cmeq\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*#?0" 14 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmeq\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?0" 4 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmeq\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\]" 14 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmeq\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]+d\[0-9\]+" 4 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmtst\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\]" 14 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmtst\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]+d\[0-9\]+" 4 } } */
+
+/* vcge + vcle both implemented with cmge (signed) or cmhs (unsigned).  */
+/* { dg-final { scan-assembler-times "\[ \t\]cmge\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\]" 14 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmge\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]+d\[0-9\]+" 4 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmhs\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\]" 14 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmhs\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]+d\[0-9\]+" 4 } } */
+
+/* vcgt + vclt both implemented with cmgt (signed) or cmhi (unsigned).  */
+/* { dg-final { scan-assembler-times "\[ \t\]cmgt\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\]" 14 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmgt\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]+d\[0-9\]+" 4 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmhi\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\]" 14 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmhi\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]+d\[0-9\]+" 4 } } */
+
+/* Comparisons against immediate zero, on the 8 signed integer types only.  */
+
+/* { dg-final { scan-assembler-times "\[ \t\]cmge\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*#?0" 7 } } */
+/*  For int64_t and int64x1_t, combine_simplify_rtx failure of
+    https://gcc.gnu.org/ml/gcc/2014-06/msg00253.html
+    prevents generation of cmge....#0, instead producing mvn + sshr.  */
+/* { #dg-final { scan-assembler-times "\[ \t\]cmge\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?0" 2 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmgt\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*#?0" 7 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmgt\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?0" 2 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmle\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*#?0" 7 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmle\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?0" 2 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmlt\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*#?0" 7 } } */
+/* For int64_t and int64x1_t, cmlt ... #0 and sshr ... #63 are equivalent,
+   so allow either.  cmgez issue above results in extra 2 * sshr....63.  */
+/* { dg-final { scan-assembler-times "\[ \t\](?:cmlt|sshr)\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?(?:0|63)" 4 } } */
+
+// All should have been compiled into single insns without inverting result:
+/* { dg-final { scan-assembler-not "not" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_2.c b/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_2.c
new file mode 100644
index 0000000..3588231
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_2.c
@@ -0,0 +1,131 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-inline" } */
+/* Stops the test_xxx methods being inlined into main, thus preventing constant
+   propagation.  */
+
+#include "int_comparisons.x"
+
+extern void abort (void);
+
+#define CHECK2(R0, R1) if (res[0] != R0 || res[1] != R1) abort ()
+
+#define TEST2(BASETYPE, SUFFIX, RESTYPE, ST1_SUFFIX) {			\
+  BASETYPE##_t _a[2] = {2, 3};						\
+  BASETYPE##x2_t a = vld1##SUFFIX (_a);					\
+  BASETYPE##_t _b[2] = {1, 3};						\
+  BASETYPE##x2_t b = vld1##SUFFIX (_b);					\
+  RESTYPE res[2];							\
+  vst1##ST1_SUFFIX (res, test_vclt##SUFFIX (a, b)); CHECK2 (0, 0);	\
+  vst1##ST1_SUFFIX (res, test_vclt##SUFFIX (b, a)); CHECK2 (-1, 0);	\
+  vst1##ST1_SUFFIX (res, test_vcle##SUFFIX (a, b)); CHECK2 (0, -1);	\
+  vst1##ST1_SUFFIX (res, test_vcle##SUFFIX (b, a)); CHECK2 (-1, -1);	\
+  vst1##ST1_SUFFIX (res, test_vceq##SUFFIX (a, b)); CHECK2 (0, -1);	\
+  vst1##ST1_SUFFIX (res, test_vcge##SUFFIX (a, b)); CHECK2 (-1, -1);	\
+  vst1##ST1_SUFFIX (res, test_vcge##SUFFIX (b, a)); CHECK2 (0, -1);	\
+  vst1##ST1_SUFFIX (res, test_vcgt##SUFFIX (a, b)); CHECK2 (-1, 0);	\
+  vst1##ST1_SUFFIX (res, test_vcgt##SUFFIX (b, a)); CHECK2 (0, 0);	\
+  vst1##ST1_SUFFIX (res, test_vtst##SUFFIX (a, b)); CHECK2 (0, -1);	\
+  vst1##ST1_SUFFIX (res, test_vtst##SUFFIX (a + 1, b)); CHECK2 (-1, 0); \
+}
+
+#define CHECK4(T, R0, R1, R2, R3)		\
+  if (res[0] != (T)R0 || res[1] != (T)R1	\
+      || res[2] != (T)R2 || res[3] != (T)R3) abort ()
+
+#define TEST4(BASETYPE, SUFFIX, RESTYPE, ST1_SUFFIX) {	\
+  BASETYPE##_t _a[4] = {1, 2, 3, 4};			\
+  BASETYPE##x4_t a = vld1##SUFFIX (_a);			\
+  BASETYPE##_t _b[4] = {4, 2, 1, 3};			\
+  BASETYPE##x4_t b = vld1##SUFFIX (_b);			\
+  RESTYPE res[4];					\
+  vst1##ST1_SUFFIX (res, test_vclt##SUFFIX (a, b));	\
+  CHECK4 (RESTYPE, -1, 0, 0, 0);			\
+  vst1##ST1_SUFFIX (res, test_vcle##SUFFIX (a, b));	\
+  CHECK4 (RESTYPE, -1, -1, 0, 0);			\
+  vst1##ST1_SUFFIX (res, test_vceq##SUFFIX (a, b));	\
+  CHECK4 (RESTYPE, 0, -1, 0, 0);			\
+  vst1##ST1_SUFFIX (res, test_vcge##SUFFIX (a, b));	\
+  CHECK4 (RESTYPE, 0, -1, -1, -1);			\
+  vst1##ST1_SUFFIX (res, test_vcgt##SUFFIX (a, b));	\
+  CHECK4 (RESTYPE, 0, 0, -1, -1);			\
+  vst1##ST1_SUFFIX (res, test_vtst##SUFFIX (a, b));	\
+  CHECK4 (RESTYPE, 0, -1, -1, 0);			\
+}
+
+#define CHECK8(T, R0, R1, R2, R3, R4, R5, R6, R7)			       \
+  if (res[0] != (T)R0 || res[1] != (T)R1 || res[2] != (T)R2 || res[3] != (T)R3 \
+      || res[4] != (T)R4 || res[5] != (T)R5 || res[6] != (T)R6		       \
+      || res[7] != (T)R7) abort ()
+
+#define TEST8(BASETYPE, SUFFIX, RESTYPE, ST1_SUFFIX) {	\
+  BASETYPE##_t _a[8] = {1, 2, 3, 4, 5, 6, 7, 8};	\
+  BASETYPE##x8_t a = vld1##SUFFIX (_a);			\
+  BASETYPE##_t _b[8] = {4, 2, 1, 3, 2, 6, 8, 9};	\
+  BASETYPE##x8_t b = vld1##SUFFIX (_b);			\
+  RESTYPE res[8];					\
+  vst1##ST1_SUFFIX (res, test_vclt##SUFFIX (a, b));	\
+  CHECK8 (RESTYPE, -1, 0, 0, 0, 0, 0, -1, -1);		\
+  vst1##ST1_SUFFIX (res, test_vcle##SUFFIX (a, b));	\
+  CHECK8 (RESTYPE, -1, -1, 0, 0, 0, -1, -1, -1);	\
+  vst1##ST1_SUFFIX (res, test_vceq##SUFFIX (a, b));	\
+  CHECK8 (RESTYPE, 0, -1, 0, 0, 0, -1, 0, 0);		\
+  vst1##ST1_SUFFIX (res, test_vcge##SUFFIX (a, b));	\
+  CHECK8 (RESTYPE, 0, -1, -1, -1, -1, -1, 0, 0);	\
+  vst1##ST1_SUFFIX (res, test_vcgt##SUFFIX (a, b));	\
+  CHECK8 (RESTYPE, 0, 0, -1, -1, -1, 0, 0, 0);		\
+  vst1##ST1_SUFFIX (res, test_vtst##SUFFIX (a, b));	\
+  CHECK8 (RESTYPE, 0, -1, -1, 0, 0, -1, 0, -1);		\
+}
+
+/* 16-way tests use same 8 values twice.  */
+#define CHECK16(T, R0, R1, R2, R3, R4, R5, R6, R7)			       \
+  if (res[0] != (T)R0 || res[1] != (T)R1 || res[2] != (T)R2 || res[3] != (T)R3 \
+      || res[4] != (T)R4 || res[5] != (T)R5 || res[6] != (T)R6		       \
+      || res[7] != (T)R7 || res[8] != (T)R0 || res[9] != (T)R1		       \
+      || res[10] != (T)R2 || res[11] != (T)R3 || res[12] != (T)R4	       \
+      || res[13] != (T)R5 || res[14] != (T)R6 || res[15] != (T)R7) abort ()
+
+#define TEST16(BASETYPE, SUFFIX, RESTYPE, ST1_SUFFIX) {			  \
+  BASETYPE##_t _a[16] = {1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8}; \
+  BASETYPE##x16_t a = vld1##SUFFIX (_a);				  \
+  BASETYPE##_t _b[16] = {4, 2, 1, 3, 2, 6, 8, 9, 4, 2, 1, 3, 2, 6, 8, 9}; \
+  BASETYPE##x16_t b = vld1##SUFFIX (_b);				  \
+  RESTYPE res[16];							  \
+  vst1##ST1_SUFFIX (res, test_vclt##SUFFIX (a, b));			  \
+  CHECK16 (RESTYPE, -1, 0, 0, 0, 0, 0, -1, -1);				  \
+  vst1##ST1_SUFFIX (res, test_vcle##SUFFIX (a, b));			  \
+  CHECK16 (RESTYPE, -1, -1, 0, 0, 0, -1, -1, -1);			  \
+  vst1##ST1_SUFFIX (res, test_vceq##SUFFIX (a, b));			  \
+  CHECK16 (RESTYPE, 0, -1, 0, 0, 0, -1, 0, 0);				  \
+  vst1##ST1_SUFFIX (res, test_vcge##SUFFIX (a, b));			  \
+  CHECK16 (RESTYPE, 0, -1, -1, -1, -1, -1, 0, 0);			  \
+  vst1##ST1_SUFFIX (res, test_vcgt##SUFFIX (a, b));			  \
+  CHECK16 (RESTYPE, 0, 0, -1, -1, -1, 0, 0, 0);				  \
+  vst1##ST1_SUFFIX (res, test_vtst##SUFFIX (a, b));			  \
+  CHECK16 (RESTYPE, 0, -1, -1, 0, 0, -1, 0, -1);			  \
+}
+
+int
+main (int argc, char **argv)
+{
+  TEST2 (int32, _s32, uint32_t, _u32);
+  TEST2 (uint32, _u32, uint32_t, _u32);
+  TEST2 (int64, q_s64, uint64_t, q_u64);
+  TEST2 (uint64, q_u64, uint64_t, q_u64);
+
+  TEST4 (int16, _s16, uint16_t, _u16);
+  TEST4 (uint16, _u16, uint16_t, _u16);
+  TEST4 (int32, q_s32, uint32_t, q_u32);
+  TEST4 (uint32, q_u32, uint32_t, q_u32);
+
+  TEST8 (int8, _s8, uint8_t, _u8);
+  TEST8 (uint8, _u8, uint8_t, _u8);
+  TEST8 (int16, q_s16, uint16_t, q_u16);
+  TEST8 (uint16, q_u16, uint16_t, q_u16);
+
+  TEST16 (int8, q_s8, uint8_t, q_u8);
+  TEST16 (uint8, q_u8, uint8_t, q_u8);
+
+  return 0;
+}
+

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH AArch64 2/2] Remove vector compare/tst __builtins
  2014-08-19 10:44 [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction Alan Lawrence
@ 2014-08-19 13:43 ` Alan Lawrence
  2014-09-02 15:19   ` Marcus Shawcroft
  2014-09-02 15:17 ` [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction Marcus Shawcroft
  1 sibling, 1 reply; 8+ messages in thread
From: Alan Lawrence @ 2014-08-19 13:43 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

The vector compare intrinsics (vc[gl][et]z, vceqz, vtst) were written using 
__builtin functions as (IIUC) at the time gcc vector extensions did not support 
comparison ops across both C and C++ frontends. These have since been updated.

Following the first patch, we now get equal/better code generation from using 
gcc vector extensions (specifically, TST instructions are generated again, and 
all NOTs are eliminated), so we can remove a bunch of code and builtins :).

Tested with check-gcc and check-g++ on aarch64-none-elf, aarch64.exp+simd.exp on 
aarch64_be-none-elf.

gcc/ChangeLog:

	* config/aarch64/aarch64-builtins.c (aarch64_fold_builtin): Remove code
	handling cmge, cmgt, cmeq, cmtst.

	* config/aarch64/aarch64-simd-builtins.def (cmeq, cmge, cmgt, cmle,
	cmlt, cmgeu, cmgtu, cmtst): Remove.

	* config/aarch64/arm_neon.h (vceq_*, vceqq_*, vceqz_*, vceqzq_*,
	vcge_*, vcgeq_*, vcgez_*, vcgezq_*, vcgt_*, vcgtq_*, vcgtz_*,
	vcgtzq_*, vcle_*, vcleq_*, vclez_*, vclezq_*, vclt_*, vcltq_*,
	vcltz_*, vcltzq_*, vtst_*, vtstq_*): Use gcc vector extensions.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rm_cmtst.patch --]
[-- Type: text/x-patch; name=rm_cmtst.patch, Size: 47581 bytes --]

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index c3df73e..aa2c40c 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1215,22 +1215,6 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args,
       BUILTIN_VALLDI (UNOP, abs, 2)
 	return fold_build1 (ABS_EXPR, type, args[0]);
 	break;
-      BUILTIN_VALLDI (BINOP, cmge, 0)
-	return fold_build2 (GE_EXPR, type, args[0], args[1]);
-	break;
-      BUILTIN_VALLDI (BINOP, cmgt, 0)
-	return fold_build2 (GT_EXPR, type, args[0], args[1]);
-	break;
-      BUILTIN_VALLDI (BINOP, cmeq, 0)
-	return fold_build2 (EQ_EXPR, type, args[0], args[1]);
-	break;
-      BUILTIN_VSDQ_I_DI (TST, cmtst, 0)
-	{
-	  tree and_node = fold_build2 (BIT_AND_EXPR, type, args[0], args[1]);
-	  tree vec_zero_node = build_zero_cst (type);
-	  return fold_build2 (NE_EXPR, type, and_node, vec_zero_node);
-	  break;
-	}
       VAR1 (REINTERP_SS, reinterpretdi, 0, df)
       VAR1 (REINTERP_SS, reinterpretv8qi, 0, df)
       VAR1 (REINTERP_SS, reinterpretv4hi, 0, df)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index ae52469..9320e99 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -240,17 +240,6 @@
   BUILTIN_VSDQ_I (SHIFTIMM, sqshl_n, 0)
   BUILTIN_VSDQ_I (SHIFTIMM, uqshl_n, 0)
 
-  /* Implemented by aarch64_cm<cmp><mode>.  */
-  BUILTIN_VALLDI (BINOP, cmeq, 0)
-  BUILTIN_VALLDI (BINOP, cmge, 0)
-  BUILTIN_VALLDI (BINOP, cmgt, 0)
-  BUILTIN_VALLDI (BINOP, cmle, 0)
-  BUILTIN_VALLDI (BINOP, cmlt, 0)
-  /* Implemented by aarch64_cm<cmp><mode>.  */
-  BUILTIN_VSDQ_I_DI (BINOP, cmgeu, 0)
-  BUILTIN_VSDQ_I_DI (BINOP, cmgtu, 0)
-  BUILTIN_VSDQ_I_DI (TST, cmtst, 0)
-
   /* Implemented by reduc_<sur>plus_<mode>.  */
   BUILTIN_VALL (UNOP, reduc_splus_, 10)
   BUILTIN_VDQ (UNOP, reduc_uplus_, 10)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e7485f0..ea56b82 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -14632,7 +14632,7 @@ vcaltq_f64 (float64x2_t __a, float64x2_t __b)
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vceq_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmeqv2sf (__a, __b);
+  return (uint32x2_t) (__a == __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -14644,26 +14644,25 @@ vceq_f64 (float64x1_t __a, float64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vceq_p8 (poly8x8_t __a, poly8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmeqv8qi ((int8x8_t) __a,
-						 (int8x8_t) __b);
+  return (uint8x8_t) (__a == __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vceq_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmeqv8qi (__a, __b);
+  return (uint8x8_t) (__a == __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vceq_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmeqv4hi (__a, __b);
+  return (uint16x4_t) (__a == __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vceq_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmeqv2si (__a, __b);
+  return (uint32x2_t) (__a == __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -14675,22 +14674,19 @@ vceq_s64 (int64x1_t __a, int64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vceq_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmeqv8qi ((int8x8_t) __a,
-						 (int8x8_t) __b);
+  return (__a == __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vceq_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmeqv4hi ((int16x4_t) __a,
-						  (int16x4_t) __b);
+  return (__a == __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vceq_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmeqv2si ((int32x2_t) __a,
-						  (int32x2_t) __b);
+  return (__a == __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -14702,72 +14698,67 @@ vceq_u64 (uint64x1_t __a, uint64x1_t __b)
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vceqq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmeqv4sf (__a, __b);
+  return (uint32x4_t) (__a == __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vceqq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmeqv2df (__a, __b);
+  return (uint64x2_t) (__a == __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vceqq_p8 (poly8x16_t __a, poly8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmeqv16qi ((int8x16_t) __a,
-						   (int8x16_t) __b);
+  return (uint8x16_t) (__a == __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vceqq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmeqv16qi (__a, __b);
+  return (uint8x16_t) (__a == __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vceqq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmeqv8hi (__a, __b);
+  return (uint16x8_t) (__a == __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vceqq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmeqv4si (__a, __b);
+  return (uint32x4_t) (__a == __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vceqq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmeqv2di (__a, __b);
+  return (uint64x2_t) (__a == __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vceqq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmeqv16qi ((int8x16_t) __a,
-						   (int8x16_t) __b);
+  return (__a == __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vceqq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmeqv8hi ((int16x8_t) __a,
-						  (int16x8_t) __b);
+  return (__a == __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vceqq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmeqv4si ((int32x4_t) __a,
-						  (int32x4_t) __b);
+  return (__a == __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vceqq_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmeqv2di ((int64x2_t) __a,
-						  (int64x2_t) __b);
+  return (__a == __b);
 }
 
 /* vceq - scalar.  */
@@ -14801,8 +14792,7 @@ vceqd_f64 (float64_t __a, float64_t __b)
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vceqz_f32 (float32x2_t __a)
 {
-  float32x2_t __b = {0.0f, 0.0f};
-  return (uint32x2_t) __builtin_aarch64_cmeqv2sf (__a, __b);
+  return (uint32x2_t) (__a == 0.0f);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -14814,30 +14804,25 @@ vceqz_f64 (float64x1_t __a)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vceqz_p8 (poly8x8_t __a)
 {
-  poly8x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x8_t) __builtin_aarch64_cmeqv8qi ((int8x8_t) __a,
-						 (int8x8_t) __b);
+  return (uint8x8_t) (__a == 0);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vceqz_s8 (int8x8_t __a)
 {
-  int8x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x8_t) __builtin_aarch64_cmeqv8qi (__a, __b);
+  return (uint8x8_t) (__a == 0);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vceqz_s16 (int16x4_t __a)
 {
-  int16x4_t __b = {0, 0, 0, 0};
-  return (uint16x4_t) __builtin_aarch64_cmeqv4hi (__a, __b);
+  return (uint16x4_t) (__a == 0);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vceqz_s32 (int32x2_t __a)
 {
-  int32x2_t __b = {0, 0};
-  return (uint32x2_t) __builtin_aarch64_cmeqv2si (__a, __b);
+  return (uint32x2_t) (__a == 0);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -14849,25 +14834,19 @@ vceqz_s64 (int64x1_t __a)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vceqz_u8 (uint8x8_t __a)
 {
-  uint8x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x8_t) __builtin_aarch64_cmeqv8qi ((int8x8_t) __a,
-						 (int8x8_t) __b);
+  return (__a == 0);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vceqz_u16 (uint16x4_t __a)
 {
-  uint16x4_t __b = {0, 0, 0, 0};
-  return (uint16x4_t) __builtin_aarch64_cmeqv4hi ((int16x4_t) __a,
-						  (int16x4_t) __b);
+  return (__a == 0);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vceqz_u32 (uint32x2_t __a)
 {
-  uint32x2_t __b = {0, 0};
-  return (uint32x2_t) __builtin_aarch64_cmeqv2si ((int32x2_t) __a,
-						  (int32x2_t) __b);
+  return (__a == 0);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -14879,86 +14858,67 @@ vceqz_u64 (uint64x1_t __a)
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vceqzq_f32 (float32x4_t __a)
 {
-  float32x4_t __b = {0.0f, 0.0f, 0.0f, 0.0f};
-  return (uint32x4_t) __builtin_aarch64_cmeqv4sf (__a, __b);
+  return (uint32x4_t) (__a == 0.0f);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vceqzq_f64 (float64x2_t __a)
 {
-  float64x2_t __b = {0.0, 0.0};
-  return (uint64x2_t) __builtin_aarch64_cmeqv2df (__a, __b);
+  return (uint64x2_t) (__a == 0.0f);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vceqzq_p8 (poly8x16_t __a)
 {
-  poly8x16_t __b = {0, 0, 0, 0, 0, 0, 0, 0,
-		    0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x16_t) __builtin_aarch64_cmeqv16qi ((int8x16_t) __a,
-						   (int8x16_t) __b);
+  return (uint8x16_t) (__a == 0);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vceqzq_s8 (int8x16_t __a)
 {
-  int8x16_t __b = {0, 0, 0, 0, 0, 0, 0, 0,
-		   0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x16_t) __builtin_aarch64_cmeqv16qi (__a, __b);
+  return (uint8x16_t) (__a == 0);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vceqzq_s16 (int16x8_t __a)
 {
-  int16x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint16x8_t) __builtin_aarch64_cmeqv8hi (__a, __b);
+  return (uint16x8_t) (__a == 0);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vceqzq_s32 (int32x4_t __a)
 {
-  int32x4_t __b = {0, 0, 0, 0};
-  return (uint32x4_t) __builtin_aarch64_cmeqv4si (__a, __b);
+  return (uint32x4_t) (__a == 0);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vceqzq_s64 (int64x2_t __a)
 {
-  int64x2_t __b = {0, 0};
-  return (uint64x2_t) __builtin_aarch64_cmeqv2di (__a, __b);
+  return (uint64x2_t) (__a == __AARCH64_INT64_C (0));
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vceqzq_u8 (uint8x16_t __a)
 {
-  uint8x16_t __b = {0, 0, 0, 0, 0, 0, 0, 0,
-		    0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x16_t) __builtin_aarch64_cmeqv16qi ((int8x16_t) __a,
-						   (int8x16_t) __b);
+  return (__a == 0);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vceqzq_u16 (uint16x8_t __a)
 {
-  uint16x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint16x8_t) __builtin_aarch64_cmeqv8hi ((int16x8_t) __a,
-						  (int16x8_t) __b);
+  return (__a == 0);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vceqzq_u32 (uint32x4_t __a)
 {
-  uint32x4_t __b = {0, 0, 0, 0};
-  return (uint32x4_t) __builtin_aarch64_cmeqv4si ((int32x4_t) __a,
-						  (int32x4_t) __b);
+  return (__a == 0);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vceqzq_u64 (uint64x2_t __a)
 {
-  uint64x2_t __b = {0, 0};
-  return (uint64x2_t) __builtin_aarch64_cmeqv2di ((int64x2_t) __a,
-						  (int64x2_t) __b);
+  return (__a == __AARCH64_UINT64_C (0));
 }
 
 /* vceqz - scalar.  */
@@ -14992,7 +14952,7 @@ vceqzd_f64 (float64_t __a)
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcge_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgev2sf (__a, __b);
+  return (uint32x2_t) (__a >= __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15004,19 +14964,19 @@ vcge_f64 (float64x1_t __a, float64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcge_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmgev8qi (__a, __b);
+  return (uint8x8_t) (__a >= __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcge_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmgev4hi (__a, __b);
+  return (uint16x4_t) (__a >= __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcge_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgev2si (__a, __b);
+  return (uint32x2_t) (__a >= __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15028,22 +14988,19 @@ vcge_s64 (int64x1_t __a, int64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcge_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmgeuv8qi ((int8x8_t) __a,
-						 (int8x8_t) __b);
+  return (__a >= __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcge_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmgeuv4hi ((int16x4_t) __a,
-						  (int16x4_t) __b);
+  return (__a >= __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcge_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgeuv2si ((int32x2_t) __a,
-						  (int32x2_t) __b);
+  return (__a >= __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15055,65 +15012,61 @@ vcge_u64 (uint64x1_t __a, uint64x1_t __b)
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgeq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgev4sf (__a, __b);
+  return (uint32x4_t) (__a >= __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcgeq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgev2df (__a, __b);
+  return (uint64x2_t) (__a >= __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgeq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmgev16qi (__a, __b);
+  return (uint8x16_t) (__a >= __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcgeq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmgev8hi (__a, __b);
+  return (uint16x8_t) (__a >= __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgeq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgev4si (__a, __b);
+  return (uint32x4_t) (__a >= __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcgeq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgev2di (__a, __b);
+  return (uint64x2_t) (__a >= __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgeq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmgeuv16qi ((int8x16_t) __a,
-						   (int8x16_t) __b);
+  return (__a >= __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcgeq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmgeuv8hi ((int16x8_t) __a,
-						  (int16x8_t) __b);
+  return (__a >= __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgeq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgeuv4si ((int32x4_t) __a,
-						  (int32x4_t) __b);
+  return (__a >= __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcgeq_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgeuv2di ((int64x2_t) __a,
-						  (int64x2_t) __b);
+  return (__a >= __b);
 }
 
 /* vcge - scalar.  */
@@ -15147,8 +15100,7 @@ vcged_f64 (float64_t __a, float64_t __b)
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcgez_f32 (float32x2_t __a)
 {
-  float32x2_t __b = {0.0f, 0.0f};
-  return (uint32x2_t) __builtin_aarch64_cmgev2sf (__a, __b);
+  return (uint32x2_t) (__a >= 0.0f);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15160,22 +15112,19 @@ vcgez_f64 (float64x1_t __a)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcgez_s8 (int8x8_t __a)
 {
-  int8x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x8_t) __builtin_aarch64_cmgev8qi (__a, __b);
+  return (uint8x8_t) (__a >= 0);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcgez_s16 (int16x4_t __a)
 {
-  int16x4_t __b = {0, 0, 0, 0};
-  return (uint16x4_t) __builtin_aarch64_cmgev4hi (__a, __b);
+  return (uint16x4_t) (__a >= 0);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcgez_s32 (int32x2_t __a)
 {
-  int32x2_t __b = {0, 0};
-  return (uint32x2_t) __builtin_aarch64_cmgev2si (__a, __b);
+  return (uint32x2_t) (__a >= 0);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15187,44 +15136,37 @@ vcgez_s64 (int64x1_t __a)
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgezq_f32 (float32x4_t __a)
 {
-  float32x4_t __b = {0.0f, 0.0f, 0.0f, 0.0f};
-  return (uint32x4_t) __builtin_aarch64_cmgev4sf (__a, __b);
+  return (uint32x4_t) (__a >= 0.0f);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcgezq_f64 (float64x2_t __a)
 {
-  float64x2_t __b = {0.0, 0.0};
-  return (uint64x2_t) __builtin_aarch64_cmgev2df (__a, __b);
+  return (uint64x2_t) (__a >= 0.0);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgezq_s8 (int8x16_t __a)
 {
-  int8x16_t __b = {0, 0, 0, 0, 0, 0, 0, 0,
-		   0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x16_t) __builtin_aarch64_cmgev16qi (__a, __b);
+  return (uint8x16_t) (__a >= 0);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcgezq_s16 (int16x8_t __a)
 {
-  int16x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint16x8_t) __builtin_aarch64_cmgev8hi (__a, __b);
+  return (uint16x8_t) (__a >= 0);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgezq_s32 (int32x4_t __a)
 {
-  int32x4_t __b = {0, 0, 0, 0};
-  return (uint32x4_t) __builtin_aarch64_cmgev4si (__a, __b);
+  return (uint32x4_t) (__a >= 0);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcgezq_s64 (int64x2_t __a)
 {
-  int64x2_t __b = {0, 0};
-  return (uint64x2_t) __builtin_aarch64_cmgev2di (__a, __b);
+  return (uint64x2_t) (__a >= __AARCH64_INT64_C (0));
 }
 
 /* vcgez - scalar.  */
@@ -15252,7 +15194,7 @@ vcgezd_f64 (float64_t __a)
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcgt_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgtv2sf (__a, __b);
+  return (uint32x2_t) (__a > __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15264,19 +15206,19 @@ vcgt_f64 (float64x1_t __a, float64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcgt_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmgtv8qi (__a, __b);
+  return (uint8x8_t) (__a > __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcgt_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmgtv4hi (__a, __b);
+  return (uint16x4_t) (__a > __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcgt_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgtv2si (__a, __b);
+  return (uint32x2_t) (__a > __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15288,22 +15230,19 @@ vcgt_s64 (int64x1_t __a, int64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcgt_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmgtuv8qi ((int8x8_t) __a,
-						 (int8x8_t) __b);
+  return (__a > __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcgt_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmgtuv4hi ((int16x4_t) __a,
-						  (int16x4_t) __b);
+  return (__a > __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcgt_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgtuv2si ((int32x2_t) __a,
-						  (int32x2_t) __b);
+  return (__a > __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15315,65 +15254,61 @@ vcgt_u64 (uint64x1_t __a, uint64x1_t __b)
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgtq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgtv4sf (__a, __b);
+  return (uint32x4_t) (__a > __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcgtq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgtv2df (__a, __b);
+  return (uint64x2_t) (__a > __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgtq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmgtv16qi (__a, __b);
+  return (uint8x16_t) (__a > __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcgtq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmgtv8hi (__a, __b);
+  return (uint16x8_t) (__a > __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgtq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgtv4si (__a, __b);
+  return (uint32x4_t) (__a > __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcgtq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgtv2di (__a, __b);
+  return (uint64x2_t) (__a > __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgtq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmgtuv16qi ((int8x16_t) __a,
-						   (int8x16_t) __b);
+  return (__a > __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcgtq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmgtuv8hi ((int16x8_t) __a,
-						  (int16x8_t) __b);
+  return (__a > __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgtq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgtuv4si ((int32x4_t) __a,
-						  (int32x4_t) __b);
+  return (__a > __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcgtq_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgtuv2di ((int64x2_t) __a,
-						  (int64x2_t) __b);
+  return (__a > __b);
 }
 
 /* vcgt - scalar.  */
@@ -15407,8 +15342,7 @@ vcgtd_f64 (float64_t __a, float64_t __b)
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcgtz_f32 (float32x2_t __a)
 {
-  float32x2_t __b = {0.0f, 0.0f};
-  return (uint32x2_t) __builtin_aarch64_cmgtv2sf (__a, __b);
+  return (uint32x2_t) (__a > 0.0f);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15420,22 +15354,19 @@ vcgtz_f64 (float64x1_t __a)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcgtz_s8 (int8x8_t __a)
 {
-  int8x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x8_t) __builtin_aarch64_cmgtv8qi (__a, __b);
+  return (uint8x8_t) (__a > 0);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcgtz_s16 (int16x4_t __a)
 {
-  int16x4_t __b = {0, 0, 0, 0};
-  return (uint16x4_t) __builtin_aarch64_cmgtv4hi (__a, __b);
+  return (uint16x4_t) (__a > 0);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcgtz_s32 (int32x2_t __a)
 {
-  int32x2_t __b = {0, 0};
-  return (uint32x2_t) __builtin_aarch64_cmgtv2si (__a, __b);
+  return (uint32x2_t) (__a > 0);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15447,44 +15378,37 @@ vcgtz_s64 (int64x1_t __a)
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgtzq_f32 (float32x4_t __a)
 {
-  float32x4_t __b = {0.0f, 0.0f, 0.0f, 0.0f};
-  return (uint32x4_t) __builtin_aarch64_cmgtv4sf (__a, __b);
+  return (uint32x4_t) (__a > 0.0f);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcgtzq_f64 (float64x2_t __a)
 {
-  float64x2_t __b = {0.0, 0.0};
-  return (uint64x2_t) __builtin_aarch64_cmgtv2df (__a, __b);
+    return (uint64x2_t) (__a > 0.0);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgtzq_s8 (int8x16_t __a)
 {
-  int8x16_t __b = {0, 0, 0, 0, 0, 0, 0, 0,
-		   0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x16_t) __builtin_aarch64_cmgtv16qi (__a, __b);
+  return (uint8x16_t) (__a > 0);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcgtzq_s16 (int16x8_t __a)
 {
-  int16x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint16x8_t) __builtin_aarch64_cmgtv8hi (__a, __b);
+  return (uint16x8_t) (__a > 0);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgtzq_s32 (int32x4_t __a)
 {
-  int32x4_t __b = {0, 0, 0, 0};
-  return (uint32x4_t) __builtin_aarch64_cmgtv4si (__a, __b);
+  return (uint32x4_t) (__a > 0);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcgtzq_s64 (int64x2_t __a)
 {
-  int64x2_t __b = {0, 0};
-  return (uint64x2_t) __builtin_aarch64_cmgtv2di (__a, __b);
+  return (uint64x2_t) (__a > __AARCH64_INT64_C (0));
 }
 
 /* vcgtz - scalar.  */
@@ -15512,7 +15436,7 @@ vcgtzd_f64 (float64_t __a)
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcle_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgev2sf (__b, __a);
+  return (uint32x2_t) (__a <= __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15524,19 +15448,19 @@ vcle_f64 (float64x1_t __a, float64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcle_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmgev8qi (__b, __a);
+  return (uint8x8_t) (__a <= __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcle_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmgev4hi (__b, __a);
+  return (uint16x4_t) (__a <= __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcle_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgev2si (__b, __a);
+  return (uint32x2_t) (__a <= __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15548,22 +15472,19 @@ vcle_s64 (int64x1_t __a, int64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcle_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmgeuv8qi ((int8x8_t) __b,
-						 (int8x8_t) __a);
+  return (__a <= __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcle_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmgeuv4hi ((int16x4_t) __b,
-						  (int16x4_t) __a);
+  return (__a <= __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcle_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgeuv2si ((int32x2_t) __b,
-						  (int32x2_t) __a);
+  return (__a <= __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15575,65 +15496,61 @@ vcle_u64 (uint64x1_t __a, uint64x1_t __b)
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcleq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgev4sf (__b, __a);
+  return (uint32x4_t) (__a <= __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcleq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgev2df (__b, __a);
+  return (uint64x2_t) (__a <= __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcleq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmgev16qi (__b, __a);
+  return (uint8x16_t) (__a <= __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcleq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmgev8hi (__b, __a);
+  return (uint16x8_t) (__a <= __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcleq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgev4si (__b, __a);
+  return (uint32x4_t) (__a <= __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcleq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgev2di (__b, __a);
+  return (uint64x2_t) (__a <= __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcleq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmgeuv16qi ((int8x16_t) __b,
-						   (int8x16_t) __a);
+  return (__a <= __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcleq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmgeuv8hi ((int16x8_t) __b,
-						  (int16x8_t) __a);
+  return (__a <= __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcleq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgeuv4si ((int32x4_t) __b,
-						  (int32x4_t) __a);
+  return (__a <= __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcleq_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgeuv2di ((int64x2_t) __b,
-						  (int64x2_t) __a);
+  return (__a <= __b);
 }
 
 /* vcle - scalar.  */
@@ -15667,8 +15584,7 @@ vcled_f64 (float64_t __a, float64_t __b)
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vclez_f32 (float32x2_t __a)
 {
-  float32x2_t __b = {0.0f, 0.0f};
-  return (uint32x2_t) __builtin_aarch64_cmlev2sf (__a, __b);
+  return (uint32x2_t) (__a <= 0.0f);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15680,22 +15596,19 @@ vclez_f64 (float64x1_t __a)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vclez_s8 (int8x8_t __a)
 {
-  int8x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x8_t) __builtin_aarch64_cmlev8qi (__a, __b);
+  return (uint8x8_t) (__a <= 0);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vclez_s16 (int16x4_t __a)
 {
-  int16x4_t __b = {0, 0, 0, 0};
-  return (uint16x4_t) __builtin_aarch64_cmlev4hi (__a, __b);
+  return (uint16x4_t) (__a <= 0);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vclez_s32 (int32x2_t __a)
 {
-  int32x2_t __b = {0, 0};
-  return (uint32x2_t) __builtin_aarch64_cmlev2si (__a, __b);
+  return (uint32x2_t) (__a <= 0);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15707,44 +15620,37 @@ vclez_s64 (int64x1_t __a)
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vclezq_f32 (float32x4_t __a)
 {
-  float32x4_t __b = {0.0f, 0.0f, 0.0f, 0.0f};
-  return (uint32x4_t) __builtin_aarch64_cmlev4sf (__a, __b);
+  return (uint32x4_t) (__a <= 0.0f);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vclezq_f64 (float64x2_t __a)
 {
-  float64x2_t __b = {0.0, 0.0};
-  return (uint64x2_t) __builtin_aarch64_cmlev2df (__a, __b);
+  return (uint64x2_t) (__a <= 0.0);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vclezq_s8 (int8x16_t __a)
 {
-  int8x16_t __b = {0, 0, 0, 0, 0, 0, 0, 0,
-		   0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x16_t) __builtin_aarch64_cmlev16qi (__a, __b);
+  return (uint8x16_t) (__a <= 0);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vclezq_s16 (int16x8_t __a)
 {
-  int16x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint16x8_t) __builtin_aarch64_cmlev8hi (__a, __b);
+  return (uint16x8_t) (__a <= 0);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vclezq_s32 (int32x4_t __a)
 {
-  int32x4_t __b = {0, 0, 0, 0};
-  return (uint32x4_t) __builtin_aarch64_cmlev4si (__a, __b);
+  return (uint32x4_t) (__a <= 0);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vclezq_s64 (int64x2_t __a)
 {
-  int64x2_t __b = {0, 0};
-  return (uint64x2_t) __builtin_aarch64_cmlev2di (__a, __b);
+  return (uint64x2_t) (__a <= __AARCH64_INT64_C (0));
 }
 
 /* vclez - scalar.  */
@@ -15772,7 +15678,7 @@ vclezd_f64 (float64_t __a)
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vclt_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgtv2sf (__b, __a);
+  return (uint32x2_t) (__a < __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15784,19 +15690,19 @@ vclt_f64 (float64x1_t __a, float64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vclt_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmgtv8qi (__b, __a);
+  return (uint8x8_t) (__a < __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vclt_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmgtv4hi (__b, __a);
+  return (uint16x4_t) (__a < __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vclt_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgtv2si (__b, __a);
+  return (uint32x2_t) (__a < __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15808,22 +15714,19 @@ vclt_s64 (int64x1_t __a, int64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vclt_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmgtuv8qi ((int8x8_t) __b,
-						 (int8x8_t) __a);
+  return (__a < __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vclt_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmgtuv4hi ((int16x4_t) __b,
-						  (int16x4_t) __a);
+  return (__a < __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vclt_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmgtuv2si ((int32x2_t) __b,
-						  (int32x2_t) __a);
+  return (__a < __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15835,65 +15738,61 @@ vclt_u64 (uint64x1_t __a, uint64x1_t __b)
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcltq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgtv4sf (__b, __a);
+  return (uint32x4_t) (__a < __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcltq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgtv2df (__b, __a);
+  return (uint64x2_t) (__a < __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcltq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmgtv16qi (__b, __a);
+  return (uint8x16_t) (__a < __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcltq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmgtv8hi (__b, __a);
+  return (uint16x8_t) (__a < __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcltq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgtv4si (__b, __a);
+  return (uint32x4_t) (__a < __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcltq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgtv2di (__b, __a);
+  return (uint64x2_t) (__a < __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcltq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmgtuv16qi ((int8x16_t) __b,
-						   (int8x16_t) __a);
+  return (__a < __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcltq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmgtuv8hi ((int16x8_t) __b,
-						  (int16x8_t) __a);
+  return (__a < __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcltq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmgtuv4si ((int32x4_t) __b,
-						  (int32x4_t) __a);
+  return (__a < __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcltq_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmgtuv2di ((int64x2_t) __b,
-						  (int64x2_t) __a);
+  return (__a < __b);
 }
 
 /* vclt - scalar.  */
@@ -15927,8 +15826,7 @@ vcltd_f64 (float64_t __a, float64_t __b)
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcltz_f32 (float32x2_t __a)
 {
-  float32x2_t __b = {0.0f, 0.0f};
-  return (uint32x2_t) __builtin_aarch64_cmltv2sf (__a, __b);
+  return (uint32x2_t) (__a < 0.0f);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15940,22 +15838,19 @@ vcltz_f64 (float64x1_t __a)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcltz_s8 (int8x8_t __a)
 {
-  int8x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x8_t) __builtin_aarch64_cmltv8qi (__a, __b);
+  return (uint8x8_t) (__a < 0);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcltz_s16 (int16x4_t __a)
 {
-  int16x4_t __b = {0, 0, 0, 0};
-  return (uint16x4_t) __builtin_aarch64_cmltv4hi (__a, __b);
+  return (uint16x4_t) (__a < 0);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcltz_s32 (int32x2_t __a)
 {
-  int32x2_t __b = {0, 0};
-  return (uint32x2_t) __builtin_aarch64_cmltv2si (__a, __b);
+  return (uint32x2_t) (__a < 0);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -15967,44 +15862,37 @@ vcltz_s64 (int64x1_t __a)
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcltzq_f32 (float32x4_t __a)
 {
-  float32x4_t __b = {0.0f, 0.0f, 0.0f, 0.0f};
-  return (uint32x4_t) __builtin_aarch64_cmltv4sf (__a, __b);
+  return (uint32x4_t) (__a < 0.0f);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcltzq_f64 (float64x2_t __a)
 {
-  float64x2_t __b = {0.0, 0.0};
-  return (uint64x2_t) __builtin_aarch64_cmltv2df (__a, __b);
+  return (uint64x2_t) (__a < 0.0);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcltzq_s8 (int8x16_t __a)
 {
-  int8x16_t __b = {0, 0, 0, 0, 0, 0, 0, 0,
-		   0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x16_t) __builtin_aarch64_cmltv16qi (__a, __b);
+  return (uint8x16_t) (__a < 0);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcltzq_s16 (int16x8_t __a)
 {
-  int16x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint16x8_t) __builtin_aarch64_cmltv8hi (__a, __b);
+  return (uint16x8_t) (__a < 0);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcltzq_s32 (int32x4_t __a)
 {
-  int32x4_t __b = {0, 0, 0, 0};
-  return (uint32x4_t) __builtin_aarch64_cmltv4si (__a, __b);
+  return (uint32x4_t) (__a < 0);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vcltzq_s64 (int64x2_t __a)
 {
-  int64x2_t __b = {0, 0};
-  return (uint64x2_t) __builtin_aarch64_cmltv2di (__a, __b);
+  return (uint64x2_t) (__a < __AARCH64_INT64_C (0));
 }
 
 /* vcltz - scalar.  */
@@ -24222,19 +24110,19 @@ vtrnq_u32 (uint32x4_t a, uint32x4_t b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vtst_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmtstv8qi (__a, __b);
+  return (uint8x8_t) ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vtst_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmtstv4hi (__a, __b);
+  return (uint16x4_t) ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vtst_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmtstv2si (__a, __b);
+  return (uint32x2_t) ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -24246,22 +24134,19 @@ vtst_s64 (int64x1_t __a, int64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vtst_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_cmtstv8qi ((int8x8_t) __a,
-						 (int8x8_t) __b);
+  return ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vtst_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_cmtstv4hi ((int16x4_t) __a,
-						  (int16x4_t) __b);
+  return ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vtst_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_cmtstv2si ((int32x2_t) __a,
-						  (int32x2_t) __b);
+  return ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
@@ -24273,53 +24158,49 @@ vtst_u64 (uint64x1_t __a, uint64x1_t __b)
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vtstq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmtstv16qi (__a, __b);
+  return (uint8x16_t) ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vtstq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmtstv8hi (__a, __b);
+  return (uint16x8_t) ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vtstq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmtstv4si (__a, __b);
+  return (uint32x4_t) ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vtstq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmtstv2di (__a, __b);
+  return (uint64x2_t) ((__a & __b) != __AARCH64_INT64_C (0));
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vtstq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t) __builtin_aarch64_cmtstv16qi ((int8x16_t) __a,
-						   (int8x16_t) __b);
+  return ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vtstq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t) __builtin_aarch64_cmtstv8hi ((int16x8_t) __a,
-						  (int16x8_t) __b);
+  return ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vtstq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t) __builtin_aarch64_cmtstv4si ((int32x4_t) __a,
-						  (int32x4_t) __b);
+  return ((__a & __b) != 0);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vtstq_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint64x2_t) __builtin_aarch64_cmtstv2di ((int64x2_t) __a,
-						  (int64x2_t) __b);
+  return ((__a & __b) != __AARCH64_UINT64_C (0));
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction
  2014-08-19 10:44 [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction Alan Lawrence
  2014-08-19 13:43 ` [PATCH AArch64 2/2] Remove vector compare/tst __builtins Alan Lawrence
@ 2014-09-02 15:17 ` Marcus Shawcroft
  2014-09-08 12:52   ` Christophe Lyon
  1 sibling, 1 reply; 8+ messages in thread
From: Marcus Shawcroft @ 2014-09-02 15:17 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On 19 August 2014 11:44, Alan Lawrence <alan.lawrence@arm.com> wrote:

> gcc/ChangeLog:
>
>         * config/aarch64/aarch64-builtins.c (aarch64_types_cmtst_qualifiers,
>         TYPES_TST): Define.
>         (aarch64_fold_builtin): Update pattern for cmtst.
>
>         * config/aarch64/aarch64-protos.h
> (aarch64_const_vec_all_same_int_p):
>         Declare.
>
>         * config/aarch64/aarch64-simd-builtins.def (cmtst): Update
> qualifiers.
>
>         * config/aarch64/aarch64-simd.md
> (aarch64_vcond_internal<mode><mode>):
>         Switch operands, separate out more cases, refactor.
>
>         (aarch64_cmtst<mode>): Rewrite pattern to match (plus ... -1).
>
>         * config/aarch64.c (aarch64_const_vec_all_same_int_p): Take single
>         argument; rename old version to...
>         (aarch64_const_vec_all_same_in_range_p): ...this.
>         (aarch64_print_operand, aarch64_simd_shift_imm_p): Follow renaming.
>
>         * config/aarch64/predicates.md (aarch64_simd_imm_minus_one): Define.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/simd/int_comparisons.x: New file.
>         * gcc.target/aarch64/simd/int_comparisons_1.c: New test.
>         * gcc.target/aarch64/simd/int_comparisons_2.c: Ditto.

OK /Marcus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH AArch64 2/2] Remove vector compare/tst __builtins
  2014-08-19 13:43 ` [PATCH AArch64 2/2] Remove vector compare/tst __builtins Alan Lawrence
@ 2014-09-02 15:19   ` Marcus Shawcroft
  0 siblings, 0 replies; 8+ messages in thread
From: Marcus Shawcroft @ 2014-09-02 15:19 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On 19 August 2014 14:43, Alan Lawrence <alan.lawrence@arm.com> wrote:

> gcc/ChangeLog:
>
>         * config/aarch64/aarch64-builtins.c (aarch64_fold_builtin): Remove
> code
>         handling cmge, cmgt, cmeq, cmtst.
>
>         * config/aarch64/aarch64-simd-builtins.def (cmeq, cmge, cmgt, cmle,
>         cmlt, cmgeu, cmgtu, cmtst): Remove.
>
>         * config/aarch64/arm_neon.h (vceq_*, vceqq_*, vceqz_*, vceqzq_*,
>         vcge_*, vcgeq_*, vcgez_*, vcgezq_*, vcgt_*, vcgtq_*, vcgtz_*,
>         vcgtzq_*, vcle_*, vcleq_*, vclez_*, vclezq_*, vclt_*, vcltq_*,
>         vcltz_*, vcltzq_*, vtst_*, vtstq_*): Use gcc vector extensions.

OK /Marcus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction
  2014-09-02 15:17 ` [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction Marcus Shawcroft
@ 2014-09-08 12:52   ` Christophe Lyon
  2014-09-08 16:12     ` Alan Lawrence
  2014-09-09 10:20     ` [PATCH][AArch64 Testsuite]Fix scan-assembler test false alarm on aarch64-linux-gnu Alan Lawrence
  0 siblings, 2 replies; 8+ messages in thread
From: Christophe Lyon @ 2014-09-08 12:52 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

Hi Alan,

In my cross-testing I've noticed that your new test:
gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-not not
is PASS for targets aarch64-none-elf and aarch64_be-none-elf, but
FAIL for aarch64-none-linux-gnu.

It seems this is not what you saw in your own validations?

Christophe.



On 2 September 2014 17:17, Marcus Shawcroft <marcus.shawcroft@gmail.com> wrote:
> On 19 August 2014 11:44, Alan Lawrence <alan.lawrence@arm.com> wrote:
>
>> gcc/ChangeLog:
>>
>>         * config/aarch64/aarch64-builtins.c (aarch64_types_cmtst_qualifiers,
>>         TYPES_TST): Define.
>>         (aarch64_fold_builtin): Update pattern for cmtst.
>>
>>         * config/aarch64/aarch64-protos.h
>> (aarch64_const_vec_all_same_int_p):
>>         Declare.
>>
>>         * config/aarch64/aarch64-simd-builtins.def (cmtst): Update
>> qualifiers.
>>
>>         * config/aarch64/aarch64-simd.md
>> (aarch64_vcond_internal<mode><mode>):
>>         Switch operands, separate out more cases, refactor.
>>
>>         (aarch64_cmtst<mode>): Rewrite pattern to match (plus ... -1).
>>
>>         * config/aarch64.c (aarch64_const_vec_all_same_int_p): Take single
>>         argument; rename old version to...
>>         (aarch64_const_vec_all_same_in_range_p): ...this.
>>         (aarch64_print_operand, aarch64_simd_shift_imm_p): Follow renaming.
>>
>>         * config/aarch64/predicates.md (aarch64_simd_imm_minus_one): Define.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         * gcc.target/aarch64/simd/int_comparisons.x: New file.
>>         * gcc.target/aarch64/simd/int_comparisons_1.c: New test.
>>         * gcc.target/aarch64/simd/int_comparisons_2.c: Ditto.
>
> OK /Marcus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction
  2014-09-08 12:52   ` Christophe Lyon
@ 2014-09-08 16:12     ` Alan Lawrence
  2014-09-09 10:20     ` [PATCH][AArch64 Testsuite]Fix scan-assembler test false alarm on aarch64-linux-gnu Alan Lawrence
  1 sibling, 0 replies; 8+ messages in thread
From: Alan Lawrence @ 2014-09-08 16:12 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

Hmmm, thanks for the heads-up. Now reproduced. Looks like a TCL regexp issue,
should have a fix shortly.

Cheers,
--Alan

Christophe Lyon wrote:
> Hi Alan,
> 
> In my cross-testing I've noticed that your new test:
> gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-not not
> is PASS for targets aarch64-none-elf and aarch64_be-none-elf, but
> FAIL for aarch64-none-linux-gnu.
> 
> It seems this is not what you saw in your own validations?
> 
> Christophe.
> 
> 
> 
> On 2 September 2014 17:17, Marcus Shawcroft <marcus.shawcroft@gmail.com> wrote:
>> On 19 August 2014 11:44, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>
>>> gcc/ChangeLog:
>>>
>>>         * config/aarch64/aarch64-builtins.c (aarch64_types_cmtst_qualifiers,
>>>         TYPES_TST): Define.
>>>         (aarch64_fold_builtin): Update pattern for cmtst.
>>>
>>>         * config/aarch64/aarch64-protos.h
>>> (aarch64_const_vec_all_same_int_p):
>>>         Declare.
>>>
>>>         * config/aarch64/aarch64-simd-builtins.def (cmtst): Update
>>> qualifiers.
>>>
>>>         * config/aarch64/aarch64-simd.md
>>> (aarch64_vcond_internal<mode><mode>):
>>>         Switch operands, separate out more cases, refactor.
>>>
>>>         (aarch64_cmtst<mode>): Rewrite pattern to match (plus ... -1).
>>>
>>>         * config/aarch64.c (aarch64_const_vec_all_same_int_p): Take single
>>>         argument; rename old version to...
>>>         (aarch64_const_vec_all_same_in_range_p): ...this.
>>>         (aarch64_print_operand, aarch64_simd_shift_imm_p): Follow renaming.
>>>
>>>         * config/aarch64/predicates.md (aarch64_simd_imm_minus_one): Define.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>         * gcc.target/aarch64/simd/int_comparisons.x: New file.
>>>         * gcc.target/aarch64/simd/int_comparisons_1.c: New test.
>>>         * gcc.target/aarch64/simd/int_comparisons_2.c: Ditto.
>> OK /Marcus
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH][AArch64 Testsuite]Fix scan-assembler test false alarm on aarch64-linux-gnu
  2014-09-08 12:52   ` Christophe Lyon
  2014-09-08 16:12     ` Alan Lawrence
@ 2014-09-09 10:20     ` Alan Lawrence
  2014-09-09 10:49       ` Marcus Shawcroft
  1 sibling, 1 reply; 8+ messages in thread
From: Alan Lawrence @ 2014-09-09 10:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: Christophe Lyon

[-- Attachment #1: Type: text/plain, Size: 2323 bytes --]

The 'scan-assembler-not not' test in gcc.target/aarch64/simd/int_comparisons_1.c 
fails on aarch64-linux-gnu because the compiler adds a ".note" at the end of the 
.s file. This patch tightens the regex to only match a not with surrounding 
whitespace. (I've verified it still catches such if e.g. the changes to 
vcond_internal are reverted).

Test now passing on aarch64-none-elf and aarch64-none-linux-gnu.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/simd/inc_comparisons_1.c: Tighten regexp.

Christophe Lyon wrote:
> Hi Alan,
> 
> In my cross-testing I've noticed that your new test:
> gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-not not
> is PASS for targets aarch64-none-elf and aarch64_be-none-elf, but
> FAIL for aarch64-none-linux-gnu.
> 
> It seems this is not what you saw in your own validations?
> 
> Christophe.
> 
> 
> 
> On 2 September 2014 17:17, Marcus Shawcroft <marcus.shawcroft@gmail.com> wrote:
>> On 19 August 2014 11:44, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>
>>> gcc/ChangeLog:
>>>
>>>         * config/aarch64/aarch64-builtins.c (aarch64_types_cmtst_qualifiers,
>>>         TYPES_TST): Define.
>>>         (aarch64_fold_builtin): Update pattern for cmtst.
>>>
>>>         * config/aarch64/aarch64-protos.h
>>> (aarch64_const_vec_all_same_int_p):
>>>         Declare.
>>>
>>>         * config/aarch64/aarch64-simd-builtins.def (cmtst): Update
>>> qualifiers.
>>>
>>>         * config/aarch64/aarch64-simd.md
>>> (aarch64_vcond_internal<mode><mode>):
>>>         Switch operands, separate out more cases, refactor.
>>>
>>>         (aarch64_cmtst<mode>): Rewrite pattern to match (plus ... -1).
>>>
>>>         * config/aarch64.c (aarch64_const_vec_all_same_int_p): Take single
>>>         argument; rename old version to...
>>>         (aarch64_const_vec_all_same_in_range_p): ...this.
>>>         (aarch64_print_operand, aarch64_simd_shift_imm_p): Follow renaming.
>>>
>>>         * config/aarch64/predicates.md (aarch64_simd_imm_minus_one): Define.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>         * gcc.target/aarch64/simd/int_comparisons.x: New file.
>>>         * gcc.target/aarch64/simd/int_comparisons_1.c: New test.
>>>         * gcc.target/aarch64/simd/int_comparisons_2.c: Ditto.
>> OK /Marcus
> 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fix_not_not.patch --]
[-- Type: text/x-patch; name=fix_not_not.patch, Size: 697 bytes --]

diff --git a/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_1.c b/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_1.c
index 86c6ed28538affcc4c3ef6cacd74d002e32b0931..cb0f4a04c0fb5f2c93064c47a141556f7fd0f89a 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_1.c
@@ -44,4 +44,4 @@
 /* { dg-final { scan-assembler-times "\[ \t\](?:cmlt|sshr)\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?(?:0|63)" 4 } } */
 
 // All should have been compiled into single insns without inverting result:
-/* { dg-final { scan-assembler-not "not" } } */
+/* { dg-final { scan-assembler-not "\[ \t\]not\[ \t\]" } } */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][AArch64 Testsuite]Fix scan-assembler test false alarm on aarch64-linux-gnu
  2014-09-09 10:20     ` [PATCH][AArch64 Testsuite]Fix scan-assembler test false alarm on aarch64-linux-gnu Alan Lawrence
@ 2014-09-09 10:49       ` Marcus Shawcroft
  0 siblings, 0 replies; 8+ messages in thread
From: Marcus Shawcroft @ 2014-09-09 10:49 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On 9 September 2014 11:20, Alan Lawrence <alan.lawrence@arm.com> wrote:
> The 'scan-assembler-not not' test in
> gcc.target/aarch64/simd/int_comparisons_1.c fails on aarch64-linux-gnu
> because the compiler adds a ".note" at the end of the .s file. This patch
> tightens the regex to only match a not with surrounding whitespace. (I've
> verified it still catches such if e.g. the changes to vcond_internal are
> reverted).
>
> Test now passing on aarch64-none-elf and aarch64-none-linux-gnu.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/simd/inc_comparisons_1.c: Tighten regexp.

OK /Marcus

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-09-09 10:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-19 10:44 [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction Alan Lawrence
2014-08-19 13:43 ` [PATCH AArch64 2/2] Remove vector compare/tst __builtins Alan Lawrence
2014-09-02 15:19   ` Marcus Shawcroft
2014-09-02 15:17 ` [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction Marcus Shawcroft
2014-09-08 12:52   ` Christophe Lyon
2014-09-08 16:12     ` Alan Lawrence
2014-09-09 10:20     ` [PATCH][AArch64 Testsuite]Fix scan-assembler test false alarm on aarch64-linux-gnu Alan Lawrence
2014-09-09 10:49       ` Marcus Shawcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).