[PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p
@ 2021-09-07  9:16 Christophe Lyon
  2021-09-07  9:16 ` [PATCH 06/13] arm: Fix mve_vmvnq_n_<supf><mode> argument mode Christophe Lyon
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Christophe Lyon @ 2021-09-07  9:16 UTC (permalink / raw)
  To: gcc-patches

VPR_REG is the only register in its class, so it should be handled by
TARGET_CLASS_LIKELY_SPILLED_P.  No test fails without this patch, but
it seems it should be implemented.

2021-09-01  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	* config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 11dafc70067..1222cb0d0fe 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29307,6 +29307,9 @@ arm_class_likely_spilled_p (reg_class_t rclass)
       || rclass  == CC_REG)
     return true;
 
+  if (TARGET_HAVE_MVE && (rclass == VPR_REG))
+    return true;
+
   return false;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 06/13] arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  2021-09-07  9:16 [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p Christophe Lyon
@ 2021-09-07  9:16 ` Christophe Lyon
  2021-09-28 11:24   ` Kyrylo Tkachov
  2021-10-11 13:42   ` Richard Sandiford
  2021-09-07  9:16 ` [PATCH 07/13] arm: Implement MVE predicates as vectors of booleans Christophe Lyon
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 13+ messages in thread
From: Christophe Lyon @ 2021-09-07  9:16 UTC (permalink / raw)
  To: gcc-patches

The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
<V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.

2021-09-03  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	* config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
	for operand 1.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index e393518ea88..14d17060290 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_<supf><mode>"
 (define_insn "mve_vmvnq_n_<supf><mode>"
   [
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
-	(unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
+	(unspec:MVE_5 [(match_operand:<V_elem> 1 "immediate_operand" "i")]
 	 VMVNQ_N))
   ]
   "TARGET_HAVE_MVE"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 07/13] arm: Implement MVE predicates as vectors of booleans
  2021-09-07  9:16 [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p Christophe Lyon
  2021-09-07  9:16 ` [PATCH 06/13] arm: Fix mve_vmvnq_n_<supf><mode> argument mode Christophe Lyon
@ 2021-09-07  9:16 ` Christophe Lyon
  2021-10-11 13:50   ` Richard Sandiford
  2021-09-07  9:16 ` [PATCH 08/13] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates Christophe Lyon
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Christophe Lyon @ 2021-09-07  9:16 UTC (permalink / raw)
  To: gcc-patches

This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

2021-09-01  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.c (arm_type_qualifiers): Add qualifier_predicate.
	(arm_init_simd_builtin_types): Add new simd types.
	(arm_init_builtin): Map predicate vectors arguments to HImode.
	(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
	rtx. Move return value to HImode rtx.
	* config/arm/arm-modes.def (V16BI, V8BI, V4BI): New modes.
	* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
	Pred2x8_t,Pred4x4_t): New.
	* simplify-rtx.c (test_vector_ops_duplicate): Avoid going past the
	end of the test vector.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 3a9ff8f26b8..771759f0cdd 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -92,7 +92,9 @@ enum arm_type_qualifiers
   qualifier_lane_pair_index = 0x1000,
   /* Lane indices selected in quadtuplets - must be within range of previous
      argument = a vector.  */
-  qualifier_lane_quadtup_index = 0x2000
+  qualifier_lane_quadtup_index = 0x2000,
+  /* MVE vector predicates.  */
+  qualifier_predicate = 0x4000
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -1633,6 +1635,13 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
   arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
 
+  if (TARGET_HAVE_MVE)
+    {
+      arm_simd_types[Pred1x16_t].eltype = unsigned_intHI_type_node;
+      arm_simd_types[Pred2x8_t].eltype = unsigned_intHI_type_node;
+      arm_simd_types[Pred4x4_t].eltype = unsigned_intHI_type_node;
+    }
+
   for (i = 0; i < nelts; i++)
     {
       tree eltype = arm_simd_types[i].eltype;
@@ -1780,6 +1789,11 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum *d,
       if (qualifiers & qualifier_map_mode)
 	op_mode = d->mode;
 
+      /* MVE Predicates use HImode as mandated by the ABI: pred16_t is unsigned
+	 short.  */
+      if (qualifiers & qualifier_predicate)
+	op_mode = HImode;
+
       /* For pointers, we want a pointer to the basic type
 	 of the vector.  */
       if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
@@ -3024,6 +3038,11 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
 	    case ARG_BUILTIN_COPY_TO_REG:
 	      if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
 		op[argc] = convert_memory_address (Pmode, op[argc]);
+
+	      /* MVE uses mve_pred16_t (aka HImode) for vectors of predicates.  */
+	      if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
+		op[argc] = gen_lowpart (mode[argc], op[argc]);
+
 	      /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
 	      if (!(*insn_data[icode].operand[opno].predicate)
 		  (op[argc], mode[argc]))
@@ -3229,6 +3248,13 @@ constant_arg:
   else
     emit_insn (insn);
 
+  if (GET_MODE_CLASS (tmode) == MODE_VECTOR_BOOL)
+    {
+      rtx HItarget = gen_reg_rtx (HImode);
+      emit_move_insn (HItarget, gen_lowpart (HImode, target));
+      return HItarget;
+    }
+
   return target;
 }
 
diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index a5e74ba3943..b414a709a62 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -84,6 +84,11 @@ VECTOR_MODE (FLOAT, BF, 2);   /*                 V2BF.  */
 VECTOR_MODE (FLOAT, BF, 4);   /*		 V4BF.  */
 VECTOR_MODE (FLOAT, BF, 8);   /*		 V8BF.  */
 
+/* Predicates for MVE.  */
+VECTOR_BOOL_MODE (V16BI, 16, 2);
+VECTOR_BOOL_MODE (V8BI, 8, 2);
+VECTOR_BOOL_MODE (V4BI, 4, 2);
+
 /* Fraction and accumulator vector modes.  */
 VECTOR_MODES (FRACT, 4);      /* V4QQ  V2HQ */
 VECTOR_MODES (UFRACT, 4);     /* V4UQQ V2UHQ */
diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
index c19a1b6e3eb..d3987985b4c 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -51,3 +51,7 @@
   ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
   ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
   ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
+
+  ENTRY (Pred1x16_t, V16BI, unsigned, 16, uint16, 21)
+  ENTRY (Pred2x8_t, V8BI, unsigned, 8, uint16, 21)
+  ENTRY (Pred4x4_t, V4BI, unsigned, 4, uint16, 21)
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index a719f57870f..1453f984f99 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -7642,6 +7642,13 @@ test_vector_ops_duplicate (machine_mode mode, rtx scalar_reg)
 	  rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
 	  rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
 	  poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
+
+	  /* OFFSET is in bytes, so stop testing when we go past the end of a
+	     vector of booleans, where we would need an offset in bits.  */
+	  if ((GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+	      && (maybe_ge (offset, GET_MODE_SIZE (mode))))
+	    break;
+
 	  ASSERT_RTX_EQ (scalar_reg,
 			 simplify_gen_subreg (inner_mode, vm,
 					      mode, offset));
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 08/13] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates
  2021-09-07  9:16 [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p Christophe Lyon
  2021-09-07  9:16 ` [PATCH 06/13] arm: Fix mve_vmvnq_n_<supf><mode> argument mode Christophe Lyon
  2021-09-07  9:16 ` [PATCH 07/13] arm: Implement MVE predicates as vectors of booleans Christophe Lyon
@ 2021-09-07  9:16 ` Christophe Lyon
  2021-10-11 13:59   ` Richard Sandiford
  2021-09-07  9:17 ` [PATCH 09/13] arm: Fix vcond_mask expander for MVE (PR target/100757) Christophe Lyon
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Christophe Lyon @ 2021-09-07  9:16 UTC (permalink / raw)
  To: gcc-patches

We make use of qualifier_predicate to describe MVE builtins
prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
as they are exercised by the tests added earlier in the series.

Special handling is needed for mve_vpselq because it has a v2di
variant, which has no natural VPR.P0 representation: we keep HImode
for it.

The vector_compare expansion code is updated to use the right VxBI
mode instead of HI for the result.

New mov patterns are introduced to handle the new modes.

2021-09-01  Christophe Lyon <christophe.lyon@foss.st.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.c (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
	(BINOP_PRED_NONE_NONE_QUALIFIERS)
	(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
	(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
	* config/arm/arm.c (arm_hard_regno_mode_ok): Handle new VxBI
	modes.
	(arm_mode_to_pred_mode): New.
	(arm_expand_vector_compare): Use the right VxBI mode instead of
	HI.
	(arm_expand_vcond): Likewise.
	* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
	(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
	(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
	(vpselq_s, vpselq_f): Use new predicated qualifiers.
	* config/arm/iterators.md (MVE_7): New mode iterator.
	(MVE_VPRED, MVE_vpred): New attribute iterators.
	* config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>)
	(@mve_vcmp<mve_cmp_op>q_f<mode>, @mve_vpselq_<supf><mode>)
	(@mve_vpselq_f<mode>): Use MVE_VPRED instead of HI.
	(@mve_vpselq_<supf>v2di): Define separately.
	(mov<mode>): New expander for VxBI modes.
	(mve_mov<mode>): New insn for VxBI modes.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 771759f0cdd..6e3638869f1 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -469,6 +469,12 @@ arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_binop_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_PRED_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_pred_unone_unone_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_none, qualifier_immediate };
@@ -487,6 +493,12 @@ arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_NONE_NONE_QUALIFIERS \
   (arm_binop_unone_none_none_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none };
+#define BINOP_PRED_NONE_NONE_QUALIFIERS \
+  (arm_binop_pred_none_none_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
@@ -558,6 +570,12 @@ arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
   (arm_ternop_none_none_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
+#define TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_none_none_none_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_immediate, qualifier_unsigned };
@@ -577,6 +595,13 @@ arm_ternop_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_ternop_unone_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+    qualifier_predicate };
+#define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_unone_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1222cb0d0fe..5f6637d9a5f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25304,7 +25304,7 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
     return false;
 
   if (IS_VPR_REGNUM (regno))
-    return mode == HImode;
+    return mode == HImode || mode == V16BImode || mode == V8BImode || mode == V4BImode;
 
   if (TARGET_THUMB1)
     /* For the Thumb we only allow values bigger than SImode in
@@ -30994,6 +30994,19 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
     arm_post_atomic_barrier (model);
 }
 \f
+/* Return the mode for the MVE vector of predicates corresponding to MODE.  */
+machine_mode
+arm_mode_to_pred_mode (machine_mode mode)
+{
+  switch (GET_MODE_NUNITS (mode))
+    {
+    case 16: return V16BImode;
+    case 8: return V8BImode;
+    case 4: return V4BImode;
+    }
+  gcc_unreachable ();
+}
+
 /* Expand code to compare vectors OP0 and OP1 using condition CODE.
    If CAN_INVERT, store either the result or its inverse in TARGET
    and return true if TARGET contains the inverse.  If !CAN_INVERT,
@@ -31077,7 +31090,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
 	  if (vcond_mve)
 	    vpr_p0 = target;
 	  else
-	    vpr_p0 = gen_reg_rtx (HImode);
+	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
 
 	  switch (GET_MODE_CLASS (cmp_mode))
 	    {
@@ -31119,7 +31132,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
 	  if (vcond_mve)
 	    vpr_p0 = target;
 	  else
-	    vpr_p0 = gen_reg_rtx (HImode);
+	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
 
 	  emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
 	  if (!vcond_mve)
@@ -31146,7 +31159,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
 	  if (vcond_mve)
 	    vpr_p0 = target;
 	  else
-	    vpr_p0 = gen_reg_rtx (HImode);
+	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
 
 	  emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, force_reg (cmp_mode, op1), op0));
 	  if (!vcond_mve)
@@ -31199,7 +31212,7 @@ arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode)
   if (TARGET_HAVE_MVE)
     {
       vcond_mve=true;
-      mask = gen_reg_rtx (HImode);
+      mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode));
     }
   else
     mask = gen_reg_rtx (cmp_result_mode);
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index e9b5b28f506..58a05e61bd9 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -89,7 +89,7 @@ VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
 VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
 VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vsubq_u, v16qi, v8hi, v4si)
@@ -117,9 +117,9 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vhsubq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si)
@@ -143,15 +143,15 @@ VAR3 (BINOP_UNONE_UNONE_IMM, vshlq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vrshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vqshlq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpeqq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_IMM, vqshluq_n_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_UNONE, vaddvq_p_s, v16qi, v8hi, v4si)
@@ -219,17 +219,17 @@ VAR2 (BINOP_UNONE_UNONE_IMM, vshllbq_n_u, v16qi, v8hi)
 VAR2 (BINOP_UNONE_UNONE_IMM, vorrq_n_u, v8hi, v4si)
 VAR2 (BINOP_UNONE_UNONE_IMM, vbicq_n_u, v8hi, v4si)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpneq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpneq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpneq_f, v8hf, v4sf)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpltq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpltq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpltq_f, v8hf, v4sf)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpleq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpleq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpleq_f, v8hf, v4sf)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpgtq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpgtq_f, v8hf, v4sf)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpgeq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpgeq_f, v8hf, v4sf)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpeqq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpeqq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpeqq_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vsubq_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vqmovntq_s, v8hi, v4si)
 VAR2 (BINOP_NONE_NONE_NONE, vqmovnbq_s, v8hi, v4si)
@@ -295,8 +295,8 @@ VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE, vcvtaq_m_u, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtaq_m_s, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vshlcq_vec_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_UNONE_IMM, vshlcq_vec_s, v16qi, v8hi, v4si)
-VAR4 (TERNOP_UNONE_UNONE_UNONE_UNONE, vpselq_u, v16qi, v8hi, v4si, v2di)
-VAR4 (TERNOP_NONE_NONE_NONE_UNONE, vpselq_s, v16qi, v8hi, v4si, v2di)
+VAR4 (TERNOP_UNONE_UNONE_UNONE_PRED, vpselq_u, v16qi, v8hi, v4si, v2di)
+VAR4 (TERNOP_NONE_NONE_NONE_PRED, vpselq_s, v16qi, v8hi, v4si, v2di)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrev64q_m_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmvnq_m_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmlasq_n_u, v16qi, v8hi, v4si)
@@ -426,7 +426,7 @@ VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrev64q_m_f, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrev32q_m_s, v16qi, v8hi)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vqmovntq_m_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vqmovnbq_m_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vpselq_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vpselq_f, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vnegq_m_f, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovntq_m_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovnbq_m_s, v8hi, v4si)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index fafbd2f94b8..df5d15e08b8 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -272,6 +272,7 @@ (define_mode_iterator MVE_3 [V16QI V8HI])
 (define_mode_iterator MVE_2 [V16QI V8HI V4SI])
 (define_mode_iterator MVE_5 [V8HI V4SI])
 (define_mode_iterator MVE_6 [V8HI V4SI])
+(define_mode_iterator MVE_7 [V16BI V8BI V4BI])
 
 ;;----------------------------------------------------------------------------
 ;; Code iterators
@@ -946,6 +947,10 @@ (define_mode_attr V_extr_elem [(V16QI "u8") (V8HI "u16") (V4SI "32")
 			       (V8HF "u16") (V4SF "32")])
 (define_mode_attr earlyclobber_32 [(V16QI "=w") (V8HI "=w") (V4SI "=&w")
 						(V8HF "=w") (V4SF "=&w")])
+(define_mode_attr MVE_VPRED [(V16QI "V16BI") (V8HI "V8BI") (V4SI "V4BI")
+                             (V8HF "V8BI")   (V4SF "V4BI")])
+(define_mode_attr MVE_vpred [(V16QI "v16bi") (V8HI "v8bi") (V4SI "v4bi")
+                             (V8HF "v8bi")   (V4SF "v4bi")])
 
 ;;----------------------------------------------------------------------------
 ;; Code attributes
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 14d17060290..c9c8e2c13fe 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -839,8 +839,8 @@ (define_insn "mve_vaddlvq_p_<supf>v4si"
 ;;
 (define_insn "@mve_vcmp<mve_cmp_op>q_<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(MVE_COMPARISONS:HI (match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(MVE_COMPARISONS:<MVE_VPRED> (match_operand:MVE_2 1 "s_register_operand" "w")
 		    (match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
@@ -1929,8 +1929,8 @@ (define_insn "mve_vcaddq<mve_rot><mode>"
 ;;
 (define_insn "@mve_vcmp<mve_cmp_op>q_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(MVE_FP_COMPARISONS:<MVE_VPRED> (match_operand:MVE_0 1 "s_register_operand" "w")
 			       (match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3321,9 +3321,21 @@ (define_insn "mve_vnegq_m_s<mode>"
 ;;
 (define_insn "@mve_vpselq_<supf><mode>"
   [
-   (set (match_operand:MVE_1 0 "s_register_operand" "=w")
-	(unspec:MVE_1 [(match_operand:MVE_1 1 "s_register_operand" "w")
-		       (match_operand:MVE_1 2 "s_register_operand" "w")
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
+	 VPSELQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpsel %q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+(define_insn "@mve_vpselq_<supf>v2di"
+  [
+   (set (match_operand:V2DI 0 "s_register_operand" "=w")
+	(unspec:V2DI [(match_operand:V2DI 1 "s_register_operand" "w")
+		       (match_operand:V2DI 2 "s_register_operand" "w")
 		       (match_operand:HI 3 "vpr_register_operand" "Up")]
 	 VPSELQ))
   ]
@@ -4419,7 +4431,7 @@ (define_insn "@mve_vpselq_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VPSELQ_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -10516,3 +10528,25 @@ (define_insn "*movmisalign<mode>_mve_load"
   "vldr<V_sz_elem1>.<V_sz_elem>\t%q0, %E1"
   [(set_attr "type" "mve_load")]
 )
+
+(define_expand "mov<mode>"
+  [(set (match_operand:MVE_7 0 "nonimmediate_operand")
+        (match_operand:MVE_7 1 "nonimmediate_operand"))]
+  "TARGET_HAVE_MVE"
+  {
+  }
+)
+
+(define_insn "*mve_mov<mode>"
+  [(set (match_operand:MVE_7 0 "nonimmediate_operand" "=rk, m, r, Up, r")
+        (match_operand:MVE_7 1 "nonimmediate_operand"  "rk, r, m, r, Up"))]
+  "TARGET_HAVE_MVE
+  && (register_operand (operands[0], <MODE>mode)
+      || register_operand (operands[1], <MODE>mode))"
+  "@
+  mov%?\t%0, %1
+  strh%?\t%1, %0
+  ldrh%?\t%0, %1
+  vmsr%?\t P0, %1
+  vmrs%?\t %0, P0"
+)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 09/13] arm: Fix vcond_mask expander for MVE (PR target/100757)
  2021-09-07  9:16 [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p Christophe Lyon
                   ` (2 preceding siblings ...)
  2021-09-07  9:16 ` [PATCH 08/13] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates Christophe Lyon
@ 2021-09-07  9:17 ` Christophe Lyon
  2021-10-11 14:06   ` Richard Sandiford
  2021-09-28 11:23 ` [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p Kyrylo Tkachov
  2021-10-11 13:11 ` Richard Sandiford
  5 siblings, 1 reply; 13+ messages in thread
From: Christophe Lyon @ 2021-09-07  9:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: Christophe Lyon

From: Christophe Lyon <christophe.lyon@linaro.org>

The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp<mode><MVE_vpred>,
vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can
move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and
vcond_mask_<mode><v_cmp_result> back to neon.md since they are not
used by MVE anymore.  The new *<MVE_vpred> patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(!<Is_float_mode> || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
    if (a[b] != 2.0f)
      c = 5.0f;
  return c;
}

fn1:
	ldr     r3, .L3+48
	vldr.64 d4, .L3              // q2=(2.0,2.0,2.0,2.0)
	vldr.64 d5, .L3+8
	vldrw.32        q0, [r3]     // q0=a(0..3)
	adds    r3, r3, #16
	vcmp.f32        eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
	vldrw.32        q1, [r3]     // q1=a(4..7)
	vmrs     r3, P0
	vcmp.f32        eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
	vmrs    r2, P0  @ movhi
	ands    r3, r3, r2           // r3=select(a(0..3]) & select(a(4..7))
	vldr.64 d4, .L3+16           // q2=(5.0,5.0,5.0,5.0)
	vldr.64 d5, .L3+24
	vmsr     P0, r3
	vldr.64 d6, .L3+32           // q3=(4.0,4.0,4.0,4.0)
	vldr.64 d7, .L3+40
	vpsel q3, q3, q2             // q3=vcond_mask(4.0,5.0)
	vmov.32 r2, q3[1]            // keep the scalar max
	vmov.32 r0, q3[3]
	vmov.32 r3, q3[2]
	vmov.f32        s11, s12
	vmov    s15, r2
	vmov    s14, r3
	vmaxnm.f32      s15, s11, s15
	vmaxnm.f32      s15, s15, s14
	vmov    s14, r0
	vmaxnm.f32      s15, s15, s14
	vmov    r0, s15
	bx      lr
	.L4:
	.align  3
	.L3:
	.word   1073741824	// 2.0f
	.word   1073741824
	.word   1073741824
	.word   1073741824
	.word   1084227584	// 5.0f
	.word   1084227584
	.word   1084227584
	.word   1084227584
	.word   1082130432	// 4.0f
	.word   1082130432
	.word   1082130432
	.word   1082130432

2021-09-02  Christophe Lyon  <christophe.lyon@linaro.org>

	PR target/100757
	gcc/
	* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
	(arm_expand_vector_compare): Update prototype.
	* config/arm/arm.c (TARGET_VECTORIZE_GET_MASK_MODE): New.
	(arm_vector_mode_supported_p): Add support for VxBI modes.
	(arm_expand_vector_compare): Remove useless generation of vpsel.
	(arm_expand_vcond): Fix select operands.
	(arm_get_mask_mode): New.
	* config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New.
	(vec_cmpu<mode><MVE_vpred>): New.
	(vcond_mask_<mode><MVE_vpred>): New.
	* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>)
	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ...
	* config/arm/neon.md (vec_cmp<mode><v_cmp_result>)
	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here
	and disable for MVE.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 9b1f61394ad..9e3d71e0c29 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -201,6 +201,7 @@ extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
 extern bool arm_pad_reg_upward (machine_mode, tree, int);
 #endif
 extern int arm_apply_result_size (void);
+extern opt_machine_mode arm_get_mask_mode (machine_mode mode);
 
 #endif /* RTX_CODE */
 
@@ -372,7 +373,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
 extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
 extern bool arm_valid_symbolic_address_p (rtx);
 extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
-extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool, bool);
+extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool);
 #endif /* RTX_CODE */
 
 extern bool arm_gen_setmem (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5f6637d9a5f..3326cd163a2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -835,6 +835,10 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
+
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE arm_get_mask_mode
+
 \f
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
@@ -29193,7 +29197,8 @@ arm_vector_mode_supported_p (machine_mode mode)
 
   if (TARGET_HAVE_MVE
       && (mode == V2DImode || mode == V4SImode || mode == V8HImode
-	  || mode == V16QImode))
+	  || mode == V16QImode
+	  || mode == V16BImode || mode == V8BImode || mode == V4BImode))
       return true;
 
   if (TARGET_HAVE_MVE_FLOAT
@@ -31012,16 +31017,12 @@ arm_mode_to_pred_mode (machine_mode mode)
    and return true if TARGET contains the inverse.  If !CAN_INVERT,
    always store the result in TARGET, never its inverse.
 
-   If VCOND_MVE, do not emit the vpsel instruction here, let arm_expand_vcond do
-   it with the right destination type to avoid emiting two vpsel, one here and
-   one in arm_expand_vcond.
-
    Note that the handling of floating-point comparisons is not
    IEEE compliant.  */
 
 bool
 arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
-			   bool can_invert, bool vcond_mve)
+			   bool can_invert)
 {
   machine_mode cmp_result_mode = GET_MODE (target);
   machine_mode cmp_mode = GET_MODE (op0);
@@ -31050,7 +31051,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
 	       and then store its inverse in TARGET.  This avoids reusing
 	       TARGET (which for integer NE could be one of the inputs).  */
 	    rtx tmp = gen_reg_rtx (cmp_result_mode);
-	    if (arm_expand_vector_compare (tmp, code, op0, op1, true, vcond_mve))
+	    if (arm_expand_vector_compare (tmp, code, op0, op1, true))
 	      gcc_unreachable ();
 	    emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp)));
 	    return false;
@@ -31086,36 +31087,20 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
     case NE:
       if (TARGET_HAVE_MVE)
 	{
-	  rtx vpr_p0;
-	  if (vcond_mve)
-	    vpr_p0 = target;
-	  else
-	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
-
 	  switch (GET_MODE_CLASS (cmp_mode))
 	    {
 	    case MODE_VECTOR_INT:
-	      emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
+	      emit_insn (gen_mve_vcmpq (code, cmp_mode, target, op0, force_reg (cmp_mode, op1)));
 	      break;
 	    case MODE_VECTOR_FLOAT:
 	      if (TARGET_HAVE_MVE_FLOAT)
-		emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
+		emit_insn (gen_mve_vcmpq_f (code, cmp_mode, target, op0, force_reg (cmp_mode, op1)));
 	      else
 		gcc_unreachable ();
 	      break;
 	    default:
 	      gcc_unreachable ();
 	    }
-
-	  /* If we are not expanding a vcond, build the result here.  */
-	  if (!vcond_mve)
-	    {
-	      rtx zero = gen_reg_rtx (cmp_result_mode);
-	      rtx one = gen_reg_rtx (cmp_result_mode);
-	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
-	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
-	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0));
-	    }
 	}
       else
 	emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1));
@@ -31127,23 +31112,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
     case GEU:
     case GTU:
       if (TARGET_HAVE_MVE)
-	{
-	  rtx vpr_p0;
-	  if (vcond_mve)
-	    vpr_p0 = target;
-	  else
-	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
-
-	  emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
-	  if (!vcond_mve)
-	    {
-	      rtx zero = gen_reg_rtx (cmp_result_mode);
-	      rtx one = gen_reg_rtx (cmp_result_mode);
-	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
-	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
-	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0));
-	    }
-	}
+	emit_insn (gen_mve_vcmpq (code, cmp_mode, target, op0, force_reg (cmp_mode, op1)));
       else
 	emit_insn (gen_neon_vc (code, cmp_mode, target,
 				op0, force_reg (cmp_mode, op1)));
@@ -31154,23 +31123,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
     case LEU:
     case LTU:
       if (TARGET_HAVE_MVE)
-	{
-	  rtx vpr_p0;
-	  if (vcond_mve)
-	    vpr_p0 = target;
-	  else
-	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
-
-	  emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, force_reg (cmp_mode, op1), op0));
-	  if (!vcond_mve)
-	    {
-	      rtx zero = gen_reg_rtx (cmp_result_mode);
-	      rtx one = gen_reg_rtx (cmp_result_mode);
-	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
-	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
-	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0));
-	    }
-	}
+	emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, target, force_reg (cmp_mode, op1), op0));
       else
 	emit_insn (gen_neon_vc (swap_condition (code), cmp_mode,
 				target, force_reg (cmp_mode, op1), op0));
@@ -31185,8 +31138,8 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
 	rtx gt_res = gen_reg_rtx (cmp_result_mode);
 	rtx alt_res = gen_reg_rtx (cmp_result_mode);
 	rtx_code alt_code = (code == LTGT ? LT : LE);
-	if (arm_expand_vector_compare (gt_res, GT, op0, op1, true, vcond_mve)
-	    || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true, vcond_mve))
+	if (arm_expand_vector_compare (gt_res, GT, op0, op1, true)
+	    || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true))
 	  gcc_unreachable ();
 	emit_insn (gen_rtx_SET (target, gen_rtx_IOR (cmp_result_mode,
 						     gt_res, alt_res)));
@@ -31206,19 +31159,15 @@ arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode)
 {
   /* When expanding for MVE, we do not want to emit a (useless) vpsel in
      arm_expand_vector_compare, and another one here.  */
-  bool vcond_mve=false;
   rtx mask;
 
   if (TARGET_HAVE_MVE)
-    {
-      vcond_mve=true;
-      mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode));
-    }
+    mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode));
   else
     mask = gen_reg_rtx (cmp_result_mode);
 
   bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]),
-					     operands[4], operands[5], true, vcond_mve);
+					     operands[4], operands[5], true);
   if (inverted)
     std::swap (operands[1], operands[2]);
   if (TARGET_NEON)
@@ -31226,20 +31175,20 @@ arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode)
 			    mask, operands[1], operands[2]));
   else
     {
-      machine_mode cmp_mode = GET_MODE (operands[4]);
-      rtx vpr_p0 = mask;
-      rtx zero = gen_reg_rtx (cmp_mode);
-      rtx one = gen_reg_rtx (cmp_mode);
-      emit_move_insn (zero, CONST0_RTX (cmp_mode));
-      emit_move_insn (one, CONST1_RTX (cmp_mode));
+      machine_mode cmp_mode = GET_MODE (operands[0]);
+
       switch (GET_MODE_CLASS (cmp_mode))
 	{
 	case MODE_VECTOR_INT:
-	  emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], one, zero, vpr_p0));
+	  emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_mode, operands[0],
+				     operands[1], operands[2], mask));
 	  break;
 	case MODE_VECTOR_FLOAT:
 	  if (TARGET_HAVE_MVE_FLOAT)
-	    emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, vpr_p0));
+	    emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0],
+					 operands[1], operands[2], mask));
+	  else
+	    gcc_unreachable ();
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -34149,4 +34098,15 @@ arm_mode_base_reg_class (machine_mode mode)
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE.  */
+
+opt_machine_mode
+arm_get_mask_mode (machine_mode mode)
+{
+  if (TARGET_HAVE_MVE)
+    return arm_mode_to_pred_mode (mode);
+
+  return default_get_mask_mode (mode);
+}
+
 #include "gt-arm.h"
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index c9c8e2c13fe..d663c698cfb 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -10550,3 +10550,58 @@ (define_insn "*mve_mov<mode>"
   vmsr%?\t P0, %1
   vmrs%?\t %0, P0"
 )
+
+;; Expanders for vec_cmp and vcond
+
+(define_expand "vec_cmp<mode><MVE_vpred>"
+  [(set (match_operand:<MVE_VPRED> 0 "s_register_operand")
+	(match_operator:<MVE_VPRED> 1 "comparison_operator"
+	  [(match_operand:MVE_VLD_ST 2 "s_register_operand")
+	   (match_operand:MVE_VLD_ST 3 "reg_or_zero_operand")]))]
+  "TARGET_HAVE_MVE
+   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+{
+  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
+			     operands[2], operands[3], false);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><MVE_vpred>"
+  [(set (match_operand:<MVE_VPRED> 0 "s_register_operand")
+	(match_operator:<MVE_VPRED> 1 "comparison_operator"
+	  [(match_operand:MVE_2 2 "s_register_operand")
+	   (match_operand:MVE_2 3 "reg_or_zero_operand")]))]
+  "TARGET_HAVE_MVE
+   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+{
+  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
+			     operands[2], operands[3], false);
+  DONE;
+})
+
+(define_expand "vcond_mask_<mode><MVE_vpred>"
+  [(set (match_operand:MVE_VLD_ST 0 "s_register_operand")
+	(if_then_else:MVE_VLD_ST
+	  (match_operand:<MVE_VPRED> 3 "s_register_operand")
+	  (match_operand:MVE_VLD_ST 1 "s_register_operand")
+	  (match_operand:MVE_VLD_ST 2 "s_register_operand")))]
+  "TARGET_HAVE_MVE"
+{
+  switch (GET_MODE_CLASS (<MODE>mode))
+    {
+      case MODE_VECTOR_INT:
+	emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0],
+				   operands[1], operands[2], operands[3]));
+	break;
+      case MODE_VECTOR_FLOAT:
+	if (TARGET_HAVE_MVE_FLOAT)
+	  emit_insn (gen_mve_vpselq_f (<MODE>mode, operands[0],
+				       operands[1], operands[2], operands[3]));
+	else
+	  gcc_unreachable ();
+	break;
+      default:
+	gcc_unreachable ();
+    }
+  DONE;
+})
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 8b0a396947c..28310d93a4e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1394,6 +1394,45 @@ (define_insn "*us_sub<mode>_neon"
   [(set_attr "type" "neon_qsub<q>")]
 )
 
+(define_expand "vec_cmp<mode><v_cmp_result>"
+  [(set (match_operand:<V_cmp_result> 0 "s_register_operand")
+	(match_operator:<V_cmp_result> 1 "comparison_operator"
+	  [(match_operand:VDQWH 2 "s_register_operand")
+	   (match_operand:VDQWH 3 "reg_or_zero_operand")]))]
+  "TARGET_NEON
+   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+{
+  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
+			     operands[2], operands[3], false);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand")
+	(match_operator:VDQIW 1 "comparison_operator"
+	  [(match_operand:VDQIW 2 "s_register_operand")
+	   (match_operand:VDQIW 3 "reg_or_zero_operand")]))]
+  "TARGET_NEON"
+{
+  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
+			     operands[2], operands[3], false);
+  DONE;
+})
+
+(define_expand "vcond_mask_<mode><v_cmp_result>"
+  [(set (match_operand:VDQWH 0 "s_register_operand")
+	(if_then_else:VDQWH
+	  (match_operand:<V_cmp_result> 3 "s_register_operand")
+	  (match_operand:VDQWH 1 "s_register_operand")
+	  (match_operand:VDQWH 2 "s_register_operand")))]
+  "TARGET_NEON
+   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+{
+  emit_insn (gen_neon_vbsl<mode> (operands[0], operands[3], operands[1],
+				  operands[2]));
+  DONE;
+})
+
 ;; Patterns for builtins.
 
 ; good for plain vadd, vaddq.
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 68de4f0f943..9b461a76155 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -363,33 +363,6 @@ (define_expand "vlshr<mode>3"
     }
 })
 
-(define_expand "vec_cmp<mode><v_cmp_result>"
-  [(set (match_operand:<V_cmp_result> 0 "s_register_operand")
-	(match_operator:<V_cmp_result> 1 "comparison_operator"
-	  [(match_operand:VDQWH 2 "s_register_operand")
-	   (match_operand:VDQWH 3 "reg_or_zero_operand")]))]
-  "ARM_HAVE_<MODE>_ARITH
-   && !TARGET_REALLY_IWMMXT
-   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
-{
-  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
-			     operands[2], operands[3], false, false);
-  DONE;
-})
-
-(define_expand "vec_cmpu<mode><mode>"
-  [(set (match_operand:VDQIW 0 "s_register_operand")
-	(match_operator:VDQIW 1 "comparison_operator"
-	  [(match_operand:VDQIW 2 "s_register_operand")
-	   (match_operand:VDQIW 3 "reg_or_zero_operand")]))]
-  "ARM_HAVE_<MODE>_ARITH
-   && !TARGET_REALLY_IWMMXT"
-{
-  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
-			     operands[2], operands[3], false, false);
-  DONE;
-})
-
 ;; Conditional instructions.  These are comparisons with conditional moves for
 ;; vectors.  They perform the assignment:
 ;;
@@ -461,31 +434,6 @@ (define_expand "vcondu<mode><v_cmp_result>"
   DONE;
 })
 
-(define_expand "vcond_mask_<mode><v_cmp_result>"
-  [(set (match_operand:VDQWH 0 "s_register_operand")
-        (if_then_else:VDQWH
-          (match_operand:<V_cmp_result> 3 "s_register_operand")
-          (match_operand:VDQWH 1 "s_register_operand")
-          (match_operand:VDQWH 2 "s_register_operand")))]
-  "ARM_HAVE_<MODE>_ARITH
-   && !TARGET_REALLY_IWMMXT
-   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
-{
-  if (TARGET_NEON)
-    {
-      emit_insn (gen_neon_vbsl (<MODE>mode, operands[0], operands[3],
-                                operands[1], operands[2]));
-    }
-  else if (TARGET_HAVE_MVE)
-    {
-      emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0],
-                                 operands[1], operands[2], operands[3]));
-    }
-  else
-    gcc_unreachable ();
-  DONE;
-})
-
 (define_expand "vec_load_lanesoi<mode>"
   [(set (match_operand:OI 0 "s_register_operand")
         (unspec:OI [(match_operand:OI 1 "neon_struct_operand")
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p
  2021-09-07  9:16 [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p Christophe Lyon
                   ` (3 preceding siblings ...)
  2021-09-07  9:17 ` [PATCH 09/13] arm: Fix vcond_mask expander for MVE (PR target/100757) Christophe Lyon
@ 2021-09-28 11:23 ` Kyrylo Tkachov
  2021-10-11 13:11 ` Richard Sandiford
  5 siblings, 0 replies; 13+ messages in thread
From: Kyrylo Tkachov @ 2021-09-28 11:23 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches



> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: 07 September 2021 10:17
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 05/13] arm: Add support for VPR_REG in
> arm_class_likely_spilled_p
> 
> VPR_REG is the only register in its class, so it should be handled by
> TARGET_CLASS_LIKELY_SPILLED_P.  No test fails without this patch, but
> it seems it should be implemented.

The documentation for the hook does recommend returning true when there is only one register in the class.
So this seems sensible to me. It's supposed to affect optimisation rather than correctness so I'm in favour of it.
Ok.
Thanks,
Kyrill

> 
> 2021-09-01  Christophe Lyon  <christophe.lyon@foss.st.com>
> 
> 	gcc/
> 	* config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 11dafc70067..1222cb0d0fe 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -29307,6 +29307,9 @@ arm_class_likely_spilled_p (reg_class_t rclass)
>        || rclass  == CC_REG)
>      return true;
> 
> +  if (TARGET_HAVE_MVE && (rclass == VPR_REG))
> +    return true;
> +
>    return false;
>  }
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 06/13] arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  2021-09-07  9:16 ` [PATCH 06/13] arm: Fix mve_vmvnq_n_<supf><mode> argument mode Christophe Lyon
@ 2021-09-28 11:24   ` Kyrylo Tkachov
  2021-10-11 13:42   ` Richard Sandiford
  1 sibling, 0 replies; 13+ messages in thread
From: Kyrylo Tkachov @ 2021-09-28 11:24 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches



> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: 07 September 2021 10:17
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 06/13] arm: Fix mve_vmvnq_n_<supf><mode> argument
> mode
> 
> The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
> <V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.

Ok. This can go in independently from the rest if testing is ok.
Thanks,
Kyrill

> 
> 2021-09-03  Christophe Lyon  <christophe.lyon@foss.st.com>
> 
> 	gcc/
> 	* config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem
> mode
> 	for operand 1.
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index e393518ea88..14d17060290 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_<supf><mode>"
>  (define_insn "mve_vmvnq_n_<supf><mode>"
>    [
>     (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> -	(unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
> +	(unspec:MVE_5 [(match_operand:<V_elem> 1
> "immediate_operand" "i")]
>  	 VMVNQ_N))
>    ]
>    "TARGET_HAVE_MVE"
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p
  2021-09-07  9:16 [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p Christophe Lyon
                   ` (4 preceding siblings ...)
  2021-09-28 11:23 ` [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p Kyrylo Tkachov
@ 2021-10-11 13:11 ` Richard Sandiford
  5 siblings, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2021-10-11 13:11 UTC (permalink / raw)
  To: Christophe Lyon via Gcc-patches

Sorry for the very long delay in reviewing this.  Things have been
a bit hectic recently.

Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> VPR_REG is the only register in its class, so it should be handled by
> TARGET_CLASS_LIKELY_SPILLED_P.  No test fails without this patch, but
> it seems it should be implemented.

I think instead we should change the “return false” so that it
calls the default implementation (default_class_likely_spilled_p).
That should handle this case correctly, as well as any future
single-register classes that we might add.

Thanks,
Richard

>
> 2021-09-01  Christophe Lyon  <christophe.lyon@foss.st.com>
>
> 	gcc/
> 	* config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 11dafc70067..1222cb0d0fe 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -29307,6 +29307,9 @@ arm_class_likely_spilled_p (reg_class_t rclass)
>        || rclass  == CC_REG)
>      return true;
>  
> +  if (TARGET_HAVE_MVE && (rclass == VPR_REG))
> +    return true;
> +
>    return false;
>  }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 06/13] arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  2021-09-07  9:16 ` [PATCH 06/13] arm: Fix mve_vmvnq_n_<supf><mode> argument mode Christophe Lyon
  2021-09-28 11:24   ` Kyrylo Tkachov
@ 2021-10-11 13:42   ` Richard Sandiford
  2021-10-13 10:19     ` Christophe Lyon
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Sandiford @ 2021-10-11 13:42 UTC (permalink / raw)
  To: Christophe Lyon via Gcc-patches

Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
> <V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.
>
> 2021-09-03  Christophe Lyon  <christophe.lyon@foss.st.com>
>
> 	gcc/
> 	* config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
> 	for operand 1.
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index e393518ea88..14d17060290 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_<supf><mode>"
>  (define_insn "mve_vmvnq_n_<supf><mode>"
>    [
>     (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> -	(unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
> +	(unspec:MVE_5 [(match_operand:<V_elem> 1 "immediate_operand" "i")]
>  	 VMVNQ_N))
>    ]
>    "TARGET_HAVE_MVE"

I agree this is correct, but there's also the issue that the
predicate is too broad.  At the moment it allows any immediate,
so things like:

  #include <arm_mve.h>
  int32x4_t foo(void) { return vmvnq_n_s32(0x12345678); }

are accepted by the compiler and only rejected by the assembler.
Not your bug to fix, just saying :-)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 07/13] arm: Implement MVE predicates as vectors of booleans
  2021-09-07  9:16 ` [PATCH 07/13] arm: Implement MVE predicates as vectors of booleans Christophe Lyon
@ 2021-10-11 13:50   ` Richard Sandiford
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2021-10-11 13:50 UTC (permalink / raw)
  To: Christophe Lyon via Gcc-patches

Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> This patch implements support for vectors of booleans to support MVE
> predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
> uint16_t) to represent predicates in intrinsics prototypes, we
> introduce a new "predicate" type qualifier so that we can map relevant
> builtins HImode arguments and return value to the appropriate vector
> of booleans (VxBI).
>
> We have to update test_vector_ops_duplicate, because it iterates using
> an offset in bytes, where we would need to iterate in bits: we stop
> iterating when we reach the end of the vector of booleans.
>
> 2021-09-01  Christophe Lyon  <christophe.lyon@foss.st.com>
>
> 	gcc/
> 	PR target/100757
> 	PR target/101325
> 	* config/arm/arm-builtins.c (arm_type_qualifiers): Add qualifier_predicate.
> 	(arm_init_simd_builtin_types): Add new simd types.
> 	(arm_init_builtin): Map predicate vectors arguments to HImode.
> 	(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
> 	rtx. Move return value to HImode rtx.
> 	* config/arm/arm-modes.def (V16BI, V8BI, V4BI): New modes.
> 	* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
> 	Pred2x8_t,Pred4x4_t): New.
> 	* simplify-rtx.c (test_vector_ops_duplicate): Avoid going past the
> 	end of the test vector.
>
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 3a9ff8f26b8..771759f0cdd 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -92,7 +92,9 @@ enum arm_type_qualifiers
>    qualifier_lane_pair_index = 0x1000,
>    /* Lane indices selected in quadtuplets - must be within range of previous
>       argument = a vector.  */
> -  qualifier_lane_quadtup_index = 0x2000
> +  qualifier_lane_quadtup_index = 0x2000,
> +  /* MVE vector predicates.  */
> +  qualifier_predicate = 0x4000
>  };
>  
>  /*  The qualifier_internal allows generation of a unary builtin from
> @@ -1633,6 +1635,13 @@ arm_init_simd_builtin_types (void)
>    arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
>    arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
>  
> +  if (TARGET_HAVE_MVE)
> +    {
> +      arm_simd_types[Pred1x16_t].eltype = unsigned_intHI_type_node;
> +      arm_simd_types[Pred2x8_t].eltype = unsigned_intHI_type_node;
> +      arm_simd_types[Pred4x4_t].eltype = unsigned_intHI_type_node;
> +    }
> +
>    for (i = 0; i < nelts; i++)
>      {
>        tree eltype = arm_simd_types[i].eltype;
> @@ -1780,6 +1789,11 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum *d,
>        if (qualifiers & qualifier_map_mode)
>  	op_mode = d->mode;
>  
> +      /* MVE Predicates use HImode as mandated by the ABI: pred16_t is unsigned
> +	 short.  */
> +      if (qualifiers & qualifier_predicate)
> +	op_mode = HImode;
> +
>        /* For pointers, we want a pointer to the basic type
>  	 of the vector.  */
>        if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
> @@ -3024,6 +3038,11 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
>  	    case ARG_BUILTIN_COPY_TO_REG:
>  	      if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
>  		op[argc] = convert_memory_address (Pmode, op[argc]);
> +
> +	      /* MVE uses mve_pred16_t (aka HImode) for vectors of predicates.  */
> +	      if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
> +		op[argc] = gen_lowpart (mode[argc], op[argc]);
> +
>  	      /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
>  	      if (!(*insn_data[icode].operand[opno].predicate)
>  		  (op[argc], mode[argc]))
> @@ -3229,6 +3248,13 @@ constant_arg:
>    else
>      emit_insn (insn);
>  
> +  if (GET_MODE_CLASS (tmode) == MODE_VECTOR_BOOL)
> +    {
> +      rtx HItarget = gen_reg_rtx (HImode);
> +      emit_move_insn (HItarget, gen_lowpart (HImode, target));
> +      return HItarget;
> +    }
> +
>    return target;
>  }
>  
> diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
> index a5e74ba3943..b414a709a62 100644
> --- a/gcc/config/arm/arm-modes.def
> +++ b/gcc/config/arm/arm-modes.def
> @@ -84,6 +84,11 @@ VECTOR_MODE (FLOAT, BF, 2);   /*                 V2BF.  */
>  VECTOR_MODE (FLOAT, BF, 4);   /*		 V4BF.  */
>  VECTOR_MODE (FLOAT, BF, 8);   /*		 V8BF.  */
>  
> +/* Predicates for MVE.  */
> +VECTOR_BOOL_MODE (V16BI, 16, 2);
> +VECTOR_BOOL_MODE (V8BI, 8, 2);
> +VECTOR_BOOL_MODE (V4BI, 4, 2);
> +
>  /* Fraction and accumulator vector modes.  */
>  VECTOR_MODES (FRACT, 4);      /* V4QQ  V2HQ */
>  VECTOR_MODES (UFRACT, 4);     /* V4UQQ V2UHQ */
> diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
> index c19a1b6e3eb..d3987985b4c 100644
> --- a/gcc/config/arm/arm-simd-builtin-types.def
> +++ b/gcc/config/arm/arm-simd-builtin-types.def
> @@ -51,3 +51,7 @@
>    ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
>    ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
>    ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
> +
> +  ENTRY (Pred1x16_t, V16BI, unsigned, 16, uint16, 21)
> +  ENTRY (Pred2x8_t, V8BI, unsigned, 8, uint16, 21)
> +  ENTRY (Pred4x4_t, V4BI, unsigned, 4, uint16, 21)
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index a719f57870f..1453f984f99 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -7642,6 +7642,13 @@ test_vector_ops_duplicate (machine_mode mode, rtx scalar_reg)
>  	  rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
>  	  rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
>  	  poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
> +
> +	  /* OFFSET is in bytes, so stop testing when we go past the end of a
> +	     vector of booleans, where we would need an offset in bits.  */
> +	  if ((GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
> +	      && (maybe_ge (offset, GET_MODE_SIZE (mode))))
> +	    break;
> +

I think we should skip the whole for loop for vector booleans.  Although the
offset is in bytes, the vec_merge indices are still in elements (usually
bits) and so the loop will test something invalid for i != 0.

OK with that change, thanks.

Richard

>  	  ASSERT_RTX_EQ (scalar_reg,
>  			 simplify_gen_subreg (inner_mode, vm,
>  					      mode, offset));

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 08/13] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates
  2021-09-07  9:16 ` [PATCH 08/13] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates Christophe Lyon
@ 2021-10-11 13:59   ` Richard Sandiford
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2021-10-11 13:59 UTC (permalink / raw)
  To: Christophe Lyon via Gcc-patches

Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> We make use of qualifier_predicate to describe MVE builtins
> prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
> as they are exercised by the tests added earlier in the series.
>
> Special handling is needed for mve_vpselq because it has a v2di
> variant, which has no natural VPR.P0 representation: we keep HImode
> for it.
>
> The vector_compare expansion code is updated to use the right VxBI
> mode instead of HI for the result.
>
> New mov patterns are introduced to handle the new modes.
>
> 2021-09-01  Christophe Lyon <christophe.lyon@foss.st.com>
>
> 	gcc/
> 	PR target/100757
> 	PR target/101325
> 	* config/arm/arm-builtins.c (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
> 	(BINOP_PRED_NONE_NONE_QUALIFIERS)
> 	(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
> 	(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
> 	* config/arm/arm.c (arm_hard_regno_mode_ok): Handle new VxBI
> 	modes.
> 	(arm_mode_to_pred_mode): New.
> 	(arm_expand_vector_compare): Use the right VxBI mode instead of
> 	HI.
> 	(arm_expand_vcond): Likewise.
> 	* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
> 	(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
> 	(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
> 	(vpselq_s, vpselq_f): Use new predicated qualifiers.
> 	* config/arm/iterators.md (MVE_7): New mode iterator.
> 	(MVE_VPRED, MVE_vpred): New attribute iterators.
> 	* config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>)
> 	(@mve_vcmp<mve_cmp_op>q_f<mode>, @mve_vpselq_<supf><mode>)
> 	(@mve_vpselq_f<mode>): Use MVE_VPRED instead of HI.
> 	(@mve_vpselq_<supf>v2di): Define separately.
> 	(mov<mode>): New expander for VxBI modes.
> 	(mve_mov<mode>): New insn for VxBI modes.
>
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 771759f0cdd..6e3638869f1 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -469,6 +469,12 @@ arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \
>    (arm_binop_unone_unone_unone_qualifiers)
>  
> +static enum arm_type_qualifiers
> +arm_binop_pred_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned };
> +#define BINOP_PRED_UNONE_UNONE_QUALIFIERS \
> +  (arm_binop_pred_unone_unone_qualifiers)
> +
>  static enum arm_type_qualifiers
>  arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_unsigned, qualifier_none, qualifier_immediate };
> @@ -487,6 +493,12 @@ arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define BINOP_UNONE_NONE_NONE_QUALIFIERS \
>    (arm_binop_unone_none_none_qualifiers)
>  
> +static enum arm_type_qualifiers
> +arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_predicate, qualifier_none, qualifier_none };
> +#define BINOP_PRED_NONE_NONE_QUALIFIERS \
> +  (arm_binop_pred_none_none_qualifiers)
> +
>  static enum arm_type_qualifiers
>  arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
> @@ -558,6 +570,12 @@ arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
>    (arm_ternop_none_none_none_unone_qualifiers)
>  
> +static enum arm_type_qualifiers
> +arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
> +#define TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS \
> +  (arm_ternop_none_none_none_pred_qualifiers)
> +
>  static enum arm_type_qualifiers
>  arm_ternop_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_none, qualifier_none, qualifier_immediate, qualifier_unsigned };
> @@ -577,6 +595,13 @@ arm_ternop_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
>    (arm_ternop_unone_unone_unone_unone_qualifiers)
>  
> +static enum arm_type_qualifiers
> +arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
> +    qualifier_predicate };
> +#define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
> +  (arm_ternop_unone_unone_unone_pred_qualifiers)
> +
>  static enum arm_type_qualifiers
>  arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 1222cb0d0fe..5f6637d9a5f 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -25304,7 +25304,7 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>      return false;
>  
>    if (IS_VPR_REGNUM (regno))
> -    return mode == HImode;
> +    return mode == HImode || mode == V16BImode || mode == V8BImode || mode == V4BImode;

Nit: long line, should be:

    return (mode == HImode
	    || mode == V16BImode
	    || mode == V8BImode
	    || mode == V4BImode);

> @@ -30994,6 +30994,19 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
>      arm_post_atomic_barrier (model);
>  }
>  \f
> +/* Return the mode for the MVE vector of predicates corresponding to MODE.  */
> +machine_mode
> +arm_mode_to_pred_mode (machine_mode mode)
> +{
> +  switch (GET_MODE_NUNITS (mode))
> +    {
> +    case 16: return V16BImode;
> +    case 8: return V8BImode;
> +    case 4: return V4BImode;
> +    }
> +  gcc_unreachable ();
> +}
> +
>  /* Expand code to compare vectors OP0 and OP1 using condition CODE.
>     If CAN_INVERT, store either the result or its inverse in TARGET
>     and return true if TARGET contains the inverse.  If !CAN_INVERT,
> @@ -31077,7 +31090,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
>  	  if (vcond_mve)
>  	    vpr_p0 = target;
>  	  else
> -	    vpr_p0 = gen_reg_rtx (HImode);
> +	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
>  
>  	  switch (GET_MODE_CLASS (cmp_mode))
>  	    {
> @@ -31119,7 +31132,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
>  	  if (vcond_mve)
>  	    vpr_p0 = target;
>  	  else
> -	    vpr_p0 = gen_reg_rtx (HImode);
> +	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
>  
>  	  emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
>  	  if (!vcond_mve)
> @@ -31146,7 +31159,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
>  	  if (vcond_mve)
>  	    vpr_p0 = target;
>  	  else
> -	    vpr_p0 = gen_reg_rtx (HImode);
> +	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
>  
>  	  emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, force_reg (cmp_mode, op1), op0));
>  	  if (!vcond_mve)
> @@ -31199,7 +31212,7 @@ arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode)
>    if (TARGET_HAVE_MVE)
>      {
>        vcond_mve=true;
> -      mask = gen_reg_rtx (HImode);
> +      mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode));
>      }
>    else
>      mask = gen_reg_rtx (cmp_result_mode);
> diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
> index e9b5b28f506..58a05e61bd9 100644
> --- a/gcc/config/arm/arm_mve_builtins.def
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -89,7 +89,7 @@ VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
>  VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
>  VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vsubq_u, v16qi, v8hi, v4si)
> @@ -117,9 +117,9 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vhsubq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si)
> @@ -143,15 +143,15 @@ VAR3 (BINOP_UNONE_UNONE_IMM, vshlq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_IMM, vrshrq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_IMM, vqshlq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpeqq_n_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_IMM, vqshluq_n_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_UNONE, vaddvq_p_s, v16qi, v8hi, v4si)
> @@ -219,17 +219,17 @@ VAR2 (BINOP_UNONE_UNONE_IMM, vshllbq_n_u, v16qi, v8hi)
>  VAR2 (BINOP_UNONE_UNONE_IMM, vorrq_n_u, v8hi, v4si)
>  VAR2 (BINOP_UNONE_UNONE_IMM, vbicq_n_u, v8hi, v4si)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpneq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpneq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpneq_f, v8hf, v4sf)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpltq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpltq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpltq_f, v8hf, v4sf)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpleq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpleq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpleq_f, v8hf, v4sf)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpgtq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpgtq_f, v8hf, v4sf)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpgeq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpgeq_f, v8hf, v4sf)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpeqq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpeqq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpeqq_f, v8hf, v4sf)
>  VAR2 (BINOP_NONE_NONE_NONE, vsubq_f, v8hf, v4sf)
>  VAR2 (BINOP_NONE_NONE_NONE, vqmovntq_s, v8hi, v4si)
>  VAR2 (BINOP_NONE_NONE_NONE, vqmovnbq_s, v8hi, v4si)
> @@ -295,8 +295,8 @@ VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE, vcvtaq_m_u, v8hi, v4si)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtaq_m_s, v8hi, v4si)
>  VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vshlcq_vec_u, v16qi, v8hi, v4si)
>  VAR3 (TERNOP_NONE_NONE_UNONE_IMM, vshlcq_vec_s, v16qi, v8hi, v4si)
> -VAR4 (TERNOP_UNONE_UNONE_UNONE_UNONE, vpselq_u, v16qi, v8hi, v4si, v2di)
> -VAR4 (TERNOP_NONE_NONE_NONE_UNONE, vpselq_s, v16qi, v8hi, v4si, v2di)
> +VAR4 (TERNOP_UNONE_UNONE_UNONE_PRED, vpselq_u, v16qi, v8hi, v4si, v2di)
> +VAR4 (TERNOP_NONE_NONE_NONE_PRED, vpselq_s, v16qi, v8hi, v4si, v2di)
>  VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrev64q_m_u, v16qi, v8hi, v4si)
>  VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmvnq_m_u, v16qi, v8hi, v4si)
>  VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmlasq_n_u, v16qi, v8hi, v4si)
> @@ -426,7 +426,7 @@ VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrev64q_m_f, v8hf, v4sf)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrev32q_m_s, v16qi, v8hi)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vqmovntq_m_s, v8hi, v4si)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vqmovnbq_m_s, v8hi, v4si)
> -VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vpselq_f, v8hf, v4sf)
> +VAR2 (TERNOP_NONE_NONE_NONE_PRED, vpselq_f, v8hf, v4sf)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vnegq_m_f, v8hf, v4sf)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovntq_m_s, v8hi, v4si)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovnbq_m_s, v8hi, v4si)
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index fafbd2f94b8..df5d15e08b8 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -272,6 +272,7 @@ (define_mode_iterator MVE_3 [V16QI V8HI])
>  (define_mode_iterator MVE_2 [V16QI V8HI V4SI])
>  (define_mode_iterator MVE_5 [V8HI V4SI])
>  (define_mode_iterator MVE_6 [V8HI V4SI])
> +(define_mode_iterator MVE_7 [V16BI V8BI V4BI])
>  
>  ;;----------------------------------------------------------------------------
>  ;; Code iterators
> @@ -946,6 +947,10 @@ (define_mode_attr V_extr_elem [(V16QI "u8") (V8HI "u16") (V4SI "32")
>  			       (V8HF "u16") (V4SF "32")])
>  (define_mode_attr earlyclobber_32 [(V16QI "=w") (V8HI "=w") (V4SI "=&w")
>  						(V8HF "=w") (V4SF "=&w")])
> +(define_mode_attr MVE_VPRED [(V16QI "V16BI") (V8HI "V8BI") (V4SI "V4BI")
> +                             (V8HF "V8BI")   (V4SF "V4BI")])
> +(define_mode_attr MVE_vpred [(V16QI "v16bi") (V8HI "v8bi") (V4SI "v4bi")
> +                             (V8HF "v8bi")   (V4SF "v4bi")])
>  
>  ;;----------------------------------------------------------------------------
>  ;; Code attributes
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 14d17060290..c9c8e2c13fe 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -839,8 +839,8 @@ (define_insn "mve_vaddlvq_p_<supf>v4si"
>  ;;
>  (define_insn "@mve_vcmp<mve_cmp_op>q_<mode>"
>    [
> -   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> -	(MVE_COMPARISONS:HI (match_operand:MVE_2 1 "s_register_operand" "w")
> +   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
> +	(MVE_COMPARISONS:<MVE_VPRED> (match_operand:MVE_2 1 "s_register_operand" "w")
>  		    (match_operand:MVE_2 2 "s_register_operand" "w")))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -1929,8 +1929,8 @@ (define_insn "mve_vcaddq<mve_rot><mode>"
>  ;;
>  (define_insn "@mve_vcmp<mve_cmp_op>q_f<mode>"
>    [
> -   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> -	(MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w")
> +   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
> +	(MVE_FP_COMPARISONS:<MVE_VPRED> (match_operand:MVE_0 1 "s_register_operand" "w")
>  			       (match_operand:MVE_0 2 "s_register_operand" "w")))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -3321,9 +3321,21 @@ (define_insn "mve_vnegq_m_s<mode>"
>  ;;
>  (define_insn "@mve_vpselq_<supf><mode>"
>    [
> -   (set (match_operand:MVE_1 0 "s_register_operand" "=w")
> -	(unspec:MVE_1 [(match_operand:MVE_1 1 "s_register_operand" "w")
> -		       (match_operand:MVE_1 2 "s_register_operand" "w")
> +   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> +	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
> +		       (match_operand:MVE_2 2 "s_register_operand" "w")
> +		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
> +	 VPSELQ))
> +  ]
> +  "TARGET_HAVE_MVE"
> +  "vpsel %q0, %q1, %q2"
> +  [(set_attr "type" "mve_move")
> +])
> +(define_insn "@mve_vpselq_<supf>v2di"
> +  [
> +   (set (match_operand:V2DI 0 "s_register_operand" "=w")
> +	(unspec:V2DI [(match_operand:V2DI 1 "s_register_operand" "w")
> +		       (match_operand:V2DI 2 "s_register_operand" "w")
>  		       (match_operand:HI 3 "vpr_register_operand" "Up")]
>  	 VPSELQ))
>    ]

I think we can keep this together and just make MVE_VPRED/MVE_vpred
map V2DI to HI/hi.

> @@ -4419,7 +4431,7 @@ (define_insn "@mve_vpselq_f<mode>"
>     (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>  	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
>  		       (match_operand:MVE_0 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
>  	 VPSELQ_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -10516,3 +10528,25 @@ (define_insn "*movmisalign<mode>_mve_load"
>    "vldr<V_sz_elem1>.<V_sz_elem>\t%q0, %E1"
>    [(set_attr "type" "mve_load")]
>  )
> +
> +(define_expand "mov<mode>"
> +  [(set (match_operand:MVE_7 0 "nonimmediate_operand")
> +        (match_operand:MVE_7 1 "nonimmediate_operand"))]
> +  "TARGET_HAVE_MVE"
> +  {
> +  }
> +)

Becuase of the (correct) register_operand condition on the define_insn,
this expander needs to force operand 1 into a register if neither
operand 0 nor operand 1 are registers:

  {
    if (!register_operand (operands[0], <MODE>mode))
      operands[1] = force_reg (<MODE>mode, operands[1]);
  }

Thanks,
Richard

> +
> +(define_insn "*mve_mov<mode>"
> +  [(set (match_operand:MVE_7 0 "nonimmediate_operand" "=rk, m, r, Up, r")
> +        (match_operand:MVE_7 1 "nonimmediate_operand"  "rk, r, m, r, Up"))]
> +  "TARGET_HAVE_MVE
> +  && (register_operand (operands[0], <MODE>mode)
> +      || register_operand (operands[1], <MODE>mode))"
> +  "@
> +  mov%?\t%0, %1
> +  strh%?\t%1, %0
> +  ldrh%?\t%0, %1
> +  vmsr%?\t P0, %1
> +  vmrs%?\t %0, P0"
> +)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 09/13] arm: Fix vcond_mask expander for MVE (PR target/100757)
  2021-09-07  9:17 ` [PATCH 09/13] arm: Fix vcond_mask expander for MVE (PR target/100757) Christophe Lyon
@ 2021-10-11 14:06   ` Richard Sandiford
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2021-10-11 14:06 UTC (permalink / raw)
  To: Christophe Lyon via Gcc-patches; +Cc: Christophe Lyon, Christophe Lyon

Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> From: Christophe Lyon <christophe.lyon@linaro.org>
>
> The problem in this PR is that we call VPSEL with a mask of vector
> type instead of HImode. This happens because operand 3 in vcond_mask
> is the pre-computed vector comparison and has vector type.
>
> This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
> returning the appropriate VxBI mode when targeting MVE.  In turn, this
> implies implementing vec_cmp<mode><MVE_vpred>,
> vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can
> move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and
> vcond_mask_<mode><v_cmp_result> back to neon.md since they are not
> used by MVE anymore.  The new *<MVE_vpred> patterns listed above are
> implemented in mve.md since they are only valid for MVE. However this
> may make maintenance/comparison more painful than having all of them
> in vec-common.md.
>
> In the process, we can get rid of the recently added vcond_mve
> parameter of arm_expand_vector_compare.
>
> Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm:
> Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
> iterator added in r12-835 (to have V4HF/V8HF support), as well as the
> (!<Is_float_mode> || flag_unsafe_math_optimizations) condition which
> was not present before r12-834 although SF modes were enabled by VDQW
> (I think this was a bug).
>
> Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
> longer need to generate vpsel with vectors of 0 and 1: the masks are
> now merged via scalar 'ands' instructions operating on 16-bit masks
> after converting the boolean vectors.
>
> In addition, this patch fixes a problem in arm_expand_vcond() where
> the result would be a vector of 0 or 1 instead of operand 1 or 2.
>
> Reducing the number of iterations in pr100757-3.c from 32 to 8, we
> generate the code below:
>
> float a[32];
> float fn1(int d) {
>   float c = 4.0f;
>   for (int b = 0; b < 8; b++)
>     if (a[b] != 2.0f)
>       c = 5.0f;
>   return c;
> }
>
> fn1:
> 	ldr     r3, .L3+48
> 	vldr.64 d4, .L3              // q2=(2.0,2.0,2.0,2.0)
> 	vldr.64 d5, .L3+8
> 	vldrw.32        q0, [r3]     // q0=a(0..3)
> 	adds    r3, r3, #16
> 	vcmp.f32        eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
> 	vldrw.32        q1, [r3]     // q1=a(4..7)
> 	vmrs     r3, P0
> 	vcmp.f32        eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
> 	vmrs    r2, P0  @ movhi
> 	ands    r3, r3, r2           // r3=select(a(0..3]) & select(a(4..7))
> 	vldr.64 d4, .L3+16           // q2=(5.0,5.0,5.0,5.0)
> 	vldr.64 d5, .L3+24
> 	vmsr     P0, r3
> 	vldr.64 d6, .L3+32           // q3=(4.0,4.0,4.0,4.0)
> 	vldr.64 d7, .L3+40
> 	vpsel q3, q3, q2             // q3=vcond_mask(4.0,5.0)
> 	vmov.32 r2, q3[1]            // keep the scalar max
> 	vmov.32 r0, q3[3]
> 	vmov.32 r3, q3[2]
> 	vmov.f32        s11, s12
> 	vmov    s15, r2
> 	vmov    s14, r3
> 	vmaxnm.f32      s15, s11, s15
> 	vmaxnm.f32      s15, s15, s14
> 	vmov    s14, r0
> 	vmaxnm.f32      s15, s15, s14
> 	vmov    r0, s15
> 	bx      lr
> 	.L4:
> 	.align  3
> 	.L3:
> 	.word   1073741824	// 2.0f
> 	.word   1073741824
> 	.word   1073741824
> 	.word   1073741824
> 	.word   1084227584	// 5.0f
> 	.word   1084227584
> 	.word   1084227584
> 	.word   1084227584
> 	.word   1082130432	// 4.0f
> 	.word   1082130432
> 	.word   1082130432
> 	.word   1082130432
>
> 2021-09-02  Christophe Lyon  <christophe.lyon@linaro.org>
>
> 	PR target/100757
> 	gcc/
> 	* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
> 	(arm_expand_vector_compare): Update prototype.
> 	* config/arm/arm.c (TARGET_VECTORIZE_GET_MASK_MODE): New.
> 	(arm_vector_mode_supported_p): Add support for VxBI modes.
> 	(arm_expand_vector_compare): Remove useless generation of vpsel.
> 	(arm_expand_vcond): Fix select operands.
> 	(arm_get_mask_mode): New.
> 	* config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New.
> 	(vec_cmpu<mode><MVE_vpred>): New.
> 	(vcond_mask_<mode><MVE_vpred>): New.
> 	* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>)
> 	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ...
> 	* config/arm/neon.md (vec_cmp<mode><v_cmp_result>)
> 	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here
> 	and disable for MVE.
>
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 9b1f61394ad..9e3d71e0c29 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -201,6 +201,7 @@ extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
>  extern bool arm_pad_reg_upward (machine_mode, tree, int);
>  #endif
>  extern int arm_apply_result_size (void);
> +extern opt_machine_mode arm_get_mask_mode (machine_mode mode);
>  
>  #endif /* RTX_CODE */
>  
> @@ -372,7 +373,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
>  extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
>  extern bool arm_valid_symbolic_address_p (rtx);
>  extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
> -extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool, bool);
> +extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool);
>  #endif /* RTX_CODE */
>  
>  extern bool arm_gen_setmem (rtx *);
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 5f6637d9a5f..3326cd163a2 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -835,6 +835,10 @@ static const struct attribute_spec arm_attribute_table[] =
>  
>  #undef TARGET_MD_ASM_ADJUST
>  #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
> +
> +#undef TARGET_VECTORIZE_GET_MASK_MODE
> +#define TARGET_VECTORIZE_GET_MASK_MODE arm_get_mask_mode
> +
>  \f
>  /* Obstack for minipool constant handling.  */
>  static struct obstack minipool_obstack;
> @@ -29193,7 +29197,8 @@ arm_vector_mode_supported_p (machine_mode mode)
>  
>    if (TARGET_HAVE_MVE
>        && (mode == V2DImode || mode == V4SImode || mode == V8HImode
> -	  || mode == V16QImode))
> +	  || mode == V16QImode
> +	  || mode == V16BImode || mode == V8BImode || mode == V4BImode))
>        return true;
>  
>    if (TARGET_HAVE_MVE_FLOAT
> @@ -31012,16 +31017,12 @@ arm_mode_to_pred_mode (machine_mode mode)
>     and return true if TARGET contains the inverse.  If !CAN_INVERT,
>     always store the result in TARGET, never its inverse.
>  
> -   If VCOND_MVE, do not emit the vpsel instruction here, let arm_expand_vcond do
> -   it with the right destination type to avoid emiting two vpsel, one here and
> -   one in arm_expand_vcond.
> -
>     Note that the handling of floating-point comparisons is not
>     IEEE compliant.  */
>  
>  bool
>  arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
> -			   bool can_invert, bool vcond_mve)
> +			   bool can_invert)
>  {
>    machine_mode cmp_result_mode = GET_MODE (target);
>    machine_mode cmp_mode = GET_MODE (op0);
> @@ -31050,7 +31051,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
>  	       and then store its inverse in TARGET.  This avoids reusing
>  	       TARGET (which for integer NE could be one of the inputs).  */
>  	    rtx tmp = gen_reg_rtx (cmp_result_mode);
> -	    if (arm_expand_vector_compare (tmp, code, op0, op1, true, vcond_mve))
> +	    if (arm_expand_vector_compare (tmp, code, op0, op1, true))
>  	      gcc_unreachable ();
>  	    emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp)));
>  	    return false;
> @@ -31086,36 +31087,20 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
>      case NE:
>        if (TARGET_HAVE_MVE)
>  	{
> -	  rtx vpr_p0;
> -	  if (vcond_mve)
> -	    vpr_p0 = target;
> -	  else
> -	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> -
>  	  switch (GET_MODE_CLASS (cmp_mode))
>  	    {
>  	    case MODE_VECTOR_INT:
> -	      emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
> +	      emit_insn (gen_mve_vcmpq (code, cmp_mode, target, op0, force_reg (cmp_mode, op1)));
>  	      break;
>  	    case MODE_VECTOR_FLOAT:
>  	      if (TARGET_HAVE_MVE_FLOAT)
> -		emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
> +		emit_insn (gen_mve_vcmpq_f (code, cmp_mode, target, op0, force_reg (cmp_mode, op1)));
>  	      else
>  		gcc_unreachable ();
>  	      break;
>  	    default:
>  	      gcc_unreachable ();
>  	    }
> -
> -	  /* If we are not expanding a vcond, build the result here.  */
> -	  if (!vcond_mve)
> -	    {
> -	      rtx zero = gen_reg_rtx (cmp_result_mode);
> -	      rtx one = gen_reg_rtx (cmp_result_mode);
> -	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> -	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> -	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0));
> -	    }
>  	}
>        else
>  	emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1));
> @@ -31127,23 +31112,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
>      case GEU:
>      case GTU:
>        if (TARGET_HAVE_MVE)
> -	{
> -	  rtx vpr_p0;
> -	  if (vcond_mve)
> -	    vpr_p0 = target;
> -	  else
> -	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> -
> -	  emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
> -	  if (!vcond_mve)
> -	    {
> -	      rtx zero = gen_reg_rtx (cmp_result_mode);
> -	      rtx one = gen_reg_rtx (cmp_result_mode);
> -	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> -	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> -	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0));
> -	    }
> -	}
> +	emit_insn (gen_mve_vcmpq (code, cmp_mode, target, op0, force_reg (cmp_mode, op1)));
>        else
>  	emit_insn (gen_neon_vc (code, cmp_mode, target,
>  				op0, force_reg (cmp_mode, op1)));
> @@ -31154,23 +31123,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
>      case LEU:
>      case LTU:
>        if (TARGET_HAVE_MVE)
> -	{
> -	  rtx vpr_p0;
> -	  if (vcond_mve)
> -	    vpr_p0 = target;
> -	  else
> -	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> -
> -	  emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, force_reg (cmp_mode, op1), op0));
> -	  if (!vcond_mve)
> -	    {
> -	      rtx zero = gen_reg_rtx (cmp_result_mode);
> -	      rtx one = gen_reg_rtx (cmp_result_mode);
> -	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> -	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> -	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0));
> -	    }
> -	}
> +	emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, target, force_reg (cmp_mode, op1), op0));
>        else
>  	emit_insn (gen_neon_vc (swap_condition (code), cmp_mode,
>  				target, force_reg (cmp_mode, op1), op0));
> @@ -31185,8 +31138,8 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
>  	rtx gt_res = gen_reg_rtx (cmp_result_mode);
>  	rtx alt_res = gen_reg_rtx (cmp_result_mode);
>  	rtx_code alt_code = (code == LTGT ? LT : LE);
> -	if (arm_expand_vector_compare (gt_res, GT, op0, op1, true, vcond_mve)
> -	    || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true, vcond_mve))
> +	if (arm_expand_vector_compare (gt_res, GT, op0, op1, true)
> +	    || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true))
>  	  gcc_unreachable ();
>  	emit_insn (gen_rtx_SET (target, gen_rtx_IOR (cmp_result_mode,
>  						     gt_res, alt_res)));
> @@ -31206,19 +31159,15 @@ arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode)
>  {
>    /* When expanding for MVE, we do not want to emit a (useless) vpsel in
>       arm_expand_vector_compare, and another one here.  */
> -  bool vcond_mve=false;
>    rtx mask;
>  
>    if (TARGET_HAVE_MVE)
> -    {
> -      vcond_mve=true;
> -      mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode));
> -    }
> +    mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode));
>    else
>      mask = gen_reg_rtx (cmp_result_mode);
>  
>    bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]),
> -					     operands[4], operands[5], true, vcond_mve);
> +					     operands[4], operands[5], true);
>    if (inverted)
>      std::swap (operands[1], operands[2]);
>    if (TARGET_NEON)
> @@ -31226,20 +31175,20 @@ arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode)
>  			    mask, operands[1], operands[2]));
>    else
>      {
> -      machine_mode cmp_mode = GET_MODE (operands[4]);
> -      rtx vpr_p0 = mask;
> -      rtx zero = gen_reg_rtx (cmp_mode);
> -      rtx one = gen_reg_rtx (cmp_mode);
> -      emit_move_insn (zero, CONST0_RTX (cmp_mode));
> -      emit_move_insn (one, CONST1_RTX (cmp_mode));
> +      machine_mode cmp_mode = GET_MODE (operands[0]);
> +
>        switch (GET_MODE_CLASS (cmp_mode))
>  	{
>  	case MODE_VECTOR_INT:
> -	  emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], one, zero, vpr_p0));
> +	  emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_mode, operands[0],
> +				     operands[1], operands[2], mask));
>  	  break;
>  	case MODE_VECTOR_FLOAT:
>  	  if (TARGET_HAVE_MVE_FLOAT)
> -	    emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, vpr_p0));
> +	    emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0],
> +					 operands[1], operands[2], mask));
> +	  else
> +	    gcc_unreachable ();
>  	  break;
>  	default:
>  	  gcc_unreachable ();
> @@ -34149,4 +34098,15 @@ arm_mode_base_reg_class (machine_mode mode)
>  
>  struct gcc_target targetm = TARGET_INITIALIZER;
>  
> +/* Implement TARGET_VECTORIZE_GET_MASK_MODE.  */
> +
> +opt_machine_mode
> +arm_get_mask_mode (machine_mode mode)
> +{
> +  if (TARGET_HAVE_MVE)
> +    return arm_mode_to_pred_mode (mode);

I think this needs to check whether arm_mode_to_pred_mode accepts
the mode first.  (Alternatively, arm_mode_to_pred_mode could return
an opt_machine_mode and punt for modes that it doesn't understand.)

> +
> +  return default_get_mask_mode (mode);
> +}
> +
>  #include "gt-arm.h"
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index c9c8e2c13fe..d663c698cfb 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -10550,3 +10550,58 @@ (define_insn "*mve_mov<mode>"
>    vmsr%?\t P0, %1
>    vmrs%?\t %0, P0"
>  )
> +
> +;; Expanders for vec_cmp and vcond
> +
> +(define_expand "vec_cmp<mode><MVE_vpred>"
> +  [(set (match_operand:<MVE_VPRED> 0 "s_register_operand")
> +	(match_operator:<MVE_VPRED> 1 "comparison_operator"
> +	  [(match_operand:MVE_VLD_ST 2 "s_register_operand")
> +	   (match_operand:MVE_VLD_ST 3 "reg_or_zero_operand")]))]
> +  "TARGET_HAVE_MVE
> +   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> +{
> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> +			     operands[2], operands[3], false);
> +  DONE;
> +})
> +
> +(define_expand "vec_cmpu<mode><MVE_vpred>"
> +  [(set (match_operand:<MVE_VPRED> 0 "s_register_operand")
> +	(match_operator:<MVE_VPRED> 1 "comparison_operator"
> +	  [(match_operand:MVE_2 2 "s_register_operand")
> +	   (match_operand:MVE_2 3 "reg_or_zero_operand")]))]
> +  "TARGET_HAVE_MVE
> +   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"

The float check should be redundant here, since MVE_2 only includes
integer modes.

Looks good otherwise, thanks.

Richard

> +{
> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> +			     operands[2], operands[3], false);
> +  DONE;
> +})
> +
> +(define_expand "vcond_mask_<mode><MVE_vpred>"
> +  [(set (match_operand:MVE_VLD_ST 0 "s_register_operand")
> +	(if_then_else:MVE_VLD_ST
> +	  (match_operand:<MVE_VPRED> 3 "s_register_operand")
> +	  (match_operand:MVE_VLD_ST 1 "s_register_operand")
> +	  (match_operand:MVE_VLD_ST 2 "s_register_operand")))]
> +  "TARGET_HAVE_MVE"
> +{
> +  switch (GET_MODE_CLASS (<MODE>mode))
> +    {
> +      case MODE_VECTOR_INT:
> +	emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0],
> +				   operands[1], operands[2], operands[3]));
> +	break;
> +      case MODE_VECTOR_FLOAT:
> +	if (TARGET_HAVE_MVE_FLOAT)
> +	  emit_insn (gen_mve_vpselq_f (<MODE>mode, operands[0],
> +				       operands[1], operands[2], operands[3]));
> +	else
> +	  gcc_unreachable ();
> +	break;
> +      default:
> +	gcc_unreachable ();
> +    }
> +  DONE;
> +})
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index 8b0a396947c..28310d93a4e 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -1394,6 +1394,45 @@ (define_insn "*us_sub<mode>_neon"
>    [(set_attr "type" "neon_qsub<q>")]
>  )
>  
> +(define_expand "vec_cmp<mode><v_cmp_result>"
> +  [(set (match_operand:<V_cmp_result> 0 "s_register_operand")
> +	(match_operator:<V_cmp_result> 1 "comparison_operator"
> +	  [(match_operand:VDQWH 2 "s_register_operand")
> +	   (match_operand:VDQWH 3 "reg_or_zero_operand")]))]
> +  "TARGET_NEON
> +   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> +{
> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> +			     operands[2], operands[3], false);
> +  DONE;
> +})
> +
> +(define_expand "vec_cmpu<mode><mode>"
> +  [(set (match_operand:VDQIW 0 "s_register_operand")
> +	(match_operator:VDQIW 1 "comparison_operator"
> +	  [(match_operand:VDQIW 2 "s_register_operand")
> +	   (match_operand:VDQIW 3 "reg_or_zero_operand")]))]
> +  "TARGET_NEON"
> +{
> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> +			     operands[2], operands[3], false);
> +  DONE;
> +})
> +
> +(define_expand "vcond_mask_<mode><v_cmp_result>"
> +  [(set (match_operand:VDQWH 0 "s_register_operand")
> +	(if_then_else:VDQWH
> +	  (match_operand:<V_cmp_result> 3 "s_register_operand")
> +	  (match_operand:VDQWH 1 "s_register_operand")
> +	  (match_operand:VDQWH 2 "s_register_operand")))]
> +  "TARGET_NEON
> +   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> +{
> +  emit_insn (gen_neon_vbsl<mode> (operands[0], operands[3], operands[1],
> +				  operands[2]));
> +  DONE;
> +})
> +
>  ;; Patterns for builtins.
>  
>  ; good for plain vadd, vaddq.
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> index 68de4f0f943..9b461a76155 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -363,33 +363,6 @@ (define_expand "vlshr<mode>3"
>      }
>  })
>  
> -(define_expand "vec_cmp<mode><v_cmp_result>"
> -  [(set (match_operand:<V_cmp_result> 0 "s_register_operand")
> -	(match_operator:<V_cmp_result> 1 "comparison_operator"
> -	  [(match_operand:VDQWH 2 "s_register_operand")
> -	   (match_operand:VDQWH 3 "reg_or_zero_operand")]))]
> -  "ARM_HAVE_<MODE>_ARITH
> -   && !TARGET_REALLY_IWMMXT
> -   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> -{
> -  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> -			     operands[2], operands[3], false, false);
> -  DONE;
> -})
> -
> -(define_expand "vec_cmpu<mode><mode>"
> -  [(set (match_operand:VDQIW 0 "s_register_operand")
> -	(match_operator:VDQIW 1 "comparison_operator"
> -	  [(match_operand:VDQIW 2 "s_register_operand")
> -	   (match_operand:VDQIW 3 "reg_or_zero_operand")]))]
> -  "ARM_HAVE_<MODE>_ARITH
> -   && !TARGET_REALLY_IWMMXT"
> -{
> -  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> -			     operands[2], operands[3], false, false);
> -  DONE;
> -})
> -
>  ;; Conditional instructions.  These are comparisons with conditional moves for
>  ;; vectors.  They perform the assignment:
>  ;;
> @@ -461,31 +434,6 @@ (define_expand "vcondu<mode><v_cmp_result>"
>    DONE;
>  })
>  
> -(define_expand "vcond_mask_<mode><v_cmp_result>"
> -  [(set (match_operand:VDQWH 0 "s_register_operand")
> -        (if_then_else:VDQWH
> -          (match_operand:<V_cmp_result> 3 "s_register_operand")
> -          (match_operand:VDQWH 1 "s_register_operand")
> -          (match_operand:VDQWH 2 "s_register_operand")))]
> -  "ARM_HAVE_<MODE>_ARITH
> -   && !TARGET_REALLY_IWMMXT
> -   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> -{
> -  if (TARGET_NEON)
> -    {
> -      emit_insn (gen_neon_vbsl (<MODE>mode, operands[0], operands[3],
> -                                operands[1], operands[2]));
> -    }
> -  else if (TARGET_HAVE_MVE)
> -    {
> -      emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0],
> -                                 operands[1], operands[2], operands[3]));
> -    }
> -  else
> -    gcc_unreachable ();
> -  DONE;
> -})
> -
>  (define_expand "vec_load_lanesoi<mode>"
>    [(set (match_operand:OI 0 "s_register_operand")
>          (unspec:OI [(match_operand:OI 1 "neon_struct_operand")

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 06/13] arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  2021-10-11 13:42   ` Richard Sandiford
@ 2021-10-13 10:19     ` Christophe Lyon
  0 siblings, 0 replies; 13+ messages in thread
From: Christophe Lyon @ 2021-10-13 10:19 UTC (permalink / raw)
  To: Richard Sandiford, Christophe Lyon via Gcc-patches, Christophe Lyon

On Mon, Oct 11, 2021 at 4:10 PM Richard Sandiford via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
> > <V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.
> >
> > 2021-09-03  Christophe Lyon  <christophe.lyon@foss.st.com>
> >
> >       gcc/
> >       * config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
> >       for operand 1.
> >
> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> > index e393518ea88..14d17060290 100644
> > --- a/gcc/config/arm/mve.md
> > +++ b/gcc/config/arm/mve.md
> > @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_<supf><mode>"
> >  (define_insn "mve_vmvnq_n_<supf><mode>"
> >    [
> >     (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> > -     (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
> > +     (unspec:MVE_5 [(match_operand:<V_elem> 1 "immediate_operand" "i")]
> >        VMVNQ_N))
> >    ]
> >    "TARGET_HAVE_MVE"
>
> I agree this is correct, but there's also the issue that the
> predicate is too broad.  At the moment it allows any immediate,
> so things like:
>
>   #include <arm_mve.h>
>   int32x4_t foo(void) { return vmvnq_n_s32(0x12345678); }
>
> are accepted by the compiler and only rejected by the assembler.
> Not your bug to fix, just saying :-)
>
>
Right, and it seems to be the case for vbicq_n, vorrq_n, ...
I'll check that separately.



> Thanks,
> Richard
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-13 10:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-07  9:16 [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p Christophe Lyon
2021-09-07  9:16 ` [PATCH 06/13] arm: Fix mve_vmvnq_n_<supf><mode> argument mode Christophe Lyon
2021-09-28 11:24   ` Kyrylo Tkachov
2021-10-11 13:42   ` Richard Sandiford
2021-10-13 10:19     ` Christophe Lyon
2021-09-07  9:16 ` [PATCH 07/13] arm: Implement MVE predicates as vectors of booleans Christophe Lyon
2021-10-11 13:50   ` Richard Sandiford
2021-09-07  9:16 ` [PATCH 08/13] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates Christophe Lyon
2021-10-11 13:59   ` Richard Sandiford
2021-09-07  9:17 ` [PATCH 09/13] arm: Fix vcond_mask expander for MVE (PR target/100757) Christophe Lyon
2021-10-11 14:06   ` Richard Sandiford
2021-09-28 11:23 ` [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p Kyrylo Tkachov
2021-10-11 13:11 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).