public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates
@ 2022-01-13 14:56 Christophe Lyon
  2022-01-13 14:56 ` [PATCH v3 01/15] arm: Add new tests for comparison vectorization with Neon and MVE Christophe Lyon
                   ` (15 more replies)
  0 siblings, 16 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches


This is v3 of this patch series, fixing issues I discovered before
committing v2 (which had been approved).

Thanks a lot to Richard Sandiford for his help.

The changes v2 -> v3 are:

Patch 4: Fix arm_hard_regno_nregs and CLASS_MAX_NREGS to support VPR.

Patch 7: Changes to the underlying representation of vectors of
booleans to account for the different expectations between AArch64/SVE
and Arm/MVE.

Patch 8: Re-use and extend existing thumb2_movhi* patterns instead of
duplicating them in mve_mov<mode>. This requires the introduction of a
new constraint to match a constant vector of booleans. Add a new RTL
test.

Patch 9: Introduce check_effective_target_arm_mve and skip
gcc.dg/signbit-2.c, because with MVE there is no fallback architecture
unlike SVE or AVX512.

Patch 12: Update less load/store MVE builtins
(mve_vldrdq_gather_base_z_<supf>v2di,
mve_vldrdq_gather_offset_z_<supf>v2di,
mve_vldrdq_gather_shifted_offset_z_<supf>v2di,
mve_vstrdq_scatter_base_p_<supf>v2di,
mve_vstrdq_scatter_offset_p_<supf>v2di,
mve_vstrdq_scatter_offset_p_<supf>v2di_insn,
mve_vstrdq_scatter_shifted_offset_p_<supf>v2di,
mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn,
mve_vstrdq_scatter_base_wb_p_<supf>v2di,
mve_vldrdq_gather_base_wb_z_<supf>v2di,
mve_vldrdq_gather_base_nowb_z_<supf>v2di,
mve_vldrdq_gather_base_wb_z_<supf>v2di_insn) for which we keep HI mode
for vpr_register_operand.

Patch 13: No need to update
gcc.target/arm/acle/cde-mve-full-assembly.c anymore since we re-use
the mov pattern that emits '@ movhi' in the assembly.

Patch 15: This is a new patch to fix a problem I noticed during this
v2->v3 update.



I'll squash patch 2 with patch 9 and patch 3 with patch 8.

Original text:

This patch series addresses PR 100757 and 101325 by representing
vectors of predicates (MVE VPR.P0 register) as vectors of booleans
rather than using HImode.

As this implies a lot of mostly mechanical changes, I have tried to
split the patches in a way that should help reviewers, but the split
is a bit artificial.

Patches 1-3 add new tests.

Patches 4-6 are small independent improvements.

Patch 7 implements the predicate qualifier, but does not change any
builtin yet.

Patch 8 is the first of the two main patches, and uses the new
qualifier to describe the vcmp and vpsel builtins that are useful for
auto-vectorization of comparisons.

Patch 9 is the second main patch, which fixes the vcond_mask expander.

Patches 10-13 convert almost all the remaining builtins with HI
operands to use the predicate qualifier.  After these, there are still
a few builtins with HI operands left, about which I am not sure: vctp,
vpnot, load-gather and store-scatter with v2di operands.  In fact,
patches 11/12 update some STR/LDR qualifiers in a way that breaks
these v2di builtins although existing tests still pass.

Christophe Lyon (15):
  arm: Add new tests for comparison vectorization with Neon and MVE
  arm: Add tests for PR target/100757
  arm: Add tests for PR target/101325
  arm: Add GENERAL_AND_VPR_REGS regclass
  arm: Add support for VPR_REG in arm_class_likely_spilled_p
  arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  arm: Implement MVE predicates as vectors of booleans
  arm: Implement auto-vectorized MVE comparisons with vectors of boolean
    predicates
  arm: Fix vcond_mask expander for MVE (PR target/100757)
  arm: Convert remaining MVE vcmp builtins to predicate qualifiers
  arm: Convert more MVE builtins to predicate qualifiers
  arm: Convert more load/store MVE builtins to predicate qualifiers
  arm: Convert more MVE/CDE builtins to predicate qualifiers
  arm: Add VPR_REG to ALL_REGS
  arm: Fix constraint check for V8HI in mve_vector_mem_operand

 gcc/config/aarch64/aarch64-modes.def          |   8 +-
 gcc/config/arm/arm-builtins.c                 | 224 +++--
 gcc/config/arm/arm-builtins.h                 |   4 +-
 gcc/config/arm/arm-modes.def                  |   8 +
 gcc/config/arm/arm-protos.h                   |   4 +-
 gcc/config/arm/arm-simd-builtin-types.def     |   4 +
 gcc/config/arm/arm.c                          | 169 ++--
 gcc/config/arm/arm.h                          |   9 +-
 gcc/config/arm/arm_mve_builtins.def           | 746 ++++++++--------
 gcc/config/arm/constraints.md                 |   6 +
 gcc/config/arm/iterators.md                   |   6 +
 gcc/config/arm/mve.md                         | 795 ++++++++++--------
 gcc/config/arm/neon.md                        |  39 +
 gcc/config/arm/vec-common.md                  |  52 --
 gcc/config/arm/vfp.md                         |  34 +-
 gcc/doc/sourcebuild.texi                      |   4 +
 gcc/emit-rtl.c                                |  20 +-
 gcc/genmodes.c                                |  81 +-
 gcc/machmode.def                              |   2 +-
 gcc/rtx-vector-builder.c                      |   4 +-
 gcc/simplify-rtx.c                            |  34 +-
 gcc/testsuite/gcc.dg/signbit-2.c              |   1 +
 .../gcc.target/arm/simd/mve-vcmp-f32-2.c      |  32 +
 .../gcc.target/arm/simd/neon-compare-1.c      |  78 ++
 .../gcc.target/arm/simd/neon-compare-2.c      |  13 +
 .../gcc.target/arm/simd/neon-compare-3.c      |  14 +
 .../arm/simd/neon-compare-scalar-1.c          |  57 ++
 .../gcc.target/arm/simd/neon-vcmp-f16.c       |  12 +
 .../gcc.target/arm/simd/neon-vcmp-f32-2.c     |  15 +
 .../gcc.target/arm/simd/neon-vcmp-f32-3.c     |  12 +
 .../gcc.target/arm/simd/neon-vcmp-f32.c       |  12 +
 gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c |  22 +
 .../gcc.target/arm/simd/pr100757-2.c          |  20 +
 .../gcc.target/arm/simd/pr100757-3.c          |  20 +
 .../gcc.target/arm/simd/pr100757-4.c          |  19 +
 gcc/testsuite/gcc.target/arm/simd/pr100757.c  |  19 +
 .../gcc.target/arm/simd/pr101325-2.c          |  19 +
 gcc/testsuite/gcc.target/arm/simd/pr101325.c  |  14 +
 gcc/testsuite/lib/target-supports.exp         |  15 +-
 gcc/varasm.c                                  |   7 +-
 40 files changed, 1635 insertions(+), 1019 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-scalar-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f16.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr101325.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 01/15] arm: Add new tests for comparison vectorization with Neon and MVE
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-13 14:56 ` [PATCH v3 02/15] arm: Add tests for PR target/100757 Christophe Lyon
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

This patch mainly adds Neon tests similar to existing MVE ones,
to make sure we do not break Neon when fixing MVE.

mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional
with 2.0f and 3.0f constants to help scan-assembler-times.

2022-01-13  Christophe Lyon <christophe.lyon@foss.st.com>

	gcc/testsuite/
	* gcc.target/arm/simd/mve-vcmp-f32-2.c: New.
	* gcc.target/arm/simd/neon-compare-1.c: New.
	* gcc.target/arm/simd/neon-compare-2.c: New.
	* gcc.target/arm/simd/neon-compare-3.c: New.
	* gcc.target/arm/simd/neon-compare-scalar-1.c: New.
	* gcc.target/arm/simd/neon-vcmp-f16.c: New.
	* gcc.target/arm/simd/neon-vcmp-f32-2.c: New.
	* gcc.target/arm/simd/neon-vcmp-f32-3.c: New.
	* gcc.target/arm/simd/neon-vcmp-f32.c: New.
	* gcc.target/arm/simd/neon-vcmp.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
new file mode 100644
index 00000000000..917a95bf141
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include <stdint.h>
+
+#define NB 4
+
+#define FUNC(OP, NAME)							\
+  void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \
+    int i;								\
+    for (i=0; i<NB; i++) {						\
+      dest[i] = (a[i] OP b[i]) ? 2.0f : 3.0f;				\
+    }									\
+  }
+
+FUNC(==, vcmpeq)
+FUNC(!=, vcmpne)
+FUNC(<, vcmplt)
+FUNC(<=, vcmple)
+FUNC(>, vcmpgt)
+FUNC(>=, vcmpge)
+
+/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 24 } } */ /* Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1077936128\n} 24 } } */ /* Constant 3.0f.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
new file mode 100644
index 00000000000..2e0222a71f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3" } */
+
+#include "mve-compare-1.c"
+
+/* 64-bit vectors.  */
+/* vmvn is used by 'ne' comparisons: 3 sizes * 2 (signed/unsigned) * 2
+   (register/zero) = 12.  */
+/* { dg-final { scan-assembler-times {\tvmvn\td[0-9]+, d[0-9]+\n} 12 } } */
+
+/* { 8 bits } x { eq, ne, lt, le, gt, ge }. */
+/* ne uses eq, lt/le only apply to comparison with zero, they use gt/ge
+   otherwise.  */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, d[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, #0\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvclt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcle.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+
+/* { 16 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, d[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, #0\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvclt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcle.s16\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+
+/* { 32 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, d[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, #0\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvclt.s32\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcle.s32\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s32\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s32\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s32\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s32\td[0-9]+, d[0-9]+, #0\n} 1 } } */
+
+/* 128-bit vectors.  */
+
+/* vmvn is used by 'ne' comparisons.  */
+/* { dg-final { scan-assembler-times {\tvmvn\tq[0-9]+, q[0-9]+\n} 12 } } */
+
+/* { 8 bits } x { eq, ne, lt, le, gt, ge }.  */
+/* { dg-final { scan-assembler-times {\tvceq.i8\tq[0-9]+, q[0-9]+, q[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i8\tq[0-9]+, q[0-9]+, #0\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvclt.s8\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcle.s8\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s8\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s8\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s8\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s8\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
+
+/* { 16 bits } x { eq, ne, lt, le, gt, ge }.  */
+/* { dg-final { scan-assembler-times {\tvceq.i16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i16\tq[0-9]+, q[0-9]+, #0\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvclt.s16\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcle.s16\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s16\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
+
+/* { 32 bits } x { eq, ne, lt, le, gt, ge }.  */
+/* { dg-final { scan-assembler-times {\tvceq.i32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i32\tq[0-9]+, q[0-9]+, #0\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvclt.s32\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcle.s32\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s32\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s32\tq[0-9]+, q[0-9]+, #0\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-compare-2.c b/gcc/testsuite/gcc.target/arm/simd/neon-compare-2.c
new file mode 100644
index 00000000000..06f3c14c91e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-compare-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include "mve-compare-2.c"
+
+/* eq, ne, lt, le, gt, ge.  */
+/* ne uses eq+vmvn, lt/le use gt/ge with swapped operands.  */
+/* { dg-final { scan-assembler-times {\tvceq.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvmvn\tq[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-compare-3.c b/gcc/testsuite/gcc.target/arm/simd/neon-compare-3.c
new file mode 100644
index 00000000000..9c9f108843b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-compare-3.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include "mve-compare-3.c"
+
+
+/* eq, ne, lt, le, gt, ge.  */
+/* ne uses eq+vmvn, lt/le use gt/ge with swapped operands.  */
+/* { dg-final { scan-assembler-times {\tvceq.f16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvmvn\tq[0-9]+, q[0-9]+\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.f16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.f16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-compare-scalar-1.c b/gcc/testsuite/gcc.target/arm/simd/neon-compare-scalar-1.c
new file mode 100644
index 00000000000..0783624a3f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-compare-scalar-1.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3" } */
+
+#include "mve-compare-scalar-1.c"
+
+/* 64-bit vectors.  */
+/* vmvn is used by 'ne' comparisons.  */
+/* { dg-final { scan-assembler-times {\tvmvn\td[0-9]+, d[0-9]+\n} 6 } } */
+
+/* { 8 bits } x { eq, ne, lt, le, gt, ge }.  */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, d[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.u8\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.u8\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+
+/* { 16 bits } x { eq, ne, lt, le, gt, ge }.  */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, d[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.u16\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.u16\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+
+/* { 32 bits } x { eq, ne, lt, le, gt, ge }.  */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, d[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s32\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.u32\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s32\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.u32\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */
+
+/* 128-bit vectors.  */
+
+/* vmvn is used by 'ne' comparisons.  */
+/* { dg-final { scan-assembler-times {\tvmvn\tq[0-9]+, q[0-9]+\n} 6 } } */
+
+/* { 8 bits } x { eq, ne, lt, le, gt, ge }.  */
+/* { dg-final { scan-assembler-times {\tvceq.i8\tq[0-9]+, q[0-9]+, q[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s8\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.u8\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s8\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.u8\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+
+/* { 16 bits } x { eq, ne, lt, le, gt, ge }.  */
+/* { dg-final { scan-assembler-times {\tvceq.i16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.u16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.u16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+
+/* { 32 bits } x { eq, ne, lt, le, gt, ge }.  */
+/* { dg-final { scan-assembler-times {\tvceq.i32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.u32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.u32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f16.c b/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f16.c
new file mode 100644
index 00000000000..688fd9a235f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f16.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include "mve-vcmp-f16.c"
+
+/* 'ne' uses vceq.  */
+/* le and lt use ge and gt with inverted operands.  */
+/* { dg-final { scan-assembler-times {\tvceq.f16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.f16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.f16\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-2.c b/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-2.c
new file mode 100644
index 00000000000..a22923eb242
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include "mve-vcmp-f32-2.c"
+
+/* 'ne' uses vceq.  */
+/* le and lt use ge and gt with inverted operands.  */
+/* { dg-final { scan-assembler-times {\tvceq.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tvmov.f32\tq[0-9]+, #2.0e\+0} 6 } } */
+/* { dg-final { scan-assembler-times {\tvmov.f32\tq[0-9]+, #3.0e\+0} 6 } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-3.c b/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-3.c
new file mode 100644
index 00000000000..4f12f043d3a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3" } */
+
+#include "mve-vcmp-f32.c"
+
+/* Should not be vectorized, since we do not use -funsafe-math-optimizations.  */
+
+/* { dg-final { scan-assembler-not {\tvceq.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} } } */
+/* { dg-final { scan-assembler-not {\tvcge.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} } } */
+/* { dg-final { scan-assembler-not {\tvcgt.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32.c b/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32.c
new file mode 100644
index 00000000000..06e5c4fd1d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include "mve-vcmp-f32.c"
+
+/* 'ne' uses vceq.  */
+/* le and lt use ge and gt with inverted operands.  */
+/* { dg-final { scan-assembler-times {\tvceq.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.f32\tq[0-9]+, q[0-9]+, q[0-9]+\n} 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c b/gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c
new file mode 100644
index 00000000000..f2b92b1be7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3" } */
+
+#include "mve-vcmp.c"
+
+/* vceq is also used for 'ne' comparisons.  */
+/* { dg-final { scan-assembler-times {\tvceq.i[0-9]+\td[0-9]+, d[0-9]+, d[0-9]+\n} 12 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i[0-9]+\tq[0-9]+, q[0-9]+, q[0-9]+\n} 12 } } */
+
+/* lt and le are replaced with the opposite condition, hence the double number
+   of matches for gt and ge.  */
+/* { dg-final { scan-assembler-times {\tvcge.s[0-9]+\td[0-9]+, d[0-9]+, d[0-9]+\n} 6 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s[0-9]+\tq[0-9]+, q[0-9]+, q[0-9]+\n} 6 } } */
+/* { dg-final { scan-assembler-times {\tvcge.u[0-9]+\td[0-9]+, d[0-9]+, d[0-9]+\n} 6 } } */
+/* { dg-final { scan-assembler-times {\tvcge.u[0-9]+\tq[0-9]+, q[0-9]+, q[0-9]+\n} 6 } } */
+
+/* { dg-final { scan-assembler-times {\tvcgt.s[0-9]+\td[0-9]+, d[0-9]+, d[0-9]+\n} 6 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s[0-9]+\tq[0-9]+, q[0-9]+, q[0-9]+\n} 6 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.u[0-9]+\td[0-9]+, d[0-9]+, d[0-9]+\n} 6 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.u[0-9]+\tq[0-9]+, q[0-9]+, q[0-9]+\n} 6 } } */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 02/15] arm: Add tests for PR target/100757
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
  2022-01-13 14:56 ` [PATCH v3 01/15] arm: Add new tests for comparison vectorization with Neon and MVE Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-13 14:56 ` [PATCH v3 03/15] arm: Add tests for PR target/101325 Christophe Lyon
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

These tests currently trigger an ICE which is fixed later in the patch
series.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/testsuite/
	PR target/100757
	* gcc.target/arm/simd/pr100757-2.c: New.
	* gcc.target/arm/simd/pr100757-3.c: New.
	* gcc.target/arm/simd/pr100757-4.c: New.
	* gcc.target/arm/simd/pr100757.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
new file mode 100644
index 00000000000..c2262b4d81e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+int fn1(int d) {
+  int c = 4;
+  for (int b = 0; b < 32; b++)
+    if (a[b] != 2.0f)
+      c = 5;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t4\n} 4 } } */ /* Initial value for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t5\n} 4 } } */ /* Possible value for c.  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
new file mode 100644
index 00000000000..e604555c04c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Copied from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+float fn1(int d) {
+  float c = 4.0f;
+  for (int b = 0; b < 32; b++)
+    if (a[b] != 2.0f)
+      c = 5.0f;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1084227584\n} 4 } } */ /* Initial value for c (4.0).  */
+/* { dg-final { scan-assembler-times {\t.word\t1082130432\n} 4 } } */ /* Possible value for c (5.0).  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
new file mode 100644
index 00000000000..c12040c517f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+unsigned int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+    if (a[b])
+      c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible value for c.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757.c b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
new file mode 100644
index 00000000000..41d6e4e2d7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+    if (a[b])
+      c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible value for c.  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 03/15] arm: Add tests for PR target/101325
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
  2022-01-13 14:56 ` [PATCH v3 01/15] arm: Add new tests for comparison vectorization with Neon and MVE Christophe Lyon
  2022-01-13 14:56 ` [PATCH v3 02/15] arm: Add tests for PR target/100757 Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-13 14:56 ` [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass Christophe Lyon
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

These tests are derived from the one provided in the PR: there is a
compile-only test because I did not have access to anything that could
execute MVE code until recently.
I have been able to add an executable test since QEMU supports MVE.

Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it
uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does.  This
ensures arm_mve_hw passes even if the toolchain does not generate MVE
code by default.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/testsuite/
	PR target/101325
	* gcc.target/arm/simd/pr101325.c: New.
	* gcc.target/arm/simd/pr101325-2.c: New.
	* lib/target-supports.exp (check_effective_target_arm_mve_hw): Use
	add_options_for_arm_v8_1m_mve_fp.

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c b/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
new file mode 100644
index 00000000000..355f6473a00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_mve_hw } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_v8_1m_mve } */
+
+#include <arm_mve.h>
+
+
+__attribute((noipa))
+unsigned foo(int8x16_t v, int8x16_t w)
+{
+  return vcmpeqq (v, w);
+}
+
+int main(void)
+{
+  if (foo (vdupq_n_s8(0), vdupq_n_s8(0)) != 0xffffU)
+    __builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325.c b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
new file mode 100644
index 00000000000..4cb2513da87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include <arm_mve.h>
+
+unsigned foo(int8x16_t v, int8x16_t w)
+{
+  return vcmpeqq (v, w);
+}
+/* { dg-final { scan-assembler {\tvcmp.i8  eq} } } */
+/* { dg-final { scan-assembler {\tvmrs\tr[0-9]+, P0} } } */
+/* { dg-final { scan-assembler {\tuxth} } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b4bf2e6b495..0fe1e1e077a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5037,6 +5037,7 @@ proc check_effective_target_arm_cmse_hw { } {
 	}
     } "-mcmse"]
 }
+
 # Return 1 if the target supports executing MVE instructions, 0
 # otherwise.
 
@@ -5052,7 +5053,7 @@ proc check_effective_target_arm_mve_hw {} {
 	       : "0" (a), "r" (b));
 	  return (a != 2);
 	}
-    } ""]
+    } [add_options_for_arm_v8_1m_mve_fp ""]]
 }
 
 # Return 1 if this is an ARM target where ARMv8-M Security Extensions with
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (2 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 03/15] arm: Add tests for PR target/101325 Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-19 18:17   ` Andre Vieira (lists)
  2022-01-27 16:21   ` Kyrylo Tkachov
  2022-01-13 14:56 ` [PATCH v3 05/15] arm: Add support for VPR_REG in arm_class_likely_spilled_p Christophe Lyon
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS).  The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
-mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
	(REG_CLASS_NAMES): Likewise.
	(REG_CLASS_CONTENTS): Likewise.
	(CLASS_MAX_NREGS): Handle VPR.
	* config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bb75921f32d..c3559ca8703 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
 static unsigned int
 arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
 {
+  if (IS_VPR_REGNUM (regno))
+    return CEIL (GET_MODE_SIZE (mode), 2);
+
   if (TARGET_32BIT
       && regno > PC_REGNUM
       && regno != FRAME_POINTER_REGNUM
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index dacce2b7f08..2416fb5ef64 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1287,6 +1287,7 @@ enum reg_class
   SFP_REG,
   AFP_REG,
   VPR_REG,
+  GENERAL_AND_VPR_REGS,
   ALL_REGS,
   LIM_REG_CLASSES
 };
@@ -1316,6 +1317,7 @@ enum reg_class
   "SFP_REG",		\
   "AFP_REG",		\
   "VPR_REG",		\
+  "GENERAL_AND_VPR_REGS", \
   "ALL_REGS"		\
 }
 
@@ -1344,6 +1346,7 @@ enum reg_class
   { 0x00000000, 0x00000000, 0x00000000, 0x00000040 }, /* SFP_REG */	\
   { 0x00000000, 0x00000000, 0x00000000, 0x00000080 }, /* AFP_REG */	\
   { 0x00000000, 0x00000000, 0x00000000, 0x00000400 }, /* VPR_REG.  */	\
+  { 0x00005FFF, 0x00000000, 0x00000000, 0x00000400 }, /* GENERAL_AND_VPR_REGS.  */ \
   { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000000F }  /* ALL_REGS.  */	\
 }
 
@@ -1453,7 +1456,9 @@ extern const char *fp_sysreg_names[NB_FP_SYSREGS];
    ARM regs are UNITS_PER_WORD bits.  
    FIXME: Is this true for iWMMX?  */
 #define CLASS_MAX_NREGS(CLASS, MODE)  \
-  (ARM_NUM_REGS (MODE))
+  (CLASS == VPR_REG)		      \
+  ? CEIL (GET_MODE_SIZE (MODE), 2)    \
+  : (ARM_NUM_REGS (MODE))
 
 /* If defined, gives a class of registers that cannot be used as the
    operand of a SUBREG that changes the mode of the object illegally.  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 05/15] arm: Add support for VPR_REG in arm_class_likely_spilled_p
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (3 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-19 18:25   ` Andre Vieira (lists)
  2022-01-13 14:56 ` [PATCH v3 06/15] arm: Fix mve_vmvnq_n_<supf><mode> argument mode Christophe Lyon
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

VPR_REG is the only register in its class, so it should be handled by
TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
default_class_likely_spilled_p.  No test fails without this patch, but
it seems it should be implemented.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	* config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c3559ca8703..64a8f2dc7de 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29317,7 +29317,7 @@ arm_class_likely_spilled_p (reg_class_t rclass)
       || rclass  == CC_REG)
     return true;
 
-  return false;
+  return default_class_likely_spilled_p (rclass);
 }
 
 /* Implements target hook small_register_classes_for_mode_p.  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 06/15] arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (4 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 05/15] arm: Add support for VPR_REG in arm_class_likely_spilled_p Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-19 19:03   ` Andre Vieira (lists)
  2022-01-13 14:56 ` [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans Christophe Lyon
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
<V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	* config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
	for operand 1.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 171dd384133..5c3b34dce3a 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_<supf><mode>"
 (define_insn "mve_vmvnq_n_<supf><mode>"
   [
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
-	(unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
+	(unspec:MVE_5 [(match_operand:<V_elem> 1 "immediate_operand" "i")]
 	 VMVNQ_N))
   ]
   "TARGET_HAVE_MVE"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (5 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 06/15] arm: Fix mve_vmvnq_n_<supf><mode> argument mode Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-21 11:20   ` Andre Vieira (lists)
                     ` (2 more replies)
  2022-01-13 14:56 ` [PATCH v3 08/15] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates Christophe Lyon
                   ` (8 subsequent siblings)
  15 siblings, 3 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

In addition, we have to fix the underlying definition of vectors of
booleans because ARM/MVE needs a different representation than
AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
element size, so that a true element of V4BI is represented by
'0b1111'.  This patch updates the aarch64 definition of VNx*BI as
needed.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
	Richard Sandiford  <richard.sandiford@arm.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
	VNx2BI): Update definition.
	* config/arm/arm-builtins.c (arm_init_simd_builtin_types): Add new
	simd types.
	(arm_init_builtin): Map predicate vectors arguments to HImode.
	(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
	rtx. Move return value to HImode rtx.
	* config/arm/arm-builtins.h (arm_type_qualifiers): Add qualifier_predicate.
	* config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New modes.
	* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
	Pred2x8_t,Pred4x4_t): New.
	* emit-rtl.c (init_emit_once): Handle all boolean modes.
	* genmodes.c (mode_data): Add boolean field.
	(blank_mode): Initialize it.
	(make_complex_modes): Fix handling of boolean modes.
	(make_vector_modes): Likewise.
	(VECTOR_BOOL_MODE): Use new COMPONENT parameter.
	(make_vector_bool_mode): Likewise.
	(BOOL_MODE): New.
	(make_bool_mode): New.
	(emit_insn_modes_h): Fix generation of boolean modes.
	(emit_class_narrowest_mode): Likewise.
	* machmode.def: Use new BOOL_MODE instead of FRACTIONAL_INT_MODE
	to define BImode.
	* rtx-vector-builder.c (rtx_vector_builder::find_cached_value):
	Fix handling of constm1_rtx for VECTOR_BOOL.
	* simplify-rtx.c (native_encode_rtx): Fix support for VECTOR_BOOL.
	(native_decode_vector_rtx): Likewise.
	(test_vector_ops_duplicate): Skip vec_merge test
	with vectors of booleans.
	* varasm.c (output_constant_pool_2): Likewise.

diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index 976bf9b42be..8f399225a80 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -47,10 +47,10 @@ ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
 
 /* Vector modes.  */
 
-VECTOR_BOOL_MODE (VNx16BI, 16, 2);
-VECTOR_BOOL_MODE (VNx8BI, 8, 2);
-VECTOR_BOOL_MODE (VNx4BI, 4, 2);
-VECTOR_BOOL_MODE (VNx2BI, 2, 2);
+VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
+VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
+VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
+VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
 
 ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
 ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 9c645722230..2ccfa37c302 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -1548,6 +1548,13 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
   arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
 
+  if (TARGET_HAVE_MVE)
+    {
+      arm_simd_types[Pred1x16_t].eltype = unsigned_intHI_type_node;
+      arm_simd_types[Pred2x8_t].eltype = unsigned_intHI_type_node;
+      arm_simd_types[Pred4x4_t].eltype = unsigned_intHI_type_node;
+    }
+
   for (i = 0; i < nelts; i++)
     {
       tree eltype = arm_simd_types[i].eltype;
@@ -1695,6 +1702,11 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum *d,
       if (qualifiers & qualifier_map_mode)
 	op_mode = d->mode;
 
+      /* MVE Predicates use HImode as mandated by the ABI: pred16_t is unsigned
+	 short.  */
+      if (qualifiers & qualifier_predicate)
+	op_mode = HImode;
+
       /* For pointers, we want a pointer to the basic type
 	 of the vector.  */
       if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
@@ -2939,6 +2951,11 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
 	    case ARG_BUILTIN_COPY_TO_REG:
 	      if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
 		op[argc] = convert_memory_address (Pmode, op[argc]);
+
+	      /* MVE uses mve_pred16_t (aka HImode) for vectors of predicates.  */
+	      if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
+		op[argc] = gen_lowpart (mode[argc], op[argc]);
+
 	      /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
 	      if (!(*insn_data[icode].operand[opno].predicate)
 		  (op[argc], mode[argc]))
@@ -3144,6 +3161,13 @@ constant_arg:
   else
     emit_insn (insn);
 
+  if (GET_MODE_CLASS (tmode) == MODE_VECTOR_BOOL)
+    {
+      rtx HItarget = gen_reg_rtx (HImode);
+      emit_move_insn (HItarget, gen_lowpart (HImode, target));
+      return HItarget;
+    }
+
   return target;
 }
 
diff --git a/gcc/config/arm/arm-builtins.h b/gcc/config/arm/arm-builtins.h
index e5130d6d286..a8ef8aef82d 100644
--- a/gcc/config/arm/arm-builtins.h
+++ b/gcc/config/arm/arm-builtins.h
@@ -84,7 +84,9 @@ enum arm_type_qualifiers
   qualifier_lane_pair_index = 0x1000,
   /* Lane indices selected in quadtuplets - must be within range of previous
      argument = a vector.  */
-  qualifier_lane_quadtup_index = 0x2000
+  qualifier_lane_quadtup_index = 0x2000,
+  /* MVE vector predicates.  */
+  qualifier_predicate = 0x4000
 };
 
 struct arm_simd_type_info
diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index de689c8b45e..9ed0cd042c5 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -84,6 +84,14 @@ VECTOR_MODE (FLOAT, BF, 2);   /*                 V2BF.  */
 VECTOR_MODE (FLOAT, BF, 4);   /*		 V4BF.  */
 VECTOR_MODE (FLOAT, BF, 8);   /*		 V8BF.  */
 
+/* Predicates for MVE.  */
+BOOL_MODE (B2I, 2, 1);
+BOOL_MODE (B4I, 4, 1);
+
+VECTOR_BOOL_MODE (V16BI, 16, BI, 2);
+VECTOR_BOOL_MODE (V8BI, 8, B2I, 2);
+VECTOR_BOOL_MODE (V4BI, 4, B4I, 2);
+
 /* Fraction and accumulator vector modes.  */
 VECTOR_MODES (FRACT, 4);      /* V4QQ  V2HQ */
 VECTOR_MODES (UFRACT, 4);     /* V4UQQ V2UHQ */
diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
index 6ba6f211531..920c2a68e4c 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -51,3 +51,7 @@
   ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
   ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
   ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
+
+  ENTRY (Pred1x16_t, V16BI, unsigned, 16, uint16, 21)
+  ENTRY (Pred2x8_t, V8BI, unsigned, 8, uint16, 21)
+  ENTRY (Pred4x4_t, V4BI, unsigned, 4, uint16, 21)
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index feeee16d320..5f559f8fd93 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -6239,9 +6239,14 @@ init_emit_once (void)
 
   /* For BImode, 1 and -1 are unsigned and signed interpretations
      of the same value.  */
-  const_tiny_rtx[0][(int) BImode] = const0_rtx;
-  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
-  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
+  for (mode = MIN_MODE_BOOL;
+       mode <= MAX_MODE_BOOL;
+       mode = (machine_mode)((int)(mode) + 1))
+    {
+      const_tiny_rtx[0][(int) mode] = const0_rtx;
+      const_tiny_rtx[1][(int) mode] = const_true_rtx;
+      const_tiny_rtx[3][(int) mode] = const_true_rtx;
+    }
 
   for (mode = MIN_MODE_PARTIAL_INT;
        mode <= MAX_MODE_PARTIAL_INT;
@@ -6260,13 +6265,16 @@ init_emit_once (void)
       const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
     }
 
-  /* As for BImode, "all 1" and "all -1" are unsigned and signed
-     interpretations of the same value.  */
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
     {
       const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
       const_tiny_rtx[3][(int) mode] = gen_const_vector (mode, 3);
-      const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
+      if (GET_MODE_INNER (mode) == BImode)
+	/* As for BImode, "all 1" and "all -1" are unsigned and signed
+	   interpretations of the same value.  */
+	const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
+      else
+	const_tiny_rtx[1][(int) mode] = gen_const_vector (mode, 1);
     }
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index 6001b854547..0bb1a7c0b48 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -78,6 +78,7 @@ struct mode_data
   bool need_bytesize_adj;	/* true if this mode needs dynamic size
 				   adjustment */
   unsigned int int_n;		/* If nonzero, then __int<INT_N> will be defined */
+  bool boolean;
 };
 
 static struct mode_data *modes[MAX_MODE_CLASS];
@@ -88,7 +89,8 @@ static const struct mode_data blank_mode = {
   0, "<unknown>", MAX_MODE_CLASS,
   0, -1U, -1U, -1U, -1U,
   0, 0, 0, 0, 0, 0,
-  "<unknown>", 0, 0, 0, 0, false, false, 0
+  "<unknown>", 0, 0, 0, 0, false, false, 0,
+  false
 };
 
 static htab_t modes_by_name;
@@ -456,7 +458,7 @@ make_complex_modes (enum mode_class cl,
       size_t m_len;
 
       /* Skip BImode.  FIXME: BImode probably shouldn't be MODE_INT.  */
-      if (m->precision == 1)
+      if (m->boolean)
 	continue;
 
       m_len = strlen (m->name);
@@ -528,7 +530,7 @@ make_vector_modes (enum mode_class cl, const char *prefix, unsigned int width,
 	 not be necessary.  */
       if (cl == MODE_FLOAT && m->bytesize == 1)
 	continue;
-      if (cl == MODE_INT && m->precision == 1)
+      if (m->boolean)
 	continue;
 
       if ((size_t) snprintf (buf, sizeof buf, "%s%u%s", prefix,
@@ -548,17 +550,18 @@ make_vector_modes (enum mode_class cl, const char *prefix, unsigned int width,
 
 /* Create a vector of booleans called NAME with COUNT elements and
    BYTESIZE bytes in total.  */
-#define VECTOR_BOOL_MODE(NAME, COUNT, BYTESIZE) \
-  make_vector_bool_mode (#NAME, COUNT, BYTESIZE, __FILE__, __LINE__)
+#define VECTOR_BOOL_MODE(NAME, COUNT, COMPONENT, BYTESIZE)		\
+  make_vector_bool_mode (#NAME, COUNT, #COMPONENT, BYTESIZE,		\
+			 __FILE__, __LINE__)
 static void ATTRIBUTE_UNUSED
 make_vector_bool_mode (const char *name, unsigned int count,
-		       unsigned int bytesize, const char *file,
-		       unsigned int line)
+		       const char *component, unsigned int bytesize,
+		       const char *file, unsigned int line)
 {
-  struct mode_data *m = find_mode ("BI");
+  struct mode_data *m = find_mode (component);
   if (!m)
     {
-      error ("%s:%d: no mode \"BI\"", file, line);
+      error ("%s:%d: no mode \"%s\"", file, line, component);
       return;
     }
 
@@ -596,6 +599,20 @@ make_int_mode (const char *name,
   m->precision = precision;
 }
 
+#define BOOL_MODE(N, B, Y) \
+  make_bool_mode (#N, B, Y, __FILE__, __LINE__)
+
+static void
+make_bool_mode (const char *name,
+		unsigned int precision, unsigned int bytesize,
+		const char *file, unsigned int line)
+{
+  struct mode_data *m = new_mode (MODE_INT, name, file, line);
+  m->bytesize = bytesize;
+  m->precision = precision;
+  m->boolean = true;
+}
+
 #define OPAQUE_MODE(N, B)			\
   make_opaque_mode (#N, -1U, B, __FILE__, __LINE__)
 
@@ -1298,9 +1315,21 @@ enum machine_mode\n{");
       /* Don't use BImode for MIN_MODE_INT, since otherwise the middle
 	 end will try to use it for bitfields in structures and the
 	 like, which we do not want.  Only the target md file should
-	 generate BImode widgets.  */
-      if (first && first->precision == 1 && c == MODE_INT)
-	first = first->next;
+	 generate BImode widgets.  Since some targets such as ARM/MVE
+	 define boolean modes with multiple bits, handle those too.  */
+      if (first && first->boolean)
+	{
+	  struct mode_data *last_bool = first;
+	  printf ("  MIN_MODE_BOOL = E_%smode,\n", first->name);
+
+	  while (first && first->boolean)
+	    {
+	      last_bool = first;
+	      first = first->next;
+	    }
+
+	  printf ("  MAX_MODE_BOOL = E_%smode,\n\n", last_bool->name);
+	}
 
       if (first && last)
 	printf ("  MIN_%s = E_%smode,\n  MAX_%s = E_%smode,\n\n",
@@ -1679,15 +1708,25 @@ emit_class_narrowest_mode (void)
   print_decl ("unsigned char", "class_narrowest_mode", "MAX_MODE_CLASS");
 
   for (c = 0; c < MAX_MODE_CLASS; c++)
-    /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
-    tagged_printf ("MIN_%s", mode_class_names[c],
-		   modes[c]
-		   ? ((c != MODE_INT || modes[c]->precision != 1)
-		      ? modes[c]->name
-		      : (modes[c]->next
-			 ? modes[c]->next->name
-			 : void_mode->name))
-		   : void_mode->name);
+    {
+      /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
+      const char *comment_name = void_mode->name;
+
+      if (modes[c])
+	if (c != MODE_INT || !modes[c]->boolean)
+	  comment_name = modes[c]->name;
+	else
+	  {
+	    struct mode_data *m = modes[c];
+	    while (m->boolean)
+	      m = m->next;
+	    if (m)
+	      comment_name = m->name;
+	    else
+	      comment_name = void_mode->name;
+	  }
+      tagged_printf ("MIN_%s", mode_class_names[c], comment_name);
+    }
 
   print_closer ();
 }
diff --git a/gcc/machmode.def b/gcc/machmode.def
index 866a2082d01..eb7905ea23d 100644
--- a/gcc/machmode.def
+++ b/gcc/machmode.def
@@ -196,7 +196,7 @@ RANDOM_MODE (VOID);
 RANDOM_MODE (BLK);
 
 /* Single bit mode used for booleans.  */
-FRACTIONAL_INT_MODE (BI, 1, 1);
+BOOL_MODE (BI, 1, 1);
 
 /* Basic integer modes.  We go up to TI in generic code (128 bits).
    TImode is needed here because the some front ends now genericly
diff --git a/gcc/rtx-vector-builder.c b/gcc/rtx-vector-builder.c
index e36aba010a0..55ffe0d5a76 100644
--- a/gcc/rtx-vector-builder.c
+++ b/gcc/rtx-vector-builder.c
@@ -90,8 +90,10 @@ rtx_vector_builder::find_cached_value ()
 
   if (GET_MODE_CLASS (m_mode) == MODE_VECTOR_BOOL)
     {
-      if (elt == const1_rtx || elt == constm1_rtx)
+      if (elt == const1_rtx)
 	return CONST1_RTX (m_mode);
+      else if (elt == constm1_rtx)
+	return CONSTM1_RTX (m_mode);
       else if (elt == const0_rtx)
 	return CONST0_RTX (m_mode);
       else
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index c36c825f958..532537ea48d 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -6876,12 +6876,13 @@ native_encode_rtx (machine_mode mode, rtx x, vec<target_unit> &bytes,
 	  /* This is the only case in which elements can be smaller than
 	     a byte.  */
 	  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
+	  auto mask = GET_MODE_MASK (GET_MODE_INNER (mode));
 	  for (unsigned int i = 0; i < num_bytes; ++i)
 	    {
 	      target_unit value = 0;
 	      for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits)
 		{
-		  value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & 1) << j;
+		  value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) << j;
 		  elt += 1;
 		}
 	      bytes.quick_push (value);
@@ -7025,9 +7026,8 @@ native_decode_vector_rtx (machine_mode mode, const vec<target_unit> &bytes,
 	  unsigned int bit_index = first_byte * BITS_PER_UNIT + i * elt_bits;
 	  unsigned int byte_index = bit_index / BITS_PER_UNIT;
 	  unsigned int lsb = bit_index % BITS_PER_UNIT;
-	  builder.quick_push (bytes[byte_index] & (1 << lsb)
-			      ? CONST1_RTX (BImode)
-			      : CONST0_RTX (BImode));
+	  unsigned int value = bytes[byte_index] >> lsb;
+	  builder.quick_push (gen_int_mode (value, GET_MODE_INNER (mode)));
 	}
     }
   else
@@ -7994,17 +7994,23 @@ test_vector_ops_duplicate (machine_mode mode, rtx scalar_reg)
 						    duplicate, last_par));
 
       /* Test a scalar subreg of a VEC_MERGE of a VEC_DUPLICATE.  */
-      rtx vector_reg = make_test_reg (mode);
-      for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
+      /* Skip this test for vectors of booleans, because offset is in bytes,
+	 while vec_merge indices are in elements (usually bits).  */
+      if (GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
 	{
-	  if (i >= HOST_BITS_PER_WIDE_INT)
-	    break;
-	  rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
-	  rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
-	  poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
-	  ASSERT_RTX_EQ (scalar_reg,
-			 simplify_gen_subreg (inner_mode, vm,
-					      mode, offset));
+	  rtx vector_reg = make_test_reg (mode);
+	  for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
+	    {
+	      if (i >= HOST_BITS_PER_WIDE_INT)
+		break;
+	      rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
+	      rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
+	      poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
+
+	      ASSERT_RTX_EQ (scalar_reg,
+			     simplify_gen_subreg (inner_mode, vm,
+						  mode, offset));
+	    }
 	}
     }
 
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 76574be191f..5f59b6ace15 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -4085,6 +4085,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
 	unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
 	unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
 	scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
+	unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode));
 
 	/* Build the constant up one integer at a time.  */
 	unsigned int elts_per_int = int_bits / elt_bits;
@@ -4093,8 +4094,10 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
 	    unsigned HOST_WIDE_INT value = 0;
 	    unsigned int limit = MIN (nelts - i, elts_per_int);
 	    for (unsigned int j = 0; j < limit; ++j)
-	      if (INTVAL (CONST_VECTOR_ELT (x, i + j)) != 0)
-		value |= 1 << (j * elt_bits);
+	    {
+	      auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j));
+	      value |= (elt & mask) << (j * elt_bits);
+	    }
 	    output_constant_pool_2 (int_mode, gen_int_mode (value, int_mode),
 				    i != 0 ? MIN (align, int_bits) : align);
 	  }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 08/15] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (6 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-27 16:37   ` Kyrylo Tkachov
  2022-01-13 14:56 ` [PATCH v3 09/15] arm: Fix vcond_mask expander for MVE (PR target/100757) Christophe Lyon
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

We make use of qualifier_predicate to describe MVE builtins
prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
as they are exercised by the tests added earlier in the series.

Special handling is needed for mve_vpselq because it has a v2di
variant, which has no natural VPR.P0 representation: we keep HImode
for it.

The vector_compare expansion code is updated to use the right VxBI
mode instead of HI for the result.

We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16 patterns
to use the new MVE_7_HI iterator which covers HI and the new VxBI
modes, in conjunction with the new DB constraint for a constant vector
of booleans.

2022-01-13  Christophe Lyon <christophe.lyon@foss.st.com>
	Richard Sandiford  <richard.sandiford@arm.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.c (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
	(BINOP_PRED_NONE_NONE_QUALIFIERS)
	(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
	(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
	* config/arm/arm-protos.h (mve_const_bool_vec_to_hi): New.
	* config/arm/arm.c (arm_hard_regno_mode_ok): Handle new VxBI
	modes.
	(arm_mode_to_pred_mode): New.
	(arm_expand_vector_compare): Use the right VxBI mode instead of
	HI.
	(arm_expand_vcond): Likewise.
	(simd_valid_immediate): Handle MODE_VECTOR_BOOL.
	(mve_const_bool_vec_to_hi): New.
	(neon_make_constant): Call mve_const_bool_vec_to_hi when needed.
	* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
	(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
	(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
	(vpselq_s, vpselq_f): Use new predicated qualifiers.
	* config/arm/constraints.md (DB): New.
	* config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators.
	(MVE_VPRED, MVE_vpred): New attribute iterators.
	* config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>)
	(@mve_vcmp<mve_cmp_op>q_f<mode>, @mve_vpselq_<supf><mode>)
	(@mve_vpselq_f<mode>): Use MVE_VPRED instead of HI.
	(@mve_vpselq_<supf>v2di): Define separately.
	(mov<mode>): New expander for VxBI modes.
	* config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Use
	MVE_7_HI iterator and add support for DB constraint.

	gcc/testsuite/
	PR target/100757
	PR target/101325
	* gcc.dg/rtl/arm/mve-vxbi.c: New test.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 2ccfa37c302..36d71ab1a13 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -420,6 +420,12 @@ arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_binop_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_PRED_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_pred_unone_unone_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_none, qualifier_immediate };
@@ -438,6 +444,12 @@ arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_NONE_NONE_QUALIFIERS \
   (arm_binop_unone_none_none_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none };
+#define BINOP_PRED_NONE_NONE_QUALIFIERS \
+  (arm_binop_pred_none_none_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
@@ -509,6 +521,12 @@ arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
   (arm_ternop_none_none_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
+#define TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_none_none_none_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_immediate, qualifier_unsigned };
@@ -528,6 +546,13 @@ arm_ternop_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_ternop_unone_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+    qualifier_predicate };
+#define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_unone_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index fb365ac5268..b978adf2038 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -101,6 +101,7 @@ extern char *neon_output_shift_immediate (const char *, char, rtx *,
 					  machine_mode, int, bool);
 extern void neon_pairwise_reduce (rtx, rtx, machine_mode,
 				  rtx (*) (rtx, rtx, rtx));
+extern rtx mve_const_bool_vec_to_hi (rtx const_vec);
 extern rtx neon_make_constant (rtx, bool generate = true);
 extern tree arm_builtin_vectorized_function (unsigned int, tree, tree);
 extern void neon_expand_vector_init (rtx, rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 64a8f2dc7de..fa18c7bd3fe 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12750,7 +12750,10 @@ simd_valid_immediate (rtx op, machine_mode mode, int inverse,
   innersize = GET_MODE_UNIT_SIZE (mode);
 
   /* Only support 128-bit vectors for MVE.  */
-  if (TARGET_HAVE_MVE && (!vector || n_elts * innersize != 16))
+  if (TARGET_HAVE_MVE
+      && (!vector
+	  || (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+	  || n_elts * innersize != 16))
     return -1;
 
   /* Vectors of float constants.  */
@@ -13115,6 +13118,29 @@ neon_vdup_constant (rtx vals, bool generate)
   return gen_vec_duplicate (mode, x);
 }
 
+/* Return a HI representation of CONST_VEC suitable for MVE predicates.  */
+rtx
+mve_const_bool_vec_to_hi (rtx const_vec)
+{
+  int n_elts = GET_MODE_NUNITS ( GET_MODE (const_vec));
+  int repeat = 16 / n_elts;
+  int i;
+  int hi_val = 0;
+
+  for (i = 0; i < n_elts; i++)
+    {
+      rtx el = CONST_VECTOR_ELT (const_vec, i);
+      unsigned HOST_WIDE_INT elpart;
+
+      gcc_assert (CONST_INT_P (el));
+      elpart = INTVAL (el);
+
+      for (int j = 0; j < repeat; j++)
+	hi_val |= elpart << (i * repeat + j);
+    }
+  return GEN_INT (hi_val);
+}
+
 /* Return a non-NULL RTX iff VALS, which is a PARALLEL containing only
    constants (for vec_init) or CONST_VECTOR, can be effeciently loaded
    into a register.
@@ -13155,6 +13181,8 @@ neon_make_constant (rtx vals, bool generate)
       && simd_immediate_valid_for_move (const_vec, mode, NULL, NULL))
     /* Load using VMOV.  On Cortex-A8 this takes one cycle.  */
     return const_vec;
+  else if (TARGET_HAVE_MVE && (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL))
+    return mve_const_bool_vec_to_hi (const_vec);
   else if ((target = neon_vdup_constant (vals, generate)) != NULL_RTX)
     /* Loaded using VDUP.  On Cortex-A8 the VDUP takes one NEON
        pipeline cycle; creating the constant takes one or two ARM
@@ -25313,7 +25341,10 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
     return false;
 
   if (IS_VPR_REGNUM (regno))
-    return mode == HImode;
+    return mode == HImode
+      || mode == V16BImode
+      || mode == V8BImode
+      || mode == V4BImode;
 
   if (TARGET_THUMB1)
     /* For the Thumb we only allow values bigger than SImode in
@@ -31001,6 +31032,19 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
     arm_post_atomic_barrier (model);
 }
 \f
+/* Return the mode for the MVE vector of predicates corresponding to MODE.  */
+machine_mode
+arm_mode_to_pred_mode (machine_mode mode)
+{
+  switch (GET_MODE_NUNITS (mode))
+    {
+    case 16: return V16BImode;
+    case 8: return V8BImode;
+    case 4: return V4BImode;
+    }
+  gcc_unreachable ();
+}
+
 /* Expand code to compare vectors OP0 and OP1 using condition CODE.
    If CAN_INVERT, store either the result or its inverse in TARGET
    and return true if TARGET contains the inverse.  If !CAN_INVERT,
@@ -31084,7 +31128,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
 	  if (vcond_mve)
 	    vpr_p0 = target;
 	  else
-	    vpr_p0 = gen_reg_rtx (HImode);
+	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
 
 	  switch (GET_MODE_CLASS (cmp_mode))
 	    {
@@ -31126,7 +31170,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
 	  if (vcond_mve)
 	    vpr_p0 = target;
 	  else
-	    vpr_p0 = gen_reg_rtx (HImode);
+	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
 
 	  emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
 	  if (!vcond_mve)
@@ -31153,7 +31197,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
 	  if (vcond_mve)
 	    vpr_p0 = target;
 	  else
-	    vpr_p0 = gen_reg_rtx (HImode);
+	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
 
 	  emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, force_reg (cmp_mode, op1), op0));
 	  if (!vcond_mve)
@@ -31206,7 +31250,7 @@ arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode)
   if (TARGET_HAVE_MVE)
     {
       vcond_mve=true;
-      mask = gen_reg_rtx (HImode);
+      mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode));
     }
   else
     mask = gen_reg_rtx (cmp_result_mode);
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index c3ae40765fe..44b41eab4c5 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -89,7 +89,7 @@ VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
 VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
 VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vsubq_u, v16qi, v8hi, v4si)
@@ -117,9 +117,9 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vhsubq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si)
@@ -143,15 +143,15 @@ VAR3 (BINOP_UNONE_UNONE_IMM, vshlq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vrshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vqshlq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_NONE, vcmpeqq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_IMM, vqshluq_n_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_UNONE, vaddvq_p_s, v16qi, v8hi, v4si)
@@ -219,17 +219,17 @@ VAR2 (BINOP_UNONE_UNONE_IMM, vshllbq_n_u, v16qi, v8hi)
 VAR2 (BINOP_UNONE_UNONE_IMM, vorrq_n_u, v8hi, v4si)
 VAR2 (BINOP_UNONE_UNONE_IMM, vbicq_n_u, v8hi, v4si)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpneq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpneq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpneq_f, v8hf, v4sf)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpltq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpltq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpltq_f, v8hf, v4sf)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpleq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpleq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpleq_f, v8hf, v4sf)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpgtq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpgtq_f, v8hf, v4sf)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpgeq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpgeq_f, v8hf, v4sf)
 VAR2 (BINOP_UNONE_NONE_NONE, vcmpeqq_n_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpeqq_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpeqq_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vsubq_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vqmovntq_s, v8hi, v4si)
 VAR2 (BINOP_NONE_NONE_NONE, vqmovnbq_s, v8hi, v4si)
@@ -295,8 +295,8 @@ VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE, vcvtaq_m_u, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtaq_m_s, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vshlcq_vec_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_UNONE_IMM, vshlcq_vec_s, v16qi, v8hi, v4si)
-VAR4 (TERNOP_UNONE_UNONE_UNONE_UNONE, vpselq_u, v16qi, v8hi, v4si, v2di)
-VAR4 (TERNOP_NONE_NONE_NONE_UNONE, vpselq_s, v16qi, v8hi, v4si, v2di)
+VAR4 (TERNOP_UNONE_UNONE_UNONE_PRED, vpselq_u, v16qi, v8hi, v4si, v2di)
+VAR4 (TERNOP_NONE_NONE_NONE_PRED, vpselq_s, v16qi, v8hi, v4si, v2di)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrev64q_m_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmvnq_m_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmlasq_n_u, v16qi, v8hi, v4si)
@@ -426,7 +426,7 @@ VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrev64q_m_f, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrev32q_m_s, v16qi, v8hi)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vqmovntq_m_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vqmovnbq_m_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vpselq_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vpselq_f, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vnegq_m_f, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovntq_m_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovnbq_m_s, v8hi, v4si)
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 1920004b450..2b411b0cb0f 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -312,6 +312,12 @@ (define_constraint "Dz"
  (and (match_code "const_vector")
       (match_test "(TARGET_NEON || TARGET_HAVE_MVE) && op == CONST0_RTX (mode)")))
 
+(define_constraint "DB"
+ "@internal
+  In ARM/Thumb-2 state with MVE a constant vector of booleans."
+ (and (match_code "const_vector")
+      (match_test "TARGET_HAVE_MVE && GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL")))
+
 (define_constraint "Da"
  "@internal
   In ARM/Thumb-2 state a const_int, const_double or const_vector that can
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 8202c27cc82..37cf7971be8 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -272,6 +272,8 @@ (define_mode_iterator MVE_3 [V16QI V8HI])
 (define_mode_iterator MVE_2 [V16QI V8HI V4SI])
 (define_mode_iterator MVE_5 [V8HI V4SI])
 (define_mode_iterator MVE_6 [V8HI V4SI])
+(define_mode_iterator MVE_7 [V16BI V8BI V4BI])
+(define_mode_iterator MVE_7_HI [HI V16BI V8BI V4BI])
 
 ;;----------------------------------------------------------------------------
 ;; Code iterators
@@ -946,6 +948,10 @@ (define_mode_attr V_extr_elem [(V16QI "u8") (V8HI "u16") (V4SI "32")
 			       (V8HF "u16") (V4SF "32")])
 (define_mode_attr earlyclobber_32 [(V16QI "=w") (V8HI "=w") (V4SI "=&w")
 						(V8HF "=w") (V4SF "=&w")])
+(define_mode_attr MVE_VPRED [(V16QI "V16BI") (V8HI "V8BI") (V4SI "V4BI")
+                             (V2DI "HI") (V8HF "V8BI")   (V4SF "V4BI")])
+(define_mode_attr MVE_vpred [(V16QI "v16bi") (V8HI "v8bi") (V4SI "v4bi")
+                             (V2DI "hi") (V8HF "v8bi")   (V4SF "v4bi")])
 
 ;;----------------------------------------------------------------------------
 ;; Code attributes
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 5c3b34dce3a..983aa10e652 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -839,8 +839,8 @@ (define_insn "mve_vaddlvq_p_<supf>v4si"
 ;;
 (define_insn "@mve_vcmp<mve_cmp_op>q_<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(MVE_COMPARISONS:HI (match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(MVE_COMPARISONS:<MVE_VPRED> (match_operand:MVE_2 1 "s_register_operand" "w")
 		    (match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
@@ -1929,8 +1929,8 @@ (define_insn "mve_vcaddq<mve_rot><mode>"
 ;;
 (define_insn "@mve_vcmp<mve_cmp_op>q_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(MVE_FP_COMPARISONS:<MVE_VPRED> (match_operand:MVE_0 1 "s_register_operand" "w")
 			       (match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3324,7 +3324,7 @@ (define_insn "@mve_vpselq_<supf><mode>"
    (set (match_operand:MVE_1 0 "s_register_operand" "=w")
 	(unspec:MVE_1 [(match_operand:MVE_1 1 "s_register_operand" "w")
 		       (match_operand:MVE_1 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VPSELQ))
   ]
   "TARGET_HAVE_MVE"
@@ -4419,7 +4419,7 @@ (define_insn "@mve_vpselq_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VPSELQ_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -10516,3 +10516,14 @@ (define_insn "*movmisalign<mode>_mve_load"
   "vldr<V_sz_elem1>.<V_sz_elem>\t%q0, %E1"
   [(set_attr "type" "mve_load")]
 )
+
+;; Expander for VxBI moves
+(define_expand "mov<mode>"
+  [(set (match_operand:MVE_7 0 "nonimmediate_operand")
+        (match_operand:MVE_7 1 "general_operand"))]
+  "TARGET_HAVE_MVE"
+  {
+    if (!register_operand (operands[0], <MODE>mode))
+      operands[1] = force_reg (<MODE>mode, operands[1]);
+  }
+)
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index f5ccb92d097..f00d1cad3e9 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -73,21 +73,26 @@ (define_insn "*arm_movhi_vfp"
 
 (define_insn "*thumb2_movhi_vfp"
  [(set
-   (match_operand:HI 0 "nonimmediate_operand"
+   (match_operand:MVE_7_HI 0 "nonimmediate_operand"
     "=rk, r, l, r, m, r, *t, r, *t, Up, r")
-   (match_operand:HI 1 "general_operand"
-    "rk, I, Py, n, r, m, r, *t, *t, r, Up"))]
+   (match_operand:MVE_7_HI 1 "general_operand"
+    "rk, IDB, Py, n, r, m, r, *t, *t, r, Up"))]
  "TARGET_THUMB2 && TARGET_VFP_BASE
   && !TARGET_VFP_FP16INST
-  && (register_operand (operands[0], HImode)
-       || register_operand (operands[1], HImode))"
+  && (register_operand (operands[0], <MODE>mode)
+       || register_operand (operands[1], <MODE>mode))"
 {
   switch (which_alternative)
     {
     case 0:
-    case 1:
     case 2:
       return "mov%?\t%0, %1\t%@ movhi";
+    case 1:
+      if (GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_VECTOR_BOOL)
+        operands[1] = mve_const_bool_vec_to_hi (operands[1]);
+      else
+        operands[1] = gen_lowpart (HImode, operands[1]);
+      return "mov%?\t%0, %1\t%@ movhi";
     case 3:
       return "movw%?\t%0, %L1\t%@ movhi";
     case 4:
@@ -173,20 +178,25 @@ (define_insn "*arm_movhi_fp16"
 
 (define_insn "*thumb2_movhi_fp16"
  [(set
-   (match_operand:HI 0 "nonimmediate_operand"
+   (match_operand:MVE_7_HI 0 "nonimmediate_operand"
     "=rk, r, l, r, m, r, *t, r, *t, Up, r")
-   (match_operand:HI 1 "general_operand"
-    "rk, I, Py, n, r, m, r, *t, *t, r, Up"))]
+   (match_operand:MVE_7_HI 1 "general_operand"
+    "rk, IDB, Py, n, r, m, r, *t, *t, r, Up"))]
  "TARGET_THUMB2 && (TARGET_VFP_FP16INST || TARGET_HAVE_MVE)
-  && (register_operand (operands[0], HImode)
-       || register_operand (operands[1], HImode))"
+  && (register_operand (operands[0], <MODE>mode)
+       || register_operand (operands[1], <MODE>mode))"
 {
   switch (which_alternative)
     {
     case 0:
-    case 1:
     case 2:
       return "mov%?\t%0, %1\t%@ movhi";
+    case 1:
+      if (GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_VECTOR_BOOL)
+        operands[1] = mve_const_bool_vec_to_hi (operands[1]);
+      else
+        operands[1] = gen_lowpart (HImode, operands[1]);
+      return "mov%?\t%0, %1\t%@ movhi";
     case 3:
       return "movw%?\t%0, %L1\t%@ movhi";
     case 4:
diff --git a/gcc/testsuite/gcc.dg/rtl/arm/mve-vxbi.c b/gcc/testsuite/gcc.dg/rtl/arm/mve-vxbi.c
new file mode 100644
index 00000000000..093283ed43c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/arm/mve-vxbi.c
@@ -0,0 +1,89 @@
+/* { dg-do compile { target arm*-*-* } } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O2" } */
+
+void __RTL (startwith ("ira")) foo (void *ptr)
+{
+  (function "foo"
+   (param "ptr"
+    (DECL_RTL (reg/v:SI <0> [ ptr ]))
+    (DECL_RTL_INCOMING (reg:SI r0 [ ptr ]))
+    ) ;; param "n"
+   (insn-chain
+    (block 2
+     (edge-from entry (flags "FALLTHRU"))
+     (cnote 5 [bb 2] NOTE_INSN_BASIC_BLOCK)
+     (insn 7 (set (reg:V4BI <1>)
+	      (const_vector:V4BI [(const_int 1)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 1)])) (nil))
+     (insn 8 (set (mem:V4BI (reg:SI <0>) [1 ptr+0 S2 A16]) (reg:V4BI <1>)))
+     (edge-to exit (flags "FALLTHRU"))
+     ) ;; block 2
+    ) ;; insn-chain
+   ) ;; function
+}
+
+void __RTL (startwith ("ira")) foo2 (void *ptr)
+{
+  (function "foo"
+   (param "ptr"
+    (DECL_RTL (reg/v:SI <0> [ ptr ]))
+    (DECL_RTL_INCOMING (reg:SI r0 [ ptr ]))
+    ) ;; param "n"
+   (insn-chain
+    (block 2
+     (edge-from entry (flags "FALLTHRU"))
+     (cnote 5 [bb 2] NOTE_INSN_BASIC_BLOCK)
+     (insn 7 (set (reg:V8BI <1>)
+	      (const_vector:V8BI [(const_int 1)
+				  (const_int 0)
+				  (const_int 1)
+				  (const_int 1)
+				  (const_int 1)
+				  (const_int 1)
+				  (const_int 0)
+				  (const_int 1)])) (nil))
+     (insn 8 (set (mem:V8BI (reg:SI <0>) [1 ptr+0 S2 A16]) (reg:V8BI <1>)))
+     (edge-to exit (flags "FALLTHRU"))
+     ) ;; block 2
+    ) ;; insn-chain
+   ) ;; function
+}
+
+void __RTL (startwith ("ira")) foo3 (void *ptr)
+{
+  (function "foo"
+   (param "ptr"
+    (DECL_RTL (reg/v:SI <0> [ ptr ]))
+    (DECL_RTL_INCOMING (reg:SI r0 [ ptr ]))
+    ) ;; param "n"
+   (insn-chain
+    (block 2
+     (edge-from entry (flags "FALLTHRU"))
+     (cnote 5 [bb 2] NOTE_INSN_BASIC_BLOCK)
+     (insn 7 (set (reg:V16BI <1>)
+	      (const_vector:V16BI [(const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)
+				  (const_int 0)])) (nil))
+     (insn 8 (set (mem:V16BI (reg:SI <0>) [1 ptr+0 S2 A16]) (reg:V16BI <1>)))
+     (edge-to exit (flags "FALLTHRU"))
+     ) ;; block 2
+    ) ;; insn-chain
+   ) ;; function
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 09/15] arm: Fix vcond_mask expander for MVE (PR target/100757)
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (7 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 08/15] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-27 16:55   ` Kyrylo Tkachov
  2022-01-13 14:56 ` [PATCH v3 10/15] arm: Convert remaining MVE vcmp builtins to predicate qualifiers Christophe Lyon
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp<mode><MVE_vpred>,
vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can
move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and
vcond_mask_<mode><v_cmp_result> back to neon.md since they are not
used by MVE anymore.  The new *<MVE_vpred> patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(!<Is_float_mode> || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
arm_mve effective target.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
    if (a[b] != 2.0f)
      c = 5.0f;
  return c;
}

fn1:
	ldr     r3, .L3+48
	vldr.64 d4, .L3              // q2=(2.0,2.0,2.0,2.0)
	vldr.64 d5, .L3+8
	vldrw.32        q0, [r3]     // q0=a(0..3)
	adds    r3, r3, #16
	vcmp.f32        eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
	vldrw.32        q1, [r3]     // q1=a(4..7)
	vmrs     r3, P0
	vcmp.f32        eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
	vmrs    r2, P0  @ movhi
	ands    r3, r3, r2           // r3=select(a(0..3]) & select(a(4..7))
	vldr.64 d4, .L3+16           // q2=(5.0,5.0,5.0,5.0)
	vldr.64 d5, .L3+24
	vmsr     P0, r3
	vldr.64 d6, .L3+32           // q3=(4.0,4.0,4.0,4.0)
	vldr.64 d7, .L3+40
	vpsel q3, q3, q2             // q3=vcond_mask(4.0,5.0)
	vmov.32 r2, q3[1]            // keep the scalar max
	vmov.32 r0, q3[3]
	vmov.32 r3, q3[2]
	vmov.f32        s11, s12
	vmov    s15, r2
	vmov    s14, r3
	vmaxnm.f32      s15, s11, s15
	vmaxnm.f32      s15, s15, s14
	vmov    s14, r0
	vmaxnm.f32      s15, s15, s14
	vmov    r0, s15
	bx      lr
	.L4:
	.align  3
	.L3:
	.word   1073741824	// 2.0f
	.word   1073741824
	.word   1073741824
	.word   1073741824
	.word   1084227584	// 5.0f
	.word   1084227584
	.word   1084227584
	.word   1084227584
	.word   1082130432	// 4.0f
	.word   1082130432
	.word   1082130432
	.word   1082130432

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	PR target/100757
	gcc/
	* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
	(arm_expand_vector_compare): Update prototype.
	* config/arm/arm.c (TARGET_VECTORIZE_GET_MASK_MODE): New.
	(arm_vector_mode_supported_p): Add support for VxBI modes.
	(arm_expand_vector_compare): Remove useless generation of vpsel.
	(arm_expand_vcond): Fix select operands.
	(arm_get_mask_mode): New.
	* config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New.
	(vec_cmpu<mode><MVE_vpred>): New.
	(vcond_mask_<mode><MVE_vpred>): New.
	* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>)
	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ...
	* config/arm/neon.md (vec_cmp<mode><v_cmp_result>)
	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here
	and disable for MVE.
	* doc/sourcebuild.texi (arm_mve): Document new effective-target.

	gcc/testsuite/
	* gcc.dg/signbit-2.c: Skip when targeting ARM/MVE.
	* lib/target-supports.exp (check_effective_target_arm_mve): New.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index b978adf2038..a84613104b1 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -202,6 +202,7 @@ extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
 extern bool arm_pad_reg_upward (machine_mode, tree, int);
 #endif
 extern int arm_apply_result_size (void);
+extern opt_machine_mode arm_get_mask_mode (machine_mode mode);
 
 #endif /* RTX_CODE */
 
@@ -378,7 +379,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
 extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
 extern bool arm_valid_symbolic_address_p (rtx);
 extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
-extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool, bool);
+extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool);
 #endif /* RTX_CODE */
 
 extern bool arm_gen_setmem (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fa18c7bd3fe..7d56fa71806 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -829,6 +829,10 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
+
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE arm_get_mask_mode
+
 \f
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
@@ -29234,7 +29238,8 @@ arm_vector_mode_supported_p (machine_mode mode)
 
   if (TARGET_HAVE_MVE
       && (mode == V2DImode || mode == V4SImode || mode == V8HImode
-	  || mode == V16QImode))
+	  || mode == V16QImode
+	  || mode == V16BImode || mode == V8BImode || mode == V4BImode))
       return true;
 
   if (TARGET_HAVE_MVE_FLOAT
@@ -31033,7 +31038,7 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 }
 \f
 /* Return the mode for the MVE vector of predicates corresponding to MODE.  */
-machine_mode
+opt_machine_mode
 arm_mode_to_pred_mode (machine_mode mode)
 {
   switch (GET_MODE_NUNITS (mode))
@@ -31042,7 +31047,7 @@ arm_mode_to_pred_mode (machine_mode mode)
     case 8: return V8BImode;
     case 4: return V4BImode;
     }
-  gcc_unreachable ();
+  return opt_machine_mode ();
 }
 
 /* Expand code to compare vectors OP0 and OP1 using condition CODE.
@@ -31050,16 +31055,12 @@ arm_mode_to_pred_mode (machine_mode mode)
    and return true if TARGET contains the inverse.  If !CAN_INVERT,
    always store the result in TARGET, never its inverse.
 
-   If VCOND_MVE, do not emit the vpsel instruction here, let arm_expand_vcond do
-   it with the right destination type to avoid emiting two vpsel, one here and
-   one in arm_expand_vcond.
-
    Note that the handling of floating-point comparisons is not
    IEEE compliant.  */
 
 bool
 arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
-			   bool can_invert, bool vcond_mve)
+			   bool can_invert)
 {
   machine_mode cmp_result_mode = GET_MODE (target);
   machine_mode cmp_mode = GET_MODE (op0);
@@ -31088,7 +31089,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
 	       and then store its inverse in TARGET.  This avoids reusing
 	       TARGET (which for integer NE could be one of the inputs).  */
 	    rtx tmp = gen_reg_rtx (cmp_result_mode);
-	    if (arm_expand_vector_compare (tmp, code, op0, op1, true, vcond_mve))
+	    if (arm_expand_vector_compare (tmp, code, op0, op1, true))
 	      gcc_unreachable ();
 	    emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp)));
 	    return false;
@@ -31124,36 +31125,22 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
     case NE:
       if (TARGET_HAVE_MVE)
 	{
-	  rtx vpr_p0;
-	  if (vcond_mve)
-	    vpr_p0 = target;
-	  else
-	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
-
 	  switch (GET_MODE_CLASS (cmp_mode))
 	    {
 	    case MODE_VECTOR_INT:
-	      emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
+	      emit_insn (gen_mve_vcmpq (code, cmp_mode, target,
+					op0, force_reg (cmp_mode, op1)));
 	      break;
 	    case MODE_VECTOR_FLOAT:
 	      if (TARGET_HAVE_MVE_FLOAT)
-		emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
+		emit_insn (gen_mve_vcmpq_f (code, cmp_mode, target,
+					    op0, force_reg (cmp_mode, op1)));
 	      else
 		gcc_unreachable ();
 	      break;
 	    default:
 	      gcc_unreachable ();
 	    }
-
-	  /* If we are not expanding a vcond, build the result here.  */
-	  if (!vcond_mve)
-	    {
-	      rtx zero = gen_reg_rtx (cmp_result_mode);
-	      rtx one = gen_reg_rtx (cmp_result_mode);
-	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
-	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
-	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0));
-	    }
 	}
       else
 	emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1));
@@ -31165,23 +31152,8 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
     case GEU:
     case GTU:
       if (TARGET_HAVE_MVE)
-	{
-	  rtx vpr_p0;
-	  if (vcond_mve)
-	    vpr_p0 = target;
-	  else
-	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
-
-	  emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1)));
-	  if (!vcond_mve)
-	    {
-	      rtx zero = gen_reg_rtx (cmp_result_mode);
-	      rtx one = gen_reg_rtx (cmp_result_mode);
-	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
-	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
-	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0));
-	    }
-	}
+	emit_insn (gen_mve_vcmpq (code, cmp_mode, target,
+				  op0, force_reg (cmp_mode, op1)));
       else
 	emit_insn (gen_neon_vc (code, cmp_mode, target,
 				op0, force_reg (cmp_mode, op1)));
@@ -31192,23 +31164,8 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
     case LEU:
     case LTU:
       if (TARGET_HAVE_MVE)
-	{
-	  rtx vpr_p0;
-	  if (vcond_mve)
-	    vpr_p0 = target;
-	  else
-	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
-
-	  emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, force_reg (cmp_mode, op1), op0));
-	  if (!vcond_mve)
-	    {
-	      rtx zero = gen_reg_rtx (cmp_result_mode);
-	      rtx one = gen_reg_rtx (cmp_result_mode);
-	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
-	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
-	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0));
-	    }
-	}
+	emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, target,
+				  force_reg (cmp_mode, op1), op0));
       else
 	emit_insn (gen_neon_vc (swap_condition (code), cmp_mode,
 				target, force_reg (cmp_mode, op1), op0));
@@ -31223,8 +31180,8 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
 	rtx gt_res = gen_reg_rtx (cmp_result_mode);
 	rtx alt_res = gen_reg_rtx (cmp_result_mode);
 	rtx_code alt_code = (code == LTGT ? LT : LE);
-	if (arm_expand_vector_compare (gt_res, GT, op0, op1, true, vcond_mve)
-	    || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true, vcond_mve))
+	if (arm_expand_vector_compare (gt_res, GT, op0, op1, true)
+	    || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true))
 	  gcc_unreachable ();
 	emit_insn (gen_rtx_SET (target, gen_rtx_IOR (cmp_result_mode,
 						     gt_res, alt_res)));
@@ -31244,19 +31201,15 @@ arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode)
 {
   /* When expanding for MVE, we do not want to emit a (useless) vpsel in
      arm_expand_vector_compare, and another one here.  */
-  bool vcond_mve=false;
   rtx mask;
 
   if (TARGET_HAVE_MVE)
-    {
-      vcond_mve=true;
-      mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode));
-    }
+    mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode).require ());
   else
     mask = gen_reg_rtx (cmp_result_mode);
 
   bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]),
-					     operands[4], operands[5], true, vcond_mve);
+					     operands[4], operands[5], true);
   if (inverted)
     std::swap (operands[1], operands[2]);
   if (TARGET_NEON)
@@ -31264,20 +31217,20 @@ arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode)
 			    mask, operands[1], operands[2]));
   else
     {
-      machine_mode cmp_mode = GET_MODE (operands[4]);
-      rtx vpr_p0 = mask;
-      rtx zero = gen_reg_rtx (cmp_mode);
-      rtx one = gen_reg_rtx (cmp_mode);
-      emit_move_insn (zero, CONST0_RTX (cmp_mode));
-      emit_move_insn (one, CONST1_RTX (cmp_mode));
+      machine_mode cmp_mode = GET_MODE (operands[0]);
+
       switch (GET_MODE_CLASS (cmp_mode))
 	{
 	case MODE_VECTOR_INT:
-	  emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], one, zero, vpr_p0));
+	  emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_mode, operands[0],
+				     operands[1], operands[2], mask));
 	  break;
 	case MODE_VECTOR_FLOAT:
 	  if (TARGET_HAVE_MVE_FLOAT)
-	    emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, vpr_p0));
+	    emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0],
+					 operands[1], operands[2], mask));
+	  else
+	    gcc_unreachable ();
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -34187,4 +34140,15 @@ arm_mode_base_reg_class (machine_mode mode)
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE.  */
+
+opt_machine_mode
+arm_get_mask_mode (machine_mode mode)
+{
+  if (TARGET_HAVE_MVE)
+    return arm_mode_to_pred_mode (mode);
+
+  return default_get_mask_mode (mode);
+}
+
 #include "gt-arm.h"
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 983aa10e652..35564e870bc 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -10527,3 +10527,57 @@ (define_expand "mov<mode>"
       operands[1] = force_reg (<MODE>mode, operands[1]);
   }
 )
+
+;; Expanders for vec_cmp and vcond
+
+(define_expand "vec_cmp<mode><MVE_vpred>"
+  [(set (match_operand:<MVE_VPRED> 0 "s_register_operand")
+	(match_operator:<MVE_VPRED> 1 "comparison_operator"
+	  [(match_operand:MVE_VLD_ST 2 "s_register_operand")
+	   (match_operand:MVE_VLD_ST 3 "reg_or_zero_operand")]))]
+  "TARGET_HAVE_MVE
+   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+{
+  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
+			     operands[2], operands[3], false);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><MVE_vpred>"
+  [(set (match_operand:<MVE_VPRED> 0 "s_register_operand")
+	(match_operator:<MVE_VPRED> 1 "comparison_operator"
+	  [(match_operand:MVE_2 2 "s_register_operand")
+	   (match_operand:MVE_2 3 "reg_or_zero_operand")]))]
+  "TARGET_HAVE_MVE"
+{
+  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
+			     operands[2], operands[3], false);
+  DONE;
+})
+
+(define_expand "vcond_mask_<mode><MVE_vpred>"
+  [(set (match_operand:MVE_VLD_ST 0 "s_register_operand")
+	(if_then_else:MVE_VLD_ST
+	  (match_operand:<MVE_VPRED> 3 "s_register_operand")
+	  (match_operand:MVE_VLD_ST 1 "s_register_operand")
+	  (match_operand:MVE_VLD_ST 2 "s_register_operand")))]
+  "TARGET_HAVE_MVE"
+{
+  switch (GET_MODE_CLASS (<MODE>mode))
+    {
+      case MODE_VECTOR_INT:
+	emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0],
+				   operands[1], operands[2], operands[3]));
+	break;
+      case MODE_VECTOR_FLOAT:
+	if (TARGET_HAVE_MVE_FLOAT)
+	  emit_insn (gen_mve_vpselq_f (<MODE>mode, operands[0],
+				       operands[1], operands[2], operands[3]));
+	else
+	  gcc_unreachable ();
+	break;
+      default:
+	gcc_unreachable ();
+    }
+  DONE;
+})
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index e06c8245672..20e9f11ec81 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1394,6 +1394,45 @@ (define_insn "*us_sub<mode>_neon"
   [(set_attr "type" "neon_qsub<q>")]
 )
 
+(define_expand "vec_cmp<mode><v_cmp_result>"
+  [(set (match_operand:<V_cmp_result> 0 "s_register_operand")
+	(match_operator:<V_cmp_result> 1 "comparison_operator"
+	  [(match_operand:VDQWH 2 "s_register_operand")
+	   (match_operand:VDQWH 3 "reg_or_zero_operand")]))]
+  "TARGET_NEON
+   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+{
+  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
+			     operands[2], operands[3], false);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand")
+	(match_operator:VDQIW 1 "comparison_operator"
+	  [(match_operand:VDQIW 2 "s_register_operand")
+	   (match_operand:VDQIW 3 "reg_or_zero_operand")]))]
+  "TARGET_NEON"
+{
+  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
+			     operands[2], operands[3], false);
+  DONE;
+})
+
+(define_expand "vcond_mask_<mode><v_cmp_result>"
+  [(set (match_operand:VDQWH 0 "s_register_operand")
+	(if_then_else:VDQWH
+	  (match_operand:<V_cmp_result> 3 "s_register_operand")
+	  (match_operand:VDQWH 1 "s_register_operand")
+	  (match_operand:VDQWH 2 "s_register_operand")))]
+  "TARGET_NEON
+   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+{
+  emit_insn (gen_neon_vbsl<mode> (operands[0], operands[3], operands[1],
+				  operands[2]));
+  DONE;
+})
+
 ;; Patterns for builtins.
 
 ; good for plain vadd, vaddq.
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index cef358e44f5..20586973ed9 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -363,33 +363,6 @@ (define_expand "vlshr<mode>3"
     }
 })
 
-(define_expand "vec_cmp<mode><v_cmp_result>"
-  [(set (match_operand:<V_cmp_result> 0 "s_register_operand")
-	(match_operator:<V_cmp_result> 1 "comparison_operator"
-	  [(match_operand:VDQWH 2 "s_register_operand")
-	   (match_operand:VDQWH 3 "reg_or_zero_operand")]))]
-  "ARM_HAVE_<MODE>_ARITH
-   && !TARGET_REALLY_IWMMXT
-   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
-{
-  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
-			     operands[2], operands[3], false, false);
-  DONE;
-})
-
-(define_expand "vec_cmpu<mode><mode>"
-  [(set (match_operand:VDQIW 0 "s_register_operand")
-	(match_operator:VDQIW 1 "comparison_operator"
-	  [(match_operand:VDQIW 2 "s_register_operand")
-	   (match_operand:VDQIW 3 "reg_or_zero_operand")]))]
-  "ARM_HAVE_<MODE>_ARITH
-   && !TARGET_REALLY_IWMMXT"
-{
-  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
-			     operands[2], operands[3], false, false);
-  DONE;
-})
-
 ;; Conditional instructions.  These are comparisons with conditional moves for
 ;; vectors.  They perform the assignment:
 ;;
@@ -461,31 +434,6 @@ (define_expand "vcondu<mode><v_cmp_result>"
   DONE;
 })
 
-(define_expand "vcond_mask_<mode><v_cmp_result>"
-  [(set (match_operand:VDQWH 0 "s_register_operand")
-        (if_then_else:VDQWH
-          (match_operand:<V_cmp_result> 3 "s_register_operand")
-          (match_operand:VDQWH 1 "s_register_operand")
-          (match_operand:VDQWH 2 "s_register_operand")))]
-  "ARM_HAVE_<MODE>_ARITH
-   && !TARGET_REALLY_IWMMXT
-   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
-{
-  if (TARGET_NEON)
-    {
-      emit_insn (gen_neon_vbsl (<MODE>mode, operands[0], operands[3],
-                                operands[1], operands[2]));
-    }
-  else if (TARGET_HAVE_MVE)
-    {
-      emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0],
-                                 operands[1], operands[2], operands[3]));
-    }
-  else
-    gcc_unreachable ();
-  DONE;
-})
-
 (define_expand "vec_load_lanesoi<mode>"
   [(set (match_operand:OI 0 "s_register_operand")
         (unspec:OI [(match_operand:OI 1 "neon_struct_operand")
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 6095a35cd45..8d369935396 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2236,6 +2236,10 @@ ARM target supports the @code{-mfloat-abi=softfp} option.
 @anchor{arm_hard_ok}
 ARM target supports the @code{-mfloat-abi=hard} option.
 
+@item arm_mve
+@anchor{arm_mve}
+ARM target supports generating MVE instructions.
+
 @item arm_v8_1_lob_ok
 @anchor{arm_v8_1_lob_ok}
 ARM Target supports executing the Armv8.1-M Mainline Low Overhead Loop
diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c
index b609f67dc9f..2f2dc448286 100644
--- a/gcc/testsuite/gcc.dg/signbit-2.c
+++ b/gcc/testsuite/gcc.dg/signbit-2.c
@@ -4,6 +4,7 @@
 /* This test does not work when the truth type does not match vector type.  */
 /* { dg-additional-options "-mno-avx512f" { target { i?86-*-* x86_64-*-* } } } */
 /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
+/* { dg-skip-if "no fallback for MVE" { arm_mve } } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 0fe1e1e077a..8dac516ec12 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5234,6 +5234,18 @@ proc check_effective_target_arm_hard_ok { } {
 	} "-mfloat-abi=hard"]
 }
 
+# Return 1 if this is an ARM target supporting MVE.
+proc check_effective_target_arm_mve { } {
+    if { ![istarget arm*-*-*] } {
+	return 0
+    }
+    return [check_no_compiler_messages arm_mve assembly {
+	#if !defined (__ARM_FEATURE_MVE)
+	#error FOO
+	#endif
+    }]
+}
+
 # Return 1 if the target supports ARMv8.1-M MVE with floating point
 # instructions, 0 otherwise.  The test is valid for ARM.
 # Record the command line options needed.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 10/15] arm: Convert remaining MVE vcmp builtins to predicate qualifiers
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (8 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 09/15] arm: Fix vcond_mask expander for MVE (PR target/100757) Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-13 14:56 ` [PATCH v3 11/15] arm: Convert more MVE " Christophe Lyon
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

This is mostly a mechanical change, only tested by the intrinsics
expansion tests.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.c (BINOP_UNONE_NONE_NONE_QUALIFIERS):
	Delete.
	(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
	(TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this.
	(TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New.
	* config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new
	predicated qualifiers.
	* config/arm/mve.md (mve_vcmp<mve_cmp_op>q_n_<mode>)
	(mve_vcmp*q_m_f<mode>): Use MVE_VPRED instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 36d71ab1a13..9cc192ddb9a 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -438,12 +438,6 @@ arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_NONE_NONE_UNONE_QUALIFIERS \
   (arm_binop_none_none_unone_qualifiers)
 
-static enum arm_type_qualifiers
-arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none };
-#define BINOP_UNONE_NONE_NONE_QUALIFIERS \
-  (arm_binop_unone_none_none_qualifiers)
-
 static enum arm_type_qualifiers
 arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_predicate, qualifier_none, qualifier_none };
@@ -504,10 +498,10 @@ arm_ternop_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_ternop_unone_unone_imm_unone_qualifiers)
 
 static enum arm_type_qualifiers
-arm_ternop_unone_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_unone_none_none_unone_qualifiers)
+arm_ternop_pred_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none, qualifier_predicate };
+#define TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_none_none_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -553,6 +547,13 @@ arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
   (arm_ternop_unone_unone_unone_pred_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_pred_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned,
+    qualifier_predicate };
+#define TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 44b41eab4c5..b7ebbcab87f 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -118,9 +118,9 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vaddvq_p_u, v16qi, v8hi, v4si)
@@ -142,17 +142,17 @@ VAR3 (BINOP_UNONE_UNONE_NONE, vbrsrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshlq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vrshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vqshlq_n_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpeqq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_IMM, vqshluq_n_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_UNONE, vaddvq_p_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vsubq_s, v16qi, v8hi, v4si)
@@ -218,17 +218,17 @@ VAR2 (BINOP_UNONE_UNONE_IMM, vshlltq_n_u, v16qi, v8hi)
 VAR2 (BINOP_UNONE_UNONE_IMM, vshllbq_n_u, v16qi, v8hi)
 VAR2 (BINOP_UNONE_UNONE_IMM, vorrq_n_u, v8hi, v4si)
 VAR2 (BINOP_UNONE_UNONE_IMM, vbicq_n_u, v8hi, v4si)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpneq_n_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpneq_n_f, v8hf, v4sf)
 VAR2 (BINOP_PRED_NONE_NONE, vcmpneq_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpltq_n_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpltq_n_f, v8hf, v4sf)
 VAR2 (BINOP_PRED_NONE_NONE, vcmpltq_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpleq_n_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpleq_n_f, v8hf, v4sf)
 VAR2 (BINOP_PRED_NONE_NONE, vcmpleq_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpgtq_n_f, v8hf, v4sf)
 VAR2 (BINOP_PRED_NONE_NONE, vcmpgtq_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpgeq_n_f, v8hf, v4sf)
 VAR2 (BINOP_PRED_NONE_NONE, vcmpgeq_f, v8hf, v4sf)
-VAR2 (BINOP_UNONE_NONE_NONE, vcmpeqq_n_f, v8hf, v4sf)
+VAR2 (BINOP_PRED_NONE_NONE, vcmpeqq_n_f, v8hf, v4sf)
 VAR2 (BINOP_PRED_NONE_NONE, vcmpeqq_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vsubq_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vqmovntq_s, v8hi, v4si)
@@ -285,7 +285,7 @@ VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlaldavhaq_s, v4si)
 VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrmlaldavhaq_u, v4si)
 VAR2 (TERNOP_NONE_NONE_UNONE_UNONE, vcvtq_m_to_f_u, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtq_m_to_f_s, v8hf, v4sf)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpeqq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpeqq_m_f, v8hf, v4sf)
 VAR3 (TERNOP_UNONE_NONE_UNONE_IMM, vshlcq_carry_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vshlcq_carry_u, v16qi, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_NONE_IMM, vqrshrunbq_n_s, v8hi, v4si)
@@ -306,14 +306,14 @@ VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmladavaq_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vminvq_p_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmaxvq_p_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vdupq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vcmpneq_m_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vcmpneq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vcmphiq_m_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vcmphiq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vcmpeqq_m_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vcmpeqq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vcmpcsq_m_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vcmpcsq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpneq_m_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpneq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmphiq_m_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmphiq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpeqq_m_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpeqq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpcsq_m_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpcsq_m_n_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vclzq_m_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vaddvaq_p_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vsriq_n_u, v16qi, v8hi, v4si)
@@ -326,18 +326,18 @@ VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vminavq_p_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vminaq_m_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vmaxavq_p_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vmaxaq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpneq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpneq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpltq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpltq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpleq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpleq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpgtq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpgtq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpgeq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpgeq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpeqq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpeqq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpneq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpneq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpltq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpltq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpleq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpleq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpgtq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpgtq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpgeq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpgeq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpeqq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpeqq_m_n_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vshlq_m_r_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vrshlq_m_n_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vrev64q_m_s, v16qi, v8hi, v4si)
@@ -405,17 +405,17 @@ VAR2 (TERNOP_UNONE_UNONE_NONE_IMM, vqshrunbq_n_s, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_NONE_IMM, vqrshruntq_n_s, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_IMM_UNONE, vorrq_m_n_u, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_IMM_UNONE, vmvnq_m_n_u, v8hi, v4si)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpneq_m_n_f, v8hf, v4sf)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpneq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpltq_m_n_f, v8hf, v4sf)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpltq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpleq_m_n_f, v8hf, v4sf)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpleq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpgtq_m_n_f, v8hf, v4sf)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpgtq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpgeq_m_n_f, v8hf, v4sf)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpgeq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpeqq_m_n_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpneq_m_n_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpneq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpltq_m_n_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpltq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpleq_m_n_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpleq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpgtq_m_n_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpgtq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpgeq_m_n_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpgeq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpeqq_m_n_f, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrndxq_m_f, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrndq_m_f, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrndpq_m_f, v8hf, v4sf)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 35564e870bc..c5cdc06c548 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -853,8 +853,8 @@ (define_insn "@mve_vcmp<mve_cmp_op>q_<mode>"
 ;;
 (define_insn "mve_vcmp<mve_cmp_op>q_n_<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(MVE_COMPARISONS:HI (match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(MVE_COMPARISONS:<MVE_VPRED> (match_operand:MVE_2 1 "s_register_operand" "w")
 		    (match_operand:<V_elem> 2 "s_register_operand" "r")))
   ]
   "TARGET_HAVE_MVE"
@@ -1943,8 +1943,8 @@ (define_insn "@mve_vcmp<mve_cmp_op>q_f<mode>"
 ;;
 (define_insn "@mve_vcmp<mve_cmp_op>q_n_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(MVE_FP_COMPARISONS:<MVE_VPRED> (match_operand:MVE_0 1 "s_register_operand" "w")
 			       (match_operand:<V_elem> 2 "s_register_operand" "r")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -2593,10 +2593,10 @@ (define_insn "mve_vbicq_m_n_<supf><mode>"
 ;;
 (define_insn "mve_vcmpeqq_m_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		    (match_operand:MVE_0 2 "s_register_operand" "w")
-		    (match_operand:HI 3 "vpr_register_operand" "Up")]
+		    (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPEQQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -2809,10 +2809,10 @@ (define_insn "mve_vclzq_m_<supf><mode>"
 ;;
 (define_insn "mve_vcmpcsq_m_n_u<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPCSQ_M_N_U))
   ]
   "TARGET_HAVE_MVE"
@@ -2825,10 +2825,10 @@ (define_insn "mve_vcmpcsq_m_n_u<mode>"
 ;;
 (define_insn "mve_vcmpcsq_m_u<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPCSQ_M_U))
   ]
   "TARGET_HAVE_MVE"
@@ -2841,10 +2841,10 @@ (define_insn "mve_vcmpcsq_m_u<mode>"
 ;;
 (define_insn "mve_vcmpeqq_m_n_<supf><mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPEQQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -2857,10 +2857,10 @@ (define_insn "mve_vcmpeqq_m_n_<supf><mode>"
 ;;
 (define_insn "mve_vcmpeqq_m_<supf><mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPEQQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -2873,10 +2873,10 @@ (define_insn "mve_vcmpeqq_m_<supf><mode>"
 ;;
 (define_insn "mve_vcmpgeq_m_n_s<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPGEQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -2889,10 +2889,10 @@ (define_insn "mve_vcmpgeq_m_n_s<mode>"
 ;;
 (define_insn "mve_vcmpgeq_m_s<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPGEQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -2905,10 +2905,10 @@ (define_insn "mve_vcmpgeq_m_s<mode>"
 ;;
 (define_insn "mve_vcmpgtq_m_n_s<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPGTQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -2921,10 +2921,10 @@ (define_insn "mve_vcmpgtq_m_n_s<mode>"
 ;;
 (define_insn "mve_vcmpgtq_m_s<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPGTQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -2937,10 +2937,10 @@ (define_insn "mve_vcmpgtq_m_s<mode>"
 ;;
 (define_insn "mve_vcmphiq_m_n_u<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPHIQ_M_N_U))
   ]
   "TARGET_HAVE_MVE"
@@ -2953,10 +2953,10 @@ (define_insn "mve_vcmphiq_m_n_u<mode>"
 ;;
 (define_insn "mve_vcmphiq_m_u<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPHIQ_M_U))
   ]
   "TARGET_HAVE_MVE"
@@ -2969,10 +2969,10 @@ (define_insn "mve_vcmphiq_m_u<mode>"
 ;;
 (define_insn "mve_vcmpleq_m_n_s<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPLEQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -2985,10 +2985,10 @@ (define_insn "mve_vcmpleq_m_n_s<mode>"
 ;;
 (define_insn "mve_vcmpleq_m_s<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPLEQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3001,10 +3001,10 @@ (define_insn "mve_vcmpleq_m_s<mode>"
 ;;
 (define_insn "mve_vcmpltq_m_n_s<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPLTQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3017,10 +3017,10 @@ (define_insn "mve_vcmpltq_m_n_s<mode>"
 ;;
 (define_insn "mve_vcmpltq_m_s<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPLTQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3033,10 +3033,10 @@ (define_insn "mve_vcmpltq_m_s<mode>"
 ;;
 (define_insn "mve_vcmpneq_m_n_<supf><mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPNEQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -3049,10 +3049,10 @@ (define_insn "mve_vcmpneq_m_n_<supf><mode>"
 ;;
 (define_insn "mve_vcmpneq_m_<supf><mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPNEQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -3770,10 +3770,10 @@ (define_insn "mve_vcmlaq<mve_rot><mode>"
 ;;
 (define_insn "mve_vcmpeqq_m_n_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPEQQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3786,10 +3786,10 @@ (define_insn "mve_vcmpeqq_m_n_f<mode>"
 ;;
 (define_insn "mve_vcmpgeq_m_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPGEQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3802,10 +3802,10 @@ (define_insn "mve_vcmpgeq_m_f<mode>"
 ;;
 (define_insn "mve_vcmpgeq_m_n_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPGEQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3818,10 +3818,10 @@ (define_insn "mve_vcmpgeq_m_n_f<mode>"
 ;;
 (define_insn "mve_vcmpgtq_m_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPGTQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3834,10 +3834,10 @@ (define_insn "mve_vcmpgtq_m_f<mode>"
 ;;
 (define_insn "mve_vcmpgtq_m_n_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPGTQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3850,10 +3850,10 @@ (define_insn "mve_vcmpgtq_m_n_f<mode>"
 ;;
 (define_insn "mve_vcmpleq_m_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPLEQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3866,10 +3866,10 @@ (define_insn "mve_vcmpleq_m_f<mode>"
 ;;
 (define_insn "mve_vcmpleq_m_n_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPLEQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3882,10 +3882,10 @@ (define_insn "mve_vcmpleq_m_n_f<mode>"
 ;;
 (define_insn "mve_vcmpltq_m_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPLTQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3898,10 +3898,10 @@ (define_insn "mve_vcmpltq_m_f<mode>"
 ;;
 (define_insn "mve_vcmpltq_m_n_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPLTQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3914,10 +3914,10 @@ (define_insn "mve_vcmpltq_m_n_f<mode>"
 ;;
 (define_insn "mve_vcmpneq_m_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPNEQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3930,10 +3930,10 @@ (define_insn "mve_vcmpneq_m_f<mode>"
 ;;
 (define_insn "mve_vcmpneq_m_n_f<mode>"
   [
-   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
-	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
+	(unspec:<MVE_VPRED> [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCMPNEQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 11/15] arm: Convert more MVE builtins to predicate qualifiers
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (9 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 10/15] arm: Convert remaining MVE vcmp builtins to predicate qualifiers Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-13 14:56 ` [PATCH v3 12/15] arm: Convert more load/store " Christophe Lyon
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

This patch covers all builtins that have an HI operand and use the
<mode> iterator, thus we can replace HI whe <MVE_vpred>.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.c (TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ...
	(TERNOP_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this.
	(TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
	(TERNOP_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
	(TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
	(TERNOP_NONE_NONE_IMM_PRED_QUALIFIERS): ... this.
	(TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Change to ...
	(TERNOP_NONE_NONE_UNONE_PRED_QUALIFIERS): ... this.
	(QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
	(QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS): ... this.
	(QUADOP_NONE_NONE_NONE_NONE_PRED_QUALIFIERS): New.
	(QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
	(QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS): ... this.
	(QUADOP_UNONE_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
	(QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
	(QUADOP_UNONE_UNONE_NONE_IMM_PRED_QUALIFIERS): ... this.
	(QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
	(QUADOP_NONE_NONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
	(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
	(QUADOP_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
	(QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ...
	(QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this.
	(STRS_P_QUALIFIERS): Use predicate qualifier.
	(STRU_P_QUALIFIERS): Likewise.
	(STRSU_P_QUALIFIERS): Likewise.
	(STRSS_P_QUALIFIERS): Likewise.
	(LDRGS_Z_QUALIFIERS): Likewise.
	(LDRGU_Z_QUALIFIERS): Likewise.
	(LDRS_Z_QUALIFIERS): Likewise.
	(LDRU_Z_QUALIFIERS): Likewise.
	(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
	(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
	(BINOP_NONE_NONE_PRED_QUALIFIERS): New.
	(BINOP_UNONE_UNONE_PRED_QUALIFIERS): New.
	* config/arm/arm_mve_builtins.def: Use new predicated qualifiers.
	* config/arm/mve.md: Use MVE_VPRED instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 9cc192ddb9a..0b063b5f037 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -484,18 +484,18 @@ arm_ternop_unone_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_ternop_unone_unone_none_imm_qualifiers)
 
 static enum arm_type_qualifiers
-arm_ternop_unone_unone_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_ternop_unone_unone_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none,
-      qualifier_unsigned };
-#define TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_unone_unone_none_unone_qualifiers)
+      qualifier_predicate };
+#define TERNOP_UNONE_UNONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_unone_unone_none_pred_qualifiers)
 
 static enum arm_type_qualifiers
-arm_ternop_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_ternop_unone_unone_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-    qualifier_unsigned };
-#define TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS \
-  (arm_ternop_unone_unone_imm_unone_qualifiers)
+    qualifier_predicate };
+#define TERNOP_UNONE_UNONE_IMM_PRED_QUALIFIERS \
+  (arm_ternop_unone_unone_imm_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ternop_pred_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -522,16 +522,16 @@ arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_ternop_none_none_none_pred_qualifiers)
 
 static enum arm_type_qualifiers
-arm_ternop_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_immediate, qualifier_unsigned };
-#define TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS \
-  (arm_ternop_none_none_imm_unone_qualifiers)
+arm_ternop_none_none_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_immediate, qualifier_predicate };
+#define TERNOP_NONE_NONE_IMM_PRED_QUALIFIERS \
+  (arm_ternop_none_none_imm_pred_qualifiers)
 
 static enum arm_type_qualifiers
-arm_ternop_none_none_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_unsigned };
-#define TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS \
-  (arm_ternop_none_none_unone_unone_qualifiers)
+arm_ternop_none_none_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_predicate };
+#define TERNOP_NONE_NONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_none_none_unone_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ternop_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -561,11 +561,11 @@ arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_ternop_none_none_none_none_qualifiers)
 
 static enum arm_type_qualifiers
-arm_quadop_unone_unone_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_quadop_unone_unone_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none, qualifier_none,
-    qualifier_unsigned };
-#define QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_quadop_unone_unone_none_none_unone_qualifiers)
+    qualifier_predicate };
+#define QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS \
+  (arm_quadop_unone_unone_none_none_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_quadop_none_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -575,11 +575,18 @@ arm_quadop_none_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_quadop_none_none_none_none_unone_qualifiers)
 
 static enum arm_type_qualifiers
-arm_quadop_none_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_quadop_none_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
+    qualifier_predicate };
+#define QUADOP_NONE_NONE_NONE_NONE_PRED_QUALIFIERS \
+  (arm_quadop_none_none_none_none_pred_qualifiers)
+
+static enum arm_type_qualifiers
+arm_quadop_none_none_none_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate,
-    qualifier_unsigned };
-#define QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS \
-  (arm_quadop_none_none_none_imm_unone_qualifiers)
+    qualifier_predicate };
+#define QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS \
+  (arm_quadop_none_none_none_imm_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_quadop_unone_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -589,32 +596,39 @@ arm_quadop_unone_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_quadop_unone_unone_unone_unone_unone_qualifiers)
 
 static enum arm_type_qualifiers
-arm_quadop_unone_unone_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_quadop_unone_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+    qualifier_unsigned, qualifier_predicate };
+#define QUADOP_UNONE_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_quadop_unone_unone_unone_unone_pred_qualifiers)
+
+static enum arm_type_qualifiers
+arm_quadop_unone_unone_none_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none,
-    qualifier_immediate, qualifier_unsigned };
-#define QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS \
-  (arm_quadop_unone_unone_none_imm_unone_qualifiers)
+    qualifier_immediate, qualifier_predicate };
+#define QUADOP_UNONE_UNONE_NONE_IMM_PRED_QUALIFIERS \
+  (arm_quadop_unone_unone_none_imm_pred_qualifiers)
 
 static enum arm_type_qualifiers
-arm_quadop_none_none_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_quadop_none_none_unone_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_immediate,
-    qualifier_unsigned };
-#define QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS \
-  (arm_quadop_none_none_unone_imm_unone_qualifiers)
+    qualifier_predicate };
+#define QUADOP_NONE_NONE_UNONE_IMM_PRED_QUALIFIERS \
+  (arm_quadop_none_none_unone_imm_pred_qualifiers)
 
 static enum arm_type_qualifiers
-arm_quadop_unone_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_quadop_unone_unone_unone_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
-    qualifier_immediate, qualifier_unsigned };
-#define QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS \
-  (arm_quadop_unone_unone_unone_imm_unone_qualifiers)
+    qualifier_immediate, qualifier_predicate };
+#define QUADOP_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS \
+  (arm_quadop_unone_unone_unone_imm_pred_qualifiers)
 
 static enum arm_type_qualifiers
-arm_quadop_unone_unone_unone_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_quadop_unone_unone_unone_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
-    qualifier_none, qualifier_unsigned };
-#define QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS \
-  (arm_quadop_unone_unone_unone_none_unone_qualifiers)
+    qualifier_none, qualifier_predicate };
+#define QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS \
+  (arm_quadop_unone_unone_unone_none_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -651,25 +665,25 @@ arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 
 static enum arm_type_qualifiers
 arm_strs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_void, qualifier_pointer, qualifier_none, qualifier_unsigned};
+  = { qualifier_void, qualifier_pointer, qualifier_none, qualifier_predicate};
 #define STRS_P_QUALIFIERS (arm_strs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_stru_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer, qualifier_unsigned,
-      qualifier_unsigned};
+      qualifier_predicate};
 #define STRU_P_QUALIFIERS (arm_stru_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer, qualifier_unsigned,
-      qualifier_unsigned, qualifier_unsigned};
+      qualifier_unsigned, qualifier_predicate};
 #define STRSU_P_QUALIFIERS (arm_strsu_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer, qualifier_unsigned,
-      qualifier_none, qualifier_unsigned};
+      qualifier_none, qualifier_predicate};
 #define STRSS_P_QUALIFIERS (arm_strss_p_qualifiers)
 
 static enum arm_type_qualifiers
@@ -729,31 +743,31 @@ arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_pointer, qualifier_unsigned,
-      qualifier_unsigned};
+      qualifier_predicate};
 #define LDRGS_Z_QUALIFIERS (arm_ldrgs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned,
-      qualifier_unsigned};
+      qualifier_predicate};
 #define LDRGU_Z_QUALIFIERS (arm_ldrgu_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_pointer, qualifier_unsigned};
+  = { qualifier_none, qualifier_pointer, qualifier_predicate};
 #define LDRS_Z_QUALIFIERS (arm_ldrs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldru_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned};
+  = { qualifier_unsigned, qualifier_pointer, qualifier_predicate};
 #define LDRU_Z_QUALIFIERS (arm_ldru_z_qualifiers)
 
 static enum arm_type_qualifiers
-arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_quinop_unone_unone_unone_unone_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
-      qualifier_unsigned, qualifier_immediate, qualifier_unsigned };
-#define QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS \
-  (arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers)
+      qualifier_unsigned, qualifier_immediate, qualifier_predicate };
+#define QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS \
+  (arm_quinop_unone_unone_unone_unone_imm_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -830,6 +844,18 @@ arm_sqshl_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const};
 #define SQSHL_QUALIFIERS (arm_sqshl_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_predicate };
+#define BINOP_NONE_NONE_PRED_QUALIFIERS \
+  (arm_binop_none_none_pred_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_predicate };
+#define BINOP_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_binop_unone_unone_pred_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index b7ebbcab87f..7db6d47867e 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -123,7 +123,7 @@ VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vaddvq_p_u, v16qi, v8hi, v4si)
+VAR3 (BINOP_UNONE_UNONE_PRED, vaddvq_p_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vaddvaq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vaddq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vabdq_u, v16qi, v8hi, v4si)
@@ -154,7 +154,7 @@ VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_IMM, vqshluq_n_s, v16qi, v8hi, v4si)
-VAR3 (BINOP_NONE_NONE_UNONE, vaddvq_p_s, v16qi, v8hi, v4si)
+VAR3 (BINOP_NONE_NONE_PRED, vaddvq_p_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vsubq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vsubq_n_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vshlq_r_s, v16qi, v8hi, v4si)
@@ -277,35 +277,35 @@ VAR1 (BINOP_NONE_NONE_NONE, vrmlaldavhq_s, v4si)
 VAR1 (BINOP_NONE_NONE_NONE, vcvttq_f16_f32, v8hf)
 VAR1 (BINOP_NONE_NONE_NONE, vcvtbq_f16_f32, v8hf)
 VAR1 (BINOP_NONE_NONE_NONE, vaddlvaq_s, v4si)
-VAR2 (TERNOP_NONE_NONE_IMM_UNONE, vbicq_m_n_s, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_IMM_UNONE, vbicq_m_n_u, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_IMM_PRED, vbicq_m_n_s, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_IMM_PRED, vbicq_m_n_u, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_IMM, vqrshrnbq_n_s, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_UNONE_IMM, vqrshrnbq_n_u, v8hi, v4si)
 VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlaldavhaq_s, v4si)
 VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrmlaldavhaq_u, v4si)
-VAR2 (TERNOP_NONE_NONE_UNONE_UNONE, vcvtq_m_to_f_u, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtq_m_to_f_s, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_UNONE_PRED, vcvtq_m_to_f_u, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vcvtq_m_to_f_s, v8hf, v4sf)
 VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpeqq_m_f, v8hf, v4sf)
 VAR3 (TERNOP_UNONE_NONE_UNONE_IMM, vshlcq_carry_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vshlcq_carry_u, v16qi, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_NONE_IMM, vqrshrunbq_n_s, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_NONE_NONE, vabavq_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vabavq_u, v16qi, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE, vcvtaq_m_u, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtaq_m_s, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_NONE_PRED, vcvtaq_m_u, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vcvtaq_m_s, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vshlcq_vec_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_UNONE_IMM, vshlcq_vec_s, v16qi, v8hi, v4si)
 VAR4 (TERNOP_UNONE_UNONE_UNONE_PRED, vpselq_u, v16qi, v8hi, v4si, v2di)
 VAR4 (TERNOP_NONE_NONE_NONE_PRED, vpselq_s, v16qi, v8hi, v4si, v2di)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrev64q_m_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmvnq_m_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_UNONE_PRED, vrev64q_m_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_UNONE_PRED, vmvnq_m_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmlasq_n_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmlaq_n_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmladavq_p_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_UNONE_PRED, vmladavq_p_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmladavaq_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vminvq_p_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmaxvq_p_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vdupq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_UNONE_PRED, vminvq_p_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_UNONE_PRED, vmaxvq_p_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_UNONE_PRED, vdupq_m_n_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpneq_m_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpneq_m_n_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmphiq_m_u, v16qi, v8hi, v4si)
@@ -314,18 +314,18 @@ VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpeqq_m_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpeqq_m_n_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpcsq_m_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_UNONE_UNONE_PRED, vcmpcsq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vclzq_m_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vaddvaq_p_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_UNONE_PRED, vclzq_m_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_UNONE_PRED, vaddvaq_p_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vsriq_n_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vsliq_n_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vshlq_m_r_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vrshlq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vqshlq_m_r_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vqrshlq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vminavq_p_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vminaq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vmaxavq_p_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_NONE_UNONE, vmaxaq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_NONE_PRED, vshlq_m_r_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_NONE_PRED, vrshlq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_NONE_PRED, vqshlq_m_r_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_NONE_PRED, vqrshlq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_NONE_PRED, vminavq_p_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_NONE_PRED, vminaq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_NONE_PRED, vmaxavq_p_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_NONE_PRED, vmaxaq_m_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpneq_m_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpneq_m_n_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpltq_m_s, v16qi, v8hi, v4si)
@@ -338,26 +338,26 @@ VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpgeq_m_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpgeq_m_n_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpeqq_m_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_PRED_NONE_NONE_PRED, vcmpeqq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vshlq_m_r_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vrshlq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vrev64q_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vqshlq_m_r_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vqrshlq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vqnegq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vqabsq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vnegq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vmvnq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vmlsdavxq_p_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vmlsdavq_p_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vmladavxq_p_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vmladavq_p_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vminvq_p_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vmaxvq_p_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vdupq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vclzq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vclsq_m_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vaddvaq_p_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_NONE_UNONE, vabsq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vshlq_m_r_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vrshlq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vrev64q_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vqshlq_m_r_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vqrshlq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vqnegq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vqabsq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vnegq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vmvnq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vmlsdavxq_p_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vmlsdavq_p_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vmladavxq_p_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vmladavq_p_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vminvq_p_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vmaxvq_p_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vdupq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vclzq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vclsq_m_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vaddvaq_p_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_NONE_PRED, vabsq_m_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_NONE, vqrdmlsdhxq_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_NONE, vqrdmlsdhq_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_NONE, vqrdmlashq_n_s, v16qi, v8hi, v4si)
@@ -378,14 +378,14 @@ VAR3 (TERNOP_NONE_NONE_NONE_NONE, vmladavaxq_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_NONE, vmladavaq_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_IMM, vsriq_n_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_IMM, vsliq_n_s, v16qi, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrev32q_m_u, v16qi, v8hi)
-VAR2 (TERNOP_UNONE_UNONE_UNONE_UNONE, vqmovntq_m_u, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_UNONE_UNONE, vqmovnbq_m_u, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmovntq_m_u, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmovnbq_m_u, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmovltq_m_u, v16qi, v8hi)
-VAR2 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmovlbq_m_u, v16qi, v8hi)
-VAR2 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmlaldavq_p_u, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_UNONE_PRED, vrev32q_m_u, v16qi, v8hi)
+VAR2 (TERNOP_UNONE_UNONE_UNONE_PRED, vqmovntq_m_u, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_UNONE_PRED, vqmovnbq_m_u, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_UNONE_PRED, vmovntq_m_u, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_UNONE_PRED, vmovnbq_m_u, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_UNONE_PRED, vmovltq_m_u, v16qi, v8hi)
+VAR2 (TERNOP_UNONE_UNONE_UNONE_PRED, vmovlbq_m_u, v16qi, v8hi)
+VAR2 (TERNOP_UNONE_UNONE_UNONE_PRED, vmlaldavq_p_u, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmlaldavaq_u, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_UNONE_IMM, vshrntq_n_u, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_UNONE_IMM, vshrnbq_n_u, v8hi, v4si)
@@ -394,17 +394,17 @@ VAR2 (TERNOP_UNONE_UNONE_UNONE_IMM, vrshrnbq_n_u, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_UNONE_IMM, vqshrntq_n_u, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_UNONE_IMM, vqshrnbq_n_u, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_UNONE_IMM, vqrshrntq_n_u, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE, vqmovuntq_m_s, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE, vqmovunbq_m_s, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE, vcvtq_m_from_f_u, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE, vcvtpq_m_u, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE, vcvtnq_m_u, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE, vcvtmq_m_u, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_NONE_PRED, vqmovuntq_m_s, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_NONE_PRED, vqmovunbq_m_s, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_NONE_PRED, vcvtq_m_from_f_u, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_NONE_PRED, vcvtpq_m_u, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_NONE_PRED, vcvtnq_m_u, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_NONE_PRED, vcvtmq_m_u, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_NONE_IMM, vqshruntq_n_s, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_NONE_IMM, vqshrunbq_n_s, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_NONE_IMM, vqrshruntq_n_s, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_IMM_UNONE, vorrq_m_n_u, v8hi, v4si)
-VAR2 (TERNOP_UNONE_UNONE_IMM_UNONE, vmvnq_m_n_u, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_IMM_PRED, vorrq_m_n_u, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_IMM_PRED, vmvnq_m_n_u, v8hi, v4si)
 VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpneq_m_n_f, v8hf, v4sf)
 VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpneq_m_f, v8hf, v4sf)
 VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpltq_m_n_f, v8hf, v4sf)
@@ -416,38 +416,38 @@ VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpgtq_m_f, v8hf, v4sf)
 VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpgeq_m_n_f, v8hf, v4sf)
 VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpgeq_m_f, v8hf, v4sf)
 VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpeqq_m_n_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrndxq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrndq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrndpq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrndnq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrndmq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrndaq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrev64q_m_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrev32q_m_s, v16qi, v8hi)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vqmovntq_m_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vqmovnbq_m_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vrndxq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vrndq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vrndpq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vrndnq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vrndmq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vrndaq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vrev64q_m_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vrev32q_m_s, v16qi, v8hi)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vqmovntq_m_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vqmovnbq_m_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_PRED, vpselq_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vnegq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovntq_m_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovnbq_m_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovltq_m_s, v16qi, v8hi)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovlbq_m_s, v16qi, v8hi)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmlsldavxq_p_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmlsldavq_p_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmlaldavxq_p_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmlaldavq_p_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vminnmvq_p_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vminnmavq_p_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vminnmaq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmaxnmvq_p_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmaxnmavq_p_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmaxnmaq_m_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vdupq_m_n_f, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtq_m_from_f_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtpq_m_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtnq_m_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtmq_m_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vabsq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vnegq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vmovntq_m_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vmovnbq_m_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vmovltq_m_s, v16qi, v8hi)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vmovlbq_m_s, v16qi, v8hi)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vmlsldavxq_p_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vmlsldavq_p_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vmlaldavxq_p_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vmlaldavq_p_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vminnmvq_p_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vminnmavq_p_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vminnmaq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vmaxnmvq_p_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vmaxnmavq_p_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vmaxnmaq_m_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vdupq_m_n_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vcvtq_m_from_f_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vcvtpq_m_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vcvtnq_m_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vcvtmq_m_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_PRED, vabsq_m_f, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_NONE, vmlsldavaxq_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_NONE, vmlsldavaq_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_NONE, vmlaldavaxq_s, v8hi, v4si)
@@ -463,8 +463,8 @@ VAR2 (TERNOP_NONE_NONE_NONE_IMM, vrshrnbq_n_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_IMM, vqshrntq_n_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_IMM, vqshrnbq_n_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_IMM, vqrshrntq_n_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_IMM_UNONE, vorrq_m_n_s, v8hi, v4si)
-VAR2 (TERNOP_NONE_NONE_IMM_UNONE, vmvnq_m_n_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_IMM_PRED, vorrq_m_n_s, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_IMM_PRED, vmvnq_m_n_s, v8hi, v4si)
 VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrmlaldavhq_p_u, v4si)
 VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrev16q_m_u, v16qi)
 VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vaddlvaq_p_u, v4si)
@@ -482,189 +482,189 @@ VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vaddlvaq_p_s, v4si)
 VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlsldavhaxq_s, v4si)
 VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlsldavhaq_s, v4si)
 VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlaldavhaxq_s, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vsriq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vsriq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsubq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsubq_m_u, v16qi, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_UNONE_IMM_UNONE, vcvtq_m_n_to_f_u, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vcvtq_m_n_to_f_s, v8hf, v4sf)
-VAR3 (QUADOP_UNONE_UNONE_NONE_IMM_UNONE, vqshluq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_NONE_NONE_UNONE, vabavq_p_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vabavq_p_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_NONE_UNONE, vshlq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vshlq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsubq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vrmulhq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vrhaddq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vqsubq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vqsubq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vqaddq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vqaddq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vorrq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vornq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmulq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmulq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmulltq_int_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmullbq_int_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmulhq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmlasq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmlaq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmladavaq_p_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vminq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmaxq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vhsubq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vhsubq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vhaddq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vhaddq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, veorq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vcaddq_rot90_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vcaddq_rot270_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vbicq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vandq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vaddq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vaddq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vabdq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_NONE_UNONE, vrshlq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_NONE_UNONE, vqshlq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_NONE_UNONE, vqrshlq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_NONE_UNONE, vbrsrq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vsliq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshrq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshlq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vrshrq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vqshlq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsubq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrshlq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmulhq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrhaddq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqsubq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqsubq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqshlq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqrshlq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqrdmulhq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqrdmulhq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqrdmlsdhxq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqrdmlsdhq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqrdmlashq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqrdmlahq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqrdmladhxq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqrdmladhq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmulhq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmulhq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmlsdhxq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmlsdhq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmlahq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmlashq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmladhxq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmladhq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqaddq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqaddq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vorrq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vornq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmulq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmulq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmulltq_int_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmullbq_int_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmulhq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmlsdavaxq_p_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmlsdavaq_p_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmlasq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmlaq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmladavaxq_p_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmladavaq_p_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vminq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmaxq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vhsubq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vhsubq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vhcaddq_rot90_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vhcaddq_rot270_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vhaddq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vhaddq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, veorq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcaddq_rot90_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcaddq_rot270_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vbrsrq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vbicq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vandq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vaddq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vaddq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vabdq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vsliq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vshrq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vshlq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vrshrq_m_n_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vqshlq_m_n_s, v16qi, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmulltq_poly_m_p, v16qi, v8hi)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmullbq_poly_m_p, v16qi, v8hi)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vmlaldavaq_p_u, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshrntq_m_n_u, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshrnbq_m_n_u, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshlltq_m_n_u, v16qi, v8hi)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshllbq_m_n_u, v16qi, v8hi)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vrshrntq_m_n_u, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vrshrnbq_m_n_u, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vqshrntq_m_n_u, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vqshrnbq_m_n_u, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vqrshrntq_m_n_u, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vqrshrnbq_m_n_u, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_UNONE, vqshruntq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_UNONE, vqshrunbq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_UNONE, vqrshruntq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_UNONE, vqrshrunbq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmulltq_m_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmulltq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmullbq_m_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vqdmullbq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmlsldavaxq_p_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmlsldavaq_p_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmlaldavaxq_p_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmlaldavaq_p_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vshrntq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vshrnbq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vshlltq_m_n_s, v16qi, v8hi)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vshllbq_m_n_s, v16qi, v8hi)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vrshrntq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vrshrnbq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vqshrntq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vqshrnbq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vqrshrntq_m_n_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vqrshrnbq_m_n_s, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_IMM_PRED, vsriq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vsriq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vsubq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vsubq_m_u, v16qi, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_UNONE_IMM_PRED, vcvtq_m_n_to_f_u, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vcvtq_m_n_to_f_s, v8hf, v4sf)
+VAR3 (QUADOP_UNONE_UNONE_NONE_IMM_PRED, vqshluq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_NONE_NONE_PRED, vabavq_p_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vabavq_p_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_NONE_PRED, vshlq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vshlq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vsubq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vrmulhq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vrhaddq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vqsubq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vqsubq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vqaddq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vqaddq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vorrq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vornq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmulq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmulq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmulltq_int_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmullbq_int_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmulhq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmlasq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmlaq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmladavaq_p_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vminq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmaxq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhsubq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhsubq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, veorq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot90_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot270_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vbicq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vandq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vaddq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vaddq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vabdq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_NONE_PRED, vrshlq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_NONE_PRED, vqshlq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_NONE_PRED, vqrshlq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_NONE_PRED, vbrsrq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vsliq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshrq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vrshrq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vqshlq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vsubq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vrshlq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vrmulhq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vrhaddq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqsubq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqsubq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqshlq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqrshlq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqrdmulhq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqrdmulhq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqrdmlsdhxq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqrdmlsdhq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqrdmlashq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqrdmlahq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqrdmladhxq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqrdmladhq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmulhq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmulhq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmlsdhxq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmlsdhq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmlahq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmlashq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmladhxq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmladhq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqaddq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqaddq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vorrq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vornq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmulq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmulq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmulltq_int_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmullbq_int_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmulhq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmlsdavaxq_p_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmlsdavaq_p_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmlasq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmlaq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmladavaxq_p_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmladavaq_p_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vminq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmaxq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhsubq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhsubq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhcaddq_rot90_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhcaddq_rot270_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, veorq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot90_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot270_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbrsrq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbicq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vandq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vaddq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vaddq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vabdq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_IMM_PRED, vsliq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_IMM_PRED, vshrq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_IMM_PRED, vshlq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_IMM_PRED, vrshrq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqshlq_m_n_s, v16qi, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmulltq_poly_m_p, v16qi, v8hi)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmullbq_poly_m_p, v16qi, v8hi)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vmlaldavaq_p_u, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshrntq_m_n_u, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshrnbq_m_n_u, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlltq_m_n_u, v16qi, v8hi)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshllbq_m_n_u, v16qi, v8hi)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vrshrntq_m_n_u, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vrshrnbq_m_n_u, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vqshrntq_m_n_u, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vqshrnbq_m_n_u, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vqrshrntq_m_n_u, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vqrshrnbq_m_n_u, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_PRED, vqshruntq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_PRED, vqshrunbq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_PRED, vqrshruntq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_PRED, vqrshrunbq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmulltq_m_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmulltq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmullbq_m_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vqdmullbq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmlsldavaxq_p_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmlsldavaq_p_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmlaldavaxq_p_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmlaldavaq_p_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vshrntq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vshrnbq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vshlltq_m_n_s, v16qi, v8hi)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vshllbq_m_n_s, v16qi, v8hi)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vrshrntq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vrshrnbq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqshrntq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqshrnbq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqrshrntq_m_n_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqrshrnbq_m_n_s, v8hi, v4si)
 VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vrmlaldavhaq_p_u, v4si)
 VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlsldavhaxq_p_s, v4si)
 VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlsldavhaq_p_s, v4si)
 VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlaldavhaxq_p_s, v4si)
 VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlaldavhaq_p_s, v4si)
-VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_UNONE, vcvtq_m_n_from_f_u, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vcvtq_m_n_from_f_s, v8hi, v4si)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vbrsrq_m_n_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsubq_m_n_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsubq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vorrq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vornq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmulq_m_n_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmulq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vminnmq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vmaxnmq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vfmsq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vfmasq_m_n_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vfmaq_m_n_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vfmaq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, veorq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcmulq_rot90_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcmulq_rot270_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcmulq_rot180_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcmulq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcmlaq_rot90_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcmlaq_rot270_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcmlaq_rot180_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcmlaq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcaddq_rot90_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vcaddq_rot270_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vbicq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vandq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vaddq_m_n_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vaddq_m_f, v8hf, v4sf)
-VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vabdq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_PRED, vcvtq_m_n_from_f_u, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vcvtq_m_n_from_f_s, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbrsrq_m_n_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vsubq_m_n_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vsubq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vorrq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vornq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmulq_m_n_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmulq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vminnmq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vmaxnmq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vfmsq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vfmasq_m_n_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vfmaq_m_n_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vfmaq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, veorq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcmulq_rot90_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcmulq_rot270_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcmulq_rot180_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcmulq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcmlaq_rot90_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcmlaq_rot270_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcmlaq_rot180_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcmlaq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot90_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot270_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbicq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vandq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vaddq_m_n_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vaddq_m_f, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vabdq_m_f, v8hf, v4sf)
 VAR3 (STRS, vstrbq_s, v16qi, v8hi, v4si)
 VAR3 (STRU, vstrbq_u, v16qi, v8hi, v4si)
 VAR3 (STRSS, vstrbq_scatter_offset_s, v16qi, v8hi, v4si)
@@ -797,14 +797,14 @@ VAR1 (STRSU_P, vstrwq_scatter_offset_p_u, v4si)
 VAR1 (STRSU_P, vstrwq_scatter_shifted_offset_p_u, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, viwdupq_wb_u, v16qi, v4si, v8hi)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vdwdupq_wb_u, v16qi, v4si, v8hi)
-VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE, viwdupq_m_wb_u, v16qi, v8hi, v4si)
-VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE, vdwdupq_m_wb_u, v16qi, v8hi, v4si)
-VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE, viwdupq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE, vdwdupq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, viwdupq_m_wb_u, v16qi, v8hi, v4si)
+VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, vdwdupq_m_wb_u, v16qi, v8hi, v4si)
+VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, viwdupq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, vdwdupq_m_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vddupq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vidupq_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vddupq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vidupq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vddupq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vidupq_m_n_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vdwdupq_n_u, v16qi, v4si, v8hi)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, viwdupq_n_u, v16qi, v4si, v8hi)
 VAR1 (STRSBWBU, vstrwq_scatter_base_wb_u, v4si)
@@ -870,10 +870,10 @@ VAR1 (UQSHL, urshr_, si)
 VAR1 (UQSHL, urshrl_, di)
 VAR1 (UQSHL, uqshl_, si)
 VAR1 (UQSHL, uqshll_, di)
-VAR3 (QUADOP_NONE_NONE_UNONE_IMM_UNONE, vshlcq_m_vec_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_UNONE_IMM_UNONE, vshlcq_m_carry_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshlcq_m_vec_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshlcq_m_carry_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_UNONE_IMM_PRED, vshlcq_m_vec_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_UNONE_IMM_PRED, vshlcq_m_carry_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_vec_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_carry_u, v16qi, v8hi, v4si)
 
 /* optabs without any suffixes.  */
 VAR5 (BINOP_NONE_NONE_NONE, vcaddq_rot90, v16qi, v8hi, v4si, v8hf, v4sf)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index c5cdc06c548..a8087815c22 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -130,7 +130,7 @@ (define_insn "mve_vrndq_m_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VRNDQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -918,7 +918,7 @@ (define_insn "mve_vaddvq_p_<supf><mode>"
   [
    (set (match_operand:SI 0 "s_register_operand" "=Te")
 	(unspec:SI [(match_operand:MVE_2 1 "s_register_operand" "w")
-		    (match_operand:HI 2 "vpr_register_operand" "Up")]
+		    (match_operand:<MVE_VPRED> 2 "vpr_register_operand" "Up")]
 	 VADDVQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -2581,7 +2581,7 @@ (define_insn "mve_vbicq_m_n_<supf><mode>"
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
 	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
 		       (match_operand:SI 2 "immediate_operand" "i")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VBICQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -2611,7 +2611,7 @@ (define_insn "mve_vcvtaq_m_<supf><mode>"
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
 	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
 		       (match_operand:<MVE_CNVT> 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCVTAQ_M))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -2626,7 +2626,7 @@ (define_insn "mve_vcvtq_m_to_f_<supf><mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:<MVE_CNVT> 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCVTQ_M_TO_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -2748,7 +2748,7 @@ (define_insn "mve_vabsq_m_s<mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VABSQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -2764,7 +2764,7 @@ (define_insn "mve_vaddvaq_p_<supf><mode>"
    (set (match_operand:SI 0 "s_register_operand" "=Te")
 	(unspec:SI [(match_operand:SI 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VADDVAQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -2780,7 +2780,7 @@ (define_insn "mve_vclsq_m_s<mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCLSQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -2796,7 +2796,7 @@ (define_insn "mve_vclzq_m_<supf><mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCLZQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -3068,7 +3068,7 @@ (define_insn "mve_vdupq_m_n_<supf><mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VDUPQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -3084,7 +3084,7 @@ (define_insn "mve_vmaxaq_m_s<mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMAXAQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3100,7 +3100,7 @@ (define_insn "mve_vmaxavq_p_s<mode>"
    (set (match_operand:<V_elem> 0 "s_register_operand" "=r")
 	(unspec:<V_elem> [(match_operand:<V_elem> 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMAXAVQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3116,7 +3116,7 @@ (define_insn "mve_vmaxvq_p_<supf><mode>"
    (set (match_operand:<V_elem> 0 "s_register_operand" "=r")
 	(unspec:<V_elem> [(match_operand:<V_elem> 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMAXVQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -3132,7 +3132,7 @@ (define_insn "mve_vminaq_m_s<mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMINAQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3148,7 +3148,7 @@ (define_insn "mve_vminavq_p_s<mode>"
    (set (match_operand:<V_elem> 0 "s_register_operand" "=r")
 	(unspec:<V_elem> [(match_operand:<V_elem> 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMINAVQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3164,7 +3164,7 @@ (define_insn "mve_vminvq_p_<supf><mode>"
    (set (match_operand:<V_elem> 0 "s_register_operand" "=r")
 	(unspec:<V_elem> [(match_operand:<V_elem> 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMINVQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -3196,7 +3196,7 @@ (define_insn "mve_vmladavq_p_<supf><mode>"
    (set (match_operand:SI 0 "s_register_operand" "=Te")
 	(unspec:SI [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMLADAVQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -3212,7 +3212,7 @@ (define_insn "mve_vmladavxq_p_s<mode>"
    (set (match_operand:SI 0 "s_register_operand" "=Te")
 	(unspec:SI [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMLADAVXQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3260,7 +3260,7 @@ (define_insn "mve_vmlsdavq_p_s<mode>"
    (set (match_operand:SI 0 "s_register_operand" "=Te")
 	(unspec:SI [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMLSDAVQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3276,7 +3276,7 @@ (define_insn "mve_vmlsdavxq_p_s<mode>"
    (set (match_operand:SI 0 "s_register_operand" "=Te")
 	(unspec:SI [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMLSDAVXQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3292,7 +3292,7 @@ (define_insn "mve_vmvnq_m_<supf><mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMVNQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -3308,7 +3308,7 @@ (define_insn "mve_vnegq_m_s<mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VNEGQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3340,7 +3340,7 @@ (define_insn "mve_vqabsq_m_s<mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VQABSQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3388,7 +3388,7 @@ (define_insn "mve_vqnegq_m_s<mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VQNEGQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -3500,7 +3500,7 @@ (define_insn "mve_vqrshlq_m_n_<supf><mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:SI 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VQRSHLQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -3516,7 +3516,7 @@ (define_insn "mve_vqshlq_m_r_<supf><mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:SI 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VQSHLQ_M_R))
   ]
   "TARGET_HAVE_MVE"
@@ -3532,7 +3532,7 @@ (define_insn "mve_vrev64q_m_<supf><mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VREV64Q_M))
   ]
   "TARGET_HAVE_MVE"
@@ -3548,7 +3548,7 @@ (define_insn "mve_vrshlq_m_n_<supf><mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:SI 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VRSHLQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -3564,7 +3564,7 @@ (define_insn "mve_vshlq_m_r_<supf><mode>"
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:SI 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VSHLQ_M_R))
   ]
   "TARGET_HAVE_MVE"
@@ -3723,7 +3723,7 @@ (define_insn "mve_vabsq_m_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VABSQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4013,7 +4013,7 @@ (define_insn "mve_vdupq_m_n_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VDUPQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4092,7 +4092,7 @@ (define_insn "mve_vmaxnmaq_m_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMAXNMAQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4107,7 +4107,7 @@ (define_insn "mve_vmaxnmavq_p_f<mode>"
    (set (match_operand:<V_elem> 0 "s_register_operand" "=r")
 	(unspec:<V_elem> [(match_operand:<V_elem> 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMAXNMAVQ_P_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4123,7 +4123,7 @@ (define_insn "mve_vmaxnmvq_p_f<mode>"
    (set (match_operand:<V_elem> 0 "s_register_operand" "=r")
 	(unspec:<V_elem> [(match_operand:<V_elem> 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMAXNMVQ_P_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4138,7 +4138,7 @@ (define_insn "mve_vminnmaq_m_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMINNMAQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4154,7 +4154,7 @@ (define_insn "mve_vminnmavq_p_f<mode>"
    (set (match_operand:<V_elem> 0 "s_register_operand" "=r")
 	(unspec:<V_elem> [(match_operand:<V_elem> 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMINNMAVQ_P_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4169,7 +4169,7 @@ (define_insn "mve_vminnmvq_p_f<mode>"
    (set (match_operand:<V_elem> 0 "s_register_operand" "=r")
 	(unspec:<V_elem> [(match_operand:<V_elem> 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMINNMVQ_P_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4217,7 +4217,7 @@ (define_insn "mve_vmlaldavq_p_<supf><mode>"
    (set (match_operand:DI 0 "s_register_operand" "=r")
 	(unspec:DI [(match_operand:MVE_5 1 "s_register_operand" "w")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMLALDAVQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -4233,7 +4233,7 @@ (define_insn "mve_vmlaldavxq_p_s<mode>"
    (set (match_operand:DI 0 "s_register_operand" "=r")
 	(unspec:DI [(match_operand:MVE_5 1 "s_register_operand" "w")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMLALDAVXQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -4280,7 +4280,7 @@ (define_insn "mve_vmlsldavq_p_s<mode>"
    (set (match_operand:DI 0 "s_register_operand" "=r")
 	(unspec:DI [(match_operand:MVE_5 1 "s_register_operand" "w")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMLSLDAVQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -4296,7 +4296,7 @@ (define_insn "mve_vmlsldavxq_p_s<mode>"
    (set (match_operand:DI 0 "s_register_operand" "=r")
 	(unspec:DI [(match_operand:MVE_5 1 "s_register_operand" "w")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMLSLDAVXQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -4311,7 +4311,7 @@ (define_insn "mve_vmovlbq_m_<supf><mode>"
    (set (match_operand:<V_double_width> 0 "s_register_operand" "=w")
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 		       (match_operand:MVE_3 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMOVLBQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -4326,7 +4326,7 @@ (define_insn "mve_vmovltq_m_<supf><mode>"
    (set (match_operand:<V_double_width> 0 "s_register_operand" "=w")
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 		       (match_operand:MVE_3 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMOVLTQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -4341,7 +4341,7 @@ (define_insn "mve_vmovnbq_m_<supf><mode>"
    (set (match_operand:<V_narrow_pack> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMOVNBQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -4357,7 +4357,7 @@ (define_insn "mve_vmovntq_m_<supf><mode>"
    (set (match_operand:<V_narrow_pack> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMOVNTQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -4373,7 +4373,7 @@ (define_insn "mve_vmvnq_m_n_<supf><mode>"
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
 	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
 		       (match_operand:SI 2 "immediate_operand" "i")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VMVNQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -4388,7 +4388,7 @@ (define_insn "mve_vnegq_m_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VNEGQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4404,7 +4404,7 @@ (define_insn "mve_vorrq_m_n_<supf><mode>"
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
 	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
 		       (match_operand:SI 2 "immediate_operand" "i")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VORRQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -4435,7 +4435,7 @@ (define_insn "mve_vqmovnbq_m_<supf><mode>"
    (set (match_operand:<V_narrow_pack> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VQMOVNBQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -4451,7 +4451,7 @@ (define_insn "mve_vqmovntq_m_<supf><mode>"
    (set (match_operand:<V_narrow_pack> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VQMOVNTQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -4467,7 +4467,7 @@ (define_insn "mve_vqmovunbq_m_s<mode>"
    (set (match_operand:<V_narrow_pack> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VQMOVUNBQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -4483,7 +4483,7 @@ (define_insn "mve_vqmovuntq_m_s<mode>"
    (set (match_operand:<V_narrow_pack> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VQMOVUNTQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -4611,7 +4611,7 @@ (define_insn "mve_vrev32q_m_<supf><mode>"
    (set (match_operand:MVE_3 0 "s_register_operand" "=w")
 	(unspec:MVE_3 [(match_operand:MVE_3 1 "s_register_operand" "0")
 		       (match_operand:MVE_3 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VREV32Q_M))
   ]
   "TARGET_HAVE_MVE"
@@ -4627,7 +4627,7 @@ (define_insn "mve_vrev64q_m_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VREV64Q_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4723,7 +4723,7 @@ (define_insn "mve_vrndaq_m_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VRNDAQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4739,7 +4739,7 @@ (define_insn "mve_vrndmq_m_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VRNDMQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4755,7 +4755,7 @@ (define_insn "mve_vrndnq_m_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VRNDNQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4771,7 +4771,7 @@ (define_insn "mve_vrndpq_m_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VRNDPQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4787,7 +4787,7 @@ (define_insn "mve_vrndxq_m_f<mode>"
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VRNDXQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4867,7 +4867,7 @@ (define_insn "mve_vcvtmq_m_<supf><mode>"
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
 	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
 		       (match_operand:<MVE_CNVT> 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCVTMQ_M))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4883,7 +4883,7 @@ (define_insn "mve_vcvtpq_m_<supf><mode>"
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
 	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
 		       (match_operand:<MVE_CNVT> 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCVTPQ_M))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4899,7 +4899,7 @@ (define_insn "mve_vcvtnq_m_<supf><mode>"
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
 	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
 		       (match_operand:<MVE_CNVT> 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCVTNQ_M))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4916,7 +4916,7 @@ (define_insn "mve_vcvtq_m_n_from_f_<supf><mode>"
 	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
 		       (match_operand:<MVE_CNVT> 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred2>" "<MVE_constraint2>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCVTQ_M_N_FROM_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4948,7 +4948,7 @@ (define_insn "mve_vcvtq_m_from_f_<supf><mode>"
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
 	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
 		       (match_operand:<MVE_CNVT> 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCVTQ_M_FROM_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4997,7 +4997,7 @@ (define_insn "mve_vabavq_p_<supf><mode>"
 	(unspec:SI [(match_operand:SI 1 "s_register_operand" "0")
 		    (match_operand:MVE_2 2 "s_register_operand" "w")
 		    (match_operand:MVE_2 3 "s_register_operand" "w")
-		    (match_operand:HI 4 "vpr_register_operand" "Up")]
+		    (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VABAVQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -5014,7 +5014,7 @@ (define_insn "mve_vqshluq_m_n_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "mve_imm_7" "Ra")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQSHLUQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5030,7 +5030,7 @@ (define_insn "mve_vshlq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSHLQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5046,7 +5046,7 @@ (define_insn "mve_vsriq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "mve_imm_selective_upto_8" "Rg")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSRIQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5062,7 +5062,7 @@ (define_insn "mve_vsubq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSUBQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5078,7 +5078,7 @@ (define_insn "mve_vcvtq_m_n_to_f_<supf><mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:<MVE_CNVT> 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred2>" "<MVE_constraint2>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCVTQ_M_N_TO_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -5094,7 +5094,7 @@ (define_insn "mve_vabdq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VABDQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5111,7 +5111,7 @@ (define_insn "mve_vaddq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VADDQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5128,7 +5128,7 @@ (define_insn "mve_vaddq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VADDQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5145,7 +5145,7 @@ (define_insn "mve_vandq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VANDQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5162,7 +5162,7 @@ (define_insn "mve_vbicq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VBICQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5179,7 +5179,7 @@ (define_insn "mve_vbrsrq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VBRSRQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5196,7 +5196,7 @@ (define_insn "mve_vcaddq_rot270_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCADDQ_ROT270_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5213,7 +5213,7 @@ (define_insn "mve_vcaddq_rot90_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCADDQ_ROT90_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5230,7 +5230,7 @@ (define_insn "mve_veorq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VEORQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5247,7 +5247,7 @@ (define_insn "mve_vhaddq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VHADDQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5264,7 +5264,7 @@ (define_insn "mve_vhaddq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VHADDQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5281,7 +5281,7 @@ (define_insn "mve_vhsubq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VHSUBQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5298,7 +5298,7 @@ (define_insn "mve_vhsubq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VHSUBQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5315,7 +5315,7 @@ (define_insn "mve_vmaxq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMAXQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5332,7 +5332,7 @@ (define_insn "mve_vminq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMINQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5349,7 +5349,7 @@ (define_insn "mve_vmladavaq_p_<supf><mode>"
 	(unspec:SI [(match_operand:SI 1 "s_register_operand" "0")
 		    (match_operand:MVE_2 2 "s_register_operand" "w")
 		    (match_operand:MVE_2 3 "s_register_operand" "w")
-		    (match_operand:HI 4 "vpr_register_operand" "Up")]
+		    (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMLADAVAQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -5366,7 +5366,7 @@ (define_insn "mve_vmlaq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMLAQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5383,7 +5383,7 @@ (define_insn "mve_vmlasq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMLASQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5400,7 +5400,7 @@ (define_insn "mve_vmulhq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMULHQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5417,7 +5417,7 @@ (define_insn "mve_vmullbq_int_m_<supf><mode>"
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 				  (match_operand:MVE_2 2 "s_register_operand" "w")
 				  (match_operand:MVE_2 3 "s_register_operand" "w")
-				  (match_operand:HI 4 "vpr_register_operand" "Up")]
+				  (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMULLBQ_INT_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5434,7 +5434,7 @@ (define_insn "mve_vmulltq_int_m_<supf><mode>"
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 				  (match_operand:MVE_2 2 "s_register_operand" "w")
 				  (match_operand:MVE_2 3 "s_register_operand" "w")
-				  (match_operand:HI 4 "vpr_register_operand" "Up")]
+				  (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMULLTQ_INT_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5451,7 +5451,7 @@ (define_insn "mve_vmulq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMULQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5468,7 +5468,7 @@ (define_insn "mve_vmulq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMULQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5485,7 +5485,7 @@ (define_insn "mve_vornq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VORNQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5502,7 +5502,7 @@ (define_insn "mve_vorrq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VORRQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5519,7 +5519,7 @@ (define_insn "mve_vqaddq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQADDQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5536,7 +5536,7 @@ (define_insn "mve_vqaddq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQADDQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5553,7 +5553,7 @@ (define_insn "mve_vqdmlahq_m_n_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMLAHQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5570,7 +5570,7 @@ (define_insn "mve_vqdmlashq_m_n_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMLASHQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5587,7 +5587,7 @@ (define_insn "mve_vqrdmlahq_m_n_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRDMLAHQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5604,7 +5604,7 @@ (define_insn "mve_vqrdmlashq_m_n_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRDMLASHQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5621,7 +5621,7 @@ (define_insn "mve_vqrshlq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRSHLQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5638,7 +5638,7 @@ (define_insn "mve_vqshlq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "immediate_operand" "i")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQSHLQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5655,7 +5655,7 @@ (define_insn "mve_vqshlq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQSHLQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5672,7 +5672,7 @@ (define_insn "mve_vqsubq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQSUBQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5689,7 +5689,7 @@ (define_insn "mve_vqsubq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQSUBQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5706,7 +5706,7 @@ (define_insn "mve_vrhaddq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VRHADDQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5723,7 +5723,7 @@ (define_insn "mve_vrmulhq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VRMULHQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5740,7 +5740,7 @@ (define_insn "mve_vrshlq_m_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VRSHLQ_M))
   ]
   "TARGET_HAVE_MVE"
@@ -5757,7 +5757,7 @@ (define_insn "mve_vrshrq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred2>" "<MVE_constraint2>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VRSHRQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5774,7 +5774,7 @@ (define_insn "mve_vshlq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "immediate_operand" "i")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSHLQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5791,7 +5791,7 @@ (define_insn "mve_vshrq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred2>" "<MVE_constraint2>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSHRQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5808,7 +5808,7 @@ (define_insn "mve_vsliq_m_n_<supf><mode>"
        (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred>" "<MVE_constraint>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSLIQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5825,7 +5825,7 @@ (define_insn "mve_vsubq_m_n_<supf><mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSUBQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -5842,7 +5842,7 @@ (define_insn "mve_vhcaddq_rot270_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VHCADDQ_ROT270_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5859,7 +5859,7 @@ (define_insn "mve_vhcaddq_rot90_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VHCADDQ_ROT90_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5876,7 +5876,7 @@ (define_insn "mve_vmladavaxq_p_s<mode>"
 	(unspec:SI [(match_operand:SI 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMLADAVAXQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5893,7 +5893,7 @@ (define_insn "mve_vmlsdavaq_p_s<mode>"
 	(unspec:SI [(match_operand:SI 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMLSDAVAQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5910,7 +5910,7 @@ (define_insn "mve_vmlsdavaxq_p_s<mode>"
 	(unspec:SI [(match_operand:SI 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMLSDAVAXQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5927,7 +5927,7 @@ (define_insn "mve_vqdmladhq_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMLADHQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5944,7 +5944,7 @@ (define_insn "mve_vqdmladhxq_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMLADHXQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5961,7 +5961,7 @@ (define_insn "mve_vqdmlsdhq_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMLSDHQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5978,7 +5978,7 @@ (define_insn "mve_vqdmlsdhxq_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMLSDHXQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -5995,7 +5995,7 @@ (define_insn "mve_vqdmulhq_m_n_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMULHQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6012,7 +6012,7 @@ (define_insn "mve_vqdmulhq_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMULHQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6029,7 +6029,7 @@ (define_insn "mve_vqrdmladhq_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRDMLADHQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6046,7 +6046,7 @@ (define_insn "mve_vqrdmladhxq_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRDMLADHXQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6063,7 +6063,7 @@ (define_insn "mve_vqrdmlsdhq_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRDMLSDHQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6080,7 +6080,7 @@ (define_insn "mve_vqrdmlsdhxq_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRDMLSDHXQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6097,7 +6097,7 @@ (define_insn "mve_vqrdmulhq_m_n_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRDMULHQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6114,7 +6114,7 @@ (define_insn "mve_vqrdmulhq_m_s<mode>"
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRDMULHQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6131,7 +6131,7 @@ (define_insn "mve_vmlaldavaq_p_<supf><mode>"
 	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:MVE_5 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMLALDAVAQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -6148,7 +6148,7 @@ (define_insn "mve_vmlaldavaxq_p_<supf><mode>"
 	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:MVE_5 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMLALDAVAXQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -6165,7 +6165,7 @@ (define_insn "mve_vqrshrnbq_m_n_<supf><mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "mve_imm_8" "Rb")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRSHRNBQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -6182,7 +6182,7 @@ (define_insn "mve_vqrshrntq_m_n_<supf><mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "mve_imm_8" "Rb")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRSHRNTQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -6199,7 +6199,7 @@ (define_insn "mve_vqshrnbq_m_n_<supf><mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred3>" "<MVE_constraint3>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQSHRNBQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -6216,7 +6216,7 @@ (define_insn "mve_vqshrntq_m_n_<supf><mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred3>" "<MVE_constraint3>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQSHRNTQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -6250,7 +6250,7 @@ (define_insn "mve_vrshrnbq_m_n_<supf><mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "mve_imm_8" "Rb")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VRSHRNBQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -6267,7 +6267,7 @@ (define_insn "mve_vrshrntq_m_n_<supf><mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "mve_imm_8" "Rb")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VRSHRNTQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -6284,7 +6284,7 @@ (define_insn "mve_vshllbq_m_n_<supf><mode>"
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 		       (match_operand:MVE_3 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "immediate_operand" "i")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSHLLBQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -6301,7 +6301,7 @@ (define_insn "mve_vshlltq_m_n_<supf><mode>"
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 		       (match_operand:MVE_3 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "immediate_operand" "i")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSHLLTQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -6318,7 +6318,7 @@ (define_insn "mve_vshrnbq_m_n_<supf><mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred3>" "<MVE_constraint3>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSHRNBQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -6335,7 +6335,7 @@ (define_insn "mve_vshrntq_m_n_<supf><mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred3>" "<MVE_constraint3>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSHRNTQ_M_N))
   ]
   "TARGET_HAVE_MVE"
@@ -6352,7 +6352,7 @@ (define_insn "mve_vmlsldavaq_p_s<mode>"
 	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:MVE_5 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMLSLDAVAQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6369,7 +6369,7 @@ (define_insn "mve_vmlsldavaxq_p_s<mode>"
 	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:MVE_5 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMLSLDAVAXQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6386,7 +6386,7 @@ (define_insn "mve_vmullbq_poly_m_p<mode>"
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 		       (match_operand:MVE_3 2 "s_register_operand" "w")
 		       (match_operand:MVE_3 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMULLBQ_POLY_M_P))
   ]
   "TARGET_HAVE_MVE"
@@ -6403,7 +6403,7 @@ (define_insn "mve_vmulltq_poly_m_p<mode>"
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 		       (match_operand:MVE_3 2 "s_register_operand" "w")
 		       (match_operand:MVE_3 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMULLTQ_POLY_M_P))
   ]
   "TARGET_HAVE_MVE"
@@ -6420,7 +6420,7 @@ (define_insn "mve_vqdmullbq_m_n_s<mode>"
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMULLBQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6437,7 +6437,7 @@ (define_insn "mve_vqdmullbq_m_s<mode>"
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:MVE_5 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMULLBQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6454,7 +6454,7 @@ (define_insn "mve_vqdmulltq_m_n_s<mode>"
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMULLTQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6471,7 +6471,7 @@ (define_insn "mve_vqdmulltq_m_s<mode>"
 	(unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:MVE_5 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQDMULLTQ_M_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6488,7 +6488,7 @@ (define_insn "mve_vqrshrunbq_m_n_s<mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "mve_imm_8" "Rb")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRSHRUNBQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6505,7 +6505,7 @@ (define_insn "mve_vqrshruntq_m_n_s<mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred3>" "<MVE_constraint3>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQRSHRUNTQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6522,7 +6522,7 @@ (define_insn "mve_vqshrunbq_m_n_s<mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred3>" "<MVE_constraint3>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQSHRUNBQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6539,7 +6539,7 @@ (define_insn "mve_vqshruntq_m_n_s<mode>"
 	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
 		       (match_operand:MVE_5 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "<MVE_pred3>" "<MVE_constraint3>")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VQSHRUNTQ_M_N_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6623,7 +6623,7 @@ (define_insn "mve_vabdq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VABDQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6640,7 +6640,7 @@ (define_insn "mve_vaddq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VADDQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6657,7 +6657,7 @@ (define_insn "mve_vaddq_m_n_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VADDQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6674,7 +6674,7 @@ (define_insn "mve_vandq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VANDQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6691,7 +6691,7 @@ (define_insn "mve_vbicq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VBICQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6708,7 +6708,7 @@ (define_insn "mve_vbrsrq_m_n_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:SI 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VBRSRQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6725,7 +6725,7 @@ (define_insn "mve_vcaddq_rot270_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCADDQ_ROT270_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6742,7 +6742,7 @@ (define_insn "mve_vcaddq_rot90_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCADDQ_ROT90_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6759,7 +6759,7 @@ (define_insn "mve_vcmlaq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCMLAQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6776,7 +6776,7 @@ (define_insn "mve_vcmlaq_rot180_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCMLAQ_ROT180_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6793,7 +6793,7 @@ (define_insn "mve_vcmlaq_rot270_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCMLAQ_ROT270_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6810,7 +6810,7 @@ (define_insn "mve_vcmlaq_rot90_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCMLAQ_ROT90_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6827,7 +6827,7 @@ (define_insn "mve_vcmulq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCMULQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6844,7 +6844,7 @@ (define_insn "mve_vcmulq_rot180_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCMULQ_ROT180_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6861,7 +6861,7 @@ (define_insn "mve_vcmulq_rot270_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCMULQ_ROT270_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6878,7 +6878,7 @@ (define_insn "mve_vcmulq_rot90_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VCMULQ_ROT90_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6895,7 +6895,7 @@ (define_insn "mve_veorq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VEORQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6912,7 +6912,7 @@ (define_insn "mve_vfmaq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VFMAQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6929,7 +6929,7 @@ (define_insn "mve_vfmaq_m_n_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VFMAQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6946,7 +6946,7 @@ (define_insn "mve_vfmasq_m_n_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VFMASQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6963,7 +6963,7 @@ (define_insn "mve_vfmsq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VFMSQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6980,7 +6980,7 @@ (define_insn "mve_vmaxnmq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMAXNMQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -6997,7 +6997,7 @@ (define_insn "mve_vminnmq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMINNMQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7014,7 +7014,7 @@ (define_insn "mve_vmulq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMULQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7031,7 +7031,7 @@ (define_insn "mve_vmulq_m_n_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VMULQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7048,7 +7048,7 @@ (define_insn "mve_vornq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VORNQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7065,7 +7065,7 @@ (define_insn "mve_vorrq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VORRQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7082,7 +7082,7 @@ (define_insn "mve_vsubq_m_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSUBQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7099,7 +7099,7 @@ (define_insn "mve_vsubq_m_n_f<mode>"
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VSUBQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7248,7 +7248,7 @@ (define_expand "mve_vstrbq_scatter_offset_p_<supf><mode>"
   [(match_operand:<MVE_B_ELEM>  0 "mve_scatter_memory")
    (match_operand:MVE_2 1 "s_register_operand")
    (match_operand:MVE_2 2 "s_register_operand")
-   (match_operand:HI 3 "vpr_register_operand" "Up")
+   (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")
    (unspec:V4SI [(const_int 0)] VSTRBSOQ)]
   "TARGET_HAVE_MVE"
 {
@@ -7267,7 +7267,7 @@ (define_insn "mve_vstrbq_scatter_offset_p_<supf><mode>_insn"
 	  [(match_operand:SI 0 "register_operand" "r")
 	   (match_operand:MVE_2 1 "s_register_operand" "w")
 	   (match_operand:MVE_2 2 "s_register_operand" "w")
-	   (match_operand:HI 3 "vpr_register_operand" "Up")]
+	   (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	  VSTRBSOQ))]
   "TARGET_HAVE_MVE"
   "vpst\;vstrbt.<V_sz_elem>\t%q2, [%0, %q1]"
@@ -7302,7 +7302,7 @@ (define_insn "mve_vstrwq_scatter_base_p_<supf>v4si"
 (define_insn "mve_vstrbq_p_<supf><mode>"
   [(set (match_operand:<MVE_B_ELEM> 0 "mve_memory_operand" "=Ux")
 	(unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1 "s_register_operand" "w")
-			      (match_operand:HI 2 "vpr_register_operand" "Up")]
+			      (match_operand:<MVE_VPRED> 2 "vpr_register_operand" "Up")]
 	 VSTRBQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7323,7 +7323,7 @@ (define_insn "mve_vldrbq_gather_offset_z_<supf><mode>"
   [(set (match_operand:MVE_2 0 "s_register_operand" "=&w")
 	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VLDRBGOQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7347,7 +7347,7 @@ (define_insn "mve_vldrbq_gather_offset_z_<supf><mode>"
 (define_insn "mve_vldrbq_z_<supf><mode>"
   [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "mve_memory_operand" "Ux")
-		       (match_operand:HI 2 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 2 "vpr_register_operand" "Up")]
 	 VLDRBQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7434,7 +7434,7 @@ (define_insn "mve_vldrhq_gather_offset_z_<supf><mode>"
   [(set (match_operand:MVE_6 0 "s_register_operand" "=&w")
 	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us")
 		       (match_operand:MVE_6 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")
 	]VLDRHGOQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7482,7 +7482,7 @@ (define_insn "mve_vldrhq_gather_shifted_offset_z_<supf><mode>"
   [(set (match_operand:MVE_6 0 "s_register_operand" "=&w")
 	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us")
 		       (match_operand:MVE_6 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")
 	]VLDRHGSOQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7548,7 +7548,7 @@ (define_insn "mve_vldrhq_z_fv8hf"
 (define_insn "mve_vldrhq_z_<supf><mode>"
   [(set (match_operand:MVE_6 0 "s_register_operand" "=w")
 	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "mve_memory_operand" "Ux")
-	(match_operand:HI 2 "vpr_register_operand" "Up")]
+	(match_operand:<MVE_VPRED> 2 "vpr_register_operand" "Up")]
 	 VLDRHQ))
   ]
   "TARGET_HAVE_MVE"
@@ -8124,7 +8124,7 @@ (define_insn "mve_vstrhq_p_fv8hf"
 (define_insn "mve_vstrhq_p_<supf><mode>"
   [(set (match_operand:<MVE_H_ELEM> 0 "mve_memory_operand" "=Ux")
 	(unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1 "s_register_operand" "w")
-			      (match_operand:HI 2 "vpr_register_operand" "Up")]
+			      (match_operand:<MVE_VPRED> 2 "vpr_register_operand" "Up")]
 	 VSTRHQ))
   ]
   "TARGET_HAVE_MVE"
@@ -8145,7 +8145,7 @@ (define_expand "mve_vstrhq_scatter_offset_p_<supf><mode>"
   [(match_operand:<MVE_H_ELEM> 0 "mve_scatter_memory")
    (match_operand:MVE_6 1 "s_register_operand")
    (match_operand:MVE_6 2 "s_register_operand")
-   (match_operand:HI 3 "vpr_register_operand")
+   (match_operand:<MVE_VPRED> 3 "vpr_register_operand")
    (unspec:V4SI [(const_int 0)] VSTRHSOQ)]
   "TARGET_HAVE_MVE"
 {
@@ -8164,7 +8164,7 @@ (define_insn "mve_vstrhq_scatter_offset_p_<supf><mode>_insn"
 	  [(match_operand:SI 0 "register_operand" "r")
 	   (match_operand:MVE_6 1 "s_register_operand" "w")
 	   (match_operand:MVE_6 2 "s_register_operand" "w")
-	   (match_operand:HI 3 "vpr_register_operand" "Up")]
+	   (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	  VSTRHSOQ))]
   "TARGET_HAVE_MVE"
   "vpst\;vstrht.<V_sz_elem>\t%q2, [%0, %q1]"
@@ -8205,7 +8205,7 @@ (define_expand "mve_vstrhq_scatter_shifted_offset_p_<supf><mode>"
   [(match_operand:<MVE_H_ELEM> 0 "mve_scatter_memory")
    (match_operand:MVE_6 1 "s_register_operand")
    (match_operand:MVE_6 2 "s_register_operand")
-   (match_operand:HI 3 "vpr_register_operand")
+   (match_operand:<MVE_VPRED> 3 "vpr_register_operand")
    (unspec:V4SI [(const_int 0)] VSTRHSSOQ)]
   "TARGET_HAVE_MVE"
 {
@@ -8224,7 +8224,7 @@ (define_insn "mve_vstrhq_scatter_shifted_offset_p_<supf><mode>_insn"
 	  [(match_operand:SI 0 "register_operand" "r")
 	   (match_operand:MVE_6 1 "s_register_operand" "w")
 	   (match_operand:MVE_6 2 "s_register_operand" "w")
-	   (match_operand:HI 3 "vpr_register_operand" "Up")]
+	   (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	  VSTRHSSOQ))]
   "TARGET_HAVE_MVE"
   "vpst\;vstrht.<V_sz_elem>\t%q2, [%0, %q1, uxtw #1]"
@@ -9011,7 +9011,7 @@ (define_expand "mve_vidupq_m_n_u<mode>"
    (match_operand:MVE_2 1 "s_register_operand")
    (match_operand:SI 2 "s_register_operand")
    (match_operand:SI 3 "mve_imm_selective_upto_8")
-   (match_operand:HI 4 "vpr_register_operand")]
+   (match_operand:<MVE_VPRED> 4 "vpr_register_operand")]
   "TARGET_HAVE_MVE"
 {
   rtx temp = gen_reg_rtx (SImode);
@@ -9031,7 +9031,7 @@ (define_insn "mve_vidupq_m_wb_u<mode>_insn"
        (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		      (match_operand:SI 3 "s_register_operand" "2")
 		      (match_operand:SI 4 "mve_imm_selective_upto_8" "Rg")
-		      (match_operand:HI 5 "vpr_register_operand" "Up")]
+		      (match_operand:<MVE_VPRED> 5 "vpr_register_operand" "Up")]
 	VIDUPQ_M))
   (set (match_operand:SI 2 "s_register_operand" "=Te")
        (plus:SI (match_dup 3)
@@ -9079,7 +9079,7 @@ (define_expand "mve_vddupq_m_n_u<mode>"
    (match_operand:MVE_2 1 "s_register_operand")
    (match_operand:SI 2 "s_register_operand")
    (match_operand:SI 3 "mve_imm_selective_upto_8")
-   (match_operand:HI 4 "vpr_register_operand")]
+   (match_operand:<MVE_VPRED> 4 "vpr_register_operand")]
   "TARGET_HAVE_MVE"
 {
   rtx temp = gen_reg_rtx (SImode);
@@ -9099,7 +9099,7 @@ (define_insn "mve_vddupq_m_wb_u<mode>_insn"
        (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		      (match_operand:SI 3 "s_register_operand" "2")
 		      (match_operand:SI 4 "mve_imm_selective_upto_8" "Rg")
-		      (match_operand:HI 5 "vpr_register_operand" "Up")]
+		      (match_operand:<MVE_VPRED> 5 "vpr_register_operand" "Up")]
 	VDDUPQ_M))
   (set (match_operand:SI 2 "s_register_operand" "=Te")
        (minus:SI (match_dup 3)
@@ -9170,7 +9170,7 @@ (define_expand "mve_vdwdupq_m_n_u<mode>"
   (match_operand:SI 2 "s_register_operand")
   (match_operand:DI 3 "s_register_operand")
   (match_operand:SI 4 "mve_imm_selective_upto_8")
-  (match_operand:HI 5 "vpr_register_operand")]
+  (match_operand:<MVE_VPRED> 5 "vpr_register_operand")]
  "TARGET_HAVE_MVE"
 {
   rtx ignore_wb = gen_reg_rtx (SImode);
@@ -9190,7 +9190,7 @@ (define_expand "mve_vdwdupq_m_wb_u<mode>"
   (match_operand:SI 2 "s_register_operand")
   (match_operand:DI 3 "s_register_operand")
   (match_operand:SI 4 "mve_imm_selective_upto_8")
-  (match_operand:HI 5 "vpr_register_operand")]
+  (match_operand:<MVE_VPRED> 5 "vpr_register_operand")]
  "TARGET_HAVE_MVE"
 {
   rtx ignore_vec = gen_reg_rtx (<MODE>mode);
@@ -9210,7 +9210,7 @@ (define_insn "mve_vdwdupq_m_wb_u<mode>_insn"
 		       (match_operand:SI 3 "s_register_operand" "1")
 		       (subreg:SI (match_operand:DI 4 "s_register_operand" "r") 4)
 		       (match_operand:SI 5 "mve_imm_selective_upto_8" "Rg")
-		       (match_operand:HI 6 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 6 "vpr_register_operand" "Up")]
 	 VDWDUPQ_M))
    (set (match_operand:SI 1 "s_register_operand" "=Te")
 	(unspec:SI [(match_dup 2)
@@ -9287,7 +9287,7 @@ (define_expand "mve_viwdupq_m_n_u<mode>"
   (match_operand:SI 2 "s_register_operand")
   (match_operand:DI 3 "s_register_operand")
   (match_operand:SI 4 "mve_imm_selective_upto_8")
-  (match_operand:HI 5 "vpr_register_operand")]
+  (match_operand:<MVE_VPRED> 5 "vpr_register_operand")]
  "TARGET_HAVE_MVE"
 {
   rtx ignore_wb = gen_reg_rtx (SImode);
@@ -9307,7 +9307,7 @@ (define_expand "mve_viwdupq_m_wb_u<mode>"
   (match_operand:SI 2 "s_register_operand")
   (match_operand:DI 3 "s_register_operand")
   (match_operand:SI 4 "mve_imm_selective_upto_8")
-  (match_operand:HI 5 "vpr_register_operand")]
+  (match_operand:<MVE_VPRED> 5 "vpr_register_operand")]
  "TARGET_HAVE_MVE"
 {
   rtx ignore_vec = gen_reg_rtx (<MODE>mode);
@@ -9327,7 +9327,7 @@ (define_insn "mve_viwdupq_m_wb_u<mode>_insn"
 		       (match_operand:SI 3 "s_register_operand" "1")
 		       (subreg:SI (match_operand:DI 4 "s_register_operand" "r") 4)
 		       (match_operand:SI 5 "mve_imm_selective_upto_8" "Rg")
-		       (match_operand:HI 6 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 6 "vpr_register_operand" "Up")]
 	 VIWDUPQ_M))
    (set (match_operand:SI 1 "s_register_operand" "=Te")
 	(unspec:SI [(match_dup 2)
@@ -10335,7 +10335,7 @@ (define_expand "mve_vshlcq_m_vec_<supf><mode>"
   (match_operand:MVE_2 1 "s_register_operand")
   (match_operand:SI 2 "s_register_operand")
   (match_operand:SI 3 "mve_imm_32")
-  (match_operand:HI 4 "vpr_register_operand")
+  (match_operand:<MVE_VPRED> 4 "vpr_register_operand")
   (unspec:MVE_2 [(const_int 0)] VSHLCQ_M)]
  "TARGET_HAVE_MVE"
 {
@@ -10351,7 +10351,7 @@ (define_expand "mve_vshlcq_m_carry_<supf><mode>"
   (match_operand:MVE_2 1 "s_register_operand")
   (match_operand:SI 2 "s_register_operand")
   (match_operand:SI 3 "mve_imm_32")
-  (match_operand:HI 4 "vpr_register_operand")
+  (match_operand:<MVE_VPRED> 4 "vpr_register_operand")
   (unspec:MVE_2 [(const_int 0)] VSHLCQ_M)]
  "TARGET_HAVE_MVE"
 {
@@ -10367,7 +10367,7 @@ (define_insn "mve_vshlcq_m_<supf><mode>"
        (unspec:MVE_2 [(match_operand:MVE_2 2 "s_register_operand" "0")
 		      (match_operand:SI 3 "s_register_operand" "1")
 		      (match_operand:SI 4 "mve_imm_32" "Rf")
-		      (match_operand:HI 5 "vpr_register_operand" "Up")]
+		      (match_operand:<MVE_VPRED> 5 "vpr_register_operand" "Up")]
 	VSHLCQ_M))
   (set (match_operand:SI  1 "s_register_operand" "=r")
        (unspec:SI [(match_dup 2)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 12/15] arm: Convert more load/store MVE builtins to predicate qualifiers
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (10 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 11/15] arm: Convert more MVE " Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-27 16:56   ` Kyrylo Tkachov
  2022-01-13 14:56 ` [PATCH v3 13/15] arm: Convert more MVE/CDE " Christophe Lyon
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

This patch covers a few builtins where we do not use the <mode>
iterator and thus we cannot use <MVE_vpred>.

For v2di instructions, we keep the HI mode for predicates.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.c (STRSBS_P_QUALIFIERS): Use predicate
	qualifier.
	(STRSBU_P_QUALIFIERS): Likewise.
	(LDRGBS_Z_QUALIFIERS): Likewise.
	(LDRGBU_Z_QUALIFIERS): Likewise.
	(LDRGBWBXU_Z_QUALIFIERS): Likewise.
	(LDRGBWBS_Z_QUALIFIERS): Likewise.
	(LDRGBWBU_Z_QUALIFIERS): Likewise.
	(STRSBWBS_P_QUALIFIERS): Likewise.
	(STRSBWBU_P_QUALIFIERS): Likewise.
	* config/arm/mve.md: Use VxBI instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 0b063b5f037..73678a00398 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -689,13 +689,13 @@ arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-      qualifier_none, qualifier_unsigned};
+      qualifier_none, qualifier_predicate};
 #define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-      qualifier_unsigned, qualifier_unsigned};
+      qualifier_unsigned, qualifier_predicate};
 #define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
 
 static enum arm_type_qualifiers
@@ -731,13 +731,13 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-      qualifier_unsigned};
+      qualifier_predicate};
 #define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-      qualifier_unsigned};
+      qualifier_predicate};
 #define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -777,7 +777,7 @@ arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-      qualifier_unsigned};
+      qualifier_predicate};
 #define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -793,13 +793,13 @@ arm_ldrgbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-      qualifier_unsigned};
+      qualifier_predicate};
 #define LDRGBWBS_Z_QUALIFIERS (arm_ldrgbwbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-      qualifier_unsigned};
+      qualifier_predicate};
 #define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -815,13 +815,13 @@ arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-      qualifier_none, qualifier_unsigned};
+      qualifier_none, qualifier_predicate};
 #define STRSBWBS_P_QUALIFIERS (arm_strsbwbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-      qualifier_unsigned, qualifier_unsigned};
+      qualifier_unsigned, qualifier_predicate};
 #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
 
 static enum arm_type_qualifiers
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index a8087815c22..9633b7187f6 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -7282,7 +7282,7 @@ (define_insn "mve_vstrwq_scatter_base_p_<supf>v4si"
 		[(match_operand:V4SI 0 "s_register_operand" "w")
 		 (match_operand:SI 1 "immediate_operand" "i")
 		 (match_operand:V4SI 2 "s_register_operand" "w")
-		 (match_operand:HI 3 "vpr_register_operand" "Up")]
+		 (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	 VSTRWSBQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7371,7 +7371,7 @@ (define_insn "mve_vldrwq_gather_base_z_<supf>v4si"
   [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
 	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
 		      (match_operand:SI 2 "immediate_operand" "i")
-		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	 VLDRWGBQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7609,7 +7609,7 @@ (define_insn "mve_vldrwq_<supf>v4si"
 (define_insn "mve_vldrwq_z_fv4sf"
   [(set (match_operand:V4SF 0 "s_register_operand" "=w")
 	(unspec:V4SF [(match_operand:V4SI 1 "mve_memory_operand" "Ux")
-	(match_operand:HI 2 "vpr_register_operand" "Up")]
+	(match_operand:V4BI 2 "vpr_register_operand" "Up")]
 	 VLDRWQ_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7629,7 +7629,7 @@ (define_insn "mve_vldrwq_z_fv4sf"
 (define_insn "mve_vldrwq_z_<supf>v4si"
   [(set (match_operand:V4SI 0 "s_register_operand" "=w")
 	(unspec:V4SI [(match_operand:V4SI 1 "mve_memory_operand" "Ux")
-	(match_operand:HI 2 "vpr_register_operand" "Up")]
+	(match_operand:V4BI 2 "vpr_register_operand" "Up")]
 	 VLDRWQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7813,7 +7813,7 @@ (define_insn "mve_vldrhq_gather_offset_z_fv8hf"
   [(set (match_operand:V8HF 0 "s_register_operand" "=&w")
 	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")
 		      (match_operand:V8HI 2 "s_register_operand" "w")
-		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+		      (match_operand:V8BI 3 "vpr_register_operand" "Up")]
 	 VLDRHQGO_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7855,7 +7855,7 @@ (define_insn "mve_vldrhq_gather_shifted_offset_z_fv8hf"
   [(set (match_operand:V8HF 0 "s_register_operand" "=&w")
 	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")
 		      (match_operand:V8HI 2 "s_register_operand" "w")
-		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+		      (match_operand:V8BI 3 "vpr_register_operand" "Up")]
 	 VLDRHQGSO_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7897,7 +7897,7 @@ (define_insn "mve_vldrwq_gather_base_z_fv4sf"
   [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
 	(unspec:V4SF [(match_operand:V4SI 1 "s_register_operand" "w")
 		      (match_operand:SI 2 "immediate_operand" "i")
-		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	 VLDRWQGB_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7958,7 +7958,7 @@ (define_insn "mve_vldrwq_gather_offset_z_fv4sf"
   [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
 	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")
 		      (match_operand:V4SI 2 "s_register_operand" "w")
-		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	 VLDRWQGO_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -7980,7 +7980,7 @@ (define_insn "mve_vldrwq_gather_offset_z_<supf>v4si"
   [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
 	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")
 		      (match_operand:V4SI 2 "s_register_operand" "w")
-		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	 VLDRWGOQ))
   ]
   "TARGET_HAVE_MVE"
@@ -8042,7 +8042,7 @@ (define_insn "mve_vldrwq_gather_shifted_offset_z_fv4sf"
   [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
 	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")
 		      (match_operand:V4SI 2 "s_register_operand" "w")
-		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	 VLDRWQGSO_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -8064,7 +8064,7 @@ (define_insn "mve_vldrwq_gather_shifted_offset_z_<supf>v4si"
   [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
 	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")
 		      (match_operand:V4SI 2 "s_register_operand" "w")
-		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	 VLDRWGSOQ))
   ]
   "TARGET_HAVE_MVE"
@@ -8104,7 +8104,7 @@ (define_insn "mve_vstrhq_fv8hf"
 (define_insn "mve_vstrhq_p_fv8hf"
   [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux")
 	(unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w")
-		      (match_operand:HI 2 "vpr_register_operand" "Up")]
+		      (match_operand:V8BI 2 "vpr_register_operand" "Up")]
 	 VSTRHQ_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -8323,7 +8323,7 @@ (define_insn "mve_vstrwq_p_fv4sf"
 (define_insn "mve_vstrwq_p_<supf>v4si"
   [(set (match_operand:V4SI 0 "mve_memory_operand" "=Ux")
 	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
-		      (match_operand:HI 2 "vpr_register_operand" "Up")]
+		      (match_operand:V4BI 2 "vpr_register_operand" "Up")]
 	 VSTRWQ))
   ]
   "TARGET_HAVE_MVE"
@@ -8576,7 +8576,7 @@ (define_expand "mve_vstrhq_scatter_offset_p_fv8hf"
   [(match_operand:V8HI 0 "mve_scatter_memory")
    (match_operand:V8HI 1 "s_register_operand")
    (match_operand:V8HF 2 "s_register_operand")
-   (match_operand:HI 3 "vpr_register_operand")
+   (match_operand:V8BI 3 "vpr_register_operand")
    (unspec:V4SI [(const_int 0)] VSTRHQSO_F)]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
 {
@@ -8594,7 +8594,7 @@ (define_insn "mve_vstrhq_scatter_offset_p_fv8hf_insn"
 	  [(match_operand:SI 0 "register_operand" "r")
 	   (match_operand:V8HI 1 "s_register_operand" "w")
 	   (match_operand:V8HF 2 "s_register_operand" "w")
-	   (match_operand:HI 3 "vpr_register_operand" "Up")]
+	   (match_operand:V8BI 3 "vpr_register_operand" "Up")]
 	  VSTRHQSO_F))]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vpst\;vstrht.16\t%q2, [%0, %q1]"
@@ -8635,7 +8635,7 @@ (define_expand "mve_vstrhq_scatter_shifted_offset_p_fv8hf"
   [(match_operand:V8HI 0 "memory_operand" "=Us")
    (match_operand:V8HI 1 "s_register_operand" "w")
    (match_operand:V8HF 2 "s_register_operand" "w")
-   (match_operand:HI 3 "vpr_register_operand" "Up")
+   (match_operand:V8BI 3 "vpr_register_operand" "Up")
    (unspec:V4SI [(const_int 0)] VSTRHQSSO_F)]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
 {
@@ -8654,7 +8654,7 @@ (define_insn "mve_vstrhq_scatter_shifted_offset_p_fv8hf_insn"
 	  [(match_operand:SI 0 "register_operand" "r")
 	   (match_operand:V8HI 1 "s_register_operand" "w")
 	   (match_operand:V8HF 2 "s_register_operand" "w")
-	   (match_operand:HI 3 "vpr_register_operand" "Up")]
+	   (match_operand:V8BI 3 "vpr_register_operand" "Up")]
 	  VSTRHQSSO_F))]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vpst\;vstrht.16\t%q2, [%0, %q1, uxtw #1]"
@@ -8691,7 +8691,7 @@ (define_insn "mve_vstrwq_scatter_base_p_fv4sf"
 		[(match_operand:V4SI 0 "s_register_operand" "w")
 		 (match_operand:SI 1 "immediate_operand" "i")
 		 (match_operand:V4SF 2 "s_register_operand" "w")
-		 (match_operand:HI 3 "vpr_register_operand" "Up")]
+		 (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	 VSTRWQSB_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -8740,7 +8740,7 @@ (define_expand "mve_vstrwq_scatter_offset_p_fv4sf"
   [(match_operand:V4SI 0 "mve_scatter_memory")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:V4SF 2 "s_register_operand")
-   (match_operand:HI 3 "vpr_register_operand")
+   (match_operand:V4BI 3 "vpr_register_operand")
    (unspec:V4SI [(const_int 0)] VSTRWQSO_F)]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
 {
@@ -8758,7 +8758,7 @@ (define_insn "mve_vstrwq_scatter_offset_p_fv4sf_insn"
 	  [(match_operand:SI 0 "register_operand" "r")
 	   (match_operand:V4SI 1 "s_register_operand" "w")
 	   (match_operand:V4SF 2 "s_register_operand" "w")
-	   (match_operand:HI 3 "vpr_register_operand" "Up")]
+	   (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	  VSTRWQSO_F))]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vpst\;vstrwt.32\t%q2, [%0, %q1]"
@@ -8771,7 +8771,7 @@ (define_expand "mve_vstrwq_scatter_offset_p_<supf>v4si"
   [(match_operand:V4SI 0 "mve_scatter_memory")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:V4SI 2 "s_register_operand")
-   (match_operand:HI 3 "vpr_register_operand")
+   (match_operand:V4BI 3 "vpr_register_operand")
    (unspec:V4SI [(const_int 0)] VSTRWSOQ)]
   "TARGET_HAVE_MVE"
 {
@@ -8789,7 +8789,7 @@ (define_insn "mve_vstrwq_scatter_offset_p_<supf>v4si_insn"
 	  [(match_operand:SI 0 "register_operand" "r")
 	   (match_operand:V4SI 1 "s_register_operand" "w")
 	   (match_operand:V4SI 2 "s_register_operand" "w")
-	   (match_operand:HI 3 "vpr_register_operand" "Up")]
+	   (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	  VSTRWSOQ))]
   "TARGET_HAVE_MVE"
   "vpst\;vstrwt.32\t%q2, [%0, %q1]"
@@ -8858,7 +8858,7 @@ (define_expand "mve_vstrwq_scatter_shifted_offset_p_fv4sf"
   [(match_operand:V4SI 0 "mve_scatter_memory")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:V4SF 2 "s_register_operand")
-   (match_operand:HI 3 "vpr_register_operand")
+   (match_operand:V4BI 3 "vpr_register_operand")
    (unspec:V4SI [(const_int 0)] VSTRWQSSO_F)]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
 {
@@ -8877,7 +8877,7 @@ (define_insn "mve_vstrwq_scatter_shifted_offset_p_fv4sf_insn"
 	  [(match_operand:SI 0 "register_operand" "r")
 	   (match_operand:V4SI 1 "s_register_operand" "w")
 	   (match_operand:V4SF 2 "s_register_operand" "w")
-	   (match_operand:HI 3 "vpr_register_operand" "Up")]
+	   (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	  VSTRWQSSO_F))]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vpst\;vstrwt.32\t%q2, [%0, %q1, uxtw #2]"
@@ -8890,7 +8890,7 @@ (define_expand "mve_vstrwq_scatter_shifted_offset_p_<supf>v4si"
   [(match_operand:V4SI 0 "mve_scatter_memory")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:V4SI 2 "s_register_operand")
-   (match_operand:HI 3 "vpr_register_operand")
+   (match_operand:V4BI 3 "vpr_register_operand")
    (unspec:V4SI [(const_int 0)] VSTRWSSOQ)]
   "TARGET_HAVE_MVE"
 {
@@ -8909,7 +8909,7 @@ (define_insn "mve_vstrwq_scatter_shifted_offset_p_<supf>v4si_insn"
 	  [(match_operand:SI 0 "register_operand" "r")
 	   (match_operand:V4SI 1 "s_register_operand" "w")
 	   (match_operand:V4SI 2 "s_register_operand" "w")
-	   (match_operand:HI 3 "vpr_register_operand" "Up")]
+	   (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	  VSTRWSSOQ))]
   "TARGET_HAVE_MVE"
   "vpst\;vstrwt.32\t%q2, [%0, %q1, uxtw #2]"
@@ -9376,7 +9376,7 @@ (define_insn "mve_vstrwq_scatter_base_wb_p_<supf>v4si"
 		[(match_operand:V4SI 1 "s_register_operand" "0")
 		 (match_operand:SI 2 "mve_vldrd_immediate" "Ri")
 		 (match_operand:V4SI 3 "s_register_operand" "w")
-		 (match_operand:HI 4 "vpr_register_operand")]
+		 (match_operand:V4BI 4 "vpr_register_operand")]
 	VSTRWSBWBQ))
    (set (match_operand:V4SI 0 "s_register_operand" "=w")
 	(unspec:V4SI [(match_dup 1) (match_dup 2)]
@@ -9427,7 +9427,7 @@ (define_insn "mve_vstrwq_scatter_base_wb_p_fv4sf"
 		[(match_operand:V4SI 1 "s_register_operand" "0")
 		 (match_operand:SI 2 "mve_vldrd_immediate" "Ri")
 		 (match_operand:V4SF 3 "s_register_operand" "w")
-		 (match_operand:HI 4 "vpr_register_operand")]
+		 (match_operand:V4BI 4 "vpr_register_operand")]
 	VSTRWQSBWB_F))
    (set (match_operand:V4SI 0 "s_register_operand" "=w")
 	(unspec:V4SI [(match_dup 1) (match_dup 2)]
@@ -9551,7 +9551,7 @@ (define_expand "mve_vldrwq_gather_base_wb_z_<supf>v4si"
   [(match_operand:V4SI 0 "s_register_operand")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:SI 2 "mve_vldrd_immediate")
-   (match_operand:HI 3 "vpr_register_operand")
+   (match_operand:V4BI 3 "vpr_register_operand")
    (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
   "TARGET_HAVE_MVE"
 {
@@ -9566,7 +9566,7 @@ (define_expand "mve_vldrwq_gather_base_nowb_z_<supf>v4si"
   [(match_operand:V4SI 0 "s_register_operand")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:SI 2 "mve_vldrd_immediate")
-   (match_operand:HI 3 "vpr_register_operand")
+   (match_operand:V4BI 3 "vpr_register_operand")
    (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
   "TARGET_HAVE_MVE"
 {
@@ -9585,7 +9585,7 @@ (define_insn "mve_vldrwq_gather_base_wb_z_<supf>v4si_insn"
   [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
 	(unspec:V4SI [(match_operand:V4SI 2 "s_register_operand" "1")
 		      (match_operand:SI 3 "mve_vldrd_immediate" "Ri")
-		      (match_operand:HI 4 "vpr_register_operand" "Up")
+		      (match_operand:V4BI 4 "vpr_register_operand" "Up")
 		      (mem:BLK (scratch))]
 	 VLDRWGBWBQ))
    (set (match_operand:V4SI 1 "s_register_operand" "=&w")
@@ -9659,7 +9659,7 @@ (define_expand "mve_vldrwq_gather_base_wb_z_fv4sf"
   [(match_operand:V4SI 0 "s_register_operand")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:SI 2 "mve_vldrd_immediate")
-   (match_operand:HI 3 "vpr_register_operand")
+   (match_operand:V4BI 3 "vpr_register_operand")
    (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
 {
@@ -9675,7 +9675,7 @@ (define_expand "mve_vldrwq_gather_base_nowb_z_fv4sf"
   [(match_operand:V4SF 0 "s_register_operand")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:SI 2 "mve_vldrd_immediate")
-   (match_operand:HI 3 "vpr_register_operand")
+   (match_operand:V4BI 3 "vpr_register_operand")
    (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
 {
@@ -9694,7 +9694,7 @@ (define_insn "mve_vldrwq_gather_base_wb_z_fv4sf_insn"
   [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
 	(unspec:V4SF [(match_operand:V4SI 2 "s_register_operand" "1")
 		      (match_operand:SI 3 "mve_vldrd_immediate" "Ri")
-		      (match_operand:HI 4 "vpr_register_operand" "Up")
+		      (match_operand:V4BI 4 "vpr_register_operand" "Up")
 		      (mem:BLK (scratch))]
 	 VLDRWQGBWB_F))
    (set (match_operand:V4SI 1 "s_register_operand" "=&w")
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 13/15] arm: Convert more MVE/CDE builtins to predicate qualifiers
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (11 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 12/15] arm: Convert more load/store " Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-27 16:56   ` Kyrylo Tkachov
  2022-01-13 14:56 ` [PATCH v3 14/15] arm: Add VPR_REG to ALL_REGS Christophe Lyon
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

This patch covers a few non-load/store builtins where we do not use
the <mode> iterator and thus we cannot use <MVE_vpred>.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.c (CX_UNARY_UNONE_QUALIFIERS): Use
	predicate.
	(CX_BINARY_UNONE_QUALIFIERS): Likewise.
	(CX_TERNARY_UNONE_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
	(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
	(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete.
	* config/arm/arm_mve_builtins.def: Use predicated qualifiers.
	* config/arm/mve.md: Use VxBI instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 73678a00398..f9437752a22 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -295,7 +295,7 @@ static enum arm_type_qualifiers
 arm_cx_unary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate, qualifier_none,
       qualifier_unsigned_immediate,
-      qualifier_unsigned };
+      qualifier_predicate };
 #define CX_UNARY_UNONE_QUALIFIERS (arm_cx_unary_unone_qualifiers)
 
 /* T (immediate, T, T, unsigned immediate).  */
@@ -304,7 +304,7 @@ arm_cx_binary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
       qualifier_none, qualifier_none,
       qualifier_unsigned_immediate,
-      qualifier_unsigned };
+      qualifier_predicate };
 #define CX_BINARY_UNONE_QUALIFIERS (arm_cx_binary_unone_qualifiers)
 
 /* T (immediate, T, T, T, unsigned immediate).  */
@@ -313,7 +313,7 @@ arm_cx_ternary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
       qualifier_none, qualifier_none, qualifier_none,
       qualifier_unsigned_immediate,
-      qualifier_unsigned };
+      qualifier_predicate };
 #define CX_TERNARY_UNONE_QUALIFIERS (arm_cx_ternary_unone_qualifiers)
 
 /* The first argument (return type) of a store should be void type,
@@ -509,12 +509,6 @@ arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS \
   (arm_ternop_none_none_none_imm_qualifiers)
 
-static enum arm_type_qualifiers
-arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
@@ -567,13 +561,6 @@ arm_quadop_unone_unone_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS \
   (arm_quadop_unone_unone_none_none_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_none_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
-    qualifier_unsigned };
-#define QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_quadop_none_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_none_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
@@ -588,13 +575,6 @@ arm_quadop_none_none_none_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS \
   (arm_quadop_none_none_none_imm_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_unone_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
-    qualifier_unsigned, qualifier_unsigned };
-#define QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
-  (arm_quadop_unone_unone_unone_unone_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_unone_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 7db6d47867e..1c8ee34f5cb 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -87,8 +87,8 @@ VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u, v16qi, v8hi, v4si, v2di)
 VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
-VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
-VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
+VAR1 (BINOP_NONE_NONE_PRED, vaddlvq_p_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_PRED, vaddlvq_p_u, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
@@ -465,20 +465,20 @@ VAR2 (TERNOP_NONE_NONE_NONE_IMM, vqshrnbq_n_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_IMM, vqrshrntq_n_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_IMM_PRED, vorrq_m_n_s, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_IMM_PRED, vmvnq_m_n_s, v8hi, v4si)
-VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrmlaldavhq_p_u, v4si)
-VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrev16q_m_u, v16qi)
-VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vaddlvaq_p_u, v4si)
-VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrmlsldavhxq_p_s, v4si)
-VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrmlsldavhq_p_s, v4si)
-VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrmlaldavhxq_p_s, v4si)
-VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrmlaldavhq_p_s, v4si)
-VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrev32q_m_f, v8hf)
-VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrev16q_m_s, v16qi)
-VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vcvttq_m_f32_f16, v4sf)
-VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vcvttq_m_f16_f32, v8hf)
-VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vcvtbq_m_f32_f16, v4sf)
-VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vcvtbq_m_f16_f32, v8hf)
-VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vaddlvaq_p_s, v4si)
+VAR1 (TERNOP_UNONE_UNONE_UNONE_PRED, vrmlaldavhq_p_u, v4si)
+VAR1 (TERNOP_UNONE_UNONE_UNONE_PRED, vrev16q_m_u, v16qi)
+VAR1 (TERNOP_UNONE_UNONE_UNONE_PRED, vaddlvaq_p_u, v4si)
+VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrmlsldavhxq_p_s, v4si)
+VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrmlsldavhq_p_s, v4si)
+VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrmlaldavhxq_p_s, v4si)
+VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrmlaldavhq_p_s, v4si)
+VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrev32q_m_f, v8hf)
+VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrev16q_m_s, v16qi)
+VAR1 (TERNOP_NONE_NONE_NONE_PRED, vcvttq_m_f32_f16, v4sf)
+VAR1 (TERNOP_NONE_NONE_NONE_PRED, vcvttq_m_f16_f32, v8hf)
+VAR1 (TERNOP_NONE_NONE_NONE_PRED, vcvtbq_m_f32_f16, v4sf)
+VAR1 (TERNOP_NONE_NONE_NONE_PRED, vcvtbq_m_f16_f32, v8hf)
+VAR1 (TERNOP_NONE_NONE_NONE_PRED, vaddlvaq_p_s, v4si)
 VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlsldavhaxq_s, v4si)
 VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlsldavhaq_s, v4si)
 VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlaldavhaxq_s, v4si)
@@ -629,11 +629,11 @@ VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqshrntq_m_n_s, v8hi, v4si)
 VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqshrnbq_m_n_s, v8hi, v4si)
 VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqrshrntq_m_n_s, v8hi, v4si)
 VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqrshrnbq_m_n_s, v8hi, v4si)
-VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vrmlaldavhaq_p_u, v4si)
-VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlsldavhaxq_p_s, v4si)
-VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlsldavhaq_p_s, v4si)
-VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlaldavhaxq_p_s, v4si)
-VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlaldavhaq_p_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vrmlaldavhaq_p_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vrmlsldavhaxq_p_s, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vrmlsldavhaq_p_s, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vrmlaldavhaxq_p_s, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vrmlaldavhaq_p_s, v4si)
 VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_PRED, vcvtq_m_n_from_f_u, v8hi, v4si)
 VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vcvtq_m_n_from_f_s, v8hi, v4si)
 VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbrsrq_m_n_f, v8hf, v4sf)
@@ -845,14 +845,14 @@ VAR1 (BINOP_NONE_NONE_NONE, vsbciq_s, v4si)
 VAR1 (BINOP_UNONE_UNONE_UNONE, vsbciq_u, v4si)
 VAR1 (BINOP_NONE_NONE_NONE, vsbcq_s, v4si)
 VAR1 (BINOP_UNONE_UNONE_UNONE, vsbcq_u, v4si)
-VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vadciq_m_s, v4si)
-VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vadciq_m_u, v4si)
-VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vadcq_m_s, v4si)
-VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vadcq_m_u, v4si)
-VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsbciq_m_s, v4si)
-VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbciq_m_u, v4si)
-VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsbcq_m_s, v4si)
-VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbcq_m_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vadciq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vadciq_m_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vadcq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vadcq_m_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vsbciq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vsbciq_m_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vsbcq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vsbcq_m_u, v4si)
 VAR5 (STORE1, vst2q, v16qi, v8hi, v4si, v8hf, v4sf)
 VAR5 (LOAD1, vld4q, v16qi, v8hi, v4si, v8hf, v4sf)
 VAR5 (LOAD1, vld2q, v16qi, v8hi, v4si, v8hf, v4sf)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 9633b7187f6..41e85b1a278 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -826,7 +826,7 @@ (define_insn "mve_vaddlvq_p_<supf>v4si"
   [
    (set (match_operand:DI 0 "s_register_operand" "=r")
 	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
-		    (match_operand:HI 2 "vpr_register_operand" "Up")]
+		    (match_operand:V4BI 2 "vpr_register_operand" "Up")]
 	 VADDLVQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -3739,7 +3739,7 @@ (define_insn "mve_vaddlvaq_p_<supf>v4si"
    (set (match_operand:DI 0 "s_register_operand" "=r")
 	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
 		       (match_operand:V4SI 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	 VADDLVAQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -3949,7 +3949,7 @@ (define_insn "mve_vcvtbq_m_f16_f32v8hf"
    (set (match_operand:V8HF 0 "s_register_operand" "=w")
 	(unspec:V8HF [(match_operand:V8HF 1 "s_register_operand" "0")
 		       (match_operand:V4SF 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCVTBQ_M_F16_F32))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3965,7 +3965,7 @@ (define_insn "mve_vcvtbq_m_f32_f16v4sf"
    (set (match_operand:V4SF 0 "s_register_operand" "=w")
 	(unspec:V4SF [(match_operand:V4SF 1 "s_register_operand" "0")
 		       (match_operand:V8HF 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCVTBQ_M_F32_F16))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3981,7 +3981,7 @@ (define_insn "mve_vcvttq_m_f16_f32v8hf"
    (set (match_operand:V8HF 0 "s_register_operand" "=w")
 	(unspec:V8HF [(match_operand:V8HF 1 "s_register_operand" "0")
 		       (match_operand:V4SF 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCVTTQ_M_F16_F32))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -3997,7 +3997,7 @@ (define_insn "mve_vcvttq_m_f32_f16v4sf"
    (set (match_operand:V4SF 0 "s_register_operand" "=w")
 	(unspec:V4SF [(match_operand:V4SF 1 "s_register_operand" "0")
 		       (match_operand:V8HF 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VCVTTQ_M_F32_F16))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4595,7 +4595,7 @@ (define_insn "mve_vrev32q_m_fv8hf"
    (set (match_operand:V8HF 0 "s_register_operand" "=w")
 	(unspec:V8HF [(match_operand:V8HF 1 "s_register_operand" "0")
 		       (match_operand:V8HF 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VREV32Q_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -4659,7 +4659,7 @@ (define_insn "mve_vrmlaldavhxq_p_sv4si"
    (set (match_operand:DI 0 "s_register_operand" "=r")
 	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
 		       (match_operand:V4SI 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VRMLALDAVHXQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -4691,7 +4691,7 @@ (define_insn "mve_vrmlsldavhq_p_sv4si"
    (set (match_operand:DI 0 "s_register_operand" "=r")
 	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
 		       (match_operand:V4SI 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VRMLSLDAVHQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -4707,7 +4707,7 @@ (define_insn "mve_vrmlsldavhxq_p_sv4si"
    (set (match_operand:DI 0 "s_register_operand" "=r")
 	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
 		       (match_operand:V4SI 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
 	 VRMLSLDAVHXQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -4932,7 +4932,7 @@ (define_insn "mve_vrev16q_m_<supf>v16qi"
    (set (match_operand:V16QI 0 "s_register_operand" "=w")
 	(unspec:V16QI [(match_operand:V16QI 1 "s_register_operand" "0")
 		       (match_operand:V16QI 2 "s_register_operand" "w")
-		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+		       (match_operand:V16BI 3 "vpr_register_operand" "Up")]
 	 VREV16Q_M))
   ]
   "TARGET_HAVE_MVE"
@@ -4964,7 +4964,7 @@ (define_insn "mve_vrmlaldavhq_p_<supf>v4si"
    (set (match_operand:DI 0 "s_register_operand" "=r")
 	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
 		    (match_operand:V4SI 2 "s_register_operand" "w")
-		    (match_operand:HI 3 "vpr_register_operand" "Up")]
+		    (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 	 VRMLALDAVHQ_P))
   ]
   "TARGET_HAVE_MVE"
@@ -6233,7 +6233,7 @@ (define_insn "mve_vrmlaldavhaq_p_sv4si"
 	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
 		       (match_operand:V4SI 2 "s_register_operand" "w")
 		       (match_operand:V4SI 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VRMLALDAVHAQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6556,7 +6556,7 @@ (define_insn "mve_vrmlaldavhaq_p_uv4si"
 	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
 		       (match_operand:V4SI 2 "s_register_operand" "w")
 		       (match_operand:V4SI 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VRMLALDAVHAQ_P_U))
   ]
   "TARGET_HAVE_MVE"
@@ -6573,7 +6573,7 @@ (define_insn "mve_vrmlaldavhaxq_p_sv4si"
 	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
 		       (match_operand:V4SI 2 "s_register_operand" "w")
 		       (match_operand:V4SI 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VRMLALDAVHAXQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6590,7 +6590,7 @@ (define_insn "mve_vrmlsldavhaq_p_sv4si"
 	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
 		       (match_operand:V4SI 2 "s_register_operand" "w")
 		       (match_operand:V4SI 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VRMLSLDAVHAQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -6607,7 +6607,7 @@ (define_insn "mve_vrmlsldavhaxq_p_sv4si"
 	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
 		       (match_operand:V4SI 2 "s_register_operand" "w")
 		       (match_operand:V4SI 3 "s_register_operand" "w")
-		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
 	 VRMLSLDAVHAXQ_P_S))
   ]
   "TARGET_HAVE_MVE"
@@ -7528,7 +7528,7 @@ (define_insn "mve_vldrhq_<supf><mode>"
 (define_insn "mve_vldrhq_z_fv8hf"
   [(set (match_operand:V8HF 0 "s_register_operand" "=w")
 	(unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand" "Ux")
-	(match_operand:HI 2 "vpr_register_operand" "Up")]
+	(match_operand:<MVE_VPRED> 2 "vpr_register_operand" "Up")]
 	 VLDRHQ_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -8303,7 +8303,7 @@ (define_insn "mve_vstrwq_fv4sf"
 (define_insn "mve_vstrwq_p_fv4sf"
   [(set (match_operand:V4SI 0 "mve_memory_operand" "=Ux")
 	(unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")
-		      (match_operand:HI 2 "vpr_register_operand" "Up")]
+		      (match_operand:<MVE_VPRED> 2 "vpr_register_operand" "Up")]
 	 VSTRWQ_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
@@ -9844,7 +9844,7 @@ (define_insn "mve_vadciq_m_<supf>v4si"
 	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "0")
 		      (match_operand:V4SI 2 "s_register_operand" "w")
 		      (match_operand:V4SI 3 "s_register_operand" "w")
-		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+		      (match_operand:V4BI 4 "vpr_register_operand" "Up")]
 	 VADCIQ_M))
    (set (reg:SI VFPCC_REGNUM)
 	(unspec:SI [(const_int 0)]
@@ -9880,7 +9880,7 @@ (define_insn "mve_vadcq_m_<supf>v4si"
 	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "0")
 		      (match_operand:V4SI 2 "s_register_operand" "w")
 		      (match_operand:V4SI 3 "s_register_operand" "w")
-		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+		      (match_operand:V4BI 4 "vpr_register_operand" "Up")]
 	 VADCQ_M))
    (set (reg:SI VFPCC_REGNUM)
 	(unspec:SI [(reg:SI VFPCC_REGNUM)]
@@ -9917,7 +9917,7 @@ (define_insn "mve_vsbciq_m_<supf>v4si"
 	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
 		      (match_operand:V4SI 2 "s_register_operand" "w")
 		      (match_operand:V4SI 3 "s_register_operand" "w")
-		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+		      (match_operand:V4BI 4 "vpr_register_operand" "Up")]
 	 VSBCIQ_M))
    (set (reg:SI VFPCC_REGNUM)
 	(unspec:SI [(const_int 0)]
@@ -9953,7 +9953,7 @@ (define_insn "mve_vsbcq_m_<supf>v4si"
 	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
 		      (match_operand:V4SI 2 "s_register_operand" "w")
 		      (match_operand:V4SI 3 "s_register_operand" "w")
-		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+		      (match_operand:V4BI 4 "vpr_register_operand" "Up")]
 	 VSBCQ_M))
    (set (reg:SI VFPCC_REGNUM)
 	(unspec:SI [(reg:SI VFPCC_REGNUM)]
@@ -10457,7 +10457,7 @@ (define_insn "arm_vcx1q<a>_p_v16qi"
 	(unspec:V16QI [(match_operand:SI 1 "const_int_coproc_operand" "i")
 			   (match_operand:V16QI 2 "register_operand" "0")
 			   (match_operand:SI 3 "const_int_mve_cde1_operand" "i")
-			   (match_operand:HI 4 "vpr_register_operand" "Up")]
+			   (match_operand:V16BI 4 "vpr_register_operand" "Up")]
 	 CDE_VCX))]
   "TARGET_CDE && TARGET_HAVE_MVE"
   "vpst\;vcx1<a>t\\tp%c1, %q0, #%c3"
@@ -10471,7 +10471,7 @@ (define_insn "arm_vcx2q<a>_p_v16qi"
 			  (match_operand:V16QI 2 "register_operand" "0")
 			  (match_operand:V16QI 3 "register_operand" "t")
 			  (match_operand:SI 4 "const_int_mve_cde2_operand" "i")
-			  (match_operand:HI 5 "vpr_register_operand" "Up")]
+			  (match_operand:V16BI 5 "vpr_register_operand" "Up")]
 	 CDE_VCX))]
   "TARGET_CDE && TARGET_HAVE_MVE"
   "vpst\;vcx2<a>t\\tp%c1, %q0, %q3, #%c4"
@@ -10486,7 +10486,7 @@ (define_insn "arm_vcx3q<a>_p_v16qi"
 			  (match_operand:V16QI 3 "register_operand" "t")
 			  (match_operand:V16QI 4 "register_operand" "t")
 			  (match_operand:SI 5 "const_int_mve_cde3_operand" "i")
-			  (match_operand:HI 6 "vpr_register_operand" "Up")]
+			  (match_operand:V16BI 6 "vpr_register_operand" "Up")]
 	 CDE_VCX))]
   "TARGET_CDE && TARGET_HAVE_MVE"
   "vpst\;vcx3<a>t\\tp%c1, %q0, %q3, %q4, #%c5"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 14/15] arm: Add VPR_REG to ALL_REGS
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (12 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 13/15] arm: Convert more MVE/CDE " Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-13 14:56 ` [PATCH v3 15/15] arm: Fix constraint check for V8HI in mve_vector_mem_operand Christophe Lyon
  2022-01-14 13:18 ` [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
  15 siblings, 0 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

VPR_REG should be part of ALL_REGS, this patch fixes this omission.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	* config/arm/arm.h (REG_CLASS_CONTENTS): Add VPR_REG to ALL_REGS.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 2416fb5ef64..ea9fb16b9b1 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1347,7 +1347,7 @@ enum reg_class
   { 0x00000000, 0x00000000, 0x00000000, 0x00000080 }, /* AFP_REG */	\
   { 0x00000000, 0x00000000, 0x00000000, 0x00000400 }, /* VPR_REG.  */	\
   { 0x00005FFF, 0x00000000, 0x00000000, 0x00000400 }, /* GENERAL_AND_VPR_REGS.  */ \
-  { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000000F }  /* ALL_REGS.  */	\
+  { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000040F }  /* ALL_REGS.  */	\
 }
 
 #define FP_SYSREGS \
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 15/15] arm: Fix constraint check for V8HI in mve_vector_mem_operand
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (13 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 14/15] arm: Add VPR_REG to ALL_REGS Christophe Lyon
@ 2022-01-13 14:56 ` Christophe Lyon
  2022-01-14 17:03   ` [arm] MVE: Relax addressing modes for full loads and stores Andre Vieira (lists)
  2022-01-14 13:18 ` [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
  15 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-13 14:56 UTC (permalink / raw)
  To: gcc-patches

When compiling gcc.target/arm/mve/intrinsics/mve_immediates_1_n.c with
-mthumb -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp, the compiler
crashes because:
error: insn does not satisfy its constraints:
(insn 28 14 17 2 (set (reg:V8HI 16 s0 [orig:249 u16 ] [249])
    (mem/c:V8HI (pre_modify:SI (reg/f:SI 12 ip [248])
            (plus:SI (reg/f:SI 12 ip [248])
                (const_int 32 [0x20]))) [1 u16+0 S16 A64])) "arm_mve.h":17113:10 3011 {*mve_movv8hi}
    (expr_list:REG_INC (reg/f:SI 12 ip [248])
      (nil)))
during RTL pass: reload

We are trying to generate:
vldrh.16        q3, [ip], #14
but the constraint check fails because ip is not a low reg.

This patch replaces LAST_LO_REGNUM by LAST_ARM_REGNUM in
mve_vector_mem_operand and avoids the ICE.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/
	* config/arm/arm.c (mve_vector_mem_operand): Fix handling of V8HI.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7d56fa71806..5edca248fb7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13479,7 +13479,7 @@ mve_vector_mem_operand (machine_mode mode, rtx op, bool strict)
 	  case E_V4HImode:
 	  case E_V4HFmode:
 	    if (val % 2 == 0 && abs (val) <= 254)
-	      return reg_no <= LAST_LO_REGNUM
+	      return reg_no <= LAST_ARM_REGNUM
 		|| reg_no >= FIRST_PSEUDO_REGISTER;
 	    return FALSE;
 	  case E_V4SImode:
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates
  2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
                   ` (14 preceding siblings ...)
  2022-01-13 14:56 ` [PATCH v3 15/15] arm: Fix constraint check for V8HI in mve_vector_mem_operand Christophe Lyon
@ 2022-01-14 13:18 ` Christophe Lyon
  2022-01-14 13:33   ` Richard Biener
  15 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-14 13:18 UTC (permalink / raw)
  To: GCC Patches

Hi,

I hadn't realized we are moving to stage 4 this week-end :-(

The PRs I'm fixing are P3, but without these fixes MVE support is badly
broken, so I think I would be really good to fix that before the buggy
version becomes part of an actual release.
Anyway I posted v1 of the patches during stage1, so it should still be OK
if they are accepted as-is ?

Thanks,

Christophe

On Thu, Jan 13, 2022 at 3:58 PM Christophe Lyon via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
> This is v3 of this patch series, fixing issues I discovered before
> committing v2 (which had been approved).
>
> Thanks a lot to Richard Sandiford for his help.
>
> The changes v2 -> v3 are:
>
> Patch 4: Fix arm_hard_regno_nregs and CLASS_MAX_NREGS to support VPR.
>
> Patch 7: Changes to the underlying representation of vectors of
> booleans to account for the different expectations between AArch64/SVE
> and Arm/MVE.
>
> Patch 8: Re-use and extend existing thumb2_movhi* patterns instead of
> duplicating them in mve_mov<mode>. This requires the introduction of a
> new constraint to match a constant vector of booleans. Add a new RTL
> test.
>
> Patch 9: Introduce check_effective_target_arm_mve and skip
> gcc.dg/signbit-2.c, because with MVE there is no fallback architecture
> unlike SVE or AVX512.
>
> Patch 12: Update less load/store MVE builtins
> (mve_vldrdq_gather_base_z_<supf>v2di,
> mve_vldrdq_gather_offset_z_<supf>v2di,
> mve_vldrdq_gather_shifted_offset_z_<supf>v2di,
> mve_vstrdq_scatter_base_p_<supf>v2di,
> mve_vstrdq_scatter_offset_p_<supf>v2di,
> mve_vstrdq_scatter_offset_p_<supf>v2di_insn,
> mve_vstrdq_scatter_shifted_offset_p_<supf>v2di,
> mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn,
> mve_vstrdq_scatter_base_wb_p_<supf>v2di,
> mve_vldrdq_gather_base_wb_z_<supf>v2di,
> mve_vldrdq_gather_base_nowb_z_<supf>v2di,
> mve_vldrdq_gather_base_wb_z_<supf>v2di_insn) for which we keep HI mode
> for vpr_register_operand.
>
> Patch 13: No need to update
> gcc.target/arm/acle/cde-mve-full-assembly.c anymore since we re-use
> the mov pattern that emits '@ movhi' in the assembly.
>
> Patch 15: This is a new patch to fix a problem I noticed during this
> v2->v3 update.
>
>
>
> I'll squash patch 2 with patch 9 and patch 3 with patch 8.
>
> Original text:
>
> This patch series addresses PR 100757 and 101325 by representing
> vectors of predicates (MVE VPR.P0 register) as vectors of booleans
> rather than using HImode.
>
> As this implies a lot of mostly mechanical changes, I have tried to
> split the patches in a way that should help reviewers, but the split
> is a bit artificial.
>
> Patches 1-3 add new tests.
>
> Patches 4-6 are small independent improvements.
>
> Patch 7 implements the predicate qualifier, but does not change any
> builtin yet.
>
> Patch 8 is the first of the two main patches, and uses the new
> qualifier to describe the vcmp and vpsel builtins that are useful for
> auto-vectorization of comparisons.
>
> Patch 9 is the second main patch, which fixes the vcond_mask expander.
>
> Patches 10-13 convert almost all the remaining builtins with HI
> operands to use the predicate qualifier.  After these, there are still
> a few builtins with HI operands left, about which I am not sure: vctp,
> vpnot, load-gather and store-scatter with v2di operands.  In fact,
> patches 11/12 update some STR/LDR qualifiers in a way that breaks
> these v2di builtins although existing tests still pass.
>
> Christophe Lyon (15):
>   arm: Add new tests for comparison vectorization with Neon and MVE
>   arm: Add tests for PR target/100757
>   arm: Add tests for PR target/101325
>   arm: Add GENERAL_AND_VPR_REGS regclass
>   arm: Add support for VPR_REG in arm_class_likely_spilled_p
>   arm: Fix mve_vmvnq_n_<supf><mode> argument mode
>   arm: Implement MVE predicates as vectors of booleans
>   arm: Implement auto-vectorized MVE comparisons with vectors of boolean
>     predicates
>   arm: Fix vcond_mask expander for MVE (PR target/100757)
>   arm: Convert remaining MVE vcmp builtins to predicate qualifiers
>   arm: Convert more MVE builtins to predicate qualifiers
>   arm: Convert more load/store MVE builtins to predicate qualifiers
>   arm: Convert more MVE/CDE builtins to predicate qualifiers
>   arm: Add VPR_REG to ALL_REGS
>   arm: Fix constraint check for V8HI in mve_vector_mem_operand
>
>  gcc/config/aarch64/aarch64-modes.def          |   8 +-
>  gcc/config/arm/arm-builtins.c                 | 224 +++--
>  gcc/config/arm/arm-builtins.h                 |   4 +-
>  gcc/config/arm/arm-modes.def                  |   8 +
>  gcc/config/arm/arm-protos.h                   |   4 +-
>  gcc/config/arm/arm-simd-builtin-types.def     |   4 +
>  gcc/config/arm/arm.c                          | 169 ++--
>  gcc/config/arm/arm.h                          |   9 +-
>  gcc/config/arm/arm_mve_builtins.def           | 746 ++++++++--------
>  gcc/config/arm/constraints.md                 |   6 +
>  gcc/config/arm/iterators.md                   |   6 +
>  gcc/config/arm/mve.md                         | 795 ++++++++++--------
>  gcc/config/arm/neon.md                        |  39 +
>  gcc/config/arm/vec-common.md                  |  52 --
>  gcc/config/arm/vfp.md                         |  34 +-
>  gcc/doc/sourcebuild.texi                      |   4 +
>  gcc/emit-rtl.c                                |  20 +-
>  gcc/genmodes.c                                |  81 +-
>  gcc/machmode.def                              |   2 +-
>  gcc/rtx-vector-builder.c                      |   4 +-
>  gcc/simplify-rtx.c                            |  34 +-
>  gcc/testsuite/gcc.dg/signbit-2.c              |   1 +
>  .../gcc.target/arm/simd/mve-vcmp-f32-2.c      |  32 +
>  .../gcc.target/arm/simd/neon-compare-1.c      |  78 ++
>  .../gcc.target/arm/simd/neon-compare-2.c      |  13 +
>  .../gcc.target/arm/simd/neon-compare-3.c      |  14 +
>  .../arm/simd/neon-compare-scalar-1.c          |  57 ++
>  .../gcc.target/arm/simd/neon-vcmp-f16.c       |  12 +
>  .../gcc.target/arm/simd/neon-vcmp-f32-2.c     |  15 +
>  .../gcc.target/arm/simd/neon-vcmp-f32-3.c     |  12 +
>  .../gcc.target/arm/simd/neon-vcmp-f32.c       |  12 +
>  gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c |  22 +
>  .../gcc.target/arm/simd/pr100757-2.c          |  20 +
>  .../gcc.target/arm/simd/pr100757-3.c          |  20 +
>  .../gcc.target/arm/simd/pr100757-4.c          |  19 +
>  gcc/testsuite/gcc.target/arm/simd/pr100757.c  |  19 +
>  .../gcc.target/arm/simd/pr101325-2.c          |  19 +
>  gcc/testsuite/gcc.target/arm/simd/pr101325.c  |  14 +
>  gcc/testsuite/lib/target-supports.exp         |  15 +-
>  gcc/varasm.c                                  |   7 +-
>  40 files changed, 1635 insertions(+), 1019 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-3.c
>  create mode 100644
> gcc/testsuite/gcc.target/arm/simd/neon-compare-scalar-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f16.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-3.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr101325.c
>
> --
> 2.25.1
>
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates
  2022-01-14 13:18 ` [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
@ 2022-01-14 13:33   ` Richard Biener
  2022-01-14 14:22     ` Kyrylo Tkachov
  0 siblings, 1 reply; 54+ messages in thread
From: Richard Biener @ 2022-01-14 13:33 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: GCC Patches

On Fri, Jan 14, 2022 at 2:18 PM Christophe Lyon via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> I hadn't realized we are moving to stage 4 this week-end :-(
>
> The PRs I'm fixing are P3, but without these fixes MVE support is badly
> broken, so I think I would be really good to fix that before the buggy
> version becomes part of an actual release.
> Anyway I posted v1 of the patches during stage1, so it should still be OK
> if they are accepted as-is ?

In the end it's up to the target maintainers to weight the risk of breakage
vs. the risk of not usefulness ;)  But stage3 is where the "was posted
during stage1"
rule can easily apply - at some point we have to stop with such general ruling.

Richard.

> Thanks,
>
> Christophe
>
> On Thu, Jan 13, 2022 at 3:58 PM Christophe Lyon via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
> >
> > This is v3 of this patch series, fixing issues I discovered before
> > committing v2 (which had been approved).
> >
> > Thanks a lot to Richard Sandiford for his help.
> >
> > The changes v2 -> v3 are:
> >
> > Patch 4: Fix arm_hard_regno_nregs and CLASS_MAX_NREGS to support VPR.
> >
> > Patch 7: Changes to the underlying representation of vectors of
> > booleans to account for the different expectations between AArch64/SVE
> > and Arm/MVE.
> >
> > Patch 8: Re-use and extend existing thumb2_movhi* patterns instead of
> > duplicating them in mve_mov<mode>. This requires the introduction of a
> > new constraint to match a constant vector of booleans. Add a new RTL
> > test.
> >
> > Patch 9: Introduce check_effective_target_arm_mve and skip
> > gcc.dg/signbit-2.c, because with MVE there is no fallback architecture
> > unlike SVE or AVX512.
> >
> > Patch 12: Update less load/store MVE builtins
> > (mve_vldrdq_gather_base_z_<supf>v2di,
> > mve_vldrdq_gather_offset_z_<supf>v2di,
> > mve_vldrdq_gather_shifted_offset_z_<supf>v2di,
> > mve_vstrdq_scatter_base_p_<supf>v2di,
> > mve_vstrdq_scatter_offset_p_<supf>v2di,
> > mve_vstrdq_scatter_offset_p_<supf>v2di_insn,
> > mve_vstrdq_scatter_shifted_offset_p_<supf>v2di,
> > mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn,
> > mve_vstrdq_scatter_base_wb_p_<supf>v2di,
> > mve_vldrdq_gather_base_wb_z_<supf>v2di,
> > mve_vldrdq_gather_base_nowb_z_<supf>v2di,
> > mve_vldrdq_gather_base_wb_z_<supf>v2di_insn) for which we keep HI mode
> > for vpr_register_operand.
> >
> > Patch 13: No need to update
> > gcc.target/arm/acle/cde-mve-full-assembly.c anymore since we re-use
> > the mov pattern that emits '@ movhi' in the assembly.
> >
> > Patch 15: This is a new patch to fix a problem I noticed during this
> > v2->v3 update.
> >
> >
> >
> > I'll squash patch 2 with patch 9 and patch 3 with patch 8.
> >
> > Original text:
> >
> > This patch series addresses PR 100757 and 101325 by representing
> > vectors of predicates (MVE VPR.P0 register) as vectors of booleans
> > rather than using HImode.
> >
> > As this implies a lot of mostly mechanical changes, I have tried to
> > split the patches in a way that should help reviewers, but the split
> > is a bit artificial.
> >
> > Patches 1-3 add new tests.
> >
> > Patches 4-6 are small independent improvements.
> >
> > Patch 7 implements the predicate qualifier, but does not change any
> > builtin yet.
> >
> > Patch 8 is the first of the two main patches, and uses the new
> > qualifier to describe the vcmp and vpsel builtins that are useful for
> > auto-vectorization of comparisons.
> >
> > Patch 9 is the second main patch, which fixes the vcond_mask expander.
> >
> > Patches 10-13 convert almost all the remaining builtins with HI
> > operands to use the predicate qualifier.  After these, there are still
> > a few builtins with HI operands left, about which I am not sure: vctp,
> > vpnot, load-gather and store-scatter with v2di operands.  In fact,
> > patches 11/12 update some STR/LDR qualifiers in a way that breaks
> > these v2di builtins although existing tests still pass.
> >
> > Christophe Lyon (15):
> >   arm: Add new tests for comparison vectorization with Neon and MVE
> >   arm: Add tests for PR target/100757
> >   arm: Add tests for PR target/101325
> >   arm: Add GENERAL_AND_VPR_REGS regclass
> >   arm: Add support for VPR_REG in arm_class_likely_spilled_p
> >   arm: Fix mve_vmvnq_n_<supf><mode> argument mode
> >   arm: Implement MVE predicates as vectors of booleans
> >   arm: Implement auto-vectorized MVE comparisons with vectors of boolean
> >     predicates
> >   arm: Fix vcond_mask expander for MVE (PR target/100757)
> >   arm: Convert remaining MVE vcmp builtins to predicate qualifiers
> >   arm: Convert more MVE builtins to predicate qualifiers
> >   arm: Convert more load/store MVE builtins to predicate qualifiers
> >   arm: Convert more MVE/CDE builtins to predicate qualifiers
> >   arm: Add VPR_REG to ALL_REGS
> >   arm: Fix constraint check for V8HI in mve_vector_mem_operand
> >
> >  gcc/config/aarch64/aarch64-modes.def          |   8 +-
> >  gcc/config/arm/arm-builtins.c                 | 224 +++--
> >  gcc/config/arm/arm-builtins.h                 |   4 +-
> >  gcc/config/arm/arm-modes.def                  |   8 +
> >  gcc/config/arm/arm-protos.h                   |   4 +-
> >  gcc/config/arm/arm-simd-builtin-types.def     |   4 +
> >  gcc/config/arm/arm.c                          | 169 ++--
> >  gcc/config/arm/arm.h                          |   9 +-
> >  gcc/config/arm/arm_mve_builtins.def           | 746 ++++++++--------
> >  gcc/config/arm/constraints.md                 |   6 +
> >  gcc/config/arm/iterators.md                   |   6 +
> >  gcc/config/arm/mve.md                         | 795 ++++++++++--------
> >  gcc/config/arm/neon.md                        |  39 +
> >  gcc/config/arm/vec-common.md                  |  52 --
> >  gcc/config/arm/vfp.md                         |  34 +-
> >  gcc/doc/sourcebuild.texi                      |   4 +
> >  gcc/emit-rtl.c                                |  20 +-
> >  gcc/genmodes.c                                |  81 +-
> >  gcc/machmode.def                              |   2 +-
> >  gcc/rtx-vector-builder.c                      |   4 +-
> >  gcc/simplify-rtx.c                            |  34 +-
> >  gcc/testsuite/gcc.dg/signbit-2.c              |   1 +
> >  .../gcc.target/arm/simd/mve-vcmp-f32-2.c      |  32 +
> >  .../gcc.target/arm/simd/neon-compare-1.c      |  78 ++
> >  .../gcc.target/arm/simd/neon-compare-2.c      |  13 +
> >  .../gcc.target/arm/simd/neon-compare-3.c      |  14 +
> >  .../arm/simd/neon-compare-scalar-1.c          |  57 ++
> >  .../gcc.target/arm/simd/neon-vcmp-f16.c       |  12 +
> >  .../gcc.target/arm/simd/neon-vcmp-f32-2.c     |  15 +
> >  .../gcc.target/arm/simd/neon-vcmp-f32-3.c     |  12 +
> >  .../gcc.target/arm/simd/neon-vcmp-f32.c       |  12 +
> >  gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c |  22 +
> >  .../gcc.target/arm/simd/pr100757-2.c          |  20 +
> >  .../gcc.target/arm/simd/pr100757-3.c          |  20 +
> >  .../gcc.target/arm/simd/pr100757-4.c          |  19 +
> >  gcc/testsuite/gcc.target/arm/simd/pr100757.c  |  19 +
> >  .../gcc.target/arm/simd/pr101325-2.c          |  19 +
> >  gcc/testsuite/gcc.target/arm/simd/pr101325.c  |  14 +
> >  gcc/testsuite/lib/target-supports.exp         |  15 +-
> >  gcc/varasm.c                                  |   7 +-
> >  40 files changed, 1635 insertions(+), 1019 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-3.c
> >  create mode 100644
> > gcc/testsuite/gcc.target/arm/simd/neon-compare-scalar-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f16.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr101325.c
> >
> > --
> > 2.25.1
> >
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates
  2022-01-14 13:33   ` Richard Biener
@ 2022-01-14 14:22     ` Kyrylo Tkachov
  2022-01-26  8:40       ` Christophe Lyon
  0 siblings, 1 reply; 54+ messages in thread
From: Kyrylo Tkachov @ 2022-01-14 14:22 UTC (permalink / raw)
  To: Richard Biener, Christophe Lyon; +Cc: gcc-patches

Hi Christophe, Richard,

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Richard
> Biener via Gcc-patches
> Sent: Friday, January 14, 2022 1:33 PM
> To: Christophe Lyon <christophe.lyon.oss@gmail.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH v3 00/15] ARM/MVE use vectors of boolean for
> predicates
> 
> On Fri, Jan 14, 2022 at 2:18 PM Christophe Lyon via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi,
> >
> > I hadn't realized we are moving to stage 4 this week-end :-(
> >
> > The PRs I'm fixing are P3, but without these fixes MVE support is badly
> > broken, so I think I would be really good to fix that before the buggy
> > version becomes part of an actual release.
> > Anyway I posted v1 of the patches during stage1, so it should still be OK
> > if they are accepted as-is ?
> 
> In the end it's up to the target maintainers to weight the risk of breakage
> vs. the risk of not usefulness ;)  But stage3 is where the "was posted
> during stage1"
> rule can easily apply - at some point we have to stop with such general ruling.
> 

Thanks, that's in line with my interpretation.
These patches resolve some nasty brokenness in the MVE support that I'm keen to see fixed and from what I can tell the patches shouldn't have a large effect on non-MVE code.
So the risk vs reward balance for the arm port as a whole looks good to me.
Andre has kindly agreed to help review the patches and I'll also try to get to them today and next week so that we can get them in early stage4.

Thanks,
Kyrill

> Richard.
> 
> > Thanks,
> >
> > Christophe
> >
> > On Thu, Jan 13, 2022 at 3:58 PM Christophe Lyon via Gcc-patches <
> > gcc-patches@gcc.gnu.org> wrote:
> >
> > >
> > > This is v3 of this patch series, fixing issues I discovered before
> > > committing v2 (which had been approved).
> > >
> > > Thanks a lot to Richard Sandiford for his help.
> > >
> > > The changes v2 -> v3 are:
> > >
> > > Patch 4: Fix arm_hard_regno_nregs and CLASS_MAX_NREGS to support
> VPR.
> > >
> > > Patch 7: Changes to the underlying representation of vectors of
> > > booleans to account for the different expectations between AArch64/SVE
> > > and Arm/MVE.
> > >
> > > Patch 8: Re-use and extend existing thumb2_movhi* patterns instead of
> > > duplicating them in mve_mov<mode>. This requires the introduction of a
> > > new constraint to match a constant vector of booleans. Add a new RTL
> > > test.
> > >
> > > Patch 9: Introduce check_effective_target_arm_mve and skip
> > > gcc.dg/signbit-2.c, because with MVE there is no fallback architecture
> > > unlike SVE or AVX512.
> > >
> > > Patch 12: Update less load/store MVE builtins
> > > (mve_vldrdq_gather_base_z_<supf>v2di,
> > > mve_vldrdq_gather_offset_z_<supf>v2di,
> > > mve_vldrdq_gather_shifted_offset_z_<supf>v2di,
> > > mve_vstrdq_scatter_base_p_<supf>v2di,
> > > mve_vstrdq_scatter_offset_p_<supf>v2di,
> > > mve_vstrdq_scatter_offset_p_<supf>v2di_insn,
> > > mve_vstrdq_scatter_shifted_offset_p_<supf>v2di,
> > > mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn,
> > > mve_vstrdq_scatter_base_wb_p_<supf>v2di,
> > > mve_vldrdq_gather_base_wb_z_<supf>v2di,
> > > mve_vldrdq_gather_base_nowb_z_<supf>v2di,
> > > mve_vldrdq_gather_base_wb_z_<supf>v2di_insn) for which we keep HI
> mode
> > > for vpr_register_operand.
> > >
> > > Patch 13: No need to update
> > > gcc.target/arm/acle/cde-mve-full-assembly.c anymore since we re-use
> > > the mov pattern that emits '@ movhi' in the assembly.
> > >
> > > Patch 15: This is a new patch to fix a problem I noticed during this
> > > v2->v3 update.
> > >
> > >
> > >
> > > I'll squash patch 2 with patch 9 and patch 3 with patch 8.
> > >
> > > Original text:
> > >
> > > This patch series addresses PR 100757 and 101325 by representing
> > > vectors of predicates (MVE VPR.P0 register) as vectors of booleans
> > > rather than using HImode.
> > >
> > > As this implies a lot of mostly mechanical changes, I have tried to
> > > split the patches in a way that should help reviewers, but the split
> > > is a bit artificial.
> > >
> > > Patches 1-3 add new tests.
> > >
> > > Patches 4-6 are small independent improvements.
> > >
> > > Patch 7 implements the predicate qualifier, but does not change any
> > > builtin yet.
> > >
> > > Patch 8 is the first of the two main patches, and uses the new
> > > qualifier to describe the vcmp and vpsel builtins that are useful for
> > > auto-vectorization of comparisons.
> > >
> > > Patch 9 is the second main patch, which fixes the vcond_mask expander.
> > >
> > > Patches 10-13 convert almost all the remaining builtins with HI
> > > operands to use the predicate qualifier.  After these, there are still
> > > a few builtins with HI operands left, about which I am not sure: vctp,
> > > vpnot, load-gather and store-scatter with v2di operands.  In fact,
> > > patches 11/12 update some STR/LDR qualifiers in a way that breaks
> > > these v2di builtins although existing tests still pass.
> > >
> > > Christophe Lyon (15):
> > >   arm: Add new tests for comparison vectorization with Neon and MVE
> > >   arm: Add tests for PR target/100757
> > >   arm: Add tests for PR target/101325
> > >   arm: Add GENERAL_AND_VPR_REGS regclass
> > >   arm: Add support for VPR_REG in arm_class_likely_spilled_p
> > >   arm: Fix mve_vmvnq_n_<supf><mode> argument mode
> > >   arm: Implement MVE predicates as vectors of booleans
> > >   arm: Implement auto-vectorized MVE comparisons with vectors of
> boolean
> > >     predicates
> > >   arm: Fix vcond_mask expander for MVE (PR target/100757)
> > >   arm: Convert remaining MVE vcmp builtins to predicate qualifiers
> > >   arm: Convert more MVE builtins to predicate qualifiers
> > >   arm: Convert more load/store MVE builtins to predicate qualifiers
> > >   arm: Convert more MVE/CDE builtins to predicate qualifiers
> > >   arm: Add VPR_REG to ALL_REGS
> > >   arm: Fix constraint check for V8HI in mve_vector_mem_operand
> > >
> > >  gcc/config/aarch64/aarch64-modes.def          |   8 +-
> > >  gcc/config/arm/arm-builtins.c                 | 224 +++--
> > >  gcc/config/arm/arm-builtins.h                 |   4 +-
> > >  gcc/config/arm/arm-modes.def                  |   8 +
> > >  gcc/config/arm/arm-protos.h                   |   4 +-
> > >  gcc/config/arm/arm-simd-builtin-types.def     |   4 +
> > >  gcc/config/arm/arm.c                          | 169 ++--
> > >  gcc/config/arm/arm.h                          |   9 +-
> > >  gcc/config/arm/arm_mve_builtins.def           | 746 ++++++++--------
> > >  gcc/config/arm/constraints.md                 |   6 +
> > >  gcc/config/arm/iterators.md                   |   6 +
> > >  gcc/config/arm/mve.md                         | 795 ++++++++++--------
> > >  gcc/config/arm/neon.md                        |  39 +
> > >  gcc/config/arm/vec-common.md                  |  52 --
> > >  gcc/config/arm/vfp.md                         |  34 +-
> > >  gcc/doc/sourcebuild.texi                      |   4 +
> > >  gcc/emit-rtl.c                                |  20 +-
> > >  gcc/genmodes.c                                |  81 +-
> > >  gcc/machmode.def                              |   2 +-
> > >  gcc/rtx-vector-builder.c                      |   4 +-
> > >  gcc/simplify-rtx.c                            |  34 +-
> > >  gcc/testsuite/gcc.dg/signbit-2.c              |   1 +
> > >  .../gcc.target/arm/simd/mve-vcmp-f32-2.c      |  32 +
> > >  .../gcc.target/arm/simd/neon-compare-1.c      |  78 ++
> > >  .../gcc.target/arm/simd/neon-compare-2.c      |  13 +
> > >  .../gcc.target/arm/simd/neon-compare-3.c      |  14 +
> > >  .../arm/simd/neon-compare-scalar-1.c          |  57 ++
> > >  .../gcc.target/arm/simd/neon-vcmp-f16.c       |  12 +
> > >  .../gcc.target/arm/simd/neon-vcmp-f32-2.c     |  15 +
> > >  .../gcc.target/arm/simd/neon-vcmp-f32-3.c     |  12 +
> > >  .../gcc.target/arm/simd/neon-vcmp-f32.c       |  12 +
> > >  gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c |  22 +
> > >  .../gcc.target/arm/simd/pr100757-2.c          |  20 +
> > >  .../gcc.target/arm/simd/pr100757-3.c          |  20 +
> > >  .../gcc.target/arm/simd/pr100757-4.c          |  19 +
> > >  gcc/testsuite/gcc.target/arm/simd/pr100757.c  |  19 +
> > >  .../gcc.target/arm/simd/pr101325-2.c          |  19 +
> > >  gcc/testsuite/gcc.target/arm/simd/pr101325.c  |  14 +
> > >  gcc/testsuite/lib/target-supports.exp         |  15 +-
> > >  gcc/varasm.c                                  |   7 +-
> > >  40 files changed, 1635 insertions(+), 1019 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-
> 2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-
> 1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-
> 2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-
> 3.c
> > >  create mode 100644
> > > gcc/testsuite/gcc.target/arm/simd/neon-compare-scalar-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f16.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-
> 2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-
> 3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr101325.c
> > >
> > > --
> > > 2.25.1
> > >
> > >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [arm] MVE: Relax addressing modes for full loads and stores
  2022-01-13 14:56 ` [PATCH v3 15/15] arm: Fix constraint check for V8HI in mve_vector_mem_operand Christophe Lyon
@ 2022-01-14 17:03   ` Andre Vieira (lists)
  2022-01-17  7:48     ` Christophe Lyon
  0 siblings, 1 reply; 54+ messages in thread
From: Andre Vieira (lists) @ 2022-01-14 17:03 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 734 bytes --]

Hi Christophe,

This patch relaxes the addressing modes for the mve full load and stores 
(by full loads and stores I mean non-widening or narrowing loads and 
stores resp). The code before was requiring a LO_REGNUM for these, where 
this is only a requirement if the load is widening or the store narrowing.

So with this your patch should not be necessary.

Regression tested on arm-none-eabi-gcc.  Can you please confirm this 
fixes the issue you were seeing too?

gcc/ChangeLog:

         * config/arm/arm.h (MVE_STN_LDW_MODE): New MACRO.
         * config/arm/arm.c (mve_vector_mem_operand): Relax constraint on
         base register for non widening loads or narrowing stores.


Kind Regards,
Andre Vieira

[-- Attachment #2: mve_addressing_modes.patch --]
[-- Type: text/plain, Size: 2047 bytes --]

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index dacce2b7f086eeffb0cd36b26f102f77130a92fa..f39786d0f9e19e81841a45f6d7e92e408272fe23 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1099,6 +1099,10 @@ extern const int arm_arch_cde_coproc_bits[];
   ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \
    || (MODE) == V16QImode)
 
+/* Modes used in MVE's narrowing stores or widening loads.  */
+#define MVE_STN_LDW_MODE(MODE) \
+  ((MODE) == V4QImode || (MODE) == V8QImode || (MODE) == V4HImode)
+
 #define VALID_MVE_SF_MODE(MODE) \
   ((MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DFmode)
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bb75921f32df6185711d5304c044ce67a2d5671e..f5e09cb00b5478546d29c05cc885aeaa89501d39 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13438,27 +13438,28 @@ mve_vector_mem_operand (machine_mode mode, rtx op, bool strict)
 	  case E_V16QImode:
 	  case E_V8QImode:
 	  case E_V4QImode:
-	    if (abs (val) <= 127)
-	      return (reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
-		|| reg_no >= FIRST_PSEUDO_REGISTER;
-	    return FALSE;
+	    if (abs (val) > 127)
+	      return FALSE;
+	    break;
 	  case E_V8HImode:
 	  case E_V8HFmode:
 	  case E_V4HImode:
 	  case E_V4HFmode:
-	    if (val % 2 == 0 && abs (val) <= 254)
-	      return reg_no <= LAST_LO_REGNUM
-		|| reg_no >= FIRST_PSEUDO_REGISTER;
-	    return FALSE;
+	    if (val % 2 != 0 || abs (val) > 254)
+	      return FALSE;
+	    break;
 	  case E_V4SImode:
 	  case E_V4SFmode:
-	    if (val % 4 == 0 && abs (val) <= 508)
-	      return (reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
-		|| reg_no >= FIRST_PSEUDO_REGISTER;
-	    return FALSE;
+	    if (val % 4 != 0 || abs (val) > 508)
+	      return FALSE;
+	    break;
 	  default:
 	    return FALSE;
 	}
+      return reg_no >= FIRST_PSEUDO_REGISTER
+	|| (MVE_STN_LDW_MODE (mode)
+	    ? reg_no <= LAST_LO_REGNUM
+	    : (reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM));
     }
   return FALSE;
 }

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [arm] MVE: Relax addressing modes for full loads and stores
  2022-01-14 17:03   ` [arm] MVE: Relax addressing modes for full loads and stores Andre Vieira (lists)
@ 2022-01-17  7:48     ` Christophe Lyon
  2022-03-07 14:16       ` Andre Vieira (lists)
  0 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-17  7:48 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Christophe Lyon, GCC Patches

Hi André,

On Fri, Jan 14, 2022 at 6:03 PM Andre Vieira (lists) via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi Christophe,
>
> This patch relaxes the addressing modes for the mve full load and stores
> (by full loads and stores I mean non-widening or narrowing loads and
> stores resp). The code before was requiring a LO_REGNUM for these, where
> this is only a requirement if the load is widening or the store narrowing.
>
> So with this your patch should not be necessary.
>
> Regression tested on arm-none-eabi-gcc.  Can you please confirm this
> fixes the issue you were seeing too?
>

Yes, I confirm this fixes the problem I was fixing with my patch #15 in my
MVE/VCMP/VCOND series.
I'll drop it.

Thanks!

Christophe


>
> gcc/ChangeLog:
>
>          * config/arm/arm.h (MVE_STN_LDW_MODE): New MACRO.
>          * config/arm/arm.c (mve_vector_mem_operand): Relax constraint on
>          base register for non widening loads or narrowing stores.
>
>
> Kind Regards,
> Andre Vieira

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass
  2022-01-13 14:56 ` [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass Christophe Lyon
@ 2022-01-19 18:17   ` Andre Vieira (lists)
  2022-01-20  9:14     ` Christophe Lyon
  2022-01-27 16:21   ` Kyrylo Tkachov
  1 sibling, 1 reply; 54+ messages in thread
From: Andre Vieira (lists) @ 2022-01-19 18:17 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches

Hi Christophe,

On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> At some point during the development of this patch series, it appeared
> that in some cases the register allocator wants “VPR or general”
> rather than “VPR or general or FP” (which is the same thing as
> ALL_REGS).  The series does not seem to require this anymore, but it
> seems to be a good thing to do anyway, to give the register allocator
> more freedom.
Not sure I fully understand this, but I guess it creates an extra class 
the register allocator can use to group things that can go into VPR or 
general reg?
>
> CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
> regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
> -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.
I have not looked into this failure, but ...
>
> 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
>
> 	gcc/
> 	* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
> 	(REG_CLASS_NAMES): Likewise.
> 	(REG_CLASS_CONTENTS): Likewise.
> 	(CLASS_MAX_NREGS): Handle VPR.
> 	* config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index bb75921f32d..c3559ca8703 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
>   static unsigned int
>   arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
>   {
> +  if (IS_VPR_REGNUM (regno))
> +    return CEIL (GET_MODE_SIZE (mode), 2);
When do we ever want to use more than 1 register for VPR?
>   
> @@ -1453,7 +1456,9 @@ extern const char *fp_sysreg_names[NB_FP_SYSREGS];
>      ARM regs are UNITS_PER_WORD bits.
>      FIXME: Is this true for iWMMX?  */
>   #define CLASS_MAX_NREGS(CLASS, MODE)  \
> -  (ARM_NUM_REGS (MODE))
> +  (CLASS == VPR_REG)		      \
> +  ? CEIL (GET_MODE_SIZE (MODE), 2)    \
> +  : (ARM_NUM_REGS (MODE))
>   
Same.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 05/15] arm: Add support for VPR_REG in arm_class_likely_spilled_p
  2022-01-13 14:56 ` [PATCH v3 05/15] arm: Add support for VPR_REG in arm_class_likely_spilled_p Christophe Lyon
@ 2022-01-19 18:25   ` Andre Vieira (lists)
  2022-01-20  9:20     ` Christophe Lyon
  0 siblings, 1 reply; 54+ messages in thread
From: Andre Vieira (lists) @ 2022-01-19 18:25 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches


On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> VPR_REG is the only register in its class, so it should be handled by
> TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
> default_class_likely_spilled_p.  No test fails without this patch, but
> it seems it should be implemented.
>
> 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
>
> 	gcc/
> 	* config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index c3559ca8703..64a8f2dc7de 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -29317,7 +29317,7 @@ arm_class_likely_spilled_p (reg_class_t rclass)
>         || rclass  == CC_REG)
>       return true;
>   
> -  return false;
> +  return default_class_likely_spilled_p (rclass);
>   }
>   
>   /* Implements target hook small_register_classes_for_mode_p.  */
LGTM, but await reviewer approval. I suspect this would help avoiding 
spilling of other special registers, though I'm not sure we codegen any 
enough to make a difference, which is why it is likely to have no effect 
on anything else.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  2022-01-13 14:56 ` [PATCH v3 06/15] arm: Fix mve_vmvnq_n_<supf><mode> argument mode Christophe Lyon
@ 2022-01-19 19:03   ` Andre Vieira (lists)
  2022-01-20  9:23     ` Christophe Lyon
  2022-01-20 10:45     ` Richard Sandiford
  0 siblings, 2 replies; 54+ messages in thread
From: Andre Vieira (lists) @ 2022-01-19 19:03 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches; +Cc: Kyrylo Tkachov, Richard Sandiford


On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
> <V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.
>
> 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
>
> 	gcc/
> 	* config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
> 	for operand 1.
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 171dd384133..5c3b34dce3a 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_<supf><mode>"
>   (define_insn "mve_vmvnq_n_<supf><mode>"
>     [
>      (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> -	(unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
> +	(unspec:MVE_5 [(match_operand:<V_elem> 1 "immediate_operand" "i")]
>   	 VMVNQ_N))
>     ]
>     "TARGET_HAVE_MVE"

While fixing this it might be good to fix the constraint and predicate 
inspired by "DL" and "neon_inv_logic_op2" respectively. This would avoid 
the compiler generating wrong assembly, and instead it would probably 
lead to the compiler using a load literal.

I kind of think it would be better to have the intrinsic refuse the 
immediate altogether, but it seems for NEON we also use the load literal 
approach.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass
  2022-01-19 18:17   ` Andre Vieira (lists)
@ 2022-01-20  9:14     ` Christophe Lyon
  2022-01-20  9:43       ` Andre Vieira (lists)
  0 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-20  9:14 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Christophe Lyon, GCC Patches

On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi Christophe,
>
> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> > At some point during the development of this patch series, it appeared
> > that in some cases the register allocator wants “VPR or general”
> > rather than “VPR or general or FP” (which is the same thing as
> > ALL_REGS).  The series does not seem to require this anymore, but it
> > seems to be a good thing to do anyway, to give the register allocator
> > more freedom.
> Not sure I fully understand this, but I guess it creates an extra class
> the register allocator can use to group things that can go into VPR or
> general reg?
> >
> > CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
> > regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
> > -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.
> I have not looked into this failure, but ...
> >
> > 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
> >
> >       gcc/
> >       * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
> >       (REG_CLASS_NAMES): Likewise.
> >       (REG_CLASS_CONTENTS): Likewise.
> >       (CLASS_MAX_NREGS): Handle VPR.
> >       * config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.
> >
> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index bb75921f32d..c3559ca8703 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
> >   static unsigned int
> >   arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
> >   {
> > +  if (IS_VPR_REGNUM (regno))
> > +    return CEIL (GET_MODE_SIZE (mode), 2);
> When do we ever want to use more than 1 register for VPR?
>

That was tricky.
Richard Sandiford helped me analyze the problem, I guess I can quote him:

RS> I think the problem is a combination of a few things:
RS>
RS> (1) arm_hard_regno_mode_ok rejects SImode in VPR, so SImode moves
RS>     to or from the VPR_REG class get the maximum cost.
RS>
RS> (2) IRA thinks from CLASS_MAX_NREGS and arm_hard_regno_nregs that
RS>    VPR is big enough to hold SImode.
RS>
RS> (3) If a class C1 is a superset of a class C2, and if C2 is big enough
RS>     to hold a mode M, IRA ensures that move costs for M involving C1
RS>     are >= move costs for M involving C2.
RS>
RS> (1) is correct but (2) isn't.  IMO (3) is dubious: the trigger should
RS> be whether C2 is actually allowed to hold M, not whether C2 is big
enough
RS> to hold M.  However, changing that is likely to cause problems
elsewhere,
RS> and could lead to classes like GENERAL_AND_FP_REGS being used when
RS> FP_REGS are disabled (which might be confusing).
RS>
RS> “Fixing” (2) using:
RS>
RS>  CEIL (GET_MODE_SIZE (mode), 2)
RS>
RS> for VPR_REG & VPR_REGNUM seems to make the costs correct.  I don't know
RS> if it would cause other problems though.
RS>
RS> I don't think CLASS_MAX_NREGS should do anything special for
superclasses
RS> of VPR_REG, even though that makes the definition non-obvious.  If an
RS> SImode is stored in GENERAL_AND_VPR_REGS, it will in reality be stored
RS> in the GENERAL_REGS subset, so the maximum count should come from there
RS> rather than VPR_REG.

Does that answer your question?


> >
> > @@ -1453,7 +1456,9 @@ extern const char *fp_sysreg_names[NB_FP_SYSREGS];
> >      ARM regs are UNITS_PER_WORD bits.
> >      FIXME: Is this true for iWMMX?  */
> >   #define CLASS_MAX_NREGS(CLASS, MODE)  \
> > -  (ARM_NUM_REGS (MODE))
> > +  (CLASS == VPR_REG)               \
> > +  ? CEIL (GET_MODE_SIZE (MODE), 2)    \
> > +  : (ARM_NUM_REGS (MODE))
> >
> Same.
>
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 05/15] arm: Add support for VPR_REG in arm_class_likely_spilled_p
  2022-01-19 18:25   ` Andre Vieira (lists)
@ 2022-01-20  9:20     ` Christophe Lyon
  0 siblings, 0 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-20  9:20 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Christophe Lyon, GCC Patches

On Wed, Jan 19, 2022 at 7:25 PM Andre Vieira (lists) via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> > VPR_REG is the only register in its class, so it should be handled by
> > TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
> > default_class_likely_spilled_p.  No test fails without this patch, but
> > it seems it should be implemented.
> >
> > 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
> >
> >       gcc/
> >       * config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.
> >
> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index c3559ca8703..64a8f2dc7de 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -29317,7 +29317,7 @@ arm_class_likely_spilled_p (reg_class_t rclass)
> >         || rclass  == CC_REG)
> >       return true;
> >
> > -  return false;
> > +  return default_class_likely_spilled_p (rclass);
> >   }
> >
> >   /* Implements target hook small_register_classes_for_mode_p.  */
> LGTM, but await reviewer approval. I suspect this would help avoiding
> spilling of other special registers, though I'm not sure we codegen any
> enough to make a difference, which is why it is likely to have no effect
> on anything else.
>
>
Yeah.

I thought this had been approved at v2:
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581778.html
(like most other patches in the series, except the few ones I had to change
v2 -> v3)

Thanks,

Christophe

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  2022-01-19 19:03   ` Andre Vieira (lists)
@ 2022-01-20  9:23     ` Christophe Lyon
  2022-01-20  9:38       ` Andre Simoes Dias Vieira
  2022-01-20 10:45     ` Richard Sandiford
  1 sibling, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-20  9:23 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Christophe Lyon, GCC Patches, Richard Sandiford

On Wed, Jan 19, 2022 at 8:03 PM Andre Vieira (lists) via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> > The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
> > <V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.
> >
> > 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
> >
> >       gcc/
> >       * config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
> >       for operand 1.
> >
> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> > index 171dd384133..5c3b34dce3a 100644
> > --- a/gcc/config/arm/mve.md
> > +++ b/gcc/config/arm/mve.md
> > @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_<supf><mode>"
> >   (define_insn "mve_vmvnq_n_<supf><mode>"
> >     [
> >      (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> > -     (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
> > +     (unspec:MVE_5 [(match_operand:<V_elem> 1 "immediate_operand" "i")]
> >        VMVNQ_N))
> >     ]
> >     "TARGET_HAVE_MVE"
>
> While fixing this it might be good to fix the constraint and predicate
> inspired by "DL" and "neon_inv_logic_op2" respectively. This would avoid
> the compiler generating wrong assembly, and instead it would probably
> lead to the compiler using a load literal.
>
> I kind of think it would be better to have the intrinsic refuse the
> immediate altogether, but it seems for NEON we also use the load literal
> approach.
>
>
Ha, I thought that patch had been approved at v2 too:
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581344.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  2022-01-20  9:23     ` Christophe Lyon
@ 2022-01-20  9:38       ` Andre Simoes Dias Vieira
  2022-01-20  9:44         ` Christophe Lyon
  0 siblings, 1 reply; 54+ messages in thread
From: Andre Simoes Dias Vieira @ 2022-01-20  9:38 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Christophe Lyon, GCC Patches, Richard Sandiford


On 20/01/2022 09:23, Christophe Lyon wrote:
>
>
> On Wed, Jan 19, 2022 at 8:03 PM Andre Vieira (lists) via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
>
>
>     On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
>     > The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
>     > <V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.
>     >
>     > 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
>     >
>     >       gcc/
>     >       * config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem
>     mode
>     >       for operand 1.
>     >
>     > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>     > index 171dd384133..5c3b34dce3a 100644
>     > --- a/gcc/config/arm/mve.md
>     > +++ b/gcc/config/arm/mve.md
>     > @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_<supf><mode>"
>     >   (define_insn "mve_vmvnq_n_<supf><mode>"
>     >     [
>     >      (set (match_operand:MVE_5 0 "s_register_operand" "=w")
>     > -     (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
>     > +     (unspec:MVE_5 [(match_operand:<V_elem> 1
>     "immediate_operand" "i")]
>     >        VMVNQ_N))
>     >     ]
>     >     "TARGET_HAVE_MVE"
>
>     While fixing this it might be good to fix the constraint and
>     predicate
>     inspired by "DL" and "neon_inv_logic_op2" respectively. This would
>     avoid
>     the compiler generating wrong assembly, and instead it would probably
>     lead to the compiler using a load literal.
>
>     I kind of think it would be better to have the intrinsic refuse the
>     immediate altogether, but it seems for NEON we also use the load
>     literal
>     approach.
>
>
> Ha, I thought that patch had been approved at v2 too: 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581344.html
>
Yeah sorry I had not looked at the previous version of these series!

I can put together a follow-up for this then.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass
  2022-01-20  9:14     ` Christophe Lyon
@ 2022-01-20  9:43       ` Andre Vieira (lists)
  2022-01-20 10:40         ` Richard Sandiford
  0 siblings, 1 reply; 54+ messages in thread
From: Andre Vieira (lists) @ 2022-01-20  9:43 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Christophe Lyon, GCC Patches, Richard Sandiford


On 20/01/2022 09:14, Christophe Lyon wrote:
>
>
> On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
>
>     Hi Christophe,
>
>     On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
>     > At some point during the development of this patch series, it
>     appeared
>     > that in some cases the register allocator wants “VPR or general”
>     > rather than “VPR or general or FP” (which is the same thing as
>     > ALL_REGS).  The series does not seem to require this anymore, but it
>     > seems to be a good thing to do anyway, to give the register
>     allocator
>     > more freedom.
>     Not sure I fully understand this, but I guess it creates an extra
>     class
>     the register allocator can use to group things that can go into
>     VPR or
>     general reg?
>     >
>     > CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
>     > regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
>     > -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.
>     I have not looked into this failure, but ...
>     >
>     > 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
>     >
>     >       gcc/
>     >       * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
>     >       (REG_CLASS_NAMES): Likewise.
>     >       (REG_CLASS_CONTENTS): Likewise.
>     >       (CLASS_MAX_NREGS): Handle VPR.
>     >       * config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.
>     >
>     > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>     > index bb75921f32d..c3559ca8703 100644
>     > --- a/gcc/config/arm/arm.c
>     > +++ b/gcc/config/arm/arm.c
>     > @@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
>     >   static unsigned int
>     >   arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
>     >   {
>     > +  if (IS_VPR_REGNUM (regno))
>     > +    return CEIL (GET_MODE_SIZE (mode), 2);
>     When do we ever want to use more than 1 register for VPR?
>
>
> That was tricky.
> Richard Sandiford helped me analyze the problem, I guess I can quote him:
>
> RS> I think the problem is a combination of a few things:
> RS>
> RS> (1) arm_hard_regno_mode_ok rejects SImode in VPR, so SImode moves
> RS>     to or from the VPR_REG class get the maximum cost.
> RS>
> RS> (2) IRA thinks from CLASS_MAX_NREGS and arm_hard_regno_nregs that
> RS>    VPR is big enough to hold SImode.
> RS>
> RS> (3) If a class C1 is a superset of a class C2, and if C2 is big enough
> RS>     to hold a mode M, IRA ensures that move costs for M involving C1
> RS>     are >= move costs for M involving C2.
> RS>
> RS> (1) is correct but (2) isn't.  IMO (3) is dubious: the trigger should
> RS> be whether C2 is actually allowed to hold M, not whether C2 is big 
> enough
> RS> to hold M.  However, changing that is likely to cause problems 
> elsewhere,
> RS> and could lead to classes like GENERAL_AND_FP_REGS being used when
> RS> FP_REGS are disabled (which might be confusing).
> RS>

I understand everything up until here.

> RS> “Fixing” (2) using:
> RS>
> RS>  CEIL (GET_MODE_SIZE (mode), 2)
I was wondering why not just return '1' for VPR_REGNUM, rather than use 
the fact that the mode-size we use for VPR is 2 bytes, so diving it by 2 
makes 1. Unless we ever decide to use a larger mode for VPR, maybe 
that's what this is trying to address? I can't imagine we would ever 
need to though since for MVE there is only one VPR register and it is 
always 16-bits. Just feels overly complicated to me.
> RS>
> RS> for VPR_REG & VPR_REGNUM seems to make the costs correct.  I don't 
> know
> RS> if it would cause other problems though.
> RS>
> RS> I don't think CLASS_MAX_NREGS should do anything special for 
> superclasses
> RS> of VPR_REG, even though that makes the definition non-obvious.  If an
> RS> SImode is stored in GENERAL_AND_VPR_REGS, it will in reality be stored
> RS> in the GENERAL_REGS subset, so the maximum count should come from 
> there
> RS> rather than VPR_REG.
>
> Does that answer your question?
I guess it end's up being correct, just don't understand the complexity 
that's all.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  2022-01-20  9:38       ` Andre Simoes Dias Vieira
@ 2022-01-20  9:44         ` Christophe Lyon
  0 siblings, 0 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-20  9:44 UTC (permalink / raw)
  To: Andre Simoes Dias Vieira; +Cc: Christophe Lyon, GCC Patches, Richard Sandiford

On Thu, Jan 20, 2022 at 10:38 AM Andre Simoes Dias Vieira <
andre.simoesdiasvieira@arm.com> wrote:

>
> On 20/01/2022 09:23, Christophe Lyon wrote:
>
>
>
> On Wed, Jan 19, 2022 at 8:03 PM Andre Vieira (lists) via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
>>
>> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
>> > The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
>> > <V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.
>> >
>> > 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
>> >
>> >       gcc/
>> >       * config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
>> >       for operand 1.
>> >
>> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>> > index 171dd384133..5c3b34dce3a 100644
>> > --- a/gcc/config/arm/mve.md
>> > +++ b/gcc/config/arm/mve.md
>> > @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_<supf><mode>"
>> >   (define_insn "mve_vmvnq_n_<supf><mode>"
>> >     [
>> >      (set (match_operand:MVE_5 0 "s_register_operand" "=w")
>> > -     (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
>> > +     (unspec:MVE_5 [(match_operand:<V_elem> 1 "immediate_operand" "i")]
>> >        VMVNQ_N))
>> >     ]
>> >     "TARGET_HAVE_MVE"
>>
>> While fixing this it might be good to fix the constraint and predicate
>> inspired by "DL" and "neon_inv_logic_op2" respectively. This would avoid
>> the compiler generating wrong assembly, and instead it would probably
>> lead to the compiler using a load literal.
>>
>> I kind of think it would be better to have the intrinsic refuse the
>> immediate altogether, but it seems for NEON we also use the load literal
>> approach.
>>
>>
> Ha, I thought that patch had been approved at v2 too:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581344.html
>
> Yeah sorry I had not looked at the previous version of these series!
>
> I can put together a follow-up for this then.
>

No problem, thanks!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass
  2022-01-20  9:43       ` Andre Vieira (lists)
@ 2022-01-20 10:40         ` Richard Sandiford
  2022-01-20 10:45           ` Andre Vieira (lists)
  0 siblings, 1 reply; 54+ messages in thread
From: Richard Sandiford @ 2022-01-20 10:40 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Christophe Lyon, Christophe Lyon, GCC Patches

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> On 20/01/2022 09:14, Christophe Lyon wrote:
>>
>>
>> On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches 
>> <gcc-patches@gcc.gnu.org> wrote:
>>
>>     Hi Christophe,
>>
>>     On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
>>     > At some point during the development of this patch series, it
>>     appeared
>>     > that in some cases the register allocator wants “VPR or general”
>>     > rather than “VPR or general or FP” (which is the same thing as
>>     > ALL_REGS).  The series does not seem to require this anymore, but it
>>     > seems to be a good thing to do anyway, to give the register
>>     allocator
>>     > more freedom.
>>     Not sure I fully understand this, but I guess it creates an extra
>>     class
>>     the register allocator can use to group things that can go into
>>     VPR or
>>     general reg?
>>     >
>>     > CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
>>     > regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
>>     > -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.
>>     I have not looked into this failure, but ...
>>     >
>>     > 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
>>     >
>>     >       gcc/
>>     >       * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
>>     >       (REG_CLASS_NAMES): Likewise.
>>     >       (REG_CLASS_CONTENTS): Likewise.
>>     >       (CLASS_MAX_NREGS): Handle VPR.
>>     >       * config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.
>>     >
>>     > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>     > index bb75921f32d..c3559ca8703 100644
>>     > --- a/gcc/config/arm/arm.c
>>     > +++ b/gcc/config/arm/arm.c
>>     > @@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
>>     >   static unsigned int
>>     >   arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
>>     >   {
>>     > +  if (IS_VPR_REGNUM (regno))
>>     > +    return CEIL (GET_MODE_SIZE (mode), 2);
>>     When do we ever want to use more than 1 register for VPR?
>>
>>
>> That was tricky.
>> Richard Sandiford helped me analyze the problem, I guess I can quote him:
>>
>> RS> I think the problem is a combination of a few things:
>> RS>
>> RS> (1) arm_hard_regno_mode_ok rejects SImode in VPR, so SImode moves
>> RS>     to or from the VPR_REG class get the maximum cost.
>> RS>
>> RS> (2) IRA thinks from CLASS_MAX_NREGS and arm_hard_regno_nregs that
>> RS>    VPR is big enough to hold SImode.
>> RS>
>> RS> (3) If a class C1 is a superset of a class C2, and if C2 is big enough
>> RS>     to hold a mode M, IRA ensures that move costs for M involving C1
>> RS>     are >= move costs for M involving C2.
>> RS>
>> RS> (1) is correct but (2) isn't.  IMO (3) is dubious: the trigger should
>> RS> be whether C2 is actually allowed to hold M, not whether C2 is big 
>> enough
>> RS> to hold M.  However, changing that is likely to cause problems 
>> elsewhere,
>> RS> and could lead to classes like GENERAL_AND_FP_REGS being used when
>> RS> FP_REGS are disabled (which might be confusing).
>> RS>
>
> I understand everything up until here.
>
>> RS> “Fixing” (2) using:
>> RS>
>> RS>  CEIL (GET_MODE_SIZE (mode), 2)
> I was wondering why not just return '1' for VPR_REGNUM, rather than use 
> the fact that the mode-size we use for VPR is 2 bytes, so diving it by 2 
> makes 1. Unless we ever decide to use a larger mode for VPR, maybe 
> that's what this is trying to address? I can't imagine we would ever 
> need to though since for MVE there is only one VPR register and it is 
> always 16-bits. Just feels overly complicated to me.

For context, that's what the first version did, and is what led to
the reload failure.  The above is trying to explain why returning
1 doesn't work in practice.

To put (2) a slightly different way: if the port says VPR_REGNUM takes
1 register regardless of the mode passed in, the port is effectively
saying that VPR (and thus VPR_REGNUM) has enough bits to hold *any* mode
passed in (SImode, DImode, etc.).  It actually makes VPR seem bigger
than a general register.

In the particular case of the reload failure, returning 1 effectively
tells the RA that VPR is big enough to hold SImode, but that the port is
nevertheless choosing not to allow VPR to be used to hold SImode.  This
then “infects” the SImode cost of GENERAL_AND_VPR_REGS.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass
  2022-01-20 10:40         ` Richard Sandiford
@ 2022-01-20 10:45           ` Andre Vieira (lists)
  0 siblings, 0 replies; 54+ messages in thread
From: Andre Vieira (lists) @ 2022-01-20 10:45 UTC (permalink / raw)
  To: Christophe Lyon, Christophe Lyon, GCC Patches, richard.sandiford


On 20/01/2022 10:40, Richard Sandiford wrote:
> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>> On 20/01/2022 09:14, Christophe Lyon wrote:
>>>
>>> On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>>
>>>      Hi Christophe,
>>>
>>>      On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
>>>      > At some point during the development of this patch series, it
>>>      appeared
>>>      > that in some cases the register allocator wants “VPR or general”
>>>      > rather than “VPR or general or FP” (which is the same thing as
>>>      > ALL_REGS).  The series does not seem to require this anymore, but it
>>>      > seems to be a good thing to do anyway, to give the register
>>>      allocator
>>>      > more freedom.
>>>      Not sure I fully understand this, but I guess it creates an extra
>>>      class
>>>      the register allocator can use to group things that can go into
>>>      VPR or
>>>      general reg?
>>>      >
>>>      > CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
>>>      > regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
>>>      > -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.
>>>      I have not looked into this failure, but ...
>>>      >
>>>      > 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
>>>      >
>>>      >       gcc/
>>>      >       * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
>>>      >       (REG_CLASS_NAMES): Likewise.
>>>      >       (REG_CLASS_CONTENTS): Likewise.
>>>      >       (CLASS_MAX_NREGS): Handle VPR.
>>>      >       * config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.
>>>      >
>>>      > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>>      > index bb75921f32d..c3559ca8703 100644
>>>      > --- a/gcc/config/arm/arm.c
>>>      > +++ b/gcc/config/arm/arm.c
>>>      > @@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
>>>      >   static unsigned int
>>>      >   arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
>>>      >   {
>>>      > +  if (IS_VPR_REGNUM (regno))
>>>      > +    return CEIL (GET_MODE_SIZE (mode), 2);
>>>      When do we ever want to use more than 1 register for VPR?
>>>
>>>
>>> That was tricky.
>>> Richard Sandiford helped me analyze the problem, I guess I can quote him:
>>>
>>> RS> I think the problem is a combination of a few things:
>>> RS>
>>> RS> (1) arm_hard_regno_mode_ok rejects SImode in VPR, so SImode moves
>>> RS>     to or from the VPR_REG class get the maximum cost.
>>> RS>
>>> RS> (2) IRA thinks from CLASS_MAX_NREGS and arm_hard_regno_nregs that
>>> RS>    VPR is big enough to hold SImode.
>>> RS>
>>> RS> (3) If a class C1 is a superset of a class C2, and if C2 is big enough
>>> RS>     to hold a mode M, IRA ensures that move costs for M involving C1
>>> RS>     are >= move costs for M involving C2.
>>> RS>
>>> RS> (1) is correct but (2) isn't.  IMO (3) is dubious: the trigger should
>>> RS> be whether C2 is actually allowed to hold M, not whether C2 is big
>>> enough
>>> RS> to hold M.  However, changing that is likely to cause problems
>>> elsewhere,
>>> RS> and could lead to classes like GENERAL_AND_FP_REGS being used when
>>> RS> FP_REGS are disabled (which might be confusing).
>>> RS>
>> I understand everything up until here.
>>
>>> RS> “Fixing” (2) using:
>>> RS>
>>> RS>  CEIL (GET_MODE_SIZE (mode), 2)
>> I was wondering why not just return '1' for VPR_REGNUM, rather than use
>> the fact that the mode-size we use for VPR is 2 bytes, so diving it by 2
>> makes 1. Unless we ever decide to use a larger mode for VPR, maybe
>> that's what this is trying to address? I can't imagine we would ever
>> need to though since for MVE there is only one VPR register and it is
>> always 16-bits. Just feels overly complicated to me.
> For context, that's what the first version did, and is what led to
> the reload failure.  The above is trying to explain why returning
> 1 doesn't work in practice.
>
> To put (2) a slightly different way: if the port says VPR_REGNUM takes
> 1 register regardless of the mode passed in, the port is effectively
> saying that VPR (and thus VPR_REGNUM) has enough bits to hold *any* mode
> passed in (SImode, DImode, etc.).  It actually makes VPR seem bigger
> than a general register.
>
> In the particular case of the reload failure, returning 1 effectively
> tells the RA that VPR is big enough to hold SImode, but that the port is
> nevertheless choosing not to allow VPR to be used to hold SImode.  This
> then “infects” the SImode cost of GENERAL_AND_VPR_REGS.
>
> Thanks,
> Richard
Ah OK thanks for the explanation.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  2022-01-19 19:03   ` Andre Vieira (lists)
  2022-01-20  9:23     ` Christophe Lyon
@ 2022-01-20 10:45     ` Richard Sandiford
  2022-01-20 11:06       ` Andre Vieira (lists)
  1 sibling, 1 reply; 54+ messages in thread
From: Richard Sandiford @ 2022-01-20 10:45 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Christophe Lyon, gcc-patches, Kyrylo Tkachov

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
>> The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
>> <V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.
>>
>> 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
>>
>> 	gcc/
>> 	* config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
>> 	for operand 1.
>>
>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>> index 171dd384133..5c3b34dce3a 100644
>> --- a/gcc/config/arm/mve.md
>> +++ b/gcc/config/arm/mve.md
>> @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_<supf><mode>"
>>   (define_insn "mve_vmvnq_n_<supf><mode>"
>>     [
>>      (set (match_operand:MVE_5 0 "s_register_operand" "=w")
>> -	(unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
>> +	(unspec:MVE_5 [(match_operand:<V_elem> 1 "immediate_operand" "i")]
>>   	 VMVNQ_N))
>>     ]
>>     "TARGET_HAVE_MVE"
>
> While fixing this it might be good to fix the constraint and predicate 
> inspired by "DL" and "neon_inv_logic_op2" respectively. This would avoid 
> the compiler generating wrong assembly, and instead it would probably 
> lead to the compiler using a load literal.

FWIW: for cases like this, I think it's better to define a predicate
only (not a constraint).  By design, the only time that constraints
are used independently of predicates is during RA, and there's nothing
that RA can/should do for immediate operands.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_<supf><mode> argument mode
  2022-01-20 10:45     ` Richard Sandiford
@ 2022-01-20 11:06       ` Andre Vieira (lists)
  0 siblings, 0 replies; 54+ messages in thread
From: Andre Vieira (lists) @ 2022-01-20 11:06 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Kyrylo Tkachov, richard.sandiford


On 20/01/2022 10:45, Richard Sandiford wrote:
> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
>>> The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
>>> <V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.
>>>
>>> 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
>>>
>>> 	gcc/
>>> 	* config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
>>> 	for operand 1.
>>>
>>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>>> index 171dd384133..5c3b34dce3a 100644
>>> --- a/gcc/config/arm/mve.md
>>> +++ b/gcc/config/arm/mve.md
>>> @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_<supf><mode>"
>>>    (define_insn "mve_vmvnq_n_<supf><mode>"
>>>      [
>>>       (set (match_operand:MVE_5 0 "s_register_operand" "=w")
>>> -	(unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
>>> +	(unspec:MVE_5 [(match_operand:<V_elem> 1 "immediate_operand" "i")]
>>>    	 VMVNQ_N))
>>>      ]
>>>      "TARGET_HAVE_MVE"
>> While fixing this it might be good to fix the constraint and predicate
>> inspired by "DL" and "neon_inv_logic_op2" respectively. This would avoid
>> the compiler generating wrong assembly, and instead it would probably
>> lead to the compiler using a load literal.
> FWIW: for cases like this, I think it's better to define a predicate
> only (not a constraint).  By design, the only time that constraints
> are used independently of predicates is during RA, and there's nothing
> that RA can/should do for immediate operands.
>
> Thanks,
> Richard
Yeah, if I use a predicate it doesn't like the fact that we are passing 
an argument 'imm' rather than actual immediate. To use a constraint like 
DL I'd also need to change the builtin to take a vector of immediates, 
since we can't use immediates as they don't have a mode and the 
constraint needs to be able to know what mode we are using.

This will have to wait...

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-01-13 14:56 ` [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans Christophe Lyon
@ 2022-01-21 11:20   ` Andre Vieira (lists)
  2022-01-21 22:30     ` Christophe Lyon
  2022-01-27 16:28   ` Kyrylo Tkachov
  2022-01-31 18:01   ` Richard Sandiford
  2 siblings, 1 reply; 54+ messages in thread
From: Andre Vieira (lists) @ 2022-01-21 11:20 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches

Hi Christophe,

On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
> index 6ba6f211531..920c2a68e4c 100644
> --- a/gcc/config/arm/arm-simd-builtin-types.def
> +++ b/gcc/config/arm/arm-simd-builtin-types.def
> @@ -51,3 +51,7 @@
>     ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
>     ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
>     ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
> +
> +  ENTRY (Pred1x16_t, V16BI, unsigned, 16, uint16, 21)
> +  ENTRY (Pred2x8_t, V8BI, unsigned, 8, uint16, 21)
> +  ENTRY (Pred4x4_t, V4BI, unsigned, 4, uint16, 21)

I'm trying to lower masked loads and when I tried to use the 
arm_simd_types[Pred1x16_t].itype as the mask type I noticed the 
TYPE_SIZE of that is 256, rather than the expected 16. Instead I used 
truth_type_for (arm_simd_types[Uint8x16_t].itype) and that gives me a 
compatible vector of booleans. So the itype for Pred1x16_t seems wrong 
to me.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-01-21 11:20   ` Andre Vieira (lists)
@ 2022-01-21 22:30     ` Christophe Lyon
  0 siblings, 0 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-21 22:30 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Christophe Lyon, GCC Patches

Hi Andre,

On Fri, Jan 21, 2022 at 12:23 PM Andre Vieira (lists) via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi Christophe,
>
> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> > diff --git a/gcc/config/arm/arm-simd-builtin-types.def
> b/gcc/config/arm/arm-simd-builtin-types.def
> > index 6ba6f211531..920c2a68e4c 100644
> > --- a/gcc/config/arm/arm-simd-builtin-types.def
> > +++ b/gcc/config/arm/arm-simd-builtin-types.def
> > @@ -51,3 +51,7 @@
> >     ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
> >     ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
> >     ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
> > +
> > +  ENTRY (Pred1x16_t, V16BI, unsigned, 16, uint16, 21)
> > +  ENTRY (Pred2x8_t, V8BI, unsigned, 8, uint16, 21)
> > +  ENTRY (Pred4x4_t, V4BI, unsigned, 4, uint16, 21)
>
> I'm trying to lower masked loads and when I tried to use the
> arm_simd_types[Pred1x16_t].itype as the mask type I noticed the
> TYPE_SIZE of that is 256, rather than the expected 16. Instead I used
> truth_type_for (arm_simd_types[Uint8x16_t].itype) and that gives me a
> compatible vector of booleans. So the itype for Pred1x16_t seems wrong
> to me.
>
>  How about:
ENTRY (Pred1x16_t, V16BI, predicate, 16, pred1, 21)
ENTRY (Pred2x8_t, V8BI, predicate, 8, pred1, 21)
ENTRY (Pred4x4_t, V4BI, predicate, 4, pred1, 21)

Christophe

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates
  2022-01-14 14:22     ` Kyrylo Tkachov
@ 2022-01-26  8:40       ` Christophe Lyon
  0 siblings, 0 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-26  8:40 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches

Ping?

As discussed elsewhere with André, I'll drop patch #15 from this series,
since his patch
is a better fix.

Since v2 of this series had been approved, I think only patches 4,7,8,9,12
and 13 need
proper review.

Thanks,

Christophe


On Fri, Jan 14, 2022 at 3:22 PM Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
wrote:

> Hi Christophe, Richard,
>
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-
> > bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Richard
> > Biener via Gcc-patches
> > Sent: Friday, January 14, 2022 1:33 PM
> > To: Christophe Lyon <christophe.lyon.oss@gmail.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH v3 00/15] ARM/MVE use vectors of boolean for
> > predicates
> >
> > On Fri, Jan 14, 2022 at 2:18 PM Christophe Lyon via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi,
> > >
> > > I hadn't realized we are moving to stage 4 this week-end :-(
> > >
> > > The PRs I'm fixing are P3, but without these fixes MVE support is badly
> > > broken, so I think I would be really good to fix that before the buggy
> > > version becomes part of an actual release.
> > > Anyway I posted v1 of the patches during stage1, so it should still be
> OK
> > > if they are accepted as-is ?
> >
> > In the end it's up to the target maintainers to weight the risk of
> breakage
> > vs. the risk of not usefulness ;)  But stage3 is where the "was posted
> > during stage1"
> > rule can easily apply - at some point we have to stop with such general
> ruling.
> >
>
> Thanks, that's in line with my interpretation.
> These patches resolve some nasty brokenness in the MVE support that I'm
> keen to see fixed and from what I can tell the patches shouldn't have a
> large effect on non-MVE code.
> So the risk vs reward balance for the arm port as a whole looks good to me.
> Andre has kindly agreed to help review the patches and I'll also try to
> get to them today and next week so that we can get them in early stage4.
>
> Thanks,
> Kyrill
>
> > Richard.
> >
> > > Thanks,
> > >
> > > Christophe
> > >
> > > On Thu, Jan 13, 2022 at 3:58 PM Christophe Lyon via Gcc-patches <
> > > gcc-patches@gcc.gnu.org> wrote:
> > >
> > > >
> > > > This is v3 of this patch series, fixing issues I discovered before
> > > > committing v2 (which had been approved).
> > > >
> > > > Thanks a lot to Richard Sandiford for his help.
> > > >
> > > > The changes v2 -> v3 are:
> > > >
> > > > Patch 4: Fix arm_hard_regno_nregs and CLASS_MAX_NREGS to support
> > VPR.
> > > >
> > > > Patch 7: Changes to the underlying representation of vectors of
> > > > booleans to account for the different expectations between
> AArch64/SVE
> > > > and Arm/MVE.
> > > >
> > > > Patch 8: Re-use and extend existing thumb2_movhi* patterns instead of
> > > > duplicating them in mve_mov<mode>. This requires the introduction of
> a
> > > > new constraint to match a constant vector of booleans. Add a new RTL
> > > > test.
> > > >
> > > > Patch 9: Introduce check_effective_target_arm_mve and skip
> > > > gcc.dg/signbit-2.c, because with MVE there is no fallback
> architecture
> > > > unlike SVE or AVX512.
> > > >
> > > > Patch 12: Update less load/store MVE builtins
> > > > (mve_vldrdq_gather_base_z_<supf>v2di,
> > > > mve_vldrdq_gather_offset_z_<supf>v2di,
> > > > mve_vldrdq_gather_shifted_offset_z_<supf>v2di,
> > > > mve_vstrdq_scatter_base_p_<supf>v2di,
> > > > mve_vstrdq_scatter_offset_p_<supf>v2di,
> > > > mve_vstrdq_scatter_offset_p_<supf>v2di_insn,
> > > > mve_vstrdq_scatter_shifted_offset_p_<supf>v2di,
> > > > mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn,
> > > > mve_vstrdq_scatter_base_wb_p_<supf>v2di,
> > > > mve_vldrdq_gather_base_wb_z_<supf>v2di,
> > > > mve_vldrdq_gather_base_nowb_z_<supf>v2di,
> > > > mve_vldrdq_gather_base_wb_z_<supf>v2di_insn) for which we keep HI
> > mode
> > > > for vpr_register_operand.
> > > >
> > > > Patch 13: No need to update
> > > > gcc.target/arm/acle/cde-mve-full-assembly.c anymore since we re-use
> > > > the mov pattern that emits '@ movhi' in the assembly.
> > > >
> > > > Patch 15: This is a new patch to fix a problem I noticed during this
> > > > v2->v3 update.
> > > >
> > > >
> > > >
> > > > I'll squash patch 2 with patch 9 and patch 3 with patch 8.
> > > >
> > > > Original text:
> > > >
> > > > This patch series addresses PR 100757 and 101325 by representing
> > > > vectors of predicates (MVE VPR.P0 register) as vectors of booleans
> > > > rather than using HImode.
> > > >
> > > > As this implies a lot of mostly mechanical changes, I have tried to
> > > > split the patches in a way that should help reviewers, but the split
> > > > is a bit artificial.
> > > >
> > > > Patches 1-3 add new tests.
> > > >
> > > > Patches 4-6 are small independent improvements.
> > > >
> > > > Patch 7 implements the predicate qualifier, but does not change any
> > > > builtin yet.
> > > >
> > > > Patch 8 is the first of the two main patches, and uses the new
> > > > qualifier to describe the vcmp and vpsel builtins that are useful for
> > > > auto-vectorization of comparisons.
> > > >
> > > > Patch 9 is the second main patch, which fixes the vcond_mask
> expander.
> > > >
> > > > Patches 10-13 convert almost all the remaining builtins with HI
> > > > operands to use the predicate qualifier.  After these, there are
> still
> > > > a few builtins with HI operands left, about which I am not sure:
> vctp,
> > > > vpnot, load-gather and store-scatter with v2di operands.  In fact,
> > > > patches 11/12 update some STR/LDR qualifiers in a way that breaks
> > > > these v2di builtins although existing tests still pass.
> > > >
> > > > Christophe Lyon (15):
> > > >   arm: Add new tests for comparison vectorization with Neon and MVE
> > > >   arm: Add tests for PR target/100757
> > > >   arm: Add tests for PR target/101325
> > > >   arm: Add GENERAL_AND_VPR_REGS regclass
> > > >   arm: Add support for VPR_REG in arm_class_likely_spilled_p
> > > >   arm: Fix mve_vmvnq_n_<supf><mode> argument mode
> > > >   arm: Implement MVE predicates as vectors of booleans
> > > >   arm: Implement auto-vectorized MVE comparisons with vectors of
> > boolean
> > > >     predicates
> > > >   arm: Fix vcond_mask expander for MVE (PR target/100757)
> > > >   arm: Convert remaining MVE vcmp builtins to predicate qualifiers
> > > >   arm: Convert more MVE builtins to predicate qualifiers
> > > >   arm: Convert more load/store MVE builtins to predicate qualifiers
> > > >   arm: Convert more MVE/CDE builtins to predicate qualifiers
> > > >   arm: Add VPR_REG to ALL_REGS
> > > >   arm: Fix constraint check for V8HI in mve_vector_mem_operand
> > > >
> > > >  gcc/config/aarch64/aarch64-modes.def          |   8 +-
> > > >  gcc/config/arm/arm-builtins.c                 | 224 +++--
> > > >  gcc/config/arm/arm-builtins.h                 |   4 +-
> > > >  gcc/config/arm/arm-modes.def                  |   8 +
> > > >  gcc/config/arm/arm-protos.h                   |   4 +-
> > > >  gcc/config/arm/arm-simd-builtin-types.def     |   4 +
> > > >  gcc/config/arm/arm.c                          | 169 ++--
> > > >  gcc/config/arm/arm.h                          |   9 +-
> > > >  gcc/config/arm/arm_mve_builtins.def           | 746 ++++++++--------
> > > >  gcc/config/arm/constraints.md                 |   6 +
> > > >  gcc/config/arm/iterators.md                   |   6 +
> > > >  gcc/config/arm/mve.md                         | 795
> ++++++++++--------
> > > >  gcc/config/arm/neon.md                        |  39 +
> > > >  gcc/config/arm/vec-common.md                  |  52 --
> > > >  gcc/config/arm/vfp.md                         |  34 +-
> > > >  gcc/doc/sourcebuild.texi                      |   4 +
> > > >  gcc/emit-rtl.c                                |  20 +-
> > > >  gcc/genmodes.c                                |  81 +-
> > > >  gcc/machmode.def                              |   2 +-
> > > >  gcc/rtx-vector-builder.c                      |   4 +-
> > > >  gcc/simplify-rtx.c                            |  34 +-
> > > >  gcc/testsuite/gcc.dg/signbit-2.c              |   1 +
> > > >  .../gcc.target/arm/simd/mve-vcmp-f32-2.c      |  32 +
> > > >  .../gcc.target/arm/simd/neon-compare-1.c      |  78 ++
> > > >  .../gcc.target/arm/simd/neon-compare-2.c      |  13 +
> > > >  .../gcc.target/arm/simd/neon-compare-3.c      |  14 +
> > > >  .../arm/simd/neon-compare-scalar-1.c          |  57 ++
> > > >  .../gcc.target/arm/simd/neon-vcmp-f16.c       |  12 +
> > > >  .../gcc.target/arm/simd/neon-vcmp-f32-2.c     |  15 +
> > > >  .../gcc.target/arm/simd/neon-vcmp-f32-3.c     |  12 +
> > > >  .../gcc.target/arm/simd/neon-vcmp-f32.c       |  12 +
> > > >  gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c |  22 +
> > > >  .../gcc.target/arm/simd/pr100757-2.c          |  20 +
> > > >  .../gcc.target/arm/simd/pr100757-3.c          |  20 +
> > > >  .../gcc.target/arm/simd/pr100757-4.c          |  19 +
> > > >  gcc/testsuite/gcc.target/arm/simd/pr100757.c  |  19 +
> > > >  .../gcc.target/arm/simd/pr101325-2.c          |  19 +
> > > >  gcc/testsuite/gcc.target/arm/simd/pr101325.c  |  14 +
> > > >  gcc/testsuite/lib/target-supports.exp         |  15 +-
> > > >  gcc/varasm.c                                  |   7 +-
> > > >  40 files changed, 1635 insertions(+), 1019 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-
> > 2.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-
> > 1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-
> > 2.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-
> > 3.c
> > > >  create mode 100644
> > > > gcc/testsuite/gcc.target/arm/simd/neon-compare-scalar-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f16.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-
> > 2.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-
> > 3.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr101325.c
> > > >
> > > > --
> > > > 2.25.1
> > > >
> > > >
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass
  2022-01-13 14:56 ` [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass Christophe Lyon
  2022-01-19 18:17   ` Andre Vieira (lists)
@ 2022-01-27 16:21   ` Kyrylo Tkachov
  1 sibling, 0 replies; 54+ messages in thread
From: Kyrylo Tkachov @ 2022-01-27 16:21 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches



> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: Thursday, January 13, 2022 2:56 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass
> 
> At some point during the development of this patch series, it appeared
> that in some cases the register allocator wants “VPR or general”
> rather than “VPR or general or FP” (which is the same thing as
> ALL_REGS).  The series does not seem to require this anymore, but it
> seems to be a good thing to do anyway, to give the register allocator
> more freedom.
> 
> CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
> regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
> -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.

Given the discussions I've seen on this patch (thanks Andre and Richard) this is ok.
Though please rebase this as we've since renamed arm.c to arm.cc

Thanks,
Kyrill

> 
> 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
> 
> 	gcc/
> 	* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
> 	(REG_CLASS_NAMES): Likewise.
> 	(REG_CLASS_CONTENTS): Likewise.
> 	(CLASS_MAX_NREGS): Handle VPR.
> 	* config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index bb75921f32d..c3559ca8703 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
>  static unsigned int
>  arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
>  {
> +  if (IS_VPR_REGNUM (regno))
> +    return CEIL (GET_MODE_SIZE (mode), 2);
> +
>    if (TARGET_32BIT
>        && regno > PC_REGNUM
>        && regno != FRAME_POINTER_REGNUM
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index dacce2b7f08..2416fb5ef64 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1287,6 +1287,7 @@ enum reg_class
>    SFP_REG,
>    AFP_REG,
>    VPR_REG,
> +  GENERAL_AND_VPR_REGS,
>    ALL_REGS,
>    LIM_REG_CLASSES
>  };
> @@ -1316,6 +1317,7 @@ enum reg_class
>    "SFP_REG",		\
>    "AFP_REG",		\
>    "VPR_REG",		\
> +  "GENERAL_AND_VPR_REGS", \
>    "ALL_REGS"		\
>  }
> 
> @@ -1344,6 +1346,7 @@ enum reg_class
>    { 0x00000000, 0x00000000, 0x00000000, 0x00000040 }, /* SFP_REG */
> 	\
>    { 0x00000000, 0x00000000, 0x00000000, 0x00000080 }, /* AFP_REG */
> 	\
>    { 0x00000000, 0x00000000, 0x00000000, 0x00000400 }, /* VPR_REG.  */
> 	\
> +  { 0x00005FFF, 0x00000000, 0x00000000, 0x00000400 }, /*
> GENERAL_AND_VPR_REGS.  */ \
>    { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000000F }  /* ALL_REGS.  */	\
>  }
> 
> @@ -1453,7 +1456,9 @@ extern const char
> *fp_sysreg_names[NB_FP_SYSREGS];
>     ARM regs are UNITS_PER_WORD bits.
>     FIXME: Is this true for iWMMX?  */
>  #define CLASS_MAX_NREGS(CLASS, MODE)  \
> -  (ARM_NUM_REGS (MODE))
> +  (CLASS == VPR_REG)		      \
> +  ? CEIL (GET_MODE_SIZE (MODE), 2)    \
> +  : (ARM_NUM_REGS (MODE))
> 
>  /* If defined, gives a class of registers that cannot be used as the
>     operand of a SUBREG that changes the mode of the object illegally.  */
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-01-13 14:56 ` [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans Christophe Lyon
  2022-01-21 11:20   ` Andre Vieira (lists)
@ 2022-01-27 16:28   ` Kyrylo Tkachov
  2022-01-27 18:10     ` Christophe Lyon
  2022-01-31 18:01   ` Richard Sandiford
  2 siblings, 1 reply; 54+ messages in thread
From: Kyrylo Tkachov @ 2022-01-27 16:28 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Richard Sandiford, gcc-patches

Hi Christophe,

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: Thursday, January 13, 2022 2:56 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of
> booleans
> 
> This patch implements support for vectors of booleans to support MVE
> predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
> uint16_t) to represent predicates in intrinsics prototypes, we
> introduce a new "predicate" type qualifier so that we can map relevant
> builtins HImode arguments and return value to the appropriate vector
> of booleans (VxBI).
> 
> We have to update test_vector_ops_duplicate, because it iterates using
> an offset in bytes, where we would need to iterate in bits: we stop
> iterating when we reach the end of the vector of booleans.
> 
> In addition, we have to fix the underlying definition of vectors of
> booleans because ARM/MVE needs a different representation than
> AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
> element size, so that a true element of V4BI is represented by
> '0b1111'.  This patch updates the aarch64 definition of VNx*BI as
> needed.
> 
> 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
> 	Richard Sandiford  <richard.sandiford@arm.com>
> 
> 	gcc/
> 	PR target/100757
> 	PR target/101325
> 	* config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
> 	VNx2BI): Update definition.
> 	* config/arm/arm-builtins.c (arm_init_simd_builtin_types): Add new
> 	simd types.
> 	(arm_init_builtin): Map predicate vectors arguments to HImode.
> 	(arm_expand_builtin_args): Move HImode predicate arguments to
> VxBI
> 	rtx. Move return value to HImode rtx.
> 	* config/arm/arm-builtins.h (arm_type_qualifiers): Add
> qualifier_predicate.
> 	* config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New
> modes.
> 	* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
> 	Pred2x8_t,Pred4x4_t): New.
> 	* emit-rtl.c (init_emit_once): Handle all boolean modes.
> 	* genmodes.c (mode_data): Add boolean field.
> 	(blank_mode): Initialize it.
> 	(make_complex_modes): Fix handling of boolean modes.
> 	(make_vector_modes): Likewise.
> 	(VECTOR_BOOL_MODE): Use new COMPONENT parameter.
> 	(make_vector_bool_mode): Likewise.
> 	(BOOL_MODE): New.
> 	(make_bool_mode): New.
> 	(emit_insn_modes_h): Fix generation of boolean modes.
> 	(emit_class_narrowest_mode): Likewise.
> 	* machmode.def: Use new BOOL_MODE instead of
> FRACTIONAL_INT_MODE
> 	to define BImode.
> 	* rtx-vector-builder.c (rtx_vector_builder::find_cached_value):
> 	Fix handling of constm1_rtx for VECTOR_BOOL.
> 	* simplify-rtx.c (native_encode_rtx): Fix support for VECTOR_BOOL.
> 	(native_decode_vector_rtx): Likewise.
> 	(test_vector_ops_duplicate): Skip vec_merge test
> 	with vectors of booleans.
> 	* varasm.c (output_constant_pool_2): Likewise.

The arm parts look ok. I guess Richard is best placed to approve the midend parts, but I see he's on the ChangeLog so maybe he needs others to review them. But then again Richard is maintainer of the gen* machinery that's the most complicated part of the patch so he can self-approve 😊
Thanks,
Kyrill

> 
> diff --git a/gcc/config/aarch64/aarch64-modes.def
> b/gcc/config/aarch64/aarch64-modes.def
> index 976bf9b42be..8f399225a80 100644
> --- a/gcc/config/aarch64/aarch64-modes.def
> +++ b/gcc/config/aarch64/aarch64-modes.def
> @@ -47,10 +47,10 @@ ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
> 
>  /* Vector modes.  */
> 
> -VECTOR_BOOL_MODE (VNx16BI, 16, 2);
> -VECTOR_BOOL_MODE (VNx8BI, 8, 2);
> -VECTOR_BOOL_MODE (VNx4BI, 4, 2);
> -VECTOR_BOOL_MODE (VNx2BI, 2, 2);
> +VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
> +VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
> +VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
> +VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
> 
>  ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
>  ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 9c645722230..2ccfa37c302 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -1548,6 +1548,13 @@ arm_init_simd_builtin_types (void)
>    arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
>    arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
> 
> +  if (TARGET_HAVE_MVE)
> +    {
> +      arm_simd_types[Pred1x16_t].eltype = unsigned_intHI_type_node;
> +      arm_simd_types[Pred2x8_t].eltype = unsigned_intHI_type_node;
> +      arm_simd_types[Pred4x4_t].eltype = unsigned_intHI_type_node;
> +    }
> +
>    for (i = 0; i < nelts; i++)
>      {
>        tree eltype = arm_simd_types[i].eltype;
> @@ -1695,6 +1702,11 @@ arm_init_builtin (unsigned int fcode,
> arm_builtin_datum *d,
>        if (qualifiers & qualifier_map_mode)
>  	op_mode = d->mode;
> 
> +      /* MVE Predicates use HImode as mandated by the ABI: pred16_t is
> unsigned
> +	 short.  */
> +      if (qualifiers & qualifier_predicate)
> +	op_mode = HImode;
> +
>        /* For pointers, we want a pointer to the basic type
>  	 of the vector.  */
>        if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
> @@ -2939,6 +2951,11 @@ arm_expand_builtin_args (rtx target,
> machine_mode map_mode, int fcode,
>  	    case ARG_BUILTIN_COPY_TO_REG:
>  	      if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
>  		op[argc] = convert_memory_address (Pmode, op[argc]);
> +
> +	      /* MVE uses mve_pred16_t (aka HImode) for vectors of
> predicates.  */
> +	      if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
> +		op[argc] = gen_lowpart (mode[argc], op[argc]);
> +
>  	      /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
>  	      if (!(*insn_data[icode].operand[opno].predicate)
>  		  (op[argc], mode[argc]))
> @@ -3144,6 +3161,13 @@ constant_arg:
>    else
>      emit_insn (insn);
> 
> +  if (GET_MODE_CLASS (tmode) == MODE_VECTOR_BOOL)
> +    {
> +      rtx HItarget = gen_reg_rtx (HImode);
> +      emit_move_insn (HItarget, gen_lowpart (HImode, target));
> +      return HItarget;
> +    }
> +
>    return target;
>  }
> 
> diff --git a/gcc/config/arm/arm-builtins.h b/gcc/config/arm/arm-builtins.h
> index e5130d6d286..a8ef8aef82d 100644
> --- a/gcc/config/arm/arm-builtins.h
> +++ b/gcc/config/arm/arm-builtins.h
> @@ -84,7 +84,9 @@ enum arm_type_qualifiers
>    qualifier_lane_pair_index = 0x1000,
>    /* Lane indices selected in quadtuplets - must be within range of previous
>       argument = a vector.  */
> -  qualifier_lane_quadtup_index = 0x2000
> +  qualifier_lane_quadtup_index = 0x2000,
> +  /* MVE vector predicates.  */
> +  qualifier_predicate = 0x4000
>  };
> 
>  struct arm_simd_type_info
> diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
> index de689c8b45e..9ed0cd042c5 100644
> --- a/gcc/config/arm/arm-modes.def
> +++ b/gcc/config/arm/arm-modes.def
> @@ -84,6 +84,14 @@ VECTOR_MODE (FLOAT, BF, 2);   /*                 V2BF.  */
>  VECTOR_MODE (FLOAT, BF, 4);   /*		 V4BF.  */
>  VECTOR_MODE (FLOAT, BF, 8);   /*		 V8BF.  */
> 
> +/* Predicates for MVE.  */
> +BOOL_MODE (B2I, 2, 1);
> +BOOL_MODE (B4I, 4, 1);
> +
> +VECTOR_BOOL_MODE (V16BI, 16, BI, 2);
> +VECTOR_BOOL_MODE (V8BI, 8, B2I, 2);
> +VECTOR_BOOL_MODE (V4BI, 4, B4I, 2);
> +
>  /* Fraction and accumulator vector modes.  */
>  VECTOR_MODES (FRACT, 4);      /* V4QQ  V2HQ */
>  VECTOR_MODES (UFRACT, 4);     /* V4UQQ V2UHQ */
> diff --git a/gcc/config/arm/arm-simd-builtin-types.def
> b/gcc/config/arm/arm-simd-builtin-types.def
> index 6ba6f211531..920c2a68e4c 100644
> --- a/gcc/config/arm/arm-simd-builtin-types.def
> +++ b/gcc/config/arm/arm-simd-builtin-types.def
> @@ -51,3 +51,7 @@
>    ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
>    ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
>    ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
> +
> +  ENTRY (Pred1x16_t, V16BI, unsigned, 16, uint16, 21)
> +  ENTRY (Pred2x8_t, V8BI, unsigned, 8, uint16, 21)
> +  ENTRY (Pred4x4_t, V4BI, unsigned, 4, uint16, 21)
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index feeee16d320..5f559f8fd93 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -6239,9 +6239,14 @@ init_emit_once (void)
> 
>    /* For BImode, 1 and -1 are unsigned and signed interpretations
>       of the same value.  */
> -  const_tiny_rtx[0][(int) BImode] = const0_rtx;
> -  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
> -  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
> +  for (mode = MIN_MODE_BOOL;
> +       mode <= MAX_MODE_BOOL;
> +       mode = (machine_mode)((int)(mode) + 1))
> +    {
> +      const_tiny_rtx[0][(int) mode] = const0_rtx;
> +      const_tiny_rtx[1][(int) mode] = const_true_rtx;
> +      const_tiny_rtx[3][(int) mode] = const_true_rtx;
> +    }
> 
>    for (mode = MIN_MODE_PARTIAL_INT;
>         mode <= MAX_MODE_PARTIAL_INT;
> @@ -6260,13 +6265,16 @@ init_emit_once (void)
>        const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
>      }
> 
> -  /* As for BImode, "all 1" and "all -1" are unsigned and signed
> -     interpretations of the same value.  */
>    FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
>      {
>        const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
>        const_tiny_rtx[3][(int) mode] = gen_const_vector (mode, 3);
> -      const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
> +      if (GET_MODE_INNER (mode) == BImode)
> +	/* As for BImode, "all 1" and "all -1" are unsigned and signed
> +	   interpretations of the same value.  */
> +	const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
> +      else
> +	const_tiny_rtx[1][(int) mode] = gen_const_vector (mode, 1);
>      }
> 
>    FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
> diff --git a/gcc/genmodes.c b/gcc/genmodes.c
> index 6001b854547..0bb1a7c0b48 100644
> --- a/gcc/genmodes.c
> +++ b/gcc/genmodes.c
> @@ -78,6 +78,7 @@ struct mode_data
>    bool need_bytesize_adj;	/* true if this mode needs dynamic size
>  				   adjustment */
>    unsigned int int_n;		/* If nonzero, then __int<INT_N> will be
> defined */
> +  bool boolean;
>  };
> 
>  static struct mode_data *modes[MAX_MODE_CLASS];
> @@ -88,7 +89,8 @@ static const struct mode_data blank_mode = {
>    0, "<unknown>", MAX_MODE_CLASS,
>    0, -1U, -1U, -1U, -1U,
>    0, 0, 0, 0, 0, 0,
> -  "<unknown>", 0, 0, 0, 0, false, false, 0
> +  "<unknown>", 0, 0, 0, 0, false, false, 0,
> +  false
>  };
> 
>  static htab_t modes_by_name;
> @@ -456,7 +458,7 @@ make_complex_modes (enum mode_class cl,
>        size_t m_len;
> 
>        /* Skip BImode.  FIXME: BImode probably shouldn't be MODE_INT.  */
> -      if (m->precision == 1)
> +      if (m->boolean)
>  	continue;
> 
>        m_len = strlen (m->name);
> @@ -528,7 +530,7 @@ make_vector_modes (enum mode_class cl, const
> char *prefix, unsigned int width,
>  	 not be necessary.  */
>        if (cl == MODE_FLOAT && m->bytesize == 1)
>  	continue;
> -      if (cl == MODE_INT && m->precision == 1)
> +      if (m->boolean)
>  	continue;
> 
>        if ((size_t) snprintf (buf, sizeof buf, "%s%u%s", prefix,
> @@ -548,17 +550,18 @@ make_vector_modes (enum mode_class cl, const
> char *prefix, unsigned int width,
> 
>  /* Create a vector of booleans called NAME with COUNT elements and
>     BYTESIZE bytes in total.  */
> -#define VECTOR_BOOL_MODE(NAME, COUNT, BYTESIZE) \
> -  make_vector_bool_mode (#NAME, COUNT, BYTESIZE, __FILE__, __LINE__)
> +#define VECTOR_BOOL_MODE(NAME, COUNT, COMPONENT, BYTESIZE)
> 		\
> +  make_vector_bool_mode (#NAME, COUNT, #COMPONENT, BYTESIZE,
> 		\
> +			 __FILE__, __LINE__)
>  static void ATTRIBUTE_UNUSED
>  make_vector_bool_mode (const char *name, unsigned int count,
> -		       unsigned int bytesize, const char *file,
> -		       unsigned int line)
> +		       const char *component, unsigned int bytesize,
> +		       const char *file, unsigned int line)
>  {
> -  struct mode_data *m = find_mode ("BI");
> +  struct mode_data *m = find_mode (component);
>    if (!m)
>      {
> -      error ("%s:%d: no mode \"BI\"", file, line);
> +      error ("%s:%d: no mode \"%s\"", file, line, component);
>        return;
>      }
> 
> @@ -596,6 +599,20 @@ make_int_mode (const char *name,
>    m->precision = precision;
>  }
> 
> +#define BOOL_MODE(N, B, Y) \
> +  make_bool_mode (#N, B, Y, __FILE__, __LINE__)
> +
> +static void
> +make_bool_mode (const char *name,
> +		unsigned int precision, unsigned int bytesize,
> +		const char *file, unsigned int line)
> +{
> +  struct mode_data *m = new_mode (MODE_INT, name, file, line);
> +  m->bytesize = bytesize;
> +  m->precision = precision;
> +  m->boolean = true;
> +}
> +
>  #define OPAQUE_MODE(N, B)			\
>    make_opaque_mode (#N, -1U, B, __FILE__, __LINE__)
> 
> @@ -1298,9 +1315,21 @@ enum machine_mode\n{");
>        /* Don't use BImode for MIN_MODE_INT, since otherwise the middle
>  	 end will try to use it for bitfields in structures and the
>  	 like, which we do not want.  Only the target md file should
> -	 generate BImode widgets.  */
> -      if (first && first->precision == 1 && c == MODE_INT)
> -	first = first->next;
> +	 generate BImode widgets.  Since some targets such as ARM/MVE
> +	 define boolean modes with multiple bits, handle those too.  */
> +      if (first && first->boolean)
> +	{
> +	  struct mode_data *last_bool = first;
> +	  printf ("  MIN_MODE_BOOL = E_%smode,\n", first->name);
> +
> +	  while (first && first->boolean)
> +	    {
> +	      last_bool = first;
> +	      first = first->next;
> +	    }
> +
> +	  printf ("  MAX_MODE_BOOL = E_%smode,\n\n", last_bool->name);
> +	}
> 
>        if (first && last)
>  	printf ("  MIN_%s = E_%smode,\n  MAX_%s = E_%smode,\n\n",
> @@ -1679,15 +1708,25 @@ emit_class_narrowest_mode (void)
>    print_decl ("unsigned char", "class_narrowest_mode",
> "MAX_MODE_CLASS");
> 
>    for (c = 0; c < MAX_MODE_CLASS; c++)
> -    /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> -    tagged_printf ("MIN_%s", mode_class_names[c],
> -		   modes[c]
> -		   ? ((c != MODE_INT || modes[c]->precision != 1)
> -		      ? modes[c]->name
> -		      : (modes[c]->next
> -			 ? modes[c]->next->name
> -			 : void_mode->name))
> -		   : void_mode->name);
> +    {
> +      /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> +      const char *comment_name = void_mode->name;
> +
> +      if (modes[c])
> +	if (c != MODE_INT || !modes[c]->boolean)
> +	  comment_name = modes[c]->name;
> +	else
> +	  {
> +	    struct mode_data *m = modes[c];
> +	    while (m->boolean)
> +	      m = m->next;
> +	    if (m)
> +	      comment_name = m->name;
> +	    else
> +	      comment_name = void_mode->name;
> +	  }
> +      tagged_printf ("MIN_%s", mode_class_names[c], comment_name);
> +    }
> 
>    print_closer ();
>  }
> diff --git a/gcc/machmode.def b/gcc/machmode.def
> index 866a2082d01..eb7905ea23d 100644
> --- a/gcc/machmode.def
> +++ b/gcc/machmode.def
> @@ -196,7 +196,7 @@ RANDOM_MODE (VOID);
>  RANDOM_MODE (BLK);
> 
>  /* Single bit mode used for booleans.  */
> -FRACTIONAL_INT_MODE (BI, 1, 1);
> +BOOL_MODE (BI, 1, 1);
> 
>  /* Basic integer modes.  We go up to TI in generic code (128 bits).
>     TImode is needed here because the some front ends now genericly
> diff --git a/gcc/rtx-vector-builder.c b/gcc/rtx-vector-builder.c
> index e36aba010a0..55ffe0d5a76 100644
> --- a/gcc/rtx-vector-builder.c
> +++ b/gcc/rtx-vector-builder.c
> @@ -90,8 +90,10 @@ rtx_vector_builder::find_cached_value ()
> 
>    if (GET_MODE_CLASS (m_mode) == MODE_VECTOR_BOOL)
>      {
> -      if (elt == const1_rtx || elt == constm1_rtx)
> +      if (elt == const1_rtx)
>  	return CONST1_RTX (m_mode);
> +      else if (elt == constm1_rtx)
> +	return CONSTM1_RTX (m_mode);
>        else if (elt == const0_rtx)
>  	return CONST0_RTX (m_mode);
>        else
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index c36c825f958..532537ea48d 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -6876,12 +6876,13 @@ native_encode_rtx (machine_mode mode, rtx x,
> vec<target_unit> &bytes,
>  	  /* This is the only case in which elements can be smaller than
>  	     a byte.  */
>  	  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> +	  auto mask = GET_MODE_MASK (GET_MODE_INNER (mode));
>  	  for (unsigned int i = 0; i < num_bytes; ++i)
>  	    {
>  	      target_unit value = 0;
>  	      for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits)
>  		{
> -		  value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & 1) << j;
> +		  value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) << j;
>  		  elt += 1;
>  		}
>  	      bytes.quick_push (value);
> @@ -7025,9 +7026,8 @@ native_decode_vector_rtx (machine_mode mode,
> const vec<target_unit> &bytes,
>  	  unsigned int bit_index = first_byte * BITS_PER_UNIT + i * elt_bits;
>  	  unsigned int byte_index = bit_index / BITS_PER_UNIT;
>  	  unsigned int lsb = bit_index % BITS_PER_UNIT;
> -	  builder.quick_push (bytes[byte_index] & (1 << lsb)
> -			      ? CONST1_RTX (BImode)
> -			      : CONST0_RTX (BImode));
> +	  unsigned int value = bytes[byte_index] >> lsb;
> +	  builder.quick_push (gen_int_mode (value, GET_MODE_INNER
> (mode)));
>  	}
>      }
>    else
> @@ -7994,17 +7994,23 @@ test_vector_ops_duplicate (machine_mode
> mode, rtx scalar_reg)
>  						    duplicate, last_par));
> 
>        /* Test a scalar subreg of a VEC_MERGE of a VEC_DUPLICATE.  */
> -      rtx vector_reg = make_test_reg (mode);
> -      for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
> +      /* Skip this test for vectors of booleans, because offset is in bytes,
> +	 while vec_merge indices are in elements (usually bits).  */
> +      if (GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
>  	{
> -	  if (i >= HOST_BITS_PER_WIDE_INT)
> -	    break;
> -	  rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
> -	  rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
> -	  poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
> -	  ASSERT_RTX_EQ (scalar_reg,
> -			 simplify_gen_subreg (inner_mode, vm,
> -					      mode, offset));
> +	  rtx vector_reg = make_test_reg (mode);
> +	  for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
> +	    {
> +	      if (i >= HOST_BITS_PER_WIDE_INT)
> +		break;
> +	      rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
> +	      rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg,
> mask);
> +	      poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
> +
> +	      ASSERT_RTX_EQ (scalar_reg,
> +			     simplify_gen_subreg (inner_mode, vm,
> +						  mode, offset));
> +	    }
>  	}
>      }
> 
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 76574be191f..5f59b6ace15 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -4085,6 +4085,7 @@ output_constant_pool_2 (fixed_size_mode mode,
> rtx x, unsigned int align)
>  	unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
>  	unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
>  	scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require
> ();
> +	unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode));
> 
>  	/* Build the constant up one integer at a time.  */
>  	unsigned int elts_per_int = int_bits / elt_bits;
> @@ -4093,8 +4094,10 @@ output_constant_pool_2 (fixed_size_mode
> mode, rtx x, unsigned int align)
>  	    unsigned HOST_WIDE_INT value = 0;
>  	    unsigned int limit = MIN (nelts - i, elts_per_int);
>  	    for (unsigned int j = 0; j < limit; ++j)
> -	      if (INTVAL (CONST_VECTOR_ELT (x, i + j)) != 0)
> -		value |= 1 << (j * elt_bits);
> +	    {
> +	      auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j));
> +	      value |= (elt & mask) << (j * elt_bits);
> +	    }
>  	    output_constant_pool_2 (int_mode, gen_int_mode (value,
> int_mode),
>  				    i != 0 ? MIN (align, int_bits) : align);
>  	  }
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 08/15] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates
  2022-01-13 14:56 ` [PATCH v3 08/15] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates Christophe Lyon
@ 2022-01-27 16:37   ` Kyrylo Tkachov
  0 siblings, 0 replies; 54+ messages in thread
From: Kyrylo Tkachov @ 2022-01-27 16:37 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

Hi Christophe,

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: Thursday, January 13, 2022 2:56 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH v3 08/15] arm: Implement auto-vectorized MVE
> comparisons with vectors of boolean predicates
> 
> We make use of qualifier_predicate to describe MVE builtins
> prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
> as they are exercised by the tests added earlier in the series.
> 
> Special handling is needed for mve_vpselq because it has a v2di
> variant, which has no natural VPR.P0 representation: we keep HImode
> for it.
> 
> The vector_compare expansion code is updated to use the right VxBI
> mode instead of HI for the result.
> 
> We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16
> patterns
> to use the new MVE_7_HI iterator which covers HI and the new VxBI
> modes, in conjunction with the new DB constraint for a constant vector
> of booleans.
> 
> 2022-01-13  Christophe Lyon <christophe.lyon@foss.st.com>
> 	Richard Sandiford  <richard.sandiford@arm.com>
> 
> 	gcc/
> 	PR target/100757
> 	PR target/101325
> 	* config/arm/arm-builtins.c
> (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
> 	(BINOP_PRED_NONE_NONE_QUALIFIERS)
> 	(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
> 	(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
> 	* config/arm/arm-protos.h (mve_const_bool_vec_to_hi): New.
> 	* config/arm/arm.c (arm_hard_regno_mode_ok): Handle new VxBI
> 	modes.
> 	(arm_mode_to_pred_mode): New.
> 	(arm_expand_vector_compare): Use the right VxBI mode instead of
> 	HI.
> 	(arm_expand_vcond): Likewise.
> 	(simd_valid_immediate): Handle MODE_VECTOR_BOOL.
> 	(mve_const_bool_vec_to_hi): New.
> 	(neon_make_constant): Call mve_const_bool_vec_to_hi when
> needed.
> 	* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_,
> vcmpcsq_)
> 	(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
> 	(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f,
> vpselq_u)
> 	(vpselq_s, vpselq_f): Use new predicated qualifiers.
> 	* config/arm/constraints.md (DB): New.
> 	* config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators.
> 	(MVE_VPRED, MVE_vpred): New attribute iterators.
> 	* config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>)
> 	(@mve_vcmp<mve_cmp_op>q_f<mode>,
> @mve_vpselq_<supf><mode>)
> 	(@mve_vpselq_f<mode>): Use MVE_VPRED instead of HI.
> 	(@mve_vpselq_<supf>v2di): Define separately.
> 	(mov<mode>): New expander for VxBI modes.
> 	* config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16):
> Use
> 	MVE_7_HI iterator and add support for DB constraint.
> 
> 	gcc/testsuite/
> 	PR target/100757
> 	PR target/101325
> 	* gcc.dg/rtl/arm/mve-vxbi.c: New test.
> 
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 2ccfa37c302..36d71ab1a13 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -420,6 +420,12 @@
> arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \
>    (arm_binop_unone_unone_unone_qualifiers)
> 
> +static enum arm_type_qualifiers
> +arm_binop_pred_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned };
> +#define BINOP_PRED_UNONE_UNONE_QUALIFIERS \
> +  (arm_binop_pred_unone_unone_qualifiers)
> +
>  static enum arm_type_qualifiers
>  arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_unsigned, qualifier_none, qualifier_immediate };
> @@ -438,6 +444,12 @@
> arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define BINOP_UNONE_NONE_NONE_QUALIFIERS \
>    (arm_binop_unone_none_none_qualifiers)
> 
> +static enum arm_type_qualifiers
> +arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_predicate, qualifier_none, qualifier_none };
> +#define BINOP_PRED_NONE_NONE_QUALIFIERS \
> +  (arm_binop_pred_none_none_qualifiers)
> +
>  static enum arm_type_qualifiers
>  arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
> @@ -509,6 +521,12 @@
> arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS
> ]
>  #define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
>    (arm_ternop_none_none_none_unone_qualifiers)
> 
> +static enum arm_type_qualifiers
> +arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
> +#define TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS \
> +  (arm_ternop_none_none_none_pred_qualifiers)
> +
>  static enum arm_type_qualifiers
> 
> arm_ternop_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_none, qualifier_none, qualifier_immediate, qualifier_unsigned
> };
> @@ -528,6 +546,13 @@
> arm_ternop_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_A
> RGS]
>  #define TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
>    (arm_ternop_unone_unone_unone_unone_qualifiers)
> 
> +static enum arm_type_qualifiers
> +arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_A
> RGS]
> +  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
> +    qualifier_predicate };
> +#define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
> +  (arm_ternop_unone_unone_unone_pred_qualifiers)
> +
>  static enum arm_type_qualifiers
>  arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index fb365ac5268..b978adf2038 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -101,6 +101,7 @@ extern char *neon_output_shift_immediate (const
> char *, char, rtx *,
>  					  machine_mode, int, bool);
>  extern void neon_pairwise_reduce (rtx, rtx, machine_mode,
>  				  rtx (*) (rtx, rtx, rtx));
> +extern rtx mve_const_bool_vec_to_hi (rtx const_vec);
>  extern rtx neon_make_constant (rtx, bool generate = true);
>  extern tree arm_builtin_vectorized_function (unsigned int, tree, tree);
>  extern void neon_expand_vector_init (rtx, rtx);
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 64a8f2dc7de..fa18c7bd3fe 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -12750,7 +12750,10 @@ simd_valid_immediate (rtx op, machine_mode
> mode, int inverse,
>    innersize = GET_MODE_UNIT_SIZE (mode);
> 
>    /* Only support 128-bit vectors for MVE.  */
> -  if (TARGET_HAVE_MVE && (!vector || n_elts * innersize != 16))
> +  if (TARGET_HAVE_MVE
> +      && (!vector
> +	  || (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
> +	  || n_elts * innersize != 16))
>      return -1;
> 
>    /* Vectors of float constants.  */
> @@ -13115,6 +13118,29 @@ neon_vdup_constant (rtx vals, bool generate)
>    return gen_vec_duplicate (mode, x);
>  }
> 
> +/* Return a HI representation of CONST_VEC suitable for MVE predicates.
> */
> +rtx
> +mve_const_bool_vec_to_hi (rtx const_vec)

I was a bit confused by the "hi" in the name. I guess it means HImode rather than "high"(part).
Maybe name it something like mve_bool_vec_to_const. The HImode part is implied by the fact that it's an MVE bool vector.

> +{
> +  int n_elts = GET_MODE_NUNITS ( GET_MODE (const_vec));
> +  int repeat = 16 / n_elts;
> +  int i;
> +  int hi_val = 0;
> +
> +  for (i = 0; i < n_elts; i++)
> +    {
> +      rtx el = CONST_VECTOR_ELT (const_vec, i);
> +      unsigned HOST_WIDE_INT elpart;
> +
> +      gcc_assert (CONST_INT_P (el));
> +      elpart = INTVAL (el);
> +
> +      for (int j = 0; j < repeat; j++)
> +	hi_val |= elpart << (i * repeat + j);
> +    }
> +  return GEN_INT (hi_val);

I think it's better to use gen_int_mode (hi_val, HImode) to ensure the HOST_WIDE_INT representation inside the CONST_INT has the right sign/zero extensions.
Ok with those changes.
Thanks,
Kyrill

> +}
> +
>  /* Return a non-NULL RTX iff VALS, which is a PARALLEL containing only
>     constants (for vec_init) or CONST_VECTOR, can be effeciently loaded
>     into a register.
> @@ -13155,6 +13181,8 @@ neon_make_constant (rtx vals, bool generate)
>        && simd_immediate_valid_for_move (const_vec, mode, NULL, NULL))
>      /* Load using VMOV.  On Cortex-A8 this takes one cycle.  */
>      return const_vec;
> +  else if (TARGET_HAVE_MVE && (GET_MODE_CLASS (mode) ==
> MODE_VECTOR_BOOL))
> +    return mve_const_bool_vec_to_hi (const_vec);
>    else if ((target = neon_vdup_constant (vals, generate)) != NULL_RTX)
>      /* Loaded using VDUP.  On Cortex-A8 the VDUP takes one NEON
>         pipeline cycle; creating the constant takes one or two ARM
> @@ -25313,7 +25341,10 @@ arm_hard_regno_mode_ok (unsigned int
> regno, machine_mode mode)
>      return false;
> 
>    if (IS_VPR_REGNUM (regno))
> -    return mode == HImode;
> +    return mode == HImode
> +      || mode == V16BImode
> +      || mode == V8BImode
> +      || mode == V4BImode;
> 
>    if (TARGET_THUMB1)
>      /* For the Thumb we only allow values bigger than SImode in
> @@ -31001,6 +31032,19 @@ arm_split_atomic_op (enum rtx_code code, rtx
> old_out, rtx new_out, rtx mem,
>      arm_post_atomic_barrier (model);
>  }
> 
> 
> 
> +/* Return the mode for the MVE vector of predicates corresponding to
> MODE.  */
> +machine_mode
> +arm_mode_to_pred_mode (machine_mode mode)
> +{
> +  switch (GET_MODE_NUNITS (mode))
> +    {
> +    case 16: return V16BImode;
> +    case 8: return V8BImode;
> +    case 4: return V4BImode;
> +    }
> +  gcc_unreachable ();
> +}
> +
>  /* Expand code to compare vectors OP0 and OP1 using condition CODE.
>     If CAN_INVERT, store either the result or its inverse in TARGET
>     and return true if TARGET contains the inverse.  If !CAN_INVERT,
> @@ -31084,7 +31128,7 @@ arm_expand_vector_compare (rtx target,
> rtx_code code, rtx op0, rtx op1,
>  	  if (vcond_mve)
>  	    vpr_p0 = target;
>  	  else
> -	    vpr_p0 = gen_reg_rtx (HImode);
> +	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> 
>  	  switch (GET_MODE_CLASS (cmp_mode))
>  	    {
> @@ -31126,7 +31170,7 @@ arm_expand_vector_compare (rtx target,
> rtx_code code, rtx op0, rtx op1,
>  	  if (vcond_mve)
>  	    vpr_p0 = target;
>  	  else
> -	    vpr_p0 = gen_reg_rtx (HImode);
> +	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> 
>  	  emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0,
> force_reg (cmp_mode, op1)));
>  	  if (!vcond_mve)
> @@ -31153,7 +31197,7 @@ arm_expand_vector_compare (rtx target,
> rtx_code code, rtx op0, rtx op1,
>  	  if (vcond_mve)
>  	    vpr_p0 = target;
>  	  else
> -	    vpr_p0 = gen_reg_rtx (HImode);
> +	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> 
>  	  emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode,
> vpr_p0, force_reg (cmp_mode, op1), op0));
>  	  if (!vcond_mve)
> @@ -31206,7 +31250,7 @@ arm_expand_vcond (rtx *operands,
> machine_mode cmp_result_mode)
>    if (TARGET_HAVE_MVE)
>      {
>        vcond_mve=true;
> -      mask = gen_reg_rtx (HImode);
> +      mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode));
>      }
>    else
>      mask = gen_reg_rtx (cmp_result_mode);
> diff --git a/gcc/config/arm/arm_mve_builtins.def
> b/gcc/config/arm/arm_mve_builtins.def
> index c3ae40765fe..44b41eab4c5 100644
> --- a/gcc/config/arm/arm_mve_builtins.def
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -89,7 +89,7 @@ VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi,
> v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
>  VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
>  VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vsubq_u, v16qi, v8hi, v4si)
> @@ -117,9 +117,9 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vhsubq_n_u,
> v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si)
> @@ -143,15 +143,15 @@ VAR3 (BINOP_UNONE_UNONE_IMM, vshlq_n_u,
> v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_IMM, vrshrq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_IMM, vqshlq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
> -VAR3 (BINOP_UNONE_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_NONE, vcmpeqq_n_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_NONE_IMM, vqshluq_n_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_UNONE, vaddvq_p_s, v16qi, v8hi, v4si)
> @@ -219,17 +219,17 @@ VAR2 (BINOP_UNONE_UNONE_IMM, vshllbq_n_u,
> v16qi, v8hi)
>  VAR2 (BINOP_UNONE_UNONE_IMM, vorrq_n_u, v8hi, v4si)
>  VAR2 (BINOP_UNONE_UNONE_IMM, vbicq_n_u, v8hi, v4si)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpneq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpneq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpneq_f, v8hf, v4sf)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpltq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpltq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpltq_f, v8hf, v4sf)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpleq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpleq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpleq_f, v8hf, v4sf)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpgtq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpgtq_f, v8hf, v4sf)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpgeq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpgeq_f, v8hf, v4sf)
>  VAR2 (BINOP_UNONE_NONE_NONE, vcmpeqq_n_f, v8hf, v4sf)
> -VAR2 (BINOP_UNONE_NONE_NONE, vcmpeqq_f, v8hf, v4sf)
> +VAR2 (BINOP_PRED_NONE_NONE, vcmpeqq_f, v8hf, v4sf)
>  VAR2 (BINOP_NONE_NONE_NONE, vsubq_f, v8hf, v4sf)
>  VAR2 (BINOP_NONE_NONE_NONE, vqmovntq_s, v8hi, v4si)
>  VAR2 (BINOP_NONE_NONE_NONE, vqmovnbq_s, v8hi, v4si)
> @@ -295,8 +295,8 @@ VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE,
> vcvtaq_m_u, v8hi, v4si)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtaq_m_s, v8hi, v4si)
>  VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vshlcq_vec_u, v16qi, v8hi,
> v4si)
>  VAR3 (TERNOP_NONE_NONE_UNONE_IMM, vshlcq_vec_s, v16qi, v8hi, v4si)
> -VAR4 (TERNOP_UNONE_UNONE_UNONE_UNONE, vpselq_u, v16qi, v8hi,
> v4si, v2di)
> -VAR4 (TERNOP_NONE_NONE_NONE_UNONE, vpselq_s, v16qi, v8hi, v4si,
> v2di)
> +VAR4 (TERNOP_UNONE_UNONE_UNONE_PRED, vpselq_u, v16qi, v8hi, v4si,
> v2di)
> +VAR4 (TERNOP_NONE_NONE_NONE_PRED, vpselq_s, v16qi, v8hi, v4si, v2di)
>  VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrev64q_m_u, v16qi,
> v8hi, v4si)
>  VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmvnq_m_u, v16qi, v8hi,
> v4si)
>  VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vmlasq_n_u, v16qi, v8hi,
> v4si)
> @@ -426,7 +426,7 @@ VAR2 (TERNOP_NONE_NONE_NONE_UNONE,
> vrev64q_m_f, v8hf, v4sf)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vrev32q_m_s, v16qi, v8hi)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vqmovntq_m_s, v8hi, v4si)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vqmovnbq_m_s, v8hi, v4si)
> -VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vpselq_f, v8hf, v4sf)
> +VAR2 (TERNOP_NONE_NONE_NONE_PRED, vpselq_f, v8hf, v4sf)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vnegq_m_f, v8hf, v4sf)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovntq_m_s, v8hi, v4si)
>  VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vmovnbq_m_s, v8hi, v4si)
> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
> index 1920004b450..2b411b0cb0f 100644
> --- a/gcc/config/arm/constraints.md
> +++ b/gcc/config/arm/constraints.md
> @@ -312,6 +312,12 @@ (define_constraint "Dz"
>   (and (match_code "const_vector")
>        (match_test "(TARGET_NEON || TARGET_HAVE_MVE) && op ==
> CONST0_RTX (mode)")))
> 
> +(define_constraint "DB"
> + "@internal
> +  In ARM/Thumb-2 state with MVE a constant vector of booleans."
> + (and (match_code "const_vector")
> +      (match_test "TARGET_HAVE_MVE && GET_MODE_CLASS (mode) ==
> MODE_VECTOR_BOOL")))
> +
>  (define_constraint "Da"
>   "@internal
>    In ARM/Thumb-2 state a const_int, const_double or const_vector that can
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 8202c27cc82..37cf7971be8 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -272,6 +272,8 @@ (define_mode_iterator MVE_3 [V16QI V8HI])
>  (define_mode_iterator MVE_2 [V16QI V8HI V4SI])
>  (define_mode_iterator MVE_5 [V8HI V4SI])
>  (define_mode_iterator MVE_6 [V8HI V4SI])
> +(define_mode_iterator MVE_7 [V16BI V8BI V4BI])
> +(define_mode_iterator MVE_7_HI [HI V16BI V8BI V4BI])
> 
>  ;;----------------------------------------------------------------------------
>  ;; Code iterators
> @@ -946,6 +948,10 @@ (define_mode_attr V_extr_elem [(V16QI "u8")
> (V8HI "u16") (V4SI "32")
>  			       (V8HF "u16") (V4SF "32")])
>  (define_mode_attr earlyclobber_32 [(V16QI "=w") (V8HI "=w") (V4SI "=&w")
>  						(V8HF "=w") (V4SF "=&w")])
> +(define_mode_attr MVE_VPRED [(V16QI "V16BI") (V8HI "V8BI") (V4SI
> "V4BI")
> +                             (V2DI "HI") (V8HF "V8BI")   (V4SF "V4BI")])
> +(define_mode_attr MVE_vpred [(V16QI "v16bi") (V8HI "v8bi") (V4SI "v4bi")
> +                             (V2DI "hi") (V8HF "v8bi")   (V4SF "v4bi")])
> 
>  ;;----------------------------------------------------------------------------
>  ;; Code attributes
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 5c3b34dce3a..983aa10e652 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -839,8 +839,8 @@ (define_insn "mve_vaddlvq_p_<supf>v4si"
>  ;;
>  (define_insn "@mve_vcmp<mve_cmp_op>q_<mode>"
>    [
> -   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> -	(MVE_COMPARISONS:HI (match_operand:MVE_2 1
> "s_register_operand" "w")
> +   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
> +	(MVE_COMPARISONS:<MVE_VPRED> (match_operand:MVE_2 1
> "s_register_operand" "w")
>  		    (match_operand:MVE_2 2 "s_register_operand" "w")))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -1929,8 +1929,8 @@ (define_insn "mve_vcaddq<mve_rot><mode>"
>  ;;
>  (define_insn "@mve_vcmp<mve_cmp_op>q_f<mode>"
>    [
> -   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> -	(MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1
> "s_register_operand" "w")
> +   (set (match_operand:<MVE_VPRED> 0 "vpr_register_operand" "=Up")
> +	(MVE_FP_COMPARISONS:<MVE_VPRED> (match_operand:MVE_0 1
> "s_register_operand" "w")
>  			       (match_operand:MVE_0 2 "s_register_operand"
> "w")))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -3324,7 +3324,7 @@ (define_insn "@mve_vpselq_<supf><mode>"
>     (set (match_operand:MVE_1 0 "s_register_operand" "=w")
>  	(unspec:MVE_1 [(match_operand:MVE_1 1 "s_register_operand"
> "w")
>  		       (match_operand:MVE_1 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
>  	 VPSELQ))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -4419,7 +4419,7 @@ (define_insn "@mve_vpselq_f<mode>"
>     (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>  	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand"
> "w")
>  		       (match_operand:MVE_0 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
>  	 VPSELQ_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -10516,3 +10516,14 @@ (define_insn
> "*movmisalign<mode>_mve_load"
>    "vldr<V_sz_elem1>.<V_sz_elem>\t%q0, %E1"
>    [(set_attr "type" "mve_load")]
>  )
> +
> +;; Expander for VxBI moves
> +(define_expand "mov<mode>"
> +  [(set (match_operand:MVE_7 0 "nonimmediate_operand")
> +        (match_operand:MVE_7 1 "general_operand"))]
> +  "TARGET_HAVE_MVE"
> +  {
> +    if (!register_operand (operands[0], <MODE>mode))
> +      operands[1] = force_reg (<MODE>mode, operands[1]);
> +  }
> +)
> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index f5ccb92d097..f00d1cad3e9 100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -73,21 +73,26 @@ (define_insn "*arm_movhi_vfp"
> 
>  (define_insn "*thumb2_movhi_vfp"
>   [(set
> -   (match_operand:HI 0 "nonimmediate_operand"
> +   (match_operand:MVE_7_HI 0 "nonimmediate_operand"
>      "=rk, r, l, r, m, r, *t, r, *t, Up, r")
> -   (match_operand:HI 1 "general_operand"
> -    "rk, I, Py, n, r, m, r, *t, *t, r, Up"))]
> +   (match_operand:MVE_7_HI 1 "general_operand"
> +    "rk, IDB, Py, n, r, m, r, *t, *t, r, Up"))]
>   "TARGET_THUMB2 && TARGET_VFP_BASE
>    && !TARGET_VFP_FP16INST
> -  && (register_operand (operands[0], HImode)
> -       || register_operand (operands[1], HImode))"
> +  && (register_operand (operands[0], <MODE>mode)
> +       || register_operand (operands[1], <MODE>mode))"
>  {
>    switch (which_alternative)
>      {
>      case 0:
> -    case 1:
>      case 2:
>        return "mov%?\t%0, %1\t%@ movhi";
> +    case 1:
> +      if (GET_MODE_CLASS (GET_MODE (operands[1])) ==
> MODE_VECTOR_BOOL)
> +        operands[1] = mve_const_bool_vec_to_hi (operands[1]);
> +      else
> +        operands[1] = gen_lowpart (HImode, operands[1]);
> +      return "mov%?\t%0, %1\t%@ movhi";
>      case 3:
>        return "movw%?\t%0, %L1\t%@ movhi";
>      case 4:
> @@ -173,20 +178,25 @@ (define_insn "*arm_movhi_fp16"
> 
>  (define_insn "*thumb2_movhi_fp16"
>   [(set
> -   (match_operand:HI 0 "nonimmediate_operand"
> +   (match_operand:MVE_7_HI 0 "nonimmediate_operand"
>      "=rk, r, l, r, m, r, *t, r, *t, Up, r")
> -   (match_operand:HI 1 "general_operand"
> -    "rk, I, Py, n, r, m, r, *t, *t, r, Up"))]
> +   (match_operand:MVE_7_HI 1 "general_operand"
> +    "rk, IDB, Py, n, r, m, r, *t, *t, r, Up"))]
>   "TARGET_THUMB2 && (TARGET_VFP_FP16INST || TARGET_HAVE_MVE)
> -  && (register_operand (operands[0], HImode)
> -       || register_operand (operands[1], HImode))"
> +  && (register_operand (operands[0], <MODE>mode)
> +       || register_operand (operands[1], <MODE>mode))"
>  {
>    switch (which_alternative)
>      {
>      case 0:
> -    case 1:
>      case 2:
>        return "mov%?\t%0, %1\t%@ movhi";
> +    case 1:
> +      if (GET_MODE_CLASS (GET_MODE (operands[1])) ==
> MODE_VECTOR_BOOL)
> +        operands[1] = mve_const_bool_vec_to_hi (operands[1]);
> +      else
> +        operands[1] = gen_lowpart (HImode, operands[1]);
> +      return "mov%?\t%0, %1\t%@ movhi";
>      case 3:
>        return "movw%?\t%0, %L1\t%@ movhi";
>      case 4:
> diff --git a/gcc/testsuite/gcc.dg/rtl/arm/mve-vxbi.c
> b/gcc/testsuite/gcc.dg/rtl/arm/mve-vxbi.c
> new file mode 100644
> index 00000000000..093283ed43c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/rtl/arm/mve-vxbi.c
> @@ -0,0 +1,89 @@
> +/* { dg-do compile { target arm*-*-* } } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O2" } */
> +
> +void __RTL (startwith ("ira")) foo (void *ptr)
> +{
> +  (function "foo"
> +   (param "ptr"
> +    (DECL_RTL (reg/v:SI <0> [ ptr ]))
> +    (DECL_RTL_INCOMING (reg:SI r0 [ ptr ]))
> +    ) ;; param "n"
> +   (insn-chain
> +    (block 2
> +     (edge-from entry (flags "FALLTHRU"))
> +     (cnote 5 [bb 2] NOTE_INSN_BASIC_BLOCK)
> +     (insn 7 (set (reg:V4BI <1>)
> +	      (const_vector:V4BI [(const_int 1)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 1)])) (nil))
> +     (insn 8 (set (mem:V4BI (reg:SI <0>) [1 ptr+0 S2 A16]) (reg:V4BI <1>)))
> +     (edge-to exit (flags "FALLTHRU"))
> +     ) ;; block 2
> +    ) ;; insn-chain
> +   ) ;; function
> +}
> +
> +void __RTL (startwith ("ira")) foo2 (void *ptr)
> +{
> +  (function "foo"
> +   (param "ptr"
> +    (DECL_RTL (reg/v:SI <0> [ ptr ]))
> +    (DECL_RTL_INCOMING (reg:SI r0 [ ptr ]))
> +    ) ;; param "n"
> +   (insn-chain
> +    (block 2
> +     (edge-from entry (flags "FALLTHRU"))
> +     (cnote 5 [bb 2] NOTE_INSN_BASIC_BLOCK)
> +     (insn 7 (set (reg:V8BI <1>)
> +	      (const_vector:V8BI [(const_int 1)
> +				  (const_int 0)
> +				  (const_int 1)
> +				  (const_int 1)
> +				  (const_int 1)
> +				  (const_int 1)
> +				  (const_int 0)
> +				  (const_int 1)])) (nil))
> +     (insn 8 (set (mem:V8BI (reg:SI <0>) [1 ptr+0 S2 A16]) (reg:V8BI <1>)))
> +     (edge-to exit (flags "FALLTHRU"))
> +     ) ;; block 2
> +    ) ;; insn-chain
> +   ) ;; function
> +}
> +
> +void __RTL (startwith ("ira")) foo3 (void *ptr)
> +{
> +  (function "foo"
> +   (param "ptr"
> +    (DECL_RTL (reg/v:SI <0> [ ptr ]))
> +    (DECL_RTL_INCOMING (reg:SI r0 [ ptr ]))
> +    ) ;; param "n"
> +   (insn-chain
> +    (block 2
> +     (edge-from entry (flags "FALLTHRU"))
> +     (cnote 5 [bb 2] NOTE_INSN_BASIC_BLOCK)
> +     (insn 7 (set (reg:V16BI <1>)
> +	      (const_vector:V16BI [(const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)
> +				  (const_int 0)])) (nil))
> +     (insn 8 (set (mem:V16BI (reg:SI <0>) [1 ptr+0 S2 A16]) (reg:V16BI <1>)))
> +     (edge-to exit (flags "FALLTHRU"))
> +     ) ;; block 2
> +    ) ;; insn-chain
> +   ) ;; function
> +}
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 09/15] arm: Fix vcond_mask expander for MVE (PR target/100757)
  2022-01-13 14:56 ` [PATCH v3 09/15] arm: Fix vcond_mask expander for MVE (PR target/100757) Christophe Lyon
@ 2022-01-27 16:55   ` Kyrylo Tkachov
  0 siblings, 0 replies; 54+ messages in thread
From: Kyrylo Tkachov @ 2022-01-27 16:55 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

Hi Christophe,

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: Thursday, January 13, 2022 2:56 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH v3 09/15] arm: Fix vcond_mask expander for MVE (PR
> target/100757)
> 
> The problem in this PR is that we call VPSEL with a mask of vector
> type instead of HImode. This happens because operand 3 in vcond_mask
> is the pre-computed vector comparison and has vector type.
> 
> This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
> returning the appropriate VxBI mode when targeting MVE.  In turn, this
> implies implementing vec_cmp<mode><MVE_vpred>,
> vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>,
> and we can
> move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and
> vcond_mask_<mode><v_cmp_result> back to neon.md since they are not
> used by MVE anymore.  The new *<MVE_vpred> patterns listed above are
> implemented in mve.md since they are only valid for MVE. However this
> may make maintenance/comparison more painful than having all of them
> in vec-common.md.
> 
> In the process, we can get rid of the recently added vcond_mve
> parameter of arm_expand_vector_compare.
> 
> Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my
> "arm:
> Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
> iterator added in r12-835 (to have V4HF/V8HF support), as well as the
> (!<Is_float_mode> || flag_unsafe_math_optimizations) condition which
> was not present before r12-834 although SF modes were enabled by VDQW
> (I think this was a bug).
> 
> Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
> longer need to generate vpsel with vectors of 0 and 1: the masks are
> now merged via scalar 'ands' instructions operating on 16-bit masks
> after converting the boolean vectors.
> 
> In addition, this patch fixes a problem in arm_expand_vcond() where
> the result would be a vector of 0 or 1 instead of operand 1 or 2.
> 
> Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
> arm_mve effective target.
> 
> Reducing the number of iterations in pr100757-3.c from 32 to 8, we
> generate the code below:
> 
> float a[32];
> float fn1(int d) {
>   float c = 4.0f;
>   for (int b = 0; b < 8; b++)
>     if (a[b] != 2.0f)
>       c = 5.0f;
>   return c;
> }
> 
> fn1:
> 	ldr     r3, .L3+48
> 	vldr.64 d4, .L3              // q2=(2.0,2.0,2.0,2.0)
> 	vldr.64 d5, .L3+8
> 	vldrw.32        q0, [r3]     // q0=a(0..3)
> 	adds    r3, r3, #16
> 	vcmp.f32        eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
> 	vldrw.32        q1, [r3]     // q1=a(4..7)
> 	vmrs     r3, P0
> 	vcmp.f32        eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
> 	vmrs    r2, P0  @ movhi
> 	ands    r3, r3, r2           // r3=select(a(0..3]) & select(a(4..7))
> 	vldr.64 d4, .L3+16           // q2=(5.0,5.0,5.0,5.0)
> 	vldr.64 d5, .L3+24
> 	vmsr     P0, r3
> 	vldr.64 d6, .L3+32           // q3=(4.0,4.0,4.0,4.0)
> 	vldr.64 d7, .L3+40
> 	vpsel q3, q3, q2             // q3=vcond_mask(4.0,5.0)
> 	vmov.32 r2, q3[1]            // keep the scalar max
> 	vmov.32 r0, q3[3]
> 	vmov.32 r3, q3[2]
> 	vmov.f32        s11, s12
> 	vmov    s15, r2
> 	vmov    s14, r3
> 	vmaxnm.f32      s15, s11, s15
> 	vmaxnm.f32      s15, s15, s14
> 	vmov    s14, r0
> 	vmaxnm.f32      s15, s15, s14
> 	vmov    r0, s15
> 	bx      lr
> 	.L4:
> 	.align  3
> 	.L3:
> 	.word   1073741824	// 2.0f
> 	.word   1073741824
> 	.word   1073741824
> 	.word   1073741824
> 	.word   1084227584	// 5.0f
> 	.word   1084227584
> 	.word   1084227584
> 	.word   1084227584
> 	.word   1082130432	// 4.0f
> 	.word   1082130432
> 	.word   1082130432
> 	.word   1082130432
> 
> 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
> 
> 	PR target/100757
> 	gcc/
> 	* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
> 	(arm_expand_vector_compare): Update prototype.
> 	* config/arm/arm.c (TARGET_VECTORIZE_GET_MASK_MODE): New.
> 	(arm_vector_mode_supported_p): Add support for VxBI modes.
> 	(arm_expand_vector_compare): Remove useless generation of vpsel.
> 	(arm_expand_vcond): Fix select operands.
> 	(arm_get_mask_mode): New.
> 	* config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New.
> 	(vec_cmpu<mode><MVE_vpred>): New.
> 	(vcond_mask_<mode><MVE_vpred>): New.
> 	* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>)
> 	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>):
> Move to ...
> 	* config/arm/neon.md (vec_cmp<mode><v_cmp_result>)
> 	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ...
> here
> 	and disable for MVE.
> 	* doc/sourcebuild.texi (arm_mve): Document new effective-target.
> 
> 	gcc/testsuite/
> 	* gcc.dg/signbit-2.c: Skip when targeting ARM/MVE.
> 	* lib/target-supports.exp (check_effective_target_arm_mve): New.
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index b978adf2038..a84613104b1 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -202,6 +202,7 @@ extern void arm_init_cumulative_args
> (CUMULATIVE_ARGS *, tree, rtx, tree);
>  extern bool arm_pad_reg_upward (machine_mode, tree, int);
>  #endif
>  extern int arm_apply_result_size (void);
> +extern opt_machine_mode arm_get_mask_mode (machine_mode mode);
> 
>  #endif /* RTX_CODE */
> 
> @@ -378,7 +379,7 @@ extern void arm_emit_coreregs_64bit_shift (enum
> rtx_code, rtx, rtx, rtx, rtx,
>  extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
>  extern bool arm_valid_symbolic_address_p (rtx);
>  extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
> -extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool,
> bool);
> +extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool);
>  #endif /* RTX_CODE */
> 
>  extern bool arm_gen_setmem (rtx *);
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index fa18c7bd3fe..7d56fa71806 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -829,6 +829,10 @@ static const struct attribute_spec
> arm_attribute_table[] =
> 
>  #undef TARGET_MD_ASM_ADJUST
>  #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
> +
> +#undef TARGET_VECTORIZE_GET_MASK_MODE
> +#define TARGET_VECTORIZE_GET_MASK_MODE arm_get_mask_mode
> +
> 
> 
> 
>  /* Obstack for minipool constant handling.  */
>  static struct obstack minipool_obstack;
> @@ -29234,7 +29238,8 @@ arm_vector_mode_supported_p
> (machine_mode mode)
> 
>    if (TARGET_HAVE_MVE
>        && (mode == V2DImode || mode == V4SImode || mode == V8HImode
> -	  || mode == V16QImode))
> +	  || mode == V16QImode
> +	  || mode == V16BImode || mode == V8BImode || mode ==
> V4BImode))
>        return true;
> 
>    if (TARGET_HAVE_MVE_FLOAT
> @@ -31033,7 +31038,7 @@ arm_split_atomic_op (enum rtx_code code, rtx
> old_out, rtx new_out, rtx mem,
>  }
> 
> 
> 
>  /* Return the mode for the MVE vector of predicates corresponding to
> MODE.  */
> -machine_mode
> +opt_machine_mode
>  arm_mode_to_pred_mode (machine_mode mode)
>  {
>    switch (GET_MODE_NUNITS (mode))
> @@ -31042,7 +31047,7 @@ arm_mode_to_pred_mode (machine_mode
> mode)
>      case 8: return V8BImode;
>      case 4: return V4BImode;
>      }
> -  gcc_unreachable ();
> +  return opt_machine_mode ();
>  }
> 
>  /* Expand code to compare vectors OP0 and OP1 using condition CODE.
> @@ -31050,16 +31055,12 @@ arm_mode_to_pred_mode (machine_mode
> mode)
>     and return true if TARGET contains the inverse.  If !CAN_INVERT,
>     always store the result in TARGET, never its inverse.
> 
> -   If VCOND_MVE, do not emit the vpsel instruction here, let
> arm_expand_vcond do
> -   it with the right destination type to avoid emiting two vpsel, one here and
> -   one in arm_expand_vcond.
> -
>     Note that the handling of floating-point comparisons is not
>     IEEE compliant.  */
> 
>  bool
>  arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
> -			   bool can_invert, bool vcond_mve)
> +			   bool can_invert)
>  {
>    machine_mode cmp_result_mode = GET_MODE (target);
>    machine_mode cmp_mode = GET_MODE (op0);
> @@ -31088,7 +31089,7 @@ arm_expand_vector_compare (rtx target,
> rtx_code code, rtx op0, rtx op1,
>  	       and then store its inverse in TARGET.  This avoids reusing
>  	       TARGET (which for integer NE could be one of the inputs).  */
>  	    rtx tmp = gen_reg_rtx (cmp_result_mode);
> -	    if (arm_expand_vector_compare (tmp, code, op0, op1, true,
> vcond_mve))
> +	    if (arm_expand_vector_compare (tmp, code, op0, op1, true))
>  	      gcc_unreachable ();
>  	    emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode,
> tmp)));
>  	    return false;
> @@ -31124,36 +31125,22 @@ arm_expand_vector_compare (rtx target,
> rtx_code code, rtx op0, rtx op1,
>      case NE:
>        if (TARGET_HAVE_MVE)
>  	{
> -	  rtx vpr_p0;
> -	  if (vcond_mve)
> -	    vpr_p0 = target;
> -	  else
> -	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> -
>  	  switch (GET_MODE_CLASS (cmp_mode))
>  	    {
>  	    case MODE_VECTOR_INT:
> -	      emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0,
> force_reg (cmp_mode, op1)));
> +	      emit_insn (gen_mve_vcmpq (code, cmp_mode, target,
> +					op0, force_reg (cmp_mode, op1)));
>  	      break;
>  	    case MODE_VECTOR_FLOAT:
>  	      if (TARGET_HAVE_MVE_FLOAT)
> -		emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0,
> op0, force_reg (cmp_mode, op1)));
> +		emit_insn (gen_mve_vcmpq_f (code, cmp_mode, target,
> +					    op0, force_reg (cmp_mode, op1)));
>  	      else
>  		gcc_unreachable ();
>  	      break;
>  	    default:
>  	      gcc_unreachable ();
>  	    }
> -
> -	  /* If we are not expanding a vcond, build the result here.  */
> -	  if (!vcond_mve)
> -	    {
> -	      rtx zero = gen_reg_rtx (cmp_result_mode);
> -	      rtx one = gen_reg_rtx (cmp_result_mode);
> -	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> -	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> -	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode,
> target, one, zero, vpr_p0));
> -	    }
>  	}
>        else
>  	emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1));
> @@ -31165,23 +31152,8 @@ arm_expand_vector_compare (rtx target,
> rtx_code code, rtx op0, rtx op1,
>      case GEU:
>      case GTU:
>        if (TARGET_HAVE_MVE)
> -	{
> -	  rtx vpr_p0;
> -	  if (vcond_mve)
> -	    vpr_p0 = target;
> -	  else
> -	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> -
> -	  emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0,
> force_reg (cmp_mode, op1)));
> -	  if (!vcond_mve)
> -	    {
> -	      rtx zero = gen_reg_rtx (cmp_result_mode);
> -	      rtx one = gen_reg_rtx (cmp_result_mode);
> -	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> -	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> -	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode,
> target, one, zero, vpr_p0));
> -	    }
> -	}
> +	emit_insn (gen_mve_vcmpq (code, cmp_mode, target,
> +				  op0, force_reg (cmp_mode, op1)));
>        else
>  	emit_insn (gen_neon_vc (code, cmp_mode, target,
>  				op0, force_reg (cmp_mode, op1)));
> @@ -31192,23 +31164,8 @@ arm_expand_vector_compare (rtx target,
> rtx_code code, rtx op0, rtx op1,
>      case LEU:
>      case LTU:
>        if (TARGET_HAVE_MVE)
> -	{
> -	  rtx vpr_p0;
> -	  if (vcond_mve)
> -	    vpr_p0 = target;
> -	  else
> -	    vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> -
> -	  emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode,
> vpr_p0, force_reg (cmp_mode, op1), op0));
> -	  if (!vcond_mve)
> -	    {
> -	      rtx zero = gen_reg_rtx (cmp_result_mode);
> -	      rtx one = gen_reg_rtx (cmp_result_mode);
> -	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> -	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> -	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode,
> target, one, zero, vpr_p0));
> -	    }
> -	}
> +	emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode,
> target,
> +				  force_reg (cmp_mode, op1), op0));
>        else
>  	emit_insn (gen_neon_vc (swap_condition (code), cmp_mode,
>  				target, force_reg (cmp_mode, op1), op0));
> @@ -31223,8 +31180,8 @@ arm_expand_vector_compare (rtx target,
> rtx_code code, rtx op0, rtx op1,
>  	rtx gt_res = gen_reg_rtx (cmp_result_mode);
>  	rtx alt_res = gen_reg_rtx (cmp_result_mode);
>  	rtx_code alt_code = (code == LTGT ? LT : LE);
> -	if (arm_expand_vector_compare (gt_res, GT, op0, op1, true,
> vcond_mve)
> -	    || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true,
> vcond_mve))
> +	if (arm_expand_vector_compare (gt_res, GT, op0, op1, true)
> +	    || arm_expand_vector_compare (alt_res, alt_code, op0, op1,
> true))
>  	  gcc_unreachable ();
>  	emit_insn (gen_rtx_SET (target, gen_rtx_IOR (cmp_result_mode,
>  						     gt_res, alt_res)));
> @@ -31244,19 +31201,15 @@ arm_expand_vcond (rtx *operands,
> machine_mode cmp_result_mode)
>  {
>    /* When expanding for MVE, we do not want to emit a (useless) vpsel in
>       arm_expand_vector_compare, and another one here.  */
> -  bool vcond_mve=false;
>    rtx mask;
> 
>    if (TARGET_HAVE_MVE)
> -    {
> -      vcond_mve=true;
> -      mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode));
> -    }
> +    mask = gen_reg_rtx (arm_mode_to_pred_mode
> (cmp_result_mode).require ());
>    else
>      mask = gen_reg_rtx (cmp_result_mode);
> 
>    bool inverted = arm_expand_vector_compare (mask, GET_CODE
> (operands[3]),
> -					     operands[4], operands[5], true,
> vcond_mve);
> +					     operands[4], operands[5], true);
>    if (inverted)
>      std::swap (operands[1], operands[2]);
>    if (TARGET_NEON)
> @@ -31264,20 +31217,20 @@ arm_expand_vcond (rtx *operands,
> machine_mode cmp_result_mode)
>  			    mask, operands[1], operands[2]));
>    else
>      {
> -      machine_mode cmp_mode = GET_MODE (operands[4]);
> -      rtx vpr_p0 = mask;
> -      rtx zero = gen_reg_rtx (cmp_mode);
> -      rtx one = gen_reg_rtx (cmp_mode);
> -      emit_move_insn (zero, CONST0_RTX (cmp_mode));
> -      emit_move_insn (one, CONST1_RTX (cmp_mode));
> +      machine_mode cmp_mode = GET_MODE (operands[0]);
> +
>        switch (GET_MODE_CLASS (cmp_mode))
>  	{
>  	case MODE_VECTOR_INT:
> -	  emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode,
> operands[0], one, zero, vpr_p0));
> +	  emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_mode, operands[0],
> +				     operands[1], operands[2], mask));
>  	  break;
>  	case MODE_VECTOR_FLOAT:
>  	  if (TARGET_HAVE_MVE_FLOAT)
> -	    emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one,
> zero, vpr_p0));
> +	    emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0],
> +					 operands[1], operands[2], mask));
> +	  else
> +	    gcc_unreachable ();
>  	  break;
>  	default:
>  	  gcc_unreachable ();
> @@ -34187,4 +34140,15 @@ arm_mode_base_reg_class (machine_mode
> mode)
> 
>  struct gcc_target targetm = TARGET_INITIALIZER;
> 
> +/* Implement TARGET_VECTORIZE_GET_MASK_MODE.  */
> +
> +opt_machine_mode
> +arm_get_mask_mode (machine_mode mode)
> +{
> +  if (TARGET_HAVE_MVE)
> +    return arm_mode_to_pred_mode (mode);
> +
> +  return default_get_mask_mode (mode);
> +}
> +
>  #include "gt-arm.h"
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 983aa10e652..35564e870bc 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -10527,3 +10527,57 @@ (define_expand "mov<mode>"
>        operands[1] = force_reg (<MODE>mode, operands[1]);
>    }
>  )
> +
> +;; Expanders for vec_cmp and vcond
> +
> +(define_expand "vec_cmp<mode><MVE_vpred>"
> +  [(set (match_operand:<MVE_VPRED> 0 "s_register_operand")
> +	(match_operator:<MVE_VPRED> 1 "comparison_operator"
> +	  [(match_operand:MVE_VLD_ST 2 "s_register_operand")
> +	   (match_operand:MVE_VLD_ST 3 "reg_or_zero_operand")]))]
> +  "TARGET_HAVE_MVE
> +   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> +{
> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> +			     operands[2], operands[3], false);
> +  DONE;
> +})
> +
> +(define_expand "vec_cmpu<mode><MVE_vpred>"
> +  [(set (match_operand:<MVE_VPRED> 0 "s_register_operand")
> +	(match_operator:<MVE_VPRED> 1 "comparison_operator"
> +	  [(match_operand:MVE_2 2 "s_register_operand")
> +	   (match_operand:MVE_2 3 "reg_or_zero_operand")]))]
> +  "TARGET_HAVE_MVE"
> +{
> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> +			     operands[2], operands[3], false);
> +  DONE;
> +})
> +
> +(define_expand "vcond_mask_<mode><MVE_vpred>"
> +  [(set (match_operand:MVE_VLD_ST 0 "s_register_operand")
> +	(if_then_else:MVE_VLD_ST
> +	  (match_operand:<MVE_VPRED> 3 "s_register_operand")
> +	  (match_operand:MVE_VLD_ST 1 "s_register_operand")
> +	  (match_operand:MVE_VLD_ST 2 "s_register_operand")))]
> +  "TARGET_HAVE_MVE"
> +{
> +  switch (GET_MODE_CLASS (<MODE>mode))
> +    {
> +      case MODE_VECTOR_INT:
> +	emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0],
> +				   operands[1], operands[2], operands[3]));
> +	break;
> +      case MODE_VECTOR_FLOAT:
> +	if (TARGET_HAVE_MVE_FLOAT)
> +	  emit_insn (gen_mve_vpselq_f (<MODE>mode, operands[0],
> +				       operands[1], operands[2], operands[3]));
> +	else
> +	  gcc_unreachable ();

I think this logic is a bit too complicated. The vpselq_f pattern is already guarded on TARGET_HAVE_MVE_FLOAT so the compiler will ICE if it gets generated without MVE float.
So there's no need for this "if (TARGET_HAVE_MVE_FLOAT)" and gcc_unreachable ().

Ok with that change.
Thanks,
Kyrill


> +	break;
> +      default:
> +	gcc_unreachable ();
> +    }
> +  DONE;
> +})
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index e06c8245672..20e9f11ec81 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -1394,6 +1394,45 @@ (define_insn "*us_sub<mode>_neon"
>    [(set_attr "type" "neon_qsub<q>")]
>  )
> 
> +(define_expand "vec_cmp<mode><v_cmp_result>"
> +  [(set (match_operand:<V_cmp_result> 0 "s_register_operand")
> +	(match_operator:<V_cmp_result> 1 "comparison_operator"
> +	  [(match_operand:VDQWH 2 "s_register_operand")
> +	   (match_operand:VDQWH 3 "reg_or_zero_operand")]))]
> +  "TARGET_NEON
> +   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> +{
> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> +			     operands[2], operands[3], false);
> +  DONE;
> +})
> +
> +(define_expand "vec_cmpu<mode><mode>"
> +  [(set (match_operand:VDQIW 0 "s_register_operand")
> +	(match_operator:VDQIW 1 "comparison_operator"
> +	  [(match_operand:VDQIW 2 "s_register_operand")
> +	   (match_operand:VDQIW 3 "reg_or_zero_operand")]))]
> +  "TARGET_NEON"
> +{
> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> +			     operands[2], operands[3], false);
> +  DONE;
> +})
> +
> +(define_expand "vcond_mask_<mode><v_cmp_result>"
> +  [(set (match_operand:VDQWH 0 "s_register_operand")
> +	(if_then_else:VDQWH
> +	  (match_operand:<V_cmp_result> 3 "s_register_operand")
> +	  (match_operand:VDQWH 1 "s_register_operand")
> +	  (match_operand:VDQWH 2 "s_register_operand")))]
> +  "TARGET_NEON
> +   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> +{
> +  emit_insn (gen_neon_vbsl<mode> (operands[0], operands[3],
> operands[1],
> +				  operands[2]));
> +  DONE;
> +})
> +
>  ;; Patterns for builtins.
> 
>  ; good for plain vadd, vaddq.
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md
> index cef358e44f5..20586973ed9 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -363,33 +363,6 @@ (define_expand "vlshr<mode>3"
>      }
>  })
> 
> -(define_expand "vec_cmp<mode><v_cmp_result>"
> -  [(set (match_operand:<V_cmp_result> 0 "s_register_operand")
> -	(match_operator:<V_cmp_result> 1 "comparison_operator"
> -	  [(match_operand:VDQWH 2 "s_register_operand")
> -	   (match_operand:VDQWH 3 "reg_or_zero_operand")]))]
> -  "ARM_HAVE_<MODE>_ARITH
> -   && !TARGET_REALLY_IWMMXT
> -   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> -{
> -  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> -			     operands[2], operands[3], false, false);
> -  DONE;
> -})
> -
> -(define_expand "vec_cmpu<mode><mode>"
> -  [(set (match_operand:VDQIW 0 "s_register_operand")
> -	(match_operator:VDQIW 1 "comparison_operator"
> -	  [(match_operand:VDQIW 2 "s_register_operand")
> -	   (match_operand:VDQIW 3 "reg_or_zero_operand")]))]
> -  "ARM_HAVE_<MODE>_ARITH
> -   && !TARGET_REALLY_IWMMXT"
> -{
> -  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> -			     operands[2], operands[3], false, false);
> -  DONE;
> -})
> -
>  ;; Conditional instructions.  These are comparisons with conditional moves
> for
>  ;; vectors.  They perform the assignment:
>  ;;
> @@ -461,31 +434,6 @@ (define_expand "vcondu<mode><v_cmp_result>"
>    DONE;
>  })
> 
> -(define_expand "vcond_mask_<mode><v_cmp_result>"
> -  [(set (match_operand:VDQWH 0 "s_register_operand")
> -        (if_then_else:VDQWH
> -          (match_operand:<V_cmp_result> 3 "s_register_operand")
> -          (match_operand:VDQWH 1 "s_register_operand")
> -          (match_operand:VDQWH 2 "s_register_operand")))]
> -  "ARM_HAVE_<MODE>_ARITH
> -   && !TARGET_REALLY_IWMMXT
> -   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> -{
> -  if (TARGET_NEON)
> -    {
> -      emit_insn (gen_neon_vbsl (<MODE>mode, operands[0], operands[3],
> -                                operands[1], operands[2]));
> -    }
> -  else if (TARGET_HAVE_MVE)
> -    {
> -      emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0],
> -                                 operands[1], operands[2], operands[3]));
> -    }
> -  else
> -    gcc_unreachable ();
> -  DONE;
> -})
> -
>  (define_expand "vec_load_lanesoi<mode>"
>    [(set (match_operand:OI 0 "s_register_operand")
>          (unspec:OI [(match_operand:OI 1 "neon_struct_operand")
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 6095a35cd45..8d369935396 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2236,6 +2236,10 @@ ARM target supports the @code{-mfloat-
> abi=softfp} option.
>  @anchor{arm_hard_ok}
>  ARM target supports the @code{-mfloat-abi=hard} option.
> 
> +@item arm_mve
> +@anchor{arm_mve}
> +ARM target supports generating MVE instructions.
> +
>  @item arm_v8_1_lob_ok
>  @anchor{arm_v8_1_lob_ok}
>  ARM Target supports executing the Armv8.1-M Mainline Low Overhead Loop
> diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c
> index b609f67dc9f..2f2dc448286 100644
> --- a/gcc/testsuite/gcc.dg/signbit-2.c
> +++ b/gcc/testsuite/gcc.dg/signbit-2.c
> @@ -4,6 +4,7 @@
>  /* This test does not work when the truth type does not match vector type.
> */
>  /* { dg-additional-options "-mno-avx512f" { target { i?86-*-* x86_64-*-* } } }
> */
>  /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
> +/* { dg-skip-if "no fallback for MVE" { arm_mve } } */
> 
>  #include <stdint.h>
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-
> supports.exp
> index 0fe1e1e077a..8dac516ec12 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -5234,6 +5234,18 @@ proc check_effective_target_arm_hard_ok { } {
>  	} "-mfloat-abi=hard"]
>  }
> 
> +# Return 1 if this is an ARM target supporting MVE.
> +proc check_effective_target_arm_mve { } {
> +    if { ![istarget arm*-*-*] } {
> +	return 0
> +    }
> +    return [check_no_compiler_messages arm_mve assembly {
> +	#if !defined (__ARM_FEATURE_MVE)
> +	#error FOO
> +	#endif
> +    }]
> +}
> +
>  # Return 1 if the target supports ARMv8.1-M MVE with floating point
>  # instructions, 0 otherwise.  The test is valid for ARM.
>  # Record the command line options needed.
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 12/15] arm: Convert more load/store MVE builtins to predicate qualifiers
  2022-01-13 14:56 ` [PATCH v3 12/15] arm: Convert more load/store " Christophe Lyon
@ 2022-01-27 16:56   ` Kyrylo Tkachov
  0 siblings, 0 replies; 54+ messages in thread
From: Kyrylo Tkachov @ 2022-01-27 16:56 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches



> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: Thursday, January 13, 2022 2:56 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH v3 12/15] arm: Convert more load/store MVE builtins to
> predicate qualifiers
> 
> This patch covers a few builtins where we do not use the <mode>
> iterator and thus we cannot use <MVE_vpred>.
> 
> For v2di instructions, we keep the HI mode for predicates.

Ok.
Thanks,
Kyrill

> 
> 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
> 
> 	gcc/
> 	PR target/100757
> 	PR target/101325
> 	* config/arm/arm-builtins.c (STRSBS_P_QUALIFIERS): Use predicate
> 	qualifier.
> 	(STRSBU_P_QUALIFIERS): Likewise.
> 	(LDRGBS_Z_QUALIFIERS): Likewise.
> 	(LDRGBU_Z_QUALIFIERS): Likewise.
> 	(LDRGBWBXU_Z_QUALIFIERS): Likewise.
> 	(LDRGBWBS_Z_QUALIFIERS): Likewise.
> 	(LDRGBWBU_Z_QUALIFIERS): Likewise.
> 	(STRSBWBS_P_QUALIFIERS): Likewise.
> 	(STRSBWBU_P_QUALIFIERS): Likewise.
> 	* config/arm/mve.md: Use VxBI instead of HI.
> 
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 0b063b5f037..73678a00398 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -689,13 +689,13 @@
> arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_void, qualifier_unsigned, qualifier_immediate,
> -      qualifier_none, qualifier_unsigned};
> +      qualifier_none, qualifier_predicate};
>  #define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
> 
>  static enum arm_type_qualifiers
>  arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_void, qualifier_unsigned, qualifier_immediate,
> -      qualifier_unsigned, qualifier_unsigned};
> +      qualifier_unsigned, qualifier_predicate};
>  #define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
> 
>  static enum arm_type_qualifiers
> @@ -731,13 +731,13 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_none, qualifier_unsigned, qualifier_immediate,
> -      qualifier_unsigned};
> +      qualifier_predicate};
>  #define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
> 
>  static enum arm_type_qualifiers
>  arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
> -      qualifier_unsigned};
> +      qualifier_predicate};
>  #define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
> 
>  static enum arm_type_qualifiers
> @@ -777,7 +777,7 @@
> arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
> -      qualifier_unsigned};
> +      qualifier_predicate};
>  #define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)
> 
>  static enum arm_type_qualifiers
> @@ -793,13 +793,13 @@
> arm_ldrgbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_ldrgbwbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_none, qualifier_unsigned, qualifier_immediate,
> -      qualifier_unsigned};
> +      qualifier_predicate};
>  #define LDRGBWBS_Z_QUALIFIERS (arm_ldrgbwbs_z_qualifiers)
> 
>  static enum arm_type_qualifiers
>  arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
> -      qualifier_unsigned};
> +      qualifier_predicate};
>  #define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers)
> 
>  static enum arm_type_qualifiers
> @@ -815,13 +815,13 @@
> arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
> -      qualifier_none, qualifier_unsigned};
> +      qualifier_none, qualifier_predicate};
>  #define STRSBWBS_P_QUALIFIERS (arm_strsbwbs_p_qualifiers)
> 
>  static enum arm_type_qualifiers
>  arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
> -      qualifier_unsigned, qualifier_unsigned};
> +      qualifier_unsigned, qualifier_predicate};
>  #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
> 
>  static enum arm_type_qualifiers
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index a8087815c22..9633b7187f6 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -7282,7 +7282,7 @@ (define_insn
> "mve_vstrwq_scatter_base_p_<supf>v4si"
>  		[(match_operand:V4SI 0 "s_register_operand" "w")
>  		 (match_operand:SI 1 "immediate_operand" "i")
>  		 (match_operand:V4SI 2 "s_register_operand" "w")
> -		 (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		 (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	 VSTRWSBQ))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -7371,7 +7371,7 @@ (define_insn
> "mve_vldrwq_gather_base_z_<supf>v4si"
>    [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
>  	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
>  		      (match_operand:SI 2 "immediate_operand" "i")
> -		      (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	 VLDRWGBQ))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -7609,7 +7609,7 @@ (define_insn "mve_vldrwq_<supf>v4si"
>  (define_insn "mve_vldrwq_z_fv4sf"
>    [(set (match_operand:V4SF 0 "s_register_operand" "=w")
>  	(unspec:V4SF [(match_operand:V4SI 1 "mve_memory_operand"
> "Ux")
> -	(match_operand:HI 2 "vpr_register_operand" "Up")]
> +	(match_operand:V4BI 2 "vpr_register_operand" "Up")]
>  	 VLDRWQ_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -7629,7 +7629,7 @@ (define_insn "mve_vldrwq_z_fv4sf"
>  (define_insn "mve_vldrwq_z_<supf>v4si"
>    [(set (match_operand:V4SI 0 "s_register_operand" "=w")
>  	(unspec:V4SI [(match_operand:V4SI 1 "mve_memory_operand"
> "Ux")
> -	(match_operand:HI 2 "vpr_register_operand" "Up")]
> +	(match_operand:V4BI 2 "vpr_register_operand" "Up")]
>  	 VLDRWQ))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -7813,7 +7813,7 @@ (define_insn "mve_vldrhq_gather_offset_z_fv8hf"
>    [(set (match_operand:V8HF 0 "s_register_operand" "=&w")
>  	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")
>  		      (match_operand:V8HI 2 "s_register_operand" "w")
> -		      (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		      (match_operand:V8BI 3 "vpr_register_operand" "Up")]
>  	 VLDRHQGO_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -7855,7 +7855,7 @@ (define_insn
> "mve_vldrhq_gather_shifted_offset_z_fv8hf"
>    [(set (match_operand:V8HF 0 "s_register_operand" "=&w")
>  	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")
>  		      (match_operand:V8HI 2 "s_register_operand" "w")
> -		      (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		      (match_operand:V8BI 3 "vpr_register_operand" "Up")]
>  	 VLDRHQGSO_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -7897,7 +7897,7 @@ (define_insn "mve_vldrwq_gather_base_z_fv4sf"
>    [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
>  	(unspec:V4SF [(match_operand:V4SI 1 "s_register_operand" "w")
>  		      (match_operand:SI 2 "immediate_operand" "i")
> -		      (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	 VLDRWQGB_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -7958,7 +7958,7 @@ (define_insn "mve_vldrwq_gather_offset_z_fv4sf"
>    [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
>  	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")
>  		      (match_operand:V4SI 2 "s_register_operand" "w")
> -		      (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	 VLDRWQGO_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -7980,7 +7980,7 @@ (define_insn
> "mve_vldrwq_gather_offset_z_<supf>v4si"
>    [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
>  	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")
>  		      (match_operand:V4SI 2 "s_register_operand" "w")
> -		      (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	 VLDRWGOQ))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -8042,7 +8042,7 @@ (define_insn
> "mve_vldrwq_gather_shifted_offset_z_fv4sf"
>    [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
>  	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")
>  		      (match_operand:V4SI 2 "s_register_operand" "w")
> -		      (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	 VLDRWQGSO_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -8064,7 +8064,7 @@ (define_insn
> "mve_vldrwq_gather_shifted_offset_z_<supf>v4si"
>    [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
>  	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")
>  		      (match_operand:V4SI 2 "s_register_operand" "w")
> -		      (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		      (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	 VLDRWGSOQ))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -8104,7 +8104,7 @@ (define_insn "mve_vstrhq_fv8hf"
>  (define_insn "mve_vstrhq_p_fv8hf"
>    [(set (match_operand:V8HI 0 "mve_memory_operand" "=Ux")
>  	(unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w")
> -		      (match_operand:HI 2 "vpr_register_operand" "Up")]
> +		      (match_operand:V8BI 2 "vpr_register_operand" "Up")]
>  	 VSTRHQ_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -8323,7 +8323,7 @@ (define_insn "mve_vstrwq_p_fv4sf"
>  (define_insn "mve_vstrwq_p_<supf>v4si"
>    [(set (match_operand:V4SI 0 "mve_memory_operand" "=Ux")
>  	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
> -		      (match_operand:HI 2 "vpr_register_operand" "Up")]
> +		      (match_operand:V4BI 2 "vpr_register_operand" "Up")]
>  	 VSTRWQ))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -8576,7 +8576,7 @@ (define_expand
> "mve_vstrhq_scatter_offset_p_fv8hf"
>    [(match_operand:V8HI 0 "mve_scatter_memory")
>     (match_operand:V8HI 1 "s_register_operand")
>     (match_operand:V8HF 2 "s_register_operand")
> -   (match_operand:HI 3 "vpr_register_operand")
> +   (match_operand:V8BI 3 "vpr_register_operand")
>     (unspec:V4SI [(const_int 0)] VSTRHQSO_F)]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>  {
> @@ -8594,7 +8594,7 @@ (define_insn
> "mve_vstrhq_scatter_offset_p_fv8hf_insn"
>  	  [(match_operand:SI 0 "register_operand" "r")
>  	   (match_operand:V8HI 1 "s_register_operand" "w")
>  	   (match_operand:V8HF 2 "s_register_operand" "w")
> -	   (match_operand:HI 3 "vpr_register_operand" "Up")]
> +	   (match_operand:V8BI 3 "vpr_register_operand" "Up")]
>  	  VSTRHQSO_F))]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>    "vpst\;vstrht.16\t%q2, [%0, %q1]"
> @@ -8635,7 +8635,7 @@ (define_expand
> "mve_vstrhq_scatter_shifted_offset_p_fv8hf"
>    [(match_operand:V8HI 0 "memory_operand" "=Us")
>     (match_operand:V8HI 1 "s_register_operand" "w")
>     (match_operand:V8HF 2 "s_register_operand" "w")
> -   (match_operand:HI 3 "vpr_register_operand" "Up")
> +   (match_operand:V8BI 3 "vpr_register_operand" "Up")
>     (unspec:V4SI [(const_int 0)] VSTRHQSSO_F)]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>  {
> @@ -8654,7 +8654,7 @@ (define_insn
> "mve_vstrhq_scatter_shifted_offset_p_fv8hf_insn"
>  	  [(match_operand:SI 0 "register_operand" "r")
>  	   (match_operand:V8HI 1 "s_register_operand" "w")
>  	   (match_operand:V8HF 2 "s_register_operand" "w")
> -	   (match_operand:HI 3 "vpr_register_operand" "Up")]
> +	   (match_operand:V8BI 3 "vpr_register_operand" "Up")]
>  	  VSTRHQSSO_F))]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>    "vpst\;vstrht.16\t%q2, [%0, %q1, uxtw #1]"
> @@ -8691,7 +8691,7 @@ (define_insn "mve_vstrwq_scatter_base_p_fv4sf"
>  		[(match_operand:V4SI 0 "s_register_operand" "w")
>  		 (match_operand:SI 1 "immediate_operand" "i")
>  		 (match_operand:V4SF 2 "s_register_operand" "w")
> -		 (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		 (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	 VSTRWQSB_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -8740,7 +8740,7 @@ (define_expand
> "mve_vstrwq_scatter_offset_p_fv4sf"
>    [(match_operand:V4SI 0 "mve_scatter_memory")
>     (match_operand:V4SI 1 "s_register_operand")
>     (match_operand:V4SF 2 "s_register_operand")
> -   (match_operand:HI 3 "vpr_register_operand")
> +   (match_operand:V4BI 3 "vpr_register_operand")
>     (unspec:V4SI [(const_int 0)] VSTRWQSO_F)]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>  {
> @@ -8758,7 +8758,7 @@ (define_insn
> "mve_vstrwq_scatter_offset_p_fv4sf_insn"
>  	  [(match_operand:SI 0 "register_operand" "r")
>  	   (match_operand:V4SI 1 "s_register_operand" "w")
>  	   (match_operand:V4SF 2 "s_register_operand" "w")
> -	   (match_operand:HI 3 "vpr_register_operand" "Up")]
> +	   (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	  VSTRWQSO_F))]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>    "vpst\;vstrwt.32\t%q2, [%0, %q1]"
> @@ -8771,7 +8771,7 @@ (define_expand
> "mve_vstrwq_scatter_offset_p_<supf>v4si"
>    [(match_operand:V4SI 0 "mve_scatter_memory")
>     (match_operand:V4SI 1 "s_register_operand")
>     (match_operand:V4SI 2 "s_register_operand")
> -   (match_operand:HI 3 "vpr_register_operand")
> +   (match_operand:V4BI 3 "vpr_register_operand")
>     (unspec:V4SI [(const_int 0)] VSTRWSOQ)]
>    "TARGET_HAVE_MVE"
>  {
> @@ -8789,7 +8789,7 @@ (define_insn
> "mve_vstrwq_scatter_offset_p_<supf>v4si_insn"
>  	  [(match_operand:SI 0 "register_operand" "r")
>  	   (match_operand:V4SI 1 "s_register_operand" "w")
>  	   (match_operand:V4SI 2 "s_register_operand" "w")
> -	   (match_operand:HI 3 "vpr_register_operand" "Up")]
> +	   (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	  VSTRWSOQ))]
>    "TARGET_HAVE_MVE"
>    "vpst\;vstrwt.32\t%q2, [%0, %q1]"
> @@ -8858,7 +8858,7 @@ (define_expand
> "mve_vstrwq_scatter_shifted_offset_p_fv4sf"
>    [(match_operand:V4SI 0 "mve_scatter_memory")
>     (match_operand:V4SI 1 "s_register_operand")
>     (match_operand:V4SF 2 "s_register_operand")
> -   (match_operand:HI 3 "vpr_register_operand")
> +   (match_operand:V4BI 3 "vpr_register_operand")
>     (unspec:V4SI [(const_int 0)] VSTRWQSSO_F)]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>  {
> @@ -8877,7 +8877,7 @@ (define_insn
> "mve_vstrwq_scatter_shifted_offset_p_fv4sf_insn"
>  	  [(match_operand:SI 0 "register_operand" "r")
>  	   (match_operand:V4SI 1 "s_register_operand" "w")
>  	   (match_operand:V4SF 2 "s_register_operand" "w")
> -	   (match_operand:HI 3 "vpr_register_operand" "Up")]
> +	   (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	  VSTRWQSSO_F))]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>    "vpst\;vstrwt.32\t%q2, [%0, %q1, uxtw #2]"
> @@ -8890,7 +8890,7 @@ (define_expand
> "mve_vstrwq_scatter_shifted_offset_p_<supf>v4si"
>    [(match_operand:V4SI 0 "mve_scatter_memory")
>     (match_operand:V4SI 1 "s_register_operand")
>     (match_operand:V4SI 2 "s_register_operand")
> -   (match_operand:HI 3 "vpr_register_operand")
> +   (match_operand:V4BI 3 "vpr_register_operand")
>     (unspec:V4SI [(const_int 0)] VSTRWSSOQ)]
>    "TARGET_HAVE_MVE"
>  {
> @@ -8909,7 +8909,7 @@ (define_insn
> "mve_vstrwq_scatter_shifted_offset_p_<supf>v4si_insn"
>  	  [(match_operand:SI 0 "register_operand" "r")
>  	   (match_operand:V4SI 1 "s_register_operand" "w")
>  	   (match_operand:V4SI 2 "s_register_operand" "w")
> -	   (match_operand:HI 3 "vpr_register_operand" "Up")]
> +	   (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	  VSTRWSSOQ))]
>    "TARGET_HAVE_MVE"
>    "vpst\;vstrwt.32\t%q2, [%0, %q1, uxtw #2]"
> @@ -9376,7 +9376,7 @@ (define_insn
> "mve_vstrwq_scatter_base_wb_p_<supf>v4si"
>  		[(match_operand:V4SI 1 "s_register_operand" "0")
>  		 (match_operand:SI 2 "mve_vldrd_immediate" "Ri")
>  		 (match_operand:V4SI 3 "s_register_operand" "w")
> -		 (match_operand:HI 4 "vpr_register_operand")]
> +		 (match_operand:V4BI 4 "vpr_register_operand")]
>  	VSTRWSBWBQ))
>     (set (match_operand:V4SI 0 "s_register_operand" "=w")
>  	(unspec:V4SI [(match_dup 1) (match_dup 2)]
> @@ -9427,7 +9427,7 @@ (define_insn
> "mve_vstrwq_scatter_base_wb_p_fv4sf"
>  		[(match_operand:V4SI 1 "s_register_operand" "0")
>  		 (match_operand:SI 2 "mve_vldrd_immediate" "Ri")
>  		 (match_operand:V4SF 3 "s_register_operand" "w")
> -		 (match_operand:HI 4 "vpr_register_operand")]
> +		 (match_operand:V4BI 4 "vpr_register_operand")]
>  	VSTRWQSBWB_F))
>     (set (match_operand:V4SI 0 "s_register_operand" "=w")
>  	(unspec:V4SI [(match_dup 1) (match_dup 2)]
> @@ -9551,7 +9551,7 @@ (define_expand
> "mve_vldrwq_gather_base_wb_z_<supf>v4si"
>    [(match_operand:V4SI 0 "s_register_operand")
>     (match_operand:V4SI 1 "s_register_operand")
>     (match_operand:SI 2 "mve_vldrd_immediate")
> -   (match_operand:HI 3 "vpr_register_operand")
> +   (match_operand:V4BI 3 "vpr_register_operand")
>     (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
>    "TARGET_HAVE_MVE"
>  {
> @@ -9566,7 +9566,7 @@ (define_expand
> "mve_vldrwq_gather_base_nowb_z_<supf>v4si"
>    [(match_operand:V4SI 0 "s_register_operand")
>     (match_operand:V4SI 1 "s_register_operand")
>     (match_operand:SI 2 "mve_vldrd_immediate")
> -   (match_operand:HI 3 "vpr_register_operand")
> +   (match_operand:V4BI 3 "vpr_register_operand")
>     (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
>    "TARGET_HAVE_MVE"
>  {
> @@ -9585,7 +9585,7 @@ (define_insn
> "mve_vldrwq_gather_base_wb_z_<supf>v4si_insn"
>    [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
>  	(unspec:V4SI [(match_operand:V4SI 2 "s_register_operand" "1")
>  		      (match_operand:SI 3 "mve_vldrd_immediate" "Ri")
> -		      (match_operand:HI 4 "vpr_register_operand" "Up")
> +		      (match_operand:V4BI 4 "vpr_register_operand" "Up")
>  		      (mem:BLK (scratch))]
>  	 VLDRWGBWBQ))
>     (set (match_operand:V4SI 1 "s_register_operand" "=&w")
> @@ -9659,7 +9659,7 @@ (define_expand
> "mve_vldrwq_gather_base_wb_z_fv4sf"
>    [(match_operand:V4SI 0 "s_register_operand")
>     (match_operand:V4SI 1 "s_register_operand")
>     (match_operand:SI 2 "mve_vldrd_immediate")
> -   (match_operand:HI 3 "vpr_register_operand")
> +   (match_operand:V4BI 3 "vpr_register_operand")
>     (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>  {
> @@ -9675,7 +9675,7 @@ (define_expand
> "mve_vldrwq_gather_base_nowb_z_fv4sf"
>    [(match_operand:V4SF 0 "s_register_operand")
>     (match_operand:V4SI 1 "s_register_operand")
>     (match_operand:SI 2 "mve_vldrd_immediate")
> -   (match_operand:HI 3 "vpr_register_operand")
> +   (match_operand:V4BI 3 "vpr_register_operand")
>     (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>  {
> @@ -9694,7 +9694,7 @@ (define_insn
> "mve_vldrwq_gather_base_wb_z_fv4sf_insn"
>    [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
>  	(unspec:V4SF [(match_operand:V4SI 2 "s_register_operand" "1")
>  		      (match_operand:SI 3 "mve_vldrd_immediate" "Ri")
> -		      (match_operand:HI 4 "vpr_register_operand" "Up")
> +		      (match_operand:V4BI 4 "vpr_register_operand" "Up")
>  		      (mem:BLK (scratch))]
>  	 VLDRWQGBWB_F))
>     (set (match_operand:V4SI 1 "s_register_operand" "=&w")
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 13/15] arm: Convert more MVE/CDE builtins to predicate qualifiers
  2022-01-13 14:56 ` [PATCH v3 13/15] arm: Convert more MVE/CDE " Christophe Lyon
@ 2022-01-27 16:56   ` Kyrylo Tkachov
  0 siblings, 0 replies; 54+ messages in thread
From: Kyrylo Tkachov @ 2022-01-27 16:56 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches



> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: Thursday, January 13, 2022 2:56 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH v3 13/15] arm: Convert more MVE/CDE builtins to predicate
> qualifiers
> 
> This patch covers a few non-load/store builtins where we do not use
> the <mode> iterator and thus we cannot use <MVE_vpred>.
> 

Ok.
Thanks,
Kyrill

> 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
> 
> 	gcc/
> 	PR target/100757
> 	PR target/101325
> 	* config/arm/arm-builtins.c (CX_UNARY_UNONE_QUALIFIERS): Use
> 	predicate.
> 	(CX_BINARY_UNONE_QUALIFIERS): Likewise.
> 	(CX_TERNARY_UNONE_QUALIFIERS): Likewise.
> 	(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
> 	(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
> 	(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS):
> Delete.
> 	* config/arm/arm_mve_builtins.def: Use predicated qualifiers.
> 	* config/arm/mve.md: Use VxBI instead of HI.
> 
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 73678a00398..f9437752a22 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -295,7 +295,7 @@ static enum arm_type_qualifiers
>  arm_cx_unary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_none, qualifier_immediate, qualifier_none,
>        qualifier_unsigned_immediate,
> -      qualifier_unsigned };
> +      qualifier_predicate };
>  #define CX_UNARY_UNONE_QUALIFIERS (arm_cx_unary_unone_qualifiers)
> 
>  /* T (immediate, T, T, unsigned immediate).  */
> @@ -304,7 +304,7 @@
> arm_cx_binary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_none, qualifier_immediate,
>        qualifier_none, qualifier_none,
>        qualifier_unsigned_immediate,
> -      qualifier_unsigned };
> +      qualifier_predicate };
>  #define CX_BINARY_UNONE_QUALIFIERS (arm_cx_binary_unone_qualifiers)
> 
>  /* T (immediate, T, T, T, unsigned immediate).  */
> @@ -313,7 +313,7 @@
> arm_cx_ternary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_none, qualifier_immediate,
>        qualifier_none, qualifier_none, qualifier_none,
>        qualifier_unsigned_immediate,
> -      qualifier_unsigned };
> +      qualifier_predicate };
>  #define CX_TERNARY_UNONE_QUALIFIERS
> (arm_cx_ternary_unone_qualifiers)
> 
>  /* The first argument (return type) of a store should be void type,
> @@ -509,12 +509,6 @@
> arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS \
>    (arm_ternop_none_none_none_imm_qualifiers)
> 
> -static enum arm_type_qualifiers
> -
> arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS
> ]
> -  = { qualifier_none, qualifier_none, qualifier_none, qualifier_unsigned };
> -#define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
> -  (arm_ternop_none_none_none_unone_qualifiers)
> -
>  static enum arm_type_qualifiers
>  arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
> @@ -567,13 +561,6 @@
> arm_quadop_unone_unone_none_none_pred_qualifiers[SIMD_MAX_BUILTI
> N_ARGS]
>  #define QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS \
>    (arm_quadop_unone_unone_none_none_pred_qualifiers)
> 
> -static enum arm_type_qualifiers
> -
> arm_quadop_none_none_none_none_unone_qualifiers[SIMD_MAX_BUILTI
> N_ARGS]
> -  = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
> -    qualifier_unsigned };
> -#define QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS \
> -  (arm_quadop_none_none_none_none_unone_qualifiers)
> -
>  static enum arm_type_qualifiers
> 
> arm_quadop_none_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_
> ARGS]
>    = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
> @@ -588,13 +575,6 @@
> arm_quadop_none_none_none_imm_pred_qualifiers[SIMD_MAX_BUILTIN_
> ARGS]
>  #define QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS \
>    (arm_quadop_none_none_none_imm_pred_qualifiers)
> 
> -static enum arm_type_qualifiers
> -
> arm_quadop_unone_unone_unone_unone_unone_qualifiers[SIMD_MAX_B
> UILTIN_ARGS]
> -  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
> -    qualifier_unsigned, qualifier_unsigned };
> -#define QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
> -  (arm_quadop_unone_unone_unone_unone_unone_qualifiers)
> -
>  static enum arm_type_qualifiers
> 
> arm_quadop_unone_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUI
> LTIN_ARGS]
>    = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
> diff --git a/gcc/config/arm/arm_mve_builtins.def
> b/gcc/config/arm/arm_mve_builtins.def
> index 7db6d47867e..1c8ee34f5cb 100644
> --- a/gcc/config/arm/arm_mve_builtins.def
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -87,8 +87,8 @@ VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u,
> v16qi, v8hi, v4si, v2di)
>  VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di)
>  VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
> -VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
> -VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
> +VAR1 (BINOP_NONE_NONE_PRED, vaddlvq_p_s, v4si)
> +VAR1 (BINOP_UNONE_UNONE_PRED, vaddlvq_p_u, v4si)
>  VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
> @@ -465,20 +465,20 @@ VAR2 (TERNOP_NONE_NONE_NONE_IMM,
> vqshrnbq_n_s, v8hi, v4si)
>  VAR2 (TERNOP_NONE_NONE_NONE_IMM, vqrshrntq_n_s, v8hi, v4si)
>  VAR2 (TERNOP_NONE_NONE_IMM_PRED, vorrq_m_n_s, v8hi, v4si)
>  VAR2 (TERNOP_NONE_NONE_IMM_PRED, vmvnq_m_n_s, v8hi, v4si)
> -VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrmlaldavhq_p_u, v4si)
> -VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrev16q_m_u, v16qi)
> -VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vaddlvaq_p_u, v4si)
> -VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrmlsldavhxq_p_s, v4si)
> -VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrmlsldavhq_p_s, v4si)
> -VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrmlaldavhxq_p_s, v4si)
> -VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrmlaldavhq_p_s, v4si)
> -VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrev32q_m_f, v8hf)
> -VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vrev16q_m_s, v16qi)
> -VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vcvttq_m_f32_f16, v4sf)
> -VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vcvttq_m_f16_f32, v8hf)
> -VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vcvtbq_m_f32_f16, v4sf)
> -VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vcvtbq_m_f16_f32, v8hf)
> -VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vaddlvaq_p_s, v4si)
> +VAR1 (TERNOP_UNONE_UNONE_UNONE_PRED, vrmlaldavhq_p_u, v4si)
> +VAR1 (TERNOP_UNONE_UNONE_UNONE_PRED, vrev16q_m_u, v16qi)
> +VAR1 (TERNOP_UNONE_UNONE_UNONE_PRED, vaddlvaq_p_u, v4si)
> +VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrmlsldavhxq_p_s, v4si)
> +VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrmlsldavhq_p_s, v4si)
> +VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrmlaldavhxq_p_s, v4si)
> +VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrmlaldavhq_p_s, v4si)
> +VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrev32q_m_f, v8hf)
> +VAR1 (TERNOP_NONE_NONE_NONE_PRED, vrev16q_m_s, v16qi)
> +VAR1 (TERNOP_NONE_NONE_NONE_PRED, vcvttq_m_f32_f16, v4sf)
> +VAR1 (TERNOP_NONE_NONE_NONE_PRED, vcvttq_m_f16_f32, v8hf)
> +VAR1 (TERNOP_NONE_NONE_NONE_PRED, vcvtbq_m_f32_f16, v4sf)
> +VAR1 (TERNOP_NONE_NONE_NONE_PRED, vcvtbq_m_f16_f32, v8hf)
> +VAR1 (TERNOP_NONE_NONE_NONE_PRED, vaddlvaq_p_s, v4si)
>  VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlsldavhaxq_s, v4si)
>  VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlsldavhaq_s, v4si)
>  VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlaldavhaxq_s, v4si)
> @@ -629,11 +629,11 @@ VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED,
> vqshrntq_m_n_s, v8hi, v4si)
>  VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqshrnbq_m_n_s, v8hi,
> v4si)
>  VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqrshrntq_m_n_s, v8hi,
> v4si)
>  VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vqrshrnbq_m_n_s, v8hi,
> v4si)
> -VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE,
> vrmlaldavhaq_p_u, v4si)
> -VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlsldavhaxq_p_s,
> v4si)
> -VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlsldavhaq_p_s,
> v4si)
> -VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlaldavhaxq_p_s,
> v4si)
> -VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vrmlaldavhaq_p_s,
> v4si)
> +VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED,
> vrmlaldavhaq_p_u, v4si)
> +VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vrmlsldavhaxq_p_s,
> v4si)
> +VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vrmlsldavhaq_p_s, v4si)
> +VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vrmlaldavhaxq_p_s,
> v4si)
> +VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vrmlaldavhaq_p_s, v4si)
>  VAR2 (QUADOP_UNONE_UNONE_NONE_IMM_PRED, vcvtq_m_n_from_f_u,
> v8hi, v4si)
>  VAR2 (QUADOP_NONE_NONE_NONE_IMM_PRED, vcvtq_m_n_from_f_s,
> v8hi, v4si)
>  VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbrsrq_m_n_f, v8hf,
> v4sf)
> @@ -845,14 +845,14 @@ VAR1 (BINOP_NONE_NONE_NONE, vsbciq_s, v4si)
>  VAR1 (BINOP_UNONE_UNONE_UNONE, vsbciq_u, v4si)
>  VAR1 (BINOP_NONE_NONE_NONE, vsbcq_s, v4si)
>  VAR1 (BINOP_UNONE_UNONE_UNONE, vsbcq_u, v4si)
> -VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vadciq_m_s, v4si)
> -VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vadciq_m_u,
> v4si)
> -VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vadcq_m_s, v4si)
> -VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vadcq_m_u,
> v4si)
> -VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsbciq_m_s, v4si)
> -VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbciq_m_u,
> v4si)
> -VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsbcq_m_s, v4si)
> -VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbcq_m_u,
> v4si)
> +VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vadciq_m_s, v4si)
> +VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vadciq_m_u, v4si)
> +VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vadcq_m_s, v4si)
> +VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vadcq_m_u, v4si)
> +VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vsbciq_m_s, v4si)
> +VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vsbciq_m_u, v4si)
> +VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vsbcq_m_s, v4si)
> +VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vsbcq_m_u, v4si)
>  VAR5 (STORE1, vst2q, v16qi, v8hi, v4si, v8hf, v4sf)
>  VAR5 (LOAD1, vld4q, v16qi, v8hi, v4si, v8hf, v4sf)
>  VAR5 (LOAD1, vld2q, v16qi, v8hi, v4si, v8hf, v4sf)
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 9633b7187f6..41e85b1a278 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -826,7 +826,7 @@ (define_insn "mve_vaddlvq_p_<supf>v4si"
>    [
>     (set (match_operand:DI 0 "s_register_operand" "=r")
>  	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
> -		    (match_operand:HI 2 "vpr_register_operand" "Up")]
> +		    (match_operand:V4BI 2 "vpr_register_operand" "Up")]
>  	 VADDLVQ_P))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -3739,7 +3739,7 @@ (define_insn "mve_vaddlvaq_p_<supf>v4si"
>     (set (match_operand:DI 0 "s_register_operand" "=r")
>  	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
>  		       (match_operand:V4SI 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	 VADDLVAQ_P))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -3949,7 +3949,7 @@ (define_insn "mve_vcvtbq_m_f16_f32v8hf"
>     (set (match_operand:V8HF 0 "s_register_operand" "=w")
>  	(unspec:V8HF [(match_operand:V8HF 1 "s_register_operand" "0")
>  		       (match_operand:V4SF 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
>  	 VCVTBQ_M_F16_F32))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -3965,7 +3965,7 @@ (define_insn "mve_vcvtbq_m_f32_f16v4sf"
>     (set (match_operand:V4SF 0 "s_register_operand" "=w")
>  	(unspec:V4SF [(match_operand:V4SF 1 "s_register_operand" "0")
>  		       (match_operand:V8HF 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
>  	 VCVTBQ_M_F32_F16))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -3981,7 +3981,7 @@ (define_insn "mve_vcvttq_m_f16_f32v8hf"
>     (set (match_operand:V8HF 0 "s_register_operand" "=w")
>  	(unspec:V8HF [(match_operand:V8HF 1 "s_register_operand" "0")
>  		       (match_operand:V4SF 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
>  	 VCVTTQ_M_F16_F32))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -3997,7 +3997,7 @@ (define_insn "mve_vcvttq_m_f32_f16v4sf"
>     (set (match_operand:V4SF 0 "s_register_operand" "=w")
>  	(unspec:V4SF [(match_operand:V4SF 1 "s_register_operand" "0")
>  		       (match_operand:V8HF 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
>  	 VCVTTQ_M_F32_F16))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -4595,7 +4595,7 @@ (define_insn "mve_vrev32q_m_fv8hf"
>     (set (match_operand:V8HF 0 "s_register_operand" "=w")
>  	(unspec:V8HF [(match_operand:V8HF 1 "s_register_operand" "0")
>  		       (match_operand:V8HF 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
>  	 VREV32Q_M_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -4659,7 +4659,7 @@ (define_insn "mve_vrmlaldavhxq_p_sv4si"
>     (set (match_operand:DI 0 "s_register_operand" "=r")
>  	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
>  		       (match_operand:V4SI 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
>  	 VRMLALDAVHXQ_P_S))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -4691,7 +4691,7 @@ (define_insn "mve_vrmlsldavhq_p_sv4si"
>     (set (match_operand:DI 0 "s_register_operand" "=r")
>  	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
>  		       (match_operand:V4SI 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
>  	 VRMLSLDAVHQ_P_S))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -4707,7 +4707,7 @@ (define_insn "mve_vrmlsldavhxq_p_sv4si"
>     (set (match_operand:DI 0 "s_register_operand" "=r")
>  	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
>  		       (match_operand:V4SI 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
>  	 VRMLSLDAVHXQ_P_S))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -4932,7 +4932,7 @@ (define_insn "mve_vrev16q_m_<supf>v16qi"
>     (set (match_operand:V16QI 0 "s_register_operand" "=w")
>  	(unspec:V16QI [(match_operand:V16QI 1 "s_register_operand" "0")
>  		       (match_operand:V16QI 2 "s_register_operand" "w")
> -		       (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		       (match_operand:V16BI 3 "vpr_register_operand" "Up")]
>  	 VREV16Q_M))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -4964,7 +4964,7 @@ (define_insn "mve_vrmlaldavhq_p_<supf>v4si"
>     (set (match_operand:DI 0 "s_register_operand" "=r")
>  	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
>  		    (match_operand:V4SI 2 "s_register_operand" "w")
> -		    (match_operand:HI 3 "vpr_register_operand" "Up")]
> +		    (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>  	 VRMLALDAVHQ_P))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -6233,7 +6233,7 @@ (define_insn "mve_vrmlaldavhaq_p_sv4si"
>  	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
>  		       (match_operand:V4SI 2 "s_register_operand" "w")
>  		       (match_operand:V4SI 3 "s_register_operand" "w")
> -		       (match_operand:HI 4 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
>  	 VRMLALDAVHAQ_P_S))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -6556,7 +6556,7 @@ (define_insn "mve_vrmlaldavhaq_p_uv4si"
>  	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
>  		       (match_operand:V4SI 2 "s_register_operand" "w")
>  		       (match_operand:V4SI 3 "s_register_operand" "w")
> -		       (match_operand:HI 4 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
>  	 VRMLALDAVHAQ_P_U))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -6573,7 +6573,7 @@ (define_insn "mve_vrmlaldavhaxq_p_sv4si"
>  	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
>  		       (match_operand:V4SI 2 "s_register_operand" "w")
>  		       (match_operand:V4SI 3 "s_register_operand" "w")
> -		       (match_operand:HI 4 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
>  	 VRMLALDAVHAXQ_P_S))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -6590,7 +6590,7 @@ (define_insn "mve_vrmlsldavhaq_p_sv4si"
>  	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
>  		       (match_operand:V4SI 2 "s_register_operand" "w")
>  		       (match_operand:V4SI 3 "s_register_operand" "w")
> -		       (match_operand:HI 4 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
>  	 VRMLSLDAVHAQ_P_S))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -6607,7 +6607,7 @@ (define_insn "mve_vrmlsldavhaxq_p_sv4si"
>  	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
>  		       (match_operand:V4SI 2 "s_register_operand" "w")
>  		       (match_operand:V4SI 3 "s_register_operand" "w")
> -		       (match_operand:HI 4 "vpr_register_operand" "Up")]
> +		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
>  	 VRMLSLDAVHAXQ_P_S))
>    ]
>    "TARGET_HAVE_MVE"
> @@ -7528,7 +7528,7 @@ (define_insn "mve_vldrhq_<supf><mode>"
>  (define_insn "mve_vldrhq_z_fv8hf"
>    [(set (match_operand:V8HF 0 "s_register_operand" "=w")
>  	(unspec:V8HF [(match_operand:V8HI 1 "mve_memory_operand"
> "Ux")
> -	(match_operand:HI 2 "vpr_register_operand" "Up")]
> +	(match_operand:<MVE_VPRED> 2 "vpr_register_operand" "Up")]
>  	 VLDRHQ_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -8303,7 +8303,7 @@ (define_insn "mve_vstrwq_fv4sf"
>  (define_insn "mve_vstrwq_p_fv4sf"
>    [(set (match_operand:V4SI 0 "mve_memory_operand" "=Ux")
>  	(unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")
> -		      (match_operand:HI 2 "vpr_register_operand" "Up")]
> +		      (match_operand:<MVE_VPRED> 2
> "vpr_register_operand" "Up")]
>  	 VSTRWQ_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> @@ -9844,7 +9844,7 @@ (define_insn "mve_vadciq_m_<supf>v4si"
>  	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "0")
>  		      (match_operand:V4SI 2 "s_register_operand" "w")
>  		      (match_operand:V4SI 3 "s_register_operand" "w")
> -		      (match_operand:HI 4 "vpr_register_operand" "Up")]
> +		      (match_operand:V4BI 4 "vpr_register_operand" "Up")]
>  	 VADCIQ_M))
>     (set (reg:SI VFPCC_REGNUM)
>  	(unspec:SI [(const_int 0)]
> @@ -9880,7 +9880,7 @@ (define_insn "mve_vadcq_m_<supf>v4si"
>  	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "0")
>  		      (match_operand:V4SI 2 "s_register_operand" "w")
>  		      (match_operand:V4SI 3 "s_register_operand" "w")
> -		      (match_operand:HI 4 "vpr_register_operand" "Up")]
> +		      (match_operand:V4BI 4 "vpr_register_operand" "Up")]
>  	 VADCQ_M))
>     (set (reg:SI VFPCC_REGNUM)
>  	(unspec:SI [(reg:SI VFPCC_REGNUM)]
> @@ -9917,7 +9917,7 @@ (define_insn "mve_vsbciq_m_<supf>v4si"
>  	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
>  		      (match_operand:V4SI 2 "s_register_operand" "w")
>  		      (match_operand:V4SI 3 "s_register_operand" "w")
> -		      (match_operand:HI 4 "vpr_register_operand" "Up")]
> +		      (match_operand:V4BI 4 "vpr_register_operand" "Up")]
>  	 VSBCIQ_M))
>     (set (reg:SI VFPCC_REGNUM)
>  	(unspec:SI [(const_int 0)]
> @@ -9953,7 +9953,7 @@ (define_insn "mve_vsbcq_m_<supf>v4si"
>  	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
>  		      (match_operand:V4SI 2 "s_register_operand" "w")
>  		      (match_operand:V4SI 3 "s_register_operand" "w")
> -		      (match_operand:HI 4 "vpr_register_operand" "Up")]
> +		      (match_operand:V4BI 4 "vpr_register_operand" "Up")]
>  	 VSBCQ_M))
>     (set (reg:SI VFPCC_REGNUM)
>  	(unspec:SI [(reg:SI VFPCC_REGNUM)]
> @@ -10457,7 +10457,7 @@ (define_insn "arm_vcx1q<a>_p_v16qi"
>  	(unspec:V16QI [(match_operand:SI 1 "const_int_coproc_operand"
> "i")
>  			   (match_operand:V16QI 2 "register_operand" "0")
>  			   (match_operand:SI 3
> "const_int_mve_cde1_operand" "i")
> -			   (match_operand:HI 4 "vpr_register_operand"
> "Up")]
> +			   (match_operand:V16BI 4 "vpr_register_operand"
> "Up")]
>  	 CDE_VCX))]
>    "TARGET_CDE && TARGET_HAVE_MVE"
>    "vpst\;vcx1<a>t\\tp%c1, %q0, #%c3"
> @@ -10471,7 +10471,7 @@ (define_insn "arm_vcx2q<a>_p_v16qi"
>  			  (match_operand:V16QI 2 "register_operand" "0")
>  			  (match_operand:V16QI 3 "register_operand" "t")
>  			  (match_operand:SI 4
> "const_int_mve_cde2_operand" "i")
> -			  (match_operand:HI 5 "vpr_register_operand"
> "Up")]
> +			  (match_operand:V16BI 5 "vpr_register_operand"
> "Up")]
>  	 CDE_VCX))]
>    "TARGET_CDE && TARGET_HAVE_MVE"
>    "vpst\;vcx2<a>t\\tp%c1, %q0, %q3, #%c4"
> @@ -10486,7 +10486,7 @@ (define_insn "arm_vcx3q<a>_p_v16qi"
>  			  (match_operand:V16QI 3 "register_operand" "t")
>  			  (match_operand:V16QI 4 "register_operand" "t")
>  			  (match_operand:SI 5
> "const_int_mve_cde3_operand" "i")
> -			  (match_operand:HI 6 "vpr_register_operand"
> "Up")]
> +			  (match_operand:V16BI 6 "vpr_register_operand"
> "Up")]
>  	 CDE_VCX))]
>    "TARGET_CDE && TARGET_HAVE_MVE"
>    "vpst\;vcx3<a>t\\tp%c1, %q0, %q3, %q4, #%c5"
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-01-27 16:28   ` Kyrylo Tkachov
@ 2022-01-27 18:10     ` Christophe Lyon
  0 siblings, 0 replies; 54+ messages in thread
From: Christophe Lyon @ 2022-01-27 18:10 UTC (permalink / raw)
  To: Kyrylo Tkachov; +Cc: Christophe Lyon, Richard Sandiford, gcc-patches

On Thu, Jan 27, 2022 at 5:29 PM Kyrylo Tkachov via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi Christophe,
>
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-
> > bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Christophe
> > Lyon via Gcc-patches
> > Sent: Thursday, January 13, 2022 2:56 PM
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of
> > booleans
> >
> > This patch implements support for vectors of booleans to support MVE
> > predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
> > uint16_t) to represent predicates in intrinsics prototypes, we
> > introduce a new "predicate" type qualifier so that we can map relevant
> > builtins HImode arguments and return value to the appropriate vector
> > of booleans (VxBI).
> >
> > We have to update test_vector_ops_duplicate, because it iterates using
> > an offset in bytes, where we would need to iterate in bits: we stop
> > iterating when we reach the end of the vector of booleans.
> >
> > In addition, we have to fix the underlying definition of vectors of
> > booleans because ARM/MVE needs a different representation than
> > AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
> > element size, so that a true element of V4BI is represented by
> > '0b1111'.  This patch updates the aarch64 definition of VNx*BI as
> > needed.
> >
> > 2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
> >       Richard Sandiford  <richard.sandiford@arm.com>
> >
> >       gcc/
> >       PR target/100757
> >       PR target/101325
> >       * config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
> >       VNx2BI): Update definition.
> >       * config/arm/arm-builtins.c (arm_init_simd_builtin_types): Add new
> >       simd types.
> >       (arm_init_builtin): Map predicate vectors arguments to HImode.
> >       (arm_expand_builtin_args): Move HImode predicate arguments to
> > VxBI
> >       rtx. Move return value to HImode rtx.
> >       * config/arm/arm-builtins.h (arm_type_qualifiers): Add
> > qualifier_predicate.
> >       * config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New
> > modes.
> >       * config/arm/arm-simd-builtin-types.def (Pred1x16_t,
> >       Pred2x8_t,Pred4x4_t): New.
> >       * emit-rtl.c (init_emit_once): Handle all boolean modes.
> >       * genmodes.c (mode_data): Add boolean field.
> >       (blank_mode): Initialize it.
> >       (make_complex_modes): Fix handling of boolean modes.
> >       (make_vector_modes): Likewise.
> >       (VECTOR_BOOL_MODE): Use new COMPONENT parameter.
> >       (make_vector_bool_mode): Likewise.
> >       (BOOL_MODE): New.
> >       (make_bool_mode): New.
> >       (emit_insn_modes_h): Fix generation of boolean modes.
> >       (emit_class_narrowest_mode): Likewise.
> >       * machmode.def: Use new BOOL_MODE instead of
> > FRACTIONAL_INT_MODE
> >       to define BImode.
> >       * rtx-vector-builder.c (rtx_vector_builder::find_cached_value):
> >       Fix handling of constm1_rtx for VECTOR_BOOL.
> >       * simplify-rtx.c (native_encode_rtx): Fix support for VECTOR_BOOL.
> >       (native_decode_vector_rtx): Likewise.
> >       (test_vector_ops_duplicate): Skip vec_merge test
> >       with vectors of booleans.
> >       * varasm.c (output_constant_pool_2): Likewise.
>
> The arm parts look ok. I guess Richard is best placed to approve the
> midend parts, but I see he's on the ChangeLog so maybe he needs others to
> review them. But then again Richard is maintainer of the gen* machinery
> that's the most complicated part of the patch so he can self-approve 😊
>

Thanks Kyrill,

Regarding the ARM part, Andre had a concern, I don't know if my proposal is
OK for him?

Christophe


> Thanks,
> Kyrill
>
> >
> > diff --git a/gcc/config/aarch64/aarch64-modes.def
> > b/gcc/config/aarch64/aarch64-modes.def
> > index 976bf9b42be..8f399225a80 100644
> > --- a/gcc/config/aarch64/aarch64-modes.def
> > +++ b/gcc/config/aarch64/aarch64-modes.def
> > @@ -47,10 +47,10 @@ ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
> >
> >  /* Vector modes.  */
> >
> > -VECTOR_BOOL_MODE (VNx16BI, 16, 2);
> > -VECTOR_BOOL_MODE (VNx8BI, 8, 2);
> > -VECTOR_BOOL_MODE (VNx4BI, 4, 2);
> > -VECTOR_BOOL_MODE (VNx2BI, 2, 2);
> > +VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
> > +VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
> > +VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
> > +VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
> >
> >  ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
> >  ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
> > diff --git a/gcc/config/arm/arm-builtins.c
> b/gcc/config/arm/arm-builtins.c
> > index 9c645722230..2ccfa37c302 100644
> > --- a/gcc/config/arm/arm-builtins.c
> > +++ b/gcc/config/arm/arm-builtins.c
> > @@ -1548,6 +1548,13 @@ arm_init_simd_builtin_types (void)
> >    arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
> >    arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
> >
> > +  if (TARGET_HAVE_MVE)
> > +    {
> > +      arm_simd_types[Pred1x16_t].eltype = unsigned_intHI_type_node;
> > +      arm_simd_types[Pred2x8_t].eltype = unsigned_intHI_type_node;
> > +      arm_simd_types[Pred4x4_t].eltype = unsigned_intHI_type_node;
> > +    }
> > +
> >    for (i = 0; i < nelts; i++)
> >      {
> >        tree eltype = arm_simd_types[i].eltype;
> > @@ -1695,6 +1702,11 @@ arm_init_builtin (unsigned int fcode,
> > arm_builtin_datum *d,
> >        if (qualifiers & qualifier_map_mode)
> >       op_mode = d->mode;
> >
> > +      /* MVE Predicates use HImode as mandated by the ABI: pred16_t is
> > unsigned
> > +      short.  */
> > +      if (qualifiers & qualifier_predicate)
> > +     op_mode = HImode;
> > +
> >        /* For pointers, we want a pointer to the basic type
> >        of the vector.  */
> >        if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
> > @@ -2939,6 +2951,11 @@ arm_expand_builtin_args (rtx target,
> > machine_mode map_mode, int fcode,
> >           case ARG_BUILTIN_COPY_TO_REG:
> >             if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
> >               op[argc] = convert_memory_address (Pmode, op[argc]);
> > +
> > +           /* MVE uses mve_pred16_t (aka HImode) for vectors of
> > predicates.  */
> > +           if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
> > +             op[argc] = gen_lowpart (mode[argc], op[argc]);
> > +
> >             /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
> >             if (!(*insn_data[icode].operand[opno].predicate)
> >                 (op[argc], mode[argc]))
> > @@ -3144,6 +3161,13 @@ constant_arg:
> >    else
> >      emit_insn (insn);
> >
> > +  if (GET_MODE_CLASS (tmode) == MODE_VECTOR_BOOL)
> > +    {
> > +      rtx HItarget = gen_reg_rtx (HImode);
> > +      emit_move_insn (HItarget, gen_lowpart (HImode, target));
> > +      return HItarget;
> > +    }
> > +
> >    return target;
> >  }
> >
> > diff --git a/gcc/config/arm/arm-builtins.h
> b/gcc/config/arm/arm-builtins.h
> > index e5130d6d286..a8ef8aef82d 100644
> > --- a/gcc/config/arm/arm-builtins.h
> > +++ b/gcc/config/arm/arm-builtins.h
> > @@ -84,7 +84,9 @@ enum arm_type_qualifiers
> >    qualifier_lane_pair_index = 0x1000,
> >    /* Lane indices selected in quadtuplets - must be within range of
> previous
> >       argument = a vector.  */
> > -  qualifier_lane_quadtup_index = 0x2000
> > +  qualifier_lane_quadtup_index = 0x2000,
> > +  /* MVE vector predicates.  */
> > +  qualifier_predicate = 0x4000
> >  };
> >
> >  struct arm_simd_type_info
> > diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
> > index de689c8b45e..9ed0cd042c5 100644
> > --- a/gcc/config/arm/arm-modes.def
> > +++ b/gcc/config/arm/arm-modes.def
> > @@ -84,6 +84,14 @@ VECTOR_MODE (FLOAT, BF, 2);   /*
>  V2BF.  */
> >  VECTOR_MODE (FLOAT, BF, 4);   /*              V4BF.  */
> >  VECTOR_MODE (FLOAT, BF, 8);   /*              V8BF.  */
> >
> > +/* Predicates for MVE.  */
> > +BOOL_MODE (B2I, 2, 1);
> > +BOOL_MODE (B4I, 4, 1);
> > +
> > +VECTOR_BOOL_MODE (V16BI, 16, BI, 2);
> > +VECTOR_BOOL_MODE (V8BI, 8, B2I, 2);
> > +VECTOR_BOOL_MODE (V4BI, 4, B4I, 2);
> > +
> >  /* Fraction and accumulator vector modes.  */
> >  VECTOR_MODES (FRACT, 4);      /* V4QQ  V2HQ */
> >  VECTOR_MODES (UFRACT, 4);     /* V4UQQ V2UHQ */
> > diff --git a/gcc/config/arm/arm-simd-builtin-types.def
> > b/gcc/config/arm/arm-simd-builtin-types.def
> > index 6ba6f211531..920c2a68e4c 100644
> > --- a/gcc/config/arm/arm-simd-builtin-types.def
> > +++ b/gcc/config/arm/arm-simd-builtin-types.def
> > @@ -51,3 +51,7 @@
> >    ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
> >    ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
> >    ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
> > +
> > +  ENTRY (Pred1x16_t, V16BI, unsigned, 16, uint16, 21)
> > +  ENTRY (Pred2x8_t, V8BI, unsigned, 8, uint16, 21)
> > +  ENTRY (Pred4x4_t, V4BI, unsigned, 4, uint16, 21)
> > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > index feeee16d320..5f559f8fd93 100644
> > --- a/gcc/emit-rtl.c
> > +++ b/gcc/emit-rtl.c
> > @@ -6239,9 +6239,14 @@ init_emit_once (void)
> >
> >    /* For BImode, 1 and -1 are unsigned and signed interpretations
> >       of the same value.  */
> > -  const_tiny_rtx[0][(int) BImode] = const0_rtx;
> > -  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
> > -  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
> > +  for (mode = MIN_MODE_BOOL;
> > +       mode <= MAX_MODE_BOOL;
> > +       mode = (machine_mode)((int)(mode) + 1))
> > +    {
> > +      const_tiny_rtx[0][(int) mode] = const0_rtx;
> > +      const_tiny_rtx[1][(int) mode] = const_true_rtx;
> > +      const_tiny_rtx[3][(int) mode] = const_true_rtx;
> > +    }
> >
> >    for (mode = MIN_MODE_PARTIAL_INT;
> >         mode <= MAX_MODE_PARTIAL_INT;
> > @@ -6260,13 +6265,16 @@ init_emit_once (void)
> >        const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner,
> inner);
> >      }
> >
> > -  /* As for BImode, "all 1" and "all -1" are unsigned and signed
> > -     interpretations of the same value.  */
> >    FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
> >      {
> >        const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
> >        const_tiny_rtx[3][(int) mode] = gen_const_vector (mode, 3);
> > -      const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
> > +      if (GET_MODE_INNER (mode) == BImode)
> > +     /* As for BImode, "all 1" and "all -1" are unsigned and signed
> > +        interpretations of the same value.  */
> > +     const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
> > +      else
> > +     const_tiny_rtx[1][(int) mode] = gen_const_vector (mode, 1);
> >      }
> >
> >    FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
> > diff --git a/gcc/genmodes.c b/gcc/genmodes.c
> > index 6001b854547..0bb1a7c0b48 100644
> > --- a/gcc/genmodes.c
> > +++ b/gcc/genmodes.c
> > @@ -78,6 +78,7 @@ struct mode_data
> >    bool need_bytesize_adj;    /* true if this mode needs dynamic size
> >                                  adjustment */
> >    unsigned int int_n;                /* If nonzero, then __int<INT_N>
> will be
> > defined */
> > +  bool boolean;
> >  };
> >
> >  static struct mode_data *modes[MAX_MODE_CLASS];
> > @@ -88,7 +89,8 @@ static const struct mode_data blank_mode = {
> >    0, "<unknown>", MAX_MODE_CLASS,
> >    0, -1U, -1U, -1U, -1U,
> >    0, 0, 0, 0, 0, 0,
> > -  "<unknown>", 0, 0, 0, 0, false, false, 0
> > +  "<unknown>", 0, 0, 0, 0, false, false, 0,
> > +  false
> >  };
> >
> >  static htab_t modes_by_name;
> > @@ -456,7 +458,7 @@ make_complex_modes (enum mode_class cl,
> >        size_t m_len;
> >
> >        /* Skip BImode.  FIXME: BImode probably shouldn't be MODE_INT.  */
> > -      if (m->precision == 1)
> > +      if (m->boolean)
> >       continue;
> >
> >        m_len = strlen (m->name);
> > @@ -528,7 +530,7 @@ make_vector_modes (enum mode_class cl, const
> > char *prefix, unsigned int width,
> >        not be necessary.  */
> >        if (cl == MODE_FLOAT && m->bytesize == 1)
> >       continue;
> > -      if (cl == MODE_INT && m->precision == 1)
> > +      if (m->boolean)
> >       continue;
> >
> >        if ((size_t) snprintf (buf, sizeof buf, "%s%u%s", prefix,
> > @@ -548,17 +550,18 @@ make_vector_modes (enum mode_class cl, const
> > char *prefix, unsigned int width,
> >
> >  /* Create a vector of booleans called NAME with COUNT elements and
> >     BYTESIZE bytes in total.  */
> > -#define VECTOR_BOOL_MODE(NAME, COUNT, BYTESIZE) \
> > -  make_vector_bool_mode (#NAME, COUNT, BYTESIZE, __FILE__, __LINE__)
> > +#define VECTOR_BOOL_MODE(NAME, COUNT, COMPONENT, BYTESIZE)
> >               \
> > +  make_vector_bool_mode (#NAME, COUNT, #COMPONENT, BYTESIZE,
> >               \
> > +                      __FILE__, __LINE__)
> >  static void ATTRIBUTE_UNUSED
> >  make_vector_bool_mode (const char *name, unsigned int count,
> > -                    unsigned int bytesize, const char *file,
> > -                    unsigned int line)
> > +                    const char *component, unsigned int bytesize,
> > +                    const char *file, unsigned int line)
> >  {
> > -  struct mode_data *m = find_mode ("BI");
> > +  struct mode_data *m = find_mode (component);
> >    if (!m)
> >      {
> > -      error ("%s:%d: no mode \"BI\"", file, line);
> > +      error ("%s:%d: no mode \"%s\"", file, line, component);
> >        return;
> >      }
> >
> > @@ -596,6 +599,20 @@ make_int_mode (const char *name,
> >    m->precision = precision;
> >  }
> >
> > +#define BOOL_MODE(N, B, Y) \
> > +  make_bool_mode (#N, B, Y, __FILE__, __LINE__)
> > +
> > +static void
> > +make_bool_mode (const char *name,
> > +             unsigned int precision, unsigned int bytesize,
> > +             const char *file, unsigned int line)
> > +{
> > +  struct mode_data *m = new_mode (MODE_INT, name, file, line);
> > +  m->bytesize = bytesize;
> > +  m->precision = precision;
> > +  m->boolean = true;
> > +}
> > +
> >  #define OPAQUE_MODE(N, B)                    \
> >    make_opaque_mode (#N, -1U, B, __FILE__, __LINE__)
> >
> > @@ -1298,9 +1315,21 @@ enum machine_mode\n{");
> >        /* Don't use BImode for MIN_MODE_INT, since otherwise the middle
> >        end will try to use it for bitfields in structures and the
> >        like, which we do not want.  Only the target md file should
> > -      generate BImode widgets.  */
> > -      if (first && first->precision == 1 && c == MODE_INT)
> > -     first = first->next;
> > +      generate BImode widgets.  Since some targets such as ARM/MVE
> > +      define boolean modes with multiple bits, handle those too.  */
> > +      if (first && first->boolean)
> > +     {
> > +       struct mode_data *last_bool = first;
> > +       printf ("  MIN_MODE_BOOL = E_%smode,\n", first->name);
> > +
> > +       while (first && first->boolean)
> > +         {
> > +           last_bool = first;
> > +           first = first->next;
> > +         }
> > +
> > +       printf ("  MAX_MODE_BOOL = E_%smode,\n\n", last_bool->name);
> > +     }
> >
> >        if (first && last)
> >       printf ("  MIN_%s = E_%smode,\n  MAX_%s = E_%smode,\n\n",
> > @@ -1679,15 +1708,25 @@ emit_class_narrowest_mode (void)
> >    print_decl ("unsigned char", "class_narrowest_mode",
> > "MAX_MODE_CLASS");
> >
> >    for (c = 0; c < MAX_MODE_CLASS; c++)
> > -    /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> > -    tagged_printf ("MIN_%s", mode_class_names[c],
> > -                modes[c]
> > -                ? ((c != MODE_INT || modes[c]->precision != 1)
> > -                   ? modes[c]->name
> > -                   : (modes[c]->next
> > -                      ? modes[c]->next->name
> > -                      : void_mode->name))
> > -                : void_mode->name);
> > +    {
> > +      /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> > +      const char *comment_name = void_mode->name;
> > +
> > +      if (modes[c])
> > +     if (c != MODE_INT || !modes[c]->boolean)
> > +       comment_name = modes[c]->name;
> > +     else
> > +       {
> > +         struct mode_data *m = modes[c];
> > +         while (m->boolean)
> > +           m = m->next;
> > +         if (m)
> > +           comment_name = m->name;
> > +         else
> > +           comment_name = void_mode->name;
> > +       }
> > +      tagged_printf ("MIN_%s", mode_class_names[c], comment_name);
> > +    }
> >
> >    print_closer ();
> >  }
> > diff --git a/gcc/machmode.def b/gcc/machmode.def
> > index 866a2082d01..eb7905ea23d 100644
> > --- a/gcc/machmode.def
> > +++ b/gcc/machmode.def
> > @@ -196,7 +196,7 @@ RANDOM_MODE (VOID);
> >  RANDOM_MODE (BLK);
> >
> >  /* Single bit mode used for booleans.  */
> > -FRACTIONAL_INT_MODE (BI, 1, 1);
> > +BOOL_MODE (BI, 1, 1);
> >
> >  /* Basic integer modes.  We go up to TI in generic code (128 bits).
> >     TImode is needed here because the some front ends now genericly
> > diff --git a/gcc/rtx-vector-builder.c b/gcc/rtx-vector-builder.c
> > index e36aba010a0..55ffe0d5a76 100644
> > --- a/gcc/rtx-vector-builder.c
> > +++ b/gcc/rtx-vector-builder.c
> > @@ -90,8 +90,10 @@ rtx_vector_builder::find_cached_value ()
> >
> >    if (GET_MODE_CLASS (m_mode) == MODE_VECTOR_BOOL)
> >      {
> > -      if (elt == const1_rtx || elt == constm1_rtx)
> > +      if (elt == const1_rtx)
> >       return CONST1_RTX (m_mode);
> > +      else if (elt == constm1_rtx)
> > +     return CONSTM1_RTX (m_mode);
> >        else if (elt == const0_rtx)
> >       return CONST0_RTX (m_mode);
> >        else
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index c36c825f958..532537ea48d 100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -6876,12 +6876,13 @@ native_encode_rtx (machine_mode mode, rtx x,
> > vec<target_unit> &bytes,
> >         /* This is the only case in which elements can be smaller than
> >            a byte.  */
> >         gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> > +       auto mask = GET_MODE_MASK (GET_MODE_INNER (mode));
> >         for (unsigned int i = 0; i < num_bytes; ++i)
> >           {
> >             target_unit value = 0;
> >             for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits)
> >               {
> > -               value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & 1) << j;
> > +               value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) <<
> j;
> >                 elt += 1;
> >               }
> >             bytes.quick_push (value);
> > @@ -7025,9 +7026,8 @@ native_decode_vector_rtx (machine_mode mode,
> > const vec<target_unit> &bytes,
> >         unsigned int bit_index = first_byte * BITS_PER_UNIT + i *
> elt_bits;
> >         unsigned int byte_index = bit_index / BITS_PER_UNIT;
> >         unsigned int lsb = bit_index % BITS_PER_UNIT;
> > -       builder.quick_push (bytes[byte_index] & (1 << lsb)
> > -                           ? CONST1_RTX (BImode)
> > -                           : CONST0_RTX (BImode));
> > +       unsigned int value = bytes[byte_index] >> lsb;
> > +       builder.quick_push (gen_int_mode (value, GET_MODE_INNER
> > (mode)));
> >       }
> >      }
> >    else
> > @@ -7994,17 +7994,23 @@ test_vector_ops_duplicate (machine_mode
> > mode, rtx scalar_reg)
> >                                                   duplicate, last_par));
> >
> >        /* Test a scalar subreg of a VEC_MERGE of a VEC_DUPLICATE.  */
> > -      rtx vector_reg = make_test_reg (mode);
> > -      for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
> > +      /* Skip this test for vectors of booleans, because offset is in
> bytes,
> > +      while vec_merge indices are in elements (usually bits).  */
> > +      if (GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
> >       {
> > -       if (i >= HOST_BITS_PER_WIDE_INT)
> > -         break;
> > -       rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
> > -       rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
> > -       poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
> > -       ASSERT_RTX_EQ (scalar_reg,
> > -                      simplify_gen_subreg (inner_mode, vm,
> > -                                           mode, offset));
> > +       rtx vector_reg = make_test_reg (mode);
> > +       for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
> > +         {
> > +           if (i >= HOST_BITS_PER_WIDE_INT)
> > +             break;
> > +           rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
> > +           rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg,
> > mask);
> > +           poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
> > +
> > +           ASSERT_RTX_EQ (scalar_reg,
> > +                          simplify_gen_subreg (inner_mode, vm,
> > +                                               mode, offset));
> > +         }
> >       }
> >      }
> >
> > diff --git a/gcc/varasm.c b/gcc/varasm.c
> > index 76574be191f..5f59b6ace15 100644
> > --- a/gcc/varasm.c
> > +++ b/gcc/varasm.c
> > @@ -4085,6 +4085,7 @@ output_constant_pool_2 (fixed_size_mode mode,
> > rtx x, unsigned int align)
> >       unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
> >       unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
> >       scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require
> > ();
> > +     unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode));
> >
> >       /* Build the constant up one integer at a time.  */
> >       unsigned int elts_per_int = int_bits / elt_bits;
> > @@ -4093,8 +4094,10 @@ output_constant_pool_2 (fixed_size_mode
> > mode, rtx x, unsigned int align)
> >           unsigned HOST_WIDE_INT value = 0;
> >           unsigned int limit = MIN (nelts - i, elts_per_int);
> >           for (unsigned int j = 0; j < limit; ++j)
> > -           if (INTVAL (CONST_VECTOR_ELT (x, i + j)) != 0)
> > -             value |= 1 << (j * elt_bits);
> > +         {
> > +           auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j));
> > +           value |= (elt & mask) << (j * elt_bits);
> > +         }
> >           output_constant_pool_2 (int_mode, gen_int_mode (value,
> > int_mode),
> >                                   i != 0 ? MIN (align, int_bits) :
> align);
> >         }
> > --
> > 2.25.1
>
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-01-13 14:56 ` [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans Christophe Lyon
  2022-01-21 11:20   ` Andre Vieira (lists)
  2022-01-27 16:28   ` Kyrylo Tkachov
@ 2022-01-31 18:01   ` Richard Sandiford
  2022-01-31 22:57     ` Christophe Lyon
  2 siblings, 1 reply; 54+ messages in thread
From: Richard Sandiford @ 2022-01-31 18:01 UTC (permalink / raw)
  To: Christophe Lyon via Gcc-patches; +Cc: Christophe Lyon

Sorry for the slow response, was out last week.

Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index feeee16d320..5f559f8fd93 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -6239,9 +6239,14 @@ init_emit_once (void)
>  
>    /* For BImode, 1 and -1 are unsigned and signed interpretations
>       of the same value.  */
> -  const_tiny_rtx[0][(int) BImode] = const0_rtx;
> -  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
> -  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
> +  for (mode = MIN_MODE_BOOL;
> +       mode <= MAX_MODE_BOOL;
> +       mode = (machine_mode)((int)(mode) + 1))
> +    {
> +      const_tiny_rtx[0][(int) mode] = const0_rtx;
> +      const_tiny_rtx[1][(int) mode] = const_true_rtx;
> +      const_tiny_rtx[3][(int) mode] = const_true_rtx;
> +    }
>  
>    for (mode = MIN_MODE_PARTIAL_INT;
>         mode <= MAX_MODE_PARTIAL_INT;

Does this do the right thing for:

  gen_int_mode (-1, B2Imode)

(which is used e.g. in native_decode_vector_rtx)?  It looks like it
would give 0b01 rather than 0b11.

Maybe for non-BImode we should use const1_rtx and constm1_rtx, like with
MODE_INT.

> @@ -1298,9 +1315,21 @@ enum machine_mode\n{");
>        /* Don't use BImode for MIN_MODE_INT, since otherwise the middle
>  	 end will try to use it for bitfields in structures and the
>  	 like, which we do not want.  Only the target md file should
> -	 generate BImode widgets.  */
> -      if (first && first->precision == 1 && c == MODE_INT)
> -	first = first->next;
> +	 generate BImode widgets.  Since some targets such as ARM/MVE
> +	 define boolean modes with multiple bits, handle those too.  */
> +      if (first && first->boolean)
> +	{
> +	  struct mode_data *last_bool = first;
> +	  printf ("  MIN_MODE_BOOL = E_%smode,\n", first->name);
> +
> +	  while (first && first->boolean)
> +	    {
> +	      last_bool = first;
> +	      first = first->next;
> +	    }
> +
> +	  printf ("  MAX_MODE_BOOL = E_%smode,\n\n", last_bool->name);
> +	}
>  
>        if (first && last)
>  	printf ("  MIN_%s = E_%smode,\n  MAX_%s = E_%smode,\n\n",

For the record: this means that MIN_MODE_BOOL and MAX_MODE_BOOL are
in principle only conditionally available, whereas:

   /* For BImode, 1 and -1 are unsigned and signed interpretations
      of the same value.  */
-  const_tiny_rtx[0][(int) BImode] = const0_rtx;
-  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
-  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
+  for (mode = MIN_MODE_BOOL;
+       mode <= MAX_MODE_BOOL;
+       mode = (machine_mode)((int)(mode) + 1))
+    {
+      const_tiny_rtx[0][(int) mode] = const0_rtx;
+      const_tiny_rtx[1][(int) mode] = const_true_rtx;
+      const_tiny_rtx[3][(int) mode] = const_true_rtx;
+    }

assumes that they are unconditionally available.  In some ways it
might be clearer if we assert that first->boolean is true and
emit the MIN/MAX stuff unconditionally.

However, that would make the generator less robust against malformed
input, and it would probably be inconsistent with the current generator
code, so I agree that the patch's version is better on balance.

> @@ -1679,15 +1708,25 @@ emit_class_narrowest_mode (void)
>    print_decl ("unsigned char", "class_narrowest_mode", "MAX_MODE_CLASS");
>  
>    for (c = 0; c < MAX_MODE_CLASS; c++)
> -    /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> -    tagged_printf ("MIN_%s", mode_class_names[c],
> -		   modes[c]
> -		   ? ((c != MODE_INT || modes[c]->precision != 1)
> -		      ? modes[c]->name
> -		      : (modes[c]->next
> -			 ? modes[c]->next->name
> -			 : void_mode->name))
> -		   : void_mode->name);
> +    {
> +      /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> +      const char *comment_name = void_mode->name;
> +
> +      if (modes[c])
> +	if (c != MODE_INT || !modes[c]->boolean)
> +	  comment_name = modes[c]->name;
> +	else
> +	  {
> +	    struct mode_data *m = modes[c];
> +	    while (m->boolean)
> +	      m = m->next;
> +	    if (m)
> +	      comment_name = m->name;
> +	    else
> +	      comment_name = void_mode->name;
> +	  }

Have you tried bootstrapping the patch on a host of your choice?
I would expect a warning/Werror about an ambiguous else here.

I guess this reduces to:

    struct mode_data *m = modes[c];
    while (m && m->boolean)
      m = m->next;
    const char *comment_name = (m ? m : void_mode)->name;

but I don't know if that's more readable.

LGTM otherwise.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-01-31 18:01   ` Richard Sandiford
@ 2022-01-31 22:57     ` Christophe Lyon
  2022-02-01  3:42       ` Richard Sandiford
  0 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-01-31 22:57 UTC (permalink / raw)
  To: Richard Sandiford, Christophe Lyon via Gcc-patches,
	Andre Simoes Dias Vieira

On Mon, Jan 31, 2022 at 7:01 PM Richard Sandiford via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Sorry for the slow response, was out last week.
>
> Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > index feeee16d320..5f559f8fd93 100644
> > --- a/gcc/emit-rtl.c
> > +++ b/gcc/emit-rtl.c
> > @@ -6239,9 +6239,14 @@ init_emit_once (void)
> >
> >    /* For BImode, 1 and -1 are unsigned and signed interpretations
> >       of the same value.  */
> > -  const_tiny_rtx[0][(int) BImode] = const0_rtx;
> > -  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
> > -  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
> > +  for (mode = MIN_MODE_BOOL;
> > +       mode <= MAX_MODE_BOOL;
> > +       mode = (machine_mode)((int)(mode) + 1))
> > +    {
> > +      const_tiny_rtx[0][(int) mode] = const0_rtx;
> > +      const_tiny_rtx[1][(int) mode] = const_true_rtx;
> > +      const_tiny_rtx[3][(int) mode] = const_true_rtx;
> > +    }
> >
> >    for (mode = MIN_MODE_PARTIAL_INT;
> >         mode <= MAX_MODE_PARTIAL_INT;
>
> Does this do the right thing for:
>
>   gen_int_mode (-1, B2Imode)
>
> (which is used e.g. in native_decode_vector_rtx)?  It looks like it
> would give 0b01 rather than 0b11.
>
> Maybe for non-BImode we should use const1_rtx and constm1_rtx, like with
> MODE_INT.
>

debug_rtx ( gen_int_mode (-1, B2Imode) says:
(const_int -1 [0xffffffffffffffff])
so that looks right?


> > @@ -1298,9 +1315,21 @@ enum machine_mode\n{");
> >        /* Don't use BImode for MIN_MODE_INT, since otherwise the middle
> >        end will try to use it for bitfields in structures and the
> >        like, which we do not want.  Only the target md file should
> > -      generate BImode widgets.  */
> > -      if (first && first->precision == 1 && c == MODE_INT)
> > -     first = first->next;
> > +      generate BImode widgets.  Since some targets such as ARM/MVE
> > +      define boolean modes with multiple bits, handle those too.  */
> > +      if (first && first->boolean)
> > +     {
> > +       struct mode_data *last_bool = first;
> > +       printf ("  MIN_MODE_BOOL = E_%smode,\n", first->name);
> > +
> > +       while (first && first->boolean)
> > +         {
> > +           last_bool = first;
> > +           first = first->next;
> > +         }
> > +
> > +       printf ("  MAX_MODE_BOOL = E_%smode,\n\n", last_bool->name);
> > +     }
> >
> >        if (first && last)
> >       printf ("  MIN_%s = E_%smode,\n  MAX_%s = E_%smode,\n\n",
>
> For the record: this means that MIN_MODE_BOOL and MAX_MODE_BOOL are
> in principle only conditionally available, whereas:
>
>    /* For BImode, 1 and -1 are unsigned and signed interpretations
>       of the same value.  */
> -  const_tiny_rtx[0][(int) BImode] = const0_rtx;
> -  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
> -  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
> +  for (mode = MIN_MODE_BOOL;
> +       mode <= MAX_MODE_BOOL;
> +       mode = (machine_mode)((int)(mode) + 1))
> +    {
> +      const_tiny_rtx[0][(int) mode] = const0_rtx;
> +      const_tiny_rtx[1][(int) mode] = const_true_rtx;
> +      const_tiny_rtx[3][(int) mode] = const_true_rtx;
> +    }
>
> assumes that they are unconditionally available.  In some ways it
> might be clearer if we assert that first->boolean is true and
> emit the MIN/MAX stuff unconditionally.
>
> However, that would make the generator less robust against malformed
> input, and it would probably be inconsistent with the current generator
> code, so I agree that the patch's version is better on balance.
>
ack


>
> > @@ -1679,15 +1708,25 @@ emit_class_narrowest_mode (void)
> >    print_decl ("unsigned char", "class_narrowest_mode",
> "MAX_MODE_CLASS");
> >
> >    for (c = 0; c < MAX_MODE_CLASS; c++)
> > -    /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> > -    tagged_printf ("MIN_%s", mode_class_names[c],
> > -                modes[c]
> > -                ? ((c != MODE_INT || modes[c]->precision != 1)
> > -                   ? modes[c]->name
> > -                   : (modes[c]->next
> > -                      ? modes[c]->next->name
> > -                      : void_mode->name))
> > -                : void_mode->name);
> > +    {
> > +      /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> > +      const char *comment_name = void_mode->name;
> > +
> > +      if (modes[c])
> > +     if (c != MODE_INT || !modes[c]->boolean)
> > +       comment_name = modes[c]->name;
> > +     else
> > +       {
> > +         struct mode_data *m = modes[c];
> > +         while (m->boolean)
> > +           m = m->next;
> > +         if (m)
> > +           comment_name = m->name;
> > +         else
> > +           comment_name = void_mode->name;
> > +       }
>
> Have you tried bootstrapping the patch on a host of your choice?
> I would expect a warning/Werror about an ambiguous else here.
>
No I hadn't and indeed the build fails

>
> I guess this reduces to:
>
>     struct mode_data *m = modes[c];
>     while (m && m->boolean)
>       m = m->next;
>     const char *comment_name = (m ? m : void_mode)->name;
>
> but I don't know if that's more readable.
>
but to my understanding the problem is that the ambiguous else
is the first one, and the code should read:
 if (modes[c])
+      {
        if (c != MODE_INT || !modes[c]->boolean)
          comment_name = modes[c]->name;
        else
          {
            struct mode_data *m = modes[c];
            while (m->boolean)
              m = m->next;
            if (m)
              comment_name = m->name;
            else
              comment_name = void_mode->name;
          }
 +    }

LGTM otherwise.
>
Thanks.

Andre, what about you? Did you try my suggestion to use
 ENTRY (Pred1x16_t, V16BI, predicate, 16, pred1, 21)
ENTRY (Pred2x8_t, V8BI, predicate, 8, pred1, 21)
ENTRY (Pred4x4_t, V4BI, predicate, 4, pred1, 21)

Does that work for you?

Christophe


> Thanks,
> Richard
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-01-31 22:57     ` Christophe Lyon
@ 2022-02-01  3:42       ` Richard Sandiford
  2022-02-02 16:51         ` Christophe Lyon
  0 siblings, 1 reply; 54+ messages in thread
From: Richard Sandiford @ 2022-02-01  3:42 UTC (permalink / raw)
  To: Christophe Lyon via Gcc-patches

Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Mon, Jan 31, 2022 at 7:01 PM Richard Sandiford via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
>> Sorry for the slow response, was out last week.
>>
>> Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
>> > index feeee16d320..5f559f8fd93 100644
>> > --- a/gcc/emit-rtl.c
>> > +++ b/gcc/emit-rtl.c
>> > @@ -6239,9 +6239,14 @@ init_emit_once (void)
>> >
>> >    /* For BImode, 1 and -1 are unsigned and signed interpretations
>> >       of the same value.  */
>> > -  const_tiny_rtx[0][(int) BImode] = const0_rtx;
>> > -  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
>> > -  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
>> > +  for (mode = MIN_MODE_BOOL;
>> > +       mode <= MAX_MODE_BOOL;
>> > +       mode = (machine_mode)((int)(mode) + 1))
>> > +    {
>> > +      const_tiny_rtx[0][(int) mode] = const0_rtx;
>> > +      const_tiny_rtx[1][(int) mode] = const_true_rtx;
>> > +      const_tiny_rtx[3][(int) mode] = const_true_rtx;
>> > +    }
>> >
>> >    for (mode = MIN_MODE_PARTIAL_INT;
>> >         mode <= MAX_MODE_PARTIAL_INT;
>>
>> Does this do the right thing for:
>>
>>   gen_int_mode (-1, B2Imode)
>>
>> (which is used e.g. in native_decode_vector_rtx)?  It looks like it
>> would give 0b01 rather than 0b11.
>>
>> Maybe for non-BImode we should use const1_rtx and constm1_rtx, like with
>> MODE_INT.
>>
>
> debug_rtx ( gen_int_mode (-1, B2Imode) says:
> (const_int -1 [0xffffffffffffffff])
> so that looks right?

Ah, right, I forgot that the mode is unused for the small constant lookup.
But it looks like CONSTM1_RTX (B2Imode) would be (const_int 1) instead,
even though the two should be equal.

>> > @@ -1679,15 +1708,25 @@ emit_class_narrowest_mode (void)
>> >    print_decl ("unsigned char", "class_narrowest_mode",
>> "MAX_MODE_CLASS");
>> >
>> >    for (c = 0; c < MAX_MODE_CLASS; c++)
>> > -    /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
>> > -    tagged_printf ("MIN_%s", mode_class_names[c],
>> > -                modes[c]
>> > -                ? ((c != MODE_INT || modes[c]->precision != 1)
>> > -                   ? modes[c]->name
>> > -                   : (modes[c]->next
>> > -                      ? modes[c]->next->name
>> > -                      : void_mode->name))
>> > -                : void_mode->name);
>> > +    {
>> > +      /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
>> > +      const char *comment_name = void_mode->name;
>> > +
>> > +      if (modes[c])
>> > +     if (c != MODE_INT || !modes[c]->boolean)
>> > +       comment_name = modes[c]->name;
>> > +     else
>> > +       {
>> > +         struct mode_data *m = modes[c];
>> > +         while (m->boolean)
>> > +           m = m->next;
>> > +         if (m)
>> > +           comment_name = m->name;
>> > +         else
>> > +           comment_name = void_mode->name;
>> > +       }
>>
>> Have you tried bootstrapping the patch on a host of your choice?
>> I would expect a warning/Werror about an ambiguous else here.
>>
> No I hadn't and indeed the build fails
>
>>
>> I guess this reduces to:
>>
>>     struct mode_data *m = modes[c];
>>     while (m && m->boolean)
>>       m = m->next;
>>     const char *comment_name = (m ? m : void_mode)->name;
>>
>> but I don't know if that's more readable.
>>
> but to my understanding the problem is that the ambiguous else
> is the first one, and the code should read:
>  if (modes[c])
> +      {
>         if (c != MODE_INT || !modes[c]->boolean)
>           comment_name = modes[c]->name;
>         else
>           {
>             struct mode_data *m = modes[c];
>             while (m->boolean)
>               m = m->next;
>             if (m)
>               comment_name = m->name;
>             else
>               comment_name = void_mode->name;
>           }
>  +    }

Yeah.  I just meant that the alternative loop was probably simpler,
as a replacement for the outer “if”.

It looks like that the outer “if” is effectively a peeled iteration of
the while loop in the outer “else”.  And the “c != MODE_INT” part ought
to be redundant: as it stands, the boolean modes don't belong to any class.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-02-01  3:42       ` Richard Sandiford
@ 2022-02-02 16:51         ` Christophe Lyon
  2022-02-04  9:42           ` Richard Sandiford
  0 siblings, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-02-02 16:51 UTC (permalink / raw)
  To: Christophe Lyon via Gcc-patches, Andre Simoes Dias Vieira,
	Christophe Lyon, Richard Sandiford

On Tue, Feb 1, 2022 at 4:42 AM Richard Sandiford <richard.sandiford@arm.com>
wrote:

> Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Mon, Jan 31, 2022 at 7:01 PM Richard Sandiford via Gcc-patches <
> > gcc-patches@gcc.gnu.org> wrote:
> >
> >> Sorry for the slow response, was out last week.
> >>
> >> Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> >> > index feeee16d320..5f559f8fd93 100644
> >> > --- a/gcc/emit-rtl.c
> >> > +++ b/gcc/emit-rtl.c
> >> > @@ -6239,9 +6239,14 @@ init_emit_once (void)
> >> >
> >> >    /* For BImode, 1 and -1 are unsigned and signed interpretations
> >> >       of the same value.  */
> >> > -  const_tiny_rtx[0][(int) BImode] = const0_rtx;
> >> > -  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
> >> > -  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
> >> > +  for (mode = MIN_MODE_BOOL;
> >> > +       mode <= MAX_MODE_BOOL;
> >> > +       mode = (machine_mode)((int)(mode) + 1))
> >> > +    {
> >> > +      const_tiny_rtx[0][(int) mode] = const0_rtx;
> >> > +      const_tiny_rtx[1][(int) mode] = const_true_rtx;
> >> > +      const_tiny_rtx[3][(int) mode] = const_true_rtx;
> >> > +    }
> >> >
> >> >    for (mode = MIN_MODE_PARTIAL_INT;
> >> >         mode <= MAX_MODE_PARTIAL_INT;
> >>
> >> Does this do the right thing for:
> >>
> >>   gen_int_mode (-1, B2Imode)
> >>
> >> (which is used e.g. in native_decode_vector_rtx)?  It looks like it
> >> would give 0b01 rather than 0b11.
> >>
> >> Maybe for non-BImode we should use const1_rtx and constm1_rtx, like with
> >> MODE_INT.
> >>
> >
> > debug_rtx ( gen_int_mode (-1, B2Imode) says:
> > (const_int -1 [0xffffffffffffffff])
> > so that looks right?
>
> Ah, right, I forgot that the mode is unused for the small constant lookup.
> But it looks like CONSTM1_RTX (B2Imode) would be (const_int 1) instead,
> even though the two should be equal.
>

Indeed!

So I changed the above loop into:
   /* For BImode, 1 and -1 are unsigned and signed interpretations
     of the same value.  */
  for (mode = MIN_MODE_BOOL;
       mode <= MAX_MODE_BOOL;
       mode = (machine_mode)((int)(mode) + 1))
    {
      const_tiny_rtx[0][(int) mode] = const0_rtx;
      const_tiny_rtx[1][(int) mode] = const_true_rtx;
-      const_tiny_rtx[3][(int) mode] = const_true_rtx;
+      const_tiny_rtx[3][(int) mode] = constm1_rtx;
    }
which works, both constants are now equal and the validation still passes.



> >> > @@ -1679,15 +1708,25 @@ emit_class_narrowest_mode (void)
> >> >    print_decl ("unsigned char", "class_narrowest_mode",
> >> "MAX_MODE_CLASS");
> >> >
> >> >    for (c = 0; c < MAX_MODE_CLASS; c++)
> >> > -    /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> >> > -    tagged_printf ("MIN_%s", mode_class_names[c],
> >> > -                modes[c]
> >> > -                ? ((c != MODE_INT || modes[c]->precision != 1)
> >> > -                   ? modes[c]->name
> >> > -                   : (modes[c]->next
> >> > -                      ? modes[c]->next->name
> >> > -                      : void_mode->name))
> >> > -                : void_mode->name);
> >> > +    {
> >> > +      /* Bleah, all this to get the comment right for MIN_MODE_INT.
> */
> >> > +      const char *comment_name = void_mode->name;
> >> > +
> >> > +      if (modes[c])
> >> > +     if (c != MODE_INT || !modes[c]->boolean)
> >> > +       comment_name = modes[c]->name;
> >> > +     else
> >> > +       {
> >> > +         struct mode_data *m = modes[c];
> >> > +         while (m->boolean)
> >> > +           m = m->next;
> >> > +         if (m)
> >> > +           comment_name = m->name;
> >> > +         else
> >> > +           comment_name = void_mode->name;
> >> > +       }
> >>
> >> Have you tried bootstrapping the patch on a host of your choice?
> >> I would expect a warning/Werror about an ambiguous else here.
> >>
> > No I hadn't and indeed the build fails
> >
> >>
> >> I guess this reduces to:
> >>
> >>     struct mode_data *m = modes[c];
> >>     while (m && m->boolean)
> >>       m = m->next;
> >>     const char *comment_name = (m ? m : void_mode)->name;
> >>
> >> but I don't know if that's more readable.
> >>
> > but to my understanding the problem is that the ambiguous else
> > is the first one, and the code should read:
> >  if (modes[c])
> > +      {
> >         if (c != MODE_INT || !modes[c]->boolean)
> >           comment_name = modes[c]->name;
> >         else
> >           {
> >             struct mode_data *m = modes[c];
> >             while (m->boolean)
> >               m = m->next;
> >             if (m)
> >               comment_name = m->name;
> >             else
> >               comment_name = void_mode->name;
> >           }
> >  +    }
>
> Yeah.  I just meant that the alternative loop was probably simpler,
> as a replacement for the outer “if”.
>
> It looks like that the outer “if” is effectively a peeled iteration of
> the while loop in the outer “else”.  And the “c != MODE_INT” part ought
> to be redundant: as it stands, the boolean modes don't belong to any class.
>
> Ack, I have now:
   for (c = 0; c < MAX_MODE_CLASS; c++)
    {
      /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
      struct mode_data *m = modes[c];
      while (m && m->boolean)
        m = m->next;
      const char *comment_name = (m ? m : void_mode)->name;

      tagged_printf ("MIN_%s", mode_class_names[c], comment_name);
    }


Andre, any chance you tried the suggestion of:
ENTRY (Pred1x16_t, V16BI, predicate, 16, pred1, 21)
ENTRY (Pred2x8_t, V8BI, predicate, 8, pred1, 21)
ENTRY (Pred4x4_t, V4BI, predicate, 4, pred1, 21)


Thanks,
Christophe




> Thanks,
> Richard
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-02-02 16:51         ` Christophe Lyon
@ 2022-02-04  9:42           ` Richard Sandiford
  2022-02-04  9:54             ` Richard Sandiford
  2022-02-17 15:39             ` Christophe Lyon
  0 siblings, 2 replies; 54+ messages in thread
From: Richard Sandiford @ 2022-02-04  9:42 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Christophe Lyon via Gcc-patches, Andre Simoes Dias Vieira

Christophe Lyon <christophe.lyon.oss@gmail.com> writes:
> On Tue, Feb 1, 2022 at 4:42 AM Richard Sandiford <richard.sandiford@arm.com>
> wrote:
>
>> Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > On Mon, Jan 31, 2022 at 7:01 PM Richard Sandiford via Gcc-patches <
>> > gcc-patches@gcc.gnu.org> wrote:
>> >
>> >> Sorry for the slow response, was out last week.
>> >>
>> >> Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> >> > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
>> >> > index feeee16d320..5f559f8fd93 100644
>> >> > --- a/gcc/emit-rtl.c
>> >> > +++ b/gcc/emit-rtl.c
>> >> > @@ -6239,9 +6239,14 @@ init_emit_once (void)
>> >> >
>> >> >    /* For BImode, 1 and -1 are unsigned and signed interpretations
>> >> >       of the same value.  */
>> >> > -  const_tiny_rtx[0][(int) BImode] = const0_rtx;
>> >> > -  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
>> >> > -  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
>> >> > +  for (mode = MIN_MODE_BOOL;
>> >> > +       mode <= MAX_MODE_BOOL;
>> >> > +       mode = (machine_mode)((int)(mode) + 1))
>> >> > +    {
>> >> > +      const_tiny_rtx[0][(int) mode] = const0_rtx;
>> >> > +      const_tiny_rtx[1][(int) mode] = const_true_rtx;
>> >> > +      const_tiny_rtx[3][(int) mode] = const_true_rtx;
>> >> > +    }
>> >> >
>> >> >    for (mode = MIN_MODE_PARTIAL_INT;
>> >> >         mode <= MAX_MODE_PARTIAL_INT;
>> >>
>> >> Does this do the right thing for:
>> >>
>> >>   gen_int_mode (-1, B2Imode)
>> >>
>> >> (which is used e.g. in native_decode_vector_rtx)?  It looks like it
>> >> would give 0b01 rather than 0b11.
>> >>
>> >> Maybe for non-BImode we should use const1_rtx and constm1_rtx, like with
>> >> MODE_INT.
>> >>
>> >
>> > debug_rtx ( gen_int_mode (-1, B2Imode) says:
>> > (const_int -1 [0xffffffffffffffff])
>> > so that looks right?
>>
>> Ah, right, I forgot that the mode is unused for the small constant lookup.
>> But it looks like CONSTM1_RTX (B2Imode) would be (const_int 1) instead,
>> even though the two should be equal.
>>
>
> Indeed!
>
> So I changed the above loop into:
>    /* For BImode, 1 and -1 are unsigned and signed interpretations
>      of the same value.  */
>   for (mode = MIN_MODE_BOOL;
>        mode <= MAX_MODE_BOOL;
>        mode = (machine_mode)((int)(mode) + 1))
>     {
>       const_tiny_rtx[0][(int) mode] = const0_rtx;
>       const_tiny_rtx[1][(int) mode] = const_true_rtx;
> -      const_tiny_rtx[3][(int) mode] = const_true_rtx;
> +      const_tiny_rtx[3][(int) mode] = constm1_rtx;
>     }
> which works, both constants are now equal and the validation still passes.

I think we need to keep const_true_rtx for both [BImode][1] and [BImode][3].
BImode is an awkward special case in that the (only) nonzero value must be
exactly STORE_FLAG_VALUE, even if that leads to an otherwise non-canonical
const_int representation.

For the multi-bit booleans, [1] needs to be const1_rtx rather than
const_true_rtx in case STORE_FLAG_VALUE != 1.

>> >> > @@ -1679,15 +1708,25 @@ emit_class_narrowest_mode (void)
>> >> >    print_decl ("unsigned char", "class_narrowest_mode",
>> >> "MAX_MODE_CLASS");
>> >> >
>> >> >    for (c = 0; c < MAX_MODE_CLASS; c++)
>> >> > -    /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
>> >> > -    tagged_printf ("MIN_%s", mode_class_names[c],
>> >> > -                modes[c]
>> >> > -                ? ((c != MODE_INT || modes[c]->precision != 1)
>> >> > -                   ? modes[c]->name
>> >> > -                   : (modes[c]->next
>> >> > -                      ? modes[c]->next->name
>> >> > -                      : void_mode->name))
>> >> > -                : void_mode->name);
>> >> > +    {
>> >> > +      /* Bleah, all this to get the comment right for MIN_MODE_INT.
>> */
>> >> > +      const char *comment_name = void_mode->name;
>> >> > +
>> >> > +      if (modes[c])
>> >> > +     if (c != MODE_INT || !modes[c]->boolean)
>> >> > +       comment_name = modes[c]->name;
>> >> > +     else
>> >> > +       {
>> >> > +         struct mode_data *m = modes[c];
>> >> > +         while (m->boolean)
>> >> > +           m = m->next;
>> >> > +         if (m)
>> >> > +           comment_name = m->name;
>> >> > +         else
>> >> > +           comment_name = void_mode->name;
>> >> > +       }
>> >>
>> >> Have you tried bootstrapping the patch on a host of your choice?
>> >> I would expect a warning/Werror about an ambiguous else here.
>> >>
>> > No I hadn't and indeed the build fails
>> >
>> >>
>> >> I guess this reduces to:
>> >>
>> >>     struct mode_data *m = modes[c];
>> >>     while (m && m->boolean)
>> >>       m = m->next;
>> >>     const char *comment_name = (m ? m : void_mode)->name;
>> >>
>> >> but I don't know if that's more readable.
>> >>
>> > but to my understanding the problem is that the ambiguous else
>> > is the first one, and the code should read:
>> >  if (modes[c])
>> > +      {
>> >         if (c != MODE_INT || !modes[c]->boolean)
>> >           comment_name = modes[c]->name;
>> >         else
>> >           {
>> >             struct mode_data *m = modes[c];
>> >             while (m->boolean)
>> >               m = m->next;
>> >             if (m)
>> >               comment_name = m->name;
>> >             else
>> >               comment_name = void_mode->name;
>> >           }
>> >  +    }
>>
>> Yeah.  I just meant that the alternative loop was probably simpler,
>> as a replacement for the outer “if”.
>>
>> It looks like that the outer “if” is effectively a peeled iteration of
>> the while loop in the outer “else”.  And the “c != MODE_INT” part ought
>> to be redundant: as it stands, the boolean modes don't belong to any class.
>>
>> Ack, I have now:
>    for (c = 0; c < MAX_MODE_CLASS; c++)
>     {
>       /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
>       struct mode_data *m = modes[c];
>       while (m && m->boolean)
>         m = m->next;
>       const char *comment_name = (m ? m : void_mode)->name;
>
>       tagged_printf ("MIN_%s", mode_class_names[c], comment_name);
>     }
>
>
> Andre, any chance you tried the suggestion of:
> ENTRY (Pred1x16_t, V16BI, predicate, 16, pred1, 21)
> ENTRY (Pred2x8_t, V8BI, predicate, 8, pred1, 21)
> ENTRY (Pred4x4_t, V4BI, predicate, 4, pred1, 21)

BTW: the final argument should be the length of the __simd<N>_<elt>_t
type name (for mangling purposes).  It looks like the existing 32-bit
and 64-bit bfloat entries also get this wrong.

But as far as Andre's point goes: I think we need to construct
a boolean type explicitly, using build_truth_vector_type_for_mode
or truth_type_for.  Although the entries above specify the correct mode
(V16BI, etc.), the mode is really a function of the type tree properties,
rather than the other way round.

The main thing that makes truth vector types special is that those
types are the only ones that allow multiple elements in the same byte.
A “normal” 16-byte vector created by build_vector_type(_for_mode)
cannot be smaller than 16 bytes.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-02-04  9:42           ` Richard Sandiford
@ 2022-02-04  9:54             ` Richard Sandiford
  2022-02-17 15:39             ` Christophe Lyon
  1 sibling, 0 replies; 54+ messages in thread
From: Richard Sandiford @ 2022-02-04  9:54 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Christophe Lyon via Gcc-patches, Andre Simoes Dias Vieira

Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> The main thing that makes truth vector types special is that those
> types are the only ones that allow multiple elements in the same byte.
> A “normal” 16-byte vector created by build_vector_type(_for_mode)
> cannot be smaller than 16 bytes.

Er, of course I meant “16-element vector created by...”.  16-byte
vectors that are smaller than 16 bytes would indeed be a problem.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-02-04  9:42           ` Richard Sandiford
  2022-02-04  9:54             ` Richard Sandiford
@ 2022-02-17 15:39             ` Christophe Lyon
  2022-02-21 18:18               ` Richard Sandiford
  1 sibling, 1 reply; 54+ messages in thread
From: Christophe Lyon @ 2022-02-17 15:39 UTC (permalink / raw)
  To: Christophe Lyon, Christophe Lyon via Gcc-patches,
	Andre Simoes Dias Vieira, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 7930 bytes --]

Hi,

On Fri, Feb 4, 2022 at 10:43 AM Richard Sandiford <richard.sandiford@arm.com>
wrote:

> Christophe Lyon <christophe.lyon.oss@gmail.com> writes:
> > On Tue, Feb 1, 2022 at 4:42 AM Richard Sandiford <
> richard.sandiford@arm.com>
> > wrote:
> >
> >> Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > On Mon, Jan 31, 2022 at 7:01 PM Richard Sandiford via Gcc-patches <
> >> > gcc-patches@gcc.gnu.org> wrote:
> >> >
> >> >> Sorry for the slow response, was out last week.
> >> >>
> >> >> Christophe Lyon via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> >> > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> >> >> > index feeee16d320..5f559f8fd93 100644
> >> >> > --- a/gcc/emit-rtl.c
> >> >> > +++ b/gcc/emit-rtl.c
> >> >> > @@ -6239,9 +6239,14 @@ init_emit_once (void)
> >> >> >
> >> >> >    /* For BImode, 1 and -1 are unsigned and signed interpretations
> >> >> >       of the same value.  */
> >> >> > -  const_tiny_rtx[0][(int) BImode] = const0_rtx;
> >> >> > -  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
> >> >> > -  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
> >> >> > +  for (mode = MIN_MODE_BOOL;
> >> >> > +       mode <= MAX_MODE_BOOL;
> >> >> > +       mode = (machine_mode)((int)(mode) + 1))
> >> >> > +    {
> >> >> > +      const_tiny_rtx[0][(int) mode] = const0_rtx;
> >> >> > +      const_tiny_rtx[1][(int) mode] = const_true_rtx;
> >> >> > +      const_tiny_rtx[3][(int) mode] = const_true_rtx;
> >> >> > +    }
> >> >> >
> >> >> >    for (mode = MIN_MODE_PARTIAL_INT;
> >> >> >         mode <= MAX_MODE_PARTIAL_INT;
> >> >>
> >> >> Does this do the right thing for:
> >> >>
> >> >>   gen_int_mode (-1, B2Imode)
> >> >>
> >> >> (which is used e.g. in native_decode_vector_rtx)?  It looks like it
> >> >> would give 0b01 rather than 0b11.
> >> >>
> >> >> Maybe for non-BImode we should use const1_rtx and constm1_rtx, like
> with
> >> >> MODE_INT.
> >> >>
> >> >
> >> > debug_rtx ( gen_int_mode (-1, B2Imode) says:
> >> > (const_int -1 [0xffffffffffffffff])
> >> > so that looks right?
> >>
> >> Ah, right, I forgot that the mode is unused for the small constant
> lookup.
> >> But it looks like CONSTM1_RTX (B2Imode) would be (const_int 1) instead,
> >> even though the two should be equal.
> >>
> >
> > Indeed!
> >
> > So I changed the above loop into:
> >    /* For BImode, 1 and -1 are unsigned and signed interpretations
> >      of the same value.  */
> >   for (mode = MIN_MODE_BOOL;
> >        mode <= MAX_MODE_BOOL;
> >        mode = (machine_mode)((int)(mode) + 1))
> >     {
> >       const_tiny_rtx[0][(int) mode] = const0_rtx;
> >       const_tiny_rtx[1][(int) mode] = const_true_rtx;
> > -      const_tiny_rtx[3][(int) mode] = const_true_rtx;
> > +      const_tiny_rtx[3][(int) mode] = constm1_rtx;
> >     }
> > which works, both constants are now equal and the validation still
> passes.
>
> I think we need to keep const_true_rtx for both [BImode][1] and
> [BImode][3].
> BImode is an awkward special case in that the (only) nonzero value must be
> exactly STORE_FLAG_VALUE, even if that leads to an otherwise non-canonical
> const_int representation.
>

OK, done.


>
> For the multi-bit booleans, [1] needs to be const1_rtx rather than
> const_true_rtx in case STORE_FLAG_VALUE != 1.
>
> >> >> > @@ -1679,15 +1708,25 @@ emit_class_narrowest_mode (void)
> >> >> >    print_decl ("unsigned char", "class_narrowest_mode",
> >> >> "MAX_MODE_CLASS");
> >> >> >
> >> >> >    for (c = 0; c < MAX_MODE_CLASS; c++)
> >> >> > -    /* Bleah, all this to get the comment right for
> MIN_MODE_INT.  */
> >> >> > -    tagged_printf ("MIN_%s", mode_class_names[c],
> >> >> > -                modes[c]
> >> >> > -                ? ((c != MODE_INT || modes[c]->precision != 1)
> >> >> > -                   ? modes[c]->name
> >> >> > -                   : (modes[c]->next
> >> >> > -                      ? modes[c]->next->name
> >> >> > -                      : void_mode->name))
> >> >> > -                : void_mode->name);
> >> >> > +    {
> >> >> > +      /* Bleah, all this to get the comment right for
> MIN_MODE_INT.
> >> */
> >> >> > +      const char *comment_name = void_mode->name;
> >> >> > +
> >> >> > +      if (modes[c])
> >> >> > +     if (c != MODE_INT || !modes[c]->boolean)
> >> >> > +       comment_name = modes[c]->name;
> >> >> > +     else
> >> >> > +       {
> >> >> > +         struct mode_data *m = modes[c];
> >> >> > +         while (m->boolean)
> >> >> > +           m = m->next;
> >> >> > +         if (m)
> >> >> > +           comment_name = m->name;
> >> >> > +         else
> >> >> > +           comment_name = void_mode->name;
> >> >> > +       }
> >> >>
> >> >> Have you tried bootstrapping the patch on a host of your choice?
> >> >> I would expect a warning/Werror about an ambiguous else here.
> >> >>
> >> > No I hadn't and indeed the build fails
> >> >
> >> >>
> >> >> I guess this reduces to:
> >> >>
> >> >>     struct mode_data *m = modes[c];
> >> >>     while (m && m->boolean)
> >> >>       m = m->next;
> >> >>     const char *comment_name = (m ? m : void_mode)->name;
> >> >>
> >> >> but I don't know if that's more readable.
> >> >>
> >> > but to my understanding the problem is that the ambiguous else
> >> > is the first one, and the code should read:
> >> >  if (modes[c])
> >> > +      {
> >> >         if (c != MODE_INT || !modes[c]->boolean)
> >> >           comment_name = modes[c]->name;
> >> >         else
> >> >           {
> >> >             struct mode_data *m = modes[c];
> >> >             while (m->boolean)
> >> >               m = m->next;
> >> >             if (m)
> >> >               comment_name = m->name;
> >> >             else
> >> >               comment_name = void_mode->name;
> >> >           }
> >> >  +    }
> >>
> >> Yeah.  I just meant that the alternative loop was probably simpler,
> >> as a replacement for the outer “if”.
> >>
> >> It looks like that the outer “if” is effectively a peeled iteration of
> >> the while loop in the outer “else”.  And the “c != MODE_INT” part ought
> >> to be redundant: as it stands, the boolean modes don't belong to any
> class.
> >>
> >> Ack, I have now:
> >    for (c = 0; c < MAX_MODE_CLASS; c++)
> >     {
> >       /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> >       struct mode_data *m = modes[c];
> >       while (m && m->boolean)
> >         m = m->next;
> >       const char *comment_name = (m ? m : void_mode)->name;
> >
> >       tagged_printf ("MIN_%s", mode_class_names[c], comment_name);
> >     }
> >
> >
> > Andre, any chance you tried the suggestion of:
> > ENTRY (Pred1x16_t, V16BI, predicate, 16, pred1, 21)
> > ENTRY (Pred2x8_t, V8BI, predicate, 8, pred1, 21)
> > ENTRY (Pred4x4_t, V4BI, predicate, 4, pred1, 21)
>
> BTW: the final argument should be the length of the __simd<N>_<elt>_t
> type name (for mangling purposes).  It looks like the existing 32-bit
> and 64-bit bfloat entries also get this wrong.
>
> But as far as Andre's point goes: I think we need to construct
> a boolean type explicitly, using build_truth_vector_type_for_mode
> or truth_type_for.  Although the entries above specify the correct mode
> (V16BI, etc.), the mode is really a function of the type tree properties,
> rather than the other way round.
>
> The main thing that makes truth vector types special is that those
> types are the only ones that allow multiple elements in the same byte.
> A “normal” 16-byte vector created by build_vector_type(_for_mode)
> cannot be smaller than 16 bytes.
>
>
Thanks for the help, here is a new version of this patch, which contains
all the changes requested.

If OK, I'll rebase and commit the series.

Thanks
Christophe



> Thanks,
> Richard
>

[-- Attachment #2: v4-0007-arm-Implement-MVE-predicates-as-vectors-of-boolea.patch --]
[-- Type: text/x-patch, Size: 19928 bytes --]

From 1eaec2a01d1bcbb397c20d7034f85b7b85c6831d Mon Sep 17 00:00:00 2001
From: Christophe Lyon <christophe.lyon@foss.st.com>
Date: Wed, 13 Oct 2021 09:16:22 +0000
Subject: [PATCH v4 07/15] arm: Implement MVE predicates as vectors of booleans

This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

In addition, we have to fix the underlying definition of vectors of
booleans because ARM/MVE needs a different representation than
AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
element size, so that a true element of V4BI is represented by
'0b1111'.  This patch updates the aarch64 definition of VNx*BI as
needed.

2022-01-13  Christophe Lyon  <christophe.lyon@foss.st.com>
	Richard Sandiford  <richard.sandiford@arm.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
	VNx2BI): Update definition.
	* config/arm/arm-builtins.c (arm_init_simd_builtin_types): Add new
	simd types.
	(arm_init_builtin): Map predicate vectors arguments to HImode.
	(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
	rtx. Move return value to HImode rtx.
	* config/arm/arm-builtins.h (arm_type_qualifiers): Add qualifier_predicate.
	* config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New modes.
	* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
	Pred2x8_t,Pred4x4_t): New.
	* emit-rtl.c (init_emit_once): Handle all boolean modes.
	* genmodes.c (mode_data): Add boolean field.
	(blank_mode): Initialize it.
	(make_complex_modes): Fix handling of boolean modes.
	(make_vector_modes): Likewise.
	(VECTOR_BOOL_MODE): Use new COMPONENT parameter.
	(make_vector_bool_mode): Likewise.
	(BOOL_MODE): New.
	(make_bool_mode): New.
	(emit_insn_modes_h): Fix generation of boolean modes.
	(emit_class_narrowest_mode): Likewise.
	* machmode.def: (VECTOR_BOOL_MODE): Document new COMPONENT
	parameter.  Use new BOOL_MODE instead of FRACTIONAL_INT_MODE to
	define BImode.
	* rtx-vector-builder.c (rtx_vector_builder::find_cached_value):
	Fix handling of constm1_rtx for VECTOR_BOOL.
	* simplify-rtx.c (native_encode_rtx): Fix support for VECTOR_BOOL.
	(native_decode_vector_rtx): Likewise.
	(test_vector_ops_duplicate): Skip vec_merge test
	with vectors of booleans.
	* varasm.c (output_constant_pool_2): Likewise.

diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index 976bf9b42be..8f399225a80 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -47,10 +47,10 @@ ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
 
 /* Vector modes.  */
 
-VECTOR_BOOL_MODE (VNx16BI, 16, 2);
-VECTOR_BOOL_MODE (VNx8BI, 8, 2);
-VECTOR_BOOL_MODE (VNx4BI, 4, 2);
-VECTOR_BOOL_MODE (VNx2BI, 2, 2);
+VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
+VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
+VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
+VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
 
 ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
 ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 9c645722230..dd537ec1679 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -1553,11 +1553,25 @@ arm_init_simd_builtin_types (void)
       tree eltype = arm_simd_types[i].eltype;
       machine_mode mode = arm_simd_types[i].mode;
 
-      if (eltype == NULL)
+      if (eltype == NULL
+	  /* VECTOR_BOOL is not supported unless MVE is activated, this would
+	     make build_truth_vector_type_for_mode crash.  */
+	  && ((GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
+	      ||!TARGET_HAVE_MVE))
 	continue;
       if (arm_simd_types[i].itype == NULL)
 	{
-	  tree type = build_vector_type (eltype, GET_MODE_NUNITS (mode));
+	  tree type;
+	  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+	    {
+	      /* Handle MVE predicates: they are internally stored as 16 bits,
+		 but are used as vectors of 1, 2 or 4-bit elements.  */
+	      type = build_truth_vector_type_for_mode (GET_MODE_NUNITS (mode), mode);
+	      eltype = TREE_TYPE (type);
+	    }
+	  else
+	    type = build_vector_type (eltype, GET_MODE_NUNITS (mode));
+
 	  type = build_distinct_type_copy (type);
 	  SET_TYPE_STRUCTURAL_EQUALITY (type);
 
@@ -1695,6 +1709,11 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum *d,
       if (qualifiers & qualifier_map_mode)
 	op_mode = d->mode;
 
+      /* MVE Predicates use HImode as mandated by the ABI: pred16_t is unsigned
+	 short.  */
+      if (qualifiers & qualifier_predicate)
+	op_mode = HImode;
+
       /* For pointers, we want a pointer to the basic type
 	 of the vector.  */
       if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
@@ -2939,6 +2958,11 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
 	    case ARG_BUILTIN_COPY_TO_REG:
 	      if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
 		op[argc] = convert_memory_address (Pmode, op[argc]);
+
+	      /* MVE uses mve_pred16_t (aka HImode) for vectors of predicates.  */
+	      if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
+		op[argc] = gen_lowpart (mode[argc], op[argc]);
+
 	      /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
 	      if (!(*insn_data[icode].operand[opno].predicate)
 		  (op[argc], mode[argc]))
@@ -3144,6 +3168,13 @@ constant_arg:
   else
     emit_insn (insn);
 
+  if (GET_MODE_CLASS (tmode) == MODE_VECTOR_BOOL)
+    {
+      rtx HItarget = gen_reg_rtx (HImode);
+      emit_move_insn (HItarget, gen_lowpart (HImode, target));
+      return HItarget;
+    }
+
   return target;
 }
 
diff --git a/gcc/config/arm/arm-builtins.h b/gcc/config/arm/arm-builtins.h
index e5130d6d286..a8ef8aef82d 100644
--- a/gcc/config/arm/arm-builtins.h
+++ b/gcc/config/arm/arm-builtins.h
@@ -84,7 +84,9 @@ enum arm_type_qualifiers
   qualifier_lane_pair_index = 0x1000,
   /* Lane indices selected in quadtuplets - must be within range of previous
      argument = a vector.  */
-  qualifier_lane_quadtup_index = 0x2000
+  qualifier_lane_quadtup_index = 0x2000,
+  /* MVE vector predicates.  */
+  qualifier_predicate = 0x4000
 };
 
 struct arm_simd_type_info
diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index de689c8b45e..9ed0cd042c5 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -84,6 +84,14 @@ VECTOR_MODE (FLOAT, BF, 2);   /*                 V2BF.  */
 VECTOR_MODE (FLOAT, BF, 4);   /*		 V4BF.  */
 VECTOR_MODE (FLOAT, BF, 8);   /*		 V8BF.  */
 
+/* Predicates for MVE.  */
+BOOL_MODE (B2I, 2, 1);
+BOOL_MODE (B4I, 4, 1);
+
+VECTOR_BOOL_MODE (V16BI, 16, BI, 2);
+VECTOR_BOOL_MODE (V8BI, 8, B2I, 2);
+VECTOR_BOOL_MODE (V4BI, 4, B4I, 2);
+
 /* Fraction and accumulator vector modes.  */
 VECTOR_MODES (FRACT, 4);      /* V4QQ  V2HQ */
 VECTOR_MODES (UFRACT, 4);     /* V4UQQ V2UHQ */
diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
index 6ba6f211531..d1d6416dad1 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -51,3 +51,7 @@
   ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
   ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
   ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
+
+  ENTRY (Pred1x16_t, V16BI, predicate, 16, pred1, 16)
+  ENTRY (Pred2x8_t, V8BI, predicate, 8, pred1, 15)
+  ENTRY (Pred4x4_t, V4BI, predicate, 4, pred1, 15)
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index feeee16d320..5bf7d37cfa6 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -6239,9 +6239,22 @@ init_emit_once (void)
 
   /* For BImode, 1 and -1 are unsigned and signed interpretations
      of the same value.  */
-  const_tiny_rtx[0][(int) BImode] = const0_rtx;
-  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
-  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
+  for (mode = MIN_MODE_BOOL;
+       mode <= MAX_MODE_BOOL;
+       mode = (machine_mode)((int)(mode) + 1))
+    {
+      const_tiny_rtx[0][(int) mode] = const0_rtx;
+      if (mode == BImode)
+	{
+	  const_tiny_rtx[1][(int) mode] = const_true_rtx;
+	  const_tiny_rtx[3][(int) mode] = const_true_rtx;
+	}
+      else
+	{
+	  const_tiny_rtx[1][(int) mode] = const1_rtx;
+	  const_tiny_rtx[3][(int) mode] = constm1_rtx;
+	}
+    }
 
   for (mode = MIN_MODE_PARTIAL_INT;
        mode <= MAX_MODE_PARTIAL_INT;
@@ -6260,13 +6273,16 @@ init_emit_once (void)
       const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
     }
 
-  /* As for BImode, "all 1" and "all -1" are unsigned and signed
-     interpretations of the same value.  */
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
     {
       const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
       const_tiny_rtx[3][(int) mode] = gen_const_vector (mode, 3);
-      const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
+      if (GET_MODE_INNER (mode) == BImode)
+	/* As for BImode, "all 1" and "all -1" are unsigned and signed
+	   interpretations of the same value.  */
+	const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
+      else
+	const_tiny_rtx[1][(int) mode] = gen_const_vector (mode, 1);
     }
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index 6001b854547..5881abd846c 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -78,6 +78,7 @@ struct mode_data
   bool need_bytesize_adj;	/* true if this mode needs dynamic size
 				   adjustment */
   unsigned int int_n;		/* If nonzero, then __int<INT_N> will be defined */
+  bool boolean;
 };
 
 static struct mode_data *modes[MAX_MODE_CLASS];
@@ -88,7 +89,8 @@ static const struct mode_data blank_mode = {
   0, "<unknown>", MAX_MODE_CLASS,
   0, -1U, -1U, -1U, -1U,
   0, 0, 0, 0, 0, 0,
-  "<unknown>", 0, 0, 0, 0, false, false, 0
+  "<unknown>", 0, 0, 0, 0, false, false, 0,
+  false
 };
 
 static htab_t modes_by_name;
@@ -456,7 +458,7 @@ make_complex_modes (enum mode_class cl,
       size_t m_len;
 
       /* Skip BImode.  FIXME: BImode probably shouldn't be MODE_INT.  */
-      if (m->precision == 1)
+      if (m->boolean)
 	continue;
 
       m_len = strlen (m->name);
@@ -528,7 +530,7 @@ make_vector_modes (enum mode_class cl, const char *prefix, unsigned int width,
 	 not be necessary.  */
       if (cl == MODE_FLOAT && m->bytesize == 1)
 	continue;
-      if (cl == MODE_INT && m->precision == 1)
+      if (m->boolean)
 	continue;
 
       if ((size_t) snprintf (buf, sizeof buf, "%s%u%s", prefix,
@@ -548,17 +550,18 @@ make_vector_modes (enum mode_class cl, const char *prefix, unsigned int width,
 
 /* Create a vector of booleans called NAME with COUNT elements and
    BYTESIZE bytes in total.  */
-#define VECTOR_BOOL_MODE(NAME, COUNT, BYTESIZE) \
-  make_vector_bool_mode (#NAME, COUNT, BYTESIZE, __FILE__, __LINE__)
+#define VECTOR_BOOL_MODE(NAME, COUNT, COMPONENT, BYTESIZE)		\
+  make_vector_bool_mode (#NAME, COUNT, #COMPONENT, BYTESIZE,		\
+			 __FILE__, __LINE__)
 static void ATTRIBUTE_UNUSED
 make_vector_bool_mode (const char *name, unsigned int count,
-		       unsigned int bytesize, const char *file,
-		       unsigned int line)
+		       const char *component, unsigned int bytesize,
+		       const char *file, unsigned int line)
 {
-  struct mode_data *m = find_mode ("BI");
+  struct mode_data *m = find_mode (component);
   if (!m)
     {
-      error ("%s:%d: no mode \"BI\"", file, line);
+      error ("%s:%d: no mode \"%s\"", file, line, component);
       return;
     }
 
@@ -596,6 +599,20 @@ make_int_mode (const char *name,
   m->precision = precision;
 }
 
+#define BOOL_MODE(N, B, Y) \
+  make_bool_mode (#N, B, Y, __FILE__, __LINE__)
+
+static void
+make_bool_mode (const char *name,
+		unsigned int precision, unsigned int bytesize,
+		const char *file, unsigned int line)
+{
+  struct mode_data *m = new_mode (MODE_INT, name, file, line);
+  m->bytesize = bytesize;
+  m->precision = precision;
+  m->boolean = true;
+}
+
 #define OPAQUE_MODE(N, B)			\
   make_opaque_mode (#N, -1U, B, __FILE__, __LINE__)
 
@@ -1298,9 +1315,21 @@ enum machine_mode\n{");
       /* Don't use BImode for MIN_MODE_INT, since otherwise the middle
 	 end will try to use it for bitfields in structures and the
 	 like, which we do not want.  Only the target md file should
-	 generate BImode widgets.  */
-      if (first && first->precision == 1 && c == MODE_INT)
-	first = first->next;
+	 generate BImode widgets.  Since some targets such as ARM/MVE
+	 define boolean modes with multiple bits, handle those too.  */
+      if (first && first->boolean)
+	{
+	  struct mode_data *last_bool = first;
+	  printf ("  MIN_MODE_BOOL = E_%smode,\n", first->name);
+
+	  while (first && first->boolean)
+	    {
+	      last_bool = first;
+	      first = first->next;
+	    }
+
+	  printf ("  MAX_MODE_BOOL = E_%smode,\n\n", last_bool->name);
+	}
 
       if (first && last)
 	printf ("  MIN_%s = E_%smode,\n  MAX_%s = E_%smode,\n\n",
@@ -1679,15 +1708,15 @@ emit_class_narrowest_mode (void)
   print_decl ("unsigned char", "class_narrowest_mode", "MAX_MODE_CLASS");
 
   for (c = 0; c < MAX_MODE_CLASS; c++)
-    /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
-    tagged_printf ("MIN_%s", mode_class_names[c],
-		   modes[c]
-		   ? ((c != MODE_INT || modes[c]->precision != 1)
-		      ? modes[c]->name
-		      : (modes[c]->next
-			 ? modes[c]->next->name
-			 : void_mode->name))
-		   : void_mode->name);
+    {
+      /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
+      struct mode_data *m = modes[c];
+      while (m && m->boolean)
+	m = m->next;
+      const char *comment_name = (m ? m : void_mode)->name;
+
+      tagged_printf ("MIN_%s", mode_class_names[c], comment_name);
+    }
 
   print_closer ();
 }
diff --git a/gcc/machmode.def b/gcc/machmode.def
index 866a2082d01..533cf6ab4b2 100644
--- a/gcc/machmode.def
+++ b/gcc/machmode.def
@@ -146,12 +146,13 @@ along with GCC; see the file COPYING3.  If not see
 	Like VECTOR_MODES, but start the mode names with PREFIX instead
 	of the usual "V".
 
-     VECTOR_BOOL_MODE (NAME, COUNT, BYTESIZE)
+     VECTOR_BOOL_MODE (NAME, COUNT, COMPONENT, BYTESIZE)
         Create a vector mode called NAME that contains COUNT boolean
         elements and occupies BYTESIZE bytes in total.  Each boolean
-        element occupies (COUNT * BITS_PER_UNIT) / BYTESIZE bits, with
-        the element at index 0 occupying the lsb of the first byte in
-        memory.  Only the lowest bit of each element is significant.
+        element is of COMPONENT type and occupies (COUNT * BITS_PER_UNIT) /
+        BYTESIZE bits, with the element at index 0 occupying the lsb of the
+        first byte in memory.  Only the lowest bit of each element is
+        significant.
 
      OPAQUE_MODE (NAME, BYTESIZE)
         Create an opaque mode called NAME that is BYTESIZE bytes wide.
@@ -196,7 +197,7 @@ RANDOM_MODE (VOID);
 RANDOM_MODE (BLK);
 
 /* Single bit mode used for booleans.  */
-FRACTIONAL_INT_MODE (BI, 1, 1);
+BOOL_MODE (BI, 1, 1);
 
 /* Basic integer modes.  We go up to TI in generic code (128 bits).
    TImode is needed here because the some front ends now genericly
diff --git a/gcc/rtx-vector-builder.c b/gcc/rtx-vector-builder.c
index e36aba010a0..55ffe0d5a76 100644
--- a/gcc/rtx-vector-builder.c
+++ b/gcc/rtx-vector-builder.c
@@ -90,8 +90,10 @@ rtx_vector_builder::find_cached_value ()
 
   if (GET_MODE_CLASS (m_mode) == MODE_VECTOR_BOOL)
     {
-      if (elt == const1_rtx || elt == constm1_rtx)
+      if (elt == const1_rtx)
 	return CONST1_RTX (m_mode);
+      else if (elt == constm1_rtx)
+	return CONSTM1_RTX (m_mode);
       else if (elt == const0_rtx)
 	return CONST0_RTX (m_mode);
       else
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index c36c825f958..532537ea48d 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -6876,12 +6876,13 @@ native_encode_rtx (machine_mode mode, rtx x, vec<target_unit> &bytes,
 	  /* This is the only case in which elements can be smaller than
 	     a byte.  */
 	  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
+	  auto mask = GET_MODE_MASK (GET_MODE_INNER (mode));
 	  for (unsigned int i = 0; i < num_bytes; ++i)
 	    {
 	      target_unit value = 0;
 	      for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits)
 		{
-		  value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & 1) << j;
+		  value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) << j;
 		  elt += 1;
 		}
 	      bytes.quick_push (value);
@@ -7025,9 +7026,8 @@ native_decode_vector_rtx (machine_mode mode, const vec<target_unit> &bytes,
 	  unsigned int bit_index = first_byte * BITS_PER_UNIT + i * elt_bits;
 	  unsigned int byte_index = bit_index / BITS_PER_UNIT;
 	  unsigned int lsb = bit_index % BITS_PER_UNIT;
-	  builder.quick_push (bytes[byte_index] & (1 << lsb)
-			      ? CONST1_RTX (BImode)
-			      : CONST0_RTX (BImode));
+	  unsigned int value = bytes[byte_index] >> lsb;
+	  builder.quick_push (gen_int_mode (value, GET_MODE_INNER (mode)));
 	}
     }
   else
@@ -7994,17 +7994,23 @@ test_vector_ops_duplicate (machine_mode mode, rtx scalar_reg)
 						    duplicate, last_par));
 
       /* Test a scalar subreg of a VEC_MERGE of a VEC_DUPLICATE.  */
-      rtx vector_reg = make_test_reg (mode);
-      for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
+      /* Skip this test for vectors of booleans, because offset is in bytes,
+	 while vec_merge indices are in elements (usually bits).  */
+      if (GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
 	{
-	  if (i >= HOST_BITS_PER_WIDE_INT)
-	    break;
-	  rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
-	  rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
-	  poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
-	  ASSERT_RTX_EQ (scalar_reg,
-			 simplify_gen_subreg (inner_mode, vm,
-					      mode, offset));
+	  rtx vector_reg = make_test_reg (mode);
+	  for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
+	    {
+	      if (i >= HOST_BITS_PER_WIDE_INT)
+		break;
+	      rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
+	      rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
+	      poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
+
+	      ASSERT_RTX_EQ (scalar_reg,
+			     simplify_gen_subreg (inner_mode, vm,
+						  mode, offset));
+	    }
 	}
     }
 
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 76574be191f..5f59b6ace15 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -4085,6 +4085,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
 	unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
 	unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
 	scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
+	unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode));
 
 	/* Build the constant up one integer at a time.  */
 	unsigned int elts_per_int = int_bits / elt_bits;
@@ -4093,8 +4094,10 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
 	    unsigned HOST_WIDE_INT value = 0;
 	    unsigned int limit = MIN (nelts - i, elts_per_int);
 	    for (unsigned int j = 0; j < limit; ++j)
-	      if (INTVAL (CONST_VECTOR_ELT (x, i + j)) != 0)
-		value |= 1 << (j * elt_bits);
+	    {
+	      auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j));
+	      value |= (elt & mask) << (j * elt_bits);
+	    }
 	    output_constant_pool_2 (int_mode, gen_int_mode (value, int_mode),
 				    i != 0 ? MIN (align, int_bits) : align);
 	  }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans
  2022-02-17 15:39             ` Christophe Lyon
@ 2022-02-21 18:18               ` Richard Sandiford
  0 siblings, 0 replies; 54+ messages in thread
From: Richard Sandiford @ 2022-02-21 18:18 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Christophe Lyon via Gcc-patches, Andre Simoes Dias Vieira

Christophe Lyon <christophe.lyon.oss@gmail.com> writes:
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 9c645722230..dd537ec1679 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -1553,11 +1553,25 @@ arm_init_simd_builtin_types (void)
>        tree eltype = arm_simd_types[i].eltype;
>        machine_mode mode = arm_simd_types[i].mode;
>  
> -      if (eltype == NULL)
> +      if (eltype == NULL
> +	  /* VECTOR_BOOL is not supported unless MVE is activated, this would
> +	     make build_truth_vector_type_for_mode crash.  */
> +	  && ((GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
> +	      ||!TARGET_HAVE_MVE))

For the record: this kind of thing wouldn't be OK in aarch64,
since there we should allow a target to be selected later.
But I agree that here it's valid, since TARGET_HAVE_MVE already
decides whether arm_neon.h or arm_mve.h builtins are registered.

Formatting nit though: missing space after “||”.

>  	continue;
>        if (arm_simd_types[i].itype == NULL)
>  	{
> -	  tree type = build_vector_type (eltype, GET_MODE_NUNITS (mode));
> +	  tree type;
> +	  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
> +	    {
> +	      /* Handle MVE predicates: they are internally stored as 16 bits,
> +		 but are used as vectors of 1, 2 or 4-bit elements.  */
> +	      type = build_truth_vector_type_for_mode (GET_MODE_NUNITS (mode), mode);

Formatting nit: line too long.

OK with those changes, thanks.

Richard

> +	      eltype = TREE_TYPE (type);
> +	    }
> +	  else
> +	    type = build_vector_type (eltype, GET_MODE_NUNITS (mode));
> +
>  	  type = build_distinct_type_copy (type);
>  	  SET_TYPE_STRUCTURAL_EQUALITY (type);
>  
> @@ -1695,6 +1709,11 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum *d,
>        if (qualifiers & qualifier_map_mode)
>  	op_mode = d->mode;
>  
> +      /* MVE Predicates use HImode as mandated by the ABI: pred16_t is unsigned
> +	 short.  */
> +      if (qualifiers & qualifier_predicate)
> +	op_mode = HImode;
> +
>        /* For pointers, we want a pointer to the basic type
>  	 of the vector.  */
>        if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
> @@ -2939,6 +2958,11 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
>  	    case ARG_BUILTIN_COPY_TO_REG:
>  	      if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
>  		op[argc] = convert_memory_address (Pmode, op[argc]);
> +
> +	      /* MVE uses mve_pred16_t (aka HImode) for vectors of predicates.  */
> +	      if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
> +		op[argc] = gen_lowpart (mode[argc], op[argc]);
> +
>  	      /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
>  	      if (!(*insn_data[icode].operand[opno].predicate)
>  		  (op[argc], mode[argc]))
> @@ -3144,6 +3168,13 @@ constant_arg:
>    else
>      emit_insn (insn);
>  
> +  if (GET_MODE_CLASS (tmode) == MODE_VECTOR_BOOL)
> +    {
> +      rtx HItarget = gen_reg_rtx (HImode);
> +      emit_move_insn (HItarget, gen_lowpart (HImode, target));
> +      return HItarget;
> +    }
> +
>    return target;
>  }
>  
> diff --git a/gcc/config/arm/arm-builtins.h b/gcc/config/arm/arm-builtins.h
> index e5130d6d286..a8ef8aef82d 100644
> --- a/gcc/config/arm/arm-builtins.h
> +++ b/gcc/config/arm/arm-builtins.h
> @@ -84,7 +84,9 @@ enum arm_type_qualifiers
>    qualifier_lane_pair_index = 0x1000,
>    /* Lane indices selected in quadtuplets - must be within range of previous
>       argument = a vector.  */
> -  qualifier_lane_quadtup_index = 0x2000
> +  qualifier_lane_quadtup_index = 0x2000,
> +  /* MVE vector predicates.  */
> +  qualifier_predicate = 0x4000
>  };
>  
>  struct arm_simd_type_info
> diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
> index de689c8b45e..9ed0cd042c5 100644
> --- a/gcc/config/arm/arm-modes.def
> +++ b/gcc/config/arm/arm-modes.def
> @@ -84,6 +84,14 @@ VECTOR_MODE (FLOAT, BF, 2);   /*                 V2BF.  */
>  VECTOR_MODE (FLOAT, BF, 4);   /*		 V4BF.  */
>  VECTOR_MODE (FLOAT, BF, 8);   /*		 V8BF.  */
>  
> +/* Predicates for MVE.  */
> +BOOL_MODE (B2I, 2, 1);
> +BOOL_MODE (B4I, 4, 1);
> +
> +VECTOR_BOOL_MODE (V16BI, 16, BI, 2);
> +VECTOR_BOOL_MODE (V8BI, 8, B2I, 2);
> +VECTOR_BOOL_MODE (V4BI, 4, B4I, 2);
> +
>  /* Fraction and accumulator vector modes.  */
>  VECTOR_MODES (FRACT, 4);      /* V4QQ  V2HQ */
>  VECTOR_MODES (UFRACT, 4);     /* V4UQQ V2UHQ */
> diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
> index 6ba6f211531..d1d6416dad1 100644
> --- a/gcc/config/arm/arm-simd-builtin-types.def
> +++ b/gcc/config/arm/arm-simd-builtin-types.def
> @@ -51,3 +51,7 @@
>    ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
>    ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
>    ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
> +
> +  ENTRY (Pred1x16_t, V16BI, predicate, 16, pred1, 16)
> +  ENTRY (Pred2x8_t, V8BI, predicate, 8, pred1, 15)
> +  ENTRY (Pred4x4_t, V4BI, predicate, 4, pred1, 15)
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index feeee16d320..5bf7d37cfa6 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -6239,9 +6239,22 @@ init_emit_once (void)
>  
>    /* For BImode, 1 and -1 are unsigned and signed interpretations
>       of the same value.  */
> -  const_tiny_rtx[0][(int) BImode] = const0_rtx;
> -  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
> -  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
> +  for (mode = MIN_MODE_BOOL;
> +       mode <= MAX_MODE_BOOL;
> +       mode = (machine_mode)((int)(mode) + 1))
> +    {
> +      const_tiny_rtx[0][(int) mode] = const0_rtx;
> +      if (mode == BImode)
> +	{
> +	  const_tiny_rtx[1][(int) mode] = const_true_rtx;
> +	  const_tiny_rtx[3][(int) mode] = const_true_rtx;
> +	}
> +      else
> +	{
> +	  const_tiny_rtx[1][(int) mode] = const1_rtx;
> +	  const_tiny_rtx[3][(int) mode] = constm1_rtx;
> +	}
> +    }
>  
>    for (mode = MIN_MODE_PARTIAL_INT;
>         mode <= MAX_MODE_PARTIAL_INT;
> @@ -6260,13 +6273,16 @@ init_emit_once (void)
>        const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
>      }
>  
> -  /* As for BImode, "all 1" and "all -1" are unsigned and signed
> -     interpretations of the same value.  */
>    FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
>      {
>        const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
>        const_tiny_rtx[3][(int) mode] = gen_const_vector (mode, 3);
> -      const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
> +      if (GET_MODE_INNER (mode) == BImode)
> +	/* As for BImode, "all 1" and "all -1" are unsigned and signed
> +	   interpretations of the same value.  */
> +	const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
> +      else
> +	const_tiny_rtx[1][(int) mode] = gen_const_vector (mode, 1);
>      }
>  
>    FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
> diff --git a/gcc/genmodes.c b/gcc/genmodes.c
> index 6001b854547..5881abd846c 100644
> --- a/gcc/genmodes.c
> +++ b/gcc/genmodes.c
> @@ -78,6 +78,7 @@ struct mode_data
>    bool need_bytesize_adj;	/* true if this mode needs dynamic size
>  				   adjustment */
>    unsigned int int_n;		/* If nonzero, then __int<INT_N> will be defined */
> +  bool boolean;
>  };
>  
>  static struct mode_data *modes[MAX_MODE_CLASS];
> @@ -88,7 +89,8 @@ static const struct mode_data blank_mode = {
>    0, "<unknown>", MAX_MODE_CLASS,
>    0, -1U, -1U, -1U, -1U,
>    0, 0, 0, 0, 0, 0,
> -  "<unknown>", 0, 0, 0, 0, false, false, 0
> +  "<unknown>", 0, 0, 0, 0, false, false, 0,
> +  false
>  };
>  
>  static htab_t modes_by_name;
> @@ -456,7 +458,7 @@ make_complex_modes (enum mode_class cl,
>        size_t m_len;
>  
>        /* Skip BImode.  FIXME: BImode probably shouldn't be MODE_INT.  */
> -      if (m->precision == 1)
> +      if (m->boolean)
>  	continue;
>  
>        m_len = strlen (m->name);
> @@ -528,7 +530,7 @@ make_vector_modes (enum mode_class cl, const char *prefix, unsigned int width,
>  	 not be necessary.  */
>        if (cl == MODE_FLOAT && m->bytesize == 1)
>  	continue;
> -      if (cl == MODE_INT && m->precision == 1)
> +      if (m->boolean)
>  	continue;
>  
>        if ((size_t) snprintf (buf, sizeof buf, "%s%u%s", prefix,
> @@ -548,17 +550,18 @@ make_vector_modes (enum mode_class cl, const char *prefix, unsigned int width,
>  
>  /* Create a vector of booleans called NAME with COUNT elements and
>     BYTESIZE bytes in total.  */
> -#define VECTOR_BOOL_MODE(NAME, COUNT, BYTESIZE) \
> -  make_vector_bool_mode (#NAME, COUNT, BYTESIZE, __FILE__, __LINE__)
> +#define VECTOR_BOOL_MODE(NAME, COUNT, COMPONENT, BYTESIZE)		\
> +  make_vector_bool_mode (#NAME, COUNT, #COMPONENT, BYTESIZE,		\
> +			 __FILE__, __LINE__)
>  static void ATTRIBUTE_UNUSED
>  make_vector_bool_mode (const char *name, unsigned int count,
> -		       unsigned int bytesize, const char *file,
> -		       unsigned int line)
> +		       const char *component, unsigned int bytesize,
> +		       const char *file, unsigned int line)
>  {
> -  struct mode_data *m = find_mode ("BI");
> +  struct mode_data *m = find_mode (component);
>    if (!m)
>      {
> -      error ("%s:%d: no mode \"BI\"", file, line);
> +      error ("%s:%d: no mode \"%s\"", file, line, component);
>        return;
>      }
>  
> @@ -596,6 +599,20 @@ make_int_mode (const char *name,
>    m->precision = precision;
>  }
>  
> +#define BOOL_MODE(N, B, Y) \
> +  make_bool_mode (#N, B, Y, __FILE__, __LINE__)
> +
> +static void
> +make_bool_mode (const char *name,
> +		unsigned int precision, unsigned int bytesize,
> +		const char *file, unsigned int line)
> +{
> +  struct mode_data *m = new_mode (MODE_INT, name, file, line);
> +  m->bytesize = bytesize;
> +  m->precision = precision;
> +  m->boolean = true;
> +}
> +
>  #define OPAQUE_MODE(N, B)			\
>    make_opaque_mode (#N, -1U, B, __FILE__, __LINE__)
>  
> @@ -1298,9 +1315,21 @@ enum machine_mode\n{");
>        /* Don't use BImode for MIN_MODE_INT, since otherwise the middle
>  	 end will try to use it for bitfields in structures and the
>  	 like, which we do not want.  Only the target md file should
> -	 generate BImode widgets.  */
> -      if (first && first->precision == 1 && c == MODE_INT)
> -	first = first->next;
> +	 generate BImode widgets.  Since some targets such as ARM/MVE
> +	 define boolean modes with multiple bits, handle those too.  */
> +      if (first && first->boolean)
> +	{
> +	  struct mode_data *last_bool = first;
> +	  printf ("  MIN_MODE_BOOL = E_%smode,\n", first->name);
> +
> +	  while (first && first->boolean)
> +	    {
> +	      last_bool = first;
> +	      first = first->next;
> +	    }
> +
> +	  printf ("  MAX_MODE_BOOL = E_%smode,\n\n", last_bool->name);
> +	}
>  
>        if (first && last)
>  	printf ("  MIN_%s = E_%smode,\n  MAX_%s = E_%smode,\n\n",
> @@ -1679,15 +1708,15 @@ emit_class_narrowest_mode (void)
>    print_decl ("unsigned char", "class_narrowest_mode", "MAX_MODE_CLASS");
>  
>    for (c = 0; c < MAX_MODE_CLASS; c++)
> -    /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> -    tagged_printf ("MIN_%s", mode_class_names[c],
> -		   modes[c]
> -		   ? ((c != MODE_INT || modes[c]->precision != 1)
> -		      ? modes[c]->name
> -		      : (modes[c]->next
> -			 ? modes[c]->next->name
> -			 : void_mode->name))
> -		   : void_mode->name);
> +    {
> +      /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> +      struct mode_data *m = modes[c];
> +      while (m && m->boolean)
> +	m = m->next;
> +      const char *comment_name = (m ? m : void_mode)->name;
> +
> +      tagged_printf ("MIN_%s", mode_class_names[c], comment_name);
> +    }
>  
>    print_closer ();
>  }
> diff --git a/gcc/machmode.def b/gcc/machmode.def
> index 866a2082d01..533cf6ab4b2 100644
> --- a/gcc/machmode.def
> +++ b/gcc/machmode.def
> @@ -146,12 +146,13 @@ along with GCC; see the file COPYING3.  If not see
>  	Like VECTOR_MODES, but start the mode names with PREFIX instead
>  	of the usual "V".
>  
> -     VECTOR_BOOL_MODE (NAME, COUNT, BYTESIZE)
> +     VECTOR_BOOL_MODE (NAME, COUNT, COMPONENT, BYTESIZE)
>          Create a vector mode called NAME that contains COUNT boolean
>          elements and occupies BYTESIZE bytes in total.  Each boolean
> -        element occupies (COUNT * BITS_PER_UNIT) / BYTESIZE bits, with
> -        the element at index 0 occupying the lsb of the first byte in
> -        memory.  Only the lowest bit of each element is significant.
> +        element is of COMPONENT type and occupies (COUNT * BITS_PER_UNIT) /
> +        BYTESIZE bits, with the element at index 0 occupying the lsb of the
> +        first byte in memory.  Only the lowest bit of each element is
> +        significant.
>  
>       OPAQUE_MODE (NAME, BYTESIZE)
>          Create an opaque mode called NAME that is BYTESIZE bytes wide.
> @@ -196,7 +197,7 @@ RANDOM_MODE (VOID);
>  RANDOM_MODE (BLK);
>  
>  /* Single bit mode used for booleans.  */
> -FRACTIONAL_INT_MODE (BI, 1, 1);
> +BOOL_MODE (BI, 1, 1);
>  
>  /* Basic integer modes.  We go up to TI in generic code (128 bits).
>     TImode is needed here because the some front ends now genericly
> diff --git a/gcc/rtx-vector-builder.c b/gcc/rtx-vector-builder.c
> index e36aba010a0..55ffe0d5a76 100644
> --- a/gcc/rtx-vector-builder.c
> +++ b/gcc/rtx-vector-builder.c
> @@ -90,8 +90,10 @@ rtx_vector_builder::find_cached_value ()
>  
>    if (GET_MODE_CLASS (m_mode) == MODE_VECTOR_BOOL)
>      {
> -      if (elt == const1_rtx || elt == constm1_rtx)
> +      if (elt == const1_rtx)
>  	return CONST1_RTX (m_mode);
> +      else if (elt == constm1_rtx)
> +	return CONSTM1_RTX (m_mode);
>        else if (elt == const0_rtx)
>  	return CONST0_RTX (m_mode);
>        else
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index c36c825f958..532537ea48d 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -6876,12 +6876,13 @@ native_encode_rtx (machine_mode mode, rtx x, vec<target_unit> &bytes,
>  	  /* This is the only case in which elements can be smaller than
>  	     a byte.  */
>  	  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> +	  auto mask = GET_MODE_MASK (GET_MODE_INNER (mode));
>  	  for (unsigned int i = 0; i < num_bytes; ++i)
>  	    {
>  	      target_unit value = 0;
>  	      for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits)
>  		{
> -		  value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & 1) << j;
> +		  value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) << j;
>  		  elt += 1;
>  		}
>  	      bytes.quick_push (value);
> @@ -7025,9 +7026,8 @@ native_decode_vector_rtx (machine_mode mode, const vec<target_unit> &bytes,
>  	  unsigned int bit_index = first_byte * BITS_PER_UNIT + i * elt_bits;
>  	  unsigned int byte_index = bit_index / BITS_PER_UNIT;
>  	  unsigned int lsb = bit_index % BITS_PER_UNIT;
> -	  builder.quick_push (bytes[byte_index] & (1 << lsb)
> -			      ? CONST1_RTX (BImode)
> -			      : CONST0_RTX (BImode));
> +	  unsigned int value = bytes[byte_index] >> lsb;
> +	  builder.quick_push (gen_int_mode (value, GET_MODE_INNER (mode)));
>  	}
>      }
>    else
> @@ -7994,17 +7994,23 @@ test_vector_ops_duplicate (machine_mode mode, rtx scalar_reg)
>  						    duplicate, last_par));
>  
>        /* Test a scalar subreg of a VEC_MERGE of a VEC_DUPLICATE.  */
> -      rtx vector_reg = make_test_reg (mode);
> -      for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
> +      /* Skip this test for vectors of booleans, because offset is in bytes,
> +	 while vec_merge indices are in elements (usually bits).  */
> +      if (GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
>  	{
> -	  if (i >= HOST_BITS_PER_WIDE_INT)
> -	    break;
> -	  rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
> -	  rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
> -	  poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
> -	  ASSERT_RTX_EQ (scalar_reg,
> -			 simplify_gen_subreg (inner_mode, vm,
> -					      mode, offset));
> +	  rtx vector_reg = make_test_reg (mode);
> +	  for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
> +	    {
> +	      if (i >= HOST_BITS_PER_WIDE_INT)
> +		break;
> +	      rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
> +	      rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
> +	      poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
> +
> +	      ASSERT_RTX_EQ (scalar_reg,
> +			     simplify_gen_subreg (inner_mode, vm,
> +						  mode, offset));
> +	    }
>  	}
>      }
>  
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 76574be191f..5f59b6ace15 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -4085,6 +4085,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
>  	unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
>  	unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
>  	scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
> +	unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode));
>  
>  	/* Build the constant up one integer at a time.  */
>  	unsigned int elts_per_int = int_bits / elt_bits;
> @@ -4093,8 +4094,10 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
>  	    unsigned HOST_WIDE_INT value = 0;
>  	    unsigned int limit = MIN (nelts - i, elts_per_int);
>  	    for (unsigned int j = 0; j < limit; ++j)
> -	      if (INTVAL (CONST_VECTOR_ELT (x, i + j)) != 0)
> -		value |= 1 << (j * elt_bits);
> +	    {
> +	      auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j));
> +	      value |= (elt & mask) << (j * elt_bits);
> +	    }
>  	    output_constant_pool_2 (int_mode, gen_int_mode (value, int_mode),
>  				    i != 0 ? MIN (align, int_bits) : align);
>  	  }

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [arm] MVE: Relax addressing modes for full loads and stores
  2022-01-17  7:48     ` Christophe Lyon
@ 2022-03-07 14:16       ` Andre Vieira (lists)
  2022-03-07 16:14         ` Kyrylo Tkachov
  0 siblings, 1 reply; 54+ messages in thread
From: Andre Vieira (lists) @ 2022-03-07 14:16 UTC (permalink / raw)
  To: kyrylo Tkachov; +Cc: GCC Patches

On 17/01/2022 07:48, Christophe Lyon wrote:
> Hi André,
>
> On Fri, Jan 14, 2022 at 6:03 PM Andre Vieira (lists) via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
>
>     Hi Christophe,
>
>     This patch relaxes the addressing modes for the mve full load and
>     stores
>     (by full loads and stores I mean non-widening or narrowing loads and
>     stores resp). The code before was requiring a LO_REGNUM for these,
>     where
>     this is only a requirement if the load is widening or the store
>     narrowing.
>
>     So with this your patch should not be necessary.
>
>     Regression tested on arm-none-eabi-gcc.  Can you please confirm this
>     fixes the issue you were seeing too?
>
>
> Yes, I confirm this fixes the problem I was fixing with my patch #15 
> in my MVE/VCMP/VCOND series.
> I'll drop it.
>
> Thanks!
>
> Christophe
>
>
>     gcc/ChangeLog:
>
>              * config/arm/arm.h (MVE_STN_LDW_MODE): New MACRO.
>              * config/arm/arm.c (mve_vector_mem_operand): Relax
>     constraint on
>              base register for non widening loads or narrowing stores.
>
>
>     Kind Regards,
>     Andre Vieira
>

Ping, I noticed this also fixes PR 104790.

Kind regards,
Andre

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [arm] MVE: Relax addressing modes for full loads and stores
  2022-03-07 14:16       ` Andre Vieira (lists)
@ 2022-03-07 16:14         ` Kyrylo Tkachov
  0 siblings, 0 replies; 54+ messages in thread
From: Kyrylo Tkachov @ 2022-03-07 16:14 UTC (permalink / raw)
  To: Andre Simoes Dias Vieira; +Cc: GCC Patches

Ok, please include PR 104790 in the ChangeLog.
Thanks,
Kyrill

From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com>
Sent: Monday, March 7, 2022 2:17 PM
To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [arm] MVE: Relax addressing modes for full loads and stores

On 17/01/2022 07:48, Christophe Lyon wrote:
Hi André,

On Fri, Jan 14, 2022 at 6:03 PM Andre Vieira (lists) via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
Hi Christophe,

This patch relaxes the addressing modes for the mve full load and stores
(by full loads and stores I mean non-widening or narrowing loads and
stores resp). The code before was requiring a LO_REGNUM for these, where
this is only a requirement if the load is widening or the store narrowing.

So with this your patch should not be necessary.

Regression tested on arm-none-eabi-gcc.  Can you please confirm this
fixes the issue you were seeing too?

Yes, I confirm this fixes the problem I was fixing with my patch #15 in my MVE/VCMP/VCOND series.
I'll drop it.

Thanks!

Christophe


gcc/ChangeLog:

         * config/arm/arm.h (MVE_STN_LDW_MODE): New MACRO.
         * config/arm/arm.c (mve_vector_mem_operand): Relax constraint on
         base register for non widening loads or narrowing stores.


Kind Regards,
Andre Vieira



Ping, I noticed this also fixes PR 104790.

Kind regards,
Andre

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2022-03-07 16:14 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13 14:56 [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
2022-01-13 14:56 ` [PATCH v3 01/15] arm: Add new tests for comparison vectorization with Neon and MVE Christophe Lyon
2022-01-13 14:56 ` [PATCH v3 02/15] arm: Add tests for PR target/100757 Christophe Lyon
2022-01-13 14:56 ` [PATCH v3 03/15] arm: Add tests for PR target/101325 Christophe Lyon
2022-01-13 14:56 ` [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass Christophe Lyon
2022-01-19 18:17   ` Andre Vieira (lists)
2022-01-20  9:14     ` Christophe Lyon
2022-01-20  9:43       ` Andre Vieira (lists)
2022-01-20 10:40         ` Richard Sandiford
2022-01-20 10:45           ` Andre Vieira (lists)
2022-01-27 16:21   ` Kyrylo Tkachov
2022-01-13 14:56 ` [PATCH v3 05/15] arm: Add support for VPR_REG in arm_class_likely_spilled_p Christophe Lyon
2022-01-19 18:25   ` Andre Vieira (lists)
2022-01-20  9:20     ` Christophe Lyon
2022-01-13 14:56 ` [PATCH v3 06/15] arm: Fix mve_vmvnq_n_<supf><mode> argument mode Christophe Lyon
2022-01-19 19:03   ` Andre Vieira (lists)
2022-01-20  9:23     ` Christophe Lyon
2022-01-20  9:38       ` Andre Simoes Dias Vieira
2022-01-20  9:44         ` Christophe Lyon
2022-01-20 10:45     ` Richard Sandiford
2022-01-20 11:06       ` Andre Vieira (lists)
2022-01-13 14:56 ` [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans Christophe Lyon
2022-01-21 11:20   ` Andre Vieira (lists)
2022-01-21 22:30     ` Christophe Lyon
2022-01-27 16:28   ` Kyrylo Tkachov
2022-01-27 18:10     ` Christophe Lyon
2022-01-31 18:01   ` Richard Sandiford
2022-01-31 22:57     ` Christophe Lyon
2022-02-01  3:42       ` Richard Sandiford
2022-02-02 16:51         ` Christophe Lyon
2022-02-04  9:42           ` Richard Sandiford
2022-02-04  9:54             ` Richard Sandiford
2022-02-17 15:39             ` Christophe Lyon
2022-02-21 18:18               ` Richard Sandiford
2022-01-13 14:56 ` [PATCH v3 08/15] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates Christophe Lyon
2022-01-27 16:37   ` Kyrylo Tkachov
2022-01-13 14:56 ` [PATCH v3 09/15] arm: Fix vcond_mask expander for MVE (PR target/100757) Christophe Lyon
2022-01-27 16:55   ` Kyrylo Tkachov
2022-01-13 14:56 ` [PATCH v3 10/15] arm: Convert remaining MVE vcmp builtins to predicate qualifiers Christophe Lyon
2022-01-13 14:56 ` [PATCH v3 11/15] arm: Convert more MVE " Christophe Lyon
2022-01-13 14:56 ` [PATCH v3 12/15] arm: Convert more load/store " Christophe Lyon
2022-01-27 16:56   ` Kyrylo Tkachov
2022-01-13 14:56 ` [PATCH v3 13/15] arm: Convert more MVE/CDE " Christophe Lyon
2022-01-27 16:56   ` Kyrylo Tkachov
2022-01-13 14:56 ` [PATCH v3 14/15] arm: Add VPR_REG to ALL_REGS Christophe Lyon
2022-01-13 14:56 ` [PATCH v3 15/15] arm: Fix constraint check for V8HI in mve_vector_mem_operand Christophe Lyon
2022-01-14 17:03   ` [arm] MVE: Relax addressing modes for full loads and stores Andre Vieira (lists)
2022-01-17  7:48     ` Christophe Lyon
2022-03-07 14:16       ` Andre Vieira (lists)
2022-03-07 16:14         ` Kyrylo Tkachov
2022-01-14 13:18 ` [PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates Christophe Lyon
2022-01-14 13:33   ` Richard Biener
2022-01-14 14:22     ` Kyrylo Tkachov
2022-01-26  8:40       ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).