>> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1? Yes, I think so. Let's explain RVV more clearly. Let's suppose we have vector-length = 64bits in RVV CPU. VNx1BI is exactly 1 consecutive bits. VNx2BI is exactly 2 consecutive bits. VNx4BI is exactly 4 consecutive bits. VNx8BI is exactly 8 consecutive bits. For VNx1BI (vbool64_t ), we load it wich this asm: vsetvl e8mf8 vlm.v For VNx2BI (vbool32_t ), we load it wich this asm: vsetvl e8mf4 vlm.v For VNx4BI (vbool16_t ), we load it wich this asm: vsetvl e8mf2 vlm.v For VNx8BI (vbool8_t ), we load it wich this asm: vsetvl e8m1 vlm.v In case of this code sequence: vbool16_t v4 = *(vbool16_t *)in; vbool8_t v3 = *(vbool8_t*)in; Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t ) We can't just use the data loaded by VNx4BI (vbool16_t ) in VNx8BI (vbool8_t ). But we can use the data loaded by VNx8BI (vbool8_t ) in VNx4BI (vbool16_t ). In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by vbool16_t v4 which is already loaded It's incorrect for RVV. Maybe @kito can give us more information about RVV ISA if I don't explain it clearly. juzhe.zhong@rivai.ai From: Richard Biener Date: 2023-02-13 16:07 To: juzhe.zhong CC: Pan Li; gcc-patches; kito.cheng; richard.sandiford; ams Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote: > Thanks for contributing this. > Hi, Richard. Can you help us with this issue? > In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode) > Since we are using 1bit-mask which is 1-BOOL occupy 1bit. > According to RVV ISA, we adjust these modes as follows: > > VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask) > VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask) > VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask) > VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask) So how's VNx1BImode laid out for N == 2? Is that still a single byte and two consecutive bits? I suppose so. But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1? I'm not sure what GET_MODE_PRECISION of the vector mode itself should be here, but then I wonder ... > If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different. > However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)). > Such scenario make these tied together and gives the wrong code gen since their bitsize are different. > Consider the case as this: > #include "riscv_vector.h" > void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond) > { > vint8m1_t v = *(vint8m1_t*)in; > *(vint8m1_t*)out = v; vbool16_t v4 = *(vbool16_t *)in; > *(vbool16_t *)(out + 300) = v4; > vbool8_t v3 = *(vbool8_t*)in; > *(vbool8_t*)(out + 200) = v3; > } > The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple. > We failed to fix it in RISC-V backend. Can you help us with this? Thanks. ... why for the loads the "padding" is not loaded? The above testcase is probably more complicated than necessary as well? Thanks, Richard. > > juzhe.zhong@rivai.ai > > From: incarnation.p.lee > Date: 2023-02-11 16:46 > To: gcc-patches > CC: juzhe.zhong; kito.cheng; rguenther; Pan Li > Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types > From: Pan Li > > Fix the bug for mode tieable of the rvv bool types. The vbool*_t > cannot be tied as the actually load/store size is determinated by > the vl. The mode size of rvv bool types are also adjusted for the > underlying optimization pass. The rvv bool type is vbool*_t, aka > vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and > vbool64_t. > > PR 108185 > PR 108654 > > gcc/ChangeLog: > > * config/riscv/riscv-modes.def (ADJUST_BYTESIZE): > * config/riscv/riscv.cc (riscv_v_adjust_bytesize): > (riscv_modes_tieable_p): > * config/riscv/riscv.h (riscv_v_adjust_bytesize): > * machmode.h (VECTOR_BOOL_MODE_P): > * tree-ssa-sccvn.cc (visit_reference_op_load): > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/pr108185-1.c: New test. > * gcc.target/riscv/pr108185-2.c: New test. > * gcc.target/riscv/pr108185-3.c: New test. > * gcc.target/riscv/pr108185-4.c: New test. > * gcc.target/riscv/pr108185-5.c: New test. > * gcc.target/riscv/pr108185-6.c: New test. > * gcc.target/riscv/pr108185-7.c: New test. > * gcc.target/riscv/pr108185-8.c: New test. > > Signed-off-by: Pan Li > --- > gcc/config/riscv/riscv-modes.def | 14 ++-- > gcc/config/riscv/riscv.cc | 34 ++++++++- > gcc/config/riscv/riscv.h | 2 + > gcc/machmode.h | 3 + > gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++ > gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++ > gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++ > gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++ > gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++ > gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++ > gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++ > gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++ > gcc/tree-ssa-sccvn.cc | 13 +++- > 13 files changed, 608 insertions(+), 11 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c > > diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def > index d5305efa8a6..cc21d3c83a2 100644 > --- a/gcc/config/riscv/riscv-modes.def > +++ b/gcc/config/riscv/riscv-modes.def > @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1); > ADJUST_ALIGNMENT (VNx32BI, 1); > ADJUST_ALIGNMENT (VNx64BI, 1); > -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); > -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); > -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); > -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); > -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); > -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); > -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8)); > +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1)); > +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1)); > +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1)); > +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1)); > +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2)); > +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4)); > +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8)); > /* > | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 | > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index 3b7804b7501..138c052e13c 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale) > return scale; > } > +/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct > + BYTES size for corresponding machine_mode. */ > + > +poly_int64 > +riscv_v_adjust_bytesize (machine_mode mode, int scale) > +{ > + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL); > + > + if (riscv_v_ext_vector_mode_p (mode)) > + { > + poly_uint16 mode_size = GET_MODE_SIZE (mode); > + > + if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR)) > + return mode_size; > + else > + return BYTES_PER_RISCV_VECTOR; > + } > + > + return scale; > +} > + > /* Return true if X is a valid address for machine mode MODE. If it is, > fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in > effect. */ > @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode) > /* Implement TARGET_MODES_TIEABLE_P. > Don't allow floating-point modes to be tied, since type punning of > - single-precision and double-precision is implementation defined. */ > + single-precision and double-precision is implementation defined. > + > + Don't allow different vbool*_t modes to be tied, since the type > + size is determinated by vl. */ > static bool > riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2) > { > + if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2)) > + { > + if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2)) > + return false; > + > + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2)); > + } > + > return (mode1 == mode2 > || !(GET_MODE_CLASS (mode1) == MODE_FLOAT > && GET_MODE_CLASS (mode2) == MODE_FLOAT)); > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h > index faffd5a77fe..f857223338c 100644 > --- a/gcc/config/riscv/riscv.h > +++ b/gcc/config/riscv/riscv.h > @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary; > extern unsigned riscv_bytes_per_vector_chunk; > extern poly_uint16 riscv_vector_chunks; > extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int); > +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale); > + > /* The number of bits and bytes in a RVV vector. */ > #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8)) > #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk)) > diff --git a/gcc/machmode.h b/gcc/machmode.h > index f1865c1ef42..6720472f2c9 100644 > --- a/gcc/machmode.h > +++ b/gcc/machmode.h > @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES]; > || CLASS == MODE_ACCUM \ > || CLASS == MODE_UACCUM) > +/* Nonzero if MODE is an vector bool mode. */ > +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL) > + > /* An optional T (i.e. a T or nothing), where T is some form of mode class. */ > template > class opt_mode > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c > new file mode 100644 > index 00000000000..c3d0b10271a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c > @@ -0,0 +1,68 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > + > +#include "riscv_vector.h" > + > +void > +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > + vbool1_t v1 = *(vbool1_t*)in; > + vbool2_t v2 = *(vbool2_t*)in; > + > + *(vbool1_t*)(out + 100) = v1; > + *(vbool2_t*)(out + 200) = v2; > +} > + > +void > +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > + vbool1_t v1 = *(vbool1_t*)in; > + vbool4_t v2 = *(vbool4_t*)in; > + > + *(vbool1_t*)(out + 100) = v1; > + *(vbool4_t*)(out + 200) = v2; > +} > + > +void > +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > + vbool1_t v1 = *(vbool1_t*)in; > + vbool8_t v2 = *(vbool8_t*)in; > + > + *(vbool1_t*)(out + 100) = v1; > + *(vbool8_t*)(out + 200) = v2; > +} > + > +void > +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > + vbool1_t v1 = *(vbool1_t*)in; > + vbool16_t v2 = *(vbool16_t*)in; > + > + *(vbool1_t*)(out + 100) = v1; > + *(vbool16_t*)(out + 200) = v2; > +} > + > +void > +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > + vbool1_t v1 = *(vbool1_t*)in; > + vbool32_t v2 = *(vbool32_t*)in; > + > + *(vbool1_t*)(out + 100) = v1; > + *(vbool32_t*)(out + 200) = v2; > +} > + > +void > +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > + vbool1_t v1 = *(vbool1_t*)in; > + vbool64_t v2 = *(vbool64_t*)in; > + > + *(vbool1_t*)(out + 100) = v1; > + *(vbool64_t*)(out + 200) = v2; > +} > + > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c > new file mode 100644 > index 00000000000..bd13ba916da > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c > @@ -0,0 +1,68 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > + > +#include "riscv_vector.h" > + > +void > +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > + vbool2_t v1 = *(vbool2_t*)in; > + vbool1_t v2 = *(vbool1_t*)in; > + > + *(vbool2_t*)(out + 100) = v1; > + *(vbool1_t*)(out + 200) = v2; > +} > + > +void > +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > + vbool2_t v1 = *(vbool2_t*)in; > + vbool4_t v2 = *(vbool4_t*)in; > + > + *(vbool2_t*)(out + 100) = v1; > + *(vbool4_t*)(out + 200) = v2; > +} > + > +void > +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > + vbool2_t v1 = *(vbool2_t*)in; > + vbool8_t v2 = *(vbool8_t*)in; > + > + *(vbool2_t*)(out + 100) = v1; > + *(vbool8_t*)(out + 200) = v2; > +} > + > +void > +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > + vbool2_t v1 = *(vbool2_t*)in; > + vbool16_t v2 = *(vbool16_t*)in; > + > + *(vbool2_t*)(out + 100) = v1; > + *(vbool16_t*)(out + 200) = v2; > +} > + > +void > +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > + vbool2_t v1 = *(vbool2_t*)in; > + vbool32_t v2 = *(vbool32_t*)in; > + > + *(vbool2_t*)(out + 100) = v1; > + *(vbool32_t*)(out + 200) = v2; > +} > + > +void > +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > + vbool2_t v1 = *(vbool2_t*)in; > + vbool64_t v2 = *(vbool64_t*)in; > + > + *(vbool2_t*)(out + 100) = v1; > + *(vbool64_t*)(out + 200) = v2; > +} > + > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c > new file mode 100644 > index 00000000000..99928f7b1cc > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c > @@ -0,0 +1,68 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > + > +#include "riscv_vector.h" > + > +void > +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > + vbool4_t v1 = *(vbool4_t*)in; > + vbool1_t v2 = *(vbool1_t*)in; > + > + *(vbool4_t*)(out + 100) = v1; > + *(vbool1_t*)(out + 200) = v2; > +} > + > +void > +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > + vbool4_t v1 = *(vbool4_t*)in; > + vbool2_t v2 = *(vbool2_t*)in; > + > + *(vbool4_t*)(out + 100) = v1; > + *(vbool2_t*)(out + 200) = v2; > +} > + > +void > +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > + vbool4_t v1 = *(vbool4_t*)in; > + vbool8_t v2 = *(vbool8_t*)in; > + > + *(vbool4_t*)(out + 100) = v1; > + *(vbool8_t*)(out + 200) = v2; > +} > + > +void > +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > + vbool4_t v1 = *(vbool4_t*)in; > + vbool16_t v2 = *(vbool16_t*)in; > + > + *(vbool4_t*)(out + 100) = v1; > + *(vbool16_t*)(out + 200) = v2; > +} > + > +void > +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > + vbool4_t v1 = *(vbool4_t*)in; > + vbool32_t v2 = *(vbool32_t*)in; > + > + *(vbool4_t*)(out + 100) = v1; > + *(vbool32_t*)(out + 200) = v2; > +} > + > +void > +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > + vbool4_t v1 = *(vbool4_t*)in; > + vbool64_t v2 = *(vbool64_t*)in; > + > + *(vbool4_t*)(out + 100) = v1; > + *(vbool64_t*)(out + 200) = v2; > +} > + > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c > new file mode 100644 > index 00000000000..e70284fada8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c > @@ -0,0 +1,68 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > + > +#include "riscv_vector.h" > + > +void > +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > + vbool8_t v1 = *(vbool8_t*)in; > + vbool1_t v2 = *(vbool1_t*)in; > + > + *(vbool8_t*)(out + 100) = v1; > + *(vbool1_t*)(out + 200) = v2; > +} > + > +void > +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > + vbool8_t v1 = *(vbool8_t*)in; > + vbool2_t v2 = *(vbool2_t*)in; > + > + *(vbool8_t*)(out + 100) = v1; > + *(vbool2_t*)(out + 200) = v2; > +} > + > +void > +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > + vbool8_t v1 = *(vbool8_t*)in; > + vbool4_t v2 = *(vbool4_t*)in; > + > + *(vbool8_t*)(out + 100) = v1; > + *(vbool4_t*)(out + 200) = v2; > +} > + > +void > +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > + vbool8_t v1 = *(vbool8_t*)in; > + vbool16_t v2 = *(vbool16_t*)in; > + > + *(vbool8_t*)(out + 100) = v1; > + *(vbool16_t*)(out + 200) = v2; > +} > + > +void > +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > + vbool8_t v1 = *(vbool8_t*)in; > + vbool32_t v2 = *(vbool32_t*)in; > + > + *(vbool8_t*)(out + 100) = v1; > + *(vbool32_t*)(out + 200) = v2; > +} > + > +void > +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > + vbool8_t v1 = *(vbool8_t*)in; > + vbool64_t v2 = *(vbool64_t*)in; > + > + *(vbool8_t*)(out + 100) = v1; > + *(vbool64_t*)(out + 200) = v2; > +} > + > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c > new file mode 100644 > index 00000000000..575a7842cdf > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c > @@ -0,0 +1,68 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > + > +#include "riscv_vector.h" > + > +void > +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > + vbool16_t v1 = *(vbool16_t*)in; > + vbool1_t v2 = *(vbool1_t*)in; > + > + *(vbool16_t*)(out + 100) = v1; > + *(vbool1_t*)(out + 200) = v2; > +} > + > +void > +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > + vbool16_t v1 = *(vbool16_t*)in; > + vbool2_t v2 = *(vbool2_t*)in; > + > + *(vbool16_t*)(out + 100) = v1; > + *(vbool2_t*)(out + 200) = v2; > +} > + > +void > +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > + vbool16_t v1 = *(vbool16_t*)in; > + vbool4_t v2 = *(vbool4_t*)in; > + > + *(vbool16_t*)(out + 100) = v1; > + *(vbool4_t*)(out + 200) = v2; > +} > + > +void > +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > + vbool16_t v1 = *(vbool16_t*)in; > + vbool8_t v2 = *(vbool8_t*)in; > + > + *(vbool16_t*)(out + 100) = v1; > + *(vbool8_t*)(out + 200) = v2; > +} > + > +void > +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > + vbool16_t v1 = *(vbool16_t*)in; > + vbool32_t v2 = *(vbool32_t*)in; > + > + *(vbool16_t*)(out + 100) = v1; > + *(vbool32_t*)(out + 200) = v2; > +} > + > +void > +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > + vbool16_t v1 = *(vbool16_t*)in; > + vbool64_t v2 = *(vbool64_t*)in; > + > + *(vbool16_t*)(out + 100) = v1; > + *(vbool64_t*)(out + 200) = v2; > +} > + > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c > new file mode 100644 > index 00000000000..95a11d37016 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c > @@ -0,0 +1,68 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > + > +#include "riscv_vector.h" > + > +void > +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > + vbool32_t v1 = *(vbool32_t*)in; > + vbool1_t v2 = *(vbool1_t*)in; > + > + *(vbool32_t*)(out + 100) = v1; > + *(vbool1_t*)(out + 200) = v2; > +} > + > +void > +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > + vbool32_t v1 = *(vbool32_t*)in; > + vbool2_t v2 = *(vbool2_t*)in; > + > + *(vbool32_t*)(out + 100) = v1; > + *(vbool2_t*)(out + 200) = v2; > +} > + > +void > +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > + vbool32_t v1 = *(vbool32_t*)in; > + vbool4_t v2 = *(vbool4_t*)in; > + > + *(vbool32_t*)(out + 100) = v1; > + *(vbool4_t*)(out + 200) = v2; > +} > + > +void > +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > + vbool32_t v1 = *(vbool32_t*)in; > + vbool8_t v2 = *(vbool8_t*)in; > + > + *(vbool32_t*)(out + 100) = v1; > + *(vbool8_t*)(out + 200) = v2; > +} > + > +void > +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > + vbool32_t v1 = *(vbool32_t*)in; > + vbool16_t v2 = *(vbool16_t*)in; > + > + *(vbool32_t*)(out + 100) = v1; > + *(vbool16_t*)(out + 200) = v2; > +} > + > +void > +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > + vbool32_t v1 = *(vbool32_t*)in; > + vbool64_t v2 = *(vbool64_t*)in; > + > + *(vbool32_t*)(out + 100) = v1; > + *(vbool64_t*)(out + 200) = v2; > +} > + > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c > new file mode 100644 > index 00000000000..8f6f0b11f09 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c > @@ -0,0 +1,68 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > + > +#include "riscv_vector.h" > + > +void > +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > + vbool64_t v1 = *(vbool64_t*)in; > + vbool1_t v2 = *(vbool1_t*)in; > + > + *(vbool64_t*)(out + 100) = v1; > + *(vbool1_t*)(out + 200) = v2; > +} > + > +void > +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > + vbool64_t v1 = *(vbool64_t*)in; > + vbool2_t v2 = *(vbool2_t*)in; > + > + *(vbool64_t*)(out + 100) = v1; > + *(vbool2_t*)(out + 200) = v2; > +} > + > +void > +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > + vbool64_t v1 = *(vbool64_t*)in; > + vbool4_t v2 = *(vbool4_t*)in; > + > + *(vbool64_t*)(out + 100) = v1; > + *(vbool4_t*)(out + 200) = v2; > +} > + > +void > +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > + vbool64_t v1 = *(vbool64_t*)in; > + vbool8_t v2 = *(vbool8_t*)in; > + > + *(vbool64_t*)(out + 100) = v1; > + *(vbool8_t*)(out + 200) = v2; > +} > + > +void > +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > + vbool64_t v1 = *(vbool64_t*)in; > + vbool16_t v2 = *(vbool16_t*)in; > + > + *(vbool64_t*)(out + 100) = v1; > + *(vbool16_t*)(out + 200) = v2; > +} > + > +void > +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > + vbool64_t v1 = *(vbool64_t*)in; > + vbool32_t v2 = *(vbool32_t*)in; > + > + *(vbool64_t*)(out + 100) = v1; > + *(vbool32_t*)(out + 200) = v2; > +} > + > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c > new file mode 100644 > index 00000000000..d96959dd064 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c > @@ -0,0 +1,77 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > + > +#include "riscv_vector.h" > + > +void > +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > + vbool1_t v1 = *(vbool1_t*)in; > + vbool1_t v2 = *(vbool1_t*)in; > + > + *(vbool1_t*)(out + 100) = v1; > + *(vbool1_t*)(out + 200) = v2; > +} > + > +void > +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > + vbool2_t v1 = *(vbool2_t*)in; > + vbool2_t v2 = *(vbool2_t*)in; > + > + *(vbool2_t*)(out + 100) = v1; > + *(vbool2_t*)(out + 200) = v2; > +} > + > +void > +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > + vbool4_t v1 = *(vbool4_t*)in; > + vbool4_t v2 = *(vbool4_t*)in; > + > + *(vbool4_t*)(out + 100) = v1; > + *(vbool4_t*)(out + 200) = v2; > +} > + > +void > +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > + vbool8_t v1 = *(vbool8_t*)in; > + vbool8_t v2 = *(vbool8_t*)in; > + > + *(vbool8_t*)(out + 100) = v1; > + *(vbool8_t*)(out + 200) = v2; > +} > + > +void > +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > + vbool16_t v1 = *(vbool16_t*)in; > + vbool16_t v2 = *(vbool16_t*)in; > + > + *(vbool16_t*)(out + 100) = v1; > + *(vbool16_t*)(out + 200) = v2; > +} > + > +void > +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > + vbool32_t v1 = *(vbool32_t*)in; > + vbool32_t v2 = *(vbool32_t*)in; > + > + *(vbool32_t*)(out + 100) = v1; > + *(vbool32_t*)(out + 200) = v2; > +} > + > +void > +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > + vbool64_t v1 = *(vbool64_t*)in; > + vbool64_t v2 = *(vbool64_t*)in; > + > + *(vbool64_t*)(out + 100) = v1; > + *(vbool64_t*)(out + 200) = v2; > +} > + > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */ > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */ > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > index 028bedbc9a0..19fdba8cfa2 100644 > --- a/gcc/tree-ssa-sccvn.cc > +++ b/gcc/tree-ssa-sccvn.cc > @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see > #include "gimple-fold.h" > #include "tree-eh.h" > #include "gimplify.h" > +#include "target.h" > #include "flags.h" > #include "dojump.h" > #include "explow.h" > @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt) > if (result > && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op))) > { > + machine_mode result_mode = TYPE_MODE (TREE_TYPE (result)); > + machine_mode op_mode = TYPE_MODE (TREE_TYPE (op)); > + poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode); > + poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode); > + > /* Avoid the type punning in case the result mode has padding where > - the op we lookup has not. */ > - if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))), > - GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op))))) > + the op we lookup has not. > + Avoid the type punning in case the target mode cannot be tied. */ > + if (maybe_lt (result_mode_precision, op_mode_precision) > + || !targetm.modes_tieable_p (result_mode, op_mode)) > result = NULL_TREE; > else > { > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)