* [PATCH] IFN: Fix vector extraction into promoted subreg. @ 2023-08-16 1:31 juzhe.zhong 2023-08-16 6:45 ` Richard Sandiford 0 siblings, 1 reply; 5+ messages in thread From: juzhe.zhong @ 2023-08-16 1:31 UTC (permalink / raw) To: gcc-patches; +Cc: richard.sandiford, rguenther [-- Attachment #1: Type: text/plain, Size: 411 bytes --] Hi, Robin, Richard and Richi. I am wondering whether we can just simply replace the VEC_EXTRACT expander with binary? Like this :? DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW, - vec_extract, vec_extract) + vec_extract, binary) to fix the sign extend issue. And remove the vec_extract explicit expander in internal-fn.cc ? Thanks. juzhe.zhong@rivai.ai ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] IFN: Fix vector extraction into promoted subreg. 2023-08-16 1:31 [PATCH] IFN: Fix vector extraction into promoted subreg juzhe.zhong @ 2023-08-16 6:45 ` Richard Sandiford 2023-08-16 9:37 ` Robin Dapp 0 siblings, 1 reply; 5+ messages in thread From: Richard Sandiford @ 2023-08-16 6:45 UTC (permalink / raw) To: juzhe.zhong; +Cc: gcc-patches, rguenther "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: > Hi, Robin, Richard and Richi. > > I am wondering whether we can just simply replace the VEC_EXTRACT expander with binary? > > Like this :? > > DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW, > - vec_extract, vec_extract) > + vec_extract, binary) > > to fix the sign extend issue. > > And remove the vec_extract explicit expander in internal-fn.cc ? I'm not sure how that would work. The vec_extract optab takes two modes whereas binary optabs take one mode. However: | #define vec_extract_direct { 3, 3, false } This looks wrong. The numbers are argument numbers (or -1 for a return value). vec_extract only takes 2 arguments, so 3 looks to be out-of-range. | #define direct_vec_extract_optab_supported_p direct_optab_supported_p I would expect this to be convert_optab_supported_p. On the promoted subreg thing, I think expand_vec_extract_optab_fn should use expand_fn_using_insn. Thanks, Richard ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] IFN: Fix vector extraction into promoted subreg. 2023-08-16 6:45 ` Richard Sandiford @ 2023-08-16 9:37 ` Robin Dapp 2023-08-16 10:05 ` Richard Sandiford 0 siblings, 1 reply; 5+ messages in thread From: Robin Dapp @ 2023-08-16 9:37 UTC (permalink / raw) To: juzhe.zhong, gcc-patches, rguenther, richard.sandiford; +Cc: rdapp.gcc > However: > > | #define vec_extract_direct { 3, 3, false } > > This looks wrong. The numbers are argument numbers (or -1 for a return > value). vec_extract only takes 2 arguments, so 3 looks to be out-of-range. > > | #define direct_vec_extract_optab_supported_p direct_optab_supported_p > > I would expect this to be convert_optab_supported_p. > > On the promoted subreg thing, I think expand_vec_extract_optab_fn > should use expand_fn_using_insn. Thanks, really easier that way. Attached a new version that's currently bootstrapping. Does that look better? Regards Robin Subject: [PATCH v2] internal-fn: Fix vector extraction into promoted subreg. This patch fixes the case where vec_extract gets passed a promoted subreg (e.g. from a return value). This is achieved by using expand_convert_optab_fn instead of a separate expander function. gcc/ChangeLog: * internal-fn.cc (vec_extract_direct): Change type argument numbers. (expand_vec_extract_optab_fn): Call convert_optab_fn. (direct_vec_extract_optab_supported_p): Use convert_optab_supported_p. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c: New test. --- gcc/internal-fn.cc | 44 +----- .../rvv/autovec/vls-vlmax/vec_extract-1u.c | 63 ++++++++ .../rvv/autovec/vls-vlmax/vec_extract-2u.c | 69 +++++++++ .../rvv/autovec/vls-vlmax/vec_extract-3u.c | 69 +++++++++ .../rvv/autovec/vls-vlmax/vec_extract-4u.c | 70 +++++++++ .../rvv/autovec/vls-vlmax/vec_extract-runu.c | 137 ++++++++++++++++++ 6 files changed, 413 insertions(+), 39 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 4f2b20a79e5..5cce36a789b 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -175,7 +175,7 @@ init_internal_fns () #define len_store_direct { 3, 3, false } #define mask_len_store_direct { 4, 5, false } #define vec_set_direct { 3, 3, false } -#define vec_extract_direct { 3, 3, false } +#define vec_extract_direct { 0, -1, false } #define unary_direct { 0, 0, true } #define unary_convert_direct { -1, 0, true } #define binary_direct { 0, 0, true } @@ -3127,43 +3127,6 @@ expand_vec_set_optab_fn (internal_fn, gcall *stmt, convert_optab optab) gcc_unreachable (); } -/* Expand VEC_EXTRACT optab internal function. */ - -static void -expand_vec_extract_optab_fn (internal_fn, gcall *stmt, convert_optab optab) -{ - tree lhs = gimple_call_lhs (stmt); - tree op0 = gimple_call_arg (stmt, 0); - tree op1 = gimple_call_arg (stmt, 1); - - rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); - - machine_mode outermode = TYPE_MODE (TREE_TYPE (op0)); - machine_mode extract_mode = TYPE_MODE (TREE_TYPE (lhs)); - - rtx src = expand_normal (op0); - rtx pos = expand_normal (op1); - - class expand_operand ops[3]; - enum insn_code icode = convert_optab_handler (optab, outermode, - extract_mode); - - if (icode != CODE_FOR_nothing) - { - create_output_operand (&ops[0], target, extract_mode); - create_input_operand (&ops[1], src, outermode); - create_convert_operand_from (&ops[2], pos, - TYPE_MODE (TREE_TYPE (op1)), true); - if (maybe_expand_insn (icode, 3, ops)) - { - if (!rtx_equal_p (target, ops[0].value)) - emit_move_insn (target, ops[0].value); - return; - } - } - gcc_unreachable (); -} - static void expand_ABNORMAL_DISPATCHER (internal_fn, gcall *) { @@ -3917,6 +3880,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab, #define expand_unary_convert_optab_fn(FN, STMT, OPTAB) \ expand_convert_optab_fn (FN, STMT, OPTAB, 1) +#define expand_vec_extract_optab_fn(FN, STMT, OPTAB) \ + expand_convert_optab_fn (FN, STMT, OPTAB, 2) + /* RETURN_TYPE and ARGS are a return type and argument list that are in principle compatible with FN (which satisfies direct_internal_fn_p). Return the types that should be used to determine whether the @@ -4019,7 +3985,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, #define direct_mask_len_fold_left_optab_supported_p direct_optab_supported_p #define direct_check_ptrs_optab_supported_p direct_optab_supported_p #define direct_vec_set_optab_supported_p direct_optab_supported_p -#define direct_vec_extract_optab_supported_p direct_optab_supported_p +#define direct_vec_extract_optab_supported_p convert_optab_supported_p /* Return the optab used by internal function FN. */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c new file mode 100644 index 00000000000..a35988ff55d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c @@ -0,0 +1,63 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ + +#include <stdint-gcc.h> + +typedef uint64_t vnx2di __attribute__((vector_size (16))); +typedef uint32_t vnx4si __attribute__((vector_size (16))); +typedef uint16_t vnx8hi __attribute__((vector_size (16))); +typedef uint8_t vnx16qi __attribute__((vector_size (16))); + +#define VEC_EXTRACT(S,V,IDX) \ + S \ + __attribute__((noipa)) \ + vec_extract_##V##_##IDX (V v) \ + { \ + return v[IDX]; \ + } + +#define VEC_EXTRACT_VAR1(S,V) \ + S \ + __attribute__((noipa)) \ + vec_extract_var_##V (V v, int8_t idx) \ + { \ + return v[idx]; \ + } + +#define TEST_ALL1(T) \ + T (uint64_t, vnx2di, 0) \ + T (uint64_t, vnx2di, 1) \ + T (uint32_t, vnx4si, 0) \ + T (uint32_t, vnx4si, 1) \ + T (uint32_t, vnx4si, 3) \ + T (uint16_t, vnx8hi, 0) \ + T (uint16_t, vnx8hi, 2) \ + T (uint16_t, vnx8hi, 6) \ + T (uint8_t, vnx16qi, 0) \ + T (uint8_t, vnx16qi, 1) \ + T (uint8_t, vnx16qi, 7) \ + T (uint8_t, vnx16qi, 11) \ + T (uint8_t, vnx16qi, 15) \ + +#define TEST_ALL_VAR1(T) \ + T (uint64_t, vnx2di) \ + T (uint32_t, vnx4si) \ + T (uint16_t, vnx8hi) \ + T (uint8_t, vnx16qi) \ + +TEST_ALL1 (VEC_EXTRACT) +TEST_ALL_VAR1 (VEC_EXTRACT_VAR1) + +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*ta,\s*ma} 4 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*ta,\s*ma} 4 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*ta,\s*ma} 3 } } */ + +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 9 } } */ +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 4 } } */ + +/* { dg-final { scan-assembler-times {\tvmv.x.s} 17 } } */ + +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 6 } } */ +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 4 } } */ +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 4 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c new file mode 100644 index 00000000000..8c3c16a7047 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c @@ -0,0 +1,69 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ + +#include <stdint-gcc.h> + +typedef uint64_t vnx4di __attribute__((vector_size (32))); +typedef uint32_t vnx8si __attribute__((vector_size (32))); +typedef uint16_t vnx16hi __attribute__((vector_size (32))); +typedef uint8_t vnx32qi __attribute__((vector_size (32))); + +#define VEC_EXTRACT(S,V,IDX) \ + S \ + __attribute__((noipa)) \ + vec_extract_##V##_##IDX (V v) \ + { \ + return v[IDX]; \ + } + +#define VEC_EXTRACT_VAR2(S,V) \ + S \ + __attribute__((noipa)) \ + vec_extract_var_##V (V v, int16_t idx) \ + { \ + return v[idx]; \ + } + +#define TEST_ALL2(T) \ + T (uint64_t, vnx4di, 0) \ + T (uint64_t, vnx4di, 1) \ + T (uint64_t, vnx4di, 2) \ + T (uint64_t, vnx4di, 3) \ + T (uint32_t, vnx8si, 0) \ + T (uint32_t, vnx8si, 1) \ + T (uint32_t, vnx8si, 3) \ + T (uint32_t, vnx8si, 4) \ + T (uint32_t, vnx8si, 7) \ + T (uint16_t, vnx16hi, 0) \ + T (uint16_t, vnx16hi, 1) \ + T (uint16_t, vnx16hi, 7) \ + T (uint16_t, vnx16hi, 8) \ + T (uint16_t, vnx16hi, 15) \ + T (uint8_t, vnx32qi, 0) \ + T (uint8_t, vnx32qi, 1) \ + T (uint8_t, vnx32qi, 15) \ + T (uint8_t, vnx32qi, 16) \ + T (uint8_t, vnx32qi, 31) \ + +#define TEST_ALL_VAR2(T) \ + T (uint64_t, vnx4di) \ + T (uint32_t, vnx8si) \ + T (uint16_t, vnx16hi) \ + T (uint8_t, vnx32qi) \ + +TEST_ALL2 (VEC_EXTRACT) +TEST_ALL_VAR2 (VEC_EXTRACT_VAR2) + +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m2,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m2,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m2,\s*ta,\s*ma} 5 } } */ + +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 15 } } */ +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 4 } } */ + +/* { dg-final { scan-assembler-times {\tvmv.x.s} 23 } } */ + +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 6 } } */ +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 6 } } */ +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 6 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c new file mode 100644 index 00000000000..ab49f29c3f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c @@ -0,0 +1,69 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ + +#include <stdint-gcc.h> + +typedef uint64_t vnx8di __attribute__((vector_size (64))); +typedef uint32_t vnx16si __attribute__((vector_size (64))); +typedef uint16_t vnx32hi __attribute__((vector_size (64))); +typedef uint8_t vnx64qi __attribute__((vector_size (64))); + +#define VEC_EXTRACT(S,V,IDX) \ + S \ + __attribute__((noipa)) \ + vec_extract_##V##_##IDX (V v) \ + { \ + return v[IDX]; \ + } + +#define VEC_EXTRACT_VAR3(S,V) \ + S \ + __attribute__((noipa)) \ + vec_extract_var_##V (V v, int32_t idx) \ + { \ + return v[idx]; \ + } + +#define TEST_ALL3(T) \ + T (uint64_t, vnx8di, 0) \ + T (uint64_t, vnx8di, 2) \ + T (uint64_t, vnx8di, 4) \ + T (uint64_t, vnx8di, 6) \ + T (uint32_t, vnx16si, 0) \ + T (uint32_t, vnx16si, 2) \ + T (uint32_t, vnx16si, 6) \ + T (uint32_t, vnx16si, 8) \ + T (uint32_t, vnx16si, 14) \ + T (uint16_t, vnx32hi, 0) \ + T (uint16_t, vnx32hi, 2) \ + T (uint16_t, vnx32hi, 14) \ + T (uint16_t, vnx32hi, 16) \ + T (uint16_t, vnx32hi, 30) \ + T (uint8_t, vnx64qi, 0) \ + T (uint8_t, vnx64qi, 2) \ + T (uint8_t, vnx64qi, 30) \ + T (uint8_t, vnx64qi, 32) \ + T (uint8_t, vnx64qi, 63) \ + +#define TEST_ALL_VAR3(T) \ + T (uint64_t, vnx8di) \ + T (uint32_t, vnx16si) \ + T (uint16_t, vnx32hi) \ + T (uint8_t, vnx64qi) \ + +TEST_ALL3 (VEC_EXTRACT) +TEST_ALL_VAR3 (VEC_EXTRACT_VAR3) + +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m4,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m4,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m4,\s*ta,\s*ma} 5 } } */ + +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 13 } } */ +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 6 } } */ + +/* { dg-final { scan-assembler-times {\tvmv.x.s} 23 } } */ + +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 6 } } */ +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 6 } } */ +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 6 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c new file mode 100644 index 00000000000..328d426e572 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ + +#include <stdint-gcc.h> + +typedef uint64_t vnx16di __attribute__((vector_size (128))); +typedef uint32_t vnx32si __attribute__((vector_size (128))); +typedef uint16_t vnx64hi __attribute__((vector_size (128))); +typedef uint8_t vnx128qi __attribute__((vector_size (128))); + +#define VEC_EXTRACT(S,V,IDX) \ + S \ + __attribute__((noipa)) \ + vec_extract_##V##_##IDX (V v) \ + { \ + return v[IDX]; \ + } + +#define VEC_EXTRACT_VAR4(S,V) \ + S \ + __attribute__((noipa)) \ + vec_extract_var_##V (V v, int64_t idx) \ + { \ + return v[idx]; \ + } + +#define TEST_ALL4(T) \ + T (uint64_t, vnx16di, 0) \ + T (uint64_t, vnx16di, 4) \ + T (uint64_t, vnx16di, 8) \ + T (uint64_t, vnx16di, 12) \ + T (uint32_t, vnx32si, 0) \ + T (uint32_t, vnx32si, 4) \ + T (uint32_t, vnx32si, 12) \ + T (uint32_t, vnx32si, 16) \ + T (uint32_t, vnx32si, 28) \ + T (uint16_t, vnx64hi, 0) \ + T (uint16_t, vnx64hi, 4) \ + T (uint16_t, vnx64hi, 28) \ + T (uint16_t, vnx64hi, 32) \ + T (uint16_t, vnx64hi, 60) \ + T (uint8_t, vnx128qi, 0) \ + T (uint8_t, vnx128qi, 4) \ + T (uint8_t, vnx128qi, 30) \ + T (uint8_t, vnx128qi, 60) \ + T (uint8_t, vnx128qi, 64) \ + T (uint8_t, vnx128qi, 127) \ + +#define TEST_ALL_VAR4(T) \ + T (uint64_t, vnx16di) \ + T (uint32_t, vnx32si) \ + T (uint16_t, vnx64hi) \ + T (uint8_t, vnx128qi) \ + +TEST_ALL4 (VEC_EXTRACT) +TEST_ALL_VAR4 (VEC_EXTRACT_VAR4) + +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m8,\s*ta,\s*ma} 7 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m8,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m8,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m8,\s*ta,\s*ma} 5 } } */ + +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 11 } } */ +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 9 } } */ + +/* { dg-final { scan-assembler-times {\tvmv.x.s} 24 } } */ + +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 7 } } */ +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 6 } } */ +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 6 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c new file mode 100644 index 00000000000..924e40c9dbb --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c @@ -0,0 +1,137 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "-std=c99 -Wno-pedantic -Wno-psabi" } */ + +#include <assert.h> +#include <limits.h> + +#include "vec_extract-1u.c" +#include "vec_extract-2u.c" +#include "vec_extract-3u.c" +#include "vec_extract-4u.c" + +#define CHECK(S, V, IDX) \ + __attribute__ ((noipa, optimize ("0"))) void check_##V##_##IDX () \ + { \ + V v; \ + for (int i = 0; i < sizeof (V) / sizeof (S); i++) \ + v[i] = (S) (INT_MAX - i); \ + S res = vec_extract_##V##_##IDX (v); \ + assert (res == (S) (INT_MAX - IDX)); \ + } + +#define CHECK_VAR(S, V) \ + __attribute__ ((noipa, optimize ("0"))) void check_var_##V (int32_t idx) \ + { \ + V v; \ + for (int i = 0; i < sizeof (V) / sizeof (S); i++) \ + v[i] = (S) (INT_MAX - i); \ + S res = vec_extract_var_##V (v, idx); \ + assert (res == (S) (INT_MAX - idx)); \ + } + +#define RUN(S, V, IDX) check_##V##_##IDX (); + +#define RUN_VAR(S, V) \ + for (int i = 0; i < sizeof (V) / sizeof (S); i++) \ + check_var_##V (i); + +#define RUN_ALL(T) \ + T (uint64_t, vnx2di, 0) \ + T (uint64_t, vnx2di, 1) \ + T (uint32_t, vnx4si, 0) \ + T (uint32_t, vnx4si, 1) \ + T (uint32_t, vnx4si, 3) \ + T (uint16_t, vnx8hi, 0) \ + T (uint16_t, vnx8hi, 2) \ + T (uint16_t, vnx8hi, 6) \ + T (uint8_t, vnx16qi, 0) \ + T (uint8_t, vnx16qi, 1) \ + T (uint8_t, vnx16qi, 7) \ + T (uint8_t, vnx16qi, 11) \ + T (uint8_t, vnx16qi, 15) \ + T (uint64_t, vnx4di, 0) \ + T (uint64_t, vnx4di, 1) \ + T (uint64_t, vnx4di, 2) \ + T (uint64_t, vnx4di, 3) \ + T (uint32_t, vnx8si, 0) \ + T (uint32_t, vnx8si, 1) \ + T (uint32_t, vnx8si, 3) \ + T (uint32_t, vnx8si, 4) \ + T (uint32_t, vnx8si, 7) \ + T (uint16_t, vnx16hi, 0) \ + T (uint16_t, vnx16hi, 1) \ + T (uint16_t, vnx16hi, 7) \ + T (uint16_t, vnx16hi, 8) \ + T (uint16_t, vnx16hi, 15) \ + T (uint8_t, vnx32qi, 0) \ + T (uint8_t, vnx32qi, 1) \ + T (uint8_t, vnx32qi, 15) \ + T (uint8_t, vnx32qi, 16) \ + T (uint8_t, vnx32qi, 31) \ + T (uint64_t, vnx8di, 0) \ + T (uint64_t, vnx8di, 2) \ + T (uint64_t, vnx8di, 4) \ + T (uint64_t, vnx8di, 6) \ + T (uint32_t, vnx16si, 0) \ + T (uint32_t, vnx16si, 2) \ + T (uint32_t, vnx16si, 6) \ + T (uint32_t, vnx16si, 8) \ + T (uint32_t, vnx16si, 14) \ + T (uint16_t, vnx32hi, 0) \ + T (uint16_t, vnx32hi, 2) \ + T (uint16_t, vnx32hi, 14) \ + T (uint16_t, vnx32hi, 16) \ + T (uint16_t, vnx32hi, 30) \ + T (uint8_t, vnx64qi, 0) \ + T (uint8_t, vnx64qi, 2) \ + T (uint8_t, vnx64qi, 30) \ + T (uint8_t, vnx64qi, 32) \ + T (uint8_t, vnx64qi, 63) \ + T (uint64_t, vnx16di, 0) \ + T (uint64_t, vnx16di, 4) \ + T (uint64_t, vnx16di, 8) \ + T (uint64_t, vnx16di, 12) \ + T (uint32_t, vnx32si, 0) \ + T (uint32_t, vnx32si, 4) \ + T (uint32_t, vnx32si, 12) \ + T (uint32_t, vnx32si, 16) \ + T (uint32_t, vnx32si, 28) \ + T (uint16_t, vnx64hi, 0) \ + T (uint16_t, vnx64hi, 4) \ + T (uint16_t, vnx64hi, 28) \ + T (uint16_t, vnx64hi, 32) \ + T (uint16_t, vnx64hi, 60) \ + T (uint8_t, vnx128qi, 0) \ + T (uint8_t, vnx128qi, 4) \ + T (uint8_t, vnx128qi, 30) \ + T (uint8_t, vnx128qi, 60) \ + T (uint8_t, vnx128qi, 64) \ + T (uint8_t, vnx128qi, 127) + +#define RUN_ALL_VAR(T) \ + T (uint64_t, vnx2di) \ + T (uint32_t, vnx4si) \ + T (uint16_t, vnx8hi) \ + T (uint8_t, vnx16qi) \ + T (uint64_t, vnx4di) \ + T (uint32_t, vnx8si) \ + T (uint16_t, vnx16hi) \ + T (uint8_t, vnx32qi) \ + T (uint64_t, vnx8di) \ + T (uint32_t, vnx16si) \ + T (uint16_t, vnx32hi) \ + T (uint8_t, vnx64qi) \ + T (uint64_t, vnx16di) \ + T (uint32_t, vnx32si) \ + T (uint16_t, vnx64hi) \ + T (uint8_t, vnx128qi) + +RUN_ALL (CHECK) +RUN_ALL_VAR (CHECK_VAR) + +int +main () +{ + RUN_ALL (RUN); + RUN_ALL_VAR (RUN_VAR); +} -- 2.41.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] IFN: Fix vector extraction into promoted subreg. 2023-08-16 9:37 ` Robin Dapp @ 2023-08-16 10:05 ` Richard Sandiford 0 siblings, 0 replies; 5+ messages in thread From: Richard Sandiford @ 2023-08-16 10:05 UTC (permalink / raw) To: Robin Dapp; +Cc: juzhe.zhong, gcc-patches, rguenther Robin Dapp <rdapp.gcc@gmail.com> writes: >> However: >> >> | #define vec_extract_direct { 3, 3, false } >> >> This looks wrong. The numbers are argument numbers (or -1 for a return >> value). vec_extract only takes 2 arguments, so 3 looks to be out-of-range. >> >> | #define direct_vec_extract_optab_supported_p direct_optab_supported_p >> >> I would expect this to be convert_optab_supported_p. >> >> On the promoted subreg thing, I think expand_vec_extract_optab_fn >> should use expand_fn_using_insn. > > Thanks, really easier that way. Attached a new version that's currently > bootstrapping. Does that look better? LGTM, thanks. OK if testing passes. Richard > Regards > Robin > > Subject: [PATCH v2] internal-fn: Fix vector extraction into promoted subreg. > > This patch fixes the case where vec_extract gets passed a promoted > subreg (e.g. from a return value). This is achieved by using > expand_convert_optab_fn instead of a separate expander function. > > gcc/ChangeLog: > > * internal-fn.cc (vec_extract_direct): Change type argument > numbers. > (expand_vec_extract_optab_fn): Call convert_optab_fn. > (direct_vec_extract_optab_supported_p): Use > convert_optab_supported_p. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c: New test. > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c: New test. > --- > gcc/internal-fn.cc | 44 +----- > .../rvv/autovec/vls-vlmax/vec_extract-1u.c | 63 ++++++++ > .../rvv/autovec/vls-vlmax/vec_extract-2u.c | 69 +++++++++ > .../rvv/autovec/vls-vlmax/vec_extract-3u.c | 69 +++++++++ > .../rvv/autovec/vls-vlmax/vec_extract-4u.c | 70 +++++++++ > .../rvv/autovec/vls-vlmax/vec_extract-runu.c | 137 ++++++++++++++++++ > 6 files changed, 413 insertions(+), 39 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 4f2b20a79e5..5cce36a789b 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -175,7 +175,7 @@ init_internal_fns () > #define len_store_direct { 3, 3, false } > #define mask_len_store_direct { 4, 5, false } > #define vec_set_direct { 3, 3, false } > -#define vec_extract_direct { 3, 3, false } > +#define vec_extract_direct { 0, -1, false } > #define unary_direct { 0, 0, true } > #define unary_convert_direct { -1, 0, true } > #define binary_direct { 0, 0, true } > @@ -3127,43 +3127,6 @@ expand_vec_set_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > gcc_unreachable (); > } > > -/* Expand VEC_EXTRACT optab internal function. */ > - > -static void > -expand_vec_extract_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > -{ > - tree lhs = gimple_call_lhs (stmt); > - tree op0 = gimple_call_arg (stmt, 0); > - tree op1 = gimple_call_arg (stmt, 1); > - > - rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > - > - machine_mode outermode = TYPE_MODE (TREE_TYPE (op0)); > - machine_mode extract_mode = TYPE_MODE (TREE_TYPE (lhs)); > - > - rtx src = expand_normal (op0); > - rtx pos = expand_normal (op1); > - > - class expand_operand ops[3]; > - enum insn_code icode = convert_optab_handler (optab, outermode, > - extract_mode); > - > - if (icode != CODE_FOR_nothing) > - { > - create_output_operand (&ops[0], target, extract_mode); > - create_input_operand (&ops[1], src, outermode); > - create_convert_operand_from (&ops[2], pos, > - TYPE_MODE (TREE_TYPE (op1)), true); > - if (maybe_expand_insn (icode, 3, ops)) > - { > - if (!rtx_equal_p (target, ops[0].value)) > - emit_move_insn (target, ops[0].value); > - return; > - } > - } > - gcc_unreachable (); > -} > - > static void > expand_ABNORMAL_DISPATCHER (internal_fn, gcall *) > { > @@ -3917,6 +3880,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab, > #define expand_unary_convert_optab_fn(FN, STMT, OPTAB) \ > expand_convert_optab_fn (FN, STMT, OPTAB, 1) > > +#define expand_vec_extract_optab_fn(FN, STMT, OPTAB) \ > + expand_convert_optab_fn (FN, STMT, OPTAB, 2) > + > /* RETURN_TYPE and ARGS are a return type and argument list that are > in principle compatible with FN (which satisfies direct_internal_fn_p). > Return the types that should be used to determine whether the > @@ -4019,7 +3985,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, > #define direct_mask_len_fold_left_optab_supported_p direct_optab_supported_p > #define direct_check_ptrs_optab_supported_p direct_optab_supported_p > #define direct_vec_set_optab_supported_p direct_optab_supported_p > -#define direct_vec_extract_optab_supported_p direct_optab_supported_p > +#define direct_vec_extract_optab_supported_p convert_optab_supported_p > > /* Return the optab used by internal function FN. */ > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c > new file mode 100644 > index 00000000000..a35988ff55d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c > @@ -0,0 +1,63 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ > + > +#include <stdint-gcc.h> > + > +typedef uint64_t vnx2di __attribute__((vector_size (16))); > +typedef uint32_t vnx4si __attribute__((vector_size (16))); > +typedef uint16_t vnx8hi __attribute__((vector_size (16))); > +typedef uint8_t vnx16qi __attribute__((vector_size (16))); > + > +#define VEC_EXTRACT(S,V,IDX) \ > + S \ > + __attribute__((noipa)) \ > + vec_extract_##V##_##IDX (V v) \ > + { \ > + return v[IDX]; \ > + } > + > +#define VEC_EXTRACT_VAR1(S,V) \ > + S \ > + __attribute__((noipa)) \ > + vec_extract_var_##V (V v, int8_t idx) \ > + { \ > + return v[idx]; \ > + } > + > +#define TEST_ALL1(T) \ > + T (uint64_t, vnx2di, 0) \ > + T (uint64_t, vnx2di, 1) \ > + T (uint32_t, vnx4si, 0) \ > + T (uint32_t, vnx4si, 1) \ > + T (uint32_t, vnx4si, 3) \ > + T (uint16_t, vnx8hi, 0) \ > + T (uint16_t, vnx8hi, 2) \ > + T (uint16_t, vnx8hi, 6) \ > + T (uint8_t, vnx16qi, 0) \ > + T (uint8_t, vnx16qi, 1) \ > + T (uint8_t, vnx16qi, 7) \ > + T (uint8_t, vnx16qi, 11) \ > + T (uint8_t, vnx16qi, 15) \ > + > +#define TEST_ALL_VAR1(T) \ > + T (uint64_t, vnx2di) \ > + T (uint32_t, vnx4si) \ > + T (uint16_t, vnx8hi) \ > + T (uint8_t, vnx16qi) \ > + > +TEST_ALL1 (VEC_EXTRACT) > +TEST_ALL_VAR1 (VEC_EXTRACT_VAR1) > + > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*ta,\s*ma} 4 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*ta,\s*ma} 4 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*ta,\s*ma} 3 } } */ > + > +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 9 } } */ > +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 4 } } */ > + > +/* { dg-final { scan-assembler-times {\tvmv.x.s} 17 } } */ > + > +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 6 } } */ > +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 4 } } */ > +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 4 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c > new file mode 100644 > index 00000000000..8c3c16a7047 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c > @@ -0,0 +1,69 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ > + > +#include <stdint-gcc.h> > + > +typedef uint64_t vnx4di __attribute__((vector_size (32))); > +typedef uint32_t vnx8si __attribute__((vector_size (32))); > +typedef uint16_t vnx16hi __attribute__((vector_size (32))); > +typedef uint8_t vnx32qi __attribute__((vector_size (32))); > + > +#define VEC_EXTRACT(S,V,IDX) \ > + S \ > + __attribute__((noipa)) \ > + vec_extract_##V##_##IDX (V v) \ > + { \ > + return v[IDX]; \ > + } > + > +#define VEC_EXTRACT_VAR2(S,V) \ > + S \ > + __attribute__((noipa)) \ > + vec_extract_var_##V (V v, int16_t idx) \ > + { \ > + return v[idx]; \ > + } > + > +#define TEST_ALL2(T) \ > + T (uint64_t, vnx4di, 0) \ > + T (uint64_t, vnx4di, 1) \ > + T (uint64_t, vnx4di, 2) \ > + T (uint64_t, vnx4di, 3) \ > + T (uint32_t, vnx8si, 0) \ > + T (uint32_t, vnx8si, 1) \ > + T (uint32_t, vnx8si, 3) \ > + T (uint32_t, vnx8si, 4) \ > + T (uint32_t, vnx8si, 7) \ > + T (uint16_t, vnx16hi, 0) \ > + T (uint16_t, vnx16hi, 1) \ > + T (uint16_t, vnx16hi, 7) \ > + T (uint16_t, vnx16hi, 8) \ > + T (uint16_t, vnx16hi, 15) \ > + T (uint8_t, vnx32qi, 0) \ > + T (uint8_t, vnx32qi, 1) \ > + T (uint8_t, vnx32qi, 15) \ > + T (uint8_t, vnx32qi, 16) \ > + T (uint8_t, vnx32qi, 31) \ > + > +#define TEST_ALL_VAR2(T) \ > + T (uint64_t, vnx4di) \ > + T (uint32_t, vnx8si) \ > + T (uint16_t, vnx16hi) \ > + T (uint8_t, vnx32qi) \ > + > +TEST_ALL2 (VEC_EXTRACT) > +TEST_ALL_VAR2 (VEC_EXTRACT_VAR2) > + > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m2,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m2,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m2,\s*ta,\s*ma} 5 } } */ > + > +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 15 } } */ > +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 4 } } */ > + > +/* { dg-final { scan-assembler-times {\tvmv.x.s} 23 } } */ > + > +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 6 } } */ > +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 6 } } */ > +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 6 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c > new file mode 100644 > index 00000000000..ab49f29c3f2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c > @@ -0,0 +1,69 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ > + > +#include <stdint-gcc.h> > + > +typedef uint64_t vnx8di __attribute__((vector_size (64))); > +typedef uint32_t vnx16si __attribute__((vector_size (64))); > +typedef uint16_t vnx32hi __attribute__((vector_size (64))); > +typedef uint8_t vnx64qi __attribute__((vector_size (64))); > + > +#define VEC_EXTRACT(S,V,IDX) \ > + S \ > + __attribute__((noipa)) \ > + vec_extract_##V##_##IDX (V v) \ > + { \ > + return v[IDX]; \ > + } > + > +#define VEC_EXTRACT_VAR3(S,V) \ > + S \ > + __attribute__((noipa)) \ > + vec_extract_var_##V (V v, int32_t idx) \ > + { \ > + return v[idx]; \ > + } > + > +#define TEST_ALL3(T) \ > + T (uint64_t, vnx8di, 0) \ > + T (uint64_t, vnx8di, 2) \ > + T (uint64_t, vnx8di, 4) \ > + T (uint64_t, vnx8di, 6) \ > + T (uint32_t, vnx16si, 0) \ > + T (uint32_t, vnx16si, 2) \ > + T (uint32_t, vnx16si, 6) \ > + T (uint32_t, vnx16si, 8) \ > + T (uint32_t, vnx16si, 14) \ > + T (uint16_t, vnx32hi, 0) \ > + T (uint16_t, vnx32hi, 2) \ > + T (uint16_t, vnx32hi, 14) \ > + T (uint16_t, vnx32hi, 16) \ > + T (uint16_t, vnx32hi, 30) \ > + T (uint8_t, vnx64qi, 0) \ > + T (uint8_t, vnx64qi, 2) \ > + T (uint8_t, vnx64qi, 30) \ > + T (uint8_t, vnx64qi, 32) \ > + T (uint8_t, vnx64qi, 63) \ > + > +#define TEST_ALL_VAR3(T) \ > + T (uint64_t, vnx8di) \ > + T (uint32_t, vnx16si) \ > + T (uint16_t, vnx32hi) \ > + T (uint8_t, vnx64qi) \ > + > +TEST_ALL3 (VEC_EXTRACT) > +TEST_ALL_VAR3 (VEC_EXTRACT_VAR3) > + > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m4,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m4,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m4,\s*ta,\s*ma} 5 } } */ > + > +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 13 } } */ > +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 6 } } */ > + > +/* { dg-final { scan-assembler-times {\tvmv.x.s} 23 } } */ > + > +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 6 } } */ > +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 6 } } */ > +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 6 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c > new file mode 100644 > index 00000000000..328d426e572 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c > @@ -0,0 +1,70 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ > + > +#include <stdint-gcc.h> > + > +typedef uint64_t vnx16di __attribute__((vector_size (128))); > +typedef uint32_t vnx32si __attribute__((vector_size (128))); > +typedef uint16_t vnx64hi __attribute__((vector_size (128))); > +typedef uint8_t vnx128qi __attribute__((vector_size (128))); > + > +#define VEC_EXTRACT(S,V,IDX) \ > + S \ > + __attribute__((noipa)) \ > + vec_extract_##V##_##IDX (V v) \ > + { \ > + return v[IDX]; \ > + } > + > +#define VEC_EXTRACT_VAR4(S,V) \ > + S \ > + __attribute__((noipa)) \ > + vec_extract_var_##V (V v, int64_t idx) \ > + { \ > + return v[idx]; \ > + } > + > +#define TEST_ALL4(T) \ > + T (uint64_t, vnx16di, 0) \ > + T (uint64_t, vnx16di, 4) \ > + T (uint64_t, vnx16di, 8) \ > + T (uint64_t, vnx16di, 12) \ > + T (uint32_t, vnx32si, 0) \ > + T (uint32_t, vnx32si, 4) \ > + T (uint32_t, vnx32si, 12) \ > + T (uint32_t, vnx32si, 16) \ > + T (uint32_t, vnx32si, 28) \ > + T (uint16_t, vnx64hi, 0) \ > + T (uint16_t, vnx64hi, 4) \ > + T (uint16_t, vnx64hi, 28) \ > + T (uint16_t, vnx64hi, 32) \ > + T (uint16_t, vnx64hi, 60) \ > + T (uint8_t, vnx128qi, 0) \ > + T (uint8_t, vnx128qi, 4) \ > + T (uint8_t, vnx128qi, 30) \ > + T (uint8_t, vnx128qi, 60) \ > + T (uint8_t, vnx128qi, 64) \ > + T (uint8_t, vnx128qi, 127) \ > + > +#define TEST_ALL_VAR4(T) \ > + T (uint64_t, vnx16di) \ > + T (uint32_t, vnx32si) \ > + T (uint16_t, vnx64hi) \ > + T (uint8_t, vnx128qi) \ > + > +TEST_ALL4 (VEC_EXTRACT) > +TEST_ALL_VAR4 (VEC_EXTRACT_VAR4) > + > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m8,\s*ta,\s*ma} 7 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m8,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m8,\s*ta,\s*ma} 6 } } */ > +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m8,\s*ta,\s*ma} 5 } } */ > + > +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 11 } } */ > +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 9 } } */ > + > +/* { dg-final { scan-assembler-times {\tvmv.x.s} 24 } } */ > + > +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 7 } } */ > +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 6 } } */ > +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 6 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c > new file mode 100644 > index 00000000000..924e40c9dbb > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c > @@ -0,0 +1,137 @@ > +/* { dg-do run { target { riscv_vector } } } */ > +/* { dg-additional-options "-std=c99 -Wno-pedantic -Wno-psabi" } */ > + > +#include <assert.h> > +#include <limits.h> > + > +#include "vec_extract-1u.c" > +#include "vec_extract-2u.c" > +#include "vec_extract-3u.c" > +#include "vec_extract-4u.c" > + > +#define CHECK(S, V, IDX) \ > + __attribute__ ((noipa, optimize ("0"))) void check_##V##_##IDX () \ > + { \ > + V v; \ > + for (int i = 0; i < sizeof (V) / sizeof (S); i++) \ > + v[i] = (S) (INT_MAX - i); \ > + S res = vec_extract_##V##_##IDX (v); \ > + assert (res == (S) (INT_MAX - IDX)); \ > + } > + > +#define CHECK_VAR(S, V) \ > + __attribute__ ((noipa, optimize ("0"))) void check_var_##V (int32_t idx) \ > + { \ > + V v; \ > + for (int i = 0; i < sizeof (V) / sizeof (S); i++) \ > + v[i] = (S) (INT_MAX - i); \ > + S res = vec_extract_var_##V (v, idx); \ > + assert (res == (S) (INT_MAX - idx)); \ > + } > + > +#define RUN(S, V, IDX) check_##V##_##IDX (); > + > +#define RUN_VAR(S, V) \ > + for (int i = 0; i < sizeof (V) / sizeof (S); i++) \ > + check_var_##V (i); > + > +#define RUN_ALL(T) \ > + T (uint64_t, vnx2di, 0) \ > + T (uint64_t, vnx2di, 1) \ > + T (uint32_t, vnx4si, 0) \ > + T (uint32_t, vnx4si, 1) \ > + T (uint32_t, vnx4si, 3) \ > + T (uint16_t, vnx8hi, 0) \ > + T (uint16_t, vnx8hi, 2) \ > + T (uint16_t, vnx8hi, 6) \ > + T (uint8_t, vnx16qi, 0) \ > + T (uint8_t, vnx16qi, 1) \ > + T (uint8_t, vnx16qi, 7) \ > + T (uint8_t, vnx16qi, 11) \ > + T (uint8_t, vnx16qi, 15) \ > + T (uint64_t, vnx4di, 0) \ > + T (uint64_t, vnx4di, 1) \ > + T (uint64_t, vnx4di, 2) \ > + T (uint64_t, vnx4di, 3) \ > + T (uint32_t, vnx8si, 0) \ > + T (uint32_t, vnx8si, 1) \ > + T (uint32_t, vnx8si, 3) \ > + T (uint32_t, vnx8si, 4) \ > + T (uint32_t, vnx8si, 7) \ > + T (uint16_t, vnx16hi, 0) \ > + T (uint16_t, vnx16hi, 1) \ > + T (uint16_t, vnx16hi, 7) \ > + T (uint16_t, vnx16hi, 8) \ > + T (uint16_t, vnx16hi, 15) \ > + T (uint8_t, vnx32qi, 0) \ > + T (uint8_t, vnx32qi, 1) \ > + T (uint8_t, vnx32qi, 15) \ > + T (uint8_t, vnx32qi, 16) \ > + T (uint8_t, vnx32qi, 31) \ > + T (uint64_t, vnx8di, 0) \ > + T (uint64_t, vnx8di, 2) \ > + T (uint64_t, vnx8di, 4) \ > + T (uint64_t, vnx8di, 6) \ > + T (uint32_t, vnx16si, 0) \ > + T (uint32_t, vnx16si, 2) \ > + T (uint32_t, vnx16si, 6) \ > + T (uint32_t, vnx16si, 8) \ > + T (uint32_t, vnx16si, 14) \ > + T (uint16_t, vnx32hi, 0) \ > + T (uint16_t, vnx32hi, 2) \ > + T (uint16_t, vnx32hi, 14) \ > + T (uint16_t, vnx32hi, 16) \ > + T (uint16_t, vnx32hi, 30) \ > + T (uint8_t, vnx64qi, 0) \ > + T (uint8_t, vnx64qi, 2) \ > + T (uint8_t, vnx64qi, 30) \ > + T (uint8_t, vnx64qi, 32) \ > + T (uint8_t, vnx64qi, 63) \ > + T (uint64_t, vnx16di, 0) \ > + T (uint64_t, vnx16di, 4) \ > + T (uint64_t, vnx16di, 8) \ > + T (uint64_t, vnx16di, 12) \ > + T (uint32_t, vnx32si, 0) \ > + T (uint32_t, vnx32si, 4) \ > + T (uint32_t, vnx32si, 12) \ > + T (uint32_t, vnx32si, 16) \ > + T (uint32_t, vnx32si, 28) \ > + T (uint16_t, vnx64hi, 0) \ > + T (uint16_t, vnx64hi, 4) \ > + T (uint16_t, vnx64hi, 28) \ > + T (uint16_t, vnx64hi, 32) \ > + T (uint16_t, vnx64hi, 60) \ > + T (uint8_t, vnx128qi, 0) \ > + T (uint8_t, vnx128qi, 4) \ > + T (uint8_t, vnx128qi, 30) \ > + T (uint8_t, vnx128qi, 60) \ > + T (uint8_t, vnx128qi, 64) \ > + T (uint8_t, vnx128qi, 127) > + > +#define RUN_ALL_VAR(T) \ > + T (uint64_t, vnx2di) \ > + T (uint32_t, vnx4si) \ > + T (uint16_t, vnx8hi) \ > + T (uint8_t, vnx16qi) \ > + T (uint64_t, vnx4di) \ > + T (uint32_t, vnx8si) \ > + T (uint16_t, vnx16hi) \ > + T (uint8_t, vnx32qi) \ > + T (uint64_t, vnx8di) \ > + T (uint32_t, vnx16si) \ > + T (uint16_t, vnx32hi) \ > + T (uint8_t, vnx64qi) \ > + T (uint64_t, vnx16di) \ > + T (uint32_t, vnx32si) \ > + T (uint16_t, vnx64hi) \ > + T (uint8_t, vnx128qi) > + > +RUN_ALL (CHECK) > +RUN_ALL_VAR (CHECK_VAR) > + > +int > +main () > +{ > + RUN_ALL (RUN); > + RUN_ALL_VAR (RUN_VAR); > +} ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] IFN: Fix vector extraction into promoted subreg. @ 2023-08-15 14:02 Robin Dapp 0 siblings, 0 replies; 5+ messages in thread From: Robin Dapp @ 2023-08-15 14:02 UTC (permalink / raw) To: gcc-patches; +Cc: rdapp.gcc Hi, this patch fixes the case where vec_extract gets passed a promoted subreg (e.g. from a return value). When such a subreg is the destination of a vector extraction we create a separate pseudo register and ensure that the necessary promotion is performed afterwards. Before this patch a sign-extended subreg would erroneously not be zero-extended e.g. when used as return value. I added missing test cases for unsigned vec_extract on RISC-V that check the proper behavior. Testsuite and bootstrap done on x86, aarch64 and power10. Regards Robin gcc/ChangeLog: * internal-fn.cc (expand_vec_extract_optab_fn): Handle SUBREG_PROMOTED_VAR_P. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c: New test. --- gcc/internal-fn.cc | 25 +++- .../rvv/autovec/vls-vlmax/vec_extract-1u.c | 63 ++++++++ .../rvv/autovec/vls-vlmax/vec_extract-2u.c | 69 +++++++++ .../rvv/autovec/vls-vlmax/vec_extract-3u.c | 69 +++++++++ .../rvv/autovec/vls-vlmax/vec_extract-4u.c | 70 +++++++++ .../rvv/autovec/vls-vlmax/vec_extract-runu.c | 137 ++++++++++++++++++ 6 files changed, 430 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 4f2b20a79e5..b1b12cc8369 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -3150,14 +3150,33 @@ expand_vec_extract_optab_fn (internal_fn, gcall *stmt, convert_optab optab) if (icode != CODE_FOR_nothing) { - create_output_operand (&ops[0], target, extract_mode); + /* Some backends like riscv sign-extend the extraction result to a full + Pmode register. If we are passed a promoted subreg as target make + sure not to use it as target directly. Instead, use a new pseudo + and perform the necessary extension afterwards. */ + rtx dest = target; + if (target && SUBREG_P (target) && SUBREG_PROMOTED_VAR_P (target)) + dest = gen_reg_rtx (extract_mode); + + create_output_operand (&ops[0], dest, extract_mode); + create_input_operand (&ops[1], src, outermode); create_convert_operand_from (&ops[2], pos, TYPE_MODE (TREE_TYPE (op1)), true); if (maybe_expand_insn (icode, 3, ops)) { - if (!rtx_equal_p (target, ops[0].value)) - emit_move_insn (target, ops[0].value); + if (!rtx_equal_p (dest, target)) + { + if (SUBREG_P (target) && SUBREG_PROMOTED_VAR_P (target)) + { + /* Have convert_move perform the subreg promotion. */ + rtx tmp = convert_to_mode (extract_mode, ops[0].value, 0); + convert_move (SUBREG_REG (target), tmp, + SUBREG_PROMOTED_SIGN (target)); + } + else + emit_move_insn (target, dest); + } return; } } diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c new file mode 100644 index 00000000000..a35988ff55d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c @@ -0,0 +1,63 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ + +#include <stdint-gcc.h> + +typedef uint64_t vnx2di __attribute__((vector_size (16))); +typedef uint32_t vnx4si __attribute__((vector_size (16))); +typedef uint16_t vnx8hi __attribute__((vector_size (16))); +typedef uint8_t vnx16qi __attribute__((vector_size (16))); + +#define VEC_EXTRACT(S,V,IDX) \ + S \ + __attribute__((noipa)) \ + vec_extract_##V##_##IDX (V v) \ + { \ + return v[IDX]; \ + } + +#define VEC_EXTRACT_VAR1(S,V) \ + S \ + __attribute__((noipa)) \ + vec_extract_var_##V (V v, int8_t idx) \ + { \ + return v[idx]; \ + } + +#define TEST_ALL1(T) \ + T (uint64_t, vnx2di, 0) \ + T (uint64_t, vnx2di, 1) \ + T (uint32_t, vnx4si, 0) \ + T (uint32_t, vnx4si, 1) \ + T (uint32_t, vnx4si, 3) \ + T (uint16_t, vnx8hi, 0) \ + T (uint16_t, vnx8hi, 2) \ + T (uint16_t, vnx8hi, 6) \ + T (uint8_t, vnx16qi, 0) \ + T (uint8_t, vnx16qi, 1) \ + T (uint8_t, vnx16qi, 7) \ + T (uint8_t, vnx16qi, 11) \ + T (uint8_t, vnx16qi, 15) \ + +#define TEST_ALL_VAR1(T) \ + T (uint64_t, vnx2di) \ + T (uint32_t, vnx4si) \ + T (uint16_t, vnx8hi) \ + T (uint8_t, vnx16qi) \ + +TEST_ALL1 (VEC_EXTRACT) +TEST_ALL_VAR1 (VEC_EXTRACT_VAR1) + +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*ta,\s*ma} 4 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*ta,\s*ma} 4 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*ta,\s*ma} 3 } } */ + +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 9 } } */ +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 4 } } */ + +/* { dg-final { scan-assembler-times {\tvmv.x.s} 17 } } */ + +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 6 } } */ +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 4 } } */ +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 4 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c new file mode 100644 index 00000000000..8c3c16a7047 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c @@ -0,0 +1,69 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ + +#include <stdint-gcc.h> + +typedef uint64_t vnx4di __attribute__((vector_size (32))); +typedef uint32_t vnx8si __attribute__((vector_size (32))); +typedef uint16_t vnx16hi __attribute__((vector_size (32))); +typedef uint8_t vnx32qi __attribute__((vector_size (32))); + +#define VEC_EXTRACT(S,V,IDX) \ + S \ + __attribute__((noipa)) \ + vec_extract_##V##_##IDX (V v) \ + { \ + return v[IDX]; \ + } + +#define VEC_EXTRACT_VAR2(S,V) \ + S \ + __attribute__((noipa)) \ + vec_extract_var_##V (V v, int16_t idx) \ + { \ + return v[idx]; \ + } + +#define TEST_ALL2(T) \ + T (uint64_t, vnx4di, 0) \ + T (uint64_t, vnx4di, 1) \ + T (uint64_t, vnx4di, 2) \ + T (uint64_t, vnx4di, 3) \ + T (uint32_t, vnx8si, 0) \ + T (uint32_t, vnx8si, 1) \ + T (uint32_t, vnx8si, 3) \ + T (uint32_t, vnx8si, 4) \ + T (uint32_t, vnx8si, 7) \ + T (uint16_t, vnx16hi, 0) \ + T (uint16_t, vnx16hi, 1) \ + T (uint16_t, vnx16hi, 7) \ + T (uint16_t, vnx16hi, 8) \ + T (uint16_t, vnx16hi, 15) \ + T (uint8_t, vnx32qi, 0) \ + T (uint8_t, vnx32qi, 1) \ + T (uint8_t, vnx32qi, 15) \ + T (uint8_t, vnx32qi, 16) \ + T (uint8_t, vnx32qi, 31) \ + +#define TEST_ALL_VAR2(T) \ + T (uint64_t, vnx4di) \ + T (uint32_t, vnx8si) \ + T (uint16_t, vnx16hi) \ + T (uint8_t, vnx32qi) \ + +TEST_ALL2 (VEC_EXTRACT) +TEST_ALL_VAR2 (VEC_EXTRACT_VAR2) + +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m2,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m2,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m2,\s*ta,\s*ma} 5 } } */ + +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 15 } } */ +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 4 } } */ + +/* { dg-final { scan-assembler-times {\tvmv.x.s} 23 } } */ + +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 6 } } */ +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 6 } } */ +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 6 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c new file mode 100644 index 00000000000..ab49f29c3f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c @@ -0,0 +1,69 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ + +#include <stdint-gcc.h> + +typedef uint64_t vnx8di __attribute__((vector_size (64))); +typedef uint32_t vnx16si __attribute__((vector_size (64))); +typedef uint16_t vnx32hi __attribute__((vector_size (64))); +typedef uint8_t vnx64qi __attribute__((vector_size (64))); + +#define VEC_EXTRACT(S,V,IDX) \ + S \ + __attribute__((noipa)) \ + vec_extract_##V##_##IDX (V v) \ + { \ + return v[IDX]; \ + } + +#define VEC_EXTRACT_VAR3(S,V) \ + S \ + __attribute__((noipa)) \ + vec_extract_var_##V (V v, int32_t idx) \ + { \ + return v[idx]; \ + } + +#define TEST_ALL3(T) \ + T (uint64_t, vnx8di, 0) \ + T (uint64_t, vnx8di, 2) \ + T (uint64_t, vnx8di, 4) \ + T (uint64_t, vnx8di, 6) \ + T (uint32_t, vnx16si, 0) \ + T (uint32_t, vnx16si, 2) \ + T (uint32_t, vnx16si, 6) \ + T (uint32_t, vnx16si, 8) \ + T (uint32_t, vnx16si, 14) \ + T (uint16_t, vnx32hi, 0) \ + T (uint16_t, vnx32hi, 2) \ + T (uint16_t, vnx32hi, 14) \ + T (uint16_t, vnx32hi, 16) \ + T (uint16_t, vnx32hi, 30) \ + T (uint8_t, vnx64qi, 0) \ + T (uint8_t, vnx64qi, 2) \ + T (uint8_t, vnx64qi, 30) \ + T (uint8_t, vnx64qi, 32) \ + T (uint8_t, vnx64qi, 63) \ + +#define TEST_ALL_VAR3(T) \ + T (uint64_t, vnx8di) \ + T (uint32_t, vnx16si) \ + T (uint16_t, vnx32hi) \ + T (uint8_t, vnx64qi) \ + +TEST_ALL3 (VEC_EXTRACT) +TEST_ALL_VAR3 (VEC_EXTRACT_VAR3) + +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m4,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m4,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m4,\s*ta,\s*ma} 5 } } */ + +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 13 } } */ +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 6 } } */ + +/* { dg-final { scan-assembler-times {\tvmv.x.s} 23 } } */ + +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 6 } } */ +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 6 } } */ +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 6 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c new file mode 100644 index 00000000000..328d426e572 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic -Wno-psabi" } */ + +#include <stdint-gcc.h> + +typedef uint64_t vnx16di __attribute__((vector_size (128))); +typedef uint32_t vnx32si __attribute__((vector_size (128))); +typedef uint16_t vnx64hi __attribute__((vector_size (128))); +typedef uint8_t vnx128qi __attribute__((vector_size (128))); + +#define VEC_EXTRACT(S,V,IDX) \ + S \ + __attribute__((noipa)) \ + vec_extract_##V##_##IDX (V v) \ + { \ + return v[IDX]; \ + } + +#define VEC_EXTRACT_VAR4(S,V) \ + S \ + __attribute__((noipa)) \ + vec_extract_var_##V (V v, int64_t idx) \ + { \ + return v[idx]; \ + } + +#define TEST_ALL4(T) \ + T (uint64_t, vnx16di, 0) \ + T (uint64_t, vnx16di, 4) \ + T (uint64_t, vnx16di, 8) \ + T (uint64_t, vnx16di, 12) \ + T (uint32_t, vnx32si, 0) \ + T (uint32_t, vnx32si, 4) \ + T (uint32_t, vnx32si, 12) \ + T (uint32_t, vnx32si, 16) \ + T (uint32_t, vnx32si, 28) \ + T (uint16_t, vnx64hi, 0) \ + T (uint16_t, vnx64hi, 4) \ + T (uint16_t, vnx64hi, 28) \ + T (uint16_t, vnx64hi, 32) \ + T (uint16_t, vnx64hi, 60) \ + T (uint8_t, vnx128qi, 0) \ + T (uint8_t, vnx128qi, 4) \ + T (uint8_t, vnx128qi, 30) \ + T (uint8_t, vnx128qi, 60) \ + T (uint8_t, vnx128qi, 64) \ + T (uint8_t, vnx128qi, 127) \ + +#define TEST_ALL_VAR4(T) \ + T (uint64_t, vnx16di) \ + T (uint32_t, vnx32si) \ + T (uint16_t, vnx64hi) \ + T (uint8_t, vnx128qi) \ + +TEST_ALL4 (VEC_EXTRACT) +TEST_ALL_VAR4 (VEC_EXTRACT_VAR4) + +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m8,\s*ta,\s*ma} 7 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m8,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m8,\s*ta,\s*ma} 6 } } */ +/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m8,\s*ta,\s*ma} 5 } } */ + +/* { dg-final { scan-assembler-times {\tvslidedown.vi} 11 } } */ +/* { dg-final { scan-assembler-times {\tvslidedown.vx} 9 } } */ + +/* { dg-final { scan-assembler-times {\tvmv.x.s} 24 } } */ + +/* { dg-final { scan-assembler-times {\tandi\ta0,a0,0xff} 7 } } */ +/* { dg-final { scan-assembler-times {\tslli\ta0,a0,48} 6 } } */ +/* { dg-final { scan-assembler-times {\tsrli\ta0,a0,48} 6 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c new file mode 100644 index 00000000000..924e40c9dbb --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c @@ -0,0 +1,137 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "-std=c99 -Wno-pedantic -Wno-psabi" } */ + +#include <assert.h> +#include <limits.h> + +#include "vec_extract-1u.c" +#include "vec_extract-2u.c" +#include "vec_extract-3u.c" +#include "vec_extract-4u.c" + +#define CHECK(S, V, IDX) \ + __attribute__ ((noipa, optimize ("0"))) void check_##V##_##IDX () \ + { \ + V v; \ + for (int i = 0; i < sizeof (V) / sizeof (S); i++) \ + v[i] = (S) (INT_MAX - i); \ + S res = vec_extract_##V##_##IDX (v); \ + assert (res == (S) (INT_MAX - IDX)); \ + } + +#define CHECK_VAR(S, V) \ + __attribute__ ((noipa, optimize ("0"))) void check_var_##V (int32_t idx) \ + { \ + V v; \ + for (int i = 0; i < sizeof (V) / sizeof (S); i++) \ + v[i] = (S) (INT_MAX - i); \ + S res = vec_extract_var_##V (v, idx); \ + assert (res == (S) (INT_MAX - idx)); \ + } + +#define RUN(S, V, IDX) check_##V##_##IDX (); + +#define RUN_VAR(S, V) \ + for (int i = 0; i < sizeof (V) / sizeof (S); i++) \ + check_var_##V (i); + +#define RUN_ALL(T) \ + T (uint64_t, vnx2di, 0) \ + T (uint64_t, vnx2di, 1) \ + T (uint32_t, vnx4si, 0) \ + T (uint32_t, vnx4si, 1) \ + T (uint32_t, vnx4si, 3) \ + T (uint16_t, vnx8hi, 0) \ + T (uint16_t, vnx8hi, 2) \ + T (uint16_t, vnx8hi, 6) \ + T (uint8_t, vnx16qi, 0) \ + T (uint8_t, vnx16qi, 1) \ + T (uint8_t, vnx16qi, 7) \ + T (uint8_t, vnx16qi, 11) \ + T (uint8_t, vnx16qi, 15) \ + T (uint64_t, vnx4di, 0) \ + T (uint64_t, vnx4di, 1) \ + T (uint64_t, vnx4di, 2) \ + T (uint64_t, vnx4di, 3) \ + T (uint32_t, vnx8si, 0) \ + T (uint32_t, vnx8si, 1) \ + T (uint32_t, vnx8si, 3) \ + T (uint32_t, vnx8si, 4) \ + T (uint32_t, vnx8si, 7) \ + T (uint16_t, vnx16hi, 0) \ + T (uint16_t, vnx16hi, 1) \ + T (uint16_t, vnx16hi, 7) \ + T (uint16_t, vnx16hi, 8) \ + T (uint16_t, vnx16hi, 15) \ + T (uint8_t, vnx32qi, 0) \ + T (uint8_t, vnx32qi, 1) \ + T (uint8_t, vnx32qi, 15) \ + T (uint8_t, vnx32qi, 16) \ + T (uint8_t, vnx32qi, 31) \ + T (uint64_t, vnx8di, 0) \ + T (uint64_t, vnx8di, 2) \ + T (uint64_t, vnx8di, 4) \ + T (uint64_t, vnx8di, 6) \ + T (uint32_t, vnx16si, 0) \ + T (uint32_t, vnx16si, 2) \ + T (uint32_t, vnx16si, 6) \ + T (uint32_t, vnx16si, 8) \ + T (uint32_t, vnx16si, 14) \ + T (uint16_t, vnx32hi, 0) \ + T (uint16_t, vnx32hi, 2) \ + T (uint16_t, vnx32hi, 14) \ + T (uint16_t, vnx32hi, 16) \ + T (uint16_t, vnx32hi, 30) \ + T (uint8_t, vnx64qi, 0) \ + T (uint8_t, vnx64qi, 2) \ + T (uint8_t, vnx64qi, 30) \ + T (uint8_t, vnx64qi, 32) \ + T (uint8_t, vnx64qi, 63) \ + T (uint64_t, vnx16di, 0) \ + T (uint64_t, vnx16di, 4) \ + T (uint64_t, vnx16di, 8) \ + T (uint64_t, vnx16di, 12) \ + T (uint32_t, vnx32si, 0) \ + T (uint32_t, vnx32si, 4) \ + T (uint32_t, vnx32si, 12) \ + T (uint32_t, vnx32si, 16) \ + T (uint32_t, vnx32si, 28) \ + T (uint16_t, vnx64hi, 0) \ + T (uint16_t, vnx64hi, 4) \ + T (uint16_t, vnx64hi, 28) \ + T (uint16_t, vnx64hi, 32) \ + T (uint16_t, vnx64hi, 60) \ + T (uint8_t, vnx128qi, 0) \ + T (uint8_t, vnx128qi, 4) \ + T (uint8_t, vnx128qi, 30) \ + T (uint8_t, vnx128qi, 60) \ + T (uint8_t, vnx128qi, 64) \ + T (uint8_t, vnx128qi, 127) + +#define RUN_ALL_VAR(T) \ + T (uint64_t, vnx2di) \ + T (uint32_t, vnx4si) \ + T (uint16_t, vnx8hi) \ + T (uint8_t, vnx16qi) \ + T (uint64_t, vnx4di) \ + T (uint32_t, vnx8si) \ + T (uint16_t, vnx16hi) \ + T (uint8_t, vnx32qi) \ + T (uint64_t, vnx8di) \ + T (uint32_t, vnx16si) \ + T (uint16_t, vnx32hi) \ + T (uint8_t, vnx64qi) \ + T (uint64_t, vnx16di) \ + T (uint32_t, vnx32si) \ + T (uint16_t, vnx64hi) \ + T (uint8_t, vnx128qi) + +RUN_ALL (CHECK) +RUN_ALL_VAR (CHECK_VAR) + +int +main () +{ + RUN_ALL (RUN); + RUN_ALL_VAR (RUN_VAR); +} -- 2.41.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-08-16 10:05 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-08-16 1:31 [PATCH] IFN: Fix vector extraction into promoted subreg juzhe.zhong 2023-08-16 6:45 ` Richard Sandiford 2023-08-16 9:37 ` Robin Dapp 2023-08-16 10:05 ` Richard Sandiford -- strict thread matches above, loose matches on Subject: below -- 2023-08-15 14:02 Robin Dapp
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).