* [PATCH] gcc: Add vec_select -> subreg RTL simplification @ 2021-07-02 9:53 Jonathan Wright 2021-07-07 13:35 ` [PATCH V2] " Jonathan Wright 0 siblings, 1 reply; 6+ messages in thread From: Jonathan Wright @ 2021-07-02 9:53 UTC (permalink / raw) To: gcc-patches; +Cc: Richard Sandiford, Kyrylo Tkachov [-- Attachment #1: Type: text/plain, Size: 3221 bytes --] Hi, As subject, this patch adds a new RTL simplification for the case of a VEC_SELECT selecting the low part of a vector. The simplification returns a SUBREG. The primary goal of this patch is to enable better combinations of Neon RTL patterns - specifically allowing generation of 'write-to- high-half' narrowing intructions. Adding this RTL simplification means that the expected results for a number of tests need to be updated: * aarch64 Neon: Update the scan-assembler regex for intrinsics tests to expect a scalar register instead of lane 0 of a vector. * aarch64 SVE: Likewise. * arm MVE: Use lane 1 instead of lane 0 for lane-extraction intrinsics tests (as the move instructions get optimized away for lane 0.) Regression tested and bootstrapped on aarch64-none-linux-gnu, x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and aarch64_be-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-06-08 Jonathan Wright <jonathan.wright@arm.com> * combine.c (combine_simplify_rtx): Add vec_select -> subreg simplification. * config/aarch64/aarch64.md (*zero_extend<SHORT:mode><GPI:mode>2_aarch64): Add Neon to general purpose register case for zero-extend pattern. * config/arm/vfp.md (*arm_movsi_vfp): Remove "*" from *t -> r case to prevent some cases opting to go through memory. * cse.c (fold_rtx): Add vec_select -> subreg simplification. * simplify-rtx.c (simplify_context::simplify_binary_operation_1): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/extract_zero_extend.c: Remove dump scan for RTL pattern match. * gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Update scan-assembler regex to look for a scalar register instead of lane 0 of a vector. * gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Likewise. * gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: Likewise. * gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: Likewise. * gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: Likewise. * gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmullh_laneq_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmulls_laneq_s32.c: Likewise. * gcc.target/aarch64/sve/dup_lane_1.c: Likewise. * gcc.target/aarch64/sve/live_1.c: Update scan-assembler regex cases to look for 'b' and 'h' registers instead of 'w'. * gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: Extract lane 1 as the moves for lane 0 now get optimized away. * gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise. [-- Attachment #2: rb14526.patch --] [-- Type: application/octet-stream, Size: 24917 bytes --] diff --git a/gcc/combine.c b/gcc/combine.c index 6476812a21268e28219d1e302ee1c979d528a6ca..965b1a69ab2162a537b5846f0563f5120090fb22 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -6276,6 +6276,36 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int in_dest, - 1, 0)); break; + case VEC_SELECT: + { + rtx trueop0 = XEXP (x, 0); + mode = GET_MODE (trueop0); + rtx trueop1 = XEXP (x, 1); + int nunits; + /* If we select a low-part subreg, return that. */ + if (GET_MODE_NUNITS (mode).is_constant (&nunits) + && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS)) + { + int flag = 0; + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0; + + for (int i = 0; i < XVECLEN (trueop1, 0); i++) + { + if (i + offset != INTVAL (XVECEXP (trueop1, 0, i))) + { + flag = 1; + break; + } + } + + if (flag == 0) + { + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode); + if (new_rtx != NULL_RTX) + return new_rtx; + } + } + } default: break; diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1884,15 +1884,16 @@ ) (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64" - [(set (match_operand:GPI 0 "register_operand" "=r,r,w") - (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m")))] + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r") + (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m,w")))] "" "@ and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask> ldr<SHORT:size>\t%w0, %1 - ldr\t%<SHORT:size>0, %1" - [(set_attr "type" "logic_imm,load_4,f_loads") - (set_attr "arch" "*,*,fp")] + ldr\t%<SHORT:size>0, %1 + umov\t%w0, %1.<SHORT:size>[0]" + [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp") + (set_attr "arch" "*,*,fp,fp")] ) (define_expand "<optab>qihi2" diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -224,7 +224,7 @@ ;; problems because small constants get converted into adds. (define_insn "*arm_movsi_vfp" [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m ,*t,r,*t,*t, *Uv") - (match_operand:SI 1 "general_operand" "rk, I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))] + (match_operand:SI 1 "general_operand" "rk, I,K,j,mi,rk,r,t,*t,*Uvi,*t"))] "TARGET_ARM && TARGET_HARD_FLOAT && ( s_register_operand (operands[0], SImode) || s_register_operand (operands[1], SImode))" diff --git a/gcc/cse.c b/gcc/cse.c index 4b7cbdce600e9d0e1d4768c17a99381c76e1cef1..51e0599d2a34b19a2a8b71780e47a25027afea1c 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -3171,6 +3171,36 @@ fold_rtx (rtx x, rtx_insn *insn) if (NO_FUNCTION_CSE && CONSTANT_P (XEXP (XEXP (x, 0), 0))) return x; break; + case VEC_SELECT: + { + rtx trueop0 = XEXP (x, 0); + mode = GET_MODE (trueop0); + rtx trueop1 = XEXP (x, 1); + int nunits; + /* If we select a low-part subreg, return that. */ + if (GET_MODE_NUNITS (mode).is_constant (&nunits) + && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS)) + { + int flag = 0; + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0; + + for (int i = 0; i < XVECLEN (trueop1, 0); i++) + { + if (i + offset != INTVAL (XVECEXP (trueop1, 0, i))) + { + flag = 1; + break; + } + } + + if (flag == 0) + { + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode); + if (new_rtx != NULL_RTX) + return new_rtx; + } + } + } /* Anything else goes through the loop below. */ default: diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index c82101c73a46e300bc65eb2104a2205433ff5d24..3b41588932e0801fd379e9aa36fa5b094b33d15e 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4201,6 +4201,34 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, return trueop0; } + /* If we select a low-part subreg, return that. */ + int nunits; + if (GET_MODE_NUNITS (GET_MODE (trueop0)).is_constant (&nunits) + && targetm.can_change_mode_class (GET_MODE (trueop0), mode, + ALL_REGS)) + { + int flag = 0; + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) + : 0; + + for (int i = 0; i < XVECLEN (trueop1, 0); i++) + { + if (i + offset != INTVAL (XVECEXP (trueop1, 0, i))) + { + flag = 1; + break; + } + } + + if (flag == 0) + { + rtx new_rtx = lowpart_subreg (mode, trueop0, + GET_MODE (trueop0)); + if (new_rtx != NULL_RTX) + return new_rtx; + } + } + /* If we build {a,b} then permute it, build the result directly. */ if (XVECLEN (trueop1, 0) == 2 && CONST_INT_P (XVECEXP (trueop1, 0, 0)) diff --git a/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c index 0209305cd55b0b62b794f790a1cc3606fcc7a44b..193b945b41ad821da6d1112ffae79ca463b4a5e4 100644 --- a/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c +++ b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c @@ -70,12 +70,3 @@ foo_siv4hi (siv4hi a) /* { dg-final { scan-assembler-times "umov\\t" 8 } } */ /* { dg-final { scan-assembler-not "and\\t" } } */ - -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv8qi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv8hi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv4hi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv16qi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv8qi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv8hi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv4hi" "final" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c index db79d5355bc925098555788c0dd09c99029576c7..9ef001eb3bad40ea09008d1d79b2211ff81f911a 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c @@ -72,5 +72,5 @@ main (void) set_and_test_case3 (); return 0; } -/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[1\\\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c index 3f8303c574ff40967c5b9ce5a152d70c4a11a9dc..232ade910472bf2ea3aa182f4216f55c8403b45b 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c @@ -58,5 +58,5 @@ main (void) set_and_test_case3 (); return 0; } -/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[1\\\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c index 124dcd8c4ec187b38ffb03606fad4121d9280451..37aa0ec270c29d998973ef37acd4d06470caf1f1 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c @@ -57,5 +57,5 @@ main (void) set_and_test_case3 (); return 0; } -/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[1\\\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c index 255f0968822ffee7f3429c5997b02e3fcfca68f3..c9f2484975a66afd7d69e7fc1d9ea023a655a4d6 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c @@ -79,7 +79,7 @@ main (void) set_and_test_case3 (); return 0; } -/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[1\\\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[2\\\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[3\\\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c index 21ae724cf0ede2378cc21a2b151e948ddb198137..6b96d1cbf0fa0de7c79811abcce25990867549ab 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c @@ -11,4 +11,4 @@ t_vqdmlalh_lane_s16 (int32_t a, int16_t b, int16x4_t c) return vqdmlalh_lane_s16 (a, b, c, 0); } -/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c index 79db7b73de07000c4a0546c2afa5e3b27584ebe9..a780ddbe2f90a0750497448ed05f0be61bd173c0 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c @@ -11,4 +11,4 @@ t_vqdmlals_lane_s32 (int64_t a, int32_t b, int32x2_t c) return vqdmlals_lane_s32 (a, b, c, 0); } -/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c index 185507b9362527b842d6f0f07934e19f77e61c97..8bbac1a3c59f60844fb75aeec57adf1b8b830d2a 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c @@ -11,4 +11,4 @@ t_vqdmlslh_lane_s16 (int32_t a, int16_t b, int16x4_t c) return vqdmlslh_lane_s16 (a, b, c, 0); } -/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c index f692923850e959946c7113b5b60bcef052938b75..069ba918d5bbae20bda5fa6b3c23e41dd8068b40 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c @@ -11,4 +11,4 @@ t_vqdmlsls_lane_s32 (int64_t a, int32_t b, int32x2_t c) return vqdmlsls_lane_s32 (a, b, c, 0); } -/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c index debf191abc71429cb26e1478ca837cc7734760d2..fcd496b1aaa773204053bec6a0d3b764a71fcf63 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c @@ -11,4 +11,4 @@ t_vqdmullh_lane_s16 (int16_t a, int16x4_t b) return vqdmullh_lane_s16 (a, b, 0); } -/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c index e810c4713bcc66f3e8aa04cba9304325d7e62a25..db77fff27f3ec4838f9e2d06f0d9cede495dedac 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c @@ -11,4 +11,4 @@ t_vqdmullh_laneq_s16 (int16_t a, int16x8_t b) return vqdmullh_laneq_s16 (a, b, 0); } -/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c index a5fe60fbe16983bef97c688948743b2052109e96..04bbe7f9daf19b93ef48779452ff03898cc62c19 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c @@ -11,4 +11,4 @@ t_vqdmulls_lane_s32 (int32_t a, int32x2_t b) return vqdmulls_lane_s32 (a, b, 0); } -/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c index bd856d8e71fb1210ecec46f116c47645bbdef4e4..e8e236894fbb7d029995dcb7f9938c4f0c4511f2 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c @@ -11,4 +11,4 @@ t_vqdmulls_laneq_s32 (int32_t a, int32x4_t b) return vqdmulls_laneq_s32 (a, b, 0); } -/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c b/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c index 532847bb7e50095217988fbd66e9c58a006fdfc7..14c1f5ab4c2de84bf923eae5ae26e1bdd81cd6ef 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c @@ -56,15 +56,27 @@ TEST_ALL (DUP_LANE) /* { dg-final { scan-assembler-not {\ttbl\t} } } */ -/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[0\]} 2 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, d[0-9]} 2 { + target { aarch64_little_endian } } } } */ +/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[0\]} 2 { + target { aarch64_big_endian } } } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[2\]} 2 } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[3\]} 2 } } */ -/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[0\]} 2 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, s[0-9]} 2 { + target { aarch64_little_endian } } } } */ +/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[0\]} 2 { + target { aarch64_big_endian } } } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[5\]} 2 } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[7\]} 2 } } */ -/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[0\]} 2 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, h[0-9]} 2 { + target { aarch64_little_endian } } } } */ +/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[0\]} 2 { + target { aarch64_big_endian } } } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[6\]} 2 } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[15\]} 2 } } */ -/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[0\]} 1 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.b, b[0-9]} 1 { + target { aarch64_little_endian } } } } */ +/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[0\]} 1 { + target { aarch64_big_endian } } } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[19\]} 1 } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[31\]} 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c index e8d92ec7e9f57a4f2d1c2fd8b259a41d87eb03c3..80ee176d1807bf628ad47551d69ff5d84deda79e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c @@ -32,10 +32,9 @@ TEST_ALL (EXTRACT_LAST) /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].s, } 4 } } */ /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].d, } 4 } } */ -/* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tlastb\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tlastb\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */ /* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tlastb\tx[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tlastb\th[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tlastb\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tlastb\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c index 2a5aa63f4572a666e50d7825c8820d49eb9cd70e..a92e1d47393ac1e6d5d39d967787c4a88f16d0f9 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c @@ -8,7 +8,7 @@ float16_t foo (float16x8_t a) { - return vgetq_lane_f16 (a, 0); + return vgetq_lane_f16 (a, 1); } /* { dg-final { scan-assembler "vmov.u16" } } */ @@ -16,7 +16,7 @@ foo (float16x8_t a) float16_t foo1 (float16x8_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c index f1839cccffe1c34478f2372cd20b47761357b142..98319eff5c0f5825edd3563b8fa018a437fa3458 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c @@ -8,7 +8,7 @@ float32_t foo (float32x4_t a) { - return vgetq_lane_f32 (a, 0); + return vgetq_lane_f32 (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ @@ -16,7 +16,7 @@ foo (float32x4_t a) float32_t foo1 (float32x4_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c index ed1c2178839568dcc3eea3342606ba8eff57ea72..c9eefeb9972eaac8168218b5c10c5efaa2e59fce 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c @@ -8,7 +8,7 @@ int16_t foo (int16x8_t a) { - return vgetq_lane_s16 (a, 0); + return vgetq_lane_s16 (a, 1); } /* { dg-final { scan-assembler "vmov.s16" } } */ @@ -16,7 +16,7 @@ foo (int16x8_t a) int16_t foo1 (int16x8_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.s16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c index c87ed93e70def5bbf6b1055d99656f7386f97ea8..0925a25bb45df9708d46038b5f534a02a2d6dbbb 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c @@ -8,7 +8,7 @@ int32_t foo (int32x4_t a) { - return vgetq_lane_s32 (a, 0); + return vgetq_lane_s32 (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ @@ -16,7 +16,7 @@ foo (int32x4_t a) int32_t foo1 (int32x4_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c index 11242ff3bc090a11bf7f8f163f0348824158bed7..5b76e3da5562fb8e2a2a49de851bed3329bc6ea0 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c @@ -8,7 +8,7 @@ int8_t foo (int8x16_t a) { - return vgetq_lane_s8 (a, 0); + return vgetq_lane_s8 (a, 1); } /* { dg-final { scan-assembler "vmov.s8" } } */ @@ -16,7 +16,7 @@ foo (int8x16_t a) int8_t foo1 (int8x16_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.s8" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c index 2788b585535c46a3271be65849b1ba058df1adcf..c4a3fb0d3794c67a789c3c479fa7ca6415da35c4 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c @@ -8,7 +8,7 @@ uint16_t foo (uint16x8_t a) { - return vgetq_lane_u16 (a, 0); + return vgetq_lane_u16 (a, 1); } /* { dg-final { scan-assembler "vmov.u16" } } */ @@ -16,7 +16,7 @@ foo (uint16x8_t a) uint16_t foo1 (uint16x8_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c index 721c5a5ffd77cd1ad038d44f32fa197fe2687311..d79837023248e84d4c30774afc07e243edc8ba65 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c @@ -8,7 +8,7 @@ uint32_t foo (uint32x4_t a) { - return vgetq_lane_u32 (a, 0); + return vgetq_lane_u32 (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ @@ -16,7 +16,7 @@ foo (uint32x4_t a) uint32_t foo1 (uint32x4_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c index 2bcaeac3fe1f5775f448d7f702ea139726fadcc3..631d995dc17f99c7a30cb9cbf56883f818fa2b1d 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c @@ -8,7 +8,7 @@ uint8_t foo (uint8x16_t a) { - return vgetq_lane_u8 (a, 0); + return vgetq_lane_u8 (a, 1); } /* { dg-final { scan-assembler "vmov.u8" } } */ @@ -16,7 +16,7 @@ foo (uint8x16_t a) uint8_t foo1 (uint8x16_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.u8" } } */ ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH V2] gcc: Add vec_select -> subreg RTL simplification 2021-07-02 9:53 [PATCH] gcc: Add vec_select -> subreg RTL simplification Jonathan Wright @ 2021-07-07 13:35 ` Jonathan Wright 2021-07-12 15:30 ` Richard Sandiford 0 siblings, 1 reply; 6+ messages in thread From: Jonathan Wright @ 2021-07-07 13:35 UTC (permalink / raw) To: gcc-patches; +Cc: Richard Sandiford, Kyrylo Tkachov [-- Attachment #1: Type: text/plain, Size: 6525 bytes --] Hi, Version 2 of this patch adds more code generation tests to show the benefit of this RTL simplification as well as adding a new helper function 'rtx_vec_series_p' to reduce code duplication. Patch tested as version 1 - ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-06-08 Jonathan Wright <jonathan.wright@arm.com> * combine.c (combine_simplify_rtx): Add vec_select -> subreg simplification. * config/aarch64/aarch64.md (*zero_extend<SHORT:mode><GPI:mode>2_aarch64): Add Neon to general purpose register case for zero-extend pattern. * config/arm/vfp.md (*arm_movsi_vfp): Remove "*" from *t -> r case to prevent some cases opting to go through memory. * cse.c (fold_rtx): Add vec_select -> subreg simplification. * rtl.c (rtx_vec_series_p): Define helper function to determine whether RTX vector-selection indices are in series. * rtl.h (rtx_vec_series_p): Define. * simplify-rtx.c (simplify_context::simplify_binary_operation_1): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/extract_zero_extend.c: Remove dump scan for RTL pattern match. * gcc.target/aarch64/narrow_high_combine.c: Add new tests. * gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Update scan-assembler regex to look for a scalar register instead of lane 0 of a vector. * gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Likewise. * gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: Likewise. * gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: Likewise. * gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: Likewise. * gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmullh_laneq_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmulls_laneq_s32.c: Likewise. * gcc.target/aarch64/sve/dup_lane_1.c: Likewise. * gcc.target/aarch64/sve/live_1.c: Update scan-assembler regex cases to look for 'b' and 'h' registers instead of 'w'. * gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: Extract lane 1 as the moves for lane 0 now get optimized away. * gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise. From: Jonathan Wright Sent: 02 July 2021 10:53 To: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> Subject: [PATCH] gcc: Add vec_select -> subreg RTL simplification Hi, As subject, this patch adds a new RTL simplification for the case of a VEC_SELECT selecting the low part of a vector. The simplification returns a SUBREG. The primary goal of this patch is to enable better combinations of Neon RTL patterns - specifically allowing generation of 'write-to- high-half' narrowing intructions. Adding this RTL simplification means that the expected results for a number of tests need to be updated: * aarch64 Neon: Update the scan-assembler regex for intrinsics tests to expect a scalar register instead of lane 0 of a vector. * aarch64 SVE: Likewise. * arm MVE: Use lane 1 instead of lane 0 for lane-extraction intrinsics tests (as the move instructions get optimized away for lane 0.) Regression tested and bootstrapped on aarch64-none-linux-gnu, x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and aarch64_be-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-06-08 Jonathan Wright <jonathan.wright@arm.com> * combine.c (combine_simplify_rtx): Add vec_select -> subreg simplification. * config/aarch64/aarch64.md (*zero_extend<SHORT:mode><GPI:mode>2_aarch64): Add Neon to general purpose register case for zero-extend pattern. * config/arm/vfp.md (*arm_movsi_vfp): Remove "*" from *t -> r case to prevent some cases opting to go through memory. * cse.c (fold_rtx): Add vec_select -> subreg simplification. * simplify-rtx.c (simplify_context::simplify_binary_operation_1): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/extract_zero_extend.c: Remove dump scan for RTL pattern match. * gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Update scan-assembler regex to look for a scalar register instead of lane 0 of a vector. * gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Likewise. * gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: Likewise. * gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: Likewise. * gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: Likewise. * gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmullh_laneq_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmulls_laneq_s32.c: Likewise. * gcc.target/aarch64/sve/dup_lane_1.c: Likewise. * gcc.target/aarch64/sve/live_1.c: Update scan-assembler regex cases to look for 'b' and 'h' registers instead of 'w'. * gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: Extract lane 1 as the moves for lane 0 now get optimized away. * gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise. [-- Attachment #2: rb14526.patch --] [-- Type: application/octet-stream, Size: 43047 bytes --] diff --git a/gcc/combine.c b/gcc/combine.c index 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int in_dest, - 1, 0)); break; + case VEC_SELECT: + { + rtx trueop0 = XEXP (x, 0); + mode = GET_MODE (trueop0); + rtx trueop1 = XEXP (x, 1); + int nunits; + /* If we select a low-part subreg, return that. */ + if (GET_MODE_NUNITS (mode).is_constant (&nunits) + && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS)) + { + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0; + + if (rtx_vec_series_p (trueop1, offset)) + { + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode); + if (new_rtx != NULL_RTX) + return new_rtx; + } + } + } default: break; diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1884,15 +1884,16 @@ ) (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64" - [(set (match_operand:GPI 0 "register_operand" "=r,r,w") - (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m")))] + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r") + (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m,w")))] "" "@ and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask> ldr<SHORT:size>\t%w0, %1 - ldr\t%<SHORT:size>0, %1" - [(set_attr "type" "logic_imm,load_4,f_loads") - (set_attr "arch" "*,*,fp")] + ldr\t%<SHORT:size>0, %1 + umov\t%w0, %1.<SHORT:size>[0]" + [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp") + (set_attr "arch" "*,*,fp,fp")] ) (define_expand "<optab>qihi2" diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -224,7 +224,7 @@ ;; problems because small constants get converted into adds. (define_insn "*arm_movsi_vfp" [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m ,*t,r,*t,*t, *Uv") - (match_operand:SI 1 "general_operand" "rk, I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))] + (match_operand:SI 1 "general_operand" "rk, I,K,j,mi,rk,r,t,*t,*Uvi,*t"))] "TARGET_ARM && TARGET_HARD_FLOAT && ( s_register_operand (operands[0], SImode) || s_register_operand (operands[1], SImode))" diff --git a/gcc/cse.c b/gcc/cse.c index 4b7cbdce600e9d0e1d4768c17a99381c76e1cef1..053c9dcc1566d3dcb4f75e22716368342d9ec75a 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -3171,6 +3171,26 @@ fold_rtx (rtx x, rtx_insn *insn) if (NO_FUNCTION_CSE && CONSTANT_P (XEXP (XEXP (x, 0), 0))) return x; break; + case VEC_SELECT: + { + rtx trueop0 = XEXP (x, 0); + mode = GET_MODE (trueop0); + rtx trueop1 = XEXP (x, 1); + int nunits; + /* If we select a low-part subreg, return that. */ + if (GET_MODE_NUNITS (mode).is_constant (&nunits) + && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS)) + { + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0; + + if (rtx_vec_series_p (trueop1, offset)) + { + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode); + if (new_rtx != NULL_RTX) + return new_rtx; + } + } + } /* Anything else goes through the loop below. */ default: diff --git a/gcc/rtl.h b/gcc/rtl.h index 5ed0d6dd6fa6356f283f1ca9c3b029b8d22aa4f7..abd2d5a4a9392e883cd15cd1dd7abae2a136acd9 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2996,6 +2996,7 @@ extern unsigned int rtx_size (const_rtx); extern rtx shallow_copy_rtx (const_rtx CXX_MEM_STAT_INFO); extern int rtx_equal_p (const_rtx, const_rtx); extern bool rtvec_all_equal_p (const_rtvec); +extern bool rtx_vec_series_p (const_rtx, int); /* Return true if X is a vector constant with a duplicated element value. */ diff --git a/gcc/rtl.c b/gcc/rtl.c index aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa 100644 --- a/gcc/rtl.c +++ b/gcc/rtl.c @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec) } } +/* Return true if element-selection indices in VEC are in series. */ + +bool +rtx_vec_series_p (const_rtx vec, int start) +{ + for (int i = 0; i < XVECLEN (vec, 0); i++) + { + if (i + start != INTVAL (XVECEXP (vec, 0, i))) + return false; + } + return true; +} + /* Return an indication of which type of insn should have X as a body. In generator files, this can be UNKNOWN if the answer is only known at (GCC) runtime. Otherwise the value is CODE_LABEL, INSN, CALL_INSN diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index c82101c73a46e300bc65eb2104a2205433ff5d24..fea88d50c6148b984f52e8d7f60e032c27b3c25d 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4201,6 +4201,24 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, return trueop0; } + /* If we select a low-part subreg, return that. */ + int nunits; + if (GET_MODE_NUNITS (GET_MODE (trueop0)).is_constant (&nunits) + && targetm.can_change_mode_class (GET_MODE (trueop0), mode, + ALL_REGS)) + { + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) + : 0; + + if (rtx_vec_series_p (trueop1, offset)) + { + rtx new_rtx = lowpart_subreg (mode, trueop0, + GET_MODE (trueop0)); + if (new_rtx != NULL_RTX) + return new_rtx; + } + } + /* If we build {a,b} then permute it, build the result directly. */ if (XVECLEN (trueop1, 0) == 2 && CONST_INT_P (XVECEXP (trueop1, 0, 0)) diff --git a/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c index 0209305cd55b0b62b794f790a1cc3606fcc7a44b..193b945b41ad821da6d1112ffae79ca463b4a5e4 100644 --- a/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c +++ b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c @@ -70,12 +70,3 @@ foo_siv4hi (siv4hi a) /* { dg-final { scan-assembler-times "umov\\t" 8 } } */ /* { dg-final { scan-assembler-not "and\\t" } } */ - -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv8qi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv8hi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv4hi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv16qi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv8qi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv8hi" "final" } } */ -/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv4hi" "final" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c b/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c index cf649bda28d4d648c9392d202fcc5660107a11d7..50ecab002a3552d37a5cc0d8921f42f6c3dba195 100644 --- a/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c +++ b/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c @@ -4,122 +4,228 @@ #include <arm_neon.h> -#define TEST_ARITH(name, rettype, rmwtype, intype, fs, rs) \ - rettype test_ ## name ## _ ## fs ## _high_combine \ +#define TEST_1_ARITH(name, rettype, rmwtype, intype, fs, rs) \ + rettype test_1_ ## name ## _ ## fs ## _high_combine \ (rmwtype a, intype b, intype c) \ { \ return vcombine_ ## rs (a, name ## _ ## fs (b, c)); \ } -TEST_ARITH (vaddhn, int8x16_t, int8x8_t, int16x8_t, s16, s8) -TEST_ARITH (vaddhn, int16x8_t, int16x4_t, int32x4_t, s32, s16) -TEST_ARITH (vaddhn, int32x4_t, int32x2_t, int64x2_t, s64, s32) -TEST_ARITH (vaddhn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) -TEST_ARITH (vaddhn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) -TEST_ARITH (vaddhn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) - -TEST_ARITH (vraddhn, int8x16_t, int8x8_t, int16x8_t, s16, s8) -TEST_ARITH (vraddhn, int16x8_t, int16x4_t, int32x4_t, s32, s16) -TEST_ARITH (vraddhn, int32x4_t, int32x2_t, int64x2_t, s64, s32) -TEST_ARITH (vraddhn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) -TEST_ARITH (vraddhn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) -TEST_ARITH (vraddhn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) - -TEST_ARITH (vsubhn, int8x16_t, int8x8_t, int16x8_t, s16, s8) -TEST_ARITH (vsubhn, int16x8_t, int16x4_t, int32x4_t, s32, s16) -TEST_ARITH (vsubhn, int32x4_t, int32x2_t, int64x2_t, s64, s32) -TEST_ARITH (vsubhn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) -TEST_ARITH (vsubhn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) -TEST_ARITH (vsubhn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) - -TEST_ARITH (vrsubhn, int8x16_t, int8x8_t, int16x8_t, s16, s8) -TEST_ARITH (vrsubhn, int16x8_t, int16x4_t, int32x4_t, s32, s16) -TEST_ARITH (vrsubhn, int32x4_t, int32x2_t, int64x2_t, s64, s32) -TEST_ARITH (vrsubhn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) -TEST_ARITH (vrsubhn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) -TEST_ARITH (vrsubhn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) - -#define TEST_SHIFT(name, rettype, rmwtype, intype, fs, rs) \ - rettype test_ ## name ## _ ## fs ## _high_combine \ +TEST_1_ARITH (vaddhn, int8x16_t, int8x8_t, int16x8_t, s16, s8) +TEST_1_ARITH (vaddhn, int16x8_t, int16x4_t, int32x4_t, s32, s16) +TEST_1_ARITH (vaddhn, int32x4_t, int32x2_t, int64x2_t, s64, s32) +TEST_1_ARITH (vaddhn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) +TEST_1_ARITH (vaddhn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) +TEST_1_ARITH (vaddhn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) + +TEST_1_ARITH (vraddhn, int8x16_t, int8x8_t, int16x8_t, s16, s8) +TEST_1_ARITH (vraddhn, int16x8_t, int16x4_t, int32x4_t, s32, s16) +TEST_1_ARITH (vraddhn, int32x4_t, int32x2_t, int64x2_t, s64, s32) +TEST_1_ARITH (vraddhn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) +TEST_1_ARITH (vraddhn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) +TEST_1_ARITH (vraddhn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) + +TEST_1_ARITH (vsubhn, int8x16_t, int8x8_t, int16x8_t, s16, s8) +TEST_1_ARITH (vsubhn, int16x8_t, int16x4_t, int32x4_t, s32, s16) +TEST_1_ARITH (vsubhn, int32x4_t, int32x2_t, int64x2_t, s64, s32) +TEST_1_ARITH (vsubhn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) +TEST_1_ARITH (vsubhn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) +TEST_1_ARITH (vsubhn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) + +TEST_1_ARITH (vrsubhn, int8x16_t, int8x8_t, int16x8_t, s16, s8) +TEST_1_ARITH (vrsubhn, int16x8_t, int16x4_t, int32x4_t, s32, s16) +TEST_1_ARITH (vrsubhn, int32x4_t, int32x2_t, int64x2_t, s64, s32) +TEST_1_ARITH (vrsubhn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) +TEST_1_ARITH (vrsubhn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) +TEST_1_ARITH (vrsubhn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) + +#define TEST_2_ARITH(name, rettype, intype, fs, rs) \ + rettype test_2_ ## name ## _ ## fs ## _high_combine \ + (intype a, intype b, intype c) \ + { \ + return vcombine_ ## rs (name ## _ ## fs (a, c), \ + name ## _ ## fs (b, c)); \ + } + +TEST_2_ARITH (vaddhn, int8x16_t, int16x8_t, s16, s8) +TEST_2_ARITH (vaddhn, int16x8_t, int32x4_t, s32, s16) +TEST_2_ARITH (vaddhn, int32x4_t, int64x2_t, s64, s32) +TEST_2_ARITH (vaddhn, uint8x16_t, uint16x8_t, u16, u8) +TEST_2_ARITH (vaddhn, uint16x8_t, uint32x4_t, u32, u16) +TEST_2_ARITH (vaddhn, uint32x4_t, uint64x2_t, u64, u32) + +TEST_2_ARITH (vraddhn, int8x16_t, int16x8_t, s16, s8) +TEST_2_ARITH (vraddhn, int16x8_t, int32x4_t, s32, s16) +TEST_2_ARITH (vraddhn, int32x4_t, int64x2_t, s64, s32) +TEST_2_ARITH (vraddhn, uint8x16_t, uint16x8_t, u16, u8) +TEST_2_ARITH (vraddhn, uint16x8_t, uint32x4_t, u32, u16) +TEST_2_ARITH (vraddhn, uint32x4_t, uint64x2_t, u64, u32) + +TEST_2_ARITH (vsubhn, int8x16_t, int16x8_t, s16, s8) +TEST_2_ARITH (vsubhn, int16x8_t, int32x4_t, s32, s16) +TEST_2_ARITH (vsubhn, int32x4_t, int64x2_t, s64, s32) +TEST_2_ARITH (vsubhn, uint8x16_t, uint16x8_t, u16, u8) +TEST_2_ARITH (vsubhn, uint16x8_t, uint32x4_t, u32, u16) +TEST_2_ARITH (vsubhn, uint32x4_t, uint64x2_t, u64, u32) + +TEST_2_ARITH (vrsubhn, int8x16_t, int16x8_t, s16, s8) +TEST_2_ARITH (vrsubhn, int16x8_t, int32x4_t, s32, s16) +TEST_2_ARITH (vrsubhn, int32x4_t, int64x2_t, s64, s32) +TEST_2_ARITH (vrsubhn, uint8x16_t, uint16x8_t, u16, u8) +TEST_2_ARITH (vrsubhn, uint16x8_t, uint32x4_t, u32, u16) +TEST_2_ARITH (vrsubhn, uint32x4_t, uint64x2_t, u64, u32) + +#define TEST_1_SHIFT(name, rettype, rmwtype, intype, fs, rs) \ + rettype test_1_ ## name ## _ ## fs ## _high_combine \ (rmwtype a, intype b) \ { \ return vcombine_ ## rs (a, name ## _ ## fs (b, 4)); \ } -TEST_SHIFT (vshrn_n, int8x16_t, int8x8_t, int16x8_t, s16, s8) -TEST_SHIFT (vshrn_n, int16x8_t, int16x4_t, int32x4_t, s32, s16) -TEST_SHIFT (vshrn_n, int32x4_t, int32x2_t, int64x2_t, s64, s32) -TEST_SHIFT (vshrn_n, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) -TEST_SHIFT (vshrn_n, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) -TEST_SHIFT (vshrn_n, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) - -TEST_SHIFT (vrshrn_n, int8x16_t, int8x8_t, int16x8_t, s16, s8) -TEST_SHIFT (vrshrn_n, int16x8_t, int16x4_t, int32x4_t, s32, s16) -TEST_SHIFT (vrshrn_n, int32x4_t, int32x2_t, int64x2_t, s64, s32) -TEST_SHIFT (vrshrn_n, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) -TEST_SHIFT (vrshrn_n, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) -TEST_SHIFT (vrshrn_n, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) - -TEST_SHIFT (vqshrn_n, int8x16_t, int8x8_t, int16x8_t, s16, s8) -TEST_SHIFT (vqshrn_n, int16x8_t, int16x4_t, int32x4_t, s32, s16) -TEST_SHIFT (vqshrn_n, int32x4_t, int32x2_t, int64x2_t, s64, s32) -TEST_SHIFT (vqshrn_n, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) -TEST_SHIFT (vqshrn_n, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) -TEST_SHIFT (vqshrn_n, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) - -TEST_SHIFT (vqrshrn_n, int8x16_t, int8x8_t, int16x8_t, s16, s8) -TEST_SHIFT (vqrshrn_n, int16x8_t, int16x4_t, int32x4_t, s32, s16) -TEST_SHIFT (vqrshrn_n, int32x4_t, int32x2_t, int64x2_t, s64, s32) -TEST_SHIFT (vqrshrn_n, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) -TEST_SHIFT (vqrshrn_n, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) -TEST_SHIFT (vqrshrn_n, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) - -TEST_SHIFT (vqshrun_n, uint8x16_t, uint8x8_t, int16x8_t, s16, u8) -TEST_SHIFT (vqshrun_n, uint16x8_t, uint16x4_t, int32x4_t, s32, u16) -TEST_SHIFT (vqshrun_n, uint32x4_t, uint32x2_t, int64x2_t, s64, u32) - -TEST_SHIFT (vqrshrun_n, uint8x16_t, uint8x8_t, int16x8_t, s16, u8) -TEST_SHIFT (vqrshrun_n, uint16x8_t, uint16x4_t, int32x4_t, s32, u16) -TEST_SHIFT (vqrshrun_n, uint32x4_t, uint32x2_t, int64x2_t, s64, u32) - -#define TEST_UNARY(name, rettype, rmwtype, intype, fs, rs) \ - rettype test_ ## name ## _ ## fs ## _high_combine \ +TEST_1_SHIFT (vshrn_n, int8x16_t, int8x8_t, int16x8_t, s16, s8) +TEST_1_SHIFT (vshrn_n, int16x8_t, int16x4_t, int32x4_t, s32, s16) +TEST_1_SHIFT (vshrn_n, int32x4_t, int32x2_t, int64x2_t, s64, s32) +TEST_1_SHIFT (vshrn_n, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) +TEST_1_SHIFT (vshrn_n, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) +TEST_1_SHIFT (vshrn_n, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) + +TEST_1_SHIFT (vrshrn_n, int8x16_t, int8x8_t, int16x8_t, s16, s8) +TEST_1_SHIFT (vrshrn_n, int16x8_t, int16x4_t, int32x4_t, s32, s16) +TEST_1_SHIFT (vrshrn_n, int32x4_t, int32x2_t, int64x2_t, s64, s32) +TEST_1_SHIFT (vrshrn_n, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) +TEST_1_SHIFT (vrshrn_n, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) +TEST_1_SHIFT (vrshrn_n, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) + +TEST_1_SHIFT (vqshrn_n, int8x16_t, int8x8_t, int16x8_t, s16, s8) +TEST_1_SHIFT (vqshrn_n, int16x8_t, int16x4_t, int32x4_t, s32, s16) +TEST_1_SHIFT (vqshrn_n, int32x4_t, int32x2_t, int64x2_t, s64, s32) +TEST_1_SHIFT (vqshrn_n, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) +TEST_1_SHIFT (vqshrn_n, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) +TEST_1_SHIFT (vqshrn_n, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) + +TEST_1_SHIFT (vqrshrn_n, int8x16_t, int8x8_t, int16x8_t, s16, s8) +TEST_1_SHIFT (vqrshrn_n, int16x8_t, int16x4_t, int32x4_t, s32, s16) +TEST_1_SHIFT (vqrshrn_n, int32x4_t, int32x2_t, int64x2_t, s64, s32) +TEST_1_SHIFT (vqrshrn_n, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) +TEST_1_SHIFT (vqrshrn_n, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) +TEST_1_SHIFT (vqrshrn_n, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) + +TEST_1_SHIFT (vqshrun_n, uint8x16_t, uint8x8_t, int16x8_t, s16, u8) +TEST_1_SHIFT (vqshrun_n, uint16x8_t, uint16x4_t, int32x4_t, s32, u16) +TEST_1_SHIFT (vqshrun_n, uint32x4_t, uint32x2_t, int64x2_t, s64, u32) + +TEST_1_SHIFT (vqrshrun_n, uint8x16_t, uint8x8_t, int16x8_t, s16, u8) +TEST_1_SHIFT (vqrshrun_n, uint16x8_t, uint16x4_t, int32x4_t, s32, u16) +TEST_1_SHIFT (vqrshrun_n, uint32x4_t, uint32x2_t, int64x2_t, s64, u32) + +#define TEST_2_SHIFT(name, rettype, intype, fs, rs) \ + rettype test_2_ ## name ## _ ## fs ## _high_combine \ + (intype a, intype b) \ + { \ + return vcombine_ ## rs (name ## _ ## fs (a, 4), \ + name ## _ ## fs (b, 4)); \ + } + +TEST_2_SHIFT (vshrn_n, int8x16_t, int16x8_t, s16, s8) +TEST_2_SHIFT (vshrn_n, int16x8_t, int32x4_t, s32, s16) +TEST_2_SHIFT (vshrn_n, int32x4_t, int64x2_t, s64, s32) +TEST_2_SHIFT (vshrn_n, uint8x16_t, uint16x8_t, u16, u8) +TEST_2_SHIFT (vshrn_n, uint16x8_t, uint32x4_t, u32, u16) +TEST_2_SHIFT (vshrn_n, uint32x4_t, uint64x2_t, u64, u32) + +TEST_2_SHIFT (vrshrn_n, int8x16_t, int16x8_t, s16, s8) +TEST_2_SHIFT (vrshrn_n, int16x8_t, int32x4_t, s32, s16) +TEST_2_SHIFT (vrshrn_n, int32x4_t, int64x2_t, s64, s32) +TEST_2_SHIFT (vrshrn_n, uint8x16_t, uint16x8_t, u16, u8) +TEST_2_SHIFT (vrshrn_n, uint16x8_t, uint32x4_t, u32, u16) +TEST_2_SHIFT (vrshrn_n, uint32x4_t, uint64x2_t, u64, u32) + +TEST_2_SHIFT (vqshrn_n, int8x16_t, int16x8_t, s16, s8) +TEST_2_SHIFT (vqshrn_n, int16x8_t, int32x4_t, s32, s16) +TEST_2_SHIFT (vqshrn_n, int32x4_t, int64x2_t, s64, s32) +TEST_2_SHIFT (vqshrn_n, uint8x16_t, uint16x8_t, u16, u8) +TEST_2_SHIFT (vqshrn_n, uint16x8_t, uint32x4_t, u32, u16) +TEST_2_SHIFT (vqshrn_n, uint32x4_t, uint64x2_t, u64, u32) + +TEST_2_SHIFT (vqrshrn_n, int8x16_t, int16x8_t, s16, s8) +TEST_2_SHIFT (vqrshrn_n, int16x8_t, int32x4_t, s32, s16) +TEST_2_SHIFT (vqrshrn_n, int32x4_t, int64x2_t, s64, s32) +TEST_2_SHIFT (vqrshrn_n, uint8x16_t, uint16x8_t, u16, u8) +TEST_2_SHIFT (vqrshrn_n, uint16x8_t, uint32x4_t, u32, u16) +TEST_2_SHIFT (vqrshrn_n, uint32x4_t, uint64x2_t, u64, u32) + +TEST_2_SHIFT (vqshrun_n, uint8x16_t, int16x8_t, s16, u8) +TEST_2_SHIFT (vqshrun_n, uint16x8_t, int32x4_t, s32, u16) +TEST_2_SHIFT (vqshrun_n, uint32x4_t, int64x2_t, s64, u32) + +TEST_2_SHIFT (vqrshrun_n, uint8x16_t, int16x8_t, s16, u8) +TEST_2_SHIFT (vqrshrun_n, uint16x8_t, int32x4_t, s32, u16) +TEST_2_SHIFT (vqrshrun_n, uint32x4_t, int64x2_t, s64, u32) + +#define TEST_1_UNARY(name, rettype, rmwtype, intype, fs, rs) \ + rettype test_1_ ## name ## _ ## fs ## _high_combine \ (rmwtype a, intype b) \ { \ return vcombine_ ## rs (a, name ## _ ## fs (b)); \ } -TEST_UNARY (vmovn, int8x16_t, int8x8_t, int16x8_t, s16, s8) -TEST_UNARY (vmovn, int16x8_t, int16x4_t, int32x4_t, s32, s16) -TEST_UNARY (vmovn, int32x4_t, int32x2_t, int64x2_t, s64, s32) -TEST_UNARY (vmovn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) -TEST_UNARY (vmovn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) -TEST_UNARY (vmovn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) - -TEST_UNARY (vqmovn, int8x16_t, int8x8_t, int16x8_t, s16, s8) -TEST_UNARY (vqmovn, int16x8_t, int16x4_t, int32x4_t, s32, s16) -TEST_UNARY (vqmovn, int32x4_t, int32x2_t, int64x2_t, s64, s32) -TEST_UNARY (vqmovn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) -TEST_UNARY (vqmovn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) -TEST_UNARY (vqmovn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) - -TEST_UNARY (vqmovun, uint8x16_t, uint8x8_t, int16x8_t, s16, u8) -TEST_UNARY (vqmovun, uint16x8_t, uint16x4_t, int32x4_t, s32, u16) -TEST_UNARY (vqmovun, uint32x4_t, uint32x2_t, int64x2_t, s64, u32) - -/* { dg-final { scan-assembler-times "\\taddhn2\\tv" 6} } */ -/* { dg-final { scan-assembler-times "\\tsubhn2\\tv" 6} } */ -/* { dg-final { scan-assembler-times "\\trsubhn2\\tv" 6} } */ -/* { dg-final { scan-assembler-times "\\traddhn2\\tv" 6} } */ -/* { dg-final { scan-assembler-times "\\trshrn2\\tv" 6} } */ -/* { dg-final { scan-assembler-times "\\tshrn2\\tv" 6} } */ -/* { dg-final { scan-assembler-times "\\tsqshrun2\\tv" 3} } */ -/* { dg-final { scan-assembler-times "\\tsqrshrun2\\tv" 3} } */ -/* { dg-final { scan-assembler-times "\\tsqshrn2\\tv" 3} } */ -/* { dg-final { scan-assembler-times "\\tuqshrn2\\tv" 3} } */ -/* { dg-final { scan-assembler-times "\\tsqrshrn2\\tv" 3} } */ -/* { dg-final { scan-assembler-times "\\tuqrshrn2\\tv" 3} } */ -/* { dg-final { scan-assembler-times "\\txtn2\\tv" 6} } */ -/* { dg-final { scan-assembler-times "\\tuqxtn2\\tv" 3} } */ -/* { dg-final { scan-assembler-times "\\tsqxtn2\\tv" 3} } */ -/* { dg-final { scan-assembler-times "\\tsqxtun2\\tv" 3} } */ +TEST_1_UNARY (vmovn, int8x16_t, int8x8_t, int16x8_t, s16, s8) +TEST_1_UNARY (vmovn, int16x8_t, int16x4_t, int32x4_t, s32, s16) +TEST_1_UNARY (vmovn, int32x4_t, int32x2_t, int64x2_t, s64, s32) +TEST_1_UNARY (vmovn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) +TEST_1_UNARY (vmovn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) +TEST_1_UNARY (vmovn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) + +TEST_1_UNARY (vqmovn, int8x16_t, int8x8_t, int16x8_t, s16, s8) +TEST_1_UNARY (vqmovn, int16x8_t, int16x4_t, int32x4_t, s32, s16) +TEST_1_UNARY (vqmovn, int32x4_t, int32x2_t, int64x2_t, s64, s32) +TEST_1_UNARY (vqmovn, uint8x16_t, uint8x8_t, uint16x8_t, u16, u8) +TEST_1_UNARY (vqmovn, uint16x8_t, uint16x4_t, uint32x4_t, u32, u16) +TEST_1_UNARY (vqmovn, uint32x4_t, uint32x2_t, uint64x2_t, u64, u32) + +TEST_1_UNARY (vqmovun, uint8x16_t, uint8x8_t, int16x8_t, s16, u8) +TEST_1_UNARY (vqmovun, uint16x8_t, uint16x4_t, int32x4_t, s32, u16) +TEST_1_UNARY (vqmovun, uint32x4_t, uint32x2_t, int64x2_t, s64, u32) + +#define TEST_2_UNARY(name, rettype, intype, fs, rs) \ + rettype test_2_ ## name ## _ ## fs ## _high_combine \ + (intype a, intype b) \ + { \ + return vcombine_ ## rs (name ## _ ## fs (a), \ + name ## _ ## fs (b)); \ + } + +TEST_2_UNARY (vmovn, int8x16_t, int16x8_t, s16, s8) +TEST_2_UNARY (vmovn, int16x8_t, int32x4_t, s32, s16) +TEST_2_UNARY (vmovn, int32x4_t, int64x2_t, s64, s32) +TEST_2_UNARY (vmovn, uint8x16_t, uint16x8_t, u16, u8) +TEST_2_UNARY (vmovn, uint16x8_t, uint32x4_t, u32, u16) +TEST_2_UNARY (vmovn, uint32x4_t, uint64x2_t, u64, u32) + +TEST_2_UNARY (vqmovn, int8x16_t, int16x8_t, s16, s8) +TEST_2_UNARY (vqmovn, int16x8_t, int32x4_t, s32, s16) +TEST_2_UNARY (vqmovn, int32x4_t, int64x2_t, s64, s32) +TEST_2_UNARY (vqmovn, uint8x16_t, uint16x8_t, u16, u8) +TEST_2_UNARY (vqmovn, uint16x8_t, uint32x4_t, u32, u16) +TEST_2_UNARY (vqmovn, uint32x4_t, uint64x2_t, u64, u32) + +TEST_2_UNARY (vqmovun, uint8x16_t, int16x8_t, s16, u8) +TEST_2_UNARY (vqmovun, uint16x8_t, int32x4_t, s32, u16) +TEST_2_UNARY (vqmovun, uint32x4_t, int64x2_t, s64, u32) + +/* { dg-final { scan-assembler-times "\\taddhn2\\tv" 12} } */ +/* { dg-final { scan-assembler-times "\\tsubhn2\\tv" 12} } */ +/* { dg-final { scan-assembler-times "\\trsubhn2\\tv" 12} } */ +/* { dg-final { scan-assembler-times "\\traddhn2\\tv" 12} } */ +/* { dg-final { scan-assembler-times "\\trshrn2\\tv" 12} } */ +/* { dg-final { scan-assembler-times "\\tshrn2\\tv" 12} } */ +/* { dg-final { scan-assembler-times "\\tsqshrun2\\tv" 6} } */ +/* { dg-final { scan-assembler-times "\\tsqrshrun2\\tv" 6} } */ +/* { dg-final { scan-assembler-times "\\tsqshrn2\\tv" 6} } */ +/* { dg-final { scan-assembler-times "\\tuqshrn2\\tv" 6} } */ +/* { dg-final { scan-assembler-times "\\tsqrshrn2\\tv" 6} } */ +/* { dg-final { scan-assembler-times "\\tuqrshrn2\\tv" 6} } */ +/* { dg-final { scan-assembler-times "\\txtn2\\tv" 12} } */ +/* { dg-final { scan-assembler-times "\\tuqxtn2\\tv" 6} } */ +/* { dg-final { scan-assembler-times "\\tsqxtn2\\tv" 6} } */ +/* { dg-final { scan-assembler-times "\\tsqxtun2\\tv" 6} } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c index db79d5355bc925098555788c0dd09c99029576c7..9ef001eb3bad40ea09008d1d79b2211ff81f911a 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c @@ -72,5 +72,5 @@ main (void) set_and_test_case3 (); return 0; } -/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[1\\\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c index 3f8303c574ff40967c5b9ce5a152d70c4a11a9dc..232ade910472bf2ea3aa182f4216f55c8403b45b 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c @@ -58,5 +58,5 @@ main (void) set_and_test_case3 (); return 0; } -/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[1\\\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c index 124dcd8c4ec187b38ffb03606fad4121d9280451..37aa0ec270c29d998973ef37acd4d06470caf1f1 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c @@ -57,5 +57,5 @@ main (void) set_and_test_case3 (); return 0; } -/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[1\\\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c index 255f0968822ffee7f3429c5997b02e3fcfca68f3..c9f2484975a66afd7d69e7fc1d9ea023a655a4d6 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c @@ -79,7 +79,7 @@ main (void) set_and_test_case3 (); return 0; } -/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[1\\\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[2\\\]\n" 1 } } */ /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[3\\\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c index 21ae724cf0ede2378cc21a2b151e948ddb198137..6b96d1cbf0fa0de7c79811abcce25990867549ab 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c @@ -11,4 +11,4 @@ t_vqdmlalh_lane_s16 (int32_t a, int16_t b, int16x4_t c) return vqdmlalh_lane_s16 (a, b, c, 0); } -/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c index 79db7b73de07000c4a0546c2afa5e3b27584ebe9..a780ddbe2f90a0750497448ed05f0be61bd173c0 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c @@ -11,4 +11,4 @@ t_vqdmlals_lane_s32 (int64_t a, int32_t b, int32x2_t c) return vqdmlals_lane_s32 (a, b, c, 0); } -/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c index 185507b9362527b842d6f0f07934e19f77e61c97..8bbac1a3c59f60844fb75aeec57adf1b8b830d2a 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c @@ -11,4 +11,4 @@ t_vqdmlslh_lane_s16 (int32_t a, int16_t b, int16x4_t c) return vqdmlslh_lane_s16 (a, b, c, 0); } -/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c index f692923850e959946c7113b5b60bcef052938b75..069ba918d5bbae20bda5fa6b3c23e41dd8068b40 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c @@ -11,4 +11,4 @@ t_vqdmlsls_lane_s32 (int64_t a, int32_t b, int32x2_t c) return vqdmlsls_lane_s32 (a, b, c, 0); } -/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c index debf191abc71429cb26e1478ca837cc7734760d2..fcd496b1aaa773204053bec6a0d3b764a71fcf63 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c @@ -11,4 +11,4 @@ t_vqdmullh_lane_s16 (int16_t a, int16x4_t b) return vqdmullh_lane_s16 (a, b, 0); } -/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c index e810c4713bcc66f3e8aa04cba9304325d7e62a25..db77fff27f3ec4838f9e2d06f0d9cede495dedac 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c @@ -11,4 +11,4 @@ t_vqdmullh_laneq_s16 (int16_t a, int16x8_t b) return vqdmullh_laneq_s16 (a, b, 0); } -/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c index a5fe60fbe16983bef97c688948743b2052109e96..04bbe7f9daf19b93ef48779452ff03898cc62c19 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c @@ -11,4 +11,4 @@ t_vqdmulls_lane_s32 (int32_t a, int32x2_t b) return vqdmulls_lane_s32 (a, b, 0); } -/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c index bd856d8e71fb1210ecec46f116c47645bbdef4e4..e8e236894fbb7d029995dcb7f9938c4f0c4511f2 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c @@ -11,4 +11,4 @@ t_vqdmulls_laneq_s32 (int32_t a, int32x4_t b) return vqdmulls_laneq_s32 (a, b, 0); } -/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */ +/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c b/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c index 532847bb7e50095217988fbd66e9c58a006fdfc7..14c1f5ab4c2de84bf923eae5ae26e1bdd81cd6ef 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c @@ -56,15 +56,27 @@ TEST_ALL (DUP_LANE) /* { dg-final { scan-assembler-not {\ttbl\t} } } */ -/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[0\]} 2 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, d[0-9]} 2 { + target { aarch64_little_endian } } } } */ +/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[0\]} 2 { + target { aarch64_big_endian } } } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[2\]} 2 } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[3\]} 2 } } */ -/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[0\]} 2 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, s[0-9]} 2 { + target { aarch64_little_endian } } } } */ +/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[0\]} 2 { + target { aarch64_big_endian } } } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[5\]} 2 } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[7\]} 2 } } */ -/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[0\]} 2 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, h[0-9]} 2 { + target { aarch64_little_endian } } } } */ +/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[0\]} 2 { + target { aarch64_big_endian } } } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[6\]} 2 } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[15\]} 2 } } */ -/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[0\]} 1 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.b, b[0-9]} 1 { + target { aarch64_little_endian } } } } */ +/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[0\]} 1 { + target { aarch64_big_endian } } } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[19\]} 1 } } */ /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[31\]} 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c index e8d92ec7e9f57a4f2d1c2fd8b259a41d87eb03c3..80ee176d1807bf628ad47551d69ff5d84deda79e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c @@ -32,10 +32,9 @@ TEST_ALL (EXTRACT_LAST) /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].s, } 4 } } */ /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].d, } 4 } } */ -/* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tlastb\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tlastb\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */ /* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tlastb\tx[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tlastb\th[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tlastb\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tlastb\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c index 2a5aa63f4572a666e50d7825c8820d49eb9cd70e..a92e1d47393ac1e6d5d39d967787c4a88f16d0f9 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c @@ -8,7 +8,7 @@ float16_t foo (float16x8_t a) { - return vgetq_lane_f16 (a, 0); + return vgetq_lane_f16 (a, 1); } /* { dg-final { scan-assembler "vmov.u16" } } */ @@ -16,7 +16,7 @@ foo (float16x8_t a) float16_t foo1 (float16x8_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c index f1839cccffe1c34478f2372cd20b47761357b142..98319eff5c0f5825edd3563b8fa018a437fa3458 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c @@ -8,7 +8,7 @@ float32_t foo (float32x4_t a) { - return vgetq_lane_f32 (a, 0); + return vgetq_lane_f32 (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ @@ -16,7 +16,7 @@ foo (float32x4_t a) float32_t foo1 (float32x4_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c index ed1c2178839568dcc3eea3342606ba8eff57ea72..c9eefeb9972eaac8168218b5c10c5efaa2e59fce 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c @@ -8,7 +8,7 @@ int16_t foo (int16x8_t a) { - return vgetq_lane_s16 (a, 0); + return vgetq_lane_s16 (a, 1); } /* { dg-final { scan-assembler "vmov.s16" } } */ @@ -16,7 +16,7 @@ foo (int16x8_t a) int16_t foo1 (int16x8_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.s16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c index c87ed93e70def5bbf6b1055d99656f7386f97ea8..0925a25bb45df9708d46038b5f534a02a2d6dbbb 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c @@ -8,7 +8,7 @@ int32_t foo (int32x4_t a) { - return vgetq_lane_s32 (a, 0); + return vgetq_lane_s32 (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ @@ -16,7 +16,7 @@ foo (int32x4_t a) int32_t foo1 (int32x4_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c index 11242ff3bc090a11bf7f8f163f0348824158bed7..5b76e3da5562fb8e2a2a49de851bed3329bc6ea0 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c @@ -8,7 +8,7 @@ int8_t foo (int8x16_t a) { - return vgetq_lane_s8 (a, 0); + return vgetq_lane_s8 (a, 1); } /* { dg-final { scan-assembler "vmov.s8" } } */ @@ -16,7 +16,7 @@ foo (int8x16_t a) int8_t foo1 (int8x16_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.s8" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c index 2788b585535c46a3271be65849b1ba058df1adcf..c4a3fb0d3794c67a789c3c479fa7ca6415da35c4 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c @@ -8,7 +8,7 @@ uint16_t foo (uint16x8_t a) { - return vgetq_lane_u16 (a, 0); + return vgetq_lane_u16 (a, 1); } /* { dg-final { scan-assembler "vmov.u16" } } */ @@ -16,7 +16,7 @@ foo (uint16x8_t a) uint16_t foo1 (uint16x8_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c index 721c5a5ffd77cd1ad038d44f32fa197fe2687311..d79837023248e84d4c30774afc07e243edc8ba65 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c @@ -8,7 +8,7 @@ uint32_t foo (uint32x4_t a) { - return vgetq_lane_u32 (a, 0); + return vgetq_lane_u32 (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ @@ -16,7 +16,7 @@ foo (uint32x4_t a) uint32_t foo1 (uint32x4_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c index 2bcaeac3fe1f5775f448d7f702ea139726fadcc3..631d995dc17f99c7a30cb9cbf56883f818fa2b1d 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c @@ -8,7 +8,7 @@ uint8_t foo (uint8x16_t a) { - return vgetq_lane_u8 (a, 0); + return vgetq_lane_u8 (a, 1); } /* { dg-final { scan-assembler "vmov.u8" } } */ @@ -16,7 +16,7 @@ foo (uint8x16_t a) uint8_t foo1 (uint8x16_t a) { - return vgetq_lane (a, 0); + return vgetq_lane (a, 1); } /* { dg-final { scan-assembler "vmov.u8" } } */ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification 2021-07-07 13:35 ` [PATCH V2] " Jonathan Wright @ 2021-07-12 15:30 ` Richard Sandiford 2021-07-15 9:09 ` Christophe Lyon 0 siblings, 1 reply; 6+ messages in thread From: Richard Sandiford @ 2021-07-12 15:30 UTC (permalink / raw) To: Jonathan Wright; +Cc: gcc-patches, Kyrylo Tkachov Jonathan Wright <Jonathan.Wright@arm.com> writes: > Hi, > > Version 2 of this patch adds more code generation tests to show the > benefit of this RTL simplification as well as adding a new helper function > 'rtx_vec_series_p' to reduce code duplication. > > Patch tested as version 1 - ok for master? Sorry for the slow reply. > Regression tested and bootstrapped on aarch64-none-linux-gnu, > x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and > aarch64_be-none-linux-gnu - no issues. I've also tested this on powerpc64le-unknown-linux-gnu, no issues again. > diff --git a/gcc/combine.c b/gcc/combine.c > index 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int in_dest, > - 1, > 0)); > break; > + case VEC_SELECT: > + { > + rtx trueop0 = XEXP (x, 0); > + mode = GET_MODE (trueop0); > + rtx trueop1 = XEXP (x, 1); > + int nunits; > + /* If we select a low-part subreg, return that. */ > + if (GET_MODE_NUNITS (mode).is_constant (&nunits) > + && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS)) > + { > + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0; > + > + if (rtx_vec_series_p (trueop1, offset)) > + { > + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode); > + if (new_rtx != NULL_RTX) > + return new_rtx; > + } > + } > + } Since this occurs three times, I think it would be worth having a new predicate: /* Return true if, for all OP of mode OP_MODE: (vec_select:RESULT_MODE OP SEL) is equivalent to the lowpart RESULT_MODE of OP. */ bool vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx sel) containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class and rtx_vec_series_p tests. I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p is in emit-rtl.c. > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -1884,15 +1884,16 @@ > ) > > (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64" > - [(set (match_operand:GPI 0 "register_operand" "=r,r,w") > - (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m")))] > + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r") > + (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m,w")))] > "" > "@ > and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask> > ldr<SHORT:size>\t%w0, %1 > - ldr\t%<SHORT:size>0, %1" > - [(set_attr "type" "logic_imm,load_4,f_loads") > - (set_attr "arch" "*,*,fp")] > + ldr\t%<SHORT:size>0, %1 > + umov\t%w0, %1.<SHORT:size>[0]" > + [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp") > + (set_attr "arch" "*,*,fp,fp")] FTR (just to show I thought about it): I don't know whether the umov can really be considered an fp operation rather than a simd operation, but since we don't support fp without simd, this is already a distinction without a difference. So the pattern is IMO OK as-is. > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md > index 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35 100644 > --- a/gcc/config/arm/vfp.md > +++ b/gcc/config/arm/vfp.md > @@ -224,7 +224,7 @@ > ;; problems because small constants get converted into adds. > (define_insn "*arm_movsi_vfp" > [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m ,*t,r,*t,*t, *Uv") > - (match_operand:SI 1 "general_operand" "rk, I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))] > + (match_operand:SI 1 "general_operand" "rk, I,K,j,mi,rk,r,t,*t,*Uvi,*t"))] > "TARGET_ARM && TARGET_HARD_FLOAT > && ( s_register_operand (operands[0], SImode) > || s_register_operand (operands[1], SImode))" I'll assume that an Arm maintainer would have spoken up by now if they didn't want this for some reason. > diff --git a/gcc/rtl.c b/gcc/rtl.c > index aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa 100644 > --- a/gcc/rtl.c > +++ b/gcc/rtl.c > @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec) > } > } > > +/* Return true if element-selection indices in VEC are in series. */ > + > +bool > +rtx_vec_series_p (const_rtx vec, int start) I think rtvec_series_p would be better, for consistency with rtvec_all_equal_p. Also, let's generalise it to: /* Return true if VEC contains a linear series of integers { START, START+1, START+2, ... }. */ bool rtvec_series_p (rtvec vec, int start) { } > +{ > + for (int i = 0; i < XVECLEN (vec, 0); i++) > + { > + if (i + start != INTVAL (XVECEXP (vec, 0, i))) > + return false; > + } > + return true; With the general definition I think this should be: for (int i = 0; i < GET_NUM_ELEM (vec); i++) { rtx x = RTVEC_ELT (vec, i); if (!CONST_INT_P (x) || INTVAL (x) != i + start) return false; } Then pass XVEC (sel, 0) to the function, instead of just sel. OK with those changes, thanks. Richard ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification 2021-07-12 15:30 ` Richard Sandiford @ 2021-07-15 9:09 ` Christophe Lyon 2021-07-15 13:06 ` Jonathan Wright 0 siblings, 1 reply; 6+ messages in thread From: Christophe Lyon @ 2021-07-15 9:09 UTC (permalink / raw) To: Richard Sandiford, Jonathan Wright, gcc-patches, Kyrylo Tkachov On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: > Jonathan Wright <Jonathan.Wright@arm.com> writes: > > Hi, > > > > Version 2 of this patch adds more code generation tests to show the > > benefit of this RTL simplification as well as adding a new helper > function > > 'rtx_vec_series_p' to reduce code duplication. > > > > Patch tested as version 1 - ok for master? > > Sorry for the slow reply. > > > Regression tested and bootstrapped on aarch64-none-linux-gnu, > > x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and > > aarch64_be-none-linux-gnu - no issues. > > I've also tested this on powerpc64le-unknown-linux-gnu, no issues again. > > > diff --git a/gcc/combine.c b/gcc/combine.c > > index > 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4 > 100644 > > --- a/gcc/combine.c > > +++ b/gcc/combine.c > > @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode > op0_mode, int in_dest, > > - 1, > > 0)); > > break; > > + case VEC_SELECT: > > + { > > + rtx trueop0 = XEXP (x, 0); > > + mode = GET_MODE (trueop0); > > + rtx trueop1 = XEXP (x, 1); > > + int nunits; > > + /* If we select a low-part subreg, return that. */ > > + if (GET_MODE_NUNITS (mode).is_constant (&nunits) > > + && targetm.can_change_mode_class (mode, GET_MODE (x), > ALL_REGS)) > > + { > > + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) > : 0; > > + > > + if (rtx_vec_series_p (trueop1, offset)) > > + { > > + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode); > > + if (new_rtx != NULL_RTX) > > + return new_rtx; > > + } > > + } > > + } > > Since this occurs three times, I think it would be worth having > a new predicate: > > /* Return true if, for all OP of mode OP_MODE: > > (vec_select:RESULT_MODE OP SEL) > > is equivalent to the lowpart RESULT_MODE of OP. */ > > bool > vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx > sel) > > containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class > and rtx_vec_series_p tests. > > I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p > is in emit-rtl.c. > > > diff --git a/gcc/config/aarch64/aarch64.md > b/gcc/config/aarch64/aarch64.md > > index > aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105 > 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -1884,15 +1884,16 @@ > > ) > > > > (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64" > > - [(set (match_operand:GPI 0 "register_operand" "=r,r,w") > > - (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" > "r,m,m")))] > > + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r") > > + (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" > "r,m,m,w")))] > > "" > > "@ > > and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask> > > ldr<SHORT:size>\t%w0, %1 > > - ldr\t%<SHORT:size>0, %1" > > - [(set_attr "type" "logic_imm,load_4,f_loads") > > - (set_attr "arch" "*,*,fp")] > > + ldr\t%<SHORT:size>0, %1 > > + umov\t%w0, %1.<SHORT:size>[0]" > > + [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp") > > + (set_attr "arch" "*,*,fp,fp")] > > FTR (just to show I thought about it): I don't know whether the umov > can really be considered an fp operation rather than a simd operation, > but since we don't support fp without simd, this is already a distinction > without a difference. So the pattern is IMO OK as-is. > > > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md > > index > 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35 > 100644 > > --- a/gcc/config/arm/vfp.md > > +++ b/gcc/config/arm/vfp.md > > @@ -224,7 +224,7 @@ > > ;; problems because small constants get converted into adds. > > (define_insn "*arm_movsi_vfp" > > [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m > ,*t,r,*t,*t, *Uv") > > - (match_operand:SI 1 "general_operand" "rk, > I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))] > > + (match_operand:SI 1 "general_operand" "rk, > I,K,j,mi,rk,r,t,*t,*Uvi,*t"))] > > "TARGET_ARM && TARGET_HARD_FLOAT > > && ( s_register_operand (operands[0], SImode) > > || s_register_operand (operands[1], SImode))" > > I'll assume that an Arm maintainer would have spoken up by now if > they didn't want this for some reason. > > > diff --git a/gcc/rtl.c b/gcc/rtl.c > > index > aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa > 100644 > > --- a/gcc/rtl.c > > +++ b/gcc/rtl.c > > @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec) > > } > > } > > > > +/* Return true if element-selection indices in VEC are in series. */ > > + > > +bool > > +rtx_vec_series_p (const_rtx vec, int start) > > I think rtvec_series_p would be better, for consistency with > rtvec_all_equal_p. Also, let's generalise it to: > > /* Return true if VEC contains a linear series of integers > { START, START+1, START+2, ... }. */ > > bool > rtvec_series_p (rtvec vec, int start) > { > } > > > +{ > > + for (int i = 0; i < XVECLEN (vec, 0); i++) > > + { > > + if (i + start != INTVAL (XVECEXP (vec, 0, i))) > > + return false; > > + } > > + return true; > > With the general definition I think this should be: > > for (int i = 0; i < GET_NUM_ELEM (vec); i++) > { > rtx x = RTVEC_ELT (vec, i); > if (!CONST_INT_P (x) || INTVAL (x) != i + start) > return false; > } > > Then pass XVEC (sel, 0) to the function, instead of just sel. > > OK with those changes, thanks. > > Hi, Some of the updated tests fail on aarch64_be: gcc.target/aarch64/sve/extract_1.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 2 gcc.target/aarch64/sve/extract_1.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 2 gcc.target/aarch64/sve/extract_2.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 2 gcc.target/aarch64/sve/extract_2.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 2 gcc.target/aarch64/sve/extract_3.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 5 gcc.target/aarch64/sve/extract_3.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 5 gcc.target/aarch64/sve/extract_4.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 6 gcc.target/aarch64/sve/extract_4.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 6 Can you check? Thanks, Christophe > Richard > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification 2021-07-15 9:09 ` Christophe Lyon @ 2021-07-15 13:06 ` Jonathan Wright 2021-08-03 9:36 ` Christophe Lyon 0 siblings, 1 reply; 6+ messages in thread From: Jonathan Wright @ 2021-07-15 13:06 UTC (permalink / raw) To: Christophe Lyon, Richard Sandiford, gcc-patches, Kyrylo Tkachov Ah, yes - those test results should have only been changed for little endian. I've submitted a patch to the list restoring the original expected results for big endian. Thanks, Jonathan ________________________________ From: Christophe Lyon <christophe.lyon.oss@gmail.com> Sent: 15 July 2021 10:09 To: Richard Sandiford <Richard.Sandiford@arm.com>; Jonathan Wright <Jonathan.Wright@arm.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> Subject: Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote: Jonathan Wright <Jonathan.Wright@arm.com<mailto:Jonathan.Wright@arm.com>> writes: > Hi, > > Version 2 of this patch adds more code generation tests to show the > benefit of this RTL simplification as well as adding a new helper function > 'rtx_vec_series_p' to reduce code duplication. > > Patch tested as version 1 - ok for master? Sorry for the slow reply. > Regression tested and bootstrapped on aarch64-none-linux-gnu, > x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and > aarch64_be-none-linux-gnu - no issues. I've also tested this on powerpc64le-unknown-linux-gnu, no issues again. > diff --git a/gcc/combine.c b/gcc/combine.c > index 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int in_dest, > - 1, > 0)); > break; > + case VEC_SELECT: > + { > + rtx trueop0 = XEXP (x, 0); > + mode = GET_MODE (trueop0); > + rtx trueop1 = XEXP (x, 1); > + int nunits; > + /* If we select a low-part subreg, return that. */ > + if (GET_MODE_NUNITS (mode).is_constant (&nunits) > + && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS)) > + { > + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0; > + > + if (rtx_vec_series_p (trueop1, offset)) > + { > + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode); > + if (new_rtx != NULL_RTX) > + return new_rtx; > + } > + } > + } Since this occurs three times, I think it would be worth having a new predicate: /* Return true if, for all OP of mode OP_MODE: (vec_select:RESULT_MODE OP SEL) is equivalent to the lowpart RESULT_MODE of OP. */ bool vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx sel) containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class and rtx_vec_series_p tests. I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p is in emit-rtl.c. > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -1884,15 +1884,16 @@ > ) > > (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64" > - [(set (match_operand:GPI 0 "register_operand" "=r,r,w") > - (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m")))] > + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r") > + (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m,w")))] > "" > "@ > and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask> > ldr<SHORT:size>\t%w0, %1 > - ldr\t%<SHORT:size>0, %1" > - [(set_attr "type" "logic_imm,load_4,f_loads") > - (set_attr "arch" "*,*,fp")] > + ldr\t%<SHORT:size>0, %1 > + umov\t%w0, %1.<SHORT:size>[0]" > + [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp") > + (set_attr "arch" "*,*,fp,fp")] FTR (just to show I thought about it): I don't know whether the umov can really be considered an fp operation rather than a simd operation, but since we don't support fp without simd, this is already a distinction without a difference. So the pattern is IMO OK as-is. > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md > index 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35 100644 > --- a/gcc/config/arm/vfp.md > +++ b/gcc/config/arm/vfp.md > @@ -224,7 +224,7 @@ > ;; problems because small constants get converted into adds. > (define_insn "*arm_movsi_vfp" > [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m ,*t,r,*t,*t, *Uv") > - (match_operand:SI 1 "general_operand" "rk, I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))] > + (match_operand:SI 1 "general_operand" "rk, I,K,j,mi,rk,r,t,*t,*Uvi,*t"))] > "TARGET_ARM && TARGET_HARD_FLOAT > && ( s_register_operand (operands[0], SImode) > || s_register_operand (operands[1], SImode))" I'll assume that an Arm maintainer would have spoken up by now if they didn't want this for some reason. > diff --git a/gcc/rtl.c b/gcc/rtl.c > index aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa 100644 > --- a/gcc/rtl.c > +++ b/gcc/rtl.c > @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec) > } > } > > +/* Return true if element-selection indices in VEC are in series. */ > + > +bool > +rtx_vec_series_p (const_rtx vec, int start) I think rtvec_series_p would be better, for consistency with rtvec_all_equal_p. Also, let's generalise it to: /* Return true if VEC contains a linear series of integers { START, START+1, START+2, ... }. */ bool rtvec_series_p (rtvec vec, int start) { } > +{ > + for (int i = 0; i < XVECLEN (vec, 0); i++) > + { > + if (i + start != INTVAL (XVECEXP (vec, 0, i))) > + return false; > + } > + return true; With the general definition I think this should be: for (int i = 0; i < GET_NUM_ELEM (vec); i++) { rtx x = RTVEC_ELT (vec, i); if (!CONST_INT_P (x) || INTVAL (x) != i + start) return false; } Then pass XVEC (sel, 0) to the function, instead of just sel. OK with those changes, thanks. Hi, Some of the updated tests fail on aarch64_be: gcc.target/aarch64/sve/extract_1.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 2 gcc.target/aarch64/sve/extract_1.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 2 gcc.target/aarch64/sve/extract_2.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 2 gcc.target/aarch64/sve/extract_2.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 2 gcc.target/aarch64/sve/extract_3.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 5 gcc.target/aarch64/sve/extract_3.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 5 gcc.target/aarch64/sve/extract_4.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 6 gcc.target/aarch64/sve/extract_4.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 6 Can you check? Thanks, Christophe Richard ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification 2021-07-15 13:06 ` Jonathan Wright @ 2021-08-03 9:36 ` Christophe Lyon 0 siblings, 0 replies; 6+ messages in thread From: Christophe Lyon @ 2021-08-03 9:36 UTC (permalink / raw) To: Jonathan Wright; +Cc: Richard Sandiford, gcc-patches, Kyrylo Tkachov Hi, Since the arm-linux toolchain build has been fixed, I have noticed additional failures on armeb: gcc.target/arm/crypto-vsha1cq_u32.c scan-assembler-times vdup.32\\tq[0-9]+, r[0-9]+ 4 gcc.target/arm/crypto-vsha1cq_u32.c scan-assembler-times vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3 gcc.target/arm/crypto-vsha1h_u32.c scan-assembler-times vdup.32\\tq[0-9]+, r[0-9]+ 4 gcc.target/arm/crypto-vsha1h_u32.c scan-assembler-times vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3 gcc.target/arm/crypto-vsha1mq_u32.c scan-assembler-times vdup.32\\tq[0-9]+, r[0-9]+ 4 gcc.target/arm/crypto-vsha1mq_u32.c scan-assembler-times vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3 gcc.target/arm/crypto-vsha1pq_u32.c scan-assembler-times vdup.32\\tq[0-9]+, r[0-9]+ 4 gcc.target/arm/crypto-vsha1pq_u32.c scan-assembler-times vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3 I don't see them mentioned in this thread though? Can you check? Thanks Christophe On Thu, Jul 15, 2021 at 3:07 PM Jonathan Wright <Jonathan.Wright@arm.com> wrote: > Ah, yes - those test results should have only been changed for little > endian. > > I've submitted a patch to the list restoring the original expected results > for big > endian. > > Thanks, > Jonathan > ------------------------------ > *From:* Christophe Lyon <christophe.lyon.oss@gmail.com> > *Sent:* 15 July 2021 10:09 > *To:* Richard Sandiford <Richard.Sandiford@arm.com>; Jonathan Wright < > Jonathan.Wright@arm.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; > Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> > *Subject:* Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification > > > > On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches < > gcc-patches@gcc.gnu.org> wrote: > > Jonathan Wright <Jonathan.Wright@arm.com> writes: > > Hi, > > > > Version 2 of this patch adds more code generation tests to show the > > benefit of this RTL simplification as well as adding a new helper > function > > 'rtx_vec_series_p' to reduce code duplication. > > > > Patch tested as version 1 - ok for master? > > Sorry for the slow reply. > > > Regression tested and bootstrapped on aarch64-none-linux-gnu, > > x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and > > aarch64_be-none-linux-gnu - no issues. > > I've also tested this on powerpc64le-unknown-linux-gnu, no issues again. > > > diff --git a/gcc/combine.c b/gcc/combine.c > > index > 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4 > 100644 > > --- a/gcc/combine.c > > +++ b/gcc/combine.c > > @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode > op0_mode, int in_dest, > > - 1, > > 0)); > > break; > > + case VEC_SELECT: > > + { > > + rtx trueop0 = XEXP (x, 0); > > + mode = GET_MODE (trueop0); > > + rtx trueop1 = XEXP (x, 1); > > + int nunits; > > + /* If we select a low-part subreg, return that. */ > > + if (GET_MODE_NUNITS (mode).is_constant (&nunits) > > + && targetm.can_change_mode_class (mode, GET_MODE (x), > ALL_REGS)) > > + { > > + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) > : 0; > > + > > + if (rtx_vec_series_p (trueop1, offset)) > > + { > > + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode); > > + if (new_rtx != NULL_RTX) > > + return new_rtx; > > + } > > + } > > + } > > Since this occurs three times, I think it would be worth having > a new predicate: > > /* Return true if, for all OP of mode OP_MODE: > > (vec_select:RESULT_MODE OP SEL) > > is equivalent to the lowpart RESULT_MODE of OP. */ > > bool > vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx > sel) > > containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class > and rtx_vec_series_p tests. > > I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p > is in emit-rtl.c. > > > diff --git a/gcc/config/aarch64/aarch64.md > b/gcc/config/aarch64/aarch64.md > > index > aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105 > 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -1884,15 +1884,16 @@ > > ) > > > > (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64" > > - [(set (match_operand:GPI 0 "register_operand" "=r,r,w") > > - (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" > "r,m,m")))] > > + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r") > > + (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" > "r,m,m,w")))] > > "" > > "@ > > and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask> > > ldr<SHORT:size>\t%w0, %1 > > - ldr\t%<SHORT:size>0, %1" > > - [(set_attr "type" "logic_imm,load_4,f_loads") > > - (set_attr "arch" "*,*,fp")] > > + ldr\t%<SHORT:size>0, %1 > > + umov\t%w0, %1.<SHORT:size>[0]" > > + [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp") > > + (set_attr "arch" "*,*,fp,fp")] > > FTR (just to show I thought about it): I don't know whether the umov > can really be considered an fp operation rather than a simd operation, > but since we don't support fp without simd, this is already a distinction > without a difference. So the pattern is IMO OK as-is. > > > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md > > index > 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35 > 100644 > > --- a/gcc/config/arm/vfp.md > > +++ b/gcc/config/arm/vfp.md > > @@ -224,7 +224,7 @@ > > ;; problems because small constants get converted into adds. > > (define_insn "*arm_movsi_vfp" > > [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m > ,*t,r,*t,*t, *Uv") > > - (match_operand:SI 1 "general_operand" "rk, > I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))] > > + (match_operand:SI 1 "general_operand" "rk, > I,K,j,mi,rk,r,t,*t,*Uvi,*t"))] > > "TARGET_ARM && TARGET_HARD_FLOAT > > && ( s_register_operand (operands[0], SImode) > > || s_register_operand (operands[1], SImode))" > > I'll assume that an Arm maintainer would have spoken up by now if > they didn't want this for some reason. > > > diff --git a/gcc/rtl.c b/gcc/rtl.c > > index > aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa > 100644 > > --- a/gcc/rtl.c > > +++ b/gcc/rtl.c > > @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec) > > } > > } > > > > +/* Return true if element-selection indices in VEC are in series. */ > > + > > +bool > > +rtx_vec_series_p (const_rtx vec, int start) > > I think rtvec_series_p would be better, for consistency with > rtvec_all_equal_p. Also, let's generalise it to: > > /* Return true if VEC contains a linear series of integers > { START, START+1, START+2, ... }. */ > > bool > rtvec_series_p (rtvec vec, int start) > { > } > > > +{ > > + for (int i = 0; i < XVECLEN (vec, 0); i++) > > + { > > + if (i + start != INTVAL (XVECEXP (vec, 0, i))) > > + return false; > > + } > > + return true; > > With the general definition I think this should be: > > for (int i = 0; i < GET_NUM_ELEM (vec); i++) > { > rtx x = RTVEC_ELT (vec, i); > if (!CONST_INT_P (x) || INTVAL (x) != i + start) > return false; > } > > Then pass XVEC (sel, 0) to the function, instead of just sel. > > OK with those changes, thanks. > > > Hi, > > Some of the updated tests fail on aarch64_be: > gcc.target/aarch64/sve/extract_1.c scan-assembler-times > \\tfmov\\tw[0-9]+, s[0-9]\\n 2 > gcc.target/aarch64/sve/extract_1.c scan-assembler-times > \\tfmov\\tx[0-9]+, d[0-9]\\n 2 > gcc.target/aarch64/sve/extract_2.c scan-assembler-times > \\tfmov\\tw[0-9]+, s[0-9]\\n 2 > gcc.target/aarch64/sve/extract_2.c scan-assembler-times > \\tfmov\\tx[0-9]+, d[0-9]\\n 2 > gcc.target/aarch64/sve/extract_3.c scan-assembler-times > \\tfmov\\tw[0-9]+, s[0-9]\\n 5 > gcc.target/aarch64/sve/extract_3.c scan-assembler-times > \\tfmov\\tx[0-9]+, d[0-9]\\n 5 > gcc.target/aarch64/sve/extract_4.c scan-assembler-times > \\tfmov\\tw[0-9]+, s[0-9]\\n 6 > gcc.target/aarch64/sve/extract_4.c scan-assembler-times > \\tfmov\\tx[0-9]+, d[0-9]\\n 6 > > Can you check? > > Thanks, > > Christophe > > > > > Richard > > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-08-03 9:36 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-02 9:53 [PATCH] gcc: Add vec_select -> subreg RTL simplification Jonathan Wright 2021-07-07 13:35 ` [PATCH V2] " Jonathan Wright 2021-07-12 15:30 ` Richard Sandiford 2021-07-15 9:09 ` Christophe Lyon 2021-07-15 13:06 ` Jonathan Wright 2021-08-03 9:36 ` Christophe Lyon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).