From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2017) id E241F3858402; Tue, 12 Sep 2023 20:47:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E241F3858402 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694551624; bh=MmmO0DfL+4vPnrP5fWYCn+JG1EvgtssoGVrC9yuMRk0=; h=From:To:Subject:Date:From; b=K3Teu0aCoLNfg6lh/TBe9i1VGrbe9FPxWB4mQlcPXP7J+5XmjB41zrHWarmNJYmaO R2sn+yc/JmZefbwsHDLdNtoyeQnL2iLkXfeBUXhDVWqIi3iFhdghzje+eji5dNEvfm 7BCjR3JyttZqUHZNf5Gi7W+HJmiD3iTE5js7Db7c= MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" From: Robin Dapp To: gcc-cvs@gcc.gnu.org Subject: [gcc r14-3911] RISC-V: Enable vec_int testsuite for RVV VLA vectorization X-Act-Checkin: gcc X-Git-Author: Juzhe-Zhong X-Git-Refname: refs/heads/master X-Git-Oldrev: 701b9309b687ed46188b9caeb7d88ad60b0212e5 X-Git-Newrev: fcf66bceb4670fcd6ed8efef7f64003354e609f1 Message-Id: <20230912204704.E241F3858402@sourceware.org> Date: Tue, 12 Sep 2023 20:47:04 +0000 (GMT) List-Id: https://gcc.gnu.org/g:fcf66bceb4670fcd6ed8efef7f64003354e609f1 commit r14-3911-gfcf66bceb4670fcd6ed8efef7f64003354e609f1 Author: Juzhe-Zhong Date: Wed Aug 30 20:05:49 2023 +0800 RISC-V: Enable vec_int testsuite for RVV VLA vectorization This patch is the final version of enabling vect_int test for RVV. There are still 80+ FAILs and they can't be fixed by adjusting testcases or target-supports.exp Here is the analysis of **ALL** FAILs: 1. REAL highest priority FAILs: ICE: FAIL: gcc.dg/vect/vect-live-6.c (internal compiler error: in force_align_down_and_div, at poly-int.h:1903) FAIL: gcc.dg/vect/vect-live-6.c (test for excess errors) FAIL: gcc.dg/vect/vect-live-6.c -flto -ffat-lto-objects (internal compiler error: in force_align_down_and_div, at poly-int.h:1903) FAIL: gcc.dg/vect/vect-live-6.c -flto -ffat-lto-objects (test for excess errors) Execution fails: FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/slp-reduc-7.c execution test FAIL: gcc.dg/vect/vect-alias-check-10.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-alias-check-10.c execution test FAIL: gcc.dg/vect/vect-alias-check-11.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-alias-check-11.c execution test FAIL: gcc.dg/vect/vect-alias-check-12.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-alias-check-12.c execution test FAIL: gcc.dg/vect/vect-alias-check-14.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-alias-check-14.c execution test FAIL: gcc.dg/vect/vect-double-reduc-5.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-double-reduc-5.c execution test These FAILs are REAL problem that we need to address first. 2. Missed optimizations due to lacking VLS modes patterns: FAIL: gcc.dg/vect/pr57705.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loop" 2 FAIL: gcc.dg/vect/pr57705.c scan-tree-dump-times vect "vectorized 1 loop" 2 FAIL: gcc.dg/vect/pr65518.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 0 loops in function" 2 FAIL: gcc.dg/vect/pr65518.c scan-tree-dump-times vect "vectorized 0 loops in function" 2 FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 4 FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using SLP" 4 FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-16.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-16.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-34-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-34-big-array.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-34.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-34.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-35.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-35.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-43.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 13 FAIL: gcc.dg/vect/slp-43.c scan-tree-dump-times vect "vectorized 1 loops" 13 FAIL: gcc.dg/vect/slp-45.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 13 FAIL: gcc.dg/vect/slp-45.c scan-tree-dump-times vect "vectorized 1 loops" 13 FAIL: gcc.dg/vect/slp-47.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-47.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-48.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-48.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2 These testcases need VLS modes vec_init patterns. FAIL: gcc.dg/vect/vect-bic-bitmask-12.c -flto -ffat-lto-objects scan-tree-dump dce7 "<=\\s*.+{ 255,.+}" FAIL: gcc.dg/vect/vect-bic-bitmask-12.c scan-tree-dump dce7 "<=\\s*.+{ 255,.+}" FAIL: gcc.dg/vect/vect-bic-bitmask-23.c -flto -ffat-lto-objects scan-tree-dump dce7 "<=\\s*.+{ 255, 15, 1, 65535 }" FAIL: gcc.dg/vect/vect-bic-bitmask-23.c scan-tree-dump dce7 "<=\\s*.+{ 255, 15, 1, 65535 }" These testcases need VLS modes VCOND_MASK and vec_cmp patterns. 3. Maybe bogus dump check FAILs: FAIL: gcc.dg/vect/vect-multitypes-11.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-multitypes-11.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-outer-4c-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "zero step in outer loop." 1 FAIL: gcc.dg/vect/vect-outer-4c-big-array.c scan-tree-dump-times vect "zero step in outer loop." 1 FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8a.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8a.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8a.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8a.c scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8b.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8b.c scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-u16b.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-u16b.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-u8a.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-u8a.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-u8b.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-dot-u8b.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-pattern-1a.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-pattern-1a.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-pattern-1b-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-pattern-1b-big-array.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-pattern-1c-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-pattern-1c-big-array.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-pattern-2a.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-pattern-2a.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-pattern-2b-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/vect-reduc-pattern-2b-big-array.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1 These testcases because we don't support widen_sum/vec_unpack....etc patterns. Currently, we don't support them since we don't see the benefits. May support those patterns if they are beneficial ? Or Fix testcases ? Conclusion: IMHO, I think we can merge this patch after we addressed all REAL highest priority issues (1). The rest FAILs are not big issues then we can reduce them by supporting more features (For example VLS modes). Feel free to give any comments. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Enable vect_int for RVV. Diff: --- gcc/testsuite/lib/target-supports.exp | 59 ++++++++++++++++++++++++++--------- 1 file changed, 45 insertions(+), 14 deletions(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 1080a5cfc443..edaa010258fa 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -1836,14 +1836,14 @@ proc check_effective_target_riscv_vector_hw { } { asm ("vadd.vv v8,v8,v16" : : : "v8"); return 0; } - } "-march=rv32gcv -mabi=ilp32d"] || [check_runtime riscv_vector_hw64 { + } ""] || [check_runtime riscv_vector_hw64 { int main (void) { asm ("vsetivli zero,8,e16,m1,ta,ma"); asm ("vadd.vv v8,v8,v16" : : : "v8"); return 0; } - } "-march=rv64gcv -mabi=lp64d"] + } ""] } # Return 1 if the we can build a Zvfh vector example with proper -march flags @@ -3821,6 +3821,7 @@ proc check_effective_target_vect_int { } { || [et-is-effective-target mips_msa])) || ([istarget s390*-*-*] && [check_effective_target_s390_vx]) + || [istarget riscv*-*-*] }}] } @@ -7535,7 +7536,8 @@ proc check_effective_target_vect_widen_sum_hi_to_si { } { return [check_cached_effective_target_indexed vect_widen_sum_hi_to_si { expr { [check_effective_target_vect_unpack] || [istarget powerpc*-*-*] - || [istarget ia64-*-*] }}] + || [istarget ia64-*-*] + || [istarget riscv*-*-*] }}] } # Return 1 if the target plus current options supports a vector @@ -7549,7 +7551,8 @@ proc check_effective_target_vect_widen_sum_qi_to_hi { } { return [check_cached_effective_target_indexed vect_widen_sum_qi_to_hi { expr { [check_effective_target_vect_unpack] || [is-effective-target arm_neon] - || [istarget ia64-*-*] }}] + || [istarget ia64-*-*] + || [istarget riscv*-*-*] }}] } # Return 1 if the target plus current options supports a vector @@ -7559,7 +7562,8 @@ proc check_effective_target_vect_widen_sum_qi_to_hi { } { proc check_effective_target_vect_widen_sum_qi_to_si { } { return [check_cached_effective_target_indexed vect_widen_sum_qi_to_si { - expr { [istarget powerpc*-*-*] }}] + expr { [istarget powerpc*-*-*] + || [istarget riscv*-*-*] }}] } # Return 1 if the target plus current options supports a vector @@ -7848,7 +7852,8 @@ proc check_effective_target_vect_hw_misalign { } { || [istarget aarch64*-*-*] || ([istarget mips*-*-*] && [et-is-effective-target mips_msa]) || ([istarget s390*-*-*] - && [check_effective_target_s390_vx]) } { + && [check_effective_target_s390_vx]) + || ([istarget riscv*-*-*]) } { return 1 } if { [istarget arm*-*-*] @@ -7954,7 +7959,8 @@ proc check_effective_target_vect_check_ptrs { } { proc check_effective_target_vect_fully_masked { } { return [expr { [check_effective_target_aarch64_sve] - || [istarget amdgcn*-*-*] }] + || [istarget amdgcn*-*-*] + || [check_effective_target_riscv_vector] }] } # Return true if the target supports the @code{len_load} and @@ -7962,7 +7968,8 @@ proc check_effective_target_vect_fully_masked { } { proc check_effective_target_vect_len_load_store { } { return [expr { [check_effective_target_has_arch_pwr9] - || [check_effective_target_s390_vx] }] + || [check_effective_target_s390_vx] + || [check_effective_target_riscv_vector] }] } # Return the value of parameter vect-partial-vector-usage specified for @@ -8023,8 +8030,9 @@ proc check_effective_target_vect_partial_vectors { } { # alignment during vectorization. proc check_effective_target_vect_element_align_preferred { } { - return [expr { [check_effective_target_aarch64_sve] - && [check_effective_target_vect_variable_length] }] + return [expr { ([check_effective_target_aarch64_sve] + && [check_effective_target_vect_variable_length]) + || [check_effective_target_riscv_vector] }] } # Return true if vectorization of v2qi/v4qi/v8qi/v16qi/v2hi store is enabed. @@ -8429,7 +8437,8 @@ proc check_effective_target_vect_load_lanes { } { return [check_cached_effective_target vect_load_lanes { expr { ([check_effective_target_arm_little_endian] && [check_effective_target_arm_neon_ok]) - || [istarget aarch64*-*-*] }}] + || [istarget aarch64*-*-*] + || [istarget riscv*-*-*] }}] } # Return 1 if the target supports vector masked loads. @@ -8445,7 +8454,8 @@ proc check_effective_target_vect_masked_load { } { proc check_effective_target_vect_masked_store { } { return [expr { [check_avx_available] || [check_effective_target_aarch64_sve] - || [istarget amdgcn*-*-*] }] + || [istarget amdgcn*-*-*] + || [check_effective_target_riscv_vector] }] } # Return 1 if the target supports vector gather loads via internal functions. @@ -8525,7 +8535,8 @@ proc check_effective_target_vect_short_mult { } { || [et-is-effective-target mips_loongson_mmi])) || ([istarget s390*-*-*] && [check_effective_target_s390_vx]) - || [istarget amdgcn-*-*] }}] + || [istarget amdgcn-*-*] + || [istarget riscv*-*-*] }}] } # Return 1 if the target supports vector int multiplication, 0 otherwise. @@ -8541,7 +8552,8 @@ proc check_effective_target_vect_int_mult { } { || [check_effective_target_arm32] || ([istarget s390*-*-*] && [check_effective_target_s390_vx]) - || [istarget amdgcn-*-*] }}] + || [istarget amdgcn-*-*] + || [istarget riscv*-*-*] }}] } # Return 1 if the target supports 64 bit hardware vector @@ -8623,6 +8635,9 @@ foreach N {2 3 4 8} { || [istarget aarch64*-*-*]) && N >= 2 && N <= 4 } { return 1 } + if { ([istarget riscv*-*-*]) && N >= 2 && N <= 8 } { + return 1 + } if [check_effective_target_vect_fully_masked] { return 1 } @@ -8659,6 +8674,11 @@ proc available_vector_sizes { } { } elseif { [istarget amdgcn*-*-*] } { # 6 different lane counts, and 4 element sizes lappend result 4096 2048 1024 512 256 128 64 32 16 8 4 2 + } elseif { [istarget riscv*-*-*] } { + if { [check_effective_target_riscv_vector] } { + lappend result 0 32 + } + lappend result 128 } else { # The traditional default asumption. lappend result 128 @@ -11143,6 +11163,17 @@ proc check_vect_support_and_set_flags { } { } } elseif [istarget amdgcn-*-*] { set dg-do-what-default run + } elseif [istarget riscv64-*-*] { + if [check_effective_target_riscv_vector_hw] { + lappend DEFAULT_VECTCFLAGS "--param" "riscv-autovec-preference=scalable" + lappend DEFAULT_VECTCFLAGS "--param" "riscv-vector-abi" + set dg-do-what-default run + } else { + lappend DEFAULT_VECTCFLAGS "-march=rv64gcv_zvfh" "-mabi=lp64d" + lappend DEFAULT_VECTCFLAGS "--param" "riscv-autovec-preference=scalable" + lappend DEFAULT_VECTCFLAGS "--param" "riscv-vector-abi" + set dg-do-what-default compile + } } else { return 0 }