From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by sourceware.org (Postfix) with ESMTPS id DB8CC3858D37 for ; Tue, 24 Oct 2023 15:03:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DB8CC3858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DB8CC3858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::432 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698159820; cv=none; b=pE4z/spLGNxM88ZmVwUDAn+Cg97QypuhW//RHrEwQHJVpTZmJP5r/oZ/qD+dQKhV55Mul3N5zZwHb9OtlnTJ2pHa5ZvNYvitScqfcK+wzgOlmiKfKIFDHoycTSOqtau4ULqu4OJ3O1NO5LBTRX+wFYQVuAG5PiofcewAIKnniAc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698159820; c=relaxed/simple; bh=N3lST9cTaIBHFTq9HDzhWTz9Iho3RSdJl792RAoLQ6c=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:From:To; b=uYGqTIglwAvbVDuvSNcd2qZAfNIitFP1TyfjzD4RfQXY6PSRgruBbhmHwccxXXA9MVS3utF9Mk1dXbQ6jKRhTQsUt5EoaVWoX0ZY7XRH8uPIim/zcNRny7P45CmRhgIhKcz2nKNns8Vmb9CkWcCWaFwxL7y/MF21+YU/K7JNMNw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-6bd0e1b1890so3568093b3a.3 for ; Tue, 24 Oct 2023 08:03:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1698159809; x=1698764609; darn=gcc.gnu.org; h=in-reply-to:references:cc:to:from:content-language:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=ImClya6pX4ZYPcaQ5lq682lyV0JwDZ0wZ6Eq6m/IjAM=; b=jYKFzHA8nPWwTjF03uF22hf982J0+YP8SXeIRgU3ONxFeUA+E1eb9IA1gcHirsq6Kf sHajQUr4DKduWTaMHrCSrP90JARj9j30jLhu16LC6xUhv66+b7hOyEPuo+BbUD+htfaC KDcB47xwJiGDxbre9t8KLMUopfv7PK/dQj7jrtMO+N87ZVxm6E0bNZWw9kWbWAeY/M7f oIWyO6vdPwyQQrU3eoUlaPm4fAIFXvzSULEU67Yk+SpFb00UdTb/utzRs6JXw1jUOsLd /nCmz97EgvYTQV5MAhERUlPK40b5inoNjXKEaiaysSnwTbPl6X+8n9o0nmlNgiocGVad PsXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698159809; x=1698764609; h=in-reply-to:references:cc:to:from:content-language:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=ImClya6pX4ZYPcaQ5lq682lyV0JwDZ0wZ6Eq6m/IjAM=; b=UQAf5lDxjDtUTJNhDaucFKqISR3BBwK7MRGsDywCE0T5jiVG+AVG3czhWbq+S7mTxI qKg8PyII4uoQCFfCl3ajFDPbyHm8UZvGbk/pRjgLC1+Cu1Cu49LH1/0atKEVxVMFmu7u BgZI/5GiyUiWkvVoPIEV8wgOv+0tSocxPAzDKLYYUZRcQoqC8bi9IPpWGyg3gXYP7vDo dCBXfUVDrHpQdWDJTPkJ5fT36BNIsHbDHbEUUMg4zB8lMta3R2G5JyP6Oh9+rzdTDvnD aHQiGGZZMlQkl/g6qRxpQ+yk7YeOOkrk2SWXg/uP1+InhNggybmiEbll7FHAHCo30YT+ JsLg== X-Gm-Message-State: AOJu0YyZy2g1kCYlsXHQsl7pNEiS/D9kBIaI9hpIbn1zkbZsqflegCef erUQHWFSnVagtbh6IbBjhQutvw== X-Google-Smtp-Source: AGHT+IGHGAGgldQhHQBF+COWAtsBrU4oiNgNOWme89qTrzldEpUqlkyNNQ6U9yB5UCEF0gFP5h3qDA== X-Received: by 2002:a05:6a21:3e0d:b0:171:8e16:ea83 with SMTP id bk13-20020a056a213e0d00b001718e16ea83mr2752051pzc.29.1698159805028; Tue, 24 Oct 2023 08:03:25 -0700 (PDT) Received: from ?IPV6:2601:647:5700:6860:cead:830d:6436:e172? ([2601:647:5700:6860:cead:830d:6436:e172]) by smtp.gmail.com with ESMTPSA id y27-20020aa78f3b000000b006bdc8bb2ed5sm7662856pfr.82.2023.10.24.08.03.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 24 Oct 2023 08:03:24 -0700 (PDT) Content-Type: multipart/alternative; boundary="------------x0PQCsXXJBiqhfVf2Jt78yA0" Message-ID: <0f93e039-9bd3-07f1-2d48-9b4a13efe99b@rivosinc.com> Date: Tue, 24 Oct 2023 08:03:22 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization Content-Language: en-US From: Patrick O'Neill To: "juzhe.zhong@rivai.ai" , gcc-patches Cc: "kito.cheng" , "Kito.cheng" , jeffreyalaw , Robin Dapp References: <20231024033200.224558-1-juzhe.zhong@rivai.ai> <4A9A3B661519DAC3+202310241144138297945@rivai.ai> <6e22f033-887a-3bf1-316a-5e5ec69a3434@rivosinc.com> In-Reply-To: <6e22f033-887a-3bf1-316a-5e5ec69a3434@rivosinc.com> X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,HTML_MESSAGE,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SCC_10_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------x0PQCsXXJBiqhfVf2Jt78yA0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit I'm seeing a variety of new failures, constrained to rv32gcv: Tested using newlib/linux: rv32gcv/ ilp32d/ medlow rv64gcv/  lp64d/ medlow rv64gcv_zvbb_zvbc_zvkg_zvkn_zvknc_zvkned_zvkng_zvknha_zvknhb_zvks_zvksc_zvksed_zvksg_zvksh_zvkt/ lp64d/ medlow rv64imafdcv_zicond_zawrs_zbc_zvkng_zvksg_zvbb_zvbc_zicsr_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt/ lp64d/ medlow Newlib failures: rv32gcv: FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-alias-check-16.c execution test FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-alias-check-20.c execution test FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-reduc-10.c execution test FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test Debug log for testcases that aren't pr110557.c look like this: Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o ./popcount-run-1.exe (timeout = 600) spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o ./popcount-run-1.exe PASS: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c (test for excess errors) spawn riscv64-unknown-elf-run ./popcount-run-1.exe FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test Debug log for pr110557.c: Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs  -lm  -o ./pr110557.exe    (timeout = 600) spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs -lm -o ./pr110557.exe PASS: g++.dg/vect/pr110557.cc  -std=c++14 (test for excess errors) spawn riscv64-unknown-elf-run ./pr110557.exe /scratch/tc-testing/tc-oct-23-avl/build-newlib/../scripts/wrapper/qemu/riscv64-unknown-elf-run: line 15: 3449805 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test Linux failures: rv32gcv: FAIL: gcc.dg/nextafter-2.c execution test FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-alias-check-16.c execution test FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-alias-check-20.c execution test FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-reduc-10.c execution test FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test FAIL: gfortran.dg/default_format_2.f90   -O0  execution test FAIL: gfortran.dg/default_format_2.f90   -O1  execution test FAIL: gfortran.dg/default_format_2.f90   -O2  execution test FAIL: gfortran.dg/default_format_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test FAIL: gfortran.dg/default_format_2.f90   -O3 -g  execution test FAIL: gfortran.dg/default_format_2.f90   -Os  execution test FAIL: gfortran.dg/default_format_denormal_2.f90   -O0  execution test FAIL: gfortran.dg/default_format_denormal_2.f90   -O1  execution test FAIL: gfortran.dg/default_format_denormal_2.f90   -O2  execution test FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g execution test FAIL: gfortran.dg/default_format_denormal_2.f90   -Os  execution test FAIL: gfortran.dg/large_real_kind_2.F90   -O0  execution test FAIL: gfortran.dg/round_4.f90   -O0  execution test FAIL: gfortran.dg/zero_sized_3.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test FAIL: gfortran.dg/ieee/large_1.f90   -O0  execution test FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test FAIL: gfortran.dg/ieee/large_2.f90   -O1  execution test FAIL: gfortran.dg/ieee/large_2.f90   -O2  execution test FAIL: gfortran.dg/ieee/large_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test FAIL: gfortran.dg/ieee/large_2.f90   -O3 -g  execution test FAIL: gfortran.dg/ieee/large_2.f90   -Os  execution test FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops FAIL: gfortran.fortran-torture/execute/intrinsic_sum.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops Some (not all) debug log outputs: Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x (timeout = 600) spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x PASS: gfortran.fortran-torture/execute/intrinsic_count.f90 compilation, -O2 -fomit-frame-pointer -finline-functions spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x STOP 2 FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution, -O2 -fomit-frame-pointer -finline-functions Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -funroll-loops -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x (timeout = 600) spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -funroll-loops -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation, -O2 -fomit-frame-pointer -finline-functions -funroll-loops spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x STOP 3 FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution, -O2 -fomit-frame-pointer -finline-functions -funroll-loops Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o ./large_2.exe (timeout = 600) spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o ./large_2.exe PASS: gfortran.dg/ieee/large_2.f90 -O0 (test for excess errors) spawn riscv64-unknown-linux-gnu-run ./large_2.exe 0.333333333333333333333333333333333317 2.24271998593667819112500193394291495E+1644 STOP 1 FAIL: gfortran.dg/ieee/large_2.f90 -O0 execution test Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm -o ./pr110557.exe (timeout = 600) spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm -o ./pr110557.exe PASS: g++.dg/vect/pr110557.cc -std=c++98 (test for excess errors) spawn riscv64-unknown-linux-gnu-run ./pr110557.exe /scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 323485 Trace/breakpoint trap (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL: g++.dg/vect/pr110557.cc -std=c++98 execution test Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe (timeout = 600) spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe PASS: gcc.dg/vect/vect-reduc-dot-21.c (test for excess errors) spawn riscv64-unknown-linux-gnu-run ./vect-reduc-dot-21.exe /scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3484803 Aborted (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-alias-check-16.exe (timeout = 600) spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-alias-check-16.exe PASS: gcc.dg/vect/vect-alias-check-16.c (test for excess errors) spawn riscv64-unknown-linux-gnu-run ./vect-alias-check-16.exe /scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3431975 Aborted (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL: gcc.dg/vect/vect-alias-check-16.c execution test PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "flags: *RAW\\n" PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "using an address-based overlap test" PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump-not vect "using an index-based" I've observed nextafter-2.c being flaky on the CI so that particular failure might not be real. If you want any particular testcase's debug logs please let me know. Patrick On 10/23/23 21:30, Patrick O'Neill wrote: > > The CI just picked it up: > https://github.com/ewlu/gcc-precommit-ci/issues/449#issue-1958483272 > Since it doesn't apply to the CI's baseline hash it's only performing > a build. > I'll re-run it in the morning once the baseline has been updated. > > In the meantime I started a full build+test run on my local machine. > I'll send you the results in ~10 hours - morning my time :-) > > Patrick > > On 10/23/23 20:44, juzhe.zhong@rivai.ai wrote: >> CCing Patrick... >> >> Hi, @Patrick. >> Could you apply this patch and trigger your regression CI? >> >> I don't have an environment to test fortran for now (I only test it >> on C/C++). >> >> Thanks. >> >> ------------------------------------------------------------------------ >> juzhe.zhong@rivai.ai >> >> *From:* Juzhe-Zhong >> *Date:* 2023-10-24 11:32 >> *To:* gcc-patches >> *CC:* kito.cheng ; kito.cheng >> ; jeffreyalaw >> ; rdapp.gcc >> ; Juzhe-Zhong >> >> *Subject:* [PATCH] RISC-V: Add AVL propagation PASS for RVV >> auto-vectorization >> This patch addresses the redundant AVL/VL toggling in RVV partial >> auto-vectorization >> which is a known issue for a long time and I finally find the >> time to address it. >> Consider a simple vector addition operation: >> https://godbolt.org/z/7hfGfEjW3 >> void >> foo (int *__restrict a, >>      int *__restrict b, >>      int *__restrict n) >> { >>   for (int i = 0; i < n; i++) >>       a[i] = a[i] + b[i]; >> } >> Optimized IR: >> Loop body: >>   _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, >> 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma >>   ... >>   vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, >> _38, 0);    -> vle32.v v2,0(a0) >>   vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, >> _38, 0);   -> vle32.v v1,0(a1) >>   vect__7.12_19 = vect__6.11_20 + >> vect__4.8_27;                              -> vsetvli >> a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2 >>   .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, >> vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4) >> We can see 2 redundant vsetvls inside the loop body due to AVL/VL >> toggling. >> The AVL/VL toggling is because we are missing LEN information in >> simple PLUS_EXPR GIMPLE assignment: >> vect__7.12_19 = vect__6.11_20 + vect__4.8_27; >> GCC apply partial predicate load/store and un-predicated full >> vector operation on partial vectorization. >> Such flow are used by all other targets like ARM SVE (RVV also >> uses such flow): >> ARM SVE: >> .L3: >>         ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load >>         ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load >>         add     z31.s, z31.s, z30.s            -> un-predicated add >>         st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store >> Such vectorization flow causes AVL/VL toggling on RVV so we need >> AVL propagation PASS for it. >> Also, It's very unlikely that we can apply predicated operations >> on all vectorization for following reasons: >> 1. It's very heavy workload to support them on all vectorization >> and we don't see any benefits if we can handle that on targets >> backend. >> 2. Changing Loop vectorizer for it will make code base ugly and >> hard to maintain. >> 3. We will need so many patterns for all operations. Not only >> COND_LEN_ADD, COND_LEN_SUB, .... >>    We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over >> 100+ patterns, unreasonable number of patterns. >> To conclude, we prefer un-predicated operations here, and design >> a nice and clean AVL propagation PASS for it to elide the >> redundant vsetvls >> due to AVL/VL toggling. >> The second question is that why we separate a PASS called AVL >> propagation. Why not optimize it in VSETVL PASS (We definitetly >> can optimize AVL in VSETVL PASS) >> Frankly, I was planning to address such issue in VSETVL PASS >> that's why we recently refactored VSETVL PASS. However, I changed >> my mind recently after several >> experiments and tries. >> The reasons as follows: >> 1. For code base management and maintainience. Current VSETVL >> PASS is complicated enough and aleady has enough aggressive and >> fancy optimizations which >>    turns out it can always generate optimal codegen in most of >> the cases. It's not a good idea keep adding more features into >> VSETVL PASS to make VSETVL >> PASS become heavy and heavy again, then we will need to refactor >> it again in the future. >> Actuall, the VSETVL PASS is very stable and optimal after the >> recent refactoring. Hopefully, we should not change VSETVL PASS >> any more except the minor >> fixes. >> 2. vsetvl insertion (VSETVL PASS does this thing) and AVL >> propagation are 2 different things,  I don't think we should fuse >> them into same PASS. >> 3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should >> be done before RA which can reduce register allocation. >> 4. This patch's AVL propagation PASS only does AVL propagation >> for RVV partial auto-vectorization situations. >>    This patch's codes are only hundreds lines which is very >> managable and can be very easily extended features and enhancements. >> We can easily extend and enhance more AVL propagation in a clean >> and separate PASS in the future. (If we do it on VSETVL PASS, we >> will complicate >> VSETVL PASS again which is already so complicated.) >> Here is an example to demonstrate more: >> https://godbolt.org/z/bE86sv3q5 >> void foo2 (int *__restrict a, >>           int *__restrict b, >>           int *__restrict c, >>           int *__restrict a2, >>           int *__restrict b2, >>           int *__restrict c2, >>           int *__restrict a3, >>           int *__restrict b3, >>           int *__restrict c3, >>           int *__restrict a4, >>           int *__restrict b4, >>           int *__restrict c4, >>           int *__restrict a5, >>           int *__restrict b5, >>           int *__restrict c5, >>           int n) >> { >>     for (int i = 0; i < n; i++){ >>       a[i] = b[i] + c[i]; >>       b5[i] = b[i] + c[i]; >>       a2[i] = b2[i] + c2[i]; >>       a3[i] = b3[i] + c3[i]; >>       a4[i] = b4[i] + c4[i]; >>       a5[i] = a[i] + a4[i]; >>       a[i] = a5[i] + b5[i]+ a[i]; >>       a[i] = a[i] + c[i]; >>       b5[i] = a[i] + c[i]; >>       a2[i] = a[i] + c2[i]; >>       a3[i] = a[i] + c3[i]; >>       a4[i] = a[i] + c4[i]; >>       a5[i] = a[i] + a4[i]; >>       a[i] = a[i] + b5[i]+ a[i]; >>     } >> } >> 1. Loop Body: >> Before this patch:                                          After >> this patch: >>       vsetvli a4,t1,e8,mf4,ta,ma                           >> vsetvli a4,t1,e32,m1,ta,ma >>         vle32.v v2,0(a2)                                     >> vle32.v v2,0(a2) >>         vle32.v v4,0(a1)                                     >> vle32.v v3,0(t2) >>         vle32.v v1,0(t2)                                     >> vle32.v v4,0(a1) >>         vsetvli a7,zero,e32,m1,ta,ma                         >> vle32.v v1,0(t0) >>         vadd.vv v4,v2,v4                                     >> vadd.vv v4,v2,v4 >>         vsetvli zero,a4,e32,m1,ta,ma                         >> vadd.vv v1,v3,v1 >>         vle32.v v3,0(s0)                                     >> vadd.vv v1,v1,v4 >>         vsetvli a7,zero,e32,m1,ta,ma                         >> vadd.vv v1,v1,v4 >>         vadd.vv v1,v3,v1                                     >> vadd.vv v1,v1,v4 >>         vadd.vv v1,v1,v4                                     >> vadd.vv v1,v1,v2 >>         vadd.vv v1,v1,v4                                     >> vadd.vv v2,v1,v2 >>         vadd.vv v1,v1,v4                                     >> vse32.v v2,0(t5) >>         vsetvli zero,a4,e32,m1,ta,ma                         >> vadd.vv v2,v2,v1 >>         vle32.v v4,0(a5)                                     >> vadd.vv v2,v2,v1 >>         vsetvli a7,zero,e32,m1,ta,ma                         slli >> a7,a4,2 >>         vadd.vv v1,v1,v2                                     >> vadd.vv v3,v1,v3 >>         vadd.vv v2,v1,v2                                     >> vle32.v v5,0(a5) >>         vadd.vv v4,v1,v4                                     >> vle32.v v6,0(t6) >>         vsetvli zero,a4,e32,m1,ta,ma                         >> vse32.v v3,0(t3) >>         vse32.v v2,0(t5)                                     >> vse32.v v2,0(a0) >>         vse32.v v4,0(a3)                                     >> vadd.vv v3,v3,v1 >>         vsetvli a7,zero,e32,m1,ta,ma                         >> vadd.vv v2,v1,v5 >>         vadd.vv v3,v1,v3                                     >> vse32.v v3,0(t4) >>         vadd.vv v2,v2,v1                                     >> vadd.vv v1,v1,v6 >>         vadd.vv v2,v2,v1                                     >> vse32.v v2,0(a3) >>         vsetvli zero,a4,e32,m1,ta,ma                         >> vse32.v v1,0(a6) >>         vse32.v v2,0(a0) >>         vse32.v v3,0(t3) >>         vle32.v v2,0(t0) >>         vsetvli a7,zero,e32,m1,ta,ma >>         vadd.vv v3,v3,v1 >>         vsetvli zero,a4,e32,m1,ta,ma >>         vse32.v v3,0(t4) >>         vsetvli a7,zero,e32,m1,ta,ma >>         slli a7,a4,2 >>         vadd.vv v1,v1,v2 >>         sub t1,t1,a4 >>         vsetvli zero,a4,e32,m1,ta,ma >>         vse32.v v1,0(a6) >> It's quite obvious, all heavy && redundant vsetvls inside loop >> body are eliminated. >> 2. Epilogue: >>     Before this patch:                                          >> After this patch: >> .L5: .L5: >>         ld s0,8(sp)                                         ret >>         addi sp,sp,16 >>         jr ra >> This is the benefit we do the AVL propation before RA since we >> eliminate the use of 'a7' register >> which is used by the redudant AVL/VL toggling instruction: >> 'vsetvli a7,zero,e32,m1,ta,ma' >> The final codegen after this patch: >> foo2: >> lw t1,56(sp) >> ld t6,0(sp) >> ld t3,8(sp) >> ld t0,16(sp) >> ld t2,24(sp) >> ld t4,32(sp) >> ld t5,40(sp) >> ble t1,zero,.L5 >> .L3: >> vsetvli a4,t1,e32,m1,ta,ma >> vle32.v v2,0(a2) >> vle32.v v3,0(t2) >> vle32.v v4,0(a1) >> vle32.v v1,0(t0) >> vadd.vv v4,v2,v4 >> vadd.vv v1,v3,v1 >> vadd.vv v1,v1,v4 >> vadd.vv v1,v1,v4 >> vadd.vv v1,v1,v4 >> vadd.vv v1,v1,v2 >> vadd.vv v2,v1,v2 >> vse32.v v2,0(t5) >> vadd.vv v2,v2,v1 >> vadd.vv v2,v2,v1 >> slli a7,a4,2 >> vadd.vv v3,v1,v3 >> vle32.v v5,0(a5) >> vle32.v v6,0(t6) >> vse32.v v3,0(t3) >> vse32.v v2,0(a0) >> vadd.vv v3,v3,v1 >> vadd.vv v2,v1,v5 >> vse32.v v3,0(t4) >> vadd.vv v1,v1,v6 >> vse32.v v2,0(a3) >> vse32.v v1,0(a6) >> sub t1,t1,a4 >> add a1,a1,a7 >> add a2,a2,a7 >> add a5,a5,a7 >> add t6,t6,a7 >> add t0,t0,a7 >> add t2,t2,a7 >> add t5,t5,a7 >> add a3,a3,a7 >> add a6,a6,a7 >> add t3,t3,a7 >> add t4,t4,a7 >> add a0,a0,a7 >> bne t1,zero,.L3 >> .L5: >> ret >> PR target/111888 >> gcc/ChangeLog: >> * config.gcc: Add AVL propgatation PASS. >> * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto. >> * config/riscv/riscv-protos.h (make_pass_avlprop): Ditto. >> (has_vtype_op): Export as global. >> (has_vl_op): Ditto. >> (tail_agnostic_p): Ditto. >> (validate_change_or_fail): Ditto. >> (vlmax_avl_type_p): Ditto. >> (vlmax_avl_p): Ditto. >> (get_sew): Ditto. >> (enum vlmul_type): Ditto. >> (const_vlmax_p): Ditto. >> * config/riscv/riscv-v.cc (has_vtype_op): Ditto. >> (has_vl_op): Ditto. >> (get_default_ta): Ditto. >> (tail_agnostic_p): Ditto. >> (validate_change_or_fail): Ditto. >> (vlmax_avl_type_p): Ditto. >> (vlmax_avl_p): Ditto. >> (get_sew): Ditto. >> (enum vlmul_type): Ditto. >> (get_vlmul): Ditto. >> * config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto. >> (has_vtype_op): Ditto. >> (has_vl_op): Ditto. >> (get_sew): Ditto. >> (get_vlmul): Ditto. >> (get_default_ta): Ditto. >> (tail_agnostic_p): Ditto. >> (validate_change_or_fail): Ditto. >> * config/riscv/t-riscv: Add AVL propagation PASS. >> * config/riscv/vector.md: Fix VLS modes attribute. >> * config/riscv/riscv-avlprop.cc: New file. >> gcc/testsuite/ChangeLog: >> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test. >> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto. >> * gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto. >> * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto. >> * gcc.target/riscv/rvv/rvv.exp: Add AVL propagation. >> * gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test. >> * gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test. >> --- >> gcc/config.gcc                                |   2 +- >> gcc/config/riscv/riscv-avlprop.cc             | 350 >> ++++++++++++++++++ >> gcc/config/riscv/riscv-passes.def             |   1 + >> gcc/config/riscv/riscv-protos.h               |  10 + >> gcc/config/riscv/riscv-v.cc                   |  84 ++++- >> gcc/config/riscv/riscv-vsetvl.cc              |  82 +--- >> gcc/config/riscv/t-riscv                      |   6 + >> gcc/config/riscv/vector.md                    |   2 +- >> .../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +- >> .../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +- >> .../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +- >> .../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 - >> .../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 + >> .../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++ >> gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 + >> 15 files changed, 514 insertions(+), 84 deletions(-) >> create mode 100644 gcc/config/riscv/riscv-avlprop.cc >> create mode 100644 >> gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c >> create mode 100644 >> gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c >> diff --git a/gcc/config.gcc b/gcc/config.gcc >> index 606d3a8513e..efd53965c9a 100644 >> --- a/gcc/config.gcc >> +++ b/gcc/config.gcc >> @@ -544,7 +544,7 @@ pru-*-*) >> riscv*) >> cpu_type=riscv >> extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o >> riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o" >> - extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o >> riscv-vector-costs.o" >> + extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o >> riscv-vector-costs.o riscv-avlprop.o" >> extra_objs="${extra_objs} riscv-vector-builtins.o >> riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o" >> extra_objs="${extra_objs} thead.o" >> d_target_objs="riscv-d.o" >> diff --git a/gcc/config/riscv/riscv-avlprop.cc >> b/gcc/config/riscv/riscv-avlprop.cc >> new file mode 100644 >> index 00000000000..bf3becd8371 >> --- /dev/null >> +++ b/gcc/config/riscv/riscv-avlprop.cc >> @@ -0,0 +1,350 @@ >> +/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler. >> +   Copyright (C) 2023-2023 Free Software Foundation, Inc. >> +   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI >> Technologies Ltd. >> + >> +This file is part of GCC. >> + >> +GCC is free software; you can redistribute it and/or modify >> +it under the terms of the GNU General Public License as published by >> +the Free Software Foundation; either version 3, or(at your option) >> +any later version. >> + >> +GCC is distributed in the hope that it will be useful, >> +but WITHOUT ANY WARRANTY; without even the implied warranty of >> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +GNU General Public License for more details. >> + >> +You should have received a copy of the GNU General Public License >> +along with GCC; see the file COPYING3.  If not see >> +. */ >> + >> +/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions. >> +   A standalone AVL propagation pass is designed because: >> + >> +     - Better code maintain: >> +       Current LCM-based VSETVL pass is so complicated that codes >> +       there will become even harder to maintain. A straight forward >> +       AVL propagation PASS is much easier to maintain. >> + >> +     - Reduce scalar register pressure: >> +       A type of AVL propagation is we propagate AVL from NON-VLMAX >> +       instruction to VLMAX instruction. >> +       Note: VLMAX instruction should be ignore tail elements (TA) >> +       and the result should be used by the NON-VLMAX instruction. >> +       This optimization is mostly for auto-vectorization codes: >> + >> +   vsetvli r136, r137      --- SELECT_VL >> +   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD >> +   vadd.vv (use VLMAX)     --- PLUS_EXPR >> +   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE >> + >> + NO AVL propation: >> + >> +   vsetvli a5, a4, ta >> +   vle8.v v1 >> +   vsetvli t0, zero, ta >> +   vadd.vv v2, v1, v1 >> +   vse8.v v2 >> + >> + We can propagate the AVL to 'vadd.vv' since its result >> + is consumed by a 'vse8.v' which has AVL = a5 and its >> + tail elements are agnostic. >> + >> +       We DON'T do this optimization on VSETVL pass since it is a >> +       post-RA pass that consumed 't0' already wheras a standalone >> +       pre-RA AVL propagation pass allows us elide the consumption >> +       of the pseudo register of 't0' then we can reduce scalar >> +       register pressure. >> + >> +     - More AVL propagation opportunities: >> +       A pre-RA pass is more flexible for AVL REG def-use chain, >> +       thus we will get more potential AVL propagation as long as >> +       it doesn't increase the scalar register pressure. >> +*/ >> + >> +#define IN_TARGET_CODE 1 >> +#define INCLUDE_ALGORITHM >> +#define INCLUDE_FUNCTIONAL >> + >> +#include "config.h" >> +#include "system.h" >> +#include "coretypes.h" >> +#include "tm.h" >> +#include "backend.h" >> +#include "rtl.h" >> +#include "target.h" >> +#include "tree-pass.h" >> +#include "df.h" >> +#include "rtl-ssa.h" >> +#include "cfgcleanup.h" >> +#include "insn-attr.h" >> + >> +using namespace rtl_ssa; >> +using namespace riscv_vector; >> + >> +/* The AVL propagation instructions and corresponding preferred AVL. >> +   It will be updated during the analysis.  */ >> +static hash_map *avlprops; >> + >> +const pass_data pass_data_avlprop = { >> +  RTL_PASS, /* type */ >> +  "avlprop", /* name */ >> +  OPTGROUP_NONE, /* optinfo_flags */ >> +  TV_NONE, /* tv_id */ >> +  0, /* properties_required */ >> +  0, /* properties_provided */ >> +  0, /* properties_destroyed */ >> +  0, /* todo_flags_start */ >> +  0, /* todo_flags_finish */ >> +}; >> + >> +class pass_avlprop : public rtl_opt_pass >> +{ >> +public: >> +  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass >> (pass_data_avlprop, ctxt) {} >> + >> +  /* opt_pass methods: */ >> +  virtual bool gate (function *) final override >> +  { >> +    return TARGET_VECTOR && optimize > 0; >> +  } >> +  virtual unsigned int execute (function *) final override; >> +}; // class pass_avlprop >> + >> +static void >> +avlprop_init (void) >> +{ >> +  calculate_dominance_info (CDI_DOMINATORS); >> +  df_analyze (); >> +  crtl->ssa = new function_info (cfun); >> +  avlprops = new hash_map; >> +} >> + >> +static void >> +avlprop_done (void) >> +{ >> +  free_dominance_info (CDI_DOMINATORS); >> +  if (crtl->ssa->perform_pending_updates ()) >> +    cleanup_cfg (0); >> +  delete crtl->ssa; >> +  crtl->ssa = nullptr; >> +  delete avlprops; >> +  avlprops = NULL; >> +} >> + >> +/* Helper function to get AVL operand.  */ >> +static rtx >> +get_avl (insn_info *insn, bool avlprop_p) >> +{ >> +  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE >> +      || get_attr_avl_type (insn->rtl ()) == VLS) >> +    return NULL_RTX; >> +  if (avlprop_p) >> +    { >> +      if (avlprops->get (insn)) >> + return (*avlprops->get (insn)); >> +      else if (vlmax_avl_type_p (insn->rtl ())) >> + return RVV_VLMAX; >> +    } >> +  extract_insn_cached (insn->rtl ()); >> +  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())]; >> +} >> + >> +/* This is a straight forward pattern ALWAYS in paritial >> auto-vectorization: >> + >> +     VL = SELECT_AVL (AVL, ...) >> +     V0 = MASK_LEN_LOAD (..., VL) >> +     V1 = MASK_LEN_LOAD (..., VL) >> +     V2 = V0 + V1 --- Missed LEN information. >> +     MASK_LEN_STORE (..., V2, VL) >> + >> +   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, >> V1, dummy LEN) >> +   because: >> + >> +     - Few code changes in Loop Vectorizer. >> +     - Reuse the current clean flow of partial vectorization, >> That is, apply >> +       predicate LEN or MASK into LOAD/STORE operations and >> other special >> +       arithmetic operations (e.d. DIV), then do the whole >> vector register >> +       operation if it DON'T affect the correctness. >> +       Such flow is used by all other targets like x86, sve, >> s390, ... etc. >> +     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD. >> + >> +   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like >> PLUS_EXPR which >> +   generates the VLMAX instruction due to missed LEN >> information. The later >> +   VSETVL PASS will elided the redundant vsetvls. >> +*/ >> + >> +static rtx >> +get_autovectorize_preferred_avl (insn_info *insn) >> +{ >> +  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p >> (insn->rtl ())) >> +    return NULL_RTX; >> + >> +  rtx use_avl = NULL_RTX; >> +  insn_info *avl_use_insn = nullptr; >> +  unsigned int ratio >> +    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul >> (insn->rtl ())); >> +  for (def_info *def : insn->defs ()) >> +    { >> +      auto set = safe_dyn_cast (def); >> +      if (!set || !set->is_reg ()) >> + return NULL_RTX; >> +      for (use_info *use : set->all_uses ()) >> + { >> +   if (!use->is_in_nondebug_insn ()) >> +     return NULL_RTX; >> +   insn_info *use_insn = use->insn (); >> +   /* FIXME: Stop AVL propagation if any USE is not a RVV real >> +      instruction. It should be totally enough for vectorized >> codes since >> +      they always locate at extended blocks. >> + >> +      TODO: We can extend PHI checking for intrinsic codes if it >> +      necessary in the future.  */ >> +   if (use_insn->is_artificial () || !has_vtype_op >> (use_insn->rtl ())) >> +     return NULL_RTX; >> +   if (!has_vl_op (use_insn->rtl ())) >> +     continue; >> + >> +   rtx new_use_avl = get_avl (use_insn, true); >> +   if (!new_use_avl) >> +     return NULL_RTX; >> +   if (!use_avl) >> +     use_avl = new_use_avl; >> +   if (!rtx_equal_p (use_avl, new_use_avl) >> +       || calculate_ratio (get_sew (use_insn->rtl ()), >> +   get_vlmul (use_insn->rtl ())) >> +    != ratio >> +       || vlmax_avl_p (new_use_avl) >> +       || !tail_agnostic_p (use_insn->rtl ())) >> +     return NULL_RTX; >> +   if (!avl_use_insn) >> +     avl_use_insn = use_insn; >> + } >> +    } >> + >> +  if (use_avl && register_operand (use_avl, Pmode)) >> +    { >> +      gcc_assert (avl_use_insn); >> +      // Find a definition at or neighboring INSN. >> +      resource_info resource = full_register (REGNO (use_avl)); >> +      def_lookup dl1 = crtl->ssa->find_def (resource, insn); >> +      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn); >> +      if (dl1.matching_set () || dl2.matching_set ()) >> + return NULL_RTX; >> +      def_info *def1 = dl1.last_def_of_prev_group (); >> +      def_info *def2 = dl2.last_def_of_prev_group (); >> +      if (def1 != def2) >> + return NULL_RTX; >> +      /* FIXME: We only all AVL propation within a block which >> should >> + be totally enough for vectorized codes. >> + >> + TODO: We can enhance it here for intrinsic codes in the future >> + if it is necessary.  */ >> +      if (def1->insn ()->bb () != insn->bb () >> +   || def1->insn ()->compare_with (insn) >= 0) >> + return NULL_RTX; >> +    } >> +  return use_avl; >> +} >> + >> +/* If we have a preferred AVL to propagate, return the AVL. >> +   Otherwise, return NULL_RTX as we don't need have any preferred >> +   AVL.  */ >> + >> +static rtx >> +get_preferred_avl (insn_info *insn) >> +{ >> +  /* TODO: We only do AVL propagation for missed-LEN partial >> +     autovectorization for now.  We could add more more AVL >> +     propagation for intrinsic codes in the future. */ >> +  return get_autovectorize_preferred_avl (insn); >> +} >> + >> +/* Return the AVL TYPE operand index.  */ >> +static int >> +get_avl_type_index (insn_info *insn) >> +{ >> +  extract_insn_cached (insn->rtl ()); >> +  /* Except rounding mode patterns, AVL TYPE operand >> +     is always the last operand.  */ >> +  if (find_access (insn->uses (), VXRM_REGNUM) >> +      || find_access (insn->uses (), FRM_REGNUM)) >> +    return recog_data.n_operands - 2; >> +  return recog_data.n_operands - 1; >> +} >> + >> +/* Main entry point for this pass.  */ >> +unsigned int >> +pass_avlprop::execute (function *) >> +{ >> +  avlprop_init (); >> + >> +  /* Go through all the instructions looking for AVL that we >> could propagate. */ >> + >> +  insn_info *next; >> +  bool change_p = true; >> + >> +  while (change_p) >> +    { >> +      /* Iterate on each instruction until no more change need.  */ >> +      change_p = false; >> +      for (insn_info *insn = crtl->ssa->first_insn (); insn; >> insn = next) >> + { >> +   next = insn->next_any_insn (); >> +   /* We only forward AVL to the instruction that has AVL/VL operand >> +      and can be optimized in RTL_SSA level.  */ >> +   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ())) >> +     continue; >> + >> +   rtx new_avl = get_preferred_avl (insn); >> +   if (new_avl) >> +     { >> +       gcc_assert (!vlmax_avl_p (new_avl)); >> +       auto &update = avlprops->get_or_insert (insn); >> +       change_p = !rtx_equal_p (update, new_avl); >> +       update = new_avl; >> +     } >> + } >> +    } >> + >> +  if (dump_file) >> +    fprintf (dump_file, "\nNumber of successful AVL >> propagations: %d\n\n", >> +      (int) avlprops->elements ()); >> + >> +  for (const auto iter : *avlprops) >> +    { >> +      rtx_insn *rinsn = iter.first->rtl (); >> +      if (dump_file) >> + { >> +   fprintf (dump_file, "\nPropagating AVL: "); >> +   print_rtl_single (dump_file, iter.second); >> +   fprintf (dump_file, "into: "); >> +   print_rtl_single (dump_file, rinsn); >> + } >> +      /* Replace AVL operand.  */ >> +      rtx new_pat >> + = simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first, >> false), >> + iter.second); >> +      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, >> false); >> + >> +      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */ >> +      if (vlmax_avl_type_p (rinsn)) >> + validate_change_or_fail ( >> +   rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)], >> +   get_avl_type_rtx (avl_type::NONVLMAX), false); >> +      if (dump_file) >> + { >> +   fprintf (dump_file, "Successfully to match this instruction: "); >> +   print_rtl_single (dump_file, rinsn); >> + } >> +    } >> + >> +  avlprop_done (); >> +  return 0; >> +} >> + >> +rtl_opt_pass * >> +make_pass_avlprop (gcc::context *ctxt) >> +{ >> +  return new pass_avlprop (ctxt); >> +} >> diff --git a/gcc/config/riscv/riscv-passes.def >> b/gcc/config/riscv/riscv-passes.def >> index 4084122cf0a..b6260939d5c 100644 >> --- a/gcc/config/riscv/riscv-passes.def >> +++ b/gcc/config/riscv/riscv-passes.def >> @@ -18,4 +18,5 @@ >> . */ >> INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs); >> +INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop); >> INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl); >> diff --git a/gcc/config/riscv/riscv-protos.h >> b/gcc/config/riscv/riscv-protos.h >> index 6cb9d459ee9..2b09ec9ea9e 100644 >> --- a/gcc/config/riscv/riscv-protos.h >> +++ b/gcc/config/riscv/riscv-protos.h >> @@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const >> char *, struct gcc_options *, locatio >> extern bool riscv_hard_regno_rename_ok (unsigned, unsigned); >> rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt); >> +rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt); >> rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt); >> /* Routines implemented in riscv-string.c.  */ >> @@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode); >> bool cmp_lmul_gt_one (machine_mode); >> bool gather_scatter_valid_offset_mode_p (machine_mode); >> bool vls_mode_valid_p (machine_mode); >> +bool has_vtype_op (rtx_insn *); >> +bool has_vl_op (rtx_insn *); >> +bool tail_agnostic_p (rtx_insn *); >> +void validate_change_or_fail (rtx, rtx *, rtx, bool); >> +bool vlmax_avl_type_p (rtx_insn *); >> +bool vlmax_avl_p (rtx); >> +uint8_t get_sew (rtx_insn *); >> +enum vlmul_type get_vlmul (rtx_insn *); >> +bool const_vlmax_p (machine_mode); >> } >> /* We classify builtin types into two classes: >> diff --git a/gcc/config/riscv/riscv-v.cc >> b/gcc/config/riscv/riscv-v.cc >> index e39a9507803..473622ac321 100644 >> --- a/gcc/config/riscv/riscv-v.cc >> +++ b/gcc/config/riscv/riscv-v.cc >> @@ -56,7 +56,7 @@ using namespace riscv_vector; >> namespace riscv_vector { >> /* Return true if vlmax is constant value and can be used in >> vsetivl.  */ >> -static bool >> +bool >> const_vlmax_p (machine_mode mode) >> { >>    poly_uint64 nuints = GET_MODE_NUNITS (mode); >> @@ -298,14 +298,6 @@ public: >>       len = force_reg (Pmode, len); >>     vls_p = true; >>   } >> - else if (const_vlmax_p (vtype_mode)) >> -   { >> -     /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of >> -        the vsetvli to obtain the value of vlmax.  */ >> -     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode); >> -     len = gen_int_mode (nunits, Pmode); >> -     vls_p = true; >> -   } >> else if (can_create_pseudo_p ()) >>   { >>     len = gen_reg_rtx (Pmode); >> @@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops) >>    emit_move_insn (dst, x4); >> } >> +/* Return true if it is an RVV instruction depends on VTYPE global >> +   status register.  */ >> +bool >> +has_vtype_op (rtx_insn *rinsn) >> +{ >> +  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op >> (rinsn); >> +} >> + >> +/* Return true if it is an RVV instruction depends on VL global >> +   status register.  */ >> +bool >> +has_vl_op (rtx_insn *rinsn) >> +{ >> +  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn); >> +} >> + >> +/* Get default tail policy.  */ >> +static bool >> +get_default_ta () >> +{ >> +  /* For the instruction that doesn't require TA, we still need >> a default value >> +     to emit vsetvl. We pick up the default value according to >> prefer policy. */ >> +  return (bool) (get_prefer_tail_policy () & 0x1 >> + || (get_prefer_tail_policy () >> 1 & 0x1)); >> +} >> + >> +/* Helper function to get TA operand.  */ >> +bool >> +tail_agnostic_p (rtx_insn *rinsn) >> +{ >> +  /* If it doesn't have TA, we return agnostic by default.  */ >> +  extract_insn_cached (rinsn); >> +  int ta = get_attr_ta (rinsn); >> +  return ta == INVALID_ATTRIBUTE ? get_default_ta () : >> IS_AGNOSTIC (ta); >> +} >> + >> +/* Change insn and Assert the change always happens. */ >> +void >> +validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool >> in_group) >> +{ >> +  bool change_p = validate_change (object, loc, new_rtx, in_group); >> +  gcc_assert (change_p); >> +} >> + >> +/* Return true if it is VLMAX AVL TYPE.  */ >> +bool >> +vlmax_avl_type_p (rtx_insn *rinsn) >> +{ >> +  return get_attr_avl_type (rinsn) == VLMAX; >> +} >> + >> +/* Return true if RTX is RVV VLMAX AVL.  */ >> +bool >> +vlmax_avl_p (rtx x) >> +{ >> +  return x && rtx_equal_p (x, RVV_VLMAX); >> +} >> + >> +/* Helper function to get SEW operand. We always have SEW value for >> +   all RVV instructions that have VTYPE OP.  */ >> +uint8_t >> +get_sew (rtx_insn *rinsn) >> +{ >> +  return get_attr_sew (rinsn); >> +} >> + >> +/* Helper function to get VLMUL operand. We always have VLMUL >> value for >> +   all RVV instructions that have VTYPE OP. */ >> +enum vlmul_type >> +get_vlmul (rtx_insn *rinsn) >> +{ >> +  return (enum vlmul_type) get_attr_vlmul (rinsn); >> +} >> + >> } // namespace riscv_vector >> diff --git a/gcc/config/riscv/riscv-vsetvl.cc >> b/gcc/config/riscv/riscv-vsetvl.cc >> index e9dd669de98..f2f19e423bf 100644 >> --- a/gcc/config/riscv/riscv-vsetvl.cc >> +++ b/gcc/config/riscv/riscv-vsetvl.cc >> @@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p) >>    return agnostic_p ? "agnostic" : "undisturbed"; >> } >> -static bool >> -vlmax_avl_p (rtx x) >> -{ >> -  return x && rtx_equal_p (x, RVV_VLMAX); >> -} >> - >> -/* Return true if it is an RVV instruction depends on VTYPE global >> -   status register.  */ >> -static bool >> -has_vtype_op (rtx_insn *rinsn) >> -{ >> -  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op >> (rinsn); >> -} >> - >> -/* Return true if it is an RVV instruction depends on VL global >> -   status register.  */ >> -static bool >> -has_vl_op (rtx_insn *rinsn) >> -{ >> -  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn); >> -} >> - >> /* Return true if the instruction ignores VLMUL field of VTYPE.  */ >> static bool >> ignore_vlmul_insn_p (rtx_insn *rinsn) >> @@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn) >>    if (!has_vl_op (rinsn)) >>      return NULL_RTX; >> -  if (get_attr_avl_type (rinsn) == VLMAX) >> -    return RVV_VLMAX; >> -  extract_insn_cached (rinsn); >> -  return recog_data.operand[get_attr_vl_op_idx (rinsn)]; >> -} >> -/* Helper function to get SEW operand. We always have SEW value for >> -   all RVV instructions that have VTYPE OP.  */ >> -static uint8_t >> -get_sew (rtx_insn *rinsn) >> -{ >> -  return get_attr_sew (rinsn); >> -} >> - >> -/* Helper function to get VLMUL operand. We always have VLMUL >> value for >> -   all RVV instructions that have VTYPE OP. */ >> -static enum vlmul_type >> -get_vlmul (rtx_insn *rinsn) >> -{ >> -  return (enum vlmul_type) get_attr_vlmul (rinsn); >> -} >> +  extract_insn_cached (rinsn); >> +  if (vlmax_avl_type_p (rinsn)) >> +    { >> +      if (BYTES_PER_RISCV_VECTOR.is_constant ()) >> + { >> +   for (int i = 0; i < recog_data.n_operands; i++) >> +     if (GET_MODE_CLASS (recog_data.operand_mode[i]) == >> MODE_VECTOR_BOOL >> + && const_vlmax_p (recog_data.operand_mode[i])) >> +       return gen_int_mode (GET_MODE_NUNITS >> (recog_data.operand_mode[i]), >> +    Pmode); >> + } >> +      return RVV_VLMAX; >> +    } >> -/* Get default tail policy.  */ >> -static bool >> -get_default_ta () >> -{ >> -  /* For the instruction that doesn't require TA, we still need >> a default value >> -     to emit vsetvl. We pick up the default value according to >> prefer policy. */ >> -  return (bool) (get_prefer_tail_policy () & 0x1 >> - || (get_prefer_tail_policy () >> 1 & 0x1)); >> +  return recog_data.operand[get_attr_vl_op_idx (rinsn)]; >> } >> /* Get default mask policy.  */ >> @@ -407,16 +371,6 @@ get_default_ma () >> || (get_prefer_mask_policy () >> 1 & 0x1)); >> } >> -/* Helper function to get TA operand.  */ >> -static bool >> -tail_agnostic_p (rtx_insn *rinsn) >> -{ >> -  /* If it doesn't have TA, we return agnostic by default.  */ >> -  extract_insn_cached (rinsn); >> -  int ta = get_attr_ta (rinsn); >> -  return ta == INVALID_ATTRIBUTE ? get_default_ta () : >> IS_AGNOSTIC (ta); >> -} >> - >> /* Helper function to get MA operand.  */ >> static bool >> mask_agnostic_p (rtx_insn *rinsn) >> @@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn >> *rinsn, int regno) >>    return true; >> } >> -/* Change insn and Assert the change always happens. */ >> -static void >> -validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool >> in_group) >> -{ >> -  bool change_p = validate_change (object, loc, new_rtx, in_group); >> -  gcc_assert (change_p); >> -} >> - >> /* This flags indicates the minimum demand of the vl and vtype >> values by the >>     RVV instruction. For example, DEMAND_RATIO_P indicates that >> this RVV >>     instruction only needs the SEW/LMUL ratio to remain the same, >> and does not >> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv >> index dd17056fe82..08de62853a6 100644 >> --- a/gcc/config/riscv/t-riscv >> +++ b/gcc/config/riscv/t-riscv >> @@ -69,6 +69,12 @@ riscv-vsetvl.o: >> $(srcdir)/config/riscv/riscv-vsetvl.cc \ >> $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ >> $(srcdir)/config/riscv/riscv-vsetvl.cc >> +riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \ >> +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \ >> +  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h >> + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ >> + $(srcdir)/config/riscv/riscv-avlprop.cc >> + >> riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \ >>    $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) >> $(FUNCTION_H) \ >>    $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \ >> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md >> index ef91950178f..0c59d1b90bc 100644 >> --- a/gcc/config/riscv/vector.md >> +++ b/gcc/config/riscv/vector.md >> @@ -809,7 +809,7 @@ >> V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF, >> V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF, >> V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF") >> -    (symbol_ref "riscv_vector::NONVLMAX") >> +    (symbol_ref "riscv_vector::VLS") >> (eq_attr "type" >> "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\ >> vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\ >> vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\ >> diff --git >> a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c >> b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c >> index 928a507a363..5278e4aa38f 100644 >> --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c >> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c >> @@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a, >>      } >> } >> -/* { dg-final { scan-assembler {e32,m4} } } */ >> +/* { dg-final { scan-assembler {e16,m2} } } */ >> /* { dg-final { scan-assembler-not {csrr} } } */ >> /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" >> } } */ >> /* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" >> } } */ >> diff --git >> a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c >> b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c >> index a50265fc1ec..1db2e073846 100644 >> --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c >> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c >> @@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict >> b, int n) >>      a[i] = a[i] + b[i]; >> } >> -/* { dg-final { scan-assembler {e32,m8} } } */ >> +/* { dg-final { scan-assembler {e16,m4} } } */ >> /* { dg-final { scan-assembler-not {csrr} } } */ >> /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" >> } } */ >> /* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */ >> diff --git >> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c >> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c >> index eac7cbc757b..ca88d42cdf4 100644 >> --- >> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c >> +++ >> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c >> @@ -7,10 +7,11 @@ >> /* >> ** foo: >> ** >> vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au] >> +** ... >> ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\) >> ** ... >> -** >> vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au] >> -** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+ >> +** >> vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au] >> +** ... >> ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\) >> ** ... >> */ >> diff --git >> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c >> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c >> index 965365da4bb..13367423751 100644 >> --- >> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c >> +++ >> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c >> @@ -3,7 +3,6 @@ >> #include "ternop-2.c" >> -/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */ >> /* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */ >> /* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized" >> } } */ >> /* { dg-final { scan-assembler-not {\tvmv} } } */ >> diff --git >> a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c >> b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c >> new file mode 100644 >> index 00000000000..b0d21650c3d >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c >> @@ -0,0 +1,16 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-march=rv64gcv -mabi=lp64d >> --param=riscv-autovec-preference=fixed-vlmax -O3" } */ >> + >> +void >> +foo (int *__restrict a, int *__restrict b, int *__restrict c, int n) >> +{ >> +  for (int i = 0; i < n; i++) >> +    a[i] = b[i] + c[i]; >> +} >> + >> +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */ >> +/* { dg-final { scan-assembler-not {vsetivli} } } */ >> +/* { dg-final { scan-assembler-times >> {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */ >> +/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} >> } } */ >> +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */ >> +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */ >> diff --git >> a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c >> b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c >> new file mode 100644 >> index 00000000000..f2d8aa54b88 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c >> @@ -0,0 +1,33 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-march=rv64gcv -mabi=lp64d >> --param=riscv-autovec-preference=fixed-vlmax -O3" } */ >> + >> +void >> +foo (int *__restrict a, int *__restrict b, int *__restrict c, >> +     int *__restrict a2, int *__restrict b2, int *__restrict c2, >> +     int *__restrict a3, int *__restrict b3, int *__restrict c3, >> +     int *__restrict a4, int *__restrict b4, int *__restrict c4, >> +     int *__restrict a5, int *__restrict b5, int *__restrict c5, >> +     int *__restrict d, int *__restrict d2, int *__restrict d3, >> +     int *__restrict d4, int *__restrict d5, int n, int m) >> +{ >> +  for (int i = 0; i < n; i++) >> +    { >> +      a[i] = b[i] + c[i]; >> +      a2[i] = b2[i] + c2[i]; >> +      a3[i] = b3[i] + c3[i]; >> +      a4[i] = b4[i] + c4[i]; >> +      a5[i] = a[i] + a4[i]; >> +      d[i] = a[i] - a2[i]; >> +      d2[i] = a2[i] * a[i]; >> +      d3[i] = a3[i] * a2[i]; >> +      d4[i] = a2[i] * d2[i]; >> +      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i]; >> +    } >> +} >> + >> +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */ >> +/* { dg-final { scan-assembler-not {vsetivli} } } */ >> +/* { dg-final { scan-assembler-times >> {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */ >> +/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} >> } } */ >> +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */ >> +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */ >> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp >> b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp >> index 674ba0d72b4..fc830f2cd4d 100644 >> --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp >> +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp >> @@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain >> $srcdir/$subdir/vsetvl/*.\[cS\]]] \ >> "" $CFLAGS >> dg-runtest [lsort [glob -nocomplain >> $srcdir/$subdir/autovec/*.\[cS\]]] \ >> "-O3 -ftree-vectorize" $CFLAGS >> +dg-runtest [lsort [glob -nocomplain >> $srcdir/$subdir/avlprop/*.\[cS\]]] \ >> + "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS >> dg-runtest [lsort [glob -nocomplain >> $srcdir/$subdir/autovec/vls/*.\[cS\]]] \ >> "-O3 -ftree-vectorize --param riscv-autovec-preference=scalable" >> $CFLAGS >> dg-runtest [lsort [glob -nocomplain >> $srcdir/$subdir/autovec/struct/*.\[cS\]]] \ >> -- >> 2.36.3 >> --------------x0PQCsXXJBiqhfVf2Jt78yA0--