From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=94ck=GG=rivosinc.com=patrick@sourceware.org>
Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432])
	by sourceware.org (Postfix) with ESMTPS id DB8CC3858D37
	for <gcc-patches@gcc.gnu.org>; Tue, 24 Oct 2023 15:03:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DB8CC3858D37
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DB8CC3858D37
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::432
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698159820; cv=none;
	b=pE4z/spLGNxM88ZmVwUDAn+Cg97QypuhW//RHrEwQHJVpTZmJP5r/oZ/qD+dQKhV55Mul3N5zZwHb9OtlnTJ2pHa5ZvNYvitScqfcK+wzgOlmiKfKIFDHoycTSOqtau4ULqu4OJ3O1NO5LBTRX+wFYQVuAG5PiofcewAIKnniAc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1698159820; c=relaxed/simple;
	bh=N3lST9cTaIBHFTq9HDzhWTz9Iho3RSdJl792RAoLQ6c=;
	h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:From:To; b=uYGqTIglwAvbVDuvSNcd2qZAfNIitFP1TyfjzD4RfQXY6PSRgruBbhmHwccxXXA9MVS3utF9Mk1dXbQ6jKRhTQsUt5EoaVWoX0ZY7XRH8uPIim/zcNRny7P45CmRhgIhKcz2nKNns8Vmb9CkWcCWaFwxL7y/MF21+YU/K7JNMNw=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-6bd0e1b1890so3568093b3a.3
        for <gcc-patches@gcc.gnu.org>; Tue, 24 Oct 2023 08:03:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1698159809; x=1698764609; darn=gcc.gnu.org;
        h=in-reply-to:references:cc:to:from:content-language:subject
         :user-agent:mime-version:date:message-id:from:to:cc:subject:date
         :message-id:reply-to;
        bh=ImClya6pX4ZYPcaQ5lq682lyV0JwDZ0wZ6Eq6m/IjAM=;
        b=jYKFzHA8nPWwTjF03uF22hf982J0+YP8SXeIRgU3ONxFeUA+E1eb9IA1gcHirsq6Kf
         sHajQUr4DKduWTaMHrCSrP90JARj9j30jLhu16LC6xUhv66+b7hOyEPuo+BbUD+htfaC
         KDcB47xwJiGDxbre9t8KLMUopfv7PK/dQj7jrtMO+N87ZVxm6E0bNZWw9kWbWAeY/M7f
         oIWyO6vdPwyQQrU3eoUlaPm4fAIFXvzSULEU67Yk+SpFb00UdTb/utzRs6JXw1jUOsLd
         /nCmz97EgvYTQV5MAhERUlPK40b5inoNjXKEaiaysSnwTbPl6X+8n9o0nmlNgiocGVad
         PsXQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1698159809; x=1698764609;
        h=in-reply-to:references:cc:to:from:content-language:subject
         :user-agent:mime-version:date:message-id:x-gm-message-state:from:to
         :cc:subject:date:message-id:reply-to;
        bh=ImClya6pX4ZYPcaQ5lq682lyV0JwDZ0wZ6Eq6m/IjAM=;
        b=UQAf5lDxjDtUTJNhDaucFKqISR3BBwK7MRGsDywCE0T5jiVG+AVG3czhWbq+S7mTxI
         qKg8PyII4uoQCFfCl3ajFDPbyHm8UZvGbk/pRjgLC1+Cu1Cu49LH1/0atKEVxVMFmu7u
         BgZI/5GiyUiWkvVoPIEV8wgOv+0tSocxPAzDKLYYUZRcQoqC8bi9IPpWGyg3gXYP7vDo
         dCBXfUVDrHpQdWDJTPkJ5fT36BNIsHbDHbEUUMg4zB8lMta3R2G5JyP6Oh9+rzdTDvnD
         aHQiGGZZMlQkl/g6qRxpQ+yk7YeOOkrk2SWXg/uP1+InhNggybmiEbll7FHAHCo30YT+
         JsLg==
X-Gm-Message-State: AOJu0YyZy2g1kCYlsXHQsl7pNEiS/D9kBIaI9hpIbn1zkbZsqflegCef
	erUQHWFSnVagtbh6IbBjhQutvw==
X-Google-Smtp-Source: AGHT+IGHGAGgldQhHQBF+COWAtsBrU4oiNgNOWme89qTrzldEpUqlkyNNQ6U9yB5UCEF0gFP5h3qDA==
X-Received: by 2002:a05:6a21:3e0d:b0:171:8e16:ea83 with SMTP id bk13-20020a056a213e0d00b001718e16ea83mr2752051pzc.29.1698159805028;
        Tue, 24 Oct 2023 08:03:25 -0700 (PDT)
Received: from ?IPV6:2601:647:5700:6860:cead:830d:6436:e172? ([2601:647:5700:6860:cead:830d:6436:e172])
        by smtp.gmail.com with ESMTPSA id y27-20020aa78f3b000000b006bdc8bb2ed5sm7662856pfr.82.2023.10.24.08.03.23
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Tue, 24 Oct 2023 08:03:24 -0700 (PDT)
Content-Type: multipart/alternative;
 boundary="------------x0PQCsXXJBiqhfVf2Jt78yA0"
Message-ID: <0f93e039-9bd3-07f1-2d48-9b4a13efe99b@rivosinc.com>
Date: Tue, 24 Oct 2023 08:03:22 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.13.0
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV
 auto-vectorization
Content-Language: en-US
From: Patrick O'Neill <patrick@rivosinc.com>
To: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>,
 gcc-patches <gcc-patches@gcc.gnu.org>
Cc: "kito.cheng" <kito.cheng@gmail.com>, "Kito.cheng"
 <kito.cheng@sifive.com>, jeffreyalaw <jeffreyalaw@gmail.com>,
 Robin Dapp <rdapp.gcc@gmail.com>
References: <20231024033200.224558-1-juzhe.zhong@rivai.ai>
 <4A9A3B661519DAC3+202310241144138297945@rivai.ai>
 <6e22f033-887a-3bf1-316a-5e5ec69a3434@rivosinc.com>
In-Reply-To: <6e22f033-887a-3bf1-316a-5e5ec69a3434@rivosinc.com>
X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,HTML_MESSAGE,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SCC_10_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

This is a multi-part message in MIME format.
--------------x0PQCsXXJBiqhfVf2Jt78yA0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

I'm seeing a variety of new failures, constrained to rv32gcv:

Tested using newlib/linux:
rv32gcv/ ilp32d/ medlow
rv64gcv/  lp64d/ medlow
rv64gcv_zvbb_zvbc_zvkg_zvkn_zvknc_zvkned_zvkng_zvknha_zvknhb_zvks_zvksc_zvksed_zvksg_zvksh_zvkt/ 
lp64d/ medlow
rv64imafdcv_zicond_zawrs_zbc_zvkng_zvksg_zvbb_zvbc_zicsr_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt/ 
lp64d/ medlow

Newlib failures:
rv32gcv:
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects 
execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects 
execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test

Debug log for testcases that aren't pr110557.c look like this:

Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o 
./popcount-run-1.exe (timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o 
./popcount-run-1.exe PASS: 
gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c (test for excess 
errors) spawn riscv64-unknown-elf-run ./popcount-run-1.exe FAIL: 
gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

Debug log for pr110557.c:

Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ 
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../  
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow   
-fdiagnostics-plain-output  -nostdinc++ 
-I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf 
-I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util 
-fmessage-length=0  -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-fdump-tree-vect-details        
-L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  
-L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs  
-lm  -o ./pr110557.exe    (timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ 
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-nostdinc++ 
-I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf 
-I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util 
-fmessage-length=0 -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-fdump-tree-vect-details 
-L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs 
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs 
-lm -o ./pr110557.exe PASS: g++.dg/vect/pr110557.cc  -std=c++14 (test 
for excess errors) spawn riscv64-unknown-elf-run ./pr110557.exe 
/scratch/tc-testing/tc-oct-23-avl/build-newlib/../scripts/wrapper/qemu/riscv64-unknown-elf-run: 
line 15: 3449805 Trace/breakpoint trap   (core dumped) 
QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 
5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL: 
g++.dg/vect/pr110557.cc  -std=c++14 execution test

Linux failures:
rv32gcv:
FAIL: gcc.dg/nextafter-2.c execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects 
execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects 
execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
FAIL: gfortran.dg/default_format_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_2.f90   -Os  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer 
-finline-functions  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -Os  execution test
FAIL: gfortran.dg/large_real_kind_2.F90   -O0  execution test
FAIL: gfortran.dg/round_4.f90   -O0  execution test
FAIL: gfortran.dg/zero_sized_3.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test
FAIL: gfortran.dg/ieee/large_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O1  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O2  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -Os  execution test
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  
-O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  
-O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_sum.f90 execution,  -O2 
-fomit-frame-pointer -finline-functions -funroll-loops

Some (not all) debug log outputs:

Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 
-march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -w -O2 -fomit-frame-pointer 
-finline-functions 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs 
-lm -o 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x 
(timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 
-march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -w -O2 -fomit-frame-pointer 
-finline-functions 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs 
-lm -o 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x 
PASS: gfortran.fortran-torture/execute/intrinsic_count.f90 compilation, 
-O2 -fomit-frame-pointer -finline-functions spawn 
riscv64-unknown-linux-gnu-run 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x 
STOP 2 FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 
execution, -O2 -fomit-frame-pointer -finline-functions Executing on 
host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -w -O2 -fomit-frame-pointer 
-finline-functions -funroll-loops 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs 
-lm -o 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x 
(timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -w -O2 -fomit-frame-pointer 
-finline-functions -funroll-loops 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs 
-lm -o 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x 
PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation, 
-O2 -fomit-frame-pointer -finline-functions -funroll-loops spawn 
riscv64-unknown-linux-gnu-run 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x 
STOP 3 FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 
execution, -O2 -fomit-frame-pointer -finline-functions -funroll-loops
Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
-fno-unsafe-math-optimizations -frounding-math -fsignaling-nans 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs 
-lm -o ./large_2.exe (timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
-fno-unsafe-math-optimizations -frounding-math -fsignaling-nans 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs 
-lm -o ./large_2.exe PASS: gfortran.dg/ieee/large_2.f90 -O0 (test for 
excess errors) spawn riscv64-unknown-linux-gnu-run ./large_2.exe 
0.333333333333333333333333333333333317 
2.24271998593667819112500193394291495E+1644 STOP 1 FAIL: 
gfortran.dg/ieee/large_2.f90 -O0 execution test Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-nostdinc++ 
-I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu 
-I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util 
-fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-fdump-tree-vect-details 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs 
-lm -o ./pr110557.exe (timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-nostdinc++ 
-I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu 
-I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util 
-fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-fdump-tree-vect-details 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs 
-lm -o ./pr110557.exe PASS: g++.dg/vect/pr110557.cc -std=c++98 (test for 
excess errors) spawn riscv64-unknown-linux-gnu-run ./pr110557.exe 
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: 
line 15: 323485 Trace/breakpoint trap (core dumped) 
QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 
5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL: 
g++.dg/vect/pr110557.cc -std=c++98 execution test Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
-fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe 
(timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
-fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe 
PASS: gcc.dg/vect/vect-reduc-dot-21.c (test for excess errors) spawn 
riscv64-unknown-linux-gnu-run ./vect-reduc-dot-21.exe 
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: 
line 15: 3484803 Aborted (core dumped) QEMU_CPU="$(march-to-cpu-opt 
--get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L 
${RISC_V_SYSROOT} "$@" FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution 
test Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
-fno-common -O2 -fdump-tree-vect-details -lm -o 
./vect-alias-check-16.exe (timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
-fno-common -O2 -fdump-tree-vect-details -lm -o 
./vect-alias-check-16.exe PASS: gcc.dg/vect/vect-alias-check-16.c (test 
for excess errors) spawn riscv64-unknown-linux-gnu-run 
./vect-alias-check-16.exe 
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: 
line 15: 3431975 Aborted (core dumped) QEMU_CPU="$(march-to-cpu-opt 
--get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L 
${RISC_V_SYSROOT} "$@" FAIL: gcc.dg/vect/vect-alias-check-16.c execution 
test PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "flags: 
*RAW\\n" PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect 
"using an address-based overlap test" PASS: 
gcc.dg/vect/vect-alias-check-16.c scan-tree-dump-not vect "using an 
index-based"

I've observed nextafter-2.c being flaky on the CI so that particular 
failure might not be real.

If you want any particular testcase's debug logs please let me know.

Patrick

On 10/23/23 21:30, Patrick O'Neill wrote:
>
> The CI just picked it up: 
> https://github.com/ewlu/gcc-precommit-ci/issues/449#issue-1958483272
> Since it doesn't apply to the CI's baseline hash it's only performing 
> a build.
> I'll re-run it in the morning once the baseline has been updated.
>
> In the meantime I started a full build+test run on my local machine.
> I'll send you the results in ~10 hours - morning my time :-)
>
> Patrick
>
> On 10/23/23 20:44, juzhe.zhong@rivai.ai wrote:
>> CCing Patrick...
>>
>> Hi, @Patrick.
>> Could you apply this patch and trigger your regression CI?
>>
>> I don't have an environment to test fortran for now (I only test it 
>> on C/C++).
>>
>> Thanks.
>>
>> ------------------------------------------------------------------------
>> juzhe.zhong@rivai.ai
>>
>>     *From:* Juzhe-Zhong <mailto:juzhe.zhong@rivai.ai>
>>     *Date:* 2023-10-24 11:32
>>     *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
>>     *CC:* kito.cheng <mailto:kito.cheng@gmail.com>; kito.cheng
>>     <mailto:kito.cheng@sifive.com>; jeffreyalaw
>>     <mailto:jeffreyalaw@gmail.com>; rdapp.gcc
>>     <mailto:rdapp.gcc@gmail.com>; Juzhe-Zhong
>>     <mailto:juzhe.zhong@rivai.ai>
>>     *Subject:* [PATCH] RISC-V: Add AVL propagation PASS for RVV
>>     auto-vectorization
>>     This patch addresses the redundant AVL/VL toggling in RVV partial
>>     auto-vectorization
>>     which is a known issue for a long time and I finally find the
>>     time to address it.
>>     Consider a simple vector addition operation:
>>     https://godbolt.org/z/7hfGfEjW3
>>     void
>>     foo (int *__restrict a,
>>          int *__restrict b,
>>          int *__restrict n)
>>     {
>>       for (int i = 0; i < n; i++)
>>           a[i] = a[i] + b[i];
>>     }
>>     Optimized IR:
>>     Loop body:
>>       _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4,
>>     4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
>>       ...
>>       vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... },
>>     _38, 0);    -> vle32.v v2,0(a0)
>>       vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... },
>>     _38, 0);   -> vle32.v v1,0(a1)
>>       vect__7.12_19 = vect__6.11_20 +
>>     vect__4.8_27;                              -> vsetvli
>>     a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
>>       .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0,
>>     vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
>>     We can see 2 redundant vsetvls inside the loop body due to AVL/VL
>>     toggling.
>>     The AVL/VL toggling is because we are missing LEN information in
>>     simple PLUS_EXPR GIMPLE assignment:
>>     vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
>>     GCC apply partial predicate load/store and un-predicated full
>>     vector operation on partial vectorization.
>>     Such flow are used by all other targets like ARM SVE (RVV also
>>     uses such flow):
>>     ARM SVE:
>>     .L3:
>>             ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
>>             ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
>>             add     z31.s, z31.s, z30.s            -> un-predicated add
>>             st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store
>>     Such vectorization flow causes AVL/VL toggling on RVV so we need
>>     AVL propagation PASS for it.
>>     Also, It's very unlikely that we can apply predicated operations
>>     on all vectorization for following reasons:
>>     1. It's very heavy workload to support them on all vectorization
>>     and we don't see any benefits if we can handle that on targets
>>     backend.
>>     2. Changing Loop vectorizer for it will make code base ugly and
>>     hard to maintain.
>>     3. We will need so many patterns for all operations. Not only
>>     COND_LEN_ADD, COND_LEN_SUB, ....
>>        We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over
>>     100+ patterns, unreasonable number of patterns.
>>     To conclude, we prefer un-predicated operations here, and design
>>     a nice and clean AVL propagation PASS for it to elide the
>>     redundant vsetvls
>>     due to AVL/VL toggling.
>>     The second question is that why we separate a PASS called AVL
>>     propagation. Why not optimize it in VSETVL PASS (We definitetly
>>     can optimize AVL in VSETVL PASS)
>>     Frankly, I was planning to address such issue in VSETVL PASS
>>     that's why we recently refactored VSETVL PASS. However, I changed
>>     my mind recently after several
>>     experiments and tries.
>>     The reasons as follows:
>>     1. For code base management and maintainience. Current VSETVL
>>     PASS is complicated enough and aleady has enough aggressive and
>>     fancy optimizations which
>>        turns out it can always generate optimal codegen in most of
>>     the cases. It's not a good idea keep adding more features into
>>     VSETVL PASS to make VSETVL
>>     PASS become heavy and heavy again, then we will need to refactor
>>     it again in the future.
>>     Actuall, the VSETVL PASS is very stable and optimal after the
>>     recent refactoring. Hopefully, we should not change VSETVL PASS
>>     any more except the minor
>>     fixes.
>>     2. vsetvl insertion (VSETVL PASS does this thing) and AVL
>>     propagation are 2 different things,  I don't think we should fuse
>>     them into same PASS.
>>     3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should
>>     be done before RA which can reduce register allocation.
>>     4. This patch's AVL propagation PASS only does AVL propagation
>>     for RVV partial auto-vectorization situations.
>>        This patch's codes are only hundreds lines which is very
>>     managable and can be very easily extended features and enhancements.
>>     We can easily extend and enhance more AVL propagation in a clean
>>     and separate PASS in the future. (If we do it on VSETVL PASS, we
>>     will complicate
>>     VSETVL PASS again which is already so complicated.)
>>     Here is an example to demonstrate more:
>>     https://godbolt.org/z/bE86sv3q5
>>     void foo2 (int *__restrict a,
>>               int *__restrict b,
>>               int *__restrict c,
>>               int *__restrict a2,
>>               int *__restrict b2,
>>               int *__restrict c2,
>>               int *__restrict a3,
>>               int *__restrict b3,
>>               int *__restrict c3,
>>               int *__restrict a4,
>>               int *__restrict b4,
>>               int *__restrict c4,
>>               int *__restrict a5,
>>               int *__restrict b5,
>>               int *__restrict c5,
>>               int n)
>>     {
>>         for (int i = 0; i < n; i++){
>>           a[i] = b[i] + c[i];
>>           b5[i] = b[i] + c[i];
>>           a2[i] = b2[i] + c2[i];
>>           a3[i] = b3[i] + c3[i];
>>           a4[i] = b4[i] + c4[i];
>>           a5[i] = a[i] + a4[i];
>>           a[i] = a5[i] + b5[i]+ a[i];
>>           a[i] = a[i] + c[i];
>>           b5[i] = a[i] + c[i];
>>           a2[i] = a[i] + c2[i];
>>           a3[i] = a[i] + c3[i];
>>           a4[i] = a[i] + c4[i];
>>           a5[i] = a[i] + a4[i];
>>           a[i] = a[i] + b5[i]+ a[i];
>>         }
>>     }
>>     1. Loop Body:
>>     Before this patch:                                          After
>>     this patch:
>>           vsetvli a4,t1,e8,mf4,ta,ma                          
>>     vsetvli a4,t1,e32,m1,ta,ma
>>             vle32.v v2,0(a2)                                    
>>     vle32.v v2,0(a2)
>>             vle32.v v4,0(a1)                                    
>>     vle32.v v3,0(t2)
>>             vle32.v v1,0(t2)                                    
>>     vle32.v v4,0(a1)
>>             vsetvli a7,zero,e32,m1,ta,ma                        
>>     vle32.v v1,0(t0)
>>             vadd.vv v4,v2,v4                                    
>>     vadd.vv v4,v2,v4
>>             vsetvli zero,a4,e32,m1,ta,ma                        
>>     vadd.vv v1,v3,v1
>>             vle32.v v3,0(s0)                                    
>>     vadd.vv v1,v1,v4
>>             vsetvli a7,zero,e32,m1,ta,ma                        
>>     vadd.vv v1,v1,v4
>>             vadd.vv v1,v3,v1                                    
>>     vadd.vv v1,v1,v4
>>             vadd.vv v1,v1,v4                                    
>>     vadd.vv v1,v1,v2
>>             vadd.vv v1,v1,v4                                    
>>     vadd.vv v2,v1,v2
>>             vadd.vv v1,v1,v4                                    
>>     vse32.v v2,0(t5)
>>             vsetvli zero,a4,e32,m1,ta,ma                        
>>     vadd.vv v2,v2,v1
>>             vle32.v v4,0(a5)                                    
>>     vadd.vv v2,v2,v1
>>             vsetvli a7,zero,e32,m1,ta,ma                         slli
>>     a7,a4,2
>>             vadd.vv v1,v1,v2                                    
>>     vadd.vv v3,v1,v3
>>             vadd.vv v2,v1,v2                                    
>>     vle32.v v5,0(a5)
>>             vadd.vv v4,v1,v4                                    
>>     vle32.v v6,0(t6)
>>             vsetvli zero,a4,e32,m1,ta,ma                        
>>     vse32.v v3,0(t3)
>>             vse32.v v2,0(t5)                                    
>>     vse32.v v2,0(a0)
>>             vse32.v v4,0(a3)                                    
>>     vadd.vv v3,v3,v1
>>             vsetvli a7,zero,e32,m1,ta,ma                        
>>     vadd.vv v2,v1,v5
>>             vadd.vv v3,v1,v3                                    
>>     vse32.v v3,0(t4)
>>             vadd.vv v2,v2,v1                                    
>>     vadd.vv v1,v1,v6
>>             vadd.vv v2,v2,v1                                    
>>     vse32.v v2,0(a3)
>>             vsetvli zero,a4,e32,m1,ta,ma                        
>>     vse32.v v1,0(a6)
>>             vse32.v v2,0(a0)
>>             vse32.v v3,0(t3)
>>             vle32.v v2,0(t0)
>>             vsetvli a7,zero,e32,m1,ta,ma
>>             vadd.vv v3,v3,v1
>>             vsetvli zero,a4,e32,m1,ta,ma
>>             vse32.v v3,0(t4)
>>             vsetvli a7,zero,e32,m1,ta,ma
>>             slli a7,a4,2
>>             vadd.vv v1,v1,v2
>>             sub t1,t1,a4
>>             vsetvli zero,a4,e32,m1,ta,ma
>>             vse32.v v1,0(a6)
>>     It's quite obvious, all heavy && redundant vsetvls inside loop
>>     body are eliminated.
>>     2. Epilogue:
>>         Before this patch:                                         
>>     After this patch:
>>     .L5: .L5:
>>             ld s0,8(sp)                                         ret
>>             addi sp,sp,16
>>             jr ra
>>     This is the benefit we do the AVL propation before RA since we
>>     eliminate the use of 'a7' register
>>     which is used by the redudant AVL/VL toggling instruction:
>>     'vsetvli a7,zero,e32,m1,ta,ma'
>>     The final codegen after this patch:
>>     foo2:
>>     lw t1,56(sp)
>>     ld t6,0(sp)
>>     ld t3,8(sp)
>>     ld t0,16(sp)
>>     ld t2,24(sp)
>>     ld t4,32(sp)
>>     ld t5,40(sp)
>>     ble t1,zero,.L5
>>     .L3:
>>     vsetvli a4,t1,e32,m1,ta,ma
>>     vle32.v v2,0(a2)
>>     vle32.v v3,0(t2)
>>     vle32.v v4,0(a1)
>>     vle32.v v1,0(t0)
>>     vadd.vv v4,v2,v4
>>     vadd.vv v1,v3,v1
>>     vadd.vv v1,v1,v4
>>     vadd.vv v1,v1,v4
>>     vadd.vv v1,v1,v4
>>     vadd.vv v1,v1,v2
>>     vadd.vv v2,v1,v2
>>     vse32.v v2,0(t5)
>>     vadd.vv v2,v2,v1
>>     vadd.vv v2,v2,v1
>>     slli a7,a4,2
>>     vadd.vv v3,v1,v3
>>     vle32.v v5,0(a5)
>>     vle32.v v6,0(t6)
>>     vse32.v v3,0(t3)
>>     vse32.v v2,0(a0)
>>     vadd.vv v3,v3,v1
>>     vadd.vv v2,v1,v5
>>     vse32.v v3,0(t4)
>>     vadd.vv v1,v1,v6
>>     vse32.v v2,0(a3)
>>     vse32.v v1,0(a6)
>>     sub t1,t1,a4
>>     add a1,a1,a7
>>     add a2,a2,a7
>>     add a5,a5,a7
>>     add t6,t6,a7
>>     add t0,t0,a7
>>     add t2,t2,a7
>>     add t5,t5,a7
>>     add a3,a3,a7
>>     add a6,a6,a7
>>     add t3,t3,a7
>>     add t4,t4,a7
>>     add a0,a0,a7
>>     bne t1,zero,.L3
>>     .L5:
>>     ret
>>     PR target/111888
>>     gcc/ChangeLog:
>>     * config.gcc: Add AVL propgatation PASS.
>>     * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
>>     * config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
>>     (has_vtype_op): Export as global.
>>     (has_vl_op): Ditto.
>>     (tail_agnostic_p): Ditto.
>>     (validate_change_or_fail): Ditto.
>>     (vlmax_avl_type_p): Ditto.
>>     (vlmax_avl_p): Ditto.
>>     (get_sew): Ditto.
>>     (enum vlmul_type): Ditto.
>>     (const_vlmax_p): Ditto.
>>     * config/riscv/riscv-v.cc (has_vtype_op): Ditto.
>>     (has_vl_op): Ditto.
>>     (get_default_ta): Ditto.
>>     (tail_agnostic_p): Ditto.
>>     (validate_change_or_fail): Ditto.
>>     (vlmax_avl_type_p): Ditto.
>>     (vlmax_avl_p): Ditto.
>>     (get_sew): Ditto.
>>     (enum vlmul_type): Ditto.
>>     (get_vlmul): Ditto.
>>     * config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
>>     (has_vtype_op): Ditto.
>>     (has_vl_op): Ditto.
>>     (get_sew): Ditto.
>>     (get_vlmul): Ditto.
>>     (get_default_ta): Ditto.
>>     (tail_agnostic_p): Ditto.
>>     (validate_change_or_fail): Ditto.
>>     * config/riscv/t-riscv: Add AVL propagation PASS.
>>     * config/riscv/vector.md: Fix VLS modes attribute.
>>     * config/riscv/riscv-avlprop.cc: New file.
>>     gcc/testsuite/ChangeLog:
>>     * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
>>     * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
>>     * gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
>>     * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
>>     * gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
>>     * gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
>>     * gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.
>>     ---
>>     gcc/config.gcc                                |   2 +-
>>     gcc/config/riscv/riscv-avlprop.cc             | 350
>>     ++++++++++++++++++
>>     gcc/config/riscv/riscv-passes.def             |   1 +
>>     gcc/config/riscv/riscv-protos.h               |  10 +
>>     gcc/config/riscv/riscv-v.cc                   |  84 ++++-
>>     gcc/config/riscv/riscv-vsetvl.cc              |  82 +---
>>     gcc/config/riscv/t-riscv                      |   6 +
>>     gcc/config/riscv/vector.md                    |   2 +-
>>     .../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +-
>>     .../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +-
>>     .../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +-
>>     .../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 -
>>     .../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
>>     .../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
>>     gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
>>     15 files changed, 514 insertions(+), 84 deletions(-)
>>     create mode 100644 gcc/config/riscv/riscv-avlprop.cc
>>     create mode 100644
>>     gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>     create mode 100644
>>     gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>     diff --git a/gcc/config.gcc b/gcc/config.gcc
>>     index 606d3a8513e..efd53965c9a 100644
>>     --- a/gcc/config.gcc
>>     +++ b/gcc/config.gcc
>>     @@ -544,7 +544,7 @@ pru-*-*)
>>     riscv*)
>>     cpu_type=riscv
>>     extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o
>>     riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
>>     - extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o
>>     riscv-vector-costs.o"
>>     + extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o
>>     riscv-vector-costs.o riscv-avlprop.o"
>>     extra_objs="${extra_objs} riscv-vector-builtins.o
>>     riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>>     extra_objs="${extra_objs} thead.o"
>>     d_target_objs="riscv-d.o"
>>     diff --git a/gcc/config/riscv/riscv-avlprop.cc
>>     b/gcc/config/riscv/riscv-avlprop.cc
>>     new file mode 100644
>>     index 00000000000..bf3becd8371
>>     --- /dev/null
>>     +++ b/gcc/config/riscv/riscv-avlprop.cc
>>     @@ -0,0 +1,350 @@
>>     +/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler.
>>     +   Copyright (C) 2023-2023 Free Software Foundation, Inc.
>>     +   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI
>>     Technologies Ltd.
>>     +
>>     +This file is part of GCC.
>>     +
>>     +GCC is free software; you can redistribute it and/or modify
>>     +it under the terms of the GNU General Public License as published by
>>     +the Free Software Foundation; either version 3, or(at your option)
>>     +any later version.
>>     +
>>     +GCC is distributed in the hope that it will be useful,
>>     +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>     +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>     +GNU General Public License for more details.
>>     +
>>     +You should have received a copy of the GNU General Public License
>>     +along with GCC; see the file COPYING3.  If not see
>>     +<http://www.gnu.org/licenses/>. */
>>     +
>>     +/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions.
>>     +   A standalone AVL propagation pass is designed because:
>>     +
>>     +     - Better code maintain:
>>     +       Current LCM-based VSETVL pass is so complicated that codes
>>     +       there will become even harder to maintain. A straight forward
>>     +       AVL propagation PASS is much easier to maintain.
>>     +
>>     +     - Reduce scalar register pressure:
>>     +       A type of AVL propagation is we propagate AVL from NON-VLMAX
>>     +       instruction to VLMAX instruction.
>>     +       Note: VLMAX instruction should be ignore tail elements (TA)
>>     +       and the result should be used by the NON-VLMAX instruction.
>>     +       This optimization is mostly for auto-vectorization codes:
>>     +
>>     +   vsetvli r136, r137      --- SELECT_VL
>>     +   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
>>     +   vadd.vv (use VLMAX)     --- PLUS_EXPR
>>     +   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
>>     +
>>     + NO AVL propation:
>>     +
>>     +   vsetvli a5, a4, ta
>>     +   vle8.v v1
>>     +   vsetvli t0, zero, ta
>>     +   vadd.vv v2, v1, v1
>>     +   vse8.v v2
>>     +
>>     + We can propagate the AVL to 'vadd.vv' since its result
>>     + is consumed by a 'vse8.v' which has AVL = a5 and its
>>     + tail elements are agnostic.
>>     +
>>     +       We DON'T do this optimization on VSETVL pass since it is a
>>     +       post-RA pass that consumed 't0' already wheras a standalone
>>     +       pre-RA AVL propagation pass allows us elide the consumption
>>     +       of the pseudo register of 't0' then we can reduce scalar
>>     +       register pressure.
>>     +
>>     +     - More AVL propagation opportunities:
>>     +       A pre-RA pass is more flexible for AVL REG def-use chain,
>>     +       thus we will get more potential AVL propagation as long as
>>     +       it doesn't increase the scalar register pressure.
>>     +*/
>>     +
>>     +#define IN_TARGET_CODE 1
>>     +#define INCLUDE_ALGORITHM
>>     +#define INCLUDE_FUNCTIONAL
>>     +
>>     +#include "config.h"
>>     +#include "system.h"
>>     +#include "coretypes.h"
>>     +#include "tm.h"
>>     +#include "backend.h"
>>     +#include "rtl.h"
>>     +#include "target.h"
>>     +#include "tree-pass.h"
>>     +#include "df.h"
>>     +#include "rtl-ssa.h"
>>     +#include "cfgcleanup.h"
>>     +#include "insn-attr.h"
>>     +
>>     +using namespace rtl_ssa;
>>     +using namespace riscv_vector;
>>     +
>>     +/* The AVL propagation instructions and corresponding preferred AVL.
>>     +   It will be updated during the analysis.  */
>>     +static hash_map<insn_info *, rtx> *avlprops;
>>     +
>>     +const pass_data pass_data_avlprop = {
>>     +  RTL_PASS, /* type */
>>     +  "avlprop", /* name */
>>     +  OPTGROUP_NONE, /* optinfo_flags */
>>     +  TV_NONE, /* tv_id */
>>     +  0, /* properties_required */
>>     +  0, /* properties_provided */
>>     +  0, /* properties_destroyed */
>>     +  0, /* todo_flags_start */
>>     +  0, /* todo_flags_finish */
>>     +};
>>     +
>>     +class pass_avlprop : public rtl_opt_pass
>>     +{
>>     +public:
>>     +  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass
>>     (pass_data_avlprop, ctxt) {}
>>     +
>>     +  /* opt_pass methods: */
>>     +  virtual bool gate (function *) final override
>>     +  {
>>     +    return TARGET_VECTOR && optimize > 0;
>>     +  }
>>     +  virtual unsigned int execute (function *) final override;
>>     +}; // class pass_avlprop
>>     +
>>     +static void
>>     +avlprop_init (void)
>>     +{
>>     +  calculate_dominance_info (CDI_DOMINATORS);
>>     +  df_analyze ();
>>     +  crtl->ssa = new function_info (cfun);
>>     +  avlprops = new hash_map<insn_info *, rtx>;
>>     +}
>>     +
>>     +static void
>>     +avlprop_done (void)
>>     +{
>>     +  free_dominance_info (CDI_DOMINATORS);
>>     +  if (crtl->ssa->perform_pending_updates ())
>>     +    cleanup_cfg (0);
>>     +  delete crtl->ssa;
>>     +  crtl->ssa = nullptr;
>>     +  delete avlprops;
>>     +  avlprops = NULL;
>>     +}
>>     +
>>     +/* Helper function to get AVL operand.  */
>>     +static rtx
>>     +get_avl (insn_info *insn, bool avlprop_p)
>>     +{
>>     +  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
>>     +      || get_attr_avl_type (insn->rtl ()) == VLS)
>>     +    return NULL_RTX;
>>     +  if (avlprop_p)
>>     +    {
>>     +      if (avlprops->get (insn))
>>     + return (*avlprops->get (insn));
>>     +      else if (vlmax_avl_type_p (insn->rtl ()))
>>     + return RVV_VLMAX;
>>     +    }
>>     +  extract_insn_cached (insn->rtl ());
>>     +  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
>>     +}
>>     +
>>     +/* This is a straight forward pattern ALWAYS in paritial
>>     auto-vectorization:
>>     +
>>     +     VL = SELECT_AVL (AVL, ...)
>>     +     V0 = MASK_LEN_LOAD (..., VL)
>>     +     V1 = MASK_LEN_LOAD (..., VL)
>>     +     V2 = V0 + V1 --- Missed LEN information.
>>     +     MASK_LEN_STORE (..., V2, VL)
>>     +
>>     +   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0,
>>     V1, dummy LEN)
>>     +   because:
>>     +
>>     +     - Few code changes in Loop Vectorizer.
>>     +     - Reuse the current clean flow of partial vectorization,
>>     That is, apply
>>     +       predicate LEN or MASK into LOAD/STORE operations and
>>     other special
>>     +       arithmetic operations (e.d. DIV), then do the whole
>>     vector register
>>     +       operation if it DON'T affect the correctness.
>>     +       Such flow is used by all other targets like x86, sve,
>>     s390, ... etc.
>>     +     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
>>     +
>>     +   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like
>>     PLUS_EXPR which
>>     +   generates the VLMAX instruction due to missed LEN
>>     information. The later
>>     +   VSETVL PASS will elided the redundant vsetvls.
>>     +*/
>>     +
>>     +static rtx
>>     +get_autovectorize_preferred_avl (insn_info *insn)
>>     +{
>>     +  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p
>>     (insn->rtl ()))
>>     +    return NULL_RTX;
>>     +
>>     +  rtx use_avl = NULL_RTX;
>>     +  insn_info *avl_use_insn = nullptr;
>>     +  unsigned int ratio
>>     +    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul
>>     (insn->rtl ()));
>>     +  for (def_info *def : insn->defs ())
>>     +    {
>>     +      auto set = safe_dyn_cast<set_info *> (def);
>>     +      if (!set || !set->is_reg ())
>>     + return NULL_RTX;
>>     +      for (use_info *use : set->all_uses ())
>>     + {
>>     +   if (!use->is_in_nondebug_insn ())
>>     +     return NULL_RTX;
>>     +   insn_info *use_insn = use->insn ();
>>     +   /* FIXME: Stop AVL propagation if any USE is not a RVV real
>>     +      instruction. It should be totally enough for vectorized
>>     codes since
>>     +      they always locate at extended blocks.
>>     +
>>     +      TODO: We can extend PHI checking for intrinsic codes if it
>>     +      necessary in the future.  */
>>     +   if (use_insn->is_artificial () || !has_vtype_op
>>     (use_insn->rtl ()))
>>     +     return NULL_RTX;
>>     +   if (!has_vl_op (use_insn->rtl ()))
>>     +     continue;
>>     +
>>     +   rtx new_use_avl = get_avl (use_insn, true);
>>     +   if (!new_use_avl)
>>     +     return NULL_RTX;
>>     +   if (!use_avl)
>>     +     use_avl = new_use_avl;
>>     +   if (!rtx_equal_p (use_avl, new_use_avl)
>>     +       || calculate_ratio (get_sew (use_insn->rtl ()),
>>     +   get_vlmul (use_insn->rtl ()))
>>     +    != ratio
>>     +       || vlmax_avl_p (new_use_avl)
>>     +       || !tail_agnostic_p (use_insn->rtl ()))
>>     +     return NULL_RTX;
>>     +   if (!avl_use_insn)
>>     +     avl_use_insn = use_insn;
>>     + }
>>     +    }
>>     +
>>     +  if (use_avl && register_operand (use_avl, Pmode))
>>     +    {
>>     +      gcc_assert (avl_use_insn);
>>     +      // Find a definition at or neighboring INSN.
>>     +      resource_info resource = full_register (REGNO (use_avl));
>>     +      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
>>     +      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
>>     +      if (dl1.matching_set () || dl2.matching_set ())
>>     + return NULL_RTX;
>>     +      def_info *def1 = dl1.last_def_of_prev_group ();
>>     +      def_info *def2 = dl2.last_def_of_prev_group ();
>>     +      if (def1 != def2)
>>     + return NULL_RTX;
>>     +      /* FIXME: We only all AVL propation within a block which
>>     should
>>     + be totally enough for vectorized codes.
>>     +
>>     + TODO: We can enhance it here for intrinsic codes in the future
>>     + if it is necessary.  */
>>     +      if (def1->insn ()->bb () != insn->bb ()
>>     +   || def1->insn ()->compare_with (insn) >= 0)
>>     + return NULL_RTX;
>>     +    }
>>     +  return use_avl;
>>     +}
>>     +
>>     +/* If we have a preferred AVL to propagate, return the AVL.
>>     +   Otherwise, return NULL_RTX as we don't need have any preferred
>>     +   AVL.  */
>>     +
>>     +static rtx
>>     +get_preferred_avl (insn_info *insn)
>>     +{
>>     +  /* TODO: We only do AVL propagation for missed-LEN partial
>>     +     autovectorization for now.  We could add more more AVL
>>     +     propagation for intrinsic codes in the future. */
>>     +  return get_autovectorize_preferred_avl (insn);
>>     +}
>>     +
>>     +/* Return the AVL TYPE operand index.  */
>>     +static int
>>     +get_avl_type_index (insn_info *insn)
>>     +{
>>     +  extract_insn_cached (insn->rtl ());
>>     +  /* Except rounding mode patterns, AVL TYPE operand
>>     +     is always the last operand.  */
>>     +  if (find_access (insn->uses (), VXRM_REGNUM)
>>     +      || find_access (insn->uses (), FRM_REGNUM))
>>     +    return recog_data.n_operands - 2;
>>     +  return recog_data.n_operands - 1;
>>     +}
>>     +
>>     +/* Main entry point for this pass.  */
>>     +unsigned int
>>     +pass_avlprop::execute (function *)
>>     +{
>>     +  avlprop_init ();
>>     +
>>     +  /* Go through all the instructions looking for AVL that we
>>     could propagate. */
>>     +
>>     +  insn_info *next;
>>     +  bool change_p = true;
>>     +
>>     +  while (change_p)
>>     +    {
>>     +      /* Iterate on each instruction until no more change need.  */
>>     +      change_p = false;
>>     +      for (insn_info *insn = crtl->ssa->first_insn (); insn;
>>     insn = next)
>>     + {
>>     +   next = insn->next_any_insn ();
>>     +   /* We only forward AVL to the instruction that has AVL/VL operand
>>     +      and can be optimized in RTL_SSA level.  */
>>     +   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
>>     +     continue;
>>     +
>>     +   rtx new_avl = get_preferred_avl (insn);
>>     +   if (new_avl)
>>     +     {
>>     +       gcc_assert (!vlmax_avl_p (new_avl));
>>     +       auto &update = avlprops->get_or_insert (insn);
>>     +       change_p = !rtx_equal_p (update, new_avl);
>>     +       update = new_avl;
>>     +     }
>>     + }
>>     +    }
>>     +
>>     +  if (dump_file)
>>     +    fprintf (dump_file, "\nNumber of successful AVL
>>     propagations: %d\n\n",
>>     +      (int) avlprops->elements ());
>>     +
>>     +  for (const auto iter : *avlprops)
>>     +    {
>>     +      rtx_insn *rinsn = iter.first->rtl ();
>>     +      if (dump_file)
>>     + {
>>     +   fprintf (dump_file, "\nPropagating AVL: ");
>>     +   print_rtl_single (dump_file, iter.second);
>>     +   fprintf (dump_file, "into: ");
>>     +   print_rtl_single (dump_file, rinsn);
>>     + }
>>     +      /* Replace AVL operand.  */
>>     +      rtx new_pat
>>     + = simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first,
>>     false),
>>     + iter.second);
>>     +      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat,
>>     false);
>>     +
>>     +      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
>>     +      if (vlmax_avl_type_p (rinsn))
>>     + validate_change_or_fail (
>>     +   rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)],
>>     +   get_avl_type_rtx (avl_type::NONVLMAX), false);
>>     +      if (dump_file)
>>     + {
>>     +   fprintf (dump_file, "Successfully to match this instruction: ");
>>     +   print_rtl_single (dump_file, rinsn);
>>     + }
>>     +    }
>>     +
>>     +  avlprop_done ();
>>     +  return 0;
>>     +}
>>     +
>>     +rtl_opt_pass *
>>     +make_pass_avlprop (gcc::context *ctxt)
>>     +{
>>     +  return new pass_avlprop (ctxt);
>>     +}
>>     diff --git a/gcc/config/riscv/riscv-passes.def
>>     b/gcc/config/riscv/riscv-passes.def
>>     index 4084122cf0a..b6260939d5c 100644
>>     --- a/gcc/config/riscv/riscv-passes.def
>>     +++ b/gcc/config/riscv/riscv-passes.def
>>     @@ -18,4 +18,5 @@
>>     <http://www.gnu.org/licenses/>. */
>>     INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
>>     +INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
>>     INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
>>     diff --git a/gcc/config/riscv/riscv-protos.h
>>     b/gcc/config/riscv/riscv-protos.h
>>     index 6cb9d459ee9..2b09ec9ea9e 100644
>>     --- a/gcc/config/riscv/riscv-protos.h
>>     +++ b/gcc/config/riscv/riscv-protos.h
>>     @@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const
>>     char *, struct gcc_options *, locatio
>>     extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
>>     rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
>>     +rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
>>     rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
>>     /* Routines implemented in riscv-string.c.  */
>>     @@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
>>     bool cmp_lmul_gt_one (machine_mode);
>>     bool gather_scatter_valid_offset_mode_p (machine_mode);
>>     bool vls_mode_valid_p (machine_mode);
>>     +bool has_vtype_op (rtx_insn *);
>>     +bool has_vl_op (rtx_insn *);
>>     +bool tail_agnostic_p (rtx_insn *);
>>     +void validate_change_or_fail (rtx, rtx *, rtx, bool);
>>     +bool vlmax_avl_type_p (rtx_insn *);
>>     +bool vlmax_avl_p (rtx);
>>     +uint8_t get_sew (rtx_insn *);
>>     +enum vlmul_type get_vlmul (rtx_insn *);
>>     +bool const_vlmax_p (machine_mode);
>>     }
>>     /* We classify builtin types into two classes:
>>     diff --git a/gcc/config/riscv/riscv-v.cc
>>     b/gcc/config/riscv/riscv-v.cc
>>     index e39a9507803..473622ac321 100644
>>     --- a/gcc/config/riscv/riscv-v.cc
>>     +++ b/gcc/config/riscv/riscv-v.cc
>>     @@ -56,7 +56,7 @@ using namespace riscv_vector;
>>     namespace riscv_vector {
>>     /* Return true if vlmax is constant value and can be used in
>>     vsetivl.  */
>>     -static bool
>>     +bool
>>     const_vlmax_p (machine_mode mode)
>>     {
>>        poly_uint64 nuints = GET_MODE_NUNITS (mode);
>>     @@ -298,14 +298,6 @@ public:
>>           len = force_reg (Pmode, len);
>>         vls_p = true;
>>       }
>>     - else if (const_vlmax_p (vtype_mode))
>>     -   {
>>     -     /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
>>     -        the vsetvli to obtain the value of vlmax.  */
>>     -     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
>>     -     len = gen_int_mode (nunits, Pmode);
>>     -     vls_p = true;
>>     -   }
>>     else if (can_create_pseudo_p ())
>>       {
>>         len = gen_reg_rtx (Pmode);
>>     @@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
>>        emit_move_insn (dst, x4);
>>     }
>>     +/* Return true if it is an RVV instruction depends on VTYPE global
>>     +   status register.  */
>>     +bool
>>     +has_vtype_op (rtx_insn *rinsn)
>>     +{
>>     +  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op
>>     (rinsn);
>>     +}
>>     +
>>     +/* Return true if it is an RVV instruction depends on VL global
>>     +   status register.  */
>>     +bool
>>     +has_vl_op (rtx_insn *rinsn)
>>     +{
>>     +  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
>>     +}
>>     +
>>     +/* Get default tail policy.  */
>>     +static bool
>>     +get_default_ta ()
>>     +{
>>     +  /* For the instruction that doesn't require TA, we still need
>>     a default value
>>     +     to emit vsetvl. We pick up the default value according to
>>     prefer policy. */
>>     +  return (bool) (get_prefer_tail_policy () & 0x1
>>     + || (get_prefer_tail_policy () >> 1 & 0x1));
>>     +}
>>     +
>>     +/* Helper function to get TA operand.  */
>>     +bool
>>     +tail_agnostic_p (rtx_insn *rinsn)
>>     +{
>>     +  /* If it doesn't have TA, we return agnostic by default.  */
>>     +  extract_insn_cached (rinsn);
>>     +  int ta = get_attr_ta (rinsn);
>>     +  return ta == INVALID_ATTRIBUTE ? get_default_ta () :
>>     IS_AGNOSTIC (ta);
>>     +}
>>     +
>>     +/* Change insn and Assert the change always happens. */
>>     +void
>>     +validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool
>>     in_group)
>>     +{
>>     +  bool change_p = validate_change (object, loc, new_rtx, in_group);
>>     +  gcc_assert (change_p);
>>     +}
>>     +
>>     +/* Return true if it is VLMAX AVL TYPE.  */
>>     +bool
>>     +vlmax_avl_type_p (rtx_insn *rinsn)
>>     +{
>>     +  return get_attr_avl_type (rinsn) == VLMAX;
>>     +}
>>     +
>>     +/* Return true if RTX is RVV VLMAX AVL.  */
>>     +bool
>>     +vlmax_avl_p (rtx x)
>>     +{
>>     +  return x && rtx_equal_p (x, RVV_VLMAX);
>>     +}
>>     +
>>     +/* Helper function to get SEW operand. We always have SEW value for
>>     +   all RVV instructions that have VTYPE OP.  */
>>     +uint8_t
>>     +get_sew (rtx_insn *rinsn)
>>     +{
>>     +  return get_attr_sew (rinsn);
>>     +}
>>     +
>>     +/* Helper function to get VLMUL operand. We always have VLMUL
>>     value for
>>     +   all RVV instructions that have VTYPE OP. */
>>     +enum vlmul_type
>>     +get_vlmul (rtx_insn *rinsn)
>>     +{
>>     +  return (enum vlmul_type) get_attr_vlmul (rinsn);
>>     +}
>>     +
>>     } // namespace riscv_vector
>>     diff --git a/gcc/config/riscv/riscv-vsetvl.cc
>>     b/gcc/config/riscv/riscv-vsetvl.cc
>>     index e9dd669de98..f2f19e423bf 100644
>>     --- a/gcc/config/riscv/riscv-vsetvl.cc
>>     +++ b/gcc/config/riscv/riscv-vsetvl.cc
>>     @@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
>>        return agnostic_p ? "agnostic" : "undisturbed";
>>     }
>>     -static bool
>>     -vlmax_avl_p (rtx x)
>>     -{
>>     -  return x && rtx_equal_p (x, RVV_VLMAX);
>>     -}
>>     -
>>     -/* Return true if it is an RVV instruction depends on VTYPE global
>>     -   status register.  */
>>     -static bool
>>     -has_vtype_op (rtx_insn *rinsn)
>>     -{
>>     -  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op
>>     (rinsn);
>>     -}
>>     -
>>     -/* Return true if it is an RVV instruction depends on VL global
>>     -   status register.  */
>>     -static bool
>>     -has_vl_op (rtx_insn *rinsn)
>>     -{
>>     -  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
>>     -}
>>     -
>>     /* Return true if the instruction ignores VLMUL field of VTYPE.  */
>>     static bool
>>     ignore_vlmul_insn_p (rtx_insn *rinsn)
>>     @@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
>>        if (!has_vl_op (rinsn))
>>          return NULL_RTX;
>>     -  if (get_attr_avl_type (rinsn) == VLMAX)
>>     -    return RVV_VLMAX;
>>     -  extract_insn_cached (rinsn);
>>     -  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
>>     -}
>>     -/* Helper function to get SEW operand. We always have SEW value for
>>     -   all RVV instructions that have VTYPE OP.  */
>>     -static uint8_t
>>     -get_sew (rtx_insn *rinsn)
>>     -{
>>     -  return get_attr_sew (rinsn);
>>     -}
>>     -
>>     -/* Helper function to get VLMUL operand. We always have VLMUL
>>     value for
>>     -   all RVV instructions that have VTYPE OP. */
>>     -static enum vlmul_type
>>     -get_vlmul (rtx_insn *rinsn)
>>     -{
>>     -  return (enum vlmul_type) get_attr_vlmul (rinsn);
>>     -}
>>     +  extract_insn_cached (rinsn);
>>     +  if (vlmax_avl_type_p (rinsn))
>>     +    {
>>     +      if (BYTES_PER_RISCV_VECTOR.is_constant ())
>>     + {
>>     +   for (int i = 0; i < recog_data.n_operands; i++)
>>     +     if (GET_MODE_CLASS (recog_data.operand_mode[i]) ==
>>     MODE_VECTOR_BOOL
>>     + && const_vlmax_p (recog_data.operand_mode[i]))
>>     +       return gen_int_mode (GET_MODE_NUNITS
>>     (recog_data.operand_mode[i]),
>>     +    Pmode);
>>     + }
>>     +      return RVV_VLMAX;
>>     +    }
>>     -/* Get default tail policy.  */
>>     -static bool
>>     -get_default_ta ()
>>     -{
>>     -  /* For the instruction that doesn't require TA, we still need
>>     a default value
>>     -     to emit vsetvl. We pick up the default value according to
>>     prefer policy. */
>>     -  return (bool) (get_prefer_tail_policy () & 0x1
>>     - || (get_prefer_tail_policy () >> 1 & 0x1));
>>     +  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
>>     }
>>     /* Get default mask policy.  */
>>     @@ -407,16 +371,6 @@ get_default_ma ()
>>     || (get_prefer_mask_policy () >> 1 & 0x1));
>>     }
>>     -/* Helper function to get TA operand.  */
>>     -static bool
>>     -tail_agnostic_p (rtx_insn *rinsn)
>>     -{
>>     -  /* If it doesn't have TA, we return agnostic by default.  */
>>     -  extract_insn_cached (rinsn);
>>     -  int ta = get_attr_ta (rinsn);
>>     -  return ta == INVALID_ATTRIBUTE ? get_default_ta () :
>>     IS_AGNOSTIC (ta);
>>     -}
>>     -
>>     /* Helper function to get MA operand.  */
>>     static bool
>>     mask_agnostic_p (rtx_insn *rinsn)
>>     @@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn
>>     *rinsn, int regno)
>>        return true;
>>     }
>>     -/* Change insn and Assert the change always happens. */
>>     -static void
>>     -validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool
>>     in_group)
>>     -{
>>     -  bool change_p = validate_change (object, loc, new_rtx, in_group);
>>     -  gcc_assert (change_p);
>>     -}
>>     -
>>     /* This flags indicates the minimum demand of the vl and vtype
>>     values by the
>>         RVV instruction. For example, DEMAND_RATIO_P indicates that
>>     this RVV
>>         instruction only needs the SEW/LMUL ratio to remain the same,
>>     and does not
>>     diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
>>     index dd17056fe82..08de62853a6 100644
>>     --- a/gcc/config/riscv/t-riscv
>>     +++ b/gcc/config/riscv/t-riscv
>>     @@ -69,6 +69,12 @@ riscv-vsetvl.o:
>>     $(srcdir)/config/riscv/riscv-vsetvl.cc \
>>     $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>>     $(srcdir)/config/riscv/riscv-vsetvl.cc
>>     +riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
>>     +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
>>     +  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h
>>     + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>>     + $(srcdir)/config/riscv/riscv-avlprop.cc
>>     +
>>     riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
>>        $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H)
>>     $(FUNCTION_H) \
>>        $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
>>     diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
>>     index ef91950178f..0c59d1b90bc 100644
>>     --- a/gcc/config/riscv/vector.md
>>     +++ b/gcc/config/riscv/vector.md
>>     @@ -809,7 +809,7 @@
>>     V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
>>     V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
>>     V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
>>     -    (symbol_ref "riscv_vector::NONVLMAX")
>>     +    (symbol_ref "riscv_vector::VLS")
>>     (eq_attr "type"
>>     "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
>>     vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
>>     vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
>>     diff --git
>>     a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>     b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>     index 928a507a363..5278e4aa38f 100644
>>     --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>     +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>     @@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
>>          }
>>     }
>>     -/* { dg-final { scan-assembler {e32,m4} } } */
>>     +/* { dg-final { scan-assembler {e16,m2} } } */
>>     /* { dg-final { scan-assembler-not {csrr} } } */
>>     /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect"
>>     } } */
>>     /* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect"
>>     } } */
>>     diff --git
>>     a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>     b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>     index a50265fc1ec..1db2e073846 100644
>>     --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>     +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>     @@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict
>>     b, int n)
>>          a[i] = a[i] + b[i];
>>     }
>>     -/* { dg-final { scan-assembler {e32,m8} } } */
>>     +/* { dg-final { scan-assembler {e16,m4} } } */
>>     /* { dg-final { scan-assembler-not {csrr} } } */
>>     /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect"
>>     } } */
>>     /* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
>>     diff --git
>>     a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>     b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>     index eac7cbc757b..ca88d42cdf4 100644
>>     ---
>>     a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>     +++
>>     b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>     @@ -7,10 +7,11 @@
>>     /*
>>     ** foo:
>>     **
>>     vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>>     +** ...
>>     ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
>>     ** ...
>>     -**
>>     vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>>     -** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
>>     +**
>>     vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>>     +** ...
>>     ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
>>     ** ...
>>     */
>>     diff --git
>>     a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>     b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>     index 965365da4bb..13367423751 100644
>>     ---
>>     a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>     +++
>>     b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>     @@ -3,7 +3,6 @@
>>     #include "ternop-2.c"
>>     -/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
>>     /* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */
>>     /* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized"
>>     } } */
>>     /* { dg-final { scan-assembler-not {\tvmv} } } */
>>     diff --git
>>     a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>     b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>     new file mode 100644
>>     index 00000000000..b0d21650c3d
>>     --- /dev/null
>>     +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>     @@ -0,0 +1,16 @@
>>     +/* { dg-do compile } */
>>     +/* { dg-options "-march=rv64gcv -mabi=lp64d
>>     --param=riscv-autovec-preference=fixed-vlmax -O3" } */
>>     +
>>     +void
>>     +foo (int *__restrict a, int *__restrict b, int *__restrict c, int n)
>>     +{
>>     +  for (int i = 0; i < n; i++)
>>     +    a[i] = b[i] + c[i];
>>     +}
>>     +
>>     +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
>>     +/* { dg-final { scan-assembler-not {vsetivli} } } */
>>     +/* { dg-final { scan-assembler-times
>>     {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
>>     +/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero}
>>     } } */
>>     +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
>>     +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
>>     diff --git
>>     a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>     b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>     new file mode 100644
>>     index 00000000000..f2d8aa54b88
>>     --- /dev/null
>>     +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>     @@ -0,0 +1,33 @@
>>     +/* { dg-do compile } */
>>     +/* { dg-options "-march=rv64gcv -mabi=lp64d
>>     --param=riscv-autovec-preference=fixed-vlmax -O3" } */
>>     +
>>     +void
>>     +foo (int *__restrict a, int *__restrict b, int *__restrict c,
>>     +     int *__restrict a2, int *__restrict b2, int *__restrict c2,
>>     +     int *__restrict a3, int *__restrict b3, int *__restrict c3,
>>     +     int *__restrict a4, int *__restrict b4, int *__restrict c4,
>>     +     int *__restrict a5, int *__restrict b5, int *__restrict c5,
>>     +     int *__restrict d, int *__restrict d2, int *__restrict d3,
>>     +     int *__restrict d4, int *__restrict d5, int n, int m)
>>     +{
>>     +  for (int i = 0; i < n; i++)
>>     +    {
>>     +      a[i] = b[i] + c[i];
>>     +      a2[i] = b2[i] + c2[i];
>>     +      a3[i] = b3[i] + c3[i];
>>     +      a4[i] = b4[i] + c4[i];
>>     +      a5[i] = a[i] + a4[i];
>>     +      d[i] = a[i] - a2[i];
>>     +      d2[i] = a2[i] * a[i];
>>     +      d3[i] = a3[i] * a2[i];
>>     +      d4[i] = a2[i] * d2[i];
>>     +      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
>>     +    }
>>     +}
>>     +
>>     +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
>>     +/* { dg-final { scan-assembler-not {vsetivli} } } */
>>     +/* { dg-final { scan-assembler-times
>>     {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
>>     +/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero}
>>     } } */
>>     +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
>>     +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
>>     diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>     b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>     index 674ba0d72b4..fc830f2cd4d 100644
>>     --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>     +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>     @@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain
>>     $srcdir/$subdir/vsetvl/*.\[cS\]]] \
>>     "" $CFLAGS
>>     dg-runtest [lsort [glob -nocomplain
>>     $srcdir/$subdir/autovec/*.\[cS\]]] \
>>     "-O3 -ftree-vectorize" $CFLAGS
>>     +dg-runtest [lsort [glob -nocomplain
>>     $srcdir/$subdir/avlprop/*.\[cS\]]] \
>>     + "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
>>     dg-runtest [lsort [glob -nocomplain
>>     $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
>>     "-O3 -ftree-vectorize --param riscv-autovec-preference=scalable"
>>     $CFLAGS
>>     dg-runtest [lsort [glob -nocomplain
>>     $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
>>     -- 
>>     2.36.3
>>
--------------x0PQCsXXJBiqhfVf2Jt78yA0--