From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpbgbr2.qq.com (smtpbgbr2.qq.com [54.207.22.56]) by sourceware.org (Postfix) with ESMTPS id CCA2C3858D37 for ; Mon, 23 Oct 2023 09:04:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CCA2C3858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CCA2C3858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=54.207.22.56 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698051858; cv=none; b=REws+TioOe7OjnbTPqNvvFtzLxbhWwBPgbsCy+L5flv0BPKmJur2XGMHMXzRZhADeIVQly3BvhrmOsqjyMECE5A2ASnmKWXdQscK4eRc8K9yp9lfF1UexJQ6XtPwZStNHF6jVRa0Y7vf0gMtZr+5PBNAa4irDvMIDjTObRauZTc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698051858; c=relaxed/simple; bh=dZ1k50J4+qEqYJessNPZ0/BMtZeaiGbkpPINvqxwPcE=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=jCOXe7hZyWLNdDA25y7m/zc5o4Pzz/0JFLIWkNFMfxtFIWps4XFHYG1/rkzDB4PHhEaVtvgYiuEMMLBVTvCDs8clsHQgRLXZTNhu+DHfeRBJBQYsoaDgGJPGE17uDytyx8bOXIJTQhmmMWooifcXR6nv8r8x1SX2FSULY0cKXIA= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp74t1698051843tusgsqyr Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Mon, 23 Oct 2023 17:04:02 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: vrqOr+ppv0tZ6kGoE/ZxFdfU86ESMfVMfv6NpEKEl4J2gRA0E458M6wCMCauA Ft9PXS5JY4gKKNpZgAxKEm4baHtEJ/xU25u+gcpWphQCvZlbesNVjZ7Y5Aj/+3TJclq8rQS 5vPc8TTF5el0BzH0CBWctdsty3i/LhA/0lZY73fnQQrN5ebcxEyFSZfgMbB5mUQb6D+RV4y IjlXmuQaW6sRSVoIV3Vg1wlQRfxdqM4QRn7/+wRtGicDSThD2D8EJ8funZvy8NYgK0kh8fz 5K0UtpigA3SFteWEHF9cRqeyOlJRjZkJNXFc4QOPHGTfUnJ2yRbIat9t+JmG6LIupCYIM0M 8N7xqRsl/YHGEkl3kPBrWkn1WB4Fp/e9e9ohQ3/PdNIh2rUsCTTv0ji5DzX3J/dDG2HPiuD iPU+c9M+tpXHWHmIhiVf7Q== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 10275362683197330235 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Fix ICE for the fusion case from vsetvl to scalar move[PR111927] Date: Mon, 23 Oct 2023 17:04:01 +0800 Message-Id: <20231023090401.1724890-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: ICE: during RTL pass: vsetvl : In function 'riscv_lms_f32': :240:1: internal compiler error: in merge, at config/riscv/riscv-vsetvl.cc:1997 240 | } In general compatible_p (avl_equal_p) has: if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ()) return false; Don't fuse AVL of vsetvl if the VL operand is used by non-RVV instructrions. It is reasonable to add it into 'can_use_next_avl_p' since we don't want to fuse AVL of vsetvl into a scalar move instruction which doesn't demand AVL. And after the fusion, we will alway use compatible_p to check whether the demand is correct or not. PR target/111927 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: Fix ICE. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr111927.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 23 ++ .../gcc.target/riscv/rvv/vsetvl/pr111927.c | 243 ++++++++++++++++++ 2 files changed, 266 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 47b459fddd4..42295732ed7 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -1541,6 +1541,29 @@ private: inline bool can_use_next_avl_p (const vsetvl_info &prev, const vsetvl_info &next) { + /* Forbid the AVL/VL propagation if VL of NEXT is used + by non-RVV instructions. This is because: + + bb 2: + scalar move (no AVL) + bb 3: + vsetvl a5(VL), a4(AVL) ... + branch a5,zero + + Since user vsetvl instruction is no side effect instruction + which should be placed in the correct and optimal location + of the program by the previous PASS, it is unreasonble that + VSETVL PASS tries to move it to another places if it used by + non-RVV instructions. + + Note: We only forbid the cases that VL is used by the following + non-RVV instructions which will cause issues. We don't forbid + other cases since it won't cause correctness issues and we still + more more demand info are fused backward. The later LCM algorithm + should know the optimal location of the vsetvl. */ + if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ()) + return false; + if (!next.has_nonvlmax_reg_avl () && !next.has_vl ()) return true; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c new file mode 100644 index 00000000000..62f395fee33 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c @@ -0,0 +1,243 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ + +#include "riscv_vector.h" +#include + +#define RISCV_MATH_LOOPUNROLL +#define RISCV_MATH_VECTOR +typedef float float32_t; + + typedef struct + { + uint16_t numTaps; /**< number of coefficients in the filter. */ + float32_t *pState; /**< points to the state variable array. The array is of length numTaps+blockSize-1. */ + float32_t *pCoeffs; /**< points to the coefficient array. The array is of length numTaps. */ + float32_t mu; /**< step size that controls filter coefficient updates. */ + } riscv_lms_instance_f32; + + +void riscv_lms_f32( + const riscv_lms_instance_f32 * S, + const float32_t * pSrc, + float32_t * pRef, + float32_t * pOut, + float32_t * pErr, + uint32_t blockSize) +{ + float32_t *pState = S->pState; /* State pointer */ + float32_t *pCoeffs = S->pCoeffs; /* Coefficient pointer */ + float32_t *pStateCurnt; /* Points to the current sample of the state */ + float32_t *px, *pb; /* Temporary pointers for state and coefficient buffers */ + float32_t mu = S->mu; /* Adaptive factor */ + float32_t acc, e; /* Accumulator, error */ + float32_t w; /* Weight factor */ + uint32_t numTaps = S->numTaps; /* Number of filter coefficients in the filter */ + uint32_t tapCnt, blkCnt; /* Loop counters */ + + /* Initializations of error, difference, Coefficient update */ + e = 0.0f; + w = 0.0f; + + /* S->pState points to state array which contains previous frame (numTaps - 1) samples */ + /* pStateCurnt points to the location where the new input data should be written */ + pStateCurnt = &(S->pState[(numTaps - 1U)]); + + /* initialise loop count */ + blkCnt = blockSize; + + while (blkCnt > 0U) + { + /* Copy the new input sample into the state buffer */ + *pStateCurnt++ = *pSrc++; + + /* Initialize pState pointer */ + px = pState; + + /* Initialize coefficient pointer */ + pb = pCoeffs; + + /* Set the accumulator to zero */ + acc = 0.0f; +#if defined (RISCV_MATH_VECTOR) + uint32_t vblkCnt = numTaps; /* Loop counter */ + size_t l; + vfloat32m8_t vx, vy; + vfloat32m1_t temp00m1; + l = __riscv_vsetvl_e32m1(1); + temp00m1 = __riscv_vfmv_v_f_f32m1(0, l); + for (; (l = __riscv_vsetvl_e32m8(vblkCnt)) > 0; vblkCnt -= l) { + vx = __riscv_vle32_v_f32m8(px, l); + px += l; + vy = __riscv_vle32_v_f32m8(pb, l); + pb += l; + temp00m1 = __riscv_vfredusum_vs_f32m8_f32m1(__riscv_vfmul_vv_f32m8(vx, vy, l), temp00m1, l); + } + acc += __riscv_vfmv_f_s_f32m1_f32(temp00m1); +#else +#if defined (RISCV_MATH_LOOPUNROLL) + + /* Loop unrolling: Compute 4 taps at a time. */ + tapCnt = numTaps >> 2U; + + while (tapCnt > 0U) + { + /* Perform the multiply-accumulate */ + acc += (*px++) * (*pb++); + + acc += (*px++) * (*pb++); + + acc += (*px++) * (*pb++); + + acc += (*px++) * (*pb++); + + /* Decrement loop counter */ + tapCnt--; + } + + /* Loop unrolling: Compute remaining taps */ + tapCnt = numTaps & 0x3U; + +#else + + /* Initialize tapCnt with number of samples */ + tapCnt = numTaps; + +#endif /* #if defined (RISCV_MATH_LOOPUNROLL) */ + + while (tapCnt > 0U) + { + /* Perform the multiply-accumulate */ + acc += (*px++) * (*pb++); + + /* Decrement the loop counter */ + tapCnt--; + } +#endif /* defined (RISCV_MATH_VECTOR) */ + /* Store the result from accumulator into the destination buffer. */ + *pOut++ = acc; + + /* Compute and store error */ + e = (float32_t) *pRef++ - acc; + *pErr++ = e; + + /* Calculation of Weighting factor for updating filter coefficients */ + w = e * mu; + + /* Initialize pState pointer */ + /* Advance state pointer by 1 for the next sample */ + px = pState++; + + /* Initialize coefficient pointer */ + pb = pCoeffs; + +#if defined (RISCV_MATH_VECTOR) + vblkCnt = numTaps; + for (; (l = __riscv_vsetvl_e32m8(vblkCnt)) > 0; vblkCnt -= l) { + vx = __riscv_vle32_v_f32m8(px, l); + px += l; + __riscv_vse32_v_f32m8(pb, __riscv_vfadd_vv_f32m8(__riscv_vfmul_vf_f32m8(vx, w, l), __riscv_vle32_v_f32m8(pb, l), l) , l); + pb += l; + } +#else +#if defined (RISCV_MATH_LOOPUNROLL) + + /* Loop unrolling: Compute 4 taps at a time. */ + tapCnt = numTaps >> 2U; + + /* Update filter coefficients */ + while (tapCnt > 0U) + { + /* Perform the multiply-accumulate */ + *pb += w * (*px++); + pb++; + + *pb += w * (*px++); + pb++; + + *pb += w * (*px++); + pb++; + + *pb += w * (*px++); + pb++; + + /* Decrement loop counter */ + tapCnt--; + } + + /* Loop unrolling: Compute remaining taps */ + tapCnt = numTaps & 0x3U; + +#else + + /* Initialize tapCnt with number of samples */ + tapCnt = numTaps; + +#endif /* #if defined (RISCV_MATH_LOOPUNROLL) */ + + while (tapCnt > 0U) + { + /* Perform the multiply-accumulate */ + *pb += w * (*px++); + pb++; + + /* Decrement loop counter */ + tapCnt--; + } +#endif /* defined (RISCV_MATH_VECTOR) */ + /* Decrement loop counter */ + blkCnt--; + } + + /* Processing is complete. + Now copy the last numTaps - 1 samples to the start of the state buffer. + This prepares the state buffer for the next function call. */ + + /* Points to the start of the pState buffer */ + pStateCurnt = S->pState; + + /* copy data */ +#if defined (RISCV_MATH_VECTOR) + uint32_t vblkCnt = (numTaps - 1U); /* Loop counter */ + size_t l; + for (; (l = __riscv_vsetvl_e32m8(vblkCnt)) > 0; vblkCnt -= l) { + __riscv_vse32_v_f32m8(pStateCurnt, __riscv_vle32_v_f32m8(pState, l) , l); + pState += l; + pStateCurnt += l; + } +#else +#if defined (RISCV_MATH_LOOPUNROLL) + + /* Loop unrolling: Compute 4 taps at a time. */ + tapCnt = (numTaps - 1U) >> 2U; + + while (tapCnt > 0U) + { + *pStateCurnt++ = *pState++; + *pStateCurnt++ = *pState++; + *pStateCurnt++ = *pState++; + *pStateCurnt++ = *pState++; + + /* Decrement loop counter */ + tapCnt--; + } + + /* Loop unrolling: Compute remaining taps */ + tapCnt = (numTaps - 1U) & 0x3U; + +#else + + /* Initialize tapCnt with number of samples */ + tapCnt = (numTaps - 1U); + +#endif /* #if defined (RISCV_MATH_LOOPUNROLL) */ + + while (tapCnt > 0U) + { + *pStateCurnt++ = *pState++; + + /* Decrement loop counter */ + tapCnt--; + } +#endif /* defined (RISCV_MATH_VECTOR) */ +} -- 2.36.3