From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2119) id 1C36D3858415; Mon, 19 Jun 2023 11:41:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1C36D3858415 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687174886; bh=UCyPweX6FNnTxAXLF4l0jJR+I3rQT+vJiQHM6LfpP3g=; h=From:To:Subject:Date:From; b=e42PU0TLQw+rFtMbLa+brtN2aq/VYJNnf14wi+CGV6bqJ3v55cFWMuAwdIzo08BqA GejH/p1rw9dZvui6svVknD8Kmpg5ahmlm/kM7NyUx+IQjIatgdz1skTv6t58vwG68O kOkX6+iyuUA6GLTa5IQ5jmDrm4vPfvnt5ygj7sb0= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Jeff Law To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32 X-Act-Checkin: gcc X-Git-Author: Pan Li X-Git-Refname: refs/vendors/riscv/heads/gcc-13-with-riscv-opts X-Git-Oldrev: 17f9b68a95fb084daa02590ddc0890fd2e1cd948 X-Git-Newrev: b4e94443b6c815d3a6a078e7e2721bc6f3cc288e Message-Id: <20230619114126.1C36D3858415@sourceware.org> Date: Mon, 19 Jun 2023 11:41:26 +0000 (GMT) List-Id: https://gcc.gnu.org/g:b4e94443b6c815d3a6a078e7e2721bc6f3cc288e commit b4e94443b6c815d3a6a078e7e2721bc6f3cc288e Author: Pan Li Date: Tue Jun 13 23:19:14 2023 +0800 RISC-V: Bugfix for vec_init repeating auto vectorization in RV32 When constructing a vector mask from individual elements we wrongly assumed that we can broadcast BITS_PER_WORD (i.e. XLEN). The maximum is actually the vector element length (i.e. ELEN). This patch fixes this. After this patch, below failures on RV32 will be fixed. FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax execution test Signed-off-by: Pan Li gcc/ChangeLog: * config/riscv/riscv-v.cc (rvv_builder::get_merge_scalar_mask): Take elen instead of scalar BITS_PER_WORD. (expand_vector_init_merge_repeating_sequence): Use inner_bits_size instead of scaler BITS_PER_WORD. Diff: --- gcc/config/riscv/riscv-v.cc | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index e07d5c2901a..01f647bc0bd 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -399,10 +399,17 @@ rvv_builder::get_merge_scalar_mask (unsigned int index_in_pattern) const { unsigned HOST_WIDE_INT mask = 0; unsigned HOST_WIDE_INT base_mask = (1ULL << index_in_pattern); + /* Here we construct a mask pattern that will later be broadcast + to a vector register. The maximum broadcast size for vmv.v.x/vmv.s.x + is determined by the length of a vector element (ELEN) and not by + XLEN so make sure we do not exceed it. One example is -march=zve32* + which mandates ELEN == 32 but can be combined with -march=rv64 + with XLEN == 64. */ + unsigned int elen = TARGET_VECTOR_ELEN_64 ? 64 : 32; - gcc_assert (BITS_PER_WORD % npatterns () == 0); + gcc_assert (elen % npatterns () == 0); - int limit = BITS_PER_WORD / npatterns (); + int limit = elen / npatterns (); for (int i = 0; i < limit; i++) mask |= base_mask << (i * npatterns ()); @@ -1928,7 +1935,7 @@ expand_vector_init_merge_repeating_sequence (rtx target, rtx mask = gen_reg_rtx (mask_mode); rtx dup = gen_reg_rtx (dup_mode); - if (full_nelts <= BITS_PER_WORD) /* vmv.s.x. */ + if (full_nelts <= builder.inner_bits_size ()) /* vmv.s.x. */ { rtx ops[] = {dup, gen_scalar_move_mask (dup_mask_mode), RVV_VUNDEF (dup_mode), merge_mask}; @@ -1938,7 +1945,8 @@ expand_vector_init_merge_repeating_sequence (rtx target, else /* vmv.v.x. */ { rtx ops[] = {dup, force_reg (GET_MODE_INNER (dup_mode), merge_mask)}; - rtx vl = gen_int_mode (CEIL (full_nelts, BITS_PER_WORD), Pmode); + rtx vl = gen_int_mode (CEIL (full_nelts, builder.inner_bits_size ()), + Pmode); emit_nonvlmax_integer_move_insn (code_for_pred_broadcast (dup_mode), ops, vl); }