[Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
@ 2023-11-08  0:15 juzhe.zhong at rivai dot ai
  2023-11-08  0:16 ` [Bug c/112431] " juzhe.zhong at rivai dot ai
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-08  0:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

            Bug ID: 112431
           Summary: RISC-V GCC-15 feature: Support register overlap on
                    widen RVV instructions
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

According to RVV ISA:

"The destination EEW is smaller than the source EEW and the overlap is in the 
 lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi 
 v0, v0, 3 is legal, but a destination of v1 is not)."

It's nice that we can support register overlap currently for narrow operations.
Consider this following example:

#include "riscv_vector.h"
void f20 (int16_t *base,int8_t *out,size_t vl, size_t shift)
{
    vuint16m2_t src = __riscv_vle16_v_u16m2 (base, vl);
    /* Only allow load v30,v31.  */
    asm volatile("#" ::
                 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9",
                   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17", 
                   "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",  
                   "v26", "v27", "v28", "v29");

    vuint8m1_t v = __riscv_vnclipu_wx_u8m1(src,shift,0,vl);
    /* Only allow vncvt SRC == DEST v30.  */
    asm volatile("#" ::                                                        
                 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", 
                   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",     
                   "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",     
                   "v26", "v27", "v28", "v29", "v31");

    __riscv_vse8_v_u8m1 (out,v,vl);
}

https://gcc.godbolt.org/z/j98xejKh5

GCC doesn't have register spilling wheras LLVM has register spillings.

However, we failed to support register overlap for RVV widen operations.
Since according to RVV ISA: 

"The destination EEW is greater than the source EEW, the source EMUL is at
least 
 1, and the overlap is in the highest-numbered part of the destination register 
 group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2,
or 
 v4 is not)."

Consider this following case:

#include "riscv_vector.h"
void f20 (void *base,void *out,size_t vl, size_t shift)
{
    vuint16m1_t src = __riscv_vle16_v_u16m1 (base, vl);
    /* Only allow load v30,v31.  */
    asm volatile("#" ::
                 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9",
                   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17", 
                   "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",  
                   "v26", "v27", "v28", "v29", "v30");

    vuint32m2_t v = __riscv_vwaddu_vv_u32m2(src,src,vl);
    /* Only allow vncvt SRC == DEST v30.  */
    asm volatile("#" ::                                                        
                 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", 
                   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",     
                   "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",     
                   "v26", "v27", "v28", "v29");

    __riscv_vse32_v_u32m2 (out,v,vl);
}

https://gcc.godbolt.org/z/h3cM9vhnY

Since we are configuring RVV widen instructions early clobber, same as LLVM.
We can see both LLVM and GCC fail to overlap registers.

GCC ASM:

f20:
        vsetvli zero,a2,e16,m1,ta,ma
        vle16.v v31,0(a0)
        vwaddu.vv       v2,v31,v31
        vmv2r.v v30,v2                   ----> Redundant mov instruction.
        vse32.v v30,0(a1)
        ret

We should be able to generate vwaddu.vv v30,v31,v31 which can eliminate the
redundant move instruction.

This issue will be fixed on GCC-15 since we don't enough time on GCC-14.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
@ 2023-11-08  0:16 ` juzhe.zhong at rivai dot ai
  2023-11-08  0:16 ` juzhe.zhong at rivai dot ai
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-08  0:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lehua.ding at rivai dot ai

--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
My colleague take this response to support this feature.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
  2023-11-08  0:16 ` [Bug c/112431] " juzhe.zhong at rivai dot ai
@ 2023-11-08  0:16 ` juzhe.zhong at rivai dot ai
  2023-11-08  1:56 ` kito at gcc dot gnu.org
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-08  0:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #2 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
My colleague Lehua takes this response to support this feature.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
  2023-11-08  0:16 ` [Bug c/112431] " juzhe.zhong at rivai dot ai
  2023-11-08  0:16 ` juzhe.zhong at rivai dot ai
@ 2023-11-08  1:56 ` kito at gcc dot gnu.org
  2023-11-12 21:18 ` pinskia at gcc dot gnu.org
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: kito at gcc dot gnu.org @ 2023-11-08  1:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #3 from Kito Cheng <kito at gcc dot gnu.org> ---
Share some thought from my end: we've tried at least 3 different approach on
LLVM side before, and now we model that as "partial early clobber", we plan to
upstream  this on LLVM side but just didn't get high enough priority yet :(

What means? Give some practical example to demo the idea:

1. It's normal live range without early clobber

vadd x, y z # y and z is dead after this use.

|---------------------|
| read  |     y    z  |
| write | x           |
|---------------------|


2. It's live range with early clobber.

vadd x, y z # y and z is dead after this use, and assume x is early clobber.

|---------------------|
| read  | x   y    z  |
| write | x           |
|---------------------|


3. It's live range with partial early clobber.

vwadd.vv x, y, z # x is two time larger than y and z

So we split x into xh and xl to represent the high part and low part, and
assume  high part can be overlap with others.

|------------------------|
| read  |    xl  y    z  |
| write | xh xl          |
|------------------------|

And following case is assume high part can overlap with others:

|------------------------|
| read  | xh     y    z  |
| write | xh xl          |
|------------------------|

Then the register allocator should able to did the overlapping allocation
naturally IF we build live range.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (2 preceding siblings ...)
  2023-11-08  1:56 ` kito at gcc dot gnu.org
@ 2023-11-12 21:18 ` pinskia at gcc dot gnu.org
  2023-11-29  9:37 ` [Bug target/112431] " cvs-commit at gcc dot gnu.org
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-12 21:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (3 preceding siblings ...)
  2023-11-12 21:18 ` pinskia at gcc dot gnu.org
@ 2023-11-29  9:37 ` cvs-commit at gcc dot gnu.org
  2023-11-30  1:16 ` cvs-commit at gcc dot gnu.org
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-29  9:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:bdad036da32f72b84a96070518e7d75c21706dc2

commit r14-5960-gbdad036da32f72b84a96070518e7d75c21706dc2
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Wed Nov 29 16:34:10 2023 +0800

    RISC-V: Support highpart register overlap for vwcvt

    Since Richard supports register filters recently, we are able to support
highpart register
    overlap for widening RVV instructions.

    This patch support it for vwcvt intrinsics.

    I leverage real application user codes for vwcvt:
    https://github.com/riscv/riscv-v-spec/issues/929
    https://godbolt.org/z/xoeGnzd8q

    This is the real application codes that using LMUL = 8 with unrolling to
gain optimal
    performance for specific libraury.

    You can see in the codegen, GCC has optimal codegen for such since we
supported register
    lowpart overlap for narrowing instructions (dest EEW < source EEW).

    Now, we start to support highpart register overlap from this patch for
widening instructions (dest EEW > source EEW).

    Leverage this intrinsic codes above but for vwcvt:

    https://godbolt.org/z/1TMPE5Wfr

    size_t
    foo (char const *buf, size_t len)
    {
      size_t sum = 0;
      size_t vl = __riscv_vsetvlmax_e8m8 ();
      size_t step = vl * 4;
      const char *it = buf, *end = buf + len;
      for (; it + step <= end;)
        {
          vint8m4_t v0 = __riscv_vle8_v_i8m4 ((void *) it, vl);
          it += vl;
          vint8m4_t v1 = __riscv_vle8_v_i8m4 ((void *) it, vl);
          it += vl;
          vint8m4_t v2 = __riscv_vle8_v_i8m4 ((void *) it, vl);
          it += vl;
          vint8m4_t v3 = __riscv_vle8_v_i8m4 ((void *) it, vl);
          it += vl;

          asm volatile("nop" ::: "memory");
          vint16m8_t vw0 = __riscv_vwcvt_x_x_v_i16m8 (v0, vl);
          vint16m8_t vw1 = __riscv_vwcvt_x_x_v_i16m8 (v1, vl);
          vint16m8_t vw2 = __riscv_vwcvt_x_x_v_i16m8 (v2, vl);
          vint16m8_t vw3 = __riscv_vwcvt_x_x_v_i16m8 (v3, vl);

          asm volatile("nop" ::: "memory");
          size_t sum0 = __riscv_vmv_x_s_i16m8_i16 (vw0);
          size_t sum1 = __riscv_vmv_x_s_i16m8_i16 (vw1);
          size_t sum2 = __riscv_vmv_x_s_i16m8_i16 (vw2);
          size_t sum3 = __riscv_vmv_x_s_i16m8_i16 (vw3);

          sum += sumation (sum0, sum1, sum2, sum3);
        }
      return sum;
    }

    Before this patch:

    ...
    csrr    t0,vlenb
    ...
            vwcvt.x.x.v     v16,v8
            vwcvt.x.x.v     v8,v28
            vs8r.v  v16,0(sp)               ---> spill
            vwcvt.x.x.v     v16,v24
            vwcvt.x.x.v     v24,v4
            nop
            vsetvli zero,zero,e16,m8,ta,ma
            vmv.x.s a2,v16
            vl8re16.v       v16,0(sp)      --->  reload
    ...
    csrr    t0,vlenb
    ...

    You can see heavy spill && reload inside the loop body.

    After this patch:

    ...
            vwcvt.x.x.v     v8,v12
            vwcvt.x.x.v     v16,v20
            vwcvt.x.x.v     v24,v28
            vwcvt.x.x.v     v0,v4
    ...

    Optimal codegen after this patch.

    Tested on zvl128b no regression.

    I am gonna to test zve64d/zvl256b/zvl512b/zvl1024b.

    Ok for trunk if no regression on the testing above ?

    Co-authored-by: kito-cheng <kito.cheng@sifive.com>
    Co-authored-by: kito-cheng <kito.cheng@gmail.com>

            PR target/112431

    gcc/ChangeLog:

            * config/riscv/constraints.md (TARGET_VECTOR ? V_REGS : NO_REGS):
New register filters.
            * config/riscv/riscv.md (no,W21,W42,W84,W41,W81,W82): Ditto.
            (no,yes): Ditto.
            * config/riscv/vector.md: Support highpart register overlap for
vwcvt.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/pr112431-1.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-2.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-3.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (4 preceding siblings ...)
  2023-11-29  9:37 ` [Bug target/112431] " cvs-commit at gcc dot gnu.org
@ 2023-11-30  1:16 ` cvs-commit at gcc dot gnu.org
  2023-11-30  2:40 ` cvs-commit at gcc dot gnu.org
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-30  1:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:62685890d8861b72f812bfe171a20332df08bd49

commit r14-5982-g62685890d8861b72f812bfe171a20332df08bd49
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Wed Nov 29 18:53:06 2023 +0800

    RISC-V: Support highpart overlap for vext.vf

    PR target/112431

    gcc/ChangeLog:

            * config/riscv/vector.md: Support highpart overlap for vext.vf2

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/unop_v_constraint-2.c: Adapt test.
            * gcc.target/riscv/rvv/base/pr112431-4.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-5.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-6.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (5 preceding siblings ...)
  2023-11-30  1:16 ` cvs-commit at gcc dot gnu.org
@ 2023-11-30  2:40 ` cvs-commit at gcc dot gnu.org
  2023-11-30 10:50 ` cvs-commit at gcc dot gnu.org
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-30  2:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Lehua Ding <lhtin@gcc.gnu.org>:

https://gcc.gnu.org/g:8614cbb253484e28c3eb20cde4d1067aad56de58

commit r14-5984-g8614cbb253484e28c3eb20cde4d1067aad56de58
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Thu Nov 30 10:36:30 2023 +0800

    RISC-V: Support highpart overlap for floating-point widen instructions

    This patch leverages the approach of vwcvt/vext.vf2 which has been
approved.
    Their approaches are totally the same.

    Tested no regression and committed.

            PR target/112431

    gcc/ChangeLog:

            * config/riscv/vector.md: Add widenning overlap.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/pr112431-10.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-11.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-12.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-13.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-14.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-15.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-7.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-8.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-9.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (6 preceding siblings ...)
  2023-11-30  2:40 ` cvs-commit at gcc dot gnu.org
@ 2023-11-30 10:50 ` cvs-commit at gcc dot gnu.org
  2023-11-30 12:11 ` cvs-commit at gcc dot gnu.org
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-30 10:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Lehua Ding <lhtin@gcc.gnu.org>:

https://gcc.gnu.org/g:5a35152f87a36db480693884dfb27ff6a5d5d683

commit r14-6007-g5a35152f87a36db480693884dfb27ff6a5d5d683
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Thu Nov 30 18:12:04 2023 +0800

    RISC-V: Remove earlyclobber for wx/wf instructions.

    While working on overlap for widening instructions, I realize that we set
    vwadd.wx/vfwadd.wf as earlyclobber which is incorrect.

    Since according to RVV ISA:
    "The destination EEW equals the source EEW."

    vwadd.vx widens the first source operand (i.e. 2 * source EEW = dest EEW)
while
    vwadd.wx only widens the second/scalar source operand.

    Therefore overlap is legal for wx but not for vx.

    Before this patch (heave spillings):

            csrr    a5,vlenb
            slli    a5,a5,1
            addi    a5,a5,64
            vfwadd.wf       v2,v14,fs0
            add     a5,a5,sp
            vs2r.v  v2,0(a5)
            vl2re32.v       v2,0(a1)
            vfwadd.wf       v14,v12,fs0
            vfwadd.wf       v12,v10,fs0
            vfwadd.wf       v10,v8,fs0
            vfwadd.wf       v8,v6,fs0
            vfwadd.wf       v6,v4,fs0
            vfwadd.wf       v4,v2,fs0
            vfwadd.wf       v2,v16,fs0
            vfwadd.wf       v16,v18,fs0
            vfwadd.wf       v18,v20,fs0
            vfwadd.wf       v20,v22,fs0
            vfwadd.wf       v22,v24,fs0
            vfwadd.wf       v24,v26,fs0
            vfwadd.wf       v26,v28,fs0
            vfwadd.wf       v28,v30,fs0
            vfwadd.wf       v30,v0,fs0
            nop
            vsetvli zero,zero,e32,m2,ta,ma
            csrr    a5,vlenb

    After this patch (no spillings):

            vfwadd.wf       v16,v16,fs0
            vfwadd.wf       v14,v14,fs0
            vfwadd.wf       v12,v12,fs0
            vfwadd.wf       v10,v10,fs0
            vfwadd.wf       v8,v8,fs0
            vfwadd.wf       v6,v6,fs0
            vfwadd.wf       v4,v4,fs0
            vfwadd.wf       v2,v2,fs0
            vfwadd.wf       v18,v18,fs0
            vfwadd.wf       v20,v20,fs0
            vfwadd.wf       v22,v22,fs0
            vfwadd.wf       v24,v24,fs0
            vfwadd.wf       v26,v26,fs0
            vfwadd.wf       v28,v28,fs0
            vfwadd.wf       v30,v30,fs0
            vfwadd.wf       v0,v0,fs0

    Confirm the codegen above run successfully on both SPIKE/QEMU.

            PR target/112431

    gcc/ChangeLog:

            * config/riscv/vector.md: Remove earlyclobber for wx/wf
instructions.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/pr112431-19.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-20.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-21.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (7 preceding siblings ...)
  2023-11-30 10:50 ` cvs-commit at gcc dot gnu.org
@ 2023-11-30 12:11 ` cvs-commit at gcc dot gnu.org
  2023-12-01 12:09 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-30 12:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #8 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:303195e2a6b6f0e8f42e0578b61f9f37c6250beb

commit r14-6008-g303195e2a6b6f0e8f42e0578b61f9f37c6250beb
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Thu Nov 30 20:08:43 2023 +0800

    RISC-V: Support widening register overlap for vf4/vf8

    size_t
    foo (char const *buf, size_t len)
    {
      size_t sum = 0;
      size_t vl = __riscv_vsetvlmax_e8m8 ();
      size_t step = vl * 4;
      const char *it = buf, *end = buf + len;
      for (; it + step <= end;)
        {
          vint8m1_t v0 = __riscv_vle8_v_i8m1 ((void *) it, vl);
          it += vl;
          vint8m1_t v1 = __riscv_vle8_v_i8m1 ((void *) it, vl);
          it += vl;
          vint8m1_t v2 = __riscv_vle8_v_i8m1 ((void *) it, vl);
          it += vl;
          vint8m1_t v3 = __riscv_vle8_v_i8m1 ((void *) it, vl);
          it += vl;

          asm volatile("nop" ::: "memory");
          vint64m8_t vw0 = __riscv_vsext_vf8_i64m8 (v0, vl);
          vint64m8_t vw1 = __riscv_vsext_vf8_i64m8 (v1, vl);
          vint64m8_t vw2 = __riscv_vsext_vf8_i64m8 (v2, vl);
          vint64m8_t vw3 = __riscv_vsext_vf8_i64m8 (v3, vl);

          asm volatile("nop" ::: "memory");
          size_t sum0 = __riscv_vmv_x_s_i64m8_i64 (vw0);
          size_t sum1 = __riscv_vmv_x_s_i64m8_i64 (vw1);
          size_t sum2 = __riscv_vmv_x_s_i64m8_i64 (vw2);
          size_t sum3 = __riscv_vmv_x_s_i64m8_i64 (vw3);

          sum += sumation (sum0, sum1, sum2, sum3);
        }
      return sum;
    }

    Before this patch:

            add     a3,s0,s1
            add     a4,s6,s1
            add     a5,s7,s1
            vsetvli zero,s0,e64,m8,ta,ma
            vle8.v  v4,0(s1)
            vle8.v  v3,0(a3)
            mv      s1,s2
            vle8.v  v2,0(a4)
            vle8.v  v1,0(a5)
            nop
            vsext.vf8       v8,v4
            vsext.vf8       v16,v2
            vs8r.v  v8,0(sp)
            vsext.vf8       v24,v1
            vsext.vf8       v8,v3
            nop
            vmv.x.s a1,v8
            vl8re64.v       v8,0(sp)
            vmv.x.s a3,v24
            vmv.x.s a2,v16
            vmv.x.s a0,v8
            add     s2,s2,s5
            call    sumation
            add     s3,s3,a0
            bgeu    s4,s2,.L5

    After this patch:

            add     a3,s0,s1
            add     a4,s6,s1
            add     a5,s7,s1
            vsetvli zero,s0,e64,m8,ta,ma
            vle8.v  v15,0(s1)
            vle8.v  v23,0(a3)
            mv      s1,s2
            vle8.v  v31,0(a4)
            vle8.v  v7,0(a5)
            vsext.vf8       v8,v15
            vsext.vf8       v16,v23
            vsext.vf8       v24,v31
            vsext.vf8       v0,v7
            vmv.x.s a3,v0
            vmv.x.s a2,v24
            vmv.x.s a1,v16
            vmv.x.s a0,v8
            add     s2,s2,s5
            call    sumation
            add     s3,s3,a0
            bgeu    s4,s2,.L5

            PR target/112431

    gcc/ChangeLog:

            * config/riscv/vector.md: Add widening overlap of vf2/vf4.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/pr112431-16.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-17.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-18.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (8 preceding siblings ...)
  2023-11-30 12:11 ` cvs-commit at gcc dot gnu.org
@ 2023-12-01 12:09 ` cvs-commit at gcc dot gnu.org
  2023-12-01 12:09 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-01 12:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Lehua Ding <lhtin@gcc.gnu.org>:

https://gcc.gnu.org/g:4418d55bcd1b7e0ef823981b6a781d7de5c38cce

commit r14-6054-g4418d55bcd1b7e0ef823981b6a781d7de5c38cce
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Fri Dec 1 16:09:59 2023 +0800

    RISC-V: Support highpart overlap for indexed load with SRC EEW < DEST EEW

    Leverage previous approach.

    Before this patch:

    .L5:
            add     a3,s0,s2
            add     a4,s6,s2
            add     a5,s7,s2
            vsetvli zero,s0,e64,m8,ta,ma
            vle8.v  v4,0(s2)
            vle8.v  v3,0(a3)
            mv      s2,s1
            vle8.v  v2,0(a4)
            vle8.v  v1,0(a5)
            nop
            vluxei8.v       v8,(s1),v4
            vs8r.v  v8,0(sp)              ---> spill
            vluxei8.v       v8,(s1),v3
            vluxei8.v       v16,(s1),v2
            vluxei8.v       v24,(s1),v1
            nop
            vmv.x.s a1,v8
            vl8re64.v       v8,0(sp)     ---> reload
            vmv.x.s a3,v24
            vmv.x.s a2,v16
            vmv.x.s a0,v8
            add     s1,s1,s5
            call    sumation
            add     s3,s3,a0
            bgeu    s4,s1,.L5

    After this patch:

    .L5:
            add     a3,s0,s2
            add     a4,s6,s2
            add     a5,s7,s2
            vsetvli zero,s0,e64,m8,ta,ma
            vle8.v  v15,0(s2)
            vle8.v  v23,0(a3)
            mv      s2,s1
            vle8.v  v31,0(a4)
            vle8.v  v7,0(a5)
            vluxei8.v       v8,(s1),v15
            vluxei8.v       v16,(s1),v23
            vluxei8.v       v24,(s1),v31
            vluxei8.v       v0,(s1),v7
            vmv.x.s a3,v0
            vmv.x.s a2,v24
            vmv.x.s a1,v16
            vmv.x.s a0,v8
            add     s1,s1,s5
            call    sumation
            add     s3,s3,a0
            bgeu    s4,s1,.L5

            PR target/112431

    gcc/ChangeLog:

            * config/riscv/vector.md: Support highpart overlap for indexed
load.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/pr112431-28.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-29.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-30.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-31.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-32.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-33.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (9 preceding siblings ...)
  2023-12-01 12:09 ` cvs-commit at gcc dot gnu.org
@ 2023-12-01 12:09 ` cvs-commit at gcc dot gnu.org
  2023-12-04 10:45 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-01 12:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Lehua Ding <lhtin@gcc.gnu.org>:

https://gcc.gnu.org/g:a23415d7572774701d7ec04664390260ab9a3f63

commit r14-6055-ga23415d7572774701d7ec04664390260ab9a3f63
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Fri Dec 1 15:00:27 2023 +0800

    RISC-V: Support highpart register overlap for widen vx/vf instructions

    This patch leverages the same approach as vwcvt.

    Before this patch:

    .L5:
            add     a3,s0,s1
            add     a4,s6,s1
            add     a5,s7,s1
            vsetvli zero,s0,e32,m4,ta,ma
            vle32.v v16,0(s1)
            vle32.v v12,0(a3)
            mv      s1,s2
            vle32.v v8,0(a4)
            vle32.v v4,0(a5)
            nop
            vfwadd.vf       v24,v16,fs0
            vfwadd.vf       v16,v12,fs0
            vs8r.v  v16,0(sp)                -----> spill
            vfwadd.vf       v16,v8,fs0
            vfwadd.vf       v8,v4,fs0
            nop
            vsetvli zero,zero,e64,m8,ta,ma
            vfmv.f.s        fa4,v24
            vl8re64.v       v24,0(sp)       -----> reload
            vfmv.f.s        fa5,v24
            fcvt.lu.d a0,fa4,rtz
            fcvt.lu.d a1,fa5,rtz
            vfmv.f.s        fa4,v16
            vfmv.f.s        fa5,v8
            fcvt.lu.d a2,fa4,rtz
            fcvt.lu.d a3,fa5,rtz
            add     s2,s2,s5
            call    sumation
            add     s3,s3,a0
            bgeu    s4,s2,.L5

    After this patch:

    .L5:
            add     a3,s0,s1
            add     a4,s6,s1
            add     a5,s7,s1
            vsetvli zero,s0,e32,m4,ta,ma
            vle32.v v4,0(s1)
            vle32.v v28,0(a3)
            mv      s1,s2
            vle32.v v20,0(a4)
            vle32.v v12,0(a5)
            vfwadd.vf       v0,v4,fs0
            vfwadd.vf       v24,v28,fs0
            vfwadd.vf       v16,v20,fs0
            vfwadd.vf       v8,v12,fs0
            vsetvli zero,zero,e64,m8,ta,ma
            vfmv.f.s        fa4,v0
            vfmv.f.s        fa5,v24
            fcvt.lu.d a0,fa4,rtz
            fcvt.lu.d a1,fa5,rtz
            vfmv.f.s        fa4,v16
            vfmv.f.s        fa5,v8
            fcvt.lu.d a2,fa4,rtz
            fcvt.lu.d a3,fa5,rtz
            add     s2,s2,s5
            call    sumation
            add     s3,s3,a0
            bgeu    s4,s2,.L5

            PR target/112431

    gcc/ChangeLog:

            * config/riscv/vector.md: Support highpart overlap for vx/vf.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/pr112431-22.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-23.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-24.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-25.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-26.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-27.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (10 preceding siblings ...)
  2023-12-01 12:09 ` cvs-commit at gcc dot gnu.org
@ 2023-12-04 10:45 ` cvs-commit at gcc dot gnu.org
  2023-12-04 11:21 ` juzhe.zhong at rivai dot ai
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-04 10:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Lehua Ding <lhtin@gcc.gnu.org>:

https://gcc.gnu.org/g:7804b4e24cd16283067225d4c2c4a4483a2b31bc

commit r14-6113-g7804b4e24cd16283067225d4c2c4a4483a2b31bc
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Mon Dec 4 16:51:06 2023 +0800

    RISC-V: Remove earlyclobber from widen reduction

    Since the destination of reduction is not a vector register group, there
    is no need to apply overlap constraint.

    Also confirm Clang:

    The mir in LLVM has early clobber:
    early-clobber %49:vrm2 = PseudoVWADD_VX_M1 $noreg(tied-def 0), killed
%17:vr, %48:gpr, %0:gprnox0, 3, 0; example.c:59:24

    The mir in LLVM doesn't have early clobber:
    %48:vr = PseudoVWREDSUM_VS_M2_E8 $noreg(tied-def 0), %17:vrm2, killed
%33:vr, %0:gprnox0, 3, 1; example.c:60:26

    And also confirm both:

    vwredsum.vs     v24, v8, v24 and vwredsum.vs     v8, v8, v24 all legal on
LLVM.

    Align with LLVM and honor RISC-V V spec, remove earlyclobber.

    Before this patch:

            vwredsum.vs     v8,v24,v8
            vwredsum.vs     v7,v22,v7
            vwredsum.vs     v6,v20,v6
            vwredsum.vs     v5,v18,v5
            vwredsum.vs     v4,v16,v4
            vwredsum.vs     v3,v14,v3
            vwredsum.vs     v2,v12,v2
            vwredsum.vs     v1,v10,v1
            vmv1r.v v9,v8
            vwredsum.vs     v9,v24,v9
            vmv1r.v v24,v7
            vwredsum.vs     v24,v22,v24
            vmv1r.v v22,v6
            vwredsum.vs     v22,v20,v22
            vmv1r.v v20,v5
            vwredsum.vs     v20,v18,v20
            vmv1r.v v18,v4
            vwredsum.vs     v18,v16,v18
            vmv1r.v v16,v3
            vwredsum.vs     v16,v14,v16
            vmv1r.v v14,v2
            vwredsum.vs     v14,v12,v14
            vmv1r.v v12,v1
            vwredsum.vs     v12,v10,v12

    After this patch:

            vfwredusum.vs   v17,v12,v17
            vfwredusum.vs   v18,v10,v18
            vfwredusum.vs   v15,v26,v15
            vfwredusum.vs   v16,v24,v16
            vfwredusum.vs   v12,v12,v17
            vfwredusum.vs   v10,v10,v18
            vfwredusum.vs   v13,v6,v20
            vfwredusum.vs   v11,v8,v19
            vfwredusum.vs   v6,v6,v13
            vfwredusum.vs   v8,v8,v11
            vfwredusum.vs   v7,v4,v21
            vfwredusum.vs   v9,v2,v22
            vfwredusum.vs   v14,v26,v15
            vfwredusum.vs   v1,v24,v16
            vfwredusum.vs   v4,v4,v7
            vfwredusum.vs   v2,v2,v9

    Same behavior as LLVM, and honor RISC-V V spec.

            PR target/112431

    gcc/ChangeLog:

            * config/riscv/vector.md: Remove earlyclobber from widen reduction.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/pr112431-35.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-36.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (11 preceding siblings ...)
  2023-12-04 10:45 ` cvs-commit at gcc dot gnu.org
@ 2023-12-04 11:21 ` juzhe.zhong at rivai dot ai
  2023-12-04 13:36 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-04 11:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #12 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Except vv/wv variant widen instructions.
All other widen EEW overlap have been done.

It seems that current register filter can not help us simulate accurate
highest-number overlap for vwadd.vv/vwadd.wv instructions.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (12 preceding siblings ...)
  2023-12-04 11:21 ` juzhe.zhong at rivai dot ai
@ 2023-12-04 13:36 ` cvs-commit at gcc dot gnu.org
  2023-12-04 13:48 ` cvs-commit at gcc dot gnu.org
  2023-12-11  7:56 ` cvs-commit at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-04 13:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #13 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:27fde325d64447a3a0d5d550c5976e5f3fb6dc16

commit r14-6117-g27fde325d64447a3a0d5d550c5976e5f3fb6dc16
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Mon Dec 4 21:32:06 2023 +0800

    RISC-V: Support highest-number regno overlap for widen ternary

    Consider this example:

    #include "riscv_vector.h"
    void
    foo6 (void *in, void *out)
    {
      vfloat64m8_t accum = __riscv_vle64_v_f64m8 (in, 4);
      vfloat64m4_t high_eew64 = __riscv_vget_v_f64m8_f64m4 (accum, 1);
      vint64m4_t high_eew64_i = __riscv_vreinterpret_v_f64m4_i64m4
(high_eew64);
      vint32m4_t high_eew32_i = __riscv_vreinterpret_v_i64m4_i32m4
(high_eew64_i);
      vfloat32m4_t high_eew32 = __riscv_vreinterpret_v_i32m4_f32m4
(high_eew32_i);
      vfloat64m8_t result = __riscv_vfwnmsac_vf_f64m8 (accum, 64, high_eew32,
4);
      __riscv_vse64_v_f64m8 (out, result, 4);
    }

    Before this patch:

    foo6:                                   # @foo6
            vsetivli        zero, 4, e32, m4, ta, ma
            vle64.v v8, (a0)
            lui     a0, 272384
            fmv.w.x fa5, a0
            vmv8r.v v16, v8
            vfwnmsac.vf     v16, fa5, v12
            vse64.v v16, (a1)
            ret

    After this patch:

    foo6:
    .LFB5:
            .cfi_startproc
            lui     a5,%hi(.LC0)
            flw     fa5,%lo(.LC0)(a5)
            vsetivli        zero,4,e32,m4,ta,ma
            vle64.v v8,0(a0)
            vfwnmsac.vf     v8,fa5,v12
            vse64.v v8,0(a1)
            ret

            PR target/112431

    gcc/ChangeLog:

            * config/riscv/vector.md: Add highest-number overlap support.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/pr112431-37.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-38.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (13 preceding siblings ...)
  2023-12-04 13:36 ` cvs-commit at gcc dot gnu.org
@ 2023-12-04 13:48 ` cvs-commit at gcc dot gnu.org
  2023-12-11  7:56 ` cvs-commit at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-04 13:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #14 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:018ba3ac952bed4ae01344c060360f13f7cc084a

commit r14-6118-g018ba3ac952bed4ae01344c060360f13f7cc084a
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Mon Dec 4 21:44:56 2023 +0800

    RISC-V: Fix overlap group incorrect overlap on v0

    In serious high register pressure case (appended in this patch):

    We see vluxei8.v       v0,(s1),v1,v0.t which is not allowed.
    Since according to RVV ISA:

    +;; The destination vector register group for a masked vector instruction
cannot overlap the source mask register (v0),
    +;; unless the destination vector register is being written with a mask
value (e.g., compares) or the scalar result of a reduction.

    Such case doesn't have spillings, however, we expect such case should be
spilled and reload data.

    The rootcause is I made a mistake in previous patch on matching dest
operand and mask operand constraints:

    dest: "=vr"
    mask: "vmWc1"

    After this patch:

    dest: "vd,vr"
    mask: "vm,Wc1"

    make EEW widening pattern are same as other instruction patterns.

            PR target/112431

    gcc/ChangeLog:

            * config/riscv/vector.md: Fix incorrect overlap in v0.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/pr112431-34.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
  2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
                   ` (14 preceding siblings ...)
  2023-12-04 13:48 ` cvs-commit at gcc dot gnu.org
@ 2023-12-11  7:56 ` cvs-commit at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-11  7:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431

--- Comment #15 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:7e854b58084c131fceca9e8fa9dcc7469972e69d

commit r14-6400-g7e854b58084c131fceca9e8fa9dcc7469972e69d
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Sat Dec 9 12:06:29 2023 +0800

    RISC-V: Support highest overlap for wv instructions

    According to RVV ISA, we can allow vwadd.wv v2, v2, v3 overlap.

    Before this patch:

            nop
            vsetivli        zero,4,e8,m4,tu,ma
            vle16.v v8,0(a0)
            vmv8r.v v0,v8
            vwsub.wv        v0,v8,v12
            nop
            addi    a4,a0,100
            vle16.v v8,0(a4)
            vmv8r.v v24,v8
            vwsub.wv        v24,v8,v12
            nop
            addi    a4,a0,200
            vle16.v v8,0(a4)
            vmv8r.v v16,v8
            vwsub.wv        v16,v8,v12
            nop

    After this patch:

            nop
            vsetivli        zero,4,e8,m4,tu,ma
            vle16.v v0,0(a0)
            vwsub.wv        v0,v0,v4
            nop
            addi    a4,a0,100
            vle16.v v24,0(a4)
            vwsub.wv        v24,v24,v28
            nop
            addi    a4,a0,200
            vle16.v v16,0(a4)
            vwsub.wv        v16,v16,v20

            PR target/112431

    gcc/ChangeLog:

            * config/riscv/vector.md: Support highest overlap for wv
instructions.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/pr112431-39.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-40.c: New test.
            * gcc.target/riscv/rvv/base/pr112431-41.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-12-11  7:56 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-08  0:15 [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions juzhe.zhong at rivai dot ai
2023-11-08  0:16 ` [Bug c/112431] " juzhe.zhong at rivai dot ai
2023-11-08  0:16 ` juzhe.zhong at rivai dot ai
2023-11-08  1:56 ` kito at gcc dot gnu.org
2023-11-12 21:18 ` pinskia at gcc dot gnu.org
2023-11-29  9:37 ` [Bug target/112431] " cvs-commit at gcc dot gnu.org
2023-11-30  1:16 ` cvs-commit at gcc dot gnu.org
2023-11-30  2:40 ` cvs-commit at gcc dot gnu.org
2023-11-30 10:50 ` cvs-commit at gcc dot gnu.org
2023-11-30 12:11 ` cvs-commit at gcc dot gnu.org
2023-12-01 12:09 ` cvs-commit at gcc dot gnu.org
2023-12-01 12:09 ` cvs-commit at gcc dot gnu.org
2023-12-04 10:45 ` cvs-commit at gcc dot gnu.org
2023-12-04 11:21 ` juzhe.zhong at rivai dot ai
2023-12-04 13:36 ` cvs-commit at gcc dot gnu.org
2023-12-04 13:48 ` cvs-commit at gcc dot gnu.org
2023-12-11  7:56 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).