public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/111848] New: RISC-V: RVV cost model pick unexpected big LMUL
@ 2023-10-17 11:03 juzhe.zhong at rivai dot ai
  2023-10-17 11:07 ` [Bug c/111848] " juzhe.zhong at rivai dot ai
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-17 11:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848

            Bug ID: 111848
           Summary: RISC-V: RVV cost model pick unexpected big LMUL
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

#include <stdint.h>
void
f3 (uint8_t *restrict a, uint8_t *restrict b,
   uint8_t *restrict c, uint8_t *restrict d,
   int n)
{
  for (int i = 0; i < n; ++i)
    {
      a[i * 8] = c[i * 8] + d[i * 8];
      a[i * 8 + 1] = c[i * 8] + d[i * 8 + 1];
      a[i * 8 + 2] = c[i * 8 + 2] + d[i * 8 + 2];
      a[i * 8 + 3] = c[i * 8 + 2] + d[i * 8 + 3];
      a[i * 8 + 4] = c[i * 8 + 4] + d[i * 8 + 4];
      a[i * 8 + 5] = c[i * 8 + 4] + d[i * 8 + 5];
      a[i * 8 + 6] = c[i * 8 + 6] + d[i * 8 + 6];
      a[i * 8 + 7] = c[i * 8 + 6] + d[i * 8 + 7];
      b[i * 8] = c[i * 8 + 1] + d[i * 8];
      b[i * 8 + 1] = c[i * 8 + 1] + d[i * 8 + 1];
      b[i * 8 + 2] = c[i * 8 + 3] + d[i * 8 + 2];
      b[i * 8 + 3] = c[i * 8 + 3] + d[i * 8 + 3];
      b[i * 8 + 4] = c[i * 8 + 5] + d[i * 8 + 4];
      b[i * 8 + 5] = c[i * 8 + 5] + d[i * 8 + 5];
      b[i * 8 + 6] = c[i * 8 + 7] + d[i * 8 + 6];
      b[i * 8 + 7] = c[i * 8 + 7] + d[i * 8 + 7];
    }
}

This case pick LMUL = 8 which causes horrible vector register spillings.

After experiment, the ideal LMUL should be 2.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug c/111848] RISC-V: RVV cost model pick unexpected big LMUL
  2023-10-17 11:03 [Bug c/111848] New: RISC-V: RVV cost model pick unexpected big LMUL juzhe.zhong at rivai dot ai
@ 2023-10-17 11:07 ` juzhe.zhong at rivai dot ai
  2023-10-20  3:57 ` [Bug target/111848] " cvs-commit at gcc dot gnu.org
  2023-10-20  6:39 ` juzhe.zhong at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-17 11:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848

--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Sorry, it pick LMUL = 4:

f3:
        ble     a4,zero,.L11
        csrr    t0,vlenb
        slli    t1,t0,4
        csrr    a6,vlenb
        sub     sp,sp,t1
        csrr    a5,vlenb
        slli    a6,a6,3
        slli    a5,a5,2
        add     a6,a6,sp
        vsetvli a7,zero,e16,m8,ta,ma
        slli    a4,a4,3
        vid.v   v8
        addi    t6,a5,-1
        vand.vi v8,v8,-2
        neg     t5,a5
        vs8r.v  v8,0(sp)
        vadd.vi v8,v8,1
        vs8r.v  v8,0(a6)
        j       .L7
.L14:
        vsetvli a7,zero,e16,m8,ta,ma
.L7:
        csrr    t0,vlenb
        slli    t0,t0,3
        vl8re16.v       v16,0(sp)
        add     t0,t0,sp
        vmv.v.x v8,t6
        mv      t1,a4
        vand.vv v24,v16,v8
        mv      a6,a4
        vl8re16.v       v16,0(t0)
        vand.vv v8,v16,v8
        bleu    a4,a5,.L6
        mv      a6,a5
.L6:
        vsetvli zero,a6,e8,m4,ta,ma
        vle8.v  v20,0(a2)
        vle8.v  v16,0(a3)
        vsetvli a7,zero,e8,m4,ta,ma
        vrgatherei16.vv v4,v20,v24
        vadd.vv v4,v16,v4
        vsetvli zero,a6,e8,m4,ta,ma
        vse8.v  v4,0(a0)
        vle8.v  v20,0(a2)
        vsetvli a7,zero,e8,m4,ta,ma
        vrgatherei16.vv v4,v20,v8
        vadd.vv v4,v4,v16
        vsetvli zero,a6,e8,m4,ta,ma
        vse8.v  v4,0(a1)


Ideally LMUL should be 2:

f3:
        ble     a4,zero,.L9
        csrr    a5,vlenb
        slli    a5,a5,1
        vsetvli a7,zero,e16,m4,ta,ma
        slli    a4,a4,3
        vid.v   v12
        addi    t6,a5,-1
        vand.vi v12,v12,-2
        neg     t5,a5
        vadd.vi v16,v12,1
        j       .L7
.L10:
        vsetvli a7,zero,e16,m4,ta,ma
.L7:
        vmv.v.x v4,t6
        mv      t1,a4
        vand.vv v20,v12,v4
        mv      a6,a4
        vand.vv v4,v16,v4
        bleu    a4,a5,.L6
        mv      a6,a5
.L6:
        vsetvli zero,a6,e8,m2,ta,ma
        vle8.v  v10,0(a2)
        vle8.v  v8,0(a3)
        vsetvli a7,zero,e8,m2,ta,ma
        vrgatherei16.vv v2,v10,v20
        vadd.vv v2,v8,v2
        vsetvli zero,a6,e8,m2,ta,ma
        vse8.v  v2,0(a0)
        vle8.v  v10,0(a2)
        vsetvli a7,zero,e8,m2,ta,ma
        vrgatherei16.vv v2,v10,v4
        vadd.vv v2,v2,v8
        vsetvli zero,a6,e8,m2,ta,ma
        vse8.v  v2,0(a1)
        add     a4,a4,t5
        add     a0,a0,a5
        add     a3,a3,a5
        add     a1,a1,a5
        add     a2,a2,a5
        bgtu    t1,a5,.L10
.L9:
        ret

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/111848] RISC-V: RVV cost model pick unexpected big LMUL
  2023-10-17 11:03 [Bug c/111848] New: RISC-V: RVV cost model pick unexpected big LMUL juzhe.zhong at rivai dot ai
  2023-10-17 11:07 ` [Bug c/111848] " juzhe.zhong at rivai dot ai
@ 2023-10-20  3:57 ` cvs-commit at gcc dot gnu.org
  2023-10-20  6:39 ` juzhe.zhong at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-10-20  3:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Lehua Ding <lhtin@gcc.gnu.org>:

https://gcc.gnu.org/g:f0e28d8c13713f509fde26fbe7dd13280b67fb87

commit r14-4774-gf0e28d8c13713f509fde26fbe7dd13280b67fb87
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Wed Oct 18 18:25:33 2023 +0800

    RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

    Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR:
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848

    But it generate horrible register spillings.

    The root cause is that we didn't hoist the vmv.v.x outside the loop which
    increase the SLP loop register pressure.

    So, change the COSNT_VECTOR move into vec_duplicate splitter that we can
gain better optimizations:

    1. better LICM.
    2. More opportunities of transforming 'vv' into 'vx' in the future.

    Before this patch:

    f3:
            ble     a4,zero,.L8
            csrr    t0,vlenb
            slli    t1,t0,4
            csrr    a6,vlenb
            sub     sp,sp,t1
            csrr    a5,vlenb
            slli    a6,a6,3
            slli    a5,a5,2
            add     a6,a6,sp
            vsetvli a7,zero,e16,m8,ta,ma
            slli    a4,a4,3
            vid.v   v8
            addi    t6,a5,-1
            vand.vi v8,v8,-2
            neg     t5,a5
            vs8r.v  v8,0(sp)
            vadd.vi v8,v8,1
            vs8r.v  v8,0(a6)
            j       .L4
    .L12:
            vsetvli a7,zero,e16,m8,ta,ma
    .L4:
            csrr    t0,vlenb
            slli    t0,t0,3
            vl8re16.v       v16,0(sp)
            add     t0,t0,sp
            vmv.v.x v8,t6
            mv      t1,a4
            vand.vv v24,v16,v8
            mv      a6,a4
            vl8re16.v       v16,0(t0)
            vand.vv v8,v16,v8
            bleu    a4,a5,.L3
            mv      a6,a5
    .L3:
            vsetvli zero,a6,e8,m4,ta,ma
            vle8.v  v20,0(a2)
            vle8.v  v16,0(a3)
            vsetvli a7,zero,e8,m4,ta,ma
            vrgatherei16.vv v4,v20,v24
            vadd.vv v4,v16,v4
            vsetvli zero,a6,e8,m4,ta,ma
            vse8.v  v4,0(a0)
            vle8.v  v20,0(a2)
            vsetvli a7,zero,e8,m4,ta,ma
            vrgatherei16.vv v4,v20,v8
            vadd.vv v4,v4,v16
            vsetvli zero,a6,e8,m4,ta,ma
            vse8.v  v4,0(a1)
            add     a4,a4,t5
            add     a0,a0,a5
            add     a3,a3,a5
            add     a1,a1,a5
            add     a2,a2,a5
            bgtu    t1,a5,.L12
            csrr    t0,vlenb
            slli    t1,t0,4
            add     sp,sp,t1
            jr      ra
    .L8:
            ret

    After this patch:

    f3:
            ble     a4,zero,.L6
            csrr    a6,vlenb
            csrr    a5,vlenb
            slli    a6,a6,2
            slli    a5,a5,2
            addi    a6,a6,-1
            slli    a4,a4,3
            neg     t5,a5
            vsetvli t1,zero,e16,m8,ta,ma
            vmv.v.x v24,a6
            vid.v   v8
            vand.vi v8,v8,-2
            vadd.vi v16,v8,1
            vand.vv v8,v8,v24
            vand.vv v16,v16,v24
    .L4:
            mv      t1,a4
            mv      a6,a4
            bleu    a4,a5,.L3
            mv      a6,a5
    .L3:
            vsetvli zero,a6,e8,m4,ta,ma
            vle8.v  v28,0(a2)
            vle8.v  v24,0(a3)
            vsetvli a7,zero,e8,m4,ta,ma
            vrgatherei16.vv v4,v28,v8
            vadd.vv v4,v24,v4
            vsetvli zero,a6,e8,m4,ta,ma
            vse8.v  v4,0(a0)
            vle8.v  v28,0(a2)
            vsetvli a7,zero,e8,m4,ta,ma
            vrgatherei16.vv v4,v28,v16
            vadd.vv v4,v4,v24
            vsetvli zero,a6,e8,m4,ta,ma
            vse8.v  v4,0(a1)
            add     a4,a4,t5
            add     a0,a0,a5
            add     a3,a3,a5
            add     a1,a1,a5
            add     a2,a2,a5
            bgtu    t1,a5,.L4
    .L6:
            ret

    Note that this patch triggers multiple FAILs:
    FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
    FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
    FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
    FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
    FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
    FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
    FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
execution test
    FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
execution test
    FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
execution test
    FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
execution test
    FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
execution test
    FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
execution test
    FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
execution test
    FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
execution test

    They failed are all because of bugs on VSETVL PASS:

    10dd4:       0c707057                vsetvli zero,zero,e8,mf2,ta,ma
       10dd8:       5e06b8d7                vmv.v.i v17,13
       10ddc:       9ed030d7                vmv1r.v v1,v13
       10de0:       b21040d7                vncvt.x.x.w     v1,v1          
----> raise illegal instruction since we don't have SEW = 8 -> SEW = 4
narrowing.
       10de4:       5e0785d7                vmv.v.v v11,v15

    Confirm the recent VSETVL refactor patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633231.html fixed all of
them.

    So this patch should be committed after the VSETVL refactor patch.

            PR target/111848

    gcc/ChangeLog:

            * config/riscv/riscv-selftests.cc (run_const_vector_selftests):
Adapt selftest.
            * config/riscv/riscv-v.cc (expand_const_vector): Change it into
vec_duplicate splitter.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Adapt test.
            * gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: New test.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/111848] RISC-V: RVV cost model pick unexpected big LMUL
  2023-10-17 11:03 [Bug c/111848] New: RISC-V: RVV cost model pick unexpected big LMUL juzhe.zhong at rivai dot ai
  2023-10-17 11:07 ` [Bug c/111848] " juzhe.zhong at rivai dot ai
  2023-10-20  3:57 ` [Bug target/111848] " cvs-commit at gcc dot gnu.org
@ 2023-10-20  6:39 ` juzhe.zhong at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-20  6:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Fixed

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-20  6:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-17 11:03 [Bug c/111848] New: RISC-V: RVV cost model pick unexpected big LMUL juzhe.zhong at rivai dot ai
2023-10-17 11:07 ` [Bug c/111848] " juzhe.zhong at rivai dot ai
2023-10-20  3:57 ` [Bug target/111848] " cvs-commit at gcc dot gnu.org
2023-10-20  6:39 ` juzhe.zhong at rivai dot ai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).