[Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
@ 2023-12-22  9:04 juzhe.zhong at rivai dot ai
  2023-12-23  0:59 ` [Bug target/113112] " cvs-commit at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-22  9:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112

            Bug ID: 113112
           Summary: RISC-V: Dynamic LMUL feature stabilization for GCC-14
                    release
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Created attachment 56922
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56922&action=edit
dynamic LMUL fail case

Hi, as we known that we have supported dynamic LMUL feature but not stable.

As far as I known, we only have these 2 execution FAILs:
FAIL: gcc.dg/pr30957-1.c execution test
FAIL: gcc.dg/signbit-5.c execution test
in full coverage testing. And they are not the real FAIL.
Tests need to be adjusted.

And I have tested on K230 and other hardware, turns out we will have over 30%
performance improvement (compare with default LMUL = M1) for various benchmark
if we can select reasonable big
LMUL (no additional registers spillings).

However, I also find that there are some benchmarks have significantly
performance
drop (compare with default LMUL = M1) when using dynamic LMUL.
I am pretty sure because we pick the wrong big LMUL (LMUL>1) which causes
additional register spillings then we have bad performance for such situations.

For example:

#include <stdint-gcc.h>

#define N 40

int a[N];

__attribute__ ((noinline)) int
foo (int n){
  int i,j;
  int sum,x;

  for (i = 0; i < n; i++) {
    sum = 0;
    for (j = 0; j < n; j++) {
      sum += (i + j);
    }
    a[i] = sum;
  }
  return 0;
}

-march=rv64gcv -mabi=lp64d -O3 --param riscv-autovec-lmul=dynamic --param
riscv-autovec-preference=fixed-vlmax

ASM:

foo:
        ble     a0,zero,.L11
        lui     a2,%hi(.LANCHOR0)
        addi    sp,sp,-128
        addi    a2,a2,%lo(.LANCHOR0)
        mv      a1,a0
        vsetvli a6,zero,e32,m8,ta,ma
        vid.v   v8
        vs8r.v  v8,0(sp)
.L3:
        vl8re32.v       v16,0(sp)
        vsetvli a4,a1,e8,m2,ta,ma
        li      a3,0
        vsetvli a5,zero,e32,m8,ta,ma
        vmv8r.v v0,v16
        vmv.v.x v8,a4
        vmv.v.i v24,0
        vadd.vv v8,v16,v8
        vmv8r.v v16,v24
        vs8r.v  v8,0(sp)
.L4:
        addiw   a3,a3,1
        vadd.vv v8,v0,v16
        vadd.vi v16,v16,1
        vadd.vv v24,v24,v8
        bne     a0,a3,.L4
        vsetvli zero,a4,e32,m8,ta,ma
        sub     a1,a1,a4
        vse32.v v24,0(a2)
        slli    a4,a4,2
        add     a2,a2,a4
        bne     a1,zero,.L3
        li      a0,0
        addi    sp,sp,128
        jr      ra
.L11:
        li      a0,0
        ret

As we can see, pick up LMUL = 8 then spills.

This case is found by this following code I add into mov pattern:

      if (known_gt (GET_MODE_SIZE (mode), BYTES_PER_RISCV_VECTOR)
          && riscv_autovec_lmul == RVV_DYNAMIC && lra_in_progress)
        gcc_unreachable ();

The attachment is the file shows the cases that we pick up incorrect too big
LMUL which cause addiontial spillings.

I will work on this issue in the following days.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/113112] RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
  2023-12-22  9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
@ 2023-12-23  0:59 ` cvs-commit at gcc dot gnu.org
  2023-12-26  9:29 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-23  0:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112

--- Comment #1 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:290230034092898981488d0716ddae43bd36c09f

commit r14-6810-g290230034092898981488d0716ddae43bd36c09f
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Sat Dec 23 07:07:42 2023 +0800

    RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model
analysis

    Consider this following case:

    foo:
            ble     a0,zero,.L11
            lui     a2,%hi(.LANCHOR0)
            addi    sp,sp,-128
            addi    a2,a2,%lo(.LANCHOR0)
            mv      a1,a0
            vsetvli a6,zero,e32,m8,ta,ma
            vid.v   v8
            vs8r.v  v8,0(sp)                     ---> spill
    .L3:
            vl8re32.v       v16,0(sp)            ---> reload
            vsetvli a4,a1,e8,m2,ta,ma
            li      a3,0
            vsetvli a5,zero,e32,m8,ta,ma
            vmv8r.v v0,v16
            vmv.v.x v8,a4
            vmv.v.i v24,0
            vadd.vv v8,v16,v8
            vmv8r.v v16,v24
            vs8r.v  v8,0(sp)                    ---> spill
    .L4:
            addiw   a3,a3,1
            vadd.vv v8,v0,v16
            vadd.vi v16,v16,1
            vadd.vv v24,v24,v8
            bne     a0,a3,.L4
            vsetvli zero,a4,e32,m8,ta,ma
            sub     a1,a1,a4
            vse32.v v24,0(a2)
            slli    a4,a4,2
            add     a2,a2,a4
            bne     a1,zero,.L3
            li      a0,0
            addi    sp,sp,128
            jr      ra
    .L11:
            li      a0,0
            ret

    Pick unexpected LMUL = 8.

    The root cause is we didn't involve PHI initial value in the dynamic LMUL
calculation:

      # j_17 = PHI <j_11(9), 0(5)>                       ---> #
vect_vec_iv_.8_24 = PHI <_25(9), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }(5)>

    We didn't count { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } in consuming vector register but it
does allocate an vector register group for it.

    This patch fixes this missing count. Then after this patch we pick up
perfect LMUL (LMUL = M4)

    foo:
            ble     a0,zero,.L9
            lui     a4,%hi(.LANCHOR0)
            addi    a4,a4,%lo(.LANCHOR0)
            mv      a2,a0
            vsetivli        zero,16,e32,m4,ta,ma
            vid.v   v20
    .L3:
            vsetvli a3,a2,e8,m1,ta,ma
            li      a5,0
            vsetivli        zero,16,e32,m4,ta,ma
            vmv4r.v v16,v20
            vmv.v.i v12,0
            vmv.v.x v4,a3
            vmv4r.v v8,v12
            vadd.vv v20,v20,v4
    .L4:
            addiw   a5,a5,1
            vmv4r.v v4,v8
            vadd.vi v8,v8,1
            vadd.vv v4,v16,v4
            vadd.vv v12,v12,v4
            bne     a0,a5,.L4
            slli    a5,a3,2
            vsetvli zero,a3,e32,m4,ta,ma
            sub     a2,a2,a3
            vse32.v v12,0(a4)
            add     a4,a4,a5
            bne     a2,zero,.L3
    .L9:
            li      a0,0
            ret

    Tested on --with-arch=gcv no regression.

            PR target/113112

    gcc/ChangeLog:

            * config/riscv/riscv-vector-costs.cc (max_number_of_live_regs):
Refine dump information.
            (preferred_new_lmul_p): Make PHI initial value into live regs
calculation.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: New test.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/113112] RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
  2023-12-22  9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
  2023-12-23  0:59 ` [Bug target/113112] " cvs-commit at gcc dot gnu.org
@ 2023-12-26  9:29 ` cvs-commit at gcc dot gnu.org
  2023-12-27  9:19 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-26  9:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112

--- Comment #2 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:f83cfb8148bcf0876df76761a9a4545bc939667d

commit r14-6836-gf83cfb8148bcf0876df76761a9a4545bc939667d
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Tue Dec 26 16:42:27 2023 +0800

    RISC-V: Some minior tweak on dynamic LMUL cost model

    Tweak some codes of dynamic LMUL cost model to make computation more
predictable and accurate.

    Tested on both RV32 and RV64 no regression.

    Committed.

            PR target/113112

    gcc/ChangeLog:

            * config/riscv/riscv-vector-costs.cc (compute_estimated_lmul):
Tweak LMUL estimation.
            (has_unexpected_spills_p): Ditto.
            (costs::record_potential_unexpected_spills): Ditto.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: Add more
checks.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Ditto.
            * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-12.c: New test.
            * gcc.dg/vect/costmodel/riscv/rvv/pr113112-2.c: New test.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/113112] RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
  2023-12-22  9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
  2023-12-23  0:59 ` [Bug target/113112] " cvs-commit at gcc dot gnu.org
  2023-12-26  9:29 ` cvs-commit at gcc dot gnu.org
@ 2023-12-27  9:19 ` cvs-commit at gcc dot gnu.org
  2024-01-02  0:23 ` cvs-commit at gcc dot gnu.org
  2024-01-03  9:21 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-27  9:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112

--- Comment #3 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:c4ac073d4fc7474e29d085bbd10971138ee7478e

commit r14-6850-gc4ac073d4fc7474e29d085bbd10971138ee7478e
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Wed Dec 27 16:16:41 2023 +0800

    RISC-V: Make known NITERS loop be aware of dynamic lmul cost model liveness
information

    Consider this following case:

    int f[12][100];

    void bad1(int v1, int v2)
    {
      for (int r = 0; r < 100; r += 4)
        {
          int i = r + 1;
          f[0][r] = f[1][r] * (f[2][r]) - f[1][i] * (f[2][i]);
          f[0][i] = f[1][r] * (f[2][i]) + f[1][i] * (f[2][r]);
          f[0][r+2] = f[1][r+2] * (f[2][r+2]) - f[1][i+2] * (f[2][i+2]);
          f[0][i+2] = f[1][r+2] * (f[2][i+2]) + f[1][i+2] * (f[2][r+2]);
        }
    }

    Pick up LMUL = 8 VLS blindly:

            lui     a4,%hi(f)
            addi    a4,a4,%lo(f)
            addi    sp,sp,-592
            addi    a3,a4,800
            lui     a5,%hi(.LANCHOR0)
            vl8re32.v       v24,0(a3)
            addi    a5,a5,%lo(.LANCHOR0)
            addi    a1,a4,400
            addi    a3,sp,140
            vl8re32.v       v16,0(a1)
            vl4re16.v       v4,0(a5)
            addi    a7,a5,192
            vs4r.v  v4,0(a3)
            addi    t0,a5,64
            addi    a3,sp,336
            li      t2,32
            addi    a2,a5,128
            vsetvli a5,zero,e32,m8,ta,ma
            vrgatherei16.vv v8,v16,v4
            vmul.vv v8,v8,v24
            vl8re32.v       v0,0(a7)
            vs8r.v  v8,0(a3)
            vmsltu.vx       v8,v0,t2
            addi    a3,sp,12
            addi    t2,sp,204
            vsm.v   v8,0(t2)
            vl4re16.v       v4,0(t0)
            vl4re16.v       v0,0(a2)
            vs4r.v  v4,0(a3)
            addi    t0,sp,336
            vrgatherei16.vv v8,v24,v4
            addi    a3,sp,208
            vrgatherei16.vv v24,v16,v0
            vs4r.v  v0,0(a3)
            vmul.vv v8,v8,v24
            vlm.v   v0,0(t2)
            vl8re32.v       v24,0(t0)
            addi    a3,sp,208
            vsub.vv v16,v24,v8
            addi    t6,a4,528
            vadd.vv v8,v24,v8
            addi    t5,a4,928
            vmerge.vvm      v8,v8,v16,v0
            addi    t3,a4,128
            vs8r.v  v8,0(a4)
            addi    t4,a4,1056
            addi    t1,a4,656
            addi    a0,a4,256
            addi    a6,a4,1184
            addi    a1,a4,784
            addi    a7,a4,384
            addi    a4,sp,140
            vl4re16.v       v0,0(a3)
            vl8re32.v       v24,0(t6)
            vl4re16.v       v4,0(a4)
            vrgatherei16.vv v16,v24,v0
            addi    a3,sp,12
            vs8r.v  v16,0(t0)
            vl8re32.v       v8,0(t5)
            vrgatherei16.vv v16,v24,v4
            vl4re16.v       v4,0(a3)
            vrgatherei16.vv v24,v8,v4
            vmul.vv v16,v16,v8
            vl8re32.v       v8,0(t0)
            vmul.vv v8,v8,v24
            vsub.vv v24,v16,v8
            vlm.v   v0,0(t2)
            addi    a3,sp,208
            vadd.vv v8,v8,v16
            vl8re32.v       v16,0(t4)
            vmerge.vvm      v8,v8,v24,v0
            vrgatherei16.vv v24,v16,v4
            vs8r.v  v24,0(t0)
            vl4re16.v       v28,0(a3)
            addi    a3,sp,464
            vs8r.v  v8,0(t3)
            vl8re32.v       v8,0(t1)
            vrgatherei16.vv v0,v8,v28
            vs8r.v  v0,0(a3)
            addi    a3,sp,140
            vl4re16.v       v24,0(a3)
            addi    a3,sp,464
            vrgatherei16.vv v0,v8,v24
            vl8re32.v       v24,0(t0)
            vmv8r.v v8,v0
            vl8re32.v       v0,0(a3)
            vmul.vv v8,v8,v16
            vmul.vv v24,v24,v0
            vsub.vv v16,v8,v24
            vadd.vv v8,v8,v24
            vsetivli        zero,4,e32,m8,ta,ma
            vle32.v v24,0(a6)
            vsetvli a4,zero,e32,m8,ta,ma
            addi    a4,sp,12
            vlm.v   v0,0(t2)
            vmerge.vvm      v8,v8,v16,v0
            vl4re16.v       v16,0(a4)
            vrgatherei16.vv v0,v24,v16
            vsetivli        zero,4,e32,m8,ta,ma
            vs8r.v  v0,0(a4)
            addi    a4,sp,208
            vl4re16.v       v0,0(a4)
            vs8r.v  v8,0(a0)
            vle32.v v16,0(a1)
            vsetvli a5,zero,e32,m8,ta,ma
            vrgatherei16.vv v8,v16,v0
            vs8r.v  v8,0(a4)
            addi    a4,sp,140
            vl4re16.v       v4,0(a4)
            addi    a5,sp,12
            vrgatherei16.vv v8,v16,v4
            vl8re32.v       v0,0(a5)
            vsetivli        zero,4,e32,m8,ta,ma
            addi    a5,sp,208
            vmv8r.v v16,v8
            vl8re32.v       v8,0(a5)
            vmul.vv v24,v24,v16
            vmul.vv v8,v0,v8
            vsub.vv v16,v24,v8
            vadd.vv v8,v8,v24
            vsetvli a5,zero,e8,m2,ta,ma
            vlm.v   v0,0(t2)
            vsetivli        zero,4,e32,m8,ta,ma
            vmerge.vvm      v8,v8,v16,v0
            vse32.v v8,0(a7)
            addi    sp,sp,592
            jr      ra

    This patch makes loop with known NITERS be aware of liveness estimation,
after this patch, choosing LMUL = 4:

            lui     a5,%hi(f)
            addi    a5,a5,%lo(f)
            addi    a3,a5,400
            addi    a4,a5,800
            vsetivli        zero,8,e32,m2,ta,ma
            vlseg4e32.v     v16,(a3)
            vlseg4e32.v     v8,(a4)
            vmul.vv v2,v8,v16
            addi    a3,a5,528
            vmv.v.v v24,v10
            vnmsub.vv       v24,v18,v2
            addi    a4,a5,928
            vmul.vv v2,v12,v22
            vmul.vv v6,v8,v18
            vmv.v.v v30,v2
            vmacc.vv        v30,v14,v20
            vmv.v.v v26,v6
            vmacc.vv        v26,v10,v16
            vmul.vv v4,v12,v20
            vmv.v.v v28,v14
            vnmsub.vv       v28,v22,v4
            vsseg4e32.v     v24,(a5)
            vlseg4e32.v     v16,(a3)
            vlseg4e32.v     v8,(a4)
            vmul.vv v2,v8,v16
            addi    a6,a5,128
            vmv.v.v v24,v10
            vnmsub.vv       v24,v18,v2
            addi    a0,a5,656
            vmul.vv v2,v12,v22
            addi    a1,a5,1056
            vmv.v.v v30,v2
            vmacc.vv        v30,v14,v20
            vmul.vv v6,v8,v18
            vmul.vv v4,v12,v20
            vmv.v.v v26,v6
            vmacc.vv        v26,v10,v16
            vmv.v.v v28,v14
            vnmsub.vv       v28,v22,v4
            vsseg4e32.v     v24,(a6)
            vlseg4e32.v     v16,(a0)
            vlseg4e32.v     v8,(a1)
            vmul.vv v2,v8,v16
            addi    a2,a5,256
            vmv.v.v v24,v10
            vnmsub.vv       v24,v18,v2
            addi    a3,a5,784
            vmul.vv v2,v12,v22
            addi    a4,a5,1184
            vmv.v.v v30,v2
            vmacc.vv        v30,v14,v20
            vmul.vv v6,v8,v18
            vmul.vv v4,v12,v20
            vmv.v.v v26,v6
            vmacc.vv        v26,v10,v16
            vmv.v.v v28,v14
            vnmsub.vv       v28,v22,v4
            addi    a5,a5,384
            vsseg4e32.v     v24,(a2)
            vsetivli        zero,1,e32,m2,ta,ma
            vlseg4e32.v     v16,(a3)
            vlseg4e32.v     v8,(a4)
            vmul.vv v2,v16,v8
            vmul.vv v6,v18,v8
            vmv.v.v v24,v18
            vnmsub.vv       v24,v10,v2
            vmul.vv v4,v20,v12
            vmul.vv v2,v22,v12
            vmv.v.v v26,v6
            vmacc.vv        v26,v16,v10
            vmv.v.v v28,v22
            vnmsub.vv       v28,v14,v4
            vmv.v.v v30,v2
            vmacc.vv        v30,v20,v14
            vsseg4e32.v     v24,(a5)
            ret

    Tested on both RV32 and RV64 no regressions.

            PR target/113112

    gcc/ChangeLog:

            * config/riscv/riscv-vector-costs.cc (is_gimple_assign_or_call):
New function.
            (get_first_lane_point): Ditto.
            (get_last_lane_point): Ditto.
            (max_number_of_live_regs): Refine live point dump.
            (compute_estimated_lmul): Make unknown NITERS loop be aware of
liveness.
            (costs::better_main_loop_than_p): Ditto.
            * config/riscv/riscv-vector-costs.h (struct stmt_point): Add new
member.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c:
            * gcc.dg/vect/costmodel/riscv/rvv/pr113112-3.c: New test.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/113112] RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
  2023-12-22  9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
                   ` (2 preceding siblings ...)
  2023-12-27  9:19 ` cvs-commit at gcc dot gnu.org
@ 2024-01-02  0:23 ` cvs-commit at gcc dot gnu.org
  2024-01-03  9:21 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-02  0:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:9a29b00365a07745c4ba2ed2af374e7c732aaeb3

commit r14-6877-g9a29b00365a07745c4ba2ed2af374e7c732aaeb3
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Fri Dec 29 09:21:02 2023 +0800

    RISC-V: Count pointer type SSA into RVV regs liveness for dynamic LMUL cost
model

    This patch fixes the following choosing unexpected big LMUL which cause
register spillings.

    Before this patch, choosing LMUL = 4:

            addi    sp,sp,-160
            addiw   t1,a2,-1
            li      a5,7
            bleu    t1,a5,.L16
            vsetivli        zero,8,e64,m4,ta,ma
            vmv.v.x v4,a0
            vs4r.v  v4,0(sp)                        ---> spill to the stack.
            vmv.v.x v4,a1
            addi    a5,sp,64
            vs4r.v  v4,0(a5)                        ---> spill to the stack.

    The root cause is the following codes:

                      if (poly_int_tree_p (var)
                          || (is_gimple_val (var)
                             && !POINTER_TYPE_P (TREE_TYPE (var))))

    We count the variable as consuming a RVV reg group when it is not
POINTER_TYPE.

    It is right for load/store STMT for example:

    _1 = (MEM)*addr -->  addr won't be allocated an RVV vector group.

    However, we find it is not right for non-load/store STMT:

    _3 = _1 == x_8(D);

    _1 is pointer type too but we does allocate a RVV register group for it.

    So after this patch, we are choosing the perfect LMUL for the testcase in
this patch:

            ble     a2,zero,.L17
            addiw   a7,a2,-1
            li      a5,3
            bleu    a7,a5,.L15
            srliw   a5,a7,2
            slli    a6,a5,1
            add     a6,a6,a5
            lui     a5,%hi(replacements)
            addi    t1,a5,%lo(replacements)
            slli    a6,a6,5
            lui     t4,%hi(.LANCHOR0)
            lui     t3,%hi(.LANCHOR0+8)
            lui     a3,%hi(.LANCHOR0+16)
            lui     a4,%hi(.LC1)
            vsetivli        zero,4,e16,mf2,ta,ma
            addi    t4,t4,%lo(.LANCHOR0)
            addi    t3,t3,%lo(.LANCHOR0+8)
            addi    a3,a3,%lo(.LANCHOR0+16)
            addi    a4,a4,%lo(.LC1)
            add     a6,t1,a6
            addi    a5,a5,%lo(replacements)
            vle16.v v18,0(t4)
            vle16.v v17,0(t3)
            vle16.v v16,0(a3)
            vmsgeu.vi       v25,v18,4
            vadd.vi v24,v18,-4
            vmsgeu.vi       v23,v17,4
            vadd.vi v22,v17,-4
            vlm.v   v21,0(a4)
            vmsgeu.vi       v20,v16,4
            vadd.vi v19,v16,-4
            vsetvli zero,zero,e64,m2,ta,mu
            vmv.v.x v12,a0
            vmv.v.x v14,a1
    .L4:
            vlseg3e64.v     v6,(a5)
            vmseq.vv        v2,v6,v12
            vmseq.vv        v0,v8,v12
            vmsne.vv        v1,v8,v12
            vmand.mm        v1,v1,v2
            vmerge.vvm      v2,v8,v14,v0
            vmv1r.v v0,v1
            addi    a4,a5,24
            vmerge.vvm      v6,v6,v14,v0
            vmerge.vim      v2,v2,0,v0
            vrgatherei16.vv v4,v6,v18
            vmv1r.v v0,v25
            vrgatherei16.vv v4,v2,v24,v0.t
            vs1r.v  v4,0(a5)
            addi    a3,a5,48
            vmv1r.v v0,v21
            vmv2r.v v4,v2
            vcompress.vm    v4,v6,v0
            vs1r.v  v4,0(a4)
            vmv1r.v v0,v23
            addi    a4,a5,72
            vrgatherei16.vv v4,v6,v17
            vrgatherei16.vv v4,v2,v22,v0.t
            vs1r.v  v4,0(a3)
            vmv1r.v v0,v20
            vrgatherei16.vv v4,v6,v16
            addi    a5,a5,96
            vrgatherei16.vv v4,v2,v19,v0.t
            vs1r.v  v4,0(a4)
            bne     a6,a5,.L4

    No spillings, no "sp" register used.

    Tested on both RV32 and RV64, no regression.

    Ok for trunk ?

            PR target/113112

    gcc/ChangeLog:

            * config/riscv/riscv-vector-costs.cc (compute_nregs_for_mode): Fix
            pointer type liveness count.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c: New test.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/113112] RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
  2023-12-22  9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
                   ` (3 preceding siblings ...)
  2024-01-02  0:23 ` cvs-commit at gcc dot gnu.org
@ 2024-01-03  9:21 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-03  9:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:a43bd8255451227fc1cd3601b1f0265b21fafada

commit r14-6889-ga43bd8255451227fc1cd3601b1f0265b21fafada
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Tue Jan 2 11:37:43 2024 +0800

    RISC-V: Make liveness be aware of rgroup number of LENS[dynamic LMUL]

    This patch fixes the following situation:
    vl4re16.v       v12,0(a5)
    ...
    vl4re16.v       v16,0(a3)
    vs4r.v  v12,0(a5)
    ...
    vl4re16.v       v4,0(a0)
    vs4r.v  v16,0(a3)
    ...
    vsetvli a3,zero,e16,m4,ta,ma
    ...
    vmv.v.x v8,t6
    vmsgeu.vv       v2,v16,v8
    vsub.vv v16,v16,v8
    vs4r.v  v16,0(a5)
    ...
    vs4r.v  v4,0(a0)
    vmsgeu.vv       v1,v4,v8
    ...
    vsub.vv v4,v4,v8
    slli    a6,a4,2
    vs4r.v  v4,0(a5)
    ...
    vsub.vv v4,v12,v8
    vmsgeu.vv       v3,v12,v8
    vs4r.v  v4,0(a5)
    ...

    There are many spills which are 'vs4r.v'.  The root cause is that we don't
count
    vector REG liveness referencing the rgroup controls.

    _29 = _25->iatom[0]; is transformed into the following vect statement with
4 different loop_len (loop_len_74, loop_len_75, loop_len_76, loop_len_77).

      vect__29.11_78 = .MASK_LEN_LOAD (vectp_sb.9_72, 32B, { -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_74, 0);
      vect__29.12_80 = .MASK_LEN_LOAD (vectp_sb.9_79, 32B, { -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_75, 0);
      vect__29.13_82 = .MASK_LEN_LOAD (vectp_sb.9_81, 32B, { -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_76, 0);
      vect__29.14_84 = .MASK_LEN_LOAD (vectp_sb.9_83, 32B, { -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_77, 0);

    which are the LENS number (LOOP_VINFO_LENS (loop_vinfo).length ()).

    Count liveness according to LOOP_VINFO_LENS (loop_vinfo).length () to
compute liveness more accurately:

    vsetivli        zero,8,e16,m1,ta,ma
    vmsgeu.vi       v19,v14,8
    vadd.vi v18,v14,-8
    vmsgeu.vi       v17,v1,8
    vadd.vi v16,v1,-8
    vlm.v   v15,0(a5)
    ...

    Tested no regression, ok for trunk ?

            PR target/113112

    gcc/ChangeLog:

            * config/riscv/riscv-vector-costs.cc (compute_nregs_for_mode): Add
rgroup info.
            (max_number_of_live_regs): Ditto.
            (has_unexpected_spills_p): Ditto.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/costmodel/riscv/rvv/pr113112-5.c: New test.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-03  9:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-22  9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
2023-12-23  0:59 ` [Bug target/113112] " cvs-commit at gcc dot gnu.org
2023-12-26  9:29 ` cvs-commit at gcc dot gnu.org
2023-12-27  9:19 ` cvs-commit at gcc dot gnu.org
2024-01-02  0:23 ` cvs-commit at gcc dot gnu.org
2024-01-03  9:21 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).