[Bug c/111720] New: RISC-V: Ugly codegen in RVV

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/111720] New: RISC-V: Ugly codegen in RVV
@ 2023-10-07 22:28 juzhe.zhong at rivai dot ai
  2023-10-07 22:34 ` [Bug target/111720] " juzhe.zhong at rivai dot ai
                   ` (30 more replies)
  0 siblings, 31 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-07 22:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

            Bug ID: 111720
           Summary: RISC-V: Ugly codegen in RVV
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Reference: https://godbolt.org/z/YqW7Y5Yve

#include<riscv_vector.h>
vbool8_t fn() {

    uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};
    uint8_t m = 1;

    vuint8m1_t varr = __riscv_vle8_v_u8m1(arr, 32);
    vuint8m1_t vand_m = __riscv_vand_vx_u8m1(varr, m, 32);
    vbool8_t vmask = __riscv_vreinterpret_v_u8m1_b8(vand_m);

    return vmask;
}

GCC asm:

fn:
        lui     a5,%hi(.LANCHOR0)
        addi    sp,sp,-32
        vsetivli        zero,4,e64,m2,ta,ma
        addi    a5,a5,%lo(.LANCHOR0)
        li      a4,32
        vle64.v v2,0(a5)
        vse64.v v2,0(sp)
        vsetvli zero,a4,e8,m1,ta,ma
        vle8.v  v1,0(sp)
        vand.vi v1,v1,1
        vsetvli a5,zero,e8,m1,ta,ma
        vsm.v   v1,0(a0)
        addi    sp,sp,32
        jr      ra

LLVM ASM:

fn:                                     # @fn
.Lpcrel_hi0:
        auipc   a0, %pcrel_hi(.L__const.fn.arr)
        addi    a0, a0, %pcrel_lo(.Lpcrel_hi0)
        li      a1, 32
        vsetvli zero, a1, e8, m1, ta, ma
        vle8.v  v8, (a0)
        vand.vi v0, v8, 1
        ret
.L__const.fn.arr:
        .ascii 
"\001\002\007\001\003\004\005\003\001\000\001\002\004\004\t\t\001\002\007\001\003\004\005\003\001\000\001\002\004\004\t\t"

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
@ 2023-10-07 22:34 ` juzhe.zhong at rivai dot ai
  2023-10-07 22:36 ` pinskia at gcc dot gnu.org
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-07 22:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
The root cause is unnecessary VLS modes data movement:

(insn 10 9 11 2 (set (reg:V4DI 143)
        (mem/u/c:V4DI (reg:DI 142) [0  S32 A128])) "/app/example.c":4:13 1119
{*movv4di}
     (nil))
(insn 11 10 12 2 (set (mem/c:V4DI (reg:DI 141) [0  S32 A128])
        (reg:V4DI 143)) "/app/example.c":4:13 1119 {*movv4di}
     (nil))

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
  2023-10-07 22:34 ` [Bug target/111720] " juzhe.zhong at rivai dot ai
@ 2023-10-07 22:36 ` pinskia at gcc dot gnu.org
  2023-10-07 22:38 ` juzhe.zhong at rivai dot ai
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-07 22:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I noticed there is an ABI difference here.

GCC is returning via a store to a0:
        vsm.v   v1,0(a0)

While LLVM is returning via v0 .

Which one is correct?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
  2023-10-07 22:34 ` [Bug target/111720] " juzhe.zhong at rivai dot ai
  2023-10-07 22:36 ` pinskia at gcc dot gnu.org
@ 2023-10-07 22:38 ` juzhe.zhong at rivai dot ai
  2023-10-07 22:41 ` juzhe.zhong at rivai dot ai
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-07 22:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Andrew Pinski from comment #2)
> I noticed there is an ABI difference here.
> 
> GCC is returning via a store to a0:
>         vsm.v   v1,0(a0)
> 
> While LLVM is returning via v0 .
> 
> Which one is correct?

Both are correct. We have a experiment ABI doc.

GCC also support same ABI but need --param=riscv-vector-abi

Then GCC ASM:

fn:
        lui     a5,%hi(.LANCHOR0)
        addi    sp,sp,-32
        addi    a5,a5,%lo(.LANCHOR0)
        vsetivli        zero,4,e64,m2,ta,ma
        li      a4,32
        vle64.v v8,0(a5)
        vse64.v v8,0(sp)
        vsetvli zero,a4,e8,m1,ta,ma
        vle8.v  v0,0(sp)
        vand.vi v0,v0,1
        addi    sp,sp,32
        jr      ra

GCC also return via v0 with enabling ABI.


The root cause is unnecessary load/store:

        vle64.v v8,0(a5)
        vse64.v v8,0(sp)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (2 preceding siblings ...)
  2023-10-07 22:38 ` juzhe.zhong at rivai dot ai
@ 2023-10-07 22:41 ` juzhe.zhong at rivai dot ai
  2023-10-07 22:43 ` juzhe.zhong at rivai dot ai
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-07 22:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
I found this is not because VLS modes.

with --param=riscv-autovec-preference=fixed-vlmax

disabling VLS modes also see unnecessary load/store:

fn:
        lui     a5,%hi(.LANCHOR0)
        addi    sp,sp,-32
        addi    a5,a5,%lo(.LANCHOR0)
        vl2re64.v       v8,0(a5)   ----- ??? unnecessary
        li      a4,32
        vs2r.v  v8,0(sp)            ----- ??? unnecessary
        vsetvli zero,a4,e8,m1,ta,ma
        vle8.v  v0,0(sp)
        vand.vi v0,v0,1
        addi    sp,sp,32
        jr      ra

The optimized tree is reasonable, but after the "expand" stage, the redundant
load and store are produced.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (3 preceding siblings ...)
  2023-10-07 22:41 ` juzhe.zhong at rivai dot ai
@ 2023-10-07 22:43 ` juzhe.zhong at rivai dot ai
  2023-10-07 22:44 ` pinskia at gcc dot gnu.org
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-07 22:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #5 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Similar issue in GCC 13.2:

https://godbolt.org/z/axKc4qj47

fn:
        lui     a5,%hi(.LANCHOR0)
        addi    a5,a5,%lo(.LANCHOR0)
        ld      a1,0(a5)
        ld      a2,8(a5)
        ld      a3,16(a5)
        ld      a4,24(a5)
        addi    sp,sp,-32
        sd      a1,0(sp)
        sd      a2,8(sp)
        sd      a3,16(sp)
        sd      a4,24(sp)
        li      a5,32
        vsetvli zero,a5,e8,m1,ta,ma
        vle8.v  v24,0(sp)
        vand.vi v24,v24,1
        vs1r.v  v24,0(a0)
        addi    sp,sp,32
        jr      ra


Multiple ld/sd. It seems that we didn't allow natural constant mem pool ????

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (4 preceding siblings ...)
  2023-10-07 22:43 ` juzhe.zhong at rivai dot ai
@ 2023-10-07 22:44 ` pinskia at gcc dot gnu.org
  2023-10-07 22:44 ` pinskia at gcc dot gnu.org
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-07 22:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect if __riscv_vle8_v_u8m1 gets lowered into a load on the gimple level,
it might just work ...

But it gets expanded as:
(insn 14 13 0 (set (reg/v:RVVM1QI 134 [ varrD.56526 ])
        (if_then_else:RVVM1QI (unspec:RVVMF8BI [
                    (const_vector:RVVMF8BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 145)
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (mem:RVVM1QI (reg:DI 144) [0  S[16, 16] A8])
            (unspec:RVVM1QI [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "/app/example.c":7:23 -1
     (nil))

That seems complex.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (5 preceding siblings ...)
  2023-10-07 22:44 ` pinskia at gcc dot gnu.org
@ 2023-10-07 22:44 ` pinskia at gcc dot gnu.org
  2023-10-07 22:47 ` juzhe.zhong at rivai dot ai
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-07 22:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-10-07
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (6 preceding siblings ...)
  2023-10-07 22:44 ` pinskia at gcc dot gnu.org
@ 2023-10-07 22:47 ` juzhe.zhong at rivai dot ai
  2023-10-07 22:49 ` juzhe.zhong at rivai dot ai
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-07 22:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #8 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Andrew Pinski from comment #6)
> I suspect if __riscv_vle8_v_u8m1 gets lowered into a load on the gimple
> level, it might just work ...
> 
> But it gets expanded as:
> (insn 14 13 0 (set (reg/v:RVVM1QI 134 [ varrD.56526 ])
>         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
>                     (const_vector:RVVMF8BI repeat [
>                             (const_int 1 [0x1])
>                         ])
>                     (reg:DI 145)
>                     (const_int 2 [0x2]) repeated x2
>                     (const_int 0 [0])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (mem:RVVM1QI (reg:DI 144) [0  S[16, 16] A8])
>             (unspec:RVVM1QI [
>                     (reg:SI 0 zero)
>                 ] UNSPEC_VUNDEF))) "/app/example.c":7:23 -1
>      (nil))
> 
> That seems complex.

You mean the normal load MEM_REF in GCC ?

I don't think we can do that since this intrinsic is defined with mask, len,
else value,...etc.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (7 preceding siblings ...)
  2023-10-07 22:47 ` juzhe.zhong at rivai dot ai
@ 2023-10-07 22:49 ` juzhe.zhong at rivai dot ai
  2023-10-07 22:51 ` pinskia at gcc dot gnu.org
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-07 22:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #9 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Andrew Pinski from comment #7)
> .

Besides, if we remove the data initialization:


https://godbolt.org/z/qcjcP7s1c

#include<riscv_vector.h>
vuint8m1_t fn() {

    uint8_t arr[32];
    uint8_t m = 1;

    vuint8m1_t varr = __riscv_vle8_v_u8m1(arr, 32);
    vuint8m1_t vand_m = __riscv_vand_vx_u8m1(varr, m, 32);
    //vbool8_t vmask = __riscv_vreinterpret_v_u8m1_b8(vand_m);

    return vand_m;
}

The issue is gone:

fn:
        addi    sp,sp,-32
        li      a5,32
        vsetvli zero,a5,e8,m1,ta,ma
        vle8.v  v24,0(sp)
        vand.vi v24,v24,1
        vs1r.v  v24,0(a0)
        addi    sp,sp,32
        jr      ra

The codegen as good as LLVM.

I still think it is something like constant memory pool issue.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (8 preceding siblings ...)
  2023-10-07 22:49 ` juzhe.zhong at rivai dot ai
@ 2023-10-07 22:51 ` pinskia at gcc dot gnu.org
  2023-10-07 22:55 ` juzhe.zhong at rivai dot ai
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-07 22:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The issues is GCC does prop the load/store for arr into __riscv_vle8_v_u8m1
really.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (9 preceding siblings ...)
  2023-10-07 22:51 ` pinskia at gcc dot gnu.org
@ 2023-10-07 22:55 ` juzhe.zhong at rivai dot ai
  2023-10-07 23:09 ` juzhe.zhong at rivai dot ai
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-07 22:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #11 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Andrew Pinski from comment #10)
> The issues is GCC does prop the load/store for arr into __riscv_vle8_v_u8m1
> really.

Ok. Do you know why GCC prop load/store for arr into __riscv_vle8_v_u8m1?

Just because the __riscv_vle8_v_u8m1 pattern is complex?

I don't think we can simplify __riscv_vle8_v_u8m1 pattern since we tried to
fuse
all feature into a single pattern (A pattern includes multiple features become
complex) to reduce the building of insn-emit.cc and insn-opinit.cc

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (10 preceding siblings ...)
  2023-10-07 22:55 ` juzhe.zhong at rivai dot ai
@ 2023-10-07 23:09 ` juzhe.zhong at rivai dot ai
  2023-10-17  8:26 ` juzhe.zhong at rivai dot ai
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-07 23:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #12 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi, Andrew.

I have another try:

https://godbolt.org/z/heKxcMWsY

change the load into normal load of arr:
vuint8m1_t varr = *(vuint8m1_t*)arr;

Like you said,

The issue is gone (as good as LLVM):
fn:
        lui     a5,%hi(.LANCHOR0)
        addi    a5,a5,%lo(.LANCHOR0)
        li      a4,32
        vl1re8.v        v1,0(a5)
        vsetvli zero,a4,e8,m1,ta,ma
        vand.vi v1,v1,1
        vs1r.v  v1,0(a0)
        ret

It seems that GCC can only optimize the normal load ?

Do we have a chance to optimize such case (for an unknown load) ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (11 preceding siblings ...)
  2023-10-07 23:09 ` juzhe.zhong at rivai dot ai
@ 2023-10-17  8:26 ` juzhe.zhong at rivai dot ai
  2023-10-18  3:29 ` pan2.li at intel dot com
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-17  8:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #13 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Confirm ARM SVE has the same issue:

https://godbolt.org/z/TjcaM6xsP

#include<arm_sve.h>
void fn(uint8_t * __restrict out) {

    uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};
    uint8_t m = 1;

    svint8_t varr = *(svint8_t*)arr;
    *(svint8_t*)out = varr;
}

ARM GCC:

fn:
        adrp    x1, .LANCHOR0
        add     x1, x1, :lo12:.LANCHOR0
        sub     sp, sp, #32
        ptrue   p7.b, all
        ldp     q31, q30, [x1]   -----> redundant stack spillings.
        stp     q31, q30, [sp]   -----> redundant stack spillings.
        ld1b    z31.b, p7/z, [sp]
        st1b    z31.b, p7, [x0]
        add     sp, sp, 32
        ret

ARM clang:

fn:                                     // @fn
        ptrue   p0.b
        adrp    x8, .L__const.fn.arr
        add     x8, x8, :lo12:.L__const.fn.arr
        ld1b    { z0.b }, p0/z, [x8]
        st1b    { z0.b }, p0, [x0]
        ret

Hi, Richard. Could you comment this issue ?
Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (12 preceding siblings ...)
  2023-10-17  8:26 ` juzhe.zhong at rivai dot ai
@ 2023-10-18  3:29 ` pan2.li at intel dot com
  2023-10-19  2:07 ` juzhe.zhong at rivai dot ai
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: pan2.li at intel dot com @ 2023-10-18  3:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #14 from Li Pan <pan2.li at intel dot com> ---
Looks like option -fmerge-all-constants doesn't work for this case, as well as
RISC-V.

For RISC-V, the CLOBBER exists after tree gimple.

void test (vuint8m1_t *out) {

  uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};

  *out = *(vuint8m1_t *)arr;
}

void test (vuint8m1_t * out)
{
  uint8_t arr[32];

  try
    {
      arr =
"\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
      arr.0_1 = &arr;
      _2 = MEM[(vuint8m1_t *)arr.0_1];
      *out = _2;
    }
  finally
    {
      arr = {CLOBBER(eol)};
    }
}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (13 preceding siblings ...)
  2023-10-18  3:29 ` pan2.li at intel dot com
@ 2023-10-19  2:07 ` juzhe.zhong at rivai dot ai
  2023-10-19  6:37 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-19  2:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #15 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
After investigation:

I found it seems to be an issue to variable-length vector:

https://godbolt.org/z/6Wrjz9ofE

void fn (char * restrict out, int x)
{
  <bb 2> [local count: 1073741824]:
  MEM[(int8x16_t *)out_2(D)] = { 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9
};
  return;

}


void fn2 (char * restrict out, int x)
{
  svint8_t varr;
  char arr[32];

  <bb 2> [local count: 1073741824]:
  arr =
"\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
  varr_3 = MEM[(svint8_t *)&arr];
  MEM[(svint8_t *)out_4(D)] = varr_3;
  arr ={v} {CLOBBER(eol)};
  return;

}

If we use ARM NEON type, the gimple IR won't have CLOBBER. Then no stack
transferring.

fn:
        adrp    x1, .LC0
        ldr     q31, [x1, #:lo12:.LC0]
        str     q31, [x0]
        ret
fn2:
        adrp    x1, .LANCHOR0
        add     x1, x1, :lo12:.LANCHOR0
        sub     sp, sp, #32
        ptrue   p7.b, all
        ldp     q31, q30, [x1]
        stp     q31, q30, [sp]
        ld1b    z31.b, p7/z, [sp]
        st1b    z31.b, p7, [x0]
        add     sp, sp, 32
        ret

ARM SVE type will have CLOBBER in gimple IR then cause redundant stack
transferring in ASM.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (14 preceding siblings ...)
  2023-10-19  2:07 ` juzhe.zhong at rivai dot ai
@ 2023-10-19  6:37 ` rguenth at gcc dot gnu.org
  2023-10-19  7:45 ` juzhe.zhong at rivai dot ai
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-19  6:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that we cannot CSE a VLA typed "load" (whatever that is) to a
constnant.

    char arr[] = {1, 2, 7, 1, 3, 4, 5, 3, 1
    , 0, 1, 2, 4, 4, 9, 9, 1, 2, 7, 1, 3, 4, 5, 3, 
    1, 0, 1, 2, 4, 4, 9, 9};
    char m = 1;

    svint8_t varr = *(svint8_t*)arr;

we don't know what portion of 'arr' this accesses.  The relevant bit in
vn_reference_lookup_3 would be

  /* 3) Assignment from a constant.  We can use folds native encode/interpret
     routines to extract the assigned bits.  */
  else if (known_eq (ref->size, maxsize)
           && is_gimple_reg_type (vr->type)
           && !reverse_storage_order_for_component_p (vr->operands)
           && !contains_storage_order_barrier_p (vr->operands)
           && gimple_assign_single_p (def_stmt)
           && CHAR_BIT == 8
           && BITS_PER_UNIT == 8
           && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
           /* native_encode and native_decode operate on arrays of bytes
              and so fundamentally need a compile-time size and offset.  */
           && maxsize.is_constant (&maxsizei)
           && offset.is_constant (&offseti)
           && (is_gimple_min_invariant (gimple_assign_rhs1 (def_stmt))
               || (TREE_CODE (gimple_assign_rhs1 (def_stmt)) == SSA_NAME
                   && is_gimple_min_invariant (SSA_VAL (gimple_assign_rhs1
(def_

and we fail at maxsize.is_constant (&maxsizei), that's the actual size of
the load.  Maybe there's constraints that are target specific and not
encoded in poly-int that could be used here, but I don't really know.

So yes, pieces of the compiler are defensive about VLA accesses and
they probably have to be.

In particular this part of VN doesn't try to use undefinedness (the access
exceeds the size of 'arr') to limit things - but in the end we'd still
need to construct a VLA typed constant and I have no idea how to do that.

Maybe Richard has an idea.

Note this has nothing to do about whether we have a CLOBBER or not.  You
can "disable" those with -fstack-reuse=none and that doesn't make a
difference.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (15 preceding siblings ...)
  2023-10-19  6:37 ` rguenth at gcc dot gnu.org
@ 2023-10-19  7:45 ` juzhe.zhong at rivai dot ai
  2023-10-19 11:16 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-19  7:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #17 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Biener from comment #16)
> The issue is that we cannot CSE a VLA typed "load" (whatever that is) to a
> constnant.
> 
>     char arr[] = {1, 2, 7, 1, 3, 4, 5, 3, 1
>     , 0, 1, 2, 4, 4, 9, 9, 1, 2, 7, 1, 3, 4, 5, 3, 
>     1, 0, 1, 2, 4, 4, 9, 9};
>     char m = 1;
> 
>     svint8_t varr = *(svint8_t*)arr;
> 
> we don't know what portion of 'arr' this accesses.  The relevant bit in
> vn_reference_lookup_3 would be
> 
>   /* 3) Assignment from a constant.  We can use folds native encode/interpret
>      routines to extract the assigned bits.  */
>   else if (known_eq (ref->size, maxsize)
>            && is_gimple_reg_type (vr->type)
>            && !reverse_storage_order_for_component_p (vr->operands)
>            && !contains_storage_order_barrier_p (vr->operands)
>            && gimple_assign_single_p (def_stmt)
>            && CHAR_BIT == 8
>            && BITS_PER_UNIT == 8
>            && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
>            /* native_encode and native_decode operate on arrays of bytes
>               and so fundamentally need a compile-time size and offset.  */
>            && maxsize.is_constant (&maxsizei)
>            && offset.is_constant (&offseti)
>            && (is_gimple_min_invariant (gimple_assign_rhs1 (def_stmt))
>                || (TREE_CODE (gimple_assign_rhs1 (def_stmt)) == SSA_NAME
>                    && is_gimple_min_invariant (SSA_VAL (gimple_assign_rhs1
> (def_
> 
> and we fail at maxsize.is_constant (&maxsizei), that's the actual size of
> the load.  Maybe there's constraints that are target specific and not
> encoded in poly-int that could be used here, but I don't really know.
> 
> So yes, pieces of the compiler are defensive about VLA accesses and
> they probably have to be.
> 
> In particular this part of VN doesn't try to use undefinedness (the access
> exceeds the size of 'arr') to limit things - but in the end we'd still
> need to construct a VLA typed constant and I have no idea how to do that.
> 
> Maybe Richard has an idea.
> 
> Note this has nothing to do about whether we have a CLOBBER or not.  You
> can "disable" those with -fstack-reuse=none and that doesn't make a
> difference.

Thanks Richi.

But how about this case in RVV:

https://godbolt.org/z/sMYor3arP

Use --param=riscv-autovec-preference=fixed-vlmax will set the mode as a known
size.  This code still have redundant stack transferring load/store.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (16 preceding siblings ...)
  2023-10-19  7:45 ` juzhe.zhong at rivai dot ai
@ 2023-10-19 11:16 ` rguenth at gcc dot gnu.org
  2023-10-19 11:30 ` juzhe.zhong at rivai dot ai
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-19 11:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
With RVV you have intrinsic calls in GIMPLE so nothing to optimize:

vbool8_t fn ()
{
  vbool8_t vmask;
  vuint8m1_t vand_m;
  vuint8m1_t varr;
  uint8_t arr[32];

  <bb 2> [local count: 1073741824]:
  arr =
"\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
  varr_3 = __riscv_vle8_v_u8m1 (&arr, 32); [return slot optimization]
  vand_m_4 = __riscv_vand_vx_u8m1 (varr_3, 1, 32); [return slot optimization]
  vmask_5 = __riscv_vreinterpret_v_u8m1_b8 (vand_m_4); [return slot
optimization]
  <retval> = vmask_5;
  arr ={v} {CLOBBER(eol)};
  return <retval>;

and on RTL I see lots of UNSPECs, RTL opts cannot do anything with those.

This is what Andrew said already.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (17 preceding siblings ...)
  2023-10-19 11:16 ` rguenth at gcc dot gnu.org
@ 2023-10-19 11:30 ` juzhe.zhong at rivai dot ai
  2023-10-19 11:34 ` rguenther at suse dot de
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-19 11:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Biener from comment #18)
> With RVV you have intrinsic calls in GIMPLE so nothing to optimize:
> 
> vbool8_t fn ()
> {
>   vbool8_t vmask;
>   vuint8m1_t vand_m;
>   vuint8m1_t varr;
>   uint8_t arr[32];
> 
>   <bb 2> [local count: 1073741824]:
>   arr =
> "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
>   varr_3 = __riscv_vle8_v_u8m1 (&arr, 32); [return slot optimization]
>   vand_m_4 = __riscv_vand_vx_u8m1 (varr_3, 1, 32); [return slot optimization]
>   vmask_5 = __riscv_vreinterpret_v_u8m1_b8 (vand_m_4); [return slot
> optimization]
>   <retval> = vmask_5;
>   arr ={v} {CLOBBER(eol)};
>   return <retval>;
> 
> and on RTL I see lots of UNSPECs, RTL opts cannot do anything with those.
> 
> This is what Andrew said already.

Ok. I wonder why this issue is gone when I change it into:

arr as static

https://godbolt.org/z/Tdoshdfr6

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (18 preceding siblings ...)
  2023-10-19 11:30 ` juzhe.zhong at rivai dot ai
@ 2023-10-19 11:34 ` rguenther at suse dot de
  2023-10-19 11:58 ` juzhe.zhong at rivai dot ai
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: rguenther at suse dot de @ 2023-10-19 11:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #20 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> 
> --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> (In reply to Richard Biener from comment #18)
> > With RVV you have intrinsic calls in GIMPLE so nothing to optimize:
> > 
> > vbool8_t fn ()
> > {
> >   vbool8_t vmask;
> >   vuint8m1_t vand_m;
> >   vuint8m1_t varr;
> >   uint8_t arr[32];
> > 
> >   <bb 2> [local count: 1073741824]:
> >   arr =
> > "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> > \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> >   varr_3 = __riscv_vle8_v_u8m1 (&arr, 32); [return slot optimization]
> >   vand_m_4 = __riscv_vand_vx_u8m1 (varr_3, 1, 32); [return slot optimization]
> >   vmask_5 = __riscv_vreinterpret_v_u8m1_b8 (vand_m_4); [return slot
> > optimization]
> >   <retval> = vmask_5;
> >   arr ={v} {CLOBBER(eol)};
> >   return <retval>;
> > 
> > and on RTL I see lots of UNSPECs, RTL opts cannot do anything with those.
> > 
> > This is what Andrew said already.
> 
> Ok. I wonder why this issue is gone when I change it into:
> 
> arr as static
> 
> https://godbolt.org/z/Tdoshdfr6

Because the stacik initialization isn't required then.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (19 preceding siblings ...)
  2023-10-19 11:34 ` rguenther at suse dot de
@ 2023-10-19 11:58 ` juzhe.zhong at rivai dot ai
  2023-10-19 12:02 ` rguenther at suse dot de
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-19 11:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to rguenther@suse.de from comment #20)
> On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > 
> > --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > (In reply to Richard Biener from comment #18)
> > > With RVV you have intrinsic calls in GIMPLE so nothing to optimize:
> > > 
> > > vbool8_t fn ()
> > > {
> > >   vbool8_t vmask;
> > >   vuint8m1_t vand_m;
> > >   vuint8m1_t varr;
> > >   uint8_t arr[32];
> > > 
> > >   <bb 2> [local count: 1073741824]:
> > >   arr =
> > > "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> > > \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> > >   varr_3 = __riscv_vle8_v_u8m1 (&arr, 32); [return slot optimization]
> > >   vand_m_4 = __riscv_vand_vx_u8m1 (varr_3, 1, 32); [return slot optimization]
> > >   vmask_5 = __riscv_vreinterpret_v_u8m1_b8 (vand_m_4); [return slot
> > > optimization]
> > >   <retval> = vmask_5;
> > >   arr ={v} {CLOBBER(eol)};
> > >   return <retval>;
> > > 
> > > and on RTL I see lots of UNSPECs, RTL opts cannot do anything with those.
> > > 
> > > This is what Andrew said already.
> > 
> > Ok. I wonder why this issue is gone when I change it into:
> > 
> > arr as static
> > 
> > https://godbolt.org/z/Tdoshdfr6
> 
> Because the stacik initialization isn't required then.

I have experiment with a simplifed pattern:


(insn 14 13 15 2 (set (reg/v:RVVM1QI 134 [ varr ])
        (if_then_else:RVVM1QI (unspec:RVVMF8BI [
                    (const_vector:RVVMF8BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 143)
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (mem:RVVM1QI (reg:DI 142) [0  S[16, 16] A8])
            (const_vector:RVVM1QI repeat [
                    (const_int 0 [0])
                ]))) "rvv.c":5:23 1476 {*pred_movrvvm1qi}
     (nil))
(insn 15 14 16 2 (set (reg:DI 144)
        (const_int 32 [0x20])) "rvv.c":6:5 206 {*movdi_64bit}
     (nil))
(insn 16 15 0 2 (set (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])
        (if_then_else:RVVM1QI (unspec:RVVMF8BI [
                    (const_vector:RVVMF8BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 144)
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (reg/v:RVVM1QI 134 [ varr ])
            (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])))
"rvv.c":6:5 1592 {pred_storervvm1qi}
     (nil))

You can see there is only one UNSPEC now. Still has redundant stack
transferring.

Is it because the pattern too complicated?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (20 preceding siblings ...)
  2023-10-19 11:58 ` juzhe.zhong at rivai dot ai
@ 2023-10-19 12:02 ` rguenther at suse dot de
  2023-10-19 12:08 ` juzhe.zhong at rivai dot ai
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: rguenther at suse dot de @ 2023-10-19 12:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #22 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> 
> --- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> (In reply to rguenther@suse.de from comment #20)
> > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > 
> > > --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > (In reply to Richard Biener from comment #18)
> > > > With RVV you have intrinsic calls in GIMPLE so nothing to optimize:
> > > > 
> > > > vbool8_t fn ()
> > > > {
> > > >   vbool8_t vmask;
> > > >   vuint8m1_t vand_m;
> > > >   vuint8m1_t varr;
> > > >   uint8_t arr[32];
> > > > 
> > > >   <bb 2> [local count: 1073741824]:
> > > >   arr =
> > > > "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> > > > \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> > > >   varr_3 = __riscv_vle8_v_u8m1 (&arr, 32); [return slot optimization]
> > > >   vand_m_4 = __riscv_vand_vx_u8m1 (varr_3, 1, 32); [return slot optimization]
> > > >   vmask_5 = __riscv_vreinterpret_v_u8m1_b8 (vand_m_4); [return slot
> > > > optimization]
> > > >   <retval> = vmask_5;
> > > >   arr ={v} {CLOBBER(eol)};
> > > >   return <retval>;
> > > > 
> > > > and on RTL I see lots of UNSPECs, RTL opts cannot do anything with those.
> > > > 
> > > > This is what Andrew said already.
> > > 
> > > Ok. I wonder why this issue is gone when I change it into:
> > > 
> > > arr as static
> > > 
> > > https://godbolt.org/z/Tdoshdfr6
> > 
> > Because the stacik initialization isn't required then.
> 
> I have experiment with a simplifed pattern:
> 
> 
> (insn 14 13 15 2 (set (reg/v:RVVM1QI 134 [ varr ])
>         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
>                     (const_vector:RVVMF8BI repeat [
>                             (const_int 1 [0x1])
>                         ])
>                     (reg:DI 143)
>                     (const_int 2 [0x2]) repeated x2
>                     (const_int 0 [0])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (mem:RVVM1QI (reg:DI 142) [0  S[16, 16] A8])
>             (const_vector:RVVM1QI repeat [
>                     (const_int 0 [0])
>                 ]))) "rvv.c":5:23 1476 {*pred_movrvvm1qi}
>      (nil))
> (insn 15 14 16 2 (set (reg:DI 144)
>         (const_int 32 [0x20])) "rvv.c":6:5 206 {*movdi_64bit}
>      (nil))
> (insn 16 15 0 2 (set (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])
>         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
>                     (const_vector:RVVMF8BI repeat [
>                             (const_int 1 [0x1])
>                         ])
>                     (reg:DI 144)
>                     (const_int 0 [0])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (reg/v:RVVM1QI 134 [ varr ])
>             (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])))
> "rvv.c":6:5 1592 {pred_storervvm1qi}
>      (nil))
> 
> You can see there is only one UNSPEC now. Still has redundant stack
> transferring.
> 
> Is it because the pattern too complicated?

It's because it has an UNSPEC in it - that makes it have target
specific (unknown to the middle-end) behavior so nothing can
be optimized here.

Specifically passes likely refuse to replace MEM operands in
such a construct.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (21 preceding siblings ...)
  2023-10-19 12:02 ` rguenther at suse dot de
@ 2023-10-19 12:08 ` juzhe.zhong at rivai dot ai
  2023-10-19 12:20 ` rguenther at suse dot de
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-19 12:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #23 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to rguenther@suse.de from comment #22)
> On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > 
> > --- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > (In reply to rguenther@suse.de from comment #20)
> > > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > > 
> > > > --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > > (In reply to Richard Biener from comment #18)
> > > > > With RVV you have intrinsic calls in GIMPLE so nothing to optimize:
> > > > > 
> > > > > vbool8_t fn ()
> > > > > {
> > > > >   vbool8_t vmask;
> > > > >   vuint8m1_t vand_m;
> > > > >   vuint8m1_t varr;
> > > > >   uint8_t arr[32];
> > > > > 
> > > > >   <bb 2> [local count: 1073741824]:
> > > > >   arr =
> > > > > "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> > > > > \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> > > > >   varr_3 = __riscv_vle8_v_u8m1 (&arr, 32); [return slot optimization]
> > > > >   vand_m_4 = __riscv_vand_vx_u8m1 (varr_3, 1, 32); [return slot optimization]
> > > > >   vmask_5 = __riscv_vreinterpret_v_u8m1_b8 (vand_m_4); [return slot
> > > > > optimization]
> > > > >   <retval> = vmask_5;
> > > > >   arr ={v} {CLOBBER(eol)};
> > > > >   return <retval>;
> > > > > 
> > > > > and on RTL I see lots of UNSPECs, RTL opts cannot do anything with those.
> > > > > 
> > > > > This is what Andrew said already.
> > > > 
> > > > Ok. I wonder why this issue is gone when I change it into:
> > > > 
> > > > arr as static
> > > > 
> > > > https://godbolt.org/z/Tdoshdfr6
> > > 
> > > Because the stacik initialization isn't required then.
> > 
> > I have experiment with a simplifed pattern:
> > 
> > 
> > (insn 14 13 15 2 (set (reg/v:RVVM1QI 134 [ varr ])
> >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> >                     (const_vector:RVVMF8BI repeat [
> >                             (const_int 1 [0x1])
> >                         ])
> >                     (reg:DI 143)
> >                     (const_int 2 [0x2]) repeated x2
> >                     (const_int 0 [0])
> >                     (reg:SI 66 vl)
> >                     (reg:SI 67 vtype)
> >                 ] UNSPEC_VPREDICATE)
> >             (mem:RVVM1QI (reg:DI 142) [0  S[16, 16] A8])
> >             (const_vector:RVVM1QI repeat [
> >                     (const_int 0 [0])
> >                 ]))) "rvv.c":5:23 1476 {*pred_movrvvm1qi}
> >      (nil))
> > (insn 15 14 16 2 (set (reg:DI 144)
> >         (const_int 32 [0x20])) "rvv.c":6:5 206 {*movdi_64bit}
> >      (nil))
> > (insn 16 15 0 2 (set (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])
> >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> >                     (const_vector:RVVMF8BI repeat [
> >                             (const_int 1 [0x1])
> >                         ])
> >                     (reg:DI 144)
> >                     (const_int 0 [0])
> >                     (reg:SI 66 vl)
> >                     (reg:SI 67 vtype)
> >                 ] UNSPEC_VPREDICATE)
> >             (reg/v:RVVM1QI 134 [ varr ])
> >             (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])))
> > "rvv.c":6:5 1592 {pred_storervvm1qi}
> >      (nil))
> > 
> > You can see there is only one UNSPEC now. Still has redundant stack
> > transferring.
> > 
> > Is it because the pattern too complicated?
> 
> It's because it has an UNSPEC in it - that makes it have target
> specific (unknown to the middle-end) behavior so nothing can
> be optimized here.
> 
> Specifically passes likely refuse to replace MEM operands in
> such a construct.

I saw ARM SVE load/store intrinsic also have UNSPEC.
They don't have such issues.

https://godbolt.org/z/fsW6Ko93z

But their patterns are much simplier than RVV patterns. 

I am still trying find a way to optimize the RVV pattern for that.
However, it seems to be very diffcult since we are trying to merge each type
intrinsics into same single pattern to avoid explosion of the insn-ouput.cc
and insn-emit.cc

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (22 preceding siblings ...)
  2023-10-19 12:08 ` juzhe.zhong at rivai dot ai
@ 2023-10-19 12:20 ` rguenther at suse dot de
  2023-10-19 12:38 ` juzhe.zhong at rivai dot ai
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: rguenther at suse dot de @ 2023-10-19 12:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #24 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> 
> --- Comment #23 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> (In reply to rguenther@suse.de from comment #22)
> > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > 
> > > --- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > (In reply to rguenther@suse.de from comment #20)
> > > > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > > > 
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > > > 
> > > > > --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > > > (In reply to Richard Biener from comment #18)
> > > > > > With RVV you have intrinsic calls in GIMPLE so nothing to optimize:
> > > > > > 
> > > > > > vbool8_t fn ()
> > > > > > {
> > > > > >   vbool8_t vmask;
> > > > > >   vuint8m1_t vand_m;
> > > > > >   vuint8m1_t varr;
> > > > > >   uint8_t arr[32];
> > > > > > 
> > > > > >   <bb 2> [local count: 1073741824]:
> > > > > >   arr =
> > > > > > "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> > > > > > \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> > > > > >   varr_3 = __riscv_vle8_v_u8m1 (&arr, 32); [return slot optimization]
> > > > > >   vand_m_4 = __riscv_vand_vx_u8m1 (varr_3, 1, 32); [return slot optimization]
> > > > > >   vmask_5 = __riscv_vreinterpret_v_u8m1_b8 (vand_m_4); [return slot
> > > > > > optimization]
> > > > > >   <retval> = vmask_5;
> > > > > >   arr ={v} {CLOBBER(eol)};
> > > > > >   return <retval>;
> > > > > > 
> > > > > > and on RTL I see lots of UNSPECs, RTL opts cannot do anything with those.
> > > > > > 
> > > > > > This is what Andrew said already.
> > > > > 
> > > > > Ok. I wonder why this issue is gone when I change it into:
> > > > > 
> > > > > arr as static
> > > > > 
> > > > > https://godbolt.org/z/Tdoshdfr6
> > > > 
> > > > Because the stacik initialization isn't required then.
> > > 
> > > I have experiment with a simplifed pattern:
> > > 
> > > 
> > > (insn 14 13 15 2 (set (reg/v:RVVM1QI 134 [ varr ])
> > >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> > >                     (const_vector:RVVMF8BI repeat [
> > >                             (const_int 1 [0x1])
> > >                         ])
> > >                     (reg:DI 143)
> > >                     (const_int 2 [0x2]) repeated x2
> > >                     (const_int 0 [0])
> > >                     (reg:SI 66 vl)
> > >                     (reg:SI 67 vtype)
> > >                 ] UNSPEC_VPREDICATE)
> > >             (mem:RVVM1QI (reg:DI 142) [0  S[16, 16] A8])
> > >             (const_vector:RVVM1QI repeat [
> > >                     (const_int 0 [0])
> > >                 ]))) "rvv.c":5:23 1476 {*pred_movrvvm1qi}
> > >      (nil))
> > > (insn 15 14 16 2 (set (reg:DI 144)
> > >         (const_int 32 [0x20])) "rvv.c":6:5 206 {*movdi_64bit}
> > >      (nil))
> > > (insn 16 15 0 2 (set (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])
> > >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> > >                     (const_vector:RVVMF8BI repeat [
> > >                             (const_int 1 [0x1])
> > >                         ])
> > >                     (reg:DI 144)
> > >                     (const_int 0 [0])
> > >                     (reg:SI 66 vl)
> > >                     (reg:SI 67 vtype)
> > >                 ] UNSPEC_VPREDICATE)
> > >             (reg/v:RVVM1QI 134 [ varr ])
> > >             (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])))
> > > "rvv.c":6:5 1592 {pred_storervvm1qi}
> > >      (nil))
> > > 
> > > You can see there is only one UNSPEC now. Still has redundant stack
> > > transferring.
> > > 
> > > Is it because the pattern too complicated?
> > 
> > It's because it has an UNSPEC in it - that makes it have target
> > specific (unknown to the middle-end) behavior so nothing can
> > be optimized here.
> > 
> > Specifically passes likely refuse to replace MEM operands in
> > such a construct.
> 
> I saw ARM SVE load/store intrinsic also have UNSPEC.
> They don't have such issues.
> 
> https://godbolt.org/z/fsW6Ko93z
> 
> But their patterns are much simplier than RVV patterns. 
> 
> I am still trying find a way to optimize the RVV pattern for that.
> However, it seems to be very diffcult since we are trying to merge each type
> intrinsics into same single pattern to avoid explosion of the insn-ouput.cc
> and insn-emit.cc

They also expose the semantics to GIMPLE instead of keeping
builtin function calls:

void fn (svbool_t pg, uint8_t * out)
{
  svuint8_t varr;
  static uint8_t arr[32] = 
"\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";

  <bb 2> [local count: 1073741824]:
  varr_3 = .MASK_LOAD (&arr, 8B, pg_2(D));
  .MASK_STORE (out_4(D), 8B, pg_2(D), varr_3); [tail call]
  return;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (23 preceding siblings ...)
  2023-10-19 12:20 ` rguenther at suse dot de
@ 2023-10-19 12:38 ` juzhe.zhong at rivai dot ai
  2023-10-19 13:30 ` rguenther at suse dot de
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-19 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #25 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to rguenther@suse.de from comment #24)
> On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > 
> > --- Comment #23 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > (In reply to rguenther@suse.de from comment #22)
> > > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > > 
> > > > --- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > > (In reply to rguenther@suse.de from comment #20)
> > > > > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > > > > 
> > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > > > > 
> > > > > > --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > > > > (In reply to Richard Biener from comment #18)
> > > > > > > With RVV you have intrinsic calls in GIMPLE so nothing to optimize:
> > > > > > > 
> > > > > > > vbool8_t fn ()
> > > > > > > {
> > > > > > >   vbool8_t vmask;
> > > > > > >   vuint8m1_t vand_m;
> > > > > > >   vuint8m1_t varr;
> > > > > > >   uint8_t arr[32];
> > > > > > > 
> > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > >   arr =
> > > > > > > "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> > > > > > > \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> > > > > > >   varr_3 = __riscv_vle8_v_u8m1 (&arr, 32); [return slot optimization]
> > > > > > >   vand_m_4 = __riscv_vand_vx_u8m1 (varr_3, 1, 32); [return slot optimization]
> > > > > > >   vmask_5 = __riscv_vreinterpret_v_u8m1_b8 (vand_m_4); [return slot
> > > > > > > optimization]
> > > > > > >   <retval> = vmask_5;
> > > > > > >   arr ={v} {CLOBBER(eol)};
> > > > > > >   return <retval>;
> > > > > > > 
> > > > > > > and on RTL I see lots of UNSPECs, RTL opts cannot do anything with those.
> > > > > > > 
> > > > > > > This is what Andrew said already.
> > > > > > 
> > > > > > Ok. I wonder why this issue is gone when I change it into:
> > > > > > 
> > > > > > arr as static
> > > > > > 
> > > > > > https://godbolt.org/z/Tdoshdfr6
> > > > > 
> > > > > Because the stacik initialization isn't required then.
> > > > 
> > > > I have experiment with a simplifed pattern:
> > > > 
> > > > 
> > > > (insn 14 13 15 2 (set (reg/v:RVVM1QI 134 [ varr ])
> > > >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> > > >                     (const_vector:RVVMF8BI repeat [
> > > >                             (const_int 1 [0x1])
> > > >                         ])
> > > >                     (reg:DI 143)
> > > >                     (const_int 2 [0x2]) repeated x2
> > > >                     (const_int 0 [0])
> > > >                     (reg:SI 66 vl)
> > > >                     (reg:SI 67 vtype)
> > > >                 ] UNSPEC_VPREDICATE)
> > > >             (mem:RVVM1QI (reg:DI 142) [0  S[16, 16] A8])
> > > >             (const_vector:RVVM1QI repeat [
> > > >                     (const_int 0 [0])
> > > >                 ]))) "rvv.c":5:23 1476 {*pred_movrvvm1qi}
> > > >      (nil))
> > > > (insn 15 14 16 2 (set (reg:DI 144)
> > > >         (const_int 32 [0x20])) "rvv.c":6:5 206 {*movdi_64bit}
> > > >      (nil))
> > > > (insn 16 15 0 2 (set (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])
> > > >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> > > >                     (const_vector:RVVMF8BI repeat [
> > > >                             (const_int 1 [0x1])
> > > >                         ])
> > > >                     (reg:DI 144)
> > > >                     (const_int 0 [0])
> > > >                     (reg:SI 66 vl)
> > > >                     (reg:SI 67 vtype)
> > > >                 ] UNSPEC_VPREDICATE)
> > > >             (reg/v:RVVM1QI 134 [ varr ])
> > > >             (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])))
> > > > "rvv.c":6:5 1592 {pred_storervvm1qi}
> > > >      (nil))
> > > > 
> > > > You can see there is only one UNSPEC now. Still has redundant stack
> > > > transferring.
> > > > 
> > > > Is it because the pattern too complicated?
> > > 
> > > It's because it has an UNSPEC in it - that makes it have target
> > > specific (unknown to the middle-end) behavior so nothing can
> > > be optimized here.
> > > 
> > > Specifically passes likely refuse to replace MEM operands in
> > > such a construct.
> > 
> > I saw ARM SVE load/store intrinsic also have UNSPEC.
> > They don't have such issues.
> > 
> > https://godbolt.org/z/fsW6Ko93z
> > 
> > But their patterns are much simplier than RVV patterns. 
> > 
> > I am still trying find a way to optimize the RVV pattern for that.
> > However, it seems to be very diffcult since we are trying to merge each type
> > intrinsics into same single pattern to avoid explosion of the insn-ouput.cc
> > and insn-emit.cc
> 
> They also expose the semantics to GIMPLE instead of keeping
> builtin function calls:
> 
> void fn (svbool_t pg, uint8_t * out)
> {
>   svuint8_t varr;
>   static uint8_t arr[32] = 
> "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> 
>   <bb 2> [local count: 1073741824]:
>   varr_3 = .MASK_LOAD (&arr, 8B, pg_2(D));
>   .MASK_STORE (out_4(D), 8B, pg_2(D), varr_3); [tail call]
>   return;

Yeah. I noticed but the autovectorization patterns doesn't match RVV
intrinsics.
So I can't fold them into MASK_LEN_LOAD... since RVV intrinsics are more
complicated.

It seems that it's impossible that we can't fix it in middle-end.
Maybe we should add a RISC-V specific PASS to optimize it?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (24 preceding siblings ...)
  2023-10-19 12:38 ` juzhe.zhong at rivai dot ai
@ 2023-10-19 13:30 ` rguenther at suse dot de
  2023-11-01  7:33 ` pan2.li at intel dot com
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: rguenther at suse dot de @ 2023-10-19 13:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #26 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> 
> --- Comment #25 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> (In reply to rguenther@suse.de from comment #24)
> > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > 
> > > --- Comment #23 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > (In reply to rguenther@suse.de from comment #22)
> > > > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > > > 
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > > > 
> > > > > --- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > > > (In reply to rguenther@suse.de from comment #20)
> > > > > > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > > > > > 
> > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > > > > > 
> > > > > > > --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > > > > > (In reply to Richard Biener from comment #18)
> > > > > > > > With RVV you have intrinsic calls in GIMPLE so nothing to optimize:
> > > > > > > > 
> > > > > > > > vbool8_t fn ()
> > > > > > > > {
> > > > > > > >   vbool8_t vmask;
> > > > > > > >   vuint8m1_t vand_m;
> > > > > > > >   vuint8m1_t varr;
> > > > > > > >   uint8_t arr[32];
> > > > > > > > 
> > > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > > >   arr =
> > > > > > > > "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> > > > > > > > \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> > > > > > > >   varr_3 = __riscv_vle8_v_u8m1 (&arr, 32); [return slot optimization]
> > > > > > > >   vand_m_4 = __riscv_vand_vx_u8m1 (varr_3, 1, 32); [return slot optimization]
> > > > > > > >   vmask_5 = __riscv_vreinterpret_v_u8m1_b8 (vand_m_4); [return slot
> > > > > > > > optimization]
> > > > > > > >   <retval> = vmask_5;
> > > > > > > >   arr ={v} {CLOBBER(eol)};
> > > > > > > >   return <retval>;
> > > > > > > > 
> > > > > > > > and on RTL I see lots of UNSPECs, RTL opts cannot do anything with those.
> > > > > > > > 
> > > > > > > > This is what Andrew said already.
> > > > > > > 
> > > > > > > Ok. I wonder why this issue is gone when I change it into:
> > > > > > > 
> > > > > > > arr as static
> > > > > > > 
> > > > > > > https://godbolt.org/z/Tdoshdfr6
> > > > > > 
> > > > > > Because the stacik initialization isn't required then.
> > > > > 
> > > > > I have experiment with a simplifed pattern:
> > > > > 
> > > > > 
> > > > > (insn 14 13 15 2 (set (reg/v:RVVM1QI 134 [ varr ])
> > > > >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> > > > >                     (const_vector:RVVMF8BI repeat [
> > > > >                             (const_int 1 [0x1])
> > > > >                         ])
> > > > >                     (reg:DI 143)
> > > > >                     (const_int 2 [0x2]) repeated x2
> > > > >                     (const_int 0 [0])
> > > > >                     (reg:SI 66 vl)
> > > > >                     (reg:SI 67 vtype)
> > > > >                 ] UNSPEC_VPREDICATE)
> > > > >             (mem:RVVM1QI (reg:DI 142) [0  S[16, 16] A8])
> > > > >             (const_vector:RVVM1QI repeat [
> > > > >                     (const_int 0 [0])
> > > > >                 ]))) "rvv.c":5:23 1476 {*pred_movrvvm1qi}
> > > > >      (nil))
> > > > > (insn 15 14 16 2 (set (reg:DI 144)
> > > > >         (const_int 32 [0x20])) "rvv.c":6:5 206 {*movdi_64bit}
> > > > >      (nil))
> > > > > (insn 16 15 0 2 (set (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])
> > > > >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> > > > >                     (const_vector:RVVMF8BI repeat [
> > > > >                             (const_int 1 [0x1])
> > > > >                         ])
> > > > >                     (reg:DI 144)
> > > > >                     (const_int 0 [0])
> > > > >                     (reg:SI 66 vl)
> > > > >                     (reg:SI 67 vtype)
> > > > >                 ] UNSPEC_VPREDICATE)
> > > > >             (reg/v:RVVM1QI 134 [ varr ])
> > > > >             (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])))
> > > > > "rvv.c":6:5 1592 {pred_storervvm1qi}
> > > > >      (nil))
> > > > > 
> > > > > You can see there is only one UNSPEC now. Still has redundant stack
> > > > > transferring.
> > > > > 
> > > > > Is it because the pattern too complicated?
> > > > 
> > > > It's because it has an UNSPEC in it - that makes it have target
> > > > specific (unknown to the middle-end) behavior so nothing can
> > > > be optimized here.
> > > > 
> > > > Specifically passes likely refuse to replace MEM operands in
> > > > such a construct.
> > > 
> > > I saw ARM SVE load/store intrinsic also have UNSPEC.
> > > They don't have such issues.
> > > 
> > > https://godbolt.org/z/fsW6Ko93z
> > > 
> > > But their patterns are much simplier than RVV patterns. 
> > > 
> > > I am still trying find a way to optimize the RVV pattern for that.
> > > However, it seems to be very diffcult since we are trying to merge each type
> > > intrinsics into same single pattern to avoid explosion of the insn-ouput.cc
> > > and insn-emit.cc
> > 
> > They also expose the semantics to GIMPLE instead of keeping
> > builtin function calls:
> > 
> > void fn (svbool_t pg, uint8_t * out)
> > {
> >   svuint8_t varr;
> >   static uint8_t arr[32] = 
> > "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> > \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> > 
> >   <bb 2> [local count: 1073741824]:
> >   varr_3 = .MASK_LOAD (&arr, 8B, pg_2(D));
> >   .MASK_STORE (out_4(D), 8B, pg_2(D), varr_3); [tail call]
> >   return;
> 
> Yeah. I noticed but the autovectorization patterns doesn't match RVV
> intrinsics.
> So I can't fold them into MASK_LEN_LOAD... since RVV intrinsics are more
> complicated.
> 
> It seems that it's impossible that we can't fix it in middle-end.
> Maybe we should add a RISC-V specific PASS to optimize it?

You can look what combine tries to recognize, maybe it needs some
helper patterns.  Other than that, what's the issue with GIMPLE
and the intrinsics?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (25 preceding siblings ...)
  2023-10-19 13:30 ` rguenther at suse dot de
@ 2023-11-01  7:33 ` pan2.li at intel dot com
  2023-11-06 10:27 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: pan2.li at intel dot com @ 2023-11-01  7:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #27 from Li Pan <pan2.li at intel dot com> ---
Hi Richard and Juzhe.

I investigated this issue recently and noticed that it may be related to the
array size of the constant memory. Assume we have 2 functions as below.

vuint8m1_t fn_00000 () {
  uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};

  return __riscv_vle8_v_u8m1(arr, 32);
}

vuint8m2_t fn_11111 () {
  uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};

  return __riscv_vle8_v_u8m2(arr, 32);
}

The vuint8m1 will have stack variables but the vuint8m2 doesn't. Thus I guess
there may be some limitations when optimization. Finally, I located
extract_low_bits when get_stored_val in dse. Looks like it can only take care
of scalar mode if the nunits are not equal.

rtx extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src)
{
  ...
  if (!int_mode_for_mode (src_mode).exists (&src_int_mode)
      || !int_mode_for_mode (mode).exists (&int_mode))
    return NULL_RTX;
  ...
}

I try to allow the vector mode for the gen_lowpart here if and only if the size
of mode is not greater than src mode. It can eliminate the stack variables as
we expected up to a point for the above functions.

I tested RVV regression and looks good for now. But I would like to double
confirm with you that it is reasonable? Before we start to do more testing. ;).

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (26 preceding siblings ...)
  2023-11-01  7:33 ` pan2.li at intel dot com
@ 2023-11-06 10:27 ` rguenth at gcc dot gnu.org
  2023-11-06 10:44 ` juzhe.zhong at rivai dot ai
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-06 10:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #28 from Richard Biener <rguenth at gcc dot gnu.org> ---
I tried to look up the requirements of __riscv_vle8_v_u8m2 in the vector
intrinsic specs but besides listing all those intrinsics the spec doesn't
contain _any_ documentation?  The 2nd arg is named 'vl' which I interpret
as vector length so that's so difficult for this intrinsic?  Why isn't it
just even a plain load?  I read the specified 'vl' isn't exact but the
intrinsics are still strongly typed, so a VLA typed gimple load should match
here?  And there should be a way to constrain the implementation somehow
since 'arr' has limited size.  Is the implementation constrained to use a
vector length <= the specified 'vl'?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (27 preceding siblings ...)
  2023-11-06 10:27 ` rguenth at gcc dot gnu.org
@ 2023-11-06 10:44 ` juzhe.zhong at rivai dot ai
  2023-11-23  1:20 ` cvs-commit at gcc dot gnu.org
  2023-11-23  1:29 ` pan2.li at intel dot com
  30 siblings, 0 replies; 32+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-06 10:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #29 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Biener from comment #28)
> I tried to look up the requirements of __riscv_vle8_v_u8m2 in the vector
> intrinsic specs but besides listing all those intrinsics the spec doesn't
> contain _any_ documentation?  The 2nd arg is named 'vl' which I interpret
> as vector length so that's so difficult for this intrinsic?  Why isn't it
> just even a plain load?  I read the specified 'vl' isn't exact but the
> intrinsics are still strongly typed, so a VLA typed gimple load should match
> here?  And there should be a way to constrain the implementation somehow
> since 'arr' has limited size.  Is the implementation constrained to use a
> vector length <= the specified 'vl'?

Yes. 'vl' is vector length.
The thing is that multiple types of intrinsics:

__riscv_vle8_v_u8m2
__riscv_vle8_v_u8m2_tu
__riscv_vle8_v_u8m2_tumu
__riscv_vle8_v_u8m2_mu

all of them will update elements index < vl.
But for index >= vl elements, we have __riscv_vle8_v_u8m2 which don't care
those
value, so they can be any value.

Wheras __riscv_vle8_v_u8m2_tu need index >= vl to be original old value.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (28 preceding siblings ...)
  2023-11-06 10:44 ` juzhe.zhong at rivai dot ai
@ 2023-11-23  1:20 ` cvs-commit at gcc dot gnu.org
  2023-11-23  1:29 ` pan2.li at intel dot com
  30 siblings, 0 replies; 32+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-23  1:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #30 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:990769a343f090088f5025ad233f88824b2c6263

commit r14-5769-g990769a343f090088f5025ad233f88824b2c6263
Author: Pan Li <pan2.li@intel.com>
Date:   Mon Nov 13 11:22:37 2023 +0800

    DSE: Allow vector type for get_stored_val when read < store

    Update in v4:
    * Merge upstream and removed some independent changes.

    Update in v3:
    * Take known_le instead of known_lt for vector size.
    * Return NULL_RTX when gap is not equal 0 and not constant.

    Update in v2:
    * Move vector type support to get_stored_val.

    Original log:

    This patch would like to allow the vector mode in the
    get_stored_val in the DSE. It is valid for the read
    rtx if and only if the read bitsize is less than the
    stored bitsize.

    Given below example code with
    --param=riscv-autovec-preference=fixed-vlmax.

    vuint8m1_t test () {
      uint8_t arr[32] = {
        1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
        1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
      };

      return __riscv_vle8_v_u8m1(arr, 32);
    }

    Before this patch:
    test:
      lui     a5,%hi(.LANCHOR0)
      addi    sp,sp,-32
      addi    a5,a5,%lo(.LANCHOR0)
      li      a3,32
      vl2re64.v       v2,0(a5)
      vsetvli zero,a3,e8,m1,ta,ma
      vs2r.v  v2,0(sp)             <== Unnecessary store to stack
      vle8.v  v1,0(sp)             <== Ditto
      vs1r.v  v1,0(a0)
      addi    sp,sp,32
      jr      ra

    After this patch:
    test:
      lui     a5,%hi(.LANCHOR0)
      addi    a5,a5,%lo(.LANCHOR0)
      li      a4,32
      addi    sp,sp,-32
      vsetvli zero,a4,e8,m1,ta,ma
      vle8.v  v1,0(a5)
      vs1r.v  v1,0(a0)
      addi    sp,sp,32
      jr      ra

    Below tests are passed within this patch:
    * The risc-v regression test.
    * The x86 bootstrap and regression test.
    * The aarch64 regression test.

            PR target/111720

    gcc/ChangeLog:

            * dse.cc (get_stored_val): Allow vector mode if read size is
            less than or equal to stored size.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/base/pr111720-0.c: New test.
            * gcc.target/riscv/rvv/base/pr111720-1.c: New test.
            * gcc.target/riscv/rvv/base/pr111720-10.c: New test.
            * gcc.target/riscv/rvv/base/pr111720-2.c: New test.
            * gcc.target/riscv/rvv/base/pr111720-3.c: New test.
            * gcc.target/riscv/rvv/base/pr111720-4.c: New test.
            * gcc.target/riscv/rvv/base/pr111720-5.c: New test.
            * gcc.target/riscv/rvv/base/pr111720-6.c: New test.
            * gcc.target/riscv/rvv/base/pr111720-7.c: New test.
            * gcc.target/riscv/rvv/base/pr111720-8.c: New test.
            * gcc.target/riscv/rvv/base/pr111720-9.c: New test.

    Signed-off-by: Pan Li <pan2.li@intel.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug target/111720] RISC-V: Ugly codegen in RVV
  2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
                   ` (29 preceding siblings ...)
  2023-11-23  1:20 ` cvs-commit at gcc dot gnu.org
@ 2023-11-23  1:29 ` pan2.li at intel dot com
  30 siblings, 0 replies; 32+ messages in thread
From: pan2.li at intel dot com @ 2023-11-23  1:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #31 from Li Pan <pan2.li at intel dot com> ---
We still have some unnecessary code here, which is stack-related, will take
care of it in another PATCH.

    After this patch:
    test:
      lui     a5,%hi(.LANCHOR0)
      addi    a5,a5,%lo(.LANCHOR0)
      li      a4,32
      addi    sp,sp,-32                   <== unnecessary insn
      vsetvli zero,a4,e8,m1,ta,ma
      vle8.v  v1,0(a5)
      vs1r.v  v1,0(a0)
      addi    sp,sp,32                    <== unnecessary insn
      jr      ra

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-11-23  1:29 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-07 22:28 [Bug c/111720] New: RISC-V: Ugly codegen in RVV juzhe.zhong at rivai dot ai
2023-10-07 22:34 ` [Bug target/111720] " juzhe.zhong at rivai dot ai
2023-10-07 22:36 ` pinskia at gcc dot gnu.org
2023-10-07 22:38 ` juzhe.zhong at rivai dot ai
2023-10-07 22:41 ` juzhe.zhong at rivai dot ai
2023-10-07 22:43 ` juzhe.zhong at rivai dot ai
2023-10-07 22:44 ` pinskia at gcc dot gnu.org
2023-10-07 22:44 ` pinskia at gcc dot gnu.org
2023-10-07 22:47 ` juzhe.zhong at rivai dot ai
2023-10-07 22:49 ` juzhe.zhong at rivai dot ai
2023-10-07 22:51 ` pinskia at gcc dot gnu.org
2023-10-07 22:55 ` juzhe.zhong at rivai dot ai
2023-10-07 23:09 ` juzhe.zhong at rivai dot ai
2023-10-17  8:26 ` juzhe.zhong at rivai dot ai
2023-10-18  3:29 ` pan2.li at intel dot com
2023-10-19  2:07 ` juzhe.zhong at rivai dot ai
2023-10-19  6:37 ` rguenth at gcc dot gnu.org
2023-10-19  7:45 ` juzhe.zhong at rivai dot ai
2023-10-19 11:16 ` rguenth at gcc dot gnu.org
2023-10-19 11:30 ` juzhe.zhong at rivai dot ai
2023-10-19 11:34 ` rguenther at suse dot de
2023-10-19 11:58 ` juzhe.zhong at rivai dot ai
2023-10-19 12:02 ` rguenther at suse dot de
2023-10-19 12:08 ` juzhe.zhong at rivai dot ai
2023-10-19 12:20 ` rguenther at suse dot de
2023-10-19 12:38 ` juzhe.zhong at rivai dot ai
2023-10-19 13:30 ` rguenther at suse dot de
2023-11-01  7:33 ` pan2.li at intel dot com
2023-11-06 10:27 ` rguenth at gcc dot gnu.org
2023-11-06 10:44 ` juzhe.zhong at rivai dot ai
2023-11-23  1:20 ` cvs-commit at gcc dot gnu.org
2023-11-23  1:29 ` pan2.li at intel dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).