[Bug c/108270] New: un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/108270] New: un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate
@ 2023-01-03  1:51 juzhe.zhong at rivai dot ai
  2023-01-03  1:52 ` [Bug c/108270] " juzhe.zhong at rivai dot ai
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-01-03  1:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270

            Bug ID: 108270
           Summary: un-optimal vsetvl for multi-loop if avl is 0 ~ 31
                    immediate
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Consider this following case:

#include "riscv_vector.h"
void f1 (void * restrict in, void * restrict out, int l, int n, int m)
{
  for (int i = 0; i < l; i++){
    for (int j = 0; j < m; j++){
      for (int k = 0; k < n; k++)
        {
          vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
          __riscv_vse8_v_i8mf8 (out + i + j, v, 17);
        }
    }
  }
}

GCC ASM:
f1:
0:      mv      a7,a2
1:      mv      a6,a0
2:      mv      t1,a1
3:      mv      a2,a3
4:      vsetivli        zero,17,e8,mf8,ta,ma
5:      ble     a7,zero,.L1
6:      ble     a4,zero,.L1
7:      ble     a3,zero,.L1
8:      add     a1,a0,a4
9:      li      a0,0
10:.L4:
11:     add     a3,a6,a0
12:     add     a4,t1,a0
13:.L7:
14:     li      a5,0
15:.L5:
16:     vle8.v  v24,0(a3)
17:     addiw   a5,a5,1
18:     vse8.v  v24,0(a4)
19:     bne     a2,a5,.L5
20:     addi    a3,a3,1
21:     addi    a4,a4,1
22:     bne     a3,a1,.L7
23:     addi    a0,a0,1
24:     addi    a1,a1,1
25:     bne     a0,a7,.L4
26:.L1:
27:     ret

The vsetivli instruction is hoisted too early. The best location of vsetivli
should be any point from 8 to 9.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug c/108270] un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate
  2023-01-03  1:51 [Bug c/108270] New: un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate juzhe.zhong at rivai dot ai
@ 2023-01-03  1:52 ` juzhe.zhong at rivai dot ai
  2023-04-21  9:49 ` [Bug target/108270] " cvs-commit at gcc dot gnu.org
  2023-05-03 15:01 ` juzhe.zhong at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-01-03  1:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |juzhe.zhong at rivai dot ai

--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
This is a trivial issue. I will fix it later.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/108270] un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate
  2023-01-03  1:51 [Bug c/108270] New: un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate juzhe.zhong at rivai dot ai
  2023-01-03  1:52 ` [Bug c/108270] " juzhe.zhong at rivai dot ai
@ 2023-04-21  9:49 ` cvs-commit at gcc dot gnu.org
  2023-05-03 15:01 ` juzhe.zhong at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-04-21  9:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kito Cheng <kito@gcc.gnu.org>:

https://gcc.gnu.org/g:d06e9264b0192c2c77e07d7fb0fe090efcb510c0

commit r14-135-gd06e9264b0192c2c77e07d7fb0fe090efcb510c0
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Fri Apr 21 17:19:12 2023 +0800

    RISC-V: Defer vsetvli insertion to later if possible [PR108270]

    Fix issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270.

    Consider the following testcase:
    void f (void * restrict in, void * restrict out, int l, int n, int m)
    {
      for (int i = 0; i < l; i++){
        for (int j = 0; j < m; j++){
          for (int k = 0; k < n; k++)
            {
              vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
              __riscv_vse8_v_i8mf8 (out + i + j, v, 17);
            }
        }
      }
    }

    Compile option: -O3

    Before this patch:
            mv      a7,a2
            mv      a6,a0
            mv      t1,a1
            mv      a2,a3
            vsetivli        zero,17,e8,mf8,ta,ma
            ble     a7,zero,.L1
            ble     a4,zero,.L1
            ble     a3,zero,.L1
    ...

    After this patch:
            mv      a7,a2
            mv      a6,a0
            mv      t1,a1
            mv      a2,a3
            ble     a7,zero,.L1
            ble     a4,zero,.L1
            ble     a3,zero,.L1
            add     a1,a0,a4
            li      a0,0
            vsetivli        zero,17,e8,mf8,ta,ma
    ...

    This issue is a missed optmization produced by Phase 3 global backward
demand
    fusion instead of LCM.

    This patch is fixing poor placement of the vsetvl.

    This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand
info
    backward fusion and propogation) which
    is I introduced into VSETVL PASS to enhance LCM && improve vsetvl
instruction
    performance.

    This patch is to supress the Phase 3 too aggressive backward fusion and
    propagation to the top of the function program
    when there is no define instruction of AVL (AVL is 0 ~ 31 imm since
vsetivli
    instruction allows imm value instead of reg).

    You may want to ask why we need Phase 3 to the job.
    Well, we have so many situations that pure LCM fails to optimize, here I
can
    show you a simple case to demonstrate it:

    void f (void * restrict in, void * restrict out, int n, int m, int cond)
    {
      size_t vl = 101;
      for (size_t j = 0; j < m; j++){
        if (cond) {
          for (size_t i = 0; i < n; i++)
            {
              vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, vl);
              __riscv_vse8_v_i8mf8 (out + i, v, vl);
            }
        } else {
          for (size_t i = 0; i < n; i++)
            {
              vint32mf2_t v = __riscv_vle32_v_i32mf2 (in + i + j, vl);
              v = __riscv_vadd_vv_i32mf2 (v,v,vl);
              __riscv_vse32_v_i32mf2 (out + i, v, vl);
            }
        }
      }
    }

    You can see:
    The first inner loop needs vsetvli e8 mf8 for vle+vse.
    The second inner loop need vsetvli e32 mf2 for vle+vadd+vse.

    If we don't have Phase 3 (Only handled by LCM (Phase 4)), we will end up
with :

    outerloop:
    ...
    vsetvli e8mf8
    inner loop 1:
    ....

    vsetvli e32mf2
    inner loop 2:
    ....

    However, if we have Phase 3, Phase 3 is going to fuse the vsetvli e32 mf2
of
    inner loop 2 into vsetvli e8 mf8, then we will end up with this result
after
    phase 3:

    outerloop:
    ...
    inner loop 1:
    vsetvli e32mf2
    ....

    inner loop 2:
    vsetvli e32mf2
    ....

    Then, this demand information after phase 3 will be well optimized after
phase 4
    (LCM), after Phase 4 result is:

    vsetvli e32mf2
    outerloop:
    ...
    inner loop 1:
    ....

    inner loop 2:
    ....

    You can see this is the optimal codegen after current VSETVL PASS (Phase 3:
    Demand backward fusion and propagation + Phase 4: LCM ). This is a known
issue
     when I start to implement VSETVL PASS.

    gcc/ChangeLog:

            PR target/108270
            * config/riscv/riscv-vsetvl.cc
            (vector_infos_manager::all_empty_predecessor_p): New function.
            (pass_vsetvl::backward_demand_fusion): Ditto.
            * config/riscv/riscv-vsetvl.h: Ditto.

    gcc/testsuite/ChangeLog:

            PR target/108270
            * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt testcase.
            * gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Ditto.
            * gcc.target/riscv/rvv/vsetvl/pr108270.c: New test.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/108270] un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate
  2023-01-03  1:51 [Bug c/108270] New: un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate juzhe.zhong at rivai dot ai
  2023-01-03  1:52 ` [Bug c/108270] " juzhe.zhong at rivai dot ai
  2023-04-21  9:49 ` [Bug target/108270] " cvs-commit at gcc dot gnu.org
@ 2023-05-03 15:01 ` juzhe.zhong at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-05-03 15:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Fixed

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-03 15:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-03  1:51 [Bug c/108270] New: un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate juzhe.zhong at rivai dot ai
2023-01-03  1:52 ` [Bug c/108270] " juzhe.zhong at rivai dot ai
2023-04-21  9:49 ` [Bug target/108270] " cvs-commit at gcc dot gnu.org
2023-05-03 15:01 ` juzhe.zhong at rivai dot ai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).