[PATCH V4] RISC-V: Defer vsetvli insertion to later if possible [PR108270]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH V4] RISC-V: Defer vsetvli insertion to later if possible [PR108270]
@ 2023-04-21  9:19 juzhe.zhong
  2023-04-21  9:51 ` Kito Cheng
  0 siblings, 1 reply; 2+ messages in thread
From: juzhe.zhong @ 2023-04-21  9:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, Juzhe-Zhong

From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

Fix issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270.

Consider the following testcase:
void f (void * restrict in, void * restrict out, int l, int n, int m)
{
  for (int i = 0; i < l; i++){
    for (int j = 0; j < m; j++){
      for (int k = 0; k < n; k++)
        {
          vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
          __riscv_vse8_v_i8mf8 (out + i + j, v, 17);
        }
    }
  }
}

Compile option: -O3

Before this patch:
        mv      a7,a2
        mv      a6,a0
        mv      t1,a1
        mv      a2,a3
        vsetivli        zero,17,e8,mf8,ta,ma
        ble     a7,zero,.L1
        ble     a4,zero,.L1
        ble     a3,zero,.L1
...

After this patch:
        mv      a7,a2
        mv      a6,a0
        mv      t1,a1
        mv      a2,a3
        ble     a7,zero,.L1
        ble     a4,zero,.L1
        ble     a3,zero,.L1
        add     a1,a0,a4
        li      a0,0
        vsetivli        zero,17,e8,mf8,ta,ma
...

This issue is a missed optmization produced by Phase 3 global backward demand fusion instead of
LCM.

This patch is fixing poor placement of the vsetvl.

This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand info backward fusion and propogation) which
is I introduced into VSETVL PASS to enhance LCM && improve vsetvl instruction performance.

This patch is to supress the Phase 3 too aggressive backward fusion and propagation to the top of the function program
when there is no define instruction of AVL (AVL is 0 ~ 31 imm since vsetivli instruction allows imm value instead of reg).

You may want to ask why we need Phase 3 to the job.
Well, we have so many situations that pure LCM fails to optimize, here I can show you a simple case to demonstrate it:

void f (void * restrict in, void * restrict out, int n, int m, int cond)
{
  size_t vl = 101;
  for (size_t j = 0; j < m; j++){
    if (cond) {
      for (size_t i = 0; i < n; i++)
        {
          vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, vl);
          __riscv_vse8_v_i8mf8 (out + i, v, vl);
        }
    } else {
      for (size_t i = 0; i < n; i++)
        {
          vint32mf2_t v = __riscv_vle32_v_i32mf2 (in + i + j, vl);
          v = __riscv_vadd_vv_i32mf2 (v,v,vl);
          __riscv_vse32_v_i32mf2 (out + i, v, vl);
        }
    }
  }
}

You can see:
The first inner loop needs vsetvli e8 mf8 for vle+vse.
The second inner loop need vsetvli e32 mf2 for vle+vadd+vse.

If we don't have Phase 3 (Only handled by LCM (Phase 4)), we will end up with :

outerloop:
...
vsetvli e8mf8
inner loop 1:
....

vsetvli e32mf2
inner loop 2:
....

However, if we have Phase 3, Phase 3 is going to fuse the vsetvli e32 mf2 of inner loop 2 into vsetvli e8 mf8, then we will end up with this result after phase 3:

outerloop:
...
inner loop 1:
vsetvli e32mf2
....

inner loop 2:
vsetvli e32mf2
....

Then, this demand information after phase 3 will be well optimized after phase 4 (LCM), after Phase 4 result is:

vsetvli e32mf2
outerloop:
...
inner loop 1:
....

inner loop 2:
....

You can see this is the optimal codegen after current VSETVL PASS (Phase 3: Demand backward fusion and propagation + Phase 4: LCM ). This is a known issue when I start to implement VSETVL PASS.

        PR 108270

gcc/ChangeLog:

        * config/riscv/riscv-vsetvl.cc (vector_infos_manager::all_empty_predecessor_p): New function.
        (pass_vsetvl::backward_demand_fusion): Ditto.
        * config/riscv/riscv-vsetvl.h: Ditto.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt testcase.
        * gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Ditto.
        * gcc.target/riscv/rvv/vsetvl/pr108270.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc              | 23 +++++++++++++++++++
 gcc/config/riscv/riscv-vsetvl.h               |  2 ++
 .../riscv/rvv/vsetvl/imm_bb_prop-1.c          |  2 +-
 .../riscv/rvv/vsetvl/imm_conflict-3.c         |  4 ++--
 .../gcc.target/riscv/rvv/vsetvl/pr108270.c    | 19 +++++++++++++++
 5 files changed, 47 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 5f424221659..167e3c6145c 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2355,6 +2355,21 @@ vector_infos_manager::get_all_available_exprs (
   return available_list;
 }
 
+bool
+vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const
+{
+  hash_set<basic_block> pred_cfg_bbs = get_all_predecessors (cfg_bb);
+  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
+    {
+      const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index];
+      if (!pred_block_info.local_dem.valid_or_dirty_p ()
+	  && !pred_block_info.reaching_out.valid_or_dirty_p ())
+	continue;
+      return false;
+    }
+  return true;
+}
+
 bool
 vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const
 {
@@ -3138,6 +3153,14 @@ pass_vsetvl::backward_demand_fusion (void)
       if (!backward_propagate_worthwhile_p (cfg_bb, curr_block_info))
 	continue;
 
+      /* Fix PR108270:
+
+		bb 0 -> bb 1
+	 We don't need to backward fuse VL/VTYPE info from bb 1 to bb 0
+	 if bb 1 is not inside a loop and all predecessors of bb 0 are empty. */
+      if (m_vector_manager->all_empty_predecessor_p (cfg_bb))
+	continue;
+
       edge e;
       edge_iterator ei;
       /* Backward propagate to each predecessor.  */
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 237381f7026..eec03d35071 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -450,6 +450,8 @@ public:
   /* Return true if all expression set in bitmap are same ratio.  */
   bool all_same_ratio_p (sbitmap) const;
 
+  bool all_empty_predecessor_p (const basic_block) const;
+
   void release (void);
   void create_bitmap_vectors (void);
   void free_bitmap_vectors (void);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
index cd4ee7dd0d3..ed32a40f5e7 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
@@ -29,4 +29,4 @@ void f (int8_t * restrict in, int8_t * restrict out, int n, int cond)
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c
index 1f7c0f036a2..2fa29c01dbc 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c
@@ -20,7 +20,7 @@ void f (int8_t * restrict in, int8_t * restrict out, int n, int cond)
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c
new file mode 100644
index 00000000000..d2ae43bf263
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */
+
+#include "riscv_vector.h"
+
+void f (void * restrict in, void * restrict out, int l, int n, int m)
+{
+  for (int i = 0; i < l; i++){
+    for (int j = 0; j < m; j++){
+      for (int k = 0; k < n; k++)
+        {
+          vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
+          __riscv_vse8_v_i8mf8 (out + i + j, v, 17);
+        }
+    }
+  }
+}
+
+/* { dg-final { scan-assembler-not {mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+vsetivli} } } */
-- 
2.36.1


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH V4] RISC-V: Defer vsetvli insertion to later if possible [PR108270]
  2023-04-21  9:19 [PATCH V4] RISC-V: Defer vsetvli insertion to later if possible [PR108270] juzhe.zhong
@ 2023-04-21  9:51 ` Kito Cheng
  0 siblings, 0 replies; 2+ messages in thread
From: Kito Cheng @ 2023-04-21  9:51 UTC (permalink / raw)
  To: juzhe.zhong; +Cc: gcc-patches

Thanks, committed to trunk.

On Fri, Apr 21, 2023 at 5:19 PM <juzhe.zhong@rivai.ai> wrote:
>
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
>
> Fix issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270.
>
> Consider the following testcase:
> void f (void * restrict in, void * restrict out, int l, int n, int m)
> {
>   for (int i = 0; i < l; i++){
>     for (int j = 0; j < m; j++){
>       for (int k = 0; k < n; k++)
>         {
>           vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
>           __riscv_vse8_v_i8mf8 (out + i + j, v, 17);
>         }
>     }
>   }
> }
>
> Compile option: -O3
>
> Before this patch:
>         mv      a7,a2
>         mv      a6,a0
>         mv      t1,a1
>         mv      a2,a3
>         vsetivli        zero,17,e8,mf8,ta,ma
>         ble     a7,zero,.L1
>         ble     a4,zero,.L1
>         ble     a3,zero,.L1
> ...
>
> After this patch:
>         mv      a7,a2
>         mv      a6,a0
>         mv      t1,a1
>         mv      a2,a3
>         ble     a7,zero,.L1
>         ble     a4,zero,.L1
>         ble     a3,zero,.L1
>         add     a1,a0,a4
>         li      a0,0
>         vsetivli        zero,17,e8,mf8,ta,ma
> ...
>
> This issue is a missed optmization produced by Phase 3 global backward demand fusion instead of
> LCM.
>
> This patch is fixing poor placement of the vsetvl.
>
> This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand info backward fusion and propogation) which
> is I introduced into VSETVL PASS to enhance LCM && improve vsetvl instruction performance.
>
> This patch is to supress the Phase 3 too aggressive backward fusion and propagation to the top of the function program
> when there is no define instruction of AVL (AVL is 0 ~ 31 imm since vsetivli instruction allows imm value instead of reg).
>
> You may want to ask why we need Phase 3 to the job.
> Well, we have so many situations that pure LCM fails to optimize, here I can show you a simple case to demonstrate it:
>
> void f (void * restrict in, void * restrict out, int n, int m, int cond)
> {
>   size_t vl = 101;
>   for (size_t j = 0; j < m; j++){
>     if (cond) {
>       for (size_t i = 0; i < n; i++)
>         {
>           vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, vl);
>           __riscv_vse8_v_i8mf8 (out + i, v, vl);
>         }
>     } else {
>       for (size_t i = 0; i < n; i++)
>         {
>           vint32mf2_t v = __riscv_vle32_v_i32mf2 (in + i + j, vl);
>           v = __riscv_vadd_vv_i32mf2 (v,v,vl);
>           __riscv_vse32_v_i32mf2 (out + i, v, vl);
>         }
>     }
>   }
> }
>
> You can see:
> The first inner loop needs vsetvli e8 mf8 for vle+vse.
> The second inner loop need vsetvli e32 mf2 for vle+vadd+vse.
>
> If we don't have Phase 3 (Only handled by LCM (Phase 4)), we will end up with :
>
> outerloop:
> ...
> vsetvli e8mf8
> inner loop 1:
> ....
>
> vsetvli e32mf2
> inner loop 2:
> ....
>
> However, if we have Phase 3, Phase 3 is going to fuse the vsetvli e32 mf2 of inner loop 2 into vsetvli e8 mf8, then we will end up with this result after phase 3:
>
> outerloop:
> ...
> inner loop 1:
> vsetvli e32mf2
> ....
>
> inner loop 2:
> vsetvli e32mf2
> ....
>
> Then, this demand information after phase 3 will be well optimized after phase 4 (LCM), after Phase 4 result is:
>
> vsetvli e32mf2
> outerloop:
> ...
> inner loop 1:
> ....
>
> inner loop 2:
> ....
>
> You can see this is the optimal codegen after current VSETVL PASS (Phase 3: Demand backward fusion and propagation + Phase 4: LCM ). This is a known issue when I start to implement VSETVL PASS.
>
>         PR 108270
>
> gcc/ChangeLog:
>
>         * config/riscv/riscv-vsetvl.cc (vector_infos_manager::all_empty_predecessor_p): New function.
>         (pass_vsetvl::backward_demand_fusion): Ditto.
>         * config/riscv/riscv-vsetvl.h: Ditto.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt testcase.
>         * gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Ditto.
>         * gcc.target/riscv/rvv/vsetvl/pr108270.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc              | 23 +++++++++++++++++++
>  gcc/config/riscv/riscv-vsetvl.h               |  2 ++
>  .../riscv/rvv/vsetvl/imm_bb_prop-1.c          |  2 +-
>  .../riscv/rvv/vsetvl/imm_conflict-3.c         |  4 ++--
>  .../gcc.target/riscv/rvv/vsetvl/pr108270.c    | 19 +++++++++++++++
>  5 files changed, 47 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
> index 5f424221659..167e3c6145c 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -2355,6 +2355,21 @@ vector_infos_manager::get_all_available_exprs (
>    return available_list;
>  }
>
> +bool
> +vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const
> +{
> +  hash_set<basic_block> pred_cfg_bbs = get_all_predecessors (cfg_bb);
> +  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
> +    {
> +      const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index];
> +      if (!pred_block_info.local_dem.valid_or_dirty_p ()
> +         && !pred_block_info.reaching_out.valid_or_dirty_p ())
> +       continue;
> +      return false;
> +    }
> +  return true;
> +}
> +
>  bool
>  vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const
>  {
> @@ -3138,6 +3153,14 @@ pass_vsetvl::backward_demand_fusion (void)
>        if (!backward_propagate_worthwhile_p (cfg_bb, curr_block_info))
>         continue;
>
> +      /* Fix PR108270:
> +
> +               bb 0 -> bb 1
> +        We don't need to backward fuse VL/VTYPE info from bb 1 to bb 0
> +        if bb 1 is not inside a loop and all predecessors of bb 0 are empty. */
> +      if (m_vector_manager->all_empty_predecessor_p (cfg_bb))
> +       continue;
> +
>        edge e;
>        edge_iterator ei;
>        /* Backward propagate to each predecessor.  */
> diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
> index 237381f7026..eec03d35071 100644
> --- a/gcc/config/riscv/riscv-vsetvl.h
> +++ b/gcc/config/riscv/riscv-vsetvl.h
> @@ -450,6 +450,8 @@ public:
>    /* Return true if all expression set in bitmap are same ratio.  */
>    bool all_same_ratio_p (sbitmap) const;
>
> +  bool all_empty_predecessor_p (const basic_block) const;
> +
>    void release (void);
>    void create_bitmap_vectors (void);
>    void free_bitmap_vectors (void);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
> index cd4ee7dd0d3..ed32a40f5e7 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
> @@ -29,4 +29,4 @@ void f (int8_t * restrict in, int8_t * restrict out, int n, int cond)
>    }
>  }
>
> -/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c
> index 1f7c0f036a2..2fa29c01dbc 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c
> @@ -20,7 +20,7 @@ void f (int8_t * restrict in, int8_t * restrict out, int n, int cond)
>    }
>  }
>
> -/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
>  /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
> -/* { dg-final { scan-assembler-times {vsetivli} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
> +/* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
>  /* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c
> new file mode 100644
> index 00000000000..d2ae43bf263
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */
> +
> +#include "riscv_vector.h"
> +
> +void f (void * restrict in, void * restrict out, int l, int n, int m)
> +{
> +  for (int i = 0; i < l; i++){
> +    for (int j = 0; j < m; j++){
> +      for (int k = 0; k < n; k++)
> +        {
> +          vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
> +          __riscv_vse8_v_i8mf8 (out + i + j, v, 17);
> +        }
> +    }
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-not {mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+vsetivli} } } */
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-04-21  9:51 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-21  9:19 [PATCH V4] RISC-V: Defer vsetvli insertion to later if possible [PR108270] juzhe.zhong
2023-04-21  9:51 ` Kito Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).