[PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
@ 2023-10-24  3:32 Juzhe-Zhong
  2023-10-24  3:44 ` juzhe.zhong
  2023-10-24 15:26 ` Kito Cheng
  0 siblings, 2 replies; 13+ messages in thread
From: Juzhe-Zhong @ 2023-10-24  3:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, kito.cheng, jeffreyalaw, rdapp.gcc, Juzhe-Zhong

This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.

Consider a simple vector addition operation:

https://godbolt.org/z/7hfGfEjW3

void
foo (int *__restrict a,
     int *__restrict b,
     int *__restrict n)
{
  for (int i = 0; i < n; i++)
      a[i] = a[i] + b[i];
}

Optimized IR:

Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);    -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;                              -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)

We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:

vect__7.12_19 = vect__6.11_20 + vect__4.8_27;

GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):

ARM SVE:
   
.L3:
        ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
        ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
        add     z31.s, z31.s, z30.s            -> un-predicated add
        st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store

Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.

Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:

1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
   We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.

To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.

The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)

Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.

The reasons as follows:

1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
   turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
	 PASS become heavy and heavy again, then we will need to refactor it again in the future.
	 Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
	 fixes.

2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things,  I don't think we should fuse them into same PASS.

3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.

4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
	 We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate 
	 VSETVL PASS again which is already so complicated.) 

Here is an example to demonstrate more:

https://godbolt.org/z/bE86sv3q5

void foo2 (int *__restrict a,
          int *__restrict b,
          int *__restrict c,
          int *__restrict a2,
          int *__restrict b2,
          int *__restrict c2,
          int *__restrict a3,
          int *__restrict b3,
          int *__restrict c3,
          int *__restrict a4,
          int *__restrict b4,
          int *__restrict c4,
          int *__restrict a5,
          int *__restrict b5,
          int *__restrict c5,
          int n)
{
    for (int i = 0; i < n; i++){
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i]+ a[i];

      a[i] = a[i] + c[i];
      b5[i] = a[i] + c[i];
      a2[i] = a[i] + c2[i];
      a3[i] = a[i] + c3[i];
      a4[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i]+ a[i];
    }
}

1. Loop Body:

Before this patch:                                          After this patch:
  
	      vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli	a4,t1,e32,m1,ta,ma                                     
        vle32.v v2,0(a2)                                     vle32.v	v2,0(a2)
        vle32.v v4,0(a1)                                     vle32.v	v3,0(t2)
        vle32.v v1,0(t2)                                     vle32.v	v4,0(a1)
        vsetvli a7,zero,e32,m1,ta,ma                         vle32.v	v1,0(t0)
        vadd.vv v4,v2,v4                                     vadd.vv	v4,v2,v4
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv	v1,v3,v1
        vle32.v v3,0(s0)                                     vadd.vv	v1,v1,v4
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv	v1,v1,v4
        vadd.vv v1,v3,v1                                     vadd.vv	v1,v1,v4
        vadd.vv v1,v1,v4                                     vadd.vv	v1,v1,v2
        vadd.vv v1,v1,v4                                     vadd.vv	v2,v1,v2
        vadd.vv v1,v1,v4                                     vse32.v	v2,0(t5)
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv	v2,v2,v1
        vle32.v v4,0(a5)                                     vadd.vv	v2,v2,v1
        vsetvli a7,zero,e32,m1,ta,ma                         slli	a7,a4,2
        vadd.vv v1,v1,v2                                     vadd.vv	v3,v1,v3
        vadd.vv v2,v1,v2                                     vle32.v	v5,0(a5)
        vadd.vv v4,v1,v4                                     vle32.v	v6,0(t6)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v	v3,0(t3)
        vse32.v v2,0(t5)                                     vse32.v	v2,0(a0)
        vse32.v v4,0(a3)                                     vadd.vv	v3,v3,v1
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv	v2,v1,v5
        vadd.vv v3,v1,v3                                     vse32.v	v3,0(t4)
        vadd.vv v2,v2,v1                                     vadd.vv	v1,v1,v6
        vadd.vv v2,v2,v1                                     vse32.v	v2,0(a3)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v	v1,0(a6)
        vse32.v v2,0(a0)                                      
        vse32.v v3,0(t3)                                      
        vle32.v v2,0(t0)                                      
        vsetvli a7,zero,e32,m1,ta,ma                                      
        vadd.vv v3,v3,v1                                      
        vsetvli zero,a4,e32,m1,ta,ma                                      
        vse32.v v3,0(t4)                                      
        vsetvli a7,zero,e32,m1,ta,ma                                      
        slli    a7,a4,2                                      
        vadd.vv v1,v1,v2                                      
        sub     t1,t1,a4                                      
        vsetvli zero,a4,e32,m1,ta,ma                                      
        vse32.v v1,0(a6)                                      

It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.

2. Epilogue:
    Before this patch:                                          After this patch:

     .L5:                                                      .L5:                                           
        ld      s0,8(sp)                                         ret
        addi    sp,sp,16                                         
        jr      ra                                         

This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'

The final codegen after this patch:

foo2:
	lw	t1,56(sp)
	ld	t6,0(sp)
	ld	t3,8(sp)
	ld	t0,16(sp)
	ld	t2,24(sp)
	ld	t4,32(sp)
	ld	t5,40(sp)
	ble	t1,zero,.L5
.L3:
	vsetvli	a4,t1,e32,m1,ta,ma
	vle32.v	v2,0(a2)
	vle32.v	v3,0(t2)
	vle32.v	v4,0(a1)
	vle32.v	v1,0(t0)
	vadd.vv	v4,v2,v4
	vadd.vv	v1,v3,v1
	vadd.vv	v1,v1,v4
	vadd.vv	v1,v1,v4
	vadd.vv	v1,v1,v4
	vadd.vv	v1,v1,v2
	vadd.vv	v2,v1,v2
	vse32.v	v2,0(t5)
	vadd.vv	v2,v2,v1
	vadd.vv	v2,v2,v1
	slli	a7,a4,2
	vadd.vv	v3,v1,v3
	vle32.v	v5,0(a5)
	vle32.v	v6,0(t6)
	vse32.v	v3,0(t3)
	vse32.v	v2,0(a0)
	vadd.vv	v3,v3,v1
	vadd.vv	v2,v1,v5
	vse32.v	v3,0(t4)
	vadd.vv	v1,v1,v6
	vse32.v	v2,0(a3)
	vse32.v	v1,0(a6)
	sub	t1,t1,a4
	add	a1,a1,a7
	add	a2,a2,a7
	add	a5,a5,a7
	add	t6,t6,a7
	add	t0,t0,a7
	add	t2,t2,a7
	add	t5,t5,a7
	add	a3,a3,a7
	add	a6,a6,a7
	add	t3,t3,a7
	add	t4,t4,a7
	add	a0,a0,a7
	bne	t1,zero,.L3
.L5:
	ret

	PR target/111888

gcc/ChangeLog:

	* config.gcc: Add AVL propgatation PASS.
	* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
	* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
	(has_vtype_op): Export as global.
	(has_vl_op): Ditto.
	(tail_agnostic_p): Ditto.
	(validate_change_or_fail): Ditto.
	(vlmax_avl_type_p): Ditto.
	(vlmax_avl_p): Ditto.
	(get_sew): Ditto.
	(enum vlmul_type): Ditto.
	(const_vlmax_p): Ditto.
	* config/riscv/riscv-v.cc (has_vtype_op): Ditto.
	(has_vl_op): Ditto.
	(get_default_ta): Ditto.
	(tail_agnostic_p): Ditto.
	(validate_change_or_fail): Ditto.
	(vlmax_avl_type_p): Ditto.
	(vlmax_avl_p): Ditto.
	(get_sew): Ditto.
	(enum vlmul_type): Ditto.
	(get_vlmul): Ditto.
	* config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
	(has_vtype_op): Ditto.
	(has_vl_op): Ditto.
	(get_sew): Ditto.
	(get_vlmul): Ditto.
	(get_default_ta): Ditto.
	(tail_agnostic_p): Ditto.
	(validate_change_or_fail): Ditto.
	* config/riscv/t-riscv: Add AVL propagation PASS.
	* config/riscv/vector.md: Fix VLS modes attribute.
	* config/riscv/riscv-avlprop.cc: New file.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
	* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
	* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
	* gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
	* gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
	* gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.

---
 gcc/config.gcc                                |   2 +-
 gcc/config/riscv/riscv-avlprop.cc             | 350 ++++++++++++++++++
 gcc/config/riscv/riscv-passes.def             |   1 +
 gcc/config/riscv/riscv-protos.h               |  10 +
 gcc/config/riscv/riscv-v.cc                   |  84 ++++-
 gcc/config/riscv/riscv-vsetvl.cc              |  82 +---
 gcc/config/riscv/t-riscv                      |   6 +
 gcc/config/riscv/vector.md                    |   2 +-
 .../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +-
 .../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +-
 .../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +-
 .../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 -
 .../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
 .../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
 15 files changed, 514 insertions(+), 84 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-avlprop.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 606d3a8513e..efd53965c9a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -544,7 +544,7 @@ pru-*-*)
 riscv*)
 	cpu_type=riscv
 	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
-	extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
+	extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o riscv-avlprop.o"
 	extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
 	extra_objs="${extra_objs} thead.o"
 	d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-avlprop.cc b/gcc/config/riscv/riscv-avlprop.cc
new file mode 100644
index 00000000000..bf3becd8371
--- /dev/null
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -0,0 +1,350 @@
+/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2023-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or(at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions.
+   A standalone AVL propagation pass is designed because:
+
+     - Better code maintain:
+       Current LCM-based VSETVL pass is so complicated that codes
+       there will become even harder to maintain. A straight forward
+       AVL propagation PASS is much easier to maintain.
+
+     - Reduce scalar register pressure:
+       A type of AVL propagation is we propagate AVL from NON-VLMAX
+       instruction to VLMAX instruction.
+       Note: VLMAX instruction should be ignore tail elements (TA)
+       and the result should be used by the NON-VLMAX instruction.
+       This optimization is mostly for auto-vectorization codes:
+
+	  vsetvli r136, r137      --- SELECT_VL
+	  vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
+	  vadd.vv (use VLMAX)     --- PLUS_EXPR
+	  vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
+
+	NO AVL propation:
+
+	  vsetvli a5, a4, ta
+	  vle8.v v1
+	  vsetvli t0, zero, ta
+	  vadd.vv v2, v1, v1
+	  vse8.v v2
+
+	We can propagate the AVL to 'vadd.vv' since its result
+	is consumed by a 'vse8.v' which has AVL = a5 and its
+	tail elements are agnostic.
+
+       We DON'T do this optimization on VSETVL pass since it is a
+       post-RA pass that consumed 't0' already wheras a standalone
+       pre-RA AVL propagation pass allows us elide the consumption
+       of the pseudo register of 't0' then we can reduce scalar
+       register pressure.
+
+     - More AVL propagation opportunities:
+       A pre-RA pass is more flexible for AVL REG def-use chain,
+       thus we will get more potential AVL propagation as long as
+       it doesn't increase the scalar register pressure.
+*/
+
+#define IN_TARGET_CODE 1
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "target.h"
+#include "tree-pass.h"
+#include "df.h"
+#include "rtl-ssa.h"
+#include "cfgcleanup.h"
+#include "insn-attr.h"
+
+using namespace rtl_ssa;
+using namespace riscv_vector;
+
+/* The AVL propagation instructions and corresponding preferred AVL.
+   It will be updated during the analysis.  */
+static hash_map<insn_info *, rtx> *avlprops;
+
+const pass_data pass_data_avlprop = {
+  RTL_PASS,	 /* type */
+  "avlprop",	 /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE,	 /* tv_id */
+  0,		 /* properties_required */
+  0,		 /* properties_provided */
+  0,		 /* properties_destroyed */
+  0,		 /* todo_flags_start */
+  0,		 /* todo_flags_finish */
+};
+
+class pass_avlprop : public rtl_opt_pass
+{
+public:
+  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass (pass_data_avlprop, ctxt) {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) final override
+  {
+    return TARGET_VECTOR && optimize > 0;
+  }
+  virtual unsigned int execute (function *) final override;
+}; // class pass_avlprop
+
+static void
+avlprop_init (void)
+{
+  calculate_dominance_info (CDI_DOMINATORS);
+  df_analyze ();
+  crtl->ssa = new function_info (cfun);
+  avlprops = new hash_map<insn_info *, rtx>;
+}
+
+static void
+avlprop_done (void)
+{
+  free_dominance_info (CDI_DOMINATORS);
+  if (crtl->ssa->perform_pending_updates ())
+    cleanup_cfg (0);
+  delete crtl->ssa;
+  crtl->ssa = nullptr;
+  delete avlprops;
+  avlprops = NULL;
+}
+
+/* Helper function to get AVL operand.  */
+static rtx
+get_avl (insn_info *insn, bool avlprop_p)
+{
+  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
+      || get_attr_avl_type (insn->rtl ()) == VLS)
+    return NULL_RTX;
+  if (avlprop_p)
+    {
+      if (avlprops->get (insn))
+	return (*avlprops->get (insn));
+      else if (vlmax_avl_type_p (insn->rtl ()))
+	return RVV_VLMAX;
+    }
+  extract_insn_cached (insn->rtl ());
+  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
+}
+
+/* This is a straight forward pattern ALWAYS in paritial auto-vectorization:
+
+     VL = SELECT_AVL (AVL, ...)
+     V0 = MASK_LEN_LOAD (..., VL)
+     V1 = MASK_LEN_LOAD (..., VL)
+     V2 = V0 + V1 --- Missed LEN information.
+     MASK_LEN_STORE (..., V2, VL)
+
+   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1, dummy LEN)
+   because:
+
+     - Few code changes in Loop Vectorizer.
+     - Reuse the current clean flow of partial vectorization, That is, apply
+       predicate LEN or MASK into LOAD/STORE operations and other special
+       arithmetic operations (e.d. DIV), then do the whole vector register
+       operation if it DON'T affect the correctness.
+       Such flow is used by all other targets like x86, sve, s390, ... etc.
+     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
+
+   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like PLUS_EXPR which
+   generates the VLMAX instruction due to missed LEN information. The later
+   VSETVL PASS will elided the redundant vsetvls.
+*/
+
+static rtx
+get_autovectorize_preferred_avl (insn_info *insn)
+{
+  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p (insn->rtl ()))
+    return NULL_RTX;
+
+  rtx use_avl = NULL_RTX;
+  insn_info *avl_use_insn = nullptr;
+  unsigned int ratio
+    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul (insn->rtl ()));
+  for (def_info *def : insn->defs ())
+    {
+      auto set = safe_dyn_cast<set_info *> (def);
+      if (!set || !set->is_reg ())
+	return NULL_RTX;
+      for (use_info *use : set->all_uses ())
+	{
+	  if (!use->is_in_nondebug_insn ())
+	    return NULL_RTX;
+	  insn_info *use_insn = use->insn ();
+	  /* FIXME: Stop AVL propagation if any USE is not a RVV real
+	     instruction. It should be totally enough for vectorized codes since
+	     they always locate at extended blocks.
+
+	     TODO: We can extend PHI checking for intrinsic codes if it
+	     necessary in the future.  */
+	  if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl ()))
+	    return NULL_RTX;
+	  if (!has_vl_op (use_insn->rtl ()))
+	    continue;
+
+	  rtx new_use_avl = get_avl (use_insn, true);
+	  if (!new_use_avl)
+	    return NULL_RTX;
+	  if (!use_avl)
+	    use_avl = new_use_avl;
+	  if (!rtx_equal_p (use_avl, new_use_avl)
+	      || calculate_ratio (get_sew (use_insn->rtl ()),
+				  get_vlmul (use_insn->rtl ()))
+		   != ratio
+	      || vlmax_avl_p (new_use_avl)
+	      || !tail_agnostic_p (use_insn->rtl ()))
+	    return NULL_RTX;
+	  if (!avl_use_insn)
+	    avl_use_insn = use_insn;
+	}
+    }
+
+  if (use_avl && register_operand (use_avl, Pmode))
+    {
+      gcc_assert (avl_use_insn);
+      // Find a definition at or neighboring INSN.
+      resource_info resource = full_register (REGNO (use_avl));
+      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
+      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
+      if (dl1.matching_set () || dl2.matching_set ())
+	return NULL_RTX;
+      def_info *def1 = dl1.last_def_of_prev_group ();
+      def_info *def2 = dl2.last_def_of_prev_group ();
+      if (def1 != def2)
+	return NULL_RTX;
+      /* FIXME: We only all AVL propation within a block which should
+	 be totally enough for vectorized codes.
+
+	 TODO: We can enhance it here for intrinsic codes in the future
+	 if it is necessary.  */
+      if (def1->insn ()->bb () != insn->bb ()
+	  || def1->insn ()->compare_with (insn) >= 0)
+	return NULL_RTX;
+    }
+  return use_avl;
+}
+
+/* If we have a preferred AVL to propagate, return the AVL.
+   Otherwise, return NULL_RTX as we don't need have any preferred
+   AVL.  */
+
+static rtx
+get_preferred_avl (insn_info *insn)
+{
+  /* TODO: We only do AVL propagation for missed-LEN partial
+     autovectorization for now.  We could add more more AVL
+     propagation for intrinsic codes in the future.  */
+  return get_autovectorize_preferred_avl (insn);
+}
+
+/* Return the AVL TYPE operand index.  */
+static int
+get_avl_type_index (insn_info *insn)
+{
+  extract_insn_cached (insn->rtl ());
+  /* Except rounding mode patterns, AVL TYPE operand
+     is always the last operand.  */
+  if (find_access (insn->uses (), VXRM_REGNUM)
+      || find_access (insn->uses (), FRM_REGNUM))
+    return recog_data.n_operands - 2;
+  return recog_data.n_operands - 1;
+}
+
+/* Main entry point for this pass.  */
+unsigned int
+pass_avlprop::execute (function *)
+{
+  avlprop_init ();
+
+  /* Go through all the instructions looking for AVL that we could propagate. */
+
+  insn_info *next;
+  bool change_p = true;
+
+  while (change_p)
+    {
+      /* Iterate on each instruction until no more change need.  */
+      change_p = false;
+      for (insn_info *insn = crtl->ssa->first_insn (); insn; insn = next)
+	{
+	  next = insn->next_any_insn ();
+	  /* We only forward AVL to the instruction that has AVL/VL operand
+	     and can be optimized in RTL_SSA level.  */
+	  if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
+	    continue;
+
+	  rtx new_avl = get_preferred_avl (insn);
+	  if (new_avl)
+	    {
+	      gcc_assert (!vlmax_avl_p (new_avl));
+	      auto &update = avlprops->get_or_insert (insn);
+	      change_p = !rtx_equal_p (update, new_avl);
+	      update = new_avl;
+	    }
+	}
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "\nNumber of successful AVL propagations: %d\n\n",
+	     (int) avlprops->elements ());
+
+  for (const auto iter : *avlprops)
+    {
+      rtx_insn *rinsn = iter.first->rtl ();
+      if (dump_file)
+	{
+	  fprintf (dump_file, "\nPropagating AVL: ");
+	  print_rtl_single (dump_file, iter.second);
+	  fprintf (dump_file, "into: ");
+	  print_rtl_single (dump_file, rinsn);
+	}
+      /* Replace AVL operand.  */
+      rtx new_pat
+	= simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first, false),
+				iter.second);
+      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
+
+      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
+      if (vlmax_avl_type_p (rinsn))
+	validate_change_or_fail (
+	  rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)],
+	  get_avl_type_rtx (avl_type::NONVLMAX), false);
+      if (dump_file)
+	{
+	  fprintf (dump_file, "Successfully to match this instruction: ");
+	  print_rtl_single (dump_file, rinsn);
+	}
+    }
+
+  avlprop_done ();
+  return 0;
+}
+
+rtl_opt_pass *
+make_pass_avlprop (gcc::context *ctxt)
+{
+  return new pass_avlprop (ctxt);
+}
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
index 4084122cf0a..b6260939d5c 100644
--- a/gcc/config/riscv/riscv-passes.def
+++ b/gcc/config/riscv/riscv-passes.def
@@ -18,4 +18,5 @@
    <http://www.gnu.org/licenses/>.  */
 
 INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
+INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
 INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6cb9d459ee9..2b09ec9ea9e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
 extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
 
 rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
 rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
 
 /* Routines implemented in riscv-string.c.  */
@@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
 bool cmp_lmul_gt_one (machine_mode);
 bool gather_scatter_valid_offset_mode_p (machine_mode);
 bool vls_mode_valid_p (machine_mode);
+bool has_vtype_op (rtx_insn *);
+bool has_vl_op (rtx_insn *);
+bool tail_agnostic_p (rtx_insn *);
+void validate_change_or_fail (rtx, rtx *, rtx, bool);
+bool vlmax_avl_type_p (rtx_insn *);
+bool vlmax_avl_p (rtx);
+uint8_t get_sew (rtx_insn *);
+enum vlmul_type get_vlmul (rtx_insn *);
+bool const_vlmax_p (machine_mode);
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e39a9507803..473622ac321 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -56,7 +56,7 @@ using namespace riscv_vector;
 namespace riscv_vector {
 
 /* Return true if vlmax is constant value and can be used in vsetivl.  */
-static bool
+bool
 const_vlmax_p (machine_mode mode)
 {
   poly_uint64 nuints = GET_MODE_NUNITS (mode);
@@ -298,14 +298,6 @@ public:
 	      len = force_reg (Pmode, len);
 	    vls_p = true;
 	  }
-	else if (const_vlmax_p (vtype_mode))
-	  {
-	    /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
-	       the vsetvli to obtain the value of vlmax.  */
-	    poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
-	    len = gen_int_mode (nunits, Pmode);
-	    vls_p = true;
-	  }
 	else if (can_create_pseudo_p ())
 	  {
 	    len = gen_reg_rtx (Pmode);
@@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
   emit_move_insn (dst, x4);
 }
 
+/* Return true if it is an RVV instruction depends on VTYPE global
+   status register.  */
+bool
+has_vtype_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
+}
+
+/* Return true if it is an RVV instruction depends on VL global
+   status register.  */
+bool
+has_vl_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
+}
+
+/* Get default tail policy.  */
+static bool
+get_default_ta ()
+{
+  /* For the instruction that doesn't require TA, we still need a default value
+     to emit vsetvl. We pick up the default value according to prefer policy. */
+  return (bool) (get_prefer_tail_policy () & 0x1
+		 || (get_prefer_tail_policy () >> 1 & 0x1));
+}
+
+/* Helper function to get TA operand.  */
+bool
+tail_agnostic_p (rtx_insn *rinsn)
+{
+  /* If it doesn't have TA, we return agnostic by default.  */
+  extract_insn_cached (rinsn);
+  int ta = get_attr_ta (rinsn);
+  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
+}
+
+/* Change insn and Assert the change always happens.  */
+void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
+
+/* Return true if it is VLMAX AVL TYPE.  */
+bool
+vlmax_avl_type_p (rtx_insn *rinsn)
+{
+  return get_attr_avl_type (rinsn) == VLMAX;
+}
+
+/* Return true if RTX is RVV VLMAX AVL.  */
+bool
+vlmax_avl_p (rtx x)
+{
+  return x && rtx_equal_p (x, RVV_VLMAX);
+}
+
+/* Helper function to get SEW operand. We always have SEW value for
+   all RVV instructions that have VTYPE OP.  */
+uint8_t
+get_sew (rtx_insn *rinsn)
+{
+  return get_attr_sew (rinsn);
+}
+
+/* Helper function to get VLMUL operand. We always have VLMUL value for
+   all RVV instructions that have VTYPE OP. */
+enum vlmul_type
+get_vlmul (rtx_insn *rinsn)
+{
+  return (enum vlmul_type) get_attr_vlmul (rinsn);
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index e9dd669de98..f2f19e423bf 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
   return agnostic_p ? "agnostic" : "undisturbed";
 }
 
-static bool
-vlmax_avl_p (rtx x)
-{
-  return x && rtx_equal_p (x, RVV_VLMAX);
-}
-
-/* Return true if it is an RVV instruction depends on VTYPE global
-   status register.  */
-static bool
-has_vtype_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
-}
-
-/* Return true if it is an RVV instruction depends on VL global
-   status register.  */
-static bool
-has_vl_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
-}
-
 /* Return true if the instruction ignores VLMUL field of VTYPE.  */
 static bool
 ignore_vlmul_insn_p (rtx_insn *rinsn)
@@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
 
   if (!has_vl_op (rinsn))
     return NULL_RTX;
-  if (get_attr_avl_type (rinsn) == VLMAX)
-    return RVV_VLMAX;
-  extract_insn_cached (rinsn);
-  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
-}
 
-/* Helper function to get SEW operand. We always have SEW value for
-   all RVV instructions that have VTYPE OP.  */
-static uint8_t
-get_sew (rtx_insn *rinsn)
-{
-  return get_attr_sew (rinsn);
-}
-
-/* Helper function to get VLMUL operand. We always have VLMUL value for
-   all RVV instructions that have VTYPE OP. */
-static enum vlmul_type
-get_vlmul (rtx_insn *rinsn)
-{
-  return (enum vlmul_type) get_attr_vlmul (rinsn);
-}
+  extract_insn_cached (rinsn);
+  if (vlmax_avl_type_p (rinsn))
+    {
+      if (BYTES_PER_RISCV_VECTOR.is_constant ())
+	{
+	  for (int i = 0; i < recog_data.n_operands; i++)
+	    if (GET_MODE_CLASS (recog_data.operand_mode[i]) == MODE_VECTOR_BOOL
+		&& const_vlmax_p (recog_data.operand_mode[i]))
+	      return gen_int_mode (GET_MODE_NUNITS (recog_data.operand_mode[i]),
+				   Pmode);
+	}
+      return RVV_VLMAX;
+    }
 
-/* Get default tail policy.  */
-static bool
-get_default_ta ()
-{
-  /* For the instruction that doesn't require TA, we still need a default value
-     to emit vsetvl. We pick up the default value according to prefer policy. */
-  return (bool) (get_prefer_tail_policy () & 0x1
-		 || (get_prefer_tail_policy () >> 1 & 0x1));
+  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
 }
 
 /* Get default mask policy.  */
@@ -407,16 +371,6 @@ get_default_ma ()
 		 || (get_prefer_mask_policy () >> 1 & 0x1));
 }
 
-/* Helper function to get TA operand.  */
-static bool
-tail_agnostic_p (rtx_insn *rinsn)
-{
-  /* If it doesn't have TA, we return agnostic by default.  */
-  extract_insn_cached (rinsn);
-  int ta = get_attr_ta (rinsn);
-  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
-}
-
 /* Helper function to get MA operand.  */
 static bool
 mask_agnostic_p (rtx_insn *rinsn)
@@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
   return true;
 }
 
-/* Change insn and Assert the change always happens.  */
-static void
-validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
-{
-  bool change_p = validate_change (object, loc, new_rtx, in_group);
-  gcc_assert (change_p);
-}
-
 /* This flags indicates the minimum demand of the vl and vtype values by the
    RVV instruction. For example, DEMAND_RATIO_P indicates that this RVV
    instruction only needs the SEW/LMUL ratio to remain the same, and does not
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index dd17056fe82..08de62853a6 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -69,6 +69,12 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/riscv/riscv-vsetvl.cc
 
+riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
+  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h 
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+		$(srcdir)/config/riscv/riscv-avlprop.cc
+
 riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
   $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ef91950178f..0c59d1b90bc 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -809,7 +809,7 @@
 			  V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
 			  V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
 			  V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
-	   (symbol_ref "riscv_vector::NONVLMAX")
+	   (symbol_ref "riscv_vector::VLS")
 	(eq_attr "type" "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
 			  vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
 			  vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
index 928a507a363..5278e4aa38f 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
@@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
     }
 }
 
-/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler {e16,m2} } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
 /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
index a50265fc1ec..1db2e073846 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
@@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict b, int n)
     a[i] = a[i] + b[i];
 }
 
-/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler {e16,m4} } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
 /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
index eac7cbc757b..ca88d42cdf4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
@@ -7,10 +7,11 @@
 /*
 ** foo:
 **	vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+**	...
 **	vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
 **	...
-**	vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
-**	add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
+**	vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+**	...
 **	vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
 **	...
 */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
index 965365da4bb..13367423751 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
@@ -3,7 +3,6 @@
 
 #include "ternop-2.c"
 
-/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
 /* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */
 /* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized" } } */
 /* { dg-final { scan-assembler-not {\tvmv} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
new file mode 100644
index 00000000000..b0d21650c3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
new file mode 100644
index 00000000000..f2d8aa54b88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c,
+     int *__restrict a2, int *__restrict b2, int *__restrict c2,
+     int *__restrict a3, int *__restrict b3, int *__restrict c3,
+     int *__restrict a4, int *__restrict b4, int *__restrict c4,
+     int *__restrict a5, int *__restrict b5, int *__restrict c5,
+     int *__restrict d, int *__restrict d2, int *__restrict d3,
+     int *__restrict d4, int *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 674ba0d72b4..fc830f2cd4d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
 	"" $CFLAGS
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \
 	"-O3 -ftree-vectorize" $CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/avlprop/*.\[cS\]]] \
+	"-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
 	"-O3 -ftree-vectorize --param riscv-autovec-preference=scalable" $CFLAGS
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
-- 
2.36.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-24  3:32 [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization Juzhe-Zhong
@ 2023-10-24  3:44 ` juzhe.zhong
  2023-10-24  4:30   ` Patrick O'Neill
  2023-10-24 15:26 ` Kito Cheng
  1 sibling, 1 reply; 13+ messages in thread
From: juzhe.zhong @ 2023-10-24  3:44 UTC (permalink / raw)
  To: 钟居哲, gcc-patches
  Cc: kito.cheng, Kito.cheng, jeffreyalaw, Robin Dapp, Patrick O'Neill

[-- Attachment #1: Type: text/plain, Size: 41578 bytes --]

CCing Patrick...

Hi, @Patrick.
Could you apply this patch and trigger your regression CI?

I don't have an environment to test fortran for now (I only test it on C/C++).

Thanks. 



juzhe.zhong@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-24 11:32
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.
 
Consider a simple vector addition operation:
 
https://godbolt.org/z/7hfGfEjW3
 
void
foo (int *__restrict a,
     int *__restrict b,
     int *__restrict n)
{
  for (int i = 0; i < n; i++)
      a[i] = a[i] + b[i];
}
 
Optimized IR:
 
Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);    -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;                              -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
 
We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:
 
vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
 
GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):
 
ARM SVE:
   
.L3:
        ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
        ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
        add     z31.s, z31.s, z30.s            -> un-predicated add
        st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store
 
Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.
 
Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:
 
1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
   We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.
 
To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.
 
The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)
 
Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.
 
The reasons as follows:
 
1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
   turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
PASS become heavy and heavy again, then we will need to refactor it again in the future.
Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
fixes.
 
2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things,  I don't think we should fuse them into same PASS.
 
3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.
 
4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate 
VSETVL PASS again which is already so complicated.) 
 
Here is an example to demonstrate more:
 
https://godbolt.org/z/bE86sv3q5
 
void foo2 (int *__restrict a,
          int *__restrict b,
          int *__restrict c,
          int *__restrict a2,
          int *__restrict b2,
          int *__restrict c2,
          int *__restrict a3,
          int *__restrict b3,
          int *__restrict c3,
          int *__restrict a4,
          int *__restrict b4,
          int *__restrict c4,
          int *__restrict a5,
          int *__restrict b5,
          int *__restrict c5,
          int n)
{
    for (int i = 0; i < n; i++){
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i]+ a[i];
 
      a[i] = a[i] + c[i];
      b5[i] = a[i] + c[i];
      a2[i] = a[i] + c2[i];
      a3[i] = a[i] + c3[i];
      a4[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i]+ a[i];
    }
}
 
1. Loop Body:
 
Before this patch:                                          After this patch:
  
      vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli a4,t1,e32,m1,ta,ma                                     
        vle32.v v2,0(a2)                                     vle32.v v2,0(a2)
        vle32.v v4,0(a1)                                     vle32.v v3,0(t2)
        vle32.v v1,0(t2)                                     vle32.v v4,0(a1)
        vsetvli a7,zero,e32,m1,ta,ma                         vle32.v v1,0(t0)
        vadd.vv v4,v2,v4                                     vadd.vv v4,v2,v4
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v1,v3,v1
        vle32.v v3,0(s0)                                     vadd.vv v1,v1,v4
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v1,v1,v4
        vadd.vv v1,v3,v1                                     vadd.vv v1,v1,v4
        vadd.vv v1,v1,v4                                     vadd.vv v1,v1,v2
        vadd.vv v1,v1,v4                                     vadd.vv v2,v1,v2
        vadd.vv v1,v1,v4                                     vse32.v v2,0(t5)
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v2,v2,v1
        vle32.v v4,0(a5)                                     vadd.vv v2,v2,v1
        vsetvli a7,zero,e32,m1,ta,ma                         slli a7,a4,2
        vadd.vv v1,v1,v2                                     vadd.vv v3,v1,v3
        vadd.vv v2,v1,v2                                     vle32.v v5,0(a5)
        vadd.vv v4,v1,v4                                     vle32.v v6,0(t6)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v3,0(t3)
        vse32.v v2,0(t5)                                     vse32.v v2,0(a0)
        vse32.v v4,0(a3)                                     vadd.vv v3,v3,v1
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v2,v1,v5
        vadd.vv v3,v1,v3                                     vse32.v v3,0(t4)
        vadd.vv v2,v2,v1                                     vadd.vv v1,v1,v6
        vadd.vv v2,v2,v1                                     vse32.v v2,0(a3)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v1,0(a6)
        vse32.v v2,0(a0)                                      
        vse32.v v3,0(t3)                                      
        vle32.v v2,0(t0)                                      
        vsetvli a7,zero,e32,m1,ta,ma                                      
        vadd.vv v3,v3,v1                                      
        vsetvli zero,a4,e32,m1,ta,ma                                      
        vse32.v v3,0(t4)                                      
        vsetvli a7,zero,e32,m1,ta,ma                                      
        slli    a7,a4,2                                      
        vadd.vv v1,v1,v2                                      
        sub     t1,t1,a4                                      
        vsetvli zero,a4,e32,m1,ta,ma                                      
        vse32.v v1,0(a6)                                      
 
It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.
 
2. Epilogue:
    Before this patch:                                          After this patch:
 
     .L5:                                                      .L5:                                           
        ld      s0,8(sp)                                         ret
        addi    sp,sp,16                                         
        jr      ra                                         
 
This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'
 
The final codegen after this patch:
 
foo2:
lw t1,56(sp)
ld t6,0(sp)
ld t3,8(sp)
ld t0,16(sp)
ld t2,24(sp)
ld t4,32(sp)
ld t5,40(sp)
ble t1,zero,.L5
.L3:
vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2)
vle32.v v3,0(t2)
vle32.v v4,0(a1)
vle32.v v1,0(t0)
vadd.vv v4,v2,v4
vadd.vv v1,v3,v1
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v2
vadd.vv v2,v1,v2
vse32.v v2,0(t5)
vadd.vv v2,v2,v1
vadd.vv v2,v2,v1
slli a7,a4,2
vadd.vv v3,v1,v3
vle32.v v5,0(a5)
vle32.v v6,0(t6)
vse32.v v3,0(t3)
vse32.v v2,0(a0)
vadd.vv v3,v3,v1
vadd.vv v2,v1,v5
vse32.v v3,0(t4)
vadd.vv v1,v1,v6
vse32.v v2,0(a3)
vse32.v v1,0(a6)
sub t1,t1,a4
add a1,a1,a7
add a2,a2,a7
add a5,a5,a7
add t6,t6,a7
add t0,t0,a7
add t2,t2,a7
add t5,t5,a7
add a3,a3,a7
add a6,a6,a7
add t3,t3,a7
add t4,t4,a7
add a0,a0,a7
bne t1,zero,.L3
.L5:
ret
 
PR target/111888
 
gcc/ChangeLog:
 
* config.gcc: Add AVL propgatation PASS.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
(has_vtype_op): Export as global.
(has_vl_op): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(const_vlmax_p): Ditto.
* config/riscv/riscv-v.cc (has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(get_vlmul): Ditto.
* config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
(has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_sew): Ditto.
(get_vlmul): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
* config/riscv/t-riscv: Add AVL propagation PASS.
* config/riscv/vector.md: Fix VLS modes attribute.
* config/riscv/riscv-avlprop.cc: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
* gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
* gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.
 
---
gcc/config.gcc                                |   2 +-
gcc/config/riscv/riscv-avlprop.cc             | 350 ++++++++++++++++++
gcc/config/riscv/riscv-passes.def             |   1 +
gcc/config/riscv/riscv-protos.h               |  10 +
gcc/config/riscv/riscv-v.cc                   |  84 ++++-
gcc/config/riscv/riscv-vsetvl.cc              |  82 +---
gcc/config/riscv/t-riscv                      |   6 +
gcc/config/riscv/vector.md                    |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +-
.../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +-
.../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 -
.../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
.../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
15 files changed, 514 insertions(+), 84 deletions(-)
create mode 100644 gcc/config/riscv/riscv-avlprop.cc
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 606d3a8513e..efd53965c9a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -544,7 +544,7 @@ pru-*-*)
riscv*)
cpu_type=riscv
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
- extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
+ extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o riscv-avlprop.o"
extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o"
d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-avlprop.cc b/gcc/config/riscv/riscv-avlprop.cc
new file mode 100644
index 00000000000..bf3becd8371
--- /dev/null
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -0,0 +1,350 @@
+/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2023-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or(at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions.
+   A standalone AVL propagation pass is designed because:
+
+     - Better code maintain:
+       Current LCM-based VSETVL pass is so complicated that codes
+       there will become even harder to maintain. A straight forward
+       AVL propagation PASS is much easier to maintain.
+
+     - Reduce scalar register pressure:
+       A type of AVL propagation is we propagate AVL from NON-VLMAX
+       instruction to VLMAX instruction.
+       Note: VLMAX instruction should be ignore tail elements (TA)
+       and the result should be used by the NON-VLMAX instruction.
+       This optimization is mostly for auto-vectorization codes:
+
+   vsetvli r136, r137      --- SELECT_VL
+   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
+   vadd.vv (use VLMAX)     --- PLUS_EXPR
+   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
+
+ NO AVL propation:
+
+   vsetvli a5, a4, ta
+   vle8.v v1
+   vsetvli t0, zero, ta
+   vadd.vv v2, v1, v1
+   vse8.v v2
+
+ We can propagate the AVL to 'vadd.vv' since its result
+ is consumed by a 'vse8.v' which has AVL = a5 and its
+ tail elements are agnostic.
+
+       We DON'T do this optimization on VSETVL pass since it is a
+       post-RA pass that consumed 't0' already wheras a standalone
+       pre-RA AVL propagation pass allows us elide the consumption
+       of the pseudo register of 't0' then we can reduce scalar
+       register pressure.
+
+     - More AVL propagation opportunities:
+       A pre-RA pass is more flexible for AVL REG def-use chain,
+       thus we will get more potential AVL propagation as long as
+       it doesn't increase the scalar register pressure.
+*/
+
+#define IN_TARGET_CODE 1
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "target.h"
+#include "tree-pass.h"
+#include "df.h"
+#include "rtl-ssa.h"
+#include "cfgcleanup.h"
+#include "insn-attr.h"
+
+using namespace rtl_ssa;
+using namespace riscv_vector;
+
+/* The AVL propagation instructions and corresponding preferred AVL.
+   It will be updated during the analysis.  */
+static hash_map<insn_info *, rtx> *avlprops;
+
+const pass_data pass_data_avlprop = {
+  RTL_PASS, /* type */
+  "avlprop", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_avlprop : public rtl_opt_pass
+{
+public:
+  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass (pass_data_avlprop, ctxt) {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) final override
+  {
+    return TARGET_VECTOR && optimize > 0;
+  }
+  virtual unsigned int execute (function *) final override;
+}; // class pass_avlprop
+
+static void
+avlprop_init (void)
+{
+  calculate_dominance_info (CDI_DOMINATORS);
+  df_analyze ();
+  crtl->ssa = new function_info (cfun);
+  avlprops = new hash_map<insn_info *, rtx>;
+}
+
+static void
+avlprop_done (void)
+{
+  free_dominance_info (CDI_DOMINATORS);
+  if (crtl->ssa->perform_pending_updates ())
+    cleanup_cfg (0);
+  delete crtl->ssa;
+  crtl->ssa = nullptr;
+  delete avlprops;
+  avlprops = NULL;
+}
+
+/* Helper function to get AVL operand.  */
+static rtx
+get_avl (insn_info *insn, bool avlprop_p)
+{
+  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
+      || get_attr_avl_type (insn->rtl ()) == VLS)
+    return NULL_RTX;
+  if (avlprop_p)
+    {
+      if (avlprops->get (insn))
+ return (*avlprops->get (insn));
+      else if (vlmax_avl_type_p (insn->rtl ()))
+ return RVV_VLMAX;
+    }
+  extract_insn_cached (insn->rtl ());
+  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
+}
+
+/* This is a straight forward pattern ALWAYS in paritial auto-vectorization:
+
+     VL = SELECT_AVL (AVL, ...)
+     V0 = MASK_LEN_LOAD (..., VL)
+     V1 = MASK_LEN_LOAD (..., VL)
+     V2 = V0 + V1 --- Missed LEN information.
+     MASK_LEN_STORE (..., V2, VL)
+
+   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1, dummy LEN)
+   because:
+
+     - Few code changes in Loop Vectorizer.
+     - Reuse the current clean flow of partial vectorization, That is, apply
+       predicate LEN or MASK into LOAD/STORE operations and other special
+       arithmetic operations (e.d. DIV), then do the whole vector register
+       operation if it DON'T affect the correctness.
+       Such flow is used by all other targets like x86, sve, s390, ... etc.
+     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
+
+   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like PLUS_EXPR which
+   generates the VLMAX instruction due to missed LEN information. The later
+   VSETVL PASS will elided the redundant vsetvls.
+*/
+
+static rtx
+get_autovectorize_preferred_avl (insn_info *insn)
+{
+  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p (insn->rtl ()))
+    return NULL_RTX;
+
+  rtx use_avl = NULL_RTX;
+  insn_info *avl_use_insn = nullptr;
+  unsigned int ratio
+    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul (insn->rtl ()));
+  for (def_info *def : insn->defs ())
+    {
+      auto set = safe_dyn_cast<set_info *> (def);
+      if (!set || !set->is_reg ())
+ return NULL_RTX;
+      for (use_info *use : set->all_uses ())
+ {
+   if (!use->is_in_nondebug_insn ())
+     return NULL_RTX;
+   insn_info *use_insn = use->insn ();
+   /* FIXME: Stop AVL propagation if any USE is not a RVV real
+      instruction. It should be totally enough for vectorized codes since
+      they always locate at extended blocks.
+
+      TODO: We can extend PHI checking for intrinsic codes if it
+      necessary in the future.  */
+   if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!has_vl_op (use_insn->rtl ()))
+     continue;
+
+   rtx new_use_avl = get_avl (use_insn, true);
+   if (!new_use_avl)
+     return NULL_RTX;
+   if (!use_avl)
+     use_avl = new_use_avl;
+   if (!rtx_equal_p (use_avl, new_use_avl)
+       || calculate_ratio (get_sew (use_insn->rtl ()),
+   get_vlmul (use_insn->rtl ()))
+    != ratio
+       || vlmax_avl_p (new_use_avl)
+       || !tail_agnostic_p (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!avl_use_insn)
+     avl_use_insn = use_insn;
+ }
+    }
+
+  if (use_avl && register_operand (use_avl, Pmode))
+    {
+      gcc_assert (avl_use_insn);
+      // Find a definition at or neighboring INSN.
+      resource_info resource = full_register (REGNO (use_avl));
+      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
+      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
+      if (dl1.matching_set () || dl2.matching_set ())
+ return NULL_RTX;
+      def_info *def1 = dl1.last_def_of_prev_group ();
+      def_info *def2 = dl2.last_def_of_prev_group ();
+      if (def1 != def2)
+ return NULL_RTX;
+      /* FIXME: We only all AVL propation within a block which should
+ be totally enough for vectorized codes.
+
+ TODO: We can enhance it here for intrinsic codes in the future
+ if it is necessary.  */
+      if (def1->insn ()->bb () != insn->bb ()
+   || def1->insn ()->compare_with (insn) >= 0)
+ return NULL_RTX;
+    }
+  return use_avl;
+}
+
+/* If we have a preferred AVL to propagate, return the AVL.
+   Otherwise, return NULL_RTX as we don't need have any preferred
+   AVL.  */
+
+static rtx
+get_preferred_avl (insn_info *insn)
+{
+  /* TODO: We only do AVL propagation for missed-LEN partial
+     autovectorization for now.  We could add more more AVL
+     propagation for intrinsic codes in the future.  */
+  return get_autovectorize_preferred_avl (insn);
+}
+
+/* Return the AVL TYPE operand index.  */
+static int
+get_avl_type_index (insn_info *insn)
+{
+  extract_insn_cached (insn->rtl ());
+  /* Except rounding mode patterns, AVL TYPE operand
+     is always the last operand.  */
+  if (find_access (insn->uses (), VXRM_REGNUM)
+      || find_access (insn->uses (), FRM_REGNUM))
+    return recog_data.n_operands - 2;
+  return recog_data.n_operands - 1;
+}
+
+/* Main entry point for this pass.  */
+unsigned int
+pass_avlprop::execute (function *)
+{
+  avlprop_init ();
+
+  /* Go through all the instructions looking for AVL that we could propagate. */
+
+  insn_info *next;
+  bool change_p = true;
+
+  while (change_p)
+    {
+      /* Iterate on each instruction until no more change need.  */
+      change_p = false;
+      for (insn_info *insn = crtl->ssa->first_insn (); insn; insn = next)
+ {
+   next = insn->next_any_insn ();
+   /* We only forward AVL to the instruction that has AVL/VL operand
+      and can be optimized in RTL_SSA level.  */
+   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
+     continue;
+
+   rtx new_avl = get_preferred_avl (insn);
+   if (new_avl)
+     {
+       gcc_assert (!vlmax_avl_p (new_avl));
+       auto &update = avlprops->get_or_insert (insn);
+       change_p = !rtx_equal_p (update, new_avl);
+       update = new_avl;
+     }
+ }
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "\nNumber of successful AVL propagations: %d\n\n",
+      (int) avlprops->elements ());
+
+  for (const auto iter : *avlprops)
+    {
+      rtx_insn *rinsn = iter.first->rtl ();
+      if (dump_file)
+ {
+   fprintf (dump_file, "\nPropagating AVL: ");
+   print_rtl_single (dump_file, iter.second);
+   fprintf (dump_file, "into: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+      /* Replace AVL operand.  */
+      rtx new_pat
+ = simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first, false),
+ iter.second);
+      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
+
+      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
+      if (vlmax_avl_type_p (rinsn))
+ validate_change_or_fail (
+   rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)],
+   get_avl_type_rtx (avl_type::NONVLMAX), false);
+      if (dump_file)
+ {
+   fprintf (dump_file, "Successfully to match this instruction: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+    }
+
+  avlprop_done ();
+  return 0;
+}
+
+rtl_opt_pass *
+make_pass_avlprop (gcc::context *ctxt)
+{
+  return new pass_avlprop (ctxt);
+}
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
index 4084122cf0a..b6260939d5c 100644
--- a/gcc/config/riscv/riscv-passes.def
+++ b/gcc/config/riscv/riscv-passes.def
@@ -18,4 +18,5 @@
    <http://www.gnu.org/licenses/>.  */
INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
+INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6cb9d459ee9..2b09ec9ea9e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
/* Routines implemented in riscv-string.c.  */
@@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
bool cmp_lmul_gt_one (machine_mode);
bool gather_scatter_valid_offset_mode_p (machine_mode);
bool vls_mode_valid_p (machine_mode);
+bool has_vtype_op (rtx_insn *);
+bool has_vl_op (rtx_insn *);
+bool tail_agnostic_p (rtx_insn *);
+void validate_change_or_fail (rtx, rtx *, rtx, bool);
+bool vlmax_avl_type_p (rtx_insn *);
+bool vlmax_avl_p (rtx);
+uint8_t get_sew (rtx_insn *);
+enum vlmul_type get_vlmul (rtx_insn *);
+bool const_vlmax_p (machine_mode);
}
/* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e39a9507803..473622ac321 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -56,7 +56,7 @@ using namespace riscv_vector;
namespace riscv_vector {
/* Return true if vlmax is constant value and can be used in vsetivl.  */
-static bool
+bool
const_vlmax_p (machine_mode mode)
{
   poly_uint64 nuints = GET_MODE_NUNITS (mode);
@@ -298,14 +298,6 @@ public:
      len = force_reg (Pmode, len);
    vls_p = true;
  }
- else if (const_vlmax_p (vtype_mode))
-   {
-     /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
-        the vsetvli to obtain the value of vlmax.  */
-     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
-     len = gen_int_mode (nunits, Pmode);
-     vls_p = true;
-   }
else if (can_create_pseudo_p ())
  {
    len = gen_reg_rtx (Pmode);
@@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
   emit_move_insn (dst, x4);
}
+/* Return true if it is an RVV instruction depends on VTYPE global
+   status register.  */
+bool
+has_vtype_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
+}
+
+/* Return true if it is an RVV instruction depends on VL global
+   status register.  */
+bool
+has_vl_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
+}
+
+/* Get default tail policy.  */
+static bool
+get_default_ta ()
+{
+  /* For the instruction that doesn't require TA, we still need a default value
+     to emit vsetvl. We pick up the default value according to prefer policy. */
+  return (bool) (get_prefer_tail_policy () & 0x1
+ || (get_prefer_tail_policy () >> 1 & 0x1));
+}
+
+/* Helper function to get TA operand.  */
+bool
+tail_agnostic_p (rtx_insn *rinsn)
+{
+  /* If it doesn't have TA, we return agnostic by default.  */
+  extract_insn_cached (rinsn);
+  int ta = get_attr_ta (rinsn);
+  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
+}
+
+/* Change insn and Assert the change always happens.  */
+void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
+
+/* Return true if it is VLMAX AVL TYPE.  */
+bool
+vlmax_avl_type_p (rtx_insn *rinsn)
+{
+  return get_attr_avl_type (rinsn) == VLMAX;
+}
+
+/* Return true if RTX is RVV VLMAX AVL.  */
+bool
+vlmax_avl_p (rtx x)
+{
+  return x && rtx_equal_p (x, RVV_VLMAX);
+}
+
+/* Helper function to get SEW operand. We always have SEW value for
+   all RVV instructions that have VTYPE OP.  */
+uint8_t
+get_sew (rtx_insn *rinsn)
+{
+  return get_attr_sew (rinsn);
+}
+
+/* Helper function to get VLMUL operand. We always have VLMUL value for
+   all RVV instructions that have VTYPE OP. */
+enum vlmul_type
+get_vlmul (rtx_insn *rinsn)
+{
+  return (enum vlmul_type) get_attr_vlmul (rinsn);
+}
+
} // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index e9dd669de98..f2f19e423bf 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
   return agnostic_p ? "agnostic" : "undisturbed";
}
-static bool
-vlmax_avl_p (rtx x)
-{
-  return x && rtx_equal_p (x, RVV_VLMAX);
-}
-
-/* Return true if it is an RVV instruction depends on VTYPE global
-   status register.  */
-static bool
-has_vtype_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
-}
-
-/* Return true if it is an RVV instruction depends on VL global
-   status register.  */
-static bool
-has_vl_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
-}
-
/* Return true if the instruction ignores VLMUL field of VTYPE.  */
static bool
ignore_vlmul_insn_p (rtx_insn *rinsn)
@@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
   if (!has_vl_op (rinsn))
     return NULL_RTX;
-  if (get_attr_avl_type (rinsn) == VLMAX)
-    return RVV_VLMAX;
-  extract_insn_cached (rinsn);
-  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
-}
-/* Helper function to get SEW operand. We always have SEW value for
-   all RVV instructions that have VTYPE OP.  */
-static uint8_t
-get_sew (rtx_insn *rinsn)
-{
-  return get_attr_sew (rinsn);
-}
-
-/* Helper function to get VLMUL operand. We always have VLMUL value for
-   all RVV instructions that have VTYPE OP. */
-static enum vlmul_type
-get_vlmul (rtx_insn *rinsn)
-{
-  return (enum vlmul_type) get_attr_vlmul (rinsn);
-}
+  extract_insn_cached (rinsn);
+  if (vlmax_avl_type_p (rinsn))
+    {
+      if (BYTES_PER_RISCV_VECTOR.is_constant ())
+ {
+   for (int i = 0; i < recog_data.n_operands; i++)
+     if (GET_MODE_CLASS (recog_data.operand_mode[i]) == MODE_VECTOR_BOOL
+ && const_vlmax_p (recog_data.operand_mode[i]))
+       return gen_int_mode (GET_MODE_NUNITS (recog_data.operand_mode[i]),
+    Pmode);
+ }
+      return RVV_VLMAX;
+    }
-/* Get default tail policy.  */
-static bool
-get_default_ta ()
-{
-  /* For the instruction that doesn't require TA, we still need a default value
-     to emit vsetvl. We pick up the default value according to prefer policy. */
-  return (bool) (get_prefer_tail_policy () & 0x1
- || (get_prefer_tail_policy () >> 1 & 0x1));
+  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
}
/* Get default mask policy.  */
@@ -407,16 +371,6 @@ get_default_ma ()
|| (get_prefer_mask_policy () >> 1 & 0x1));
}
-/* Helper function to get TA operand.  */
-static bool
-tail_agnostic_p (rtx_insn *rinsn)
-{
-  /* If it doesn't have TA, we return agnostic by default.  */
-  extract_insn_cached (rinsn);
-  int ta = get_attr_ta (rinsn);
-  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
-}
-
/* Helper function to get MA operand.  */
static bool
mask_agnostic_p (rtx_insn *rinsn)
@@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
   return true;
}
-/* Change insn and Assert the change always happens.  */
-static void
-validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
-{
-  bool change_p = validate_change (object, loc, new_rtx, in_group);
-  gcc_assert (change_p);
-}
-
/* This flags indicates the minimum demand of the vl and vtype values by the
    RVV instruction. For example, DEMAND_RATIO_P indicates that this RVV
    instruction only needs the SEW/LMUL ratio to remain the same, and does not
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index dd17056fe82..08de62853a6 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -69,6 +69,12 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/riscv/riscv-vsetvl.cc
+riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
+  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h 
+ $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+ $(srcdir)/config/riscv/riscv-avlprop.cc
+
riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
   $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ef91950178f..0c59d1b90bc 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -809,7 +809,7 @@
  V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
  V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
  V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
-    (symbol_ref "riscv_vector::NONVLMAX")
+    (symbol_ref "riscv_vector::VLS")
(eq_attr "type" "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
  vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
  vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
index 928a507a363..5278e4aa38f 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
@@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
     }
}
-/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler {e16,m2} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
index a50265fc1ec..1db2e073846 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
@@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict b, int n)
     a[i] = a[i] + b[i];
}
-/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler {e16,m4} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
index eac7cbc757b..ca88d42cdf4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
@@ -7,10 +7,11 @@
/*
** foo:
** vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
-** vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
-** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
+** vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
*/
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
index 965365da4bb..13367423751 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
@@ -3,7 +3,6 @@
#include "ternop-2.c"
-/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
/* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */
/* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized" } } */
/* { dg-final { scan-assembler-not {\tvmv} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
new file mode 100644
index 00000000000..b0d21650c3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
new file mode 100644
index 00000000000..f2d8aa54b88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c,
+     int *__restrict a2, int *__restrict b2, int *__restrict c2,
+     int *__restrict a3, int *__restrict b3, int *__restrict c3,
+     int *__restrict a4, int *__restrict b4, int *__restrict c4,
+     int *__restrict a5, int *__restrict b5, int *__restrict c5,
+     int *__restrict d, int *__restrict d2, int *__restrict d3,
+     int *__restrict d4, int *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 674ba0d72b4..fc830f2cd4d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
"" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \
"-O3 -ftree-vectorize" $CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/avlprop/*.\[cS\]]] \
+ "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
"-O3 -ftree-vectorize --param riscv-autovec-preference=scalable" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
-- 
2.36.3
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-24  3:44 ` juzhe.zhong
@ 2023-10-24  4:30   ` Patrick O'Neill
  2023-10-24 15:03     ` Patrick O'Neill
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick O'Neill @ 2023-10-24  4:30 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: kito.cheng, Kito.cheng, jeffreyalaw, Robin Dapp

[-- Attachment #1: Type: text/plain, Size: 50284 bytes --]

The CI just picked it up: 
https://github.com/ewlu/gcc-precommit-ci/issues/449#issue-1958483272
Since it doesn't apply to the CI's baseline hash it's only performing a 
build.
I'll re-run it in the morning once the baseline has been updated.

In the meantime I started a full build+test run on my local machine.
I'll send you the results in ~10 hours - morning my time :-)

Patrick

On 10/23/23 20:44, juzhe.zhong@rivai.ai wrote:
> CCing Patrick...
>
> Hi, @Patrick.
> Could you apply this patch and trigger your regression CI?
>
> I don't have an environment to test fortran for now (I only test it on 
> C/C++).
>
> Thanks.
>
> ------------------------------------------------------------------------
> juzhe.zhong@rivai.ai
>
>     *From:* Juzhe-Zhong <mailto:juzhe.zhong@rivai.ai>
>     *Date:* 2023-10-24 11:32
>     *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
>     *CC:* kito.cheng <mailto:kito.cheng@gmail.com>; kito.cheng
>     <mailto:kito.cheng@sifive.com>; jeffreyalaw
>     <mailto:jeffreyalaw@gmail.com>; rdapp.gcc
>     <mailto:rdapp.gcc@gmail.com>; Juzhe-Zhong
>     <mailto:juzhe.zhong@rivai.ai>
>     *Subject:* [PATCH] RISC-V: Add AVL propagation PASS for RVV
>     auto-vectorization
>     This patch addresses the redundant AVL/VL toggling in RVV partial
>     auto-vectorization
>     which is a known issue for a long time and I finally find the time
>     to address it.
>     Consider a simple vector addition operation:
>     https://godbolt.org/z/7hfGfEjW3
>     void
>     foo (int *__restrict a,
>          int *__restrict b,
>          int *__restrict n)
>     {
>       for (int i = 0; i < n; i++)
>           a[i] = a[i] + b[i];
>     }
>     Optimized IR:
>     Loop body:
>       _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4,
>     4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
>       ...
>       vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... },
>     _38, 0);    -> vle32.v v2,0(a0)
>       vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... },
>     _38, 0);   -> vle32.v v1,0(a1)
>       vect__7.12_19 = vect__6.11_20 +
>     vect__4.8_27;                              -> vsetvli
>     a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
>       .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0,
>     vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
>     We can see 2 redundant vsetvls inside the loop body due to AVL/VL
>     toggling.
>     The AVL/VL toggling is because we are missing LEN information in
>     simple PLUS_EXPR GIMPLE assignment:
>     vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
>     GCC apply partial predicate load/store and un-predicated full
>     vector operation on partial vectorization.
>     Such flow are used by all other targets like ARM SVE (RVV also
>     uses such flow):
>     ARM SVE:
>     .L3:
>             ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
>             ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
>             add     z31.s, z31.s, z30.s            -> un-predicated add
>             st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store
>     Such vectorization flow causes AVL/VL toggling on RVV so we need
>     AVL propagation PASS for it.
>     Also, It's very unlikely that we can apply predicated operations
>     on all vectorization for following reasons:
>     1. It's very heavy workload to support them on all vectorization
>     and we don't see any benefits if we can handle that on targets
>     backend.
>     2. Changing Loop vectorizer for it will make code base ugly and
>     hard to maintain.
>     3. We will need so many patterns for all operations. Not only
>     COND_LEN_ADD, COND_LEN_SUB, ....
>        We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over
>     100+ patterns, unreasonable number of patterns.
>     To conclude, we prefer un-predicated operations here, and design a
>     nice and clean AVL propagation PASS for it to elide the redundant
>     vsetvls
>     due to AVL/VL toggling.
>     The second question is that why we separate a PASS called AVL
>     propagation. Why not optimize it in VSETVL PASS (We definitetly
>     can optimize AVL in VSETVL PASS)
>     Frankly, I was planning to address such issue in VSETVL PASS
>     that's why we recently refactored VSETVL PASS. However, I changed
>     my mind recently after several
>     experiments and tries.
>     The reasons as follows:
>     1. For code base management and maintainience. Current VSETVL PASS
>     is complicated enough and aleady has enough aggressive and fancy
>     optimizations which
>        turns out it can always generate optimal codegen in most of the
>     cases. It's not a good idea keep adding more features into VSETVL
>     PASS to make VSETVL
>     PASS become heavy and heavy again, then we will need to refactor
>     it again in the future.
>     Actuall, the VSETVL PASS is very stable and optimal after the
>     recent refactoring. Hopefully, we should not change VSETVL PASS
>     any more except the minor
>     fixes.
>     2. vsetvl insertion (VSETVL PASS does this thing) and AVL
>     propagation are 2 different things,  I don't think we should fuse
>     them into same PASS.
>     3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be
>     done before RA which can reduce register allocation.
>     4. This patch's AVL propagation PASS only does AVL propagation for
>     RVV partial auto-vectorization situations.
>        This patch's codes are only hundreds lines which is very
>     managable and can be very easily extended features and enhancements.
>     We can easily extend and enhance more AVL propagation in a clean
>     and separate PASS in the future. (If we do it on VSETVL PASS, we
>     will complicate
>     VSETVL PASS again which is already so complicated.)
>     Here is an example to demonstrate more:
>     https://godbolt.org/z/bE86sv3q5
>     void foo2 (int *__restrict a,
>               int *__restrict b,
>               int *__restrict c,
>               int *__restrict a2,
>               int *__restrict b2,
>               int *__restrict c2,
>               int *__restrict a3,
>               int *__restrict b3,
>               int *__restrict c3,
>               int *__restrict a4,
>               int *__restrict b4,
>               int *__restrict c4,
>               int *__restrict a5,
>               int *__restrict b5,
>               int *__restrict c5,
>               int n)
>     {
>         for (int i = 0; i < n; i++){
>           a[i] = b[i] + c[i];
>           b5[i] = b[i] + c[i];
>           a2[i] = b2[i] + c2[i];
>           a3[i] = b3[i] + c3[i];
>           a4[i] = b4[i] + c4[i];
>           a5[i] = a[i] + a4[i];
>           a[i] = a5[i] + b5[i]+ a[i];
>           a[i] = a[i] + c[i];
>           b5[i] = a[i] + c[i];
>           a2[i] = a[i] + c2[i];
>           a3[i] = a[i] + c3[i];
>           a4[i] = a[i] + c4[i];
>           a5[i] = a[i] + a4[i];
>           a[i] = a[i] + b5[i]+ a[i];
>         }
>     }
>     1. Loop Body:
>     Before this patch:                                          After
>     this patch:
>           vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli
>     a4,t1,e32,m1,ta,ma
>             vle32.v v2,0(a2)                                    
>     vle32.v v2,0(a2)
>             vle32.v v4,0(a1)                                    
>     vle32.v v3,0(t2)
>             vle32.v v1,0(t2)                                    
>     vle32.v v4,0(a1)
>             vsetvli a7,zero,e32,m1,ta,ma                        
>     vle32.v v1,0(t0)
>             vadd.vv v4,v2,v4                                    
>     vadd.vv v4,v2,v4
>             vsetvli zero,a4,e32,m1,ta,ma                        
>     vadd.vv v1,v3,v1
>             vle32.v v3,0(s0)                                    
>     vadd.vv v1,v1,v4
>             vsetvli a7,zero,e32,m1,ta,ma                        
>     vadd.vv v1,v1,v4
>             vadd.vv v1,v3,v1                                    
>     vadd.vv v1,v1,v4
>             vadd.vv v1,v1,v4                                    
>     vadd.vv v1,v1,v2
>             vadd.vv v1,v1,v4                                    
>     vadd.vv v2,v1,v2
>             vadd.vv v1,v1,v4                                    
>     vse32.v v2,0(t5)
>             vsetvli zero,a4,e32,m1,ta,ma                        
>     vadd.vv v2,v2,v1
>             vle32.v v4,0(a5)                                    
>     vadd.vv v2,v2,v1
>             vsetvli a7,zero,e32,m1,ta,ma                         slli
>     a7,a4,2
>             vadd.vv v1,v1,v2                                    
>     vadd.vv v3,v1,v3
>             vadd.vv v2,v1,v2                                    
>     vle32.v v5,0(a5)
>             vadd.vv v4,v1,v4                                    
>     vle32.v v6,0(t6)
>             vsetvli zero,a4,e32,m1,ta,ma                        
>     vse32.v v3,0(t3)
>             vse32.v v2,0(t5)                                    
>     vse32.v v2,0(a0)
>             vse32.v v4,0(a3)                                    
>     vadd.vv v3,v3,v1
>             vsetvli a7,zero,e32,m1,ta,ma                        
>     vadd.vv v2,v1,v5
>             vadd.vv v3,v1,v3                                    
>     vse32.v v3,0(t4)
>             vadd.vv v2,v2,v1                                    
>     vadd.vv v1,v1,v6
>             vadd.vv v2,v2,v1                                    
>     vse32.v v2,0(a3)
>             vsetvli zero,a4,e32,m1,ta,ma                        
>     vse32.v v1,0(a6)
>             vse32.v v2,0(a0)
>             vse32.v v3,0(t3)
>             vle32.v v2,0(t0)
>             vsetvli a7,zero,e32,m1,ta,ma
>             vadd.vv v3,v3,v1
>             vsetvli zero,a4,e32,m1,ta,ma
>             vse32.v v3,0(t4)
>             vsetvli a7,zero,e32,m1,ta,ma
>             slli a7,a4,2
>             vadd.vv v1,v1,v2
>             sub t1,t1,a4
>             vsetvli zero,a4,e32,m1,ta,ma
>             vse32.v v1,0(a6)
>     It's quite obvious, all heavy && redundant vsetvls inside loop
>     body are eliminated.
>     2. Epilogue:
>         Before this patch:                                         
>     After this patch:
>     .L5: .L5:
>             ld s0,8(sp)                                         ret
>             addi sp,sp,16
>             jr ra
>     This is the benefit we do the AVL propation before RA since we
>     eliminate the use of 'a7' register
>     which is used by the redudant AVL/VL toggling instruction:
>     'vsetvli a7,zero,e32,m1,ta,ma'
>     The final codegen after this patch:
>     foo2:
>     lw t1,56(sp)
>     ld t6,0(sp)
>     ld t3,8(sp)
>     ld t0,16(sp)
>     ld t2,24(sp)
>     ld t4,32(sp)
>     ld t5,40(sp)
>     ble t1,zero,.L5
>     .L3:
>     vsetvli a4,t1,e32,m1,ta,ma
>     vle32.v v2,0(a2)
>     vle32.v v3,0(t2)
>     vle32.v v4,0(a1)
>     vle32.v v1,0(t0)
>     vadd.vv v4,v2,v4
>     vadd.vv v1,v3,v1
>     vadd.vv v1,v1,v4
>     vadd.vv v1,v1,v4
>     vadd.vv v1,v1,v4
>     vadd.vv v1,v1,v2
>     vadd.vv v2,v1,v2
>     vse32.v v2,0(t5)
>     vadd.vv v2,v2,v1
>     vadd.vv v2,v2,v1
>     slli a7,a4,2
>     vadd.vv v3,v1,v3
>     vle32.v v5,0(a5)
>     vle32.v v6,0(t6)
>     vse32.v v3,0(t3)
>     vse32.v v2,0(a0)
>     vadd.vv v3,v3,v1
>     vadd.vv v2,v1,v5
>     vse32.v v3,0(t4)
>     vadd.vv v1,v1,v6
>     vse32.v v2,0(a3)
>     vse32.v v1,0(a6)
>     sub t1,t1,a4
>     add a1,a1,a7
>     add a2,a2,a7
>     add a5,a5,a7
>     add t6,t6,a7
>     add t0,t0,a7
>     add t2,t2,a7
>     add t5,t5,a7
>     add a3,a3,a7
>     add a6,a6,a7
>     add t3,t3,a7
>     add t4,t4,a7
>     add a0,a0,a7
>     bne t1,zero,.L3
>     .L5:
>     ret
>     PR target/111888
>     gcc/ChangeLog:
>     * config.gcc: Add AVL propgatation PASS.
>     * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
>     * config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
>     (has_vtype_op): Export as global.
>     (has_vl_op): Ditto.
>     (tail_agnostic_p): Ditto.
>     (validate_change_or_fail): Ditto.
>     (vlmax_avl_type_p): Ditto.
>     (vlmax_avl_p): Ditto.
>     (get_sew): Ditto.
>     (enum vlmul_type): Ditto.
>     (const_vlmax_p): Ditto.
>     * config/riscv/riscv-v.cc (has_vtype_op): Ditto.
>     (has_vl_op): Ditto.
>     (get_default_ta): Ditto.
>     (tail_agnostic_p): Ditto.
>     (validate_change_or_fail): Ditto.
>     (vlmax_avl_type_p): Ditto.
>     (vlmax_avl_p): Ditto.
>     (get_sew): Ditto.
>     (enum vlmul_type): Ditto.
>     (get_vlmul): Ditto.
>     * config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
>     (has_vtype_op): Ditto.
>     (has_vl_op): Ditto.
>     (get_sew): Ditto.
>     (get_vlmul): Ditto.
>     (get_default_ta): Ditto.
>     (tail_agnostic_p): Ditto.
>     (validate_change_or_fail): Ditto.
>     * config/riscv/t-riscv: Add AVL propagation PASS.
>     * config/riscv/vector.md: Fix VLS modes attribute.
>     * config/riscv/riscv-avlprop.cc: New file.
>     gcc/testsuite/ChangeLog:
>     * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
>     * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
>     * gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
>     * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
>     * gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
>     * gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
>     * gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.
>     ---
>     gcc/config.gcc                                |   2 +-
>     gcc/config/riscv/riscv-avlprop.cc             | 350 ++++++++++++++++++
>     gcc/config/riscv/riscv-passes.def             |   1 +
>     gcc/config/riscv/riscv-protos.h               |  10 +
>     gcc/config/riscv/riscv-v.cc                   |  84 ++++-
>     gcc/config/riscv/riscv-vsetvl.cc              |  82 +---
>     gcc/config/riscv/t-riscv                      |   6 +
>     gcc/config/riscv/vector.md                    |   2 +-
>     .../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +-
>     .../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +-
>     .../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +-
>     .../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 -
>     .../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
>     .../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
>     gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
>     15 files changed, 514 insertions(+), 84 deletions(-)
>     create mode 100644 gcc/config/riscv/riscv-avlprop.cc
>     create mode 100644
>     gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>     create mode 100644
>     gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>     diff --git a/gcc/config.gcc b/gcc/config.gcc
>     index 606d3a8513e..efd53965c9a 100644
>     --- a/gcc/config.gcc
>     +++ b/gcc/config.gcc
>     @@ -544,7 +544,7 @@ pru-*-*)
>     riscv*)
>     cpu_type=riscv
>     extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o
>     riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
>     - extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o
>     riscv-vector-costs.o"
>     + extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o
>     riscv-vector-costs.o riscv-avlprop.o"
>     extra_objs="${extra_objs} riscv-vector-builtins.o
>     riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>     extra_objs="${extra_objs} thead.o"
>     d_target_objs="riscv-d.o"
>     diff --git a/gcc/config/riscv/riscv-avlprop.cc
>     b/gcc/config/riscv/riscv-avlprop.cc
>     new file mode 100644
>     index 00000000000..bf3becd8371
>     --- /dev/null
>     +++ b/gcc/config/riscv/riscv-avlprop.cc
>     @@ -0,0 +1,350 @@
>     +/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler.
>     +   Copyright (C) 2023-2023 Free Software Foundation, Inc.
>     +   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI
>     Technologies Ltd.
>     +
>     +This file is part of GCC.
>     +
>     +GCC is free software; you can redistribute it and/or modify
>     +it under the terms of the GNU General Public License as published by
>     +the Free Software Foundation; either version 3, or(at your option)
>     +any later version.
>     +
>     +GCC is distributed in the hope that it will be useful,
>     +but WITHOUT ANY WARRANTY; without even the implied warranty of
>     +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>     +GNU General Public License for more details.
>     +
>     +You should have received a copy of the GNU General Public License
>     +along with GCC; see the file COPYING3.  If not see
>     +<http://www.gnu.org/licenses/>.  */
>     +
>     +/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions.
>     +   A standalone AVL propagation pass is designed because:
>     +
>     +     - Better code maintain:
>     +       Current LCM-based VSETVL pass is so complicated that codes
>     +       there will become even harder to maintain. A straight forward
>     +       AVL propagation PASS is much easier to maintain.
>     +
>     +     - Reduce scalar register pressure:
>     +       A type of AVL propagation is we propagate AVL from NON-VLMAX
>     +       instruction to VLMAX instruction.
>     +       Note: VLMAX instruction should be ignore tail elements (TA)
>     +       and the result should be used by the NON-VLMAX instruction.
>     +       This optimization is mostly for auto-vectorization codes:
>     +
>     +   vsetvli r136, r137      --- SELECT_VL
>     +   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
>     +   vadd.vv (use VLMAX)     --- PLUS_EXPR
>     +   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
>     +
>     + NO AVL propation:
>     +
>     +   vsetvli a5, a4, ta
>     +   vle8.v v1
>     +   vsetvli t0, zero, ta
>     +   vadd.vv v2, v1, v1
>     +   vse8.v v2
>     +
>     + We can propagate the AVL to 'vadd.vv' since its result
>     + is consumed by a 'vse8.v' which has AVL = a5 and its
>     + tail elements are agnostic.
>     +
>     +       We DON'T do this optimization on VSETVL pass since it is a
>     +       post-RA pass that consumed 't0' already wheras a standalone
>     +       pre-RA AVL propagation pass allows us elide the consumption
>     +       of the pseudo register of 't0' then we can reduce scalar
>     +       register pressure.
>     +
>     +     - More AVL propagation opportunities:
>     +       A pre-RA pass is more flexible for AVL REG def-use chain,
>     +       thus we will get more potential AVL propagation as long as
>     +       it doesn't increase the scalar register pressure.
>     +*/
>     +
>     +#define IN_TARGET_CODE 1
>     +#define INCLUDE_ALGORITHM
>     +#define INCLUDE_FUNCTIONAL
>     +
>     +#include "config.h"
>     +#include "system.h"
>     +#include "coretypes.h"
>     +#include "tm.h"
>     +#include "backend.h"
>     +#include "rtl.h"
>     +#include "target.h"
>     +#include "tree-pass.h"
>     +#include "df.h"
>     +#include "rtl-ssa.h"
>     +#include "cfgcleanup.h"
>     +#include "insn-attr.h"
>     +
>     +using namespace rtl_ssa;
>     +using namespace riscv_vector;
>     +
>     +/* The AVL propagation instructions and corresponding preferred AVL.
>     +   It will be updated during the analysis.  */
>     +static hash_map<insn_info *, rtx> *avlprops;
>     +
>     +const pass_data pass_data_avlprop = {
>     +  RTL_PASS, /* type */
>     +  "avlprop", /* name */
>     +  OPTGROUP_NONE, /* optinfo_flags */
>     +  TV_NONE, /* tv_id */
>     +  0, /* properties_required */
>     +  0, /* properties_provided */
>     +  0, /* properties_destroyed */
>     +  0, /* todo_flags_start */
>     +  0, /* todo_flags_finish */
>     +};
>     +
>     +class pass_avlprop : public rtl_opt_pass
>     +{
>     +public:
>     +  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass
>     (pass_data_avlprop, ctxt) {}
>     +
>     +  /* opt_pass methods: */
>     +  virtual bool gate (function *) final override
>     +  {
>     +    return TARGET_VECTOR && optimize > 0;
>     +  }
>     +  virtual unsigned int execute (function *) final override;
>     +}; // class pass_avlprop
>     +
>     +static void
>     +avlprop_init (void)
>     +{
>     +  calculate_dominance_info (CDI_DOMINATORS);
>     +  df_analyze ();
>     +  crtl->ssa = new function_info (cfun);
>     +  avlprops = new hash_map<insn_info *, rtx>;
>     +}
>     +
>     +static void
>     +avlprop_done (void)
>     +{
>     +  free_dominance_info (CDI_DOMINATORS);
>     +  if (crtl->ssa->perform_pending_updates ())
>     +    cleanup_cfg (0);
>     +  delete crtl->ssa;
>     +  crtl->ssa = nullptr;
>     +  delete avlprops;
>     +  avlprops = NULL;
>     +}
>     +
>     +/* Helper function to get AVL operand.  */
>     +static rtx
>     +get_avl (insn_info *insn, bool avlprop_p)
>     +{
>     +  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
>     +      || get_attr_avl_type (insn->rtl ()) == VLS)
>     +    return NULL_RTX;
>     +  if (avlprop_p)
>     +    {
>     +      if (avlprops->get (insn))
>     + return (*avlprops->get (insn));
>     +      else if (vlmax_avl_type_p (insn->rtl ()))
>     + return RVV_VLMAX;
>     +    }
>     +  extract_insn_cached (insn->rtl ());
>     +  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
>     +}
>     +
>     +/* This is a straight forward pattern ALWAYS in paritial
>     auto-vectorization:
>     +
>     +     VL = SELECT_AVL (AVL, ...)
>     +     V0 = MASK_LEN_LOAD (..., VL)
>     +     V1 = MASK_LEN_LOAD (..., VL)
>     +     V2 = V0 + V1 --- Missed LEN information.
>     +     MASK_LEN_STORE (..., V2, VL)
>     +
>     +   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1,
>     dummy LEN)
>     +   because:
>     +
>     +     - Few code changes in Loop Vectorizer.
>     +     - Reuse the current clean flow of partial vectorization,
>     That is, apply
>     +       predicate LEN or MASK into LOAD/STORE operations and other
>     special
>     +       arithmetic operations (e.d. DIV), then do the whole vector
>     register
>     +       operation if it DON'T affect the correctness.
>     +       Such flow is used by all other targets like x86, sve,
>     s390, ... etc.
>     +     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
>     +
>     +   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like
>     PLUS_EXPR which
>     +   generates the VLMAX instruction due to missed LEN information.
>     The later
>     +   VSETVL PASS will elided the redundant vsetvls.
>     +*/
>     +
>     +static rtx
>     +get_autovectorize_preferred_avl (insn_info *insn)
>     +{
>     +  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p
>     (insn->rtl ()))
>     +    return NULL_RTX;
>     +
>     +  rtx use_avl = NULL_RTX;
>     +  insn_info *avl_use_insn = nullptr;
>     +  unsigned int ratio
>     +    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul
>     (insn->rtl ()));
>     +  for (def_info *def : insn->defs ())
>     +    {
>     +      auto set = safe_dyn_cast<set_info *> (def);
>     +      if (!set || !set->is_reg ())
>     + return NULL_RTX;
>     +      for (use_info *use : set->all_uses ())
>     + {
>     +   if (!use->is_in_nondebug_insn ())
>     +     return NULL_RTX;
>     +   insn_info *use_insn = use->insn ();
>     +   /* FIXME: Stop AVL propagation if any USE is not a RVV real
>     +      instruction. It should be totally enough for vectorized
>     codes since
>     +      they always locate at extended blocks.
>     +
>     +      TODO: We can extend PHI checking for intrinsic codes if it
>     +      necessary in the future.  */
>     +   if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl
>     ()))
>     +     return NULL_RTX;
>     +   if (!has_vl_op (use_insn->rtl ()))
>     +     continue;
>     +
>     +   rtx new_use_avl = get_avl (use_insn, true);
>     +   if (!new_use_avl)
>     +     return NULL_RTX;
>     +   if (!use_avl)
>     +     use_avl = new_use_avl;
>     +   if (!rtx_equal_p (use_avl, new_use_avl)
>     +       || calculate_ratio (get_sew (use_insn->rtl ()),
>     +   get_vlmul (use_insn->rtl ()))
>     +    != ratio
>     +       || vlmax_avl_p (new_use_avl)
>     +       || !tail_agnostic_p (use_insn->rtl ()))
>     +     return NULL_RTX;
>     +   if (!avl_use_insn)
>     +     avl_use_insn = use_insn;
>     + }
>     +    }
>     +
>     +  if (use_avl && register_operand (use_avl, Pmode))
>     +    {
>     +      gcc_assert (avl_use_insn);
>     +      // Find a definition at or neighboring INSN.
>     +      resource_info resource = full_register (REGNO (use_avl));
>     +      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
>     +      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
>     +      if (dl1.matching_set () || dl2.matching_set ())
>     + return NULL_RTX;
>     +      def_info *def1 = dl1.last_def_of_prev_group ();
>     +      def_info *def2 = dl2.last_def_of_prev_group ();
>     +      if (def1 != def2)
>     + return NULL_RTX;
>     +      /* FIXME: We only all AVL propation within a block which should
>     + be totally enough for vectorized codes.
>     +
>     + TODO: We can enhance it here for intrinsic codes in the future
>     + if it is necessary.  */
>     +      if (def1->insn ()->bb () != insn->bb ()
>     +   || def1->insn ()->compare_with (insn) >= 0)
>     + return NULL_RTX;
>     +    }
>     +  return use_avl;
>     +}
>     +
>     +/* If we have a preferred AVL to propagate, return the AVL.
>     +   Otherwise, return NULL_RTX as we don't need have any preferred
>     +   AVL.  */
>     +
>     +static rtx
>     +get_preferred_avl (insn_info *insn)
>     +{
>     +  /* TODO: We only do AVL propagation for missed-LEN partial
>     +     autovectorization for now.  We could add more more AVL
>     +     propagation for intrinsic codes in the future.  */
>     +  return get_autovectorize_preferred_avl (insn);
>     +}
>     +
>     +/* Return the AVL TYPE operand index.  */
>     +static int
>     +get_avl_type_index (insn_info *insn)
>     +{
>     +  extract_insn_cached (insn->rtl ());
>     +  /* Except rounding mode patterns, AVL TYPE operand
>     +     is always the last operand.  */
>     +  if (find_access (insn->uses (), VXRM_REGNUM)
>     +      || find_access (insn->uses (), FRM_REGNUM))
>     +    return recog_data.n_operands - 2;
>     +  return recog_data.n_operands - 1;
>     +}
>     +
>     +/* Main entry point for this pass.  */
>     +unsigned int
>     +pass_avlprop::execute (function *)
>     +{
>     +  avlprop_init ();
>     +
>     +  /* Go through all the instructions looking for AVL that we
>     could propagate. */
>     +
>     +  insn_info *next;
>     +  bool change_p = true;
>     +
>     +  while (change_p)
>     +    {
>     +      /* Iterate on each instruction until no more change need.  */
>     +      change_p = false;
>     +      for (insn_info *insn = crtl->ssa->first_insn (); insn; insn
>     = next)
>     + {
>     +   next = insn->next_any_insn ();
>     +   /* We only forward AVL to the instruction that has AVL/VL operand
>     +      and can be optimized in RTL_SSA level.  */
>     +   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
>     +     continue;
>     +
>     +   rtx new_avl = get_preferred_avl (insn);
>     +   if (new_avl)
>     +     {
>     +       gcc_assert (!vlmax_avl_p (new_avl));
>     +       auto &update = avlprops->get_or_insert (insn);
>     +       change_p = !rtx_equal_p (update, new_avl);
>     +       update = new_avl;
>     +     }
>     + }
>     +    }
>     +
>     +  if (dump_file)
>     +    fprintf (dump_file, "\nNumber of successful AVL propagations:
>     %d\n\n",
>     +      (int) avlprops->elements ());
>     +
>     +  for (const auto iter : *avlprops)
>     +    {
>     +      rtx_insn *rinsn = iter.first->rtl ();
>     +      if (dump_file)
>     + {
>     +   fprintf (dump_file, "\nPropagating AVL: ");
>     +   print_rtl_single (dump_file, iter.second);
>     +   fprintf (dump_file, "into: ");
>     +   print_rtl_single (dump_file, rinsn);
>     + }
>     +      /* Replace AVL operand.  */
>     +      rtx new_pat
>     + = simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first,
>     false),
>     + iter.second);
>     +      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat,
>     false);
>     +
>     +      /* Change AVL TYPE into NONVLMAX if it is VLMAX. */
>     +      if (vlmax_avl_type_p (rinsn))
>     + validate_change_or_fail (
>     +   rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)],
>     +   get_avl_type_rtx (avl_type::NONVLMAX), false);
>     +      if (dump_file)
>     + {
>     +   fprintf (dump_file, "Successfully to match this instruction: ");
>     +   print_rtl_single (dump_file, rinsn);
>     + }
>     +    }
>     +
>     +  avlprop_done ();
>     +  return 0;
>     +}
>     +
>     +rtl_opt_pass *
>     +make_pass_avlprop (gcc::context *ctxt)
>     +{
>     +  return new pass_avlprop (ctxt);
>     +}
>     diff --git a/gcc/config/riscv/riscv-passes.def
>     b/gcc/config/riscv/riscv-passes.def
>     index 4084122cf0a..b6260939d5c 100644
>     --- a/gcc/config/riscv/riscv-passes.def
>     +++ b/gcc/config/riscv/riscv-passes.def
>     @@ -18,4 +18,5 @@
>     <http://www.gnu.org/licenses/>.  */
>     INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
>     +INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
>     INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
>     diff --git a/gcc/config/riscv/riscv-protos.h
>     b/gcc/config/riscv/riscv-protos.h
>     index 6cb9d459ee9..2b09ec9ea9e 100644
>     --- a/gcc/config/riscv/riscv-protos.h
>     +++ b/gcc/config/riscv/riscv-protos.h
>     @@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const
>     char *, struct gcc_options *, locatio
>     extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
>     rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
>     +rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
>     rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
>     /* Routines implemented in riscv-string.c.  */
>     @@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
>     bool cmp_lmul_gt_one (machine_mode);
>     bool gather_scatter_valid_offset_mode_p (machine_mode);
>     bool vls_mode_valid_p (machine_mode);
>     +bool has_vtype_op (rtx_insn *);
>     +bool has_vl_op (rtx_insn *);
>     +bool tail_agnostic_p (rtx_insn *);
>     +void validate_change_or_fail (rtx, rtx *, rtx, bool);
>     +bool vlmax_avl_type_p (rtx_insn *);
>     +bool vlmax_avl_p (rtx);
>     +uint8_t get_sew (rtx_insn *);
>     +enum vlmul_type get_vlmul (rtx_insn *);
>     +bool const_vlmax_p (machine_mode);
>     }
>     /* We classify builtin types into two classes:
>     diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
>     index e39a9507803..473622ac321 100644
>     --- a/gcc/config/riscv/riscv-v.cc
>     +++ b/gcc/config/riscv/riscv-v.cc
>     @@ -56,7 +56,7 @@ using namespace riscv_vector;
>     namespace riscv_vector {
>     /* Return true if vlmax is constant value and can be used in
>     vsetivl.  */
>     -static bool
>     +bool
>     const_vlmax_p (machine_mode mode)
>     {
>        poly_uint64 nuints = GET_MODE_NUNITS (mode);
>     @@ -298,14 +298,6 @@ public:
>           len = force_reg (Pmode, len);
>         vls_p = true;
>       }
>     - else if (const_vlmax_p (vtype_mode))
>     -   {
>     -     /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
>     -        the vsetvli to obtain the value of vlmax.  */
>     -     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
>     -     len = gen_int_mode (nunits, Pmode);
>     -     vls_p = true;
>     -   }
>     else if (can_create_pseudo_p ())
>       {
>         len = gen_reg_rtx (Pmode);
>     @@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
>        emit_move_insn (dst, x4);
>     }
>     +/* Return true if it is an RVV instruction depends on VTYPE global
>     +   status register.  */
>     +bool
>     +has_vtype_op (rtx_insn *rinsn)
>     +{
>     +  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op
>     (rinsn);
>     +}
>     +
>     +/* Return true if it is an RVV instruction depends on VL global
>     +   status register.  */
>     +bool
>     +has_vl_op (rtx_insn *rinsn)
>     +{
>     +  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
>     +}
>     +
>     +/* Get default tail policy.  */
>     +static bool
>     +get_default_ta ()
>     +{
>     +  /* For the instruction that doesn't require TA, we still need a
>     default value
>     +     to emit vsetvl. We pick up the default value according to
>     prefer policy. */
>     +  return (bool) (get_prefer_tail_policy () & 0x1
>     + || (get_prefer_tail_policy () >> 1 & 0x1));
>     +}
>     +
>     +/* Helper function to get TA operand.  */
>     +bool
>     +tail_agnostic_p (rtx_insn *rinsn)
>     +{
>     +  /* If it doesn't have TA, we return agnostic by default.  */
>     +  extract_insn_cached (rinsn);
>     +  int ta = get_attr_ta (rinsn);
>     +  return ta == INVALID_ATTRIBUTE ? get_default_ta () :
>     IS_AGNOSTIC (ta);
>     +}
>     +
>     +/* Change insn and Assert the change always happens.  */
>     +void
>     +validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool
>     in_group)
>     +{
>     +  bool change_p = validate_change (object, loc, new_rtx, in_group);
>     +  gcc_assert (change_p);
>     +}
>     +
>     +/* Return true if it is VLMAX AVL TYPE.  */
>     +bool
>     +vlmax_avl_type_p (rtx_insn *rinsn)
>     +{
>     +  return get_attr_avl_type (rinsn) == VLMAX;
>     +}
>     +
>     +/* Return true if RTX is RVV VLMAX AVL.  */
>     +bool
>     +vlmax_avl_p (rtx x)
>     +{
>     +  return x && rtx_equal_p (x, RVV_VLMAX);
>     +}
>     +
>     +/* Helper function to get SEW operand. We always have SEW value for
>     +   all RVV instructions that have VTYPE OP.  */
>     +uint8_t
>     +get_sew (rtx_insn *rinsn)
>     +{
>     +  return get_attr_sew (rinsn);
>     +}
>     +
>     +/* Helper function to get VLMUL operand. We always have VLMUL
>     value for
>     +   all RVV instructions that have VTYPE OP. */
>     +enum vlmul_type
>     +get_vlmul (rtx_insn *rinsn)
>     +{
>     +  return (enum vlmul_type) get_attr_vlmul (rinsn);
>     +}
>     +
>     } // namespace riscv_vector
>     diff --git a/gcc/config/riscv/riscv-vsetvl.cc
>     b/gcc/config/riscv/riscv-vsetvl.cc
>     index e9dd669de98..f2f19e423bf 100644
>     --- a/gcc/config/riscv/riscv-vsetvl.cc
>     +++ b/gcc/config/riscv/riscv-vsetvl.cc
>     @@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
>        return agnostic_p ? "agnostic" : "undisturbed";
>     }
>     -static bool
>     -vlmax_avl_p (rtx x)
>     -{
>     -  return x && rtx_equal_p (x, RVV_VLMAX);
>     -}
>     -
>     -/* Return true if it is an RVV instruction depends on VTYPE global
>     -   status register.  */
>     -static bool
>     -has_vtype_op (rtx_insn *rinsn)
>     -{
>     -  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op
>     (rinsn);
>     -}
>     -
>     -/* Return true if it is an RVV instruction depends on VL global
>     -   status register.  */
>     -static bool
>     -has_vl_op (rtx_insn *rinsn)
>     -{
>     -  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
>     -}
>     -
>     /* Return true if the instruction ignores VLMUL field of VTYPE.  */
>     static bool
>     ignore_vlmul_insn_p (rtx_insn *rinsn)
>     @@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
>        if (!has_vl_op (rinsn))
>          return NULL_RTX;
>     -  if (get_attr_avl_type (rinsn) == VLMAX)
>     -    return RVV_VLMAX;
>     -  extract_insn_cached (rinsn);
>     -  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
>     -}
>     -/* Helper function to get SEW operand. We always have SEW value for
>     -   all RVV instructions that have VTYPE OP.  */
>     -static uint8_t
>     -get_sew (rtx_insn *rinsn)
>     -{
>     -  return get_attr_sew (rinsn);
>     -}
>     -
>     -/* Helper function to get VLMUL operand. We always have VLMUL
>     value for
>     -   all RVV instructions that have VTYPE OP. */
>     -static enum vlmul_type
>     -get_vlmul (rtx_insn *rinsn)
>     -{
>     -  return (enum vlmul_type) get_attr_vlmul (rinsn);
>     -}
>     +  extract_insn_cached (rinsn);
>     +  if (vlmax_avl_type_p (rinsn))
>     +    {
>     +      if (BYTES_PER_RISCV_VECTOR.is_constant ())
>     + {
>     +   for (int i = 0; i < recog_data.n_operands; i++)
>     +     if (GET_MODE_CLASS (recog_data.operand_mode[i]) ==
>     MODE_VECTOR_BOOL
>     + && const_vlmax_p (recog_data.operand_mode[i]))
>     +       return gen_int_mode (GET_MODE_NUNITS
>     (recog_data.operand_mode[i]),
>     +    Pmode);
>     + }
>     +      return RVV_VLMAX;
>     +    }
>     -/* Get default tail policy.  */
>     -static bool
>     -get_default_ta ()
>     -{
>     -  /* For the instruction that doesn't require TA, we still need a
>     default value
>     -     to emit vsetvl. We pick up the default value according to
>     prefer policy. */
>     -  return (bool) (get_prefer_tail_policy () & 0x1
>     - || (get_prefer_tail_policy () >> 1 & 0x1));
>     +  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
>     }
>     /* Get default mask policy.  */
>     @@ -407,16 +371,6 @@ get_default_ma ()
>     || (get_prefer_mask_policy () >> 1 & 0x1));
>     }
>     -/* Helper function to get TA operand.  */
>     -static bool
>     -tail_agnostic_p (rtx_insn *rinsn)
>     -{
>     -  /* If it doesn't have TA, we return agnostic by default.  */
>     -  extract_insn_cached (rinsn);
>     -  int ta = get_attr_ta (rinsn);
>     -  return ta == INVALID_ATTRIBUTE ? get_default_ta () :
>     IS_AGNOSTIC (ta);
>     -}
>     -
>     /* Helper function to get MA operand.  */
>     static bool
>     mask_agnostic_p (rtx_insn *rinsn)
>     @@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn
>     *rinsn, int regno)
>        return true;
>     }
>     -/* Change insn and Assert the change always happens.  */
>     -static void
>     -validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool
>     in_group)
>     -{
>     -  bool change_p = validate_change (object, loc, new_rtx, in_group);
>     -  gcc_assert (change_p);
>     -}
>     -
>     /* This flags indicates the minimum demand of the vl and vtype
>     values by the
>         RVV instruction. For example, DEMAND_RATIO_P indicates that
>     this RVV
>         instruction only needs the SEW/LMUL ratio to remain the same,
>     and does not
>     diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
>     index dd17056fe82..08de62853a6 100644
>     --- a/gcc/config/riscv/t-riscv
>     +++ b/gcc/config/riscv/t-riscv
>     @@ -69,6 +69,12 @@ riscv-vsetvl.o:
>     $(srcdir)/config/riscv/riscv-vsetvl.cc \
>     $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>     $(srcdir)/config/riscv/riscv-vsetvl.cc
>     +riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
>     +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
>     +  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h
>     + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>     + $(srcdir)/config/riscv/riscv-avlprop.cc
>     +
>     riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
>        $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H)
>     $(FUNCTION_H) \
>        $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
>     diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
>     index ef91950178f..0c59d1b90bc 100644
>     --- a/gcc/config/riscv/vector.md
>     +++ b/gcc/config/riscv/vector.md
>     @@ -809,7 +809,7 @@
>     V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
>     V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
>     V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
>     -    (symbol_ref "riscv_vector::NONVLMAX")
>     +    (symbol_ref "riscv_vector::VLS")
>     (eq_attr "type"
>     "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
>       vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
>     vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
>     diff --git
>     a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>     b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>     index 928a507a363..5278e4aa38f 100644
>     --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>     +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>     @@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
>          }
>     }
>     -/* { dg-final { scan-assembler {e32,m4} } } */
>     +/* { dg-final { scan-assembler {e16,m2} } } */
>     /* { dg-final { scan-assembler-not {csrr} } } */
>     /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" }
>     } */
>     /* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" }
>     } */
>     diff --git
>     a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>     b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>     index a50265fc1ec..1db2e073846 100644
>     --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>     +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>     @@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict
>     b, int n)
>          a[i] = a[i] + b[i];
>     }
>     -/* { dg-final { scan-assembler {e32,m8} } } */
>     +/* { dg-final { scan-assembler {e16,m4} } } */
>     /* { dg-final { scan-assembler-not {csrr} } } */
>     /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" }
>     } */
>     /* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
>     diff --git
>     a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>     b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>     index eac7cbc757b..ca88d42cdf4 100644
>     --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>     +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>     @@ -7,10 +7,11 @@
>     /*
>     ** foo:
>     **
>     vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>     +** ...
>     ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
>     ** ...
>     -**
>     vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>     -** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
>     +**
>     vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>     +** ...
>     ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
>     ** ...
>     */
>     diff --git
>     a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>     b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>     index 965365da4bb..13367423751 100644
>     ---
>     a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>     +++
>     b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>     @@ -3,7 +3,6 @@
>     #include "ternop-2.c"
>     -/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
>     /* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */
>     /* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized"
>     } } */
>     /* { dg-final { scan-assembler-not {\tvmv} } } */
>     diff --git
>     a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>     b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>     new file mode 100644
>     index 00000000000..b0d21650c3d
>     --- /dev/null
>     +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>     @@ -0,0 +1,16 @@
>     +/* { dg-do compile } */
>     +/* { dg-options "-march=rv64gcv -mabi=lp64d
>     --param=riscv-autovec-preference=fixed-vlmax -O3" } */
>     +
>     +void
>     +foo (int *__restrict a, int *__restrict b, int *__restrict c, int n)
>     +{
>     +  for (int i = 0; i < n; i++)
>     +    a[i] = b[i] + c[i];
>     +}
>     +
>     +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
>     +/* { dg-final { scan-assembler-not {vsetivli} } } */
>     +/* { dg-final { scan-assembler-times
>     {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
>     +/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero}
>     } } */
>     +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
>     +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
>     diff --git
>     a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>     b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>     new file mode 100644
>     index 00000000000..f2d8aa54b88
>     --- /dev/null
>     +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>     @@ -0,0 +1,33 @@
>     +/* { dg-do compile } */
>     +/* { dg-options "-march=rv64gcv -mabi=lp64d
>     --param=riscv-autovec-preference=fixed-vlmax -O3" } */
>     +
>     +void
>     +foo (int *__restrict a, int *__restrict b, int *__restrict c,
>     +     int *__restrict a2, int *__restrict b2, int *__restrict c2,
>     +     int *__restrict a3, int *__restrict b3, int *__restrict c3,
>     +     int *__restrict a4, int *__restrict b4, int *__restrict c4,
>     +     int *__restrict a5, int *__restrict b5, int *__restrict c5,
>     +     int *__restrict d, int *__restrict d2, int *__restrict d3,
>     +     int *__restrict d4, int *__restrict d5, int n, int m)
>     +{
>     +  for (int i = 0; i < n; i++)
>     +    {
>     +      a[i] = b[i] + c[i];
>     +      a2[i] = b2[i] + c2[i];
>     +      a3[i] = b3[i] + c3[i];
>     +      a4[i] = b4[i] + c4[i];
>     +      a5[i] = a[i] + a4[i];
>     +      d[i] = a[i] - a2[i];
>     +      d2[i] = a2[i] * a[i];
>     +      d3[i] = a3[i] * a2[i];
>     +      d4[i] = a2[i] * d2[i];
>     +      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
>     +    }
>     +}
>     +
>     +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
>     +/* { dg-final { scan-assembler-not {vsetivli} } } */
>     +/* { dg-final { scan-assembler-times
>     {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
>     +/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero}
>     } } */
>     +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
>     +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
>     diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>     b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>     index 674ba0d72b4..fc830f2cd4d 100644
>     --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>     +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>     @@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain
>     $srcdir/$subdir/vsetvl/*.\[cS\]]] \
>     "" $CFLAGS
>     dg-runtest [lsort [glob -nocomplain
>     $srcdir/$subdir/autovec/*.\[cS\]]] \
>     "-O3 -ftree-vectorize" $CFLAGS
>     +dg-runtest [lsort [glob -nocomplain
>     $srcdir/$subdir/avlprop/*.\[cS\]]] \
>     + "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
>     dg-runtest [lsort [glob -nocomplain
>     $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
>     "-O3 -ftree-vectorize --param riscv-autovec-preference=scalable"
>     $CFLAGS
>     dg-runtest [lsort [glob -nocomplain
>     $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
>     -- 
>     2.36.3
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-24  4:30   ` Patrick O'Neill
@ 2023-10-24 15:03     ` Patrick O'Neill
  2023-10-25 12:20       ` juzhe.zhong
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick O'Neill @ 2023-10-24 15:03 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: kito.cheng, Kito.cheng, jeffreyalaw, Robin Dapp

[-- Attachment #1: Type: text/plain, Size: 79491 bytes --]

I'm seeing a variety of new failures, constrained to rv32gcv:

Tested using newlib/linux:
rv32gcv/ ilp32d/ medlow
rv64gcv/  lp64d/ medlow
rv64gcv_zvbb_zvbc_zvkg_zvkn_zvknc_zvkned_zvkng_zvknha_zvknhb_zvks_zvksc_zvksed_zvksg_zvksh_zvkt/ 
lp64d/ medlow
rv64imafdcv_zicond_zawrs_zbc_zvkng_zvksg_zvbb_zvbc_zicsr_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt/ 
lp64d/ medlow

Newlib failures:
rv32gcv:
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects 
execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects 
execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test

Debug log for testcases that aren't pr110557.c look like this:

Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o 
./popcount-run-1.exe (timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o 
./popcount-run-1.exe PASS: 
gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c (test for excess 
errors) spawn riscv64-unknown-elf-run ./popcount-run-1.exe FAIL: 
gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

Debug log for pr110557.c:

Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ 
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../  
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow   
-fdiagnostics-plain-output  -nostdinc++ 
-I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf 
-I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util 
-fmessage-length=0  -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-fdump-tree-vect-details        
-L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  
-L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs  
-lm  -o ./pr110557.exe    (timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ 
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-nostdinc++ 
-I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf 
-I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util 
-fmessage-length=0 -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-fdump-tree-vect-details 
-L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs 
-B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs 
-lm -o ./pr110557.exe PASS: g++.dg/vect/pr110557.cc  -std=c++14 (test 
for excess errors) spawn riscv64-unknown-elf-run ./pr110557.exe 
/scratch/tc-testing/tc-oct-23-avl/build-newlib/../scripts/wrapper/qemu/riscv64-unknown-elf-run: 
line 15: 3449805 Trace/breakpoint trap   (core dumped) 
QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 
5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL: 
g++.dg/vect/pr110557.cc  -std=c++14 execution test

Linux failures:
rv32gcv:
FAIL: gcc.dg/nextafter-2.c execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects 
execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects 
execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
FAIL: gfortran.dg/default_format_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_2.f90   -Os  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer 
-finline-functions  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -Os  execution test
FAIL: gfortran.dg/large_real_kind_2.F90   -O0  execution test
FAIL: gfortran.dg/round_4.f90   -O0  execution test
FAIL: gfortran.dg/zero_sized_3.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test
FAIL: gfortran.dg/ieee/large_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O1  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O2  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -Os  execution test
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  
-O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  
-O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_sum.f90 execution,  -O2 
-fomit-frame-pointer -finline-functions -funroll-loops

Some (not all) debug log outputs:

Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 
-march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -w -O2 -fomit-frame-pointer 
-finline-functions 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs 
-lm -o 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x 
(timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 
-march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -w -O2 -fomit-frame-pointer 
-finline-functions 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs 
-lm -o 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x 
PASS: gfortran.fortran-torture/execute/intrinsic_count.f90 compilation, 
-O2 -fomit-frame-pointer -finline-functions spawn 
riscv64-unknown-linux-gnu-run 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x 
STOP 2 FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 
execution, -O2 -fomit-frame-pointer -finline-functions Executing on 
host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -w -O2 -fomit-frame-pointer 
-finline-functions -funroll-loops 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs 
-lm -o 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x 
(timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -w -O2 -fomit-frame-pointer 
-finline-functions -funroll-loops 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs 
-lm -o 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x 
PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation, 
-O2 -fomit-frame-pointer -finline-functions -funroll-loops spawn 
riscv64-unknown-linux-gnu-run 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x 
STOP 3 FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 
execution, -O2 -fomit-frame-pointer -finline-functions -funroll-loops
Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
-fno-unsafe-math-optimizations -frounding-math -fsignaling-nans 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs 
-lm -o ./large_2.exe (timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ 
-fno-unsafe-math-optimizations -frounding-math -fsignaling-nans 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs 
-lm -o ./large_2.exe PASS: gfortran.dg/ieee/large_2.f90 -O0 (test for 
excess errors) spawn riscv64-unknown-linux-gnu-run ./large_2.exe 
0.333333333333333333333333333333333317 
2.24271998593667819112500193394291495E+1644 STOP 1 FAIL: 
gfortran.dg/ieee/large_2.f90 -O0 execution test Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-nostdinc++ 
-I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu 
-I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util 
-fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-fdump-tree-vect-details 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs 
-lm -o ./pr110557.exe (timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-nostdinc++ 
-I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu 
-I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward 
-I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util 
-fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-fdump-tree-vect-details 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ 
-L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs 
-lm -o ./pr110557.exe PASS: g++.dg/vect/pr110557.cc -std=c++98 (test for 
excess errors) spawn riscv64-unknown-linux-gnu-run ./pr110557.exe 
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: 
line 15: 323485 Trace/breakpoint trap (core dumped) 
QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 
5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL: 
g++.dg/vect/pr110557.cc -std=c++98 execution test Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
-fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe 
(timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
-fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe 
PASS: gcc.dg/vect/vect-reduc-dot-21.c (test for excess errors) spawn 
riscv64-unknown-linux-gnu-run ./vect-reduc-dot-21.exe 
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: 
line 15: 3484803 Aborted (core dumped) QEMU_CPU="$(march-to-cpu-opt 
--get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L 
${RISC_V_SYSROOT} "$@" FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution 
test Executing on host: 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
-fno-common -O2 -fdump-tree-vect-details -lm -o 
./vect-alias-check-16.exe (timeout = 600) spawn -ignore SIGHUP 
/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc 
-B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ 
/scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
--param riscv-autovec-preference=scalable --param riscv-vector-abi 
-ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
-fno-common -O2 -fdump-tree-vect-details -lm -o 
./vect-alias-check-16.exe PASS: gcc.dg/vect/vect-alias-check-16.c (test 
for excess errors) spawn riscv64-unknown-linux-gnu-run 
./vect-alias-check-16.exe 
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: 
line 15: 3431975 Aborted (core dumped) QEMU_CPU="$(march-to-cpu-opt 
--get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L 
${RISC_V_SYSROOT} "$@" FAIL: gcc.dg/vect/vect-alias-check-16.c execution 
test PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "flags: 
*RAW\\n" PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect 
"using an address-based overlap test" PASS: 
gcc.dg/vect/vect-alias-check-16.c scan-tree-dump-not vect "using an 
index-based"

I've observed nextafter-2.c being flaky on the CI so that particular 
failure might not be real.

If you want any particular testcase's debug logs please let me know.

Patrick

On 10/23/23 21:30, Patrick O'Neill wrote:
>
> The CI just picked it up: 
> https://github.com/ewlu/gcc-precommit-ci/issues/449#issue-1958483272
> Since it doesn't apply to the CI's baseline hash it's only performing 
> a build.
> I'll re-run it in the morning once the baseline has been updated.
>
> In the meantime I started a full build+test run on my local machine.
> I'll send you the results in ~10 hours - morning my time :-)
>
> Patrick
>
> On 10/23/23 20:44, juzhe.zhong@rivai.ai wrote:
>> CCing Patrick...
>>
>> Hi, @Patrick.
>> Could you apply this patch and trigger your regression CI?
>>
>> I don't have an environment to test fortran for now (I only test it 
>> on C/C++).
>>
>> Thanks.
>>
>> ------------------------------------------------------------------------
>> juzhe.zhong@rivai.ai
>>
>>     *From:* Juzhe-Zhong <mailto:juzhe.zhong@rivai.ai>
>>     *Date:* 2023-10-24 11:32
>>     *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
>>     *CC:* kito.cheng <mailto:kito.cheng@gmail.com>; kito.cheng
>>     <mailto:kito.cheng@sifive.com>; jeffreyalaw
>>     <mailto:jeffreyalaw@gmail.com>; rdapp.gcc
>>     <mailto:rdapp.gcc@gmail.com>; Juzhe-Zhong
>>     <mailto:juzhe.zhong@rivai.ai>
>>     *Subject:* [PATCH] RISC-V: Add AVL propagation PASS for RVV
>>     auto-vectorization
>>     This patch addresses the redundant AVL/VL toggling in RVV partial
>>     auto-vectorization
>>     which is a known issue for a long time and I finally find the
>>     time to address it.
>>     Consider a simple vector addition operation:
>>     https://godbolt.org/z/7hfGfEjW3
>>     void
>>     foo (int *__restrict a,
>>          int *__restrict b,
>>          int *__restrict n)
>>     {
>>       for (int i = 0; i < n; i++)
>>           a[i] = a[i] + b[i];
>>     }
>>     Optimized IR:
>>     Loop body:
>>       _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4,
>>     4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
>>       ...
>>       vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... },
>>     _38, 0);    -> vle32.v v2,0(a0)
>>       vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... },
>>     _38, 0);   -> vle32.v v1,0(a1)
>>       vect__7.12_19 = vect__6.11_20 +
>>     vect__4.8_27;                              -> vsetvli
>>     a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
>>       .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0,
>>     vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
>>     We can see 2 redundant vsetvls inside the loop body due to AVL/VL
>>     toggling.
>>     The AVL/VL toggling is because we are missing LEN information in
>>     simple PLUS_EXPR GIMPLE assignment:
>>     vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
>>     GCC apply partial predicate load/store and un-predicated full
>>     vector operation on partial vectorization.
>>     Such flow are used by all other targets like ARM SVE (RVV also
>>     uses such flow):
>>     ARM SVE:
>>     .L3:
>>             ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
>>             ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
>>             add     z31.s, z31.s, z30.s            -> un-predicated add
>>             st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store
>>     Such vectorization flow causes AVL/VL toggling on RVV so we need
>>     AVL propagation PASS for it.
>>     Also, It's very unlikely that we can apply predicated operations
>>     on all vectorization for following reasons:
>>     1. It's very heavy workload to support them on all vectorization
>>     and we don't see any benefits if we can handle that on targets
>>     backend.
>>     2. Changing Loop vectorizer for it will make code base ugly and
>>     hard to maintain.
>>     3. We will need so many patterns for all operations. Not only
>>     COND_LEN_ADD, COND_LEN_SUB, ....
>>        We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over
>>     100+ patterns, unreasonable number of patterns.
>>     To conclude, we prefer un-predicated operations here, and design
>>     a nice and clean AVL propagation PASS for it to elide the
>>     redundant vsetvls
>>     due to AVL/VL toggling.
>>     The second question is that why we separate a PASS called AVL
>>     propagation. Why not optimize it in VSETVL PASS (We definitetly
>>     can optimize AVL in VSETVL PASS)
>>     Frankly, I was planning to address such issue in VSETVL PASS
>>     that's why we recently refactored VSETVL PASS. However, I changed
>>     my mind recently after several
>>     experiments and tries.
>>     The reasons as follows:
>>     1. For code base management and maintainience. Current VSETVL
>>     PASS is complicated enough and aleady has enough aggressive and
>>     fancy optimizations which
>>        turns out it can always generate optimal codegen in most of
>>     the cases. It's not a good idea keep adding more features into
>>     VSETVL PASS to make VSETVL
>>     PASS become heavy and heavy again, then we will need to refactor
>>     it again in the future.
>>     Actuall, the VSETVL PASS is very stable and optimal after the
>>     recent refactoring. Hopefully, we should not change VSETVL PASS
>>     any more except the minor
>>     fixes.
>>     2. vsetvl insertion (VSETVL PASS does this thing) and AVL
>>     propagation are 2 different things,  I don't think we should fuse
>>     them into same PASS.
>>     3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should
>>     be done before RA which can reduce register allocation.
>>     4. This patch's AVL propagation PASS only does AVL propagation
>>     for RVV partial auto-vectorization situations.
>>        This patch's codes are only hundreds lines which is very
>>     managable and can be very easily extended features and enhancements.
>>     We can easily extend and enhance more AVL propagation in a clean
>>     and separate PASS in the future. (If we do it on VSETVL PASS, we
>>     will complicate
>>     VSETVL PASS again which is already so complicated.)
>>     Here is an example to demonstrate more:
>>     https://godbolt.org/z/bE86sv3q5
>>     void foo2 (int *__restrict a,
>>               int *__restrict b,
>>               int *__restrict c,
>>               int *__restrict a2,
>>               int *__restrict b2,
>>               int *__restrict c2,
>>               int *__restrict a3,
>>               int *__restrict b3,
>>               int *__restrict c3,
>>               int *__restrict a4,
>>               int *__restrict b4,
>>               int *__restrict c4,
>>               int *__restrict a5,
>>               int *__restrict b5,
>>               int *__restrict c5,
>>               int n)
>>     {
>>         for (int i = 0; i < n; i++){
>>           a[i] = b[i] + c[i];
>>           b5[i] = b[i] + c[i];
>>           a2[i] = b2[i] + c2[i];
>>           a3[i] = b3[i] + c3[i];
>>           a4[i] = b4[i] + c4[i];
>>           a5[i] = a[i] + a4[i];
>>           a[i] = a5[i] + b5[i]+ a[i];
>>           a[i] = a[i] + c[i];
>>           b5[i] = a[i] + c[i];
>>           a2[i] = a[i] + c2[i];
>>           a3[i] = a[i] + c3[i];
>>           a4[i] = a[i] + c4[i];
>>           a5[i] = a[i] + a4[i];
>>           a[i] = a[i] + b5[i]+ a[i];
>>         }
>>     }
>>     1. Loop Body:
>>     Before this patch:                                          After
>>     this patch:
>>           vsetvli a4,t1,e8,mf4,ta,ma                          
>>     vsetvli a4,t1,e32,m1,ta,ma
>>             vle32.v v2,0(a2)                                    
>>     vle32.v v2,0(a2)
>>             vle32.v v4,0(a1)                                    
>>     vle32.v v3,0(t2)
>>             vle32.v v1,0(t2)                                    
>>     vle32.v v4,0(a1)
>>             vsetvli a7,zero,e32,m1,ta,ma                        
>>     vle32.v v1,0(t0)
>>             vadd.vv v4,v2,v4                                    
>>     vadd.vv v4,v2,v4
>>             vsetvli zero,a4,e32,m1,ta,ma                        
>>     vadd.vv v1,v3,v1
>>             vle32.v v3,0(s0)                                    
>>     vadd.vv v1,v1,v4
>>             vsetvli a7,zero,e32,m1,ta,ma                        
>>     vadd.vv v1,v1,v4
>>             vadd.vv v1,v3,v1                                    
>>     vadd.vv v1,v1,v4
>>             vadd.vv v1,v1,v4                                    
>>     vadd.vv v1,v1,v2
>>             vadd.vv v1,v1,v4                                    
>>     vadd.vv v2,v1,v2
>>             vadd.vv v1,v1,v4                                    
>>     vse32.v v2,0(t5)
>>             vsetvli zero,a4,e32,m1,ta,ma                        
>>     vadd.vv v2,v2,v1
>>             vle32.v v4,0(a5)                                    
>>     vadd.vv v2,v2,v1
>>             vsetvli a7,zero,e32,m1,ta,ma                         slli
>>     a7,a4,2
>>             vadd.vv v1,v1,v2                                    
>>     vadd.vv v3,v1,v3
>>             vadd.vv v2,v1,v2                                    
>>     vle32.v v5,0(a5)
>>             vadd.vv v4,v1,v4                                    
>>     vle32.v v6,0(t6)
>>             vsetvli zero,a4,e32,m1,ta,ma                        
>>     vse32.v v3,0(t3)
>>             vse32.v v2,0(t5)                                    
>>     vse32.v v2,0(a0)
>>             vse32.v v4,0(a3)                                    
>>     vadd.vv v3,v3,v1
>>             vsetvli a7,zero,e32,m1,ta,ma                        
>>     vadd.vv v2,v1,v5
>>             vadd.vv v3,v1,v3                                    
>>     vse32.v v3,0(t4)
>>             vadd.vv v2,v2,v1                                    
>>     vadd.vv v1,v1,v6
>>             vadd.vv v2,v2,v1                                    
>>     vse32.v v2,0(a3)
>>             vsetvli zero,a4,e32,m1,ta,ma                        
>>     vse32.v v1,0(a6)
>>             vse32.v v2,0(a0)
>>             vse32.v v3,0(t3)
>>             vle32.v v2,0(t0)
>>             vsetvli a7,zero,e32,m1,ta,ma
>>             vadd.vv v3,v3,v1
>>             vsetvli zero,a4,e32,m1,ta,ma
>>             vse32.v v3,0(t4)
>>             vsetvli a7,zero,e32,m1,ta,ma
>>             slli a7,a4,2
>>             vadd.vv v1,v1,v2
>>             sub t1,t1,a4
>>             vsetvli zero,a4,e32,m1,ta,ma
>>             vse32.v v1,0(a6)
>>     It's quite obvious, all heavy && redundant vsetvls inside loop
>>     body are eliminated.
>>     2. Epilogue:
>>         Before this patch:                                         
>>     After this patch:
>>     .L5: .L5:
>>             ld s0,8(sp)                                         ret
>>             addi sp,sp,16
>>             jr ra
>>     This is the benefit we do the AVL propation before RA since we
>>     eliminate the use of 'a7' register
>>     which is used by the redudant AVL/VL toggling instruction:
>>     'vsetvli a7,zero,e32,m1,ta,ma'
>>     The final codegen after this patch:
>>     foo2:
>>     lw t1,56(sp)
>>     ld t6,0(sp)
>>     ld t3,8(sp)
>>     ld t0,16(sp)
>>     ld t2,24(sp)
>>     ld t4,32(sp)
>>     ld t5,40(sp)
>>     ble t1,zero,.L5
>>     .L3:
>>     vsetvli a4,t1,e32,m1,ta,ma
>>     vle32.v v2,0(a2)
>>     vle32.v v3,0(t2)
>>     vle32.v v4,0(a1)
>>     vle32.v v1,0(t0)
>>     vadd.vv v4,v2,v4
>>     vadd.vv v1,v3,v1
>>     vadd.vv v1,v1,v4
>>     vadd.vv v1,v1,v4
>>     vadd.vv v1,v1,v4
>>     vadd.vv v1,v1,v2
>>     vadd.vv v2,v1,v2
>>     vse32.v v2,0(t5)
>>     vadd.vv v2,v2,v1
>>     vadd.vv v2,v2,v1
>>     slli a7,a4,2
>>     vadd.vv v3,v1,v3
>>     vle32.v v5,0(a5)
>>     vle32.v v6,0(t6)
>>     vse32.v v3,0(t3)
>>     vse32.v v2,0(a0)
>>     vadd.vv v3,v3,v1
>>     vadd.vv v2,v1,v5
>>     vse32.v v3,0(t4)
>>     vadd.vv v1,v1,v6
>>     vse32.v v2,0(a3)
>>     vse32.v v1,0(a6)
>>     sub t1,t1,a4
>>     add a1,a1,a7
>>     add a2,a2,a7
>>     add a5,a5,a7
>>     add t6,t6,a7
>>     add t0,t0,a7
>>     add t2,t2,a7
>>     add t5,t5,a7
>>     add a3,a3,a7
>>     add a6,a6,a7
>>     add t3,t3,a7
>>     add t4,t4,a7
>>     add a0,a0,a7
>>     bne t1,zero,.L3
>>     .L5:
>>     ret
>>     PR target/111888
>>     gcc/ChangeLog:
>>     * config.gcc: Add AVL propgatation PASS.
>>     * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
>>     * config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
>>     (has_vtype_op): Export as global.
>>     (has_vl_op): Ditto.
>>     (tail_agnostic_p): Ditto.
>>     (validate_change_or_fail): Ditto.
>>     (vlmax_avl_type_p): Ditto.
>>     (vlmax_avl_p): Ditto.
>>     (get_sew): Ditto.
>>     (enum vlmul_type): Ditto.
>>     (const_vlmax_p): Ditto.
>>     * config/riscv/riscv-v.cc (has_vtype_op): Ditto.
>>     (has_vl_op): Ditto.
>>     (get_default_ta): Ditto.
>>     (tail_agnostic_p): Ditto.
>>     (validate_change_or_fail): Ditto.
>>     (vlmax_avl_type_p): Ditto.
>>     (vlmax_avl_p): Ditto.
>>     (get_sew): Ditto.
>>     (enum vlmul_type): Ditto.
>>     (get_vlmul): Ditto.
>>     * config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
>>     (has_vtype_op): Ditto.
>>     (has_vl_op): Ditto.
>>     (get_sew): Ditto.
>>     (get_vlmul): Ditto.
>>     (get_default_ta): Ditto.
>>     (tail_agnostic_p): Ditto.
>>     (validate_change_or_fail): Ditto.
>>     * config/riscv/t-riscv: Add AVL propagation PASS.
>>     * config/riscv/vector.md: Fix VLS modes attribute.
>>     * config/riscv/riscv-avlprop.cc: New file.
>>     gcc/testsuite/ChangeLog:
>>     * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
>>     * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
>>     * gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
>>     * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
>>     * gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
>>     * gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
>>     * gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.
>>     ---
>>     gcc/config.gcc                                |   2 +-
>>     gcc/config/riscv/riscv-avlprop.cc             | 350
>>     ++++++++++++++++++
>>     gcc/config/riscv/riscv-passes.def             |   1 +
>>     gcc/config/riscv/riscv-protos.h               |  10 +
>>     gcc/config/riscv/riscv-v.cc                   |  84 ++++-
>>     gcc/config/riscv/riscv-vsetvl.cc              |  82 +---
>>     gcc/config/riscv/t-riscv                      |   6 +
>>     gcc/config/riscv/vector.md                    |   2 +-
>>     .../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +-
>>     .../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +-
>>     .../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +-
>>     .../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 -
>>     .../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
>>     .../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
>>     gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
>>     15 files changed, 514 insertions(+), 84 deletions(-)
>>     create mode 100644 gcc/config/riscv/riscv-avlprop.cc
>>     create mode 100644
>>     gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>     create mode 100644
>>     gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>     diff --git a/gcc/config.gcc b/gcc/config.gcc
>>     index 606d3a8513e..efd53965c9a 100644
>>     --- a/gcc/config.gcc
>>     +++ b/gcc/config.gcc
>>     @@ -544,7 +544,7 @@ pru-*-*)
>>     riscv*)
>>     cpu_type=riscv
>>     extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o
>>     riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
>>     - extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o
>>     riscv-vector-costs.o"
>>     + extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o
>>     riscv-vector-costs.o riscv-avlprop.o"
>>     extra_objs="${extra_objs} riscv-vector-builtins.o
>>     riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>>     extra_objs="${extra_objs} thead.o"
>>     d_target_objs="riscv-d.o"
>>     diff --git a/gcc/config/riscv/riscv-avlprop.cc
>>     b/gcc/config/riscv/riscv-avlprop.cc
>>     new file mode 100644
>>     index 00000000000..bf3becd8371
>>     --- /dev/null
>>     +++ b/gcc/config/riscv/riscv-avlprop.cc
>>     @@ -0,0 +1,350 @@
>>     +/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler.
>>     +   Copyright (C) 2023-2023 Free Software Foundation, Inc.
>>     +   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI
>>     Technologies Ltd.
>>     +
>>     +This file is part of GCC.
>>     +
>>     +GCC is free software; you can redistribute it and/or modify
>>     +it under the terms of the GNU General Public License as published by
>>     +the Free Software Foundation; either version 3, or(at your option)
>>     +any later version.
>>     +
>>     +GCC is distributed in the hope that it will be useful,
>>     +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>     +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>     +GNU General Public License for more details.
>>     +
>>     +You should have received a copy of the GNU General Public License
>>     +along with GCC; see the file COPYING3.  If not see
>>     +<http://www.gnu.org/licenses/>. */
>>     +
>>     +/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions.
>>     +   A standalone AVL propagation pass is designed because:
>>     +
>>     +     - Better code maintain:
>>     +       Current LCM-based VSETVL pass is so complicated that codes
>>     +       there will become even harder to maintain. A straight forward
>>     +       AVL propagation PASS is much easier to maintain.
>>     +
>>     +     - Reduce scalar register pressure:
>>     +       A type of AVL propagation is we propagate AVL from NON-VLMAX
>>     +       instruction to VLMAX instruction.
>>     +       Note: VLMAX instruction should be ignore tail elements (TA)
>>     +       and the result should be used by the NON-VLMAX instruction.
>>     +       This optimization is mostly for auto-vectorization codes:
>>     +
>>     +   vsetvli r136, r137      --- SELECT_VL
>>     +   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
>>     +   vadd.vv (use VLMAX)     --- PLUS_EXPR
>>     +   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
>>     +
>>     + NO AVL propation:
>>     +
>>     +   vsetvli a5, a4, ta
>>     +   vle8.v v1
>>     +   vsetvli t0, zero, ta
>>     +   vadd.vv v2, v1, v1
>>     +   vse8.v v2
>>     +
>>     + We can propagate the AVL to 'vadd.vv' since its result
>>     + is consumed by a 'vse8.v' which has AVL = a5 and its
>>     + tail elements are agnostic.
>>     +
>>     +       We DON'T do this optimization on VSETVL pass since it is a
>>     +       post-RA pass that consumed 't0' already wheras a standalone
>>     +       pre-RA AVL propagation pass allows us elide the consumption
>>     +       of the pseudo register of 't0' then we can reduce scalar
>>     +       register pressure.
>>     +
>>     +     - More AVL propagation opportunities:
>>     +       A pre-RA pass is more flexible for AVL REG def-use chain,
>>     +       thus we will get more potential AVL propagation as long as
>>     +       it doesn't increase the scalar register pressure.
>>     +*/
>>     +
>>     +#define IN_TARGET_CODE 1
>>     +#define INCLUDE_ALGORITHM
>>     +#define INCLUDE_FUNCTIONAL
>>     +
>>     +#include "config.h"
>>     +#include "system.h"
>>     +#include "coretypes.h"
>>     +#include "tm.h"
>>     +#include "backend.h"
>>     +#include "rtl.h"
>>     +#include "target.h"
>>     +#include "tree-pass.h"
>>     +#include "df.h"
>>     +#include "rtl-ssa.h"
>>     +#include "cfgcleanup.h"
>>     +#include "insn-attr.h"
>>     +
>>     +using namespace rtl_ssa;
>>     +using namespace riscv_vector;
>>     +
>>     +/* The AVL propagation instructions and corresponding preferred AVL.
>>     +   It will be updated during the analysis.  */
>>     +static hash_map<insn_info *, rtx> *avlprops;
>>     +
>>     +const pass_data pass_data_avlprop = {
>>     +  RTL_PASS, /* type */
>>     +  "avlprop", /* name */
>>     +  OPTGROUP_NONE, /* optinfo_flags */
>>     +  TV_NONE, /* tv_id */
>>     +  0, /* properties_required */
>>     +  0, /* properties_provided */
>>     +  0, /* properties_destroyed */
>>     +  0, /* todo_flags_start */
>>     +  0, /* todo_flags_finish */
>>     +};
>>     +
>>     +class pass_avlprop : public rtl_opt_pass
>>     +{
>>     +public:
>>     +  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass
>>     (pass_data_avlprop, ctxt) {}
>>     +
>>     +  /* opt_pass methods: */
>>     +  virtual bool gate (function *) final override
>>     +  {
>>     +    return TARGET_VECTOR && optimize > 0;
>>     +  }
>>     +  virtual unsigned int execute (function *) final override;
>>     +}; // class pass_avlprop
>>     +
>>     +static void
>>     +avlprop_init (void)
>>     +{
>>     +  calculate_dominance_info (CDI_DOMINATORS);
>>     +  df_analyze ();
>>     +  crtl->ssa = new function_info (cfun);
>>     +  avlprops = new hash_map<insn_info *, rtx>;
>>     +}
>>     +
>>     +static void
>>     +avlprop_done (void)
>>     +{
>>     +  free_dominance_info (CDI_DOMINATORS);
>>     +  if (crtl->ssa->perform_pending_updates ())
>>     +    cleanup_cfg (0);
>>     +  delete crtl->ssa;
>>     +  crtl->ssa = nullptr;
>>     +  delete avlprops;
>>     +  avlprops = NULL;
>>     +}
>>     +
>>     +/* Helper function to get AVL operand.  */
>>     +static rtx
>>     +get_avl (insn_info *insn, bool avlprop_p)
>>     +{
>>     +  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
>>     +      || get_attr_avl_type (insn->rtl ()) == VLS)
>>     +    return NULL_RTX;
>>     +  if (avlprop_p)
>>     +    {
>>     +      if (avlprops->get (insn))
>>     + return (*avlprops->get (insn));
>>     +      else if (vlmax_avl_type_p (insn->rtl ()))
>>     + return RVV_VLMAX;
>>     +    }
>>     +  extract_insn_cached (insn->rtl ());
>>     +  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
>>     +}
>>     +
>>     +/* This is a straight forward pattern ALWAYS in paritial
>>     auto-vectorization:
>>     +
>>     +     VL = SELECT_AVL (AVL, ...)
>>     +     V0 = MASK_LEN_LOAD (..., VL)
>>     +     V1 = MASK_LEN_LOAD (..., VL)
>>     +     V2 = V0 + V1 --- Missed LEN information.
>>     +     MASK_LEN_STORE (..., V2, VL)
>>     +
>>     +   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0,
>>     V1, dummy LEN)
>>     +   because:
>>     +
>>     +     - Few code changes in Loop Vectorizer.
>>     +     - Reuse the current clean flow of partial vectorization,
>>     That is, apply
>>     +       predicate LEN or MASK into LOAD/STORE operations and
>>     other special
>>     +       arithmetic operations (e.d. DIV), then do the whole
>>     vector register
>>     +       operation if it DON'T affect the correctness.
>>     +       Such flow is used by all other targets like x86, sve,
>>     s390, ... etc.
>>     +     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
>>     +
>>     +   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like
>>     PLUS_EXPR which
>>     +   generates the VLMAX instruction due to missed LEN
>>     information. The later
>>     +   VSETVL PASS will elided the redundant vsetvls.
>>     +*/
>>     +
>>     +static rtx
>>     +get_autovectorize_preferred_avl (insn_info *insn)
>>     +{
>>     +  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p
>>     (insn->rtl ()))
>>     +    return NULL_RTX;
>>     +
>>     +  rtx use_avl = NULL_RTX;
>>     +  insn_info *avl_use_insn = nullptr;
>>     +  unsigned int ratio
>>     +    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul
>>     (insn->rtl ()));
>>     +  for (def_info *def : insn->defs ())
>>     +    {
>>     +      auto set = safe_dyn_cast<set_info *> (def);
>>     +      if (!set || !set->is_reg ())
>>     + return NULL_RTX;
>>     +      for (use_info *use : set->all_uses ())
>>     + {
>>     +   if (!use->is_in_nondebug_insn ())
>>     +     return NULL_RTX;
>>     +   insn_info *use_insn = use->insn ();
>>     +   /* FIXME: Stop AVL propagation if any USE is not a RVV real
>>     +      instruction. It should be totally enough for vectorized
>>     codes since
>>     +      they always locate at extended blocks.
>>     +
>>     +      TODO: We can extend PHI checking for intrinsic codes if it
>>     +      necessary in the future.  */
>>     +   if (use_insn->is_artificial () || !has_vtype_op
>>     (use_insn->rtl ()))
>>     +     return NULL_RTX;
>>     +   if (!has_vl_op (use_insn->rtl ()))
>>     +     continue;
>>     +
>>     +   rtx new_use_avl = get_avl (use_insn, true);
>>     +   if (!new_use_avl)
>>     +     return NULL_RTX;
>>     +   if (!use_avl)
>>     +     use_avl = new_use_avl;
>>     +   if (!rtx_equal_p (use_avl, new_use_avl)
>>     +       || calculate_ratio (get_sew (use_insn->rtl ()),
>>     +   get_vlmul (use_insn->rtl ()))
>>     +    != ratio
>>     +       || vlmax_avl_p (new_use_avl)
>>     +       || !tail_agnostic_p (use_insn->rtl ()))
>>     +     return NULL_RTX;
>>     +   if (!avl_use_insn)
>>     +     avl_use_insn = use_insn;
>>     + }
>>     +    }
>>     +
>>     +  if (use_avl && register_operand (use_avl, Pmode))
>>     +    {
>>     +      gcc_assert (avl_use_insn);
>>     +      // Find a definition at or neighboring INSN.
>>     +      resource_info resource = full_register (REGNO (use_avl));
>>     +      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
>>     +      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
>>     +      if (dl1.matching_set () || dl2.matching_set ())
>>     + return NULL_RTX;
>>     +      def_info *def1 = dl1.last_def_of_prev_group ();
>>     +      def_info *def2 = dl2.last_def_of_prev_group ();
>>     +      if (def1 != def2)
>>     + return NULL_RTX;
>>     +      /* FIXME: We only all AVL propation within a block which
>>     should
>>     + be totally enough for vectorized codes.
>>     +
>>     + TODO: We can enhance it here for intrinsic codes in the future
>>     + if it is necessary.  */
>>     +      if (def1->insn ()->bb () != insn->bb ()
>>     +   || def1->insn ()->compare_with (insn) >= 0)
>>     + return NULL_RTX;
>>     +    }
>>     +  return use_avl;
>>     +}
>>     +
>>     +/* If we have a preferred AVL to propagate, return the AVL.
>>     +   Otherwise, return NULL_RTX as we don't need have any preferred
>>     +   AVL.  */
>>     +
>>     +static rtx
>>     +get_preferred_avl (insn_info *insn)
>>     +{
>>     +  /* TODO: We only do AVL propagation for missed-LEN partial
>>     +     autovectorization for now.  We could add more more AVL
>>     +     propagation for intrinsic codes in the future. */
>>     +  return get_autovectorize_preferred_avl (insn);
>>     +}
>>     +
>>     +/* Return the AVL TYPE operand index.  */
>>     +static int
>>     +get_avl_type_index (insn_info *insn)
>>     +{
>>     +  extract_insn_cached (insn->rtl ());
>>     +  /* Except rounding mode patterns, AVL TYPE operand
>>     +     is always the last operand.  */
>>     +  if (find_access (insn->uses (), VXRM_REGNUM)
>>     +      || find_access (insn->uses (), FRM_REGNUM))
>>     +    return recog_data.n_operands - 2;
>>     +  return recog_data.n_operands - 1;
>>     +}
>>     +
>>     +/* Main entry point for this pass.  */
>>     +unsigned int
>>     +pass_avlprop::execute (function *)
>>     +{
>>     +  avlprop_init ();
>>     +
>>     +  /* Go through all the instructions looking for AVL that we
>>     could propagate. */
>>     +
>>     +  insn_info *next;
>>     +  bool change_p = true;
>>     +
>>     +  while (change_p)
>>     +    {
>>     +      /* Iterate on each instruction until no more change need.  */
>>     +      change_p = false;
>>     +      for (insn_info *insn = crtl->ssa->first_insn (); insn;
>>     insn = next)
>>     + {
>>     +   next = insn->next_any_insn ();
>>     +   /* We only forward AVL to the instruction that has AVL/VL operand
>>     +      and can be optimized in RTL_SSA level.  */
>>     +   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
>>     +     continue;
>>     +
>>     +   rtx new_avl = get_preferred_avl (insn);
>>     +   if (new_avl)
>>     +     {
>>     +       gcc_assert (!vlmax_avl_p (new_avl));
>>     +       auto &update = avlprops->get_or_insert (insn);
>>     +       change_p = !rtx_equal_p (update, new_avl);
>>     +       update = new_avl;
>>     +     }
>>     + }
>>     +    }
>>     +
>>     +  if (dump_file)
>>     +    fprintf (dump_file, "\nNumber of successful AVL
>>     propagations: %d\n\n",
>>     +      (int) avlprops->elements ());
>>     +
>>     +  for (const auto iter : *avlprops)
>>     +    {
>>     +      rtx_insn *rinsn = iter.first->rtl ();
>>     +      if (dump_file)
>>     + {
>>     +   fprintf (dump_file, "\nPropagating AVL: ");
>>     +   print_rtl_single (dump_file, iter.second);
>>     +   fprintf (dump_file, "into: ");
>>     +   print_rtl_single (dump_file, rinsn);
>>     + }
>>     +      /* Replace AVL operand.  */
>>     +      rtx new_pat
>>     + = simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first,
>>     false),
>>     + iter.second);
>>     +      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat,
>>     false);
>>     +
>>     +      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
>>     +      if (vlmax_avl_type_p (rinsn))
>>     + validate_change_or_fail (
>>     +   rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)],
>>     +   get_avl_type_rtx (avl_type::NONVLMAX), false);
>>     +      if (dump_file)
>>     + {
>>     +   fprintf (dump_file, "Successfully to match this instruction: ");
>>     +   print_rtl_single (dump_file, rinsn);
>>     + }
>>     +    }
>>     +
>>     +  avlprop_done ();
>>     +  return 0;
>>     +}
>>     +
>>     +rtl_opt_pass *
>>     +make_pass_avlprop (gcc::context *ctxt)
>>     +{
>>     +  return new pass_avlprop (ctxt);
>>     +}
>>     diff --git a/gcc/config/riscv/riscv-passes.def
>>     b/gcc/config/riscv/riscv-passes.def
>>     index 4084122cf0a..b6260939d5c 100644
>>     --- a/gcc/config/riscv/riscv-passes.def
>>     +++ b/gcc/config/riscv/riscv-passes.def
>>     @@ -18,4 +18,5 @@
>>     <http://www.gnu.org/licenses/>. */
>>     INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
>>     +INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
>>     INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
>>     diff --git a/gcc/config/riscv/riscv-protos.h
>>     b/gcc/config/riscv/riscv-protos.h
>>     index 6cb9d459ee9..2b09ec9ea9e 100644
>>     --- a/gcc/config/riscv/riscv-protos.h
>>     +++ b/gcc/config/riscv/riscv-protos.h
>>     @@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const
>>     char *, struct gcc_options *, locatio
>>     extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
>>     rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
>>     +rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
>>     rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
>>     /* Routines implemented in riscv-string.c.  */
>>     @@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
>>     bool cmp_lmul_gt_one (machine_mode);
>>     bool gather_scatter_valid_offset_mode_p (machine_mode);
>>     bool vls_mode_valid_p (machine_mode);
>>     +bool has_vtype_op (rtx_insn *);
>>     +bool has_vl_op (rtx_insn *);
>>     +bool tail_agnostic_p (rtx_insn *);
>>     +void validate_change_or_fail (rtx, rtx *, rtx, bool);
>>     +bool vlmax_avl_type_p (rtx_insn *);
>>     +bool vlmax_avl_p (rtx);
>>     +uint8_t get_sew (rtx_insn *);
>>     +enum vlmul_type get_vlmul (rtx_insn *);
>>     +bool const_vlmax_p (machine_mode);
>>     }
>>     /* We classify builtin types into two classes:
>>     diff --git a/gcc/config/riscv/riscv-v.cc
>>     b/gcc/config/riscv/riscv-v.cc
>>     index e39a9507803..473622ac321 100644
>>     --- a/gcc/config/riscv/riscv-v.cc
>>     +++ b/gcc/config/riscv/riscv-v.cc
>>     @@ -56,7 +56,7 @@ using namespace riscv_vector;
>>     namespace riscv_vector {
>>     /* Return true if vlmax is constant value and can be used in
>>     vsetivl.  */
>>     -static bool
>>     +bool
>>     const_vlmax_p (machine_mode mode)
>>     {
>>        poly_uint64 nuints = GET_MODE_NUNITS (mode);
>>     @@ -298,14 +298,6 @@ public:
>>           len = force_reg (Pmode, len);
>>         vls_p = true;
>>       }
>>     - else if (const_vlmax_p (vtype_mode))
>>     -   {
>>     -     /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
>>     -        the vsetvli to obtain the value of vlmax.  */
>>     -     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
>>     -     len = gen_int_mode (nunits, Pmode);
>>     -     vls_p = true;
>>     -   }
>>     else if (can_create_pseudo_p ())
>>       {
>>         len = gen_reg_rtx (Pmode);
>>     @@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
>>        emit_move_insn (dst, x4);
>>     }
>>     +/* Return true if it is an RVV instruction depends on VTYPE global
>>     +   status register.  */
>>     +bool
>>     +has_vtype_op (rtx_insn *rinsn)
>>     +{
>>     +  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op
>>     (rinsn);
>>     +}
>>     +
>>     +/* Return true if it is an RVV instruction depends on VL global
>>     +   status register.  */
>>     +bool
>>     +has_vl_op (rtx_insn *rinsn)
>>     +{
>>     +  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
>>     +}
>>     +
>>     +/* Get default tail policy.  */
>>     +static bool
>>     +get_default_ta ()
>>     +{
>>     +  /* For the instruction that doesn't require TA, we still need
>>     a default value
>>     +     to emit vsetvl. We pick up the default value according to
>>     prefer policy. */
>>     +  return (bool) (get_prefer_tail_policy () & 0x1
>>     + || (get_prefer_tail_policy () >> 1 & 0x1));
>>     +}
>>     +
>>     +/* Helper function to get TA operand.  */
>>     +bool
>>     +tail_agnostic_p (rtx_insn *rinsn)
>>     +{
>>     +  /* If it doesn't have TA, we return agnostic by default.  */
>>     +  extract_insn_cached (rinsn);
>>     +  int ta = get_attr_ta (rinsn);
>>     +  return ta == INVALID_ATTRIBUTE ? get_default_ta () :
>>     IS_AGNOSTIC (ta);
>>     +}
>>     +
>>     +/* Change insn and Assert the change always happens. */
>>     +void
>>     +validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool
>>     in_group)
>>     +{
>>     +  bool change_p = validate_change (object, loc, new_rtx, in_group);
>>     +  gcc_assert (change_p);
>>     +}
>>     +
>>     +/* Return true if it is VLMAX AVL TYPE.  */
>>     +bool
>>     +vlmax_avl_type_p (rtx_insn *rinsn)
>>     +{
>>     +  return get_attr_avl_type (rinsn) == VLMAX;
>>     +}
>>     +
>>     +/* Return true if RTX is RVV VLMAX AVL.  */
>>     +bool
>>     +vlmax_avl_p (rtx x)
>>     +{
>>     +  return x && rtx_equal_p (x, RVV_VLMAX);
>>     +}
>>     +
>>     +/* Helper function to get SEW operand. We always have SEW value for
>>     +   all RVV instructions that have VTYPE OP.  */
>>     +uint8_t
>>     +get_sew (rtx_insn *rinsn)
>>     +{
>>     +  return get_attr_sew (rinsn);
>>     +}
>>     +
>>     +/* Helper function to get VLMUL operand. We always have VLMUL
>>     value for
>>     +   all RVV instructions that have VTYPE OP. */
>>     +enum vlmul_type
>>     +get_vlmul (rtx_insn *rinsn)
>>     +{
>>     +  return (enum vlmul_type) get_attr_vlmul (rinsn);
>>     +}
>>     +
>>     } // namespace riscv_vector
>>     diff --git a/gcc/config/riscv/riscv-vsetvl.cc
>>     b/gcc/config/riscv/riscv-vsetvl.cc
>>     index e9dd669de98..f2f19e423bf 100644
>>     --- a/gcc/config/riscv/riscv-vsetvl.cc
>>     +++ b/gcc/config/riscv/riscv-vsetvl.cc
>>     @@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
>>        return agnostic_p ? "agnostic" : "undisturbed";
>>     }
>>     -static bool
>>     -vlmax_avl_p (rtx x)
>>     -{
>>     -  return x && rtx_equal_p (x, RVV_VLMAX);
>>     -}
>>     -
>>     -/* Return true if it is an RVV instruction depends on VTYPE global
>>     -   status register.  */
>>     -static bool
>>     -has_vtype_op (rtx_insn *rinsn)
>>     -{
>>     -  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op
>>     (rinsn);
>>     -}
>>     -
>>     -/* Return true if it is an RVV instruction depends on VL global
>>     -   status register.  */
>>     -static bool
>>     -has_vl_op (rtx_insn *rinsn)
>>     -{
>>     -  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
>>     -}
>>     -
>>     /* Return true if the instruction ignores VLMUL field of VTYPE.  */
>>     static bool
>>     ignore_vlmul_insn_p (rtx_insn *rinsn)
>>     @@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
>>        if (!has_vl_op (rinsn))
>>          return NULL_RTX;
>>     -  if (get_attr_avl_type (rinsn) == VLMAX)
>>     -    return RVV_VLMAX;
>>     -  extract_insn_cached (rinsn);
>>     -  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
>>     -}
>>     -/* Helper function to get SEW operand. We always have SEW value for
>>     -   all RVV instructions that have VTYPE OP.  */
>>     -static uint8_t
>>     -get_sew (rtx_insn *rinsn)
>>     -{
>>     -  return get_attr_sew (rinsn);
>>     -}
>>     -
>>     -/* Helper function to get VLMUL operand. We always have VLMUL
>>     value for
>>     -   all RVV instructions that have VTYPE OP. */
>>     -static enum vlmul_type
>>     -get_vlmul (rtx_insn *rinsn)
>>     -{
>>     -  return (enum vlmul_type) get_attr_vlmul (rinsn);
>>     -}
>>     +  extract_insn_cached (rinsn);
>>     +  if (vlmax_avl_type_p (rinsn))
>>     +    {
>>     +      if (BYTES_PER_RISCV_VECTOR.is_constant ())
>>     + {
>>     +   for (int i = 0; i < recog_data.n_operands; i++)
>>     +     if (GET_MODE_CLASS (recog_data.operand_mode[i]) ==
>>     MODE_VECTOR_BOOL
>>     + && const_vlmax_p (recog_data.operand_mode[i]))
>>     +       return gen_int_mode (GET_MODE_NUNITS
>>     (recog_data.operand_mode[i]),
>>     +    Pmode);
>>     + }
>>     +      return RVV_VLMAX;
>>     +    }
>>     -/* Get default tail policy.  */
>>     -static bool
>>     -get_default_ta ()
>>     -{
>>     -  /* For the instruction that doesn't require TA, we still need
>>     a default value
>>     -     to emit vsetvl. We pick up the default value according to
>>     prefer policy. */
>>     -  return (bool) (get_prefer_tail_policy () & 0x1
>>     - || (get_prefer_tail_policy () >> 1 & 0x1));
>>     +  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
>>     }
>>     /* Get default mask policy.  */
>>     @@ -407,16 +371,6 @@ get_default_ma ()
>>     || (get_prefer_mask_policy () >> 1 & 0x1));
>>     }
>>     -/* Helper function to get TA operand.  */
>>     -static bool
>>     -tail_agnostic_p (rtx_insn *rinsn)
>>     -{
>>     -  /* If it doesn't have TA, we return agnostic by default.  */
>>     -  extract_insn_cached (rinsn);
>>     -  int ta = get_attr_ta (rinsn);
>>     -  return ta == INVALID_ATTRIBUTE ? get_default_ta () :
>>     IS_AGNOSTIC (ta);
>>     -}
>>     -
>>     /* Helper function to get MA operand.  */
>>     static bool
>>     mask_agnostic_p (rtx_insn *rinsn)
>>     @@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn
>>     *rinsn, int regno)
>>        return true;
>>     }
>>     -/* Change insn and Assert the change always happens. */
>>     -static void
>>     -validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool
>>     in_group)
>>     -{
>>     -  bool change_p = validate_change (object, loc, new_rtx, in_group);
>>     -  gcc_assert (change_p);
>>     -}
>>     -
>>     /* This flags indicates the minimum demand of the vl and vtype
>>     values by the
>>         RVV instruction. For example, DEMAND_RATIO_P indicates that
>>     this RVV
>>         instruction only needs the SEW/LMUL ratio to remain the same,
>>     and does not
>>     diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
>>     index dd17056fe82..08de62853a6 100644
>>     --- a/gcc/config/riscv/t-riscv
>>     +++ b/gcc/config/riscv/t-riscv
>>     @@ -69,6 +69,12 @@ riscv-vsetvl.o:
>>     $(srcdir)/config/riscv/riscv-vsetvl.cc \
>>     $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>>     $(srcdir)/config/riscv/riscv-vsetvl.cc
>>     +riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
>>     +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
>>     +  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h
>>     + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>>     + $(srcdir)/config/riscv/riscv-avlprop.cc
>>     +
>>     riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
>>        $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H)
>>     $(FUNCTION_H) \
>>        $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
>>     diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
>>     index ef91950178f..0c59d1b90bc 100644
>>     --- a/gcc/config/riscv/vector.md
>>     +++ b/gcc/config/riscv/vector.md
>>     @@ -809,7 +809,7 @@
>>     V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
>>     V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
>>     V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
>>     -    (symbol_ref "riscv_vector::NONVLMAX")
>>     +    (symbol_ref "riscv_vector::VLS")
>>     (eq_attr "type"
>>     "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
>>     vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
>>     vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
>>     diff --git
>>     a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>     b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>     index 928a507a363..5278e4aa38f 100644
>>     --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>     +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>     @@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
>>          }
>>     }
>>     -/* { dg-final { scan-assembler {e32,m4} } } */
>>     +/* { dg-final { scan-assembler {e16,m2} } } */
>>     /* { dg-final { scan-assembler-not {csrr} } } */
>>     /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect"
>>     } } */
>>     /* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect"
>>     } } */
>>     diff --git
>>     a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>     b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>     index a50265fc1ec..1db2e073846 100644
>>     --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>     +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>     @@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict
>>     b, int n)
>>          a[i] = a[i] + b[i];
>>     }
>>     -/* { dg-final { scan-assembler {e32,m8} } } */
>>     +/* { dg-final { scan-assembler {e16,m4} } } */
>>     /* { dg-final { scan-assembler-not {csrr} } } */
>>     /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect"
>>     } } */
>>     /* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
>>     diff --git
>>     a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>     b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>     index eac7cbc757b..ca88d42cdf4 100644
>>     ---
>>     a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>     +++
>>     b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>     @@ -7,10 +7,11 @@
>>     /*
>>     ** foo:
>>     **
>>     vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>>     +** ...
>>     ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
>>     ** ...
>>     -**
>>     vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>>     -** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
>>     +**
>>     vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>>     +** ...
>>     ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
>>     ** ...
>>     */
>>     diff --git
>>     a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>     b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>     index 965365da4bb..13367423751 100644
>>     ---
>>     a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>     +++
>>     b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>     @@ -3,7 +3,6 @@
>>     #include "ternop-2.c"
>>     -/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
>>     /* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */
>>     /* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized"
>>     } } */
>>     /* { dg-final { scan-assembler-not {\tvmv} } } */
>>     diff --git
>>     a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>     b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>     new file mode 100644
>>     index 00000000000..b0d21650c3d
>>     --- /dev/null
>>     +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>     @@ -0,0 +1,16 @@
>>     +/* { dg-do compile } */
>>     +/* { dg-options "-march=rv64gcv -mabi=lp64d
>>     --param=riscv-autovec-preference=fixed-vlmax -O3" } */
>>     +
>>     +void
>>     +foo (int *__restrict a, int *__restrict b, int *__restrict c, int n)
>>     +{
>>     +  for (int i = 0; i < n; i++)
>>     +    a[i] = b[i] + c[i];
>>     +}
>>     +
>>     +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
>>     +/* { dg-final { scan-assembler-not {vsetivli} } } */
>>     +/* { dg-final { scan-assembler-times
>>     {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
>>     +/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero}
>>     } } */
>>     +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
>>     +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
>>     diff --git
>>     a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>     b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>     new file mode 100644
>>     index 00000000000..f2d8aa54b88
>>     --- /dev/null
>>     +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>     @@ -0,0 +1,33 @@
>>     +/* { dg-do compile } */
>>     +/* { dg-options "-march=rv64gcv -mabi=lp64d
>>     --param=riscv-autovec-preference=fixed-vlmax -O3" } */
>>     +
>>     +void
>>     +foo (int *__restrict a, int *__restrict b, int *__restrict c,
>>     +     int *__restrict a2, int *__restrict b2, int *__restrict c2,
>>     +     int *__restrict a3, int *__restrict b3, int *__restrict c3,
>>     +     int *__restrict a4, int *__restrict b4, int *__restrict c4,
>>     +     int *__restrict a5, int *__restrict b5, int *__restrict c5,
>>     +     int *__restrict d, int *__restrict d2, int *__restrict d3,
>>     +     int *__restrict d4, int *__restrict d5, int n, int m)
>>     +{
>>     +  for (int i = 0; i < n; i++)
>>     +    {
>>     +      a[i] = b[i] + c[i];
>>     +      a2[i] = b2[i] + c2[i];
>>     +      a3[i] = b3[i] + c3[i];
>>     +      a4[i] = b4[i] + c4[i];
>>     +      a5[i] = a[i] + a4[i];
>>     +      d[i] = a[i] - a2[i];
>>     +      d2[i] = a2[i] * a[i];
>>     +      d3[i] = a3[i] * a2[i];
>>     +      d4[i] = a2[i] * d2[i];
>>     +      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
>>     +    }
>>     +}
>>     +
>>     +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
>>     +/* { dg-final { scan-assembler-not {vsetivli} } } */
>>     +/* { dg-final { scan-assembler-times
>>     {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
>>     +/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero}
>>     } } */
>>     +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
>>     +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
>>     diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>     b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>     index 674ba0d72b4..fc830f2cd4d 100644
>>     --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>     +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>     @@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain
>>     $srcdir/$subdir/vsetvl/*.\[cS\]]] \
>>     "" $CFLAGS
>>     dg-runtest [lsort [glob -nocomplain
>>     $srcdir/$subdir/autovec/*.\[cS\]]] \
>>     "-O3 -ftree-vectorize" $CFLAGS
>>     +dg-runtest [lsort [glob -nocomplain
>>     $srcdir/$subdir/avlprop/*.\[cS\]]] \
>>     + "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
>>     dg-runtest [lsort [glob -nocomplain
>>     $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
>>     "-O3 -ftree-vectorize --param riscv-autovec-preference=scalable"
>>     $CFLAGS
>>     dg-runtest [lsort [glob -nocomplain
>>     $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
>>     -- 
>>     2.36.3
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-24 15:03     ` Patrick O'Neill
@ 2023-10-25 12:20       ` juzhe.zhong
  2023-10-26  0:37         ` Patrick O'Neill
  0 siblings, 1 reply; 13+ messages in thread
From: juzhe.zhong @ 2023-10-25 12:20 UTC (permalink / raw)
  To: Patrick O'Neill, gcc-patches
  Cc: kito.cheng, Kito.cheng, jeffreyalaw, Robin Dapp

[-- Attachment #1: Type: text/plain, Size: 70629 bytes --]

Hi, Patrick.

I have fixed on V2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634267.html

I have tested on RV32/RV64 C/C++, no regression. But I am not able to test on Fortran.

The failures you showed have been fixed. Except this one:
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
This FAIL is not because of this patch since I confirmed it already existed without this patch.
We will fix that on stage 3.

Could you verify with Fortran test ? 

Thanks.



juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-24 23:03
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
I'm seeing a variety of new failures, constrained to rv32gcv:

Tested using newlib/linux: 
rv32gcv/ ilp32d/ medlow
rv64gcv/  lp64d/ medlow
rv64gcv_zvbb_zvbc_zvkg_zvkn_zvknc_zvkned_zvkng_zvknha_zvknhb_zvks_zvksc_zvksed_zvksg_zvksh_zvkt/  lp64d/ medlow
rv64imafdcv_zicond_zawrs_zbc_zvkng_zvksg_zvbb_zvbc_zicsr_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt/  lp64d/ medlow

Newlib failures:
rv32gcv:
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test

Debug log for testcases that aren't pr110557.c look like this:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-lmul=m4      -lm  -o ./popcount-run-1.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o ./popcount-run-1.exe
PASS: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c (test for excess errors)
spawn riscv64-unknown-elf-run ./popcount-run-1.exe
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

Debug log for pr110557.c:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs  -lm  -o ./pr110557.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs -lm -o ./pr110557.exe
PASS: g++.dg/vect/pr110557.cc  -std=c++14 (test for excess errors)
spawn riscv64-unknown-elf-run ./pr110557.exe
/scratch/tc-testing/tc-oct-23-avl/build-newlib/../scripts/wrapper/qemu/riscv64-unknown-elf-run: line 15: 3449805 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
Linux failures:
rv32gcv:
FAIL: gcc.dg/nextafter-2.c execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
FAIL: gfortran.dg/default_format_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_2.f90   -Os  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -Os  execution test
FAIL: gfortran.dg/large_real_kind_2.F90   -O0  execution test
FAIL: gfortran.dg/round_4.f90   -O0  execution test
FAIL: gfortran.dg/zero_sized_3.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test
FAIL: gfortran.dg/ieee/large_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O1  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O2  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -Os  execution test
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_sum.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops

Some (not all) debug log outputs:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions        -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
PASS: gfortran.fortran-torture/execute/intrinsic_count.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions 
spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
STOP 2
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions 
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions -funroll-loops       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -funroll-loops -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops 
spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
STOP 3
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops

Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output    -O0   -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o ./large_2.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o ./large_2.exe
PASS: gfortran.dg/ieee/large_2.f90   -O0  (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./large_2.exe
  0.333333333333333333333333333333333317         2.24271998593667819112500193394291495E+1644
STOP 1
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm  -o ./pr110557.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm -o ./pr110557.exe
PASS: g++.dg/vect/pr110557.cc  -std=c++98 (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./pr110557.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 323485 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-reduc-dot-21.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe
PASS: gcc.dg/vect/vect-reduc-dot-21.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-reduc-dot-21.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3484803 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-alias-check-16.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-alias-check-16.exe
PASS: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-alias-check-16.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3431975 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "flags: *RAW\\n"
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "using an address-based overlap test"
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump-not vect "using an index-based"

I've observed nextafter-2.c being flaky on the CI so that particular failure might not be real.

If you want any particular testcase's debug logs please let me know.

Patrick

On 10/23/23 21:30, Patrick O'Neill wrote:
The CI just picked it up: https://github.com/ewlu/gcc-precommit-ci/issues/449#issue-1958483272
Since it doesn't apply to the CI's baseline hash it's only performing a build.
I'll re-run it in the morning once the baseline has been updated.

In the meantime I started a full build+test run on my local machine.
I'll send you the results in ~10 hours - morning my time :-)

Patrick
On 10/23/23 20:44, juzhe.zhong@rivai.ai wrote:
CCing Patrick...

Hi, @Patrick.
Could you apply this patch and trigger your regression CI?

I don't have an environment to test fortran for now (I only test it on C/C++).

Thanks. 



juzhe.zhong@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-24 11:32
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.
 
Consider a simple vector addition operation:
 
https://godbolt.org/z/7hfGfEjW3
 
void
foo (int *__restrict a,
     int *__restrict b,
     int *__restrict n)
{
  for (int i = 0; i < n; i++)
      a[i] = a[i] + b[i];
}
 
Optimized IR:
 
Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);    -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;                              -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
 
We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:
 
vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
 
GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):
 
ARM SVE:
   
.L3:
        ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
        ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
        add     z31.s, z31.s, z30.s            -> un-predicated add
        st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store
 
Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.
 
Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:
 
1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
   We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.
 
To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.
 
The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)
 
Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.
 
The reasons as follows:
 
1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
   turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
PASS become heavy and heavy again, then we will need to refactor it again in the future.
Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
fixes.
 
2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things,  I don't think we should fuse them into same PASS.
 
3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.
 
4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate 
VSETVL PASS again which is already so complicated.) 
 
Here is an example to demonstrate more:
 
https://godbolt.org/z/bE86sv3q5
 
void foo2 (int *__restrict a,
          int *__restrict b,
          int *__restrict c,
          int *__restrict a2,
          int *__restrict b2,
          int *__restrict c2,
          int *__restrict a3,
          int *__restrict b3,
          int *__restrict c3,
          int *__restrict a4,
          int *__restrict b4,
          int *__restrict c4,
          int *__restrict a5,
          int *__restrict b5,
          int *__restrict c5,
          int n)
{
    for (int i = 0; i < n; i++){
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i]+ a[i];
 
      a[i] = a[i] + c[i];
      b5[i] = a[i] + c[i];
      a2[i] = a[i] + c2[i];
      a3[i] = a[i] + c3[i];
      a4[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i]+ a[i];
    }
}
 
1. Loop Body:
 
Before this patch:                                          After this patch:
  
      vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli a4,t1,e32,m1,ta,ma                                     
        vle32.v v2,0(a2)                                     vle32.v v2,0(a2)
        vle32.v v4,0(a1)                                     vle32.v v3,0(t2)
        vle32.v v1,0(t2)                                     vle32.v v4,0(a1)
        vsetvli a7,zero,e32,m1,ta,ma                         vle32.v v1,0(t0)
        vadd.vv v4,v2,v4                                     vadd.vv v4,v2,v4
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v1,v3,v1
        vle32.v v3,0(s0)                                     vadd.vv v1,v1,v4
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v1,v1,v4
        vadd.vv v1,v3,v1                                     vadd.vv v1,v1,v4
        vadd.vv v1,v1,v4                                     vadd.vv v1,v1,v2
        vadd.vv v1,v1,v4                                     vadd.vv v2,v1,v2
        vadd.vv v1,v1,v4                                     vse32.v v2,0(t5)
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v2,v2,v1
        vle32.v v4,0(a5)                                     vadd.vv v2,v2,v1
        vsetvli a7,zero,e32,m1,ta,ma                         slli a7,a4,2
        vadd.vv v1,v1,v2                                     vadd.vv v3,v1,v3
        vadd.vv v2,v1,v2                                     vle32.v v5,0(a5)
        vadd.vv v4,v1,v4                                     vle32.v v6,0(t6)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v3,0(t3)
        vse32.v v2,0(t5)                                     vse32.v v2,0(a0)
        vse32.v v4,0(a3)                                     vadd.vv v3,v3,v1
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v2,v1,v5
        vadd.vv v3,v1,v3                                     vse32.v v3,0(t4)
        vadd.vv v2,v2,v1                                     vadd.vv v1,v1,v6
        vadd.vv v2,v2,v1                                     vse32.v v2,0(a3)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v1,0(a6)
        vse32.v v2,0(a0)                                      
        vse32.v v3,0(t3)                                      
        vle32.v v2,0(t0)                                      
        vsetvli a7,zero,e32,m1,ta,ma                                      
        vadd.vv v3,v3,v1                                      
        vsetvli zero,a4,e32,m1,ta,ma                                      
        vse32.v v3,0(t4)                                      
        vsetvli a7,zero,e32,m1,ta,ma                                      
        slli    a7,a4,2                                      
        vadd.vv v1,v1,v2                                      
        sub     t1,t1,a4                                      
        vsetvli zero,a4,e32,m1,ta,ma                                      
        vse32.v v1,0(a6)                                      
 
It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.
 
2. Epilogue:
    Before this patch:                                          After this patch:
 
     .L5:                                                      .L5:                                           
        ld      s0,8(sp)                                         ret
        addi    sp,sp,16                                         
        jr      ra                                         
 
This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'
 
The final codegen after this patch:
 
foo2:
lw t1,56(sp)
ld t6,0(sp)
ld t3,8(sp)
ld t0,16(sp)
ld t2,24(sp)
ld t4,32(sp)
ld t5,40(sp)
ble t1,zero,.L5
.L3:
vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2)
vle32.v v3,0(t2)
vle32.v v4,0(a1)
vle32.v v1,0(t0)
vadd.vv v4,v2,v4
vadd.vv v1,v3,v1
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v2
vadd.vv v2,v1,v2
vse32.v v2,0(t5)
vadd.vv v2,v2,v1
vadd.vv v2,v2,v1
slli a7,a4,2
vadd.vv v3,v1,v3
vle32.v v5,0(a5)
vle32.v v6,0(t6)
vse32.v v3,0(t3)
vse32.v v2,0(a0)
vadd.vv v3,v3,v1
vadd.vv v2,v1,v5
vse32.v v3,0(t4)
vadd.vv v1,v1,v6
vse32.v v2,0(a3)
vse32.v v1,0(a6)
sub t1,t1,a4
add a1,a1,a7
add a2,a2,a7
add a5,a5,a7
add t6,t6,a7
add t0,t0,a7
add t2,t2,a7
add t5,t5,a7
add a3,a3,a7
add a6,a6,a7
add t3,t3,a7
add t4,t4,a7
add a0,a0,a7
bne t1,zero,.L3
.L5:
ret
 
PR target/111888
 
gcc/ChangeLog:
 
* config.gcc: Add AVL propgatation PASS.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
(has_vtype_op): Export as global.
(has_vl_op): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(const_vlmax_p): Ditto.
* config/riscv/riscv-v.cc (has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(get_vlmul): Ditto.
* config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
(has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_sew): Ditto.
(get_vlmul): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
* config/riscv/t-riscv: Add AVL propagation PASS.
* config/riscv/vector.md: Fix VLS modes attribute.
* config/riscv/riscv-avlprop.cc: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
* gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
* gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.
 
---
gcc/config.gcc                                |   2 +-
gcc/config/riscv/riscv-avlprop.cc             | 350 ++++++++++++++++++
gcc/config/riscv/riscv-passes.def             |   1 +
gcc/config/riscv/riscv-protos.h               |  10 +
gcc/config/riscv/riscv-v.cc                   |  84 ++++-
gcc/config/riscv/riscv-vsetvl.cc              |  82 +---
gcc/config/riscv/t-riscv                      |   6 +
gcc/config/riscv/vector.md                    |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +-
.../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +-
.../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 -
.../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
.../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
15 files changed, 514 insertions(+), 84 deletions(-)
create mode 100644 gcc/config/riscv/riscv-avlprop.cc
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 606d3a8513e..efd53965c9a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -544,7 +544,7 @@ pru-*-*)
riscv*)
cpu_type=riscv
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
- extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
+ extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o riscv-avlprop.o"
extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o"
d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-avlprop.cc b/gcc/config/riscv/riscv-avlprop.cc
new file mode 100644
index 00000000000..bf3becd8371
--- /dev/null
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -0,0 +1,350 @@
+/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2023-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or(at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions.
+   A standalone AVL propagation pass is designed because:
+
+     - Better code maintain:
+       Current LCM-based VSETVL pass is so complicated that codes
+       there will become even harder to maintain. A straight forward
+       AVL propagation PASS is much easier to maintain.
+
+     - Reduce scalar register pressure:
+       A type of AVL propagation is we propagate AVL from NON-VLMAX
+       instruction to VLMAX instruction.
+       Note: VLMAX instruction should be ignore tail elements (TA)
+       and the result should be used by the NON-VLMAX instruction.
+       This optimization is mostly for auto-vectorization codes:
+
+   vsetvli r136, r137      --- SELECT_VL
+   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
+   vadd.vv (use VLMAX)     --- PLUS_EXPR
+   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
+
+ NO AVL propation:
+
+   vsetvli a5, a4, ta
+   vle8.v v1
+   vsetvli t0, zero, ta
+   vadd.vv v2, v1, v1
+   vse8.v v2
+
+ We can propagate the AVL to 'vadd.vv' since its result
+ is consumed by a 'vse8.v' which has AVL = a5 and its
+ tail elements are agnostic.
+
+       We DON'T do this optimization on VSETVL pass since it is a
+       post-RA pass that consumed 't0' already wheras a standalone
+       pre-RA AVL propagation pass allows us elide the consumption
+       of the pseudo register of 't0' then we can reduce scalar
+       register pressure.
+
+     - More AVL propagation opportunities:
+       A pre-RA pass is more flexible for AVL REG def-use chain,
+       thus we will get more potential AVL propagation as long as
+       it doesn't increase the scalar register pressure.
+*/
+
+#define IN_TARGET_CODE 1
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "target.h"
+#include "tree-pass.h"
+#include "df.h"
+#include "rtl-ssa.h"
+#include "cfgcleanup.h"
+#include "insn-attr.h"
+
+using namespace rtl_ssa;
+using namespace riscv_vector;
+
+/* The AVL propagation instructions and corresponding preferred AVL.
+   It will be updated during the analysis.  */
+static hash_map<insn_info *, rtx> *avlprops;
+
+const pass_data pass_data_avlprop = {
+  RTL_PASS, /* type */
+  "avlprop", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_avlprop : public rtl_opt_pass
+{
+public:
+  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass (pass_data_avlprop, ctxt) {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) final override
+  {
+    return TARGET_VECTOR && optimize > 0;
+  }
+  virtual unsigned int execute (function *) final override;
+}; // class pass_avlprop
+
+static void
+avlprop_init (void)
+{
+  calculate_dominance_info (CDI_DOMINATORS);
+  df_analyze ();
+  crtl->ssa = new function_info (cfun);
+  avlprops = new hash_map<insn_info *, rtx>;
+}
+
+static void
+avlprop_done (void)
+{
+  free_dominance_info (CDI_DOMINATORS);
+  if (crtl->ssa->perform_pending_updates ())
+    cleanup_cfg (0);
+  delete crtl->ssa;
+  crtl->ssa = nullptr;
+  delete avlprops;
+  avlprops = NULL;
+}
+
+/* Helper function to get AVL operand.  */
+static rtx
+get_avl (insn_info *insn, bool avlprop_p)
+{
+  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
+      || get_attr_avl_type (insn->rtl ()) == VLS)
+    return NULL_RTX;
+  if (avlprop_p)
+    {
+      if (avlprops->get (insn))
+ return (*avlprops->get (insn));
+      else if (vlmax_avl_type_p (insn->rtl ()))
+ return RVV_VLMAX;
+    }
+  extract_insn_cached (insn->rtl ());
+  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
+}
+
+/* This is a straight forward pattern ALWAYS in paritial auto-vectorization:
+
+     VL = SELECT_AVL (AVL, ...)
+     V0 = MASK_LEN_LOAD (..., VL)
+     V1 = MASK_LEN_LOAD (..., VL)
+     V2 = V0 + V1 --- Missed LEN information.
+     MASK_LEN_STORE (..., V2, VL)
+
+   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1, dummy LEN)
+   because:
+
+     - Few code changes in Loop Vectorizer.
+     - Reuse the current clean flow of partial vectorization, That is, apply
+       predicate LEN or MASK into LOAD/STORE operations and other special
+       arithmetic operations (e.d. DIV), then do the whole vector register
+       operation if it DON'T affect the correctness.
+       Such flow is used by all other targets like x86, sve, s390, ... etc.
+     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
+
+   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like PLUS_EXPR which
+   generates the VLMAX instruction due to missed LEN information. The later
+   VSETVL PASS will elided the redundant vsetvls.
+*/
+
+static rtx
+get_autovectorize_preferred_avl (insn_info *insn)
+{
+  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p (insn->rtl ()))
+    return NULL_RTX;
+
+  rtx use_avl = NULL_RTX;
+  insn_info *avl_use_insn = nullptr;
+  unsigned int ratio
+    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul (insn->rtl ()));
+  for (def_info *def : insn->defs ())
+    {
+      auto set = safe_dyn_cast<set_info *> (def);
+      if (!set || !set->is_reg ())
+ return NULL_RTX;
+      for (use_info *use : set->all_uses ())
+ {
+   if (!use->is_in_nondebug_insn ())
+     return NULL_RTX;
+   insn_info *use_insn = use->insn ();
+   /* FIXME: Stop AVL propagation if any USE is not a RVV real
+      instruction. It should be totally enough for vectorized codes since
+      they always locate at extended blocks.
+
+      TODO: We can extend PHI checking for intrinsic codes if it
+      necessary in the future.  */
+   if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!has_vl_op (use_insn->rtl ()))
+     continue;
+
+   rtx new_use_avl = get_avl (use_insn, true);
+   if (!new_use_avl)
+     return NULL_RTX;
+   if (!use_avl)
+     use_avl = new_use_avl;
+   if (!rtx_equal_p (use_avl, new_use_avl)
+       || calculate_ratio (get_sew (use_insn->rtl ()),
+   get_vlmul (use_insn->rtl ()))
+    != ratio
+       || vlmax_avl_p (new_use_avl)
+       || !tail_agnostic_p (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!avl_use_insn)
+     avl_use_insn = use_insn;
+ }
+    }
+
+  if (use_avl && register_operand (use_avl, Pmode))
+    {
+      gcc_assert (avl_use_insn);
+      // Find a definition at or neighboring INSN.
+      resource_info resource = full_register (REGNO (use_avl));
+      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
+      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
+      if (dl1.matching_set () || dl2.matching_set ())
+ return NULL_RTX;
+      def_info *def1 = dl1.last_def_of_prev_group ();
+      def_info *def2 = dl2.last_def_of_prev_group ();
+      if (def1 != def2)
+ return NULL_RTX;
+      /* FIXME: We only all AVL propation within a block which should
+ be totally enough for vectorized codes.
+
+ TODO: We can enhance it here for intrinsic codes in the future
+ if it is necessary.  */
+      if (def1->insn ()->bb () != insn->bb ()
+   || def1->insn ()->compare_with (insn) >= 0)
+ return NULL_RTX;
+    }
+  return use_avl;
+}
+
+/* If we have a preferred AVL to propagate, return the AVL.
+   Otherwise, return NULL_RTX as we don't need have any preferred
+   AVL.  */
+
+static rtx
+get_preferred_avl (insn_info *insn)
+{
+  /* TODO: We only do AVL propagation for missed-LEN partial
+     autovectorization for now.  We could add more more AVL
+     propagation for intrinsic codes in the future.  */
+  return get_autovectorize_preferred_avl (insn);
+}
+
+/* Return the AVL TYPE operand index.  */
+static int
+get_avl_type_index (insn_info *insn)
+{
+  extract_insn_cached (insn->rtl ());
+  /* Except rounding mode patterns, AVL TYPE operand
+     is always the last operand.  */
+  if (find_access (insn->uses (), VXRM_REGNUM)
+      || find_access (insn->uses (), FRM_REGNUM))
+    return recog_data.n_operands - 2;
+  return recog_data.n_operands - 1;
+}
+
+/* Main entry point for this pass.  */
+unsigned int
+pass_avlprop::execute (function *)
+{
+  avlprop_init ();
+
+  /* Go through all the instructions looking for AVL that we could propagate. */
+
+  insn_info *next;
+  bool change_p = true;
+
+  while (change_p)
+    {
+      /* Iterate on each instruction until no more change need.  */
+      change_p = false;
+      for (insn_info *insn = crtl->ssa->first_insn (); insn; insn = next)
+ {
+   next = insn->next_any_insn ();
+   /* We only forward AVL to the instruction that has AVL/VL operand
+      and can be optimized in RTL_SSA level.  */
+   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
+     continue;
+
+   rtx new_avl = get_preferred_avl (insn);
+   if (new_avl)
+     {
+       gcc_assert (!vlmax_avl_p (new_avl));
+       auto &update = avlprops->get_or_insert (insn);
+       change_p = !rtx_equal_p (update, new_avl);
+       update = new_avl;
+     }
+ }
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "\nNumber of successful AVL propagations: %d\n\n",
+      (int) avlprops->elements ());
+
+  for (const auto iter : *avlprops)
+    {
+      rtx_insn *rinsn = iter.first->rtl ();
+      if (dump_file)
+ {
+   fprintf (dump_file, "\nPropagating AVL: ");
+   print_rtl_single (dump_file, iter.second);
+   fprintf (dump_file, "into: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+      /* Replace AVL operand.  */
+      rtx new_pat
+ = simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first, false),
+ iter.second);
+      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
+
+      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
+      if (vlmax_avl_type_p (rinsn))
+ validate_change_or_fail (
+   rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)],
+   get_avl_type_rtx (avl_type::NONVLMAX), false);
+      if (dump_file)
+ {
+   fprintf (dump_file, "Successfully to match this instruction: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+    }
+
+  avlprop_done ();
+  return 0;
+}
+
+rtl_opt_pass *
+make_pass_avlprop (gcc::context *ctxt)
+{
+  return new pass_avlprop (ctxt);
+}
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
index 4084122cf0a..b6260939d5c 100644
--- a/gcc/config/riscv/riscv-passes.def
+++ b/gcc/config/riscv/riscv-passes.def
@@ -18,4 +18,5 @@
    <http://www.gnu.org/licenses/>.  */
INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
+INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6cb9d459ee9..2b09ec9ea9e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
/* Routines implemented in riscv-string.c.  */
@@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
bool cmp_lmul_gt_one (machine_mode);
bool gather_scatter_valid_offset_mode_p (machine_mode);
bool vls_mode_valid_p (machine_mode);
+bool has_vtype_op (rtx_insn *);
+bool has_vl_op (rtx_insn *);
+bool tail_agnostic_p (rtx_insn *);
+void validate_change_or_fail (rtx, rtx *, rtx, bool);
+bool vlmax_avl_type_p (rtx_insn *);
+bool vlmax_avl_p (rtx);
+uint8_t get_sew (rtx_insn *);
+enum vlmul_type get_vlmul (rtx_insn *);
+bool const_vlmax_p (machine_mode);
}
/* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e39a9507803..473622ac321 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -56,7 +56,7 @@ using namespace riscv_vector;
namespace riscv_vector {
/* Return true if vlmax is constant value and can be used in vsetivl.  */
-static bool
+bool
const_vlmax_p (machine_mode mode)
{
   poly_uint64 nuints = GET_MODE_NUNITS (mode);
@@ -298,14 +298,6 @@ public:
      len = force_reg (Pmode, len);
    vls_p = true;
  }
- else if (const_vlmax_p (vtype_mode))
-   {
-     /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
-        the vsetvli to obtain the value of vlmax.  */
-     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
-     len = gen_int_mode (nunits, Pmode);
-     vls_p = true;
-   }
else if (can_create_pseudo_p ())
  {
    len = gen_reg_rtx (Pmode);
@@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
   emit_move_insn (dst, x4);
}
+/* Return true if it is an RVV instruction depends on VTYPE global
+   status register.  */
+bool
+has_vtype_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
+}
+
+/* Return true if it is an RVV instruction depends on VL global
+   status register.  */
+bool
+has_vl_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
+}
+
+/* Get default tail policy.  */
+static bool
+get_default_ta ()
+{
+  /* For the instruction that doesn't require TA, we still need a default value
+     to emit vsetvl. We pick up the default value according to prefer policy. */
+  return (bool) (get_prefer_tail_policy () & 0x1
+ || (get_prefer_tail_policy () >> 1 & 0x1));
+}
+
+/* Helper function to get TA operand.  */
+bool
+tail_agnostic_p (rtx_insn *rinsn)
+{
+  /* If it doesn't have TA, we return agnostic by default.  */
+  extract_insn_cached (rinsn);
+  int ta = get_attr_ta (rinsn);
+  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
+}
+
+/* Change insn and Assert the change always happens.  */
+void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
+
+/* Return true if it is VLMAX AVL TYPE.  */
+bool
+vlmax_avl_type_p (rtx_insn *rinsn)
+{
+  return get_attr_avl_type (rinsn) == VLMAX;
+}
+
+/* Return true if RTX is RVV VLMAX AVL.  */
+bool
+vlmax_avl_p (rtx x)
+{
+  return x && rtx_equal_p (x, RVV_VLMAX);
+}
+
+/* Helper function to get SEW operand. We always have SEW value for
+   all RVV instructions that have VTYPE OP.  */
+uint8_t
+get_sew (rtx_insn *rinsn)
+{
+  return get_attr_sew (rinsn);
+}
+
+/* Helper function to get VLMUL operand. We always have VLMUL value for
+   all RVV instructions that have VTYPE OP. */
+enum vlmul_type
+get_vlmul (rtx_insn *rinsn)
+{
+  return (enum vlmul_type) get_attr_vlmul (rinsn);
+}
+
} // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index e9dd669de98..f2f19e423bf 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
   return agnostic_p ? "agnostic" : "undisturbed";
}
-static bool
-vlmax_avl_p (rtx x)
-{
-  return x && rtx_equal_p (x, RVV_VLMAX);
-}
-
-/* Return true if it is an RVV instruction depends on VTYPE global
-   status register.  */
-static bool
-has_vtype_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
-}
-
-/* Return true if it is an RVV instruction depends on VL global
-   status register.  */
-static bool
-has_vl_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
-}
-
/* Return true if the instruction ignores VLMUL field of VTYPE.  */
static bool
ignore_vlmul_insn_p (rtx_insn *rinsn)
@@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
   if (!has_vl_op (rinsn))
     return NULL_RTX;
-  if (get_attr_avl_type (rinsn) == VLMAX)
-    return RVV_VLMAX;
-  extract_insn_cached (rinsn);
-  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
-}
-/* Helper function to get SEW operand. We always have SEW value for
-   all RVV instructions that have VTYPE OP.  */
-static uint8_t
-get_sew (rtx_insn *rinsn)
-{
-  return get_attr_sew (rinsn);
-}
-
-/* Helper function to get VLMUL operand. We always have VLMUL value for
-   all RVV instructions that have VTYPE OP. */
-static enum vlmul_type
-get_vlmul (rtx_insn *rinsn)
-{
-  return (enum vlmul_type) get_attr_vlmul (rinsn);
-}
+  extract_insn_cached (rinsn);
+  if (vlmax_avl_type_p (rinsn))
+    {
+      if (BYTES_PER_RISCV_VECTOR.is_constant ())
+ {
+   for (int i = 0; i < recog_data.n_operands; i++)
+     if (GET_MODE_CLASS (recog_data.operand_mode[i]) == MODE_VECTOR_BOOL
+ && const_vlmax_p (recog_data.operand_mode[i]))
+       return gen_int_mode (GET_MODE_NUNITS (recog_data.operand_mode[i]),
+    Pmode);
+ }
+      return RVV_VLMAX;
+    }
-/* Get default tail policy.  */
-static bool
-get_default_ta ()
-{
-  /* For the instruction that doesn't require TA, we still need a default value
-     to emit vsetvl. We pick up the default value according to prefer policy. */
-  return (bool) (get_prefer_tail_policy () & 0x1
- || (get_prefer_tail_policy () >> 1 & 0x1));
+  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
}
/* Get default mask policy.  */
@@ -407,16 +371,6 @@ get_default_ma ()
|| (get_prefer_mask_policy () >> 1 & 0x1));
}
-/* Helper function to get TA operand.  */
-static bool
-tail_agnostic_p (rtx_insn *rinsn)
-{
-  /* If it doesn't have TA, we return agnostic by default.  */
-  extract_insn_cached (rinsn);
-  int ta = get_attr_ta (rinsn);
-  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
-}
-
/* Helper function to get MA operand.  */
static bool
mask_agnostic_p (rtx_insn *rinsn)
@@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
   return true;
}
-/* Change insn and Assert the change always happens.  */
-static void
-validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
-{
-  bool change_p = validate_change (object, loc, new_rtx, in_group);
-  gcc_assert (change_p);
-}
-
/* This flags indicates the minimum demand of the vl and vtype values by the
    RVV instruction. For example, DEMAND_RATIO_P indicates that this RVV
    instruction only needs the SEW/LMUL ratio to remain the same, and does not
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index dd17056fe82..08de62853a6 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -69,6 +69,12 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/riscv/riscv-vsetvl.cc
+riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
+  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h 
+ $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+ $(srcdir)/config/riscv/riscv-avlprop.cc
+
riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
   $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ef91950178f..0c59d1b90bc 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -809,7 +809,7 @@
  V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
  V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
  V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
-    (symbol_ref "riscv_vector::NONVLMAX")
+    (symbol_ref "riscv_vector::VLS")
(eq_attr "type" "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
  vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
  vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
index 928a507a363..5278e4aa38f 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
@@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
     }
}
-/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler {e16,m2} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
index a50265fc1ec..1db2e073846 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
@@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict b, int n)
     a[i] = a[i] + b[i];
}
-/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler {e16,m4} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
index eac7cbc757b..ca88d42cdf4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
@@ -7,10 +7,11 @@
/*
** foo:
** vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
-** vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
-** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
+** vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
*/
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
index 965365da4bb..13367423751 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
@@ -3,7 +3,6 @@
#include "ternop-2.c"
-/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
/* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */
/* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized" } } */
/* { dg-final { scan-assembler-not {\tvmv} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
new file mode 100644
index 00000000000..b0d21650c3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
new file mode 100644
index 00000000000..f2d8aa54b88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c,
+     int *__restrict a2, int *__restrict b2, int *__restrict c2,
+     int *__restrict a3, int *__restrict b3, int *__restrict c3,
+     int *__restrict a4, int *__restrict b4, int *__restrict c4,
+     int *__restrict a5, int *__restrict b5, int *__restrict c5,
+     int *__restrict d, int *__restrict d2, int *__restrict d3,
+     int *__restrict d4, int *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 674ba0d72b4..fc830f2cd4d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
"" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \
"-O3 -ftree-vectorize" $CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/avlprop/*.\[cS\]]] \
+ "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
"-O3 -ftree-vectorize --param riscv-autovec-preference=scalable" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
-- 
2.36.3
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-25 12:20       ` juzhe.zhong
@ 2023-10-26  0:37         ` Patrick O'Neill
  2023-10-26  0:49           ` juzhe.zhong
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick O'Neill @ 2023-10-26  0:37 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: kito.cheng, Kito.cheng, jeffreyalaw, Robin Dapp

[-- Attachment #1: Type: text/plain, Size: 88337 bytes --]

Hi Juzhe,

I tested on glibc rv32/64gcv qemu.
Applied patch to/comparing with 668c4c3783970e7adf0591396b6d0d5286cc0541.

V2 results look much better! I don't see any new fortran failures but I 
am seeing new gcc failures:

rv64gcv:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 
-ftree-vectorize --param riscv-autovec-lmul=dynamic scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 
-ftree-vectorize --param riscv-autovec-lmul=dynamic scan-assembler e32,m8
FAIL: 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c 
execution test

rv32gcv:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 
-ftree-vectorize --param riscv-autovec-lmul=dynamic scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 
-ftree-vectorize --param riscv-autovec-lmul=dynamic scan-assembler e32,m8
FAIL: 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

The popcount-run-1.c test doesn't show up for me on 
668c4c3783970e7adf0591396b6d0d5286cc0541 rv32gcv or rv64gcv.
After applying your patch it only shows up on rv32gcv (rv64gcv still 
does not have the failure). This might be due to a difference in our 
testing setups.

Thanks,
Patrick

On 10/25/23 05:20, juzhe.zhong@rivai.ai wrote:
> Hi, Patrick.
>
> I have fixed on V2 patch: 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634267.html
>
> I have tested on RV32/RV64 C/C++, no regression. But I am not able to 
> test on Fortran.
>
> The failures you showed have been fixed. Except this one:
> FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
> This FAIL is not because of this patch since I confirmed it already 
> existed without this patch.
> We will fix that on stage 3.
>
> Could you verify with Fortran test ?
>
> Thanks.
>
> ------------------------------------------------------------------------
> juzhe.zhong@rivai.ai
>
>     *From:* Patrick O'Neill <mailto:patrick@rivosinc.com>
>     *Date:* 2023-10-24 23:03
>     *To:* juzhe.zhong@rivai.ai; gcc-patches
>     <mailto:gcc-patches@gcc.gnu.org>
>     *CC:* kito.cheng <mailto:kito.cheng@gmail.com>; Kito.cheng
>     <mailto:kito.cheng@sifive.com>; jeffreyalaw
>     <mailto:jeffreyalaw@gmail.com>; Robin Dapp
>     <mailto:rdapp.gcc@gmail.com>
>     *Subject:* Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV
>     auto-vectorization
>
>     I'm seeing a variety of new failures, constrained to rv32gcv:
>
>     Tested using newlib/linux:
>     rv32gcv/ ilp32d/ medlow
>     rv64gcv/  lp64d/ medlow
>     rv64gcv_zvbb_zvbc_zvkg_zvkn_zvknc_zvkned_zvkng_zvknha_zvknhb_zvks_zvksc_zvksed_zvksg_zvksh_zvkt/
>     lp64d/ medlow
>     rv64imafdcv_zicond_zawrs_zbc_zvkng_zvksg_zvbb_zvbc_zicsr_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt/
>     lp64d/ medlow
>
>     Newlib failures:
>     rv32gcv:
>     FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
>     FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
>     FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-reduc-10.c execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
>     FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c
>     execution test
>     FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c
>     execution test
>     FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
>     FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution
>     test
>     FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
>     FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
>     FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
>     FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
>
>     Debug log for testcases that aren't pr110557.c look like this:
>
>     Executing on host:
>     /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc
>     -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output -ftree-vectorize -O3 --param
>     riscv-autovec-lmul=m4 -lm -o ./popcount-run-1.exe (timeout = 600)
>     spawn -ignore SIGHUP
>     /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc
>     -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output -ftree-vectorize -O3 --param
>     riscv-autovec-lmul=m4 -lm -o ./popcount-run-1.exe PASS:
>     gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c (test for
>     excess errors) spawn riscv64-unknown-elf-run ./popcount-run-1.exe
>     FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution
>     test
>
>     Debug log for pr110557.c:
>
>     Executing on host:
>     /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++
>     -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../ 
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc 
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow  
>     -fdiagnostics-plain-output  -nostdinc++
>     -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf
>     -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util
>     -fmessage-length=0  -std=c++14 -O2 -ftree-vectorize
>     -fno-vect-cost-model --param riscv-autovec-preference=scalable
>     --param riscv-vector-abi -fdump-tree-vect-details       
>     -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs 
>     -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs 
>     -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs 
>     -lm  -o ./pr110557.exe    (timeout = 600) spawn -ignore SIGHUP
>     /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++
>     -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output -nostdinc++
>     -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf
>     -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util
>     -fmessage-length=0 -std=c++14 -O2 -ftree-vectorize
>     -fno-vect-cost-model --param riscv-autovec-preference=scalable
>     --param riscv-vector-abi -fdump-tree-vect-details
>     -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs
>     -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs
>     -lm -o ./pr110557.exe PASS: g++.dg/vect/pr110557.cc  -std=c++14
>     (test for excess errors) spawn riscv64-unknown-elf-run
>     ./pr110557.exe
>     /scratch/tc-testing/tc-oct-23-avl/build-newlib/../scripts/wrapper/qemu/riscv64-unknown-elf-run:
>     line 15: 3449805 Trace/breakpoint trap   (core dumped)
>     QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen
>     -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL:
>     g++.dg/vect/pr110557.cc  -std=c++14 execution test
>
>     Linux failures:
>     rv32gcv:
>     FAIL: gcc.dg/nextafter-2.c execution test
>     FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
>     FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
>     FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-reduc-10.c execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects
>     execution test
>     FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
>     FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c
>     execution test
>     FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c
>     execution test
>     FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
>     FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
>     FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution
>     test
>     FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution
>     test
>     FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
>     FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
>     FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
>     FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
>     FAIL: gfortran.dg/default_format_2.f90   -O0  execution test
>     FAIL: gfortran.dg/default_format_2.f90   -O1  execution test
>     FAIL: gfortran.dg/default_format_2.f90   -O2  execution test
>     FAIL: gfortran.dg/default_format_2.f90   -O3 -fomit-frame-pointer
>     -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
>     test
>     FAIL: gfortran.dg/default_format_2.f90   -O3 -g  execution test
>     FAIL: gfortran.dg/default_format_2.f90   -Os  execution test
>     FAIL: gfortran.dg/default_format_denormal_2.f90   -O0 execution test
>     FAIL: gfortran.dg/default_format_denormal_2.f90   -O1 execution test
>     FAIL: gfortran.dg/default_format_denormal_2.f90   -O2 execution test
>     FAIL: gfortran.dg/default_format_denormal_2.f90   -O3
>     -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
>     -finline-functions  execution test
>     FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g execution
>     test
>     FAIL: gfortran.dg/default_format_denormal_2.f90   -Os execution test
>     FAIL: gfortran.dg/large_real_kind_2.F90   -O0  execution test
>     FAIL: gfortran.dg/round_4.f90   -O0  execution test
>     FAIL: gfortran.dg/zero_sized_3.f90   -O3 -fomit-frame-pointer
>     -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
>     test
>     FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test
>     FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test
>     FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test
>     FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer
>     -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
>     test
>     FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test
>     FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test
>     FAIL: gfortran.dg/ieee/large_1.f90   -O0  execution test
>     FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
>     FAIL: gfortran.dg/ieee/large_2.f90   -O1  execution test
>     FAIL: gfortran.dg/ieee/large_2.f90   -O2  execution test
>     FAIL: gfortran.dg/ieee/large_2.f90   -O3 -fomit-frame-pointer
>     -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
>     test
>     FAIL: gfortran.dg/ieee/large_2.f90   -O3 -g  execution test
>     FAIL: gfortran.dg/ieee/large_2.f90   -Os  execution test
>     FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90
>     execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
>     FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90
>     execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
>     FAIL: gfortran.fortran-torture/execute/intrinsic_sum.f90
>     execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
>
>     Some (not all) debug log outputs:
>
>     Executing on host:
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90
>     -march=rv64gcv -mabi=lp64d -mcmodel=medlow
>     -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2
>     -fomit-frame-pointer -finline-functions
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs
>     -lm -o
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
>     (timeout = 600) spawn -ignore SIGHUP
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90
>     -march=rv64gcv -mabi=lp64d -mcmodel=medlow
>     -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2
>     -fomit-frame-pointer -finline-functions
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs
>     -lm -o
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
>     PASS: gfortran.fortran-torture/execute/intrinsic_count.f90
>     compilation, -O2 -fomit-frame-pointer -finline-functions spawn
>     riscv64-unknown-linux-gnu-run
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
>     STOP 2 FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90
>     execution, -O2 -fomit-frame-pointer -finline-functions Executing
>     on host:
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2
>     -fomit-frame-pointer -finline-functions -funroll-loops
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs
>     -lm -o
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
>     (timeout = 600) spawn -ignore SIGHUP
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2
>     -fomit-frame-pointer -finline-functions -funroll-loops
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs
>     -lm -o
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
>     PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90
>     compilation, -O2 -fomit-frame-pointer -finline-functions
>     -funroll-loops spawn riscv64-unknown-linux-gnu-run
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
>     STOP 3 FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90
>     execution, -O2 -fomit-frame-pointer -finline-functions -funroll-loops
>     Executing on host:
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output -fdiagnostics-plain-output -O0
>     -pedantic-errors -fintrinsic-modules-path
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/
>     -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs
>     -lm -o ./large_2.exe (timeout = 600) spawn -ignore SIGHUP
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output -fdiagnostics-plain-output -O0
>     -pedantic-errors -fintrinsic-modules-path
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/
>     -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs
>     -lm -o ./large_2.exe PASS: gfortran.dg/ieee/large_2.f90 -O0 (test
>     for excess errors) spawn riscv64-unknown-linux-gnu-run
>     ./large_2.exe 0.333333333333333333333333333333333317
>     2.24271998593667819112500193394291495E+1644 STOP 1 FAIL:
>     gfortran.dg/ieee/large_2.f90 -O0 execution test Executing on host:
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output -nostdinc++
>     -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu
>     -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util
>     -fmessage-length=0 -std=c++98 -O2 -ftree-vectorize
>     -fno-vect-cost-model --param riscv-autovec-preference=scalable
>     --param riscv-vector-abi -fdump-tree-vect-details
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs
>     -lm -o ./pr110557.exe (timeout = 600) spawn -ignore SIGHUP
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output -nostdinc++
>     -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu
>     -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward
>     -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util
>     -fmessage-length=0 -std=c++98 -O2 -ftree-vectorize
>     -fno-vect-cost-model --param riscv-autovec-preference=scalable
>     --param riscv-vector-abi -fdump-tree-vect-details
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/
>     -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs
>     -lm -o ./pr110557.exe PASS: g++.dg/vect/pr110557.cc -std=c++98
>     (test for excess errors) spawn riscv64-unknown-linux-gnu-run
>     ./pr110557.exe
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run:
>     line 15: 323485 Trace/breakpoint trap (core dumped)
>     QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen
>     -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL:
>     g++.dg/vect/pr110557.cc -std=c++98 execution test Executing on
>     host:
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output --param
>     riscv-autovec-preference=scalable --param riscv-vector-abi
>     -ftree-vectorize -fno-tree-loop-distribute-patterns
>     -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm
>     -o ./vect-reduc-dot-21.exe (timeout = 600) spawn -ignore SIGHUP
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output --param
>     riscv-autovec-preference=scalable --param riscv-vector-abi
>     -ftree-vectorize -fno-tree-loop-distribute-patterns
>     -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm
>     -o ./vect-reduc-dot-21.exe PASS: gcc.dg/vect/vect-reduc-dot-21.c
>     (test for excess errors) spawn riscv64-unknown-linux-gnu-run
>     ./vect-reduc-dot-21.exe
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run:
>     line 15: 3484803 Aborted (core dumped)
>     QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen
>     -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL:
>     gcc.dg/vect/vect-reduc-dot-21.c execution test Executing on host:
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output --param
>     riscv-autovec-preference=scalable --param riscv-vector-abi
>     -ftree-vectorize -fno-tree-loop-distribute-patterns
>     -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm
>     -o ./vect-alias-check-16.exe (timeout = 600) spawn -ignore SIGHUP
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc
>     -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/
>     /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
>     -march=rv32gcv -mabi=ilp32d -mcmodel=medlow
>     -fdiagnostics-plain-output --param
>     riscv-autovec-preference=scalable --param riscv-vector-abi
>     -ftree-vectorize -fno-tree-loop-distribute-patterns
>     -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm
>     -o ./vect-alias-check-16.exe PASS:
>     gcc.dg/vect/vect-alias-check-16.c (test for excess errors) spawn
>     riscv64-unknown-linux-gnu-run ./vect-alias-check-16.exe
>     /scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run:
>     line 15: 3431975 Aborted (core dumped)
>     QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen
>     -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@" FAIL:
>     gcc.dg/vect/vect-alias-check-16.c execution test PASS:
>     gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "flags:
>     *RAW\\n" PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump
>     vect "using an address-based overlap test" PASS:
>     gcc.dg/vect/vect-alias-check-16.c scan-tree-dump-not vect "using
>     an index-based"
>
>     I've observed nextafter-2.c being flaky on the CI so that
>     particular failure might not be real.
>
>     If you want any particular testcase's debug logs please let me know.
>
>     Patrick
>
>     On 10/23/23 21:30, Patrick O'Neill wrote:
>>
>>     The CI just picked it up:
>>     https://github.com/ewlu/gcc-precommit-ci/issues/449#issue-1958483272
>>     Since it doesn't apply to the CI's baseline hash it's only
>>     performing a build.
>>     I'll re-run it in the morning once the baseline has been updated.
>>
>>     In the meantime I started a full build+test run on my local machine.
>>     I'll send you the results in ~10 hours - morning my time :-)
>>
>>     Patrick
>>
>>     On 10/23/23 20:44, juzhe.zhong@rivai.ai wrote:
>>>     CCing Patrick...
>>>
>>>     Hi, @Patrick.
>>>     Could you apply this patch and trigger your regression CI?
>>>
>>>     I don't have an environment to test fortran for now (I only test
>>>     it on C/C++).
>>>
>>>     Thanks.
>>>
>>>     ------------------------------------------------------------------------
>>>     juzhe.zhong@rivai.ai
>>>
>>>         *From:* Juzhe-Zhong <mailto:juzhe.zhong@rivai.ai>
>>>         *Date:* 2023-10-24 11:32
>>>         *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
>>>         *CC:* kito.cheng <mailto:kito.cheng@gmail.com>; kito.cheng
>>>         <mailto:kito.cheng@sifive.com>; jeffreyalaw
>>>         <mailto:jeffreyalaw@gmail.com>; rdapp.gcc
>>>         <mailto:rdapp.gcc@gmail.com>; Juzhe-Zhong
>>>         <mailto:juzhe.zhong@rivai.ai>
>>>         *Subject:* [PATCH] RISC-V: Add AVL propagation PASS for RVV
>>>         auto-vectorization
>>>         This patch addresses the redundant AVL/VL toggling in RVV
>>>         partial auto-vectorization
>>>         which is a known issue for a long time and I finally find
>>>         the time to address it.
>>>         Consider a simple vector addition operation:
>>>         https://godbolt.org/z/7hfGfEjW3
>>>         void
>>>         foo (int *__restrict a,
>>>              int *__restrict b,
>>>              int *__restrict n)
>>>         {
>>>           for (int i = 0; i < n; i++)
>>>               a[i] = a[i] + b[i];
>>>         }
>>>         Optimized IR:
>>>         Loop body:
>>>           _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4,
>>>         4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
>>>           ...
>>>           vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1,
>>>         ... }, _38, 0);    -> vle32.v v2,0(a0)
>>>           vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1,
>>>         ... }, _38, 0);   -> vle32.v v1,0(a1)
>>>           vect__7.12_19 = vect__6.11_20 +
>>>         vect__4.8_27;                              -> vsetvli
>>>         a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
>>>           .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0,
>>>         vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v
>>>         v1,0(a4)
>>>         We can see 2 redundant vsetvls inside the loop body due to
>>>         AVL/VL toggling.
>>>         The AVL/VL toggling is because we are missing LEN
>>>         information in simple PLUS_EXPR GIMPLE assignment:
>>>         vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
>>>         GCC apply partial predicate load/store and un-predicated
>>>         full vector operation on partial vectorization.
>>>         Such flow are used by all other targets like ARM SVE (RVV
>>>         also uses such flow):
>>>         ARM SVE:
>>>         .L3:
>>>                 ld1w    z30.s, p7/z, [x0, x3, lsl 2] -> predicated load
>>>                 ld1w    z31.s, p7/z, [x1, x3, lsl 2] -> predicated load
>>>                 add     z31.s, z31.s, z30.s -> un-predicated add
>>>                 st1w    z31.s, p7, [x0, x3, lsl 2] -> predicated store
>>>         Such vectorization flow causes AVL/VL toggling on RVV so we
>>>         need AVL propagation PASS for it.
>>>         Also, It's very unlikely that we can apply predicated
>>>         operations on all vectorization for following reasons:
>>>         1. It's very heavy workload to support them on all
>>>         vectorization and we don't see any benefits if we can handle
>>>         that on targets backend.
>>>         2. Changing Loop vectorizer for it will make code base ugly
>>>         and hard to maintain.
>>>         3. We will need so many patterns for all operations. Not
>>>         only COND_LEN_ADD, COND_LEN_SUB, ....
>>>            We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... ..
>>>         over 100+ patterns, unreasonable number of patterns.
>>>         To conclude, we prefer un-predicated operations here, and
>>>         design a nice and clean AVL propagation PASS for it to elide
>>>         the redundant vsetvls
>>>         due to AVL/VL toggling.
>>>         The second question is that why we separate a PASS called
>>>         AVL propagation. Why not optimize it in VSETVL PASS (We
>>>         definitetly can optimize AVL in VSETVL PASS)
>>>         Frankly, I was planning to address such issue in VSETVL PASS
>>>         that's why we recently refactored VSETVL PASS. However, I
>>>         changed my mind recently after several
>>>         experiments and tries.
>>>         The reasons as follows:
>>>         1. For code base management and maintainience. Current
>>>         VSETVL PASS is complicated enough and aleady has enough
>>>         aggressive and fancy optimizations which
>>>            turns out it can always generate optimal codegen in most
>>>         of the cases. It's not a good idea keep adding more features
>>>         into VSETVL PASS to make VSETVL
>>>         PASS become heavy and heavy again, then we will need to
>>>         refactor it again in the future.
>>>         Actuall, the VSETVL PASS is very stable and optimal after
>>>         the recent refactoring. Hopefully, we should not change
>>>         VSETVL PASS any more except the minor
>>>         fixes.
>>>         2. vsetvl insertion (VSETVL PASS does this thing) and AVL
>>>         propagation are 2 different things,  I don't think we should
>>>         fuse them into same PASS.
>>>         3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion
>>>         should be done before RA which can reduce register allocation.
>>>         4. This patch's AVL propagation PASS only does AVL
>>>         propagation for RVV partial auto-vectorization situations.
>>>            This patch's codes are only hundreds lines which is very
>>>         managable and can be very easily extended features and
>>>         enhancements.
>>>         We can easily extend and enhance more AVL propagation in a
>>>         clean and separate PASS in the future. (If we do it on
>>>         VSETVL PASS, we will complicate
>>>         VSETVL PASS again which is already so complicated.)
>>>         Here is an example to demonstrate more:
>>>         https://godbolt.org/z/bE86sv3q5
>>>         void foo2 (int *__restrict a,
>>>                   int *__restrict b,
>>>                   int *__restrict c,
>>>                   int *__restrict a2,
>>>                   int *__restrict b2,
>>>                   int *__restrict c2,
>>>                   int *__restrict a3,
>>>                   int *__restrict b3,
>>>                   int *__restrict c3,
>>>                   int *__restrict a4,
>>>                   int *__restrict b4,
>>>                   int *__restrict c4,
>>>                   int *__restrict a5,
>>>                   int *__restrict b5,
>>>                   int *__restrict c5,
>>>                   int n)
>>>         {
>>>             for (int i = 0; i < n; i++){
>>>               a[i] = b[i] + c[i];
>>>               b5[i] = b[i] + c[i];
>>>               a2[i] = b2[i] + c2[i];
>>>               a3[i] = b3[i] + c3[i];
>>>               a4[i] = b4[i] + c4[i];
>>>               a5[i] = a[i] + a4[i];
>>>               a[i] = a5[i] + b5[i]+ a[i];
>>>               a[i] = a[i] + c[i];
>>>               b5[i] = a[i] + c[i];
>>>               a2[i] = a[i] + c2[i];
>>>               a3[i] = a[i] + c3[i];
>>>               a4[i] = a[i] + c4[i];
>>>               a5[i] = a[i] + a4[i];
>>>               a[i] = a[i] + b5[i]+ a[i];
>>>             }
>>>         }
>>>         1. Loop Body:
>>>         Before this patch: After this patch:
>>>               vsetvli a4,t1,e8,mf4,ta,ma vsetvli a4,t1,e32,m1,ta,ma
>>>                 vle32.v v2,0(a2) vle32.v v2,0(a2)
>>>                 vle32.v v4,0(a1) vle32.v v3,0(t2)
>>>                 vle32.v v1,0(t2) vle32.v v4,0(a1)
>>>                 vsetvli a7,zero,e32,m1,ta,ma vle32.v v1,0(t0)
>>>                 vadd.vv v4,v2,v4 vadd.vv v4,v2,v4
>>>                 vsetvli zero,a4,e32,m1,ta,ma vadd.vv v1,v3,v1
>>>                 vle32.v v3,0(s0) vadd.vv v1,v1,v4
>>>                 vsetvli a7,zero,e32,m1,ta,ma vadd.vv v1,v1,v4
>>>                 vadd.vv v1,v3,v1 vadd.vv v1,v1,v4
>>>                 vadd.vv v1,v1,v4 vadd.vv v1,v1,v2
>>>                 vadd.vv v1,v1,v4 vadd.vv v2,v1,v2
>>>                 vadd.vv v1,v1,v4 vse32.v v2,0(t5)
>>>                 vsetvli zero,a4,e32,m1,ta,ma vadd.vv v2,v2,v1
>>>                 vle32.v v4,0(a5) vadd.vv v2,v2,v1
>>>                 vsetvli a7,zero,e32,m1,ta,ma                        
>>>         slli a7,a4,2
>>>                 vadd.vv v1,v1,v2 vadd.vv v3,v1,v3
>>>                 vadd.vv v2,v1,v2 vle32.v v5,0(a5)
>>>                 vadd.vv v4,v1,v4 vle32.v v6,0(t6)
>>>                 vsetvli zero,a4,e32,m1,ta,ma vse32.v v3,0(t3)
>>>                 vse32.v v2,0(t5) vse32.v v2,0(a0)
>>>                 vse32.v v4,0(a3) vadd.vv v3,v3,v1
>>>                 vsetvli a7,zero,e32,m1,ta,ma vadd.vv v2,v1,v5
>>>                 vadd.vv v3,v1,v3 vse32.v v3,0(t4)
>>>                 vadd.vv v2,v2,v1 vadd.vv v1,v1,v6
>>>                 vadd.vv v2,v2,v1 vse32.v v2,0(a3)
>>>                 vsetvli zero,a4,e32,m1,ta,ma vse32.v v1,0(a6)
>>>                 vse32.v v2,0(a0)
>>>                 vse32.v v3,0(t3)
>>>                 vle32.v v2,0(t0)
>>>                 vsetvli a7,zero,e32,m1,ta,ma
>>>                 vadd.vv v3,v3,v1
>>>                 vsetvli zero,a4,e32,m1,ta,ma
>>>                 vse32.v v3,0(t4)
>>>                 vsetvli a7,zero,e32,m1,ta,ma
>>>                 slli a7,a4,2
>>>                 vadd.vv v1,v1,v2
>>>                 sub t1,t1,a4
>>>                 vsetvli zero,a4,e32,m1,ta,ma
>>>                 vse32.v v1,0(a6)
>>>         It's quite obvious, all heavy && redundant vsetvls inside
>>>         loop body are eliminated.
>>>         2. Epilogue:
>>>             Before this patch: After this patch:
>>>         .L5: .L5:
>>>                 ld s0,8(sp) ret
>>>                 addi sp,sp,16
>>>                 jr ra
>>>         This is the benefit we do the AVL propation before RA since
>>>         we eliminate the use of 'a7' register
>>>         which is used by the redudant AVL/VL toggling instruction:
>>>         'vsetvli a7,zero,e32,m1,ta,ma'
>>>         The final codegen after this patch:
>>>         foo2:
>>>         lw t1,56(sp)
>>>         ld t6,0(sp)
>>>         ld t3,8(sp)
>>>         ld t0,16(sp)
>>>         ld t2,24(sp)
>>>         ld t4,32(sp)
>>>         ld t5,40(sp)
>>>         ble t1,zero,.L5
>>>         .L3:
>>>         vsetvli a4,t1,e32,m1,ta,ma
>>>         vle32.v v2,0(a2)
>>>         vle32.v v3,0(t2)
>>>         vle32.v v4,0(a1)
>>>         vle32.v v1,0(t0)
>>>         vadd.vv v4,v2,v4
>>>         vadd.vv v1,v3,v1
>>>         vadd.vv v1,v1,v4
>>>         vadd.vv v1,v1,v4
>>>         vadd.vv v1,v1,v4
>>>         vadd.vv v1,v1,v2
>>>         vadd.vv v2,v1,v2
>>>         vse32.v v2,0(t5)
>>>         vadd.vv v2,v2,v1
>>>         vadd.vv v2,v2,v1
>>>         slli a7,a4,2
>>>         vadd.vv v3,v1,v3
>>>         vle32.v v5,0(a5)
>>>         vle32.v v6,0(t6)
>>>         vse32.v v3,0(t3)
>>>         vse32.v v2,0(a0)
>>>         vadd.vv v3,v3,v1
>>>         vadd.vv v2,v1,v5
>>>         vse32.v v3,0(t4)
>>>         vadd.vv v1,v1,v6
>>>         vse32.v v2,0(a3)
>>>         vse32.v v1,0(a6)
>>>         sub t1,t1,a4
>>>         add a1,a1,a7
>>>         add a2,a2,a7
>>>         add a5,a5,a7
>>>         add t6,t6,a7
>>>         add t0,t0,a7
>>>         add t2,t2,a7
>>>         add t5,t5,a7
>>>         add a3,a3,a7
>>>         add a6,a6,a7
>>>         add t3,t3,a7
>>>         add t4,t4,a7
>>>         add a0,a0,a7
>>>         bne t1,zero,.L3
>>>         .L5:
>>>         ret
>>>         PR target/111888
>>>         gcc/ChangeLog:
>>>         * config.gcc: Add AVL propgatation PASS.
>>>         * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
>>>         * config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
>>>         (has_vtype_op): Export as global.
>>>         (has_vl_op): Ditto.
>>>         (tail_agnostic_p): Ditto.
>>>         (validate_change_or_fail): Ditto.
>>>         (vlmax_avl_type_p): Ditto.
>>>         (vlmax_avl_p): Ditto.
>>>         (get_sew): Ditto.
>>>         (enum vlmul_type): Ditto.
>>>         (const_vlmax_p): Ditto.
>>>         * config/riscv/riscv-v.cc (has_vtype_op): Ditto.
>>>         (has_vl_op): Ditto.
>>>         (get_default_ta): Ditto.
>>>         (tail_agnostic_p): Ditto.
>>>         (validate_change_or_fail): Ditto.
>>>         (vlmax_avl_type_p): Ditto.
>>>         (vlmax_avl_p): Ditto.
>>>         (get_sew): Ditto.
>>>         (enum vlmul_type): Ditto.
>>>         (get_vlmul): Ditto.
>>>         * config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
>>>         (has_vtype_op): Ditto.
>>>         (has_vl_op): Ditto.
>>>         (get_sew): Ditto.
>>>         (get_vlmul): Ditto.
>>>         (get_default_ta): Ditto.
>>>         (tail_agnostic_p): Ditto.
>>>         (validate_change_or_fail): Ditto.
>>>         * config/riscv/t-riscv: Add AVL propagation PASS.
>>>         * config/riscv/vector.md: Fix VLS modes attribute.
>>>         * config/riscv/riscv-avlprop.cc: New file.
>>>         gcc/testsuite/ChangeLog:
>>>         * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
>>>         * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
>>>         * gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
>>>         * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
>>>         * gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
>>>         * gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
>>>         * gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.
>>>         ---
>>>         gcc/config.gcc |   2 +-
>>>         gcc/config/riscv/riscv-avlprop.cc | 350 ++++++++++++++++++
>>>         gcc/config/riscv/riscv-passes.def |   1 +
>>>         gcc/config/riscv/riscv-protos.h |  10 +
>>>         gcc/config/riscv/riscv-v.cc |  84 ++++-
>>>         gcc/config/riscv/riscv-vsetvl.cc |  82 +---
>>>         gcc/config/riscv/t-riscv |   6 +
>>>         gcc/config/riscv/vector.md |   2 +-
>>>         .../costmodel/riscv/rvv/dynamic-lmul4-5.c |   2 +-
>>>         .../costmodel/riscv/rvv/dynamic-lmul8-2.c |   2 +-
>>>         .../riscv/rvv/autovec/partial/select_vl-2.c |   5 +-
>>>         .../riscv/rvv/autovec/ternop/ternop_nofm-2.c |   1 -
>>>         .../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
>>>         .../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
>>>         gcc/testsuite/gcc.target/riscv/rvv/rvv.exp |   2 +
>>>         15 files changed, 514 insertions(+), 84 deletions(-)
>>>         create mode 100644 gcc/config/riscv/riscv-avlprop.cc
>>>         create mode 100644
>>>         gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>>         create mode 100644
>>>         gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>>         diff --git a/gcc/config.gcc b/gcc/config.gcc
>>>         index 606d3a8513e..efd53965c9a 100644
>>>         --- a/gcc/config.gcc
>>>         +++ b/gcc/config.gcc
>>>         @@ -544,7 +544,7 @@ pru-*-*)
>>>         riscv*)
>>>         cpu_type=riscv
>>>         extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o
>>>         riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
>>>         - extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o
>>>         riscv-vector-costs.o"
>>>         + extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o
>>>         riscv-vector-costs.o riscv-avlprop.o"
>>>         extra_objs="${extra_objs} riscv-vector-builtins.o
>>>         riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>>>         extra_objs="${extra_objs} thead.o"
>>>         d_target_objs="riscv-d.o"
>>>         diff --git a/gcc/config/riscv/riscv-avlprop.cc
>>>         b/gcc/config/riscv/riscv-avlprop.cc
>>>         new file mode 100644
>>>         index 00000000000..bf3becd8371
>>>         --- /dev/null
>>>         +++ b/gcc/config/riscv/riscv-avlprop.cc
>>>         @@ -0,0 +1,350 @@
>>>         +/* AVL propagation pass for RISC-V 'V' Extension for GNU
>>>         compiler.
>>>         +   Copyright (C) 2023-2023 Free Software Foundation, Inc.
>>>         +   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI
>>>         Technologies Ltd.
>>>         +
>>>         +This file is part of GCC.
>>>         +
>>>         +GCC is free software; you can redistribute it and/or modify
>>>         +it under the terms of the GNU General Public License as
>>>         published by
>>>         +the Free Software Foundation; either version 3, or(at your
>>>         option)
>>>         +any later version.
>>>         +
>>>         +GCC is distributed in the hope that it will be useful,
>>>         +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>         +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>         +GNU General Public License for more details.
>>>         +
>>>         +You should have received a copy of the GNU General Public
>>>         License
>>>         +along with GCC; see the file COPYING3.  If not see
>>>         +<http://www.gnu.org/licenses/>. */
>>>         +
>>>         +/* Pre-RA RTL_SSA-based pass propagates AVL for RVV
>>>         instructions.
>>>         +   A standalone AVL propagation pass is designed because:
>>>         +
>>>         +     - Better code maintain:
>>>         +       Current LCM-based VSETVL pass is so complicated that
>>>         codes
>>>         +       there will become even harder to maintain. A
>>>         straight forward
>>>         +       AVL propagation PASS is much easier to maintain.
>>>         +
>>>         +     - Reduce scalar register pressure:
>>>         +       A type of AVL propagation is we propagate AVL from
>>>         NON-VLMAX
>>>         +       instruction to VLMAX instruction.
>>>         +       Note: VLMAX instruction should be ignore tail
>>>         elements (TA)
>>>         +       and the result should be used by the NON-VLMAX
>>>         instruction.
>>>         +       This optimization is mostly for auto-vectorization
>>>         codes:
>>>         +
>>>         +   vsetvli r136, r137      --- SELECT_VL
>>>         +   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
>>>         +   vadd.vv (use VLMAX)     --- PLUS_EXPR
>>>         +   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
>>>         +
>>>         + NO AVL propation:
>>>         +
>>>         +   vsetvli a5, a4, ta
>>>         +   vle8.v v1
>>>         +   vsetvli t0, zero, ta
>>>         +   vadd.vv v2, v1, v1
>>>         +   vse8.v v2
>>>         +
>>>         + We can propagate the AVL to 'vadd.vv' since its result
>>>         + is consumed by a 'vse8.v' which has AVL = a5 and its
>>>         + tail elements are agnostic.
>>>         +
>>>         +       We DON'T do this optimization on VSETVL pass since
>>>         it is a
>>>         +       post-RA pass that consumed 't0' already wheras a
>>>         standalone
>>>         +       pre-RA AVL propagation pass allows us elide the
>>>         consumption
>>>         +       of the pseudo register of 't0' then we can reduce scalar
>>>         +       register pressure.
>>>         +
>>>         +     - More AVL propagation opportunities:
>>>         +       A pre-RA pass is more flexible for AVL REG def-use
>>>         chain,
>>>         +       thus we will get more potential AVL propagation as
>>>         long as
>>>         +       it doesn't increase the scalar register pressure.
>>>         +*/
>>>         +
>>>         +#define IN_TARGET_CODE 1
>>>         +#define INCLUDE_ALGORITHM
>>>         +#define INCLUDE_FUNCTIONAL
>>>         +
>>>         +#include "config.h"
>>>         +#include "system.h"
>>>         +#include "coretypes.h"
>>>         +#include "tm.h"
>>>         +#include "backend.h"
>>>         +#include "rtl.h"
>>>         +#include "target.h"
>>>         +#include "tree-pass.h"
>>>         +#include "df.h"
>>>         +#include "rtl-ssa.h"
>>>         +#include "cfgcleanup.h"
>>>         +#include "insn-attr.h"
>>>         +
>>>         +using namespace rtl_ssa;
>>>         +using namespace riscv_vector;
>>>         +
>>>         +/* The AVL propagation instructions and corresponding
>>>         preferred AVL.
>>>         +   It will be updated during the analysis.  */
>>>         +static hash_map<insn_info *, rtx> *avlprops;
>>>         +
>>>         +const pass_data pass_data_avlprop = {
>>>         +  RTL_PASS, /* type */
>>>         +  "avlprop", /* name */
>>>         +  OPTGROUP_NONE, /* optinfo_flags */
>>>         +  TV_NONE, /* tv_id */
>>>         +  0, /* properties_required */
>>>         +  0, /* properties_provided */
>>>         +  0, /* properties_destroyed */
>>>         +  0, /* todo_flags_start */
>>>         +  0, /* todo_flags_finish */
>>>         +};
>>>         +
>>>         +class pass_avlprop : public rtl_opt_pass
>>>         +{
>>>         +public:
>>>         +  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass
>>>         (pass_data_avlprop, ctxt) {}
>>>         +
>>>         +  /* opt_pass methods: */
>>>         +  virtual bool gate (function *) final override
>>>         +  {
>>>         +    return TARGET_VECTOR && optimize > 0;
>>>         +  }
>>>         +  virtual unsigned int execute (function *) final override;
>>>         +}; // class pass_avlprop
>>>         +
>>>         +static void
>>>         +avlprop_init (void)
>>>         +{
>>>         +  calculate_dominance_info (CDI_DOMINATORS);
>>>         +  df_analyze ();
>>>         +  crtl->ssa = new function_info (cfun);
>>>         +  avlprops = new hash_map<insn_info *, rtx>;
>>>         +}
>>>         +
>>>         +static void
>>>         +avlprop_done (void)
>>>         +{
>>>         +  free_dominance_info (CDI_DOMINATORS);
>>>         +  if (crtl->ssa->perform_pending_updates ())
>>>         +    cleanup_cfg (0);
>>>         +  delete crtl->ssa;
>>>         +  crtl->ssa = nullptr;
>>>         +  delete avlprops;
>>>         +  avlprops = NULL;
>>>         +}
>>>         +
>>>         +/* Helper function to get AVL operand.  */
>>>         +static rtx
>>>         +get_avl (insn_info *insn, bool avlprop_p)
>>>         +{
>>>         +  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
>>>         +      || get_attr_avl_type (insn->rtl ()) == VLS)
>>>         +    return NULL_RTX;
>>>         +  if (avlprop_p)
>>>         +    {
>>>         +      if (avlprops->get (insn))
>>>         + return (*avlprops->get (insn));
>>>         +      else if (vlmax_avl_type_p (insn->rtl ()))
>>>         + return RVV_VLMAX;
>>>         +    }
>>>         +  extract_insn_cached (insn->rtl ());
>>>         +  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
>>>         +}
>>>         +
>>>         +/* This is a straight forward pattern ALWAYS in paritial
>>>         auto-vectorization:
>>>         +
>>>         +     VL = SELECT_AVL (AVL, ...)
>>>         +     V0 = MASK_LEN_LOAD (..., VL)
>>>         +     V1 = MASK_LEN_LOAD (..., VL)
>>>         +     V2 = V0 + V1 --- Missed LEN information.
>>>         +     MASK_LEN_STORE (..., V2, VL)
>>>         +
>>>         +   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD
>>>         (V0, V1, dummy LEN)
>>>         +   because:
>>>         +
>>>         +     - Few code changes in Loop Vectorizer.
>>>         +     - Reuse the current clean flow of partial
>>>         vectorization, That is, apply
>>>         +       predicate LEN or MASK into LOAD/STORE operations and
>>>         other special
>>>         +       arithmetic operations (e.d. DIV), then do the whole
>>>         vector register
>>>         +       operation if it DON'T affect the correctness.
>>>         +       Such flow is used by all other targets like x86,
>>>         sve, s390, ... etc.
>>>         +     - PLUS_EXPR has better gimple optimizations than
>>>         COND_LEN_ADD.
>>>         +
>>>         +   We propagate AVL from NON-VLMAX to VLMAX for gimple IR
>>>         like PLUS_EXPR which
>>>         +   generates the VLMAX instruction due to missed LEN
>>>         information. The later
>>>         +   VSETVL PASS will elided the redundant vsetvls.
>>>         +*/
>>>         +
>>>         +static rtx
>>>         +get_autovectorize_preferred_avl (insn_info *insn)
>>>         +{
>>>         +  if (!vlmax_avl_p (get_avl (insn, true)) ||
>>>         !tail_agnostic_p (insn->rtl ()))
>>>         +    return NULL_RTX;
>>>         +
>>>         +  rtx use_avl = NULL_RTX;
>>>         +  insn_info *avl_use_insn = nullptr;
>>>         +  unsigned int ratio
>>>         +    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul
>>>         (insn->rtl ()));
>>>         +  for (def_info *def : insn->defs ())
>>>         +    {
>>>         +      auto set = safe_dyn_cast<set_info *> (def);
>>>         +      if (!set || !set->is_reg ())
>>>         + return NULL_RTX;
>>>         +      for (use_info *use : set->all_uses ())
>>>         + {
>>>         +   if (!use->is_in_nondebug_insn ())
>>>         +     return NULL_RTX;
>>>         +   insn_info *use_insn = use->insn ();
>>>         +   /* FIXME: Stop AVL propagation if any USE is not a RVV real
>>>         +      instruction. It should be totally enough for
>>>         vectorized codes since
>>>         +      they always locate at extended blocks.
>>>         +
>>>         +      TODO: We can extend PHI checking for intrinsic codes
>>>         if it
>>>         +      necessary in the future.  */
>>>         +   if (use_insn->is_artificial () || !has_vtype_op
>>>         (use_insn->rtl ()))
>>>         +     return NULL_RTX;
>>>         +   if (!has_vl_op (use_insn->rtl ()))
>>>         +     continue;
>>>         +
>>>         +   rtx new_use_avl = get_avl (use_insn, true);
>>>         +   if (!new_use_avl)
>>>         +     return NULL_RTX;
>>>         +   if (!use_avl)
>>>         +     use_avl = new_use_avl;
>>>         +   if (!rtx_equal_p (use_avl, new_use_avl)
>>>         +       || calculate_ratio (get_sew (use_insn->rtl ()),
>>>         +   get_vlmul (use_insn->rtl ()))
>>>         +    != ratio
>>>         +       || vlmax_avl_p (new_use_avl)
>>>         +       || !tail_agnostic_p (use_insn->rtl ()))
>>>         +     return NULL_RTX;
>>>         +   if (!avl_use_insn)
>>>         +     avl_use_insn = use_insn;
>>>         + }
>>>         +    }
>>>         +
>>>         +  if (use_avl && register_operand (use_avl, Pmode))
>>>         +    {
>>>         +      gcc_assert (avl_use_insn);
>>>         +      // Find a definition at or neighboring INSN.
>>>         +      resource_info resource = full_register (REGNO (use_avl));
>>>         +      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
>>>         +      def_lookup dl2 = crtl->ssa->find_def (resource,
>>>         avl_use_insn);
>>>         +      if (dl1.matching_set () || dl2.matching_set ())
>>>         + return NULL_RTX;
>>>         +      def_info *def1 = dl1.last_def_of_prev_group ();
>>>         +      def_info *def2 = dl2.last_def_of_prev_group ();
>>>         +      if (def1 != def2)
>>>         + return NULL_RTX;
>>>         +      /* FIXME: We only all AVL propation within a block
>>>         which should
>>>         + be totally enough for vectorized codes.
>>>         +
>>>         + TODO: We can enhance it here for intrinsic codes in the future
>>>         + if it is necessary.  */
>>>         +      if (def1->insn ()->bb () != insn->bb ()
>>>         +   || def1->insn ()->compare_with (insn) >= 0)
>>>         + return NULL_RTX;
>>>         +    }
>>>         +  return use_avl;
>>>         +}
>>>         +
>>>         +/* If we have a preferred AVL to propagate, return the AVL.
>>>         +   Otherwise, return NULL_RTX as we don't need have any
>>>         preferred
>>>         +   AVL.  */
>>>         +
>>>         +static rtx
>>>         +get_preferred_avl (insn_info *insn)
>>>         +{
>>>         +  /* TODO: We only do AVL propagation for missed-LEN partial
>>>         +     autovectorization for now.  We could add more more AVL
>>>         +     propagation for intrinsic codes in the future.  */
>>>         +  return get_autovectorize_preferred_avl (insn);
>>>         +}
>>>         +
>>>         +/* Return the AVL TYPE operand index.  */
>>>         +static int
>>>         +get_avl_type_index (insn_info *insn)
>>>         +{
>>>         +  extract_insn_cached (insn->rtl ());
>>>         +  /* Except rounding mode patterns, AVL TYPE operand
>>>         +     is always the last operand.  */
>>>         +  if (find_access (insn->uses (), VXRM_REGNUM)
>>>         +      || find_access (insn->uses (), FRM_REGNUM))
>>>         +    return recog_data.n_operands - 2;
>>>         +  return recog_data.n_operands - 1;
>>>         +}
>>>         +
>>>         +/* Main entry point for this pass.  */
>>>         +unsigned int
>>>         +pass_avlprop::execute (function *)
>>>         +{
>>>         +  avlprop_init ();
>>>         +
>>>         +  /* Go through all the instructions looking for AVL that
>>>         we could propagate. */
>>>         +
>>>         +  insn_info *next;
>>>         +  bool change_p = true;
>>>         +
>>>         +  while (change_p)
>>>         +    {
>>>         +      /* Iterate on each instruction until no more change
>>>         need.  */
>>>         +      change_p = false;
>>>         +      for (insn_info *insn = crtl->ssa->first_insn ();
>>>         insn; insn = next)
>>>         + {
>>>         +   next = insn->next_any_insn ();
>>>         +   /* We only forward AVL to the instruction that has
>>>         AVL/VL operand
>>>         +      and can be optimized in RTL_SSA level. */
>>>         +   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
>>>         +     continue;
>>>         +
>>>         +   rtx new_avl = get_preferred_avl (insn);
>>>         +   if (new_avl)
>>>         +     {
>>>         +       gcc_assert (!vlmax_avl_p (new_avl));
>>>         +       auto &update = avlprops->get_or_insert (insn);
>>>         +       change_p = !rtx_equal_p (update, new_avl);
>>>         +       update = new_avl;
>>>         +     }
>>>         + }
>>>         +    }
>>>         +
>>>         +  if (dump_file)
>>>         +    fprintf (dump_file, "\nNumber of successful AVL
>>>         propagations: %d\n\n",
>>>         +      (int) avlprops->elements ());
>>>         +
>>>         +  for (const auto iter : *avlprops)
>>>         +    {
>>>         +      rtx_insn *rinsn = iter.first->rtl ();
>>>         +      if (dump_file)
>>>         + {
>>>         +   fprintf (dump_file, "\nPropagating AVL: ");
>>>         +   print_rtl_single (dump_file, iter.second);
>>>         +   fprintf (dump_file, "into: ");
>>>         +   print_rtl_single (dump_file, rinsn);
>>>         + }
>>>         +      /* Replace AVL operand.  */
>>>         +      rtx new_pat
>>>         + = simplify_replace_rtx (PATTERN (rinsn), get_avl
>>>         (iter.first, false),
>>>         + iter.second);
>>>         +      validate_change_or_fail (rinsn, &PATTERN (rinsn),
>>>         new_pat, false);
>>>         +
>>>         +      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
>>>         +      if (vlmax_avl_type_p (rinsn))
>>>         + validate_change_or_fail (
>>>         +   rinsn, recog_data.operand_loc[get_avl_type_index
>>>         (iter.first)],
>>>         +   get_avl_type_rtx (avl_type::NONVLMAX), false);
>>>         +      if (dump_file)
>>>         + {
>>>         +   fprintf (dump_file, "Successfully to match this
>>>         instruction: ");
>>>         +   print_rtl_single (dump_file, rinsn);
>>>         + }
>>>         +    }
>>>         +
>>>         +  avlprop_done ();
>>>         +  return 0;
>>>         +}
>>>         +
>>>         +rtl_opt_pass *
>>>         +make_pass_avlprop (gcc::context *ctxt)
>>>         +{
>>>         +  return new pass_avlprop (ctxt);
>>>         +}
>>>         diff --git a/gcc/config/riscv/riscv-passes.def
>>>         b/gcc/config/riscv/riscv-passes.def
>>>         index 4084122cf0a..b6260939d5c 100644
>>>         --- a/gcc/config/riscv/riscv-passes.def
>>>         +++ b/gcc/config/riscv/riscv-passes.def
>>>         @@ -18,4 +18,5 @@
>>>         <http://www.gnu.org/licenses/>. */
>>>         INSERT_PASS_AFTER (pass_rtl_store_motion, 1,
>>>         pass_shorten_memrefs);
>>>         +INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
>>>         INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
>>>         diff --git a/gcc/config/riscv/riscv-protos.h
>>>         b/gcc/config/riscv/riscv-protos.h
>>>         index 6cb9d459ee9..2b09ec9ea9e 100644
>>>         --- a/gcc/config/riscv/riscv-protos.h
>>>         +++ b/gcc/config/riscv/riscv-protos.h
>>>         @@ -156,6 +156,7 @@ extern void riscv_parse_arch_string
>>>         (const char *, struct gcc_options *, locatio
>>>         extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
>>>         rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
>>>         +rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
>>>         rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
>>>         /* Routines implemented in riscv-string.c.  */
>>>         @@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
>>>         bool cmp_lmul_gt_one (machine_mode);
>>>         bool gather_scatter_valid_offset_mode_p (machine_mode);
>>>         bool vls_mode_valid_p (machine_mode);
>>>         +bool has_vtype_op (rtx_insn *);
>>>         +bool has_vl_op (rtx_insn *);
>>>         +bool tail_agnostic_p (rtx_insn *);
>>>         +void validate_change_or_fail (rtx, rtx *, rtx, bool);
>>>         +bool vlmax_avl_type_p (rtx_insn *);
>>>         +bool vlmax_avl_p (rtx);
>>>         +uint8_t get_sew (rtx_insn *);
>>>         +enum vlmul_type get_vlmul (rtx_insn *);
>>>         +bool const_vlmax_p (machine_mode);
>>>         }
>>>         /* We classify builtin types into two classes:
>>>         diff --git a/gcc/config/riscv/riscv-v.cc
>>>         b/gcc/config/riscv/riscv-v.cc
>>>         index e39a9507803..473622ac321 100644
>>>         --- a/gcc/config/riscv/riscv-v.cc
>>>         +++ b/gcc/config/riscv/riscv-v.cc
>>>         @@ -56,7 +56,7 @@ using namespace riscv_vector;
>>>         namespace riscv_vector {
>>>         /* Return true if vlmax is constant value and can be used in
>>>         vsetivl.  */
>>>         -static bool
>>>         +bool
>>>         const_vlmax_p (machine_mode mode)
>>>         {
>>>            poly_uint64 nuints = GET_MODE_NUNITS (mode);
>>>         @@ -298,14 +298,6 @@ public:
>>>               len = force_reg (Pmode, len);
>>>             vls_p = true;
>>>           }
>>>         - else if (const_vlmax_p (vtype_mode))
>>>         -   {
>>>         -     /* Optimize VLS-VLMAX code gen, we can use vsetivli
>>>         instead of
>>>         -        the vsetvli to obtain the value of vlmax.  */
>>>         -     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
>>>         -     len = gen_int_mode (nunits, Pmode);
>>>         -     vls_p = true;
>>>         -   }
>>>         else if (can_create_pseudo_p ())
>>>           {
>>>             len = gen_reg_rtx (Pmode);
>>>         @@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
>>>            emit_move_insn (dst, x4);
>>>         }
>>>         +/* Return true if it is an RVV instruction depends on VTYPE
>>>         global
>>>         +   status register.  */
>>>         +bool
>>>         +has_vtype_op (rtx_insn *rinsn)
>>>         +{
>>>         +  return recog_memoized (rinsn) >= 0 &&
>>>         get_attr_has_vtype_op (rinsn);
>>>         +}
>>>         +
>>>         +/* Return true if it is an RVV instruction depends on VL global
>>>         +   status register.  */
>>>         +bool
>>>         +has_vl_op (rtx_insn *rinsn)
>>>         +{
>>>         +  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op
>>>         (rinsn);
>>>         +}
>>>         +
>>>         +/* Get default tail policy.  */
>>>         +static bool
>>>         +get_default_ta ()
>>>         +{
>>>         +  /* For the instruction that doesn't require TA, we still
>>>         need a default value
>>>         +     to emit vsetvl. We pick up the default value according
>>>         to prefer policy. */
>>>         +  return (bool) (get_prefer_tail_policy () & 0x1
>>>         + || (get_prefer_tail_policy () >> 1 & 0x1));
>>>         +}
>>>         +
>>>         +/* Helper function to get TA operand.  */
>>>         +bool
>>>         +tail_agnostic_p (rtx_insn *rinsn)
>>>         +{
>>>         +  /* If it doesn't have TA, we return agnostic by default.  */
>>>         +  extract_insn_cached (rinsn);
>>>         +  int ta = get_attr_ta (rinsn);
>>>         +  return ta == INVALID_ATTRIBUTE ? get_default_ta () :
>>>         IS_AGNOSTIC (ta);
>>>         +}
>>>         +
>>>         +/* Change insn and Assert the change always happens.  */
>>>         +void
>>>         +validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx,
>>>         bool in_group)
>>>         +{
>>>         +  bool change_p = validate_change (object, loc, new_rtx,
>>>         in_group);
>>>         +  gcc_assert (change_p);
>>>         +}
>>>         +
>>>         +/* Return true if it is VLMAX AVL TYPE.  */
>>>         +bool
>>>         +vlmax_avl_type_p (rtx_insn *rinsn)
>>>         +{
>>>         +  return get_attr_avl_type (rinsn) == VLMAX;
>>>         +}
>>>         +
>>>         +/* Return true if RTX is RVV VLMAX AVL.  */
>>>         +bool
>>>         +vlmax_avl_p (rtx x)
>>>         +{
>>>         +  return x && rtx_equal_p (x, RVV_VLMAX);
>>>         +}
>>>         +
>>>         +/* Helper function to get SEW operand. We always have SEW
>>>         value for
>>>         +   all RVV instructions that have VTYPE OP. */
>>>         +uint8_t
>>>         +get_sew (rtx_insn *rinsn)
>>>         +{
>>>         +  return get_attr_sew (rinsn);
>>>         +}
>>>         +
>>>         +/* Helper function to get VLMUL operand. We always have
>>>         VLMUL value for
>>>         +   all RVV instructions that have VTYPE OP. */
>>>         +enum vlmul_type
>>>         +get_vlmul (rtx_insn *rinsn)
>>>         +{
>>>         +  return (enum vlmul_type) get_attr_vlmul (rinsn);
>>>         +}
>>>         +
>>>         } // namespace riscv_vector
>>>         diff --git a/gcc/config/riscv/riscv-vsetvl.cc
>>>         b/gcc/config/riscv/riscv-vsetvl.cc
>>>         index e9dd669de98..f2f19e423bf 100644
>>>         --- a/gcc/config/riscv/riscv-vsetvl.cc
>>>         +++ b/gcc/config/riscv/riscv-vsetvl.cc
>>>         @@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
>>>            return agnostic_p ? "agnostic" : "undisturbed";
>>>         }
>>>         -static bool
>>>         -vlmax_avl_p (rtx x)
>>>         -{
>>>         -  return x && rtx_equal_p (x, RVV_VLMAX);
>>>         -}
>>>         -
>>>         -/* Return true if it is an RVV instruction depends on VTYPE
>>>         global
>>>         -   status register.  */
>>>         -static bool
>>>         -has_vtype_op (rtx_insn *rinsn)
>>>         -{
>>>         -  return recog_memoized (rinsn) >= 0 &&
>>>         get_attr_has_vtype_op (rinsn);
>>>         -}
>>>         -
>>>         -/* Return true if it is an RVV instruction depends on VL global
>>>         -   status register.  */
>>>         -static bool
>>>         -has_vl_op (rtx_insn *rinsn)
>>>         -{
>>>         -  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op
>>>         (rinsn);
>>>         -}
>>>         -
>>>         /* Return true if the instruction ignores VLMUL field of
>>>         VTYPE.  */
>>>         static bool
>>>         ignore_vlmul_insn_p (rtx_insn *rinsn)
>>>         @@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
>>>            if (!has_vl_op (rinsn))
>>>              return NULL_RTX;
>>>         -  if (get_attr_avl_type (rinsn) == VLMAX)
>>>         -    return RVV_VLMAX;
>>>         -  extract_insn_cached (rinsn);
>>>         -  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
>>>         -}
>>>         -/* Helper function to get SEW operand. We always have SEW
>>>         value for
>>>         -   all RVV instructions that have VTYPE OP. */
>>>         -static uint8_t
>>>         -get_sew (rtx_insn *rinsn)
>>>         -{
>>>         -  return get_attr_sew (rinsn);
>>>         -}
>>>         -
>>>         -/* Helper function to get VLMUL operand. We always have
>>>         VLMUL value for
>>>         -   all RVV instructions that have VTYPE OP. */
>>>         -static enum vlmul_type
>>>         -get_vlmul (rtx_insn *rinsn)
>>>         -{
>>>         -  return (enum vlmul_type) get_attr_vlmul (rinsn);
>>>         -}
>>>         +  extract_insn_cached (rinsn);
>>>         +  if (vlmax_avl_type_p (rinsn))
>>>         +    {
>>>         +      if (BYTES_PER_RISCV_VECTOR.is_constant ())
>>>         + {
>>>         +   for (int i = 0; i < recog_data.n_operands; i++)
>>>         +     if (GET_MODE_CLASS (recog_data.operand_mode[i]) ==
>>>         MODE_VECTOR_BOOL
>>>         + && const_vlmax_p (recog_data.operand_mode[i]))
>>>         +       return gen_int_mode (GET_MODE_NUNITS
>>>         (recog_data.operand_mode[i]),
>>>         +    Pmode);
>>>         + }
>>>         +      return RVV_VLMAX;
>>>         +    }
>>>         -/* Get default tail policy.  */
>>>         -static bool
>>>         -get_default_ta ()
>>>         -{
>>>         -  /* For the instruction that doesn't require TA, we still
>>>         need a default value
>>>         -     to emit vsetvl. We pick up the default value according
>>>         to prefer policy. */
>>>         -  return (bool) (get_prefer_tail_policy () & 0x1
>>>         - || (get_prefer_tail_policy () >> 1 & 0x1));
>>>         +  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
>>>         }
>>>         /* Get default mask policy.  */
>>>         @@ -407,16 +371,6 @@ get_default_ma ()
>>>         || (get_prefer_mask_policy () >> 1 & 0x1));
>>>         }
>>>         -/* Helper function to get TA operand.  */
>>>         -static bool
>>>         -tail_agnostic_p (rtx_insn *rinsn)
>>>         -{
>>>         -  /* If it doesn't have TA, we return agnostic by default.  */
>>>         -  extract_insn_cached (rinsn);
>>>         -  int ta = get_attr_ta (rinsn);
>>>         -  return ta == INVALID_ATTRIBUTE ? get_default_ta () :
>>>         IS_AGNOSTIC (ta);
>>>         -}
>>>         -
>>>         /* Helper function to get MA operand.  */
>>>         static bool
>>>         mask_agnostic_p (rtx_insn *rinsn)
>>>         @@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb,
>>>         rtx_insn *rinsn, int regno)
>>>            return true;
>>>         }
>>>         -/* Change insn and Assert the change always happens.  */
>>>         -static void
>>>         -validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx,
>>>         bool in_group)
>>>         -{
>>>         -  bool change_p = validate_change (object, loc, new_rtx,
>>>         in_group);
>>>         -  gcc_assert (change_p);
>>>         -}
>>>         -
>>>         /* This flags indicates the minimum demand of the vl and
>>>         vtype values by the
>>>             RVV instruction. For example, DEMAND_RATIO_P indicates
>>>         that this RVV
>>>             instruction only needs the SEW/LMUL ratio to remain the
>>>         same, and does not
>>>         diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
>>>         index dd17056fe82..08de62853a6 100644
>>>         --- a/gcc/config/riscv/t-riscv
>>>         +++ b/gcc/config/riscv/t-riscv
>>>         @@ -69,6 +69,12 @@ riscv-vsetvl.o:
>>>         $(srcdir)/config/riscv/riscv-vsetvl.cc \
>>>         $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS)
>>>         $(INCLUDES) \
>>>         $(srcdir)/config/riscv/riscv-vsetvl.cc
>>>         +riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
>>>         +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H)
>>>         $(REGS_H) \
>>>         +  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h
>>>         insn-attr.h
>>>         + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS)
>>>         $(INCLUDES) \
>>>         + $(srcdir)/config/riscv/riscv-avlprop.cc
>>>         +
>>>         riscv-vector-costs.o:
>>>         $(srcdir)/config/riscv/riscv-vector-costs.cc \
>>>            $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H)
>>>         $(FUNCTION_H) \
>>>            $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h
>>>         cfgloop.h \
>>>         diff --git a/gcc/config/riscv/vector.md
>>>         b/gcc/config/riscv/vector.md
>>>         index ef91950178f..0c59d1b90bc 100644
>>>         --- a/gcc/config/riscv/vector.md
>>>         +++ b/gcc/config/riscv/vector.md
>>>         @@ -809,7 +809,7 @@
>>>         V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
>>>         V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
>>>         V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
>>>         -    (symbol_ref "riscv_vector::NONVLMAX")
>>>         +    (symbol_ref "riscv_vector::VLS")
>>>         (eq_attr "type"
>>>         "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
>>>         vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
>>>         vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
>>>         diff --git
>>>         a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>>         b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>>         index 928a507a363..5278e4aa38f 100644
>>>         ---
>>>         a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>>         +++
>>>         b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
>>>         @@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
>>>              }
>>>         }
>>>         -/* { dg-final { scan-assembler {e32,m4} } } */
>>>         +/* { dg-final { scan-assembler {e16,m2} } } */
>>>         /* { dg-final { scan-assembler-not {csrr} } } */
>>>         /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1
>>>         "vect" } } */
>>>         /* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1
>>>         "vect" } } */
>>>         diff --git
>>>         a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>>         b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>>         index a50265fc1ec..1db2e073846 100644
>>>         ---
>>>         a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>>         +++
>>>         b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
>>>         @@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t
>>>         *__restrict b, int n)
>>>              a[i] = a[i] + b[i];
>>>         }
>>>         -/* { dg-final { scan-assembler {e32,m8} } } */
>>>         +/* { dg-final { scan-assembler {e16,m4} } } */
>>>         /* { dg-final { scan-assembler-not {csrr} } } */
>>>         /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1
>>>         "vect" } } */
>>>         /* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect"
>>>         } } */
>>>         diff --git
>>>         a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>>         b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>>         index eac7cbc757b..ca88d42cdf4 100644
>>>         ---
>>>         a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>>         +++
>>>         b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
>>>         @@ -7,10 +7,11 @@
>>>         /*
>>>         ** foo:
>>>         **
>>>         vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>>>         +** ...
>>>         ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
>>>         ** ...
>>>         -**
>>>         vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>>>         -** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
>>>         +**
>>>         vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
>>>         +** ...
>>>         ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
>>>         ** ...
>>>         */
>>>         diff --git
>>>         a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>>         b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>>         index 965365da4bb..13367423751 100644
>>>         ---
>>>         a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>>         +++
>>>         b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
>>>         @@ -3,7 +3,6 @@
>>>         #include "ternop-2.c"
>>>         -/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
>>>         /* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv}
>>>         9 } } */
>>>         /* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9
>>>         "optimized" } } */
>>>         /* { dg-final { scan-assembler-not {\tvmv} } } */
>>>         diff --git
>>>         a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>>         b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>>         new file mode 100644
>>>         index 00000000000..b0d21650c3d
>>>         --- /dev/null
>>>         +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
>>>         @@ -0,0 +1,16 @@
>>>         +/* { dg-do compile } */
>>>         +/* { dg-options "-march=rv64gcv -mabi=lp64d
>>>         --param=riscv-autovec-preference=fixed-vlmax -O3" } */
>>>         +
>>>         +void
>>>         +foo (int *__restrict a, int *__restrict b, int *__restrict
>>>         c, int n)
>>>         +{
>>>         +  for (int i = 0; i < n; i++)
>>>         +    a[i] = b[i] + c[i];
>>>         +}
>>>         +
>>>         +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
>>>         +/* { dg-final { scan-assembler-not {vsetivli} } } */
>>>         +/* { dg-final { scan-assembler-times
>>>         {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
>>>         +/* { dg-final { scan-assembler-not
>>>         {vsetvli\s*[a-x0-9]+,\s*zero} } } */
>>>         +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
>>>         +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
>>>         diff --git
>>>         a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>>         b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>>         new file mode 100644
>>>         index 00000000000..f2d8aa54b88
>>>         --- /dev/null
>>>         +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>>>         @@ -0,0 +1,33 @@
>>>         +/* { dg-do compile } */
>>>         +/* { dg-options "-march=rv64gcv -mabi=lp64d
>>>         --param=riscv-autovec-preference=fixed-vlmax -O3" } */
>>>         +
>>>         +void
>>>         +foo (int *__restrict a, int *__restrict b, int *__restrict c,
>>>         +     int *__restrict a2, int *__restrict b2, int
>>>         *__restrict c2,
>>>         +     int *__restrict a3, int *__restrict b3, int
>>>         *__restrict c3,
>>>         +     int *__restrict a4, int *__restrict b4, int
>>>         *__restrict c4,
>>>         +     int *__restrict a5, int *__restrict b5, int
>>>         *__restrict c5,
>>>         +     int *__restrict d, int *__restrict d2, int *__restrict d3,
>>>         +     int *__restrict d4, int *__restrict d5, int n, int m)
>>>         +{
>>>         +  for (int i = 0; i < n; i++)
>>>         +    {
>>>         +      a[i] = b[i] + c[i];
>>>         +      a2[i] = b2[i] + c2[i];
>>>         +      a3[i] = b3[i] + c3[i];
>>>         +      a4[i] = b4[i] + c4[i];
>>>         +      a5[i] = a[i] + a4[i];
>>>         +      d[i] = a[i] - a2[i];
>>>         +      d2[i] = a2[i] * a[i];
>>>         +      d3[i] = a3[i] * a2[i];
>>>         +      d4[i] = a2[i] * d2[i];
>>>         +      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
>>>         +    }
>>>         +}
>>>         +
>>>         +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
>>>         +/* { dg-final { scan-assembler-not {vsetivli} } } */
>>>         +/* { dg-final { scan-assembler-times
>>>         {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
>>>         +/* { dg-final { scan-assembler-not
>>>         {vsetvli\s*[a-x0-9]+,\s*zero} } } */
>>>         +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
>>>         +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
>>>         diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>>         b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>>         index 674ba0d72b4..fc830f2cd4d 100644
>>>         --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>>         +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>>         @@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain
>>>         $srcdir/$subdir/vsetvl/*.\[cS\]]] \
>>>         "" $CFLAGS
>>>         dg-runtest [lsort [glob -nocomplain
>>>         $srcdir/$subdir/autovec/*.\[cS\]]] \
>>>         "-O3 -ftree-vectorize" $CFLAGS
>>>         +dg-runtest [lsort [glob -nocomplain
>>>         $srcdir/$subdir/avlprop/*.\[cS\]]] \
>>>         + "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
>>>         dg-runtest [lsort [glob -nocomplain
>>>         $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
>>>         "-O3 -ftree-vectorize --param
>>>         riscv-autovec-preference=scalable" $CFLAGS
>>>         dg-runtest [lsort [glob -nocomplain
>>>         $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
>>>         -- 
>>>         2.36.3
>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-26  0:37         ` Patrick O'Neill
@ 2023-10-26  0:49           ` juzhe.zhong
  2023-10-26  1:22             ` Patrick O'Neill
  0 siblings, 1 reply; 13+ messages in thread
From: juzhe.zhong @ 2023-10-26  0:49 UTC (permalink / raw)
  To: Patrick O'Neill, gcc-patches
  Cc: kito.cheng, Kito.cheng, jeffreyalaw, Robin Dapp

FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8

These 2 FAILs are bogus. Testcases need to be adapted, I notice I didn't include this in this patch.

FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

These 2 already exist on the trunk for RV32.

FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test 
This FAIL for RV64 is odd. I don't have it.  Could you share me the debug log ?

juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-26 08:37
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
Hi Juzhe,

I tested on glibc rv32/64gcv qemu.
Applied patch to/comparing with 668c4c3783970e7adf0591396b6d0d5286cc0541.

V2 results look much better! I don't see any new fortran failures but I am seeing new gcc failures:

rv64gcv:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test

rv32gcv:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

The popcount-run-1.c test doesn't show up for me on 668c4c3783970e7adf0591396b6d0d5286cc0541 rv32gcv or rv64gcv.
After applying your patch it only shows up on rv32gcv (rv64gcv still does not have the failure). This might be due to a difference in our testing setups.

Thanks,
Patrick

On 10/25/23 05:20, juzhe.zhong@rivai.ai wrote:
Hi, Patrick.

I have fixed on V2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634267.html

I have tested on RV32/RV64 C/C++, no regression. But I am not able to test on Fortran.

The failures you showed have been fixed. Except this one:
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
This FAIL is not because of this patch since I confirmed it already existed without this patch.
We will fix that on stage 3.

Could you verify with Fortran test ? 

Thanks.

juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-24 23:03
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
I'm seeing a variety of new failures, constrained to rv32gcv:

Tested using newlib/linux:
rv32gcv/ ilp32d/ medlow
rv64gcv/  lp64d/ medlow
rv64gcv_zvbb_zvbc_zvkg_zvkn_zvknc_zvkned_zvkng_zvknha_zvknhb_zvks_zvksc_zvksed_zvksg_zvksh_zvkt/  lp64d/ medlow
rv64imafdcv_zicond_zawrs_zbc_zvkng_zvksg_zvbb_zvbc_zicsr_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt/  lp64d/ medlow

Newlib failures:
rv32gcv:
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test

Debug log for testcases that aren't pr110557.c look like this:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-lmul=m4      -lm  -o ./popcount-run-1.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o ./popcount-run-1.exe
PASS: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c (test for excess errors)
spawn riscv64-unknown-elf-run ./popcount-run-1.exe
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
Debug log for pr110557.c:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs  -lm  -o ./pr110557.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs -lm -o ./pr110557.exe
PASS: g++.dg/vect/pr110557.cc  -std=c++14 (test for excess errors)
spawn riscv64-unknown-elf-run ./pr110557.exe
/scratch/tc-testing/tc-oct-23-avl/build-newlib/../scripts/wrapper/qemu/riscv64-unknown-elf-run: line 15: 3449805 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
Linux failures:
rv32gcv:
FAIL: gcc.dg/nextafter-2.c execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
FAIL: gfortran.dg/default_format_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_2.f90   -Os  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -Os  execution test
FAIL: gfortran.dg/large_real_kind_2.F90   -O0  execution test
FAIL: gfortran.dg/round_4.f90   -O0  execution test
FAIL: gfortran.dg/zero_sized_3.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test
FAIL: gfortran.dg/ieee/large_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O1  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O2  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -Os  execution test
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_sum.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops

Some (not all) debug log outputs:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions        -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
PASS: gfortran.fortran-torture/execute/intrinsic_count.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions
spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
STOP 2
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions -funroll-loops       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -funroll-loops -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
STOP 3
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops

Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output    -O0   -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o ./large_2.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o ./large_2.exe
PASS: gfortran.dg/ieee/large_2.f90   -O0  (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./large_2.exe
  0.333333333333333333333333333333333317         2.24271998593667819112500193394291495E+1644
STOP 1
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm  -o ./pr110557.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm -o ./pr110557.exe
PASS: g++.dg/vect/pr110557.cc  -std=c++98 (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./pr110557.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 323485 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-reduc-dot-21.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe
PASS: gcc.dg/vect/vect-reduc-dot-21.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-reduc-dot-21.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3484803 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-alias-check-16.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-alias-check-16.exe
PASS: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-alias-check-16.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3431975 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "flags: *RAW\\n"
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "using an address-based overlap test"
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump-not vect "using an index-based"
I've observed nextafter-2.c being flaky on the CI so that particular failure might not be real.

If you want any particular testcase's debug logs please let me know.

Patrick

On 10/23/23 21:30, Patrick O'Neill wrote:
The CI just picked it up: https://github.com/ewlu/gcc-precommit-ci/issues/449#issue-1958483272
Since it doesn't apply to the CI's baseline hash it's only performing a build.
I'll re-run it in the morning once the baseline has been updated.

In the meantime I started a full build+test run on my local machine.
I'll send you the results in ~10 hours - morning my time :-)

Patrick
On 10/23/23 20:44, juzhe.zhong@rivai.ai wrote:
CCing Patrick...

Hi, @Patrick.
Could you apply this patch and trigger your regression CI?

I don't have an environment to test fortran for now (I only test it on C/C++).

Thanks. 

juzhe.zhong@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-24 11:32
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.
 
Consider a simple vector addition operation:
 
https://godbolt.org/z/7hfGfEjW3
 
void
foo (int *__restrict a,
     int *__restrict b,
     int *__restrict n)
{
  for (int i = 0; i < n; i++)
      a[i] = a[i] + b[i];
}
 
Optimized IR:
 
Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);    -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;                              -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
 
We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:
 
vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
 
GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):
 
ARM SVE:
  
.L3:
        ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
        ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
        add     z31.s, z31.s, z30.s            -> un-predicated add
        st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store
 
Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.
 
Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:
 
1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
   We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.
 
To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.
 
The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)
 
Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.
 
The reasons as follows:
 
1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
   turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
PASS become heavy and heavy again, then we will need to refactor it again in the future.
Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
fixes.
 
2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things,  I don't think we should fuse them into same PASS.
 
3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.
 
4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
VSETVL PASS again which is already so complicated.)
 
Here is an example to demonstrate more:
 
https://godbolt.org/z/bE86sv3q5
 
void foo2 (int *__restrict a,
          int *__restrict b,
          int *__restrict c,
          int *__restrict a2,
          int *__restrict b2,
          int *__restrict c2,
          int *__restrict a3,
          int *__restrict b3,
          int *__restrict c3,
          int *__restrict a4,
          int *__restrict b4,
          int *__restrict c4,
          int *__restrict a5,
          int *__restrict b5,
          int *__restrict c5,
          int n)
{
    for (int i = 0; i < n; i++){
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i]+ a[i];
 
      a[i] = a[i] + c[i];
      b5[i] = a[i] + c[i];
      a2[i] = a[i] + c2[i];
      a3[i] = a[i] + c3[i];
      a4[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i]+ a[i];
    }
}
 
1. Loop Body:
 
Before this patch:                                          After this patch:
 
      vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli a4,t1,e32,m1,ta,ma                                    
        vle32.v v2,0(a2)                                     vle32.v v2,0(a2)
        vle32.v v4,0(a1)                                     vle32.v v3,0(t2)
        vle32.v v1,0(t2)                                     vle32.v v4,0(a1)
        vsetvli a7,zero,e32,m1,ta,ma                         vle32.v v1,0(t0)
        vadd.vv v4,v2,v4                                     vadd.vv v4,v2,v4
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v1,v3,v1
        vle32.v v3,0(s0)                                     vadd.vv v1,v1,v4
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v1,v1,v4
        vadd.vv v1,v3,v1                                     vadd.vv v1,v1,v4
        vadd.vv v1,v1,v4                                     vadd.vv v1,v1,v2
        vadd.vv v1,v1,v4                                     vadd.vv v2,v1,v2
        vadd.vv v1,v1,v4                                     vse32.v v2,0(t5)
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v2,v2,v1
        vle32.v v4,0(a5)                                     vadd.vv v2,v2,v1
        vsetvli a7,zero,e32,m1,ta,ma                         slli a7,a4,2
        vadd.vv v1,v1,v2                                     vadd.vv v3,v1,v3
        vadd.vv v2,v1,v2                                     vle32.v v5,0(a5)
        vadd.vv v4,v1,v4                                     vle32.v v6,0(t6)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v3,0(t3)
        vse32.v v2,0(t5)                                     vse32.v v2,0(a0)
        vse32.v v4,0(a3)                                     vadd.vv v3,v3,v1
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v2,v1,v5
        vadd.vv v3,v1,v3                                     vse32.v v3,0(t4)
        vadd.vv v2,v2,v1                                     vadd.vv v1,v1,v6
        vadd.vv v2,v2,v1                                     vse32.v v2,0(a3)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v1,0(a6)
        vse32.v v2,0(a0)                                     
        vse32.v v3,0(t3)                                     
        vle32.v v2,0(t0)                                     
        vsetvli a7,zero,e32,m1,ta,ma                                     
        vadd.vv v3,v3,v1                                     
        vsetvli zero,a4,e32,m1,ta,ma                                     
        vse32.v v3,0(t4)                                     
        vsetvli a7,zero,e32,m1,ta,ma                                     
        slli    a7,a4,2                                     
        vadd.vv v1,v1,v2                                     
        sub     t1,t1,a4                                     
        vsetvli zero,a4,e32,m1,ta,ma                                     
        vse32.v v1,0(a6)                                     
 
It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.
 
2. Epilogue:
    Before this patch:                                          After this patch:
 
     .L5:                                                      .L5:                                          
        ld      s0,8(sp)                                         ret
        addi    sp,sp,16                                        
        jr      ra                                        
 
This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'
 
The final codegen after this patch:
 
foo2:
lw t1,56(sp)
ld t6,0(sp)
ld t3,8(sp)
ld t0,16(sp)
ld t2,24(sp)
ld t4,32(sp)
ld t5,40(sp)
ble t1,zero,.L5
.L3:
vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2)
vle32.v v3,0(t2)
vle32.v v4,0(a1)
vle32.v v1,0(t0)
vadd.vv v4,v2,v4
vadd.vv v1,v3,v1
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v2
vadd.vv v2,v1,v2
vse32.v v2,0(t5)
vadd.vv v2,v2,v1
vadd.vv v2,v2,v1
slli a7,a4,2
vadd.vv v3,v1,v3
vle32.v v5,0(a5)
vle32.v v6,0(t6)
vse32.v v3,0(t3)
vse32.v v2,0(a0)
vadd.vv v3,v3,v1
vadd.vv v2,v1,v5
vse32.v v3,0(t4)
vadd.vv v1,v1,v6
vse32.v v2,0(a3)
vse32.v v1,0(a6)
sub t1,t1,a4
add a1,a1,a7
add a2,a2,a7
add a5,a5,a7
add t6,t6,a7
add t0,t0,a7
add t2,t2,a7
add t5,t5,a7
add a3,a3,a7
add a6,a6,a7
add t3,t3,a7
add t4,t4,a7
add a0,a0,a7
bne t1,zero,.L3
.L5:
ret
 
PR target/111888
 
gcc/ChangeLog:
 
* config.gcc: Add AVL propgatation PASS.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
(has_vtype_op): Export as global.
(has_vl_op): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(const_vlmax_p): Ditto.
* config/riscv/riscv-v.cc (has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(get_vlmul): Ditto.
* config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
(has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_sew): Ditto.
(get_vlmul): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
* config/riscv/t-riscv: Add AVL propagation PASS.
* config/riscv/vector.md: Fix VLS modes attribute.
* config/riscv/riscv-avlprop.cc: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
* gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
* gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.
 
---
gcc/config.gcc                                |   2 +-
gcc/config/riscv/riscv-avlprop.cc             | 350 ++++++++++++++++++
gcc/config/riscv/riscv-passes.def             |   1 +
gcc/config/riscv/riscv-protos.h               |  10 +
gcc/config/riscv/riscv-v.cc                   |  84 ++++-
gcc/config/riscv/riscv-vsetvl.cc              |  82 +---
gcc/config/riscv/t-riscv                      |   6 +
gcc/config/riscv/vector.md                    |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +-
.../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +-
.../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 -
.../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
.../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
15 files changed, 514 insertions(+), 84 deletions(-)
create mode 100644 gcc/config/riscv/riscv-avlprop.cc
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 606d3a8513e..efd53965c9a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -544,7 +544,7 @@ pru-*-*)
riscv*)
cpu_type=riscv
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
- extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
+ extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o riscv-avlprop.o"
extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o"
d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-avlprop.cc b/gcc/config/riscv/riscv-avlprop.cc
new file mode 100644
index 00000000000..bf3becd8371
--- /dev/null
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -0,0 +1,350 @@
+/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2023-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or(at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions.
+   A standalone AVL propagation pass is designed because:
+
+     - Better code maintain:
+       Current LCM-based VSETVL pass is so complicated that codes
+       there will become even harder to maintain. A straight forward
+       AVL propagation PASS is much easier to maintain.
+
+     - Reduce scalar register pressure:
+       A type of AVL propagation is we propagate AVL from NON-VLMAX
+       instruction to VLMAX instruction.
+       Note: VLMAX instruction should be ignore tail elements (TA)
+       and the result should be used by the NON-VLMAX instruction.
+       This optimization is mostly for auto-vectorization codes:
+
+   vsetvli r136, r137      --- SELECT_VL
+   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
+   vadd.vv (use VLMAX)     --- PLUS_EXPR
+   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
+
+ NO AVL propation:
+
+   vsetvli a5, a4, ta
+   vle8.v v1
+   vsetvli t0, zero, ta
+   vadd.vv v2, v1, v1
+   vse8.v v2
+
+ We can propagate the AVL to 'vadd.vv' since its result
+ is consumed by a 'vse8.v' which has AVL = a5 and its
+ tail elements are agnostic.
+
+       We DON'T do this optimization on VSETVL pass since it is a
+       post-RA pass that consumed 't0' already wheras a standalone
+       pre-RA AVL propagation pass allows us elide the consumption
+       of the pseudo register of 't0' then we can reduce scalar
+       register pressure.
+
+     - More AVL propagation opportunities:
+       A pre-RA pass is more flexible for AVL REG def-use chain,
+       thus we will get more potential AVL propagation as long as
+       it doesn't increase the scalar register pressure.
+*/
+
+#define IN_TARGET_CODE 1
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "target.h"
+#include "tree-pass.h"
+#include "df.h"
+#include "rtl-ssa.h"
+#include "cfgcleanup.h"
+#include "insn-attr.h"
+
+using namespace rtl_ssa;
+using namespace riscv_vector;
+
+/* The AVL propagation instructions and corresponding preferred AVL.
+   It will be updated during the analysis.  */
+static hash_map<insn_info *, rtx> *avlprops;
+
+const pass_data pass_data_avlprop = {
+  RTL_PASS, /* type */
+  "avlprop", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_avlprop : public rtl_opt_pass
+{
+public:
+  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass (pass_data_avlprop, ctxt) {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) final override
+  {
+    return TARGET_VECTOR && optimize > 0;
+  }
+  virtual unsigned int execute (function *) final override;
+}; // class pass_avlprop
+
+static void
+avlprop_init (void)
+{
+  calculate_dominance_info (CDI_DOMINATORS);
+  df_analyze ();
+  crtl->ssa = new function_info (cfun);
+  avlprops = new hash_map<insn_info *, rtx>;
+}
+
+static void
+avlprop_done (void)
+{
+  free_dominance_info (CDI_DOMINATORS);
+  if (crtl->ssa->perform_pending_updates ())
+    cleanup_cfg (0);
+  delete crtl->ssa;
+  crtl->ssa = nullptr;
+  delete avlprops;
+  avlprops = NULL;
+}
+
+/* Helper function to get AVL operand.  */
+static rtx
+get_avl (insn_info *insn, bool avlprop_p)
+{
+  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
+      || get_attr_avl_type (insn->rtl ()) == VLS)
+    return NULL_RTX;
+  if (avlprop_p)
+    {
+      if (avlprops->get (insn))
+ return (*avlprops->get (insn));
+      else if (vlmax_avl_type_p (insn->rtl ()))
+ return RVV_VLMAX;
+    }
+  extract_insn_cached (insn->rtl ());
+  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
+}
+
+/* This is a straight forward pattern ALWAYS in paritial auto-vectorization:
+
+     VL = SELECT_AVL (AVL, ...)
+     V0 = MASK_LEN_LOAD (..., VL)
+     V1 = MASK_LEN_LOAD (..., VL)
+     V2 = V0 + V1 --- Missed LEN information.
+     MASK_LEN_STORE (..., V2, VL)
+
+   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1, dummy LEN)
+   because:
+
+     - Few code changes in Loop Vectorizer.
+     - Reuse the current clean flow of partial vectorization, That is, apply
+       predicate LEN or MASK into LOAD/STORE operations and other special
+       arithmetic operations (e.d. DIV), then do the whole vector register
+       operation if it DON'T affect the correctness.
+       Such flow is used by all other targets like x86, sve, s390, ... etc.
+     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
+
+   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like PLUS_EXPR which
+   generates the VLMAX instruction due to missed LEN information. The later
+   VSETVL PASS will elided the redundant vsetvls.
+*/
+
+static rtx
+get_autovectorize_preferred_avl (insn_info *insn)
+{
+  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p (insn->rtl ()))
+    return NULL_RTX;
+
+  rtx use_avl = NULL_RTX;
+  insn_info *avl_use_insn = nullptr;
+  unsigned int ratio
+    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul (insn->rtl ()));
+  for (def_info *def : insn->defs ())
+    {
+      auto set = safe_dyn_cast<set_info *> (def);
+      if (!set || !set->is_reg ())
+ return NULL_RTX;
+      for (use_info *use : set->all_uses ())
+ {
+   if (!use->is_in_nondebug_insn ())
+     return NULL_RTX;
+   insn_info *use_insn = use->insn ();
+   /* FIXME: Stop AVL propagation if any USE is not a RVV real
+      instruction. It should be totally enough for vectorized codes since
+      they always locate at extended blocks.
+
+      TODO: We can extend PHI checking for intrinsic codes if it
+      necessary in the future.  */
+   if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!has_vl_op (use_insn->rtl ()))
+     continue;
+
+   rtx new_use_avl = get_avl (use_insn, true);
+   if (!new_use_avl)
+     return NULL_RTX;
+   if (!use_avl)
+     use_avl = new_use_avl;
+   if (!rtx_equal_p (use_avl, new_use_avl)
+       || calculate_ratio (get_sew (use_insn->rtl ()),
+   get_vlmul (use_insn->rtl ()))
+    != ratio
+       || vlmax_avl_p (new_use_avl)
+       || !tail_agnostic_p (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!avl_use_insn)
+     avl_use_insn = use_insn;
+ }
+    }
+
+  if (use_avl && register_operand (use_avl, Pmode))
+    {
+      gcc_assert (avl_use_insn);
+      // Find a definition at or neighboring INSN.
+      resource_info resource = full_register (REGNO (use_avl));
+      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
+      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
+      if (dl1.matching_set () || dl2.matching_set ())
+ return NULL_RTX;
+      def_info *def1 = dl1.last_def_of_prev_group ();
+      def_info *def2 = dl2.last_def_of_prev_group ();
+      if (def1 != def2)
+ return NULL_RTX;
+      /* FIXME: We only all AVL propation within a block which should
+ be totally enough for vectorized codes.
+
+ TODO: We can enhance it here for intrinsic codes in the future
+ if it is necessary.  */
+      if (def1->insn ()->bb () != insn->bb ()
+   || def1->insn ()->compare_with (insn) >= 0)
+ return NULL_RTX;
+    }
+  return use_avl;
+}
+
+/* If we have a preferred AVL to propagate, return the AVL.
+   Otherwise, return NULL_RTX as we don't need have any preferred
+   AVL.  */
+
+static rtx
+get_preferred_avl (insn_info *insn)
+{
+  /* TODO: We only do AVL propagation for missed-LEN partial
+     autovectorization for now.  We could add more more AVL
+     propagation for intrinsic codes in the future.  */
+  return get_autovectorize_preferred_avl (insn);
+}
+
+/* Return the AVL TYPE operand index.  */
+static int
+get_avl_type_index (insn_info *insn)
+{
+  extract_insn_cached (insn->rtl ());
+  /* Except rounding mode patterns, AVL TYPE operand
+     is always the last operand.  */
+  if (find_access (insn->uses (), VXRM_REGNUM)
+      || find_access (insn->uses (), FRM_REGNUM))
+    return recog_data.n_operands - 2;
+  return recog_data.n_operands - 1;
+}
+
+/* Main entry point for this pass.  */
+unsigned int
+pass_avlprop::execute (function *)
+{
+  avlprop_init ();
+
+  /* Go through all the instructions looking for AVL that we could propagate. */
+
+  insn_info *next;
+  bool change_p = true;
+
+  while (change_p)
+    {
+      /* Iterate on each instruction until no more change need.  */
+      change_p = false;
+      for (insn_info *insn = crtl->ssa->first_insn (); insn; insn = next)
+ {
+   next = insn->next_any_insn ();
+   /* We only forward AVL to the instruction that has AVL/VL operand
+      and can be optimized in RTL_SSA level.  */
+   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
+     continue;
+
+   rtx new_avl = get_preferred_avl (insn);
+   if (new_avl)
+     {
+       gcc_assert (!vlmax_avl_p (new_avl));
+       auto &update = avlprops->get_or_insert (insn);
+       change_p = !rtx_equal_p (update, new_avl);
+       update = new_avl;
+     }
+ }
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "\nNumber of successful AVL propagations: %d\n\n",
+      (int) avlprops->elements ());
+
+  for (const auto iter : *avlprops)
+    {
+      rtx_insn *rinsn = iter.first->rtl ();
+      if (dump_file)
+ {
+   fprintf (dump_file, "\nPropagating AVL: ");
+   print_rtl_single (dump_file, iter.second);
+   fprintf (dump_file, "into: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+      /* Replace AVL operand.  */
+      rtx new_pat
+ = simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first, false),
+ iter.second);
+      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
+
+      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
+      if (vlmax_avl_type_p (rinsn))
+ validate_change_or_fail (
+   rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)],
+   get_avl_type_rtx (avl_type::NONVLMAX), false);
+      if (dump_file)
+ {
+   fprintf (dump_file, "Successfully to match this instruction: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+    }
+
+  avlprop_done ();
+  return 0;
+}
+
+rtl_opt_pass *
+make_pass_avlprop (gcc::context *ctxt)
+{
+  return new pass_avlprop (ctxt);
+}
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
index 4084122cf0a..b6260939d5c 100644
--- a/gcc/config/riscv/riscv-passes.def
+++ b/gcc/config/riscv/riscv-passes.def
@@ -18,4 +18,5 @@
    <http://www.gnu.org/licenses/>.  */
INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
+INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6cb9d459ee9..2b09ec9ea9e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
/* Routines implemented in riscv-string.c.  */
@@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
bool cmp_lmul_gt_one (machine_mode);
bool gather_scatter_valid_offset_mode_p (machine_mode);
bool vls_mode_valid_p (machine_mode);
+bool has_vtype_op (rtx_insn *);
+bool has_vl_op (rtx_insn *);
+bool tail_agnostic_p (rtx_insn *);
+void validate_change_or_fail (rtx, rtx *, rtx, bool);
+bool vlmax_avl_type_p (rtx_insn *);
+bool vlmax_avl_p (rtx);
+uint8_t get_sew (rtx_insn *);
+enum vlmul_type get_vlmul (rtx_insn *);
+bool const_vlmax_p (machine_mode);
}
/* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e39a9507803..473622ac321 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -56,7 +56,7 @@ using namespace riscv_vector;
namespace riscv_vector {
/* Return true if vlmax is constant value and can be used in vsetivl.  */
-static bool
+bool
const_vlmax_p (machine_mode mode)
{
   poly_uint64 nuints = GET_MODE_NUNITS (mode);
@@ -298,14 +298,6 @@ public:
      len = force_reg (Pmode, len);
    vls_p = true;
  }
- else if (const_vlmax_p (vtype_mode))
-   {
-     /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
-        the vsetvli to obtain the value of vlmax.  */
-     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
-     len = gen_int_mode (nunits, Pmode);
-     vls_p = true;
-   }
else if (can_create_pseudo_p ())
  {
    len = gen_reg_rtx (Pmode);
@@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
   emit_move_insn (dst, x4);
}
+/* Return true if it is an RVV instruction depends on VTYPE global
+   status register.  */
+bool
+has_vtype_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
+}
+
+/* Return true if it is an RVV instruction depends on VL global
+   status register.  */
+bool
+has_vl_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
+}
+
+/* Get default tail policy.  */
+static bool
+get_default_ta ()
+{
+  /* For the instruction that doesn't require TA, we still need a default value
+     to emit vsetvl. We pick up the default value according to prefer policy. */
+  return (bool) (get_prefer_tail_policy () & 0x1
+ || (get_prefer_tail_policy () >> 1 & 0x1));
+}
+
+/* Helper function to get TA operand.  */
+bool
+tail_agnostic_p (rtx_insn *rinsn)
+{
+  /* If it doesn't have TA, we return agnostic by default.  */
+  extract_insn_cached (rinsn);
+  int ta = get_attr_ta (rinsn);
+  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
+}
+
+/* Change insn and Assert the change always happens.  */
+void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
+
+/* Return true if it is VLMAX AVL TYPE.  */
+bool
+vlmax_avl_type_p (rtx_insn *rinsn)
+{
+  return get_attr_avl_type (rinsn) == VLMAX;
+}
+
+/* Return true if RTX is RVV VLMAX AVL.  */
+bool
+vlmax_avl_p (rtx x)
+{
+  return x && rtx_equal_p (x, RVV_VLMAX);
+}
+
+/* Helper function to get SEW operand. We always have SEW value for
+   all RVV instructions that have VTYPE OP.  */
+uint8_t
+get_sew (rtx_insn *rinsn)
+{
+  return get_attr_sew (rinsn);
+}
+
+/* Helper function to get VLMUL operand. We always have VLMUL value for
+   all RVV instructions that have VTYPE OP. */
+enum vlmul_type
+get_vlmul (rtx_insn *rinsn)
+{
+  return (enum vlmul_type) get_attr_vlmul (rinsn);
+}
+
} // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index e9dd669de98..f2f19e423bf 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
   return agnostic_p ? "agnostic" : "undisturbed";
}
-static bool
-vlmax_avl_p (rtx x)
-{
-  return x && rtx_equal_p (x, RVV_VLMAX);
-}
-
-/* Return true if it is an RVV instruction depends on VTYPE global
-   status register.  */
-static bool
-has_vtype_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
-}
-
-/* Return true if it is an RVV instruction depends on VL global
-   status register.  */
-static bool
-has_vl_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
-}
-
/* Return true if the instruction ignores VLMUL field of VTYPE.  */
static bool
ignore_vlmul_insn_p (rtx_insn *rinsn)
@@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
   if (!has_vl_op (rinsn))
     return NULL_RTX;
-  if (get_attr_avl_type (rinsn) == VLMAX)
-    return RVV_VLMAX;
-  extract_insn_cached (rinsn);
-  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
-}
-/* Helper function to get SEW operand. We always have SEW value for
-   all RVV instructions that have VTYPE OP.  */
-static uint8_t
-get_sew (rtx_insn *rinsn)
-{
-  return get_attr_sew (rinsn);
-}
-
-/* Helper function to get VLMUL operand. We always have VLMUL value for
-   all RVV instructions that have VTYPE OP. */
-static enum vlmul_type
-get_vlmul (rtx_insn *rinsn)
-{
-  return (enum vlmul_type) get_attr_vlmul (rinsn);
-}
+  extract_insn_cached (rinsn);
+  if (vlmax_avl_type_p (rinsn))
+    {
+      if (BYTES_PER_RISCV_VECTOR.is_constant ())
+ {
+   for (int i = 0; i < recog_data.n_operands; i++)
+     if (GET_MODE_CLASS (recog_data.operand_mode[i]) == MODE_VECTOR_BOOL
+ && const_vlmax_p (recog_data.operand_mode[i]))
+       return gen_int_mode (GET_MODE_NUNITS (recog_data.operand_mode[i]),
+    Pmode);
+ }
+      return RVV_VLMAX;
+    }
-/* Get default tail policy.  */
-static bool
-get_default_ta ()
-{
-  /* For the instruction that doesn't require TA, we still need a default value
-     to emit vsetvl. We pick up the default value according to prefer policy. */
-  return (bool) (get_prefer_tail_policy () & 0x1
- || (get_prefer_tail_policy () >> 1 & 0x1));
+  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
}
/* Get default mask policy.  */
@@ -407,16 +371,6 @@ get_default_ma ()
|| (get_prefer_mask_policy () >> 1 & 0x1));
}
-/* Helper function to get TA operand.  */
-static bool
-tail_agnostic_p (rtx_insn *rinsn)
-{
-  /* If it doesn't have TA, we return agnostic by default.  */
-  extract_insn_cached (rinsn);
-  int ta = get_attr_ta (rinsn);
-  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
-}
-
/* Helper function to get MA operand.  */
static bool
mask_agnostic_p (rtx_insn *rinsn)
@@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
   return true;
}
-/* Change insn and Assert the change always happens.  */
-static void
-validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
-{
-  bool change_p = validate_change (object, loc, new_rtx, in_group);
-  gcc_assert (change_p);
-}
-
/* This flags indicates the minimum demand of the vl and vtype values by the
    RVV instruction. For example, DEMAND_RATIO_P indicates that this RVV
    instruction only needs the SEW/LMUL ratio to remain the same, and does not
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index dd17056fe82..08de62853a6 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -69,6 +69,12 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/riscv/riscv-vsetvl.cc
+riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
+  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h
+ $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+ $(srcdir)/config/riscv/riscv-avlprop.cc
+
riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
   $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ef91950178f..0c59d1b90bc 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -809,7 +809,7 @@
  V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
  V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
  V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
-    (symbol_ref "riscv_vector::NONVLMAX")
+    (symbol_ref "riscv_vector::VLS")
(eq_attr "type" "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
  vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
  vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
index 928a507a363..5278e4aa38f 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
@@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
     }
}
-/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler {e16,m2} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
index a50265fc1ec..1db2e073846 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
@@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict b, int n)
     a[i] = a[i] + b[i];
}
-/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler {e16,m4} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
index eac7cbc757b..ca88d42cdf4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
@@ -7,10 +7,11 @@
/*
** foo:
** vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
-** vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
-** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
+** vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
*/
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
index 965365da4bb..13367423751 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
@@ -3,7 +3,6 @@
#include "ternop-2.c"
-/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
/* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */
/* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized" } } */
/* { dg-final { scan-assembler-not {\tvmv} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
new file mode 100644
index 00000000000..b0d21650c3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
new file mode 100644
index 00000000000..f2d8aa54b88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c,
+     int *__restrict a2, int *__restrict b2, int *__restrict c2,
+     int *__restrict a3, int *__restrict b3, int *__restrict c3,
+     int *__restrict a4, int *__restrict b4, int *__restrict c4,
+     int *__restrict a5, int *__restrict b5, int *__restrict c5,
+     int *__restrict d, int *__restrict d2, int *__restrict d3,
+     int *__restrict d4, int *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 674ba0d72b4..fc830f2cd4d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
"" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \
"-O3 -ftree-vectorize" $CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/avlprop/*.\[cS\]]] \
+ "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
"-O3 -ftree-vectorize --param riscv-autovec-preference=scalable" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
--
2.36.3
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-26  0:49           ` juzhe.zhong
@ 2023-10-26  1:22             ` Patrick O'Neill
  2023-10-26  1:27               ` juzhe.zhong
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick O'Neill @ 2023-10-26  1:22 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: kito.cheng, Kito.cheng, jeffreyalaw, Robin Dapp

[-- Attachment #1: Type: text/plain, Size: 82255 bytes --]


On 10/25/23 17:49, juzhe.zhong@rivai.ai wrote:
> FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
> FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
>
> These 2 FAILs are bogus. Testcases need to be adapted, I notice I didn't include this in this patch.
>
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
>
> These 2 already exist on the trunk for RV32.
>
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
> This FAIL for RV64 is odd. I don't have it.  Could you share me the debug log ?

rv64gcv debug log:

Executing on host: /scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany      -lm  -o ./mask_gather_load_run-11.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany -lm -o ./mask_gather_load_run-11.exe
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./mask_gather_load_run-11.exe
mask_gather_load_run-11.exe: /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c:98: main: Assertion `dest_uint16_t_uint8_t[i * 2] == dest2_uint16_t_uint8_t[i * 2]' failed.
/scratch/tc-testing/tc-avl/build-rv64gcv/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 1520161 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test

rv32gcv debug log:

Executing on host: /scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany      -lm  -o ./mask_gather_load_run-11.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany -lm -o ./mask_gather_load_run-11.exe
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./mask_gather_load_run-11.exe
mask_gather_load_run-11.exe: /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c:98: main: Assertion `dest_uint16_t_uint8_t[i * 2] == dest2_uint16_t_uint8_t[i * 2]' failed.
/scratch/tc-testing/tc-avl/build-rv32gcv/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 2593314 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test

Patrick

>
> juzhe.zhong@rivai.ai
>   
> From: Patrick O'Neill
> Date: 2023-10-26 08:37
> To:juzhe.zhong@rivai.ai; gcc-patches
> CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
> Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
> Hi Juzhe,
>
> I tested on glibc rv32/64gcv qemu.
> Applied patch to/comparing with 668c4c3783970e7adf0591396b6d0d5286cc0541.
>
> V2 results look much better! I don't see any new fortran failures but I am seeing new gcc failures:
>
> rv64gcv:
> FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
> FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
>
> rv32gcv:
> FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
> FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
>
> The popcount-run-1.c test doesn't show up for me on 668c4c3783970e7adf0591396b6d0d5286cc0541 rv32gcv or rv64gcv.
> After applying your patch it only shows up on rv32gcv (rv64gcv still does not have the failure). This might be due to a difference in our testing setups.
>
> Thanks,
> Patrick
>
> On 10/25/23 05:20,juzhe.zhong@rivai.ai  wrote:
> Hi, Patrick.
>
> I have fixed on V2 patch:https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634267.html
>
> I have tested on RV32/RV64 C/C++, no regression. But I am not able to test on Fortran.
>
> The failures you showed have been fixed. Except this one:
> FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
> This FAIL is not because of this patch since I confirmed it already existed without this patch.
> We will fix that on stage 3.
>
> Could you verify with Fortran test ?
>
> Thanks.
>
> juzhe.zhong@rivai.ai
>   
> From: Patrick O'Neill
> Date: 2023-10-24 23:03
> To:juzhe.zhong@rivai.ai; gcc-patches
> CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
> Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
> I'm seeing a variety of new failures, constrained to rv32gcv:
>
> Tested using newlib/linux:
> rv32gcv/ ilp32d/ medlow
> rv64gcv/  lp64d/ medlow
> rv64gcv_zvbb_zvbc_zvkg_zvkn_zvknc_zvkned_zvkng_zvknha_zvknhb_zvks_zvksc_zvksed_zvksg_zvksh_zvkt/  lp64d/ medlow
> rv64imafdcv_zicond_zawrs_zbc_zvkng_zvksg_zvbb_zvbc_zicsr_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt/  lp64d/ medlow
>
> Newlib failures:
> rv32gcv:
> FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
> FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
> FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-reduc-10.c execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
> FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
> FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
> FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
> FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
>
> Debug log for testcases that aren't pr110557.c look like this:
> Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-lmul=m4      -lm  -o ./popcount-run-1.exe    (timeout = 600)
> spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o ./popcount-run-1.exe
> PASS: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c (test for excess errors)
> spawn riscv64-unknown-elf-run ./popcount-run-1.exe
> FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
> Debug log for pr110557.c:
> Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs  -lm  -o ./pr110557.exe    (timeout = 600)
> spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs -lm -o ./pr110557.exe
> PASS: g++.dg/vect/pr110557.cc  -std=c++14 (test for excess errors)
> spawn riscv64-unknown-elf-run ./pr110557.exe
> /scratch/tc-testing/tc-oct-23-avl/build-newlib/../scripts/wrapper/qemu/riscv64-unknown-elf-run: line 15: 3449805 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
> FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
> Linux failures:
> rv32gcv:
> FAIL: gcc.dg/nextafter-2.c execution test
> FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
> FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
> FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-reduc-10.c execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
> FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
> FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
> FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
> FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
> FAIL: gfortran.dg/default_format_2.f90   -O0  execution test
> FAIL: gfortran.dg/default_format_2.f90   -O1  execution test
> FAIL: gfortran.dg/default_format_2.f90   -O2  execution test
> FAIL: gfortran.dg/default_format_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/default_format_2.f90   -O3 -g  execution test
> FAIL: gfortran.dg/default_format_2.f90   -Os  execution test
> FAIL: gfortran.dg/default_format_denormal_2.f90   -O0  execution test
> FAIL: gfortran.dg/default_format_denormal_2.f90   -O1  execution test
> FAIL: gfortran.dg/default_format_denormal_2.f90   -O2  execution test
> FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g  execution test
> FAIL: gfortran.dg/default_format_denormal_2.f90   -Os  execution test
> FAIL: gfortran.dg/large_real_kind_2.F90   -O0  execution test
> FAIL: gfortran.dg/round_4.f90   -O0  execution test
> FAIL: gfortran.dg/zero_sized_3.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test
> FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test
> FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test
> FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test
> FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test
> FAIL: gfortran.dg/ieee/large_1.f90   -O0  execution test
> FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
> FAIL: gfortran.dg/ieee/large_2.f90   -O1  execution test
> FAIL: gfortran.dg/ieee/large_2.f90   -O2  execution test
> FAIL: gfortran.dg/ieee/large_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/ieee/large_2.f90   -O3 -g  execution test
> FAIL: gfortran.dg/ieee/large_2.f90   -Os  execution test
> FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
> FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
> FAIL: gfortran.fortran-torture/execute/intrinsic_sum.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
>
> Some (not all) debug log outputs:
> Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions        -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x    (timeout = 600)
> spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
> PASS: gfortran.fortran-torture/execute/intrinsic_count.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions
> spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
> STOP 2
> FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions
> Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions -funroll-loops       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x    (timeout = 600)
> spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -funroll-loops -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
> PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
> spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
> STOP 3
> FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
>
> Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output    -O0   -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o ./large_2.exe    (timeout = 600)
> spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o ./large_2.exe
> PASS: gfortran.dg/ieee/large_2.f90   -O0  (test for excess errors)
> spawn riscv64-unknown-linux-gnu-run ./large_2.exe
>    0.333333333333333333333333333333333317         2.24271998593667819112500193394291495E+1644
> STOP 1
> FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
> Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm  -o ./pr110557.exe    (timeout = 600)
> spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm -o ./pr110557.exe
> PASS: g++.dg/vect/pr110557.cc  -std=c++98 (test for excess errors)
> spawn riscv64-unknown-linux-gnu-run ./pr110557.exe
> /scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 323485 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
> FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
> Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-reduc-dot-21.exe    (timeout = 600)
> spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe
> PASS: gcc.dg/vect/vect-reduc-dot-21.c (test for excess errors)
> spawn riscv64-unknown-linux-gnu-run ./vect-reduc-dot-21.exe
> /scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3484803 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
> FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
> Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-alias-check-16.exe    (timeout = 600)
> spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-alias-check-16.exe
> PASS: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
> spawn riscv64-unknown-linux-gnu-run ./vect-alias-check-16.exe
> /scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3431975 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
> FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
> PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "flags: *RAW\\n"
> PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "using an address-based overlap test"
> PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump-not vect "using an index-based"
> I've observed nextafter-2.c being flaky on the CI so that particular failure might not be real.
>
> If you want any particular testcase's debug logs please let me know.
>
> Patrick
>
> On 10/23/23 21:30, Patrick O'Neill wrote:
> The CI just picked it up:https://github.com/ewlu/gcc-precommit-ci/issues/449#issue-1958483272
> Since it doesn't apply to the CI's baseline hash it's only performing a build.
> I'll re-run it in the morning once the baseline has been updated.
>
> In the meantime I started a full build+test run on my local machine.
> I'll send you the results in ~10 hours - morning my time :-)
>
> Patrick
> On 10/23/23 20:44,juzhe.zhong@rivai.ai  wrote:
> CCing Patrick...
>
> Hi, @Patrick.
> Could you apply this patch and trigger your regression CI?
>
> I don't have an environment to test fortran for now (I only test it on C/C++).
>
> Thanks.
>
> juzhe.zhong@rivai.ai
>   
> From: Juzhe-Zhong
> Date: 2023-10-24 11:32
> To: gcc-patches
> CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
> Subject: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
> This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
> which is a known issue for a long time and I finally find the time to address it.
>   
> Consider a simple vector addition operation:
>   
> https://godbolt.org/z/7hfGfEjW3
>   
> void
> foo (int *__restrict a,
>       int *__restrict b,
>       int *__restrict n)
> {
>    for (int i = 0; i < n; i++)
>        a[i] = a[i] + b[i];
> }
>   
> Optimized IR:
>   
> Loop body:
>    _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
>    ...
>    vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);    -> vle32.v v2,0(a0)
>    vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> vle32.v v1,0(a1)
>    vect__7.12_19 = vect__6.11_20 + vect__4.8_27;                              -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
>    .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
>   
> We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
> The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:
>   
> vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
>   
> GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
> Such flow are used by all other targets like ARM SVE (RVV also uses such flow):
>   
> ARM SVE:
>    
> .L3:
>          ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
>          ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
>          add     z31.s, z31.s, z30.s            -> un-predicated add
>          st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store
>   
> Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.
>   
> Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:
>   
> 1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
> 2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
> 3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
>     We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.
>   
> To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
> due to AVL/VL toggling.
>   
> The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)
>   
> Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
> experiments and tries.
>   
> The reasons as follows:
>   
> 1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
>     turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
> PASS become heavy and heavy again, then we will need to refactor it again in the future.
> Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
> fixes.
>   
> 2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things,  I don't think we should fuse them into same PASS.
>   
> 3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.
>   
> 4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
>     This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
> We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
> VSETVL PASS again which is already so complicated.)
>   
> Here is an example to demonstrate more:
>   
> https://godbolt.org/z/bE86sv3q5
>   
> void foo2 (int *__restrict a,
>            int *__restrict b,
>            int *__restrict c,
>            int *__restrict a2,
>            int *__restrict b2,
>            int *__restrict c2,
>            int *__restrict a3,
>            int *__restrict b3,
>            int *__restrict c3,
>            int *__restrict a4,
>            int *__restrict b4,
>            int *__restrict c4,
>            int *__restrict a5,
>            int *__restrict b5,
>            int *__restrict c5,
>            int n)
> {
>      for (int i = 0; i < n; i++){
>        a[i] = b[i] + c[i];
>        b5[i] = b[i] + c[i];
>        a2[i] = b2[i] + c2[i];
>        a3[i] = b3[i] + c3[i];
>        a4[i] = b4[i] + c4[i];
>        a5[i] = a[i] + a4[i];
>        a[i] = a5[i] + b5[i]+ a[i];
>   
>        a[i] = a[i] + c[i];
>        b5[i] = a[i] + c[i];
>        a2[i] = a[i] + c2[i];
>        a3[i] = a[i] + c3[i];
>        a4[i] = a[i] + c4[i];
>        a5[i] = a[i] + a4[i];
>        a[i] = a[i] + b5[i]+ a[i];
>      }
> }
>   
> 1. Loop Body:
>   
> Before this patch:                                          After this patch:
>   
>        vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli a4,t1,e32,m1,ta,ma
>          vle32.v v2,0(a2)                                     vle32.v v2,0(a2)
>          vle32.v v4,0(a1)                                     vle32.v v3,0(t2)
>          vle32.v v1,0(t2)                                     vle32.v v4,0(a1)
>          vsetvli a7,zero,e32,m1,ta,ma                         vle32.v v1,0(t0)
>          vadd.vv v4,v2,v4                                     vadd.vv v4,v2,v4
>          vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v1,v3,v1
>          vle32.v v3,0(s0)                                     vadd.vv v1,v1,v4
>          vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v1,v1,v4
>          vadd.vv v1,v3,v1                                     vadd.vv v1,v1,v4
>          vadd.vv v1,v1,v4                                     vadd.vv v1,v1,v2
>          vadd.vv v1,v1,v4                                     vadd.vv v2,v1,v2
>          vadd.vv v1,v1,v4                                     vse32.v v2,0(t5)
>          vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v2,v2,v1
>          vle32.v v4,0(a5)                                     vadd.vv v2,v2,v1
>          vsetvli a7,zero,e32,m1,ta,ma                         slli a7,a4,2
>          vadd.vv v1,v1,v2                                     vadd.vv v3,v1,v3
>          vadd.vv v2,v1,v2                                     vle32.v v5,0(a5)
>          vadd.vv v4,v1,v4                                     vle32.v v6,0(t6)
>          vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v3,0(t3)
>          vse32.v v2,0(t5)                                     vse32.v v2,0(a0)
>          vse32.v v4,0(a3)                                     vadd.vv v3,v3,v1
>          vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v2,v1,v5
>          vadd.vv v3,v1,v3                                     vse32.v v3,0(t4)
>          vadd.vv v2,v2,v1                                     vadd.vv v1,v1,v6
>          vadd.vv v2,v2,v1                                     vse32.v v2,0(a3)
>          vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v1,0(a6)
>          vse32.v v2,0(a0)
>          vse32.v v3,0(t3)
>          vle32.v v2,0(t0)
>          vsetvli a7,zero,e32,m1,ta,ma
>          vadd.vv v3,v3,v1
>          vsetvli zero,a4,e32,m1,ta,ma
>          vse32.v v3,0(t4)
>          vsetvli a7,zero,e32,m1,ta,ma
>          slli    a7,a4,2
>          vadd.vv v1,v1,v2
>          sub     t1,t1,a4
>          vsetvli zero,a4,e32,m1,ta,ma
>          vse32.v v1,0(a6)
>   
> It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.
>   
> 2. Epilogue:
>      Before this patch:                                          After this patch:
>   
>       .L5:                                                      .L5:
>          ld      s0,8(sp)                                         ret
>          addi    sp,sp,16
>          jr      ra
>   
> This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
> which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'
>   
> The final codegen after this patch:
>   
> foo2:
> lw t1,56(sp)
> ld t6,0(sp)
> ld t3,8(sp)
> ld t0,16(sp)
> ld t2,24(sp)
> ld t4,32(sp)
> ld t5,40(sp)
> ble t1,zero,.L5
> .L3:
> vsetvli a4,t1,e32,m1,ta,ma
> vle32.v v2,0(a2)
> vle32.v v3,0(t2)
> vle32.v v4,0(a1)
> vle32.v v1,0(t0)
> vadd.vv v4,v2,v4
> vadd.vv v1,v3,v1
> vadd.vv v1,v1,v4
> vadd.vv v1,v1,v4
> vadd.vv v1,v1,v4
> vadd.vv v1,v1,v2
> vadd.vv v2,v1,v2
> vse32.v v2,0(t5)
> vadd.vv v2,v2,v1
> vadd.vv v2,v2,v1
> slli a7,a4,2
> vadd.vv v3,v1,v3
> vle32.v v5,0(a5)
> vle32.v v6,0(t6)
> vse32.v v3,0(t3)
> vse32.v v2,0(a0)
> vadd.vv v3,v3,v1
> vadd.vv v2,v1,v5
> vse32.v v3,0(t4)
> vadd.vv v1,v1,v6
> vse32.v v2,0(a3)
> vse32.v v1,0(a6)
> sub t1,t1,a4
> add a1,a1,a7
> add a2,a2,a7
> add a5,a5,a7
> add t6,t6,a7
> add t0,t0,a7
> add t2,t2,a7
> add t5,t5,a7
> add a3,a3,a7
> add a6,a6,a7
> add t3,t3,a7
> add t4,t4,a7
> add a0,a0,a7
> bne t1,zero,.L3
> .L5:
> ret
>   
> PR target/111888
>   
> gcc/ChangeLog:
>   
> * config.gcc: Add AVL propgatation PASS.
> * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
> * config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
> (has_vtype_op): Export as global.
> (has_vl_op): Ditto.
> (tail_agnostic_p): Ditto.
> (validate_change_or_fail): Ditto.
> (vlmax_avl_type_p): Ditto.
> (vlmax_avl_p): Ditto.
> (get_sew): Ditto.
> (enum vlmul_type): Ditto.
> (const_vlmax_p): Ditto.
> * config/riscv/riscv-v.cc (has_vtype_op): Ditto.
> (has_vl_op): Ditto.
> (get_default_ta): Ditto.
> (tail_agnostic_p): Ditto.
> (validate_change_or_fail): Ditto.
> (vlmax_avl_type_p): Ditto.
> (vlmax_avl_p): Ditto.
> (get_sew): Ditto.
> (enum vlmul_type): Ditto.
> (get_vlmul): Ditto.
> * config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
> (has_vtype_op): Ditto.
> (has_vl_op): Ditto.
> (get_sew): Ditto.
> (get_vlmul): Ditto.
> (get_default_ta): Ditto.
> (tail_agnostic_p): Ditto.
> (validate_change_or_fail): Ditto.
> * config/riscv/t-riscv: Add AVL propagation PASS.
> * config/riscv/vector.md: Fix VLS modes attribute.
> * config/riscv/riscv-avlprop.cc: New file.
>   
> gcc/testsuite/ChangeLog:
>   
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
> * gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
> * gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
> * gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.
>   
> ---
> gcc/config.gcc                                |   2 +-
> gcc/config/riscv/riscv-avlprop.cc             | 350 ++++++++++++++++++
> gcc/config/riscv/riscv-passes.def             |   1 +
> gcc/config/riscv/riscv-protos.h               |  10 +
> gcc/config/riscv/riscv-v.cc                   |  84 ++++-
> gcc/config/riscv/riscv-vsetvl.cc              |  82 +---
> gcc/config/riscv/t-riscv                      |   6 +
> gcc/config/riscv/vector.md                    |   2 +-
> .../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +-
> .../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +-
> .../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +-
> .../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 -
> .../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
> .../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
> gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
> 15 files changed, 514 insertions(+), 84 deletions(-)
> create mode 100644 gcc/config/riscv/riscv-avlprop.cc
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
>   
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 606d3a8513e..efd53965c9a 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -544,7 +544,7 @@ pru-*-*)
> riscv*)
> cpu_type=riscv
> extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
> - extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
> + extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o riscv-avlprop.o"
> extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
> extra_objs="${extra_objs} thead.o"
> d_target_objs="riscv-d.o"
> diff --git a/gcc/config/riscv/riscv-avlprop.cc b/gcc/config/riscv/riscv-avlprop.cc
> new file mode 100644
> index 00000000000..bf3becd8371
> --- /dev/null
> +++ b/gcc/config/riscv/riscv-avlprop.cc
> @@ -0,0 +1,350 @@
> +/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler.
> +   Copyright (C) 2023-2023 Free Software Foundation, Inc.
> +   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or(at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions.
> +   A standalone AVL propagation pass is designed because:
> +
> +     - Better code maintain:
> +       Current LCM-based VSETVL pass is so complicated that codes
> +       there will become even harder to maintain. A straight forward
> +       AVL propagation PASS is much easier to maintain.
> +
> +     - Reduce scalar register pressure:
> +       A type of AVL propagation is we propagate AVL from NON-VLMAX
> +       instruction to VLMAX instruction.
> +       Note: VLMAX instruction should be ignore tail elements (TA)
> +       and the result should be used by the NON-VLMAX instruction.
> +       This optimization is mostly for auto-vectorization codes:
> +
> +   vsetvli r136, r137      --- SELECT_VL
> +   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
> +   vadd.vv (use VLMAX)     --- PLUS_EXPR
> +   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
> +
> + NO AVL propation:
> +
> +   vsetvli a5, a4, ta
> +   vle8.v v1
> +   vsetvli t0, zero, ta
> +   vadd.vv v2, v1, v1
> +   vse8.v v2
> +
> + We can propagate the AVL to 'vadd.vv' since its result
> + is consumed by a 'vse8.v' which has AVL = a5 and its
> + tail elements are agnostic.
> +
> +       We DON'T do this optimization on VSETVL pass since it is a
> +       post-RA pass that consumed 't0' already wheras a standalone
> +       pre-RA AVL propagation pass allows us elide the consumption
> +       of the pseudo register of 't0' then we can reduce scalar
> +       register pressure.
> +
> +     - More AVL propagation opportunities:
> +       A pre-RA pass is more flexible for AVL REG def-use chain,
> +       thus we will get more potential AVL propagation as long as
> +       it doesn't increase the scalar register pressure.
> +*/
> +
> +#define IN_TARGET_CODE 1
> +#define INCLUDE_ALGORITHM
> +#define INCLUDE_FUNCTIONAL
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "target.h"
> +#include "tree-pass.h"
> +#include "df.h"
> +#include "rtl-ssa.h"
> +#include "cfgcleanup.h"
> +#include "insn-attr.h"
> +
> +using namespace rtl_ssa;
> +using namespace riscv_vector;
> +
> +/* The AVL propagation instructions and corresponding preferred AVL.
> +   It will be updated during the analysis.  */
> +static hash_map<insn_info *, rtx> *avlprops;
> +
> +const pass_data pass_data_avlprop = {
> +  RTL_PASS, /* type */
> +  "avlprop", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_avlprop : public rtl_opt_pass
> +{
> +public:
> +  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass (pass_data_avlprop, ctxt) {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) final override
> +  {
> +    return TARGET_VECTOR && optimize > 0;
> +  }
> +  virtual unsigned int execute (function *) final override;
> +}; // class pass_avlprop
> +
> +static void
> +avlprop_init (void)
> +{
> +  calculate_dominance_info (CDI_DOMINATORS);
> +  df_analyze ();
> +  crtl->ssa = new function_info (cfun);
> +  avlprops = new hash_map<insn_info *, rtx>;
> +}
> +
> +static void
> +avlprop_done (void)
> +{
> +  free_dominance_info (CDI_DOMINATORS);
> +  if (crtl->ssa->perform_pending_updates ())
> +    cleanup_cfg (0);
> +  delete crtl->ssa;
> +  crtl->ssa = nullptr;
> +  delete avlprops;
> +  avlprops = NULL;
> +}
> +
> +/* Helper function to get AVL operand.  */
> +static rtx
> +get_avl (insn_info *insn, bool avlprop_p)
> +{
> +  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
> +      || get_attr_avl_type (insn->rtl ()) == VLS)
> +    return NULL_RTX;
> +  if (avlprop_p)
> +    {
> +      if (avlprops->get (insn))
> + return (*avlprops->get (insn));
> +      else if (vlmax_avl_type_p (insn->rtl ()))
> + return RVV_VLMAX;
> +    }
> +  extract_insn_cached (insn->rtl ());
> +  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
> +}
> +
> +/* This is a straight forward pattern ALWAYS in paritial auto-vectorization:
> +
> +     VL = SELECT_AVL (AVL, ...)
> +     V0 = MASK_LEN_LOAD (..., VL)
> +     V1 = MASK_LEN_LOAD (..., VL)
> +     V2 = V0 + V1 --- Missed LEN information.
> +     MASK_LEN_STORE (..., V2, VL)
> +
> +   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1, dummy LEN)
> +   because:
> +
> +     - Few code changes in Loop Vectorizer.
> +     - Reuse the current clean flow of partial vectorization, That is, apply
> +       predicate LEN or MASK into LOAD/STORE operations and other special
> +       arithmetic operations (e.d. DIV), then do the whole vector register
> +       operation if it DON'T affect the correctness.
> +       Such flow is used by all other targets like x86, sve, s390, ... etc.
> +     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
> +
> +   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like PLUS_EXPR which
> +   generates the VLMAX instruction due to missed LEN information. The later
> +   VSETVL PASS will elided the redundant vsetvls.
> +*/
> +
> +static rtx
> +get_autovectorize_preferred_avl (insn_info *insn)
> +{
> +  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p (insn->rtl ()))
> +    return NULL_RTX;
> +
> +  rtx use_avl = NULL_RTX;
> +  insn_info *avl_use_insn = nullptr;
> +  unsigned int ratio
> +    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul (insn->rtl ()));
> +  for (def_info *def : insn->defs ())
> +    {
> +      auto set = safe_dyn_cast<set_info *> (def);
> +      if (!set || !set->is_reg ())
> + return NULL_RTX;
> +      for (use_info *use : set->all_uses ())
> + {
> +   if (!use->is_in_nondebug_insn ())
> +     return NULL_RTX;
> +   insn_info *use_insn = use->insn ();
> +   /* FIXME: Stop AVL propagation if any USE is not a RVV real
> +      instruction. It should be totally enough for vectorized codes since
> +      they always locate at extended blocks.
> +
> +      TODO: We can extend PHI checking for intrinsic codes if it
> +      necessary in the future.  */
> +   if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl ()))
> +     return NULL_RTX;
> +   if (!has_vl_op (use_insn->rtl ()))
> +     continue;
> +
> +   rtx new_use_avl = get_avl (use_insn, true);
> +   if (!new_use_avl)
> +     return NULL_RTX;
> +   if (!use_avl)
> +     use_avl = new_use_avl;
> +   if (!rtx_equal_p (use_avl, new_use_avl)
> +       || calculate_ratio (get_sew (use_insn->rtl ()),
> +   get_vlmul (use_insn->rtl ()))
> +    != ratio
> +       || vlmax_avl_p (new_use_avl)
> +       || !tail_agnostic_p (use_insn->rtl ()))
> +     return NULL_RTX;
> +   if (!avl_use_insn)
> +     avl_use_insn = use_insn;
> + }
> +    }
> +
> +  if (use_avl && register_operand (use_avl, Pmode))
> +    {
> +      gcc_assert (avl_use_insn);
> +      // Find a definition at or neighboring INSN.
> +      resource_info resource = full_register (REGNO (use_avl));
> +      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
> +      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
> +      if (dl1.matching_set () || dl2.matching_set ())
> + return NULL_RTX;
> +      def_info *def1 = dl1.last_def_of_prev_group ();
> +      def_info *def2 = dl2.last_def_of_prev_group ();
> +      if (def1 != def2)
> + return NULL_RTX;
> +      /* FIXME: We only all AVL propation within a block which should
> + be totally enough for vectorized codes.
> +
> + TODO: We can enhance it here for intrinsic codes in the future
> + if it is necessary.  */
> +      if (def1->insn ()->bb () != insn->bb ()
> +   || def1->insn ()->compare_with (insn) >= 0)
> + return NULL_RTX;
> +    }
> +  return use_avl;
> +}
> +
> +/* If we have a preferred AVL to propagate, return the AVL.
> +   Otherwise, return NULL_RTX as we don't need have any preferred
> +   AVL.  */
> +
> +static rtx
> +get_preferred_avl (insn_info *insn)
> +{
> +  /* TODO: We only do AVL propagation for missed-LEN partial
> +     autovectorization for now.  We could add more more AVL
> +     propagation for intrinsic codes in the future.  */
> +  return get_autovectorize_preferred_avl (insn);
> +}
> +
> +/* Return the AVL TYPE operand index.  */
> +static int
> +get_avl_type_index (insn_info *insn)
> +{
> +  extract_insn_cached (insn->rtl ());
> +  /* Except rounding mode patterns, AVL TYPE operand
> +     is always the last operand.  */
> +  if (find_access (insn->uses (), VXRM_REGNUM)
> +      || find_access (insn->uses (), FRM_REGNUM))
> +    return recog_data.n_operands - 2;
> +  return recog_data.n_operands - 1;
> +}
> +
> +/* Main entry point for this pass.  */
> +unsigned int
> +pass_avlprop::execute (function *)
> +{
> +  avlprop_init ();
> +
> +  /* Go through all the instructions looking for AVL that we could propagate. */
> +
> +  insn_info *next;
> +  bool change_p = true;
> +
> +  while (change_p)
> +    {
> +      /* Iterate on each instruction until no more change need.  */
> +      change_p = false;
> +      for (insn_info *insn = crtl->ssa->first_insn (); insn; insn = next)
> + {
> +   next = insn->next_any_insn ();
> +   /* We only forward AVL to the instruction that has AVL/VL operand
> +      and can be optimized in RTL_SSA level.  */
> +   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
> +     continue;
> +
> +   rtx new_avl = get_preferred_avl (insn);
> +   if (new_avl)
> +     {
> +       gcc_assert (!vlmax_avl_p (new_avl));
> +       auto &update = avlprops->get_or_insert (insn);
> +       change_p = !rtx_equal_p (update, new_avl);
> +       update = new_avl;
> +     }
> + }
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "\nNumber of successful AVL propagations: %d\n\n",
> +      (int) avlprops->elements ());
> +
> +  for (const auto iter : *avlprops)
> +    {
> +      rtx_insn *rinsn = iter.first->rtl ();
> +      if (dump_file)
> + {
> +   fprintf (dump_file, "\nPropagating AVL: ");
> +   print_rtl_single (dump_file, iter.second);
> +   fprintf (dump_file, "into: ");
> +   print_rtl_single (dump_file, rinsn);
> + }
> +      /* Replace AVL operand.  */
> +      rtx new_pat
> + = simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first, false),
> + iter.second);
> +      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
> +
> +      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
> +      if (vlmax_avl_type_p (rinsn))
> + validate_change_or_fail (
> +   rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)],
> +   get_avl_type_rtx (avl_type::NONVLMAX), false);
> +      if (dump_file)
> + {
> +   fprintf (dump_file, "Successfully to match this instruction: ");
> +   print_rtl_single (dump_file, rinsn);
> + }
> +    }
> +
> +  avlprop_done ();
> +  return 0;
> +}
> +
> +rtl_opt_pass *
> +make_pass_avlprop (gcc::context *ctxt)
> +{
> +  return new pass_avlprop (ctxt);
> +}
> diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
> index 4084122cf0a..b6260939d5c 100644
> --- a/gcc/config/riscv/riscv-passes.def
> +++ b/gcc/config/riscv/riscv-passes.def
> @@ -18,4 +18,5 @@
>      <http://www.gnu.org/licenses/>.  */
> INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
> +INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
> INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 6cb9d459ee9..2b09ec9ea9e 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
> extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
> rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
> +rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
> rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
> /* Routines implemented in riscv-string.c.  */
> @@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
> bool cmp_lmul_gt_one (machine_mode);
> bool gather_scatter_valid_offset_mode_p (machine_mode);
> bool vls_mode_valid_p (machine_mode);
> +bool has_vtype_op (rtx_insn *);
> +bool has_vl_op (rtx_insn *);
> +bool tail_agnostic_p (rtx_insn *);
> +void validate_change_or_fail (rtx, rtx *, rtx, bool);
> +bool vlmax_avl_type_p (rtx_insn *);
> +bool vlmax_avl_p (rtx);
> +uint8_t get_sew (rtx_insn *);
> +enum vlmul_type get_vlmul (rtx_insn *);
> +bool const_vlmax_p (machine_mode);
> }
> /* We classify builtin types into two classes:
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index e39a9507803..473622ac321 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -56,7 +56,7 @@ using namespace riscv_vector;
> namespace riscv_vector {
> /* Return true if vlmax is constant value and can be used in vsetivl.  */
> -static bool
> +bool
> const_vlmax_p (machine_mode mode)
> {
>     poly_uint64 nuints = GET_MODE_NUNITS (mode);
> @@ -298,14 +298,6 @@ public:
>        len = force_reg (Pmode, len);
>      vls_p = true;
>    }
> - else if (const_vlmax_p (vtype_mode))
> -   {
> -     /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
> -        the vsetvli to obtain the value of vlmax.  */
> -     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
> -     len = gen_int_mode (nunits, Pmode);
> -     vls_p = true;
> -   }
> else if (can_create_pseudo_p ())
>    {
>      len = gen_reg_rtx (Pmode);
> @@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
>     emit_move_insn (dst, x4);
> }
> +/* Return true if it is an RVV instruction depends on VTYPE global
> +   status register.  */
> +bool
> +has_vtype_op (rtx_insn *rinsn)
> +{
> +  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
> +}
> +
> +/* Return true if it is an RVV instruction depends on VL global
> +   status register.  */
> +bool
> +has_vl_op (rtx_insn *rinsn)
> +{
> +  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
> +}
> +
> +/* Get default tail policy.  */
> +static bool
> +get_default_ta ()
> +{
> +  /* For the instruction that doesn't require TA, we still need a default value
> +     to emit vsetvl. We pick up the default value according to prefer policy. */
> +  return (bool) (get_prefer_tail_policy () & 0x1
> + || (get_prefer_tail_policy () >> 1 & 0x1));
> +}
> +
> +/* Helper function to get TA operand.  */
> +bool
> +tail_agnostic_p (rtx_insn *rinsn)
> +{
> +  /* If it doesn't have TA, we return agnostic by default.  */
> +  extract_insn_cached (rinsn);
> +  int ta = get_attr_ta (rinsn);
> +  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
> +}
> +
> +/* Change insn and Assert the change always happens.  */
> +void
> +validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
> +{
> +  bool change_p = validate_change (object, loc, new_rtx, in_group);
> +  gcc_assert (change_p);
> +}
> +
> +/* Return true if it is VLMAX AVL TYPE.  */
> +bool
> +vlmax_avl_type_p (rtx_insn *rinsn)
> +{
> +  return get_attr_avl_type (rinsn) == VLMAX;
> +}
> +
> +/* Return true if RTX is RVV VLMAX AVL.  */
> +bool
> +vlmax_avl_p (rtx x)
> +{
> +  return x && rtx_equal_p (x, RVV_VLMAX);
> +}
> +
> +/* Helper function to get SEW operand. We always have SEW value for
> +   all RVV instructions that have VTYPE OP.  */
> +uint8_t
> +get_sew (rtx_insn *rinsn)
> +{
> +  return get_attr_sew (rinsn);
> +}
> +
> +/* Helper function to get VLMUL operand. We always have VLMUL value for
> +   all RVV instructions that have VTYPE OP. */
> +enum vlmul_type
> +get_vlmul (rtx_insn *rinsn)
> +{
> +  return (enum vlmul_type) get_attr_vlmul (rinsn);
> +}
> +
> } // namespace riscv_vector
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
> index e9dd669de98..f2f19e423bf 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
>     return agnostic_p ? "agnostic" : "undisturbed";
> }
> -static bool
> -vlmax_avl_p (rtx x)
> -{
> -  return x && rtx_equal_p (x, RVV_VLMAX);
> -}
> -
> -/* Return true if it is an RVV instruction depends on VTYPE global
> -   status register.  */
> -static bool
> -has_vtype_op (rtx_insn *rinsn)
> -{
> -  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
> -}
> -
> -/* Return true if it is an RVV instruction depends on VL global
> -   status register.  */
> -static bool
> -has_vl_op (rtx_insn *rinsn)
> -{
> -  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
> -}
> -
> /* Return true if the instruction ignores VLMUL field of VTYPE.  */
> static bool
> ignore_vlmul_insn_p (rtx_insn *rinsn)
> @@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
>     if (!has_vl_op (rinsn))
>       return NULL_RTX;
> -  if (get_attr_avl_type (rinsn) == VLMAX)
> -    return RVV_VLMAX;
> -  extract_insn_cached (rinsn);
> -  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
> -}
> -/* Helper function to get SEW operand. We always have SEW value for
> -   all RVV instructions that have VTYPE OP.  */
> -static uint8_t
> -get_sew (rtx_insn *rinsn)
> -{
> -  return get_attr_sew (rinsn);
> -}
> -
> -/* Helper function to get VLMUL operand. We always have VLMUL value for
> -   all RVV instructions that have VTYPE OP. */
> -static enum vlmul_type
> -get_vlmul (rtx_insn *rinsn)
> -{
> -  return (enum vlmul_type) get_attr_vlmul (rinsn);
> -}
> +  extract_insn_cached (rinsn);
> +  if (vlmax_avl_type_p (rinsn))
> +    {
> +      if (BYTES_PER_RISCV_VECTOR.is_constant ())
> + {
> +   for (int i = 0; i < recog_data.n_operands; i++)
> +     if (GET_MODE_CLASS (recog_data.operand_mode[i]) == MODE_VECTOR_BOOL
> + && const_vlmax_p (recog_data.operand_mode[i]))
> +       return gen_int_mode (GET_MODE_NUNITS (recog_data.operand_mode[i]),
> +    Pmode);
> + }
> +      return RVV_VLMAX;
> +    }
> -/* Get default tail policy.  */
> -static bool
> -get_default_ta ()
> -{
> -  /* For the instruction that doesn't require TA, we still need a default value
> -     to emit vsetvl. We pick up the default value according to prefer policy. */
> -  return (bool) (get_prefer_tail_policy () & 0x1
> - || (get_prefer_tail_policy () >> 1 & 0x1));
> +  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
> }
> /* Get default mask policy.  */
> @@ -407,16 +371,6 @@ get_default_ma ()
> || (get_prefer_mask_policy () >> 1 & 0x1));
> }
> -/* Helper function to get TA operand.  */
> -static bool
> -tail_agnostic_p (rtx_insn *rinsn)
> -{
> -  /* If it doesn't have TA, we return agnostic by default.  */
> -  extract_insn_cached (rinsn);
> -  int ta = get_attr_ta (rinsn);
> -  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
> -}
> -
> /* Helper function to get MA operand.  */
> static bool
> mask_agnostic_p (rtx_insn *rinsn)
> @@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
>     return true;
> }
> -/* Change insn and Assert the change always happens.  */
> -static void
> -validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
> -{
> -  bool change_p = validate_change (object, loc, new_rtx, in_group);
> -  gcc_assert (change_p);
> -}
> -
> /* This flags indicates the minimum demand of the vl and vtype values by the
>      RVV instruction. For example, DEMAND_RATIO_P indicates that this RVV
>      instruction only needs the SEW/LMUL ratio to remain the same, and does not
> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> index dd17056fe82..08de62853a6 100644
> --- a/gcc/config/riscv/t-riscv
> +++ b/gcc/config/riscv/t-riscv
> @@ -69,6 +69,12 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
> $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
> $(srcdir)/config/riscv/riscv-vsetvl.cc
> +riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
> +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
> +  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h
> + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
> + $(srcdir)/config/riscv/riscv-avlprop.cc
> +
> riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
>     $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
>     $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index ef91950178f..0c59d1b90bc 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -809,7 +809,7 @@
>    V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
>    V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
>    V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
> -    (symbol_ref "riscv_vector::NONVLMAX")
> +    (symbol_ref "riscv_vector::VLS")
> (eq_attr "type" "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
>    vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
>    vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
> index 928a507a363..5278e4aa38f 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
> @@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
>       }
> }
> -/* { dg-final { scan-assembler {e32,m4} } } */
> +/* { dg-final { scan-assembler {e16,m2} } } */
> /* { dg-final { scan-assembler-not {csrr} } } */
> /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
> /* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
> index a50265fc1ec..1db2e073846 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
> @@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict b, int n)
>       a[i] = a[i] + b[i];
> }
> -/* { dg-final { scan-assembler {e32,m8} } } */
> +/* { dg-final { scan-assembler {e16,m4} } } */
> /* { dg-final { scan-assembler-not {csrr} } } */
> /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
> /* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
> index eac7cbc757b..ca88d42cdf4 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
> @@ -7,10 +7,11 @@
> /*
> ** foo:
> ** vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
> +** ...
> ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
> ** ...
> -** vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
> -** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
> +** vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
> +** ...
> ** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
> ** ...
> */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
> index 965365da4bb..13367423751 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
> @@ -3,7 +3,6 @@
> #include "ternop-2.c"
> -/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
> /* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */
> /* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized" } } */
> /* { dg-final { scan-assembler-not {\tvmv} } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
> new file mode 100644
> index 00000000000..b0d21650c3d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
> +
> +void
> +foo (int *__restrict a, int *__restrict b, int *__restrict c, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +    a[i] = b[i] + c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
> +/* { dg-final { scan-assembler-not {vsetivli} } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
> +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
> +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
> new file mode 100644
> index 00000000000..f2d8aa54b88
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
> +
> +void
> +foo (int *__restrict a, int *__restrict b, int *__restrict c,
> +     int *__restrict a2, int *__restrict b2, int *__restrict c2,
> +     int *__restrict a3, int *__restrict b3, int *__restrict c3,
> +     int *__restrict a4, int *__restrict b4, int *__restrict c4,
> +     int *__restrict a5, int *__restrict b5, int *__restrict c5,
> +     int *__restrict d, int *__restrict d2, int *__restrict d3,
> +     int *__restrict d4, int *__restrict d5, int n, int m)
> +{
> +  for (int i = 0; i < n; i++)
> +    {
> +      a[i] = b[i] + c[i];
> +      a2[i] = b2[i] + c2[i];
> +      a3[i] = b3[i] + c3[i];
> +      a4[i] = b4[i] + c4[i];
> +      a5[i] = a[i] + a4[i];
> +      d[i] = a[i] - a2[i];
> +      d2[i] = a2[i] * a[i];
> +      d3[i] = a3[i] * a2[i];
> +      d4[i] = a2[i] * d2[i];
> +      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
> +/* { dg-final { scan-assembler-not {vsetivli} } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
> +/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
> +/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> index 674ba0d72b4..fc830f2cd4d 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> @@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
> "" $CFLAGS
> dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \
> "-O3 -ftree-vectorize" $CFLAGS
> +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/avlprop/*.\[cS\]]] \
> + "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
> dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
> "-O3 -ftree-vectorize --param riscv-autovec-preference=scalable" $CFLAGS
> dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
> --
> 2.36.3
>   

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-26  1:22             ` Patrick O'Neill
@ 2023-10-26  1:27               ` juzhe.zhong
  2023-10-26  7:33                 ` Li, Pan2
  0 siblings, 1 reply; 13+ messages in thread
From: juzhe.zhong @ 2023-10-26  1:27 UTC (permalink / raw)
  To: Patrick O'Neill, gcc-patches
  Cc: kito.cheng, Kito.cheng, jeffreyalaw, Robin Dapp

I think it's QEMU issue:

line 15: 1520161 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test

I use SPIKE works fine. This is my SPIKE configuration

spike \
    --isa=rv64gcv_zvfh_zfh \
    --misaligned \
    ${PK_PATH}/pk${xlen} "$@"



juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-26 09:22
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization

On 10/25/23 17:49, juzhe.zhong@rivai.ai wrote:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8

These 2 FAILs are bogus. Testcases need to be adapted, I notice I didn't include this in this patch.

FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

These 2 already exist on the trunk for RV32.

FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test 
This FAIL for RV64 is odd. I don't have it.  Could you share me the debug log ?
rv64gcv debug log:

Executing on host: /scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany      -lm  -o ./mask_gather_load_run-11.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany -lm -o ./mask_gather_load_run-11.exe
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./mask_gather_load_run-11.exe
mask_gather_load_run-11.exe: /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c:98: main: Assertion `dest_uint16_t_uint8_t[i * 2] == dest2_uint16_t_uint8_t[i * 2]' failed.
/scratch/tc-testing/tc-avl/build-rv64gcv/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 1520161 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test

rv32gcv debug log:

Executing on host: /scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany      -lm  -o ./mask_gather_load_run-11.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany -lm -o ./mask_gather_load_run-11.exe
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./mask_gather_load_run-11.exe
mask_gather_load_run-11.exe: /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c:98: main: Assertion `dest_uint16_t_uint8_t[i * 2] == dest2_uint16_t_uint8_t[i * 2]' failed.
/scratch/tc-testing/tc-avl/build-rv32gcv/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 2593314 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test

Patrick
juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-26 08:37
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
Hi Juzhe,

I tested on glibc rv32/64gcv qemu.
Applied patch to/comparing with 668c4c3783970e7adf0591396b6d0d5286cc0541.

V2 results look much better! I don't see any new fortran failures but I am seeing new gcc failures:

rv64gcv:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test

rv32gcv:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

The popcount-run-1.c test doesn't show up for me on 668c4c3783970e7adf0591396b6d0d5286cc0541 rv32gcv or rv64gcv.
After applying your patch it only shows up on rv32gcv (rv64gcv still does not have the failure). This might be due to a difference in our testing setups.

Thanks,
Patrick

On 10/25/23 05:20, juzhe.zhong@rivai.ai wrote:
Hi, Patrick.

I have fixed on V2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634267.html

I have tested on RV32/RV64 C/C++, no regression. But I am not able to test on Fortran.

The failures you showed have been fixed. Except this one:
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
This FAIL is not because of this patch since I confirmed it already existed without this patch.
We will fix that on stage 3.

Could you verify with Fortran test ? 

Thanks.

juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-24 23:03
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
I'm seeing a variety of new failures, constrained to rv32gcv:

Tested using newlib/linux:
rv32gcv/ ilp32d/ medlow
rv64gcv/  lp64d/ medlow
rv64gcv_zvbb_zvbc_zvkg_zvkn_zvknc_zvkned_zvkng_zvknha_zvknhb_zvks_zvksc_zvksed_zvksg_zvksh_zvkt/  lp64d/ medlow
rv64imafdcv_zicond_zawrs_zbc_zvkng_zvksg_zvbb_zvbc_zicsr_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt/  lp64d/ medlow

Newlib failures:
rv32gcv:
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test

Debug log for testcases that aren't pr110557.c look like this:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-lmul=m4      -lm  -o ./popcount-run-1.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o ./popcount-run-1.exe
PASS: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c (test for excess errors)
spawn riscv64-unknown-elf-run ./popcount-run-1.exe
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
Debug log for pr110557.c:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs  -lm  -o ./pr110557.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs -lm -o ./pr110557.exe
PASS: g++.dg/vect/pr110557.cc  -std=c++14 (test for excess errors)
spawn riscv64-unknown-elf-run ./pr110557.exe
/scratch/tc-testing/tc-oct-23-avl/build-newlib/../scripts/wrapper/qemu/riscv64-unknown-elf-run: line 15: 3449805 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
Linux failures:
rv32gcv:
FAIL: gcc.dg/nextafter-2.c execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
FAIL: gfortran.dg/default_format_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_2.f90   -Os  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -Os  execution test
FAIL: gfortran.dg/large_real_kind_2.F90   -O0  execution test
FAIL: gfortran.dg/round_4.f90   -O0  execution test
FAIL: gfortran.dg/zero_sized_3.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test
FAIL: gfortran.dg/ieee/large_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O1  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O2  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -Os  execution test
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_sum.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops

Some (not all) debug log outputs:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions        -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
PASS: gfortran.fortran-torture/execute/intrinsic_count.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions
spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
STOP 2
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions -funroll-loops       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -funroll-loops -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
STOP 3
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops

Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output    -O0   -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o ./large_2.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o ./large_2.exe
PASS: gfortran.dg/ieee/large_2.f90   -O0  (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./large_2.exe
  0.333333333333333333333333333333333317         2.24271998593667819112500193394291495E+1644
STOP 1
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm  -o ./pr110557.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm -o ./pr110557.exe
PASS: g++.dg/vect/pr110557.cc  -std=c++98 (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./pr110557.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 323485 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-reduc-dot-21.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe
PASS: gcc.dg/vect/vect-reduc-dot-21.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-reduc-dot-21.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3484803 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-alias-check-16.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-alias-check-16.exe
PASS: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-alias-check-16.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3431975 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "flags: *RAW\\n"
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "using an address-based overlap test"
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump-not vect "using an index-based"
I've observed nextafter-2.c being flaky on the CI so that particular failure might not be real.

If you want any particular testcase's debug logs please let me know.

Patrick

On 10/23/23 21:30, Patrick O'Neill wrote:
The CI just picked it up: https://github.com/ewlu/gcc-precommit-ci/issues/449#issue-1958483272
Since it doesn't apply to the CI's baseline hash it's only performing a build.
I'll re-run it in the morning once the baseline has been updated.

In the meantime I started a full build+test run on my local machine.
I'll send you the results in ~10 hours - morning my time :-)

Patrick
On 10/23/23 20:44, juzhe.zhong@rivai.ai wrote:
CCing Patrick...

Hi, @Patrick.
Could you apply this patch and trigger your regression CI?

I don't have an environment to test fortran for now (I only test it on C/C++).

Thanks. 

juzhe.zhong@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-24 11:32
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.
 
Consider a simple vector addition operation:
 
https://godbolt.org/z/7hfGfEjW3
 
void
foo (int *__restrict a,
     int *__restrict b,
     int *__restrict n)
{
  for (int i = 0; i < n; i++)
      a[i] = a[i] + b[i];
}
 
Optimized IR:
 
Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);    -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;                              -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
 
We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:
 
vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
 
GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):
 
ARM SVE:
  
.L3:
        ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
        ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
        add     z31.s, z31.s, z30.s            -> un-predicated add
        st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store
 
Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.
 
Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:
 
1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
   We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.
 
To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.
 
The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)
 
Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.
 
The reasons as follows:
 
1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
   turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
PASS become heavy and heavy again, then we will need to refactor it again in the future.
Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
fixes.
 
2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things,  I don't think we should fuse them into same PASS.
 
3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.
 
4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
VSETVL PASS again which is already so complicated.)
 
Here is an example to demonstrate more:
 
https://godbolt.org/z/bE86sv3q5
 
void foo2 (int *__restrict a,
          int *__restrict b,
          int *__restrict c,
          int *__restrict a2,
          int *__restrict b2,
          int *__restrict c2,
          int *__restrict a3,
          int *__restrict b3,
          int *__restrict c3,
          int *__restrict a4,
          int *__restrict b4,
          int *__restrict c4,
          int *__restrict a5,
          int *__restrict b5,
          int *__restrict c5,
          int n)
{
    for (int i = 0; i < n; i++){
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i]+ a[i];
 
      a[i] = a[i] + c[i];
      b5[i] = a[i] + c[i];
      a2[i] = a[i] + c2[i];
      a3[i] = a[i] + c3[i];
      a4[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i]+ a[i];
    }
}
 
1. Loop Body:
 
Before this patch:                                          After this patch:
 
      vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli a4,t1,e32,m1,ta,ma                                    
        vle32.v v2,0(a2)                                     vle32.v v2,0(a2)
        vle32.v v4,0(a1)                                     vle32.v v3,0(t2)
        vle32.v v1,0(t2)                                     vle32.v v4,0(a1)
        vsetvli a7,zero,e32,m1,ta,ma                         vle32.v v1,0(t0)
        vadd.vv v4,v2,v4                                     vadd.vv v4,v2,v4
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v1,v3,v1
        vle32.v v3,0(s0)                                     vadd.vv v1,v1,v4
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v1,v1,v4
        vadd.vv v1,v3,v1                                     vadd.vv v1,v1,v4
        vadd.vv v1,v1,v4                                     vadd.vv v1,v1,v2
        vadd.vv v1,v1,v4                                     vadd.vv v2,v1,v2
        vadd.vv v1,v1,v4                                     vse32.v v2,0(t5)
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v2,v2,v1
        vle32.v v4,0(a5)                                     vadd.vv v2,v2,v1
        vsetvli a7,zero,e32,m1,ta,ma                         slli a7,a4,2
        vadd.vv v1,v1,v2                                     vadd.vv v3,v1,v3
        vadd.vv v2,v1,v2                                     vle32.v v5,0(a5)
        vadd.vv v4,v1,v4                                     vle32.v v6,0(t6)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v3,0(t3)
        vse32.v v2,0(t5)                                     vse32.v v2,0(a0)
        vse32.v v4,0(a3)                                     vadd.vv v3,v3,v1
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v2,v1,v5
        vadd.vv v3,v1,v3                                     vse32.v v3,0(t4)
        vadd.vv v2,v2,v1                                     vadd.vv v1,v1,v6
        vadd.vv v2,v2,v1                                     vse32.v v2,0(a3)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v1,0(a6)
        vse32.v v2,0(a0)                                     
        vse32.v v3,0(t3)                                     
        vle32.v v2,0(t0)                                     
        vsetvli a7,zero,e32,m1,ta,ma                                     
        vadd.vv v3,v3,v1                                     
        vsetvli zero,a4,e32,m1,ta,ma                                     
        vse32.v v3,0(t4)                                     
        vsetvli a7,zero,e32,m1,ta,ma                                     
        slli    a7,a4,2                                     
        vadd.vv v1,v1,v2                                     
        sub     t1,t1,a4                                     
        vsetvli zero,a4,e32,m1,ta,ma                                     
        vse32.v v1,0(a6)                                     
 
It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.
 
2. Epilogue:
    Before this patch:                                          After this patch:
 
     .L5:                                                      .L5:                                          
        ld      s0,8(sp)                                         ret
        addi    sp,sp,16                                        
        jr      ra                                        
 
This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'
 
The final codegen after this patch:
 
foo2:
lw t1,56(sp)
ld t6,0(sp)
ld t3,8(sp)
ld t0,16(sp)
ld t2,24(sp)
ld t4,32(sp)
ld t5,40(sp)
ble t1,zero,.L5
.L3:
vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2)
vle32.v v3,0(t2)
vle32.v v4,0(a1)
vle32.v v1,0(t0)
vadd.vv v4,v2,v4
vadd.vv v1,v3,v1
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v2
vadd.vv v2,v1,v2
vse32.v v2,0(t5)
vadd.vv v2,v2,v1
vadd.vv v2,v2,v1
slli a7,a4,2
vadd.vv v3,v1,v3
vle32.v v5,0(a5)
vle32.v v6,0(t6)
vse32.v v3,0(t3)
vse32.v v2,0(a0)
vadd.vv v3,v3,v1
vadd.vv v2,v1,v5
vse32.v v3,0(t4)
vadd.vv v1,v1,v6
vse32.v v2,0(a3)
vse32.v v1,0(a6)
sub t1,t1,a4
add a1,a1,a7
add a2,a2,a7
add a5,a5,a7
add t6,t6,a7
add t0,t0,a7
add t2,t2,a7
add t5,t5,a7
add a3,a3,a7
add a6,a6,a7
add t3,t3,a7
add t4,t4,a7
add a0,a0,a7
bne t1,zero,.L3
.L5:
ret
 
PR target/111888
 
gcc/ChangeLog:
 
* config.gcc: Add AVL propgatation PASS.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
(has_vtype_op): Export as global.
(has_vl_op): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(const_vlmax_p): Ditto.
* config/riscv/riscv-v.cc (has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(get_vlmul): Ditto.
* config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
(has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_sew): Ditto.
(get_vlmul): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
* config/riscv/t-riscv: Add AVL propagation PASS.
* config/riscv/vector.md: Fix VLS modes attribute.
* config/riscv/riscv-avlprop.cc: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
* gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
* gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.
 
---
gcc/config.gcc                                |   2 +-
gcc/config/riscv/riscv-avlprop.cc             | 350 ++++++++++++++++++
gcc/config/riscv/riscv-passes.def             |   1 +
gcc/config/riscv/riscv-protos.h               |  10 +
gcc/config/riscv/riscv-v.cc                   |  84 ++++-
gcc/config/riscv/riscv-vsetvl.cc              |  82 +---
gcc/config/riscv/t-riscv                      |   6 +
gcc/config/riscv/vector.md                    |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +-
.../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +-
.../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 -
.../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
.../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
15 files changed, 514 insertions(+), 84 deletions(-)
create mode 100644 gcc/config/riscv/riscv-avlprop.cc
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 606d3a8513e..efd53965c9a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -544,7 +544,7 @@ pru-*-*)
riscv*)
cpu_type=riscv
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
- extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
+ extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o riscv-avlprop.o"
extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o"
d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-avlprop.cc b/gcc/config/riscv/riscv-avlprop.cc
new file mode 100644
index 00000000000..bf3becd8371
--- /dev/null
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -0,0 +1,350 @@
+/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2023-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or(at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions.
+   A standalone AVL propagation pass is designed because:
+
+     - Better code maintain:
+       Current LCM-based VSETVL pass is so complicated that codes
+       there will become even harder to maintain. A straight forward
+       AVL propagation PASS is much easier to maintain.
+
+     - Reduce scalar register pressure:
+       A type of AVL propagation is we propagate AVL from NON-VLMAX
+       instruction to VLMAX instruction.
+       Note: VLMAX instruction should be ignore tail elements (TA)
+       and the result should be used by the NON-VLMAX instruction.
+       This optimization is mostly for auto-vectorization codes:
+
+   vsetvli r136, r137      --- SELECT_VL
+   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
+   vadd.vv (use VLMAX)     --- PLUS_EXPR
+   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
+
+ NO AVL propation:
+
+   vsetvli a5, a4, ta
+   vle8.v v1
+   vsetvli t0, zero, ta
+   vadd.vv v2, v1, v1
+   vse8.v v2
+
+ We can propagate the AVL to 'vadd.vv' since its result
+ is consumed by a 'vse8.v' which has AVL = a5 and its
+ tail elements are agnostic.
+
+       We DON'T do this optimization on VSETVL pass since it is a
+       post-RA pass that consumed 't0' already wheras a standalone
+       pre-RA AVL propagation pass allows us elide the consumption
+       of the pseudo register of 't0' then we can reduce scalar
+       register pressure.
+
+     - More AVL propagation opportunities:
+       A pre-RA pass is more flexible for AVL REG def-use chain,
+       thus we will get more potential AVL propagation as long as
+       it doesn't increase the scalar register pressure.
+*/
+
+#define IN_TARGET_CODE 1
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "target.h"
+#include "tree-pass.h"
+#include "df.h"
+#include "rtl-ssa.h"
+#include "cfgcleanup.h"
+#include "insn-attr.h"
+
+using namespace rtl_ssa;
+using namespace riscv_vector;
+
+/* The AVL propagation instructions and corresponding preferred AVL.
+   It will be updated during the analysis.  */
+static hash_map<insn_info *, rtx> *avlprops;
+
+const pass_data pass_data_avlprop = {
+  RTL_PASS, /* type */
+  "avlprop", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_avlprop : public rtl_opt_pass
+{
+public:
+  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass (pass_data_avlprop, ctxt) {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) final override
+  {
+    return TARGET_VECTOR && optimize > 0;
+  }
+  virtual unsigned int execute (function *) final override;
+}; // class pass_avlprop
+
+static void
+avlprop_init (void)
+{
+  calculate_dominance_info (CDI_DOMINATORS);
+  df_analyze ();
+  crtl->ssa = new function_info (cfun);
+  avlprops = new hash_map<insn_info *, rtx>;
+}
+
+static void
+avlprop_done (void)
+{
+  free_dominance_info (CDI_DOMINATORS);
+  if (crtl->ssa->perform_pending_updates ())
+    cleanup_cfg (0);
+  delete crtl->ssa;
+  crtl->ssa = nullptr;
+  delete avlprops;
+  avlprops = NULL;
+}
+
+/* Helper function to get AVL operand.  */
+static rtx
+get_avl (insn_info *insn, bool avlprop_p)
+{
+  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
+      || get_attr_avl_type (insn->rtl ()) == VLS)
+    return NULL_RTX;
+  if (avlprop_p)
+    {
+      if (avlprops->get (insn))
+ return (*avlprops->get (insn));
+      else if (vlmax_avl_type_p (insn->rtl ()))
+ return RVV_VLMAX;
+    }
+  extract_insn_cached (insn->rtl ());
+  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
+}
+
+/* This is a straight forward pattern ALWAYS in paritial auto-vectorization:
+
+     VL = SELECT_AVL (AVL, ...)
+     V0 = MASK_LEN_LOAD (..., VL)
+     V1 = MASK_LEN_LOAD (..., VL)
+     V2 = V0 + V1 --- Missed LEN information.
+     MASK_LEN_STORE (..., V2, VL)
+
+   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1, dummy LEN)
+   because:
+
+     - Few code changes in Loop Vectorizer.
+     - Reuse the current clean flow of partial vectorization, That is, apply
+       predicate LEN or MASK into LOAD/STORE operations and other special
+       arithmetic operations (e.d. DIV), then do the whole vector register
+       operation if it DON'T affect the correctness.
+       Such flow is used by all other targets like x86, sve, s390, ... etc.
+     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
+
+   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like PLUS_EXPR which
+   generates the VLMAX instruction due to missed LEN information. The later
+   VSETVL PASS will elided the redundant vsetvls.
+*/
+
+static rtx
+get_autovectorize_preferred_avl (insn_info *insn)
+{
+  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p (insn->rtl ()))
+    return NULL_RTX;
+
+  rtx use_avl = NULL_RTX;
+  insn_info *avl_use_insn = nullptr;
+  unsigned int ratio
+    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul (insn->rtl ()));
+  for (def_info *def : insn->defs ())
+    {
+      auto set = safe_dyn_cast<set_info *> (def);
+      if (!set || !set->is_reg ())
+ return NULL_RTX;
+      for (use_info *use : set->all_uses ())
+ {
+   if (!use->is_in_nondebug_insn ())
+     return NULL_RTX;
+   insn_info *use_insn = use->insn ();
+   /* FIXME: Stop AVL propagation if any USE is not a RVV real
+      instruction. It should be totally enough for vectorized codes since
+      they always locate at extended blocks.
+
+      TODO: We can extend PHI checking for intrinsic codes if it
+      necessary in the future.  */
+   if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!has_vl_op (use_insn->rtl ()))
+     continue;
+
+   rtx new_use_avl = get_avl (use_insn, true);
+   if (!new_use_avl)
+     return NULL_RTX;
+   if (!use_avl)
+     use_avl = new_use_avl;
+   if (!rtx_equal_p (use_avl, new_use_avl)
+       || calculate_ratio (get_sew (use_insn->rtl ()),
+   get_vlmul (use_insn->rtl ()))
+    != ratio
+       || vlmax_avl_p (new_use_avl)
+       || !tail_agnostic_p (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!avl_use_insn)
+     avl_use_insn = use_insn;
+ }
+    }
+
+  if (use_avl && register_operand (use_avl, Pmode))
+    {
+      gcc_assert (avl_use_insn);
+      // Find a definition at or neighboring INSN.
+      resource_info resource = full_register (REGNO (use_avl));
+      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
+      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
+      if (dl1.matching_set () || dl2.matching_set ())
+ return NULL_RTX;
+      def_info *def1 = dl1.last_def_of_prev_group ();
+      def_info *def2 = dl2.last_def_of_prev_group ();
+      if (def1 != def2)
+ return NULL_RTX;
+      /* FIXME: We only all AVL propation within a block which should
+ be totally enough for vectorized codes.
+
+ TODO: We can enhance it here for intrinsic codes in the future
+ if it is necessary.  */
+      if (def1->insn ()->bb () != insn->bb ()
+   || def1->insn ()->compare_with (insn) >= 0)
+ return NULL_RTX;
+    }
+  return use_avl;
+}
+
+/* If we have a preferred AVL to propagate, return the AVL.
+   Otherwise, return NULL_RTX as we don't need have any preferred
+   AVL.  */
+
+static rtx
+get_preferred_avl (insn_info *insn)
+{
+  /* TODO: We only do AVL propagation for missed-LEN partial
+     autovectorization for now.  We could add more more AVL
+     propagation for intrinsic codes in the future.  */
+  return get_autovectorize_preferred_avl (insn);
+}
+
+/* Return the AVL TYPE operand index.  */
+static int
+get_avl_type_index (insn_info *insn)
+{
+  extract_insn_cached (insn->rtl ());
+  /* Except rounding mode patterns, AVL TYPE operand
+     is always the last operand.  */
+  if (find_access (insn->uses (), VXRM_REGNUM)
+      || find_access (insn->uses (), FRM_REGNUM))
+    return recog_data.n_operands - 2;
+  return recog_data.n_operands - 1;
+}
+
+/* Main entry point for this pass.  */
+unsigned int
+pass_avlprop::execute (function *)
+{
+  avlprop_init ();
+
+  /* Go through all the instructions looking for AVL that we could propagate. */
+
+  insn_info *next;
+  bool change_p = true;
+
+  while (change_p)
+    {
+      /* Iterate on each instruction until no more change need.  */
+      change_p = false;
+      for (insn_info *insn = crtl->ssa->first_insn (); insn; insn = next)
+ {
+   next = insn->next_any_insn ();
+   /* We only forward AVL to the instruction that has AVL/VL operand
+      and can be optimized in RTL_SSA level.  */
+   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
+     continue;
+
+   rtx new_avl = get_preferred_avl (insn);
+   if (new_avl)
+     {
+       gcc_assert (!vlmax_avl_p (new_avl));
+       auto &update = avlprops->get_or_insert (insn);
+       change_p = !rtx_equal_p (update, new_avl);
+       update = new_avl;
+     }
+ }
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "\nNumber of successful AVL propagations: %d\n\n",
+      (int) avlprops->elements ());
+
+  for (const auto iter : *avlprops)
+    {
+      rtx_insn *rinsn = iter.first->rtl ();
+      if (dump_file)
+ {
+   fprintf (dump_file, "\nPropagating AVL: ");
+   print_rtl_single (dump_file, iter.second);
+   fprintf (dump_file, "into: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+      /* Replace AVL operand.  */
+      rtx new_pat
+ = simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first, false),
+ iter.second);
+      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
+
+      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
+      if (vlmax_avl_type_p (rinsn))
+ validate_change_or_fail (
+   rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)],
+   get_avl_type_rtx (avl_type::NONVLMAX), false);
+      if (dump_file)
+ {
+   fprintf (dump_file, "Successfully to match this instruction: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+    }
+
+  avlprop_done ();
+  return 0;
+}
+
+rtl_opt_pass *
+make_pass_avlprop (gcc::context *ctxt)
+{
+  return new pass_avlprop (ctxt);
+}
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
index 4084122cf0a..b6260939d5c 100644
--- a/gcc/config/riscv/riscv-passes.def
+++ b/gcc/config/riscv/riscv-passes.def
@@ -18,4 +18,5 @@
    <http://www.gnu.org/licenses/>.  */
INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
+INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6cb9d459ee9..2b09ec9ea9e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
/* Routines implemented in riscv-string.c.  */
@@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
bool cmp_lmul_gt_one (machine_mode);
bool gather_scatter_valid_offset_mode_p (machine_mode);
bool vls_mode_valid_p (machine_mode);
+bool has_vtype_op (rtx_insn *);
+bool has_vl_op (rtx_insn *);
+bool tail_agnostic_p (rtx_insn *);
+void validate_change_or_fail (rtx, rtx *, rtx, bool);
+bool vlmax_avl_type_p (rtx_insn *);
+bool vlmax_avl_p (rtx);
+uint8_t get_sew (rtx_insn *);
+enum vlmul_type get_vlmul (rtx_insn *);
+bool const_vlmax_p (machine_mode);
}
/* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e39a9507803..473622ac321 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -56,7 +56,7 @@ using namespace riscv_vector;
namespace riscv_vector {
/* Return true if vlmax is constant value and can be used in vsetivl.  */
-static bool
+bool
const_vlmax_p (machine_mode mode)
{
   poly_uint64 nuints = GET_MODE_NUNITS (mode);
@@ -298,14 +298,6 @@ public:
      len = force_reg (Pmode, len);
    vls_p = true;
  }
- else if (const_vlmax_p (vtype_mode))
-   {
-     /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
-        the vsetvli to obtain the value of vlmax.  */
-     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
-     len = gen_int_mode (nunits, Pmode);
-     vls_p = true;
-   }
else if (can_create_pseudo_p ())
  {
    len = gen_reg_rtx (Pmode);
@@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
   emit_move_insn (dst, x4);
}
+/* Return true if it is an RVV instruction depends on VTYPE global
+   status register.  */
+bool
+has_vtype_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
+}
+
+/* Return true if it is an RVV instruction depends on VL global
+   status register.  */
+bool
+has_vl_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
+}
+
+/* Get default tail policy.  */
+static bool
+get_default_ta ()
+{
+  /* For the instruction that doesn't require TA, we still need a default value
+     to emit vsetvl. We pick up the default value according to prefer policy. */
+  return (bool) (get_prefer_tail_policy () & 0x1
+ || (get_prefer_tail_policy () >> 1 & 0x1));
+}
+
+/* Helper function to get TA operand.  */
+bool
+tail_agnostic_p (rtx_insn *rinsn)
+{
+  /* If it doesn't have TA, we return agnostic by default.  */
+  extract_insn_cached (rinsn);
+  int ta = get_attr_ta (rinsn);
+  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
+}
+
+/* Change insn and Assert the change always happens.  */
+void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
+
+/* Return true if it is VLMAX AVL TYPE.  */
+bool
+vlmax_avl_type_p (rtx_insn *rinsn)
+{
+  return get_attr_avl_type (rinsn) == VLMAX;
+}
+
+/* Return true if RTX is RVV VLMAX AVL.  */
+bool
+vlmax_avl_p (rtx x)
+{
+  return x && rtx_equal_p (x, RVV_VLMAX);
+}
+
+/* Helper function to get SEW operand. We always have SEW value for
+   all RVV instructions that have VTYPE OP.  */
+uint8_t
+get_sew (rtx_insn *rinsn)
+{
+  return get_attr_sew (rinsn);
+}
+
+/* Helper function to get VLMUL operand. We always have VLMUL value for
+   all RVV instructions that have VTYPE OP. */
+enum vlmul_type
+get_vlmul (rtx_insn *rinsn)
+{
+  return (enum vlmul_type) get_attr_vlmul (rinsn);
+}
+
} // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index e9dd669de98..f2f19e423bf 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
   return agnostic_p ? "agnostic" : "undisturbed";
}
-static bool
-vlmax_avl_p (rtx x)
-{
-  return x && rtx_equal_p (x, RVV_VLMAX);
-}
-
-/* Return true if it is an RVV instruction depends on VTYPE global
-   status register.  */
-static bool
-has_vtype_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
-}
-
-/* Return true if it is an RVV instruction depends on VL global
-   status register.  */
-static bool
-has_vl_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
-}
-
/* Return true if the instruction ignores VLMUL field of VTYPE.  */
static bool
ignore_vlmul_insn_p (rtx_insn *rinsn)
@@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
   if (!has_vl_op (rinsn))
     return NULL_RTX;
-  if (get_attr_avl_type (rinsn) == VLMAX)
-    return RVV_VLMAX;
-  extract_insn_cached (rinsn);
-  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
-}
-/* Helper function to get SEW operand. We always have SEW value for
-   all RVV instructions that have VTYPE OP.  */
-static uint8_t
-get_sew (rtx_insn *rinsn)
-{
-  return get_attr_sew (rinsn);
-}
-
-/* Helper function to get VLMUL operand. We always have VLMUL value for
-   all RVV instructions that have VTYPE OP. */
-static enum vlmul_type
-get_vlmul (rtx_insn *rinsn)
-{
-  return (enum vlmul_type) get_attr_vlmul (rinsn);
-}
+  extract_insn_cached (rinsn);
+  if (vlmax_avl_type_p (rinsn))
+    {
+      if (BYTES_PER_RISCV_VECTOR.is_constant ())
+ {
+   for (int i = 0; i < recog_data.n_operands; i++)
+     if (GET_MODE_CLASS (recog_data.operand_mode[i]) == MODE_VECTOR_BOOL
+ && const_vlmax_p (recog_data.operand_mode[i]))
+       return gen_int_mode (GET_MODE_NUNITS (recog_data.operand_mode[i]),
+    Pmode);
+ }
+      return RVV_VLMAX;
+    }
-/* Get default tail policy.  */
-static bool
-get_default_ta ()
-{
-  /* For the instruction that doesn't require TA, we still need a default value
-     to emit vsetvl. We pick up the default value according to prefer policy. */
-  return (bool) (get_prefer_tail_policy () & 0x1
- || (get_prefer_tail_policy () >> 1 & 0x1));
+  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
}
/* Get default mask policy.  */
@@ -407,16 +371,6 @@ get_default_ma ()
|| (get_prefer_mask_policy () >> 1 & 0x1));
}
-/* Helper function to get TA operand.  */
-static bool
-tail_agnostic_p (rtx_insn *rinsn)
-{
-  /* If it doesn't have TA, we return agnostic by default.  */
-  extract_insn_cached (rinsn);
-  int ta = get_attr_ta (rinsn);
-  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
-}
-
/* Helper function to get MA operand.  */
static bool
mask_agnostic_p (rtx_insn *rinsn)
@@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
   return true;
}
-/* Change insn and Assert the change always happens.  */
-static void
-validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
-{
-  bool change_p = validate_change (object, loc, new_rtx, in_group);
-  gcc_assert (change_p);
-}
-
/* This flags indicates the minimum demand of the vl and vtype values by the
    RVV instruction. For example, DEMAND_RATIO_P indicates that this RVV
    instruction only needs the SEW/LMUL ratio to remain the same, and does not
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index dd17056fe82..08de62853a6 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -69,6 +69,12 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/riscv/riscv-vsetvl.cc
+riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
+  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h
+ $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+ $(srcdir)/config/riscv/riscv-avlprop.cc
+
riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
   $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ef91950178f..0c59d1b90bc 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -809,7 +809,7 @@
  V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
  V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
  V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
-    (symbol_ref "riscv_vector::NONVLMAX")
+    (symbol_ref "riscv_vector::VLS")
(eq_attr "type" "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
  vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
  vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
index 928a507a363..5278e4aa38f 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
@@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
     }
}
-/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler {e16,m2} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
index a50265fc1ec..1db2e073846 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
@@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict b, int n)
     a[i] = a[i] + b[i];
}
-/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler {e16,m4} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
index eac7cbc757b..ca88d42cdf4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
@@ -7,10 +7,11 @@
/*
** foo:
** vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
-** vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
-** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
+** vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
*/
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
index 965365da4bb..13367423751 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
@@ -3,7 +3,6 @@
#include "ternop-2.c"
-/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
/* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */
/* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized" } } */
/* { dg-final { scan-assembler-not {\tvmv} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
new file mode 100644
index 00000000000..b0d21650c3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
new file mode 100644
index 00000000000..f2d8aa54b88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c,
+     int *__restrict a2, int *__restrict b2, int *__restrict c2,
+     int *__restrict a3, int *__restrict b3, int *__restrict c3,
+     int *__restrict a4, int *__restrict b4, int *__restrict c4,
+     int *__restrict a5, int *__restrict b5, int *__restrict c5,
+     int *__restrict d, int *__restrict d2, int *__restrict d3,
+     int *__restrict d4, int *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 674ba0d72b4..fc830f2cd4d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
"" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \
"-O3 -ftree-vectorize" $CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/avlprop/*.\[cS\]]] \
+ "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
"-O3 -ftree-vectorize --param riscv-autovec-preference=scalable" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
--
2.36.3
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-26  1:27               ` juzhe.zhong
@ 2023-10-26  7:33                 ` Li, Pan2
  2023-10-26  7:48                   ` juzhe.zhong
  0 siblings, 1 reply; 13+ messages in thread
From: Li, Pan2 @ 2023-10-26  7:33 UTC (permalink / raw)
  To: juzhe.zhong, Patrick O'Neill, gcc-patches
  Cc: kito.cheng, Kito.cheng, jeffreyalaw, Robin Dapp

Just apply v2 version for RV32 with spike riscv-sim for confirmation.

This patch only increased 2 popcount run failures as well as 2 dump failures, and the mask_gather_load_run-11.c is PASS within spike.

Pan

-----Original Message-----
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai> 
Sent: Thursday, October 26, 2023 9:27 AM
To: Patrick O'Neill <patrick@rivosinc.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: kito.cheng <kito.cheng@gmail.com>; Kito.cheng <kito.cheng@sifive.com>; jeffreyalaw <jeffreyalaw@gmail.com>; Robin Dapp <rdapp.gcc@gmail.com>
Subject: Re: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization

I think it's QEMU issue:

line 15: 1520161 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test

I use SPIKE works fine. This is my SPIKE configuration

spike \
    --isa=rv64gcv_zvfh_zfh \
    --misaligned \
    ${PK_PATH}/pk${xlen} "$@"



juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-26 09:22
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization

On 10/25/23 17:49, juzhe.zhong@rivai.ai wrote:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8

These 2 FAILs are bogus. Testcases need to be adapted, I notice I didn't include this in this patch.

FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

These 2 already exist on the trunk for RV32.

FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test 
This FAIL for RV64 is odd. I don't have it.  Could you share me the debug log ?
rv64gcv debug log:

Executing on host: /scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany      -lm  -o ./mask_gather_load_run-11.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany -lm -o ./mask_gather_load_run-11.exe
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./mask_gather_load_run-11.exe
mask_gather_load_run-11.exe: /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c:98: main: Assertion `dest_uint16_t_uint8_t[i * 2] == dest2_uint16_t_uint8_t[i * 2]' failed.
/scratch/tc-testing/tc-avl/build-rv64gcv/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 1520161 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test

rv32gcv debug log:

Executing on host: /scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany      -lm  -o ./mask_gather_load_run-11.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany -lm -o ./mask_gather_load_run-11.exe
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./mask_gather_load_run-11.exe
mask_gather_load_run-11.exe: /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c:98: main: Assertion `dest_uint16_t_uint8_t[i * 2] == dest2_uint16_t_uint8_t[i * 2]' failed.
/scratch/tc-testing/tc-avl/build-rv32gcv/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 2593314 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test

Patrick
juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-26 08:37
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
Hi Juzhe,

I tested on glibc rv32/64gcv qemu.
Applied patch to/comparing with 668c4c3783970e7adf0591396b6d0d5286cc0541.

V2 results look much better! I don't see any new fortran failures but I am seeing new gcc failures:

rv64gcv:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test

rv32gcv:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

The popcount-run-1.c test doesn't show up for me on 668c4c3783970e7adf0591396b6d0d5286cc0541 rv32gcv or rv64gcv.
After applying your patch it only shows up on rv32gcv (rv64gcv still does not have the failure). This might be due to a difference in our testing setups.

Thanks,
Patrick

On 10/25/23 05:20, juzhe.zhong@rivai.ai wrote:
Hi, Patrick.

I have fixed on V2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634267.html

I have tested on RV32/RV64 C/C++, no regression. But I am not able to test on Fortran.

The failures you showed have been fixed. Except this one:
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
This FAIL is not because of this patch since I confirmed it already existed without this patch.
We will fix that on stage 3.

Could you verify with Fortran test ? 

Thanks.

juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-24 23:03
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
I'm seeing a variety of new failures, constrained to rv32gcv:

Tested using newlib/linux:
rv32gcv/ ilp32d/ medlow
rv64gcv/  lp64d/ medlow
rv64gcv_zvbb_zvbc_zvkg_zvkn_zvknc_zvkned_zvkng_zvknha_zvknhb_zvks_zvksc_zvksed_zvksg_zvksh_zvkt/  lp64d/ medlow
rv64imafdcv_zicond_zawrs_zbc_zvkng_zvksg_zvbb_zvbc_zicsr_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt/  lp64d/ medlow

Newlib failures:
rv32gcv:
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test

Debug log for testcases that aren't pr110557.c look like this:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-lmul=m4      -lm  -o ./popcount-run-1.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o ./popcount-run-1.exe
PASS: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c (test for excess errors)
spawn riscv64-unknown-elf-run ./popcount-run-1.exe
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
Debug log for pr110557.c:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs  -lm  -o ./pr110557.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs -lm -o ./pr110557.exe
PASS: g++.dg/vect/pr110557.cc  -std=c++14 (test for excess errors)
spawn riscv64-unknown-elf-run ./pr110557.exe
/scratch/tc-testing/tc-oct-23-avl/build-newlib/../scripts/wrapper/qemu/riscv64-unknown-elf-run: line 15: 3449805 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
Linux failures:
rv32gcv:
FAIL: gcc.dg/nextafter-2.c execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
FAIL: gfortran.dg/default_format_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_2.f90   -Os  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -Os  execution test
FAIL: gfortran.dg/large_real_kind_2.F90   -O0  execution test
FAIL: gfortran.dg/round_4.f90   -O0  execution test
FAIL: gfortran.dg/zero_sized_3.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test
FAIL: gfortran.dg/ieee/large_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O1  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O2  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -Os  execution test
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_sum.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops

Some (not all) debug log outputs:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions        -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
PASS: gfortran.fortran-torture/execute/intrinsic_count.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions
spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
STOP 2
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions -funroll-loops       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -funroll-loops -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
STOP 3
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops

Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output    -O0   -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o ./large_2.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o ./large_2.exe
PASS: gfortran.dg/ieee/large_2.f90   -O0  (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./large_2.exe
  0.333333333333333333333333333333333317         2.24271998593667819112500193394291495E+1644
STOP 1
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm  -o ./pr110557.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm -o ./pr110557.exe
PASS: g++.dg/vect/pr110557.cc  -std=c++98 (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./pr110557.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 323485 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-reduc-dot-21.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe
PASS: gcc.dg/vect/vect-reduc-dot-21.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-reduc-dot-21.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3484803 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-alias-check-16.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-alias-check-16.exe
PASS: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-alias-check-16.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3431975 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "flags: *RAW\\n"
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "using an address-based overlap test"
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump-not vect "using an index-based"
I've observed nextafter-2.c being flaky on the CI so that particular failure might not be real.

If you want any particular testcase's debug logs please let me know.

Patrick

On 10/23/23 21:30, Patrick O'Neill wrote:
The CI just picked it up: https://github.com/ewlu/gcc-precommit-ci/issues/449#issue-1958483272
Since it doesn't apply to the CI's baseline hash it's only performing a build.
I'll re-run it in the morning once the baseline has been updated.

In the meantime I started a full build+test run on my local machine.
I'll send you the results in ~10 hours - morning my time :-)

Patrick
On 10/23/23 20:44, juzhe.zhong@rivai.ai wrote:
CCing Patrick...

Hi, @Patrick.
Could you apply this patch and trigger your regression CI?

I don't have an environment to test fortran for now (I only test it on C/C++).

Thanks. 

juzhe.zhong@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-24 11:32
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.
 
Consider a simple vector addition operation:
 
https://godbolt.org/z/7hfGfEjW3
 
void
foo (int *__restrict a,
     int *__restrict b,
     int *__restrict n)
{
  for (int i = 0; i < n; i++)
      a[i] = a[i] + b[i];
}
 
Optimized IR:
 
Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);    -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;                              -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
 
We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:
 
vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
 
GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):
 
ARM SVE:
  
.L3:
        ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
        ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
        add     z31.s, z31.s, z30.s            -> un-predicated add
        st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store
 
Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.
 
Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:
 
1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
   We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.
 
To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.
 
The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)
 
Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.
 
The reasons as follows:
 
1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
   turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
PASS become heavy and heavy again, then we will need to refactor it again in the future.
Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
fixes.
 
2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things,  I don't think we should fuse them into same PASS.
 
3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.
 
4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
VSETVL PASS again which is already so complicated.)
 
Here is an example to demonstrate more:
 
https://godbolt.org/z/bE86sv3q5
 
void foo2 (int *__restrict a,
          int *__restrict b,
          int *__restrict c,
          int *__restrict a2,
          int *__restrict b2,
          int *__restrict c2,
          int *__restrict a3,
          int *__restrict b3,
          int *__restrict c3,
          int *__restrict a4,
          int *__restrict b4,
          int *__restrict c4,
          int *__restrict a5,
          int *__restrict b5,
          int *__restrict c5,
          int n)
{
    for (int i = 0; i < n; i++){
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i]+ a[i];
 
      a[i] = a[i] + c[i];
      b5[i] = a[i] + c[i];
      a2[i] = a[i] + c2[i];
      a3[i] = a[i] + c3[i];
      a4[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i]+ a[i];
    }
}
 
1. Loop Body:
 
Before this patch:                                          After this patch:
 
      vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli a4,t1,e32,m1,ta,ma                                    
        vle32.v v2,0(a2)                                     vle32.v v2,0(a2)
        vle32.v v4,0(a1)                                     vle32.v v3,0(t2)
        vle32.v v1,0(t2)                                     vle32.v v4,0(a1)
        vsetvli a7,zero,e32,m1,ta,ma                         vle32.v v1,0(t0)
        vadd.vv v4,v2,v4                                     vadd.vv v4,v2,v4
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v1,v3,v1
        vle32.v v3,0(s0)                                     vadd.vv v1,v1,v4
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v1,v1,v4
        vadd.vv v1,v3,v1                                     vadd.vv v1,v1,v4
        vadd.vv v1,v1,v4                                     vadd.vv v1,v1,v2
        vadd.vv v1,v1,v4                                     vadd.vv v2,v1,v2
        vadd.vv v1,v1,v4                                     vse32.v v2,0(t5)
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v2,v2,v1
        vle32.v v4,0(a5)                                     vadd.vv v2,v2,v1
        vsetvli a7,zero,e32,m1,ta,ma                         slli a7,a4,2
        vadd.vv v1,v1,v2                                     vadd.vv v3,v1,v3
        vadd.vv v2,v1,v2                                     vle32.v v5,0(a5)
        vadd.vv v4,v1,v4                                     vle32.v v6,0(t6)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v3,0(t3)
        vse32.v v2,0(t5)                                     vse32.v v2,0(a0)
        vse32.v v4,0(a3)                                     vadd.vv v3,v3,v1
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v2,v1,v5
        vadd.vv v3,v1,v3                                     vse32.v v3,0(t4)
        vadd.vv v2,v2,v1                                     vadd.vv v1,v1,v6
        vadd.vv v2,v2,v1                                     vse32.v v2,0(a3)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v1,0(a6)
        vse32.v v2,0(a0)                                     
        vse32.v v3,0(t3)                                     
        vle32.v v2,0(t0)                                     
        vsetvli a7,zero,e32,m1,ta,ma                                     
        vadd.vv v3,v3,v1                                     
        vsetvli zero,a4,e32,m1,ta,ma                                     
        vse32.v v3,0(t4)                                     
        vsetvli a7,zero,e32,m1,ta,ma                                     
        slli    a7,a4,2                                     
        vadd.vv v1,v1,v2                                     
        sub     t1,t1,a4                                     
        vsetvli zero,a4,e32,m1,ta,ma                                     
        vse32.v v1,0(a6)                                     
 
It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.
 
2. Epilogue:
    Before this patch:                                          After this patch:
 
     .L5:                                                      .L5:                                          
        ld      s0,8(sp)                                         ret
        addi    sp,sp,16                                        
        jr      ra                                        
 
This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'
 
The final codegen after this patch:
 
foo2:
lw t1,56(sp)
ld t6,0(sp)
ld t3,8(sp)
ld t0,16(sp)
ld t2,24(sp)
ld t4,32(sp)
ld t5,40(sp)
ble t1,zero,.L5
.L3:
vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2)
vle32.v v3,0(t2)
vle32.v v4,0(a1)
vle32.v v1,0(t0)
vadd.vv v4,v2,v4
vadd.vv v1,v3,v1
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v2
vadd.vv v2,v1,v2
vse32.v v2,0(t5)
vadd.vv v2,v2,v1
vadd.vv v2,v2,v1
slli a7,a4,2
vadd.vv v3,v1,v3
vle32.v v5,0(a5)
vle32.v v6,0(t6)
vse32.v v3,0(t3)
vse32.v v2,0(a0)
vadd.vv v3,v3,v1
vadd.vv v2,v1,v5
vse32.v v3,0(t4)
vadd.vv v1,v1,v6
vse32.v v2,0(a3)
vse32.v v1,0(a6)
sub t1,t1,a4
add a1,a1,a7
add a2,a2,a7
add a5,a5,a7
add t6,t6,a7
add t0,t0,a7
add t2,t2,a7
add t5,t5,a7
add a3,a3,a7
add a6,a6,a7
add t3,t3,a7
add t4,t4,a7
add a0,a0,a7
bne t1,zero,.L3
.L5:
ret
 
PR target/111888
 
gcc/ChangeLog:
 
* config.gcc: Add AVL propgatation PASS.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
(has_vtype_op): Export as global.
(has_vl_op): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(const_vlmax_p): Ditto.
* config/riscv/riscv-v.cc (has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(get_vlmul): Ditto.
* config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
(has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_sew): Ditto.
(get_vlmul): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
* config/riscv/t-riscv: Add AVL propagation PASS.
* config/riscv/vector.md: Fix VLS modes attribute.
* config/riscv/riscv-avlprop.cc: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
* gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
* gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.
 
---
gcc/config.gcc                                |   2 +-
gcc/config/riscv/riscv-avlprop.cc             | 350 ++++++++++++++++++
gcc/config/riscv/riscv-passes.def             |   1 +
gcc/config/riscv/riscv-protos.h               |  10 +
gcc/config/riscv/riscv-v.cc                   |  84 ++++-
gcc/config/riscv/riscv-vsetvl.cc              |  82 +---
gcc/config/riscv/t-riscv                      |   6 +
gcc/config/riscv/vector.md                    |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +-
.../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +-
.../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 -
.../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
.../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
15 files changed, 514 insertions(+), 84 deletions(-)
create mode 100644 gcc/config/riscv/riscv-avlprop.cc
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 606d3a8513e..efd53965c9a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -544,7 +544,7 @@ pru-*-*)
riscv*)
cpu_type=riscv
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
- extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
+ extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o riscv-avlprop.o"
extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o"
d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-avlprop.cc b/gcc/config/riscv/riscv-avlprop.cc
new file mode 100644
index 00000000000..bf3becd8371
--- /dev/null
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -0,0 +1,350 @@
+/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2023-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or(at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions.
+   A standalone AVL propagation pass is designed because:
+
+     - Better code maintain:
+       Current LCM-based VSETVL pass is so complicated that codes
+       there will become even harder to maintain. A straight forward
+       AVL propagation PASS is much easier to maintain.
+
+     - Reduce scalar register pressure:
+       A type of AVL propagation is we propagate AVL from NON-VLMAX
+       instruction to VLMAX instruction.
+       Note: VLMAX instruction should be ignore tail elements (TA)
+       and the result should be used by the NON-VLMAX instruction.
+       This optimization is mostly for auto-vectorization codes:
+
+   vsetvli r136, r137      --- SELECT_VL
+   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
+   vadd.vv (use VLMAX)     --- PLUS_EXPR
+   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
+
+ NO AVL propation:
+
+   vsetvli a5, a4, ta
+   vle8.v v1
+   vsetvli t0, zero, ta
+   vadd.vv v2, v1, v1
+   vse8.v v2
+
+ We can propagate the AVL to 'vadd.vv' since its result
+ is consumed by a 'vse8.v' which has AVL = a5 and its
+ tail elements are agnostic.
+
+       We DON'T do this optimization on VSETVL pass since it is a
+       post-RA pass that consumed 't0' already wheras a standalone
+       pre-RA AVL propagation pass allows us elide the consumption
+       of the pseudo register of 't0' then we can reduce scalar
+       register pressure.
+
+     - More AVL propagation opportunities:
+       A pre-RA pass is more flexible for AVL REG def-use chain,
+       thus we will get more potential AVL propagation as long as
+       it doesn't increase the scalar register pressure.
+*/
+
+#define IN_TARGET_CODE 1
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "target.h"
+#include "tree-pass.h"
+#include "df.h"
+#include "rtl-ssa.h"
+#include "cfgcleanup.h"
+#include "insn-attr.h"
+
+using namespace rtl_ssa;
+using namespace riscv_vector;
+
+/* The AVL propagation instructions and corresponding preferred AVL.
+   It will be updated during the analysis.  */
+static hash_map<insn_info *, rtx> *avlprops;
+
+const pass_data pass_data_avlprop = {
+  RTL_PASS, /* type */
+  "avlprop", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_avlprop : public rtl_opt_pass
+{
+public:
+  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass (pass_data_avlprop, ctxt) {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) final override
+  {
+    return TARGET_VECTOR && optimize > 0;
+  }
+  virtual unsigned int execute (function *) final override;
+}; // class pass_avlprop
+
+static void
+avlprop_init (void)
+{
+  calculate_dominance_info (CDI_DOMINATORS);
+  df_analyze ();
+  crtl->ssa = new function_info (cfun);
+  avlprops = new hash_map<insn_info *, rtx>;
+}
+
+static void
+avlprop_done (void)
+{
+  free_dominance_info (CDI_DOMINATORS);
+  if (crtl->ssa->perform_pending_updates ())
+    cleanup_cfg (0);
+  delete crtl->ssa;
+  crtl->ssa = nullptr;
+  delete avlprops;
+  avlprops = NULL;
+}
+
+/* Helper function to get AVL operand.  */
+static rtx
+get_avl (insn_info *insn, bool avlprop_p)
+{
+  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
+      || get_attr_avl_type (insn->rtl ()) == VLS)
+    return NULL_RTX;
+  if (avlprop_p)
+    {
+      if (avlprops->get (insn))
+ return (*avlprops->get (insn));
+      else if (vlmax_avl_type_p (insn->rtl ()))
+ return RVV_VLMAX;
+    }
+  extract_insn_cached (insn->rtl ());
+  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
+}
+
+/* This is a straight forward pattern ALWAYS in paritial auto-vectorization:
+
+     VL = SELECT_AVL (AVL, ...)
+     V0 = MASK_LEN_LOAD (..., VL)
+     V1 = MASK_LEN_LOAD (..., VL)
+     V2 = V0 + V1 --- Missed LEN information.
+     MASK_LEN_STORE (..., V2, VL)
+
+   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1, dummy LEN)
+   because:
+
+     - Few code changes in Loop Vectorizer.
+     - Reuse the current clean flow of partial vectorization, That is, apply
+       predicate LEN or MASK into LOAD/STORE operations and other special
+       arithmetic operations (e.d. DIV), then do the whole vector register
+       operation if it DON'T affect the correctness.
+       Such flow is used by all other targets like x86, sve, s390, ... etc.
+     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
+
+   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like PLUS_EXPR which
+   generates the VLMAX instruction due to missed LEN information. The later
+   VSETVL PASS will elided the redundant vsetvls.
+*/
+
+static rtx
+get_autovectorize_preferred_avl (insn_info *insn)
+{
+  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p (insn->rtl ()))
+    return NULL_RTX;
+
+  rtx use_avl = NULL_RTX;
+  insn_info *avl_use_insn = nullptr;
+  unsigned int ratio
+    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul (insn->rtl ()));
+  for (def_info *def : insn->defs ())
+    {
+      auto set = safe_dyn_cast<set_info *> (def);
+      if (!set || !set->is_reg ())
+ return NULL_RTX;
+      for (use_info *use : set->all_uses ())
+ {
+   if (!use->is_in_nondebug_insn ())
+     return NULL_RTX;
+   insn_info *use_insn = use->insn ();
+   /* FIXME: Stop AVL propagation if any USE is not a RVV real
+      instruction. It should be totally enough for vectorized codes since
+      they always locate at extended blocks.
+
+      TODO: We can extend PHI checking for intrinsic codes if it
+      necessary in the future.  */
+   if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!has_vl_op (use_insn->rtl ()))
+     continue;
+
+   rtx new_use_avl = get_avl (use_insn, true);
+   if (!new_use_avl)
+     return NULL_RTX;
+   if (!use_avl)
+     use_avl = new_use_avl;
+   if (!rtx_equal_p (use_avl, new_use_avl)
+       || calculate_ratio (get_sew (use_insn->rtl ()),
+   get_vlmul (use_insn->rtl ()))
+    != ratio
+       || vlmax_avl_p (new_use_avl)
+       || !tail_agnostic_p (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!avl_use_insn)
+     avl_use_insn = use_insn;
+ }
+    }
+
+  if (use_avl && register_operand (use_avl, Pmode))
+    {
+      gcc_assert (avl_use_insn);
+      // Find a definition at or neighboring INSN.
+      resource_info resource = full_register (REGNO (use_avl));
+      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
+      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
+      if (dl1.matching_set () || dl2.matching_set ())
+ return NULL_RTX;
+      def_info *def1 = dl1.last_def_of_prev_group ();
+      def_info *def2 = dl2.last_def_of_prev_group ();
+      if (def1 != def2)
+ return NULL_RTX;
+      /* FIXME: We only all AVL propation within a block which should
+ be totally enough for vectorized codes.
+
+ TODO: We can enhance it here for intrinsic codes in the future
+ if it is necessary.  */
+      if (def1->insn ()->bb () != insn->bb ()
+   || def1->insn ()->compare_with (insn) >= 0)
+ return NULL_RTX;
+    }
+  return use_avl;
+}
+
+/* If we have a preferred AVL to propagate, return the AVL.
+   Otherwise, return NULL_RTX as we don't need have any preferred
+   AVL.  */
+
+static rtx
+get_preferred_avl (insn_info *insn)
+{
+  /* TODO: We only do AVL propagation for missed-LEN partial
+     autovectorization for now.  We could add more more AVL
+     propagation for intrinsic codes in the future.  */
+  return get_autovectorize_preferred_avl (insn);
+}
+
+/* Return the AVL TYPE operand index.  */
+static int
+get_avl_type_index (insn_info *insn)
+{
+  extract_insn_cached (insn->rtl ());
+  /* Except rounding mode patterns, AVL TYPE operand
+     is always the last operand.  */
+  if (find_access (insn->uses (), VXRM_REGNUM)
+      || find_access (insn->uses (), FRM_REGNUM))
+    return recog_data.n_operands - 2;
+  return recog_data.n_operands - 1;
+}
+
+/* Main entry point for this pass.  */
+unsigned int
+pass_avlprop::execute (function *)
+{
+  avlprop_init ();
+
+  /* Go through all the instructions looking for AVL that we could propagate. */
+
+  insn_info *next;
+  bool change_p = true;
+
+  while (change_p)
+    {
+      /* Iterate on each instruction until no more change need.  */
+      change_p = false;
+      for (insn_info *insn = crtl->ssa->first_insn (); insn; insn = next)
+ {
+   next = insn->next_any_insn ();
+   /* We only forward AVL to the instruction that has AVL/VL operand
+      and can be optimized in RTL_SSA level.  */
+   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
+     continue;
+
+   rtx new_avl = get_preferred_avl (insn);
+   if (new_avl)
+     {
+       gcc_assert (!vlmax_avl_p (new_avl));
+       auto &update = avlprops->get_or_insert (insn);
+       change_p = !rtx_equal_p (update, new_avl);
+       update = new_avl;
+     }
+ }
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "\nNumber of successful AVL propagations: %d\n\n",
+      (int) avlprops->elements ());
+
+  for (const auto iter : *avlprops)
+    {
+      rtx_insn *rinsn = iter.first->rtl ();
+      if (dump_file)
+ {
+   fprintf (dump_file, "\nPropagating AVL: ");
+   print_rtl_single (dump_file, iter.second);
+   fprintf (dump_file, "into: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+      /* Replace AVL operand.  */
+      rtx new_pat
+ = simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first, false),
+ iter.second);
+      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
+
+      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
+      if (vlmax_avl_type_p (rinsn))
+ validate_change_or_fail (
+   rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)],
+   get_avl_type_rtx (avl_type::NONVLMAX), false);
+      if (dump_file)
+ {
+   fprintf (dump_file, "Successfully to match this instruction: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+    }
+
+  avlprop_done ();
+  return 0;
+}
+
+rtl_opt_pass *
+make_pass_avlprop (gcc::context *ctxt)
+{
+  return new pass_avlprop (ctxt);
+}
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
index 4084122cf0a..b6260939d5c 100644
--- a/gcc/config/riscv/riscv-passes.def
+++ b/gcc/config/riscv/riscv-passes.def
@@ -18,4 +18,5 @@
    <http://www.gnu.org/licenses/>.  */
INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
+INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6cb9d459ee9..2b09ec9ea9e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
/* Routines implemented in riscv-string.c.  */
@@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
bool cmp_lmul_gt_one (machine_mode);
bool gather_scatter_valid_offset_mode_p (machine_mode);
bool vls_mode_valid_p (machine_mode);
+bool has_vtype_op (rtx_insn *);
+bool has_vl_op (rtx_insn *);
+bool tail_agnostic_p (rtx_insn *);
+void validate_change_or_fail (rtx, rtx *, rtx, bool);
+bool vlmax_avl_type_p (rtx_insn *);
+bool vlmax_avl_p (rtx);
+uint8_t get_sew (rtx_insn *);
+enum vlmul_type get_vlmul (rtx_insn *);
+bool const_vlmax_p (machine_mode);
}
/* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e39a9507803..473622ac321 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -56,7 +56,7 @@ using namespace riscv_vector;
namespace riscv_vector {
/* Return true if vlmax is constant value and can be used in vsetivl.  */
-static bool
+bool
const_vlmax_p (machine_mode mode)
{
   poly_uint64 nuints = GET_MODE_NUNITS (mode);
@@ -298,14 +298,6 @@ public:
      len = force_reg (Pmode, len);
    vls_p = true;
  }
- else if (const_vlmax_p (vtype_mode))
-   {
-     /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
-        the vsetvli to obtain the value of vlmax.  */
-     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
-     len = gen_int_mode (nunits, Pmode);
-     vls_p = true;
-   }
else if (can_create_pseudo_p ())
  {
    len = gen_reg_rtx (Pmode);
@@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
   emit_move_insn (dst, x4);
}
+/* Return true if it is an RVV instruction depends on VTYPE global
+   status register.  */
+bool
+has_vtype_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
+}
+
+/* Return true if it is an RVV instruction depends on VL global
+   status register.  */
+bool
+has_vl_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
+}
+
+/* Get default tail policy.  */
+static bool
+get_default_ta ()
+{
+  /* For the instruction that doesn't require TA, we still need a default value
+     to emit vsetvl. We pick up the default value according to prefer policy. */
+  return (bool) (get_prefer_tail_policy () & 0x1
+ || (get_prefer_tail_policy () >> 1 & 0x1));
+}
+
+/* Helper function to get TA operand.  */
+bool
+tail_agnostic_p (rtx_insn *rinsn)
+{
+  /* If it doesn't have TA, we return agnostic by default.  */
+  extract_insn_cached (rinsn);
+  int ta = get_attr_ta (rinsn);
+  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
+}
+
+/* Change insn and Assert the change always happens.  */
+void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
+
+/* Return true if it is VLMAX AVL TYPE.  */
+bool
+vlmax_avl_type_p (rtx_insn *rinsn)
+{
+  return get_attr_avl_type (rinsn) == VLMAX;
+}
+
+/* Return true if RTX is RVV VLMAX AVL.  */
+bool
+vlmax_avl_p (rtx x)
+{
+  return x && rtx_equal_p (x, RVV_VLMAX);
+}
+
+/* Helper function to get SEW operand. We always have SEW value for
+   all RVV instructions that have VTYPE OP.  */
+uint8_t
+get_sew (rtx_insn *rinsn)
+{
+  return get_attr_sew (rinsn);
+}
+
+/* Helper function to get VLMUL operand. We always have VLMUL value for
+   all RVV instructions that have VTYPE OP. */
+enum vlmul_type
+get_vlmul (rtx_insn *rinsn)
+{
+  return (enum vlmul_type) get_attr_vlmul (rinsn);
+}
+
} // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index e9dd669de98..f2f19e423bf 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
   return agnostic_p ? "agnostic" : "undisturbed";
}
-static bool
-vlmax_avl_p (rtx x)
-{
-  return x && rtx_equal_p (x, RVV_VLMAX);
-}
-
-/* Return true if it is an RVV instruction depends on VTYPE global
-   status register.  */
-static bool
-has_vtype_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
-}
-
-/* Return true if it is an RVV instruction depends on VL global
-   status register.  */
-static bool
-has_vl_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
-}
-
/* Return true if the instruction ignores VLMUL field of VTYPE.  */
static bool
ignore_vlmul_insn_p (rtx_insn *rinsn)
@@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
   if (!has_vl_op (rinsn))
     return NULL_RTX;
-  if (get_attr_avl_type (rinsn) == VLMAX)
-    return RVV_VLMAX;
-  extract_insn_cached (rinsn);
-  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
-}
-/* Helper function to get SEW operand. We always have SEW value for
-   all RVV instructions that have VTYPE OP.  */
-static uint8_t
-get_sew (rtx_insn *rinsn)
-{
-  return get_attr_sew (rinsn);
-}
-
-/* Helper function to get VLMUL operand. We always have VLMUL value for
-   all RVV instructions that have VTYPE OP. */
-static enum vlmul_type
-get_vlmul (rtx_insn *rinsn)
-{
-  return (enum vlmul_type) get_attr_vlmul (rinsn);
-}
+  extract_insn_cached (rinsn);
+  if (vlmax_avl_type_p (rinsn))
+    {
+      if (BYTES_PER_RISCV_VECTOR.is_constant ())
+ {
+   for (int i = 0; i < recog_data.n_operands; i++)
+     if (GET_MODE_CLASS (recog_data.operand_mode[i]) == MODE_VECTOR_BOOL
+ && const_vlmax_p (recog_data.operand_mode[i]))
+       return gen_int_mode (GET_MODE_NUNITS (recog_data.operand_mode[i]),
+    Pmode);
+ }
+      return RVV_VLMAX;
+    }
-/* Get default tail policy.  */
-static bool
-get_default_ta ()
-{
-  /* For the instruction that doesn't require TA, we still need a default value
-     to emit vsetvl. We pick up the default value according to prefer policy. */
-  return (bool) (get_prefer_tail_policy () & 0x1
- || (get_prefer_tail_policy () >> 1 & 0x1));
+  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
}
/* Get default mask policy.  */
@@ -407,16 +371,6 @@ get_default_ma ()
|| (get_prefer_mask_policy () >> 1 & 0x1));
}
-/* Helper function to get TA operand.  */
-static bool
-tail_agnostic_p (rtx_insn *rinsn)
-{
-  /* If it doesn't have TA, we return agnostic by default.  */
-  extract_insn_cached (rinsn);
-  int ta = get_attr_ta (rinsn);
-  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
-}
-
/* Helper function to get MA operand.  */
static bool
mask_agnostic_p (rtx_insn *rinsn)
@@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
   return true;
}
-/* Change insn and Assert the change always happens.  */
-static void
-validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
-{
-  bool change_p = validate_change (object, loc, new_rtx, in_group);
-  gcc_assert (change_p);
-}
-
/* This flags indicates the minimum demand of the vl and vtype values by the
    RVV instruction. For example, DEMAND_RATIO_P indicates that this RVV
    instruction only needs the SEW/LMUL ratio to remain the same, and does not
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index dd17056fe82..08de62853a6 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -69,6 +69,12 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/riscv/riscv-vsetvl.cc
+riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
+  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h
+ $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+ $(srcdir)/config/riscv/riscv-avlprop.cc
+
riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
   $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ef91950178f..0c59d1b90bc 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -809,7 +809,7 @@
  V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
  V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
  V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
-    (symbol_ref "riscv_vector::NONVLMAX")
+    (symbol_ref "riscv_vector::VLS")
(eq_attr "type" "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
  vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
  vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
index 928a507a363..5278e4aa38f 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
@@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
     }
}
-/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler {e16,m2} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
index a50265fc1ec..1db2e073846 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
@@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict b, int n)
     a[i] = a[i] + b[i];
}
-/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler {e16,m4} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
index eac7cbc757b..ca88d42cdf4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
@@ -7,10 +7,11 @@
/*
** foo:
** vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
-** vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
-** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
+** vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
*/
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
index 965365da4bb..13367423751 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
@@ -3,7 +3,6 @@
#include "ternop-2.c"
-/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
/* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */
/* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized" } } */
/* { dg-final { scan-assembler-not {\tvmv} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
new file mode 100644
index 00000000000..b0d21650c3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
new file mode 100644
index 00000000000..f2d8aa54b88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c,
+     int *__restrict a2, int *__restrict b2, int *__restrict c2,
+     int *__restrict a3, int *__restrict b3, int *__restrict c3,
+     int *__restrict a4, int *__restrict b4, int *__restrict c4,
+     int *__restrict a5, int *__restrict b5, int *__restrict c5,
+     int *__restrict d, int *__restrict d2, int *__restrict d3,
+     int *__restrict d4, int *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 674ba0d72b4..fc830f2cd4d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
"" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \
"-O3 -ftree-vectorize" $CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/avlprop/*.\[cS\]]] \
+ "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
"-O3 -ftree-vectorize --param riscv-autovec-preference=scalable" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
--
2.36.3
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RE: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-26  7:33                 ` Li, Pan2
@ 2023-10-26  7:48                   ` juzhe.zhong
  2023-10-26  7:50                     ` Robin Dapp
  0 siblings, 1 reply; 13+ messages in thread
From: juzhe.zhong @ 2023-10-26  7:48 UTC (permalink / raw)
  To: pan2.li, Patrick O'Neill, gcc-patches
  Cc: kito.cheng, Kito.cheng, jeffreyalaw, Robin Dapp

Yes. I just checked again.

Before this patch:
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test


After this patch:
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

Increased FAILS are LMUL = M4. I have analyzed the codegen. Looks reasonable.

Moreover, When I removed 'popcount_64' and test, all passed no matter apply this patch or not.

I think it is because popcount64 is buggy in RV32, this patch trigger LMUL = 4 bug already existed that we were lucky.

So I suggest this patch should go ahead and ignore popcount issue for now. (I will send V3 with fixing dump FAILs).

I am not familiar  with popcount, Robin. Any suggestions?


juzhe.zhong@rivai.ai
 
From: Li, Pan2
Date: 2023-10-26 15:33
To: juzhe.zhong@rivai.ai; Patrick O'Neill; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: RE: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
Just apply v2 version for RV32 with spike riscv-sim for confirmation.
 
This patch only increased 2 popcount run failures as well as 2 dump failures, and the mask_gather_load_run-11.c is PASS within spike.
 
Pan
 
-----Original Message-----
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Thursday, October 26, 2023 9:27 AM
To: Patrick O'Neill <patrick@rivosinc.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: kito.cheng <kito.cheng@gmail.com>; Kito.cheng <kito.cheng@sifive.com>; jeffreyalaw <jeffreyalaw@gmail.com>; Robin Dapp <rdapp.gcc@gmail.com>
Subject: Re: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
 
I think it's QEMU issue:
 
line 15: 1520161 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
 
I use SPIKE works fine. This is my SPIKE configuration
 
spike \
    --isa=rv64gcv_zvfh_zfh \
    --misaligned \
    ${PK_PATH}/pk${xlen} "$@"
 
 
 
juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-26 09:22
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
 
On 10/25/23 17:49, juzhe.zhong@rivai.ai wrote:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
 
These 2 FAILs are bogus. Testcases need to be adapted, I notice I didn't include this in this patch.
 
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
 
These 2 already exist on the trunk for RV32.
 
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test 
This FAIL for RV64 is odd. I don't have it.  Could you share me the debug log ?
rv64gcv debug log:
 
Executing on host: /scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany      -lm  -o ./mask_gather_load_run-11.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv64gcv/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany -lm -o ./mask_gather_load_run-11.exe
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./mask_gather_load_run-11.exe
mask_gather_load_run-11.exe: /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c:98: main: Assertion `dest_uint16_t_uint8_t[i * 2] == dest2_uint16_t_uint8_t[i * 2]' failed.
/scratch/tc-testing/tc-avl/build-rv64gcv/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 1520161 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
 
rv32gcv debug log:
 
Executing on host: /scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany      -lm  -o ./mask_gather_load_run-11.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-avl/build-rv32gcv/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math -mcmodel=medany -lm -o ./mask_gather_load_run-11.exe
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./mask_gather_load_run-11.exe
mask_gather_load_run-11.exe: /scratch/tc-testing/tc-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c:98: main: Assertion `dest_uint16_t_uint8_t[i * 2] == dest2_uint16_t_uint8_t[i * 2]' failed.
/scratch/tc-testing/tc-avl/build-rv32gcv/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 2593314 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
 
Patrick
juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-26 08:37
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
Hi Juzhe,
 
I tested on glibc rv32/64gcv qemu.
Applied patch to/comparing with 668c4c3783970e7adf0591396b6d0d5286cc0541.
 
V2 results look much better! I don't see any new fortran failures but I am seeing new gcc failures:
 
rv64gcv:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
 
rv32gcv:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m4
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic  scan-assembler e32,m8
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
 
The popcount-run-1.c test doesn't show up for me on 668c4c3783970e7adf0591396b6d0d5286cc0541 rv32gcv or rv64gcv.
After applying your patch it only shows up on rv32gcv (rv64gcv still does not have the failure). This might be due to a difference in our testing setups.
 
Thanks,
Patrick
 
On 10/25/23 05:20, juzhe.zhong@rivai.ai wrote:
Hi, Patrick.
 
I have fixed on V2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634267.html
 
I have tested on RV32/RV64 C/C++, no regression. But I am not able to test on Fortran.
 
The failures you showed have been fixed. Except this one:
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
This FAIL is not because of this patch since I confirmed it already existed without this patch.
We will fix that on stage 3.
 
Could you verify with Fortran test ? 
 
Thanks.
 
juzhe.zhong@rivai.ai
 
From: Patrick O'Neill
Date: 2023-10-24 23:03
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
I'm seeing a variety of new failures, constrained to rv32gcv:
 
Tested using newlib/linux:
rv32gcv/ ilp32d/ medlow
rv64gcv/  lp64d/ medlow
rv64gcv_zvbb_zvbc_zvkg_zvkn_zvknc_zvkned_zvkng_zvknha_zvknhb_zvks_zvksc_zvksed_zvksg_zvksh_zvkt/  lp64d/ medlow
rv64imafdcv_zicond_zawrs_zbc_zvkng_zvksg_zvbb_zvbc_zicsr_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt/  lp64d/ medlow
 
Newlib failures:
rv32gcv:
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
 
Debug log for testcases that aren't pr110557.c look like this:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   -ftree-vectorize -O3 --param riscv-autovec-lmul=m4      -lm  -o ./popcount-run-1.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -ftree-vectorize -O3 --param riscv-autovec-lmul=m4 -lm -o ./popcount-run-1.exe
PASS: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c (test for excess errors)
spawn riscv64-unknown-elf-run ./popcount-run-1.exe
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
Debug log for pr110557.c:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs  -lm  -o ./pr110557.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/gcc/testsuite/g++5/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include/riscv64-unknown-elf -I/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-newlib/build-gcc-newlib-stage2/riscv64-unknown-elf/rv32imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/ilp32d/libstdc++-v3/src/experimental/.libs -lm -o ./pr110557.exe
PASS: g++.dg/vect/pr110557.cc  -std=c++14 (test for excess errors)
spawn riscv64-unknown-elf-run ./pr110557.exe
/scratch/tc-testing/tc-oct-23-avl/build-newlib/../scripts/wrapper/qemu/riscv64-unknown-elf-run: line 15: 3449805 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
Linux failures:
rv32gcv:
FAIL: gcc.dg/nextafter-2.c execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-10.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-19.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-20.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-reduc-dot-22.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 execution test
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
FAIL: gfortran.dg/default_format_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_2.f90   -Os  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -Os  execution test
FAIL: gfortran.dg/large_real_kind_2.F90   -O0  execution test
FAIL: gfortran.dg/round_4.f90   -O0  execution test
FAIL: gfortran.dg/zero_sized_3.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test
FAIL: gfortran.dg/ieee/large_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O1  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O2  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/large_2.f90   -Os  execution test
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
FAIL: gfortran.fortran-torture/execute/intrinsic_sum.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
 
Some (not all) debug log outputs:
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions        -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_count.f90 -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/./libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
PASS: gfortran.fortran-torture/execute/intrinsic_count.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions
spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran8/intrinsic_count.x
STOP 2
FAIL: gfortran.fortran-torture/execute/intrinsic_count.f90 execution,  -O2 -fomit-frame-pointer -finline-functions
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output  -w  -O2 -fomit-frame-pointer -finline-functions -funroll-loops       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_matmul.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -w -O2 -fomit-frame-pointer -finline-functions -funroll-loops -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
spawn riscv64-unknown-linux-gnu-run /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/intrinsic_matmul.x
STOP 3
FAIL: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops
 
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -fdiagnostics-plain-output    -O0   -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans       -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs  -lm  -o ./large_2.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../gfortran -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/gfortran10/../../ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/ -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libgfortran/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libatomic/.libs -lm -o ./large_2.exe
PASS: gfortran.dg/ieee/large_2.f90   -O0  (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./large_2.exe
  0.333333333333333333333333333333333317         2.24271998593667819112500193394291495E+1644
STOP 1
FAIL: gfortran.dg/ieee/large_2.f90   -O0  execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output  -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0  -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details        -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs  -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm  -o ./pr110557.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../xg++ -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/testsuite/g++8/../../ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/g++.dg/vect/pr110557.cc -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -nostdinc++ -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include/riscv64-unknown-linux-gnu -I/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/include -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/libsupc++ -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/include/backward -I/scratch/tc-testing/tc-oct-23-avl/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=c++98 -O2 -ftree-vectorize -fno-vect-cost-model --param riscv-autovec-preference=scalable --param riscv-vector-abi -fdump-tree-vect-details -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/.libs -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libstdc++-v3/src/experimental/.libs -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/ -L/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/lib32/ilp32d/libitm/.libs -lm -o ./pr110557.exe
PASS: g++.dg/vect/pr110557.cc  -std=c++98 (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./pr110557.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 323485 Trace/breakpoint trap   (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-reduc-dot-21.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-reduc-dot-21.exe
PASS: gcc.dg/vect/vect-reduc-dot-21.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-reduc-dot-21.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3484803 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.dg/vect/vect-reduc-dot-21.c execution test
Executing on host: /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/  /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details      -lm  -o ./vect-alias-check-16.exe    (timeout = 600)
spawn -ignore SIGHUP /scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/xgcc -B/scratch/tc-testing/tc-oct-23-avl/build-linux/build-gcc-linux-stage2/gcc/ /scratch/tc-testing/tc-oct-23-avl/gcc/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output --param riscv-autovec-preference=scalable --param riscv-vector-abi -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./vect-alias-check-16.exe
PASS: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-alias-check-16.exe
/scratch/tc-testing/tc-oct-23-avl/build-linux/../scripts/wrapper/qemu/riscv64-unknown-linux-gnu-run: line 15: 3431975 Aborted                 (core dumped) QEMU_CPU="$(march-to-cpu-opt --get-riscv-tag $1)" qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"
FAIL: gcc.dg/vect/vect-alias-check-16.c execution test
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "flags: *RAW\\n"
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump vect "using an address-based overlap test"
PASS: gcc.dg/vect/vect-alias-check-16.c scan-tree-dump-not vect "using an index-based"
I've observed nextafter-2.c being flaky on the CI so that particular failure might not be real.
 
If you want any particular testcase's debug logs please let me know.
 
Patrick
 
On 10/23/23 21:30, Patrick O'Neill wrote:
The CI just picked it up: https://github.com/ewlu/gcc-precommit-ci/issues/449#issue-1958483272
Since it doesn't apply to the CI's baseline hash it's only performing a build.
I'll re-run it in the morning once the baseline has been updated.
 
In the meantime I started a full build+test run on my local machine.
I'll send you the results in ~10 hours - morning my time :-)
 
Patrick
On 10/23/23 20:44, juzhe.zhong@rivai.ai wrote:
CCing Patrick...
 
Hi, @Patrick.
Could you apply this patch and trigger your regression CI?
 
I don't have an environment to test fortran for now (I only test it on C/C++).
 
Thanks. 
 
juzhe.zhong@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-24 11:32
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.
 
Consider a simple vector addition operation:
 
https://godbolt.org/z/7hfGfEjW3
 
void
foo (int *__restrict a,
     int *__restrict b,
     int *__restrict n)
{
  for (int i = 0; i < n; i++)
      a[i] = a[i] + b[i];
}
 
Optimized IR:
 
Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);    -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;                              -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
 
We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:
 
vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
 
GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):
 
ARM SVE:
  
.L3:
        ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
        ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
        add     z31.s, z31.s, z30.s            -> un-predicated add
        st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store
 
Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.
 
Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:
 
1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
   We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.
 
To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.
 
The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)
 
Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.
 
The reasons as follows:
 
1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
   turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
PASS become heavy and heavy again, then we will need to refactor it again in the future.
Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
fixes.
 
2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things,  I don't think we should fuse them into same PASS.
 
3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.
 
4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
VSETVL PASS again which is already so complicated.)
 
Here is an example to demonstrate more:
 
https://godbolt.org/z/bE86sv3q5
 
void foo2 (int *__restrict a,
          int *__restrict b,
          int *__restrict c,
          int *__restrict a2,
          int *__restrict b2,
          int *__restrict c2,
          int *__restrict a3,
          int *__restrict b3,
          int *__restrict c3,
          int *__restrict a4,
          int *__restrict b4,
          int *__restrict c4,
          int *__restrict a5,
          int *__restrict b5,
          int *__restrict c5,
          int n)
{
    for (int i = 0; i < n; i++){
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i]+ a[i];
 
      a[i] = a[i] + c[i];
      b5[i] = a[i] + c[i];
      a2[i] = a[i] + c2[i];
      a3[i] = a[i] + c3[i];
      a4[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i]+ a[i];
    }
}
 
1. Loop Body:
 
Before this patch:                                          After this patch:
 
      vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli a4,t1,e32,m1,ta,ma                                    
        vle32.v v2,0(a2)                                     vle32.v v2,0(a2)
        vle32.v v4,0(a1)                                     vle32.v v3,0(t2)
        vle32.v v1,0(t2)                                     vle32.v v4,0(a1)
        vsetvli a7,zero,e32,m1,ta,ma                         vle32.v v1,0(t0)
        vadd.vv v4,v2,v4                                     vadd.vv v4,v2,v4
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v1,v3,v1
        vle32.v v3,0(s0)                                     vadd.vv v1,v1,v4
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v1,v1,v4
        vadd.vv v1,v3,v1                                     vadd.vv v1,v1,v4
        vadd.vv v1,v1,v4                                     vadd.vv v1,v1,v2
        vadd.vv v1,v1,v4                                     vadd.vv v2,v1,v2
        vadd.vv v1,v1,v4                                     vse32.v v2,0(t5)
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v2,v2,v1
        vle32.v v4,0(a5)                                     vadd.vv v2,v2,v1
        vsetvli a7,zero,e32,m1,ta,ma                         slli a7,a4,2
        vadd.vv v1,v1,v2                                     vadd.vv v3,v1,v3
        vadd.vv v2,v1,v2                                     vle32.v v5,0(a5)
        vadd.vv v4,v1,v4                                     vle32.v v6,0(t6)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v3,0(t3)
        vse32.v v2,0(t5)                                     vse32.v v2,0(a0)
        vse32.v v4,0(a3)                                     vadd.vv v3,v3,v1
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v2,v1,v5
        vadd.vv v3,v1,v3                                     vse32.v v3,0(t4)
        vadd.vv v2,v2,v1                                     vadd.vv v1,v1,v6
        vadd.vv v2,v2,v1                                     vse32.v v2,0(a3)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v1,0(a6)
        vse32.v v2,0(a0)                                     
        vse32.v v3,0(t3)                                     
        vle32.v v2,0(t0)                                     
        vsetvli a7,zero,e32,m1,ta,ma                                     
        vadd.vv v3,v3,v1                                     
        vsetvli zero,a4,e32,m1,ta,ma                                     
        vse32.v v3,0(t4)                                     
        vsetvli a7,zero,e32,m1,ta,ma                                     
        slli    a7,a4,2                                     
        vadd.vv v1,v1,v2                                     
        sub     t1,t1,a4                                     
        vsetvli zero,a4,e32,m1,ta,ma                                     
        vse32.v v1,0(a6)                                     
 
It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.
 
2. Epilogue:
    Before this patch:                                          After this patch:
 
     .L5:                                                      .L5:                                          
        ld      s0,8(sp)                                         ret
        addi    sp,sp,16                                        
        jr      ra                                        
 
This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'
 
The final codegen after this patch:
 
foo2:
lw t1,56(sp)
ld t6,0(sp)
ld t3,8(sp)
ld t0,16(sp)
ld t2,24(sp)
ld t4,32(sp)
ld t5,40(sp)
ble t1,zero,.L5
.L3:
vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2)
vle32.v v3,0(t2)
vle32.v v4,0(a1)
vle32.v v1,0(t0)
vadd.vv v4,v2,v4
vadd.vv v1,v3,v1
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v2
vadd.vv v2,v1,v2
vse32.v v2,0(t5)
vadd.vv v2,v2,v1
vadd.vv v2,v2,v1
slli a7,a4,2
vadd.vv v3,v1,v3
vle32.v v5,0(a5)
vle32.v v6,0(t6)
vse32.v v3,0(t3)
vse32.v v2,0(a0)
vadd.vv v3,v3,v1
vadd.vv v2,v1,v5
vse32.v v3,0(t4)
vadd.vv v1,v1,v6
vse32.v v2,0(a3)
vse32.v v1,0(a6)
sub t1,t1,a4
add a1,a1,a7
add a2,a2,a7
add a5,a5,a7
add t6,t6,a7
add t0,t0,a7
add t2,t2,a7
add t5,t5,a7
add a3,a3,a7
add a6,a6,a7
add t3,t3,a7
add t4,t4,a7
add a0,a0,a7
bne t1,zero,.L3
.L5:
ret
 
PR target/111888
 
gcc/ChangeLog:
 
* config.gcc: Add AVL propgatation PASS.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
(has_vtype_op): Export as global.
(has_vl_op): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(const_vlmax_p): Ditto.
* config/riscv/riscv-v.cc (has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(vlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(get_vlmul): Ditto.
* config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
(has_vtype_op): Ditto.
(has_vl_op): Ditto.
(get_sew): Ditto.
(get_vlmul): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
* config/riscv/t-riscv: Add AVL propagation PASS.
* config/riscv/vector.md: Fix VLS modes attribute.
* config/riscv/riscv-avlprop.cc: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add AVL propagation.
* gcc.target/riscv/rvv/avlprop/pr111888-1.c: New test.
* gcc.target/riscv/rvv/avlprop/pr111888-2.c: New test.
 
---
gcc/config.gcc                                |   2 +-
gcc/config/riscv/riscv-avlprop.cc             | 350 ++++++++++++++++++
gcc/config/riscv/riscv-passes.def             |   1 +
gcc/config/riscv/riscv-protos.h               |  10 +
gcc/config/riscv/riscv-v.cc                   |  84 ++++-
gcc/config/riscv/riscv-vsetvl.cc              |  82 +---
gcc/config/riscv/t-riscv                      |   6 +
gcc/config/riscv/vector.md                    |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul4-5.c     |   2 +-
.../costmodel/riscv/rvv/dynamic-lmul8-2.c     |   2 +-
.../riscv/rvv/autovec/partial/select_vl-2.c   |   5 +-
.../riscv/rvv/autovec/ternop/ternop_nofm-2.c  |   1 -
.../gcc.target/riscv/rvv/avlprop/pr111888-1.c |  16 +
.../gcc.target/riscv/rvv/avlprop/pr111888-2.c |  33 ++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
15 files changed, 514 insertions(+), 84 deletions(-)
create mode 100644 gcc/config/riscv/riscv-avlprop.cc
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 606d3a8513e..efd53965c9a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -544,7 +544,7 @@ pru-*-*)
riscv*)
cpu_type=riscv
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
- extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
+ extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o riscv-avlprop.o"
extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o"
d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-avlprop.cc b/gcc/config/riscv/riscv-avlprop.cc
new file mode 100644
index 00000000000..bf3becd8371
--- /dev/null
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -0,0 +1,350 @@
+/* AVL propagation pass for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2023-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or(at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Pre-RA RTL_SSA-based pass propagates AVL for RVV instructions.
+   A standalone AVL propagation pass is designed because:
+
+     - Better code maintain:
+       Current LCM-based VSETVL pass is so complicated that codes
+       there will become even harder to maintain. A straight forward
+       AVL propagation PASS is much easier to maintain.
+
+     - Reduce scalar register pressure:
+       A type of AVL propagation is we propagate AVL from NON-VLMAX
+       instruction to VLMAX instruction.
+       Note: VLMAX instruction should be ignore tail elements (TA)
+       and the result should be used by the NON-VLMAX instruction.
+       This optimization is mostly for auto-vectorization codes:
+
+   vsetvli r136, r137      --- SELECT_VL
+   vle8.v (use avl = r136) --- IFN_MASK_LEN_LOAD
+   vadd.vv (use VLMAX)     --- PLUS_EXPR
+   vse8.v (use avl = r136) --- IFN_MASK_LEN_STORE
+
+ NO AVL propation:
+
+   vsetvli a5, a4, ta
+   vle8.v v1
+   vsetvli t0, zero, ta
+   vadd.vv v2, v1, v1
+   vse8.v v2
+
+ We can propagate the AVL to 'vadd.vv' since its result
+ is consumed by a 'vse8.v' which has AVL = a5 and its
+ tail elements are agnostic.
+
+       We DON'T do this optimization on VSETVL pass since it is a
+       post-RA pass that consumed 't0' already wheras a standalone
+       pre-RA AVL propagation pass allows us elide the consumption
+       of the pseudo register of 't0' then we can reduce scalar
+       register pressure.
+
+     - More AVL propagation opportunities:
+       A pre-RA pass is more flexible for AVL REG def-use chain,
+       thus we will get more potential AVL propagation as long as
+       it doesn't increase the scalar register pressure.
+*/
+
+#define IN_TARGET_CODE 1
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "target.h"
+#include "tree-pass.h"
+#include "df.h"
+#include "rtl-ssa.h"
+#include "cfgcleanup.h"
+#include "insn-attr.h"
+
+using namespace rtl_ssa;
+using namespace riscv_vector;
+
+/* The AVL propagation instructions and corresponding preferred AVL.
+   It will be updated during the analysis.  */
+static hash_map<insn_info *, rtx> *avlprops;
+
+const pass_data pass_data_avlprop = {
+  RTL_PASS, /* type */
+  "avlprop", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_avlprop : public rtl_opt_pass
+{
+public:
+  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass (pass_data_avlprop, ctxt) {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) final override
+  {
+    return TARGET_VECTOR && optimize > 0;
+  }
+  virtual unsigned int execute (function *) final override;
+}; // class pass_avlprop
+
+static void
+avlprop_init (void)
+{
+  calculate_dominance_info (CDI_DOMINATORS);
+  df_analyze ();
+  crtl->ssa = new function_info (cfun);
+  avlprops = new hash_map<insn_info *, rtx>;
+}
+
+static void
+avlprop_done (void)
+{
+  free_dominance_info (CDI_DOMINATORS);
+  if (crtl->ssa->perform_pending_updates ())
+    cleanup_cfg (0);
+  delete crtl->ssa;
+  crtl->ssa = nullptr;
+  delete avlprops;
+  avlprops = NULL;
+}
+
+/* Helper function to get AVL operand.  */
+static rtx
+get_avl (insn_info *insn, bool avlprop_p)
+{
+  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
+      || get_attr_avl_type (insn->rtl ()) == VLS)
+    return NULL_RTX;
+  if (avlprop_p)
+    {
+      if (avlprops->get (insn))
+ return (*avlprops->get (insn));
+      else if (vlmax_avl_type_p (insn->rtl ()))
+ return RVV_VLMAX;
+    }
+  extract_insn_cached (insn->rtl ());
+  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
+}
+
+/* This is a straight forward pattern ALWAYS in paritial auto-vectorization:
+
+     VL = SELECT_AVL (AVL, ...)
+     V0 = MASK_LEN_LOAD (..., VL)
+     V1 = MASK_LEN_LOAD (..., VL)
+     V2 = V0 + V1 --- Missed LEN information.
+     MASK_LEN_STORE (..., V2, VL)
+
+   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1, dummy LEN)
+   because:
+
+     - Few code changes in Loop Vectorizer.
+     - Reuse the current clean flow of partial vectorization, That is, apply
+       predicate LEN or MASK into LOAD/STORE operations and other special
+       arithmetic operations (e.d. DIV), then do the whole vector register
+       operation if it DON'T affect the correctness.
+       Such flow is used by all other targets like x86, sve, s390, ... etc.
+     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
+
+   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like PLUS_EXPR which
+   generates the VLMAX instruction due to missed LEN information. The later
+   VSETVL PASS will elided the redundant vsetvls.
+*/
+
+static rtx
+get_autovectorize_preferred_avl (insn_info *insn)
+{
+  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p (insn->rtl ()))
+    return NULL_RTX;
+
+  rtx use_avl = NULL_RTX;
+  insn_info *avl_use_insn = nullptr;
+  unsigned int ratio
+    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul (insn->rtl ()));
+  for (def_info *def : insn->defs ())
+    {
+      auto set = safe_dyn_cast<set_info *> (def);
+      if (!set || !set->is_reg ())
+ return NULL_RTX;
+      for (use_info *use : set->all_uses ())
+ {
+   if (!use->is_in_nondebug_insn ())
+     return NULL_RTX;
+   insn_info *use_insn = use->insn ();
+   /* FIXME: Stop AVL propagation if any USE is not a RVV real
+      instruction. It should be totally enough for vectorized codes since
+      they always locate at extended blocks.
+
+      TODO: We can extend PHI checking for intrinsic codes if it
+      necessary in the future.  */
+   if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!has_vl_op (use_insn->rtl ()))
+     continue;
+
+   rtx new_use_avl = get_avl (use_insn, true);
+   if (!new_use_avl)
+     return NULL_RTX;
+   if (!use_avl)
+     use_avl = new_use_avl;
+   if (!rtx_equal_p (use_avl, new_use_avl)
+       || calculate_ratio (get_sew (use_insn->rtl ()),
+   get_vlmul (use_insn->rtl ()))
+    != ratio
+       || vlmax_avl_p (new_use_avl)
+       || !tail_agnostic_p (use_insn->rtl ()))
+     return NULL_RTX;
+   if (!avl_use_insn)
+     avl_use_insn = use_insn;
+ }
+    }
+
+  if (use_avl && register_operand (use_avl, Pmode))
+    {
+      gcc_assert (avl_use_insn);
+      // Find a definition at or neighboring INSN.
+      resource_info resource = full_register (REGNO (use_avl));
+      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
+      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
+      if (dl1.matching_set () || dl2.matching_set ())
+ return NULL_RTX;
+      def_info *def1 = dl1.last_def_of_prev_group ();
+      def_info *def2 = dl2.last_def_of_prev_group ();
+      if (def1 != def2)
+ return NULL_RTX;
+      /* FIXME: We only all AVL propation within a block which should
+ be totally enough for vectorized codes.
+
+ TODO: We can enhance it here for intrinsic codes in the future
+ if it is necessary.  */
+      if (def1->insn ()->bb () != insn->bb ()
+   || def1->insn ()->compare_with (insn) >= 0)
+ return NULL_RTX;
+    }
+  return use_avl;
+}
+
+/* If we have a preferred AVL to propagate, return the AVL.
+   Otherwise, return NULL_RTX as we don't need have any preferred
+   AVL.  */
+
+static rtx
+get_preferred_avl (insn_info *insn)
+{
+  /* TODO: We only do AVL propagation for missed-LEN partial
+     autovectorization for now.  We could add more more AVL
+     propagation for intrinsic codes in the future.  */
+  return get_autovectorize_preferred_avl (insn);
+}
+
+/* Return the AVL TYPE operand index.  */
+static int
+get_avl_type_index (insn_info *insn)
+{
+  extract_insn_cached (insn->rtl ());
+  /* Except rounding mode patterns, AVL TYPE operand
+     is always the last operand.  */
+  if (find_access (insn->uses (), VXRM_REGNUM)
+      || find_access (insn->uses (), FRM_REGNUM))
+    return recog_data.n_operands - 2;
+  return recog_data.n_operands - 1;
+}
+
+/* Main entry point for this pass.  */
+unsigned int
+pass_avlprop::execute (function *)
+{
+  avlprop_init ();
+
+  /* Go through all the instructions looking for AVL that we could propagate. */
+
+  insn_info *next;
+  bool change_p = true;
+
+  while (change_p)
+    {
+      /* Iterate on each instruction until no more change need.  */
+      change_p = false;
+      for (insn_info *insn = crtl->ssa->first_insn (); insn; insn = next)
+ {
+   next = insn->next_any_insn ();
+   /* We only forward AVL to the instruction that has AVL/VL operand
+      and can be optimized in RTL_SSA level.  */
+   if (!insn->can_be_optimized () || !has_vl_op (insn->rtl ()))
+     continue;
+
+   rtx new_avl = get_preferred_avl (insn);
+   if (new_avl)
+     {
+       gcc_assert (!vlmax_avl_p (new_avl));
+       auto &update = avlprops->get_or_insert (insn);
+       change_p = !rtx_equal_p (update, new_avl);
+       update = new_avl;
+     }
+ }
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "\nNumber of successful AVL propagations: %d\n\n",
+      (int) avlprops->elements ());
+
+  for (const auto iter : *avlprops)
+    {
+      rtx_insn *rinsn = iter.first->rtl ();
+      if (dump_file)
+ {
+   fprintf (dump_file, "\nPropagating AVL: ");
+   print_rtl_single (dump_file, iter.second);
+   fprintf (dump_file, "into: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+      /* Replace AVL operand.  */
+      rtx new_pat
+ = simplify_replace_rtx (PATTERN (rinsn), get_avl (iter.first, false),
+ iter.second);
+      validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
+
+      /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
+      if (vlmax_avl_type_p (rinsn))
+ validate_change_or_fail (
+   rinsn, recog_data.operand_loc[get_avl_type_index (iter.first)],
+   get_avl_type_rtx (avl_type::NONVLMAX), false);
+      if (dump_file)
+ {
+   fprintf (dump_file, "Successfully to match this instruction: ");
+   print_rtl_single (dump_file, rinsn);
+ }
+    }
+
+  avlprop_done ();
+  return 0;
+}
+
+rtl_opt_pass *
+make_pass_avlprop (gcc::context *ctxt)
+{
+  return new pass_avlprop (ctxt);
+}
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
index 4084122cf0a..b6260939d5c 100644
--- a/gcc/config/riscv/riscv-passes.def
+++ b/gcc/config/riscv/riscv-passes.def
@@ -18,4 +18,5 @@
    <http://www.gnu.org/licenses/>.  */
INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
+INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6cb9d459ee9..2b09ec9ea9e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -156,6 +156,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
/* Routines implemented in riscv-string.c.  */
@@ -559,6 +560,15 @@ bool cmp_lmul_le_one (machine_mode);
bool cmp_lmul_gt_one (machine_mode);
bool gather_scatter_valid_offset_mode_p (machine_mode);
bool vls_mode_valid_p (machine_mode);
+bool has_vtype_op (rtx_insn *);
+bool has_vl_op (rtx_insn *);
+bool tail_agnostic_p (rtx_insn *);
+void validate_change_or_fail (rtx, rtx *, rtx, bool);
+bool vlmax_avl_type_p (rtx_insn *);
+bool vlmax_avl_p (rtx);
+uint8_t get_sew (rtx_insn *);
+enum vlmul_type get_vlmul (rtx_insn *);
+bool const_vlmax_p (machine_mode);
}
/* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e39a9507803..473622ac321 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -56,7 +56,7 @@ using namespace riscv_vector;
namespace riscv_vector {
/* Return true if vlmax is constant value and can be used in vsetivl.  */
-static bool
+bool
const_vlmax_p (machine_mode mode)
{
   poly_uint64 nuints = GET_MODE_NUNITS (mode);
@@ -298,14 +298,6 @@ public:
      len = force_reg (Pmode, len);
    vls_p = true;
  }
- else if (const_vlmax_p (vtype_mode))
-   {
-     /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
-        the vsetvli to obtain the value of vlmax.  */
-     poly_uint64 nunits = GET_MODE_NUNITS (vtype_mode);
-     len = gen_int_mode (nunits, Pmode);
-     vls_p = true;
-   }
else if (can_create_pseudo_p ())
  {
    len = gen_reg_rtx (Pmode);
@@ -4435,4 +4427,78 @@ expand_popcount (rtx *ops)
   emit_move_insn (dst, x4);
}
+/* Return true if it is an RVV instruction depends on VTYPE global
+   status register.  */
+bool
+has_vtype_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
+}
+
+/* Return true if it is an RVV instruction depends on VL global
+   status register.  */
+bool
+has_vl_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
+}
+
+/* Get default tail policy.  */
+static bool
+get_default_ta ()
+{
+  /* For the instruction that doesn't require TA, we still need a default value
+     to emit vsetvl. We pick up the default value according to prefer policy. */
+  return (bool) (get_prefer_tail_policy () & 0x1
+ || (get_prefer_tail_policy () >> 1 & 0x1));
+}
+
+/* Helper function to get TA operand.  */
+bool
+tail_agnostic_p (rtx_insn *rinsn)
+{
+  /* If it doesn't have TA, we return agnostic by default.  */
+  extract_insn_cached (rinsn);
+  int ta = get_attr_ta (rinsn);
+  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
+}
+
+/* Change insn and Assert the change always happens.  */
+void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
+
+/* Return true if it is VLMAX AVL TYPE.  */
+bool
+vlmax_avl_type_p (rtx_insn *rinsn)
+{
+  return get_attr_avl_type (rinsn) == VLMAX;
+}
+
+/* Return true if RTX is RVV VLMAX AVL.  */
+bool
+vlmax_avl_p (rtx x)
+{
+  return x && rtx_equal_p (x, RVV_VLMAX);
+}
+
+/* Helper function to get SEW operand. We always have SEW value for
+   all RVV instructions that have VTYPE OP.  */
+uint8_t
+get_sew (rtx_insn *rinsn)
+{
+  return get_attr_sew (rinsn);
+}
+
+/* Helper function to get VLMUL operand. We always have VLMUL value for
+   all RVV instructions that have VTYPE OP. */
+enum vlmul_type
+get_vlmul (rtx_insn *rinsn)
+{
+  return (enum vlmul_type) get_attr_vlmul (rinsn);
+}
+
} // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index e9dd669de98..f2f19e423bf 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -255,28 +255,6 @@ policy_to_str (bool agnostic_p)
   return agnostic_p ? "agnostic" : "undisturbed";
}
-static bool
-vlmax_avl_p (rtx x)
-{
-  return x && rtx_equal_p (x, RVV_VLMAX);
-}
-
-/* Return true if it is an RVV instruction depends on VTYPE global
-   status register.  */
-static bool
-has_vtype_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn);
-}
-
-/* Return true if it is an RVV instruction depends on VL global
-   status register.  */
-static bool
-has_vl_op (rtx_insn *rinsn)
-{
-  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
-}
-
/* Return true if the instruction ignores VLMUL field of VTYPE.  */
static bool
ignore_vlmul_insn_p (rtx_insn *rinsn)
@@ -365,36 +343,22 @@ get_avl (rtx_insn *rinsn)
   if (!has_vl_op (rinsn))
     return NULL_RTX;
-  if (get_attr_avl_type (rinsn) == VLMAX)
-    return RVV_VLMAX;
-  extract_insn_cached (rinsn);
-  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
-}
-/* Helper function to get SEW operand. We always have SEW value for
-   all RVV instructions that have VTYPE OP.  */
-static uint8_t
-get_sew (rtx_insn *rinsn)
-{
-  return get_attr_sew (rinsn);
-}
-
-/* Helper function to get VLMUL operand. We always have VLMUL value for
-   all RVV instructions that have VTYPE OP. */
-static enum vlmul_type
-get_vlmul (rtx_insn *rinsn)
-{
-  return (enum vlmul_type) get_attr_vlmul (rinsn);
-}
+  extract_insn_cached (rinsn);
+  if (vlmax_avl_type_p (rinsn))
+    {
+      if (BYTES_PER_RISCV_VECTOR.is_constant ())
+ {
+   for (int i = 0; i < recog_data.n_operands; i++)
+     if (GET_MODE_CLASS (recog_data.operand_mode[i]) == MODE_VECTOR_BOOL
+ && const_vlmax_p (recog_data.operand_mode[i]))
+       return gen_int_mode (GET_MODE_NUNITS (recog_data.operand_mode[i]),
+    Pmode);
+ }
+      return RVV_VLMAX;
+    }
-/* Get default tail policy.  */
-static bool
-get_default_ta ()
-{
-  /* For the instruction that doesn't require TA, we still need a default value
-     to emit vsetvl. We pick up the default value according to prefer policy. */
-  return (bool) (get_prefer_tail_policy () & 0x1
- || (get_prefer_tail_policy () >> 1 & 0x1));
+  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
}
/* Get default mask policy.  */
@@ -407,16 +371,6 @@ get_default_ma ()
|| (get_prefer_mask_policy () >> 1 & 0x1));
}
-/* Helper function to get TA operand.  */
-static bool
-tail_agnostic_p (rtx_insn *rinsn)
-{
-  /* If it doesn't have TA, we return agnostic by default.  */
-  extract_insn_cached (rinsn);
-  int ta = get_attr_ta (rinsn);
-  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
-}
-
/* Helper function to get MA operand.  */
static bool
mask_agnostic_p (rtx_insn *rinsn)
@@ -696,14 +650,6 @@ has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
   return true;
}
-/* Change insn and Assert the change always happens.  */
-static void
-validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
-{
-  bool change_p = validate_change (object, loc, new_rtx, in_group);
-  gcc_assert (change_p);
-}
-
/* This flags indicates the minimum demand of the vl and vtype values by the
    RVV instruction. For example, DEMAND_RATIO_P indicates that this RVV
    instruction only needs the SEW/LMUL ratio to remain the same, and does not
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index dd17056fe82..08de62853a6 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -69,6 +69,12 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/riscv/riscv-vsetvl.cc
+riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
+  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h
+ $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+ $(srcdir)/config/riscv/riscv-avlprop.cc
+
riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
   $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ef91950178f..0c59d1b90bc 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -809,7 +809,7 @@
  V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
  V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
  V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
-    (symbol_ref "riscv_vector::NONVLMAX")
+    (symbol_ref "riscv_vector::VLS")
(eq_attr "type" "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
  vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
  vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
index 928a507a363..5278e4aa38f 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
@@ -39,7 +39,7 @@ void foo2 (int16_t *__restrict a,
     }
}
-/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler {e16,m2} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
index a50265fc1ec..1db2e073846 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
@@ -10,7 +10,7 @@ foo (int32_t *__restrict a, int16_t *__restrict b, int n)
     a[i] = a[i] + b[i];
}
-/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler {e16,m4} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
index eac7cbc757b..ca88d42cdf4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-2.c
@@ -7,10 +7,11 @@
/*
** foo:
** vsetivli\t[a-x0-9]+,\s*8,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
-** vsetvli\t[a-x0-9]+,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
-** add\t[a-x0-9]+,[a-x0-9]+,[a-x0-9]+
+** vsetvli\tzero,\s*[a-x0-9]+,\s*e(8?|16?|32?|64),\s*m(1?|2?|4?|8?|f2?|f4?|f8),\s*t[au],\s*m[au]
+** ...
** vle32\.v\tv[0-9]+,0\([a-x0-9]+\)
** ...
*/
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
index 965365da4bb..13367423751 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c
@@ -3,7 +3,6 @@
#include "ternop-2.c"
-/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
/* { dg-final { scan-assembler-times {\tvfma[c-d][c-d]\.vv} 9 } } */
/* { dg-final { scan-tree-dump-times "COND_LEN_FMA" 9 "optimized" } } */
/* { dg-final { scan-assembler-not {\tvmv} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
new file mode 100644
index 00000000000..b0d21650c3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
new file mode 100644
index 00000000000..f2d8aa54b88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/avlprop/pr111888-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3" } */
+
+void
+foo (int *__restrict a, int *__restrict b, int *__restrict c,
+     int *__restrict a2, int *__restrict b2, int *__restrict c2,
+     int *__restrict a3, int *__restrict b3, int *__restrict c3,
+     int *__restrict a4, int *__restrict b4, int *__restrict c4,
+     int *__restrict a5, int *__restrict b5, int *__restrict c5,
+     int *__restrict d, int *__restrict d2, int *__restrict d3,
+     int *__restrict d4, int *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
+/* { dg-final { scan-assembler-times {vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*[a-x0-9]+,\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetvli\s*zero} } } */
+/* { dg-final { scan-assembler-not {vsetivli\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 674ba0d72b4..fc830f2cd4d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -48,6 +48,8 @@ gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
"" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \
"-O3 -ftree-vectorize" $CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/avlprop/*.\[cS\]]] \
+ "-O3 -ftree-vectorize -fno-vect-cost-model" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vls/*.\[cS\]]] \
"-O3 -ftree-vectorize --param riscv-autovec-preference=scalable" $CFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/struct/*.\[cS\]]] \
--
2.36.3
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-26  7:48                   ` juzhe.zhong
@ 2023-10-26  7:50                     ` Robin Dapp
  0 siblings, 0 replies; 13+ messages in thread
From: Robin Dapp @ 2023-10-26  7:50 UTC (permalink / raw)
  To: juzhe.zhong, pan2.li, Patrick O'Neill, gcc-patches
  Cc: rdapp.gcc, kito.cheng, Kito.cheng, jeffreyalaw


> Increased FAILS are LMUL = M4. I have analyzed the codegen. Looks
> reasonable.
> 
> Moreover, When I removed 'popcount_64' and test, all passed no matter
> apply this patch or not.
> 
> I think it is because popcount64 is buggy in RV32, this patch trigger
> LMUL = 4 bug already existed that we were lucky.
> 
> So I suggest this patch should go ahead and ignore popcount issue for
> now. (I will send V3 with fixing dump FAILs).
> 
> I am not familiar  with popcount, Robin. Any suggestions?
Yeah, agree.  popcount_64 might be wrong and it's unlikely that your
patch causes it.  Will have a look.

Regards
 Robin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization
  2023-10-24  3:32 [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization Juzhe-Zhong
  2023-10-24  3:44 ` juzhe.zhong
@ 2023-10-24 15:26 ` Kito Cheng
  1 sibling, 0 replies; 13+ messages in thread
From: Kito Cheng @ 2023-10-24 15:26 UTC (permalink / raw)
  To: Juzhe-Zhong; +Cc: gcc-patches, kito.cheng, jeffreyalaw, rdapp.gcc

> +using namespace rtl_ssa;
> +using namespace riscv_vector;
> +
> +/* The AVL propagation instructions and corresponding preferred AVL.
> +   It will be updated during the analysis.  */
> +static hash_map<insn_info *, rtx> *avlprops;

Maybe put into member data of pass_avlprop?

> +
> +const pass_data pass_data_avlprop = {
> +  RTL_PASS,     /* type */
> +  "avlprop",    /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE,      /* tv_id */
> +  0,            /* properties_required */
> +  0,            /* properties_provided */
> +  0,            /* properties_destroyed */
> +  0,            /* todo_flags_start */
> +  0,            /* todo_flags_finish */
> +};
> +
> +class pass_avlprop : public rtl_opt_pass
> +{
> +public:
> +  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass (pass_data_avlprop, ctxt) {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) final override
> +  {
> +    return TARGET_VECTOR && optimize > 0;
> +  }
> +  virtual unsigned int execute (function *) final override;
> +}; // class pass_avlprop
> +
> +static void
> +avlprop_init (void)

Maybe put into member function of pass_avlprop?

> +{
> +  calculate_dominance_info (CDI_DOMINATORS);
> +  df_analyze ();
> +  crtl->ssa = new function_info (cfun);

And take function * from incomping parameter of execute

> +  avlprops = new hash_map<insn_info *, rtx>;
> +}
> +
> +static void
> +avlprop_done (void)
> +{
> +  free_dominance_info (CDI_DOMINATORS);
> +  if (crtl->ssa->perform_pending_updates ())
> +    cleanup_cfg (0);
> +  delete crtl->ssa;
> +  crtl->ssa = nullptr;
> +  delete avlprops;
> +  avlprops = NULL;
> +}
> +
> +/* Helper function to get AVL operand.  */
> +static rtx
> +get_avl (insn_info *insn, bool avlprop_p)
> +{
> +  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
> +      || get_attr_avl_type (insn->rtl ()) == VLS)
> +    return NULL_RTX;
> +  if (avlprop_p)
> +    {
> +      if (avlprops->get (insn))
> +       return (*avlprops->get (insn));
> +      else if (vlmax_avl_type_p (insn->rtl ()))
> +       return RVV_VLMAX;

I guess I didn't get why we need handle vlmax_avl_type_p here?

> +    }
> +  extract_insn_cached (insn->rtl ());
> +  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
> +}
> +
> +/* This is a straight forward pattern ALWAYS in paritial auto-vectorization:
> +
> +     VL = SELECT_AVL (AVL, ...)
> +     V0 = MASK_LEN_LOAD (..., VL)
> +     V1 = MASK_LEN_LOAD (..., VL)
> +     V2 = V0 + V1 --- Missed LEN information.
> +     MASK_LEN_STORE (..., V2, VL)
> +
> +   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1, dummy LEN)
> +   because:
> +
> +     - Few code changes in Loop Vectorizer.
> +     - Reuse the current clean flow of partial vectorization, That is, apply
> +       predicate LEN or MASK into LOAD/STORE operations and other special
> +       arithmetic operations (e.d. DIV), then do the whole vector register
> +       operation if it DON'T affect the correctness.
> +       Such flow is used by all other targets like x86, sve, s390, ... etc.
> +     - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
> +
> +   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like PLUS_EXPR which
> +   generates the VLMAX instruction due to missed LEN information. The later
> +   VSETVL PASS will elided the redundant vsetvls.
> +*/
> +
> +static rtx
> +get_autovectorize_preferred_avl (insn_info *insn)
> +{
> +  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p (insn->rtl ()))
> +    return NULL_RTX;

I would prefer adding new attribute to let this become simpler.

> +
> +  rtx use_avl = NULL_RTX;
> +  insn_info *avl_use_insn = nullptr;
> +  unsigned int ratio
> +    = calculate_ratio (get_sew (insn->rtl ()), get_vlmul (insn->rtl ()));
> +  for (def_info *def : insn->defs ())
> +    {
> +      auto set = safe_dyn_cast<set_info *> (def);
> +      if (!set || !set->is_reg ())
> +       return NULL_RTX;
> +      for (use_info *use : set->all_uses ())
> +       {
> +         if (!use->is_in_nondebug_insn ())
> +           return NULL_RTX;
> +         insn_info *use_insn = use->insn ();
> +         /* FIXME: Stop AVL propagation if any USE is not a RVV real
> +            instruction. It should be totally enough for vectorized codes since
> +            they always locate at extended blocks.
> +
> +            TODO: We can extend PHI checking for intrinsic codes if it
> +            necessary in the future.  */
> +         if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl ()))
> +           return NULL_RTX;
> +         if (!has_vl_op (use_insn->rtl ()))
> +           continue;
> +
> +         rtx new_use_avl = get_avl (use_insn, true);
> +         if (!new_use_avl)
> +           return NULL_RTX;
> +         if (!use_avl)
> +           use_avl = new_use_avl;
> +         if (!rtx_equal_p (use_avl, new_use_avl)
> +             || calculate_ratio (get_sew (use_insn->rtl ()),
> +                                 get_vlmul (use_insn->rtl ()))
> +                  != ratio
> +             || vlmax_avl_p (new_use_avl)
> +             || !tail_agnostic_p (use_insn->rtl ()))
> +           return NULL_RTX;
> +         if (!avl_use_insn)
> +           avl_use_insn = use_insn;
> +       }
> +    }
> +
> +  if (use_avl && register_operand (use_avl, Pmode))
> +    {
> +      gcc_assert (avl_use_insn);
> +      // Find a definition at or neighboring INSN.
> +      resource_info resource = full_register (REGNO (use_avl));
> +      def_lookup dl1 = crtl->ssa->find_def (resource, insn);
> +      def_lookup dl2 = crtl->ssa->find_def (resource, avl_use_insn);
> +      if (dl1.matching_set () || dl2.matching_set ())
> +       return NULL_RTX;
> +      def_info *def1 = dl1.last_def_of_prev_group ();
> +      def_info *def2 = dl2.last_def_of_prev_group ();
> +      if (def1 != def2)
> +       return NULL_RTX;
> +      /* FIXME: We only all AVL propation within a block which should
> +        be totally enough for vectorized codes.
> +
> +        TODO: We can enhance it here for intrinsic codes in the future
> +        if it is necessary.  */
> +      if (def1->insn ()->bb () != insn->bb ()
> +         || def1->insn ()->compare_with (insn) >= 0)
> +       return NULL_RTX;
> +    }
> +  return use_avl;
> +}
> +
> +/* If we have a preferred AVL to propagate, return the AVL.
> +   Otherwise, return NULL_RTX as we don't need have any preferred
> +   AVL.  */
> +
> +static rtx
> +get_preferred_avl (insn_info *insn)
> +{
> +  /* TODO: We only do AVL propagation for missed-LEN partial
> +     autovectorization for now.  We could add more more AVL
> +     propagation for intrinsic codes in the future.  */
> +  return get_autovectorize_preferred_avl (insn);
> +}
> +
> +/* Return the AVL TYPE operand index.  */
> +static int
> +get_avl_type_index (insn_info *insn)
> +{
> +  extract_insn_cached (insn->rtl ());
> +  /* Except rounding mode patterns, AVL TYPE operand
> +     is always the last operand.  */
> +  if (find_access (insn->uses (), VXRM_REGNUM)
> +      || find_access (insn->uses (), FRM_REGNUM))
> +    return recog_data.n_operands - 2;
> +  return recog_data.n_operands - 1;

Could we add some attribute like `vl_op_idx`? maintain this magic here
is not good idea IMO.

> +}
> +
> +/* Main entry point for this pass.  */
> +unsigned int
> +pass_avlprop::execute (function *)
> +{
> +  avlprop_init ();
> +
> +  /* Go through all the instructions looking for AVL that we could propagate. */
> +
> +  insn_info *next;
> +  bool change_p = true;
> +
> +  while (change_p)
> +    {
> +      /* Iterate on each instruction until no more change need.  */
> +      change_p = false;
> +      for (insn_info *insn = crtl->ssa->first_insn (); insn; insn = next)

Backward should converge faster, also I suggest add a pre-scan pass to
collect all candidate, and then iterate those candidate only.

Maybe something like this:

for each insn in reverse order:
   if insn is candidate:
      put insn to candidate list

while (change_p)
{
  for each insn in candidate:
    ...
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-10-26  7:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-24  3:32 [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization Juzhe-Zhong
2023-10-24  3:44 ` juzhe.zhong
2023-10-24  4:30   ` Patrick O'Neill
2023-10-24 15:03     ` Patrick O'Neill
2023-10-25 12:20       ` juzhe.zhong
2023-10-26  0:37         ` Patrick O'Neill
2023-10-26  0:49           ` juzhe.zhong
2023-10-26  1:22             ` Patrick O'Neill
2023-10-26  1:27               ` juzhe.zhong
2023-10-26  7:33                 ` Li, Pan2
2023-10-26  7:48                   ` juzhe.zhong
2023-10-26  7:50                     ` Robin Dapp
2023-10-24 15:26 ` Kito Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).