public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: juzhe.zhong@rivai.ai
To: gcc-patches@gcc.gnu.org
Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, palmer@dabbelt.com,
	palmer@rivosinc.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com,
	Juzhe-Zhong <juzhe.zhong@rivai.ai>
Subject: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization
Date: Tue, 23 May 2023 14:08:04 +0800	[thread overview]
Message-ID: <20230523060804.61556-1-juzhe.zhong@rivai.ai> (raw)

From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

This patch is to refactor the framework of RVV auto-vectorization.
Since we find out are keep adding helpers && wrappers when implementing auto-vectorization.
It will make the RVV auto-vectorizaiton very messy.

After double check my downstream RVV GCC, assemble all auto-vectorization patterns we are
going to have. Base on these informations, I refactor the RVV framework to make it is
easier and flexible for future use.

For example, we will definitely implement len_mask_load/len_mask_store patterns which
have both length && mask operand and use undefine merge operand.

len_cond_div or cond_div will have length or mask operand and use a real merge operand
instead of undefine merge operand.

Also, we will have some patterns will use tail undisturbed and mask any.

etc..... We will defintely have various features.

Base on these circumstances, we add these following private members:
  
  int m_op_num;
  /* It't true when the pattern has a dest operand. Most of the patterns have
     dest operand wheras some patterns like STOREs does not have dest operand.
  */
  bool m_has_dest_p;
  /* It't true if the pattern uses all trues mask operand.  */
  bool m_use_all_trues_mask_p;
  /* It's true if the pattern uses undefined merge operand.  */
  bool m_use_undef_merge_p;
  bool m_has_avl_p;
  bool m_vlmax_p;
  bool m_has_tail_policy_p;
  bool m_has_mask_policy_p;
  enum tail_policy m_tail_policy;
  enum mask_policy m_mask_policy;
  machine_mode m_dest_mode;
  machine_mode m_mask_mode;

These variables I believe can cover all potential situations.

And the instruction generater wrapper is "emit_insn" which will add operands and
emit instruction according to the variables I mentioned above.

After this is done. We will easily add helpers without changing any base class "insn_expand".

Currently, we have "emit_vlmax_tany_many" and "emit_nonvlmax_tany_many".

For example, when we want to emit a binary operations:
We have 
#define RVV_BINOP_NUM 3 (number including the output)

Then just use emit_vlmax_tany_many (...RVV_BINOP_NUM...)

So, if we support ternary operation in the future. It's quite simple:
#define RVV_TERNOP_NUM 4 (number including the output)
emit_vlmax_tany_many (...RVV_BINOP_NUM...)

"*_tany_many" means we are using tail any and mask any.

We will definitely need tail undisturbed or mask undisturbed when we support these patterns
in middle-end. It's very simple to extend such helper base on current framework:

we can do that in the future like this:

void
emit_nonvlmax_tu_mu (unsigned icode, int op_num, rtx *ops)
{
  machine_mode data_mode = GET_MODE (ops[0]);
  machine_mode mask_mode = get_mask_mode (data_mode).require ();
  /* The number = 11 is because we have maximum 11 operands for
     RVV instruction patterns according to vector.md.  */
  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
		       /*USE_ALL_TRUES_MASK_P*/ true,
		       /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true,
		       /*VLMAX_P*/ false,
		       /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true,
		       /*TAIL_POLICY*/ TAIL_UNDISTURBED, /*MASK_POLICY*/ MASK_UNDISTURBED,
		       /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
  e.emit_insn ((enum insn_code) icode, ops);
}

That's enough (I have tested it fully in my downstream RVV GCC).
I didn't add it in this patch.

Thanks.

gcc/ChangeLog:

        * config/riscv/autovec.md: Refactor the framework of RVV auto-vectorization.
        * config/riscv/riscv-protos.h (RVV_MISC_OP_NUM): Ditto.
        (RVV_UNOP_NUM): New macro.
        (RVV_BINOP_NUM): Ditto.
        (legitimize_move): Refactor the framework of RVV auto-vectorization.
        (emit_vlmax_op): Ditto.
        (emit_vlmax_reg_op): Ditto.
        (emit_len_op): Ditto.
        (emit_len_binop): Ditto.
        (emit_vlmax_tany_many): Ditto.
        (emit_nonvlmax_tany_many): Ditto.
        (sew64_scalar_helper): Ditto.
        (expand_tuple_move): Ditto.
        * config/riscv/riscv-v.cc (emit_pred_op): Ditto.
        (emit_pred_binop): Ditto.
        (emit_vlmax_op): Ditto.
        (emit_vlmax_tany_many): New function.
        (emit_len_op): Remove.
        (emit_nonvlmax_tany_many): New function.
        (emit_vlmax_reg_op): Remove.
        (emit_len_binop): Ditto.
        (emit_index_op): Ditto.
        (expand_vec_series): Refactor the framework of RVV auto-vectorization.
        (expand_const_vector): Ditto.
        (legitimize_move): Ditto.
        (sew64_scalar_helper): Ditto.
        (expand_tuple_move): Ditto.
        (expand_vector_init_insert_elems): Ditto.
        * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.
        * config/riscv/vector.md: Ditto.

---
 gcc/config/riscv/autovec.md     |  40 ++--
 gcc/config/riscv/riscv-protos.h |  16 +-
 gcc/config/riscv/riscv-v.cc     | 341 +++++++++++++++++---------------
 gcc/config/riscv/riscv.cc       |   8 +-
 gcc/config/riscv/vector.md      |  40 +---
 5 files changed, 216 insertions(+), 229 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ce0b46537ad..24405b869fa 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -31,8 +31,8 @@
    (match_operand 3 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::emit_len_op (code_for_pred_mov (<MODE>mode), operands[0],
-			     operands[1], operands[2], <VM>mode);
+  riscv_vector::emit_nonvlmax_tany_many (code_for_pred_mov (<MODE>mode),
+  					 RVV_UNOP_NUM, operands);
   DONE;
 })
 
@@ -43,8 +43,8 @@
    (match_operand 3 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::emit_len_op (code_for_pred_mov (<MODE>mode), operands[0],
-			     operands[1], operands[2], <VM>mode);
+  riscv_vector::emit_nonvlmax_tany_many (code_for_pred_mov (<MODE>mode),
+  					 RVV_UNOP_NUM, operands);
   DONE;
 })
 
@@ -118,21 +118,8 @@
      (match_operand:VI 2 "<binop_rhs2_predicate>")))]
   "TARGET_VECTOR"
 {
-  if (!register_operand (operands[2], <MODE>mode))
-    {
-      rtx cst;
-      gcc_assert (const_vec_duplicate_p(operands[2], &cst));
-      riscv_vector::emit_len_binop (code_for_pred_scalar
-				    (<CODE>, <MODE>mode),
-				    operands[0], operands[1], cst,
-				    NULL, <VM>mode,
-				    <VEL>mode);
-    }
-  else
-    riscv_vector::emit_len_binop (code_for_pred
-				  (<CODE>, <MODE>mode),
-				  operands[0], operands[1], operands[2],
-				  NULL, <VM>mode);
+  riscv_vector::emit_vlmax_tany_many (code_for_pred (<CODE>, <MODE>mode),
+				      RVV_BINOP_NUM, operands);
   DONE;
 })
 
@@ -151,12 +138,9 @@
      (match_operand:<VEL> 2 "csr_operand")))]
   "TARGET_VECTOR"
 {
-  if (!CONST_SCALAR_INT_P (operands[2]))
-      operands[2] = gen_lowpart (Pmode, operands[2]);
-  riscv_vector::emit_len_binop (code_for_pred_scalar
-				(<CODE>, <MODE>mode),
-				operands[0], operands[1], operands[2],
-				NULL_RTX, <VM>mode, Pmode);
+  operands[2] = gen_lowpart (Pmode, operands[2]);
+  riscv_vector::emit_vlmax_tany_many (code_for_pred_scalar (<CODE>, <MODE>mode),
+				      RVV_BINOP_NUM, operands);
   DONE;
 })
 
@@ -174,9 +158,7 @@
      (match_operand:VI 2 "vector_shift_operand")))]
   "TARGET_VECTOR"
 {
-  riscv_vector::emit_len_binop (code_for_pred
-				(<CODE>, <MODE>mode),
-				operands[0], operands[1], operands[2],
-				NULL_RTX, <VM>mode);
+  riscv_vector::emit_vlmax_tany_many (code_for_pred (<CODE>, <MODE>mode),
+				      RVV_BINOP_NUM, operands);
   DONE;
 })
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 12634d0ac1a..ba6d56517d3 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -132,6 +132,9 @@ namespace riscv_vector {
 #define RVV_VUNDEF(MODE)                                                       \
   gen_rtx_UNSPEC (MODE, gen_rtvec (1, gen_rtx_REG (SImode, X0_REGNUM)),        \
 		  UNSPEC_VUNDEF)
+#define RVV_MISC_OP_NUM 1
+#define RVV_UNOP_NUM 2
+#define RVV_BINOP_NUM 3
 enum vlmul_type
 {
   LMUL_1 = 0,
@@ -163,14 +166,11 @@ rtx expand_builtin (unsigned int, tree, rtx);
 bool check_builtin_call (location_t, vec<location_t>, unsigned int,
 			   tree, unsigned int, tree *);
 bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
-bool legitimize_move (rtx, rtx, machine_mode);
+bool legitimize_move (rtx, rtx);
 void emit_vlmax_vsetvl (machine_mode, rtx);
 void emit_hard_vlmax_vsetvl (machine_mode, rtx);
-void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);
-void emit_vlmax_reg_op (unsigned, rtx, rtx, rtx, machine_mode);
-void emit_len_op (unsigned, rtx, rtx, rtx, machine_mode);
-void emit_len_binop (unsigned, rtx, rtx, rtx, rtx, machine_mode,
-		     machine_mode = VOIDmode);
+void emit_vlmax_tany_many (unsigned, int, rtx *);
+void emit_nonvlmax_tany_many (unsigned, int, rtx *);
 enum vlmul_type get_vlmul (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
@@ -202,7 +202,7 @@ bool neg_simm5_p (rtx);
 #ifdef RTX_CODE
 bool has_vi_variant_p (rtx_code, rtx);
 #endif
-bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, machine_mode,
+bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
 			  bool, void (*)(rtx *, rtx));
 rtx gen_scalar_move_mask (machine_mode);
 
@@ -218,7 +218,7 @@ enum vlen_enum
 bool slide1_sew64_helper (int, machine_mode, machine_mode,
 			  machine_mode, rtx *);
 rtx gen_avl_for_scalar_move (rtx);
-void expand_tuple_move (machine_mode, rtx *);
+void expand_tuple_move (rtx *);
 machine_mode preferred_simd_mode (scalar_mode);
 opt_machine_mode get_mask_mode (machine_mode);
 void expand_vec_series (rtx, rtx, rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e0b19bc1754..980928c8aff 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -66,7 +66,29 @@ const_vlmax_p (machine_mode mode)
 template <int MAX_OPERANDS> class insn_expander
 {
 public:
-  insn_expander () : m_opno (0), m_has_dest_p(false) {}
+  insn_expander ()
+    : m_opno (0), m_op_num (0), m_has_dest_p (false),
+      m_use_all_trues_mask_p (false), m_use_undef_merge_p (false),
+      m_has_avl_p (false), m_vlmax_p (false), m_has_tail_policy_p (false),
+      m_has_mask_policy_p (false), m_tail_policy (TAIL_ANY),
+      m_mask_policy (MASK_ANY), m_dest_mode (VOIDmode), m_mask_mode (VOIDmode)
+  {}
+
+  /* Initializer for various configurations.  */
+  insn_expander (int op_num, bool has_dest_p, bool use_all_trues_mask_p,
+		 bool use_undef_merge_p, bool has_avl_p, bool vlmax_p,
+		 bool has_tail_policy_p, bool has_mask_policy_p,
+		 enum tail_policy tail_policy, enum mask_policy mask_policy,
+		 machine_mode dest_mode, machine_mode mask_mode)
+    : m_opno (0), m_op_num (op_num), m_has_dest_p (has_dest_p),
+      m_use_all_trues_mask_p (use_all_trues_mask_p),
+      m_use_undef_merge_p (use_undef_merge_p), m_has_avl_p (has_avl_p),
+      m_vlmax_p (vlmax_p), m_has_tail_policy_p (has_tail_policy_p),
+      m_has_mask_policy_p (has_mask_policy_p), m_tail_policy (tail_policy),
+      m_mask_policy (mask_policy), m_dest_mode (dest_mode),
+      m_mask_mode (mask_mode)
+  {}
+
   void add_output_operand (rtx x, machine_mode mode)
   {
     create_output_operand (&m_ops[m_opno++], x, mode);
@@ -77,67 +99,94 @@ public:
     create_input_operand (&m_ops[m_opno++], x, mode);
     gcc_assert (m_opno <= MAX_OPERANDS);
   }
-  void add_all_one_mask_operand (machine_mode mode)
+  void add_all_one_mask_operand ()
   {
-    add_input_operand (CONSTM1_RTX (mode), mode);
+    add_input_operand (CONSTM1_RTX (m_mask_mode), m_mask_mode);
   }
-  void add_vundef_operand (machine_mode mode)
+  void add_vundef_operand ()
   {
-    add_input_operand (RVV_VUNDEF (mode), mode);
+    add_input_operand (RVV_VUNDEF (m_dest_mode), m_dest_mode);
   }
-  void add_policy_operand (enum tail_policy vta, enum mask_policy vma)
+  void add_policy_operand ()
   {
-    rtx tail_policy_rtx = gen_int_mode (vta, Pmode);
-    rtx mask_policy_rtx = gen_int_mode (vma, Pmode);
-    add_input_operand (tail_policy_rtx, Pmode);
-    add_input_operand (mask_policy_rtx, Pmode);
+    if (m_has_tail_policy_p)
+      {
+	rtx tail_policy_rtx = gen_int_mode (m_tail_policy, Pmode);
+	add_input_operand (tail_policy_rtx, Pmode);
+      }
+    if (m_has_mask_policy_p)
+      {
+	rtx mask_policy_rtx = gen_int_mode (m_mask_policy, Pmode);
+	add_input_operand (mask_policy_rtx, Pmode);
+      }
   }
   void add_avl_type_operand (avl_type type)
   {
     add_input_operand (gen_int_mode (type, Pmode), Pmode);
   }
 
-  void set_dest_and_mask (rtx mask, rtx dest, machine_mode mask_mode)
+  void emit_insn (enum insn_code icode, rtx *ops)
   {
-    m_dest_mode = GET_MODE (dest);
-    m_has_dest_p = true;
-
-    add_output_operand (dest, m_dest_mode);
-
-    if (mask)
-      add_input_operand (mask, GET_MODE (mask));
-    else
-      add_all_one_mask_operand (mask_mode);
-
-    add_vundef_operand (m_dest_mode);
-  }
+    int opno = 0;
+    /* It's true if any operand is memory operand.  */
+    bool any_mem_p = false;
+    /* It's true if all operands are mask operand.  */
+    bool all_mask_p = true;
+    if (m_has_dest_p)
+      {
+	any_mem_p |= MEM_P (ops[opno]);
+	all_mask_p &= GET_MODE_CLASS (GET_MODE (ops[opno])) == MODE_VECTOR_BOOL;
+	add_output_operand (ops[opno++], m_dest_mode);
+      }
 
-  void set_len_and_policy (rtx len, bool force_vlmax = false)
-    {
-      bool vlmax_p = force_vlmax || !len;
-      gcc_assert (m_has_dest_p);
+    if (m_use_all_trues_mask_p)
+      add_all_one_mask_operand ();
 
-      if (vlmax_p && const_vlmax_p (m_dest_mode))
-	{
-	  /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of the
-	     vsetvli to obtain the value of vlmax.  */
-	  poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
-	  len = gen_int_mode (nunits, Pmode);
-	  vlmax_p = false; /* It has became NONVLMAX now.  */
-	}
-      else if (!len)
-	{
-	  len = gen_reg_rtx (Pmode);
-	  emit_vlmax_vsetvl (m_dest_mode, len);
-	}
+    if (m_use_undef_merge_p)
+      add_vundef_operand ();
 
-      add_input_operand (len, Pmode);
+    for (; opno < m_op_num; opno++)
+      {
+	any_mem_p |= MEM_P (ops[opno]);
+	all_mask_p &= GET_MODE_CLASS (GET_MODE (ops[opno])) == MODE_VECTOR_BOOL;
+	machine_mode mode = insn_data[(int) icode].operand[m_opno].mode;
+	/* 'create_input_operand doesn't allow VOIDmode.
+	   According to vector.md, we may have some patterns that do not have
+	   explicit machine mode specifying the operand. Such operands are
+	   always Pmode.  */
+	if (mode == VOIDmode)
+	  mode = Pmode;
+	add_input_operand (ops[opno], mode);
+      }
 
-      if (GET_MODE_CLASS (m_dest_mode) != MODE_VECTOR_BOOL)
-	add_policy_operand (get_prefer_tail_policy (), get_prefer_mask_policy ());
+    if (m_has_avl_p)
+      {
+	rtx len = ops[m_op_num];
+	if (m_vlmax_p)
+	  {
+	    if (const_vlmax_p (m_dest_mode))
+	      {
+		/* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
+		   the vsetvli to obtain the value of vlmax.  */
+		poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
+		len = gen_int_mode (nunits, Pmode);
+		m_vlmax_p = false; /* It has became NONVLMAX now.  */
+	      }
+	    else if (can_create_pseudo_p ())
+	      {
+		len = gen_reg_rtx (Pmode);
+		emit_vlmax_vsetvl (m_dest_mode, len);
+	      }
+	  }
+	add_input_operand (len, Pmode);
+      }
 
-      add_avl_type_operand (vlmax_p ? avl_type::VLMAX : avl_type::NONVLMAX);
-    }
+    if (!all_mask_p)
+      add_policy_operand ();
+    if (m_has_avl_p)
+      add_avl_type_operand (m_vlmax_p ? avl_type::VLMAX : avl_type::NONVLMAX);
+    expand (icode, any_mem_p);
+  }
 
   void expand (enum insn_code icode, bool temporary_volatile_p = false)
   {
@@ -152,8 +201,23 @@ public:
 
 private:
   int m_opno;
+  int m_op_num;
+  /* It't true when the pattern has a dest operand. Most of the patterns have
+     dest operand wheras some patterns like STOREs does not have dest operand.
+  */
   bool m_has_dest_p;
+  /* It't true if the pattern uses all trues mask operand.  */
+  bool m_use_all_trues_mask_p;
+  /* It's true if the pattern uses undefined merge operand.  */
+  bool m_use_undef_merge_p;
+  bool m_has_avl_p;
+  bool m_vlmax_p;
+  bool m_has_tail_policy_p;
+  bool m_has_mask_policy_p;
+  enum tail_policy m_tail_policy;
+  enum mask_policy m_mask_policy;
   machine_mode m_dest_mode;
+  machine_mode m_mask_mode;
   expand_operand m_ops[MAX_OPERANDS];
 };
 
@@ -246,49 +310,6 @@ autovec_use_vlmax_p (void)
 	  || riscv_autovec_preference == RVV_FIXED_VLMAX);
 }
 
-/* Emit an RVV unmask && vl mov from SRC to DEST.  */
-static void
-emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
-	      machine_mode mask_mode, bool force_vlmax = false)
-{
-  insn_expander<8> e;
-  e.set_dest_and_mask (mask, dest, mask_mode);
-
-  e.add_input_operand (src, GET_MODE (src));
-
-  e.set_len_and_policy (len, force_vlmax);
-
-  e.expand ((enum insn_code) icode, MEM_P (dest) || MEM_P (src));
-}
-
-/* Emit an RVV binop.  If one of SRC1 and SRC2 is a scalar operand, its mode is
-   specified using SCALAR_MODE.  */
-static void
-emit_pred_binop (unsigned icode, rtx mask, rtx dest, rtx src1, rtx src2,
-		 rtx len, machine_mode mask_mode,
-		 machine_mode scalar_mode = VOIDmode)
-{
-  insn_expander<9> e;
-  e.set_dest_and_mask (mask, dest, mask_mode);
-
-  gcc_assert (VECTOR_MODE_P (GET_MODE (src1))
-	      || VECTOR_MODE_P (GET_MODE (src2)));
-
-  if (VECTOR_MODE_P (GET_MODE (src1)))
-    e.add_input_operand (src1, GET_MODE (src1));
-  else
-    e.add_input_operand (src1, scalar_mode);
-
-  if (VECTOR_MODE_P (GET_MODE (src2)))
-    e.add_input_operand (src2, GET_MODE (src2));
-  else
-    e.add_input_operand (src2, scalar_mode);
-
-  e.set_len_and_policy (len);
-
-  e.expand ((enum insn_code) icode, MEM_P (dest) || MEM_P (src1) || MEM_P (src2));
-}
-
 /* The RISC-V vsetvli pass uses "known vlmax" operations for optimization.
    Whether or not an instruction actually is a vlmax operation is not
    recognizable from the length operand alone but the avl_type operand
@@ -305,52 +326,42 @@ emit_pred_binop (unsigned icode, rtx mask, rtx dest, rtx src1, rtx src2,
     For that case we also allow to set the avl_type to VLMAX.
 */
 
-/* This function emits a VLMAX vsetvli followed by the actual operation.  */
+/* This function emits a {VLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
+ * actual operation.  */
 void
-emit_vlmax_op (unsigned icode, rtx dest, rtx src, machine_mode mask_mode)
+emit_vlmax_tany_many (unsigned icode, int op_num, rtx *ops)
 {
-  emit_pred_op (icode, NULL_RTX, dest, src, NULL_RTX, mask_mode);
+  machine_mode data_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (data_mode).require ();
+  /* The number = 11 is because we have maximum 11 operands for
+     RVV instruction patterns according to vector.md.  */
+  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
+		       /*USE_ALL_TRUES_MASK_P*/ true,
+		       /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true,
+		       /*VLMAX_P*/ true,
+		       /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true,
+		       /*TAIL_POLICY*/ TAIL_ANY, /*MASK_POLICY*/ MASK_ANY,
+		       /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
+  e.emit_insn ((enum insn_code) icode, ops);
 }
 
-/* This function emits an operation with a given LEN that is determined
-   by a previously emitted VLMAX vsetvli.  */
+/* This function emits a {NONVLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
+ * actual operation.  */
 void
-emit_len_op (unsigned icode, rtx dest, rtx src, rtx len,
-	     machine_mode mask_mode)
+emit_nonvlmax_tany_many (unsigned icode, int op_num, rtx *ops)
 {
-  emit_pred_op (icode, NULL_RTX, dest, src, len, mask_mode);
-}
-
-/* This function emits an operation with a given LEN that is known to be
-   a preceding VLMAX.  It also sets the VLMAX flag which allows further
-   optimization in the vsetvli pass.  */
-void
-emit_vlmax_reg_op (unsigned icode, rtx dest, rtx src, rtx len,
-		   machine_mode mask_mode)
-{
-  emit_pred_op (icode, NULL_RTX, dest, src, len, mask_mode,
-		/* Force VLMAX */ true);
-}
-
-void
-emit_len_binop (unsigned icode, rtx dest, rtx src1, rtx src2, rtx len,
-		machine_mode mask_mode, machine_mode scalar_mode)
-{
-  emit_pred_binop (icode, NULL_RTX, dest, src1, src2, len,
-		   mask_mode, scalar_mode);
-}
-
-/* Emit vid.v instruction.  */
-
-static void
-emit_index_op (rtx dest, machine_mode mask_mode)
-{
-  insn_expander<7> e;
-  e.set_dest_and_mask (NULL, dest, mask_mode);
-
-  e.set_len_and_policy (NULL, true);
-
-  e.expand (code_for_pred_series (GET_MODE (dest)), false);
+  machine_mode data_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (data_mode).require ();
+  /* The number = 11 is because we have maximum 11 operands for
+     RVV instruction patterns according to vector.md.  */
+  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
+		       /*USE_ALL_TRUES_MASK_P*/ true,
+		       /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true,
+		       /*VLMAX_P*/ false,
+		       /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true,
+		       /*TAIL_POLICY*/ TAIL_ANY, /*MASK_POLICY*/ MASK_ANY,
+		       /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
+  e.emit_insn ((enum insn_code) icode, ops);
 }
 
 /* Expand series const vector.  */
@@ -359,7 +370,6 @@ void
 expand_vec_series (rtx dest, rtx base, rtx step)
 {
   machine_mode mode = GET_MODE (dest);
-  machine_mode inner_mode = GET_MODE_INNER (mode);
   machine_mode mask_mode;
   gcc_assert (get_mask_mode (mode).exists (&mask_mode));
 
@@ -367,7 +377,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
 
   /* Step 1: Generate I = { 0, 1, 2, ... } by vid.v.  */
   rtx vid = gen_reg_rtx (mode);
-  emit_index_op (vid, mask_mode);
+  rtx op[1] = {vid};
+  emit_vlmax_tany_many (code_for_pred_series (mode), RVV_MISC_OP_NUM, op);
 
   /* Step 2: Generate I * STEP.
      - STEP is 1, we don't emit any instructions.
@@ -385,14 +396,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
 	  int shift = exact_log2 (INTVAL (step));
 	  rtx shift_amount = gen_int_mode (shift, Pmode);
 	  insn_code icode = code_for_pred_scalar (ASHIFT, mode);
-	  emit_len_binop (icode, step_adj, vid, shift_amount,
-			  NULL, mask_mode, Pmode);
+	  rtx ops[3] = {step_adj, vid, shift_amount};
+	  emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops);
 	}
       else
 	{
 	  insn_code icode = code_for_pred_scalar (MULT, mode);
-	  emit_len_binop (icode, step_adj, vid, step,
-			  NULL, mask_mode, inner_mode);
+	  rtx ops[3] = {step_adj, vid, step};
+	  emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops);
 	}
     }
 
@@ -407,14 +418,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
     {
       rtx result = gen_reg_rtx (mode);
       insn_code icode = code_for_pred_scalar (PLUS, mode);
-      emit_len_binop (icode, result, step_adj, base,
-			   NULL, mask_mode, inner_mode);
+      rtx ops[3] = {result, step_adj, base};
+      emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops);
       emit_move_insn (dest, result);
     }
 }
 
 static void
-expand_const_vector (rtx target, rtx src, machine_mode mask_mode)
+expand_const_vector (rtx target, rtx src)
 {
   machine_mode mode = GET_MODE (target);
   scalar_mode elt_mode = GET_MODE_INNER (mode);
@@ -424,7 +435,8 @@ expand_const_vector (rtx target, rtx src, machine_mode mask_mode)
       gcc_assert (
 	const_vec_duplicate_p (src, &elt)
 	&& (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
-      emit_vlmax_op (code_for_pred_mov (mode), target, src, mask_mode);
+      rtx ops[2] = {target, src};
+      emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops);
       return;
     }
 
@@ -435,10 +447,16 @@ expand_const_vector (rtx target, rtx src, machine_mode mask_mode)
       /* Element in range -16 ~ 15 integer or 0.0 floating-point,
 	 we use vmv.v.i instruction.  */
       if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
-	emit_vlmax_op (code_for_pred_mov (mode), tmp, src, mask_mode);
+	{
+	  rtx ops[2] = {tmp, src};
+	  emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops);
+	}
       else
-	emit_vlmax_op (code_for_pred_broadcast (mode), tmp,
-		       force_reg (elt_mode, elt), mask_mode);
+	{
+	  elt = force_reg (elt_mode, elt);
+	  rtx ops[2] = {tmp, elt};
+	  emit_vlmax_tany_many (code_for_pred_broadcast (mode), RVV_UNOP_NUM, ops);
+	}
 
       if (tmp != target)
 	emit_move_insn (target, tmp);
@@ -463,12 +481,12 @@ expand_const_vector (rtx target, rtx src, machine_mode mask_mode)
 /* Expand a pre-RA RVV data move from SRC to DEST.
    It expands move for RVV fractional vector modes.  */
 bool
-legitimize_move (rtx dest, rtx src, machine_mode mask_mode)
+legitimize_move (rtx dest, rtx src)
 {
   machine_mode mode = GET_MODE (dest);
   if (CONST_VECTOR_P (src))
     {
-      expand_const_vector (dest, src, mask_mode);
+      expand_const_vector (dest, src);
       return true;
     }
 
@@ -505,7 +523,10 @@ legitimize_move (rtx dest, rtx src, machine_mode mask_mode)
     {
       rtx tmp = gen_reg_rtx (mode);
       if (MEM_P (src))
-	emit_vlmax_op (code_for_pred_mov (mode), tmp, src, mask_mode);
+	{
+	  rtx ops[2] = {tmp, src};
+	  emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops);
+	}
       else
 	emit_move_insn (tmp, src);
       src = tmp;
@@ -514,7 +535,8 @@ legitimize_move (rtx dest, rtx src, machine_mode mask_mode)
   if (satisfies_constraint_vu (src))
     return false;
 
-  emit_vlmax_op (code_for_pred_mov (mode), dest, src, mask_mode);
+  rtx ops[2] = {dest, src};
+  emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops);
   return true;
 }
 
@@ -748,8 +770,7 @@ has_vi_variant_p (rtx_code code, rtx x)
 
 bool
 sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx vl,
-		     machine_mode vector_mode, machine_mode mask_mode,
-		     bool has_vi_variant_p,
+		     machine_mode vector_mode, bool has_vi_variant_p,
 		     void (*emit_vector_func) (rtx *, rtx))
 {
   machine_mode scalar_mode = GET_MODE_INNER (vector_mode);
@@ -779,8 +800,9 @@ sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx vl,
     *scalar_op = force_reg (scalar_mode, *scalar_op);
 
   rtx tmp = gen_reg_rtx (vector_mode);
-  riscv_vector::emit_len_op (code_for_pred_broadcast (vector_mode), tmp,
-			     *scalar_op, vl, mask_mode);
+  rtx ops[3] = {tmp, *scalar_op, vl};
+  riscv_vector::emit_nonvlmax_tany_many (code_for_pred_broadcast (vector_mode),
+					 RVV_UNOP_NUM, ops);
   emit_vector_func (operands, tmp);
 
   return true;
@@ -990,7 +1012,7 @@ gen_avl_for_scalar_move (rtx avl)
 
 /* Expand tuple modes data movement for.  */
 void
-expand_tuple_move (machine_mode mask_mode, rtx *ops)
+expand_tuple_move (rtx *ops)
 {
   unsigned int i;
   machine_mode tuple_mode = GET_MODE (ops[0]);
@@ -1086,8 +1108,11 @@ expand_tuple_move (machine_mode mask_mode, rtx *ops)
 	      rtx mem = gen_rtx_MEM (subpart_mode, ops[3]);
 
 	      if (fractional_p)
-		emit_vlmax_reg_op (code_for_pred_mov (subpart_mode), subreg, mem,
-			       ops[4], mask_mode);
+		{
+		  rtx operands[3] = {subreg, mem, ops[4]};
+		  emit_vlmax_tany_many (code_for_pred_mov (subpart_mode),
+					RVV_UNOP_NUM, operands);
+		}
 	      else
 		emit_move_insn (subreg, mem);
 	    }
@@ -1108,8 +1133,11 @@ expand_tuple_move (machine_mode mask_mode, rtx *ops)
 	      rtx mem = gen_rtx_MEM (subpart_mode, ops[3]);
 
 	      if (fractional_p)
-		emit_vlmax_reg_op (code_for_pred_mov (subpart_mode), mem, subreg,
-			       ops[4], mask_mode);
+		{
+		  rtx operands[3] = {mem, subreg, ops[4]};
+		  emit_vlmax_tany_many (code_for_pred_mov (subpart_mode),
+					RVV_UNOP_NUM, operands);
+		}
 	      else
 		emit_move_insn (mem, subreg);
 	    }
@@ -1230,7 +1258,6 @@ expand_vector_init_insert_elems (rtx target, const rvv_builder &builder,
 				 int nelts_reqd)
 {
   machine_mode mode = GET_MODE (target);
-  scalar_mode elem_mode = GET_MODE_INNER (mode);
   machine_mode mask_mode;
   gcc_assert (get_mask_mode (mode).exists (&mask_mode));
   rtx dup = expand_vector_broadcast (mode, builder.elt (0));
@@ -1241,8 +1268,8 @@ expand_vector_init_insert_elems (rtx target, const rvv_builder &builder,
       unsigned int unspec
 	= FLOAT_MODE_P (mode) ? UNSPEC_VFSLIDE1DOWN : UNSPEC_VSLIDE1DOWN;
       insn_code icode = code_for_pred_slide (unspec, mode);
-      emit_len_binop (icode, target, target, builder.elt (i), NULL, mask_mode,
-		      elem_mode);
+      rtx ops[3] = {target, target, builder.elt (i)};
+      emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops);
     }
 }
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5ac187c1b1b..109483c8b1c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7389,9 +7389,6 @@ vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 	{
 	  rtx target = regno_reg_rtx[regno];
 	  machine_mode mode = GET_MODE (target);
-	  poly_uint16 nunits = GET_MODE_NUNITS (mode);
-	  machine_mode mask_mode
-	    = riscv_vector::get_vector_mode (BImode, nunits).require ();
 
 	  if (!emitted_vlmax_vsetvl)
 	    {
@@ -7399,8 +7396,9 @@ vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 	      emitted_vlmax_vsetvl = true;
 	    }
 
-	  riscv_vector::emit_vlmax_reg_op (code_for_pred_mov (mode), target,
-					   CONST0_RTX (mode), vl, mask_mode);
+	  rtx ops[3] = {target, CONST0_RTX (mode), vl};
+	  riscv_vector::emit_vlmax_tany_many (code_for_pred_mov (mode),
+					      RVV_UNOP_NUM, ops);
 
 	  SET_HARD_REG_BIT (zeroed_hardregs, regno);
 	}
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index e8bb7c5dec1..b6663973ba1 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -662,7 +662,7 @@
 	 before spilling. The clobber scratch is used by spilling fractional
 	 registers in IRA/LRA so it's too early.  */
 
-  if (riscv_vector::legitimize_move (operands[0], operands[1], <VM>mode))
+  if (riscv_vector::legitimize_move (operands[0], operands[1]))
     DONE;
 })
 
@@ -718,7 +718,7 @@
 	(match_operand:VB 1 "general_operand"))]
   "TARGET_VECTOR"
 {
-  if (riscv_vector::legitimize_move (operands[0], operands[1], <MODE>mode))
+  if (riscv_vector::legitimize_move (operands[0], operands[1]))
     DONE;
 })
 
@@ -760,9 +760,8 @@
   else
     {
       riscv_vector::emit_vlmax_vsetvl (<V_FRACT:MODE>mode, operands[2]);
-      riscv_vector::emit_vlmax_reg_op (code_for_pred_mov (<V_FRACT:MODE>mode),
-				       operands[0], operands[1], operands[2],
-				       <VM>mode);
+      riscv_vector::emit_vlmax_tany_many (code_for_pred_mov (<V_FRACT:MODE>mode),
+					  RVV_UNOP_NUM, operands);
     }
   DONE;
 })
@@ -781,9 +780,8 @@
   else
     {
       riscv_vector::emit_vlmax_vsetvl (<VB:MODE>mode, operands[2]);
-      riscv_vector::emit_vlmax_reg_op (code_for_pred_mov (<VB:MODE>mode),
-				       operands[0], operands[1], operands[2],
-				       <VB:MODE>mode);
+      riscv_vector::emit_vlmax_tany_many (code_for_pred_mov (<VB:MODE>mode),
+					  RVV_UNOP_NUM, operands);
     }
   DONE;
 })
@@ -806,7 +804,7 @@
 
     if (GET_CODE (operands[1]) == CONST_VECTOR)
       {
-        riscv_vector::expand_tuple_move (<VM>mode, operands);
+        riscv_vector::expand_tuple_move (operands);
         DONE;
       }
 
@@ -826,7 +824,7 @@
   "&& reload_completed"
   [(const_int 0)]
   {
-    riscv_vector::expand_tuple_move (<VM>mode, operands);
+    riscv_vector::expand_tuple_move (operands);
     DONE;
   }
   [(set_attr "type" "vmov,vlde,vste")
@@ -846,8 +844,8 @@
 	  (match_operand:<VEL> 1 "direct_broadcast_operand")))]
   "TARGET_VECTOR"
   {
-    riscv_vector::emit_vlmax_op (code_for_pred_broadcast (<MODE>mode),
-				 operands[0], operands[1], <VM>mode);
+    riscv_vector::emit_vlmax_tany_many (code_for_pred_broadcast (<MODE>mode),
+					RVV_UNOP_NUM, operands);
     DONE;
   }
 )
@@ -1272,7 +1270,6 @@
 	/* scalar op */&operands[3],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::simm5_p (operands[3]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_merge<mode> (operands[0], operands[1],
@@ -1983,7 +1980,6 @@
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (<CODE>, operands[4]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_<optab><mode> (operands[0], operands[1],
@@ -2059,7 +2055,6 @@
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (<CODE>, operands[4]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_<optab><mode> (operands[0], operands[1],
@@ -2135,7 +2130,6 @@
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::neg_simm5_p (operands[4]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_sub<mode> (operands[0], operands[1],
@@ -2253,7 +2247,6 @@
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_mulh<v_su><mode> (operands[0], operands[1],
@@ -2428,7 +2421,6 @@
 	/* scalar op */&operands[3],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::simm5_p (operands[3]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_adc<mode> (operands[0], operands[1],
@@ -2512,7 +2504,6 @@
 	/* scalar op */&operands[3],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_sbc<mode> (operands[0], operands[1],
@@ -2671,7 +2662,6 @@
 	/* scalar op */&operands[2],
 	/* vl */operands[4],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::simm5_p (operands[2]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_madc<mode> (operands[0], operands[1],
@@ -2741,7 +2731,6 @@
 	/* scalar op */&operands[2],
 	/* vl */operands[4],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_msbc<mode> (operands[0], operands[1],
@@ -2884,7 +2873,6 @@
 	/* scalar op */&operands[2],
 	/* vl */operands[3],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::simm5_p (operands[2]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_madc<mode>_overflow (operands[0], operands[1],
@@ -2951,7 +2939,6 @@
 	/* scalar op */&operands[2],
 	/* vl */operands[3],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_msbc<mode>_overflow (operands[0], operands[1],
@@ -3449,7 +3436,6 @@
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (<CODE>, operands[4]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_<optab><mode> (operands[0], operands[1],
@@ -3531,7 +3517,6 @@
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (<CODE>, operands[4]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_<optab><mode> (operands[0], operands[1],
@@ -3681,7 +3666,6 @@
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_<sat_op><mode> (operands[0], operands[1],
@@ -4141,7 +4125,6 @@
 	/* scalar op */&operands[5],
 	/* vl */operands[6],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (code, operands[5]),
 	code == LT || code == LTU ?
 	  [] (rtx *operands, rtx boardcast_scalar) {
@@ -4181,7 +4164,6 @@
 	/* scalar op */&operands[5],
 	/* vl */operands[6],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (code, operands[5]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_cmp<mode> (operands[0], operands[1],
@@ -4880,7 +4862,6 @@
 	/* scalar op */&operands[2],
 	/* vl */operands[6],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_mul_plus<mode> (operands[0], operands[1],
@@ -5301,7 +5282,6 @@
 	/* scalar op */&operands[2],
 	/* vl */operands[6],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_minus_mul<mode> (operands[0], operands[1],
-- 
2.36.3


             reply	other threads:[~2023-05-23  6:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-23  6:08 juzhe.zhong [this message]
2023-05-23  8:06 ` Robin Dapp
2023-05-23  8:25   ` juzhe.zhong
2023-05-23  8:45     ` Kito Cheng
2023-05-23  9:00       ` juzhe.zhong
2023-05-23 12:14         ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230523060804.61556-1-juzhe.zhong@rivai.ai \
    --to=juzhe.zhong@rivai.ai \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=kito.cheng@gmail.com \
    --cc=kito.cheng@sifive.com \
    --cc=palmer@dabbelt.com \
    --cc=palmer@rivosinc.com \
    --cc=rdapp.gcc@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).