Thanks Robin. Address comment.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-05-22 16:07
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; palmer; jeffreyalaw; Richard Sandiford
Subject: Re: [PATCH] RISC-V: Add RVV comparison autovectorization
Hi Juzhe,
 
thanks.  Some remarks inline.
 
> +;; Integer (signed) vcond.  Don't enforce an immediate range here, since it
> +;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
> +(define_expand "vcond<V:mode><VI:mode>"
> +  [(set (match_operand:V 0 "register_operand")
> + (if_then_else:V
> +   (match_operator 3 "comparison_operator"
> +     [(match_operand:VI 4 "register_operand")
> +      (match_operand:VI 5 "nonmemory_operand")])
> +   (match_operand:V 1 "nonmemory_operand")
> +   (match_operand:V 2 "nonmemory_operand")))]
> +  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (<V:MODE>mode),
> +  GET_MODE_NUNITS (<VI:MODE>mode))"
> +  {
> +    riscv_vector::expand_vcond (<VI:MODE>mode, operands);
> +    DONE;
> +  }
> +)
> +
> +;; Integer vcondu.  Don't enforce an immediate range here, since it
> +;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
> +(define_expand "vcondu<V:mode><VI:mode>"
> +  [(set (match_operand:V 0 "register_operand")
> + (if_then_else:V
> +   (match_operator 3 "comparison_operator"
> +     [(match_operand:VI 4 "register_operand")
> +      (match_operand:VI 5 "nonmemory_operand")])
> +   (match_operand:V 1 "nonmemory_operand")
> +   (match_operand:V 2 "nonmemory_operand")))]
> +  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (<V:MODE>mode),
> +  GET_MODE_NUNITS (<VI:MODE>mode))"
> +  {
> +    riscv_vector::expand_vcond (<VI:MODE>mode, operands);
> +    DONE;
> +  }
> +)
 
These do exactly the same (as do their aarch64 heirs).  As you are a friend
of iterators usually I guess you didn't use one for clarity here?  Also, I
didn't see that we do much of immediate-range enforcement in expand_vcond.
 
> +
> +;; Floating-point vcond.  Don't enforce an immediate range here, since it
> +;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
> +(define_expand "vcond<V:mode><VF:mode>"
> +  [(set (match_operand:V 0 "register_operand")
> + (if_then_else:V
> +   (match_operator 3 "comparison_operator"
> +     [(match_operand:VF 4 "register_operand")
> +      (match_operand:VF 5 "nonmemory_operand")])
> +   (match_operand:V 1 "nonmemory_operand")
> +   (match_operand:V 2 "nonmemory_operand")))]
> +  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (<V:MODE>mode),
> +  GET_MODE_NUNITS (<VF:MODE>mode))"
> +  {
> +    riscv_vector::expand_vcond (<VF:MODE>mode, operands);
> +    DONE;
> +  }
> +)
 
It comes a bit as a surprise to add float comparisons before any other
float autovec patterns are in.  I'm not against it but would wait for
other comments here.  If the tests are source from aarch64 they have
been reviewed often enough that we can be fairly sure to do the right
thing though.  I haven't checked the expander and inversion things
closely now though.
 
> +
> +;; -------------------------------------------------------------------------
> +;; ---- [INT,FP] Select based on masks
> +;; -------------------------------------------------------------------------
> +;; Includes merging patterns for:
> +;; - vmerge.vv
> +;; - vmerge.vx
> +;; - vfmerge.vf
> +;; -------------------------------------------------------------------------
> +
> +(define_expand "vcond_mask_<mode><vm>"
> +  [(match_operand:V 0 "register_operand")
> +   (match_operand:<VM> 3 "register_operand")
> +   (match_operand:V 1 "nonmemory_operand")
> +   (match_operand:V 2 "register_operand")]
> +  "TARGET_VECTOR"
> +  {
> +    riscv_vector::emit_merge_op (operands[0], operands[2],
> +    operands[1], operands[3]);
> +    DONE;
> +  }
> +)
 
Order of operands is a bit surprising, see below.
 
> +  void add_fixed_operand (rtx x)
> +  {
> +    create_fixed_operand (&m_ops[m_opno++], x);
> +    gcc_assert (m_opno <= MAX_OPERANDS);
> +  }
> +  void add_integer_operand (rtx x)
> +  {
> +    create_integer_operand (&m_ops[m_opno++], INTVAL (x));
> +    gcc_assert (m_opno <= MAX_OPERANDS);
> +  }
>    void add_all_one_mask_operand (machine_mode mode)
>    {
>      add_input_operand (CONSTM1_RTX (mode), mode);
> @@ -85,11 +95,14 @@ public:
>    {
>      add_input_operand (RVV_VUNDEF (mode), mode);
>    }
> -  void add_policy_operand (enum tail_policy vta, enum mask_policy vma)
> +  void add_policy_operand (enum tail_policy vta)
>    {
>      rtx tail_policy_rtx = gen_int_mode (vta, Pmode);
> -    rtx mask_policy_rtx = gen_int_mode (vma, Pmode);
>      add_input_operand (tail_policy_rtx, Pmode);
> +  }
> +  void add_policy_operand (enum mask_policy vma)
> +  {
> +    rtx mask_policy_rtx = gen_int_mode (vma, Pmode);
>      add_input_operand (mask_policy_rtx, Pmode);
>    }
>    void add_avl_type_operand (avl_type type)
> @@ -97,7 +110,8 @@ public:
>      add_input_operand (gen_int_mode (type, Pmode), Pmode);
>    }
 
My idea would be to have the policy operands hidden a bit more as
in my last patch.  It comes down to a matter of taste.  We can discuss
once this is in and I rebased my suggestion.  
 
> -  void set_dest_and_mask (rtx mask, rtx dest, machine_mode mask_mode)
> +  void set_dest_and_mask (rtx mask, rtx dest, rtx maskoff,
> +   machine_mode mask_mode)
>    {
>      dest_mode = GET_MODE (dest);
>      has_dest = true;
> @@ -109,35 +123,73 @@ public:
>      else
>        add_all_one_mask_operand (mask_mode);
>  
> -    add_vundef_operand (dest_mode);
> +    if (maskoff)
> +      add_input_operand (maskoff, GET_MODE (maskoff));
> +    else
> +      add_vundef_operand (dest_mode);
> +  }
 
Please describe/comment what maskoff is.
 
> +
> +  bool set_len (rtx len, bool force_vlmax = false)
> +  {
> +    bool vlmax_p = force_vlmax || !len;
> +    gcc_assert (has_dest);
> +
> +    if (vlmax_p && const_vlmax_p (dest_mode))
> +      {
> + /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of the
> +    vsetvli to obtain the value of vlmax.  */
> + poly_uint64 nunits = GET_MODE_NUNITS (dest_mode);
> + len = gen_int_mode (nunits, Pmode);
> + vlmax_p = false; /* It has became NONVLMAX now.  */
> +      }
> +    else if (!len)
> +      {
> + len = gen_reg_rtx (Pmode);
> + emit_vlmax_vsetvl (dest_mode, len);
> +      }
> +
> +    add_input_operand (len, Pmode);
> +    return vlmax_p;
>    }
 
I feel the whole vlmax/non-vlmax/len thing gets more confusing by the day.
I don't have an immediate suggestion how to untagle it but we need to
think about this.  Should not block this patch but something to keep in
mind.
 
> +static void
> +emit_len_unop (unsigned icode, rtx dest, rtx src, rtx len,
> +        machine_mode mask_mode)
> +{
> +  emit_pred_unop (icode, NULL_RTX, dest, src, len, TAIL_ANY, MASK_ANY,
> +   mask_mode);
> +}
 
How does this differ from emit_len_op?  Do we need both?
 
> +/* Emit merge instruction.  */
> +
> +void
> +emit_merge_op (rtx dest, rtx src1, rtx src2, rtx mask)
> +{
> +  insn_expander<8> e;
> +  machine_mode mode = GET_MODE (dest);
> +  e.set_dest_merge (dest);
> +  e.add_input_operand (src1, mode);
> +  if (VECTOR_MODE_P (GET_MODE (src2)))
> +    e.add_input_operand (src2, mode);
> +  else
> +    e.add_input_operand (src2, GET_MODE_INNER (mode));
> +
> +  e.add_input_operand (mask, GET_MODE (mask));
> +  e.set_len_and_policy (NULL_RTX, TAIL_ANY, true);
> +  if (VECTOR_MODE_P (GET_MODE (src2)))
> +    e.expand (code_for_pred_merge (mode), false);
> +  else
> +    e.expand (code_for_pred_merge_scalar (mode), false);
> +}
 
See below for the treatment of SRC2.  At least add a comment
here that we canonicalize according to RVV intrinsics.
 
> +/* Expand an RVV vcond pattern with operands OPS.  DATA_MODE is the mode
> +   of the data being merged and CMP_MODE is the mode of the values being
> +   compared.  */
 
I see not data_mode as opposed to the aarch version ;)
 
> +void
> +expand_vcond (machine_mode cmp_mode, rtx *ops)
> +{
> +  machine_mode mask_mode = get_mask_mode (cmp_mode).require ();
> +  rtx mask = gen_reg_rtx (mask_mode);
> +  if (FLOAT_MODE_P (cmp_mode))
> +    {
> +      if (expand_vec_cmp_float (mask, GET_CODE (ops[3]), ops[4], ops[5], true))
> + std::swap (ops[1], ops[2]);
> +    }
> +  else
> +    expand_vec_cmp_int (mask, GET_CODE (ops[3]), ops[4], ops[5]);
> +
> +  if (!CONST_VECTOR_P (ops[1]))
> +    {
> +      rtx elt;
> +      if (const_vec_duplicate_p (ops[1], &elt))
> + ops[1] = elt;
> +    }
 
Can't we do
if (const_vect_duplicate_p (ops[1],...)
   ops[1] = elt;
here?
 
Besides, there also is unwrap_const_vec_duplicate.
 
> +  emit_merge_op (ops[0], ops[2], ops[1], mask);
> +}
 
I find it a bit odd to swap the operands here just because emit_merge_op
treats SRC2 differently.  Can't we swap the commutative operands there
and add a comment that we canonicalize merge to always have the scalar
as second source?
On top, what if we inverted ops[1] and ops[2] for float?  We just swap
back here?
 
> +  if (!insn_operand_matches ((enum insn_code) icode, e.opno () + 1, src1))
> +    src1 = force_reg (data_mode, src1);
> +  if (!insn_operand_matches ((enum insn_code) icode, e.opno () + 2, src2))
> +    {
> +      if (VECTOR_MODE_P (GET_MODE (src2)))
> + src2 = force_reg (data_mode, src2);
> +      else
> + src2 = force_reg (scalar_mode, src2);
> +    }
 
e.opno () + 1/2 might be a bit confusing.  We actually want to refer to both
source operands which are always at the same position except if we don't have
a merge operand?  Can we do something like
const int src1_opno = have_merge_op ? 3 : 4;?
 
> +/* Expand an RVV integer comparison using the RVV equivalent of:
> +
> +     (set TARGET (CODE OP0 OP1)).  */
> +
 
Should be float.  This function is called the same as the one without mask.
I'd suggest to either rename to _masked or comment the parameters.  
 
Regards
Robin