Re: [PATCH] RISC-V: Add RVV comparison autovectorization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Robin Dapp <rdapp.gcc@gmail.com>
To: juzhe.zhong@rivai.ai, gcc-patches@gcc.gnu.org
Cc: rdapp.gcc@gmail.com, kito.cheng@sifive.com, palmer@rivosinc.com,
	jeffreyalaw@gmail.com,
	Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH] RISC-V: Add RVV comparison autovectorization
Date: Mon, 22 May 2023 10:07:28 +0200	[thread overview]
Message-ID: <e5afcfb8-369d-f8ff-4022-308ee4929f04@gmail.com> (raw)
In-Reply-To: <20230520045447.3276232-1-juzhe.zhong@rivai.ai>

Hi Juzhe,

thanks.  Some remarks inline.

> +;; Integer (signed) vcond.  Don't enforce an immediate range here, since it
> +;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
> +(define_expand "vcond<V:mode><VI:mode>"
> +  [(set (match_operand:V 0 "register_operand")
> +	(if_then_else:V
> +	  (match_operator 3 "comparison_operator"
> +	    [(match_operand:VI 4 "register_operand")
> +	     (match_operand:VI 5 "nonmemory_operand")])
> +	  (match_operand:V 1 "nonmemory_operand")
> +	  (match_operand:V 2 "nonmemory_operand")))]
> +  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (<V:MODE>mode),
> +  		GET_MODE_NUNITS (<VI:MODE>mode))"
> +  {
> +    riscv_vector::expand_vcond (<VI:MODE>mode, operands);
> +    DONE;
> +  }
> +)
> +
> +;; Integer vcondu.  Don't enforce an immediate range here, since it
> +;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
> +(define_expand "vcondu<V:mode><VI:mode>"
> +  [(set (match_operand:V 0 "register_operand")
> +	(if_then_else:V
> +	  (match_operator 3 "comparison_operator"
> +	    [(match_operand:VI 4 "register_operand")
> +	     (match_operand:VI 5 "nonmemory_operand")])
> +	  (match_operand:V 1 "nonmemory_operand")
> +	  (match_operand:V 2 "nonmemory_operand")))]
> +  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (<V:MODE>mode),
> +  		GET_MODE_NUNITS (<VI:MODE>mode))"
> +  {
> +    riscv_vector::expand_vcond (<VI:MODE>mode, operands);
> +    DONE;
> +  }
> +)

These do exactly the same (as do their aarch64 heirs).  As you are a friend
of iterators usually I guess you didn't use one for clarity here?  Also, I
didn't see that we do much of immediate-range enforcement in expand_vcond.

> +
> +;; Floating-point vcond.  Don't enforce an immediate range here, since it
> +;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
> +(define_expand "vcond<V:mode><VF:mode>"
> +  [(set (match_operand:V 0 "register_operand")
> +	(if_then_else:V
> +	  (match_operator 3 "comparison_operator"
> +	    [(match_operand:VF 4 "register_operand")
> +	     (match_operand:VF 5 "nonmemory_operand")])
> +	  (match_operand:V 1 "nonmemory_operand")
> +	  (match_operand:V 2 "nonmemory_operand")))]
> +  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (<V:MODE>mode),
> +  		GET_MODE_NUNITS (<VF:MODE>mode))"
> +  {
> +    riscv_vector::expand_vcond (<VF:MODE>mode, operands);
> +    DONE;
> +  }
> +)

It comes a bit as a surprise to add float comparisons before any other
float autovec patterns are in.  I'm not against it but would wait for
other comments here.  If the tests are source from aarch64 they have
been reviewed often enough that we can be fairly sure to do the right
thing though.  I haven't checked the expander and inversion things
closely now though.

> +
> +;; -------------------------------------------------------------------------
> +;; ---- [INT,FP] Select based on masks
> +;; -------------------------------------------------------------------------
> +;; Includes merging patterns for:
> +;; - vmerge.vv
> +;; - vmerge.vx
> +;; - vfmerge.vf
> +;; -------------------------------------------------------------------------
> +
> +(define_expand "vcond_mask_<mode><vm>"
> +  [(match_operand:V 0 "register_operand")
> +   (match_operand:<VM> 3 "register_operand")
> +   (match_operand:V 1 "nonmemory_operand")
> +   (match_operand:V 2 "register_operand")]
> +  "TARGET_VECTOR"
> +  {
> +    riscv_vector::emit_merge_op (operands[0], operands[2],
> +    				 operands[1], operands[3]);
> +    DONE;
> +  }
> +)

Order of operands is a bit surprising, see below.

> +  void add_fixed_operand (rtx x)
> +  {
> +    create_fixed_operand (&m_ops[m_opno++], x);
> +    gcc_assert (m_opno <= MAX_OPERANDS);
> +  }
> +  void add_integer_operand (rtx x)
> +  {
> +    create_integer_operand (&m_ops[m_opno++], INTVAL (x));
> +    gcc_assert (m_opno <= MAX_OPERANDS);
> +  }
>    void add_all_one_mask_operand (machine_mode mode)
>    {
>      add_input_operand (CONSTM1_RTX (mode), mode);
> @@ -85,11 +95,14 @@ public:
>    {
>      add_input_operand (RVV_VUNDEF (mode), mode);
>    }
> -  void add_policy_operand (enum tail_policy vta, enum mask_policy vma)
> +  void add_policy_operand (enum tail_policy vta)
>    {
>      rtx tail_policy_rtx = gen_int_mode (vta, Pmode);
> -    rtx mask_policy_rtx = gen_int_mode (vma, Pmode);
>      add_input_operand (tail_policy_rtx, Pmode);
> +  }
> +  void add_policy_operand (enum mask_policy vma)
> +  {
> +    rtx mask_policy_rtx = gen_int_mode (vma, Pmode);
>      add_input_operand (mask_policy_rtx, Pmode);
>    }
>    void add_avl_type_operand (avl_type type)
> @@ -97,7 +110,8 @@ public:
>      add_input_operand (gen_int_mode (type, Pmode), Pmode);
>    }

My idea would be to have the policy operands hidden a bit more as
in my last patch.  It comes down to a matter of taste.  We can discuss
once this is in and I rebased my suggestion.  

> -  void set_dest_and_mask (rtx mask, rtx dest, machine_mode mask_mode)
> +  void set_dest_and_mask (rtx mask, rtx dest, rtx maskoff,
> +			  machine_mode mask_mode)
>    {
>      dest_mode = GET_MODE (dest);
>      has_dest = true;
> @@ -109,35 +123,73 @@ public:
>      else
>        add_all_one_mask_operand (mask_mode);
>  
> -    add_vundef_operand (dest_mode);
> +    if (maskoff)
> +      add_input_operand (maskoff, GET_MODE (maskoff));
> +    else
> +      add_vundef_operand (dest_mode);
> +  }

Please describe/comment what maskoff is.

> +
> +  bool set_len (rtx len, bool force_vlmax = false)
> +  {
> +    bool vlmax_p = force_vlmax || !len;
> +    gcc_assert (has_dest);
> +
> +    if (vlmax_p && const_vlmax_p (dest_mode))
> +      {
> +	/* Optimize VLS-VLMAX code gen, we can use vsetivli instead of the
> +	   vsetvli to obtain the value of vlmax.  */
> +	poly_uint64 nunits = GET_MODE_NUNITS (dest_mode);
> +	len = gen_int_mode (nunits, Pmode);
> +	vlmax_p = false; /* It has became NONVLMAX now.  */
> +      }
> +    else if (!len)
> +      {
> +	len = gen_reg_rtx (Pmode);
> +	emit_vlmax_vsetvl (dest_mode, len);
> +      }
> +
> +    add_input_operand (len, Pmode);
> +    return vlmax_p;
>    }

I feel the whole vlmax/non-vlmax/len thing gets more confusing by the day.
I don't have an immediate suggestion how to untagle it but we need to
think about this.  Should not block this patch but something to keep in
mind.

> +static void
> +emit_len_unop (unsigned icode, rtx dest, rtx src, rtx len,
> +	       machine_mode mask_mode)
> +{
> +  emit_pred_unop (icode, NULL_RTX, dest, src, len, TAIL_ANY, MASK_ANY,
> +		  mask_mode);
> +}

How does this differ from emit_len_op?  Do we need both?

> +/* Emit merge instruction.  */
> +
> +void
> +emit_merge_op (rtx dest, rtx src1, rtx src2, rtx mask)
> +{
> +  insn_expander<8> e;
> +  machine_mode mode = GET_MODE (dest);
> +  e.set_dest_merge (dest);
> +  e.add_input_operand (src1, mode);
> +  if (VECTOR_MODE_P (GET_MODE (src2)))
> +    e.add_input_operand (src2, mode);
> +  else
> +    e.add_input_operand (src2, GET_MODE_INNER (mode));
> +
> +  e.add_input_operand (mask, GET_MODE (mask));
> +  e.set_len_and_policy (NULL_RTX, TAIL_ANY, true);
> +  if (VECTOR_MODE_P (GET_MODE (src2)))
> +    e.expand (code_for_pred_merge (mode), false);
> +  else
> +    e.expand (code_for_pred_merge_scalar (mode), false);
> +}

See below for the treatment of SRC2.  At least add a comment
here that we canonicalize according to RVV intrinsics.

> +/* Expand an RVV vcond pattern with operands OPS.  DATA_MODE is the mode
> +   of the data being merged and CMP_MODE is the mode of the values being
> +   compared.  */

I see not data_mode as opposed to the aarch version ;)

> +void
> +expand_vcond (machine_mode cmp_mode, rtx *ops)
> +{
> +  machine_mode mask_mode = get_mask_mode (cmp_mode).require ();
> +  rtx mask = gen_reg_rtx (mask_mode);
> +  if (FLOAT_MODE_P (cmp_mode))
> +    {
> +      if (expand_vec_cmp_float (mask, GET_CODE (ops[3]), ops[4], ops[5], true))
> +	std::swap (ops[1], ops[2]);
> +    }
> +  else
> +    expand_vec_cmp_int (mask, GET_CODE (ops[3]), ops[4], ops[5]);
> +
> +  if (!CONST_VECTOR_P (ops[1]))
> +    {
> +      rtx elt;
> +      if (const_vec_duplicate_p (ops[1], &elt))
> +	ops[1] = elt;
> +    }

Can't we do
 if (const_vect_duplicate_p (ops[1],...)
   ops[1] = elt;
here?

Besides, there also is unwrap_const_vec_duplicate.

> +  emit_merge_op (ops[0], ops[2], ops[1], mask);
> +}

I find it a bit odd to swap the operands here just because emit_merge_op
treats SRC2 differently.  Can't we swap the commutative operands there
and add a comment that we canonicalize merge to always have the scalar
as second source?
On top, what if we inverted ops[1] and ops[2] for float?  We just swap
back here?

> +  if (!insn_operand_matches ((enum insn_code) icode, e.opno () + 1, src1))
> +    src1 = force_reg (data_mode, src1);
> +  if (!insn_operand_matches ((enum insn_code) icode, e.opno () + 2, src2))
> +    {
> +      if (VECTOR_MODE_P (GET_MODE (src2)))
> +	src2 = force_reg (data_mode, src2);
> +      else
> +	src2 = force_reg (scalar_mode, src2);
> +    }

e.opno () + 1/2 might be a bit confusing.  We actually want to refer to both
source operands which are always at the same position except if we don't have
a merge operand?  Can we do something like
const int src1_opno = have_merge_op ? 3 : 4;?

> +/* Expand an RVV integer comparison using the RVV equivalent of:
> +
> +     (set TARGET (CODE OP0 OP1)).  */
> +

Should be float.  This function is called the same as the one without mask.
I'd suggest to either rename to _masked or comment the parameters.  

Regards
 Robin

next prev parent reply	other threads:[~2023-05-22  8:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-20  4:54 juzhe.zhong
2023-05-22  8:07 ` Robin Dapp [this message]
2023-05-22  9:01   ` juzhe.zhong
2023-05-22 12:14     ` Robin Dapp
2023-05-22 12:18       ` juzhe.zhong
2023-05-22 12:26         ` Robin Dapp
2023-05-22 12:30           ` juzhe.zhong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5afcfb8-369d-f8ff-4022-308ee4929f04@gmail.com \
    --to=rdapp.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=juzhe.zhong@rivai.ai \
    --cc=kito.cheng@sifive.com \
    --cc=palmer@rivosinc.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).