public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
To: rguenther <rguenther@suse.de>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
	 richard.sandiford <richard.sandiford@arm.com>,
	 "Robin Dapp" <rdapp.gcc@gmail.com>
Subject: Re: Re: [PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs
Date: Fri, 16 Jun 2023 17:06:20 +0800	[thread overview]
Message-ID: <10CADA563DEE0F00+202306161706196619590@rivai.ai> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2306160903440.4723@jbgna.fhfr.qr>

[-- Attachment #1: Type: text/plain, Size: 17381 bytes --]

Thanks a lot! I will wait for Richard final approve.



juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-06-16 17:04
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford; rdapp.gcc
Subject: Re: [PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs
On Thu, 15 Jun 2023, juzhe.zhong@rivai.ai wrote:
 
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> This patch bootstrap pass on X86, ok for trunk ?
 
OK with me, please give Richard S. a chance to comment before pushing.
 
Thanks,
Richard.
 
> Accoding to comments from Richi, split the first patch to add ifn && optabs
> of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
> patch. And also add BIAS argument for possible s390's future use.
> 
> The description of the patterns in doc are coming Robin.
> 
> After this patch is approved, will send the second patch to apply len_mask_*
> patterns into vectorizer.
> 
> Target like ARM SVE in GCC has an elegant way to handle both loop control
> and flow control simultaneously:
> 
> loop_control_mask = WHILE_ULT
> flow_control_mask = comparison
> control_mask = loop_control_mask & flow_control_mask;
> MASK_LOAD (control_mask)
> MASK_STORE (control_mask)
> 
> However, targets like RVV (RISC-V Vector) can not use this approach in
> auto-vectorization since RVV use length in loop control.
> 
> This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
> like RISC-V that uses length in loop control.
> Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
> or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
> Mask is the outcome of comparison.
> 
> LEN_MASK_ LOAD/STORE format is defined as follows:
> 1). LEN_MASK_LOAD (ptr, align, length, mask).
> 2). LEN_MASK_STORE (ptr, align, length, mask, vec).
> 
> Consider these 4 following cases:
> 
> VLA: Variable-length auto-vectorization
> VLS: Specific-length auto-vectorization
> 
> Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
> Code: v1 = MEM (...)
>   for (int i = 0; i < 4; i++)           v2 = MEM (...)
>     a[i] = b[i] + c[i];                 v3 = v1 + v2 
>                                         MEM[...] = v3
> 
> Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = comparison):
> Code:                                   mask = comparison
>   for (int i = 0; i < 4; i++)           v1 = LEN_MASK_LOAD (length = VF, mask)
>     if (cond[i])                        v2 = LEN_MASK_LOAD (length = VF, mask) 
>       a[i] = b[i] + c[i];               v3 = v1 + v2
>                                         LEN_MASK_STORE (length = VF, mask, v3)
>            
> Case 3 (VLA):
> Code:                                   loop_len = SELECT_VL or MIN
>   for (int i = 0; i < n; i++)           v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
>       a[i] = b[i] + c[i];               v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
>                                         v3 = v1 + v2                            
>                                         LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3)
> 
> Case 4 (VLA):
> Code:                                   loop_len = SELECT_VL or MIN
>   for (int i = 0; i < n; i++)           mask = comparison
>       if (cond[i])                      v1 = LEN_MASK_LOAD (length = loop_len, mask)
>       a[i] = b[i] + c[i];               v2 = LEN_MASK_LOAD (length = loop_len, mask)
>                                         v3 = v1 + v2                            
>                                         LEN_MASK_STORE (length = loop_len, mask, v3)
> 
> Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>
> 
> gcc/ChangeLog:
> 
>         * doc/md.texi: Add len_mask{load,store}.
>         * genopinit.cc (main): Ditto.
>         (CMP_NAME): Ditto.
>         * internal-fn.cc (len_maskload_direct): Ditto.
>         (len_maskstore_direct): Ditto.
>         (expand_call_mem_ref): Ditto.
>         (expand_partial_load_optab_fn): Ditto.
>         (expand_len_maskload_optab_fn): Ditto.
>         (expand_partial_store_optab_fn): Ditto.
>         (expand_len_maskstore_optab_fn): Ditto.
>         (direct_len_maskload_optab_supported_p): Ditto.
>         (direct_len_maskstore_optab_supported_p): Ditto.
>         * internal-fn.def (LEN_MASK_LOAD): Ditto.
>         (LEN_MASK_STORE): Ditto.
>         * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi     | 46 +++++++++++++++++++++++++++++++++++++++++++++
>  gcc/genopinit.cc    |  6 ++++--
>  gcc/internal-fn.cc  | 43 ++++++++++++++++++++++++++++++++++++++----
>  gcc/internal-fn.def |  4 ++++
>  gcc/optabs.def      |  2 ++
>  5 files changed, 95 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index a43fd65a2b2..af23ec938d6 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5136,6 +5136,52 @@ of @code{QI} elements.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
> +@item @samp{len_maskload@var{m}@var{n}}
> +Perform a masked load (operand 2 - operand 4) elements from vector memory
> +operand 1 into vector register operand 0, setting the other elements of
> +operand 0 to undefined values.  This is a combination of len_load and maskload. 
> +Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2
> +has whichever integer mode the target prefers.  A secondary mask is specified in
> +operand 3 which must be of type @var{n}.  Operand 4 conceptually has mode @code{QI}.
> +
> +Operand 2 can be a variable or a constant amount.  Operand 4 specifies a
> +constant bias: it is either a constant 0 or a constant -1.  The predicate on
> +operand 4 must only accept the bias values that the target actually supports.
> +GCC handles a bias of 0 more efficiently than a bias of -1.
> +
> +If (operand 2 - operand 4) exceeds the number of elements in mode
> +@var{m}, the behavior is undefined.
> +
> +If the target prefers the length to be measured in bytes
> +rather than elements, it should only implement this pattern for vectors
> +of @code{QI} elements.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> +@cindex @code{len_maskstore@var{m}@var{n}} instruction pattern
> +@item @samp{len_maskstore@var{m}@var{n}}
> +Perform a masked store (operand 2 - operand 4) vector elements from vector register
> +operand 1 into memory operand 0, leaving the other elements of operand 0 unchanged.
> +This is a combination of len_store and maskstore.
> +Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2 has whichever
> +integer mode the target prefers.  A secondary mask is specified in operand 3 which must be
> +of type @var{n}.  Operand 4 conceptually has mode @code{QI}.
> +
> +Operand 2 can be a variable or a constant amount.  Operand 3 specifies a
> +constant bias: it is either a constant 0 or a constant -1.  The predicate on
> +operand 4 must only accept the bias values that the target actually supports.
> +GCC handles a bias of 0 more efficiently than a bias of -1.
> +
> +If (operand 2 - operand 4) exceeds the number of elements in mode
> +@var{m}, the behavior is undefined.
> +
> +If the target prefers the length to be measured in bytes
> +rather than elements, it should only implement this pattern for vectors
> +of @code{QI} elements.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{vec_perm@var{m}} instruction pattern
>  @item @samp{vec_perm@var{m}}
>  Output a (variable) vector permutation.  Operand 0 is the destination
> diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
> index 0c1b6859ca0..6bd8858a1d9 100644
> --- a/gcc/genopinit.cc
> +++ b/gcc/genopinit.cc
> @@ -376,7 +376,8 @@ main (int argc, const char **argv)
>  
>    fprintf (s_file,
>     "/* Returns TRUE if the target supports any of the partial vector\n"
> -    "   optabs: while_ult_optab, len_load_optab or len_store_optab,\n"
> +    "   optabs: while_ult_optab, len_load_optab, len_store_optab,\n"
> +    "   len_maskload_optab or len_maskstore_optab,\n"
>     "   for any mode.  */\n"
>     "bool\npartial_vectors_supported_p (void)\n{\n");
>    bool any_match = false;
> @@ -386,7 +387,8 @@ main (int argc, const char **argv)
>      {
>  #define CMP_NAME(N) !strncmp (p->name, (N), strlen ((N)))
>        if (CMP_NAME("while_ult") || CMP_NAME ("len_load")
> -   || CMP_NAME ("len_store"))
> +   || CMP_NAME ("len_store")|| CMP_NAME ("len_maskload")
> +   || CMP_NAME ("len_maskstore"))
>  {
>    if (first)
>      fprintf (s_file, " HAVE_%s", p->name);
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 208bdf497eb..c911ae790cb 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -165,6 +165,7 @@ init_internal_fns ()
>  #define mask_load_lanes_direct { -1, -1, false }
>  #define gather_load_direct { 3, 1, false }
>  #define len_load_direct { -1, -1, false }
> +#define len_maskload_direct { -1, 3, false }
>  #define mask_store_direct { 3, 2, false }
>  #define store_lanes_direct { 0, 0, false }
>  #define mask_store_lanes_direct { 0, 0, false }
> @@ -172,6 +173,7 @@ init_internal_fns ()
>  #define vec_cond_direct { 2, 0, false }
>  #define scatter_store_direct { 3, 1, false }
>  #define len_store_direct { 3, 3, false }
> +#define len_maskstore_direct { 4, 3, false }
>  #define vec_set_direct { 3, 3, false }
>  #define unary_direct { 0, 0, true }
>  #define unary_convert_direct { -1, 0, true }
> @@ -2873,12 +2875,13 @@ expand_call_mem_ref (tree type, gcall *stmt, int index)
>    return fold_build2 (MEM_REF, type, addr, build_int_cst (alias_ptr_type, 0));
>  }
>  
> -/* Expand MASK_LOAD{,_LANES} or LEN_LOAD call STMT using optab OPTAB.  */
> +/* Expand MASK_LOAD{,_LANES}, LEN_MASK_LOAD or LEN_LOAD call STMT using optab
> + * OPTAB.  */
>  
>  static void
>  expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>  {
> -  class expand_operand ops[4];
> +  class expand_operand ops[5];
>    tree type, lhs, rhs, maskt, biast;
>    rtx mem, target, mask, bias;
>    insn_code icode;
> @@ -2913,6 +2916,20 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>        create_input_operand (&ops[3], bias, QImode);
>        expand_insn (icode, 4, ops);
>      }
> +  else if (optab == len_maskload_optab)
> +    {
> +      create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)),
> +    TYPE_UNSIGNED (TREE_TYPE (maskt)));
> +      maskt = gimple_call_arg (stmt, 3);
> +      mask = expand_normal (maskt);
> +      create_input_operand (&ops[3], mask, TYPE_MODE (TREE_TYPE (maskt)));
> +      icode = convert_optab_handler (optab, TYPE_MODE (type),
> +      TYPE_MODE (TREE_TYPE (maskt)));
> +      biast = gimple_call_arg (stmt, 4);
> +      bias = expand_normal (biast);
> +      create_input_operand (&ops[4], bias, QImode);
> +      expand_insn (icode, 5, ops);
> +    }
>    else
>      {
>        create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
> @@ -2926,13 +2943,15 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>  #define expand_mask_load_optab_fn expand_partial_load_optab_fn
>  #define expand_mask_load_lanes_optab_fn expand_mask_load_optab_fn
>  #define expand_len_load_optab_fn expand_partial_load_optab_fn
> +#define expand_len_maskload_optab_fn expand_partial_load_optab_fn
>  
> -/* Expand MASK_STORE{,_LANES} or LEN_STORE call STMT using optab OPTAB.  */
> +/* Expand MASK_STORE{,_LANES}, LEN_MASK_STORE or LEN_STORE call STMT using optab
> + * OPTAB.  */
>  
>  static void
>  expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>  {
> -  class expand_operand ops[4];
> +  class expand_operand ops[5];
>    tree type, lhs, rhs, maskt, biast;
>    rtx mem, reg, mask, bias;
>    insn_code icode;
> @@ -2965,6 +2984,19 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>        create_input_operand (&ops[3], bias, QImode);
>        expand_insn (icode, 4, ops);
>      }
> +  else if (optab == len_maskstore_optab)
> +    {
> +      create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)),
> +    TYPE_UNSIGNED (TREE_TYPE (maskt)));
> +      maskt = gimple_call_arg (stmt, 3);
> +      mask = expand_normal (maskt);
> +      create_input_operand (&ops[3], mask, TYPE_MODE (TREE_TYPE (maskt)));
> +      biast = gimple_call_arg (stmt, 4);
> +      bias = expand_normal (biast);
> +      create_input_operand (&ops[4], bias, QImode);
> +      icode = convert_optab_handler (optab, TYPE_MODE (type), GET_MODE (mask));
> +      expand_insn (icode, 5, ops);
> +    }
>    else
>      {
>        create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
> @@ -2975,6 +3007,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>  #define expand_mask_store_optab_fn expand_partial_store_optab_fn
>  #define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn
>  #define expand_len_store_optab_fn expand_partial_store_optab_fn
> +#define expand_len_maskstore_optab_fn expand_partial_store_optab_fn
>  
>  /* Expand VCOND, VCONDU and VCONDEQ optab internal functions.
>     The expansion of STMT happens based on OPTAB table associated.  */
> @@ -3928,6 +3961,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_gather_load_optab_supported_p convert_optab_supported_p
>  #define direct_len_load_optab_supported_p direct_optab_supported_p
> +#define direct_len_maskload_optab_supported_p convert_optab_supported_p
>  #define direct_mask_store_optab_supported_p convert_optab_supported_p
>  #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
> @@ -3935,6 +3969,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_vec_cond_optab_supported_p convert_optab_supported_p
>  #define direct_scatter_store_optab_supported_p convert_optab_supported_p
>  #define direct_len_store_optab_supported_p direct_optab_supported_p
> +#define direct_len_maskstore_optab_supported_p convert_optab_supported_p
>  #define direct_while_optab_supported_p convert_optab_supported_p
>  #define direct_fold_extract_optab_supported_p direct_optab_supported_p
>  #define direct_fold_left_optab_supported_p direct_optab_supported_p
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 9da5f31636e..bc947c0fde7 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -50,12 +50,14 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_load_lanes: currently just vec_mask_load_lanes
>     - gather_load: used for {mask_,}gather_load
>     - len_load: currently just len_load
> +   - len_maskload: currently just len_maskload
>  
>     - mask_store: currently just maskstore
>     - store_lanes: currently just vec_store_lanes
>     - mask_store_lanes: currently just vec_mask_store_lanes
>     - scatter_store: used for {mask_,}scatter_store
>     - len_store: currently just len_store
> +   - len_maskstore: currently just len_maskstore
>  
>     - unary: a normal unary optab, such as vec_reverse_<mode>
>     - binary: a normal binary optab, such as vec_interleave_lo_<mode>
> @@ -157,6 +159,7 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
>         mask_gather_load, gather_load)
>  
>  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
>  
>  DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store)
>  DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
> @@ -175,6 +178,7 @@ DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
>  DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
>  
>  DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_STORE, 0, len_maskstore, len_maskstore)
>  
>  DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
>  DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 22b31be0f72..9533eb11565 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -91,6 +91,8 @@ OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
>  OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")
>  OPTAB_CD(maskload_optab, "maskload$a$b")
>  OPTAB_CD(maskstore_optab, "maskstore$a$b")
> +OPTAB_CD(len_maskload_optab, "len_maskload$a$b")
> +OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b")
>  OPTAB_CD(gather_load_optab, "gather_load$a$b")
>  OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
>  OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
 

  reply	other threads:[~2023-06-16  9:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15 13:14 juzhe.zhong
2023-06-16  9:04 ` Richard Biener
2023-06-16  9:06   ` juzhe.zhong [this message]
2023-06-16  9:21 ` Robin Dapp
2023-06-16  9:33   ` juzhe.zhong
2023-06-16 10:10     ` Robin Dapp
2023-06-16 10:29       ` juzhe.zhong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10CADA563DEE0F00+202306161706196619590@rivai.ai \
    --to=juzhe.zhong@rivai.ai \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rdapp.gcc@gmail.com \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).