From: Jeff Law <jeffreyalaw@gmail.com>
To: juzhe.zhong@rivai.ai, gcc-patches@gcc.gnu.org
Cc: richard.sandiford@arm.com, rguenther@suse.de, rdapp.gcc@gmail.com
Subject: Re: [PATCH V6] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs
Date: Sat, 17 Jun 2023 08:19:45 -0600 [thread overview]
Message-ID: <43e33266-e449-4ddf-4ea4-938dcd458665@gmail.com> (raw)
In-Reply-To: <20230616102912.262207-1-juzhe.zhong@rivai.ai>
On 6/16/23 04:29, juzhe.zhong@rivai.ai wrote:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
>
> This patch bootstrap pass on X86, ok for trunk ?
>
> Accoding to comments from Richi, split the first patch to add ifn && optabs
> of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
> patch. And also add BIAS argument for possible s390's future use.
>
> The description of the patterns in doc are coming Robin.
>
> After this patch is approved, will send the second patch to apply len_mask_*
> patterns into vectorizer.
>
> Target like ARM SVE in GCC has an elegant way to handle both loop control
> and flow control simultaneously:
>
> loop_control_mask = WHILE_ULT
> flow_control_mask = comparison
> control_mask = loop_control_mask & flow_control_mask;
> MASK_LOAD (control_mask)
> MASK_STORE (control_mask)
>
> However, targets like RVV (RISC-V Vector) can not use this approach in
> auto-vectorization since RVV use length in loop control.
>
> This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
> like RISC-V that uses length in loop control.
> Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
> or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
> Mask is the outcome of comparison.
>
> LEN_MASK_ LOAD/STORE format is defined as follows:
> 1). LEN_MASK_LOAD (ptr, align, length, mask).
> 2). LEN_MASK_STORE (ptr, align, length, mask, vec).
>
> Consider these 4 following cases:
>
> VLA: Variable-length auto-vectorization
> VLS: Specific-length auto-vectorization
>
> Case 1 (VLS): -mrvv-vector-bits=128 IR (Does not use LEN_MASK_*):
> Code: v1 = MEM (...)
> for (int i = 0; i < 4; i++) v2 = MEM (...)
> a[i] = b[i] + c[i]; v3 = v1 + v2
> MEM[...] = v3
>
> Case 2 (VLS): -mrvv-vector-bits=128 IR (LEN_MASK_* with length = VF, mask = comparison):
> Code: mask = comparison
> for (int i = 0; i < 4; i++) v1 = LEN_MASK_LOAD (length = VF, mask)
> if (cond[i]) v2 = LEN_MASK_LOAD (length = VF, mask)
> a[i] = b[i] + c[i]; v3 = v1 + v2
> LEN_MASK_STORE (length = VF, mask, v3)
>
> Case 3 (VLA):
> Code: loop_len = SELECT_VL or MIN
> for (int i = 0; i < n; i++) v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
> a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
> v3 = v1 + v2
> LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3)
>
> Case 4 (VLA):
> Code: loop_len = SELECT_VL or MIN
> for (int i = 0; i < n; i++) mask = comparison
> if (cond[i]) v1 = LEN_MASK_LOAD (length = loop_len, mask)
> a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask)
> v3 = v1 + v2
> LEN_MASK_STORE (length = loop_len, mask, v3)
>
> Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>
>
> gcc/ChangeLog:
>
> * doc/md.texi: Add len_mask{load,store}.
> * genopinit.cc (main): Ditto.
> (CMP_NAME): Ditto.
> * internal-fn.cc (len_maskload_direct): Ditto.
> (len_maskstore_direct): Ditto.
> (expand_call_mem_ref): Ditto.
> (expand_partial_load_optab_fn): Ditto.
> (expand_len_maskload_optab_fn): Ditto.
> (expand_partial_store_optab_fn): Ditto.
> (expand_len_maskstore_optab_fn): Ditto.
> (direct_len_maskload_optab_supported_p): Ditto.
> (direct_len_maskstore_optab_supported_p): Ditto.
> * internal-fn.def (LEN_MASK_LOAD): Ditto.
> (LEN_MASK_STORE): Ditto.
> * optabs.def (OPTAB_CD): Ditto.
You should probably mention the change to len_load in the ChangeLog
entry for md.texi.
> @@ -5136,6 +5136,57 @@ of @code{QI} elements.
>
> This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
> +@item @samp{len_maskload@var{m}@var{n}}
> +Perform a masked load of (operand 2 + operand 4) elements from vector memory
> +operand 1 into vector register operand 0, setting the other elements of
> +operand 0 to undefined values. This is a combination of len_load and maskload.
> +Operands 0 and 1 have mode @var{m}, which must be a vector mode. Operand 2
> +has whichever integer mode the target prefers. A mask is specified in
> +operand 3 which must be of type @var{n}. The mask has lower precedence than
> +the length and is itself subject to length masking,
> +i.e. only mask indices < (operand 2 + operand 4) are used.
> +Operand 4 conceptually has mode @code{QI}.s/vector memory/memory/
That first sentence is a bit difficult to parse. Perhaps break it into
two sentences. Something like this?
Perform a masked load from the memory location pointed to be operand 1
into register operand 0. operand 2 + operand 4 elements are loaded from
memory and other elements in operand 0 are set to undefined values.
Presumably the element length is encoded by the modes? Is this worth
mentioning?
> +
> +This pattern is not allowed to @code{FAIL}.
If the pattern is not allowed to fail, then what code enforces the bias
argument's restrictions? I don't see it in the generic expander code.
> +
> +@cindex @code{len_maskstore@var{m}@var{n}} instruction pattern
> +@item @samp{len_maskstore@var{m}@var{n}}
> +Perform a masked store of (operand 2 + operand 4) vector elements from vector register
> +operand 1 into memory operand 0, leaving the other elements of operand 0 unchanged.
> +This is a combination of len_store and maskstore.
I'd think this could be adjusted in roughly the same manner as I
suggested for loads.
> diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
> index 0c1b6859ca0..6bd8858a1d9 100644
> --- a/gcc/genopinit.cc
> +++ b/gcc/genopinit.cc
> @@ -386,7 +387,8 @@ main (int argc, const char **argv)
> {
> #define CMP_NAME(N) !strncmp (p->name, (N), strlen ((N)))
> if (CMP_NAME("while_ult") || CMP_NAME ("len_load")
> - || CMP_NAME ("len_store"))
> + || CMP_NAME ("len_store")|| CMP_NAME ("len_maskload")
Whitespace nit. Insert a space before the "||".
Looks reasonable to me. I think we're going to need a V2, but I don't
see major conceptual issues, just minor details to work through.
jeff
next prev parent reply other threads:[~2023-06-17 14:19 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-16 10:29 juzhe.zhong
2023-06-17 14:19 ` Jeff Law [this message]
2023-06-17 22:57 ` 钟居哲
2023-06-19 6:56 ` Robin Dapp
2023-06-19 11:43 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43e33266-e449-4ddf-4ea4-938dcd458665@gmail.com \
--to=jeffreyalaw@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=juzhe.zhong@rivai.ai \
--cc=rdapp.gcc@gmail.com \
--cc=rguenther@suse.de \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).