From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2119) id 480983858415; Mon, 19 Jun 2023 11:42:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 480983858415 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687174962; bh=3qvPTdf8irlfaHp92Rzzl8ekEGc5M3MnUh9yeFrWZ+s=; h=From:To:Subject:Date:From; b=BAuvk+WGrdwNNm4S3wI7PK9Akj+956uZzj0MSyqCLAmIva6Tzk8FEHmmOz2jSknpo 0mU2Fb6ufGgqFeYDb/igsg8DWmXOhhJEU0McWpbivGirw1Ad3r7tZkkQuZX2GSuG0c Zv3mA68aoMWlGqdKj3+1E09OCQ1naaJaxsa369aw= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Jeff Law To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] VECT: Support LEN_MASK_{LOAD, STORE} ifn && optabs X-Act-Checkin: gcc X-Git-Author: Ju-Zhe Zhong X-Git-Refname: refs/vendors/riscv/heads/gcc-13-with-riscv-opts X-Git-Oldrev: 4885d06f546231f4806349ab3c31d6c6a54bbeb2 X-Git-Newrev: 3398e8c78476fd5ccf2162fbafcc26ccd3e17121 Message-Id: <20230619114242.480983858415@sourceware.org> Date: Mon, 19 Jun 2023 11:42:42 +0000 (GMT) List-Id: https://gcc.gnu.org/g:3398e8c78476fd5ccf2162fbafcc26ccd3e17121 commit 3398e8c78476fd5ccf2162fbafcc26ccd3e17121 Author: Ju-Zhe Zhong Date: Mon Jun 19 10:07:09 2023 +0200 VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets like RISC-V that uses length in loop control. Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR. Mask is the outcome of comparison. LEN_MASK_ LOAD/STORE format is defined as follows: 1). LEN_MASK_LOAD (ptr, align, length, mask). 2). LEN_MASK_STORE (ptr, align, length, mask, vec). Consider these 4 following cases: VLA: Variable-length auto-vectorization VLS: Specific-length auto-vectorization Case 1 (VLS): -mrvv-vector-bits=128 IR (Does not use LEN_MASK_*): Code: v1 = MEM (...) for (int i = 0; i < 4; i++) v2 = MEM (...) a[i] = b[i] + c[i]; v3 = v1 + v2 MEM[...] = v3 Case 2 (VLS): -mrvv-vector-bits=128 IR (LEN_MASK_* with length = VF, mask = comparison): Code: mask = comparison for (int i = 0; i < 4; i++) v1 = LEN_MASK_LOAD (length = VF, mask) if (cond[i]) v2 = LEN_MASK_LOAD (length = VF, mask) a[i] = b[i] + c[i]; v3 = v1 + v2 LEN_MASK_STORE (length = VF, mask, v3) Case 3 (VLA): Code: loop_len = SELECT_VL or MIN for (int i = 0; i < n; i++) v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...}) a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...}) v3 = v1 + v2 LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3) Case 4 (VLA): Code: loop_len = SELECT_VL or MIN for (int i = 0; i < n; i++) mask = comparison if (cond[i]) v1 = LEN_MASK_LOAD (length = loop_len, mask) a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask) v3 = v1 + v2 LEN_MASK_STORE (length = loop_len, mask, v3) Co-authored-by: Robin Dapp gcc/ChangeLog: * doc/md.texi: Add len_mask{load,store}. * genopinit.cc (main): Ditto. (CMP_NAME): Ditto. * internal-fn.cc (len_maskload_direct): Ditto. (len_maskstore_direct): Ditto. (expand_call_mem_ref): Ditto. (expand_partial_load_optab_fn): Ditto. (expand_len_maskload_optab_fn): Ditto. (expand_partial_store_optab_fn): Ditto. (expand_len_maskstore_optab_fn): Ditto. (direct_len_maskload_optab_supported_p): Ditto. (direct_len_maskstore_optab_supported_p): Ditto. * internal-fn.def (LEN_MASK_LOAD): Ditto. (LEN_MASK_STORE): Ditto. * optabs.def (OPTAB_CD): Ditto. Diff: --- gcc/doc/md.texi | 61 +++++++++++++++++++++++++++++++++++++++++++++++++---- gcc/genopinit.cc | 6 ++++-- gcc/internal-fn.cc | 43 +++++++++++++++++++++++++++++++++---- gcc/internal-fn.def | 4 ++++ gcc/optabs.def | 2 ++ 5 files changed, 106 insertions(+), 10 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 909ec6b7728..61df82b203d 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5094,7 +5094,7 @@ This pattern is not allowed to @code{FAIL}. @cindex @code{len_load_@var{m}} instruction pattern @item @samp{len_load_@var{m}} -Load (operand 2 - operand 3) elements from vector memory operand 1 +Load (operand 2 + operand 3) elements from memory operand 1 into vector register operand 0, setting the other elements of operand 0 to undefined values. Operands 0 and 1 have mode @var{m}, which must be a vector mode. Operand 2 has whichever integer mode the @@ -5105,7 +5105,7 @@ constant bias: it is either a constant 0 or a constant -1. The predicate on operand 3 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1. -If (operand 2 - operand 3) exceeds the number of elements in mode +If (operand 2 + operand 3) exceeds the number of elements in mode @var{m}, the behavior is undefined. If the target prefers the length to be measured in bytes rather than @@ -5116,7 +5116,7 @@ This pattern is not allowed to @code{FAIL}. @cindex @code{len_store_@var{m}} instruction pattern @item @samp{len_store_@var{m}} -Store (operand 2 - operand 3) vector elements from vector register operand 1 +Store (operand 2 + operand 3) vector elements from vector register operand 1 into memory operand 0, leaving the other elements of operand 0 unchanged. Operands 0 and 1 have mode @var{m}, which must be a vector mode. Operand 2 has whichever integer mode the target prefers. @@ -5127,7 +5127,60 @@ constant bias: it is either a constant 0 or a constant -1. The predicate on operand 3 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1. -If (operand 2 - operand 3) exceeds the number of elements in mode +If (operand 2 + operand 3) exceeds the number of elements in mode +@var{m}, the behavior is undefined. + +If the target prefers the length to be measured in bytes +rather than elements, it should only implement this pattern for vectors +of @code{QI} elements. + +This pattern is not allowed to @code{FAIL}. + +@cindex @code{len_maskload@var{m}@var{n}} instruction pattern +@item @samp{len_maskload@var{m}@var{n}} +Perform a masked load from the memory location pointed to by operand 1 +into register operand 0. (operand 2 + operand 4) elements are loaded from +memory and other elements in operand 0 are set to undefined values. +This is a combination of len_load and maskload. +Operands 0 and 1 have mode @var{m}, which must be a vector mode. Operand 2 +has whichever integer mode the target prefers. A mask is specified in +operand 3 which must be of type @var{n}. The mask has lower precedence than +the length and is itself subject to length masking, +i.e. only mask indices < (operand 2 + operand 4) are used. +Operand 4 conceptually has mode @code{QI}. + +Operand 2 can be a variable or a constant amount. Operand 4 specifies a +constant bias: it is either a constant 0 or a constant -1. The predicate on +operand 4 must only accept the bias values that the target actually supports. +GCC handles a bias of 0 more efficiently than a bias of -1. + +If (operand 2 + operand 4) exceeds the number of elements in mode +@var{m}, the behavior is undefined. + +If the target prefers the length to be measured in bytes +rather than elements, it should only implement this pattern for vectors +of @code{QI} elements. + +This pattern is not allowed to @code{FAIL}. + +@cindex @code{len_maskstore@var{m}@var{n}} instruction pattern +@item @samp{len_maskstore@var{m}@var{n}} +Perform a masked store from vector register operand 1 into memory operand 0. +(operand 2 + operand 4) elements are stored to memory +and leave the other elements of operand 0 unchanged. +This is a combination of len_store and maskstore. +Operands 0 and 1 have mode @var{m}, which must be a vector mode. Operand 2 has whichever +integer mode the target prefers. A mask is specified in operand 3 which must be +of type @var{n}. The mask has lower precedence than the length and is itself subject to +length masking, i.e. only mask indices < (operand 2 + operand 4) are used. +Operand 4 conceptually has mode @code{QI}. + +Operand 2 can be a variable or a constant amount. Operand 3 specifies a +constant bias: it is either a constant 0 or a constant -1. The predicate on +operand 4 must only accept the bias values that the target actually supports. +GCC handles a bias of 0 more efficiently than a bias of -1. + +If (operand 2 + operand 4) exceeds the number of elements in mode @var{m}, the behavior is undefined. If the target prefers the length to be measured in bytes diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc index 0c1b6859ca0..6bd8858a1d9 100644 --- a/gcc/genopinit.cc +++ b/gcc/genopinit.cc @@ -376,7 +376,8 @@ main (int argc, const char **argv) fprintf (s_file, "/* Returns TRUE if the target supports any of the partial vector\n" - " optabs: while_ult_optab, len_load_optab or len_store_optab,\n" + " optabs: while_ult_optab, len_load_optab, len_store_optab,\n" + " len_maskload_optab or len_maskstore_optab,\n" " for any mode. */\n" "bool\npartial_vectors_supported_p (void)\n{\n"); bool any_match = false; @@ -386,7 +387,8 @@ main (int argc, const char **argv) { #define CMP_NAME(N) !strncmp (p->name, (N), strlen ((N))) if (CMP_NAME("while_ult") || CMP_NAME ("len_load") - || CMP_NAME ("len_store")) + || CMP_NAME ("len_store")|| CMP_NAME ("len_maskload") + || CMP_NAME ("len_maskstore")) { if (first) fprintf (s_file, " HAVE_%s", p->name); diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 6e81dc05e0e..d0e29103e3b 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -111,6 +111,7 @@ init_internal_fns () #define mask_load_lanes_direct { -1, -1, false } #define gather_load_direct { 3, 1, false } #define len_load_direct { -1, -1, false } +#define len_maskload_direct { -1, 3, false } #define mask_store_direct { 3, 2, false } #define store_lanes_direct { 0, 0, false } #define mask_store_lanes_direct { 0, 0, false } @@ -118,6 +119,7 @@ init_internal_fns () #define vec_cond_direct { 2, 0, false } #define scatter_store_direct { 3, 1, false } #define len_store_direct { 3, 3, false } +#define len_maskstore_direct { 4, 3, false } #define vec_set_direct { 3, 3, false } #define unary_direct { 0, 0, true } #define unary_convert_direct { -1, 0, true } @@ -2780,12 +2782,13 @@ expand_call_mem_ref (tree type, gcall *stmt, int index) return fold_build2 (MEM_REF, type, addr, build_int_cst (alias_ptr_type, 0)); } -/* Expand MASK_LOAD{,_LANES} or LEN_LOAD call STMT using optab OPTAB. */ +/* Expand MASK_LOAD{,_LANES}, LEN_MASK_LOAD or LEN_LOAD call STMT using optab + * OPTAB. */ static void expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) { - class expand_operand ops[4]; + class expand_operand ops[5]; tree type, lhs, rhs, maskt, biast; rtx mem, target, mask, bias; insn_code icode; @@ -2820,6 +2823,20 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) create_input_operand (&ops[3], bias, QImode); expand_insn (icode, 4, ops); } + else if (optab == len_maskload_optab) + { + create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)), + TYPE_UNSIGNED (TREE_TYPE (maskt))); + maskt = gimple_call_arg (stmt, 3); + mask = expand_normal (maskt); + create_input_operand (&ops[3], mask, TYPE_MODE (TREE_TYPE (maskt))); + icode = convert_optab_handler (optab, TYPE_MODE (type), + TYPE_MODE (TREE_TYPE (maskt))); + biast = gimple_call_arg (stmt, 4); + bias = expand_normal (biast); + create_input_operand (&ops[4], bias, QImode); + expand_insn (icode, 5, ops); + } else { create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); @@ -2833,13 +2850,15 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) #define expand_mask_load_optab_fn expand_partial_load_optab_fn #define expand_mask_load_lanes_optab_fn expand_mask_load_optab_fn #define expand_len_load_optab_fn expand_partial_load_optab_fn +#define expand_len_maskload_optab_fn expand_partial_load_optab_fn -/* Expand MASK_STORE{,_LANES} or LEN_STORE call STMT using optab OPTAB. */ +/* Expand MASK_STORE{,_LANES}, LEN_MASK_STORE or LEN_STORE call STMT using optab + * OPTAB. */ static void expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) { - class expand_operand ops[4]; + class expand_operand ops[5]; tree type, lhs, rhs, maskt, biast; rtx mem, reg, mask, bias; insn_code icode; @@ -2872,6 +2891,19 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) create_input_operand (&ops[3], bias, QImode); expand_insn (icode, 4, ops); } + else if (optab == len_maskstore_optab) + { + create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)), + TYPE_UNSIGNED (TREE_TYPE (maskt))); + maskt = gimple_call_arg (stmt, 3); + mask = expand_normal (maskt); + create_input_operand (&ops[3], mask, TYPE_MODE (TREE_TYPE (maskt))); + biast = gimple_call_arg (stmt, 4); + bias = expand_normal (biast); + create_input_operand (&ops[4], bias, QImode); + icode = convert_optab_handler (optab, TYPE_MODE (type), GET_MODE (mask)); + expand_insn (icode, 5, ops); + } else { create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); @@ -2882,6 +2914,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) #define expand_mask_store_optab_fn expand_partial_store_optab_fn #define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn #define expand_len_store_optab_fn expand_partial_store_optab_fn +#define expand_len_maskstore_optab_fn expand_partial_store_optab_fn /* Expand VCOND, VCONDU and VCONDEQ optab internal functions. The expansion of STMT happens based on OPTAB table associated. */ @@ -3835,6 +3868,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_gather_load_optab_supported_p convert_optab_supported_p #define direct_len_load_optab_supported_p direct_optab_supported_p +#define direct_len_maskload_optab_supported_p convert_optab_supported_p #define direct_mask_store_optab_supported_p convert_optab_supported_p #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p @@ -3842,6 +3876,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, #define direct_vec_cond_optab_supported_p convert_optab_supported_p #define direct_scatter_store_optab_supported_p convert_optab_supported_p #define direct_len_store_optab_supported_p direct_optab_supported_p +#define direct_len_maskstore_optab_supported_p convert_optab_supported_p #define direct_while_optab_supported_p convert_optab_supported_p #define direct_fold_extract_optab_supported_p direct_optab_supported_p #define direct_fold_left_optab_supported_p direct_optab_supported_p diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 6f6fa7d37f9..7df681d6f87 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -50,12 +50,14 @@ along with GCC; see the file COPYING3. If not see - mask_load_lanes: currently just vec_mask_load_lanes - gather_load: used for {mask_,}gather_load - len_load: currently just len_load + - len_maskload: currently just len_maskload - mask_store: currently just maskstore - store_lanes: currently just vec_store_lanes - mask_store_lanes: currently just vec_mask_store_lanes - scatter_store: used for {mask_,}scatter_store - len_store: currently just len_store + - len_maskstore: currently just len_maskstore - unary: a normal unary optab, such as vec_reverse_ - binary: a normal binary optab, such as vec_interleave_lo_ @@ -133,6 +135,7 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, mask_gather_load, gather_load) DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load) +DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload) DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store) DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0, @@ -151,6 +154,7 @@ DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask) DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set) DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store) +DEF_INTERNAL_OPTAB_FN (LEN_MASK_STORE, 0, len_maskstore, len_maskstore) DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary) diff --git a/gcc/optabs.def b/gcc/optabs.def index b637471b76e..934275a51bd 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -91,6 +91,8 @@ OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b") OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b") OPTAB_CD(maskload_optab, "maskload$a$b") OPTAB_CD(maskstore_optab, "maskstore$a$b") +OPTAB_CD(len_maskload_optab, "len_maskload$a$b") +OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b") OPTAB_CD(gather_load_optab, "gather_load$a$b") OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b") OPTAB_CD(scatter_store_optab, "scatter_store$a$b")