public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] VECT: Support LEN_MASK_{LOAD, STORE} ifn && optabs
@ 2023-06-19 11:42 Jeff Law
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff Law @ 2023-06-19 11:42 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:3398e8c78476fd5ccf2162fbafcc26ccd3e17121

commit 3398e8c78476fd5ccf2162fbafcc26ccd3e17121
Author: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
Date:   Mon Jun 19 10:07:09 2023 +0200

    VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs
    
    This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
    like RISC-V that uses length in loop control.
    Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
    or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
    Mask is the outcome of comparison.
    
    LEN_MASK_ LOAD/STORE format is defined as follows:
    1). LEN_MASK_LOAD (ptr, align, length, mask).
    2). LEN_MASK_STORE (ptr, align, length, mask, vec).
    
    Consider these 4 following cases:
    
    VLA: Variable-length auto-vectorization
    VLS: Specific-length auto-vectorization
    
    Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
    Code:                                   v1 = MEM (...)
      for (int i = 0; i < 4; i++)           v2 = MEM (...)
        a[i] = b[i] + c[i];                 v3 = v1 + v2
                                            MEM[...] = v3
    
    Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = comparison):
    Code:                                   mask = comparison
      for (int i = 0; i < 4; i++)           v1 = LEN_MASK_LOAD (length = VF, mask)
        if (cond[i])                        v2 = LEN_MASK_LOAD (length = VF, mask)
          a[i] = b[i] + c[i];               v3 = v1 + v2
                                            LEN_MASK_STORE (length = VF, mask, v3)
    
    Case 3 (VLA):
    Code:                                   loop_len = SELECT_VL or MIN
      for (int i = 0; i < n; i++)           v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
          a[i] = b[i] + c[i];               v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
                                            v3 = v1 + v2
                                            LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3)
    
    Case 4 (VLA):
    Code:                                   loop_len = SELECT_VL or MIN
      for (int i = 0; i < n; i++)           mask = comparison
          if (cond[i])                      v1 = LEN_MASK_LOAD (length = loop_len, mask)
          a[i] = b[i] + c[i];               v2 = LEN_MASK_LOAD (length = loop_len, mask)
                                            v3 = v1 + v2
                                            LEN_MASK_STORE (length = loop_len, mask, v3)
    
    Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>
    
    gcc/ChangeLog:
    
            * doc/md.texi: Add len_mask{load,store}.
            * genopinit.cc (main): Ditto.
            (CMP_NAME): Ditto.
            * internal-fn.cc (len_maskload_direct): Ditto.
            (len_maskstore_direct): Ditto.
            (expand_call_mem_ref): Ditto.
            (expand_partial_load_optab_fn): Ditto.
            (expand_len_maskload_optab_fn): Ditto.
            (expand_partial_store_optab_fn): Ditto.
            (expand_len_maskstore_optab_fn): Ditto.
            (direct_len_maskload_optab_supported_p): Ditto.
            (direct_len_maskstore_optab_supported_p): Ditto.
            * internal-fn.def (LEN_MASK_LOAD): Ditto.
            (LEN_MASK_STORE): Ditto.
            * optabs.def (OPTAB_CD): Ditto.

Diff:
---
 gcc/doc/md.texi     | 61 +++++++++++++++++++++++++++++++++++++++++++++++++----
 gcc/genopinit.cc    |  6 ++++--
 gcc/internal-fn.cc  | 43 +++++++++++++++++++++++++++++++++----
 gcc/internal-fn.def |  4 ++++
 gcc/optabs.def      |  2 ++
 5 files changed, 106 insertions(+), 10 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 909ec6b7728..61df82b203d 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,7 +5094,7 @@ This pattern is not allowed to @code{FAIL}.
 
 @cindex @code{len_load_@var{m}} instruction pattern
 @item @samp{len_load_@var{m}}
-Load (operand 2 - operand 3) elements from vector memory operand 1
+Load (operand 2 + operand 3) elements from memory operand 1
 into vector register operand 0, setting the other elements of
 operand 0 to undefined values.  Operands 0 and 1 have mode @var{m},
 which must be a vector mode.  Operand 2 has whichever integer mode the
@@ -5105,7 +5105,7 @@ constant bias: it is either a constant 0 or a constant -1.  The predicate on
 operand 3 must only accept the bias values that the target actually supports.
 GCC handles a bias of 0 more efficiently than a bias of -1.
 
-If (operand 2 - operand 3) exceeds the number of elements in mode
+If (operand 2 + operand 3) exceeds the number of elements in mode
 @var{m}, the behavior is undefined.
 
 If the target prefers the length to be measured in bytes rather than
@@ -5116,7 +5116,7 @@ This pattern is not allowed to @code{FAIL}.
 
 @cindex @code{len_store_@var{m}} instruction pattern
 @item @samp{len_store_@var{m}}
-Store (operand 2 - operand 3) vector elements from vector register operand 1
+Store (operand 2 + operand 3) vector elements from vector register operand 1
 into memory operand 0, leaving the other elements of
 operand 0 unchanged.  Operands 0 and 1 have mode @var{m}, which must be
 a vector mode.  Operand 2 has whichever integer mode the target prefers.
@@ -5127,7 +5127,60 @@ constant bias: it is either a constant 0 or a constant -1.  The predicate on
 operand 3 must only accept the bias values that the target actually supports.
 GCC handles a bias of 0 more efficiently than a bias of -1.
 
-If (operand 2 - operand 3) exceeds the number of elements in mode
+If (operand 2 + operand 3) exceeds the number of elements in mode
+@var{m}, the behavior is undefined.
+
+If the target prefers the length to be measured in bytes
+rather than elements, it should only implement this pattern for vectors
+of @code{QI} elements.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
+@item @samp{len_maskload@var{m}@var{n}}
+Perform a masked load from the memory location pointed to by operand 1
+into register operand 0.  (operand 2 + operand 4) elements are loaded from
+memory and other elements in operand 0 are set to undefined values.
+This is a combination of len_load and maskload.
+Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2
+has whichever integer mode the target prefers.  A mask is specified in
+operand 3 which must be of type @var{n}.  The mask has lower precedence than
+the length and is itself subject to length masking,
+i.e. only mask indices < (operand 2 + operand 4) are used.
+Operand 4 conceptually has mode @code{QI}.
+
+Operand 2 can be a variable or a constant amount.  Operand 4 specifies a
+constant bias: it is either a constant 0 or a constant -1.  The predicate on
+operand 4 must only accept the bias values that the target actually supports.
+GCC handles a bias of 0 more efficiently than a bias of -1.
+
+If (operand 2 + operand 4) exceeds the number of elements in mode
+@var{m}, the behavior is undefined.
+
+If the target prefers the length to be measured in bytes
+rather than elements, it should only implement this pattern for vectors
+of @code{QI} elements.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{len_maskstore@var{m}@var{n}} instruction pattern
+@item @samp{len_maskstore@var{m}@var{n}}
+Perform a masked store from vector register operand 1 into memory operand 0.
+(operand 2 + operand 4) elements are stored to memory
+and leave the other elements of operand 0 unchanged.
+This is a combination of len_store and maskstore.
+Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2 has whichever
+integer mode the target prefers.  A mask is specified in operand 3 which must be
+of type @var{n}.  The mask has lower precedence than the length and is itself subject to
+length masking, i.e. only mask indices < (operand 2 + operand 4) are used.
+Operand 4 conceptually has mode @code{QI}.
+
+Operand 2 can be a variable or a constant amount.  Operand 3 specifies a
+constant bias: it is either a constant 0 or a constant -1.  The predicate on
+operand 4 must only accept the bias values that the target actually supports.
+GCC handles a bias of 0 more efficiently than a bias of -1.
+
+If (operand 2 + operand 4) exceeds the number of elements in mode
 @var{m}, the behavior is undefined.
 
 If the target prefers the length to be measured in bytes
diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 0c1b6859ca0..6bd8858a1d9 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc
@@ -376,7 +376,8 @@ main (int argc, const char **argv)
 
   fprintf (s_file,
 	   "/* Returns TRUE if the target supports any of the partial vector\n"
-	   "   optabs: while_ult_optab, len_load_optab or len_store_optab,\n"
+	   "   optabs: while_ult_optab, len_load_optab, len_store_optab,\n"
+	   "   len_maskload_optab or len_maskstore_optab,\n"
 	   "   for any mode.  */\n"
 	   "bool\npartial_vectors_supported_p (void)\n{\n");
   bool any_match = false;
@@ -386,7 +387,8 @@ main (int argc, const char **argv)
     {
 #define CMP_NAME(N) !strncmp (p->name, (N), strlen ((N)))
       if (CMP_NAME("while_ult") || CMP_NAME ("len_load")
-	  || CMP_NAME ("len_store"))
+	  || CMP_NAME ("len_store")|| CMP_NAME ("len_maskload")
+	  || CMP_NAME ("len_maskstore"))
 	{
 	  if (first)
 	    fprintf (s_file, " HAVE_%s", p->name);
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 6e81dc05e0e..d0e29103e3b 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -111,6 +111,7 @@ init_internal_fns ()
 #define mask_load_lanes_direct { -1, -1, false }
 #define gather_load_direct { 3, 1, false }
 #define len_load_direct { -1, -1, false }
+#define len_maskload_direct { -1, 3, false }
 #define mask_store_direct { 3, 2, false }
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
@@ -118,6 +119,7 @@ init_internal_fns ()
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
 #define len_store_direct { 3, 3, false }
+#define len_maskstore_direct { 4, 3, false }
 #define vec_set_direct { 3, 3, false }
 #define unary_direct { 0, 0, true }
 #define unary_convert_direct { -1, 0, true }
@@ -2780,12 +2782,13 @@ expand_call_mem_ref (tree type, gcall *stmt, int index)
   return fold_build2 (MEM_REF, type, addr, build_int_cst (alias_ptr_type, 0));
 }
 
-/* Expand MASK_LOAD{,_LANES} or LEN_LOAD call STMT using optab OPTAB.  */
+/* Expand MASK_LOAD{,_LANES}, LEN_MASK_LOAD or LEN_LOAD call STMT using optab
+ * OPTAB.  */
 
 static void
 expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 {
-  class expand_operand ops[4];
+  class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
   rtx mem, target, mask, bias;
   insn_code icode;
@@ -2820,6 +2823,20 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
       create_input_operand (&ops[3], bias, QImode);
       expand_insn (icode, 4, ops);
     }
+  else if (optab == len_maskload_optab)
+    {
+      create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)),
+				   TYPE_UNSIGNED (TREE_TYPE (maskt)));
+      maskt = gimple_call_arg (stmt, 3);
+      mask = expand_normal (maskt);
+      create_input_operand (&ops[3], mask, TYPE_MODE (TREE_TYPE (maskt)));
+      icode = convert_optab_handler (optab, TYPE_MODE (type),
+				     TYPE_MODE (TREE_TYPE (maskt)));
+      biast = gimple_call_arg (stmt, 4);
+      bias = expand_normal (biast);
+      create_input_operand (&ops[4], bias, QImode);
+      expand_insn (icode, 5, ops);
+    }
   else
     {
       create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
@@ -2833,13 +2850,15 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 #define expand_mask_load_optab_fn expand_partial_load_optab_fn
 #define expand_mask_load_lanes_optab_fn expand_mask_load_optab_fn
 #define expand_len_load_optab_fn expand_partial_load_optab_fn
+#define expand_len_maskload_optab_fn expand_partial_load_optab_fn
 
-/* Expand MASK_STORE{,_LANES} or LEN_STORE call STMT using optab OPTAB.  */
+/* Expand MASK_STORE{,_LANES}, LEN_MASK_STORE or LEN_STORE call STMT using optab
+ * OPTAB.  */
 
 static void
 expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 {
-  class expand_operand ops[4];
+  class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
   rtx mem, reg, mask, bias;
   insn_code icode;
@@ -2872,6 +2891,19 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
       create_input_operand (&ops[3], bias, QImode);
       expand_insn (icode, 4, ops);
     }
+  else if (optab == len_maskstore_optab)
+    {
+      create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)),
+				   TYPE_UNSIGNED (TREE_TYPE (maskt)));
+      maskt = gimple_call_arg (stmt, 3);
+      mask = expand_normal (maskt);
+      create_input_operand (&ops[3], mask, TYPE_MODE (TREE_TYPE (maskt)));
+      biast = gimple_call_arg (stmt, 4);
+      bias = expand_normal (biast);
+      create_input_operand (&ops[4], bias, QImode);
+      icode = convert_optab_handler (optab, TYPE_MODE (type), GET_MODE (mask));
+      expand_insn (icode, 5, ops);
+    }
   else
     {
       create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
@@ -2882,6 +2914,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 #define expand_mask_store_optab_fn expand_partial_store_optab_fn
 #define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn
 #define expand_len_store_optab_fn expand_partial_store_optab_fn
+#define expand_len_maskstore_optab_fn expand_partial_store_optab_fn
 
 /* Expand VCOND, VCONDU and VCONDEQ optab internal functions.
    The expansion of STMT happens based on OPTAB table associated.  */
@@ -3835,6 +3868,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_gather_load_optab_supported_p convert_optab_supported_p
 #define direct_len_load_optab_supported_p direct_optab_supported_p
+#define direct_len_maskload_optab_supported_p convert_optab_supported_p
 #define direct_mask_store_optab_supported_p convert_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
@@ -3842,6 +3876,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_vec_cond_optab_supported_p convert_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
 #define direct_len_store_optab_supported_p direct_optab_supported_p
+#define direct_len_maskstore_optab_supported_p convert_optab_supported_p
 #define direct_while_optab_supported_p convert_optab_supported_p
 #define direct_fold_extract_optab_supported_p direct_optab_supported_p
 #define direct_fold_left_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 6f6fa7d37f9..7df681d6f87 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -50,12 +50,14 @@ along with GCC; see the file COPYING3.  If not see
    - mask_load_lanes: currently just vec_mask_load_lanes
    - gather_load: used for {mask_,}gather_load
    - len_load: currently just len_load
+   - len_maskload: currently just len_maskload
 
    - mask_store: currently just maskstore
    - store_lanes: currently just vec_store_lanes
    - mask_store_lanes: currently just vec_mask_store_lanes
    - scatter_store: used for {mask_,}scatter_store
    - len_store: currently just len_store
+   - len_maskstore: currently just len_maskstore
 
    - unary: a normal unary optab, such as vec_reverse_<mode>
    - binary: a normal binary optab, such as vec_interleave_lo_<mode>
@@ -133,6 +135,7 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
 		       mask_gather_load, gather_load)
 
 DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
+DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
 
 DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store)
 DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
@@ -151,6 +154,7 @@ DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
 DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
 
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
+DEF_INTERNAL_OPTAB_FN (LEN_MASK_STORE, 0, len_maskstore, len_maskstore)
 
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
 DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b637471b76e..934275a51bd 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -91,6 +91,8 @@ OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
 OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")
 OPTAB_CD(maskload_optab, "maskload$a$b")
 OPTAB_CD(maskstore_optab, "maskstore$a$b")
+OPTAB_CD(len_maskload_optab, "len_maskload$a$b")
+OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b")
 OPTAB_CD(gather_load_optab, "gather_load$a$b")
 OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
 OPTAB_CD(scatter_store_optab, "scatter_store$a$b")

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] VECT: Support LEN_MASK_{LOAD, STORE} ifn && optabs
@ 2023-07-14  2:47 Jeff Law
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff Law @ 2023-07-14  2:47 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:81e1062e35b9bcc79561b3645a1f97b2e801c1a7

commit 81e1062e35b9bcc79561b3645a1f97b2e801c1a7
Author: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
Date:   Mon Jun 19 10:07:09 2023 +0200

    VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs
    
    This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
    like RISC-V that uses length in loop control.
    Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
    or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
    Mask is the outcome of comparison.
    
    LEN_MASK_ LOAD/STORE format is defined as follows:
    1). LEN_MASK_LOAD (ptr, align, length, mask).
    2). LEN_MASK_STORE (ptr, align, length, mask, vec).
    
    Consider these 4 following cases:
    
    VLA: Variable-length auto-vectorization
    VLS: Specific-length auto-vectorization
    
    Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
    Code:                                   v1 = MEM (...)
      for (int i = 0; i < 4; i++)           v2 = MEM (...)
        a[i] = b[i] + c[i];                 v3 = v1 + v2
                                            MEM[...] = v3
    
    Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = comparison):
    Code:                                   mask = comparison
      for (int i = 0; i < 4; i++)           v1 = LEN_MASK_LOAD (length = VF, mask)
        if (cond[i])                        v2 = LEN_MASK_LOAD (length = VF, mask)
          a[i] = b[i] + c[i];               v3 = v1 + v2
                                            LEN_MASK_STORE (length = VF, mask, v3)
    
    Case 3 (VLA):
    Code:                                   loop_len = SELECT_VL or MIN
      for (int i = 0; i < n; i++)           v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
          a[i] = b[i] + c[i];               v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
                                            v3 = v1 + v2
                                            LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3)
    
    Case 4 (VLA):
    Code:                                   loop_len = SELECT_VL or MIN
      for (int i = 0; i < n; i++)           mask = comparison
          if (cond[i])                      v1 = LEN_MASK_LOAD (length = loop_len, mask)
          a[i] = b[i] + c[i];               v2 = LEN_MASK_LOAD (length = loop_len, mask)
                                            v3 = v1 + v2
                                            LEN_MASK_STORE (length = loop_len, mask, v3)
    
    Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>
    
    gcc/ChangeLog:
    
            * doc/md.texi: Add len_mask{load,store}.
            * genopinit.cc (main): Ditto.
            (CMP_NAME): Ditto.
            * internal-fn.cc (len_maskload_direct): Ditto.
            (len_maskstore_direct): Ditto.
            (expand_call_mem_ref): Ditto.
            (expand_partial_load_optab_fn): Ditto.
            (expand_len_maskload_optab_fn): Ditto.
            (expand_partial_store_optab_fn): Ditto.
            (expand_len_maskstore_optab_fn): Ditto.
            (direct_len_maskload_optab_supported_p): Ditto.
            (direct_len_maskstore_optab_supported_p): Ditto.
            * internal-fn.def (LEN_MASK_LOAD): Ditto.
            (LEN_MASK_STORE): Ditto.
            * optabs.def (OPTAB_CD): Ditto.

Diff:
---
 gcc/doc/md.texi     | 61 +++++++++++++++++++++++++++++++++++++++++++++++++----
 gcc/genopinit.cc    |  6 ++++--
 gcc/internal-fn.cc  | 43 +++++++++++++++++++++++++++++++++----
 gcc/internal-fn.def |  4 ++++
 gcc/optabs.def      |  2 ++
 5 files changed, 106 insertions(+), 10 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 909ec6b7728..61df82b203d 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,7 +5094,7 @@ This pattern is not allowed to @code{FAIL}.
 
 @cindex @code{len_load_@var{m}} instruction pattern
 @item @samp{len_load_@var{m}}
-Load (operand 2 - operand 3) elements from vector memory operand 1
+Load (operand 2 + operand 3) elements from memory operand 1
 into vector register operand 0, setting the other elements of
 operand 0 to undefined values.  Operands 0 and 1 have mode @var{m},
 which must be a vector mode.  Operand 2 has whichever integer mode the
@@ -5105,7 +5105,7 @@ constant bias: it is either a constant 0 or a constant -1.  The predicate on
 operand 3 must only accept the bias values that the target actually supports.
 GCC handles a bias of 0 more efficiently than a bias of -1.
 
-If (operand 2 - operand 3) exceeds the number of elements in mode
+If (operand 2 + operand 3) exceeds the number of elements in mode
 @var{m}, the behavior is undefined.
 
 If the target prefers the length to be measured in bytes rather than
@@ -5116,7 +5116,7 @@ This pattern is not allowed to @code{FAIL}.
 
 @cindex @code{len_store_@var{m}} instruction pattern
 @item @samp{len_store_@var{m}}
-Store (operand 2 - operand 3) vector elements from vector register operand 1
+Store (operand 2 + operand 3) vector elements from vector register operand 1
 into memory operand 0, leaving the other elements of
 operand 0 unchanged.  Operands 0 and 1 have mode @var{m}, which must be
 a vector mode.  Operand 2 has whichever integer mode the target prefers.
@@ -5127,7 +5127,60 @@ constant bias: it is either a constant 0 or a constant -1.  The predicate on
 operand 3 must only accept the bias values that the target actually supports.
 GCC handles a bias of 0 more efficiently than a bias of -1.
 
-If (operand 2 - operand 3) exceeds the number of elements in mode
+If (operand 2 + operand 3) exceeds the number of elements in mode
+@var{m}, the behavior is undefined.
+
+If the target prefers the length to be measured in bytes
+rather than elements, it should only implement this pattern for vectors
+of @code{QI} elements.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
+@item @samp{len_maskload@var{m}@var{n}}
+Perform a masked load from the memory location pointed to by operand 1
+into register operand 0.  (operand 2 + operand 4) elements are loaded from
+memory and other elements in operand 0 are set to undefined values.
+This is a combination of len_load and maskload.
+Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2
+has whichever integer mode the target prefers.  A mask is specified in
+operand 3 which must be of type @var{n}.  The mask has lower precedence than
+the length and is itself subject to length masking,
+i.e. only mask indices < (operand 2 + operand 4) are used.
+Operand 4 conceptually has mode @code{QI}.
+
+Operand 2 can be a variable or a constant amount.  Operand 4 specifies a
+constant bias: it is either a constant 0 or a constant -1.  The predicate on
+operand 4 must only accept the bias values that the target actually supports.
+GCC handles a bias of 0 more efficiently than a bias of -1.
+
+If (operand 2 + operand 4) exceeds the number of elements in mode
+@var{m}, the behavior is undefined.
+
+If the target prefers the length to be measured in bytes
+rather than elements, it should only implement this pattern for vectors
+of @code{QI} elements.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{len_maskstore@var{m}@var{n}} instruction pattern
+@item @samp{len_maskstore@var{m}@var{n}}
+Perform a masked store from vector register operand 1 into memory operand 0.
+(operand 2 + operand 4) elements are stored to memory
+and leave the other elements of operand 0 unchanged.
+This is a combination of len_store and maskstore.
+Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2 has whichever
+integer mode the target prefers.  A mask is specified in operand 3 which must be
+of type @var{n}.  The mask has lower precedence than the length and is itself subject to
+length masking, i.e. only mask indices < (operand 2 + operand 4) are used.
+Operand 4 conceptually has mode @code{QI}.
+
+Operand 2 can be a variable or a constant amount.  Operand 3 specifies a
+constant bias: it is either a constant 0 or a constant -1.  The predicate on
+operand 4 must only accept the bias values that the target actually supports.
+GCC handles a bias of 0 more efficiently than a bias of -1.
+
+If (operand 2 + operand 4) exceeds the number of elements in mode
 @var{m}, the behavior is undefined.
 
 If the target prefers the length to be measured in bytes
diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 0c1b6859ca0..6bd8858a1d9 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc
@@ -376,7 +376,8 @@ main (int argc, const char **argv)
 
   fprintf (s_file,
 	   "/* Returns TRUE if the target supports any of the partial vector\n"
-	   "   optabs: while_ult_optab, len_load_optab or len_store_optab,\n"
+	   "   optabs: while_ult_optab, len_load_optab, len_store_optab,\n"
+	   "   len_maskload_optab or len_maskstore_optab,\n"
 	   "   for any mode.  */\n"
 	   "bool\npartial_vectors_supported_p (void)\n{\n");
   bool any_match = false;
@@ -386,7 +387,8 @@ main (int argc, const char **argv)
     {
 #define CMP_NAME(N) !strncmp (p->name, (N), strlen ((N)))
       if (CMP_NAME("while_ult") || CMP_NAME ("len_load")
-	  || CMP_NAME ("len_store"))
+	  || CMP_NAME ("len_store")|| CMP_NAME ("len_maskload")
+	  || CMP_NAME ("len_maskstore"))
 	{
 	  if (first)
 	    fprintf (s_file, " HAVE_%s", p->name);
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 6e81dc05e0e..d0e29103e3b 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -111,6 +111,7 @@ init_internal_fns ()
 #define mask_load_lanes_direct { -1, -1, false }
 #define gather_load_direct { 3, 1, false }
 #define len_load_direct { -1, -1, false }
+#define len_maskload_direct { -1, 3, false }
 #define mask_store_direct { 3, 2, false }
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
@@ -118,6 +119,7 @@ init_internal_fns ()
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
 #define len_store_direct { 3, 3, false }
+#define len_maskstore_direct { 4, 3, false }
 #define vec_set_direct { 3, 3, false }
 #define unary_direct { 0, 0, true }
 #define unary_convert_direct { -1, 0, true }
@@ -2780,12 +2782,13 @@ expand_call_mem_ref (tree type, gcall *stmt, int index)
   return fold_build2 (MEM_REF, type, addr, build_int_cst (alias_ptr_type, 0));
 }
 
-/* Expand MASK_LOAD{,_LANES} or LEN_LOAD call STMT using optab OPTAB.  */
+/* Expand MASK_LOAD{,_LANES}, LEN_MASK_LOAD or LEN_LOAD call STMT using optab
+ * OPTAB.  */
 
 static void
 expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 {
-  class expand_operand ops[4];
+  class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
   rtx mem, target, mask, bias;
   insn_code icode;
@@ -2820,6 +2823,20 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
       create_input_operand (&ops[3], bias, QImode);
       expand_insn (icode, 4, ops);
     }
+  else if (optab == len_maskload_optab)
+    {
+      create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)),
+				   TYPE_UNSIGNED (TREE_TYPE (maskt)));
+      maskt = gimple_call_arg (stmt, 3);
+      mask = expand_normal (maskt);
+      create_input_operand (&ops[3], mask, TYPE_MODE (TREE_TYPE (maskt)));
+      icode = convert_optab_handler (optab, TYPE_MODE (type),
+				     TYPE_MODE (TREE_TYPE (maskt)));
+      biast = gimple_call_arg (stmt, 4);
+      bias = expand_normal (biast);
+      create_input_operand (&ops[4], bias, QImode);
+      expand_insn (icode, 5, ops);
+    }
   else
     {
       create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
@@ -2833,13 +2850,15 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 #define expand_mask_load_optab_fn expand_partial_load_optab_fn
 #define expand_mask_load_lanes_optab_fn expand_mask_load_optab_fn
 #define expand_len_load_optab_fn expand_partial_load_optab_fn
+#define expand_len_maskload_optab_fn expand_partial_load_optab_fn
 
-/* Expand MASK_STORE{,_LANES} or LEN_STORE call STMT using optab OPTAB.  */
+/* Expand MASK_STORE{,_LANES}, LEN_MASK_STORE or LEN_STORE call STMT using optab
+ * OPTAB.  */
 
 static void
 expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 {
-  class expand_operand ops[4];
+  class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
   rtx mem, reg, mask, bias;
   insn_code icode;
@@ -2872,6 +2891,19 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
       create_input_operand (&ops[3], bias, QImode);
       expand_insn (icode, 4, ops);
     }
+  else if (optab == len_maskstore_optab)
+    {
+      create_convert_operand_from (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)),
+				   TYPE_UNSIGNED (TREE_TYPE (maskt)));
+      maskt = gimple_call_arg (stmt, 3);
+      mask = expand_normal (maskt);
+      create_input_operand (&ops[3], mask, TYPE_MODE (TREE_TYPE (maskt)));
+      biast = gimple_call_arg (stmt, 4);
+      bias = expand_normal (biast);
+      create_input_operand (&ops[4], bias, QImode);
+      icode = convert_optab_handler (optab, TYPE_MODE (type), GET_MODE (mask));
+      expand_insn (icode, 5, ops);
+    }
   else
     {
       create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
@@ -2882,6 +2914,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 #define expand_mask_store_optab_fn expand_partial_store_optab_fn
 #define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn
 #define expand_len_store_optab_fn expand_partial_store_optab_fn
+#define expand_len_maskstore_optab_fn expand_partial_store_optab_fn
 
 /* Expand VCOND, VCONDU and VCONDEQ optab internal functions.
    The expansion of STMT happens based on OPTAB table associated.  */
@@ -3835,6 +3868,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_gather_load_optab_supported_p convert_optab_supported_p
 #define direct_len_load_optab_supported_p direct_optab_supported_p
+#define direct_len_maskload_optab_supported_p convert_optab_supported_p
 #define direct_mask_store_optab_supported_p convert_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
@@ -3842,6 +3876,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_vec_cond_optab_supported_p convert_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
 #define direct_len_store_optab_supported_p direct_optab_supported_p
+#define direct_len_maskstore_optab_supported_p convert_optab_supported_p
 #define direct_while_optab_supported_p convert_optab_supported_p
 #define direct_fold_extract_optab_supported_p direct_optab_supported_p
 #define direct_fold_left_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 6f6fa7d37f9..7df681d6f87 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -50,12 +50,14 @@ along with GCC; see the file COPYING3.  If not see
    - mask_load_lanes: currently just vec_mask_load_lanes
    - gather_load: used for {mask_,}gather_load
    - len_load: currently just len_load
+   - len_maskload: currently just len_maskload
 
    - mask_store: currently just maskstore
    - store_lanes: currently just vec_store_lanes
    - mask_store_lanes: currently just vec_mask_store_lanes
    - scatter_store: used for {mask_,}scatter_store
    - len_store: currently just len_store
+   - len_maskstore: currently just len_maskstore
 
    - unary: a normal unary optab, such as vec_reverse_<mode>
    - binary: a normal binary optab, such as vec_interleave_lo_<mode>
@@ -133,6 +135,7 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
 		       mask_gather_load, gather_load)
 
 DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
+DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
 
 DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store)
 DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
@@ -151,6 +154,7 @@ DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
 DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
 
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
+DEF_INTERNAL_OPTAB_FN (LEN_MASK_STORE, 0, len_maskstore, len_maskstore)
 
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
 DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b637471b76e..934275a51bd 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -91,6 +91,8 @@ OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
 OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")
 OPTAB_CD(maskload_optab, "maskload$a$b")
 OPTAB_CD(maskstore_optab, "maskstore$a$b")
+OPTAB_CD(len_maskload_optab, "len_maskload$a$b")
+OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b")
 OPTAB_CD(gather_load_optab, "gather_load$a$b")
 OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
 OPTAB_CD(scatter_store_optab, "scatter_store$a$b")

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-07-14  2:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-19 11:42 [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] VECT: Support LEN_MASK_{LOAD, STORE} ifn && optabs Jeff Law
2023-07-14  2:47 Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).