public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN
@ 2023-10-31  9:59 Juzhe-Zhong
  2023-10-31 14:08 ` Li, Pan2
  2023-11-02 13:52 ` Richard Biener
  0 siblings, 2 replies; 13+ messages in thread
From: Juzhe-Zhong @ 2023-10-31  9:59 UTC (permalink / raw)
  To: gcc-patches
  Cc: rguenther, jeffreyalaw, richard.sandiford, rdapp.gcc, Juzhe-Zhong

As previous Richard's suggested, we should support strided load/store in
loop vectorizer instead hacking RISC-V backend.

This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.

The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but with
changing vector offset into scalar stride.

We don't have strided_load/strided_store and mask_strided_load/mask_strided_store since
it't unlikely RVV will have such optabs and we can't add the patterns that we can't test them.


gcc/ChangeLog:

	* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
	* internal-fn.cc (internal_load_fn_p): Ditto.
	(internal_strided_fn_p): Ditto.
	(internal_fn_len_index): Ditto.
	(internal_fn_mask_index): Ditto.
	(internal_fn_stored_value_index): Ditto.
	(internal_strided_fn_supported_p): Ditto.
	* internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
	(MASK_LEN_STRIDED_STORE): Ditto.
	* internal-fn.h (internal_strided_fn_p): Ditto.
	(internal_strided_fn_supported_p): Ditto.
	* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi     | 51 +++++++++++++++++++++++++++++++++++++++++++++
 gcc/internal-fn.cc  | 44 ++++++++++++++++++++++++++++++++++++++
 gcc/internal-fn.def |  4 ++++
 gcc/internal-fn.h   |  2 ++
 gcc/optabs.def      |  2 ++
 5 files changed, 103 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index fab2513105a..5bac713a0dd 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of the result should
 be loaded from memory and clear if element @var{i} of the result should be undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}@var{n}}
+Load several separate memory locations into a destination vector of mode @var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of mode @var{n}.
+The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step.
+For each element index i:
+
+@itemize @bullet
+@item
+extend the stride to address width, using zero
+extension if operand 3 is 1 and sign extension if operand 3 is zero;
+@item
+multiply the extended stride by operand 4;
+@item
+add the result to the base; and
+@item
+load the value at that address (operand 1 + @var{i} * multiplied and extended stride) into element @var{i} of operand 0.
+@end itemize
+
+Similar to mask_len_load, the instruction loads at most (operand 6 + operand 7) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be undefined.
+Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5157,31 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}@var{n}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of mode @var{n}.
+Operand 2 is the vector of values that should be stored, which is of mode @var{m}.
+The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step.
+For each element index i:
+
+@itemize @bullet
+@item
+extend the stride to address width, using zero
+extension if operand 2 is 1 and sign extension if operand 2 is zero;
+@item
+multiply the extended stride by operand 3;
+@item
+add the result to the base; and
+@item
+store element @var{i} of operand 4 to that address (operand 1 + @var{i} * multiplied and extended stride).
+@end itemize
+
+Similar to mask_len_store, the instruction stores at most (operand 6 + operand 7) elements of (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
+Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index e7451b96353..f7f85aa7dde 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4596,6 +4596,7 @@ internal_load_fn_p (internal_fn fn)
     case IFN_GATHER_LOAD:
     case IFN_MASK_GATHER_LOAD:
     case IFN_MASK_LEN_GATHER_LOAD:
+    case IFN_MASK_LEN_STRIDED_LOAD:
     case IFN_LEN_LOAD:
     case IFN_MASK_LEN_LOAD:
       return true;
@@ -4648,6 +4649,22 @@ internal_gather_scatter_fn_p (internal_fn fn)
     }
 }
 
+/* Return true if IFN is some form of strided load or strided store.  */
+
+bool
+internal_strided_fn_p (internal_fn fn)
+{
+  switch (fn)
+    {
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return true;
+
+    default:
+      return false;
+    }
+}
+
 /* If FN takes a vector len argument, return the index of that argument,
    otherwise return -1.  */
 
@@ -4662,6 +4679,8 @@ internal_fn_len_index (internal_fn fn)
 
     case IFN_MASK_LEN_GATHER_LOAD:
     case IFN_MASK_LEN_SCATTER_STORE:
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
     case IFN_COND_LEN_FMA:
     case IFN_COND_LEN_FMS:
     case IFN_COND_LEN_FNMA:
@@ -4719,6 +4738,8 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_SCATTER_STORE:
     case IFN_MASK_LEN_GATHER_LOAD:
     case IFN_MASK_LEN_SCATTER_STORE:
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
       return 4;
 
     default:
@@ -4740,6 +4761,7 @@ internal_fn_stored_value_index (internal_fn fn)
     case IFN_SCATTER_STORE:
     case IFN_MASK_SCATTER_STORE:
     case IFN_MASK_LEN_SCATTER_STORE:
+    case IFN_MASK_LEN_STRIDED_STORE:
       return 3;
 
     case IFN_LEN_STORE:
@@ -4784,6 +4806,28 @@ internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type,
 	  && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale)));
 }
 
+/* Return true if the target supports strided load or strided store function
+   IFN.  For loads, VECTOR_TYPE is the vector type of the load result,
+   while for stores it is the vector type of the stored data argument.
+   STRIDE_TYPE is the type that holds the stride from the previous element
+   memory address of each loaded or stored element.
+   SCALE is the amount by which these stride should be multiplied
+   *after* they have been extended to address width.  */
+
+bool
+internal_strided_fn_supported_p (internal_fn ifn, tree vector_type,
+				 tree offset_type, int scale)
+{
+  optab optab = direct_internal_fn_optab (ifn);
+  insn_code icode = convert_optab_handler (optab, TYPE_MODE (vector_type),
+					   TYPE_MODE (offset_type));
+  int output_ops = internal_load_fn_p (ifn) ? 1 : 0;
+  bool unsigned_p = TYPE_UNSIGNED (offset_type);
+  return (icode != CODE_FOR_nothing
+	  && insn_operand_matches (icode, 2 + output_ops, GEN_INT (unsigned_p))
+	  && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale)));
+}
+
 /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN
    for pointers of type TYPE when the accesses have LENGTH bytes and their
    common byte alignment is ALIGN.  */
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a2023ab9c3d..0fa532e8f6b 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -199,6 +199,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
 		       mask_gather_load, gather_load)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
 		       mask_len_gather_load, gather_load)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
+		       mask_len_strided_load, gather_load)
 
 DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load)
@@ -208,6 +210,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
 		       mask_scatter_store, scatter_store)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
 		       mask_len_scatter_store, scatter_store)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
+		       mask_len_strided_store, scatter_store)
 
 DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
 DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 99de13a0199..8379c61dff7 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -235,11 +235,13 @@ extern bool can_interpret_as_conditional_op_p (gimple *, tree *,
 extern bool internal_load_fn_p (internal_fn);
 extern bool internal_store_fn_p (internal_fn);
 extern bool internal_gather_scatter_fn_p (internal_fn);
+extern bool internal_strided_fn_p (internal_fn);
 extern int internal_fn_mask_index (internal_fn);
 extern int internal_fn_len_index (internal_fn);
 extern int internal_fn_stored_value_index (internal_fn);
 extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree,
 						    tree, tree, int);
+extern bool internal_strided_fn_supported_p (internal_fn, tree, tree, int);
 extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree,
 						poly_uint64, unsigned int);
 #define VECT_PARTIAL_BIAS_UNSUPPORTED 127
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2ccbe4197b7..3d85ac5f678 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -98,9 +98,11 @@ OPTAB_CD(mask_len_store_optab, "mask_len_store$a$b")
 OPTAB_CD(gather_load_optab, "gather_load$a$b")
 OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
 OPTAB_CD(mask_len_gather_load_optab, "mask_len_gather_load$a$b")
+OPTAB_CD(mask_len_strided_load_optab, "mask_len_strided_load$a$b")
 OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
 OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
 OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b")
+OPTAB_CD(mask_len_strided_store_optab, "mask_len_strided_store$a$b")
 OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
 OPTAB_CD(vec_init_optab, "vec_init$a$b")
 
-- 
2.36.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-11-03  8:29 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-31  9:59 [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN Juzhe-Zhong
2023-10-31 14:08 ` Li, Pan2
2023-11-02 13:52 ` Richard Biener
2023-11-02 14:23   ` 钟居哲
2023-11-02 14:27     ` Richard Biener
2023-11-02 14:29       ` 钟居哲
2023-11-02 14:34         ` Richard Biener
2023-11-03  7:10           ` juzhe.zhong
2023-11-03  7:40             ` Richard Biener
2023-11-03  7:49               ` juzhe.zhong
2023-11-03  8:27                 ` Richard Biener
2023-11-03  8:18               ` juzhe.zhong
2023-11-03  8:29                 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).