public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store
@ 2024-05-28  3:14 pan2.li
  2024-06-04 13:22 ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: pan2.li @ 2024-05-28  3:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, tamar.christina, richard.guenther, Pan Li

From: Pan Li <pan2.li@intel.com>

This patch would like to add new internal fun for the below 2 IFN.
* mask_len_strided_load
* mask_len_strided_store

The GIMPLE v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) will
be expanded into v = mask_len_strided_load (ptr, stried, mask, len, bias).

The GIMPLE MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
be expanded into mask_len_stried_store (ptr, stride, v, mask, len, bias).

The below test suites are passed for this patch:
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

	* doc/md.texi: Add description for mask_len_strided_load/store.
	* internal-fn.cc (strided_load_direct): New internal_fn define
	for strided_load_direct.
	(strided_store_direct): Ditto but for store.
	(expand_strided_load_optab_fn): New expand func for
	mask_len_strided_load.
	(expand_strided_store_optab_fn): Ditto but for store.
	(direct_strided_load_optab_supported_p): New define for load
	direct optab supported.
	(direct_strided_store_optab_supported_p): Ditto but for store.
	(internal_fn_len_index): Add len index for both load and store.
	(internal_fn_mask_index): Ditto but for mask index.
	(internal_fn_stored_value_index): Add stored index.
	* internal-fn.def (MASK_LEN_STRIDED_LOAD): New direct fn define
	for strided_load.
	(MASK_LEN_STRIDED_STORE): Ditto but for stride_store.
	* optabs.def (OPTAB_D): New optab define for load and store.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
---
 gcc/doc/md.texi     | 27 ++++++++++++++++
 gcc/internal-fn.cc  | 75 +++++++++++++++++++++++++++++++++++++++++++++
 gcc/internal-fn.def |  6 ++++
 gcc/optabs.def      |  2 ++
 4 files changed, 110 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..3d242675c63 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5138,6 +5138,20 @@ Bit @var{i} of the mask is set if element @var{i} of the result should
 be loaded from memory and clear if element @var{i} of the result should be undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode @var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
+The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory.
+Element @var{i} of the mask (operand 3) is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5175,6 +5189,19 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode @var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
+The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory.
+Element @var{i} of the mask is set if element @var{i} of (operand 3) should be stored.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9c09026793f..f6e5329cd84 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -159,6 +159,7 @@ init_internal_fns ()
 #define load_lanes_direct { -1, -1, false }
 #define mask_load_lanes_direct { -1, -1, false }
 #define gather_load_direct { 3, 1, false }
+#define strided_load_direct { -1, -1, false }
 #define len_load_direct { -1, -1, false }
 #define mask_len_load_direct { -1, 4, false }
 #define mask_store_direct { 3, 2, false }
@@ -168,6 +169,7 @@ init_internal_fns ()
 #define vec_cond_mask_len_direct { 1, 1, false }
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
+#define strided_store_direct { 1, 1, false }
 #define len_store_direct { 3, 3, false }
 #define mask_len_store_direct { 4, 5, false }
 #define vec_set_direct { 3, 3, false }
@@ -3668,6 +3670,68 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
     emit_move_insn (lhs_rtx, ops[0].value);
 }
 
+/* Expand MASK_LEN_STRIDED_LOAD call CALL by optab OPTAB.  */
+
+static void
+expand_strided_load_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
+			      direct_optab optab)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree base = gimple_call_arg (stmt, 0);
+  tree stride = gimple_call_arg (stmt, 1);
+
+  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx base_rtx = expand_normal (base);
+  rtx stride_rtx = expand_normal (stride);
+
+  unsigned i = 0;
+  class expand_operand ops[6];
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+
+  create_output_operand (&ops[i++], lhs_rtx, mode);
+  create_address_operand (&ops[i++], base_rtx);
+  create_address_operand (&ops[i++], stride_rtx);
+
+  insn_code icode = direct_optab_handler (optab, mode);
+
+  i = add_mask_and_len_args (ops, i, stmt);
+  expand_insn (icode, i, ops);
+
+  if (!rtx_equal_p (lhs_rtx, ops[0].value))
+    emit_move_insn (lhs_rtx, ops[0].value);
+}
+
+/* Expand MASK_LEN_STRIDED_STORE call CALL by optab OPTAB.  */
+
+static void
+expand_strided_store_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
+			       direct_optab optab)
+{
+  internal_fn fn = gimple_call_internal_fn (stmt);
+  int rhs_index = internal_fn_stored_value_index (fn);
+
+  tree base = gimple_call_arg (stmt, 0);
+  tree stride = gimple_call_arg (stmt, 1);
+  tree rhs = gimple_call_arg (stmt, rhs_index);
+
+  rtx base_rtx = expand_normal (base);
+  rtx stride_rtx = expand_normal (stride);
+  rtx rhs_rtx = expand_normal (rhs);
+
+  unsigned i = 0;
+  class expand_operand ops[6];
+  machine_mode mode = TYPE_MODE (TREE_TYPE (rhs));
+
+  create_address_operand (&ops[i++], base_rtx);
+  create_address_operand (&ops[i++], stride_rtx);
+  create_input_operand (&ops[i++], rhs_rtx, mode);
+
+  insn_code icode = direct_optab_handler (optab, mode);
+  i = add_mask_and_len_args (ops, i, stmt);
+
+  expand_insn (icode, i, ops);
+}
+
 /* Helper for expand_DIVMOD.  Return true if the sequence starting with
    INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes.  */
 
@@ -4058,6 +4122,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_gather_load_optab_supported_p convert_optab_supported_p
+#define direct_strided_load_optab_supported_p direct_optab_supported_p
 #define direct_len_load_optab_supported_p direct_optab_supported_p
 #define direct_mask_len_load_optab_supported_p convert_optab_supported_p
 #define direct_mask_store_optab_supported_p convert_optab_supported_p
@@ -4066,6 +4131,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
 #define direct_vec_cond_optab_supported_p convert_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
+#define direct_strided_store_optab_supported_p direct_optab_supported_p
 #define direct_len_store_optab_supported_p direct_optab_supported_p
 #define direct_mask_len_store_optab_supported_p convert_optab_supported_p
 #define direct_while_optab_supported_p convert_optab_supported_p
@@ -4723,6 +4789,8 @@ internal_fn_len_index (internal_fn fn)
     case IFN_COND_LEN_XOR:
     case IFN_COND_LEN_SHL:
     case IFN_COND_LEN_SHR:
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
       return 4;
 
     case IFN_COND_LEN_NEG:
@@ -4817,6 +4885,10 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_LEN_STORE:
       return 2;
 
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return 3;
+
     case IFN_MASK_GATHER_LOAD:
     case IFN_MASK_SCATTER_STORE:
     case IFN_MASK_LEN_GATHER_LOAD:
@@ -4840,6 +4912,9 @@ internal_fn_stored_value_index (internal_fn fn)
 {
   switch (fn)
     {
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return 2;
+
     case IFN_MASK_STORE:
     case IFN_MASK_STORE_LANES:
     case IFN_SCATTER_STORE:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 25badbb86e5..b30a7a5b009 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
    - mask_load_lanes: currently just vec_mask_load_lanes
    - mask_len_load_lanes: currently just vec_mask_len_load_lanes
    - gather_load: used for {mask_,mask_len_,}gather_load
+   - strided_load: currently just mask_len_strided_load
    - len_load: currently just len_load
    - mask_len_load: currently just mask_len_load
 
@@ -64,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
    - mask_store_lanes: currently just vec_mask_store_lanes
    - mask_len_store_lanes: currently just vec_mask_len_store_lanes
    - scatter_store: used for {mask_,mask_len_,}scatter_store
+   - strided_store: currently just mask_len_strided_store
    - len_store: currently just len_store
    - mask_len_store: currently just mask_len_store
 
@@ -212,6 +214,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
 		       mask_gather_load, gather_load)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
 		       mask_len_gather_load, gather_load)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
+		       mask_len_strided_load, strided_load)
 
 DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load)
@@ -221,6 +225,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
 		       mask_scatter_store, scatter_store)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
 		       mask_len_scatter_store, scatter_store)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
+		       mask_len_strided_store, strided_store)
 
 DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
 DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 3f2cb46aff8..630b1de8f97 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -539,4 +539,6 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (mask_len_strided_load_optab, "mask_len_strided_load_$a")
+OPTAB_D (mask_len_strided_store_optab, "mask_len_strided_store_$a")
 OPTAB_D (select_vl_optab, "select_vl$a")
-- 
2.34.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store
  2024-05-28  3:14 [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store pan2.li
@ 2024-06-04 13:22 ` Richard Biener
  2024-06-05  1:18   ` Li, Pan2
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2024-06-04 13:22 UTC (permalink / raw)
  To: pan2.li, Richard Sandiford
  Cc: gcc-patches, juzhe.zhong, kito.cheng, tamar.christina

On Tue, May 28, 2024 at 5:15 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to add new internal fun for the below 2 IFN.
> * mask_len_strided_load
> * mask_len_strided_store
>
> The GIMPLE v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) will
> be expanded into v = mask_len_strided_load (ptr, stried, mask, len, bias).
>
> The GIMPLE MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
> be expanded into mask_len_stried_store (ptr, stride, v, mask, len, bias).
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

Sorry if we have discussed this last year already - is there anything wrong
with using a gather/scatter with a VEC_SERIES gimple/rtl def for the offset?

Richard.

> gcc/ChangeLog:
>
>         * doc/md.texi: Add description for mask_len_strided_load/store.
>         * internal-fn.cc (strided_load_direct): New internal_fn define
>         for strided_load_direct.
>         (strided_store_direct): Ditto but for store.
>         (expand_strided_load_optab_fn): New expand func for
>         mask_len_strided_load.
>         (expand_strided_store_optab_fn): Ditto but for store.
>         (direct_strided_load_optab_supported_p): New define for load
>         direct optab supported.
>         (direct_strided_store_optab_supported_p): Ditto but for store.
>         (internal_fn_len_index): Add len index for both load and store.
>         (internal_fn_mask_index): Ditto but for mask index.
>         (internal_fn_stored_value_index): Add stored index.
>         * internal-fn.def (MASK_LEN_STRIDED_LOAD): New direct fn define
>         for strided_load.
>         (MASK_LEN_STRIDED_STORE): Ditto but for stride_store.
>         * optabs.def (OPTAB_D): New optab define for load and store.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
> ---
>  gcc/doc/md.texi     | 27 ++++++++++++++++
>  gcc/internal-fn.cc  | 75 +++++++++++++++++++++++++++++++++++++++++++++
>  gcc/internal-fn.def |  6 ++++
>  gcc/optabs.def      |  2 ++
>  4 files changed, 110 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..3d242675c63 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5138,6 +5138,20 @@ Bit @var{i} of the mask is set if element @var{i} of the result should
>  be loaded from memory and clear if element @var{i} of the result should be undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
> +The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory.
> +Element @var{i} of the mask (operand 3) is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be zero.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5175,6 +5189,19 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +Operand 2 is the vector of values that should be stored, which is of mode @var{m}.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
> +The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 1 as step.
> +For each element index i store address is operand 0 + @var{i} * operand 1.
> +Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory.
> +Element @var{i} of the mask is set if element @var{i} of (operand 3) should be stored.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 9c09026793f..f6e5329cd84 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -159,6 +159,7 @@ init_internal_fns ()
>  #define load_lanes_direct { -1, -1, false }
>  #define mask_load_lanes_direct { -1, -1, false }
>  #define gather_load_direct { 3, 1, false }
> +#define strided_load_direct { -1, -1, false }
>  #define len_load_direct { -1, -1, false }
>  #define mask_len_load_direct { -1, 4, false }
>  #define mask_store_direct { 3, 2, false }
> @@ -168,6 +169,7 @@ init_internal_fns ()
>  #define vec_cond_mask_len_direct { 1, 1, false }
>  #define vec_cond_direct { 2, 0, false }
>  #define scatter_store_direct { 3, 1, false }
> +#define strided_store_direct { 1, 1, false }
>  #define len_store_direct { 3, 3, false }
>  #define mask_len_store_direct { 4, 5, false }
>  #define vec_set_direct { 3, 3, false }
> @@ -3668,6 +3670,68 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>      emit_move_insn (lhs_rtx, ops[0].value);
>  }
>
> +/* Expand MASK_LEN_STRIDED_LOAD call CALL by optab OPTAB.  */
> +
> +static void
> +expand_strided_load_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
> +                             direct_optab optab)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree base = gimple_call_arg (stmt, 0);
> +  tree stride = gimple_call_arg (stmt, 1);
> +
> +  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  rtx base_rtx = expand_normal (base);
> +  rtx stride_rtx = expand_normal (stride);
> +
> +  unsigned i = 0;
> +  class expand_operand ops[6];
> +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> +
> +  create_output_operand (&ops[i++], lhs_rtx, mode);
> +  create_address_operand (&ops[i++], base_rtx);
> +  create_address_operand (&ops[i++], stride_rtx);
> +
> +  insn_code icode = direct_optab_handler (optab, mode);
> +
> +  i = add_mask_and_len_args (ops, i, stmt);
> +  expand_insn (icode, i, ops);
> +
> +  if (!rtx_equal_p (lhs_rtx, ops[0].value))
> +    emit_move_insn (lhs_rtx, ops[0].value);
> +}
> +
> +/* Expand MASK_LEN_STRIDED_STORE call CALL by optab OPTAB.  */
> +
> +static void
> +expand_strided_store_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
> +                              direct_optab optab)
> +{
> +  internal_fn fn = gimple_call_internal_fn (stmt);
> +  int rhs_index = internal_fn_stored_value_index (fn);
> +
> +  tree base = gimple_call_arg (stmt, 0);
> +  tree stride = gimple_call_arg (stmt, 1);
> +  tree rhs = gimple_call_arg (stmt, rhs_index);
> +
> +  rtx base_rtx = expand_normal (base);
> +  rtx stride_rtx = expand_normal (stride);
> +  rtx rhs_rtx = expand_normal (rhs);
> +
> +  unsigned i = 0;
> +  class expand_operand ops[6];
> +  machine_mode mode = TYPE_MODE (TREE_TYPE (rhs));
> +
> +  create_address_operand (&ops[i++], base_rtx);
> +  create_address_operand (&ops[i++], stride_rtx);
> +  create_input_operand (&ops[i++], rhs_rtx, mode);
> +
> +  insn_code icode = direct_optab_handler (optab, mode);
> +  i = add_mask_and_len_args (ops, i, stmt);
> +
> +  expand_insn (icode, i, ops);
> +}
> +
>  /* Helper for expand_DIVMOD.  Return true if the sequence starting with
>     INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes.  */
>
> @@ -4058,6 +4122,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_gather_load_optab_supported_p convert_optab_supported_p
> +#define direct_strided_load_optab_supported_p direct_optab_supported_p
>  #define direct_len_load_optab_supported_p direct_optab_supported_p
>  #define direct_mask_len_load_optab_supported_p convert_optab_supported_p
>  #define direct_mask_store_optab_supported_p convert_optab_supported_p
> @@ -4066,6 +4131,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
>  #define direct_vec_cond_optab_supported_p convert_optab_supported_p
>  #define direct_scatter_store_optab_supported_p convert_optab_supported_p
> +#define direct_strided_store_optab_supported_p direct_optab_supported_p
>  #define direct_len_store_optab_supported_p direct_optab_supported_p
>  #define direct_mask_len_store_optab_supported_p convert_optab_supported_p
>  #define direct_while_optab_supported_p convert_optab_supported_p
> @@ -4723,6 +4789,8 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_COND_LEN_XOR:
>      case IFN_COND_LEN_SHL:
>      case IFN_COND_LEN_SHR:
> +    case IFN_MASK_LEN_STRIDED_LOAD:
> +    case IFN_MASK_LEN_STRIDED_STORE:
>        return 4;
>
>      case IFN_COND_LEN_NEG:
> @@ -4817,6 +4885,10 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_MASK_LEN_STORE:
>        return 2;
>
> +    case IFN_MASK_LEN_STRIDED_LOAD:
> +    case IFN_MASK_LEN_STRIDED_STORE:
> +      return 3;
> +
>      case IFN_MASK_GATHER_LOAD:
>      case IFN_MASK_SCATTER_STORE:
>      case IFN_MASK_LEN_GATHER_LOAD:
> @@ -4840,6 +4912,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  {
>    switch (fn)
>      {
> +    case IFN_MASK_LEN_STRIDED_STORE:
> +      return 2;
> +
>      case IFN_MASK_STORE:
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 25badbb86e5..b30a7a5b009 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_load_lanes: currently just vec_mask_load_lanes
>     - mask_len_load_lanes: currently just vec_mask_len_load_lanes
>     - gather_load: used for {mask_,mask_len_,}gather_load
> +   - strided_load: currently just mask_len_strided_load
>     - len_load: currently just len_load
>     - mask_len_load: currently just mask_len_load
>
> @@ -64,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_store_lanes: currently just vec_mask_store_lanes
>     - mask_len_store_lanes: currently just vec_mask_len_store_lanes
>     - scatter_store: used for {mask_,mask_len_,}scatter_store
> +   - strided_store: currently just mask_len_strided_store
>     - len_store: currently just len_store
>     - mask_len_store: currently just mask_len_store
>
> @@ -212,6 +214,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
>                        mask_gather_load, gather_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
>                        mask_len_gather_load, gather_load)
> +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
> +                      mask_len_strided_load, strided_load)
>
>  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load)
> @@ -221,6 +225,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
>                        mask_scatter_store, scatter_store)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
>                        mask_len_scatter_store, scatter_store)
> +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
> +                      mask_len_strided_store, strided_store)
>
>  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
>  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 3f2cb46aff8..630b1de8f97 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -539,4 +539,6 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
>  OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
>  OPTAB_D (len_load_optab, "len_load_$a")
>  OPTAB_D (len_store_optab, "len_store_$a")
> +OPTAB_D (mask_len_strided_load_optab, "mask_len_strided_load_$a")
> +OPTAB_D (mask_len_strided_store_optab, "mask_len_strided_store_$a")
>  OPTAB_D (select_vl_optab, "select_vl$a")
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store
  2024-06-04 13:22 ` Richard Biener
@ 2024-06-05  1:18   ` Li, Pan2
  2024-06-05  7:50     ` Li, Pan2
  0 siblings, 1 reply; 4+ messages in thread
From: Li, Pan2 @ 2024-06-05  1:18 UTC (permalink / raw)
  To: Richard Biener, Richard Sandiford
  Cc: gcc-patches, juzhe.zhong, kito.cheng, tamar.christina

> Sorry if we have discussed this last year already - is there anything wrong
> with using a gather/scatter with a VEC_SERIES gimple/rtl def for the offset?

Thanks for comments, it is quit a while since last discussion. Let me recall a little about it and keep you posted.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Tuesday, June 4, 2024 9:22 PM
To: Li, Pan2 <pan2.li@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com
Subject: Re: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

On Tue, May 28, 2024 at 5:15 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to add new internal fun for the below 2 IFN.
> * mask_len_strided_load
> * mask_len_strided_store
>
> The GIMPLE v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) will
> be expanded into v = mask_len_strided_load (ptr, stried, mask, len, bias).
>
> The GIMPLE MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
> be expanded into mask_len_stried_store (ptr, stride, v, mask, len, bias).
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

Sorry if we have discussed this last year already - is there anything wrong
with using a gather/scatter with a VEC_SERIES gimple/rtl def for the offset?

Richard.

> gcc/ChangeLog:
>
>         * doc/md.texi: Add description for mask_len_strided_load/store.
>         * internal-fn.cc (strided_load_direct): New internal_fn define
>         for strided_load_direct.
>         (strided_store_direct): Ditto but for store.
>         (expand_strided_load_optab_fn): New expand func for
>         mask_len_strided_load.
>         (expand_strided_store_optab_fn): Ditto but for store.
>         (direct_strided_load_optab_supported_p): New define for load
>         direct optab supported.
>         (direct_strided_store_optab_supported_p): Ditto but for store.
>         (internal_fn_len_index): Add len index for both load and store.
>         (internal_fn_mask_index): Ditto but for mask index.
>         (internal_fn_stored_value_index): Add stored index.
>         * internal-fn.def (MASK_LEN_STRIDED_LOAD): New direct fn define
>         for strided_load.
>         (MASK_LEN_STRIDED_STORE): Ditto but for stride_store.
>         * optabs.def (OPTAB_D): New optab define for load and store.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
> ---
>  gcc/doc/md.texi     | 27 ++++++++++++++++
>  gcc/internal-fn.cc  | 75 +++++++++++++++++++++++++++++++++++++++++++++
>  gcc/internal-fn.def |  6 ++++
>  gcc/optabs.def      |  2 ++
>  4 files changed, 110 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..3d242675c63 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5138,6 +5138,20 @@ Bit @var{i} of the mask is set if element @var{i} of the result should
>  be loaded from memory and clear if element @var{i} of the result should be undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
> +The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory.
> +Element @var{i} of the mask (operand 3) is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be zero.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5175,6 +5189,19 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +Operand 2 is the vector of values that should be stored, which is of mode @var{m}.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
> +The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 1 as step.
> +For each element index i store address is operand 0 + @var{i} * operand 1.
> +Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory.
> +Element @var{i} of the mask is set if element @var{i} of (operand 3) should be stored.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 9c09026793f..f6e5329cd84 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -159,6 +159,7 @@ init_internal_fns ()
>  #define load_lanes_direct { -1, -1, false }
>  #define mask_load_lanes_direct { -1, -1, false }
>  #define gather_load_direct { 3, 1, false }
> +#define strided_load_direct { -1, -1, false }
>  #define len_load_direct { -1, -1, false }
>  #define mask_len_load_direct { -1, 4, false }
>  #define mask_store_direct { 3, 2, false }
> @@ -168,6 +169,7 @@ init_internal_fns ()
>  #define vec_cond_mask_len_direct { 1, 1, false }
>  #define vec_cond_direct { 2, 0, false }
>  #define scatter_store_direct { 3, 1, false }
> +#define strided_store_direct { 1, 1, false }
>  #define len_store_direct { 3, 3, false }
>  #define mask_len_store_direct { 4, 5, false }
>  #define vec_set_direct { 3, 3, false }
> @@ -3668,6 +3670,68 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>      emit_move_insn (lhs_rtx, ops[0].value);
>  }
>
> +/* Expand MASK_LEN_STRIDED_LOAD call CALL by optab OPTAB.  */
> +
> +static void
> +expand_strided_load_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
> +                             direct_optab optab)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree base = gimple_call_arg (stmt, 0);
> +  tree stride = gimple_call_arg (stmt, 1);
> +
> +  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  rtx base_rtx = expand_normal (base);
> +  rtx stride_rtx = expand_normal (stride);
> +
> +  unsigned i = 0;
> +  class expand_operand ops[6];
> +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> +
> +  create_output_operand (&ops[i++], lhs_rtx, mode);
> +  create_address_operand (&ops[i++], base_rtx);
> +  create_address_operand (&ops[i++], stride_rtx);
> +
> +  insn_code icode = direct_optab_handler (optab, mode);
> +
> +  i = add_mask_and_len_args (ops, i, stmt);
> +  expand_insn (icode, i, ops);
> +
> +  if (!rtx_equal_p (lhs_rtx, ops[0].value))
> +    emit_move_insn (lhs_rtx, ops[0].value);
> +}
> +
> +/* Expand MASK_LEN_STRIDED_STORE call CALL by optab OPTAB.  */
> +
> +static void
> +expand_strided_store_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
> +                              direct_optab optab)
> +{
> +  internal_fn fn = gimple_call_internal_fn (stmt);
> +  int rhs_index = internal_fn_stored_value_index (fn);
> +
> +  tree base = gimple_call_arg (stmt, 0);
> +  tree stride = gimple_call_arg (stmt, 1);
> +  tree rhs = gimple_call_arg (stmt, rhs_index);
> +
> +  rtx base_rtx = expand_normal (base);
> +  rtx stride_rtx = expand_normal (stride);
> +  rtx rhs_rtx = expand_normal (rhs);
> +
> +  unsigned i = 0;
> +  class expand_operand ops[6];
> +  machine_mode mode = TYPE_MODE (TREE_TYPE (rhs));
> +
> +  create_address_operand (&ops[i++], base_rtx);
> +  create_address_operand (&ops[i++], stride_rtx);
> +  create_input_operand (&ops[i++], rhs_rtx, mode);
> +
> +  insn_code icode = direct_optab_handler (optab, mode);
> +  i = add_mask_and_len_args (ops, i, stmt);
> +
> +  expand_insn (icode, i, ops);
> +}
> +
>  /* Helper for expand_DIVMOD.  Return true if the sequence starting with
>     INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes.  */
>
> @@ -4058,6 +4122,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_gather_load_optab_supported_p convert_optab_supported_p
> +#define direct_strided_load_optab_supported_p direct_optab_supported_p
>  #define direct_len_load_optab_supported_p direct_optab_supported_p
>  #define direct_mask_len_load_optab_supported_p convert_optab_supported_p
>  #define direct_mask_store_optab_supported_p convert_optab_supported_p
> @@ -4066,6 +4131,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
>  #define direct_vec_cond_optab_supported_p convert_optab_supported_p
>  #define direct_scatter_store_optab_supported_p convert_optab_supported_p
> +#define direct_strided_store_optab_supported_p direct_optab_supported_p
>  #define direct_len_store_optab_supported_p direct_optab_supported_p
>  #define direct_mask_len_store_optab_supported_p convert_optab_supported_p
>  #define direct_while_optab_supported_p convert_optab_supported_p
> @@ -4723,6 +4789,8 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_COND_LEN_XOR:
>      case IFN_COND_LEN_SHL:
>      case IFN_COND_LEN_SHR:
> +    case IFN_MASK_LEN_STRIDED_LOAD:
> +    case IFN_MASK_LEN_STRIDED_STORE:
>        return 4;
>
>      case IFN_COND_LEN_NEG:
> @@ -4817,6 +4885,10 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_MASK_LEN_STORE:
>        return 2;
>
> +    case IFN_MASK_LEN_STRIDED_LOAD:
> +    case IFN_MASK_LEN_STRIDED_STORE:
> +      return 3;
> +
>      case IFN_MASK_GATHER_LOAD:
>      case IFN_MASK_SCATTER_STORE:
>      case IFN_MASK_LEN_GATHER_LOAD:
> @@ -4840,6 +4912,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  {
>    switch (fn)
>      {
> +    case IFN_MASK_LEN_STRIDED_STORE:
> +      return 2;
> +
>      case IFN_MASK_STORE:
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 25badbb86e5..b30a7a5b009 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_load_lanes: currently just vec_mask_load_lanes
>     - mask_len_load_lanes: currently just vec_mask_len_load_lanes
>     - gather_load: used for {mask_,mask_len_,}gather_load
> +   - strided_load: currently just mask_len_strided_load
>     - len_load: currently just len_load
>     - mask_len_load: currently just mask_len_load
>
> @@ -64,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_store_lanes: currently just vec_mask_store_lanes
>     - mask_len_store_lanes: currently just vec_mask_len_store_lanes
>     - scatter_store: used for {mask_,mask_len_,}scatter_store
> +   - strided_store: currently just mask_len_strided_store
>     - len_store: currently just len_store
>     - mask_len_store: currently just mask_len_store
>
> @@ -212,6 +214,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
>                        mask_gather_load, gather_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
>                        mask_len_gather_load, gather_load)
> +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
> +                      mask_len_strided_load, strided_load)
>
>  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load)
> @@ -221,6 +225,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
>                        mask_scatter_store, scatter_store)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
>                        mask_len_scatter_store, scatter_store)
> +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
> +                      mask_len_strided_store, strided_store)
>
>  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
>  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 3f2cb46aff8..630b1de8f97 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -539,4 +539,6 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
>  OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
>  OPTAB_D (len_load_optab, "len_load_$a")
>  OPTAB_D (len_store_optab, "len_store_$a")
> +OPTAB_D (mask_len_strided_load_optab, "mask_len_strided_load_$a")
> +OPTAB_D (mask_len_strided_store_optab, "mask_len_strided_store_$a")
>  OPTAB_D (select_vl_optab, "select_vl$a")
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store
  2024-06-05  1:18   ` Li, Pan2
@ 2024-06-05  7:50     ` Li, Pan2
  0 siblings, 0 replies; 4+ messages in thread
From: Li, Pan2 @ 2024-06-05  7:50 UTC (permalink / raw)
  To: Richard Biener, Richard Sandiford
  Cc: gcc-patches, juzhe.zhong, kito.cheng, tamar.christina

Looks not easy to get the original context/history, only catch some shadow from below patch but not the fully picture.

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634683.html

It is reasonable to me that using gather/scatter with a VEC_SERICES, for example as blow, will have a try for this.

operand_0 = mask_gather_loadmn (ptr, offset, 1/0(sign/unsign), multiply, mask)
  offset = (vec_series:m base step) => base + i * step
  op_0[i] = memory[ptr + offset[i] * multiply] && mask[i]

operand_0 = mask_len_strided_load (ptr, stride, mask, len, bias).
  op_0[i] = memory[prt + stride * i] && mask[i] && i < (len + bias)

Pan

-----Original Message-----
From: Li, Pan2 
Sent: Wednesday, June 5, 2024 9:18 AM
To: Richard Biener <richard.guenther@gmail.com>; Richard Sandiford <richard.sandiford@arm.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com
Subject: RE: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

> Sorry if we have discussed this last year already - is there anything wrong
> with using a gather/scatter with a VEC_SERIES gimple/rtl def for the offset?

Thanks for comments, it is quit a while since last discussion. Let me recall a little about it and keep you posted.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Tuesday, June 4, 2024 9:22 PM
To: Li, Pan2 <pan2.li@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com
Subject: Re: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

On Tue, May 28, 2024 at 5:15 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to add new internal fun for the below 2 IFN.
> * mask_len_strided_load
> * mask_len_strided_store
>
> The GIMPLE v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) will
> be expanded into v = mask_len_strided_load (ptr, stried, mask, len, bias).
>
> The GIMPLE MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
> be expanded into mask_len_stried_store (ptr, stride, v, mask, len, bias).
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

Sorry if we have discussed this last year already - is there anything wrong
with using a gather/scatter with a VEC_SERIES gimple/rtl def for the offset?

Richard.

> gcc/ChangeLog:
>
>         * doc/md.texi: Add description for mask_len_strided_load/store.
>         * internal-fn.cc (strided_load_direct): New internal_fn define
>         for strided_load_direct.
>         (strided_store_direct): Ditto but for store.
>         (expand_strided_load_optab_fn): New expand func for
>         mask_len_strided_load.
>         (expand_strided_store_optab_fn): Ditto but for store.
>         (direct_strided_load_optab_supported_p): New define for load
>         direct optab supported.
>         (direct_strided_store_optab_supported_p): Ditto but for store.
>         (internal_fn_len_index): Add len index for both load and store.
>         (internal_fn_mask_index): Ditto but for mask index.
>         (internal_fn_stored_value_index): Add stored index.
>         * internal-fn.def (MASK_LEN_STRIDED_LOAD): New direct fn define
>         for strided_load.
>         (MASK_LEN_STRIDED_STORE): Ditto but for stride_store.
>         * optabs.def (OPTAB_D): New optab define for load and store.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
> ---
>  gcc/doc/md.texi     | 27 ++++++++++++++++
>  gcc/internal-fn.cc  | 75 +++++++++++++++++++++++++++++++++++++++++++++
>  gcc/internal-fn.def |  6 ++++
>  gcc/optabs.def      |  2 ++
>  4 files changed, 110 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..3d242675c63 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5138,6 +5138,20 @@ Bit @var{i} of the mask is set if element @var{i} of the result should
>  be loaded from memory and clear if element @var{i} of the result should be undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
> +The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory.
> +Element @var{i} of the mask (operand 3) is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be zero.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5175,6 +5189,19 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +Operand 2 is the vector of values that should be stored, which is of mode @var{m}.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
> +The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 1 as step.
> +For each element index i store address is operand 0 + @var{i} * operand 1.
> +Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory.
> +Element @var{i} of the mask is set if element @var{i} of (operand 3) should be stored.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 9c09026793f..f6e5329cd84 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -159,6 +159,7 @@ init_internal_fns ()
>  #define load_lanes_direct { -1, -1, false }
>  #define mask_load_lanes_direct { -1, -1, false }
>  #define gather_load_direct { 3, 1, false }
> +#define strided_load_direct { -1, -1, false }
>  #define len_load_direct { -1, -1, false }
>  #define mask_len_load_direct { -1, 4, false }
>  #define mask_store_direct { 3, 2, false }
> @@ -168,6 +169,7 @@ init_internal_fns ()
>  #define vec_cond_mask_len_direct { 1, 1, false }
>  #define vec_cond_direct { 2, 0, false }
>  #define scatter_store_direct { 3, 1, false }
> +#define strided_store_direct { 1, 1, false }
>  #define len_store_direct { 3, 3, false }
>  #define mask_len_store_direct { 4, 5, false }
>  #define vec_set_direct { 3, 3, false }
> @@ -3668,6 +3670,68 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>      emit_move_insn (lhs_rtx, ops[0].value);
>  }
>
> +/* Expand MASK_LEN_STRIDED_LOAD call CALL by optab OPTAB.  */
> +
> +static void
> +expand_strided_load_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
> +                             direct_optab optab)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree base = gimple_call_arg (stmt, 0);
> +  tree stride = gimple_call_arg (stmt, 1);
> +
> +  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  rtx base_rtx = expand_normal (base);
> +  rtx stride_rtx = expand_normal (stride);
> +
> +  unsigned i = 0;
> +  class expand_operand ops[6];
> +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> +
> +  create_output_operand (&ops[i++], lhs_rtx, mode);
> +  create_address_operand (&ops[i++], base_rtx);
> +  create_address_operand (&ops[i++], stride_rtx);
> +
> +  insn_code icode = direct_optab_handler (optab, mode);
> +
> +  i = add_mask_and_len_args (ops, i, stmt);
> +  expand_insn (icode, i, ops);
> +
> +  if (!rtx_equal_p (lhs_rtx, ops[0].value))
> +    emit_move_insn (lhs_rtx, ops[0].value);
> +}
> +
> +/* Expand MASK_LEN_STRIDED_STORE call CALL by optab OPTAB.  */
> +
> +static void
> +expand_strided_store_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
> +                              direct_optab optab)
> +{
> +  internal_fn fn = gimple_call_internal_fn (stmt);
> +  int rhs_index = internal_fn_stored_value_index (fn);
> +
> +  tree base = gimple_call_arg (stmt, 0);
> +  tree stride = gimple_call_arg (stmt, 1);
> +  tree rhs = gimple_call_arg (stmt, rhs_index);
> +
> +  rtx base_rtx = expand_normal (base);
> +  rtx stride_rtx = expand_normal (stride);
> +  rtx rhs_rtx = expand_normal (rhs);
> +
> +  unsigned i = 0;
> +  class expand_operand ops[6];
> +  machine_mode mode = TYPE_MODE (TREE_TYPE (rhs));
> +
> +  create_address_operand (&ops[i++], base_rtx);
> +  create_address_operand (&ops[i++], stride_rtx);
> +  create_input_operand (&ops[i++], rhs_rtx, mode);
> +
> +  insn_code icode = direct_optab_handler (optab, mode);
> +  i = add_mask_and_len_args (ops, i, stmt);
> +
> +  expand_insn (icode, i, ops);
> +}
> +
>  /* Helper for expand_DIVMOD.  Return true if the sequence starting with
>     INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes.  */
>
> @@ -4058,6 +4122,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_gather_load_optab_supported_p convert_optab_supported_p
> +#define direct_strided_load_optab_supported_p direct_optab_supported_p
>  #define direct_len_load_optab_supported_p direct_optab_supported_p
>  #define direct_mask_len_load_optab_supported_p convert_optab_supported_p
>  #define direct_mask_store_optab_supported_p convert_optab_supported_p
> @@ -4066,6 +4131,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
>  #define direct_vec_cond_optab_supported_p convert_optab_supported_p
>  #define direct_scatter_store_optab_supported_p convert_optab_supported_p
> +#define direct_strided_store_optab_supported_p direct_optab_supported_p
>  #define direct_len_store_optab_supported_p direct_optab_supported_p
>  #define direct_mask_len_store_optab_supported_p convert_optab_supported_p
>  #define direct_while_optab_supported_p convert_optab_supported_p
> @@ -4723,6 +4789,8 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_COND_LEN_XOR:
>      case IFN_COND_LEN_SHL:
>      case IFN_COND_LEN_SHR:
> +    case IFN_MASK_LEN_STRIDED_LOAD:
> +    case IFN_MASK_LEN_STRIDED_STORE:
>        return 4;
>
>      case IFN_COND_LEN_NEG:
> @@ -4817,6 +4885,10 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_MASK_LEN_STORE:
>        return 2;
>
> +    case IFN_MASK_LEN_STRIDED_LOAD:
> +    case IFN_MASK_LEN_STRIDED_STORE:
> +      return 3;
> +
>      case IFN_MASK_GATHER_LOAD:
>      case IFN_MASK_SCATTER_STORE:
>      case IFN_MASK_LEN_GATHER_LOAD:
> @@ -4840,6 +4912,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  {
>    switch (fn)
>      {
> +    case IFN_MASK_LEN_STRIDED_STORE:
> +      return 2;
> +
>      case IFN_MASK_STORE:
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 25badbb86e5..b30a7a5b009 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_load_lanes: currently just vec_mask_load_lanes
>     - mask_len_load_lanes: currently just vec_mask_len_load_lanes
>     - gather_load: used for {mask_,mask_len_,}gather_load
> +   - strided_load: currently just mask_len_strided_load
>     - len_load: currently just len_load
>     - mask_len_load: currently just mask_len_load
>
> @@ -64,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_store_lanes: currently just vec_mask_store_lanes
>     - mask_len_store_lanes: currently just vec_mask_len_store_lanes
>     - scatter_store: used for {mask_,mask_len_,}scatter_store
> +   - strided_store: currently just mask_len_strided_store
>     - len_store: currently just len_store
>     - mask_len_store: currently just mask_len_store
>
> @@ -212,6 +214,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
>                        mask_gather_load, gather_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
>                        mask_len_gather_load, gather_load)
> +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
> +                      mask_len_strided_load, strided_load)
>
>  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load)
> @@ -221,6 +225,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
>                        mask_scatter_store, scatter_store)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
>                        mask_len_scatter_store, scatter_store)
> +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
> +                      mask_len_strided_store, strided_store)
>
>  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
>  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 3f2cb46aff8..630b1de8f97 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -539,4 +539,6 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
>  OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
>  OPTAB_D (len_load_optab, "len_load_$a")
>  OPTAB_D (len_store_optab, "len_store_$a")
> +OPTAB_D (mask_len_strided_load_optab, "mask_len_strided_load_$a")
> +OPTAB_D (mask_len_strided_store_optab, "mask_len_strided_store_$a")
>  OPTAB_D (select_vl_optab, "select_vl$a")
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-05  7:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-28  3:14 [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store pan2.li
2024-06-04 13:22 ` Richard Biener
2024-06-05  1:18   ` Li, Pan2
2024-06-05  7:50     ` Li, Pan2

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).