public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
@ 2023-11-14  3:39 Juzhe-Zhong
  2023-11-16  7:21 ` juzhe.zhong
       [not found] ` <202311161521015436725@rivai.ai>
  0 siblings, 2 replies; 3+ messages in thread
From: Juzhe-Zhong @ 2023-11-14  3:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, rguenther, Juzhe-Zhong

This patch adds mask_len_strided_load/mask_len_strided_store.

Document already has been reviewed.

This patch adds OPTAB/IFN support as follows:

1. strided load
GIMPLE level:

v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias)

be expand (by internal-fn.cc) into:

v = mask_len_strided_load (ptr, stried, mask, len, bias)

2. strided store

GIMPLE leve:

MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)

be expand (by internal-fn.cc) into:

mask_len_stried_store (ptr, stride, v, mask, len, bias)

Bootstrap and regression on X86 no regression.

Ok for trunk ?
 
gcc/ChangeLog:

	* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
	* internal-fn.cc (strided_load_direct): Ditto.
	(strided_store_direct): Ditto.
	(expand_strided_store_optab_fn): Ditto.
	(expand_strided_load_optab_fn): Ditto.
	(direct_strided_load_optab_supported_p): Ditto.
	(direct_strided_store_optab_supported_p): Ditto.
	(internal_fn_len_index): Ditto.
	(internal_fn_mask_index): Ditto.
	(internal_fn_stored_value_index): Ditto.
	* internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
	(MASK_LEN_STRIDED_STORE): Ditto.
	* optabs.def (OPTAB_D): Ditto.

---
 gcc/doc/md.texi     | 27 +++++++++++++++++++
 gcc/internal-fn.cc  | 63 +++++++++++++++++++++++++++++++++++++++++++++
 gcc/internal-fn.def |  6 +++++
 gcc/optabs.def      |  2 ++
 4 files changed, 98 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5d86152e5dd..5dc76a1183c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the result should
 be loaded from memory and clear if element @var{i} of the result should be undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode @var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
+The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory.
+Element @var{i} of the mask (operand 3) is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode @var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
+The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory.
+Element @var{i} of the mask is set if element @var{i} of (operand 3) should be stored.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5a998e794ad..bfb307684a9 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -164,6 +164,7 @@ init_internal_fns ()
 #define load_lanes_direct { -1, -1, false }
 #define mask_load_lanes_direct { -1, -1, false }
 #define gather_load_direct { 3, 1, false }
+#define strided_load_direct { -1, -1, false }
 #define len_load_direct { -1, -1, false }
 #define mask_len_load_direct { -1, 4, false }
 #define mask_store_direct { 3, 2, false }
@@ -173,6 +174,7 @@ init_internal_fns ()
 #define vec_cond_mask_len_direct { 1, 1, false }
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
+#define strided_store_direct { 1, 1, false }
 #define len_store_direct { 3, 3, false }
 #define mask_len_store_direct { 4, 5, false }
 #define vec_set_direct { 3, 3, false }
@@ -3593,6 +3595,31 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
   expand_insn (icode, i, ops);
 }
 
+/* Expand MASK_LEN_STRIDED_STORE call CALL using optab OPTAB.  */
+
+static void
+expand_strided_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
+{
+  internal_fn ifn = gimple_call_internal_fn (stmt);
+  int rhs_index = internal_fn_stored_value_index (ifn);
+  tree base = gimple_call_arg (stmt, 0);
+  tree stride = gimple_call_arg (stmt, 1);
+  tree rhs = gimple_call_arg (stmt, rhs_index);
+
+  rtx base_rtx = expand_normal (base);
+  rtx stride_rtx = expand_normal (stride);
+  rtx rhs_rtx = expand_normal (rhs);
+
+  class expand_operand ops[6];
+  int i = 0;
+  create_address_operand (&ops[i++], base_rtx);
+  create_address_operand (&ops[i++], stride_rtx);
+  create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
+  i = add_mask_and_len_args (ops, i, stmt);
+  insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)));
+  expand_insn (icode, i, ops);
+}
+
 /* Expand {MASK_,}GATHER_LOAD call CALL using optab OPTAB.  */
 
 static void
@@ -3623,6 +3650,31 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
     emit_move_insn (lhs_rtx, ops[0].value);
 }
 
+/* Expand MASK_LEN_STRIDED_LOAD call CALL using optab OPTAB.  */
+
+static void
+expand_strided_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree base = gimple_call_arg (stmt, 0);
+  tree stride = gimple_call_arg (stmt, 1);
+
+  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx base_rtx = expand_normal (base);
+  rtx stride_rtx = expand_normal (stride);
+
+  int i = 0;
+  class expand_operand ops[6];
+  create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
+  create_address_operand (&ops[i++], base_rtx);
+  create_address_operand (&ops[i++], stride_rtx);
+  i = add_mask_and_len_args (ops, i, stmt);
+  insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)));
+  expand_insn (icode, i, ops);
+  if (!rtx_equal_p (lhs_rtx, ops[0].value))
+    emit_move_insn (lhs_rtx, ops[0].value);
+}
+
 /* Helper for expand_DIVMOD.  Return true if the sequence starting with
    INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes.  */
 
@@ -4013,6 +4065,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_gather_load_optab_supported_p convert_optab_supported_p
+#define direct_strided_load_optab_supported_p direct_optab_supported_p
 #define direct_len_load_optab_supported_p direct_optab_supported_p
 #define direct_mask_len_load_optab_supported_p convert_optab_supported_p
 #define direct_mask_store_optab_supported_p convert_optab_supported_p
@@ -4021,6 +4074,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
 #define direct_vec_cond_optab_supported_p convert_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
+#define direct_strided_store_optab_supported_p direct_optab_supported_p
 #define direct_len_store_optab_supported_p direct_optab_supported_p
 #define direct_mask_len_store_optab_supported_p convert_optab_supported_p
 #define direct_while_optab_supported_p convert_optab_supported_p
@@ -4684,6 +4738,8 @@ internal_fn_len_index (internal_fn fn)
     case IFN_COND_LEN_XOR:
     case IFN_COND_LEN_SHL:
     case IFN_COND_LEN_SHR:
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
       return 4;
 
     case IFN_COND_LEN_NEG:
@@ -4778,6 +4834,10 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_LEN_STORE:
       return 2;
 
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return 3;
+
     case IFN_MASK_GATHER_LOAD:
     case IFN_MASK_SCATTER_STORE:
     case IFN_MASK_LEN_GATHER_LOAD:
@@ -4801,6 +4861,9 @@ internal_fn_stored_value_index (internal_fn fn)
 {
   switch (fn)
     {
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return 2;
+
     case IFN_MASK_STORE:
     case IFN_MASK_STORE_LANES:
     case IFN_SCATTER_STORE:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 319e0ab26cd..8a380df20f1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
    - mask_load_lanes: currently just vec_mask_load_lanes
    - mask_len_load_lanes: currently just vec_mask_len_load_lanes
    - gather_load: used for {mask_,mask_len_,}gather_load
+   - strided_load: currently just mask_len_strided_load
    - len_load: currently just len_load
    - mask_len_load: currently just mask_len_load
 
@@ -60,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
    - mask_store_lanes: currently just vec_mask_store_lanes
    - mask_len_store_lanes: currently just vec_mask_len_store_lanes
    - scatter_store: used for {mask_,mask_len_,}scatter_store
+   - strided_store: currently just mask_len_strided_store
    - len_store: currently just len_store
    - mask_len_store: currently just mask_len_store
 
@@ -199,6 +201,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
 		       mask_gather_load, gather_load)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
 		       mask_len_gather_load, gather_load)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
+		       mask_len_strided_load, strided_load)
 
 DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load)
@@ -208,6 +212,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
 		       mask_scatter_store, scatter_store)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
 		       mask_len_scatter_store, scatter_store)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
+		       mask_len_strided_store, strided_store)
 
 DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
 DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 07c06ba8cbb..fa7a51b4906 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -539,4 +539,6 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (mask_len_strided_load_optab, "mask_len_strided_load_$a")
+OPTAB_D (mask_len_strided_store_optab, "mask_len_strided_store_$a")
 OPTAB_D (select_vl_optab, "select_vl$a")
-- 
2.36.3


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
  2023-11-14  3:39 [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN Juzhe-Zhong
@ 2023-11-16  7:21 ` juzhe.zhong
       [not found] ` <202311161521015436725@rivai.ai>
  1 sibling, 0 replies; 3+ messages in thread
From: juzhe.zhong @ 2023-11-16  7:21 UTC (permalink / raw)
  To: 钟居哲, gcc-patches; +Cc: richard.sandiford, rguenther

[-- Attachment #1: Type: text/plain, Size: 12433 bytes --]

Update just finished test CI.

Tested on aarch64 QEMU no regression.



juzhe.zhong@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-11-14 11:39
To: gcc-patches
CC: richard.sandiford; rguenther; Juzhe-Zhong
Subject: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
This patch adds mask_len_strided_load/mask_len_strided_store.
 
Document already has been reviewed.
 
This patch adds OPTAB/IFN support as follows:
 
1. strided load
GIMPLE level:
 
v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias)
 
be expand (by internal-fn.cc) into:
 
v = mask_len_strided_load (ptr, stried, mask, len, bias)
 
2. strided store
 
GIMPLE leve:
 
MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
 
be expand (by internal-fn.cc) into:
 
mask_len_stried_store (ptr, stride, v, mask, len, bias)
 
Bootstrap and regression on X86 no regression.
 
Ok for trunk ?
gcc/ChangeLog:
 
* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
* internal-fn.cc (strided_load_direct): Ditto.
(strided_store_direct): Ditto.
(expand_strided_store_optab_fn): Ditto.
(expand_strided_load_optab_fn): Ditto.
(direct_strided_load_optab_supported_p): Ditto.
(direct_strided_store_optab_supported_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
(MASK_LEN_STRIDED_STORE): Ditto.
* optabs.def (OPTAB_D): Ditto.
 
---
gcc/doc/md.texi     | 27 +++++++++++++++++++
gcc/internal-fn.cc  | 63 +++++++++++++++++++++++++++++++++++++++++++++
gcc/internal-fn.def |  6 +++++
gcc/optabs.def      |  2 ++
4 files changed, 98 insertions(+)
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5d86152e5dd..5dc76a1183c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the result should
be loaded from memory and clear if element @var{i} of the result should be undefined.
Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode @var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
+The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory.
+Element @var{i} of the mask (operand 3) is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
@cindex @code{scatter_store@var{m}@var{n}} instruction pattern
@item @samp{scatter_store@var{m}@var{n}}
Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory.
Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode @var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
+The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory.
+Element @var{i} of the mask is set if element @var{i} of (operand 3) should be stored.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
@cindex @code{vec_set@var{m}} instruction pattern
@item @samp{vec_set@var{m}}
Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5a998e794ad..bfb307684a9 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -164,6 +164,7 @@ init_internal_fns ()
#define load_lanes_direct { -1, -1, false }
#define mask_load_lanes_direct { -1, -1, false }
#define gather_load_direct { 3, 1, false }
+#define strided_load_direct { -1, -1, false }
#define len_load_direct { -1, -1, false }
#define mask_len_load_direct { -1, 4, false }
#define mask_store_direct { 3, 2, false }
@@ -173,6 +174,7 @@ init_internal_fns ()
#define vec_cond_mask_len_direct { 1, 1, false }
#define vec_cond_direct { 2, 0, false }
#define scatter_store_direct { 3, 1, false }
+#define strided_store_direct { 1, 1, false }
#define len_store_direct { 3, 3, false }
#define mask_len_store_direct { 4, 5, false }
#define vec_set_direct { 3, 3, false }
@@ -3593,6 +3595,31 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
   expand_insn (icode, i, ops);
}
+/* Expand MASK_LEN_STRIDED_STORE call CALL using optab OPTAB.  */
+
+static void
+expand_strided_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
+{
+  internal_fn ifn = gimple_call_internal_fn (stmt);
+  int rhs_index = internal_fn_stored_value_index (ifn);
+  tree base = gimple_call_arg (stmt, 0);
+  tree stride = gimple_call_arg (stmt, 1);
+  tree rhs = gimple_call_arg (stmt, rhs_index);
+
+  rtx base_rtx = expand_normal (base);
+  rtx stride_rtx = expand_normal (stride);
+  rtx rhs_rtx = expand_normal (rhs);
+
+  class expand_operand ops[6];
+  int i = 0;
+  create_address_operand (&ops[i++], base_rtx);
+  create_address_operand (&ops[i++], stride_rtx);
+  create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
+  i = add_mask_and_len_args (ops, i, stmt);
+  insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)));
+  expand_insn (icode, i, ops);
+}
+
/* Expand {MASK_,}GATHER_LOAD call CALL using optab OPTAB.  */
static void
@@ -3623,6 +3650,31 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
     emit_move_insn (lhs_rtx, ops[0].value);
}
+/* Expand MASK_LEN_STRIDED_LOAD call CALL using optab OPTAB.  */
+
+static void
+expand_strided_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree base = gimple_call_arg (stmt, 0);
+  tree stride = gimple_call_arg (stmt, 1);
+
+  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx base_rtx = expand_normal (base);
+  rtx stride_rtx = expand_normal (stride);
+
+  int i = 0;
+  class expand_operand ops[6];
+  create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
+  create_address_operand (&ops[i++], base_rtx);
+  create_address_operand (&ops[i++], stride_rtx);
+  i = add_mask_and_len_args (ops, i, stmt);
+  insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)));
+  expand_insn (icode, i, ops);
+  if (!rtx_equal_p (lhs_rtx, ops[0].value))
+    emit_move_insn (lhs_rtx, ops[0].value);
+}
+
/* Helper for expand_DIVMOD.  Return true if the sequence starting with
    INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes.  */
@@ -4013,6 +4065,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
#define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
#define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
#define direct_gather_load_optab_supported_p convert_optab_supported_p
+#define direct_strided_load_optab_supported_p direct_optab_supported_p
#define direct_len_load_optab_supported_p direct_optab_supported_p
#define direct_mask_len_load_optab_supported_p convert_optab_supported_p
#define direct_mask_store_optab_supported_p convert_optab_supported_p
@@ -4021,6 +4074,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
#define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
#define direct_vec_cond_optab_supported_p convert_optab_supported_p
#define direct_scatter_store_optab_supported_p convert_optab_supported_p
+#define direct_strided_store_optab_supported_p direct_optab_supported_p
#define direct_len_store_optab_supported_p direct_optab_supported_p
#define direct_mask_len_store_optab_supported_p convert_optab_supported_p
#define direct_while_optab_supported_p convert_optab_supported_p
@@ -4684,6 +4738,8 @@ internal_fn_len_index (internal_fn fn)
     case IFN_COND_LEN_XOR:
     case IFN_COND_LEN_SHL:
     case IFN_COND_LEN_SHR:
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
       return 4;
     case IFN_COND_LEN_NEG:
@@ -4778,6 +4834,10 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_LEN_STORE:
       return 2;
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return 3;
+
     case IFN_MASK_GATHER_LOAD:
     case IFN_MASK_SCATTER_STORE:
     case IFN_MASK_LEN_GATHER_LOAD:
@@ -4801,6 +4861,9 @@ internal_fn_stored_value_index (internal_fn fn)
{
   switch (fn)
     {
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return 2;
+
     case IFN_MASK_STORE:
     case IFN_MASK_STORE_LANES:
     case IFN_SCATTER_STORE:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 319e0ab26cd..8a380df20f1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
    - mask_load_lanes: currently just vec_mask_load_lanes
    - mask_len_load_lanes: currently just vec_mask_len_load_lanes
    - gather_load: used for {mask_,mask_len_,}gather_load
+   - strided_load: currently just mask_len_strided_load
    - len_load: currently just len_load
    - mask_len_load: currently just mask_len_load
@@ -60,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
    - mask_store_lanes: currently just vec_mask_store_lanes
    - mask_len_store_lanes: currently just vec_mask_len_store_lanes
    - scatter_store: used for {mask_,mask_len_,}scatter_store
+   - strided_store: currently just mask_len_strided_store
    - len_store: currently just len_store
    - mask_len_store: currently just mask_len_store
@@ -199,6 +201,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
       mask_gather_load, gather_load)
DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
       mask_len_gather_load, gather_load)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
+        mask_len_strided_load, strided_load)
DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load)
@@ -208,6 +212,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
       mask_scatter_store, scatter_store)
DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
       mask_len_scatter_store, scatter_store)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
+        mask_len_strided_store, strided_store)
DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 07c06ba8cbb..fa7a51b4906 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -539,4 +539,6 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
OPTAB_D (len_load_optab, "len_load_$a")
OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (mask_len_strided_load_optab, "mask_len_strided_load_$a")
+OPTAB_D (mask_len_strided_store_optab, "mask_len_strided_store_$a")
OPTAB_D (select_vl_optab, "select_vl$a")
-- 
2.36.3
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* 回复: Re: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
       [not found] ` <202311161521015436725@rivai.ai>
@ 2023-11-20  8:46   ` juzhe.zhong
  0 siblings, 0 replies; 3+ messages in thread
From: juzhe.zhong @ 2023-11-20  8:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, rguenther

[-- Attachment #1: Type: text/plain, Size: 12835 bytes --]

Hi, Richi.

strided load/store has been posted for a while.
Can this feature be available on GCC-14 ?
Or postpone it to GCC-15 ? 

Thanks.



juzhe.zhong@rivai.ai
 
发件人: juzhe.zhong@rivai.ai
发送时间: 2023-11-16 15:21
收件人: 钟居哲; gcc-patches
抄送: richard.sandiford; rguenther
主题: Re: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
Update just finished test CI.

Tested on aarch64 QEMU no regression.



juzhe.zhong@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-11-14 11:39
To: gcc-patches
CC: richard.sandiford; rguenther; Juzhe-Zhong
Subject: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
This patch adds mask_len_strided_load/mask_len_strided_store.
 
Document already has been reviewed.
 
This patch adds OPTAB/IFN support as follows:
 
1. strided load
GIMPLE level:
 
v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias)
 
be expand (by internal-fn.cc) into:
 
v = mask_len_strided_load (ptr, stried, mask, len, bias)
 
2. strided store
 
GIMPLE leve:
 
MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
 
be expand (by internal-fn.cc) into:
 
mask_len_stried_store (ptr, stride, v, mask, len, bias)
 
Bootstrap and regression on X86 no regression.
 
Ok for trunk ?
gcc/ChangeLog:
 
* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
* internal-fn.cc (strided_load_direct): Ditto.
(strided_store_direct): Ditto.
(expand_strided_store_optab_fn): Ditto.
(expand_strided_load_optab_fn): Ditto.
(direct_strided_load_optab_supported_p): Ditto.
(direct_strided_store_optab_supported_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
(MASK_LEN_STRIDED_STORE): Ditto.
* optabs.def (OPTAB_D): Ditto.
 
---
gcc/doc/md.texi     | 27 +++++++++++++++++++
gcc/internal-fn.cc  | 63 +++++++++++++++++++++++++++++++++++++++++++++
gcc/internal-fn.def |  6 +++++
gcc/optabs.def      |  2 ++
4 files changed, 98 insertions(+)
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5d86152e5dd..5dc76a1183c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the result should
be loaded from memory and clear if element @var{i} of the result should be undefined.
Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode @var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
+The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory.
+Element @var{i} of the mask (operand 3) is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
@cindex @code{scatter_store@var{m}@var{n}} instruction pattern
@item @samp{scatter_store@var{m}@var{n}}
Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory.
Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode @var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand.
+The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory.
+Element @var{i} of the mask is set if element @var{i} of (operand 3) should be stored.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
@cindex @code{vec_set@var{m}} instruction pattern
@item @samp{vec_set@var{m}}
Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5a998e794ad..bfb307684a9 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -164,6 +164,7 @@ init_internal_fns ()
#define load_lanes_direct { -1, -1, false }
#define mask_load_lanes_direct { -1, -1, false }
#define gather_load_direct { 3, 1, false }
+#define strided_load_direct { -1, -1, false }
#define len_load_direct { -1, -1, false }
#define mask_len_load_direct { -1, 4, false }
#define mask_store_direct { 3, 2, false }
@@ -173,6 +174,7 @@ init_internal_fns ()
#define vec_cond_mask_len_direct { 1, 1, false }
#define vec_cond_direct { 2, 0, false }
#define scatter_store_direct { 3, 1, false }
+#define strided_store_direct { 1, 1, false }
#define len_store_direct { 3, 3, false }
#define mask_len_store_direct { 4, 5, false }
#define vec_set_direct { 3, 3, false }
@@ -3593,6 +3595,31 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
   expand_insn (icode, i, ops);
}
+/* Expand MASK_LEN_STRIDED_STORE call CALL using optab OPTAB.  */
+
+static void
+expand_strided_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
+{
+  internal_fn ifn = gimple_call_internal_fn (stmt);
+  int rhs_index = internal_fn_stored_value_index (ifn);
+  tree base = gimple_call_arg (stmt, 0);
+  tree stride = gimple_call_arg (stmt, 1);
+  tree rhs = gimple_call_arg (stmt, rhs_index);
+
+  rtx base_rtx = expand_normal (base);
+  rtx stride_rtx = expand_normal (stride);
+  rtx rhs_rtx = expand_normal (rhs);
+
+  class expand_operand ops[6];
+  int i = 0;
+  create_address_operand (&ops[i++], base_rtx);
+  create_address_operand (&ops[i++], stride_rtx);
+  create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
+  i = add_mask_and_len_args (ops, i, stmt);
+  insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)));
+  expand_insn (icode, i, ops);
+}
+
/* Expand {MASK_,}GATHER_LOAD call CALL using optab OPTAB.  */
static void
@@ -3623,6 +3650,31 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
     emit_move_insn (lhs_rtx, ops[0].value);
}
+/* Expand MASK_LEN_STRIDED_LOAD call CALL using optab OPTAB.  */
+
+static void
+expand_strided_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree base = gimple_call_arg (stmt, 0);
+  tree stride = gimple_call_arg (stmt, 1);
+
+  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx base_rtx = expand_normal (base);
+  rtx stride_rtx = expand_normal (stride);
+
+  int i = 0;
+  class expand_operand ops[6];
+  create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
+  create_address_operand (&ops[i++], base_rtx);
+  create_address_operand (&ops[i++], stride_rtx);
+  i = add_mask_and_len_args (ops, i, stmt);
+  insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)));
+  expand_insn (icode, i, ops);
+  if (!rtx_equal_p (lhs_rtx, ops[0].value))
+    emit_move_insn (lhs_rtx, ops[0].value);
+}
+
/* Helper for expand_DIVMOD.  Return true if the sequence starting with
    INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes.  */
@@ -4013,6 +4065,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
#define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
#define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
#define direct_gather_load_optab_supported_p convert_optab_supported_p
+#define direct_strided_load_optab_supported_p direct_optab_supported_p
#define direct_len_load_optab_supported_p direct_optab_supported_p
#define direct_mask_len_load_optab_supported_p convert_optab_supported_p
#define direct_mask_store_optab_supported_p convert_optab_supported_p
@@ -4021,6 +4074,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
#define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
#define direct_vec_cond_optab_supported_p convert_optab_supported_p
#define direct_scatter_store_optab_supported_p convert_optab_supported_p
+#define direct_strided_store_optab_supported_p direct_optab_supported_p
#define direct_len_store_optab_supported_p direct_optab_supported_p
#define direct_mask_len_store_optab_supported_p convert_optab_supported_p
#define direct_while_optab_supported_p convert_optab_supported_p
@@ -4684,6 +4738,8 @@ internal_fn_len_index (internal_fn fn)
     case IFN_COND_LEN_XOR:
     case IFN_COND_LEN_SHL:
     case IFN_COND_LEN_SHR:
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
       return 4;
     case IFN_COND_LEN_NEG:
@@ -4778,6 +4834,10 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_LEN_STORE:
       return 2;
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return 3;
+
     case IFN_MASK_GATHER_LOAD:
     case IFN_MASK_SCATTER_STORE:
     case IFN_MASK_LEN_GATHER_LOAD:
@@ -4801,6 +4861,9 @@ internal_fn_stored_value_index (internal_fn fn)
{
   switch (fn)
     {
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return 2;
+
     case IFN_MASK_STORE:
     case IFN_MASK_STORE_LANES:
     case IFN_SCATTER_STORE:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 319e0ab26cd..8a380df20f1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
    - mask_load_lanes: currently just vec_mask_load_lanes
    - mask_len_load_lanes: currently just vec_mask_len_load_lanes
    - gather_load: used for {mask_,mask_len_,}gather_load
+   - strided_load: currently just mask_len_strided_load
    - len_load: currently just len_load
    - mask_len_load: currently just mask_len_load
@@ -60,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
    - mask_store_lanes: currently just vec_mask_store_lanes
    - mask_len_store_lanes: currently just vec_mask_len_store_lanes
    - scatter_store: used for {mask_,mask_len_,}scatter_store
+   - strided_store: currently just mask_len_strided_store
    - len_store: currently just len_store
    - mask_len_store: currently just mask_len_store
@@ -199,6 +201,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
       mask_gather_load, gather_load)
DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
       mask_len_gather_load, gather_load)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
+        mask_len_strided_load, strided_load)
DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load)
@@ -208,6 +212,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
       mask_scatter_store, scatter_store)
DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
       mask_len_scatter_store, scatter_store)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
+        mask_len_strided_store, strided_store)
DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 07c06ba8cbb..fa7a51b4906 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -539,4 +539,6 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
OPTAB_D (len_load_optab, "len_load_$a")
OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (mask_len_strided_load_optab, "mask_len_strided_load_$a")
+OPTAB_D (mask_len_strided_store_optab, "mask_len_strided_store_$a")
OPTAB_D (select_vl_optab, "select_vl$a")
-- 
2.36.3
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-11-20  8:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-14  3:39 [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN Juzhe-Zhong
2023-11-16  7:21 ` juzhe.zhong
     [not found] ` <202311161521015436725@rivai.ai>
2023-11-20  8:46   ` 回复: " juzhe.zhong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).