public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN
@ 2023-10-30 10:39 Juzhe-Zhong
  2023-10-31  8:45 ` Robin Dapp
  0 siblings, 1 reply; 3+ messages in thread
From: Juzhe-Zhong @ 2023-10-30 10:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, rguenther, jeffreyalaw, Juzhe-Zhong

As previous Richard's suggested, we should support strided load/store in
loop vectorizer instead hacking RISC-V backend.

This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.

The GIMPLE IR is:

v = mask_len_strided_load (ptr, stride, mask, len, bias)
mask_len_strided_store (ptr, stride, v, mask, len, bias)

This patch is the prerequisite patch for the following loop vectorizer patch.

gcc/ChangeLog:

	* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
	* internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
	(expand_gather_load_optab_fn): Ditto.
	(internal_load_fn_p): Ditto.
	(internal_strided_fn_p): Ditto.
	(internal_fn_len_index): Ditto.
	(internal_fn_mask_index): Ditto.
	(internal_fn_stored_value_index): Ditto.
	* internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
	(MASK_LEN_STRIDED_STORE): Ditto.
	* internal-fn.h (internal_strided_fn_p): Ditto.
	* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi     | 23 +++++++++++++++++++++
 gcc/internal-fn.cc  | 49 +++++++++++++++++++++++++++++++++++++--------
 gcc/internal-fn.def |  4 ++++
 gcc/internal-fn.h   |  1 +
 gcc/optabs.def      |  2 ++
 5 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index fab2513105a..f27148c3a3c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,18 @@ Bit @var{i} of the mask is set if element @var{i} of the result should
 be loaded from memory and clear if element @var{i} of the result should be undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}@var{n}}
+Load several separate memory locations into a vector of mode m.
+Operand 1 is a scalar base address and operand 2 is mode @var{n}
+specifying each uniform stride between consecutive element.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is
+bias operand.  Similar to mask_len_load, the instruction loads at most
+(operand 4 + operand 5) elements from memory.  Bit @var{i} of the mask is set
+if element @var{i} of the result should be loaded from memory and clear if
+element @var{i} of the result should be undefined.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5143,17 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}@var{n}}
+Store a vector of mode @var{m} into several distinct memory locations.
+Operand 0 is a scalar base address, operand 2 is the vector to be stored,
+and operand 1 is mode @var{n} specifying each uniform stride between consecutive element.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is
+bias operand.  Similar to mask_len_store, the instruction stores at most
+(operand 4 + operand 5) elements to memory.  Bit @var{i} of the mask is set
+if element @var{i} of the result should be storeed.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index e7451b96353..5c1a6015de4 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3570,20 +3570,23 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
   int rhs_index = internal_fn_stored_value_index (ifn);
   tree base = gimple_call_arg (stmt, 0);
   tree offset = gimple_call_arg (stmt, 1);
-  tree scale = gimple_call_arg (stmt, 2);
   tree rhs = gimple_call_arg (stmt, rhs_index);
 
   rtx base_rtx = expand_normal (base);
   rtx offset_rtx = expand_normal (offset);
-  HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
 
   class expand_operand ops[8];
   int i = 0;
   create_address_operand (&ops[i++], base_rtx);
   create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
-  create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
-  create_integer_operand (&ops[i++], scale_int);
+  if (!internal_strided_fn_p (ifn))
+    {
+      create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
+      tree scale = gimple_call_arg (stmt, 2);
+      HOST_WIDE_INT scale_int = tree_to_shwi (scale);
+      create_integer_operand (&ops[i++], scale_int);
+    }
   create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
   i = add_mask_and_len_args (ops, i, stmt);
 
@@ -3597,23 +3600,27 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
 static void
 expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
 {
+  internal_fn ifn = gimple_call_internal_fn (stmt);
   tree lhs = gimple_call_lhs (stmt);
   tree base = gimple_call_arg (stmt, 0);
   tree offset = gimple_call_arg (stmt, 1);
-  tree scale = gimple_call_arg (stmt, 2);
 
   rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   rtx base_rtx = expand_normal (base);
   rtx offset_rtx = expand_normal (offset);
-  HOST_WIDE_INT scale_int = tree_to_shwi (scale);
 
   int i = 0;
   class expand_operand ops[8];
   create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
   create_address_operand (&ops[i++], base_rtx);
   create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
-  create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
-  create_integer_operand (&ops[i++], scale_int);
+  if (!internal_strided_fn_p (ifn))
+    {
+      create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
+      tree scale = gimple_call_arg (stmt, 2);
+      HOST_WIDE_INT scale_int = tree_to_shwi (scale);
+      create_integer_operand (&ops[i++], scale_int);
+    }
   i = add_mask_and_len_args (ops, i, stmt);
   insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)),
 					   TYPE_MODE (TREE_TYPE (offset)));
@@ -4596,6 +4603,7 @@ internal_load_fn_p (internal_fn fn)
     case IFN_GATHER_LOAD:
     case IFN_MASK_GATHER_LOAD:
     case IFN_MASK_LEN_GATHER_LOAD:
+    case IFN_MASK_LEN_STRIDED_LOAD:
     case IFN_LEN_LOAD:
     case IFN_MASK_LEN_LOAD:
       return true;
@@ -4648,6 +4656,22 @@ internal_gather_scatter_fn_p (internal_fn fn)
     }
 }
 
+/* Return true if IFN is some form of strided load or strided store.  */
+
+bool
+internal_strided_fn_p (internal_fn fn)
+{
+  switch (fn)
+    {
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return true;
+
+    default:
+      return false;
+    }
+}
+
 /* If FN takes a vector len argument, return the index of that argument,
    otherwise return -1.  */
 
@@ -4683,6 +4707,8 @@ internal_fn_len_index (internal_fn fn)
     case IFN_COND_LEN_XOR:
     case IFN_COND_LEN_SHL:
     case IFN_COND_LEN_SHR:
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
       return 4;
 
     case IFN_COND_LEN_NEG:
@@ -4715,6 +4741,10 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_LEN_STORE:
       return 2;
 
+    case IFN_MASK_LEN_STRIDED_LOAD:
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return 3;
+
     case IFN_MASK_GATHER_LOAD:
     case IFN_MASK_SCATTER_STORE:
     case IFN_MASK_LEN_GATHER_LOAD:
@@ -4735,6 +4765,9 @@ internal_fn_stored_value_index (internal_fn fn)
 {
   switch (fn)
     {
+    case IFN_MASK_LEN_STRIDED_STORE:
+      return 2;
+
     case IFN_MASK_STORE:
     case IFN_MASK_STORE_LANES:
     case IFN_SCATTER_STORE:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a2023ab9c3d..0fa532e8f6b 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -199,6 +199,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
 		       mask_gather_load, gather_load)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
 		       mask_len_gather_load, gather_load)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
+		       mask_len_strided_load, gather_load)
 
 DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load)
@@ -208,6 +210,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
 		       mask_scatter_store, scatter_store)
 DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
 		       mask_len_scatter_store, scatter_store)
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
+		       mask_len_strided_store, scatter_store)
 
 DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
 DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 99de13a0199..d25925b9a10 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -235,6 +235,7 @@ extern bool can_interpret_as_conditional_op_p (gimple *, tree *,
 extern bool internal_load_fn_p (internal_fn);
 extern bool internal_store_fn_p (internal_fn);
 extern bool internal_gather_scatter_fn_p (internal_fn);
+extern bool internal_strided_fn_p (internal_fn);
 extern int internal_fn_mask_index (internal_fn);
 extern int internal_fn_len_index (internal_fn);
 extern int internal_fn_stored_value_index (internal_fn);
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2ccbe4197b7..3d85ac5f678 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -98,9 +98,11 @@ OPTAB_CD(mask_len_store_optab, "mask_len_store$a$b")
 OPTAB_CD(gather_load_optab, "gather_load$a$b")
 OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
 OPTAB_CD(mask_len_gather_load_optab, "mask_len_gather_load$a$b")
+OPTAB_CD(mask_len_strided_load_optab, "mask_len_strided_load$a$b")
 OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
 OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
 OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b")
+OPTAB_CD(mask_len_strided_store_optab, "mask_len_strided_store$a$b")
 OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
 OPTAB_CD(vec_init_optab, "vec_init$a$b")
 
-- 
2.36.3


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN
  2023-10-30 10:39 [PATCH] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN Juzhe-Zhong
@ 2023-10-31  8:45 ` Robin Dapp
  2023-10-31  9:55   ` juzhe.zhong
  0 siblings, 1 reply; 3+ messages in thread
From: Robin Dapp @ 2023-10-31  8:45 UTC (permalink / raw)
  To: Juzhe-Zhong, gcc-patches
  Cc: rdapp.gcc, richard.sandiford, rguenther, jeffreyalaw

Hi Juzhe,

> +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}@var{n}}
> +Load several separate memory locations into a vector of mode m.
> +Operand 1 is a scalar base address and operand 2 is mode @var{n}
> +specifying each uniform stride between consecutive element.
How about:

"into a destination vector of mode @var{m} (operand 0). Operand 1
is a scalar base address.  Operand 2 is a scalar stride of mode @var{n}"
such that element @var{i} of the destination is loaded from
(operand 1) + @var{i} * (operand 2).  The instruction can be seen
as a special case of @code{mask_len_gather_load@var{m}@var{n}} with
an offset vector that is a @code{vec_series} with (operand 1) as base
and (operand 2) as step.

> +operand 3 is mask operand, operand 4 is length operand and operand 5 is
> +bias operand.  

Maybe: Similar to mask_len_load, operand 3 contains the mask, operand 4
the length and operand 5 the bias.  The instruction loads...

> +@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}@var{n}}
> +Store a vector of mode @var{m} into several distinct memory locations.
> +Operand 0 is a scalar base address, operand 2 is the vector to be stored,
> +and operand 1 is mode @var{n} specifying each uniform stride between consecutive element.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is
> +bias operand.  Similar to mask_len_store, the instruction stores at most
> +(operand 4 + operand 5) elements to memory.  Bit @var{i} of the mask is set
> +if element @var{i} of the result should be storeed.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.

Same here.

Regards
 Robin


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Re: [PATCH] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN
  2023-10-31  8:45 ` Robin Dapp
@ 2023-10-31  9:55   ` juzhe.zhong
  0 siblings, 0 replies; 3+ messages in thread
From: juzhe.zhong @ 2023-10-31  9:55 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches
  Cc: Robin Dapp, richard.sandiford, rguenther, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 2111 bytes --]

Thanks Robin. Address comments on V2.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-10-31 16:45
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; richard.sandiford; rguenther; jeffreyalaw
Subject: Re: [PATCH] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN
Hi Juzhe,
 
> +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}@var{n}}
> +Load several separate memory locations into a vector of mode m.
> +Operand 1 is a scalar base address and operand 2 is mode @var{n}
> +specifying each uniform stride between consecutive element.
How about:
 
"into a destination vector of mode @var{m} (operand 0). Operand 1
is a scalar base address.  Operand 2 is a scalar stride of mode @var{n}"
such that element @var{i} of the destination is loaded from
(operand 1) + @var{i} * (operand 2).  The instruction can be seen
as a special case of @code{mask_len_gather_load@var{m}@var{n}} with
an offset vector that is a @code{vec_series} with (operand 1) as base
and (operand 2) as step.
 
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is
> +bias operand.  
 
Maybe: Similar to mask_len_load, operand 3 contains the mask, operand 4
the length and operand 5 the bias.  The instruction loads...
 
> +@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}@var{n}}
> +Store a vector of mode @var{m} into several distinct memory locations.
> +Operand 0 is a scalar base address, operand 2 is the vector to be stored,
> +and operand 1 is mode @var{n} specifying each uniform stride between consecutive element.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is
> +bias operand.  Similar to mask_len_store, the instruction stores at most
> +(operand 4 + operand 5) elements to memory.  Bit @var{i} of the mask is set
> +if element @var{i} of the result should be storeed.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
 
Same here.
 
Regards
Robin
 
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-10-31  9:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-30 10:39 [PATCH] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN Juzhe-Zhong
2023-10-31  8:45 ` Robin Dapp
2023-10-31  9:55   ` juzhe.zhong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).