public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V7] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern
@ 2023-07-03 11:09 juzhe.zhong
  2023-07-03 13:49 ` Richard Sandiford
  0 siblings, 1 reply; 3+ messages in thread
From: juzhe.zhong @ 2023-07-03 11:09 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, rguenther, Ju-Zhe Zhong

From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

Hi, Richi and Richard.

Base one the review comments from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html

I change len_mask_gather_load/len_mask_scatter_store order into:
{len,bias,mask}

We adjust adding len and mask using using add_len_and_mask_args
which is same as partial_load/parial_store.

Now, the codes become more reasonable and easier maintain.

This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:

#include <stdint.h>
void
f (uint8_t *restrict a,
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
    {
      if (cond[i])
        a[i * step + base] = b[i * step + base];
    }
}

We hope RVV can vectorize such case into following IR:

loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask)
LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask)

This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.

Will send patch which apply such patterns into vectorizer soon after this
patch is approved.

Ok for trunk?

gcc/ChangeLog:

        * doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store.
        * internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
        (expand_gather_load_optab_fn): Ditto.
        (internal_load_fn_p): Ditto.
        (internal_store_fn_p): Ditto.
        (internal_gather_scatter_fn_p): Ditto.
        (internal_fn_len_index): Ditto.
        (internal_fn_mask_index): Ditto.
        (internal_fn_stored_value_index): Ditto.
        * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
        (LEN_MASK_SCATTER_STORE): Ditto.
        * optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi     | 17 +++++++++++++++++
 gcc/internal-fn.cc  | 32 +++++++++++++++++---------------
 gcc/internal-fn.def |  8 ++++++--
 gcc/optabs.def      |  2 ++
 4 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5e5482265cd..f14dd32b2dc 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element @var{i}
 of the result should be loaded from memory and clear if element @var{i}
 of the result should be set to zero.
 
+@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_gather_load@var{m}@var{n}}
+Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand (operand 5),
+a bias operand (operand 6) as well as a mask operand (operand 7).  Similar to len_maskload,
+the instruction loads at most (operand 5 + operand 6) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be undefined.
+Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as
 operand 5.  Bit @var{i} of the mask is set if element @var{i}
 of the result should be stored to memory.
 
+@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_scatter_store@var{m}@var{n}}
+Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand (operand 5),
+a bias operand (operand 6) as well as a mask operand (operand 7).  The instruction stores
+at most (operand 5 + operand 6) elements of (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
+Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c1fcb38b17b..303df102d81 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3507,7 +3507,6 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
 {
   internal_fn ifn = gimple_call_internal_fn (stmt);
   int rhs_index = internal_fn_stored_value_index (ifn);
-  int mask_index = internal_fn_mask_index (ifn);
   tree base = gimple_call_arg (stmt, 0);
   tree offset = gimple_call_arg (stmt, 1);
   tree scale = gimple_call_arg (stmt, 2);
@@ -3518,19 +3517,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
 
-  class expand_operand ops[6];
+  class expand_operand ops[8];
   int i = 0;
   create_address_operand (&ops[i++], base_rtx);
   create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
   create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
   create_integer_operand (&ops[i++], scale_int);
   create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
-  if (mask_index >= 0)
-    {
-      tree mask = gimple_call_arg (stmt, mask_index);
-      rtx mask_rtx = expand_normal (mask);
-      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
-    }
+  i = add_len_and_mask_args (ops, i, stmt);
 
   insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)),
 					   TYPE_MODE (TREE_TYPE (offset)));
@@ -3553,18 +3547,13 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
 
   int i = 0;
-  class expand_operand ops[6];
+  class expand_operand ops[8];
   create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
   create_address_operand (&ops[i++], base_rtx);
   create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
   create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
   create_integer_operand (&ops[i++], scale_int);
-  if (optab == mask_gather_load_optab)
-    {
-      tree mask = gimple_call_arg (stmt, 4);
-      rtx mask_rtx = expand_normal (mask);
-      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
-    }
+  i = add_len_and_mask_args (ops, i, stmt);
   insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)),
 					   TYPE_MODE (TREE_TYPE (offset)));
   expand_insn (icode, i, ops);
@@ -4415,6 +4404,7 @@ internal_load_fn_p (internal_fn fn)
     case IFN_MASK_LOAD_LANES:
     case IFN_GATHER_LOAD:
     case IFN_MASK_GATHER_LOAD:
+    case IFN_LEN_MASK_GATHER_LOAD:
     case IFN_LEN_LOAD:
     case IFN_LEN_MASK_LOAD:
       return true;
@@ -4436,6 +4426,7 @@ internal_store_fn_p (internal_fn fn)
     case IFN_MASK_STORE_LANES:
     case IFN_SCATTER_STORE:
     case IFN_MASK_SCATTER_STORE:
+    case IFN_LEN_MASK_SCATTER_STORE:
     case IFN_LEN_STORE:
     case IFN_LEN_MASK_STORE:
       return true;
@@ -4454,8 +4445,10 @@ internal_gather_scatter_fn_p (internal_fn fn)
     {
     case IFN_GATHER_LOAD:
     case IFN_MASK_GATHER_LOAD:
+    case IFN_LEN_MASK_GATHER_LOAD:
     case IFN_SCATTER_STORE:
     case IFN_MASK_SCATTER_STORE:
+    case IFN_LEN_MASK_SCATTER_STORE:
       return true;
 
     default:
@@ -4477,6 +4470,10 @@ internal_fn_len_index (internal_fn fn)
     case IFN_LEN_MASK_STORE:
       return 2;
 
+    case IFN_LEN_MASK_GATHER_LOAD:
+    case IFN_LEN_MASK_SCATTER_STORE:
+      return 4;
+
     default:
       return -1;
     }
@@ -4502,6 +4499,10 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_LEN_MASK_STORE:
       return 4;
 
+    case IFN_LEN_MASK_GATHER_LOAD:
+    case IFN_LEN_MASK_SCATTER_STORE:
+      return 6;
+
     default:
       return (conditional_internal_fn_code (fn) != ERROR_MARK
 	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
@@ -4520,6 +4521,7 @@ internal_fn_stored_value_index (internal_fn fn)
     case IFN_MASK_STORE_LANES:
     case IFN_SCATTER_STORE:
     case IFN_MASK_SCATTER_STORE:
+    case IFN_LEN_MASK_SCATTER_STORE:
       return 3;
 
     case IFN_LEN_STORE:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index d9fcca8430f..9b73e540d55 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -48,14 +48,14 @@ along with GCC; see the file COPYING3.  If not see
    - mask_load: currently just maskload
    - load_lanes: currently just vec_load_lanes
    - mask_load_lanes: currently just vec_mask_load_lanes
-   - gather_load: used for {mask_,}gather_load
+   - gather_load: used for {mask_,len_mask,}gather_load
    - len_load: currently just len_load
    - len_maskload: currently just len_maskload
 
    - mask_store: currently just maskstore
    - store_lanes: currently just vec_store_lanes
    - mask_store_lanes: currently just vec_mask_store_lanes
-   - scatter_store: used for {mask_,}scatter_store
+   - scatter_store: used for {mask_,len_mask,}scatter_store
    - len_store: currently just len_store
    - len_maskstore: currently just len_maskstore
 
@@ -157,6 +157,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
 DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load)
 DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
 		       mask_gather_load, gather_load)
+DEF_INTERNAL_OPTAB_FN (LEN_MASK_GATHER_LOAD, ECF_PURE,
+		       len_mask_gather_load, gather_load)
 
 DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
 DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
@@ -164,6 +166,8 @@ DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
 DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store)
 DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
 		       mask_scatter_store, scatter_store)
+DEF_INTERNAL_OPTAB_FN (LEN_MASK_SCATTER_STORE, 0,
+		       len_mask_scatter_store, scatter_store)
 
 DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
 DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index a901b68c538..73c9a0c760f 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -95,8 +95,10 @@ OPTAB_CD(len_maskload_optab, "len_maskload$a$b")
 OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b")
 OPTAB_CD(gather_load_optab, "gather_load$a$b")
 OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
+OPTAB_CD(len_mask_gather_load_optab, "len_mask_gather_load$a$b")
 OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
 OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
+OPTAB_CD(len_mask_scatter_store_optab, "len_mask_scatter_store$a$b")
 OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
 OPTAB_CD(vec_init_optab, "vec_init$a$b")
 
-- 
2.36.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH V7] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern
  2023-07-03 11:09 [PATCH V7] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern juzhe.zhong
@ 2023-07-03 13:49 ` Richard Sandiford
  2023-07-04  3:14   ` Li, Pan2
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Sandiford @ 2023-07-03 13:49 UTC (permalink / raw)
  To: juzhe.zhong; +Cc: gcc-patches, rguenther

juzhe.zhong@rivai.ai writes:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
>
> Hi, Richi and Richard.
>
> Base one the review comments from Richard:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html
>
> I change len_mask_gather_load/len_mask_scatter_store order into:
> {len,bias,mask}
>
> We adjust adding len and mask using using add_len_and_mask_args
> which is same as partial_load/parial_store.
>
> Now, the codes become more reasonable and easier maintain.
>
> This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
> handle flow control by mask and loop control by length on gather/scatter memory
> operations. Consider this following case:
>
> #include <stdint.h>
> void
> f (uint8_t *restrict a,
>    uint8_t *restrict b, int n,
>    int base, int step,
>    int *restrict cond)
> {
>   for (int i = 0; i < n; ++i)
>     {
>       if (cond[i])
>         a[i * step + base] = b[i * step + base];
>     }
> }
>
> We hope RVV can vectorize such case into following IR:
>
> loop_len = SELECT_VL
> control_mask = comparison
> v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask)
> LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask)
>
> This patch doesn't apply such patterns into vectorizer, just add patterns
> and update the documents.
>
> Will send patch which apply such patterns into vectorizer soon after this
> patch is approved.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         * doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store.
>         * internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
>         (expand_gather_load_optab_fn): Ditto.
>         (internal_load_fn_p): Ditto.
>         (internal_store_fn_p): Ditto.
>         (internal_gather_scatter_fn_p): Ditto.
>         (internal_fn_len_index): Ditto.
>         (internal_fn_mask_index): Ditto.
>         (internal_fn_stored_value_index): Ditto.
>         * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
>         (LEN_MASK_SCATTER_STORE): Ditto.
>         * optabs.def (OPTAB_CD): Ditto.

Nice!  OK, thanks.

Richard

> ---
>  gcc/doc/md.texi     | 17 +++++++++++++++++
>  gcc/internal-fn.cc  | 32 +++++++++++++++++---------------
>  gcc/internal-fn.def |  8 ++++++--
>  gcc/optabs.def      |  2 ++
>  4 files changed, 42 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5e5482265cd..f14dd32b2dc 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be loaded from memory and clear if element @var{i}
>  of the result should be set to zero.
>  
> +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand (operand 5),
> +a bias operand (operand 6) as well as a mask operand (operand 7).  Similar to len_maskload,
> +the instruction loads at most (operand 5 + operand 6) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be undefined.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as
>  operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be stored to memory.
>  
> +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand (operand 5),
> +a bias operand (operand 6) as well as a mask operand (operand 7).  The instruction stores
> +at most (operand 5 + operand 6) elements of (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c1fcb38b17b..303df102d81 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3507,7 +3507,6 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>  {
>    internal_fn ifn = gimple_call_internal_fn (stmt);
>    int rhs_index = internal_fn_stored_value_index (ifn);
> -  int mask_index = internal_fn_mask_index (ifn);
>    tree base = gimple_call_arg (stmt, 0);
>    tree offset = gimple_call_arg (stmt, 1);
>    tree scale = gimple_call_arg (stmt, 2);
> @@ -3518,19 +3517,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>    rtx rhs_rtx = expand_normal (rhs);
>  
> -  class expand_operand ops[6];
> +  class expand_operand ops[8];
>    int i = 0;
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
>    create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
>    create_integer_operand (&ops[i++], scale_int);
>    create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
> -  if (mask_index >= 0)
> -    {
> -      tree mask = gimple_call_arg (stmt, mask_index);
> -      rtx mask_rtx = expand_normal (mask);
> -      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
> -    }
> +  i = add_len_and_mask_args (ops, i, stmt);
>  
>    insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)),
>  					   TYPE_MODE (TREE_TYPE (offset)));
> @@ -3553,18 +3547,13 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>  
>    int i = 0;
> -  class expand_operand ops[6];
> +  class expand_operand ops[8];
>    create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
>    create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
>    create_integer_operand (&ops[i++], scale_int);
> -  if (optab == mask_gather_load_optab)
> -    {
> -      tree mask = gimple_call_arg (stmt, 4);
> -      rtx mask_rtx = expand_normal (mask);
> -      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
> -    }
> +  i = add_len_and_mask_args (ops, i, stmt);
>    insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)),
>  					   TYPE_MODE (TREE_TYPE (offset)));
>    expand_insn (icode, i, ops);
> @@ -4415,6 +4404,7 @@ internal_load_fn_p (internal_fn fn)
>      case IFN_MASK_LOAD_LANES:
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_LEN_LOAD:
>      case IFN_LEN_MASK_LOAD:
>        return true;
> @@ -4436,6 +4426,7 @@ internal_store_fn_p (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>      case IFN_LEN_STORE:
>      case IFN_LEN_MASK_STORE:
>        return true;
> @@ -4454,8 +4445,10 @@ internal_gather_scatter_fn_p (internal_fn fn)
>      {
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>        return true;
>  
>      default:
> @@ -4477,6 +4470,10 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_LEN_MASK_STORE:
>        return 2;
>  
> +    case IFN_LEN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_SCATTER_STORE:
> +      return 4;
> +
>      default:
>        return -1;
>      }
> @@ -4502,6 +4499,10 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_LEN_MASK_STORE:
>        return 4;
>  
> +    case IFN_LEN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_SCATTER_STORE:
> +      return 6;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>  	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4520,6 +4521,7 @@ internal_fn_stored_value_index (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>        return 3;
>  
>      case IFN_LEN_STORE:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index d9fcca8430f..9b73e540d55 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -48,14 +48,14 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_load: currently just maskload
>     - load_lanes: currently just vec_load_lanes
>     - mask_load_lanes: currently just vec_mask_load_lanes
> -   - gather_load: used for {mask_,}gather_load
> +   - gather_load: used for {mask_,len_mask,}gather_load
>     - len_load: currently just len_load
>     - len_maskload: currently just len_maskload
>  
>     - mask_store: currently just maskstore
>     - store_lanes: currently just vec_store_lanes
>     - mask_store_lanes: currently just vec_mask_store_lanes
> -   - scatter_store: used for {mask_,}scatter_store
> +   - scatter_store: used for {mask_,len_mask,}scatter_store
>     - len_store: currently just len_store
>     - len_maskstore: currently just len_maskstore
>  
> @@ -157,6 +157,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
>  DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
>  		       mask_gather_load, gather_load)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_GATHER_LOAD, ECF_PURE,
> +		       len_mask_gather_load, gather_load)
>  
>  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
>  DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
> @@ -164,6 +166,8 @@ DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
>  DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store)
>  DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
>  		       mask_scatter_store, scatter_store)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_SCATTER_STORE, 0,
> +		       len_mask_scatter_store, scatter_store)
>  
>  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
>  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index a901b68c538..73c9a0c760f 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -95,8 +95,10 @@ OPTAB_CD(len_maskload_optab, "len_maskload$a$b")
>  OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b")
>  OPTAB_CD(gather_load_optab, "gather_load$a$b")
>  OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
> +OPTAB_CD(len_mask_gather_load_optab, "len_mask_gather_load$a$b")
>  OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
>  OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
> +OPTAB_CD(len_mask_scatter_store_optab, "len_mask_scatter_store$a$b")
>  OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
>  OPTAB_CD(vec_init_optab, "vec_init$a$b")

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH V7] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern
  2023-07-03 13:49 ` Richard Sandiford
@ 2023-07-04  3:14   ` Li, Pan2
  0 siblings, 0 replies; 3+ messages in thread
From: Li, Pan2 @ 2023-07-04  3:14 UTC (permalink / raw)
  To: Richard Sandiford, juzhe.zhong; +Cc: gcc-patches, rguenther

Committed as both the bootstrap and regression tests passed, thanks Richard.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Richard Sandiford via Gcc-patches
Sent: Monday, July 3, 2023 9:50 PM
To: juzhe.zhong@rivai.ai
Cc: gcc-patches@gcc.gnu.org; rguenther@suse.de
Subject: Re: [PATCH V7] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

juzhe.zhong@rivai.ai writes:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
>
> Hi, Richi and Richard.
>
> Base one the review comments from Richard:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html
>
> I change len_mask_gather_load/len_mask_scatter_store order into:
> {len,bias,mask}
>
> We adjust adding len and mask using using add_len_and_mask_args
> which is same as partial_load/parial_store.
>
> Now, the codes become more reasonable and easier maintain.
>
> This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
> handle flow control by mask and loop control by length on gather/scatter memory
> operations. Consider this following case:
>
> #include <stdint.h>
> void
> f (uint8_t *restrict a,
>    uint8_t *restrict b, int n,
>    int base, int step,
>    int *restrict cond)
> {
>   for (int i = 0; i < n; ++i)
>     {
>       if (cond[i])
>         a[i * step + base] = b[i * step + base];
>     }
> }
>
> We hope RVV can vectorize such case into following IR:
>
> loop_len = SELECT_VL
> control_mask = comparison
> v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask)
> LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask)
>
> This patch doesn't apply such patterns into vectorizer, just add patterns
> and update the documents.
>
> Will send patch which apply such patterns into vectorizer soon after this
> patch is approved.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         * doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store.
>         * internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
>         (expand_gather_load_optab_fn): Ditto.
>         (internal_load_fn_p): Ditto.
>         (internal_store_fn_p): Ditto.
>         (internal_gather_scatter_fn_p): Ditto.
>         (internal_fn_len_index): Ditto.
>         (internal_fn_mask_index): Ditto.
>         (internal_fn_stored_value_index): Ditto.
>         * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
>         (LEN_MASK_SCATTER_STORE): Ditto.
>         * optabs.def (OPTAB_CD): Ditto.

Nice!  OK, thanks.

Richard

> ---
>  gcc/doc/md.texi     | 17 +++++++++++++++++
>  gcc/internal-fn.cc  | 32 +++++++++++++++++---------------
>  gcc/internal-fn.def |  8 ++++++--
>  gcc/optabs.def      |  2 ++
>  4 files changed, 42 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5e5482265cd..f14dd32b2dc 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be loaded from memory and clear if element @var{i}
>  of the result should be set to zero.
>  
> +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand (operand 5),
> +a bias operand (operand 6) as well as a mask operand (operand 7).  Similar to len_maskload,
> +the instruction loads at most (operand 5 + operand 6) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be undefined.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as
>  operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be stored to memory.
>  
> +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand (operand 5),
> +a bias operand (operand 6) as well as a mask operand (operand 7).  The instruction stores
> +at most (operand 5 + operand 6) elements of (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c1fcb38b17b..303df102d81 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3507,7 +3507,6 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>  {
>    internal_fn ifn = gimple_call_internal_fn (stmt);
>    int rhs_index = internal_fn_stored_value_index (ifn);
> -  int mask_index = internal_fn_mask_index (ifn);
>    tree base = gimple_call_arg (stmt, 0);
>    tree offset = gimple_call_arg (stmt, 1);
>    tree scale = gimple_call_arg (stmt, 2);
> @@ -3518,19 +3517,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>    rtx rhs_rtx = expand_normal (rhs);
>  
> -  class expand_operand ops[6];
> +  class expand_operand ops[8];
>    int i = 0;
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
>    create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
>    create_integer_operand (&ops[i++], scale_int);
>    create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
> -  if (mask_index >= 0)
> -    {
> -      tree mask = gimple_call_arg (stmt, mask_index);
> -      rtx mask_rtx = expand_normal (mask);
> -      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
> -    }
> +  i = add_len_and_mask_args (ops, i, stmt);
>  
>    insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)),
>  					   TYPE_MODE (TREE_TYPE (offset)));
> @@ -3553,18 +3547,13 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>  
>    int i = 0;
> -  class expand_operand ops[6];
> +  class expand_operand ops[8];
>    create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
>    create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
>    create_integer_operand (&ops[i++], scale_int);
> -  if (optab == mask_gather_load_optab)
> -    {
> -      tree mask = gimple_call_arg (stmt, 4);
> -      rtx mask_rtx = expand_normal (mask);
> -      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
> -    }
> +  i = add_len_and_mask_args (ops, i, stmt);
>    insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)),
>  					   TYPE_MODE (TREE_TYPE (offset)));
>    expand_insn (icode, i, ops);
> @@ -4415,6 +4404,7 @@ internal_load_fn_p (internal_fn fn)
>      case IFN_MASK_LOAD_LANES:
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_LEN_LOAD:
>      case IFN_LEN_MASK_LOAD:
>        return true;
> @@ -4436,6 +4426,7 @@ internal_store_fn_p (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>      case IFN_LEN_STORE:
>      case IFN_LEN_MASK_STORE:
>        return true;
> @@ -4454,8 +4445,10 @@ internal_gather_scatter_fn_p (internal_fn fn)
>      {
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>        return true;
>  
>      default:
> @@ -4477,6 +4470,10 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_LEN_MASK_STORE:
>        return 2;
>  
> +    case IFN_LEN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_SCATTER_STORE:
> +      return 4;
> +
>      default:
>        return -1;
>      }
> @@ -4502,6 +4499,10 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_LEN_MASK_STORE:
>        return 4;
>  
> +    case IFN_LEN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_SCATTER_STORE:
> +      return 6;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>  	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4520,6 +4521,7 @@ internal_fn_stored_value_index (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>        return 3;
>  
>      case IFN_LEN_STORE:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index d9fcca8430f..9b73e540d55 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -48,14 +48,14 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_load: currently just maskload
>     - load_lanes: currently just vec_load_lanes
>     - mask_load_lanes: currently just vec_mask_load_lanes
> -   - gather_load: used for {mask_,}gather_load
> +   - gather_load: used for {mask_,len_mask,}gather_load
>     - len_load: currently just len_load
>     - len_maskload: currently just len_maskload
>  
>     - mask_store: currently just maskstore
>     - store_lanes: currently just vec_store_lanes
>     - mask_store_lanes: currently just vec_mask_store_lanes
> -   - scatter_store: used for {mask_,}scatter_store
> +   - scatter_store: used for {mask_,len_mask,}scatter_store
>     - len_store: currently just len_store
>     - len_maskstore: currently just len_maskstore
>  
> @@ -157,6 +157,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
>  DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
>  		       mask_gather_load, gather_load)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_GATHER_LOAD, ECF_PURE,
> +		       len_mask_gather_load, gather_load)
>  
>  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
>  DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
> @@ -164,6 +166,8 @@ DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
>  DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store)
>  DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
>  		       mask_scatter_store, scatter_store)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_SCATTER_STORE, 0,
> +		       len_mask_scatter_store, scatter_store)
>  
>  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
>  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index a901b68c538..73c9a0c760f 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -95,8 +95,10 @@ OPTAB_CD(len_maskload_optab, "len_maskload$a$b")
>  OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b")
>  OPTAB_CD(gather_load_optab, "gather_load$a$b")
>  OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
> +OPTAB_CD(len_mask_gather_load_optab, "len_mask_gather_load$a$b")
>  OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
>  OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
> +OPTAB_CD(len_mask_scatter_store_optab, "len_mask_scatter_store$a$b")
>  OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
>  OPTAB_CD(vec_init_optab, "vec_init$a$b")

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-07-04  3:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-03 11:09 [PATCH V7] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern juzhe.zhong
2023-07-03 13:49 ` Richard Sandiford
2023-07-04  3:14   ` Li, Pan2

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).