[PATCH] VECT: Enhance SLP of MASK_LEN_GATHER

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
@ 2023-10-11 12:27 Juzhe-Zhong
  2023-10-11 12:50 ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Juzhe-Zhong @ 2023-10-11 12:27 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, rguenther, Juzhe-Zhong

This patch fixes this following FAILs in RISC-V regression:

FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"

The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.

Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
situation as GATHER_LOAD (no conditional mask).

So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.

gcc/ChangeLog:

	* tree-vect-slp.cc (vect_get_operand_map):
	(vect_build_slp_tree_1):
	(vect_build_slp_tree_2):
	* tree-vect-stmts.cc (vectorizable_load):

---
 gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
 gcc/tree-vect-stmts.cc |  4 ++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index fa098f9ff4e..712c04ec278 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
 	  case IFN_MASK_GATHER_LOAD:
 	    return arg1_arg4_map;
 
+	  case IFN_MASK_LEN_GATHER_LOAD:
+	    /* In tree-vect-patterns.cc, we will have these 2 situations:
+
+		- Unconditional gather load transforms
+		  into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
+
+		- Conditional gather load transforms
+		  into MASK_LEN_GATHER_LOAD with real conditional mask.*/
+	    return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
+								  : nullptr;
+
 	  case IFN_MASK_STORE:
 	    return arg3_arg2_map;
 
@@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
 
 	  if (cfn == CFN_MASK_LOAD
 	      || cfn == CFN_GATHER_LOAD
-	      || cfn == CFN_MASK_GATHER_LOAD)
+	      || cfn == CFN_MASK_GATHER_LOAD
+	      || cfn == CFN_MASK_LEN_GATHER_LOAD)
 	    ldst_p = true;
 	  else if (cfn == CFN_MASK_STORE)
 	    {
@@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
 	  if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
 	      && rhs_code != CFN_GATHER_LOAD
 	      && rhs_code != CFN_MASK_GATHER_LOAD
+	      && rhs_code != CFN_MASK_LEN_GATHER_LOAD
 	      /* Not grouped loads are handled as externals for BB
 		 vectorization.  For loop vectorization we can handle
 		 splats the same we handle single element interleaving.  */
@@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
       if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
 	gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
 		    || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
-		    || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
+		    || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
+		    || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
       else
 	{
 	  *max_nunits = this_max_nunits;
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cd7c1090d88..263acf5d3cd 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
 	return false;
 
       mask_index = internal_fn_mask_index (ifn);
-      if (mask_index >= 0 && slp_node)
+      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
 	mask_index = vect_slp_child_index_for_operand (call, mask_index);
-      if (mask_index >= 0
+      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
 	  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
 				      &mask, NULL, &mask_dt, &mask_vectype))
 	return false;
-- 
2.36.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-11 12:27 [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] Juzhe-Zhong
@ 2023-10-11 12:50 ` Richard Biener
  2023-10-11 16:02   ` 钟居哲
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2023-10-11 12:50 UTC (permalink / raw)
  To: Juzhe-Zhong; +Cc: gcc-patches, richard.sandiford

On Wed, 11 Oct 2023, Juzhe-Zhong wrote:

> This patch fixes this following FAILs in RISC-V regression:
> 
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> 
> The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> 
> Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> situation as GATHER_LOAD (no conditional mask).
> 
> So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-slp.cc (vect_get_operand_map):
> 	(vect_build_slp_tree_1):
> 	(vect_build_slp_tree_2):
> 	* tree-vect-stmts.cc (vectorizable_load):
> 
> ---
>  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
>  gcc/tree-vect-stmts.cc |  4 ++--
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index fa098f9ff4e..712c04ec278 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
>  	  case IFN_MASK_GATHER_LOAD:
>  	    return arg1_arg4_map;
>  
> +	  case IFN_MASK_LEN_GATHER_LOAD:
> +	    /* In tree-vect-patterns.cc, we will have these 2 situations:
> +
> +		- Unconditional gather load transforms
> +		  into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> +
> +		- Conditional gather load transforms
> +		  into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> +	    return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> +								  : nullptr;
> +
>  	  case IFN_MASK_STORE:
>  	    return arg3_arg2_map;
>  
> @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
>  
>  	  if (cfn == CFN_MASK_LOAD
>  	      || cfn == CFN_GATHER_LOAD
> -	      || cfn == CFN_MASK_GATHER_LOAD)
> +	      || cfn == CFN_MASK_GATHER_LOAD
> +	      || cfn == CFN_MASK_LEN_GATHER_LOAD)
>  	    ldst_p = true;
>  	  else if (cfn == CFN_MASK_STORE)
>  	    {
> @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
>  	  if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
>  	      && rhs_code != CFN_GATHER_LOAD
>  	      && rhs_code != CFN_MASK_GATHER_LOAD
> +	      && rhs_code != CFN_MASK_LEN_GATHER_LOAD
>  	      /* Not grouped loads are handled as externals for BB
>  		 vectorization.  For loop vectorization we can handle
>  		 splats the same we handle single element interleaving.  */
> @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
>  	gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
>  		    || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> -		    || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> +		    || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> +		    || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
>        else
>  	{
>  	  *max_nunits = this_max_nunits;
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index cd7c1090d88..263acf5d3cd 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
>  	return false;
>  
>        mask_index = internal_fn_mask_index (ifn);
> -      if (mask_index >= 0 && slp_node)
> +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
>  	mask_index = vect_slp_child_index_for_operand (call, mask_index);
> -      if (mask_index >= 0
> +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
>  	  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
>  				      &mask, NULL, &mask_dt, &mask_vectype))
>  	return false;

You are ignoring the mask argument and here only handle it when the
IFN doesn't have a _LEN.  This doesn't seem to be forward looking
to the point where you want to actually handle masked (aka conditional)
gather.

Did you check that SLP is actually used to vectorize this?

Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-11 12:50 ` Richard Biener
@ 2023-10-11 16:02   ` 钟居哲
  2023-10-12  9:44     ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: 钟居哲 @ 2023-10-11 16:02 UTC (permalink / raw)
  To: rguenther; +Cc: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 5260 bytes --]

Thanks Richi point it out.

I found this patch can't make conditional gather load succeed on SLP.

I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:

If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.

If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.

Is it reasonable ?


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-10-11 20:50
To: Juzhe-Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
 
> This patch fixes this following FAILs in RISC-V regression:
> 
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> 
> The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> 
> Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> situation as GATHER_LOAD (no conditional mask).
> 
> So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> 
> gcc/ChangeLog:
> 
> * tree-vect-slp.cc (vect_get_operand_map):
> (vect_build_slp_tree_1):
> (vect_build_slp_tree_2):
> * tree-vect-stmts.cc (vectorizable_load):
> 
> ---
>  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
>  gcc/tree-vect-stmts.cc |  4 ++--
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index fa098f9ff4e..712c04ec278 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
>    case IFN_MASK_GATHER_LOAD:
>      return arg1_arg4_map;
>  
> +   case IFN_MASK_LEN_GATHER_LOAD:
> +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> +
> + - Unconditional gather load transforms
> +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> +
> + - Conditional gather load transforms
> +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> +   : nullptr;
> +
>    case IFN_MASK_STORE:
>      return arg3_arg2_map;
>  
> @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
>  
>    if (cfn == CFN_MASK_LOAD
>        || cfn == CFN_GATHER_LOAD
> -       || cfn == CFN_MASK_GATHER_LOAD)
> +       || cfn == CFN_MASK_GATHER_LOAD
> +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
>      ldst_p = true;
>    else if (cfn == CFN_MASK_STORE)
>      {
> @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
>    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
>        && rhs_code != CFN_GATHER_LOAD
>        && rhs_code != CFN_MASK_GATHER_LOAD
> +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
>        /* Not grouped loads are handled as externals for BB
>  vectorization.  For loop vectorization we can handle
>  splats the same we handle single element interleaving.  */
> @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
>  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
>      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
>        else
>  {
>    *max_nunits = this_max_nunits;
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index cd7c1090d88..263acf5d3cd 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
>  return false;
>  
>        mask_index = internal_fn_mask_index (ifn);
> -      if (mask_index >= 0 && slp_node)
> +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
>  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> -      if (mask_index >= 0
> +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
>    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
>        &mask, NULL, &mask_dt, &mask_vectype))
>  return false;
 
You are ignoring the mask argument and here only handle it when the
IFN doesn't have a _LEN.  This doesn't seem to be forward looking
to the point where you want to actually handle masked (aka conditional)
gather.
 
Did you check that SLP is actually used to vectorize this?
 
Richard.
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-11 16:02   ` 钟居哲
@ 2023-10-12  9:44     ` Richard Biener
  2023-10-12  9:50       ` juzhe.zhong
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2023-10-12  9:44 UTC (permalink / raw)
  To: 钟居哲; +Cc: gcc-patches, richard.sandiford

On Thu, 12 Oct 2023, ??? wrote:

> Thanks Richi point it out.
> 
> I found this patch can't make conditional gather load succeed on SLP.
> 
> I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> 
> If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.
> 
> If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.
> 
> Is it reasonable ?

What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
even when the mask is -1?

> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-11 20:50
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
>  
> > This patch fixes this following FAILs in RISC-V regression:
> > 
> > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> > 
> > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > 
> > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> > situation as GATHER_LOAD (no conditional mask).
> > 
> > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> > 
> > gcc/ChangeLog:
> > 
> > * tree-vect-slp.cc (vect_get_operand_map):
> > (vect_build_slp_tree_1):
> > (vect_build_slp_tree_2):
> > * tree-vect-stmts.cc (vectorizable_load):
> > 
> > ---
> >  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
> >  gcc/tree-vect-stmts.cc |  4 ++--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index fa098f9ff4e..712c04ec278 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> >    case IFN_MASK_GATHER_LOAD:
> >      return arg1_arg4_map;
> >  
> > +   case IFN_MASK_LEN_GATHER_LOAD:
> > +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> > +
> > + - Unconditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > +
> > + - Conditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > +   : nullptr;
> > +
> >    case IFN_MASK_STORE:
> >      return arg3_arg2_map;
> >  
> > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> >  
> >    if (cfn == CFN_MASK_LOAD
> >        || cfn == CFN_GATHER_LOAD
> > -       || cfn == CFN_MASK_GATHER_LOAD)
> > +       || cfn == CFN_MASK_GATHER_LOAD
> > +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
> >      ldst_p = true;
> >    else if (cfn == CFN_MASK_STORE)
> >      {
> > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> >    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> >        && rhs_code != CFN_GATHER_LOAD
> >        && rhs_code != CFN_MASK_GATHER_LOAD
> > +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> >        /* Not grouped loads are handled as externals for BB
> >  vectorization.  For loop vectorization we can handle
> >  splats the same we handle single element interleaving.  */
> > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> >        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> >      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> >        else
> >  {
> >    *max_nunits = this_max_nunits;
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index cd7c1090d88..263acf5d3cd 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> >  return false;
> >  
> >        mask_index = internal_fn_mask_index (ifn);
> > -      if (mask_index >= 0 && slp_node)
> > +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > -      if (mask_index >= 0
> > +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> >        &mask, NULL, &mask_dt, &mask_vectype))
> >  return false;
>  
> You are ignoring the mask argument and here only handle it when the
> IFN doesn't have a _LEN.  This doesn't seem to be forward looking
> to the point where you want to actually handle masked (aka conditional)
> gather.
>  
> Did you check that SLP is actually used to vectorize this?
>  
> Richard.
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-12  9:44     ` Richard Biener
@ 2023-10-12  9:50       ` juzhe.zhong
  2023-10-12  9:55         ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: juzhe.zhong @ 2023-10-12  9:50 UTC (permalink / raw)
  To: rguenther; +Cc: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 6995 bytes --]

I tree-vect-slp.cc:
vect_get_and_check_slp_defs
711: 

          tree type = TREE_TYPE (oprnd);
          dt = dts[i];
          if ((dt == vect_constant_def
               || dt == vect_external_def)
              && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
              && (TREE_CODE (type) == BOOLEAN_TYPE
                  || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
                                                      type)))
            {
              if (dump_enabled_p ())
                dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                                 "Build SLP failed: invalid type of def "
                                 "for variable-length SLP %T\n", oprnd);
              return -1;
            }

Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed:
Build SLP failed: invalid type of def




juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:44
To: 钟居哲
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, ??? wrote:
 
> Thanks Richi point it out.
> 
> I found this patch can't make conditional gather load succeed on SLP.
> 
> I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> 
> If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.
> 
> If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.
> 
> Is it reasonable ?
 
What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
even when the mask is -1?
 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-11 20:50
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
>  
> > This patch fixes this following FAILs in RISC-V regression:
> > 
> > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> > 
> > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > 
> > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> > situation as GATHER_LOAD (no conditional mask).
> > 
> > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> > 
> > gcc/ChangeLog:
> > 
> > * tree-vect-slp.cc (vect_get_operand_map):
> > (vect_build_slp_tree_1):
> > (vect_build_slp_tree_2):
> > * tree-vect-stmts.cc (vectorizable_load):
> > 
> > ---
> >  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
> >  gcc/tree-vect-stmts.cc |  4 ++--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index fa098f9ff4e..712c04ec278 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> >    case IFN_MASK_GATHER_LOAD:
> >      return arg1_arg4_map;
> >  
> > +   case IFN_MASK_LEN_GATHER_LOAD:
> > +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> > +
> > + - Unconditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > +
> > + - Conditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > +   : nullptr;
> > +
> >    case IFN_MASK_STORE:
> >      return arg3_arg2_map;
> >  
> > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> >  
> >    if (cfn == CFN_MASK_LOAD
> >        || cfn == CFN_GATHER_LOAD
> > -       || cfn == CFN_MASK_GATHER_LOAD)
> > +       || cfn == CFN_MASK_GATHER_LOAD
> > +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
> >      ldst_p = true;
> >    else if (cfn == CFN_MASK_STORE)
> >      {
> > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> >    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> >        && rhs_code != CFN_GATHER_LOAD
> >        && rhs_code != CFN_MASK_GATHER_LOAD
> > +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> >        /* Not grouped loads are handled as externals for BB
> >  vectorization.  For loop vectorization we can handle
> >  splats the same we handle single element interleaving.  */
> > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> >        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> >      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> >        else
> >  {
> >    *max_nunits = this_max_nunits;
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index cd7c1090d88..263acf5d3cd 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> >  return false;
> >  
> >        mask_index = internal_fn_mask_index (ifn);
> > -      if (mask_index >= 0 && slp_node)
> > +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > -      if (mask_index >= 0
> > +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> >        &mask, NULL, &mask_dt, &mask_vectype))
> >  return false;
>  
> You are ignoring the mask argument and here only handle it when the
> IFN doesn't have a _LEN.  This doesn't seem to be forward looking
> to the point where you want to actually handle masked (aka conditional)
> gather.
>  
> Did you check that SLP is actually used to vectorize this?
>  
> Richard.
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-12  9:50       ` juzhe.zhong
@ 2023-10-12  9:55         ` Richard Biener
  2023-10-12 10:18           ` juzhe.zhong
                             ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Richard Biener @ 2023-10-12  9:55 UTC (permalink / raw)
  To: juzhe.zhong; +Cc: gcc-patches, richard.sandiford

On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:

> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>           tree type = TREE_TYPE (oprnd);
>           dt = dts[i];
>           if ((dt == vect_constant_def
>                || dt == vect_external_def)
>               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>               && (TREE_CODE (type) == BOOLEAN_TYPE
>                   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>                                                       type)))
>             {
>               if (dump_enabled_p ())
>                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                                  "Build SLP failed: invalid type of def "
>                                  "for variable-length SLP %T\n", oprnd);
>               return -1;
>             }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed:
> Build SLP failed: invalid type of def

I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
 
>
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:
> > > 
> > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > 
> > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > > 
> > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> > > situation as GATHER_LOAD (no conditional mask).
> > > 
> > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-vect-slp.cc (vect_get_operand_map):
> > > (vect_build_slp_tree_1):
> > > (vect_build_slp_tree_2):
> > > * tree-vect-stmts.cc (vectorizable_load):
> > > 
> > > ---
> > >  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
> > >  gcc/tree-vect-stmts.cc |  4 ++--
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index fa098f9ff4e..712c04ec278 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> > >    case IFN_MASK_GATHER_LOAD:
> > >      return arg1_arg4_map;
> > >  
> > > +   case IFN_MASK_LEN_GATHER_LOAD:
> > > +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> > > +
> > > + - Unconditional gather load transforms
> > > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > > +
> > > + - Conditional gather load transforms
> > > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > > +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > > +   : nullptr;
> > > +
> > >    case IFN_MASK_STORE:
> > >      return arg3_arg2_map;
> > >  
> > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > >  
> > >    if (cfn == CFN_MASK_LOAD
> > >        || cfn == CFN_GATHER_LOAD
> > > -       || cfn == CFN_MASK_GATHER_LOAD)
> > > +       || cfn == CFN_MASK_GATHER_LOAD
> > > +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
> > >      ldst_p = true;
> > >    else if (cfn == CFN_MASK_STORE)
> > >      {
> > > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > >    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> > >        && rhs_code != CFN_GATHER_LOAD
> > >        && rhs_code != CFN_MASK_GATHER_LOAD
> > > +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> > >        /* Not grouped loads are handled as externals for BB
> > >  vectorization.  For loop vectorization we can handle
> > >  splats the same we handle single element interleaving.  */
> > > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> > >        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> > >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> > >      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > > -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > > +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > > +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> > >        else
> > >  {
> > >    *max_nunits = this_max_nunits;
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index cd7c1090d88..263acf5d3cd 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> > >  return false;
> > >  
> > >        mask_index = internal_fn_mask_index (ifn);
> > > -      if (mask_index >= 0 && slp_node)
> > > +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> > >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > > -      if (mask_index >= 0
> > > +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> > >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> > >        &mask, NULL, &mask_dt, &mask_vectype))
> > >  return false;
> >  
> > You are ignoring the mask argument and here only handle it when the
> > IFN doesn't have a _LEN.  This doesn't seem to be forward looking
> > to the point where you want to actually handle masked (aka conditional)
> > gather.
> >  
> > Did you check that SLP is actually used to vectorize this?
> >  
> > Richard.
> >  
> > 
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-12  9:55         ` Richard Biener
@ 2023-10-12 10:18           ` juzhe.zhong
  2023-10-12 11:12             ` Richard Biener
  2023-10-12 10:36           ` juzhe.zhong
  2023-10-12 10:57           ` juzhe.zhong
  2 siblings, 1 reply; 14+ messages in thread
From: juzhe.zhong @ 2023-10-12 10:18 UTC (permalink / raw)
  To: rguenther; +Cc: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 9525 bytes --]

Hi, Richi.

I restrict as you said into vect_external_def.

Then this condition made SLP failed:

-      if (mask_index >= 0
+      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
          && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
                                      &mask, NULL, &mask_dt, &mask_vectype))
        return false;

So I add 'internal_fn_len_index (ifn) < 0' for MASK_LEN_GATHER_LOAD does not check scalar mask.

Then ICE here:

vect_slp_analyze_node_operations
if (child
          && (SLP_TREE_DEF_TYPE (child) == vect_constant_def
              || SLP_TREE_DEF_TYPE (child) == vect_external_def)
          /* Perform usual caching, note code-generation still
             code-gens these nodes multiple times but we expect
             to CSE them later.  */
          && !visited_set.add (child))
        {
          visited_vec.safe_push (child);
          /* ???  After auditing more code paths make a "default"
             and push the vector type from NODE to all children
             if it is not already set.  */
          /* Compute the number of vectors to be generated.  */
          tree vector_type = SLP_TREE_VECTYPE (child);
          if (!vector_type)
            {
              /* For shifts with a scalar argument we don't need
                 to cost or code-generate anything.
                 ???  Represent this more explicitely.  */
              gcc_assert ((STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node)) ----> assert FAILed.
                           == shift_vec_info_type)
                          && j == 1);
              continue;
            }

Could you help me with that?


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:55
To: juzhe.zhong@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:
 
> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>           tree type = TREE_TYPE (oprnd);
>           dt = dts[i];
>           if ((dt == vect_constant_def
>                || dt == vect_external_def)
>               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>               && (TREE_CODE (type) == BOOLEAN_TYPE
>                   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>                                                       type)))
>             {
>               if (dump_enabled_p ())
>                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                                  "Build SLP failed: invalid type of def "
>                                  "for variable-length SLP %T\n", oprnd);
>               return -1;
>             }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed:
> Build SLP failed: invalid type of def
 
I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
>
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:
> > > 
> > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > 
> > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > > 
> > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> > > situation as GATHER_LOAD (no conditional mask).
> > > 
> > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-vect-slp.cc (vect_get_operand_map):
> > > (vect_build_slp_tree_1):
> > > (vect_build_slp_tree_2):
> > > * tree-vect-stmts.cc (vectorizable_load):
> > > 
> > > ---
> > >  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
> > >  gcc/tree-vect-stmts.cc |  4 ++--
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index fa098f9ff4e..712c04ec278 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> > >    case IFN_MASK_GATHER_LOAD:
> > >      return arg1_arg4_map;
> > >  
> > > +   case IFN_MASK_LEN_GATHER_LOAD:
> > > +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> > > +
> > > + - Unconditional gather load transforms
> > > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > > +
> > > + - Conditional gather load transforms
> > > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > > +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > > +   : nullptr;
> > > +
> > >    case IFN_MASK_STORE:
> > >      return arg3_arg2_map;
> > >  
> > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > >  
> > >    if (cfn == CFN_MASK_LOAD
> > >        || cfn == CFN_GATHER_LOAD
> > > -       || cfn == CFN_MASK_GATHER_LOAD)
> > > +       || cfn == CFN_MASK_GATHER_LOAD
> > > +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
> > >      ldst_p = true;
> > >    else if (cfn == CFN_MASK_STORE)
> > >      {
> > > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > >    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> > >        && rhs_code != CFN_GATHER_LOAD
> > >        && rhs_code != CFN_MASK_GATHER_LOAD
> > > +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> > >        /* Not grouped loads are handled as externals for BB
> > >  vectorization.  For loop vectorization we can handle
> > >  splats the same we handle single element interleaving.  */
> > > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> > >        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> > >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> > >      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > > -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > > +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > > +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> > >        else
> > >  {
> > >    *max_nunits = this_max_nunits;
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index cd7c1090d88..263acf5d3cd 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> > >  return false;
> > >  
> > >        mask_index = internal_fn_mask_index (ifn);
> > > -      if (mask_index >= 0 && slp_node)
> > > +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> > >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > > -      if (mask_index >= 0
> > > +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> > >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> > >        &mask, NULL, &mask_dt, &mask_vectype))
> > >  return false;
> >  
> > You are ignoring the mask argument and here only handle it when the
> > IFN doesn't have a _LEN.  This doesn't seem to be forward looking
> > to the point where you want to actually handle masked (aka conditional)
> > gather.
> >  
> > Did you check that SLP is actually used to vectorize this?
> >  
> > Richard.
> >  
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-12 10:18           ` juzhe.zhong
@ 2023-10-12 11:12             ` Richard Biener
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Biener @ 2023-10-12 11:12 UTC (permalink / raw)
  To: juzhe.zhong; +Cc: gcc-patches, richard.sandiford

On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:

> Hi, Richi.
> 
> I restrict as you said into vect_external_def.
> 
> Then this condition made SLP failed:
> 
> -      if (mask_index >= 0
> +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
>           && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
>                                       &mask, NULL, &mask_dt, &mask_vectype))
>         return false;
>
> So I add 'internal_fn_len_index (ifn) < 0' for MASK_LEN_GATHER_LOAD does not check scalar mask.

Rather figure why.
 
> Then ICE here:
> 
> vect_slp_analyze_node_operations
> if (child
>           && (SLP_TREE_DEF_TYPE (child) == vect_constant_def
>               || SLP_TREE_DEF_TYPE (child) == vect_external_def)
>           /* Perform usual caching, note code-generation still
>              code-gens these nodes multiple times but we expect
>              to CSE them later.  */
>           && !visited_set.add (child))
>         {
>           visited_vec.safe_push (child);
>           /* ???  After auditing more code paths make a "default"
>              and push the vector type from NODE to all children
>              if it is not already set.  */
>           /* Compute the number of vectors to be generated.  */
>           tree vector_type = SLP_TREE_VECTYPE (child);
>           if (!vector_type)
>             {
>               /* For shifts with a scalar argument we don't need
>                  to cost or code-generate anything.
>                  ???  Represent this more explicitely.  */
>               gcc_assert ((STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node)) ----> assert FAILed.
>                            == shift_vec_info_type)
>                           && j == 1);
>               continue;
>             }
> 
> Could you help me with that?
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >           tree type = TREE_TYPE (oprnd);
> >           dt = dts[i];
> >           if ((dt == vect_constant_def
> >                || dt == vect_external_def)
> >               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >               && (TREE_CODE (type) == BOOLEAN_TYPE
> >                   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> >                                                       type)))
> >             {
> >               if (dump_enabled_p ())
> >                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >                                  "Build SLP failed: invalid type of def "
> >                                  "for variable-length SLP %T\n", oprnd);
> >               return -1;
> >             }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zhong@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > > 
> > > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > > > 
> > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> > > > situation as GATHER_LOAD (no conditional mask).
> > > > 
> > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * tree-vect-slp.cc (vect_get_operand_map):
> > > > (vect_build_slp_tree_1):
> > > > (vect_build_slp_tree_2):
> > > > * tree-vect-stmts.cc (vectorizable_load):
> > > > 
> > > > ---
> > > >  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
> > > >  gcc/tree-vect-stmts.cc |  4 ++--
> > > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > > index fa098f9ff4e..712c04ec278 100644
> > > > --- a/gcc/tree-vect-slp.cc
> > > > +++ b/gcc/tree-vect-slp.cc
> > > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> > > >    case IFN_MASK_GATHER_LOAD:
> > > >      return arg1_arg4_map;
> > > >  
> > > > +   case IFN_MASK_LEN_GATHER_LOAD:
> > > > +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> > > > +
> > > > + - Unconditional gather load transforms
> > > > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > > > +
> > > > + - Conditional gather load transforms
> > > > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > > > +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > > > +   : nullptr;
> > > > +
> > > >    case IFN_MASK_STORE:
> > > >      return arg3_arg2_map;
> > > >  
> > > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > > >  
> > > >    if (cfn == CFN_MASK_LOAD
> > > >        || cfn == CFN_GATHER_LOAD
> > > > -       || cfn == CFN_MASK_GATHER_LOAD)
> > > > +       || cfn == CFN_MASK_GATHER_LOAD
> > > > +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
> > > >      ldst_p = true;
> > > >    else if (cfn == CFN_MASK_STORE)
> > > >      {
> > > > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > > >    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> > > >        && rhs_code != CFN_GATHER_LOAD
> > > >        && rhs_code != CFN_MASK_GATHER_LOAD
> > > > +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> > > >        /* Not grouped loads are handled as externals for BB
> > > >  vectorization.  For loop vectorization we can handle
> > > >  splats the same we handle single element interleaving.  */
> > > > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> > > >        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> > > >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> > > >      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > > > -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > > > +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > > > +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> > > >        else
> > > >  {
> > > >    *max_nunits = this_max_nunits;
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index cd7c1090d88..263acf5d3cd 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> > > >  return false;
> > > >  
> > > >        mask_index = internal_fn_mask_index (ifn);
> > > > -      if (mask_index >= 0 && slp_node)
> > > > +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> > > >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > > > -      if (mask_index >= 0
> > > > +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> > > >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> > > >        &mask, NULL, &mask_dt, &mask_vectype))
> > > >  return false;
> > >  
> > > You are ignoring the mask argument and here only handle it when the
> > > IFN doesn't have a _LEN.  This doesn't seem to be forward looking
> > > to the point where you want to actually handle masked (aka conditional)
> > > gather.
> > >  
> > > Did you check that SLP is actually used to vectorize this?
> > >  
> > > Richard.
> > >  
> > > 
> >  
> > 
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-12  9:55         ` Richard Biener
  2023-10-12 10:18           ` juzhe.zhong
@ 2023-10-12 10:36           ` juzhe.zhong
  2023-10-12 11:13             ` Richard Biener
  2023-10-12 10:57           ` juzhe.zhong
  2 siblings, 1 reply; 14+ messages in thread
From: juzhe.zhong @ 2023-10-12 10:36 UTC (permalink / raw)
  To: rguenther; +Cc: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 8395 bytes --]

Oh. I see.

Here make vect_constant_def failed to SLP:

tree-vect-slp.cc:
vect_build_slp_tree_2
line 2354:

      if (oprnd_info->first_dt == vect_external_def
          || oprnd_info->first_dt == vect_constant_def)
        {
          slp_tree invnode = vect_create_new_slp_node (oprnd_info->ops);
          SLP_TREE_DEF_TYPE (invnode) = oprnd_info->first_dt;
          oprnd_info->ops = vNULL;
          children.safe_push (invnode);
          continue;
        }

It seems that we handle vect_constant_def same as vect_external_def.
So failed to SLP ?



juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:55
To: juzhe.zhong@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:
 
> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>           tree type = TREE_TYPE (oprnd);
>           dt = dts[i];
>           if ((dt == vect_constant_def
>                || dt == vect_external_def)
>               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>               && (TREE_CODE (type) == BOOLEAN_TYPE
>                   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>                                                       type)))
>             {
>               if (dump_enabled_p ())
>                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                                  "Build SLP failed: invalid type of def "
>                                  "for variable-length SLP %T\n", oprnd);
>               return -1;
>             }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed:
> Build SLP failed: invalid type of def
 
I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
>
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:
> > > 
> > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > 
> > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > > 
> > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> > > situation as GATHER_LOAD (no conditional mask).
> > > 
> > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-vect-slp.cc (vect_get_operand_map):
> > > (vect_build_slp_tree_1):
> > > (vect_build_slp_tree_2):
> > > * tree-vect-stmts.cc (vectorizable_load):
> > > 
> > > ---
> > >  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
> > >  gcc/tree-vect-stmts.cc |  4 ++--
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index fa098f9ff4e..712c04ec278 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> > >    case IFN_MASK_GATHER_LOAD:
> > >      return arg1_arg4_map;
> > >  
> > > +   case IFN_MASK_LEN_GATHER_LOAD:
> > > +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> > > +
> > > + - Unconditional gather load transforms
> > > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > > +
> > > + - Conditional gather load transforms
> > > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > > +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > > +   : nullptr;
> > > +
> > >    case IFN_MASK_STORE:
> > >      return arg3_arg2_map;
> > >  
> > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > >  
> > >    if (cfn == CFN_MASK_LOAD
> > >        || cfn == CFN_GATHER_LOAD
> > > -       || cfn == CFN_MASK_GATHER_LOAD)
> > > +       || cfn == CFN_MASK_GATHER_LOAD
> > > +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
> > >      ldst_p = true;
> > >    else if (cfn == CFN_MASK_STORE)
> > >      {
> > > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > >    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> > >        && rhs_code != CFN_GATHER_LOAD
> > >        && rhs_code != CFN_MASK_GATHER_LOAD
> > > +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> > >        /* Not grouped loads are handled as externals for BB
> > >  vectorization.  For loop vectorization we can handle
> > >  splats the same we handle single element interleaving.  */
> > > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> > >        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> > >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> > >      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > > -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > > +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > > +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> > >        else
> > >  {
> > >    *max_nunits = this_max_nunits;
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index cd7c1090d88..263acf5d3cd 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> > >  return false;
> > >  
> > >        mask_index = internal_fn_mask_index (ifn);
> > > -      if (mask_index >= 0 && slp_node)
> > > +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> > >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > > -      if (mask_index >= 0
> > > +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> > >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> > >        &mask, NULL, &mask_dt, &mask_vectype))
> > >  return false;
> >  
> > You are ignoring the mask argument and here only handle it when the
> > IFN doesn't have a _LEN.  This doesn't seem to be forward looking
> > to the point where you want to actually handle masked (aka conditional)
> > gather.
> >  
> > Did you check that SLP is actually used to vectorize this?
> >  
> > Richard.
> >  
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-12 10:36           ` juzhe.zhong
@ 2023-10-12 11:13             ` Richard Biener
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Biener @ 2023-10-12 11:13 UTC (permalink / raw)
  To: juzhe.zhong; +Cc: gcc-patches, richard.sandiford

On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:

> Oh. I see.
> 
> Here make vect_constant_def failed to SLP:
> 
> tree-vect-slp.cc:
> vect_build_slp_tree_2
> line 2354:
> 
>       if (oprnd_info->first_dt == vect_external_def
>           || oprnd_info->first_dt == vect_constant_def)
>         {
>           slp_tree invnode = vect_create_new_slp_node (oprnd_info->ops);
>           SLP_TREE_DEF_TYPE (invnode) = oprnd_info->first_dt;
>           oprnd_info->ops = vNULL;
>           children.safe_push (invnode);
>           continue;
>         }
> 
> It seems that we handle vect_constant_def same as vect_external_def.
> So failed to SLP ?

Why?  We _should_ see a SLP node for the all-true mask operand.

> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >           tree type = TREE_TYPE (oprnd);
> >           dt = dts[i];
> >           if ((dt == vect_constant_def
> >                || dt == vect_external_def)
> >               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >               && (TREE_CODE (type) == BOOLEAN_TYPE
> >                   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> >                                                       type)))
> >             {
> >               if (dump_enabled_p ())
> >                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >                                  "Build SLP failed: invalid type of def "
> >                                  "for variable-length SLP %T\n", oprnd);
> >               return -1;
> >             }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zhong@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > > 
> > > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > > > 
> > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> > > > situation as GATHER_LOAD (no conditional mask).
> > > > 
> > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * tree-vect-slp.cc (vect_get_operand_map):
> > > > (vect_build_slp_tree_1):
> > > > (vect_build_slp_tree_2):
> > > > * tree-vect-stmts.cc (vectorizable_load):
> > > > 
> > > > ---
> > > >  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
> > > >  gcc/tree-vect-stmts.cc |  4 ++--
> > > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > > index fa098f9ff4e..712c04ec278 100644
> > > > --- a/gcc/tree-vect-slp.cc
> > > > +++ b/gcc/tree-vect-slp.cc
> > > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> > > >    case IFN_MASK_GATHER_LOAD:
> > > >      return arg1_arg4_map;
> > > >  
> > > > +   case IFN_MASK_LEN_GATHER_LOAD:
> > > > +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> > > > +
> > > > + - Unconditional gather load transforms
> > > > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > > > +
> > > > + - Conditional gather load transforms
> > > > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > > > +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > > > +   : nullptr;
> > > > +
> > > >    case IFN_MASK_STORE:
> > > >      return arg3_arg2_map;
> > > >  
> > > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > > >  
> > > >    if (cfn == CFN_MASK_LOAD
> > > >        || cfn == CFN_GATHER_LOAD
> > > > -       || cfn == CFN_MASK_GATHER_LOAD)
> > > > +       || cfn == CFN_MASK_GATHER_LOAD
> > > > +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
> > > >      ldst_p = true;
> > > >    else if (cfn == CFN_MASK_STORE)
> > > >      {
> > > > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > > >    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> > > >        && rhs_code != CFN_GATHER_LOAD
> > > >        && rhs_code != CFN_MASK_GATHER_LOAD
> > > > +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> > > >        /* Not grouped loads are handled as externals for BB
> > > >  vectorization.  For loop vectorization we can handle
> > > >  splats the same we handle single element interleaving.  */
> > > > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> > > >        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> > > >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> > > >      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > > > -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > > > +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > > > +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> > > >        else
> > > >  {
> > > >    *max_nunits = this_max_nunits;
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index cd7c1090d88..263acf5d3cd 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> > > >  return false;
> > > >  
> > > >        mask_index = internal_fn_mask_index (ifn);
> > > > -      if (mask_index >= 0 && slp_node)
> > > > +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> > > >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > > > -      if (mask_index >= 0
> > > > +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> > > >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> > > >        &mask, NULL, &mask_dt, &mask_vectype))
> > > >  return false;
> > >  
> > > You are ignoring the mask argument and here only handle it when the
> > > IFN doesn't have a _LEN.  This doesn't seem to be forward looking
> > > to the point where you want to actually handle masked (aka conditional)
> > > gather.
> > >  
> > > Did you check that SLP is actually used to vectorize this?
> > >  
> > > Richard.
> > >  
> > > 
> >  
> > 
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-12  9:55         ` Richard Biener
  2023-10-12 10:18           ` juzhe.zhong
  2023-10-12 10:36           ` juzhe.zhong
@ 2023-10-12 10:57           ` juzhe.zhong
  2023-10-12 11:14             ` Richard Biener
  2 siblings, 1 reply; 14+ messages in thread
From: juzhe.zhong @ 2023-10-12 10:57 UTC (permalink / raw)
  To: rguenther; +Cc: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 8443 bytes --]

In tree-vect-stmts.cc

vect_check_scalar_mask

Failed here:

  /* If the caller is not prepared for adjusting an external/constant
     SLP mask vector type fail.  */
  if (slp_node
      && !mask_node
      && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "SLP mask argument is not vectorized.\n");
      return false;
    }

If we allow vect_constant_def, we should adjust constant SLP mask ? in the caller "vectorizable_load" ?

But I don't know how to adjust that.



juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 17:55
To: juzhe.zhong@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:
 
> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>           tree type = TREE_TYPE (oprnd);
>           dt = dts[i];
>           if ((dt == vect_constant_def
>                || dt == vect_external_def)
>               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>               && (TREE_CODE (type) == BOOLEAN_TYPE
>                   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>                                                       type)))
>             {
>               if (dump_enabled_p ())
>                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                                  "Build SLP failed: invalid type of def "
>                                  "for variable-length SLP %T\n", oprnd);
>               return -1;
>             }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed:
> Build SLP failed: invalid type of def
 
I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
>
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:
> > > 
> > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > 
> > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > > 
> > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> > > situation as GATHER_LOAD (no conditional mask).
> > > 
> > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-vect-slp.cc (vect_get_operand_map):
> > > (vect_build_slp_tree_1):
> > > (vect_build_slp_tree_2):
> > > * tree-vect-stmts.cc (vectorizable_load):
> > > 
> > > ---
> > >  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
> > >  gcc/tree-vect-stmts.cc |  4 ++--
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index fa098f9ff4e..712c04ec278 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> > >    case IFN_MASK_GATHER_LOAD:
> > >      return arg1_arg4_map;
> > >  
> > > +   case IFN_MASK_LEN_GATHER_LOAD:
> > > +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> > > +
> > > + - Unconditional gather load transforms
> > > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > > +
> > > + - Conditional gather load transforms
> > > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > > +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > > +   : nullptr;
> > > +
> > >    case IFN_MASK_STORE:
> > >      return arg3_arg2_map;
> > >  
> > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > >  
> > >    if (cfn == CFN_MASK_LOAD
> > >        || cfn == CFN_GATHER_LOAD
> > > -       || cfn == CFN_MASK_GATHER_LOAD)
> > > +       || cfn == CFN_MASK_GATHER_LOAD
> > > +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
> > >      ldst_p = true;
> > >    else if (cfn == CFN_MASK_STORE)
> > >      {
> > > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > >    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> > >        && rhs_code != CFN_GATHER_LOAD
> > >        && rhs_code != CFN_MASK_GATHER_LOAD
> > > +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> > >        /* Not grouped loads are handled as externals for BB
> > >  vectorization.  For loop vectorization we can handle
> > >  splats the same we handle single element interleaving.  */
> > > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> > >        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> > >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> > >      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > > -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > > +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > > +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> > >        else
> > >  {
> > >    *max_nunits = this_max_nunits;
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index cd7c1090d88..263acf5d3cd 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> > >  return false;
> > >  
> > >        mask_index = internal_fn_mask_index (ifn);
> > > -      if (mask_index >= 0 && slp_node)
> > > +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> > >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > > -      if (mask_index >= 0
> > > +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> > >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> > >        &mask, NULL, &mask_dt, &mask_vectype))
> > >  return false;
> >  
> > You are ignoring the mask argument and here only handle it when the
> > IFN doesn't have a _LEN.  This doesn't seem to be forward looking
> > to the point where you want to actually handle masked (aka conditional)
> > gather.
> >  
> > Did you check that SLP is actually used to vectorize this?
> >  
> > Richard.
> >  
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-12 10:57           ` juzhe.zhong
@ 2023-10-12 11:14             ` Richard Biener
  2023-10-12 11:24               ` juzhe.zhong
  2023-10-13  4:33               ` juzhe.zhong
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Biener @ 2023-10-12 11:14 UTC (permalink / raw)
  To: juzhe.zhong; +Cc: gcc-patches, richard.sandiford

On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:

> In tree-vect-stmts.cc
> 
> vect_check_scalar_mask
> 
> Failed here:
> 
>   /* If the caller is not prepared for adjusting an external/constant
>      SLP mask vector type fail.  */
>   if (slp_node
>       && !mask_node

^^^

where's the mask_node?

>       && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
>     {
>       if (dump_enabled_p ())
>         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                          "SLP mask argument is not vectorized.\n");
>       return false;
>     }
> 
> If we allow vect_constant_def, we should adjust constant SLP mask ? in the caller "vectorizable_load" ?
> 
> But I don't know how to adjust that.
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >           tree type = TREE_TYPE (oprnd);
> >           dt = dts[i];
> >           if ((dt == vect_constant_def
> >                || dt == vect_external_def)
> >               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >               && (TREE_CODE (type) == BOOLEAN_TYPE
> >                   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> >                                                       type)))
> >             {
> >               if (dump_enabled_p ())
> >                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >                                  "Build SLP failed: invalid type of def "
> >                                  "for variable-length SLP %T\n", oprnd);
> >               return -1;
> >             }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zhong@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > > 
> > > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > > > 
> > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> > > > situation as GATHER_LOAD (no conditional mask).
> > > > 
> > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * tree-vect-slp.cc (vect_get_operand_map):
> > > > (vect_build_slp_tree_1):
> > > > (vect_build_slp_tree_2):
> > > > * tree-vect-stmts.cc (vectorizable_load):
> > > > 
> > > > ---
> > > >  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
> > > >  gcc/tree-vect-stmts.cc |  4 ++--
> > > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > > index fa098f9ff4e..712c04ec278 100644
> > > > --- a/gcc/tree-vect-slp.cc
> > > > +++ b/gcc/tree-vect-slp.cc
> > > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> > > >    case IFN_MASK_GATHER_LOAD:
> > > >      return arg1_arg4_map;
> > > >  
> > > > +   case IFN_MASK_LEN_GATHER_LOAD:
> > > > +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> > > > +
> > > > + - Unconditional gather load transforms
> > > > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > > > +
> > > > + - Conditional gather load transforms
> > > > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > > > +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > > > +   : nullptr;
> > > > +
> > > >    case IFN_MASK_STORE:
> > > >      return arg3_arg2_map;
> > > >  
> > > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > > >  
> > > >    if (cfn == CFN_MASK_LOAD
> > > >        || cfn == CFN_GATHER_LOAD
> > > > -       || cfn == CFN_MASK_GATHER_LOAD)
> > > > +       || cfn == CFN_MASK_GATHER_LOAD
> > > > +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
> > > >      ldst_p = true;
> > > >    else if (cfn == CFN_MASK_STORE)
> > > >      {
> > > > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > > >    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> > > >        && rhs_code != CFN_GATHER_LOAD
> > > >        && rhs_code != CFN_MASK_GATHER_LOAD
> > > > +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> > > >        /* Not grouped loads are handled as externals for BB
> > > >  vectorization.  For loop vectorization we can handle
> > > >  splats the same we handle single element interleaving.  */
> > > > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> > > >        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> > > >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> > > >      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > > > -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > > > +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > > > +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> > > >        else
> > > >  {
> > > >    *max_nunits = this_max_nunits;
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index cd7c1090d88..263acf5d3cd 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> > > >  return false;
> > > >  
> > > >        mask_index = internal_fn_mask_index (ifn);
> > > > -      if (mask_index >= 0 && slp_node)
> > > > +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> > > >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > > > -      if (mask_index >= 0
> > > > +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> > > >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> > > >        &mask, NULL, &mask_dt, &mask_vectype))
> > > >  return false;
> > >  
> > > You are ignoring the mask argument and here only handle it when the
> > > IFN doesn't have a _LEN.  This doesn't seem to be forward looking
> > > to the point where you want to actually handle masked (aka conditional)
> > > gather.
> > >  
> > > Did you check that SLP is actually used to vectorize this?
> > >  
> > > Richard.
> > >  
> > > 
> >  
> > 
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-12 11:14             ` Richard Biener
@ 2023-10-12 11:24               ` juzhe.zhong
  2023-10-13  4:33               ` juzhe.zhong
  1 sibling, 0 replies; 14+ messages in thread
From: juzhe.zhong @ 2023-10-12 11:24 UTC (permalink / raw)
  To: rguenther; +Cc: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 9444 bytes --]

The mask node is NULL since the caller :

      if (mask_index >= 0
          && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
                                      &mask, NULL, &mask_dt, &mask_vectype))
        return false;

pass NULL to mask_node.


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 19:14
To: juzhe.zhong@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:
 
> In tree-vect-stmts.cc
> 
> vect_check_scalar_mask
> 
> Failed here:
> 
>   /* If the caller is not prepared for adjusting an external/constant
>      SLP mask vector type fail.  */
>   if (slp_node
>       && !mask_node
 
^^^
 
where's the mask_node?
 
>       && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
>     {
>       if (dump_enabled_p ())
>         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                          "SLP mask argument is not vectorized.\n");
>       return false;
>     }
> 
> If we allow vect_constant_def, we should adjust constant SLP mask ? in the caller "vectorizable_load" ?
> 
> But I don't know how to adjust that.
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >           tree type = TREE_TYPE (oprnd);
> >           dt = dts[i];
> >           if ((dt == vect_constant_def
> >                || dt == vect_external_def)
> >               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >               && (TREE_CODE (type) == BOOLEAN_TYPE
> >                   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> >                                                       type)))
> >             {
> >               if (dump_enabled_p ())
> >                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >                                  "Build SLP failed: invalid type of def "
> >                                  "for variable-length SLP %T\n", oprnd);
> >               return -1;
> >             }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zhong@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > > 
> > > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > > > 
> > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> > > > situation as GATHER_LOAD (no conditional mask).
> > > > 
> > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * tree-vect-slp.cc (vect_get_operand_map):
> > > > (vect_build_slp_tree_1):
> > > > (vect_build_slp_tree_2):
> > > > * tree-vect-stmts.cc (vectorizable_load):
> > > > 
> > > > ---
> > > >  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
> > > >  gcc/tree-vect-stmts.cc |  4 ++--
> > > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > > index fa098f9ff4e..712c04ec278 100644
> > > > --- a/gcc/tree-vect-slp.cc
> > > > +++ b/gcc/tree-vect-slp.cc
> > > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> > > >    case IFN_MASK_GATHER_LOAD:
> > > >      return arg1_arg4_map;
> > > >  
> > > > +   case IFN_MASK_LEN_GATHER_LOAD:
> > > > +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> > > > +
> > > > + - Unconditional gather load transforms
> > > > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > > > +
> > > > + - Conditional gather load transforms
> > > > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > > > +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > > > +   : nullptr;
> > > > +
> > > >    case IFN_MASK_STORE:
> > > >      return arg3_arg2_map;
> > > >  
> > > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > > >  
> > > >    if (cfn == CFN_MASK_LOAD
> > > >        || cfn == CFN_GATHER_LOAD
> > > > -       || cfn == CFN_MASK_GATHER_LOAD)
> > > > +       || cfn == CFN_MASK_GATHER_LOAD
> > > > +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
> > > >      ldst_p = true;
> > > >    else if (cfn == CFN_MASK_STORE)
> > > >      {
> > > > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > > >    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> > > >        && rhs_code != CFN_GATHER_LOAD
> > > >        && rhs_code != CFN_MASK_GATHER_LOAD
> > > > +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> > > >        /* Not grouped loads are handled as externals for BB
> > > >  vectorization.  For loop vectorization we can handle
> > > >  splats the same we handle single element interleaving.  */
> > > > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> > > >        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> > > >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> > > >      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > > > -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > > > +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > > > +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> > > >        else
> > > >  {
> > > >    *max_nunits = this_max_nunits;
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index cd7c1090d88..263acf5d3cd 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> > > >  return false;
> > > >  
> > > >        mask_index = internal_fn_mask_index (ifn);
> > > > -      if (mask_index >= 0 && slp_node)
> > > > +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> > > >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > > > -      if (mask_index >= 0
> > > > +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> > > >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> > > >        &mask, NULL, &mask_dt, &mask_vectype))
> > > >  return false;
> > >  
> > > You are ignoring the mask argument and here only handle it when the
> > > IFN doesn't have a _LEN.  This doesn't seem to be forward looking
> > > to the point where you want to actually handle masked (aka conditional)
> > > gather.
> > >  
> > > Did you check that SLP is actually used to vectorize this?
> > >  
> > > Richard.
> > >  
> > > 
> >  
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
  2023-10-12 11:14             ` Richard Biener
  2023-10-12 11:24               ` juzhe.zhong
@ 2023-10-13  4:33               ` juzhe.zhong
  1 sibling, 0 replies; 14+ messages in thread
From: juzhe.zhong @ 2023-10-13  4:33 UTC (permalink / raw)
  To: rguenther; +Cc: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 9353 bytes --]

Hi, Richi. 

As you suggest, I keep MAK_LEN_GATHER_LOAD (...,-1) format and support SLP for that in V3:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632846.html 

Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-10-12 19:14
To: juzhe.zhong@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:
 
> In tree-vect-stmts.cc
> 
> vect_check_scalar_mask
> 
> Failed here:
> 
>   /* If the caller is not prepared for adjusting an external/constant
>      SLP mask vector type fail.  */
>   if (slp_node
>       && !mask_node
 
^^^
 
where's the mask_node?
 
>       && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
>     {
>       if (dump_enabled_p ())
>         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                          "SLP mask argument is not vectorized.\n");
>       return false;
>     }
> 
> If we allow vect_constant_def, we should adjust constant SLP mask ? in the caller "vectorizable_load" ?
> 
> But I don't know how to adjust that.
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >           tree type = TREE_TYPE (oprnd);
> >           dt = dts[i];
> >           if ((dt == vect_constant_def
> >                || dt == vect_external_def)
> >               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >               && (TREE_CODE (type) == BOOLEAN_TYPE
> >                   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> >                                                       type)))
> >             {
> >               if (dump_enabled_p ())
> >                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >                                  "Build SLP failed: invalid type of def "
> >                                  "for variable-length SLP %T\n", oprnd);
> >               return -1;
> >             }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zhong@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
> > > > 
> > > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > > > 
> > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same
> > > > situation as GATHER_LOAD (no conditional mask).
> > > > 
> > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * tree-vect-slp.cc (vect_get_operand_map):
> > > > (vect_build_slp_tree_1):
> > > > (vect_build_slp_tree_2):
> > > > * tree-vect-stmts.cc (vectorizable_load):
> > > > 
> > > > ---
> > > >  gcc/tree-vect-slp.cc   | 18 ++++++++++++++++--
> > > >  gcc/tree-vect-stmts.cc |  4 ++--
> > > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > > index fa098f9ff4e..712c04ec278 100644
> > > > --- a/gcc/tree-vect-slp.cc
> > > > +++ b/gcc/tree-vect-slp.cc
> > > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> > > >    case IFN_MASK_GATHER_LOAD:
> > > >      return arg1_arg4_map;
> > > >  
> > > > +   case IFN_MASK_LEN_GATHER_LOAD:
> > > > +     /* In tree-vect-patterns.cc, we will have these 2 situations:
> > > > +
> > > > + - Unconditional gather load transforms
> > > > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > > > +
> > > > + - Conditional gather load transforms
> > > > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > > > +     return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > > > +   : nullptr;
> > > > +
> > > >    case IFN_MASK_STORE:
> > > >      return arg3_arg2_map;
> > > >  
> > > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > > >  
> > > >    if (cfn == CFN_MASK_LOAD
> > > >        || cfn == CFN_GATHER_LOAD
> > > > -       || cfn == CFN_MASK_GATHER_LOAD)
> > > > +       || cfn == CFN_MASK_GATHER_LOAD
> > > > +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
> > > >      ldst_p = true;
> > > >    else if (cfn == CFN_MASK_STORE)
> > > >      {
> > > > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > > >    if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> > > >        && rhs_code != CFN_GATHER_LOAD
> > > >        && rhs_code != CFN_MASK_GATHER_LOAD
> > > > +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> > > >        /* Not grouped loads are handled as externals for BB
> > > >  vectorization.  For loop vectorization we can handle
> > > >  splats the same we handle single element interleaving.  */
> > > > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> > > >        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> > > >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> > > >      || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > > > -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > > > +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> > > > +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> > > >        else
> > > >  {
> > > >    *max_nunits = this_max_nunits;
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index cd7c1090d88..263acf5d3cd 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
> > > >  return false;
> > > >  
> > > >        mask_index = internal_fn_mask_index (ifn);
> > > > -      if (mask_index >= 0 && slp_node)
> > > > +      if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
> > > >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > > > -      if (mask_index >= 0
> > > > +      if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> > > >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> > > >        &mask, NULL, &mask_dt, &mask_vectype))
> > > >  return false;
> > >  
> > > You are ignoring the mask argument and here only handle it when the
> > > IFN doesn't have a _LEN.  This doesn't seem to be forward looking
> > > to the point where you want to actually handle masked (aka conditional)
> > > gather.
> > >  
> > > Did you check that SLP is actually used to vectorize this?
> > >  
> > > Richard.
> > >  
> > > 
> >  
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-10-13  4:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-11 12:27 [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] Juzhe-Zhong
2023-10-11 12:50 ` Richard Biener
2023-10-11 16:02   ` 钟居哲
2023-10-12  9:44     ` Richard Biener
2023-10-12  9:50       ` juzhe.zhong
2023-10-12  9:55         ` Richard Biener
2023-10-12 10:18           ` juzhe.zhong
2023-10-12 11:12             ` Richard Biener
2023-10-12 10:36           ` juzhe.zhong
2023-10-12 11:13             ` Richard Biener
2023-10-12 10:57           ` juzhe.zhong
2023-10-12 11:14             ` Richard Biener
2023-10-12 11:24               ` juzhe.zhong
2023-10-13  4:33               ` juzhe.zhong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).