public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat
@ 2018-08-07 19:25 Will Schmidt
  2018-08-09 13:53 ` Bill Schmidt
  2018-08-17 14:11 ` [PATCH, RFC, rs6000] enable GIMPLE folding " Richard Biener
  0 siblings, 2 replies; 17+ messages in thread
From: Will Schmidt @ 2018-08-07 19:25 UTC (permalink / raw)
  To: Richard Biener, Segher Boessenkool, Bill Schmidt, David Edelsohn
  Cc: GCC Patches

Hi
Enable GIMPLE folding of the vec_splat() intrinsic.
    
For review.. feedback is expected. :-)
    
I came up with the following after spending some time poking around
at the tree_vec_extract() and vector_element() functions as seen
in tree-vect-generic.c looking for insights.  Some of this seems a
bit clunky to me yet, but this is functional as far as make-check
can tell, and is as far as I can get without additional input.
    
This uses the tree_vec_extract() function out of tree-vect-generic.c
to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
made non-static as part of this change.

In review of the .gimple output, this folding takes a sample testcase
of 
	vector bool int testb_0  (vector bool int x)
	{
	  return vec_splat (x, 0b00000); 
	}
from:
testb_0 (__vector __bool int x)
{
  __vector __bool intD.1486 D.2855;

  _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
  _2 = __builtin_altivec_vspltwD.1919 (_1, 0);
  D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
  return D.2855;
}
to:
 testb_0 (__vector __bool int x)
{
  __vector __bool intD.1486 D.2855;

  _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
  D.2856 = BIT_FIELD_REF <_1, 32, 0>;
  _2 = {D.2856, D.2856, D.2856, D.2856};
  D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
  return D.2855;
}


Testcases are being posted as a separate patch.
    
OK for trunk?   
Thanks,
-Will
    
[gcc]
    
2018-08-07  Will Schmidt  <will_schmidt@vnet.ibm.com>
    
	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
	  early gimple folding of vec_splat().
	* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
	* gimple-fold.h:  Add an extern define for tree_vec_extract().

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 35c32be..acc6b49 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15764,10 +15764,56 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 	 tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
 	 g = gimple_build_assign (lhs, splat_tree);
 	 gimple_set_location (g, gimple_location (stmt));
 	 gsi_replace (gsi, g, true);
 	 return true;
+	}
+
+    /* Flavors of vec_splat.  */
+    // a = vec_splat (b, 0x3) ;  // a = { b[3],b[3],b[3],...};
+    case ALTIVEC_BUILTIN_VSPLTB:
+    case ALTIVEC_BUILTIN_VSPLTH:
+    case ALTIVEC_BUILTIN_VSPLTW:
+    case VSX_BUILTIN_XXSPLTD_V2DI:
+    case VSX_BUILTIN_XXSPLTD_V2DF:
+      {
+	arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
+	arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
+	/* Only fold the vec_splat_*() if arg1 is a constant
+	   5-bit unsigned literal.  */
+	if (TREE_CODE (arg1) != INTEGER_CST || TREE_INT_CST_LOW (arg1) > 0x1f)
+	  return false;
+
+	lhs = gimple_call_lhs (stmt);
+	tree lhs_type = TREE_TYPE (lhs);
+
+	tree splat;
+	if (TREE_CODE (arg0) == VECTOR_CST)
+	  splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (arg1));
+	else
+	  {
+	    /* Determine (in bits) the length and start location of the
+	       splat value for a call to the tree_vec_extract helper.  */
+	    int tree_size_in_bits = TREE_INT_CST_LOW (size_in_bytes (lhs_type))
+				    * BITS_PER_UNIT;
+	    int splat_elem_size = tree_size_in_bits / VECTOR_CST_NELTS (arg0);
+	    int splat_start_bit = TREE_INT_CST_LOW (arg1) * splat_elem_size;
+	    /* Do not attempt to early-fold if the size + specified offset into
+	       the vector would touch outside of the source vector.  */
+	    if ((splat_start_bit + splat_elem_size) > tree_size_in_bits)
+	      return false;
+	    tree len = build_int_cst (bitsizetype, splat_elem_size);
+	    tree start = build_int_cst (bitsizetype, splat_start_bit);
+	    splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
+				      len, start);
+	}
+	/* And finally, build the new vector.  */
+	tree splat_tree = build_vector_from_val (lhs_type, splat);
+	g = gimple_build_assign (lhs, splat_tree);
+	gimple_set_location (g, gimple_location (stmt));
+	gsi_replace (gsi, g, true);
+	return true;
       }
 
     /* vec_mergel (integrals).  */
     case ALTIVEC_BUILTIN_VMRGLH:
     case ALTIVEC_BUILTIN_VMRGLW:
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 04e9bfa..e634180 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -59,10 +59,11 @@ extern tree gimple_fold_indirect_ref (tree);
 extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
 extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
 extern bool arith_code_with_undefined_signed_overflow (tree_code);
 extern gimple_seq rewrite_to_defined_overflow (gimple *);
 extern void replace_call_with_value (gimple_stmt_iterator *, tree);
+extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, tree);
 
 /* gimple_build, functionally matching fold_buildN, outputs stmts
    int the provided sequence, matching and simplifying them on-the-fly.
    Supposed to replace force_gimple_operand (fold_buildN (...), ...).  */
 extern tree gimple_build (gimple_seq *, location_t,
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index 909f790..1c9701d 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -118,11 +118,11 @@ build_word_mode_vector_type (int nunits)
 
 typedef tree (*elem_op_func) (gimple_stmt_iterator *,
 			      tree, tree, tree, tree, tree, enum tree_code,
 			      tree);
 
-static inline tree
+tree
 tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
 		  tree t, tree bitsize, tree bitpos)
 {
   if (TREE_CODE (t) == SSA_NAME)
     {


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat
  2018-08-07 19:25 [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat Will Schmidt
@ 2018-08-09 13:53 ` Bill Schmidt
  2018-08-09 21:07   ` Will Schmidt
  2018-08-16 15:50   ` [PATCH, RFC, rs6000, v2] " Will Schmidt
  2018-08-17 14:11 ` [PATCH, RFC, rs6000] enable GIMPLE folding " Richard Biener
  1 sibling, 2 replies; 17+ messages in thread
From: Bill Schmidt @ 2018-08-09 13:53 UTC (permalink / raw)
  To: will_schmidt
  Cc: Richard Biener, Segher Boessenkool, Bill Schmidt, David Edelsohn,
	GCC Patches


> On Aug 7, 2018, at 2:25 PM, Will Schmidt <will_schmidt@vnet.ibm.com> wrote:
> 
> Hi
> Enable GIMPLE folding of the vec_splat() intrinsic.
> 
> For review.. feedback is expected. :-)
> 
> I came up with the following after spending some time poking around
> at the tree_vec_extract() and vector_element() functions as seen
> in tree-vect-generic.c looking for insights.  Some of this seems a
> bit clunky to me yet, but this is functional as far as make-check
> can tell, and is as far as I can get without additional input.
> 
> This uses the tree_vec_extract() function out of tree-vect-generic.c
> to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
> made non-static as part of this change.
> 
> In review of the .gimple output, this folding takes a sample testcase
> of 
> 	vector bool int testb_0  (vector bool int x)
> 	{
> 	  return vec_splat (x, 0b00000); 
> 	}
> from:
> testb_0 (__vector __bool int x)
> {
>  __vector __bool intD.1486 D.2855;
> 
>  _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
>  _2 = __builtin_altivec_vspltwD.1919 (_1, 0);
>  D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
>  return D.2855;
> }
> to:
> testb_0 (__vector __bool int x)
> {
>  __vector __bool intD.1486 D.2855;
> 
>  _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
>  D.2856 = BIT_FIELD_REF <_1, 32, 0>;
>  _2 = {D.2856, D.2856, D.2856, D.2856};
>  D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
>  return D.2855;
> }
> 
This looks okay.
> 
> Testcases are being posted as a separate patch.
> 
> OK for trunk?   
> Thanks,
> -Will
> 
> [gcc]
> 
> 2018-08-07  Will Schmidt  <will_schmidt@vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
> 	  early gimple folding of vec_splat().
> 	* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
> 	* gimple-fold.h:  Add an extern define for tree_vec_extract().
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 35c32be..acc6b49 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -15764,10 +15764,56 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> 	 tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
> 	 g = gimple_build_assign (lhs, splat_tree);
> 	 gimple_set_location (g, gimple_location (stmt));
> 	 gsi_replace (gsi, g, true);
> 	 return true;
> +	}
> +
> +    /* Flavors of vec_splat.  */
> +    // a = vec_splat (b, 0x3) ;  // a = { b[3],b[3],b[3],...};
> +    case ALTIVEC_BUILTIN_VSPLTB:
> +    case ALTIVEC_BUILTIN_VSPLTH:
> +    case ALTIVEC_BUILTIN_VSPLTW:
> +    case VSX_BUILTIN_XXSPLTD_V2DI:
> +    case VSX_BUILTIN_XXSPLTD_V2DF:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
> +	arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
> +	/* Only fold the vec_splat_*() if arg1 is a constant
> +	   5-bit unsigned literal.  */
> +	if (TREE_CODE (arg1) != INTEGER_CST || TREE_INT_CST_LOW (arg1) > 0x1f)
> +	  return false;
> +
> +	lhs = gimple_call_lhs (stmt);
> +	tree lhs_type = TREE_TYPE (lhs);
> +
> +	tree splat;
> +	if (TREE_CODE (arg0) == VECTOR_CST)
> +	  splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (arg1));

You should force arg1 into range before this.  It should be adjusted modulo
VECTOR_CST_NELTS (arg0).  Yes, the underlying vsplt* instructions will
handle the modulo itself, but when expanding early we should make the
correction early, else something is likely to explode.  Do you have a test
where arg1 is less than 32 but greater than the number of elements?

> +	else
> +	  {
> +	    /* Determine (in bits) the length and start location of the
> +	       splat value for a call to the tree_vec_extract helper.  */
> +	    int tree_size_in_bits = TREE_INT_CST_LOW (size_in_bytes (lhs_type))
> +				    * BITS_PER_UNIT;

I expect you should use arg0's type rather than lhs_type here.  They
should be the same, of course; it's just clearer.

> +	    int splat_elem_size = tree_size_in_bits / VECTOR_CST_NELTS (arg0);
> +	    int splat_start_bit = TREE_INT_CST_LOW (arg1) * splat_elem_size;
> +	    /* Do not attempt to early-fold if the size + specified offset into
> +	       the vector would touch outside of the source vector.  */
> +	    if ((splat_start_bit + splat_elem_size) > tree_size_in_bits)
> +	      return false;

This won't be necessary once you fix the modulo issue.

The rest LGTM.

Thanks,
Bill

> +	    tree len = build_int_cst (bitsizetype, splat_elem_size);
> +	    tree start = build_int_cst (bitsizetype, splat_start_bit);
> +	    splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
> +				      len, start);
> +	}
> +	/* And finally, build the new vector.  */
> +	tree splat_tree = build_vector_from_val (lhs_type, splat);
> +	g = gimple_build_assign (lhs, splat_tree);
> +	gimple_set_location (g, gimple_location (stmt));
> +	gsi_replace (gsi, g, true);
> +	return true;
>       }
> 
>     /* vec_mergel (integrals).  */
>     case ALTIVEC_BUILTIN_VMRGLH:
>     case ALTIVEC_BUILTIN_VMRGLW:
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index 04e9bfa..e634180 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -59,10 +59,11 @@ extern tree gimple_fold_indirect_ref (tree);
> extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
> extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
> extern bool arith_code_with_undefined_signed_overflow (tree_code);
> extern gimple_seq rewrite_to_defined_overflow (gimple *);
> extern void replace_call_with_value (gimple_stmt_iterator *, tree);
> +extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, tree);
> 
> /* gimple_build, functionally matching fold_buildN, outputs stmts
>    int the provided sequence, matching and simplifying them on-the-fly.
>    Supposed to replace force_gimple_operand (fold_buildN (...), ...).  */
> extern tree gimple_build (gimple_seq *, location_t,
> diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
> index 909f790..1c9701d 100644
> --- a/gcc/tree-vect-generic.c
> +++ b/gcc/tree-vect-generic.c
> @@ -118,11 +118,11 @@ build_word_mode_vector_type (int nunits)
> 
> typedef tree (*elem_op_func) (gimple_stmt_iterator *,
> 			      tree, tree, tree, tree, tree, enum tree_code,
> 			      tree);
> 
> -static inline tree
> +tree
> tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
> 		  tree t, tree bitsize, tree bitpos)
> {
>   if (TREE_CODE (t) == SSA_NAME)
>     {
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat
  2018-08-09 13:53 ` Bill Schmidt
@ 2018-08-09 21:07   ` Will Schmidt
  2018-08-16 15:50   ` [PATCH, RFC, rs6000, v2] " Will Schmidt
  1 sibling, 0 replies; 17+ messages in thread
From: Will Schmidt @ 2018-08-09 21:07 UTC (permalink / raw)
  To: Bill Schmidt
  Cc: Richard Biener, Segher Boessenkool, Bill Schmidt, David Edelsohn,
	GCC Patches

On Thu, 2018-08-09 at 08:53 -0500, Bill Schmidt wrote:
> > On Aug 7, 2018, at 2:25 PM, Will Schmidt <will_schmidt@vnet.ibm.com> wrote:
> > 
> > Hi
> > Enable GIMPLE folding of the vec_splat() intrinsic.
> > 
> > For review.. feedback is expected. :-)
> > 
> > I came up with the following after spending some time poking around
> > at the tree_vec_extract() and vector_element() functions as seen
> > in tree-vect-generic.c looking for insights.  Some of this seems a
> > bit clunky to me yet, but this is functional as far as make-check
> > can tell, and is as far as I can get without additional input.
> > 
> > This uses the tree_vec_extract() function out of tree-vect-generic.c
> > to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
> > made non-static as part of this change.
> > 
> > In review of the .gimple output, this folding takes a sample testcase
> > of 
> > 	vector bool int testb_0  (vector bool int x)
> > 	{
> > 	  return vec_splat (x, 0b00000); 
> > 	}
> > from:
> > testb_0 (__vector __bool int x)
> > {
> >  __vector __bool intD.1486 D.2855;
> > 
> >  _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
> >  _2 = __builtin_altivec_vspltwD.1919 (_1, 0);
> >  D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
> >  return D.2855;
> > }
> > to:
> > testb_0 (__vector __bool int x)
> > {
> >  __vector __bool intD.1486 D.2855;
> > 
> >  _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
> >  D.2856 = BIT_FIELD_REF <_1, 32, 0>;
> >  _2 = {D.2856, D.2856, D.2856, D.2856};
> >  D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
> >  return D.2855;
> > }
> > 
> This looks okay.
> > 
> > Testcases are being posted as a separate patch.
> > 
> > OK for trunk?   
> > Thanks,
> > -Will
> > 
> > [gcc]
> > 
> > 2018-08-07  Will Schmidt  <will_schmidt@vnet.ibm.com>
> > 
> > 	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
> > 	  early gimple folding of vec_splat().
> > 	* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
> > 	* gimple-fold.h:  Add an extern define for tree_vec_extract().
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 35c32be..acc6b49 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -15764,10 +15764,56 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> > 	 tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
> > 	 g = gimple_build_assign (lhs, splat_tree);
> > 	 gimple_set_location (g, gimple_location (stmt));
> > 	 gsi_replace (gsi, g, true);
> > 	 return true;
> > +	}
> > +
> > +    /* Flavors of vec_splat.  */
> > +    // a = vec_splat (b, 0x3) ;  // a = { b[3],b[3],b[3],...};
> > +    case ALTIVEC_BUILTIN_VSPLTB:
> > +    case ALTIVEC_BUILTIN_VSPLTH:
> > +    case ALTIVEC_BUILTIN_VSPLTW:
> > +    case VSX_BUILTIN_XXSPLTD_V2DI:
> > +    case VSX_BUILTIN_XXSPLTD_V2DF:
> > +      {
> > +	arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
> > +	arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
> > +	/* Only fold the vec_splat_*() if arg1 is a constant
> > +	   5-bit unsigned literal.  */
> > +	if (TREE_CODE (arg1) != INTEGER_CST || TREE_INT_CST_LOW (arg1) > 0x1f)
> > +	  return false;
> > +
> > +	lhs = gimple_call_lhs (stmt);
> > +	tree lhs_type = TREE_TYPE (lhs);
> > +
> > +	tree splat;
> > +	if (TREE_CODE (arg0) == VECTOR_CST)
> > +	  splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (arg1));
> 
> You should force arg1 into range before this.  It should be adjusted modulo
> VECTOR_CST_NELTS (arg0).  Yes, the underlying vsplt* instructions will

OK.

> handle the modulo itself, but when expanding early we should make the
> correction early, else something is likely to explode.  Do you have a test
> where arg1 is less than 32 but greater than the number of elements?

Almost. :-)   I test for arg1 < 32 and > #elms, but not with the
arg0-is-constant scenario.  I'll craft up a couple more testcases for
that.

> 
> > +	else
> > +	  {
> > +	    /* Determine (in bits) the length and start location of the
> > +	       splat value for a call to the tree_vec_extract helper.  */
> > +	    int tree_size_in_bits = TREE_INT_CST_LOW (size_in_bytes (lhs_type))
> > +				    * BITS_PER_UNIT;
> 
> I expect you should use arg0's type rather than lhs_type here.  They
> should be the same, of course; it's just clearer.

ok.

> 
> > +	    int splat_elem_size = tree_size_in_bits / VECTOR_CST_NELTS (arg0);
> > +	    int splat_start_bit = TREE_INT_CST_LOW (arg1) * splat_elem_size;
> > +	    /* Do not attempt to early-fold if the size + specified offset into
> > +	       the vector would touch outside of the source vector.  */
> > +	    if ((splat_start_bit + splat_elem_size) > tree_size_in_bits)
> > +	      return false;
> 
> This won't be necessary once you fix the modulo issue.

ok.

> The rest LGTM.

thanks for the review and feedback. :-)

-Will



> 
> Thanks,
> Bill
> 
> > +	    tree len = build_int_cst (bitsizetype, splat_elem_size);
> > +	    tree start = build_int_cst (bitsizetype, splat_start_bit);
> > +	    splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
> > +				      len, start);
> > +	}
> > +	/* And finally, build the new vector.  */
> > +	tree splat_tree = build_vector_from_val (lhs_type, splat);
> > +	g = gimple_build_assign (lhs, splat_tree);
> > +	gimple_set_location (g, gimple_location (stmt));
> > +	gsi_replace (gsi, g, true);
> > +	return true;
> >       }
> > 
> >     /* vec_mergel (integrals).  */
> >     case ALTIVEC_BUILTIN_VMRGLH:
> >     case ALTIVEC_BUILTIN_VMRGLW:
> > diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> > index 04e9bfa..e634180 100644
> > --- a/gcc/gimple-fold.h
> > +++ b/gcc/gimple-fold.h
> > @@ -59,10 +59,11 @@ extern tree gimple_fold_indirect_ref (tree);
> > extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
> > extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
> > extern bool arith_code_with_undefined_signed_overflow (tree_code);
> > extern gimple_seq rewrite_to_defined_overflow (gimple *);
> > extern void replace_call_with_value (gimple_stmt_iterator *, tree);
> > +extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, tree);
> > 
> > /* gimple_build, functionally matching fold_buildN, outputs stmts
> >    int the provided sequence, matching and simplifying them on-the-fly.
> >    Supposed to replace force_gimple_operand (fold_buildN (...), ...).  */
> > extern tree gimple_build (gimple_seq *, location_t,
> > diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
> > index 909f790..1c9701d 100644
> > --- a/gcc/tree-vect-generic.c
> > +++ b/gcc/tree-vect-generic.c
> > @@ -118,11 +118,11 @@ build_word_mode_vector_type (int nunits)
> > 
> > typedef tree (*elem_op_func) (gimple_stmt_iterator *,
> > 			      tree, tree, tree, tree, tree, enum tree_code,
> > 			      tree);
> > 
> > -static inline tree
> > +tree
> > tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
> > 		  tree t, tree bitsize, tree bitpos)
> > {
> >   if (TREE_CODE (t) == SSA_NAME)
> >     {
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH, RFC, rs6000, v2] folding of vec_splat
  2018-08-09 13:53 ` Bill Schmidt
  2018-08-09 21:07   ` Will Schmidt
@ 2018-08-16 15:50   ` Will Schmidt
  2018-08-16 20:51     ` Segher Boessenkool
  1 sibling, 1 reply; 17+ messages in thread
From: Will Schmidt @ 2018-08-16 15:50 UTC (permalink / raw)
  To: Bill Schmidt
  Cc: Richard Biener, Segher Boessenkool, Bill Schmidt, David Edelsohn,
	GCC Patches

Hi
  Enable GIMPLE folding of the vec_splat() intrinsic. (v2).
    
This uses the tree_vec_extract() function out of tree-vect-generic.c
to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
made non-static as part of this change.
    
Testcases are already in-tree.
    
V2 updates, per feedback previously received.
Forced arg1 into range (modulo #elements) before attempting to extract
the splat value.
Removed the (now unnecessary) code that did bounds-checking before calling
the tree_vec_extract helper.
Used arg0_type rather than lhs_type for calculating the tree size.

OK for trunk?
    
Thanks,
-Will
    
[gcc]

2018-08-16  Will Schmidt  <will_schmidt@vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
	  early gimple folding of vec_splat().
	* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
	* gimple-fold.h:  Add an extern define for tree_vec_extract().

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 97b922f..ec92e6a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15766,10 +15766,58 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 	 tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
 	 g = gimple_build_assign (lhs, splat_tree);
 	 gimple_set_location (g, gimple_location (stmt));
 	 gsi_replace (gsi, g, true);
 	 return true;
+	}
+
+    /* Flavors of vec_splat.  */
+    /* a = vec_splat (b, 0x3) becomes a = { b[3],b[3],b[3],...};  */
+    case ALTIVEC_BUILTIN_VSPLTB:
+    case ALTIVEC_BUILTIN_VSPLTH:
+    case ALTIVEC_BUILTIN_VSPLTW:
+    case VSX_BUILTIN_XXSPLTD_V2DI:
+    case VSX_BUILTIN_XXSPLTD_V2DF:
+      {
+	arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
+	arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
+	/* Only fold the vec_splat_*() if arg1 is a constant
+	   5-bit unsigned literal.  */
+	if (TREE_CODE (arg1) != INTEGER_CST || TREE_INT_CST_LOW (arg1) > 0x1f)
+	  return false;
+	lhs = gimple_call_lhs (stmt);
+	tree lhs_type = TREE_TYPE (lhs);
+	tree arg0_type = TREE_TYPE (arg0);
+	tree arg1_type = TREE_TYPE (arg1);
+	int n_elts = VECTOR_CST_NELTS (arg0);
+	/* force arg1 into range.  */
+	tree new_arg1 = build_int_cst (arg1_type,
+				       TREE_INT_CST_LOW (arg1) % n_elts);
+	tree splat;
+	if (TREE_CODE (arg0) == VECTOR_CST)
+	  splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (new_arg1));
+	else
+	  {
+	    /* Determine (in bits) the length and start location of the
+	       splat value for a call to the tree_vec_extract helper.  */
+	    int tree_size_in_bits = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
+				    * BITS_PER_UNIT;
+	    int splat_elem_size = tree_size_in_bits / n_elts;
+	    int splat_start_bit = TREE_INT_CST_LOW (new_arg1) * splat_elem_size;
+	    /* Do not attempt to early-fold if the size + specified offset into
+	       the vector would touch outside of the source vector.  */
+	    tree len = build_int_cst (bitsizetype, splat_elem_size);
+	    tree start = build_int_cst (bitsizetype, splat_start_bit);
+	    splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
+				      len, start);
+	}
+	/* And finally, build the new vector.  */
+	tree splat_tree = build_vector_from_val (lhs_type, splat);
+	g = gimple_build_assign (lhs, splat_tree);
+	gimple_set_location (g, gimple_location (stmt));
+	gsi_replace (gsi, g, true);
+	return true;
       }
 
     /* vec_mergel (integrals).  */
     case ALTIVEC_BUILTIN_VMRGLH:
     case ALTIVEC_BUILTIN_VMRGLW:
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 04e9bfa..e634180 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -59,10 +59,11 @@ extern tree gimple_fold_indirect_ref (tree);
 extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
 extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
 extern bool arith_code_with_undefined_signed_overflow (tree_code);
 extern gimple_seq rewrite_to_defined_overflow (gimple *);
 extern void replace_call_with_value (gimple_stmt_iterator *, tree);
+extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, tree);
 
 /* gimple_build, functionally matching fold_buildN, outputs stmts
    int the provided sequence, matching and simplifying them on-the-fly.
    Supposed to replace force_gimple_operand (fold_buildN (...), ...).  */
 extern tree gimple_build (gimple_seq *, location_t,
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index 909f790..1c9701d 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -118,11 +118,11 @@ build_word_mode_vector_type (int nunits)
 
 typedef tree (*elem_op_func) (gimple_stmt_iterator *,
 			      tree, tree, tree, tree, tree, enum tree_code,
 			      tree);
 
-static inline tree
+tree
 tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
 		  tree t, tree bitsize, tree bitpos)
 {
   if (TREE_CODE (t) == SSA_NAME)
     {


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000, v2] folding of vec_splat
  2018-08-16 15:50   ` [PATCH, RFC, rs6000, v2] " Will Schmidt
@ 2018-08-16 20:51     ` Segher Boessenkool
  2018-08-16 23:21       ` Will Schmidt
  2018-08-20 21:44       ` [PATCH, RFC, rs6000, v3] enable early gimple-folding " Will Schmidt
  0 siblings, 2 replies; 17+ messages in thread
From: Segher Boessenkool @ 2018-08-16 20:51 UTC (permalink / raw)
  To: Will Schmidt
  Cc: Bill Schmidt, Richard Biener, Bill Schmidt, David Edelsohn, GCC Patches

Hi Will,

On Thu, Aug 16, 2018 at 10:50:45AM -0500, Will Schmidt wrote:
> 2018-08-16  Will Schmidt  <will_schmidt@vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
> 	  early gimple folding of vec_splat().

Continuation lines should be indented to the *, not to the text after it.

> +	/* Only fold the vec_splat_*() if arg1 is a constant
> +	   5-bit unsigned literal.  */
> +	if (TREE_CODE (arg1) != INTEGER_CST || TREE_INT_CST_LOW (arg1) > 0x1f)
> +	  return false;

Does this need to check for negative as well?  So something with IN_RANGE
then.

> +	/* force arg1 into range.  */
> +	tree new_arg1 = build_int_cst (arg1_type,
> +				       TREE_INT_CST_LOW (arg1) % n_elts);

Or is the range check unnecessary completely, since you have this?

> +	tree splat;
> +	if (TREE_CODE (arg0) == VECTOR_CST)
> +	  splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (new_arg1));
> +	else
> +	  {
> +	    /* Determine (in bits) the length and start location of the
> +	       splat value for a call to the tree_vec_extract helper.  */
> +	    int tree_size_in_bits = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
> +				    * BITS_PER_UNIT;
> +	    int splat_elem_size = tree_size_in_bits / n_elts;
> +	    int splat_start_bit = TREE_INT_CST_LOW (new_arg1) * splat_elem_size;
> +	    /* Do not attempt to early-fold if the size + specified offset into
> +	       the vector would touch outside of the source vector.  */
> +	    tree len = build_int_cst (bitsizetype, splat_elem_size);
> +	    tree start = build_int_cst (bitsizetype, splat_start_bit);
> +	    splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
> +				      len, start);
> +	}

This closing brace should be indented two spaces more.

> -static inline tree
> +tree
>  tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
>  		  tree t, tree bitsize, tree bitpos)

It could use a comment, too (what the args are, etc.)

Other than those nits, looks fine to me.  Maybe Richard or Bill have
more comments?


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000, v2] folding of vec_splat
  2018-08-16 20:51     ` Segher Boessenkool
@ 2018-08-16 23:21       ` Will Schmidt
  2018-08-20 21:44       ` [PATCH, RFC, rs6000, v3] enable early gimple-folding " Will Schmidt
  1 sibling, 0 replies; 17+ messages in thread
From: Will Schmidt @ 2018-08-16 23:21 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Bill Schmidt, Richard Biener, Bill Schmidt, David Edelsohn, GCC Patches

On Thu, 2018-08-16 at 15:51 -0500, Segher Boessenkool wrote:
> Hi Will,
> 
> On Thu, Aug 16, 2018 at 10:50:45AM -0500, Will Schmidt wrote:
> > 2018-08-16  Will Schmidt  <will_schmidt@vnet.ibm.com>
> > 
> > 	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
> > 	  early gimple folding of vec_splat().
> 
> Continuation lines should be indented to the *, not to the text after it.
OK

> 
> > +	/* Only fold the vec_splat_*() if arg1 is a constant
> > +	   5-bit unsigned literal.  */
> > +	if (TREE_CODE (arg1) != INTEGER_CST || TREE_INT_CST_LOW (arg1) > 0x1f)
> > +	  return false;
> 
> Does this need to check for negative as well?  So something with IN_RANGE
> then.

> 
> > +	/* force arg1 into range.  */
> > +	tree new_arg1 = build_int_cst (arg1_type,
> > +				       TREE_INT_CST_LOW (arg1) % n_elts);
> 
> Or is the range check unnecessary completely, since you have this?

I can be convinced either way. :-)
I think i still want both.  The first rejects (from gimple-folding) any
values that are outside of the 5-bit range as specified by the function
definition.
The second (modulo) then maps any 'valid' values into what is reasonable
for the data type being splatted.

While trying to convince myself one way or another, I do notice that
today (pre-folding), i can get out-of-range errors such as 
  /tmp/ccP0s6iJ.s:781: Error: operand out of range (-1 is not between
  0 and 3)
while with folding enabled, and this modulo in place, we compile without
warning.  So there is a behavior change, for which I have mixed
feelings.  :-)


> > +	tree splat;
> > +	if (TREE_CODE (arg0) == VECTOR_CST)
> > +	  splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (new_arg1));
> > +	else
> > +	  {
> > +	    /* Determine (in bits) the length and start location of the
> > +	       splat value for a call to the tree_vec_extract helper.  */
> > +	    int tree_size_in_bits = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
> > +				    * BITS_PER_UNIT;
> > +	    int splat_elem_size = tree_size_in_bits / n_elts;
> > +	    int splat_start_bit = TREE_INT_CST_LOW (new_arg1) * splat_elem_size;
> > +	    /* Do not attempt to early-fold if the size + specified offset into
> > +	       the vector would touch outside of the source vector.  */
> > +	    tree len = build_int_cst (bitsizetype, splat_elem_size);
> > +	    tree start = build_int_cst (bitsizetype, splat_start_bit);
> > +	    splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
> > +				      len, start);
> > +	}
> 
> This closing brace should be indented two spaces more.
Ok

> > -static inline tree
> > +tree
> >  tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
> >  		  tree t, tree bitsize, tree bitpos)
> 
> It could use a comment, too (what the args are, etc.)

I can definitely add some commentary to my call into this function.
At a glance, the functions nearby tree_vec_extract do have a couple
lines of description each, so I'll look and see if I can come up with
anything reasonable here.

> Other than those nits, looks fine to me.  Maybe Richard or Bill have
> more comments?

Thanks for the review. 
-Will

> 
> 
> Segher
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat
  2018-08-07 19:25 [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat Will Schmidt
  2018-08-09 13:53 ` Bill Schmidt
@ 2018-08-17 14:11 ` Richard Biener
  2018-08-18 17:01   ` VEC_DUPLICATE_EXPR options (was Re: [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat) Richard Sandiford
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2018-08-17 14:11 UTC (permalink / raw)
  To: will_schmidt, Richard Sandiford
  Cc: Segher Boessenkool, William J. Schmidt, David Edelsohn, GCC Patches

On Tue, Aug 7, 2018 at 9:25 PM Will Schmidt <will_schmidt@vnet.ibm.com> wrote:
>
> Hi
> Enable GIMPLE folding of the vec_splat() intrinsic.
>
> For review.. feedback is expected. :-)
>
> I came up with the following after spending some time poking around
> at the tree_vec_extract() and vector_element() functions as seen
> in tree-vect-generic.c looking for insights.  Some of this seems a
> bit clunky to me yet, but this is functional as far as make-check
> can tell, and is as far as I can get without additional input.
>
> This uses the tree_vec_extract() function out of tree-vect-generic.c
> to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
> made non-static as part of this change.
>
> In review of the .gimple output, this folding takes a sample testcase
> of
>         vector bool int testb_0  (vector bool int x)
>         {
>           return vec_splat (x, 0b00000);
>         }
> from:
> testb_0 (__vector __bool int x)
> {
>   __vector __bool intD.1486 D.2855;
>
>   _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
>   _2 = __builtin_altivec_vspltwD.1919 (_1, 0);
>   D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
>   return D.2855;
> }
> to:
>  testb_0 (__vector __bool int x)
> {
>   __vector __bool intD.1486 D.2855;
>
>   _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
>   D.2856 = BIT_FIELD_REF <_1, 32, 0>;
>   _2 = {D.2856, D.2856, D.2856, D.2856};
>   D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
>   return D.2855;
> }
>
>
> Testcases are being posted as a separate patch.
>
> OK for trunk?

It certainly works, nowadays we have VEC_DUPLICATE_EXPR though
so you could simply do

 _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
 D.2856 = BIT_FIELD_REF <_1, 32, 0>;
 D.2855 = VEC_DUPLICATE_EXPR <__vector __bool intD.1486> (D.2856);

not sure what variant is better for followup optimizations and whether
we already fold that { D.2856, D.2856, D.2856, D.2856 }; to VEC_DUPLICATE_EXPR.

Richard may know more.

Richard.

> Thanks,
> -Will
>
> [gcc]
>
> 2018-08-07  Will Schmidt  <will_schmidt@vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
>           early gimple folding of vec_splat().
>         * tree-vect-generic.c: Remove static from tree_vec_extract() definition.
>         * gimple-fold.h:  Add an extern define for tree_vec_extract().
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 35c32be..acc6b49 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -15764,10 +15764,56 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>          tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
>          g = gimple_build_assign (lhs, splat_tree);
>          gimple_set_location (g, gimple_location (stmt));
>          gsi_replace (gsi, g, true);
>          return true;
> +       }
> +
> +    /* Flavors of vec_splat.  */
> +    // a = vec_splat (b, 0x3) ;  // a = { b[3],b[3],b[3],...};
> +    case ALTIVEC_BUILTIN_VSPLTB:
> +    case ALTIVEC_BUILTIN_VSPLTH:
> +    case ALTIVEC_BUILTIN_VSPLTW:
> +    case VSX_BUILTIN_XXSPLTD_V2DI:
> +    case VSX_BUILTIN_XXSPLTD_V2DF:
> +      {
> +       arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
> +       arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
> +       /* Only fold the vec_splat_*() if arg1 is a constant
> +          5-bit unsigned literal.  */
> +       if (TREE_CODE (arg1) != INTEGER_CST || TREE_INT_CST_LOW (arg1) > 0x1f)
> +         return false;
> +
> +       lhs = gimple_call_lhs (stmt);
> +       tree lhs_type = TREE_TYPE (lhs);
> +
> +       tree splat;
> +       if (TREE_CODE (arg0) == VECTOR_CST)
> +         splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (arg1));
> +       else
> +         {
> +           /* Determine (in bits) the length and start location of the
> +              splat value for a call to the tree_vec_extract helper.  */
> +           int tree_size_in_bits = TREE_INT_CST_LOW (size_in_bytes (lhs_type))
> +                                   * BITS_PER_UNIT;
> +           int splat_elem_size = tree_size_in_bits / VECTOR_CST_NELTS (arg0);
> +           int splat_start_bit = TREE_INT_CST_LOW (arg1) * splat_elem_size;
> +           /* Do not attempt to early-fold if the size + specified offset into
> +              the vector would touch outside of the source vector.  */
> +           if ((splat_start_bit + splat_elem_size) > tree_size_in_bits)
> +             return false;
> +           tree len = build_int_cst (bitsizetype, splat_elem_size);
> +           tree start = build_int_cst (bitsizetype, splat_start_bit);
> +           splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
> +                                     len, start);
> +       }
> +       /* And finally, build the new vector.  */
> +       tree splat_tree = build_vector_from_val (lhs_type, splat);
> +       g = gimple_build_assign (lhs, splat_tree);
> +       gimple_set_location (g, gimple_location (stmt));
> +       gsi_replace (gsi, g, true);
> +       return true;
>        }
>
>      /* vec_mergel (integrals).  */
>      case ALTIVEC_BUILTIN_VMRGLH:
>      case ALTIVEC_BUILTIN_VMRGLW:
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index 04e9bfa..e634180 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -59,10 +59,11 @@ extern tree gimple_fold_indirect_ref (tree);
>  extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
>  extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
>  extern bool arith_code_with_undefined_signed_overflow (tree_code);
>  extern gimple_seq rewrite_to_defined_overflow (gimple *);
>  extern void replace_call_with_value (gimple_stmt_iterator *, tree);
> +extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, tree);
>
>  /* gimple_build, functionally matching fold_buildN, outputs stmts
>     int the provided sequence, matching and simplifying them on-the-fly.
>     Supposed to replace force_gimple_operand (fold_buildN (...), ...).  */
>  extern tree gimple_build (gimple_seq *, location_t,
> diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
> index 909f790..1c9701d 100644
> --- a/gcc/tree-vect-generic.c
> +++ b/gcc/tree-vect-generic.c
> @@ -118,11 +118,11 @@ build_word_mode_vector_type (int nunits)
>
>  typedef tree (*elem_op_func) (gimple_stmt_iterator *,
>                               tree, tree, tree, tree, tree, enum tree_code,
>                               tree);
>
> -static inline tree
> +tree
>  tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
>                   tree t, tree bitsize, tree bitpos)
>  {
>    if (TREE_CODE (t) == SSA_NAME)
>      {
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* VEC_DUPLICATE_EXPR options (was Re: [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat)
  2018-08-17 14:11 ` [PATCH, RFC, rs6000] enable GIMPLE folding " Richard Biener
@ 2018-08-18 17:01   ` Richard Sandiford
  2018-08-23 13:50     ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Sandiford @ 2018-08-18 17:01 UTC (permalink / raw)
  To: Richard Biener
  Cc: will_schmidt, Segher Boessenkool, William J. Schmidt,
	David Edelsohn, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Aug 7, 2018 at 9:25 PM Will Schmidt <will_schmidt@vnet.ibm.com> wrote:
>>
>> Hi
>> Enable GIMPLE folding of the vec_splat() intrinsic.
>>
>> For review.. feedback is expected. :-)
>>
>> I came up with the following after spending some time poking around
>> at the tree_vec_extract() and vector_element() functions as seen
>> in tree-vect-generic.c looking for insights.  Some of this seems a
>> bit clunky to me yet, but this is functional as far as make-check
>> can tell, and is as far as I can get without additional input.
>>
>> This uses the tree_vec_extract() function out of tree-vect-generic.c
>> to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
>> made non-static as part of this change.
>>
>> In review of the .gimple output, this folding takes a sample testcase
>> of
>>         vector bool int testb_0  (vector bool int x)
>>         {
>>           return vec_splat (x, 0b00000);
>>         }
>> from:
>> testb_0 (__vector __bool int x)
>> {
>>   __vector __bool intD.1486 D.2855;
>>
>>   _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
>>   _2 = __builtin_altivec_vspltwD.1919 (_1, 0);
>>   D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
>>   return D.2855;
>> }
>> to:
>>  testb_0 (__vector __bool int x)
>> {
>>   __vector __bool intD.1486 D.2855;
>>
>>   _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
>>   D.2856 = BIT_FIELD_REF <_1, 32, 0>;
>>   _2 = {D.2856, D.2856, D.2856, D.2856};
>>   D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
>>   return D.2855;
>> }
>>
>>
>> Testcases are being posted as a separate patch.
>>
>> OK for trunk?
>
> It certainly works, nowadays we have VEC_DUPLICATE_EXPR though
> so you could simply do
>
>  _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
>  D.2856 = BIT_FIELD_REF <_1, 32, 0>;
>  D.2855 = VEC_DUPLICATE_EXPR <__vector __bool intD.1486> (D.2856);
>
> not sure what variant is better for followup optimizations and whether
> we already fold that { D.2856, D.2856, D.2856, D.2856 }; to VEC_DUPLICATE_EXPR.
>
> Richard may know more.

We only use VEC_DUPLICATE_EXPR for variable-length vectors.  I still owe
you a GCC 9 update for:

  https://gcc.gnu.org/ml/gcc-patches/2017-12/msg01039.html

(hadn't forgotten!)

It's a diversion from the patch, sorry, but I think the main alternatives are:

(1) Allow either operand of a VEC_PERM_EXPR to be a scalar
    ------------------------------------------------------

    Handles VEC_DUPLICATE_EXPR <a> as:
        VEC_PERM_EXPR <a, a, { 0, ... }>

    Handles IFN_VEC_SHL_INSERT <avec, b> as:
        VEC_PERM_EXPR <b, avec, { 0, 1, 2, 3, ... }>
    or: VEC_PERM_EXPR <avec, b, { n, 0, 1, 2, ... }>

    Can also insert a single "a" at the beginning of a vector of "b":
        VEC_PERM_EXPR <a, b, { 0, 1, 1, 1, ... }>

    Disdvantages:

    (a) It's still a bit limited.  E.g. it can't insert two scalars at
        the beginning of a vector of "b", which might be useful for SLP.

    (b) Either all targets would need to handle this case explicitly
        or something would need to massage the tree permute mask before
        the targets see it.  E.g.:

           VEC_PERM_EXPR <a, b, { 0, 1, 1, ... }>
        -> permute of { 0, n, n, ... } on two vector broadcast rtxes

    (c) There'd still be an inconsistency.  Either:
        (i) we'd only use VEC_PERM_EXPRs for variable-length vectors or
        (ii) we'd have to check before creating a CONSTRUCTOR whether
             it should be represented as a VEC_PERM_EXPR instead.

(2) Allow only the second operand of a VEC_PERM_EXPR to be a scalar
    ---------------------------------------------------------------

    Handles VEC_DUPLICATE_EXPR <a> as:
        VEC_PERM_EXPR <{ 0, ... }, a, { n, ... }>

    Handles IFN_VEC_SHL_INSERT <avec, b> as:
        VEC_PERM_EXPR <avec, b, { n, 0, 1, 2, ... }>;

    Advantages:

    - Avoids the need for target changes: the mask is the same if the scalar
      is replaced by a vector.

    Disadvantages:

    - (a) still holds (and now we can't do a two-scalar permute).

    - (c) still holds.

(3) Keep VEC_DUPLICATE_EXPR but use it for everything
    -------------------------------------------------

    Advantages:

    - The handling of duplicates becomes consistent across vector types.

    - There are no changes to VEC_PERM_EXPR or targets.

    - We can still replace IFN_VEC_SHL_INSERT with a VEC_PERM_EXPR,
      as a separate change.

    Disadvantages:

    - (c) still holds.

    (d) Vector duplication stays a bit of a special case (which was the
        original problem).

(4) Use the VECTOR_CST encoding for CONSTRUCTORs too
    ------------------------------------------------

    Advantages:

    - Reduces memory overhead of CONSTRUCTORs for vector broadcasts
      in constant-length vectors.

    - There's only one tree code to check.

    - Many more vectors have the same representation for constant- and
      variable-length vectors.

    - There's more consistency between constant element values and
      variable element values.

    - We can still replace IFN_VEC_SHL_INSERT with a VEC_PERM_EXPR,
      as a separate change.

    Disadvantages:

    (e) Completely following the VECTOR_CST encoding would mean having
        to encode the stepped forms based on two values in the series.
        E.g. a 3-pattern:

          { a : b : c : ... }

        (ad-hoc syntax) would be: 

          { a, b, c, c * 2 - b, c * 3 - b, ... }

        so the CONSTRUCTOR would include hidden arithmetic.  This means
        that the value of any given element (rather than just a directly-
        encoded one) would in general not be a gimple value.  E.g. in
        the above case, asking for element 3 would give a GENERIC tree.

    (f) If vector CONSTRUCTORs seem like a mistake then this will too.

(5) Like (4), but don't allow the stepped case
    ------------------------------------------

    Advantages:

    - As for (4), and avoids the hidden arithmetic in (e).

    Disadvantages:

    - (f) still holds.

    (g) Stepped vectors need to be handled separately.

For (f), we could (and might need to) replace CONSTRUCTOR with a new
tree code.  It would still have to be GIMPLE_SINGLE_RHS though, if we
keep the current "gassigns have 4 operands" assumption.

I'm not sure (g) is really a disadvantage.  We could replace
VEC_SERIES_EXPR <a, b> with:

    basev = { a, ... }
    stepv = { b, ... }
    tmp = stepv * { 0, 1, 2, ... };
    series = basev + tmp

and match it as an IFN_VEC_SERIES for targets that have it.
So code constructing vectors would never have to know about
VEC_SERIES_EXPR/IFN_VEC_SERIES or IFN_VEC_SHL_INSERT.

So I'm leaning towards (5) at the moment.  What do you think?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat
  2018-08-16 20:51     ` Segher Boessenkool
  2018-08-16 23:21       ` Will Schmidt
@ 2018-08-20 21:44       ` Will Schmidt
  2018-08-25 17:16         ` Bill Schmidt
  2018-09-12 13:23         ` Segher Boessenkool
  1 sibling, 2 replies; 17+ messages in thread
From: Will Schmidt @ 2018-08-20 21:44 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Bill Schmidt, Richard Biener, Bill Schmidt, David Edelsohn, GCC Patches

Hi
Enable GIMPLE folding of the vec_splat() intrinsic. (v3).
    
This uses the tree_vec_extract() function out of tree-vect-generic.c
to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
made non-static as part of this change.
    
Testcases are already in-tree.
    
V2 updates, per feedback previously received.
Forced arg1 into range (modulo #elements) before attempting to extract
the splat value.
Removed the (now unnecessary) code that did bounds-checking before calling
the tree_vec_extract helper.
Used arg0_type rather than lhs_type for calculating the tree size.
    
V3 updates, inspired by additional offline discussion with Segher.
Noting that for vec_splat(arg1,arg2), the ABI describes ARG2 as an element
number less than the number of elements supported by the respective ARG1
type, so we do not attempt to gimple-fold the intrinsic if we determine our
value is out of range. Thus, the initial check ensures that ARG1 is both
constant and has a valid index into ARG0.
The subsequent modulo operation is no longer necessary, and has been removed.
Also eliminated a few now-unused variables.
    
Sniff-tests pass.  full regtest against powerpc platforms is in the queue.
OK for trunk with passing test results?
    
Thanks,
-Will
    
[gcc]
    
2018-08-20  Will Schmidt  <will_schmidt@vnet.ibm.com>
    
    	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
    	  early gimple folding of vec_splat().
    	* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
    	* gimple-fold.h:  Add an extern define for tree_vec_extract().

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5f77afd..4c3257f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15768,10 +15768,52 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 	 tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
 	 g = gimple_build_assign (lhs, splat_tree);
 	 gimple_set_location (g, gimple_location (stmt));
 	 gsi_replace (gsi, g, true);
 	 return true;
+	}
+
+    /* Flavors of vec_splat.  */
+    /* a = vec_splat (b, 0x3) becomes a = { b[3],b[3],b[3],...};  */
+    case ALTIVEC_BUILTIN_VSPLTB:
+    case ALTIVEC_BUILTIN_VSPLTH:
+    case ALTIVEC_BUILTIN_VSPLTW:
+    case VSX_BUILTIN_XXSPLTD_V2DI:
+    case VSX_BUILTIN_XXSPLTD_V2DF:
+      {
+	arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
+	arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
+	/* Only fold the vec_splat_*() if arg1 is both a constant value, and a valid
+	 index into the arg0 vector.  */
+	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
+	if (TREE_CODE (arg1) != INTEGER_CST
+	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))
+	  return false;
+	lhs = gimple_call_lhs (stmt);
+	tree lhs_type = TREE_TYPE (lhs);
+	tree arg0_type = TREE_TYPE (arg0);
+	tree splat;
+	if (TREE_CODE (arg0) == VECTOR_CST)
+	  splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (arg1));
+	else
+	  {
+	    /* Determine (in bits) the length and start location of the
+	       splat value for a call to the tree_vec_extract helper.  */
+	    int splat_elem_size = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
+				    * BITS_PER_UNIT / n_elts;
+	    int splat_start_bit = TREE_INT_CST_LOW (arg1) * splat_elem_size;
+	    tree len = build_int_cst (bitsizetype, splat_elem_size);
+	    tree start = build_int_cst (bitsizetype, splat_start_bit);
+	    splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
+				      len, start);
+	  }
+	/* And finally, build the new vector.  */
+	tree splat_tree = build_vector_from_val (lhs_type, splat);
+	g = gimple_build_assign (lhs, splat_tree);
+	gimple_set_location (g, gimple_location (stmt));
+	gsi_replace (gsi, g, true);
+	return true;
       }
 
     /* vec_mergel (integrals).  */
     case ALTIVEC_BUILTIN_VMRGLH:
     case ALTIVEC_BUILTIN_VMRGLW:
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 04e9bfa..e634180 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -59,10 +59,11 @@ extern tree gimple_fold_indirect_ref (tree);
 extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
 extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
 extern bool arith_code_with_undefined_signed_overflow (tree_code);
 extern gimple_seq rewrite_to_defined_overflow (gimple *);
 extern void replace_call_with_value (gimple_stmt_iterator *, tree);
+extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, tree);
 
 /* gimple_build, functionally matching fold_buildN, outputs stmts
    int the provided sequence, matching and simplifying them on-the-fly.
    Supposed to replace force_gimple_operand (fold_buildN (...), ...).  */
 extern tree gimple_build (gimple_seq *, location_t,
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index 909f790..1c9701d 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -118,11 +118,11 @@ build_word_mode_vector_type (int nunits)
 
 typedef tree (*elem_op_func) (gimple_stmt_iterator *,
 			      tree, tree, tree, tree, tree, enum tree_code,
 			      tree);
 
-static inline tree
+tree
 tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
 		  tree t, tree bitsize, tree bitpos)
 {
   if (TREE_CODE (t) == SSA_NAME)
     {


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: VEC_DUPLICATE_EXPR options (was Re: [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat)
  2018-08-18 17:01   ` VEC_DUPLICATE_EXPR options (was Re: [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat) Richard Sandiford
@ 2018-08-23 13:50     ` Richard Biener
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Biener @ 2018-08-23 13:50 UTC (permalink / raw)
  To: will_schmidt, Segher Boessenkool, William J. Schmidt,
	David Edelsohn, GCC Patches, Richard Sandiford

On Sat, Aug 18, 2018 at 7:01 PM Richard Sandiford
<rdsandiford@googlemail.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Tue, Aug 7, 2018 at 9:25 PM Will Schmidt <will_schmidt@vnet.ibm.com> wrote:
> >>
> >> Hi
> >> Enable GIMPLE folding of the vec_splat() intrinsic.
> >>
> >> For review.. feedback is expected. :-)
> >>
> >> I came up with the following after spending some time poking around
> >> at the tree_vec_extract() and vector_element() functions as seen
> >> in tree-vect-generic.c looking for insights.  Some of this seems a
> >> bit clunky to me yet, but this is functional as far as make-check
> >> can tell, and is as far as I can get without additional input.
> >>
> >> This uses the tree_vec_extract() function out of tree-vect-generic.c
> >> to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
> >> made non-static as part of this change.
> >>
> >> In review of the .gimple output, this folding takes a sample testcase
> >> of
> >>         vector bool int testb_0  (vector bool int x)
> >>         {
> >>           return vec_splat (x, 0b00000);
> >>         }
> >> from:
> >> testb_0 (__vector __bool int x)
> >> {
> >>   __vector __bool intD.1486 D.2855;
> >>
> >>   _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
> >>   _2 = __builtin_altivec_vspltwD.1919 (_1, 0);
> >>   D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
> >>   return D.2855;
> >> }
> >> to:
> >>  testb_0 (__vector __bool int x)
> >> {
> >>   __vector __bool intD.1486 D.2855;
> >>
> >>   _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
> >>   D.2856 = BIT_FIELD_REF <_1, 32, 0>;
> >>   _2 = {D.2856, D.2856, D.2856, D.2856};
> >>   D.2855 = VIEW_CONVERT_EXPR<__vector __bool intD.1486>(_2);
> >>   return D.2855;
> >> }
> >>
> >>
> >> Testcases are being posted as a separate patch.
> >>
> >> OK for trunk?
> >
> > It certainly works, nowadays we have VEC_DUPLICATE_EXPR though
> > so you could simply do
> >
> >  _1 = VIEW_CONVERT_EXPR<__vector signed intD.1468>(xD.2778);
> >  D.2856 = BIT_FIELD_REF <_1, 32, 0>;
> >  D.2855 = VEC_DUPLICATE_EXPR <__vector __bool intD.1486> (D.2856);
> >
> > not sure what variant is better for followup optimizations and whether
> > we already fold that { D.2856, D.2856, D.2856, D.2856 }; to VEC_DUPLICATE_EXPR.
> >
> > Richard may know more.
>
> We only use VEC_DUPLICATE_EXPR for variable-length vectors.  I still owe
> you a GCC 9 update for:
>
>   https://gcc.gnu.org/ml/gcc-patches/2017-12/msg01039.html
>
> (hadn't forgotten!)
>
> It's a diversion from the patch, sorry, but I think the main alternatives are:
>
> (1) Allow either operand of a VEC_PERM_EXPR to be a scalar
>     ------------------------------------------------------
>
>     Handles VEC_DUPLICATE_EXPR <a> as:
>         VEC_PERM_EXPR <a, a, { 0, ... }>
>
>     Handles IFN_VEC_SHL_INSERT <avec, b> as:
>         VEC_PERM_EXPR <b, avec, { 0, 1, 2, 3, ... }>
>     or: VEC_PERM_EXPR <avec, b, { n, 0, 1, 2, ... }>
>
>     Can also insert a single "a" at the beginning of a vector of "b":
>         VEC_PERM_EXPR <a, b, { 0, 1, 1, 1, ... }>
>
>     Disdvantages:
>
>     (a) It's still a bit limited.  E.g. it can't insert two scalars at
>         the beginning of a vector of "b", which might be useful for SLP.
>
>     (b) Either all targets would need to handle this case explicitly
>         or something would need to massage the tree permute mask before
>         the targets see it.  E.g.:
>
>            VEC_PERM_EXPR <a, b, { 0, 1, 1, ... }>
>         -> permute of { 0, n, n, ... } on two vector broadcast rtxes
>
>     (c) There'd still be an inconsistency.  Either:
>         (i) we'd only use VEC_PERM_EXPRs for variable-length vectors or
>         (ii) we'd have to check before creating a CONSTRUCTOR whether
>              it should be represented as a VEC_PERM_EXPR instead.
>
> (2) Allow only the second operand of a VEC_PERM_EXPR to be a scalar
>     ---------------------------------------------------------------
>
>     Handles VEC_DUPLICATE_EXPR <a> as:
>         VEC_PERM_EXPR <{ 0, ... }, a, { n, ... }>
>
>     Handles IFN_VEC_SHL_INSERT <avec, b> as:
>         VEC_PERM_EXPR <avec, b, { n, 0, 1, 2, ... }>;
>
>     Advantages:
>
>     - Avoids the need for target changes: the mask is the same if the scalar
>       is replaced by a vector.

Interesting idea ;)

>     Disadvantages:
>
>     - (a) still holds (and now we can't do a two-scalar permute).
>
>     - (c) still holds.
>
> (3) Keep VEC_DUPLICATE_EXPR but use it for everything
>     -------------------------------------------------
>
>     Advantages:
>
>     - The handling of duplicates becomes consistent across vector types.

I think this is definitely important - I don't like too much the idea of
having some stuff only used for the variable-size case.

>     - There are no changes to VEC_PERM_EXPR or targets.
>
>     - We can still replace IFN_VEC_SHL_INSERT with a VEC_PERM_EXPR,
>       as a separate change.
>
>     Disadvantages:
>
>     - (c) still holds.
>
>     (d) Vector duplication stays a bit of a special case (which was the
>         original problem).
>
> (4) Use the VECTOR_CST encoding for CONSTRUCTORs too
>     ------------------------------------------------
>
>     Advantages:
>
>     - Reduces memory overhead of CONSTRUCTORs for vector broadcasts
>       in constant-length vectors.
>
>     - There's only one tree code to check.
>
>     - Many more vectors have the same representation for constant- and
>       variable-length vectors.
>
>     - There's more consistency between constant element values and
>       variable element values.
>
>     - We can still replace IFN_VEC_SHL_INSERT with a VEC_PERM_EXPR,
>       as a separate change.
>
>     Disadvantages:
>
>     (e) Completely following the VECTOR_CST encoding would mean having
>         to encode the stepped forms based on two values in the series.
>         E.g. a 3-pattern:
>
>           { a : b : c : ... }
>
>         (ad-hoc syntax) would be:
>
>           { a, b, c, c * 2 - b, c * 3 - b, ... }
>
>         so the CONSTRUCTOR would include hidden arithmetic.  This means
>         that the value of any given element (rather than just a directly-
>         encoded one) would in general not be a gimple value.  E.g. in
>         the above case, asking for element 3 would give a GENERIC tree.

Eh.  The "advantage" would be it handles the VEC_SERIES_EXPR case as well?

>
>     (f) If vector CONSTRUCTORs seem like a mistake then this will too.

I still think they are ;)

> (5) Like (4), but don't allow the stepped case
>     ------------------------------------------
>
>     Advantages:
>
>     - As for (4), and avoids the hidden arithmetic in (e).
>
>     Disadvantages:
>
>     - (f) still holds.
>
>     (g) Stepped vectors need to be handled separately.
>
> For (f), we could (and might need to) replace CONSTRUCTOR with a new
> tree code.  It would still have to be GIMPLE_SINGLE_RHS though, if we
> keep the current "gassigns have 4 operands" assumption.
>
> I'm not sure (g) is really a disadvantage.  We could replace
> VEC_SERIES_EXPR <a, b> with:
>
>     basev = { a, ... }
>     stepv = { b, ... }
>     tmp = stepv * { 0, 1, 2, ... };
>     series = basev + tmp
>
> and match it as an IFN_VEC_SERIES for targets that have it.
> So code constructing vectors would never have to know about
> VEC_SERIES_EXPR/IFN_VEC_SERIES or IFN_VEC_SHL_INSERT.
>
> So I'm leaning towards (5) at the moment.  What do you think?

Generally I dislike CONSTRUCTOR because it hides the implementation
complexity.  That is, for most vector operations we only allow things
in the IL that matches what the target is willing to expand (which doesn't
necessarily mean it expands to one target instruction ...).  At least after
vector lowering.

So this would advocate for exposing all HW instructions as IFN_* but of
course that's too messy to handle in optimizers.  Instead what we do
for VEC_PERM_EXPR via can_vec_perm_const_p looks like a good
way forward to me.

That means (2) looks appealling to me given a two-scalar generic permute
isn't likely to be present in HW.  Still VEC_PERM_EXPR <scalar,
scalar, { 0, 1 }>
as way to concat two scalars to a vector makes sense to me.

We currently have

/* Vector permutation expression.  A = VEC_PERM_EXPR<v0, v1, mask> means

   N = length(mask)
   foreach i in N:
     M = mask[i] % (2*N)
     A = M < N ? v0[M] : v1[M-N]

   V0 and V1 are vectors of the same type.  MASK is an integer-typed
   vector.  The number of MASK elements must be the same with the
   number of elements in V0 and V1.  The size of the inner type
   of the MASK and of the V0 and V1 must be the same.
*/
DEFTREECODE (VEC_PERM_EXPR, "vec_perm_expr", tcc_expression, 3)

and GIMPLE checking enforces same types for the permutation result
and the two permuted vectors.

I'd relax that a bit to allow scalar / vector V0 / V1 and not require
the result to be the same as either vector type but just have the
same element type and element count as the mask vector.  That
would allow concats.

The goal would be that all primitive permute/concat (and extract?)
operations ISAs have can be expressed as a single VEC_PERM_EXPR.

And vector CONSTRUCTORs could be lowered using some automated
generator (as described by the "rewrite vectorizer" talks at the last
Cauldrons) that is seeded by some kind of machine description for
permutation operations.

That said, this would leave VEC_SERIES_EXPR alone unless we'd want
to split it up as you say and it wouldn't complicate CONSTRUCTORs
but walk in the direction of exposing some more target details
at the expense of target complication in their vec_perm_const_ok hook
(until that automated generator thing materializes).

Richard.

>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat
  2018-08-20 21:44       ` [PATCH, RFC, rs6000, v3] enable early gimple-folding " Will Schmidt
@ 2018-08-25 17:16         ` Bill Schmidt
  2018-08-25 18:09           ` Segher Boessenkool
  2018-09-12 13:23         ` Segher Boessenkool
  1 sibling, 1 reply; 17+ messages in thread
From: Bill Schmidt @ 2018-08-25 17:16 UTC (permalink / raw)
  To: will_schmidt, Segher Boessenkool
  Cc: Richard Biener, Bill Schmidt, David Edelsohn, GCC Patches

On 8/20/18 4:44 PM, Will Schmidt wrote:
> Hi
> Enable GIMPLE folding of the vec_splat() intrinsic. (v3).
>     
> This uses the tree_vec_extract() function out of tree-vect-generic.c
> to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
> made non-static as part of this change.
>     
> Testcases are already in-tree.
>     
> V2 updates, per feedback previously received.
> Forced arg1 into range (modulo #elements) before attempting to extract
> the splat value.
> Removed the (now unnecessary) code that did bounds-checking before calling
> the tree_vec_extract helper.
> Used arg0_type rather than lhs_type for calculating the tree size.
>     
> V3 updates, inspired by additional offline discussion with Segher.
> Noting that for vec_splat(arg1,arg2), the ABI describes ARG2 as an element
> number less than the number of elements supported by the respective ARG1
> type, so we do not attempt to gimple-fold the intrinsic if we determine our
> value is out of range. Thus, the initial check ensures that ARG1 is both
> constant and has a valid index into ARG0.
> The subsequent modulo operation is no longer necessary, and has been removed.
> Also eliminated a few now-unused variables.
>     
> Sniff-tests pass.  full regtest against powerpc platforms is in the queue.
> OK for trunk with passing test results?
>     
> Thanks,
> -Will
>     
> [gcc]
>     
> 2018-08-20  Will Schmidt  <will_schmidt@vnet.ibm.com>
>     
>     	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
>     	  early gimple folding of vec_splat().
>     	* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
>     	* gimple-fold.h:  Add an extern define for tree_vec_extract().
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 5f77afd..4c3257f 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -15768,10 +15768,52 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  	 tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
>  	 g = gimple_build_assign (lhs, splat_tree);
>  	 gimple_set_location (g, gimple_location (stmt));
>  	 gsi_replace (gsi, g, true);
>  	 return true;
> +	}
> +
> +    /* Flavors of vec_splat.  */
> +    /* a = vec_splat (b, 0x3) becomes a = { b[3],b[3],b[3],...};  */
> +    case ALTIVEC_BUILTIN_VSPLTB:
> +    case ALTIVEC_BUILTIN_VSPLTH:
> +    case ALTIVEC_BUILTIN_VSPLTW:
> +    case VSX_BUILTIN_XXSPLTD_V2DI:
> +    case VSX_BUILTIN_XXSPLTD_V2DF:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
> +	arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
> +	/* Only fold the vec_splat_*() if arg1 is both a constant value, and a valid
> +	 index into the arg0 vector.  */
> +	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
> +	if (TREE_CODE (arg1) != INTEGER_CST
> +	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))

Space between - and 1.

I'm still not convinced we should do this.  The semantics of vec_splat are
to select the element as arg1 modulo n_elts.  Odd as it seems, it's historically
valid for values between 0 and 31 to be used regardless of the value of n_elts.
Since arg1 is a constant we can easily bring it into valid range and expand it
early, rather than losing optimization opportunities by keeping this as a
built-in call.

Do we have test cases for arg1 in {n_elts, ..., 31}?  What's GCC's current
behavior?  Maybe we already aren't honoring this technically legal case?
If so, it doesn't matter.  (I think we may have had this discussion on IRC
but I have lost track of it.  Doesn't hurt to have it again in a permanent
location. ;-)

Otherwise looks fine to me, though of course I can't approve.

Thanks,
Bill

> +	  return false;
> +	lhs = gimple_call_lhs (stmt);
> +	tree lhs_type = TREE_TYPE (lhs);
> +	tree arg0_type = TREE_TYPE (arg0);
> +	tree splat;
> +	if (TREE_CODE (arg0) == VECTOR_CST)
> +	  splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (arg1));
> +	else
> +	  {
> +	    /* Determine (in bits) the length and start location of the
> +	       splat value for a call to the tree_vec_extract helper.  */
> +	    int splat_elem_size = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
> +				    * BITS_PER_UNIT / n_elts;
> +	    int splat_start_bit = TREE_INT_CST_LOW (arg1) * splat_elem_size;
> +	    tree len = build_int_cst (bitsizetype, splat_elem_size);
> +	    tree start = build_int_cst (bitsizetype, splat_start_bit);
> +	    splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
> +				      len, start);
> +	  }
> +	/* And finally, build the new vector.  */
> +	tree splat_tree = build_vector_from_val (lhs_type, splat);
> +	g = gimple_build_assign (lhs, splat_tree);
> +	gimple_set_location (g, gimple_location (stmt));
> +	gsi_replace (gsi, g, true);
> +	return true;
>        }
>
>      /* vec_mergel (integrals).  */
>      case ALTIVEC_BUILTIN_VMRGLH:
>      case ALTIVEC_BUILTIN_VMRGLW:
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index 04e9bfa..e634180 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -59,10 +59,11 @@ extern tree gimple_fold_indirect_ref (tree);
>  extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
>  extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
>  extern bool arith_code_with_undefined_signed_overflow (tree_code);
>  extern gimple_seq rewrite_to_defined_overflow (gimple *);
>  extern void replace_call_with_value (gimple_stmt_iterator *, tree);
> +extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, tree);
>
>  /* gimple_build, functionally matching fold_buildN, outputs stmts
>     int the provided sequence, matching and simplifying them on-the-fly.
>     Supposed to replace force_gimple_operand (fold_buildN (...), ...).  */
>  extern tree gimple_build (gimple_seq *, location_t,
> diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
> index 909f790..1c9701d 100644
> --- a/gcc/tree-vect-generic.c
> +++ b/gcc/tree-vect-generic.c
> @@ -118,11 +118,11 @@ build_word_mode_vector_type (int nunits)
>
>  typedef tree (*elem_op_func) (gimple_stmt_iterator *,
>  			      tree, tree, tree, tree, tree, enum tree_code,
>  			      tree);
>
> -static inline tree
> +tree
>  tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
>  		  tree t, tree bitsize, tree bitpos)
>  {
>    if (TREE_CODE (t) == SSA_NAME)
>      {
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat
  2018-08-25 17:16         ` Bill Schmidt
@ 2018-08-25 18:09           ` Segher Boessenkool
  2018-08-25 18:55             ` Bill Schmidt
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2018-08-25 18:09 UTC (permalink / raw)
  To: Bill Schmidt
  Cc: will_schmidt, Richard Biener, Bill Schmidt, David Edelsohn, GCC Patches

Hey,

On Sat, Aug 25, 2018 at 12:15:16PM -0500, Bill Schmidt wrote:
> On 8/20/18 4:44 PM, Will Schmidt wrote:
> > Enable GIMPLE folding of the vec_splat() intrinsic. (v3).
> >     
> > This uses the tree_vec_extract() function out of tree-vect-generic.c
> > to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
> > made non-static as part of this change.
> >     
> > Testcases are already in-tree.
> >     
> > V2 updates, per feedback previously received.
> > Forced arg1 into range (modulo #elements) before attempting to extract
> > the splat value.
> > Removed the (now unnecessary) code that did bounds-checking before calling
> > the tree_vec_extract helper.
> > Used arg0_type rather than lhs_type for calculating the tree size.
> >     
> > V3 updates, inspired by additional offline discussion with Segher.
> > Noting that for vec_splat(arg1,arg2), the ABI describes ARG2 as an element
> > number less than the number of elements supported by the respective ARG1
> > type, so we do not attempt to gimple-fold the intrinsic if we determine our
> > value is out of range. Thus, the initial check ensures that ARG1 is both
> > constant and has a valid index into ARG0.
> > The subsequent modulo operation is no longer necessary, and has been removed.

> > 2018-08-20  Will Schmidt  <will_schmidt@vnet.ibm.com>
> >     
> >     	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
> >     	  early gimple folding of vec_splat().

s/  //

> >     	* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
> >     	* gimple-fold.h:  Add an extern define for tree_vec_extract().

s/  / /

> > +	/* Only fold the vec_splat_*() if arg1 is both a constant value, and a valid
> > +	 index into the arg0 vector.  */

Needs two more spaces indent?

> > +	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
> > +	if (TREE_CODE (arg1) != INTEGER_CST
> > +	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))
> 
> Space between - and 1.
> 
> I'm still not convinced we should do this.  The semantics of vec_splat are
> to select the element as arg1 modulo n_elts.  Odd as it seems, it's historically
> valid for values between 0 and 31 to be used regardless of the value of n_elts.

But, the assembler complains.  So it never worked, and it seems better to
not allow it in the compiler either.

A lot of existing testcases generate assembler code the assembler chokes
on -- but the testcases are dg-do compile, so the assembler isn't run.

This is also PR86987; I have patches, but it snowballs because of the
aforementioned existing wrong testcases.

> Since arg1 is a constant we can easily bring it into valid range and expand it
> early, rather than losing optimization opportunities by keeping this as a
> built-in call.

We should just error on invalid input.

> Do we have test cases for arg1 in {n_elts, ..., 31}?  What's GCC's current
> behavior?  Maybe we already aren't honoring this technically legal case?

We do, lots.  Sometimes we compile it; the assembler chokes on the output
(but the assembler is not run in those testcases).  Sometimes we ICE.

> Otherwise looks fine to me, though of course I can't approve.

Looks fine to me, too.

> > +	    int splat_elem_size = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
> > +				    * BITS_PER_UNIT / n_elts;

(Well one thing then...  Wrong indent?)


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat
  2018-08-25 18:09           ` Segher Boessenkool
@ 2018-08-25 18:55             ` Bill Schmidt
  0 siblings, 0 replies; 17+ messages in thread
From: Bill Schmidt @ 2018-08-25 18:55 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: will_schmidt, Richard Biener, Bill Schmidt, David Edelsohn, GCC Patches

On 8/25/18 1:09 PM, Segher Boessenkool wrote:
> Hey,
>
> On Sat, Aug 25, 2018 at 12:15:16PM -0500, Bill Schmidt wrote:
>> On 8/20/18 4:44 PM, Will Schmidt wrote:
>>> Enable GIMPLE folding of the vec_splat() intrinsic. (v3).
>>>     
>>> This uses the tree_vec_extract() function out of tree-vect-generic.c
>>> to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
>>> made non-static as part of this change.
>>>     
>>> Testcases are already in-tree.
>>>     
>>> V2 updates, per feedback previously received.
>>> Forced arg1 into range (modulo #elements) before attempting to extract
>>> the splat value.
>>> Removed the (now unnecessary) code that did bounds-checking before calling
>>> the tree_vec_extract helper.
>>> Used arg0_type rather than lhs_type for calculating the tree size.
>>>     
>>> V3 updates, inspired by additional offline discussion with Segher.
>>> Noting that for vec_splat(arg1,arg2), the ABI describes ARG2 as an element
>>> number less than the number of elements supported by the respective ARG1
>>> type, so we do not attempt to gimple-fold the intrinsic if we determine our
>>> value is out of range. Thus, the initial check ensures that ARG1 is both
>>> constant and has a valid index into ARG0.
>>> The subsequent modulo operation is no longer necessary, and has been removed.
>>> 2018-08-20  Will Schmidt  <will_schmidt@vnet.ibm.com>
>>>     
>>>     	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
>>>     	  early gimple folding of vec_splat().
> s/  //
>
>>>     	* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
>>>     	* gimple-fold.h:  Add an extern define for tree_vec_extract().
> s/  / /
>
>>> +	/* Only fold the vec_splat_*() if arg1 is both a constant value, and a valid
>>> +	 index into the arg0 vector.  */
> Needs two more spaces indent?
>
>>> +	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
>>> +	if (TREE_CODE (arg1) != INTEGER_CST
>>> +	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))
>> Space between - and 1.
>>
>> I'm still not convinced we should do this.  The semantics of vec_splat are
>> to select the element as arg1 modulo n_elts.  Odd as it seems, it's historically
>> valid for values between 0 and 31 to be used regardless of the value of n_elts.
> But, the assembler complains.  So it never worked, and it seems better to
> not allow it in the compiler either.
>
> A lot of existing testcases generate assembler code the assembler chokes
> on -- but the testcases are dg-do compile, so the assembler isn't run.
>
> This is also PR86987; I have patches, but it snowballs because of the
> aforementioned existing wrong testcases.
>
>> Since arg1 is a constant we can easily bring it into valid range and expand it
>> early, rather than losing optimization opportunities by keeping this as a
>> built-in call.
> We should just error on invalid input.
>
>> Do we have test cases for arg1 in {n_elts, ..., 31}?  What's GCC's current
>> behavior?  Maybe we already aren't honoring this technically legal case?
> We do, lots.  Sometimes we compile it; the assembler chokes on the output
> (but the assembler is not run in those testcases).  Sometimes we ICE.

Thanks for the reminders. :-)  No concerns, then.

Bill

>
>> Otherwise looks fine to me, though of course I can't approve.
> Looks fine to me, too.
>
>>> +	    int splat_elem_size = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
>>> +				    * BITS_PER_UNIT / n_elts;
> (Well one thing then...  Wrong indent?)
>
>
> Segher
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat
  2018-08-20 21:44       ` [PATCH, RFC, rs6000, v3] enable early gimple-folding " Will Schmidt
  2018-08-25 17:16         ` Bill Schmidt
@ 2018-09-12 13:23         ` Segher Boessenkool
  2018-09-12 13:47           ` Bill Schmidt
  2018-09-18 15:54           ` Will Schmidt
  1 sibling, 2 replies; 17+ messages in thread
From: Segher Boessenkool @ 2018-09-12 13:23 UTC (permalink / raw)
  To: Will Schmidt
  Cc: Bill Schmidt, Richard Biener, Bill Schmidt, David Edelsohn, GCC Patches

Hi!

Sorry this is all taking so long.

On Mon, Aug 20, 2018 at 04:44:30PM -0500, Will Schmidt wrote:
> 2018-08-20  Will Schmidt  <will_schmidt@vnet.ibm.com>
>     
>     	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
>     	  early gimple folding of vec_splat().

"early" should indent together with the "*" above, not "config".

>     	* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
>     	* gimple-fold.h:  Add an extern define for tree_vec_extract().

One space after : .

> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -15768,10 +15768,52 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  	 tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
>  	 g = gimple_build_assign (lhs, splat_tree);
>  	 gimple_set_location (g, gimple_location (stmt));
>  	 gsi_replace (gsi, g, true);
>  	 return true;
> +	}

Something is wrong with the indent here (in the original code already).

> +	arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
> +	arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
> +	/* Only fold the vec_splat_*() if arg1 is both a constant value, and a valid
> +	 index into the arg0 vector.  */

First comment line wraps, please break it a little earlier?

> +	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
> +	if (TREE_CODE (arg1) != INTEGER_CST
> +	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))
> +	  return false;

I think you should check lower bound 0 as well.  Maybe use IN_RANGE?

The rest looks fine, thanks!  Okay for trunk (with the trivialities fixed).


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat
  2018-09-12 13:23         ` Segher Boessenkool
@ 2018-09-12 13:47           ` Bill Schmidt
  2018-09-18 15:54           ` Will Schmidt
  1 sibling, 0 replies; 17+ messages in thread
From: Bill Schmidt @ 2018-09-12 13:47 UTC (permalink / raw)
  To: Segher Boessenkool, Will Schmidt
  Cc: Richard Biener, Bill Schmidt, David Edelsohn, GCC Patches

On 9/12/18 8:23 AM, Segher Boessenkool wrote:
> Hi!
>
> Sorry this is all taking so long.
>
> On Mon, Aug 20, 2018 at 04:44:30PM -0500, Will Schmidt wrote:
>> 2018-08-20  Will Schmidt  <will_schmidt@vnet.ibm.com>
>>     
>>     	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
>>     	  early gimple folding of vec_splat().
> "early" should indent together with the "*" above, not "config".
>
>>     	* tree-vect-generic.c: Remove static from tree_vec_extract() definition.
>>     	* gimple-fold.h:  Add an extern define for tree_vec_extract().
> One space after : .
>
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -15768,10 +15768,52 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>>  	 tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
>>  	 g = gimple_build_assign (lhs, splat_tree);
>>  	 gimple_set_location (g, gimple_location (stmt));
>>  	 gsi_replace (gsi, g, true);
>>  	 return true;
>> +	}
> Something is wrong with the indent here (in the original code already).
>
>> +	arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
>> +	arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
>> +	/* Only fold the vec_splat_*() if arg1 is both a constant value, and a valid
>> +	 index into the arg0 vector.  */
> First comment line wraps, please break it a little earlier?
>
>> +	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
>> +	if (TREE_CODE (arg1) != INTEGER_CST
>> +	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))
>> +	  return false;
> I think you should check lower bound 0 as well.  Maybe use IN_RANGE?

And "-1" should be "- 1".  Thanks!

Bill

>
> The rest looks fine, thanks!  Okay for trunk (with the trivialities fixed).
>
>
> Segher
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat
  2018-09-12 13:23         ` Segher Boessenkool
  2018-09-12 13:47           ` Bill Schmidt
@ 2018-09-18 15:54           ` Will Schmidt
  2018-09-18 16:29             ` Segher Boessenkool
  1 sibling, 1 reply; 17+ messages in thread
From: Will Schmidt @ 2018-09-18 15:54 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Bill Schmidt, Richard Biener, Bill Schmidt, David Edelsohn, GCC Patches

On Wed, 2018-09-12 at 08:23 -0500, Segher Boessenkool wrote: 
> Hi!

> > +	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
> > +	if (TREE_CODE (arg1) != INTEGER_CST
> > +	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))
> > +	  return false;
> 
> I think you should check lower bound 0 as well.  Maybe use IN_RANGE?

I'm comparing the IN_RANGE usage here with elsewhere in rs6000.c.
Do I need the sext_hwi() adjustment for arg1 within the IN_RANGE check?
As reference, this is for the vec_splat intrinsic, where arg1 is defined
as 'const int', and we want to return false if arg1 is outside the range
of 0..#elements . 
	vector bool char vec_splat (vector bool char, const int);

So.. this?
	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
	if (TREE_CODE (arg1) != INTEGER_CST
	    || !IN_RANGE (TREE_INT_CST_LOW (arg1), 0, (n_elts - 1))
	  return false;

OR something like this with sext_hwi(...) ?:
	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
	if (TREE_CODE (arg1) != INTEGER_CST
	    || !IN_RANGE (sext_hwi (TREE_INT_CST_LOW (arg1), n_elts),
			  0, (n_elts - 1)))
	  return false;


thanks
-Will


> 
> The rest looks fine, thanks!  Okay for trunk (with the trivialities fixed).
> 
> 
> Segher
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat
  2018-09-18 15:54           ` Will Schmidt
@ 2018-09-18 16:29             ` Segher Boessenkool
  0 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2018-09-18 16:29 UTC (permalink / raw)
  To: Will Schmidt
  Cc: Bill Schmidt, Richard Biener, Bill Schmidt, David Edelsohn, GCC Patches

On Tue, Sep 18, 2018 at 10:52:17AM -0500, Will Schmidt wrote:
> On Wed, 2018-09-12 at 08:23 -0500, Segher Boessenkool wrote: 
> > Hi!
> 
> > > +	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
> > > +	if (TREE_CODE (arg1) != INTEGER_CST
> > > +	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))
> > > +	  return false;
> > 
> > I think you should check lower bound 0 as well.  Maybe use IN_RANGE?
> 
> I'm comparing the IN_RANGE usage here with elsewhere in rs6000.c.
> Do I need the sext_hwi() adjustment for arg1 within the IN_RANGE check?

No, it will work fine (it cast to unsigned HWI internally).

> As reference, this is for the vec_splat intrinsic, where arg1 is defined
> as 'const int', and we want to return false if arg1 is outside the range
> of 0..#elements . 
> 	vector bool char vec_splat (vector bool char, const int);
> 
> So.. this?
> 	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
> 	if (TREE_CODE (arg1) != INTEGER_CST
> 	    || !IN_RANGE (TREE_INT_CST_LOW (arg1), 0, (n_elts - 1))
> 	  return false;

That looks fine, but you can lose the parens around "n_elts - 1".

...

It turns out TREE_INT_CST_LOW returns an unsigned HWI, so this is all
moot, sorry for the misdirection; just the > is fine, as you had.

But write:

	if (TREE_CODE (arg1) != INTEGER_CST
	    || TREE_INT_CST_LOW (arg1) >= n_elts)

or use tree_to_uhwi perhaps...  Ask someone who knows trees :-)


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-09-18 16:12 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-07 19:25 [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat Will Schmidt
2018-08-09 13:53 ` Bill Schmidt
2018-08-09 21:07   ` Will Schmidt
2018-08-16 15:50   ` [PATCH, RFC, rs6000, v2] " Will Schmidt
2018-08-16 20:51     ` Segher Boessenkool
2018-08-16 23:21       ` Will Schmidt
2018-08-20 21:44       ` [PATCH, RFC, rs6000, v3] enable early gimple-folding " Will Schmidt
2018-08-25 17:16         ` Bill Schmidt
2018-08-25 18:09           ` Segher Boessenkool
2018-08-25 18:55             ` Bill Schmidt
2018-09-12 13:23         ` Segher Boessenkool
2018-09-12 13:47           ` Bill Schmidt
2018-09-18 15:54           ` Will Schmidt
2018-09-18 16:29             ` Segher Boessenkool
2018-08-17 14:11 ` [PATCH, RFC, rs6000] enable GIMPLE folding " Richard Biener
2018-08-18 17:01   ` VEC_DUPLICATE_EXPR options (was Re: [PATCH, RFC, rs6000] enable GIMPLE folding of vec_splat) Richard Sandiford
2018-08-23 13:50     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).