public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, rs6000] Fix PR80695 (vec_construct cost modeling issue)
@ 2017-05-11 16:23 Bill Schmidt
  2017-05-11 17:40 ` Bill Schmidt
  2017-05-11 20:01 ` Segher Boessenkool
  0 siblings, 2 replies; 3+ messages in thread
From: Bill Schmidt @ 2017-05-11 16:23 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool, David Edelsohn

Hi,

PR80695 identifies a case (similar to several others we've seen) where SLP
vectorization is too aggressive about vectorizing stores.  The problem is
that we undervalue the cost of a vec_construct operation.  vec_construct
is the vectorizer's representation for building a vector from scalar
elements.  When we construct an integer vector type from its constituent
parts, it requires a direct move from two GPRs (one instruction on P9,
two direct moves and a merge on P8).  The high cost of this is not
reflected in the current cost calculation, which only counts the cost
of combining the elements using N-1 inserts.  This patch provides a higher
estimate that is closer to reality.  Note that all cost estimation for
vectorization is a bit rough, so this should be viewed as a heuristic.

The patch treats all integer vectors separately from the default case.
There is already special handling for V4SFmode, so this leaves only
V2DFmode in the default case.  It was previously established heuristically
that a cost factor of 2 was appropriate for V2DFmode, so that is left
unchanged here; but since V2DFmode is the only default, we can simplify
the calculation to just return 2.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
Is this ok for trunk?

Thanks,
Bill


[gcc]

2017-05-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/80695
	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
	Account for direct move costs for vec_construct of integer
	vectors.

[gcc/testsuite]

2017-05-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/80695
	* gcc.target/powerpc/pr80695-p8.c: New file.
	* gcc.target/powerpc/pr80695-p9.c: New file.


Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 247809)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -5849,8 +5849,20 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
 	if (SCALAR_FLOAT_TYPE_P (elem_type)
 	    && TYPE_PRECISION (elem_type) == 32)
 	  return 5;
+	/* On POWER9, integer vector types are built up in GPRs and then
+           use a direct move (2 cycles).  For POWER8 this is even worse,
+           as we need two direct moves and a merge, and the direct moves
+	   are five cycles.  */
+	else if (INTEGRAL_TYPE_P (elem_type))
+	  {
+	    if (TARGET_P9_VECTOR)
+	      return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 2;
+	    else
+	      return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 11;
+	  }
 	else
-	  return max (2, TYPE_VECTOR_SUBPARTS (vectype) - 1);
+	  /* V2DFmode doesn't need a direct move.  */
+	  return 2;
 
       default:
         gcc_unreachable ();
Index: gcc/testsuite/gcc.target/powerpc/pr80695-p8.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr80695-p8.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr80695-p8.c	(working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-mcpu=power8 -O3 -fdump-tree-slp-details" } */
+
+/* PR80695: Verify cost model for vec_construct on POWER8.  */
+
+long a[10] __attribute__((aligned(16)));
+
+void foo (long i, long j, long k, long l)
+{
+  a[6] = i;
+  a[7] = j;
+  a[8] = k;
+  a[9] = l;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 "slp2" } } */
Index: gcc/testsuite/gcc.target/powerpc/pr80695-p9.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr80695-p9.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr80695-p9.c	(working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-mcpu=power9 -O3 -fdump-tree-slp-details" } */
+
+/* PR80695: Verify cost model for vec_construct on POWER9.  */
+
+long a[10] __attribute__((aligned(16)));
+
+void foo (long i, long j, long k, long l)
+{
+  a[6] = i;
+  a[7] = j;
+  a[8] = k;
+  a[9] = l;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 "slp2" } } */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH, rs6000] Fix PR80695 (vec_construct cost modeling issue)
  2017-05-11 16:23 [PATCH, rs6000] Fix PR80695 (vec_construct cost modeling issue) Bill Schmidt
@ 2017-05-11 17:40 ` Bill Schmidt
  2017-05-11 20:01 ` Segher Boessenkool
  1 sibling, 0 replies; 3+ messages in thread
From: Bill Schmidt @ 2017-05-11 17:40 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool, David Edelsohn

I meant to mention that I regstrapped this on both POWER8 and POWER9.

Bill

> On May 11, 2017, at 10:59 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:
> 
> Hi,
> 
> PR80695 identifies a case (similar to several others we've seen) where SLP
> vectorization is too aggressive about vectorizing stores.  The problem is
> that we undervalue the cost of a vec_construct operation.  vec_construct
> is the vectorizer's representation for building a vector from scalar
> elements.  When we construct an integer vector type from its constituent
> parts, it requires a direct move from two GPRs (one instruction on P9,
> two direct moves and a merge on P8).  The high cost of this is not
> reflected in the current cost calculation, which only counts the cost
> of combining the elements using N-1 inserts.  This patch provides a higher
> estimate that is closer to reality.  Note that all cost estimation for
> vectorization is a bit rough, so this should be viewed as a heuristic.
> 
> The patch treats all integer vectors separately from the default case.
> There is already special handling for V4SFmode, so this leaves only
> V2DFmode in the default case.  It was previously established heuristically
> that a cost factor of 2 was appropriate for V2DFmode, so that is left
> unchanged here; but since V2DFmode is the only default, we can simplify
> the calculation to just return 2.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this ok for trunk?
> 
> Thanks,
> Bill
> 
> 
> [gcc]
> 
> 2017-05-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	PR target/80695
> 	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
> 	Account for direct move costs for vec_construct of integer
> 	vectors.
> 
> [gcc/testsuite]
> 
> 2017-05-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	PR target/80695
> 	* gcc.target/powerpc/pr80695-p8.c: New file.
> 	* gcc.target/powerpc/pr80695-p9.c: New file.
> 
> 
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c	(revision 247809)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -5849,8 +5849,20 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
> 	if (SCALAR_FLOAT_TYPE_P (elem_type)
> 	    && TYPE_PRECISION (elem_type) == 32)
> 	  return 5;
> +	/* On POWER9, integer vector types are built up in GPRs and then
> +           use a direct move (2 cycles).  For POWER8 this is even worse,
> +           as we need two direct moves and a merge, and the direct moves
> +	   are five cycles.  */
> +	else if (INTEGRAL_TYPE_P (elem_type))
> +	  {
> +	    if (TARGET_P9_VECTOR)
> +	      return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 2;
> +	    else
> +	      return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 11;
> +	  }
> 	else
> -	  return max (2, TYPE_VECTOR_SUBPARTS (vectype) - 1);
> +	  /* V2DFmode doesn't need a direct move.  */
> +	  return 2;
> 
>       default:
>         gcc_unreachable ();
> Index: gcc/testsuite/gcc.target/powerpc/pr80695-p8.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr80695-p8.c	(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr80695-p8.c	(working copy)
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-mcpu=power8 -O3 -fdump-tree-slp-details" } */
> +
> +/* PR80695: Verify cost model for vec_construct on POWER8.  */
> +
> +long a[10] __attribute__((aligned(16)));
> +
> +void foo (long i, long j, long k, long l)
> +{
> +  a[6] = i;
> +  a[7] = j;
> +  a[8] = k;
> +  a[9] = l;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 "slp2" } } */
> Index: gcc/testsuite/gcc.target/powerpc/pr80695-p9.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr80695-p9.c	(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr80695-p9.c	(working copy)
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-mcpu=power9 -O3 -fdump-tree-slp-details" } */
> +
> +/* PR80695: Verify cost model for vec_construct on POWER9.  */
> +
> +long a[10] __attribute__((aligned(16)));
> +
> +void foo (long i, long j, long k, long l)
> +{
> +  a[6] = i;
> +  a[7] = j;
> +  a[8] = k;
> +  a[9] = l;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 "slp2" } } */
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH, rs6000] Fix PR80695 (vec_construct cost modeling issue)
  2017-05-11 16:23 [PATCH, rs6000] Fix PR80695 (vec_construct cost modeling issue) Bill Schmidt
  2017-05-11 17:40 ` Bill Schmidt
@ 2017-05-11 20:01 ` Segher Boessenkool
  1 sibling, 0 replies; 3+ messages in thread
From: Segher Boessenkool @ 2017-05-11 20:01 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: GCC Patches, David Edelsohn

On Thu, May 11, 2017 at 10:59:41AM -0500, Bill Schmidt wrote:
> PR80695 identifies a case (similar to several others we've seen) where SLP
> vectorization is too aggressive about vectorizing stores.  The problem is
> that we undervalue the cost of a vec_construct operation.  vec_construct
> is the vectorizer's representation for building a vector from scalar
> elements.  When we construct an integer vector type from its constituent
> parts, it requires a direct move from two GPRs (one instruction on P9,
> two direct moves and a merge on P8).  The high cost of this is not
> reflected in the current cost calculation, which only counts the cost
> of combining the elements using N-1 inserts.  This patch provides a higher
> estimate that is closer to reality.  Note that all cost estimation for
> vectorization is a bit rough, so this should be viewed as a heuristic.
> 
> The patch treats all integer vectors separately from the default case.
> There is already special handling for V4SFmode, so this leaves only
> V2DFmode in the default case.  It was previously established heuristically
> that a cost factor of 2 was appropriate for V2DFmode, so that is left
> unchanged here; but since V2DFmode is the only default, we can simplify
> the calculation to just return 2.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this ok for trunk?

Seems fine to me (well, minor stuff below).  Thanks,


Segher


> 2017-05-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	PR target/80695
> 	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
> 	Account for direct move costs for vec_construct of integer
> 	vectors.
> 
> [gcc/testsuite]
> 
> 2017-05-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	PR target/80695
> 	* gcc.target/powerpc/pr80695-p8.c: New file.
> 	* gcc.target/powerpc/pr80695-p9.c: New file.


> @@ -5849,8 +5849,20 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
>  	if (SCALAR_FLOAT_TYPE_P (elem_type)
>  	    && TYPE_PRECISION (elem_type) == 32)
>  	  return 5;
> +	/* On POWER9, integer vector types are built up in GPRs and then
> +           use a direct move (2 cycles).  For POWER8 this is even worse,
> +           as we need two direct moves and a merge, and the direct moves
> +	   are five cycles.  */

You're mixing tabs and spaces here.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-05-11 19:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-11 16:23 [PATCH, rs6000] Fix PR80695 (vec_construct cost modeling issue) Bill Schmidt
2017-05-11 17:40 ` Bill Schmidt
2017-05-11 20:01 ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).