[PATCH] tree-optimization/110381 - preserve SLP permutation with in-order reductions

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] tree-optimization/110381 - preserve SLP permutation with in-order reductions
@ 2023-06-26 12:17 Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2023-06-26 12:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford

The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements.  But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.

Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

	PR tree-optimization/110381
	* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
	Materialize permutes before fold-left reductions.

	* gcc.dg/vect/pr110381.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr110381.c | 40 ++++++++++++++++++++++++++++
 gcc/tree-vect-slp.cc                 | 18 +++++++++++--
 2 files changed, 56 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr110381.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr110381.c b/gcc/testsuite/gcc.dg/vect/pr110381.c
new file mode 100644
index 00000000000..2313dbf11ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr110381.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+
+struct FOO {
+   double a;
+   double b;
+   double c;
+};
+
+double __attribute__((noipa))
+sum_8_foos(const struct FOO* foos)
+{
+  double sum = 0;
+
+  for (int i = 0; i < 8; ++i)
+    {
+      struct FOO foo = foos[i];
+
+      /* Need to use an in-order reduction here, preserving
+         the load permutation.  */
+      sum += foo.a;
+      sum += foo.c;
+      sum += foo.b;
+    }
+
+  return sum;
+}
+
+int main()
+{
+  struct FOO foos[8];
+
+  __builtin_memset (foos, 0, sizeof (foos));
+  foos[0].a = __DBL_MAX__;
+  foos[0].b = 5;
+  foos[0].c = -__DBL_MAX__;
+
+  if (sum_8_foos (foos) != 5)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4481d43e3d7..8cb1ac1f319 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4682,14 +4682,28 @@ vect_optimize_slp_pass::start_choosing_layouts ()
   m_partition_layout_costs.safe_grow_cleared (m_partitions.length ()
 					      * m_perms.length ());
 
-  /* We have to mark outgoing permutations facing non-reduction graph
-     entries that are not represented as to be materialized.  */
+  /* We have to mark outgoing permutations facing non-associating-reduction
+     graph entries that are not represented as to be materialized.
+     slp_inst_kind_bb_reduc currently only covers associatable reductions.  */
   for (slp_instance instance : m_vinfo->slp_instances)
     if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor)
       {
 	unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
 	m_partitions[m_vertices[node_i].partition].layout = 0;
       }
+    else if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_reduc_chain)
+      {
+	stmt_vec_info stmt_info
+	  = SLP_TREE_REPRESENTATIVE (SLP_INSTANCE_TREE (instance));
+	stmt_vec_info reduc_info = info_for_reduction (m_vinfo, stmt_info);
+	if (needs_fold_left_reduction_p (TREE_TYPE
+					   (gimple_get_lhs (stmt_info->stmt)),
+					 STMT_VINFO_REDUC_CODE (reduc_info)))
+	  {
+	    unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
+	    m_partitions[m_vertices[node_i].partition].layout = 0;
+	  }
+      }
 
   /* Check which layouts each node and partition can handle.  Calculate the
      weights associated with inserting layout changes on edges.  */
-- 
2.35.3

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] tree-optimization/110381 - preserve SLP permutation with in-order reductions
       [not found] <8749469a-6303-49f5-9484-80999bf133c4@AM7EUR03FT045.eop-EUR03.prod.protection.outlook.com>
@ 2023-06-26 12:57 ` Richard Sandiford
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Sandiford @ 2023-06-26 12:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Richard Biener <rguenther@suse.de> writes:
> The following fixes a bug that manifests itself during fold-left
> reduction transform in picking not the last scalar def to replace
> and thus double-counting some elements.  But the underlying issue
> is that we merge a load permutation into the in-order reduction
> which is of course wrong.
>
> Now, reduction analysis has not yet been performend when optimizing
> permutations so we have to resort to check that ourselves.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
>
> 	PR tree-optimization/110381
> 	* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
> 	Materialize permutes before fold-left reductions.
>
> 	* gcc.dg/vect/pr110381.c: New testcase.

Thanks, LGTM FWIW.

Richard

> ---
>  gcc/testsuite/gcc.dg/vect/pr110381.c | 40 ++++++++++++++++++++++++++++
>  gcc/tree-vect-slp.cc                 | 18 +++++++++++--
>  2 files changed, 56 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr110381.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr110381.c b/gcc/testsuite/gcc.dg/vect/pr110381.c
> new file mode 100644
> index 00000000000..2313dbf11ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr110381.c
> @@ -0,0 +1,40 @@
> +/* { dg-do run } */
> +
> +struct FOO {
> +   double a;
> +   double b;
> +   double c;
> +};
> +
> +double __attribute__((noipa))
> +sum_8_foos(const struct FOO* foos)
> +{
> +  double sum = 0;
> +
> +  for (int i = 0; i < 8; ++i)
> +    {
> +      struct FOO foo = foos[i];
> +
> +      /* Need to use an in-order reduction here, preserving
> +         the load permutation.  */
> +      sum += foo.a;
> +      sum += foo.c;
> +      sum += foo.b;
> +    }
> +
> +  return sum;
> +}
> +
> +int main()
> +{
> +  struct FOO foos[8];
> +
> +  __builtin_memset (foos, 0, sizeof (foos));
> +  foos[0].a = __DBL_MAX__;
> +  foos[0].b = 5;
> +  foos[0].c = -__DBL_MAX__;
> +
> +  if (sum_8_foos (foos) != 5)
> +    __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 4481d43e3d7..8cb1ac1f319 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -4682,14 +4682,28 @@ vect_optimize_slp_pass::start_choosing_layouts ()
>    m_partition_layout_costs.safe_grow_cleared (m_partitions.length ()
>  					      * m_perms.length ());
>  
> -  /* We have to mark outgoing permutations facing non-reduction graph
> -     entries that are not represented as to be materialized.  */
> +  /* We have to mark outgoing permutations facing non-associating-reduction
> +     graph entries that are not represented as to be materialized.
> +     slp_inst_kind_bb_reduc currently only covers associatable reductions.  */
>    for (slp_instance instance : m_vinfo->slp_instances)
>      if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor)
>        {
>  	unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
>  	m_partitions[m_vertices[node_i].partition].layout = 0;
>        }
> +    else if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_reduc_chain)
> +      {
> +	stmt_vec_info stmt_info
> +	  = SLP_TREE_REPRESENTATIVE (SLP_INSTANCE_TREE (instance));
> +	stmt_vec_info reduc_info = info_for_reduction (m_vinfo, stmt_info);
> +	if (needs_fold_left_reduction_p (TREE_TYPE
> +					   (gimple_get_lhs (stmt_info->stmt)),
> +					 STMT_VINFO_REDUC_CODE (reduc_info)))
> +	  {
> +	    unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
> +	    m_partitions[m_vertices[node_i].partition].layout = 0;
> +	  }
> +      }
>  
>    /* Check which layouts each node and partition can handle.  Calculate the
>       weights associated with inserting layout changes on edges.  */

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-06-26 12:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-26 12:17 [PATCH] tree-optimization/110381 - preserve SLP permutation with in-order reductions Richard Biener
     [not found] <8749469a-6303-49f5-9484-80999bf133c4@AM7EUR03FT045.eop-EUR03.prod.protection.outlook.com>
2023-06-26 12:57 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).