[PATCH] Fix latent SLP bugs and a memleak

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Fix latent SLP bugs and a memleak
@ 2019-01-21 14:52 Richard Biener
  2019-01-21 21:56 ` Richard Sandiford
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2019-01-21 14:52 UTC (permalink / raw)
  To: gcc-patches


When investigating some issues I ran into two SLP op gathering issues.
*You may not ask for SLP vector def ops in the wrong order*

Both vectorizable_comparison and vectorize_fold_left_reduction fall
into this trap.

I've also fixed a memleak uncovered by valgrind that I ran on the
above fix and a testcase.  I'm too lazy to try generate a testcase
that also fails with unpatched trunk but I think this qualifies
for this stage (at least the vectorizable_comparison one should
be easy to create).

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

From b036b1ac618d9deb0efff89051f6ba1036d84ea3 Mon Sep 17 00:00:00 2001
From: Richard Guenther <rguenther@suse.de>
Date: Mon, 21 Jan 2019 15:40:49 +0100
Subject: [PATCH] fix-slp-op-gatherings-and-memleak

	* tree-vect-loop.c (vect_analyze_loop_operations): Use
	auto_vec for cost vector to fix memleak.
	(vectorize_fold_left_reduction): Properly gather SLP defs.
	(vectorizable_comparison): Do not swap operands to properly
	gather SLP defs.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 6ec8e72cd99..50d19427e54 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1470,8 +1470,7 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
 
   DUMP_VECT_SCOPE ("vect_analyze_loop_operations");
 
-  stmt_vector_for_cost cost_vec;
-  cost_vec.create (2);
+  auto_vec<stmt_info_for_cost> cost_vec;
 
   for (i = 0; i < nbbs; i++)
     {
@@ -1581,7 +1580,6 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
     } /* bbs */
 
   add_stmt_costs (loop_vinfo->target_cost_data, &cost_vec);
-  cost_vec.release ();
 
   /* All operations in the loop are either irrelevant (deal with loop
      control, or dead), or only used outside the loop and can be moved
@@ -5742,8 +5740,14 @@ vectorize_fold_left_reduction (stmt_vec_info stmt_info,
   auto_vec<tree> vec_oprnds0;
   if (slp_node)
     {
-      vect_get_vec_defs (op0, NULL_TREE, stmt_info, &vec_oprnds0, NULL,
-			 slp_node);
+      auto_vec<vec<tree> > vec_defs (2);
+      auto_vec<tree> sops(2);
+      sops.quick_push (ops[0]);
+      sops.quick_push (ops[1]);
+      vect_get_slp_defs (sops, slp_node, &vec_defs);
+      vec_oprnds0.safe_splice (vec_defs[1 - reduc_index]);
+      vec_defs[0].release ();
+      vec_defs[1].release ();
       group_size = SLP_TREE_SCALAR_STMTS (slp_node).length ();
       scalar_dest_def_info = SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1];
     }
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index c4f645e6d7d..b93097d6a6f 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -9274,6 +9274,7 @@ vectorizable_comparison (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
        BITOP2 (rhs1 BITOP1 rhs2) or
        rhs1 BITOP2 (BITOP1 rhs2)
      depending on bitop1 and bitop2 arity.  */
+  bool swap_p = false;
   if (VECTOR_BOOLEAN_TYPE_P (vectype))
     {
       if (code == GT_EXPR)
@@ -9290,15 +9291,13 @@ vectorizable_comparison (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
 	{
 	  bitop1 = BIT_NOT_EXPR;
 	  bitop2 = BIT_AND_EXPR;
-	  std::swap (rhs1, rhs2);
-	  std::swap (dts[0], dts[1]);
+	  swap_p = true;
 	}
       else if (code == LE_EXPR)
 	{
 	  bitop1 = BIT_NOT_EXPR;
 	  bitop2 = BIT_IOR_EXPR;
-	  std::swap (rhs1, rhs2);
-	  std::swap (dts[0], dts[1]);
+	  swap_p = true;
 	}
       else
 	{
@@ -9365,6 +9364,8 @@ vectorizable_comparison (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
 	      vect_get_slp_defs (ops, slp_node, &vec_defs);
 	      vec_oprnds1 = vec_defs.pop ();
 	      vec_oprnds0 = vec_defs.pop ();
+	      if (swap_p)
+		std::swap (vec_oprnds0, vec_oprnds1);
 	    }
 	  else
 	    {
@@ -9384,6 +9385,8 @@ vectorizable_comparison (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
 
       if (!slp_node)
 	{
+	  if (swap_p)
+	    std::swap (vec_rhs1, vec_rhs2);
 	  vec_oprnds0.quick_push (vec_rhs1);
 	  vec_oprnds1.quick_push (vec_rhs2);
 	}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Fix latent SLP bugs and a memleak
  2019-01-21 14:52 [PATCH] Fix latent SLP bugs and a memleak Richard Biener
@ 2019-01-21 21:56 ` Richard Sandiford
  2019-01-22  8:28   ` Richard Biener
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Sandiford @ 2019-01-21 21:56 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Richard Biener <rguenther@suse.de> writes:
> @@ -5742,8 +5740,14 @@ vectorize_fold_left_reduction (stmt_vec_info stmt_info,
>    auto_vec<tree> vec_oprnds0;
>    if (slp_node)
>      {
> -      vect_get_vec_defs (op0, NULL_TREE, stmt_info, &vec_oprnds0, NULL,
> -			 slp_node);
> +      auto_vec<vec<tree> > vec_defs (2);
> +      auto_vec<tree> sops(2);
> +      sops.quick_push (ops[0]);
> +      sops.quick_push (ops[1]);
> +      vect_get_slp_defs (sops, slp_node, &vec_defs);
> +      vec_oprnds0.safe_splice (vec_defs[1 - reduc_index]);
> +      vec_defs[0].release ();
> +      vec_defs[1].release ();
>        group_size = SLP_TREE_SCALAR_STMTS (slp_node).length ();
>        scalar_dest_def_info = SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1];
>      }

Ewww. :-)  Would be nice if it was easier to do the right thing.

But are you sure we want this for fold-left reductions?  The other
operand isn't supposed to be vectorised -- it stays a scalar even
in the vector loop.

Thanks,
Richard


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Fix latent SLP bugs and a memleak
  2019-01-21 21:56 ` Richard Sandiford
@ 2019-01-22  8:28   ` Richard Biener
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Biener @ 2019-01-22  8:28 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

On Mon, 21 Jan 2019, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > @@ -5742,8 +5740,14 @@ vectorize_fold_left_reduction (stmt_vec_info stmt_info,
> >    auto_vec<tree> vec_oprnds0;
> >    if (slp_node)
> >      {
> > -      vect_get_vec_defs (op0, NULL_TREE, stmt_info, &vec_oprnds0, NULL,
> > -			 slp_node);
> > +      auto_vec<vec<tree> > vec_defs (2);
> > +      auto_vec<tree> sops(2);
> > +      sops.quick_push (ops[0]);
> > +      sops.quick_push (ops[1]);
> > +      vect_get_slp_defs (sops, slp_node, &vec_defs);
> > +      vec_oprnds0.safe_splice (vec_defs[1 - reduc_index]);
> > +      vec_defs[0].release ();
> > +      vec_defs[1].release ();
> >        group_size = SLP_TREE_SCALAR_STMTS (slp_node).length ();
> >        scalar_dest_def_info = SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1];
> >      }
> 
> Ewww. :-)  Would be nice if it was easier to do the right thing.

Working on that ;)

> But are you sure we want this for fold-left reductions?  The other
> operand isn't supposed to be vectorised -- it stays a scalar even
> in the vector loop.

It seems to "work" for me.  The issue appears with MINUS_EXPR
fold-left reductions (you may have guessed that).  I guess
it's possible to create a SLP testcase even without patched trunk
(I have group_size == 1 SLP instances "enabled").

In the end vect_get_slp_defs will, if no actual vectorized stmts
are recorded, not push any to the defs vector which means
vec_defs[0] will remain empty.

Richard.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-01-22  8:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-21 14:52 [PATCH] Fix latent SLP bugs and a memleak Richard Biener
2019-01-21 21:56 ` Richard Sandiford
2019-01-22  8:28   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).