public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch V2] Fix SLP PR58135.
@ 2016-05-17 11:56 Kumar, Venkataramanan
  2016-05-17 12:09 ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Kumar, Venkataramanan @ 2016-05-17 11:56 UTC (permalink / raw)
  To: Richard Beiner (richard.guenther@gmail.com), gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1517 bytes --]

Hi Richard, 

I created the patch by passing -b option to git. Now the patch is more readable.

As per your suggestion I tried to fix the PR by splitting the SLP store group at vector boundary after the SLP tree is built.

Boot strap PASSED on x86_64.
Checked the patch with check_GNU_style.sh.

The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence it  generated 2 more vzeroupper.  
As recommended I adjusted the test case by adding -fno-tree-slp-vectorize to make it as expected after loop vectorization.

The following tests are now passing.

------ Snip-----
Tests that now work, but didn't before:

gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects  scan-tree-dump-times slp2 "basic block vectorized" 1

gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block vectorized" 1

New tests that PASS:

gcc.dg/vect/pr58135.c (test for excess errors) gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess errors)

------ Snip-----

ChangeLog

2016-05-14  Venkataramanan Kumar  <Venkataramanan.kumar@amd.com>
     PR tree-optimization/58135
    * tree-vect-slp.c:  When group size is not multiple of vector size, 
     allow splitting of store group at vector boundary. 

Test suite  ChangeLog
2016-05-14  Venkataramanan Kumar  <Venkataramanan.kumar@amd.com>
    * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL. 
    * gcc.dg/vect/pr58135.c:  Add new.
    * gfortran.dg/pr46519-1.f: Adjust test case.

The attached patch Ok for trunk?

Regards,
Venkat.


[-- Attachment #2: pr58135.diff.txt --]
[-- Type: text/plain, Size: 5890 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-19.c b/gcc/testsuite/gcc.dg/vect/bb-slp-19.c
index 42cd294..c282155 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-19.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-19.c
@@ -53,5 +53,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2"  { xfail *-*-* }  } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/pr58135.c b/gcc/testsuite/gcc.dg/vect/pr58135.c
new file mode 100644
index 0000000..ca25000
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr58135.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+int a[100];
+void foo ()
+{
+  a[0] = a[1] = a[2] = a[3] = a[4]= 0;
+}
+
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */
diff --git a/gcc/testsuite/gfortran.dg/pr46519-1.f b/gcc/testsuite/gfortran.dg/pr46519-1.f
index 51c64b8..46be9f5 100644
--- a/gcc/testsuite/gfortran.dg/pr46519-1.f
+++ b/gcc/testsuite/gfortran.dg/pr46519-1.f
@@ -1,5 +1,5 @@
 ! { dg-do compile { target i?86-*-* x86_64-*-* } }
-! { dg-options "-O3 -mavx -mvzeroupper -mtune=generic -dp" }
+! { dg-options "-O3 -mavx -mvzeroupper -fno-tree-slp-vectorize -mtune=generic -dp" }
 
       PROGRAM MG3XDEMO 
       INTEGER LM, NM, NV, NR, NIT
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index d713848..23a127f 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1754,18 +1754,6 @@ vect_analyze_slp_instance (vec_info *vinfo,
     }
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
-  /* Calculate the unrolling factor.  */
-  unrolling_factor = least_common_multiple (nunits, group_size) / group_size;
-  if (unrolling_factor != 1 && is_a <bb_vec_info> (vinfo))
-    {
-      if (dump_enabled_p ())
-        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "Build SLP failed: unrolling required in basic"
-			 " block SLP\n");
-
-      return false;
-    }
-
   /* Create a node (a root of the SLP tree) for the packed grouped stores.  */
   scalar_stmts.create (group_size);
   next = stmt;
@@ -1801,21 +1789,43 @@ vect_analyze_slp_instance (vec_info *vinfo,
   /* Build the tree for the SLP instance.  */
   bool *matches = XALLOCAVEC (bool, group_size);
   unsigned npermutes = 0;
-  if ((node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
+
+  node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
 			      &max_nunits, &loads, matches, &npermutes,
-				   NULL, max_tree_size)) != NULL)
+			      NULL, max_tree_size);
+
+  if (node != NULL)
+    {
+      /* Calculate the unrolling factor.  */
+      unrolling_factor = least_common_multiple
+			  (nunits, group_size) / group_size;
+
+      if (is_a <bb_vec_info> (vinfo)
+	  && nunits < group_size
+	  && unrolling_factor != 1
+	  && is_a <bb_vec_info> (vinfo))
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "Build SLP failed: store group "
+			   "size not a multiple of the vector size "
+			   "in basic block SLP\n");
+	  /* Fatal mismatch.  */
+	  matches[nunits] = false;
+	}
+      else
 	{
 	  /* Calculate the unrolling factor based on the smallest type.  */
 	  if (max_nunits > nunits)
-        unrolling_factor = least_common_multiple (max_nunits, group_size)
-                           / group_size;
+	    unrolling_factor
+		= least_common_multiple (max_nunits, group_size)/group_size;
 
 	  if (unrolling_factor != 1 && is_a <bb_vec_info> (vinfo))
 	    {
 	      if (dump_enabled_p ())
-            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "Build SLP failed: unrolling required in basic"
-			     " block SLP\n");
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+				 vect_location,
+				 "Build SLP failed: unrolling "
+				 "required in basic block SLP\n");
 	      vect_free_slp_tree (node);
 	      loads.release ();
 	      return false;
@@ -1842,8 +1852,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
 		(vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (load_node)[0]));
 	      FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (load_node), j, load)
 		{
-	      int load_place
-		= vect_get_place_in_interleaving_chain (load, first_stmt);
+		  int load_place = vect_get_place_in_interleaving_chain
+				     (load, first_stmt);
 		  gcc_assert (load_place != -1);
 		  if (load_place != j)
 		    this_load_permuted = true;
@@ -1873,7 +1883,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
 		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 				       "Build SLP failed: unsupported load "
 				       "permutation ");
-                  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+		      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,
+					TDF_SLIM, stmt, 0);
 		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
 		    }
 		  vect_free_slp_instance (new_instance);
@@ -1881,7 +1892,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
 		}
 	    }
 
-      /* If the loads and stores can be handled with load/store-lane
+	  /* If the loads and stores can be handled with load/store-lan
 	     instructions do not generate this SLP instance.  */
 	  if (is_a <loop_vec_info> (vinfo)
 	      && loads_permuted
@@ -1893,7 +1904,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
 		  gimple *first_stmt = GROUP_FIRST_ELEMENT
 		    (vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (load_node)[0]));
 		  stmt_vec_info stmt_vinfo = vinfo_for_stmt (first_stmt);
-	      /* Use SLP for strided accesses (or if we can't load-lanes).  */
+		  /* Use SLP for strided accesses (or if we
+		     can't load-lanes).  */
 		  if (STMT_VINFO_STRIDED_P (stmt_vinfo)
 		    || ! vect_load_lanes_supported
 			  (STMT_VINFO_VECTYPE (stmt_vinfo),
@@ -1922,6 +1934,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
 
 	  return true;
 	}
+    }
 
   /* Failed to SLP.  */
   /* Free the allocated memory.  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch V2] Fix SLP PR58135.
  2016-05-17 11:56 [Patch V2] Fix SLP PR58135 Kumar, Venkataramanan
@ 2016-05-17 12:09 ` Richard Biener
  2016-05-18 15:30   ` Kumar, Venkataramanan
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2016-05-17 12:09 UTC (permalink / raw)
  To: Kumar, Venkataramanan; +Cc: gcc-patches

On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
> Hi Richard,
>
> I created the patch by passing -b option to git. Now the patch is more readable.
>
> As per your suggestion I tried to fix the PR by splitting the SLP store group at vector boundary after the SLP tree is built.
>
> Boot strap PASSED on x86_64.
> Checked the patch with check_GNU_style.sh.
>
> The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence it  generated 2 more vzeroupper.
> As recommended I adjusted the test case by adding -fno-tree-slp-vectorize to make it as expected after loop vectorization.
>
> The following tests are now passing.
>
> ------ Snip-----
> Tests that now work, but didn't before:
>
> gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects  scan-tree-dump-times slp2 "basic block vectorized" 1
>
> gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block vectorized" 1
>
> New tests that PASS:
>
> gcc.dg/vect/pr58135.c (test for excess errors) gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess errors)
>
> ------ Snip-----
>
> ChangeLog
>
> 2016-05-14  Venkataramanan Kumar  <Venkataramanan.kumar@amd.com>
>      PR tree-optimization/58135
>     * tree-vect-slp.c:  When group size is not multiple of vector size,
>      allow splitting of store group at vector boundary.
>
> Test suite  ChangeLog
> 2016-05-14  Venkataramanan Kumar  <Venkataramanan.kumar@amd.com>
>     * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL.
>     * gcc.dg/vect/pr58135.c:  Add new.
>     * gfortran.dg/pr46519-1.f: Adjust test case.
>
> The attached patch Ok for trunk?


Please avoid the excessive vertical space around the vect_build_slp_tree call.

+      /* Calculate the unrolling factor.  */
+      unrolling_factor = least_common_multiple
+                         (nunits, group_size) / group_size;
...
+      else
        {
          /* Calculate the unrolling factor based on the smallest type.  */
          if (max_nunits > nunits)
-        unrolling_factor = least_common_multiple (max_nunits, group_size)
-                           / group_size;
+           unrolling_factor
+               = least_common_multiple (max_nunits, group_size)/group_size;

please compute the "correct" unroll factor immediately and move the
"unrolling of BB required" error into the if() case by post-poning the
nunits < group_size check (and use max_nunits here).

+      if (is_a <bb_vec_info> (vinfo)
+         && nunits < group_size
+         && unrolling_factor != 1
+         && is_a <bb_vec_info> (vinfo))
+       {
+         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                          "Build SLP failed: store group "
+                          "size not a multiple of the vector size "
+                          "in basic block SLP\n");
+         /* Fatal mismatch.  */
+         matches[nunits] = false;

this is too pessimistic - you want to add the extra 'false' at
group_size / max_nunits * max_nunits.

It looks like you leak 'node' in the if () path as well.  You need

          vect_free_slp_tree (node);
          loads.release ();

thus treat it as a failure case.

Thanks,
Richard.

> Regards,
> Venkat.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [Patch V2] Fix SLP PR58135.
  2016-05-17 12:09 ` Richard Biener
@ 2016-05-18 15:30   ` Kumar, Venkataramanan
  2016-05-19 10:38     ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Kumar, Venkataramanan @ 2016-05-18 15:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4096 bytes --]

Hi Richard,

> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Tuesday, May 17, 2016 5:40 PM
> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [Patch V2] Fix SLP PR58135.
> 
> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan
> <Venkataramanan.Kumar@amd.com> wrote:
> > Hi Richard,
> >
> > I created the patch by passing -b option to git. Now the patch is more
> readable.
> >
> > As per your suggestion I tried to fix the PR by splitting the SLP store group at
> vector boundary after the SLP tree is built.
> >
> > Boot strap PASSED on x86_64.
> > Checked the patch with check_GNU_style.sh.
> >
> > The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence it
> generated 2 more vzeroupper.
> > As recommended I adjusted the test case by adding -fno-tree-slp-vectorize
> to make it as expected after loop vectorization.
> >
> > The following tests are now passing.
> >
> > ------ Snip-----
> > Tests that now work, but didn't before:
> >
> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects  scan-tree-dump-times
> > slp2 "basic block vectorized" 1
> >
> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block
> > vectorized" 1
> >
> > New tests that PASS:
> >
> > gcc.dg/vect/pr58135.c (test for excess errors) gcc.dg/vect/pr58135.c
> > -flto -ffat-lto-objects (test for excess errors)
> >
> > ------ Snip-----
> >
> > ChangeLog
> >
> > 2016-05-14  Venkataramanan Kumar
> <Venkataramanan.kumar@amd.com>
> >      PR tree-optimization/58135
> >     * tree-vect-slp.c:  When group size is not multiple of vector size,
> >      allow splitting of store group at vector boundary.
> >
> > Test suite  ChangeLog
> > 2016-05-14  Venkataramanan Kumar
> <Venkataramanan.kumar@amd.com>
> >     * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL.
> >     * gcc.dg/vect/pr58135.c:  Add new.
> >     * gfortran.dg/pr46519-1.f: Adjust test case.
> >
> > The attached patch Ok for trunk?
> 
> 
> Please avoid the excessive vertical space around the vect_build_slp_tree call.
Yes fixed in the attached patch.
> 
> +      /* Calculate the unrolling factor.  */
> +      unrolling_factor = least_common_multiple
> +                         (nunits, group_size) / group_size;
> ...
> +      else
>         {
>           /* Calculate the unrolling factor based on the smallest type.  */
>           if (max_nunits > nunits)
> -        unrolling_factor = least_common_multiple (max_nunits, group_size)
> -                           / group_size;
> +           unrolling_factor
> +               = least_common_multiple (max_nunits,
> + group_size)/group_size;
> 
> please compute the "correct" unroll factor immediately and move the
> "unrolling of BB required" error into the if() case by post-poning the nunits <
> group_size check (and use max_nunits here).
> 
Yes fixed in the attached patch.

> +      if (is_a <bb_vec_info> (vinfo)
> +         && nunits < group_size
> +         && unrolling_factor != 1
> +         && is_a <bb_vec_info> (vinfo))
> +       {
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "Build SLP failed: store group "
> +                          "size not a multiple of the vector size "
> +                          "in basic block SLP\n");
> +         /* Fatal mismatch.  */
> +         matches[nunits] = false;
> 
> this is too pessimistic - you want to add the extra 'false' at group_size /
> max_nunits * max_nunits.
Yes fixed in attached patch. 

> 
> It looks like you leak 'node' in the if () path as well.  You need
> 
>           vect_free_slp_tree (node);
>           loads.release ();
> 
> thus treat it as a failure case.

Yes fixed. I added an else part before scalar_stmts.release call for the case when SLP tree is not built. This avoids double freeing.
Bootstrapped and reg tested on X86_64.

Ok for trunk ?
> 
> Thanks,
> Richard.
> 
> > Regards,
> > Venkat.
> >

Regards,
Venkat.

[-- Attachment #2: pr58135.diff.txt --]
[-- Type: text/plain, Size: 6302 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-19.c b/gcc/testsuite/gcc.dg/vect/bb-slp-19.c
index 42cd294..c282155 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-19.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-19.c
@@ -53,5 +53,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2"  { xfail *-*-* }  } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/pr58135.c b/gcc/testsuite/gcc.dg/vect/pr58135.c
new file mode 100644
index 0000000..ca25000
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr58135.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+int a[100];
+void foo ()
+{
+  a[0] = a[1] = a[2] = a[3] = a[4]= 0;
+}
+
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */
diff --git a/gcc/testsuite/gfortran.dg/pr46519-1.f b/gcc/testsuite/gfortran.dg/pr46519-1.f
index 51c64b8..46be9f5 100644
--- a/gcc/testsuite/gfortran.dg/pr46519-1.f
+++ b/gcc/testsuite/gfortran.dg/pr46519-1.f
@@ -1,5 +1,5 @@
 ! { dg-do compile { target i?86-*-* x86_64-*-* } }
-! { dg-options "-O3 -mavx -mvzeroupper -mtune=generic -dp" }
+! { dg-options "-O3 -mavx -mvzeroupper -fno-tree-slp-vectorize -mtune=generic -dp" }
 
       PROGRAM MG3XDEMO 
       INTEGER LM, NM, NV, NR, NIT
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index d713848..3babfcd 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1754,18 +1754,6 @@ vect_analyze_slp_instance (vec_info *vinfo,
     }
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
-  /* Calculate the unrolling factor.  */
-  unrolling_factor = least_common_multiple (nunits, group_size) / group_size;
-  if (unrolling_factor != 1 && is_a <bb_vec_info> (vinfo))
-    {
-      if (dump_enabled_p ())
-        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "Build SLP failed: unrolling required in basic"
-			 " block SLP\n");
-
-      return false;
-    }
-
   /* Create a node (a root of the SLP tree) for the packed grouped stores.  */
   scalar_stmts.create (group_size);
   next = stmt;
@@ -1801,26 +1789,44 @@ vect_analyze_slp_instance (vec_info *vinfo,
   /* Build the tree for the SLP instance.  */
   bool *matches = XALLOCAVEC (bool, group_size);
   unsigned npermutes = 0;
-  if ((node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
+  node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
 			      &max_nunits, &loads, matches, &npermutes,
-				   NULL, max_tree_size)) != NULL)
+			      NULL, max_tree_size);
+  if (node != NULL)
     {
-      /* Calculate the unrolling factor based on the smallest type.  */
-      if (max_nunits > nunits)
-        unrolling_factor = least_common_multiple (max_nunits, group_size)
-                           / group_size;
+      /*Calculate the unrolling factor based on the smallest type.  */
+      unrolling_factor
+	= least_common_multiple (max_nunits, group_size)/group_size;
 
-      if (unrolling_factor != 1 && is_a <bb_vec_info> (vinfo))
+      if (max_nunits > nunits
+	  && unrolling_factor != 1
+	  && is_a <bb_vec_info> (vinfo))
 	{
 	  if (dump_enabled_p ())
-            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "Build SLP failed: unrolling required in basic"
-			     " block SLP\n");
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+			       vect_location,
+			       "Build SLP failed: unrolling "
+			       "required in basic block SLP\n");
 	  vect_free_slp_tree (node);
 	  loads.release ();
 	  return false;
 	}
 
+      if (is_a <bb_vec_info> (vinfo)
+	  && max_nunits < group_size
+	  && unrolling_factor != 1)
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "Build SLP failed: store group "
+			   "size not a multiple of the vector size "
+			   "in basic block SLP\n");
+	  /* Fatal mismatch.  */
+	  matches[group_size/max_nunits * max_nunits] = false;
+	  vect_free_slp_tree (node);
+	  loads.release ();
+	}
+      else
+	{
 	  /* Create a new SLP instance.  */
 	  new_instance = XNEW (struct _slp_instance);
 	  SLP_INSTANCE_TREE (new_instance) = node;
@@ -1842,8 +1848,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
 		(vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (load_node)[0]));
 	      FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (load_node), j, load)
 		{
-	      int load_place
-		= vect_get_place_in_interleaving_chain (load, first_stmt);
+		  int load_place = vect_get_place_in_interleaving_chain
+				     (load, first_stmt);
 		  gcc_assert (load_place != -1);
 		  if (load_place != j)
 		    this_load_permuted = true;
@@ -1873,7 +1879,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
 		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 				       "Build SLP failed: unsupported load "
 				       "permutation ");
-                  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+		      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,
+					TDF_SLIM, stmt, 0);
 		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
 		    }
 		  vect_free_slp_instance (new_instance);
@@ -1881,7 +1888,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
 		}
 	    }
 
-      /* If the loads and stores can be handled with load/store-lane
+	  /* If the loads and stores can be handled with load/store-lan
 	     instructions do not generate this SLP instance.  */
 	  if (is_a <loop_vec_info> (vinfo)
 	      && loads_permuted
@@ -1893,7 +1900,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
 		  gimple *first_stmt = GROUP_FIRST_ELEMENT
 		    (vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (load_node)[0]));
 		  stmt_vec_info stmt_vinfo = vinfo_for_stmt (first_stmt);
-	      /* Use SLP for strided accesses (or if we can't load-lanes).  */
+		  /* Use SLP for strided accesses (or if we
+		     can't load-lanes).  */
 		  if (STMT_VINFO_STRIDED_P (stmt_vinfo)
 		    || ! vect_load_lanes_supported
 			  (STMT_VINFO_VECTYPE (stmt_vinfo),
@@ -1922,11 +1930,14 @@ vect_analyze_slp_instance (vec_info *vinfo,
 
 	  return true;
 	}
-
+    }
+  else
+    {
       /* Failed to SLP.  */
       /* Free the allocated memory.  */
       scalar_stmts.release ();
       loads.release ();
+    }
 
   /* For basic block SLP, try to break the group up into multiples of the
      vector size.  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch V2] Fix SLP PR58135.
  2016-05-18 15:30   ` Kumar, Venkataramanan
@ 2016-05-19 10:38     ` Richard Biener
  2016-05-23  9:55       ` Kumar, Venkataramanan
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2016-05-19 10:38 UTC (permalink / raw)
  To: Kumar, Venkataramanan; +Cc: gcc-patches

On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
> Hi Richard,
>
>> -----Original Message-----
>> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> Sent: Tuesday, May 17, 2016 5:40 PM
>> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [Patch V2] Fix SLP PR58135.
>>
>> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan
>> <Venkataramanan.Kumar@amd.com> wrote:
>> > Hi Richard,
>> >
>> > I created the patch by passing -b option to git. Now the patch is more
>> readable.
>> >
>> > As per your suggestion I tried to fix the PR by splitting the SLP store group at
>> vector boundary after the SLP tree is built.
>> >
>> > Boot strap PASSED on x86_64.
>> > Checked the patch with check_GNU_style.sh.
>> >
>> > The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence it
>> generated 2 more vzeroupper.
>> > As recommended I adjusted the test case by adding -fno-tree-slp-vectorize
>> to make it as expected after loop vectorization.
>> >
>> > The following tests are now passing.
>> >
>> > ------ Snip-----
>> > Tests that now work, but didn't before:
>> >
>> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects  scan-tree-dump-times
>> > slp2 "basic block vectorized" 1
>> >
>> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block
>> > vectorized" 1
>> >
>> > New tests that PASS:
>> >
>> > gcc.dg/vect/pr58135.c (test for excess errors) gcc.dg/vect/pr58135.c
>> > -flto -ffat-lto-objects (test for excess errors)
>> >
>> > ------ Snip-----
>> >
>> > ChangeLog
>> >
>> > 2016-05-14  Venkataramanan Kumar
>> <Venkataramanan.kumar@amd.com>
>> >      PR tree-optimization/58135
>> >     * tree-vect-slp.c:  When group size is not multiple of vector size,
>> >      allow splitting of store group at vector boundary.
>> >
>> > Test suite  ChangeLog
>> > 2016-05-14  Venkataramanan Kumar
>> <Venkataramanan.kumar@amd.com>
>> >     * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL.
>> >     * gcc.dg/vect/pr58135.c:  Add new.
>> >     * gfortran.dg/pr46519-1.f: Adjust test case.
>> >
>> > The attached patch Ok for trunk?
>>
>>
>> Please avoid the excessive vertical space around the vect_build_slp_tree call.
> Yes fixed in the attached patch.
>>
>> +      /* Calculate the unrolling factor.  */
>> +      unrolling_factor = least_common_multiple
>> +                         (nunits, group_size) / group_size;
>> ...
>> +      else
>>         {
>>           /* Calculate the unrolling factor based on the smallest type.  */
>>           if (max_nunits > nunits)
>> -        unrolling_factor = least_common_multiple (max_nunits, group_size)
>> -                           / group_size;
>> +           unrolling_factor
>> +               = least_common_multiple (max_nunits,
>> + group_size)/group_size;
>>
>> please compute the "correct" unroll factor immediately and move the
>> "unrolling of BB required" error into the if() case by post-poning the nunits <
>> group_size check (and use max_nunits here).
>>
> Yes fixed in the attached patch.
>
>> +      if (is_a <bb_vec_info> (vinfo)
>> +         && nunits < group_size
>> +         && unrolling_factor != 1
>> +         && is_a <bb_vec_info> (vinfo))
>> +       {
>> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +                          "Build SLP failed: store group "
>> +                          "size not a multiple of the vector size "
>> +                          "in basic block SLP\n");
>> +         /* Fatal mismatch.  */
>> +         matches[nunits] = false;
>>
>> this is too pessimistic - you want to add the extra 'false' at group_size /
>> max_nunits * max_nunits.
> Yes fixed in attached patch.
>
>>
>> It looks like you leak 'node' in the if () path as well.  You need
>>
>>           vect_free_slp_tree (node);
>>           loads.release ();
>>
>> thus treat it as a failure case.
>
> Yes fixed. I added an else part before scalar_stmts.release call for the case when SLP tree is not built. This avoids double freeing.
> Bootstrapped and reg tested on X86_64.
>
> Ok for trunk ?

+      /*Calculate the unrolling factor based on the smallest type.  */
+      unrolling_factor

Space after /* missing.

+      unrolling_factor
+       = least_common_multiple (max_nunits, group_size)/group_size;

spaces before/after the '/'

+      if (max_nunits > nunits
+         && unrolling_factor != 1
+         && is_a <bb_vec_info> (vinfo))
        {
          if (dump_enabled_p ())
-            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                            "Build SLP failed: unrolling required in basic"
-                            " block SLP\n");
+             dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+                              vect_location,
+                              "Build SLP failed: unrolling "
+                              "required in basic block SLP\n");
          vect_free_slp_tree (node);
          loads.release ();
          return false;
        }

+      if (is_a <bb_vec_info> (vinfo)
+         && max_nunits < group_size
+         && unrolling_factor != 1)

please merge these.  I think you have a not covered case of max_nunits
== nunits, max_nunits > group_size.
So do

    if (unrolling_factor != 1
        && is_a <...>)
     {
        if (max_nunits >= group_size)
          {
             first error
          }
        ... signal fatal mismatch instead...
     }
    else
     {

Ok with that change.

Thanks,
Richard.

>>
>> Thanks,
>> Richard.
>>
>> > Regards,
>> > Venkat.
>> >
>
> Regards,
> Venkat.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [Patch V2] Fix SLP PR58135.
  2016-05-19 10:38     ` Richard Biener
@ 2016-05-23  9:55       ` Kumar, Venkataramanan
  2016-05-24 16:17         ` Christophe Lyon
  0 siblings, 1 reply; 8+ messages in thread
From: Kumar, Venkataramanan @ 2016-05-23  9:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Hi Richard,

> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Thursday, May 19, 2016 4:08 PM
> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [Patch V2] Fix SLP PR58135.
> 
> On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan
> <Venkataramanan.Kumar@amd.com> wrote:
> > Hi Richard,
> >
> >> -----Original Message-----
> >> From: Richard Biener [mailto:richard.guenther@gmail.com]
> >> Sent: Tuesday, May 17, 2016 5:40 PM
> >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
> >> Cc: gcc-patches@gcc.gnu.org
> >> Subject: Re: [Patch V2] Fix SLP PR58135.
> >>
> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan
> >> <Venkataramanan.Kumar@amd.com> wrote:
> >> > Hi Richard,
> >> >
> >> > I created the patch by passing -b option to git. Now the patch is
> >> > more
> >> readable.
> >> >
> >> > As per your suggestion I tried to fix the PR by splitting the SLP
> >> > store group at
> >> vector boundary after the SLP tree is built.
> >> >
> >> > Boot strap PASSED on x86_64.
> >> > Checked the patch with check_GNU_style.sh.
> >> >
> >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence
> >> > it
> >> generated 2 more vzeroupper.
> >> > As recommended I adjusted the test case by adding
> >> > -fno-tree-slp-vectorize
> >> to make it as expected after loop vectorization.
> >> >
> >> > The following tests are now passing.
> >> >
> >> > ------ Snip-----
> >> > Tests that now work, but didn't before:
> >> >
> >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects
> >> > scan-tree-dump-times
> >> > slp2 "basic block vectorized" 1
> >> >
> >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block
> >> > vectorized" 1
> >> >
> >> > New tests that PASS:
> >> >
> >> > gcc.dg/vect/pr58135.c (test for excess errors)
> >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess
> >> > errors)
> >> >
> >> > ------ Snip-----
> >> >
> >> > ChangeLog
> >> >
> >> > 2016-05-14  Venkataramanan Kumar
> >> <Venkataramanan.kumar@amd.com>
> >> >      PR tree-optimization/58135
> >> >     * tree-vect-slp.c:  When group size is not multiple of vector size,
> >> >      allow splitting of store group at vector boundary.
> >> >
> >> > Test suite  ChangeLog
> >> > 2016-05-14  Venkataramanan Kumar
> >> <Venkataramanan.kumar@amd.com>
> >> >     * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL.
> >> >     * gcc.dg/vect/pr58135.c:  Add new.
> >> >     * gfortran.dg/pr46519-1.f: Adjust test case.
> >> >
> >> > The attached patch Ok for trunk?
> >>
> >>
> >> Please avoid the excessive vertical space around the vect_build_slp_tree
> call.
> > Yes fixed in the attached patch.
> >>
> >> +      /* Calculate the unrolling factor.  */
> >> +      unrolling_factor = least_common_multiple
> >> +                         (nunits, group_size) / group_size;
> >> ...
> >> +      else
> >>         {
> >>           /* Calculate the unrolling factor based on the smallest type.  */
> >>           if (max_nunits > nunits)
> >> -        unrolling_factor = least_common_multiple (max_nunits, group_size)
> >> -                           / group_size;
> >> +           unrolling_factor
> >> +               = least_common_multiple (max_nunits,
> >> + group_size)/group_size;
> >>
> >> please compute the "correct" unroll factor immediately and move the
> >> "unrolling of BB required" error into the if() case by post-poning
> >> the nunits < group_size check (and use max_nunits here).
> >>
> > Yes fixed in the attached patch.
> >
> >> +      if (is_a <bb_vec_info> (vinfo)
> >> +         && nunits < group_size
> >> +         && unrolling_factor != 1
> >> +         && is_a <bb_vec_info> (vinfo))
> >> +       {
> >> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> +                          "Build SLP failed: store group "
> >> +                          "size not a multiple of the vector size "
> >> +                          "in basic block SLP\n");
> >> +         /* Fatal mismatch.  */
> >> +         matches[nunits] = false;
> >>
> >> this is too pessimistic - you want to add the extra 'false' at
> >> group_size / max_nunits * max_nunits.
> > Yes fixed in attached patch.
> >
> >>
> >> It looks like you leak 'node' in the if () path as well.  You need
> >>
> >>           vect_free_slp_tree (node);
> >>           loads.release ();
> >>
> >> thus treat it as a failure case.
> >
> > Yes fixed. I added an else part before scalar_stmts.release call for the case
> when SLP tree is not built. This avoids double freeing.
> > Bootstrapped and reg tested on X86_64.
> >
> > Ok for trunk ?
> 
> +      /*Calculate the unrolling factor based on the smallest type.  */
> +      unrolling_factor
> 
> Space after /* missing.
> 
> +      unrolling_factor
> +       = least_common_multiple (max_nunits, group_size)/group_size;
> 
> spaces before/after the '/'
> 
> +      if (max_nunits > nunits
> +         && unrolling_factor != 1
> +         && is_a <bb_vec_info> (vinfo))
>         {
>           if (dump_enabled_p ())
> -            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                            "Build SLP failed: unrolling required in basic"
> -                            " block SLP\n");
> +             dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +                              vect_location,
> +                              "Build SLP failed: unrolling "
> +                              "required in basic block SLP\n");
>           vect_free_slp_tree (node);
>           loads.release ();
>           return false;
>         }
> 
> +      if (is_a <bb_vec_info> (vinfo)
> +         && max_nunits < group_size
> +         && unrolling_factor != 1)
> 
> please merge these.  I think you have a not covered case of max_nunits ==
> nunits, max_nunits > group_size.
> So do
> 
>     if (unrolling_factor != 1
>         && is_a <...>)
>      {
>         if (max_nunits >= group_size)
>           {
>              first error
>           }
>         ... signal fatal mismatch instead...
>      }
>     else
>      {
> 
> Ok with that change.

Thank you for the review. I updated and patch and up streamed it to trunk.
Ref: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=236582

Regards,
Venkat.

> 
> Thanks,
> Richard.
> 
> >>
> >> Thanks,
> >> Richard.
> >>
> >> > Regards,
> >> > Venkat.
> >> >
> >
> > Regards,
> > Venkat.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch V2] Fix SLP PR58135.
  2016-05-23  9:55       ` Kumar, Venkataramanan
@ 2016-05-24 16:17         ` Christophe Lyon
  2016-05-25  8:21           ` Kumar, Venkataramanan
  0 siblings, 1 reply; 8+ messages in thread
From: Christophe Lyon @ 2016-05-24 16:17 UTC (permalink / raw)
  To: Kumar, Venkataramanan; +Cc: Richard Biener, gcc-patches

Hi Venkat,


On 23 May 2016 at 11:54, Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
> Hi Richard,
>
>> -----Original Message-----
>> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> Sent: Thursday, May 19, 2016 4:08 PM
>> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [Patch V2] Fix SLP PR58135.
>>
>> On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan
>> <Venkataramanan.Kumar@amd.com> wrote:
>> > Hi Richard,
>> >
>> >> -----Original Message-----
>> >> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> >> Sent: Tuesday, May 17, 2016 5:40 PM
>> >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
>> >> Cc: gcc-patches@gcc.gnu.org
>> >> Subject: Re: [Patch V2] Fix SLP PR58135.
>> >>
>> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan
>> >> <Venkataramanan.Kumar@amd.com> wrote:
>> >> > Hi Richard,
>> >> >
>> >> > I created the patch by passing -b option to git. Now the patch is
>> >> > more
>> >> readable.
>> >> >
>> >> > As per your suggestion I tried to fix the PR by splitting the SLP
>> >> > store group at
>> >> vector boundary after the SLP tree is built.
>> >> >
>> >> > Boot strap PASSED on x86_64.
>> >> > Checked the patch with check_GNU_style.sh.
>> >> >
>> >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence
>> >> > it
>> >> generated 2 more vzeroupper.
>> >> > As recommended I adjusted the test case by adding
>> >> > -fno-tree-slp-vectorize
>> >> to make it as expected after loop vectorization.
>> >> >
>> >> > The following tests are now passing.
>> >> >
>> >> > ------ Snip-----
>> >> > Tests that now work, but didn't before:
>> >> >
>> >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects
>> >> > scan-tree-dump-times
>> >> > slp2 "basic block vectorized" 1
>> >> >
>> >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block
>> >> > vectorized" 1
>> >> >
>> >> > New tests that PASS:
>> >> >
>> >> > gcc.dg/vect/pr58135.c (test for excess errors)
>> >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess
>> >> > errors)
>> >> >
>> >> > ------ Snip-----
>> >> >
>> >> > ChangeLog
>> >> >
>> >> > 2016-05-14  Venkataramanan Kumar
>> >> <Venkataramanan.kumar@amd.com>
>> >> >      PR tree-optimization/58135
>> >> >     * tree-vect-slp.c:  When group size is not multiple of vector size,
>> >> >      allow splitting of store group at vector boundary.
>> >> >
>> >> > Test suite  ChangeLog
>> >> > 2016-05-14  Venkataramanan Kumar
>> >> <Venkataramanan.kumar@amd.com>
>> >> >     * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL.
>> >> >     * gcc.dg/vect/pr58135.c:  Add new.
>> >> >     * gfortran.dg/pr46519-1.f: Adjust test case.
>> >> >
>> >> > The attached patch Ok for trunk?
>> >>
>> >>
>> >> Please avoid the excessive vertical space around the vect_build_slp_tree
>> call.
>> > Yes fixed in the attached patch.
>> >>
>> >> +      /* Calculate the unrolling factor.  */
>> >> +      unrolling_factor = least_common_multiple
>> >> +                         (nunits, group_size) / group_size;
>> >> ...
>> >> +      else
>> >>         {
>> >>           /* Calculate the unrolling factor based on the smallest type.  */
>> >>           if (max_nunits > nunits)
>> >> -        unrolling_factor = least_common_multiple (max_nunits, group_size)
>> >> -                           / group_size;
>> >> +           unrolling_factor
>> >> +               = least_common_multiple (max_nunits,
>> >> + group_size)/group_size;
>> >>
>> >> please compute the "correct" unroll factor immediately and move the
>> >> "unrolling of BB required" error into the if() case by post-poning
>> >> the nunits < group_size check (and use max_nunits here).
>> >>
>> > Yes fixed in the attached patch.
>> >
>> >> +      if (is_a <bb_vec_info> (vinfo)
>> >> +         && nunits < group_size
>> >> +         && unrolling_factor != 1
>> >> +         && is_a <bb_vec_info> (vinfo))
>> >> +       {
>> >> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> >> +                          "Build SLP failed: store group "
>> >> +                          "size not a multiple of the vector size "
>> >> +                          "in basic block SLP\n");
>> >> +         /* Fatal mismatch.  */
>> >> +         matches[nunits] = false;
>> >>
>> >> this is too pessimistic - you want to add the extra 'false' at
>> >> group_size / max_nunits * max_nunits.
>> > Yes fixed in attached patch.
>> >
>> >>
>> >> It looks like you leak 'node' in the if () path as well.  You need
>> >>
>> >>           vect_free_slp_tree (node);
>> >>           loads.release ();
>> >>
>> >> thus treat it as a failure case.
>> >
>> > Yes fixed. I added an else part before scalar_stmts.release call for the case
>> when SLP tree is not built. This avoids double freeing.
>> > Bootstrapped and reg tested on X86_64.
>> >

This patch is causing regressions on armeb:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/236591/report-build-info.html
The following fortran tests now fail at runtime:

  gfortran.dg/c_f_pointer_logical.f03   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
  gfortran.dg/c_f_pointer_logical.f03   -O3 -g  execution test
  gfortran.dg/intrinsic_pack_1.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
  gfortran.dg/intrinsic_pack_1.f90   -O3 -g  execution test

Can you have a look?

Thanks,

Christophe.



>> > Ok for trunk ?
>>
>> +      /*Calculate the unrolling factor based on the smallest type.  */
>> +      unrolling_factor
>>
>> Space after /* missing.
>>
>> +      unrolling_factor
>> +       = least_common_multiple (max_nunits, group_size)/group_size;
>>
>> spaces before/after the '/'
>>
>> +      if (max_nunits > nunits
>> +         && unrolling_factor != 1
>> +         && is_a <bb_vec_info> (vinfo))
>>         {
>>           if (dump_enabled_p ())
>> -            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> -                            "Build SLP failed: unrolling required in basic"
>> -                            " block SLP\n");
>> +             dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>> +                              vect_location,
>> +                              "Build SLP failed: unrolling "
>> +                              "required in basic block SLP\n");
>>           vect_free_slp_tree (node);
>>           loads.release ();
>>           return false;
>>         }
>>
>> +      if (is_a <bb_vec_info> (vinfo)
>> +         && max_nunits < group_size
>> +         && unrolling_factor != 1)
>>
>> please merge these.  I think you have a not covered case of max_nunits ==
>> nunits, max_nunits > group_size.
>> So do
>>
>>     if (unrolling_factor != 1
>>         && is_a <...>)
>>      {
>>         if (max_nunits >= group_size)
>>           {
>>              first error
>>           }
>>         ... signal fatal mismatch instead...
>>      }
>>     else
>>      {
>>
>> Ok with that change.
>
> Thank you for the review. I updated and patch and up streamed it to trunk.
> Ref: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=236582
>
> Regards,
> Venkat.
>
>>
>> Thanks,
>> Richard.
>>
>> >>
>> >> Thanks,
>> >> Richard.
>> >>
>> >> > Regards,
>> >> > Venkat.
>> >> >
>> >
>> > Regards,
>> > Venkat.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [Patch V2] Fix SLP PR58135.
  2016-05-24 16:17         ` Christophe Lyon
@ 2016-05-25  8:21           ` Kumar, Venkataramanan
  2016-05-25 10:40             ` Christophe Lyon
  0 siblings, 1 reply; 8+ messages in thread
From: Kumar, Venkataramanan @ 2016-05-25  8:21 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Richard Biener, gcc-patches

Hi Christophe, 

> -----Original Message-----
> From: Christophe Lyon [mailto:christophe.lyon@linaro.org]
> Sent: Tuesday, May 24, 2016 8:45 PM
> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch V2] Fix SLP PR58135.
> 
> Hi Venkat,
> 
> 
> On 23 May 2016 at 11:54, Kumar, Venkataramanan
> <Venkataramanan.Kumar@amd.com> wrote:
> > Hi Richard,
> >
> >> -----Original Message-----
> >> From: Richard Biener [mailto:richard.guenther@gmail.com]
> >> Sent: Thursday, May 19, 2016 4:08 PM
> >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
> >> Cc: gcc-patches@gcc.gnu.org
> >> Subject: Re: [Patch V2] Fix SLP PR58135.
> >>
> >> On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan
> >> <Venkataramanan.Kumar@amd.com> wrote:
> >> > Hi Richard,
> >> >
> >> >> -----Original Message-----
> >> >> From: Richard Biener [mailto:richard.guenther@gmail.com]
> >> >> Sent: Tuesday, May 17, 2016 5:40 PM
> >> >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
> >> >> Cc: gcc-patches@gcc.gnu.org
> >> >> Subject: Re: [Patch V2] Fix SLP PR58135.
> >> >>
> >> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan
> >> >> <Venkataramanan.Kumar@amd.com> wrote:
> >> >> > Hi Richard,
> >> >> >
> >> >> > I created the patch by passing -b option to git. Now the patch
> >> >> > is more
> >> >> readable.
> >> >> >
> >> >> > As per your suggestion I tried to fix the PR by splitting the
> >> >> > SLP store group at
> >> >> vector boundary after the SLP tree is built.
> >> >> >
> >> >> > Boot strap PASSED on x86_64.
> >> >> > Checked the patch with check_GNU_style.sh.
> >> >> >
> >> >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization.
> >> >> > Hence it
> >> >> generated 2 more vzeroupper.
> >> >> > As recommended I adjusted the test case by adding
> >> >> > -fno-tree-slp-vectorize
> >> >> to make it as expected after loop vectorization.
> >> >> >
> >> >> > The following tests are now passing.
> >> >> >
> >> >> > ------ Snip-----
> >> >> > Tests that now work, but didn't before:
> >> >> >
> >> >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects
> >> >> > scan-tree-dump-times
> >> >> > slp2 "basic block vectorized" 1
> >> >> >
> >> >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block
> >> >> > vectorized" 1
> >> >> >
> >> >> > New tests that PASS:
> >> >> >
> >> >> > gcc.dg/vect/pr58135.c (test for excess errors)
> >> >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess
> >> >> > errors)
> >> >> >
> >> >> > ------ Snip-----
> >> >> >
> >> >> > ChangeLog
> >> >> >
> >> >> > 2016-05-14  Venkataramanan Kumar
> >> >> <Venkataramanan.kumar@amd.com>
> >> >> >      PR tree-optimization/58135
> >> >> >     * tree-vect-slp.c:  When group size is not multiple of vector size,
> >> >> >      allow splitting of store group at vector boundary.
> >> >> >
> >> >> > Test suite  ChangeLog
> >> >> > 2016-05-14  Venkataramanan Kumar
> >> >> <Venkataramanan.kumar@amd.com>
> >> >> >     * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL.
> >> >> >     * gcc.dg/vect/pr58135.c:  Add new.
> >> >> >     * gfortran.dg/pr46519-1.f: Adjust test case.
> >> >> >
> >> >> > The attached patch Ok for trunk?
> >> >>
> >> >>
> >> >> Please avoid the excessive vertical space around the
> >> >> vect_build_slp_tree
> >> call.
> >> > Yes fixed in the attached patch.
> >> >>
> >> >> +      /* Calculate the unrolling factor.  */
> >> >> +      unrolling_factor = least_common_multiple
> >> >> +                         (nunits, group_size) / group_size;
> >> >> ...
> >> >> +      else
> >> >>         {
> >> >>           /* Calculate the unrolling factor based on the smallest type.  */
> >> >>           if (max_nunits > nunits)
> >> >> -        unrolling_factor = least_common_multiple (max_nunits,
> group_size)
> >> >> -                           / group_size;
> >> >> +           unrolling_factor
> >> >> +               = least_common_multiple (max_nunits,
> >> >> + group_size)/group_size;
> >> >>
> >> >> please compute the "correct" unroll factor immediately and move
> >> >> the "unrolling of BB required" error into the if() case by
> >> >> post-poning the nunits < group_size check (and use max_nunits here).
> >> >>
> >> > Yes fixed in the attached patch.
> >> >
> >> >> +      if (is_a <bb_vec_info> (vinfo)
> >> >> +         && nunits < group_size
> >> >> +         && unrolling_factor != 1
> >> >> +         && is_a <bb_vec_info> (vinfo))
> >> >> +       {
> >> >> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> >> +                          "Build SLP failed: store group "
> >> >> +                          "size not a multiple of the vector size "
> >> >> +                          "in basic block SLP\n");
> >> >> +         /* Fatal mismatch.  */
> >> >> +         matches[nunits] = false;
> >> >>
> >> >> this is too pessimistic - you want to add the extra 'false' at
> >> >> group_size / max_nunits * max_nunits.
> >> > Yes fixed in attached patch.
> >> >
> >> >>
> >> >> It looks like you leak 'node' in the if () path as well.  You need
> >> >>
> >> >>           vect_free_slp_tree (node);
> >> >>           loads.release ();
> >> >>
> >> >> thus treat it as a failure case.
> >> >
> >> > Yes fixed. I added an else part before scalar_stmts.release call
> >> > for the case
> >> when SLP tree is not built. This avoids double freeing.
> >> > Bootstrapped and reg tested on X86_64.
> >> >
> 
> This patch is causing regressions on armeb:
> http://people.linaro.org/~christophe.lyon/cross-
> validation/gcc/trunk/236591/report-build-info.html
> The following fortran tests now fail at runtime:
> 
>   gfortran.dg/c_f_pointer_logical.f03   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>   gfortran.dg/c_f_pointer_logical.f03   -O3 -g  execution test
>   gfortran.dg/intrinsic_pack_1.f90   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>   gfortran.dg/intrinsic_pack_1.f90   -O3 -g  execution test
> 
> Can you have a look?
> 

Sure. I will take a look.

> Thanks,
> 
> Christophe.
> 
> 
> 
> >> > Ok for trunk ?
> >>
> >> +      /*Calculate the unrolling factor based on the smallest type.  */
> >> +      unrolling_factor
> >>
> >> Space after /* missing.
> >>
> >> +      unrolling_factor
> >> +       = least_common_multiple (max_nunits, group_size)/group_size;
> >>
> >> spaces before/after the '/'
> >>
> >> +      if (max_nunits > nunits
> >> +         && unrolling_factor != 1
> >> +         && is_a <bb_vec_info> (vinfo))
> >>         {
> >>           if (dump_enabled_p ())
> >> -            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> -                            "Build SLP failed: unrolling required in basic"
> >> -                            " block SLP\n");
> >> +             dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> >> +                              vect_location,
> >> +                              "Build SLP failed: unrolling "
> >> +                              "required in basic block SLP\n");
> >>           vect_free_slp_tree (node);
> >>           loads.release ();
> >>           return false;
> >>         }
> >>
> >> +      if (is_a <bb_vec_info> (vinfo)
> >> +         && max_nunits < group_size
> >> +         && unrolling_factor != 1)
> >>
> >> please merge these.  I think you have a not covered case of
> >> max_nunits == nunits, max_nunits > group_size.
> >> So do
> >>
> >>     if (unrolling_factor != 1
> >>         && is_a <...>)
> >>      {
> >>         if (max_nunits >= group_size)
> >>           {
> >>              first error
> >>           }
> >>         ... signal fatal mismatch instead...
> >>      }
> >>     else
> >>      {
> >>
> >> Ok with that change.
> >
> > Thank you for the review. I updated and patch and up streamed it to trunk.
> > Ref: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=236582
> >
> > Regards,
> > Venkat.

Regards,
Venkat.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch V2] Fix SLP PR58135.
  2016-05-25  8:21           ` Kumar, Venkataramanan
@ 2016-05-25 10:40             ` Christophe Lyon
  0 siblings, 0 replies; 8+ messages in thread
From: Christophe Lyon @ 2016-05-25 10:40 UTC (permalink / raw)
  To: Kumar, Venkataramanan; +Cc: Richard Biener, gcc-patches

On 25 May 2016 at 05:29, Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
> Hi Christophe,
>
>> -----Original Message-----
>> From: Christophe Lyon [mailto:christophe.lyon@linaro.org]
>> Sent: Tuesday, May 24, 2016 8:45 PM
>> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
>> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org
>> Subject: Re: [Patch V2] Fix SLP PR58135.
>>
>> Hi Venkat,
>>
>>
>> On 23 May 2016 at 11:54, Kumar, Venkataramanan
>> <Venkataramanan.Kumar@amd.com> wrote:
>> > Hi Richard,
>> >
>> >> -----Original Message-----
>> >> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> >> Sent: Thursday, May 19, 2016 4:08 PM
>> >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
>> >> Cc: gcc-patches@gcc.gnu.org
>> >> Subject: Re: [Patch V2] Fix SLP PR58135.
>> >>
>> >> On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan
>> >> <Venkataramanan.Kumar@amd.com> wrote:
>> >> > Hi Richard,
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> >> >> Sent: Tuesday, May 17, 2016 5:40 PM
>> >> >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
>> >> >> Cc: gcc-patches@gcc.gnu.org
>> >> >> Subject: Re: [Patch V2] Fix SLP PR58135.
>> >> >>
>> >> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan
>> >> >> <Venkataramanan.Kumar@amd.com> wrote:
>> >> >> > Hi Richard,
>> >> >> >
>> >> >> > I created the patch by passing -b option to git. Now the patch
>> >> >> > is more
>> >> >> readable.
>> >> >> >
>> >> >> > As per your suggestion I tried to fix the PR by splitting the
>> >> >> > SLP store group at
>> >> >> vector boundary after the SLP tree is built.
>> >> >> >
>> >> >> > Boot strap PASSED on x86_64.
>> >> >> > Checked the patch with check_GNU_style.sh.
>> >> >> >
>> >> >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization.
>> >> >> > Hence it
>> >> >> generated 2 more vzeroupper.
>> >> >> > As recommended I adjusted the test case by adding
>> >> >> > -fno-tree-slp-vectorize
>> >> >> to make it as expected after loop vectorization.
>> >> >> >
>> >> >> > The following tests are now passing.
>> >> >> >
>> >> >> > ------ Snip-----
>> >> >> > Tests that now work, but didn't before:
>> >> >> >
>> >> >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects
>> >> >> > scan-tree-dump-times
>> >> >> > slp2 "basic block vectorized" 1
>> >> >> >
>> >> >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block
>> >> >> > vectorized" 1
>> >> >> >
>> >> >> > New tests that PASS:
>> >> >> >
>> >> >> > gcc.dg/vect/pr58135.c (test for excess errors)
>> >> >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess
>> >> >> > errors)
>> >> >> >
>> >> >> > ------ Snip-----
>> >> >> >
>> >> >> > ChangeLog
>> >> >> >
>> >> >> > 2016-05-14  Venkataramanan Kumar
>> >> >> <Venkataramanan.kumar@amd.com>
>> >> >> >      PR tree-optimization/58135
>> >> >> >     * tree-vect-slp.c:  When group size is not multiple of vector size,
>> >> >> >      allow splitting of store group at vector boundary.
>> >> >> >
>> >> >> > Test suite  ChangeLog
>> >> >> > 2016-05-14  Venkataramanan Kumar
>> >> >> <Venkataramanan.kumar@amd.com>
>> >> >> >     * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL.
>> >> >> >     * gcc.dg/vect/pr58135.c:  Add new.
>> >> >> >     * gfortran.dg/pr46519-1.f: Adjust test case.
>> >> >> >
>> >> >> > The attached patch Ok for trunk?
>> >> >>
>> >> >>
>> >> >> Please avoid the excessive vertical space around the
>> >> >> vect_build_slp_tree
>> >> call.
>> >> > Yes fixed in the attached patch.
>> >> >>
>> >> >> +      /* Calculate the unrolling factor.  */
>> >> >> +      unrolling_factor = least_common_multiple
>> >> >> +                         (nunits, group_size) / group_size;
>> >> >> ...
>> >> >> +      else
>> >> >>         {
>> >> >>           /* Calculate the unrolling factor based on the smallest type.  */
>> >> >>           if (max_nunits > nunits)
>> >> >> -        unrolling_factor = least_common_multiple (max_nunits,
>> group_size)
>> >> >> -                           / group_size;
>> >> >> +           unrolling_factor
>> >> >> +               = least_common_multiple (max_nunits,
>> >> >> + group_size)/group_size;
>> >> >>
>> >> >> please compute the "correct" unroll factor immediately and move
>> >> >> the "unrolling of BB required" error into the if() case by
>> >> >> post-poning the nunits < group_size check (and use max_nunits here).
>> >> >>
>> >> > Yes fixed in the attached patch.
>> >> >
>> >> >> +      if (is_a <bb_vec_info> (vinfo)
>> >> >> +         && nunits < group_size
>> >> >> +         && unrolling_factor != 1
>> >> >> +         && is_a <bb_vec_info> (vinfo))
>> >> >> +       {
>> >> >> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> >> >> +                          "Build SLP failed: store group "
>> >> >> +                          "size not a multiple of the vector size "
>> >> >> +                          "in basic block SLP\n");
>> >> >> +         /* Fatal mismatch.  */
>> >> >> +         matches[nunits] = false;
>> >> >>
>> >> >> this is too pessimistic - you want to add the extra 'false' at
>> >> >> group_size / max_nunits * max_nunits.
>> >> > Yes fixed in attached patch.
>> >> >
>> >> >>
>> >> >> It looks like you leak 'node' in the if () path as well.  You need
>> >> >>
>> >> >>           vect_free_slp_tree (node);
>> >> >>           loads.release ();
>> >> >>
>> >> >> thus treat it as a failure case.
>> >> >
>> >> > Yes fixed. I added an else part before scalar_stmts.release call
>> >> > for the case
>> >> when SLP tree is not built. This avoids double freeing.
>> >> > Bootstrapped and reg tested on X86_64.
>> >> >
>>
>> This patch is causing regressions on armeb:
>> http://people.linaro.org/~christophe.lyon/cross-
>> validation/gcc/trunk/236591/report-build-info.html
>> The following fortran tests now fail at runtime:
>>
>>   gfortran.dg/c_f_pointer_logical.f03   -O3 -fomit-frame-pointer
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>>   gfortran.dg/c_f_pointer_logical.f03   -O3 -g  execution test
>>   gfortran.dg/intrinsic_pack_1.f90   -O3 -fomit-frame-pointer
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>>   gfortran.dg/intrinsic_pack_1.f90   -O3 -g  execution test
>>
>> Can you have a look?
>>
>
> Sure. I will take a look.
>
Thanks, I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71270
to keep track of this.

Christophe

>> Thanks,
>>
>> Christophe.
>>
>>
>>
>> >> > Ok for trunk ?
>> >>
>> >> +      /*Calculate the unrolling factor based on the smallest type.  */
>> >> +      unrolling_factor
>> >>
>> >> Space after /* missing.
>> >>
>> >> +      unrolling_factor
>> >> +       = least_common_multiple (max_nunits, group_size)/group_size;
>> >>
>> >> spaces before/after the '/'
>> >>
>> >> +      if (max_nunits > nunits
>> >> +         && unrolling_factor != 1
>> >> +         && is_a <bb_vec_info> (vinfo))
>> >>         {
>> >>           if (dump_enabled_p ())
>> >> -            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> >> -                            "Build SLP failed: unrolling required in basic"
>> >> -                            " block SLP\n");
>> >> +             dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>> >> +                              vect_location,
>> >> +                              "Build SLP failed: unrolling "
>> >> +                              "required in basic block SLP\n");
>> >>           vect_free_slp_tree (node);
>> >>           loads.release ();
>> >>           return false;
>> >>         }
>> >>
>> >> +      if (is_a <bb_vec_info> (vinfo)
>> >> +         && max_nunits < group_size
>> >> +         && unrolling_factor != 1)
>> >>
>> >> please merge these.  I think you have a not covered case of
>> >> max_nunits == nunits, max_nunits > group_size.
>> >> So do
>> >>
>> >>     if (unrolling_factor != 1
>> >>         && is_a <...>)
>> >>      {
>> >>         if (max_nunits >= group_size)
>> >>           {
>> >>              first error
>> >>           }
>> >>         ... signal fatal mismatch instead...
>> >>      }
>> >>     else
>> >>      {
>> >>
>> >> Ok with that change.
>> >
>> > Thank you for the review. I updated and patch and up streamed it to trunk.
>> > Ref: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=236582
>> >
>> > Regards,
>> > Venkat.
>
> Regards,
> Venkat.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-05-25  7:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-17 11:56 [Patch V2] Fix SLP PR58135 Kumar, Venkataramanan
2016-05-17 12:09 ` Richard Biener
2016-05-18 15:30   ` Kumar, Venkataramanan
2016-05-19 10:38     ` Richard Biener
2016-05-23  9:55       ` Kumar, Venkataramanan
2016-05-24 16:17         ` Christophe Lyon
2016-05-25  8:21           ` Kumar, Venkataramanan
2016-05-25 10:40             ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).