public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1
@ 2023-05-17  6:09 Kewen.Lin
  2023-05-17  6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin
  2023-05-17  6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener
  0 siblings, 2 replies; 10+ messages in thread
From: Kewen.Lin @ 2023-05-17  6:09 UTC (permalink / raw)
  To: GCC Patches
  Cc: Richard Biener, Richard Sandiford, Segher Boessenkool, Peter Bergner

Hi,

This patch is to refactor the handlings for the case (index
== count) in a loop of vect_transform_slp_perm_load_1, in
order to prepare a subsequent adjustment on *nperm.  This
patch doesn't have any functional changes.

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

BR,
Kewen
-----
gcc/ChangeLog:

	* tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
	handling on the case index == count.
---
 gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++----------------------
 1 file changed, 44 insertions(+), 45 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3b7a21724ec..e5c9d7e766e 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
 	noop_p = false;
       mask[index++] = mask_element;

-      if (index == count && !noop_p)
+      if (index == count)
 	{
-	  indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
-	  if (!can_vec_perm_const_p (mode, mode, indices))
+	  if (!noop_p)
 	    {
-	      if (dump_p)
+	      indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
+	      if (!can_vec_perm_const_p (mode, mode, indices))
 		{
-		  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
-				   vect_location,
-				   "unsupported vect permute { ");
-		  for (i = 0; i < count; ++i)
+		  if (dump_p)
 		    {
-		      dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
-		      dump_printf (MSG_MISSED_OPTIMIZATION, " ");
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "unsupported vect permute { ");
+		      for (i = 0; i < count; ++i)
+			{
+			  dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
+			  dump_printf (MSG_MISSED_OPTIMIZATION, " ");
+			}
+		      dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
 		    }
-		  dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
+		  gcc_assert (analyze_only);
+		  return false;
 		}
-	      gcc_assert (analyze_only);
-	      return false;
-	    }

-	  ++*n_perms;
-	}
+	      ++*n_perms;

-      if (index == count)
-	{
-	  if (!analyze_only)
-	    {
-	      tree mask_vec = NULL_TREE;
-
-	      if (! noop_p)
-		mask_vec = vect_gen_perm_mask_checked (vectype, indices);
+	      if (!analyze_only)
+		{
+		  tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);

-	      if (second_vec_index == -1)
-		second_vec_index = first_vec_index;
+		  if (second_vec_index == -1)
+		    second_vec_index = first_vec_index;

-	      for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
-		{
-		  /* Generate the permute statement if necessary.  */
-		  tree first_vec = dr_chain[first_vec_index + ri];
-		  tree second_vec = dr_chain[second_vec_index + ri];
-		  gimple *perm_stmt;
-		  if (! noop_p)
+		  for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
 		    {
-		      gassign *stmt = as_a <gassign *> (stmt_info->stmt);
+		      /* Generate the permute statement if necessary.  */
+		      tree first_vec = dr_chain[first_vec_index + ri];
+		      tree second_vec = dr_chain[second_vec_index + ri];
+		      gassign *stmt = as_a<gassign *> (stmt_info->stmt);
 		      tree perm_dest
 			= vect_create_destination_var (gimple_assign_lhs (stmt),
 						       vectype);
 		      perm_dest = make_ssa_name (perm_dest);
-		      perm_stmt
+		      gimple *perm_stmt
 			= gimple_build_assign (perm_dest, VEC_PERM_EXPR,
-					       first_vec, second_vec,
-					       mask_vec);
+					       first_vec, second_vec, mask_vec);
 		      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
 						   gsi);
 		      if (dce_chain)
@@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
 			  bitmap_set_bit (used_defs, first_vec_index + ri);
 			  bitmap_set_bit (used_defs, second_vec_index + ri);
 			}
+
+		      /* Store the vector statement in NODE.  */
+		      SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
+			= perm_stmt;
 		    }
-		  else
-		    {
-		      /* If mask was NULL_TREE generate the requested
-			 identity transform.  */
-		      perm_stmt = SSA_NAME_DEF_STMT (first_vec);
-		      if (dce_chain)
-			bitmap_set_bit (used_defs, first_vec_index + ri);
-		    }
+		}
+	    }
+	  else if (!analyze_only)
+	    {
+	      for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
+		{
+		  tree first_vec = dr_chain[first_vec_index + ri];
+		  /* If mask was NULL_TREE generate the requested
+		     identity transform.  */
+		  gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec);
+		  if (dce_chain)
+		    bitmap_set_bit (used_defs, first_vec_index + ri);

 		  /* Store the vector statement in NODE.  */
 		  SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
--
2.39.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
  2023-05-17  6:09 [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 Kewen.Lin
@ 2023-05-17  6:15 ` Kewen.Lin
  2023-05-22 13:44   ` Richard Biener
  2023-05-17  6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener
  1 sibling, 1 reply; 10+ messages in thread
From: Kewen.Lin @ 2023-05-17  6:15 UTC (permalink / raw)
  To: GCC Patches
  Cc: Richard Biener, Richard Sandiford, Segher Boessenkool, Peter Bergner

Hi,

Following Richi's suggestion in [1], I'm working on deferring
cost evaluation next to the transformation, this patch is
to enhance function vect_transform_slp_perm_load_1 which
could under-cost for vector permutation, since the costing
doesn't try to consider nvectors_per_build, it's inconsistent
with the transformation part.

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html

BR,
Kewen
-----
gcc/ChangeLog:

	* tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
	calculation on n_perms by considering nvectors_per_build.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
---
 .../vect/costmodel/ppc/costmodel-slp-perm.c   | 23 +++++++
 gcc/tree-vect-slp.cc                          | 66 ++++++++++---------
 2 files changed, 57 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
new file mode 100644
index 00000000000..e5c4dceddfb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* Specify power9 to ensure the vectorization is profitable
+   and test point stands, otherwise it could be not profitable
+   to vectorize.  */
+/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
+
+/* Verify we cost the exact count for required vec_perm.  */
+
+int x[1024], y[1024];
+
+void
+foo ()
+{
+  for (int i = 0; i < 512; ++i)
+    {
+      x[2 * i] = y[1023 - (2 * i)];
+      x[2 * i + 1] = y[1023 - (2 * i + 1)];
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index e5c9d7e766e..af9a6dd4fa9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,

   mode = TYPE_MODE (vectype);
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);

   /* Initialize the vect stmts of NODE to properly insert the generated
      stmts later.  */
   if (! analyze_only)
-    for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
-	 i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
+    for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
       SLP_TREE_VEC_STMTS (node).quick_push (NULL);

   /* Generate permutation masks for every NODE. Number of masks for each NODE
@@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
 	 (b) the permutes only need a single vector input.  */
       mask.new_vector (nunits, group_size, 3);
       nelts_to_build = mask.encoded_nelts ();
-      nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
+      /* It's possible to obtain zero nstmts during analyze_only, so make
+	 it at least one to ensure the later computation for n_perms
+	 proceed.  */
+      nvectors_per_build = nstmts > 0 ? nstmts : 1;
       in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
     }
   else
@@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
 		  return false;
 		}

-	      ++*n_perms;
-
+	      tree mask_vec = NULL_TREE;
 	      if (!analyze_only)
-		{
-		  tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
+		mask_vec = vect_gen_perm_mask_checked (vectype, indices);

-		  if (second_vec_index == -1)
-		    second_vec_index = first_vec_index;
+	      if (second_vec_index == -1)
+		second_vec_index = first_vec_index;

-		  for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
+	      for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
+		{
+		  ++*n_perms;
+		  if (analyze_only)
+		    continue;
+		  /* Generate the permute statement if necessary.  */
+		  tree first_vec = dr_chain[first_vec_index + ri];
+		  tree second_vec = dr_chain[second_vec_index + ri];
+		  gassign *stmt = as_a<gassign *> (stmt_info->stmt);
+		  tree perm_dest
+		    = vect_create_destination_var (gimple_assign_lhs (stmt),
+						   vectype);
+		  perm_dest = make_ssa_name (perm_dest);
+		  gimple *perm_stmt
+		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec,
+					   second_vec, mask_vec);
+		  vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+					       gsi);
+		  if (dce_chain)
 		    {
-		      /* Generate the permute statement if necessary.  */
-		      tree first_vec = dr_chain[first_vec_index + ri];
-		      tree second_vec = dr_chain[second_vec_index + ri];
-		      gassign *stmt = as_a<gassign *> (stmt_info->stmt);
-		      tree perm_dest
-			= vect_create_destination_var (gimple_assign_lhs (stmt),
-						       vectype);
-		      perm_dest = make_ssa_name (perm_dest);
-		      gimple *perm_stmt
-			= gimple_build_assign (perm_dest, VEC_PERM_EXPR,
-					       first_vec, second_vec, mask_vec);
-		      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
-						   gsi);
-		      if (dce_chain)
-			{
-			  bitmap_set_bit (used_defs, first_vec_index + ri);
-			  bitmap_set_bit (used_defs, second_vec_index + ri);
-			}
-
-		      /* Store the vector statement in NODE.  */
-		      SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
-			= perm_stmt;
+		      bitmap_set_bit (used_defs, first_vec_index + ri);
+		      bitmap_set_bit (used_defs, second_vec_index + ri);
 		    }
+
+		  /* Store the vector statement in NODE.  */
+		  SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
 		}
 	    }
 	  else if (!analyze_only)
--
2.39.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1
  2023-05-17  6:09 [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 Kewen.Lin
  2023-05-17  6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin
@ 2023-05-17  6:34 ` Richard Biener
  2023-05-17  7:18   ` Kewen.Lin
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-05-17  6:34 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner

On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> This patch is to refactor the handlings for the case (index
> == count) in a loop of vect_transform_slp_perm_load_1, in
> order to prepare a subsequent adjustment on *nperm.  This
> patch doesn't have any functional changes.

The diff is impossible to be reviewed - can you explain the
refactoring you have done or also attach a patch more clearly
showing what you change?

> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>         * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
>         handling on the case index == count.
> ---
>  gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++----------------------
>  1 file changed, 44 insertions(+), 45 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 3b7a21724ec..e5c9d7e766e 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>         noop_p = false;
>        mask[index++] = mask_element;
>
> -      if (index == count && !noop_p)
> +      if (index == count)
>         {
> -         indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
> -         if (!can_vec_perm_const_p (mode, mode, indices))
> +         if (!noop_p)
>             {
> -             if (dump_p)
> +             indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
> +             if (!can_vec_perm_const_p (mode, mode, indices))
>                 {
> -                 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> -                                  vect_location,
> -                                  "unsupported vect permute { ");
> -                 for (i = 0; i < count; ++i)
> +                 if (dump_p)
>                     {
> -                     dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
> -                     dump_printf (MSG_MISSED_OPTIMIZATION, " ");
> +                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                                      "unsupported vect permute { ");
> +                     for (i = 0; i < count; ++i)
> +                       {
> +                         dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
> +                         dump_printf (MSG_MISSED_OPTIMIZATION, " ");
> +                       }
> +                     dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
>                     }
> -                 dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
> +                 gcc_assert (analyze_only);
> +                 return false;
>                 }
> -             gcc_assert (analyze_only);
> -             return false;
> -           }
>
> -         ++*n_perms;
> -       }
> +             ++*n_perms;
>
> -      if (index == count)
> -       {
> -         if (!analyze_only)
> -           {
> -             tree mask_vec = NULL_TREE;
> -
> -             if (! noop_p)
> -               mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> +             if (!analyze_only)
> +               {
> +                 tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>
> -             if (second_vec_index == -1)
> -               second_vec_index = first_vec_index;
> +                 if (second_vec_index == -1)
> +                   second_vec_index = first_vec_index;
>
> -             for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> -               {
> -                 /* Generate the permute statement if necessary.  */
> -                 tree first_vec = dr_chain[first_vec_index + ri];
> -                 tree second_vec = dr_chain[second_vec_index + ri];
> -                 gimple *perm_stmt;
> -                 if (! noop_p)
> +                 for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>                     {
> -                     gassign *stmt = as_a <gassign *> (stmt_info->stmt);
> +                     /* Generate the permute statement if necessary.  */
> +                     tree first_vec = dr_chain[first_vec_index + ri];
> +                     tree second_vec = dr_chain[second_vec_index + ri];
> +                     gassign *stmt = as_a<gassign *> (stmt_info->stmt);
>                       tree perm_dest
>                         = vect_create_destination_var (gimple_assign_lhs (stmt),
>                                                        vectype);
>                       perm_dest = make_ssa_name (perm_dest);
> -                     perm_stmt
> +                     gimple *perm_stmt
>                         = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> -                                              first_vec, second_vec,
> -                                              mask_vec);
> +                                              first_vec, second_vec, mask_vec);
>                       vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
>                                                    gsi);
>                       if (dce_chain)
> @@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>                           bitmap_set_bit (used_defs, first_vec_index + ri);
>                           bitmap_set_bit (used_defs, second_vec_index + ri);
>                         }
> +
> +                     /* Store the vector statement in NODE.  */
> +                     SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
> +                       = perm_stmt;
>                     }
> -                 else
> -                   {
> -                     /* If mask was NULL_TREE generate the requested
> -                        identity transform.  */
> -                     perm_stmt = SSA_NAME_DEF_STMT (first_vec);
> -                     if (dce_chain)
> -                       bitmap_set_bit (used_defs, first_vec_index + ri);
> -                   }
> +               }
> +           }
> +         else if (!analyze_only)
> +           {
> +             for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> +               {
> +                 tree first_vec = dr_chain[first_vec_index + ri];
> +                 /* If mask was NULL_TREE generate the requested
> +                    identity transform.  */
> +                 gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec);
> +                 if (dce_chain)
> +                   bitmap_set_bit (used_defs, first_vec_index + ri);
>
>                   /* Store the vector statement in NODE.  */
>                   SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
> --
> 2.39.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1
  2023-05-17  6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener
@ 2023-05-17  7:18   ` Kewen.Lin
  2023-05-18  6:12     ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Kewen.Lin @ 2023-05-17  7:18 UTC (permalink / raw)
  To: Richard Biener
  Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner

Hi Richi,

on 2023/5/17 14:34, Richard Biener wrote:
> On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> This patch is to refactor the handlings for the case (index
>> == count) in a loop of vect_transform_slp_perm_load_1, in
>> order to prepare a subsequent adjustment on *nperm.  This
>> patch doesn't have any functional changes.
> 
> The diff is impossible to be reviewed - can you explain the
> refactoring you have done or also attach a patch more clearly
> showing what you change?

Sorry, I should have made it more clear.
It mainly to combine these two hunks:
  
  if (index == count && !noop_p)
    {  
       // A ...
       // ++*n_perms;
    }

  if (index == count)
    {
       if (!analyze_only)
         {
            if (!noop_p)
               // B1 ...
            
            // B2 ...
           
            for ...
              {
                 if (!noop_p)
                    // B3 building VEC_PERM_EXPR
                 else
                    // B4 building nothing (no uses for B2 and its seq)
              }
         }
       // B5
    }

The former can be part of the latter, so it becomes to:

  if (index == count)
    {
       if (!noop_p)
         {
           // A ...
           // ++*n_perms;

           if (!analyze_only)
             {
                // B1 ...
                // B2 ...
                for ...
                   // B3 building VEC_PERM_EXPR
             }
         }
       else if (!analyze_only)
         {
            // no B2 since no any further uses here.
            for ...
              // B4 building nothing
         }
        // B5 ...
    }

But it's mainly the basic for the subsequent patch for consistent n_perms calculation,
the patch 2/2 is to make it further become to:

  if (index == count)
    {
       if (!noop_p)
         {
           // A ...

           if (!analyze_only)
             // B1 ...
           
           // B2 ... (trivial computations during analyze_only or not)

           for ...
             {
                // ++*n_perms;  (now n_perms is consistent with building VEC_PERM_EXPR)
                if (analyze_only)
                   continue;
                // B3 building VEC_PERM_EXPR
             }
         }
       else if (!analyze_only)
         {
            // no B2 since no any further uses here.
            for ...
              // B4 building nothing
         }
        // B5 ...
    }

BR,
Kewen


> 
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>>         * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
>>         handling on the case index == count.
>> ---
>>  gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++----------------------
>>  1 file changed, 44 insertions(+), 45 deletions(-)
>>
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index 3b7a21724ec..e5c9d7e766e 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>         noop_p = false;
>>        mask[index++] = mask_element;
>>
>> -      if (index == count && !noop_p)
>> +      if (index == count)
>>         {
>> -         indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
>> -         if (!can_vec_perm_const_p (mode, mode, indices))
>> +         if (!noop_p)
>>             {
>> -             if (dump_p)
>> +             indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
>> +             if (!can_vec_perm_const_p (mode, mode, indices))
>>                 {
>> -                 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>> -                                  vect_location,
>> -                                  "unsupported vect permute { ");
>> -                 for (i = 0; i < count; ++i)
>> +                 if (dump_p)
>>                     {
>> -                     dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
>> -                     dump_printf (MSG_MISSED_OPTIMIZATION, " ");
>> +                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +                                      "unsupported vect permute { ");
>> +                     for (i = 0; i < count; ++i)
>> +                       {
>> +                         dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
>> +                         dump_printf (MSG_MISSED_OPTIMIZATION, " ");
>> +                       }
>> +                     dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
>>                     }
>> -                 dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
>> +                 gcc_assert (analyze_only);
>> +                 return false;
>>                 }
>> -             gcc_assert (analyze_only);
>> -             return false;
>> -           }
>>
>> -         ++*n_perms;
>> -       }
>> +             ++*n_perms;
>>
>> -      if (index == count)
>> -       {
>> -         if (!analyze_only)
>> -           {
>> -             tree mask_vec = NULL_TREE;
>> -
>> -             if (! noop_p)
>> -               mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>> +             if (!analyze_only)
>> +               {
>> +                 tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>>
>> -             if (second_vec_index == -1)
>> -               second_vec_index = first_vec_index;
>> +                 if (second_vec_index == -1)
>> +                   second_vec_index = first_vec_index;
>>
>> -             for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>> -               {
>> -                 /* Generate the permute statement if necessary.  */
>> -                 tree first_vec = dr_chain[first_vec_index + ri];
>> -                 tree second_vec = dr_chain[second_vec_index + ri];
>> -                 gimple *perm_stmt;
>> -                 if (! noop_p)
>> +                 for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>>                     {
>> -                     gassign *stmt = as_a <gassign *> (stmt_info->stmt);
>> +                     /* Generate the permute statement if necessary.  */
>> +                     tree first_vec = dr_chain[first_vec_index + ri];
>> +                     tree second_vec = dr_chain[second_vec_index + ri];
>> +                     gassign *stmt = as_a<gassign *> (stmt_info->stmt);
>>                       tree perm_dest
>>                         = vect_create_destination_var (gimple_assign_lhs (stmt),
>>                                                        vectype);
>>                       perm_dest = make_ssa_name (perm_dest);
>> -                     perm_stmt
>> +                     gimple *perm_stmt
>>                         = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
>> -                                              first_vec, second_vec,
>> -                                              mask_vec);
>> +                                              first_vec, second_vec, mask_vec);
>>                       vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
>>                                                    gsi);
>>                       if (dce_chain)
>> @@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>                           bitmap_set_bit (used_defs, first_vec_index + ri);
>>                           bitmap_set_bit (used_defs, second_vec_index + ri);
>>                         }
>> +
>> +                     /* Store the vector statement in NODE.  */
>> +                     SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
>> +                       = perm_stmt;
>>                     }
>> -                 else
>> -                   {
>> -                     /* If mask was NULL_TREE generate the requested
>> -                        identity transform.  */
>> -                     perm_stmt = SSA_NAME_DEF_STMT (first_vec);
>> -                     if (dce_chain)
>> -                       bitmap_set_bit (used_defs, first_vec_index + ri);
>> -                   }
>> +               }
>> +           }
>> +         else if (!analyze_only)
>> +           {
>> +             for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>> +               {
>> +                 tree first_vec = dr_chain[first_vec_index + ri];
>> +                 /* If mask was NULL_TREE generate the requested
>> +                    identity transform.  */
>> +                 gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec);
>> +                 if (dce_chain)
>> +                   bitmap_set_bit (used_defs, first_vec_index + ri);
>>
>>                   /* Store the vector statement in NODE.  */
>>                   SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
>> --
>> 2.39.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1
  2023-05-17  7:18   ` Kewen.Lin
@ 2023-05-18  6:12     ` Richard Biener
  2023-05-22  5:37       ` Kewen.Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-05-18  6:12 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner

On Wed, May 17, 2023 at 9:19 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Richi,
>
> on 2023/5/17 14:34, Richard Biener wrote:
> > On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> This patch is to refactor the handlings for the case (index
> >> == count) in a loop of vect_transform_slp_perm_load_1, in
> >> order to prepare a subsequent adjustment on *nperm.  This
> >> patch doesn't have any functional changes.
> >
> > The diff is impossible to be reviewed - can you explain the
> > refactoring you have done or also attach a patch more clearly
> > showing what you change?
>
> Sorry, I should have made it more clear.
> It mainly to combine these two hunks:
>
>   if (index == count && !noop_p)
>     {
>        // A ...
>        // ++*n_perms;
>     }
>
>   if (index == count)
>     {
>        if (!analyze_only)
>          {
>             if (!noop_p)
>                // B1 ...
>
>             // B2 ...
>
>             for ...
>               {
>                  if (!noop_p)
>                     // B3 building VEC_PERM_EXPR
>                  else
>                     // B4 building nothing (no uses for B2 and its seq)
>               }
>          }
>        // B5
>     }
>
> The former can be part of the latter, so it becomes to:
>
>   if (index == count)
>     {
>        if (!noop_p)
>          {
>            // A ...
>            // ++*n_perms;
>
>            if (!analyze_only)
>              {
>                 // B1 ...
>                 // B2 ...
>                 for ...
>                    // B3 building VEC_PERM_EXPR
>              }
>          }
>        else if (!analyze_only)
>          {
>             // no B2 since no any further uses here.
>             for ...
>               // B4 building nothing
>          }
>         // B5 ...
>     }

Ah, thanks - that made reviewing easy.  1/2 is OK for trunk.

Thanks,
Richard.

> But it's mainly the basic for the subsequent patch for consistent n_perms calculation,
> the patch 2/2 is to make it further become to:
>
>   if (index == count)
>     {
>        if (!noop_p)
>          {
>            // A ...
>
>            if (!analyze_only)
>              // B1 ...
>
>            // B2 ... (trivial computations during analyze_only or not)
>
>            for ...
>              {
>                 // ++*n_perms;  (now n_perms is consistent with building VEC_PERM_EXPR)
>                 if (analyze_only)
>                    continue;
>                 // B3 building VEC_PERM_EXPR
>              }
>          }
>        else if (!analyze_only)
>          {
>             // no B2 since no any further uses here.
>             for ...
>               // B4 building nothing
>          }
>         // B5 ...
>     }
>
> BR,
> Kewen
>
>
> >
> >> Bootstrapped and regtested on x86_64-redhat-linux,
> >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
> >>
> >> BR,
> >> Kewen
> >> -----
> >> gcc/ChangeLog:
> >>
> >>         * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
> >>         handling on the case index == count.
> >> ---
> >>  gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++----------------------
> >>  1 file changed, 44 insertions(+), 45 deletions(-)
> >>
> >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> >> index 3b7a21724ec..e5c9d7e766e 100644
> >> --- a/gcc/tree-vect-slp.cc
> >> +++ b/gcc/tree-vect-slp.cc
> >> @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> >>         noop_p = false;
> >>        mask[index++] = mask_element;
> >>
> >> -      if (index == count && !noop_p)
> >> +      if (index == count)
> >>         {
> >> -         indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
> >> -         if (!can_vec_perm_const_p (mode, mode, indices))
> >> +         if (!noop_p)
> >>             {
> >> -             if (dump_p)
> >> +             indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
> >> +             if (!can_vec_perm_const_p (mode, mode, indices))
> >>                 {
> >> -                 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> >> -                                  vect_location,
> >> -                                  "unsupported vect permute { ");
> >> -                 for (i = 0; i < count; ++i)
> >> +                 if (dump_p)
> >>                     {
> >> -                     dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
> >> -                     dump_printf (MSG_MISSED_OPTIMIZATION, " ");
> >> +                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> +                                      "unsupported vect permute { ");
> >> +                     for (i = 0; i < count; ++i)
> >> +                       {
> >> +                         dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
> >> +                         dump_printf (MSG_MISSED_OPTIMIZATION, " ");
> >> +                       }
> >> +                     dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
> >>                     }
> >> -                 dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
> >> +                 gcc_assert (analyze_only);
> >> +                 return false;
> >>                 }
> >> -             gcc_assert (analyze_only);
> >> -             return false;
> >> -           }
> >>
> >> -         ++*n_perms;
> >> -       }
> >> +             ++*n_perms;
> >>
> >> -      if (index == count)
> >> -       {
> >> -         if (!analyze_only)
> >> -           {
> >> -             tree mask_vec = NULL_TREE;
> >> -
> >> -             if (! noop_p)
> >> -               mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> >> +             if (!analyze_only)
> >> +               {
> >> +                 tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> >>
> >> -             if (second_vec_index == -1)
> >> -               second_vec_index = first_vec_index;
> >> +                 if (second_vec_index == -1)
> >> +                   second_vec_index = first_vec_index;
> >>
> >> -             for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> >> -               {
> >> -                 /* Generate the permute statement if necessary.  */
> >> -                 tree first_vec = dr_chain[first_vec_index + ri];
> >> -                 tree second_vec = dr_chain[second_vec_index + ri];
> >> -                 gimple *perm_stmt;
> >> -                 if (! noop_p)
> >> +                 for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> >>                     {
> >> -                     gassign *stmt = as_a <gassign *> (stmt_info->stmt);
> >> +                     /* Generate the permute statement if necessary.  */
> >> +                     tree first_vec = dr_chain[first_vec_index + ri];
> >> +                     tree second_vec = dr_chain[second_vec_index + ri];
> >> +                     gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> >>                       tree perm_dest
> >>                         = vect_create_destination_var (gimple_assign_lhs (stmt),
> >>                                                        vectype);
> >>                       perm_dest = make_ssa_name (perm_dest);
> >> -                     perm_stmt
> >> +                     gimple *perm_stmt
> >>                         = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> >> -                                              first_vec, second_vec,
> >> -                                              mask_vec);
> >> +                                              first_vec, second_vec, mask_vec);
> >>                       vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> >>                                                    gsi);
> >>                       if (dce_chain)
> >> @@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> >>                           bitmap_set_bit (used_defs, first_vec_index + ri);
> >>                           bitmap_set_bit (used_defs, second_vec_index + ri);
> >>                         }
> >> +
> >> +                     /* Store the vector statement in NODE.  */
> >> +                     SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
> >> +                       = perm_stmt;
> >>                     }
> >> -                 else
> >> -                   {
> >> -                     /* If mask was NULL_TREE generate the requested
> >> -                        identity transform.  */
> >> -                     perm_stmt = SSA_NAME_DEF_STMT (first_vec);
> >> -                     if (dce_chain)
> >> -                       bitmap_set_bit (used_defs, first_vec_index + ri);
> >> -                   }
> >> +               }
> >> +           }
> >> +         else if (!analyze_only)
> >> +           {
> >> +             for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> >> +               {
> >> +                 tree first_vec = dr_chain[first_vec_index + ri];
> >> +                 /* If mask was NULL_TREE generate the requested
> >> +                    identity transform.  */
> >> +                 gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec);
> >> +                 if (dce_chain)
> >> +                   bitmap_set_bit (used_defs, first_vec_index + ri);
> >>
> >>                   /* Store the vector statement in NODE.  */
> >>                   SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
> >> --
> >> 2.39.1
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1
  2023-05-18  6:12     ` Richard Biener
@ 2023-05-22  5:37       ` Kewen.Lin
  0 siblings, 0 replies; 10+ messages in thread
From: Kewen.Lin @ 2023-05-22  5:37 UTC (permalink / raw)
  To: Richard Biener
  Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner

on 2023/5/18 14:12, Richard Biener wrote:
> On Wed, May 17, 2023 at 9:19 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi Richi,
>>
>> on 2023/5/17 14:34, Richard Biener wrote:
>>> On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> This patch is to refactor the handlings for the case (index
>>>> == count) in a loop of vect_transform_slp_perm_load_1, in
>>>> order to prepare a subsequent adjustment on *nperm.  This
>>>> patch doesn't have any functional changes.
>>>
>>> The diff is impossible to be reviewed - can you explain the
>>> refactoring you have done or also attach a patch more clearly
>>> showing what you change?
>>
>> Sorry, I should have made it more clear.
>> It mainly to combine these two hunks:
>>
>>   if (index == count && !noop_p)
>>     {
>>        // A ...
>>        // ++*n_perms;
>>     }
>>
>>   if (index == count)
>>     {
>>        if (!analyze_only)
>>          {
>>             if (!noop_p)
>>                // B1 ...
>>
>>             // B2 ...
>>
>>             for ...
>>               {
>>                  if (!noop_p)
>>                     // B3 building VEC_PERM_EXPR
>>                  else
>>                     // B4 building nothing (no uses for B2 and its seq)
>>               }
>>          }
>>        // B5
>>     }
>>
>> The former can be part of the latter, so it becomes to:
>>
>>   if (index == count)
>>     {
>>        if (!noop_p)
>>          {
>>            // A ...
>>            // ++*n_perms;
>>
>>            if (!analyze_only)
>>              {
>>                 // B1 ...
>>                 // B2 ...
>>                 for ...
>>                    // B3 building VEC_PERM_EXPR
>>              }
>>          }
>>        else if (!analyze_only)
>>          {
>>             // no B2 since no any further uses here.
>>             for ...
>>               // B4 building nothing
>>          }
>>         // B5 ...
>>     }
> 
> Ah, thanks - that made reviewing easy.  1/2 is OK for trunk.

Thanks for the review!  Pushed as r14-1028.

BR,
Kewen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
  2023-05-17  6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin
@ 2023-05-22 13:44   ` Richard Biener
  2023-05-23  3:01     ` Kewen.Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-05-22 13:44 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner

On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> Following Richi's suggestion in [1], I'm working on deferring
> cost evaluation next to the transformation, this patch is
> to enhance function vect_transform_slp_perm_load_1 which
> could under-cost for vector permutation, since the costing
> doesn't try to consider nvectors_per_build, it's inconsistent
> with the transformation part.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>         * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
>         calculation on n_perms by considering nvectors_per_build.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
> ---
>  .../vect/costmodel/ppc/costmodel-slp-perm.c   | 23 +++++++
>  gcc/tree-vect-slp.cc                          | 66 ++++++++++---------
>  2 files changed, 57 insertions(+), 32 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> new file mode 100644
> index 00000000000..e5c4dceddfb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* Specify power9 to ensure the vectorization is profitable
> +   and test point stands, otherwise it could be not profitable
> +   to vectorize.  */
> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
> +
> +/* Verify we cost the exact count for required vec_perm.  */
> +
> +int x[1024], y[1024];
> +
> +void
> +foo ()
> +{
> +  for (int i = 0; i < 512; ++i)
> +    {
> +      x[2 * i] = y[1023 - (2 * i)];
> +      x[2 * i + 1] = y[1023 - (2 * i + 1)];
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index e5c9d7e766e..af9a6dd4fa9 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>
>    mode = TYPE_MODE (vectype);
>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
>
>    /* Initialize the vect stmts of NODE to properly insert the generated
>       stmts later.  */
>    if (! analyze_only)
> -    for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
> -        i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
> +    for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
>        SLP_TREE_VEC_STMTS (node).quick_push (NULL);
>
>    /* Generate permutation masks for every NODE. Number of masks for each NODE
> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>          (b) the permutes only need a single vector input.  */
>        mask.new_vector (nunits, group_size, 3);
>        nelts_to_build = mask.encoded_nelts ();
> -      nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
> +      /* It's possible to obtain zero nstmts during analyze_only, so make
> +        it at least one to ensure the later computation for n_perms
> +        proceed.  */
> +      nvectors_per_build = nstmts > 0 ? nstmts : 1;
>        in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
>      }
>    else
> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>                   return false;
>                 }
>
> -             ++*n_perms;
> -
> +             tree mask_vec = NULL_TREE;
>               if (!analyze_only)
> -               {
> -                 tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> +               mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>
> -                 if (second_vec_index == -1)
> -                   second_vec_index = first_vec_index;
> +             if (second_vec_index == -1)
> +               second_vec_index = first_vec_index;
>
> -                 for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> +             for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> +               {
> +                 ++*n_perms;

So the "real" change is doing

      *n_perms += nvectors_per_build;

and *n_perms was unused when !analyze_only?  And since at
analysis time we (sometimes?) have zero nvectors you have to
fixup above?  Which cases are that?

In principle the patch looks good to me.

Richard.

> +                 if (analyze_only)
> +                   continue;
> +                 /* Generate the permute statement if necessary.  */
> +                 tree first_vec = dr_chain[first_vec_index + ri];
> +                 tree second_vec = dr_chain[second_vec_index + ri];
> +                 gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> +                 tree perm_dest
> +                   = vect_create_destination_var (gimple_assign_lhs (stmt),
> +                                                  vectype);
> +                 perm_dest = make_ssa_name (perm_dest);
> +                 gimple *perm_stmt
> +                   = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec,
> +                                          second_vec, mask_vec);
> +                 vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> +                                              gsi);
> +                 if (dce_chain)
>                     {
> -                     /* Generate the permute statement if necessary.  */
> -                     tree first_vec = dr_chain[first_vec_index + ri];
> -                     tree second_vec = dr_chain[second_vec_index + ri];
> -                     gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> -                     tree perm_dest
> -                       = vect_create_destination_var (gimple_assign_lhs (stmt),
> -                                                      vectype);
> -                     perm_dest = make_ssa_name (perm_dest);
> -                     gimple *perm_stmt
> -                       = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> -                                              first_vec, second_vec, mask_vec);
> -                     vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> -                                                  gsi);
> -                     if (dce_chain)
> -                       {
> -                         bitmap_set_bit (used_defs, first_vec_index + ri);
> -                         bitmap_set_bit (used_defs, second_vec_index + ri);
> -                       }
> -
> -                     /* Store the vector statement in NODE.  */
> -                     SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
> -                       = perm_stmt;
> +                     bitmap_set_bit (used_defs, first_vec_index + ri);
> +                     bitmap_set_bit (used_defs, second_vec_index + ri);
>                     }
> +
> +                 /* Store the vector statement in NODE.  */
> +                 SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
>                 }
>             }
>           else if (!analyze_only)
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
  2023-05-22 13:44   ` Richard Biener
@ 2023-05-23  3:01     ` Kewen.Lin
  2023-05-23  6:19       ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Kewen.Lin @ 2023-05-23  3:01 UTC (permalink / raw)
  To: Richard Biener
  Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner

Hi Richi,

Thanks for the review!

on 2023/5/22 21:44, Richard Biener wrote:
> On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> Following Richi's suggestion in [1], I'm working on deferring
>> cost evaluation next to the transformation, this patch is
>> to enhance function vect_transform_slp_perm_load_1 which
>> could under-cost for vector permutation, since the costing
>> doesn't try to consider nvectors_per_build, it's inconsistent
>> with the transformation part.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
>>
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>>         * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
>>         calculation on n_perms by considering nvectors_per_build.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
>> ---
>>  .../vect/costmodel/ppc/costmodel-slp-perm.c   | 23 +++++++
>>  gcc/tree-vect-slp.cc                          | 66 ++++++++++---------
>>  2 files changed, 57 insertions(+), 32 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>> new file mode 100644
>> index 00000000000..e5c4dceddfb
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-require-effective-target powerpc_p9vector_ok } */
>> +/* Specify power9 to ensure the vectorization is profitable
>> +   and test point stands, otherwise it could be not profitable
>> +   to vectorize.  */
>> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
>> +
>> +/* Verify we cost the exact count for required vec_perm.  */
>> +
>> +int x[1024], y[1024];
>> +
>> +void
>> +foo ()
>> +{
>> +  for (int i = 0; i < 512; ++i)
>> +    {
>> +      x[2 * i] = y[1023 - (2 * i)];
>> +      x[2 * i + 1] = y[1023 - (2 * i + 1)];
>> +    }
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index e5c9d7e766e..af9a6dd4fa9 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>
>>    mode = TYPE_MODE (vectype);
>>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
>> +  unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
>>
>>    /* Initialize the vect stmts of NODE to properly insert the generated
>>       stmts later.  */
>>    if (! analyze_only)
>> -    for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
>> -        i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
>> +    for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
>>        SLP_TREE_VEC_STMTS (node).quick_push (NULL);
>>
>>    /* Generate permutation masks for every NODE. Number of masks for each NODE
>> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>          (b) the permutes only need a single vector input.  */
>>        mask.new_vector (nunits, group_size, 3);
>>        nelts_to_build = mask.encoded_nelts ();
>> -      nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
>> +      /* It's possible to obtain zero nstmts during analyze_only, so make
>> +        it at least one to ensure the later computation for n_perms
>> +        proceed.  */
>> +      nvectors_per_build = nstmts > 0 ? nstmts : 1;
>>        in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
>>      }
>>    else
>> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>                   return false;
>>                 }
>>
>> -             ++*n_perms;
>> -
>> +             tree mask_vec = NULL_TREE;
>>               if (!analyze_only)
>> -               {
>> -                 tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>> +               mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>>
>> -                 if (second_vec_index == -1)
>> -                   second_vec_index = first_vec_index;
>> +             if (second_vec_index == -1)
>> +               second_vec_index = first_vec_index;
>>
>> -                 for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>> +             for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>> +               {
>> +                 ++*n_perms;
> 
> So the "real" change is doing
> 
>       *n_perms += nvectors_per_build;
> 
> and *n_perms was unused when !analyze_only?  And since at

Yes, although both !analyze_only and analyze_only calls pass n_perms in, now
only the call sites with analyze_only will use the returned n_perms further.

> analysis time we (sometimes?) have zero nvectors you have to
> fixup above?  Which cases are that?

Yes, the fixup is to avoid to result in unexpected n_perms in function
vect_optimize_slp_pass::internal_node_cost。  One typical case is
gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optimize
out one more vec_perm unexpectedly.

In vect_optimize_slp_pass::internal_node_cost, it checks if the returned n_perms
is zero or not (vec_perm not needed or needed).

      if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL,
					   nullptr, vf, true, false, &n_perms))
	{
	  auto rep = SLP_TREE_REPRESENTATIVE (node);
	  if (out_layout_i == 0)
	    {
	      /* Use the fallback cost if the load is an N-to-N permutation.
		 Otherwise assume that the node will be rejected later
		 and rebuilt from scalars.  */
	      if (STMT_VINFO_GROUPED_ACCESS (rep)
		  && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep))
		      == SLP_TREE_LANES (node)))
		return fallback_cost;
	      return 0;
	    }
	  return -1;
	}

      /* See the comment above the corresponding VEC_PERM_EXPR handling.  */
      return n_perms == 0 ? 0 : 1;

In vect_optimize_slp_pass::forward_pass (), it only considers the case that
factor > 0 (there is some vec_perm needed).

	      /* Accumulate the cost of using LAYOUT_I within NODE,
		 both for the inputs and the outputs.  */
	      int factor = internal_node_cost (vertex.node, layout_i,
					       layout_i);
	      if (factor < 0)
		{
		  is_possible = false;
		  break;
		}
	      else if (factor)
		layout_costs.internal_cost.add_serial_cost
		  ({ vertex.weight * factor, m_optimize_size });

BR,
Kewen

> 
> In principle the patch looks good to me.
> 
> Richard.
> 
>> +                 if (analyze_only)
>> +                   continue;
>> +                 /* Generate the permute statement if necessary.  */
>> +                 tree first_vec = dr_chain[first_vec_index + ri];
>> +                 tree second_vec = dr_chain[second_vec_index + ri];
>> +                 gassign *stmt = as_a<gassign *> (stmt_info->stmt);
>> +                 tree perm_dest
>> +                   = vect_create_destination_var (gimple_assign_lhs (stmt),
>> +                                                  vectype);
>> +                 perm_dest = make_ssa_name (perm_dest);
>> +                 gimple *perm_stmt
>> +                   = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec,
>> +                                          second_vec, mask_vec);
>> +                 vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
>> +                                              gsi);
>> +                 if (dce_chain)
>>                     {
>> -                     /* Generate the permute statement if necessary.  */
>> -                     tree first_vec = dr_chain[first_vec_index + ri];
>> -                     tree second_vec = dr_chain[second_vec_index + ri];
>> -                     gassign *stmt = as_a<gassign *> (stmt_info->stmt);
>> -                     tree perm_dest
>> -                       = vect_create_destination_var (gimple_assign_lhs (stmt),
>> -                                                      vectype);
>> -                     perm_dest = make_ssa_name (perm_dest);
>> -                     gimple *perm_stmt
>> -                       = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
>> -                                              first_vec, second_vec, mask_vec);
>> -                     vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
>> -                                                  gsi);
>> -                     if (dce_chain)
>> -                       {
>> -                         bitmap_set_bit (used_defs, first_vec_index + ri);
>> -                         bitmap_set_bit (used_defs, second_vec_index + ri);
>> -                       }
>> -
>> -                     /* Store the vector statement in NODE.  */
>> -                     SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
>> -                       = perm_stmt;
>> +                     bitmap_set_bit (used_defs, first_vec_index + ri);
>> +                     bitmap_set_bit (used_defs, second_vec_index + ri);
>>                     }
>> +
>> +                 /* Store the vector statement in NODE.  */
>> +                 SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
>>                 }
>>             }
>>           else if (!analyze_only)
>> --
>> 2.39.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
  2023-05-23  3:01     ` Kewen.Lin
@ 2023-05-23  6:19       ` Richard Biener
  2023-05-24  5:23         ` Kewen.Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-05-23  6:19 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner

On Tue, May 23, 2023 at 5:01 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Richi,
>
> Thanks for the review!
>
> on 2023/5/22 21:44, Richard Biener wrote:
> > On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> Following Richi's suggestion in [1], I'm working on deferring
> >> cost evaluation next to the transformation, this patch is
> >> to enhance function vect_transform_slp_perm_load_1 which
> >> could under-cost for vector permutation, since the costing
> >> doesn't try to consider nvectors_per_build, it's inconsistent
> >> with the transformation part.
> >>
> >> Bootstrapped and regtested on x86_64-redhat-linux,
> >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
> >>
> >> Is it ok for trunk?
> >>
> >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
> >>
> >> BR,
> >> Kewen
> >> -----
> >> gcc/ChangeLog:
> >>
> >>         * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
> >>         calculation on n_perms by considering nvectors_per_build.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>         * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
> >> ---
> >>  .../vect/costmodel/ppc/costmodel-slp-perm.c   | 23 +++++++
> >>  gcc/tree-vect-slp.cc                          | 66 ++++++++++---------
> >>  2 files changed, 57 insertions(+), 32 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> >> new file mode 100644
> >> index 00000000000..e5c4dceddfb
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> >> @@ -0,0 +1,23 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-require-effective-target vect_int } */
> >> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> >> +/* Specify power9 to ensure the vectorization is profitable
> >> +   and test point stands, otherwise it could be not profitable
> >> +   to vectorize.  */
> >> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
> >> +
> >> +/* Verify we cost the exact count for required vec_perm.  */
> >> +
> >> +int x[1024], y[1024];
> >> +
> >> +void
> >> +foo ()
> >> +{
> >> +  for (int i = 0; i < 512; ++i)
> >> +    {
> >> +      x[2 * i] = y[1023 - (2 * i)];
> >> +      x[2 * i + 1] = y[1023 - (2 * i + 1)];
> >> +    }
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
> >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> >> index e5c9d7e766e..af9a6dd4fa9 100644
> >> --- a/gcc/tree-vect-slp.cc
> >> +++ b/gcc/tree-vect-slp.cc
> >> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> >>
> >>    mode = TYPE_MODE (vectype);
> >>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> >> +  unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
> >>
> >>    /* Initialize the vect stmts of NODE to properly insert the generated
> >>       stmts later.  */
> >>    if (! analyze_only)
> >> -    for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
> >> -        i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
> >> +    for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
> >>        SLP_TREE_VEC_STMTS (node).quick_push (NULL);
> >>
> >>    /* Generate permutation masks for every NODE. Number of masks for each NODE
> >> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> >>          (b) the permutes only need a single vector input.  */
> >>        mask.new_vector (nunits, group_size, 3);
> >>        nelts_to_build = mask.encoded_nelts ();
> >> -      nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
> >> +      /* It's possible to obtain zero nstmts during analyze_only, so make
> >> +        it at least one to ensure the later computation for n_perms
> >> +        proceed.  */
> >> +      nvectors_per_build = nstmts > 0 ? nstmts : 1;
> >>        in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
> >>      }
> >>    else
> >> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> >>                   return false;
> >>                 }
> >>
> >> -             ++*n_perms;
> >> -
> >> +             tree mask_vec = NULL_TREE;
> >>               if (!analyze_only)
> >> -               {
> >> -                 tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> >> +               mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> >>
> >> -                 if (second_vec_index == -1)
> >> -                   second_vec_index = first_vec_index;
> >> +             if (second_vec_index == -1)
> >> +               second_vec_index = first_vec_index;
> >>
> >> -                 for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> >> +             for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> >> +               {
> >> +                 ++*n_perms;
> >
> > So the "real" change is doing
> >
> >       *n_perms += nvectors_per_build;
> >
> > and *n_perms was unused when !analyze_only?  And since at
>
> Yes, although both !analyze_only and analyze_only calls pass n_perms in, now
> only the call sites with analyze_only will use the returned n_perms further.
>
> > analysis time we (sometimes?) have zero nvectors you have to
> > fixup above?  Which cases are that?
>
> Yes, the fixup is to avoid to result in unexpected n_perms in function
> vect_optimize_slp_pass::internal_node_cost。  One typical case is
> gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optimize
> out one more vec_perm unexpectedly.
>
> In vect_optimize_slp_pass::internal_node_cost, it checks if the returned n_perms
> is zero or not (vec_perm not needed or needed).
>
>       if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL,
>                                            nullptr, vf, true, false, &n_perms))
>         {
>           auto rep = SLP_TREE_REPRESENTATIVE (node);
>           if (out_layout_i == 0)
>             {
>               /* Use the fallback cost if the load is an N-to-N permutation.
>                  Otherwise assume that the node will be rejected later
>                  and rebuilt from scalars.  */
>               if (STMT_VINFO_GROUPED_ACCESS (rep)
>                   && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep))
>                       == SLP_TREE_LANES (node)))
>                 return fallback_cost;
>               return 0;
>             }
>           return -1;
>         }
>
>       /* See the comment above the corresponding VEC_PERM_EXPR handling.  */
>       return n_perms == 0 ? 0 : 1;
>
> In vect_optimize_slp_pass::forward_pass (), it only considers the case that
> factor > 0 (there is some vec_perm needed).
>
>               /* Accumulate the cost of using LAYOUT_I within NODE,
>                  both for the inputs and the outputs.  */
>               int factor = internal_node_cost (vertex.node, layout_i,
>                                                layout_i);
>               if (factor < 0)
>                 {
>                   is_possible = false;
>                   break;
>                 }
>               else if (factor)
>                 layout_costs.internal_cost.add_serial_cost
>                   ({ vertex.weight * factor, m_optimize_size });

Ah, OK - thanks for clarifying.

The patch is OK.
Richard.

> BR,
> Kewen
>
> >
> > In principle the patch looks good to me.
> >
> > Richard.
> >
> >> +                 if (analyze_only)
> >> +                   continue;
> >> +                 /* Generate the permute statement if necessary.  */
> >> +                 tree first_vec = dr_chain[first_vec_index + ri];
> >> +                 tree second_vec = dr_chain[second_vec_index + ri];
> >> +                 gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> >> +                 tree perm_dest
> >> +                   = vect_create_destination_var (gimple_assign_lhs (stmt),
> >> +                                                  vectype);
> >> +                 perm_dest = make_ssa_name (perm_dest);
> >> +                 gimple *perm_stmt
> >> +                   = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec,
> >> +                                          second_vec, mask_vec);
> >> +                 vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> >> +                                              gsi);
> >> +                 if (dce_chain)
> >>                     {
> >> -                     /* Generate the permute statement if necessary.  */
> >> -                     tree first_vec = dr_chain[first_vec_index + ri];
> >> -                     tree second_vec = dr_chain[second_vec_index + ri];
> >> -                     gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> >> -                     tree perm_dest
> >> -                       = vect_create_destination_var (gimple_assign_lhs (stmt),
> >> -                                                      vectype);
> >> -                     perm_dest = make_ssa_name (perm_dest);
> >> -                     gimple *perm_stmt
> >> -                       = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> >> -                                              first_vec, second_vec, mask_vec);
> >> -                     vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> >> -                                                  gsi);
> >> -                     if (dce_chain)
> >> -                       {
> >> -                         bitmap_set_bit (used_defs, first_vec_index + ri);
> >> -                         bitmap_set_bit (used_defs, second_vec_index + ri);
> >> -                       }
> >> -
> >> -                     /* Store the vector statement in NODE.  */
> >> -                     SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
> >> -                       = perm_stmt;
> >> +                     bitmap_set_bit (used_defs, first_vec_index + ri);
> >> +                     bitmap_set_bit (used_defs, second_vec_index + ri);
> >>                     }
> >> +
> >> +                 /* Store the vector statement in NODE.  */
> >> +                 SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
> >>                 }
> >>             }
> >>           else if (!analyze_only)
> >> --
> >> 2.39.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
  2023-05-23  6:19       ` Richard Biener
@ 2023-05-24  5:23         ` Kewen.Lin
  0 siblings, 0 replies; 10+ messages in thread
From: Kewen.Lin @ 2023-05-24  5:23 UTC (permalink / raw)
  To: Richard Biener
  Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner

on 2023/5/23 14:19, Richard Biener wrote:
> On Tue, May 23, 2023 at 5:01 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi Richi,
>>
>> Thanks for the review!
>>
>> on 2023/5/22 21:44, Richard Biener wrote:
>>> On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Following Richi's suggestion in [1], I'm working on deferring
>>>> cost evaluation next to the transformation, this patch is
>>>> to enhance function vect_transform_slp_perm_load_1 which
>>>> could under-cost for vector permutation, since the costing
>>>> doesn't try to consider nvectors_per_build, it's inconsistent
>>>> with the transformation part.
>>>>
>>>> Bootstrapped and regtested on x86_64-redhat-linux,
>>>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>>>
>>>> Is it ok for trunk?
>>>>
>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
>>>>
>>>> BR,
>>>> Kewen
>>>> -----
>>>> gcc/ChangeLog:
>>>>
>>>>         * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
>>>>         calculation on n_perms by considering nvectors_per_build.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>         * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
>>>> ---
>>>>  .../vect/costmodel/ppc/costmodel-slp-perm.c   | 23 +++++++
>>>>  gcc/tree-vect-slp.cc                          | 66 ++++++++++---------
>>>>  2 files changed, 57 insertions(+), 32 deletions(-)
>>>>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>>>>
>>>> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>>>> new file mode 100644
>>>> index 00000000000..e5c4dceddfb
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>>>> @@ -0,0 +1,23 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-require-effective-target vect_int } */
>>>> +/* { dg-require-effective-target powerpc_p9vector_ok } */
>>>> +/* Specify power9 to ensure the vectorization is profitable
>>>> +   and test point stands, otherwise it could be not profitable
>>>> +   to vectorize.  */
>>>> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
>>>> +
>>>> +/* Verify we cost the exact count for required vec_perm.  */
>>>> +
>>>> +int x[1024], y[1024];
>>>> +
>>>> +void
>>>> +foo ()
>>>> +{
>>>> +  for (int i = 0; i < 512; ++i)
>>>> +    {
>>>> +      x[2 * i] = y[1023 - (2 * i)];
>>>> +      x[2 * i + 1] = y[1023 - (2 * i + 1)];
>>>> +    }
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
>>>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>>>> index e5c9d7e766e..af9a6dd4fa9 100644
>>>> --- a/gcc/tree-vect-slp.cc
>>>> +++ b/gcc/tree-vect-slp.cc
>>>> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>>>
>>>>    mode = TYPE_MODE (vectype);
>>>>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
>>>> +  unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
>>>>
>>>>    /* Initialize the vect stmts of NODE to properly insert the generated
>>>>       stmts later.  */
>>>>    if (! analyze_only)
>>>> -    for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
>>>> -        i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
>>>> +    for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
>>>>        SLP_TREE_VEC_STMTS (node).quick_push (NULL);
>>>>
>>>>    /* Generate permutation masks for every NODE. Number of masks for each NODE
>>>> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>>>          (b) the permutes only need a single vector input.  */
>>>>        mask.new_vector (nunits, group_size, 3);
>>>>        nelts_to_build = mask.encoded_nelts ();
>>>> -      nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
>>>> +      /* It's possible to obtain zero nstmts during analyze_only, so make
>>>> +        it at least one to ensure the later computation for n_perms
>>>> +        proceed.  */
>>>> +      nvectors_per_build = nstmts > 0 ? nstmts : 1;
>>>>        in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
>>>>      }
>>>>    else
>>>> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>>>                   return false;
>>>>                 }
>>>>
>>>> -             ++*n_perms;
>>>> -
>>>> +             tree mask_vec = NULL_TREE;
>>>>               if (!analyze_only)
>>>> -               {
>>>> -                 tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>>>> +               mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>>>>
>>>> -                 if (second_vec_index == -1)
>>>> -                   second_vec_index = first_vec_index;
>>>> +             if (second_vec_index == -1)
>>>> +               second_vec_index = first_vec_index;
>>>>
>>>> -                 for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>>>> +             for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>>>> +               {
>>>> +                 ++*n_perms;
>>>
>>> So the "real" change is doing
>>>
>>>       *n_perms += nvectors_per_build;
>>>
>>> and *n_perms was unused when !analyze_only?  And since at
>>
>> Yes, although both !analyze_only and analyze_only calls pass n_perms in, now
>> only the call sites with analyze_only will use the returned n_perms further.
>>
>>> analysis time we (sometimes?) have zero nvectors you have to
>>> fixup above?  Which cases are that?
>>
>> Yes, the fixup is to avoid to result in unexpected n_perms in function
>> vect_optimize_slp_pass::internal_node_cost。  One typical case is
>> gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optimize
>> out one more vec_perm unexpectedly.
>>
>> In vect_optimize_slp_pass::internal_node_cost, it checks if the returned n_perms
>> is zero or not (vec_perm not needed or needed).
>>
>>       if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL,
>>                                            nullptr, vf, true, false, &n_perms))
>>         {
>>           auto rep = SLP_TREE_REPRESENTATIVE (node);
>>           if (out_layout_i == 0)
>>             {
>>               /* Use the fallback cost if the load is an N-to-N permutation.
>>                  Otherwise assume that the node will be rejected later
>>                  and rebuilt from scalars.  */
>>               if (STMT_VINFO_GROUPED_ACCESS (rep)
>>                   && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep))
>>                       == SLP_TREE_LANES (node)))
>>                 return fallback_cost;
>>               return 0;
>>             }
>>           return -1;
>>         }
>>
>>       /* See the comment above the corresponding VEC_PERM_EXPR handling.  */
>>       return n_perms == 0 ? 0 : 1;
>>
>> In vect_optimize_slp_pass::forward_pass (), it only considers the case that
>> factor > 0 (there is some vec_perm needed).
>>
>>               /* Accumulate the cost of using LAYOUT_I within NODE,
>>                  both for the inputs and the outputs.  */
>>               int factor = internal_node_cost (vertex.node, layout_i,
>>                                                layout_i);
>>               if (factor < 0)
>>                 {
>>                   is_possible = false;
>>                   break;
>>                 }
>>               else if (factor)
>>                 layout_costs.internal_cost.add_serial_cost
>>                   ({ vertex.weight * factor, m_optimize_size });
> 
> Ah, OK - thanks for clarifying.
> 
> The patch is OK.

Thanks!  Committed in r14-1151.

BR,
Kewen

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-05-24  5:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-17  6:09 [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 Kewen.Lin
2023-05-17  6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin
2023-05-22 13:44   ` Richard Biener
2023-05-23  3:01     ` Kewen.Lin
2023-05-23  6:19       ` Richard Biener
2023-05-24  5:23         ` Kewen.Lin
2023-05-17  6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener
2023-05-17  7:18   ` Kewen.Lin
2023-05-18  6:12     ` Richard Biener
2023-05-22  5:37       ` Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).