public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: "Kewen.Lin" <linkw@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Richard Sandiford <richard.sandiford@arm.com>,
	 Segher Boessenkool <segher@kernel.crashing.org>,
	Peter Bergner <bergner@linux.ibm.com>
Subject: Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
Date: Mon, 22 May 2023 15:44:30 +0200	[thread overview]
Message-ID: <CAFiYyc0t03oJ3D0rdfzrviPk7Zggek095_F9-aJS2VESP-u10g@mail.gmail.com> (raw)
In-Reply-To: <71fda837-6a92-7f74-43e1-90b046919f6a@linux.ibm.com>

On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> Following Richi's suggestion in [1], I'm working on deferring
> cost evaluation next to the transformation, this patch is
> to enhance function vect_transform_slp_perm_load_1 which
> could under-cost for vector permutation, since the costing
> doesn't try to consider nvectors_per_build, it's inconsistent
> with the transformation part.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>         * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
>         calculation on n_perms by considering nvectors_per_build.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
> ---
>  .../vect/costmodel/ppc/costmodel-slp-perm.c   | 23 +++++++
>  gcc/tree-vect-slp.cc                          | 66 ++++++++++---------
>  2 files changed, 57 insertions(+), 32 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> new file mode 100644
> index 00000000000..e5c4dceddfb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* Specify power9 to ensure the vectorization is profitable
> +   and test point stands, otherwise it could be not profitable
> +   to vectorize.  */
> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
> +
> +/* Verify we cost the exact count for required vec_perm.  */
> +
> +int x[1024], y[1024];
> +
> +void
> +foo ()
> +{
> +  for (int i = 0; i < 512; ++i)
> +    {
> +      x[2 * i] = y[1023 - (2 * i)];
> +      x[2 * i + 1] = y[1023 - (2 * i + 1)];
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index e5c9d7e766e..af9a6dd4fa9 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>
>    mode = TYPE_MODE (vectype);
>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
>
>    /* Initialize the vect stmts of NODE to properly insert the generated
>       stmts later.  */
>    if (! analyze_only)
> -    for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
> -        i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
> +    for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
>        SLP_TREE_VEC_STMTS (node).quick_push (NULL);
>
>    /* Generate permutation masks for every NODE. Number of masks for each NODE
> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>          (b) the permutes only need a single vector input.  */
>        mask.new_vector (nunits, group_size, 3);
>        nelts_to_build = mask.encoded_nelts ();
> -      nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
> +      /* It's possible to obtain zero nstmts during analyze_only, so make
> +        it at least one to ensure the later computation for n_perms
> +        proceed.  */
> +      nvectors_per_build = nstmts > 0 ? nstmts : 1;
>        in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
>      }
>    else
> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>                   return false;
>                 }
>
> -             ++*n_perms;
> -
> +             tree mask_vec = NULL_TREE;
>               if (!analyze_only)
> -               {
> -                 tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> +               mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>
> -                 if (second_vec_index == -1)
> -                   second_vec_index = first_vec_index;
> +             if (second_vec_index == -1)
> +               second_vec_index = first_vec_index;
>
> -                 for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> +             for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> +               {
> +                 ++*n_perms;

So the "real" change is doing

      *n_perms += nvectors_per_build;

and *n_perms was unused when !analyze_only?  And since at
analysis time we (sometimes?) have zero nvectors you have to
fixup above?  Which cases are that?

In principle the patch looks good to me.

Richard.

> +                 if (analyze_only)
> +                   continue;
> +                 /* Generate the permute statement if necessary.  */
> +                 tree first_vec = dr_chain[first_vec_index + ri];
> +                 tree second_vec = dr_chain[second_vec_index + ri];
> +                 gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> +                 tree perm_dest
> +                   = vect_create_destination_var (gimple_assign_lhs (stmt),
> +                                                  vectype);
> +                 perm_dest = make_ssa_name (perm_dest);
> +                 gimple *perm_stmt
> +                   = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec,
> +                                          second_vec, mask_vec);
> +                 vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> +                                              gsi);
> +                 if (dce_chain)
>                     {
> -                     /* Generate the permute statement if necessary.  */
> -                     tree first_vec = dr_chain[first_vec_index + ri];
> -                     tree second_vec = dr_chain[second_vec_index + ri];
> -                     gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> -                     tree perm_dest
> -                       = vect_create_destination_var (gimple_assign_lhs (stmt),
> -                                                      vectype);
> -                     perm_dest = make_ssa_name (perm_dest);
> -                     gimple *perm_stmt
> -                       = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> -                                              first_vec, second_vec, mask_vec);
> -                     vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> -                                                  gsi);
> -                     if (dce_chain)
> -                       {
> -                         bitmap_set_bit (used_defs, first_vec_index + ri);
> -                         bitmap_set_bit (used_defs, second_vec_index + ri);
> -                       }
> -
> -                     /* Store the vector statement in NODE.  */
> -                     SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
> -                       = perm_stmt;
> +                     bitmap_set_bit (used_defs, first_vec_index + ri);
> +                     bitmap_set_bit (used_defs, second_vec_index + ri);
>                     }
> +
> +                 /* Store the vector statement in NODE.  */
> +                 SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
>                 }
>             }
>           else if (!analyze_only)
> --
> 2.39.1
>

  reply	other threads:[~2023-05-22 13:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-17  6:09 [PATCH 1/2] vect: Refactor code for index == count " Kewen.Lin
2023-05-17  6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin
2023-05-22 13:44   ` Richard Biener [this message]
2023-05-23  3:01     ` Kewen.Lin
2023-05-23  6:19       ` Richard Biener
2023-05-24  5:23         ` Kewen.Lin
2023-05-17  6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener
2023-05-17  7:18   ` Kewen.Lin
2023-05-18  6:12     ` Richard Biener
2023-05-22  5:37       ` Kewen.Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc0t03oJ3D0rdfzrviPk7Zggek095_F9-aJS2VESP-u10g@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=bergner@linux.ibm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@linux.ibm.com \
    --cc=richard.sandiford@arm.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).