public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Kewen.Lin" <linkw@linux.ibm.com>
To: Hongtao Liu <crazylht@gmail.com>
Cc: gcc-patches@gcc.gnu.org, richard.guenther@gmail.com,
	richard.sandiford@arm.com, segher@kernel.crashing.org,
	bergner@linux.ibm.com, ubizjak@gmail.com, hongtao.liu@intel.com
Subject: Re: [PATCH 8/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE
Date: Mon, 19 Jun 2023 15:23:55 +0800	[thread overview]
Message-ID: <dbd0bff8-38b3-ee26-dbbb-02ca3e03f31b@linux.ibm.com> (raw)
In-Reply-To: <CAMZc-bzyDicGYWfEL0c-sLuWPdaXOURpywL_aAPZGxhCwNaTaA@mail.gmail.com>

Hi Hongtao,

on 2023/6/14 16:17, Hongtao Liu wrote:
> On Tue, Jun 13, 2023 at 10:07 AM Kewen Lin via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> This patch adjusts the cost handling on
>> VMAT_CONTIGUOUS_PERMUTE in function vectorizable_load.  We
>> don't call function vect_model_load_cost for it any more.
>>
>> As the affected test case gcc.target/i386/pr70021.c shows,
>> the previous costing can under-cost the total generated
>> vector loads as for VMAT_CONTIGUOUS_PERMUTE function
>> vect_model_load_cost doesn't consider the group size which
>> is considered as vec_num during the transformation.
> The original PR is for the correctness issue, and I'm not sure how
> much of a performance impact the patch would be, but the change looks
> reasonable, so the test change looks ok to me.
> I'll track performance impact on SPEC2017 to see if there's any
> regression caused by the patch(Guess probably not).

Thanks for the feedback and further tracking!  Hope this (and
this whole series) doesn't impact SPEC2017 performance on x86. :)

BR,
Kewen

>>
>> This patch makes the count of vector load in costing become
>> consistent with what we generates during the transformation.
>> To be more specific, for the given test case, for memory
>> access b[i_20], it costed for 2 vector loads before,
>> with this patch it costs 8 instead, it matches the final
>> count of generated vector loads basing from b.  This costing
>> change makes cost model analysis feel it's not profitable
>> to vectorize the first loop, so this patch adjusts the test
>> case without vect cost model any more.
>>
>> But note that this test case also exposes something we can
>> improve further is that although the number of vector
>> permutation what we costed and generated are consistent,
>> but DCE can further optimize some unused permutation out,
>> it would be good if we can predict that and generate only
>> those necessary permutations.
>>
>> gcc/ChangeLog:
>>
>>         * tree-vect-stmts.cc (vect_model_load_cost): Assert this function only
>>         handle memory_access_type VMAT_CONTIGUOUS, remove some
>>         VMAT_CONTIGUOUS_PERMUTE related handlings.
>>         (vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE
>>         without calling vect_model_load_cost.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         * gcc.target/i386/pr70021.c: Adjust with -fno-vect-cost-model.
>> ---
>>  gcc/testsuite/gcc.target/i386/pr70021.c |  2 +-
>>  gcc/tree-vect-stmts.cc                  | 88 ++++++++++++++-----------
>>  2 files changed, 51 insertions(+), 39 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/i386/pr70021.c b/gcc/testsuite/gcc.target/i386/pr70021.c
>> index 6562c0f2bd0..d509583601e 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr70021.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr70021.c
>> @@ -1,7 +1,7 @@
>>  /* PR target/70021 */
>>  /* { dg-do run } */
>>  /* { dg-require-effective-target avx2 } */
>> -/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details -mtune=skylake" } */
>> +/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details -mtune=skylake -fno-vect-cost-model" } */
>>
>>  #include "avx2-check.h"
>>
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index 7f8d9db5363..e7a97dbe05d 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -1134,8 +1134,7 @@ vect_model_load_cost (vec_info *vinfo,
>>                       slp_tree slp_node,
>>                       stmt_vector_for_cost *cost_vec)
>>  {
>> -  gcc_assert (memory_access_type == VMAT_CONTIGUOUS
>> -             || memory_access_type == VMAT_CONTIGUOUS_PERMUTE);
>> +  gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
>>
>>    unsigned int inside_cost = 0, prologue_cost = 0;
>>    bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
>> @@ -1174,26 +1173,6 @@ vect_model_load_cost (vec_info *vinfo,
>>       once per group anyhow.  */
>>    bool first_stmt_p = (first_stmt_info == stmt_info);
>>
>> -  /* We assume that the cost of a single load-lanes instruction is
>> -     equivalent to the cost of DR_GROUP_SIZE separate loads.  If a grouped
>> -     access is instead being provided by a load-and-permute operation,
>> -     include the cost of the permutes.  */
>> -  if (first_stmt_p
>> -      && memory_access_type == VMAT_CONTIGUOUS_PERMUTE)
>> -    {
>> -      /* Uses an even and odd extract operations or shuffle operations
>> -        for each needed permute.  */
>> -      int group_size = DR_GROUP_SIZE (first_stmt_info);
>> -      int nstmts = ncopies * ceil_log2 (group_size) * group_size;
>> -      inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm,
>> -                                      stmt_info, 0, vect_body);
>> -
>> -      if (dump_enabled_p ())
>> -        dump_printf_loc (MSG_NOTE, vect_location,
>> -                         "vect_model_load_cost: strided group_size = %d .\n",
>> -                         group_size);
>> -    }
>> -
>>    vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
>>                       misalignment, first_stmt_p, &inside_cost, &prologue_cost,
>>                       cost_vec, cost_vec, true);
>> @@ -10652,11 +10631,22 @@ vectorizable_load (vec_info *vinfo,
>>                  alignment support schemes.  */
>>               if (costing_p)
>>                 {
>> -                 if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
>> +                 /* For VMAT_CONTIGUOUS_PERMUTE if it's grouped load, we
>> +                    only need to take care of the first stmt, whose
>> +                    stmt_info is first_stmt_info, vec_num iterating on it
>> +                    will cover the cost for the remaining, it's consistent
>> +                    with transforming.  For the prologue cost for realign,
>> +                    we only need to count it once for the whole group.  */
>> +                 bool first_stmt_info_p = first_stmt_info == stmt_info;
>> +                 bool add_realign_cost = first_stmt_info_p && i == 0;
>> +                 if (memory_access_type == VMAT_CONTIGUOUS_REVERSE
>> +                     || (memory_access_type == VMAT_CONTIGUOUS_PERMUTE
>> +                         && (!grouped_load || first_stmt_info_p)))
>>                     vect_get_load_cost (vinfo, stmt_info, 1,
>>                                         alignment_support_scheme, misalignment,
>> -                                       false, &inside_cost, &prologue_cost,
>> -                                       cost_vec, cost_vec, true);
>> +                                       add_realign_cost, &inside_cost,
>> +                                       &prologue_cost, cost_vec, cost_vec,
>> +                                       true);
>>                 }
>>               else
>>                 {
>> @@ -10774,8 +10764,7 @@ vectorizable_load (vec_info *vinfo,
>>              ???  This is a hack to prevent compile-time issues as seen
>>              in PR101120 and friends.  */
>>           if (costing_p
>> -             && memory_access_type != VMAT_CONTIGUOUS
>> -             && memory_access_type != VMAT_CONTIGUOUS_PERMUTE)
>> +             && memory_access_type != VMAT_CONTIGUOUS)
>>             {
>>               vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf,
>>                                             true, &n_perms, nullptr);
>> @@ -10790,20 +10779,44 @@ vectorizable_load (vec_info *vinfo,
>>               gcc_assert (ok);
>>             }
>>         }
>> -      else if (!costing_p)
>> +      else
>>          {
>>            if (grouped_load)
>>             {
>>               if (memory_access_type != VMAT_LOAD_STORE_LANES)
>> -               vect_transform_grouped_load (vinfo, stmt_info, dr_chain,
>> -                                            group_size, gsi);
>> -             *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
>> -           }
>> -          else
>> -           {
>> -             STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>> +               {
>> +                 gcc_assert (memory_access_type == VMAT_CONTIGUOUS_PERMUTE);
>> +                 /* We assume that the cost of a single load-lanes instruction
>> +                    is equivalent to the cost of DR_GROUP_SIZE separate loads.
>> +                    If a grouped access is instead being provided by a
>> +                    load-and-permute operation, include the cost of the
>> +                    permutes.  */
>> +                 if (costing_p && first_stmt_info == stmt_info)
>> +                   {
>> +                     /* Uses an even and odd extract operations or shuffle
>> +                        operations for each needed permute.  */
>> +                     int group_size = DR_GROUP_SIZE (first_stmt_info);
>> +                     int nstmts = ceil_log2 (group_size) * group_size;
>> +                     inside_cost
>> +                       += record_stmt_cost (cost_vec, nstmts, vec_perm,
>> +                                            stmt_info, 0, vect_body);
>> +
>> +                     if (dump_enabled_p ())
>> +                       dump_printf_loc (
>> +                         MSG_NOTE, vect_location,
>> +                         "vect_model_load_cost: strided group_size = %d .\n",
>> +                         group_size);
>> +                   }
>> +                 else if (!costing_p)
>> +                   vect_transform_grouped_load (vinfo, stmt_info, dr_chain,
>> +                                                group_size, gsi);
>> +               }
>> +             if (!costing_p)
>> +               *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
>>             }
>> -        }
>> +         else if (!costing_p)
>> +           STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>> +       }
>>        dr_chain.release ();
>>      }
>>    if (!slp && !costing_p)
>> @@ -10814,8 +10827,7 @@ vectorizable_load (vec_info *vinfo,
>>        gcc_assert (memory_access_type != VMAT_INVARIANT
>>                   && memory_access_type != VMAT_ELEMENTWISE
>>                   && memory_access_type != VMAT_STRIDED_SLP);
>> -      if (memory_access_type != VMAT_CONTIGUOUS
>> -         && memory_access_type != VMAT_CONTIGUOUS_PERMUTE)
>> +      if (memory_access_type != VMAT_CONTIGUOUS)
>>         {
>>           if (dump_enabled_p ())
>>             dump_printf_loc (MSG_NOTE, vect_location,
>> --
>> 2.31.1
>>
> 
> 


  reply	other threads:[~2023-06-19  7:24 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-13  2:03 [PATCH 0/9] vect: Move costing next to the transform for vect load Kewen Lin
2023-06-13  2:03 ` [PATCH 1/9] vect: Move vect_model_load_cost next to the transform in vectorizable_load Kewen Lin
2023-06-13  2:03 ` [PATCH 2/9] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl Kewen Lin
2023-06-30 11:11   ` Richard Biener
2023-07-03  2:57     ` [PATCH 2/9 v2] " Kewen.Lin
2023-06-13  2:03 ` [PATCH 3/9] vect: Adjust vectorizable_load costing on VMAT_INVARIANT Kewen Lin
2023-06-30 11:18   ` Richard Biener
2023-07-03  2:58     ` [PATCH 3/9 v2] " Kewen.Lin
2023-06-13  2:03 ` [PATCH 4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP Kewen Lin
2023-07-02  8:58   ` Richard Sandiford
2023-07-03  3:19     ` Kewen.Lin
2023-07-22 15:58       ` Iain Sandoe
2023-07-24  1:50         ` Kewen.Lin
2023-06-13  2:03 ` [PATCH 5/9] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER Kewen Lin
2023-07-03  3:01   ` [PATCH 5/9 v2] " Kewen.Lin
2023-06-13  2:03 ` [PATCH 6/9] vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES Kewen Lin
2023-06-13  2:03 ` [PATCH 7/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE Kewen Lin
2023-06-13  2:03 ` [PATCH 8/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE Kewen Lin
2023-06-14  8:17   ` Hongtao Liu
2023-06-19  7:23     ` Kewen.Lin [this message]
2023-06-13  2:03 ` [PATCH 9/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS Kewen Lin
2023-07-03  3:06   ` [PATCH 9/9 v2] " Kewen.Lin
2023-06-26  6:00 ` [PATCH 0/9] vect: Move costing next to the transform for vect load Kewen.Lin
2023-06-30 11:37 ` Richard Biener
2023-07-02  9:13   ` Richard Sandiford
2023-07-03  3:39   ` Kewen.Lin
2023-07-03  8:42     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dbd0bff8-38b3-ee26-dbbb-02ca3e03f31b@linux.ibm.com \
    --to=linkw@linux.ibm.com \
    --cc=bergner@linux.ibm.com \
    --cc=crazylht@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hongtao.liu@intel.com \
    --cc=richard.guenther@gmail.com \
    --cc=richard.sandiford@arm.com \
    --cc=segher@kernel.crashing.org \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).