From: "Kewen.Lin" <linkw@linux.ibm.com>
To: Hongtao Liu <crazylht@gmail.com>
Cc: gcc-patches@gcc.gnu.org, richard.guenther@gmail.com,
richard.sandiford@arm.com, segher@kernel.crashing.org,
bergner@linux.ibm.com, ubizjak@gmail.com, hongtao.liu@intel.com
Subject: Re: [PATCH 8/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE
Date: Mon, 19 Jun 2023 15:23:55 +0800 [thread overview]
Message-ID: <dbd0bff8-38b3-ee26-dbbb-02ca3e03f31b@linux.ibm.com> (raw)
In-Reply-To: <CAMZc-bzyDicGYWfEL0c-sLuWPdaXOURpywL_aAPZGxhCwNaTaA@mail.gmail.com>
Hi Hongtao,
on 2023/6/14 16:17, Hongtao Liu wrote:
> On Tue, Jun 13, 2023 at 10:07 AM Kewen Lin via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> This patch adjusts the cost handling on
>> VMAT_CONTIGUOUS_PERMUTE in function vectorizable_load. We
>> don't call function vect_model_load_cost for it any more.
>>
>> As the affected test case gcc.target/i386/pr70021.c shows,
>> the previous costing can under-cost the total generated
>> vector loads as for VMAT_CONTIGUOUS_PERMUTE function
>> vect_model_load_cost doesn't consider the group size which
>> is considered as vec_num during the transformation.
> The original PR is for the correctness issue, and I'm not sure how
> much of a performance impact the patch would be, but the change looks
> reasonable, so the test change looks ok to me.
> I'll track performance impact on SPEC2017 to see if there's any
> regression caused by the patch(Guess probably not).
Thanks for the feedback and further tracking! Hope this (and
this whole series) doesn't impact SPEC2017 performance on x86. :)
BR,
Kewen
>>
>> This patch makes the count of vector load in costing become
>> consistent with what we generates during the transformation.
>> To be more specific, for the given test case, for memory
>> access b[i_20], it costed for 2 vector loads before,
>> with this patch it costs 8 instead, it matches the final
>> count of generated vector loads basing from b. This costing
>> change makes cost model analysis feel it's not profitable
>> to vectorize the first loop, so this patch adjusts the test
>> case without vect cost model any more.
>>
>> But note that this test case also exposes something we can
>> improve further is that although the number of vector
>> permutation what we costed and generated are consistent,
>> but DCE can further optimize some unused permutation out,
>> it would be good if we can predict that and generate only
>> those necessary permutations.
>>
>> gcc/ChangeLog:
>>
>> * tree-vect-stmts.cc (vect_model_load_cost): Assert this function only
>> handle memory_access_type VMAT_CONTIGUOUS, remove some
>> VMAT_CONTIGUOUS_PERMUTE related handlings.
>> (vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE
>> without calling vect_model_load_cost.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/i386/pr70021.c: Adjust with -fno-vect-cost-model.
>> ---
>> gcc/testsuite/gcc.target/i386/pr70021.c | 2 +-
>> gcc/tree-vect-stmts.cc | 88 ++++++++++++++-----------
>> 2 files changed, 51 insertions(+), 39 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/i386/pr70021.c b/gcc/testsuite/gcc.target/i386/pr70021.c
>> index 6562c0f2bd0..d509583601e 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr70021.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr70021.c
>> @@ -1,7 +1,7 @@
>> /* PR target/70021 */
>> /* { dg-do run } */
>> /* { dg-require-effective-target avx2 } */
>> -/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details -mtune=skylake" } */
>> +/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details -mtune=skylake -fno-vect-cost-model" } */
>>
>> #include "avx2-check.h"
>>
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index 7f8d9db5363..e7a97dbe05d 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -1134,8 +1134,7 @@ vect_model_load_cost (vec_info *vinfo,
>> slp_tree slp_node,
>> stmt_vector_for_cost *cost_vec)
>> {
>> - gcc_assert (memory_access_type == VMAT_CONTIGUOUS
>> - || memory_access_type == VMAT_CONTIGUOUS_PERMUTE);
>> + gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
>>
>> unsigned int inside_cost = 0, prologue_cost = 0;
>> bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
>> @@ -1174,26 +1173,6 @@ vect_model_load_cost (vec_info *vinfo,
>> once per group anyhow. */
>> bool first_stmt_p = (first_stmt_info == stmt_info);
>>
>> - /* We assume that the cost of a single load-lanes instruction is
>> - equivalent to the cost of DR_GROUP_SIZE separate loads. If a grouped
>> - access is instead being provided by a load-and-permute operation,
>> - include the cost of the permutes. */
>> - if (first_stmt_p
>> - && memory_access_type == VMAT_CONTIGUOUS_PERMUTE)
>> - {
>> - /* Uses an even and odd extract operations or shuffle operations
>> - for each needed permute. */
>> - int group_size = DR_GROUP_SIZE (first_stmt_info);
>> - int nstmts = ncopies * ceil_log2 (group_size) * group_size;
>> - inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm,
>> - stmt_info, 0, vect_body);
>> -
>> - if (dump_enabled_p ())
>> - dump_printf_loc (MSG_NOTE, vect_location,
>> - "vect_model_load_cost: strided group_size = %d .\n",
>> - group_size);
>> - }
>> -
>> vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
>> misalignment, first_stmt_p, &inside_cost, &prologue_cost,
>> cost_vec, cost_vec, true);
>> @@ -10652,11 +10631,22 @@ vectorizable_load (vec_info *vinfo,
>> alignment support schemes. */
>> if (costing_p)
>> {
>> - if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
>> + /* For VMAT_CONTIGUOUS_PERMUTE if it's grouped load, we
>> + only need to take care of the first stmt, whose
>> + stmt_info is first_stmt_info, vec_num iterating on it
>> + will cover the cost for the remaining, it's consistent
>> + with transforming. For the prologue cost for realign,
>> + we only need to count it once for the whole group. */
>> + bool first_stmt_info_p = first_stmt_info == stmt_info;
>> + bool add_realign_cost = first_stmt_info_p && i == 0;
>> + if (memory_access_type == VMAT_CONTIGUOUS_REVERSE
>> + || (memory_access_type == VMAT_CONTIGUOUS_PERMUTE
>> + && (!grouped_load || first_stmt_info_p)))
>> vect_get_load_cost (vinfo, stmt_info, 1,
>> alignment_support_scheme, misalignment,
>> - false, &inside_cost, &prologue_cost,
>> - cost_vec, cost_vec, true);
>> + add_realign_cost, &inside_cost,
>> + &prologue_cost, cost_vec, cost_vec,
>> + true);
>> }
>> else
>> {
>> @@ -10774,8 +10764,7 @@ vectorizable_load (vec_info *vinfo,
>> ??? This is a hack to prevent compile-time issues as seen
>> in PR101120 and friends. */
>> if (costing_p
>> - && memory_access_type != VMAT_CONTIGUOUS
>> - && memory_access_type != VMAT_CONTIGUOUS_PERMUTE)
>> + && memory_access_type != VMAT_CONTIGUOUS)
>> {
>> vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf,
>> true, &n_perms, nullptr);
>> @@ -10790,20 +10779,44 @@ vectorizable_load (vec_info *vinfo,
>> gcc_assert (ok);
>> }
>> }
>> - else if (!costing_p)
>> + else
>> {
>> if (grouped_load)
>> {
>> if (memory_access_type != VMAT_LOAD_STORE_LANES)
>> - vect_transform_grouped_load (vinfo, stmt_info, dr_chain,
>> - group_size, gsi);
>> - *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
>> - }
>> - else
>> - {
>> - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>> + {
>> + gcc_assert (memory_access_type == VMAT_CONTIGUOUS_PERMUTE);
>> + /* We assume that the cost of a single load-lanes instruction
>> + is equivalent to the cost of DR_GROUP_SIZE separate loads.
>> + If a grouped access is instead being provided by a
>> + load-and-permute operation, include the cost of the
>> + permutes. */
>> + if (costing_p && first_stmt_info == stmt_info)
>> + {
>> + /* Uses an even and odd extract operations or shuffle
>> + operations for each needed permute. */
>> + int group_size = DR_GROUP_SIZE (first_stmt_info);
>> + int nstmts = ceil_log2 (group_size) * group_size;
>> + inside_cost
>> + += record_stmt_cost (cost_vec, nstmts, vec_perm,
>> + stmt_info, 0, vect_body);
>> +
>> + if (dump_enabled_p ())
>> + dump_printf_loc (
>> + MSG_NOTE, vect_location,
>> + "vect_model_load_cost: strided group_size = %d .\n",
>> + group_size);
>> + }
>> + else if (!costing_p)
>> + vect_transform_grouped_load (vinfo, stmt_info, dr_chain,
>> + group_size, gsi);
>> + }
>> + if (!costing_p)
>> + *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
>> }
>> - }
>> + else if (!costing_p)
>> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>> + }
>> dr_chain.release ();
>> }
>> if (!slp && !costing_p)
>> @@ -10814,8 +10827,7 @@ vectorizable_load (vec_info *vinfo,
>> gcc_assert (memory_access_type != VMAT_INVARIANT
>> && memory_access_type != VMAT_ELEMENTWISE
>> && memory_access_type != VMAT_STRIDED_SLP);
>> - if (memory_access_type != VMAT_CONTIGUOUS
>> - && memory_access_type != VMAT_CONTIGUOUS_PERMUTE)
>> + if (memory_access_type != VMAT_CONTIGUOUS)
>> {
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_NOTE, vect_location,
>> --
>> 2.31.1
>>
>
>
next prev parent reply other threads:[~2023-06-19 7:24 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-13 2:03 [PATCH 0/9] vect: Move costing next to the transform for vect load Kewen Lin
2023-06-13 2:03 ` [PATCH 1/9] vect: Move vect_model_load_cost next to the transform in vectorizable_load Kewen Lin
2023-06-13 2:03 ` [PATCH 2/9] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl Kewen Lin
2023-06-30 11:11 ` Richard Biener
2023-07-03 2:57 ` [PATCH 2/9 v2] " Kewen.Lin
2023-06-13 2:03 ` [PATCH 3/9] vect: Adjust vectorizable_load costing on VMAT_INVARIANT Kewen Lin
2023-06-30 11:18 ` Richard Biener
2023-07-03 2:58 ` [PATCH 3/9 v2] " Kewen.Lin
2023-06-13 2:03 ` [PATCH 4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP Kewen Lin
2023-07-02 8:58 ` Richard Sandiford
2023-07-03 3:19 ` Kewen.Lin
2023-07-22 15:58 ` Iain Sandoe
2023-07-24 1:50 ` Kewen.Lin
2023-06-13 2:03 ` [PATCH 5/9] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER Kewen Lin
2023-07-03 3:01 ` [PATCH 5/9 v2] " Kewen.Lin
2023-06-13 2:03 ` [PATCH 6/9] vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES Kewen Lin
2023-06-13 2:03 ` [PATCH 7/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE Kewen Lin
2023-06-13 2:03 ` [PATCH 8/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE Kewen Lin
2023-06-14 8:17 ` Hongtao Liu
2023-06-19 7:23 ` Kewen.Lin [this message]
2023-06-13 2:03 ` [PATCH 9/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS Kewen Lin
2023-07-03 3:06 ` [PATCH 9/9 v2] " Kewen.Lin
2023-06-26 6:00 ` [PATCH 0/9] vect: Move costing next to the transform for vect load Kewen.Lin
2023-06-30 11:37 ` Richard Biener
2023-07-02 9:13 ` Richard Sandiford
2023-07-03 3:39 ` Kewen.Lin
2023-07-03 8:42 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dbd0bff8-38b3-ee26-dbbb-02ca3e03f31b@linux.ibm.com \
--to=linkw@linux.ibm.com \
--cc=bergner@linux.ibm.com \
--cc=crazylht@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hongtao.liu@intel.com \
--cc=richard.guenther@gmail.com \
--cc=richard.sandiford@arm.com \
--cc=segher@kernel.crashing.org \
--cc=ubizjak@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).