From: "Kewen.Lin" <linkw@linux.ibm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
Bill Schmidt <wschmidt@linux.ibm.com>,
David Edelsohn <dje.gcc@gmail.com>
Subject: [PATCH] rs6000: Modify the way for extra penalized cost
Date: Thu, 16 Sep 2021 09:14:15 +0800 [thread overview]
Message-ID: <d26e0a72-a029-c765-75ab-9b31de44f114@linux.ibm.com> (raw)
Hi,
This patch follows the discussion here[1], where Segher pointed
out the existing way to guard the extra penalized cost for
strided/elementwise loads with a magic bound doesn't scale.
The way with nunits * stmt_cost can get one much exaggerated
penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320,
that's why we need one bound. To make it scale, this patch
doesn't use nunits * stmt_cost any more, but it still keeps
nunits since there are actually nunits scalar loads there. So
it uses one cost adjusted from stmt_cost, since the current
stmt_cost sort of considers nunits, we can stablize the cost
for big nunits and retain the cost for small nunits. After
some tries, this patch gets the adjusted cost as:
stmt_cost / (log2(nunits) * log2(nunits))
For V16QI, the adjusted cost would be 1 and total penalized
cost is 16, it isn't exaggerated. For V2DI, the adjusted
cost would be 2 and total penalized cost is 4, which is the
same as before. btw, I tried to use one single log2(nunits),
but the penalized cost is still big enough and can't fix the
degraded bmk blender_r.
The separated SPEC2017 evaluations on Power8, Power9 and Power10
at option sets O2-vect and Ofast-unroll showed this change is
neutral (that is same effect as before).
Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.
Is it ok for trunk?
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html
BR,
Kewen
-----
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust
the way to compute extra penalized cost.
---
gcc/config/rs6000/rs6000.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4ab23b0ab33..e08b94c0447 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5454,17 +5454,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data *data,
{
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
unsigned int nunits = vect_nunits_for_cost (vectype);
- unsigned int extra_cost = nunits * stmt_cost;
- /* As function rs6000_builtin_vectorization_cost shows, we have
- priced much on V16QI/V8HI vector construction as their units,
- if we penalize them with nunits * stmt_cost, it can result in
- an unreliable body cost, eg: for V16QI on Power8, stmt_cost
- is 20 and nunits is 16, the extra cost is 320 which looks
- much exaggerated. So let's use one maximum bound for the
- extra penalized cost for vector construction here. */
- const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
- if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
- extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
+ /* As function rs6000_builtin_vectorization_cost shows, we
+ have priced much on V16QI/V8HI vector construction by
+ considering their units, if we penalize them with nunits
+ * stmt_cost here, it can result in an unreliable body cost,
+ eg: for V16QI on Power8, stmt_cost is 20 and nunits is 16,
+ the penalty will be 320 which looks much exaggerated. But
+ there are actually nunits scalar loads, so we try to adopt
+ one reasonable penalized cost for each load rather than
+ stmt_cost. Here, with stmt_cost dividing by log2(nunits)^2,
+ we can still retain the necessary penalty for small nunits
+ meanwhile stabilize the penalty for big nunits. */
+ int nunits_log2 = exact_log2 (nunits);
+ gcc_assert (nunits_log2 > 0);
+ unsigned int nunits_sq = nunits_log2 * nunits_log2;
+ unsigned int adjusted_cost = stmt_cost / nunits_sq;
+ gcc_assert (adjusted_cost > 0);
+ unsigned int extra_cost = nunits * adjusted_cost;
data->extra_ctor_cost += extra_cost;
}
}
--
2.25.1
next reply other threads:[~2021-09-16 1:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-16 1:14 Kewen.Lin [this message]
2021-09-17 16:34 ` Bill Schmidt
2021-09-21 2:20 ` Kewen.Lin
2021-09-17 22:01 ` Segher Boessenkool
2021-09-21 3:24 ` Kewen.Lin
2021-09-22 22:36 ` Segher Boessenkool
2021-09-28 8:39 ` Kewen.Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d26e0a72-a029-c765-75ab-9b31de44f114@linux.ibm.com \
--to=linkw@linux.ibm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).