public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] rs6000: Modify the way for extra penalized cost
@ 2021-09-16  1:14 Kewen.Lin
  2021-09-17 16:34 ` Bill Schmidt
  2021-09-17 22:01 ` Segher Boessenkool
  0 siblings, 2 replies; 7+ messages in thread
From: Kewen.Lin @ 2021-09-16  1:14 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool, Bill Schmidt, David Edelsohn

Hi,

This patch follows the discussion here[1], where Segher pointed
out the existing way to guard the extra penalized cost for
strided/elementwise loads with a magic bound doesn't scale.

The way with nunits * stmt_cost can get one much exaggerated
penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320,
that's why we need one bound.  To make it scale, this patch
doesn't use nunits * stmt_cost any more, but it still keeps
nunits since there are actually nunits scalar loads there.  So
it uses one cost adjusted from stmt_cost, since the current
stmt_cost sort of considers nunits, we can stablize the cost
for big nunits and retain the cost for small nunits.  After
some tries, this patch gets the adjusted cost as:

    stmt_cost / (log2(nunits) * log2(nunits))

For V16QI, the adjusted cost would be 1 and total penalized
cost is 16, it isn't exaggerated.  For V2DI, the adjusted
cost would be 2 and total penalized cost is 4, which is the
same as before.  btw, I tried to use one single log2(nunits),
but the penalized cost is still big enough and can't fix the
degraded bmk blender_r.

The separated SPEC2017 evaluations on Power8, Power9 and Power10
at option sets O2-vect and Ofast-unroll showed this change is
neutral (that is same effect as before).

Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html

BR,
Kewen
-----
gcc/ChangeLog:

	* config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust
	the way to compute extra penalized cost.

---
 gcc/config/rs6000/rs6000.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4ab23b0ab33..e08b94c0447 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5454,17 +5454,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data *data,
 	{
 	  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
 	  unsigned int nunits = vect_nunits_for_cost (vectype);
-	  unsigned int extra_cost = nunits * stmt_cost;
-	  /* As function rs6000_builtin_vectorization_cost shows, we have
-	     priced much on V16QI/V8HI vector construction as their units,
-	     if we penalize them with nunits * stmt_cost, it can result in
-	     an unreliable body cost, eg: for V16QI on Power8, stmt_cost
-	     is 20 and nunits is 16, the extra cost is 320 which looks
-	     much exaggerated.  So let's use one maximum bound for the
-	     extra penalized cost for vector construction here.  */
-	  const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
-	  if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
-	    extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
+	  /* As function rs6000_builtin_vectorization_cost shows, we
+	     have priced much on V16QI/V8HI vector construction by
+	     considering their units, if we penalize them with nunits
+	     * stmt_cost here, it can result in an unreliable body cost,
+	     eg: for V16QI on Power8, stmt_cost is 20 and nunits is 16,
+	     the penalty will be 320 which looks much exaggerated.  But
+	     there are actually nunits scalar loads, so we try to adopt
+	     one reasonable penalized cost for each load rather than
+	     stmt_cost.  Here, with stmt_cost dividing by log2(nunits)^2,
+	     we can still retain the necessary penalty for small nunits
+	     meanwhile stabilize the penalty for big nunits.  */
+	  int nunits_log2 = exact_log2 (nunits);
+	  gcc_assert (nunits_log2 > 0);
+	  unsigned int nunits_sq = nunits_log2 * nunits_log2;
+	  unsigned int adjusted_cost = stmt_cost / nunits_sq;
+	  gcc_assert (adjusted_cost > 0);
+	  unsigned int extra_cost = nunits * adjusted_cost;
 	  data->extra_ctor_cost += extra_cost;
 	}
     }
--
2.25.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-09-28  8:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-16  1:14 [PATCH] rs6000: Modify the way for extra penalized cost Kewen.Lin
2021-09-17 16:34 ` Bill Schmidt
2021-09-21  2:20   ` Kewen.Lin
2021-09-17 22:01 ` Segher Boessenkool
2021-09-21  3:24   ` Kewen.Lin
2021-09-22 22:36     ` Segher Boessenkool
2021-09-28  8:39       ` Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).