Re: [PATCH] rs6000: Modify the way for extra penalized cost

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Kewen.Lin" <linkw@linux.ibm.com>
To: wschmidt@linux.ibm.com
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	David Edelsohn <dje.gcc@gmail.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] rs6000: Modify the way for extra penalized cost
Date: Tue, 21 Sep 2021 10:20:30 +0800	[thread overview]
Message-ID: <6b75c3c8-84e6-7819-a890-ffd4d32d550d@linux.ibm.com> (raw)
In-Reply-To: <53cee221-b757-9071-1e74-3c5722a27f30@linux.ibm.com>

Hi Bill,

Thanks for the review!

on 2021/9/18 上午12:34, Bill Schmidt wrote:
> Hi Kewen,
> 
> On 9/15/21 8:14 PM, Kewen.Lin wrote:
>> Hi,
>>
>> This patch follows the discussion here[1], where Segher pointed
>> out the existing way to guard the extra penalized cost for
>> strided/elementwise loads with a magic bound doesn't scale.
>>
>> The way with nunits * stmt_cost can get one much exaggerated
>> penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320,
>> that's why we need one bound.  To make it scale, this patch
>> doesn't use nunits * stmt_cost any more, but it still keeps
>> nunits since there are actually nunits scalar loads there.  So
>> it uses one cost adjusted from stmt_cost, since the current
>> stmt_cost sort of considers nunits, we can stablize the cost
>> for big nunits and retain the cost for small nunits.  After
>> some tries, this patch gets the adjusted cost as:
>>
>>      stmt_cost / (log2(nunits) * log2(nunits))
>>
>> For V16QI, the adjusted cost would be 1 and total penalized
>> cost is 16, it isn't exaggerated.  For V2DI, the adjusted
>> cost would be 2 and total penalized cost is 4, which is the
>> same as before.  btw, I tried to use one single log2(nunits),
>> but the penalized cost is still big enough and can't fix the
>> degraded bmk blender_r.
>>
>> The separated SPEC2017 evaluations on Power8, Power9 and Power10
>> at option sets O2-vect and Ofast-unroll showed this change is
>> neutral (that is same effect as before).
>>
>> Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.
>>
>> Is it ok for trunk?
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html
>>
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>>     * config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust
>>     the way to compute extra penalized cost.
>>
>> ---
>>   gcc/config/rs6000/rs6000.c | 28 +++++++++++++++++-----------
>>   1 file changed, 17 insertions(+), 11 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 4ab23b0ab33..e08b94c0447 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -5454,17 +5454,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data *data,
>>       {
>>         tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>>         unsigned int nunits = vect_nunits_for_cost (vectype);
>> -      unsigned int extra_cost = nunits * stmt_cost;
>> -      /* As function rs6000_builtin_vectorization_cost shows, we have
>> -         priced much on V16QI/V8HI vector construction as their units,
>> -         if we penalize them with nunits * stmt_cost, it can result in
>> -         an unreliable body cost, eg: for V16QI on Power8, stmt_cost
>> -         is 20 and nunits is 16, the extra cost is 320 which looks
>> -         much exaggerated.  So let's use one maximum bound for the
>> -         extra penalized cost for vector construction here.  */
>> -      const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
>> -      if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
>> -        extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
>> +      /* As function rs6000_builtin_vectorization_cost shows, we
>> +         have priced much on V16QI/V8HI vector construction by
>> +         considering their units, if we penalize them with nunits
>> +         * stmt_cost here, it can result in an unreliable body cost,
> 
> This might be confusing to the reader, since you have deleted the calculation of nunits * stmt_cost.  Could you instead write this to indicate that we used to adjust in this way, and it had this particular downside, so that's why you're choosing this heuristic? It's a minor thing but I think people reading the code will be confused otherwise.
> 

Good point!  I'll update the commentary to explain it, thanks!!

BR,
Kewen 

> I think the heuristic is generally reasonable, and certainly better than what we had before!
> 
> LGTM with adjusted commentary, so recommend maintainers approve.
> 
> Thanks for the patch!
> Bill
>> +         eg: for V16QI on Power8, stmt_cost is 20 and nunits is 16,
>> +         the penalty will be 320 which looks much exaggerated.  But
>> +         there are actually nunits scalar loads, so we try to adopt
>> +         one reasonable penalized cost for each load rather than
>> +         stmt_cost.  Here, with stmt_cost dividing by log2(nunits)^2,
>> +         we can still retain the necessary penalty for small nunits
>> +         meanwhile stabilize the penalty for big nunits.  */
>> +      int nunits_log2 = exact_log2 (nunits);
>> +      gcc_assert (nunits_log2 > 0);
>> +      unsigned int nunits_sq = nunits_log2 * nunits_log2;
>> +      unsigned int adjusted_cost = stmt_cost / nunits_sq;
>> +      gcc_assert (adjusted_cost > 0);
>> +      unsigned int extra_cost = nunits * adjusted_cost;
>>         data->extra_ctor_cost += extra_cost;
>>       }
>>       }
>> -- 
>> 2.25.1
>

next prev parent reply	other threads:[~2021-09-21  2:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-16  1:14 Kewen.Lin
2021-09-17 16:34 ` Bill Schmidt
2021-09-21  2:20   ` Kewen.Lin [this message]
2021-09-17 22:01 ` Segher Boessenkool
2021-09-21  3:24   ` Kewen.Lin
2021-09-22 22:36     ` Segher Boessenkool
2021-09-28  8:39       ` Kewen.Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6b75c3c8-84e6-7819-a890-ffd4d32d550d@linux.ibm.com \
    --to=linkw@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).