Re: [PATCH v4] rs6000: Add load density heuristic

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Bill Schmidt <wschmidt@linux.ibm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>,
	"Kewen.Lin" <linkw@linux.ibm.com>
Cc: David Edelsohn <dje.gcc@gmail.com>,
	will schmidt <will_schmidt@vnet.ibm.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH v4] rs6000: Add load density heuristic
Date: Thu, 9 Sep 2021 12:19:28 -0500	[thread overview]
Message-ID: <894f01c3-6481-0757-751f-b4239a4f0232@linux.ibm.com> (raw)
In-Reply-To: <20210909161152.GR1583@gate.crashing.org>

On 9/9/21 11:11 AM, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Sep 08, 2021 at 02:57:14PM +0800, Kewen.Lin wrote:
>>>> +      /* If we have strided or elementwise loads into a vector, it's
>>>> +	 possible to be bounded by latency and execution resources for
>>>> +	 many scalar loads.  Try to account for this by scaling the
>>>> +	 construction cost by the number of elements involved, when
>>>> +	 handling each matching statement we record the possible extra
>>>> +	 penalized cost into target cost, in the end of costing for
>>>> +	 the whole loop, we do the actual penalization once some load
>>>> +	 density heuristics are satisfied.  */
>>> The above comment is quite hard to read.  Can you please break up the last
>>> sentence into at least two sentences?
>> How about the below:
>>
>> +      /* If we have strided or elementwise loads into a vector, it's
> "strided" is not a word: it properly is "stridden", which does not read
> very well either.  "Have loads by stride, or by element, ..."?  Is that
> good English, and easier to understand?

No, this is OK.  "Strided loads" is a term of art used by the 
vectorizer; whether or not it was the Queen's English, it's what we 
have...  (And I think you might only find "bestridden" in some 18th or 
19th century English poetry... :-)
>
>> +        possible to be bounded by latency and execution resources for
>> +        many scalar loads.  Try to account for this by scaling the
>> +        construction cost by the number of elements involved.  For
>> +        each matching statement, we record the possible extra
>> +        penalized cost into the relevant field in target cost.  When
>> +        we want to finalize the whole loop costing, we will check if
>> +        those related load density heuristics are satisfied, and add
>> +        this accumulated penalized cost if yes.  */
>>
>>> Otherwise this looks good to me, and I recommend maintainers approve with
>>> that clarified.
> Does that text look good to you now Bill?  It is still kinda complex,
> maybe you see a way to make it simpler.

I think it's OK now.  The complexity at least matches the code now 
instead of exceeding it. :-P  j/k...

>
>> 	* config/rs6000/rs6000.c (struct rs6000_cost_data): New members
>> 	nstmts, nloads and extra_ctor_cost.
>> 	(rs6000_density_test): Add load density related heuristics and the
>> 	checks, do extra costing on vector construction statements if need.
> "and the checks"?  Oh, "and checks"?  It is probably fine to just leave
> out this whole phrase part :-)
>
> Don't use commas like this in changelogs.  s/, do/.  Do/  Yes this is a
> bit boring text that way, but that is the purpose: it makes it simpler
> to read (and read quickly, even merely scan).
>
>> @@ -5262,6 +5262,12 @@ typedef struct _rs6000_cost_data
> [ Btw, you can get rid of the typedef now, just have a struct with the
> non-underscore name, we have C++ now.  Such a mechanical change (as
> separate patch!) is pre-approved. ]
>
>> +  /* Check if we need to penalize the body cost for latency and
>> +     execution resources bound from strided or elementwise loads
>> +     into a vector.  */
> Bill, is that clear enough?  I'm sure something nicer would help here,
> but it's hard for me to write anything :-)

Perhaps:  "Check whether we need to penalize the body cost to account 
for excess strided or elementwise loads."
>
>> +  if (data->extra_ctor_cost > 0)
>> +    {
>> +      /* Threshold for load stmts percentage in all vectorized stmts.  */
>> +      const int DENSITY_LOAD_PCT_THRESHOLD = 45;
> Threshold for what?
>
> 45% is awfully exact.  Can you make this a param?
>
>> +      /* Threshold for total number of load stmts.  */
>> +      const int DENSITY_LOAD_NUM_THRESHOLD = 20;
> Same.


We have similar magic constants in here already.  Parameterizing is 
possible, but I'm more interested in making sure the numbers are 
appropriate for each processor.  Given that Kewen reports they work well 
for both P9 and P10, I'm pretty happy with what we have here.  (Kewen, 
thanks for running the P10 experiments!)

Perhaps a follow-up patch to add params for the magic constants would be 
reasonable, but I'd personally consider it pretty low priority.

>
>> +      unsigned int load_pct = (data->nloads * 100) / (data->nstmts);
> No parens around the last thing please.  The other pair of parens is
> unneeded as well, but perhaps it is easier to read like that.
>
>> +	  if (dump_enabled_p ())
>> +	    dump_printf_loc (MSG_NOTE, vect_location,
>> +			     "Found %u loads and load pct. %u%% exceed "
>> +			     "the threshold, penalizing loop body "
>> +			     "cost by extra cost %u for ctor.\n",
>> +			     data->nloads, load_pct, data->extra_ctor_cost);
> That line does not fit.  Make it more lines?
>
> It is a pity that using these interfaces at all takes up 45 chars
> of noise already.
>
>> +/* Helper function for add_stmt_cost.  Check each statement cost
>> +   entry, gather information and update the target_cost fields
>> +   accordingly.  */
>> +static void
>> +rs6000_update_target_cost_per_stmt (rs6000_cost_data *data,
>> +				    enum vect_cost_for_stmt kind,
>> +				    struct _stmt_vec_info *stmt_info,
>> +				    enum vect_cost_model_location where,
>> +				    int stmt_cost, unsigned int orig_count)
> Please put those last two on separate lines as well?
>
>> +	  /* As function rs6000_builtin_vectorization_cost shows, we have
>> +	     priced much on V16QI/V8HI vector construction as their units,
>> +	     if we penalize them with nunits * stmt_cost, it can result in
>> +	     an unreliable body cost, eg: for V16QI on Power8, stmt_cost
>> +	     is 20 and nunits is 16, the extra cost is 320 which looks
>> +	     much exaggerated.  So let's use one maximum bound for the
>> +	     extra penalized cost for vector construction here.  */
>> +	  const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
>> +	  if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
>> +	    extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
> That is a pretty gross hack.  Can you think of any saner way to not have
> those out of scale costs in the first place?

In Kewen's defense, the whole business of "finish_cost" for these 
vectorized loops is to tweak things that don't work quite right with the 
hooks currently provided to the vectorizer to add costs on a per-stmt 
basis without looking at the overall set of statements.  It gives the 
back end a chance to massage things and exercise veto power over 
otherwise bad decisions.  By nature, that's going to be very much a 
heuristic exercise.  Personally I think the heuristics used here are 
pretty reasonable, and importantly they are designed to only be employed 
in pretty rare circumstances.  It doesn't look easy to me to avoid the 
need for a cap here without making the rest of the heuristics harder to 
understand.  But sure, he can try! :)

Kewen, thanks for the updates!

Bill

>
> Okay for trunk with such tweaks.  Thanks!  (And please consult with Bill
> for the wordsmithing :-) )
>
>
> Segher

next prev parent reply	other threads:[~2021-09-09 17:19 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07  2:29 [PATCH] rs6000: Adjust rs6000_density_test for strided_load Kewen.Lin
2021-05-26  2:59 ` [PATCH v2] rs6000: Add load density heuristic Kewen.Lin
2021-06-09  2:26   ` PING^1 " Kewen.Lin
2021-06-28  7:01     ` PING^2 " Kewen.Lin
2021-07-15  1:59       ` PING^3 " Kewen.Lin
2021-07-27 22:25   ` will schmidt
2021-07-28  2:59     ` Kewen.Lin
2021-09-06 23:43       ` Segher Boessenkool
2021-09-08  7:01         ` Kewen.Lin
2021-07-28  5:22   ` [PATCH v3] " Kewen.Lin
2021-09-03 15:57     ` Bill Schmidt
2021-09-08  6:57       ` [PATCH v4] " Kewen.Lin
2021-09-08  8:28         ` Kewen.Lin
2021-09-09 16:11         ` Segher Boessenkool
2021-09-09 17:19           ` Bill Schmidt [this message]
2021-09-09 17:39             ` Bill Schmidt
2021-09-09 18:24             ` Segher Boessenkool
2021-09-10  3:22             ` Kewen.Lin
2021-09-10  3:46               ` Kewen.Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=894f01c3-6481-0757-751f-b4239a4f0232@linux.ibm.com \
    --to=wschmidt@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@linux.ibm.com \
    --cc=segher@kernel.crashing.org \
    --cc=will_schmidt@vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).