PING^3 [PATCH v2] rs6000: Modify the way for extra penalized cost

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Kewen.Lin" <linkw@linux.ibm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Bill Schmidt <wschmidt@linux.ibm.com>,
	David Edelsohn <dje.gcc@gmail.com>,
	Segher Boessenkool <segher@kernel.crashing.org>
Subject: PING^3 [PATCH v2] rs6000: Modify the way for extra penalized cost
Date: Thu, 4 Nov 2021 18:56:36 +0800	[thread overview]
Message-ID: <ba3f7ee4-d531-c4ca-8b5f-7e9d4153e980@linux.ibm.com> (raw)
In-Reply-To: <d1108bc1-5829-0f47-85d2-c0f78b029caa@linux.ibm.com>

Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580358.html

BR,
Kewen

>> on 2021/9/28 下午4:16, Kewen.Lin via Gcc-patches wrote:
>>> Hi,
>>>
>>> This patch follows the discussions here[1][2], where Segher
>>> pointed out the existing way to guard the extra penalized
>>> cost for strided/elementwise loads with a magic bound does
>>> not scale.
>>>
>>> The way with nunits * stmt_cost can get one much
>>> exaggerated penalized cost, such as: for V16QI on P8, it's
>>> 16 * 20 = 320, that's why we need one bound.  To make it
>>> better and more readable, the penalized cost is simplified
>>> as:
>>>
>>>     unsigned adjusted_cost = (nunits == 2) ? 2 : 1;
>>>     unsigned extra_cost = nunits * adjusted_cost;
>>>
>>> For V2DI/V2DF, it uses 2 penalized cost for each scalar load
>>> while for the other modes, it uses 1.  It's mainly concluded
>>> from the performance evaluations.  One thing might be
>>> related is that: More units vector gets constructed, more
>>> instructions are used.  It has more chances to schedule them
>>> better (even run in parallelly when enough available units
>>> at that time), so it seems reasonable not to penalize more
>>> for them.
>>>
>>> The SPEC2017 evaluations on Power8/Power9/Power10 at option
>>> sets O2-vect and Ofast-unroll show this change is neutral.
>>>
>>> Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.
>>>
>>> Is it ok for trunk?
>>>
>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html
>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580099.html
>>> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579529.html
>>>
>>> BR,
>>> Kewen
>>> -----
>>> gcc/ChangeLog:
>>>
>>> 	* config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust
>>> 	the way to compute extra penalized cost.  Remove useless parameter.
>>> 	(rs6000_add_stmt_cost): Adjust the call to function
>>> 	rs6000_update_target_cost_per_stmt.
>>>
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000.c | 31 ++++++++++++++++++-------------
>>>  1 file changed, 18 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>>> index dd42b0964f1..8200e1152c2 100644
>>> --- a/gcc/config/rs6000/rs6000.c
>>> +++ b/gcc/config/rs6000/rs6000.c
>>> @@ -5422,7 +5422,6 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data *data,
>>>  				    enum vect_cost_for_stmt kind,
>>>  				    struct _stmt_vec_info *stmt_info,
>>>  				    enum vect_cost_model_location where,
>>> -				    int stmt_cost,
>>>  				    unsigned int orig_count)
>>>  {
>>>
>>> @@ -5462,17 +5461,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data *data,
>>>  	{
>>>  	  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>>>  	  unsigned int nunits = vect_nunits_for_cost (vectype);
>>> -	  unsigned int extra_cost = nunits * stmt_cost;
>>> -	  /* As function rs6000_builtin_vectorization_cost shows, we have
>>> -	     priced much on V16QI/V8HI vector construction as their units,
>>> -	     if we penalize them with nunits * stmt_cost, it can result in
>>> -	     an unreliable body cost, eg: for V16QI on Power8, stmt_cost
>>> -	     is 20 and nunits is 16, the extra cost is 320 which looks
>>> -	     much exaggerated.  So let's use one maximum bound for the
>>> -	     extra penalized cost for vector construction here.  */
>>> -	  const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
>>> -	  if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
>>> -	    extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
>>> +	  /* Don't expect strided/elementwise loads for just 1 nunit.  */
>>> +	  gcc_assert (nunits > 1);
>>> +	  /* i386 port adopts nunits * stmt_cost as the penalized cost
>>> +	     for this kind of penalization, we used to follow it but
>>> +	     found it could result in an unreliable body cost especially
>>> +	     for V16QI/V8HI modes.  To make it better, we choose this
>>> +	     new heuristic: for each scalar load, we use 2 as penalized
>>> +	     cost for the case with 2 nunits and use 1 for the other
>>> +	     cases.  It's without much supporting theory, mainly
>>> +	     concluded from the broad performance evaluations on Power8,
>>> +	     Power9 and Power10.  One possibly related point is that:
>>> +	     vector construction for more units would use more insns,
>>> +	     it has more chances to schedule them better (even run in
>>> +	     parallelly when enough available units at that time), so
>>> +	     it seems reasonable not to penalize that much for them.  */
>>> +	  unsigned int adjusted_cost = (nunits == 2) ? 2 : 1;
>>> +	  unsigned int extra_cost = nunits * adjusted_cost;
>>>  	  data->extra_ctor_cost += extra_cost;
>>>  	}
>>>      }
>>> @@ -5510,7 +5515,7 @@ rs6000_add_stmt_cost (class vec_info *vinfo, void *data, int count,
>>>        cost_data->cost[where] += retval;
>>>
>>>        rs6000_update_target_cost_per_stmt (cost_data, kind, stmt_info, where,
>>> -					  stmt_cost, orig_count);
>>> +					  orig_count);
>>>      }
>>>
>>>    return retval;
>>> --
>>> 2.27.0
>>>
>

next prev parent reply	other threads:[~2021-11-04 10:56 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-28  8:16 Kewen.Lin
2021-10-13  2:30 ` PING^1 " Kewen.Lin
2021-10-20  9:29   ` PING^2 " Kewen.Lin
2021-11-04 10:56     ` Kewen.Lin [this message]
2021-11-22  2:23       ` PING^4 " Kewen.Lin
2021-11-29 22:06 ` Segher Boessenkool
2021-11-30  5:05   ` Kewen.Lin
2021-11-30 23:05     ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba3f7ee4-d531-c4ca-8b5f-7e9d4153e980@linux.ibm.com \
    --to=linkw@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).