From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <wschmidt@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com
 [148.163.158.5])
 by sourceware.org (Postfix) with ESMTPS id 5E44C3858420
 for <gcc-patches@gcc.gnu.org>; Fri, 17 Sep 2021 16:34:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5E44C3858420
Received: from pps.filterd (m0098420.ppops.net [127.0.0.1])
 by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 18HFdjac025708; 
 Fri, 17 Sep 2021 12:34:23 -0400
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0b-001b2d01.pphosted.com with ESMTP id 3b4uxx49js-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 17 Sep 2021 12:34:22 -0400
Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 18HGS0Z3005169;
 Fri, 17 Sep 2021 12:34:22 -0400
Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com
 [169.55.85.253])
 by mx0b-001b2d01.pphosted.com with ESMTP id 3b4uxx49jg-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 17 Sep 2021 12:34:22 -0400
Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1])
 by ppma01wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 18HGX0PB007440;
 Fri, 17 Sep 2021 16:34:21 GMT
Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com
 [9.57.198.27]) by ppma01wdc.us.ibm.com with ESMTP id 3b0m3bxxvu-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 17 Sep 2021 16:34:21 +0000
Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com
 [9.57.199.107])
 by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 18HGYL0w33554754
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 17 Sep 2021 16:34:21 GMT
Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id E211E124079;
 Fri, 17 Sep 2021 16:34:19 +0000 (GMT)
Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 6F83F12406E;
 Fri, 17 Sep 2021 16:34:19 +0000 (GMT)
Received: from Bills-MacBook-Pro.local (unknown [9.211.85.128])
 by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTP;
 Fri, 17 Sep 2021 16:34:19 +0000 (GMT)
Reply-To: wschmidt@linux.ibm.com
Subject: Re: [PATCH] rs6000: Modify the way for extra penalized cost
To: "Kewen.Lin" <linkw@linux.ibm.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
 David Edelsohn <dje.gcc@gmail.com>
References: <d26e0a72-a029-c765-75ab-9b31de44f114@linux.ibm.com>
From: Bill Schmidt <wschmidt@linux.ibm.com>
Message-ID: <53cee221-b757-9071-1e74-3c5722a27f30@linux.ibm.com>
Date: Fri, 17 Sep 2021 11:34:18 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0)
 Gecko/20100101 Thunderbird/78.14.0
In-Reply-To: <d26e0a72-a029-c765-75ab-9b31de44f114@linux.ibm.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Language: en-GB
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: zRS7SGpMjmxzJwJUNpaky82ZtKUww_lu
X-Proofpoint-ORIG-GUID: 3Y153T-KKqkEsIp2CF9oSNilmI-gEfX-
Content-Transfer-Encoding: 8bit
X-Proofpoint-UnRewURL: 0 URL was un-rewritten
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475
 definitions=2021-09-17_06,2021-09-17_02,2020-04-07_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 suspectscore=0
 lowpriorityscore=0 adultscore=0 malwarescore=0 clxscore=1015 spamscore=0
 impostorscore=0 mlxlogscore=999 priorityscore=1501 mlxscore=0 phishscore=0
 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2109030001 definitions=main-2109170101
X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A,
 RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Sep 2021 16:34:24 -0000

Hi Kewen,

On 9/15/21 8:14 PM, Kewen.Lin wrote:
> Hi,
>
> This patch follows the discussion here[1], where Segher pointed
> out the existing way to guard the extra penalized cost for
> strided/elementwise loads with a magic bound doesn't scale.
>
> The way with nunits * stmt_cost can get one much exaggerated
> penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320,
> that's why we need one bound.  To make it scale, this patch
> doesn't use nunits * stmt_cost any more, but it still keeps
> nunits since there are actually nunits scalar loads there.  So
> it uses one cost adjusted from stmt_cost, since the current
> stmt_cost sort of considers nunits, we can stablize the cost
> for big nunits and retain the cost for small nunits.  After
> some tries, this patch gets the adjusted cost as:
>
>      stmt_cost / (log2(nunits) * log2(nunits))
>
> For V16QI, the adjusted cost would be 1 and total penalized
> cost is 16, it isn't exaggerated.  For V2DI, the adjusted
> cost would be 2 and total penalized cost is 4, which is the
> same as before.  btw, I tried to use one single log2(nunits),
> but the penalized cost is still big enough and can't fix the
> degraded bmk blender_r.
>
> The separated SPEC2017 evaluations on Power8, Power9 and Power10
> at option sets O2-vect and Ofast-unroll showed this change is
> neutral (that is same effect as before).
>
> Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.
>
> Is it ok for trunk?
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
> 	* config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust
> 	the way to compute extra penalized cost.
>
> ---
>   gcc/config/rs6000/rs6000.c | 28 +++++++++++++++++-----------
>   1 file changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 4ab23b0ab33..e08b94c0447 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -5454,17 +5454,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data *data,
>   	{
>   	  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>   	  unsigned int nunits = vect_nunits_for_cost (vectype);
> -	  unsigned int extra_cost = nunits * stmt_cost;
> -	  /* As function rs6000_builtin_vectorization_cost shows, we have
> -	     priced much on V16QI/V8HI vector construction as their units,
> -	     if we penalize them with nunits * stmt_cost, it can result in
> -	     an unreliable body cost, eg: for V16QI on Power8, stmt_cost
> -	     is 20 and nunits is 16, the extra cost is 320 which looks
> -	     much exaggerated.  So let's use one maximum bound for the
> -	     extra penalized cost for vector construction here.  */
> -	  const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
> -	  if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
> -	    extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
> +	  /* As function rs6000_builtin_vectorization_cost shows, we
> +	     have priced much on V16QI/V8HI vector construction by
> +	     considering their units, if we penalize them with nunits
> +	     * stmt_cost here, it can result in an unreliable body cost,

This might be confusing to the reader, since you have deleted the 
calculation of nunits * stmt_cost.  Could you instead write this to 
indicate that we used to adjust in this way, and it had this particular 
downside, so that's why you're choosing this heuristic? It's a minor 
thing but I think people reading the code will be confused otherwise.

I think the heuristic is generally reasonable, and certainly better than 
what we had before!

LGTM with adjusted commentary, so recommend maintainers approve.

Thanks for the patch!
Bill
> +	     eg: for V16QI on Power8, stmt_cost is 20 and nunits is 16,
> +	     the penalty will be 320 which looks much exaggerated.  But
> +	     there are actually nunits scalar loads, so we try to adopt
> +	     one reasonable penalized cost for each load rather than
> +	     stmt_cost.  Here, with stmt_cost dividing by log2(nunits)^2,
> +	     we can still retain the necessary penalty for small nunits
> +	     meanwhile stabilize the penalty for big nunits.  */
> +	  int nunits_log2 = exact_log2 (nunits);
> +	  gcc_assert (nunits_log2 > 0);
> +	  unsigned int nunits_sq = nunits_log2 * nunits_log2;
> +	  unsigned int adjusted_cost = stmt_cost / nunits_sq;
> +	  gcc_assert (adjusted_cost > 0);
> +	  unsigned int extra_cost = nunits * adjusted_cost;
>   	  data->extra_ctor_cost += extra_cost;
>   	}
>       }
> --
> 2.25.1