From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 5E44C3858420 for ; Fri, 17 Sep 2021 16:34:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5E44C3858420 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 18HFdjac025708; Fri, 17 Sep 2021 12:34:23 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3b4uxx49js-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Sep 2021 12:34:22 -0400 Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 18HGS0Z3005169; Fri, 17 Sep 2021 12:34:22 -0400 Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com [169.55.85.253]) by mx0b-001b2d01.pphosted.com with ESMTP id 3b4uxx49jg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Sep 2021 12:34:22 -0400 Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1]) by ppma01wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 18HGX0PB007440; Fri, 17 Sep 2021 16:34:21 GMT Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by ppma01wdc.us.ibm.com with ESMTP id 3b0m3bxxvu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Sep 2021 16:34:21 +0000 Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com [9.57.199.107]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 18HGYL0w33554754 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Sep 2021 16:34:21 GMT Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E211E124079; Fri, 17 Sep 2021 16:34:19 +0000 (GMT) Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6F83F12406E; Fri, 17 Sep 2021 16:34:19 +0000 (GMT) Received: from Bills-MacBook-Pro.local (unknown [9.211.85.128]) by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 17 Sep 2021 16:34:19 +0000 (GMT) Reply-To: wschmidt@linux.ibm.com Subject: Re: [PATCH] rs6000: Modify the way for extra penalized cost To: "Kewen.Lin" , GCC Patches Cc: Segher Boessenkool , David Edelsohn References: From: Bill Schmidt Message-ID: <53cee221-b757-9071-1e74-3c5722a27f30@linux.ibm.com> Date: Fri, 17 Sep 2021 11:34:18 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Language: en-GB X-TM-AS-GCONF: 00 X-Proofpoint-GUID: zRS7SGpMjmxzJwJUNpaky82ZtKUww_lu X-Proofpoint-ORIG-GUID: 3Y153T-KKqkEsIp2CF9oSNilmI-gEfX- Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-17_06,2021-09-17_02,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 lowpriorityscore=0 adultscore=0 malwarescore=0 clxscore=1015 spamscore=0 impostorscore=0 mlxlogscore=999 priorityscore=1501 mlxscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109170101 X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2021 16:34:24 -0000 Hi Kewen, On 9/15/21 8:14 PM, Kewen.Lin wrote: > Hi, > > This patch follows the discussion here[1], where Segher pointed > out the existing way to guard the extra penalized cost for > strided/elementwise loads with a magic bound doesn't scale. > > The way with nunits * stmt_cost can get one much exaggerated > penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320, > that's why we need one bound. To make it scale, this patch > doesn't use nunits * stmt_cost any more, but it still keeps > nunits since there are actually nunits scalar loads there. So > it uses one cost adjusted from stmt_cost, since the current > stmt_cost sort of considers nunits, we can stablize the cost > for big nunits and retain the cost for small nunits. After > some tries, this patch gets the adjusted cost as: > > stmt_cost / (log2(nunits) * log2(nunits)) > > For V16QI, the adjusted cost would be 1 and total penalized > cost is 16, it isn't exaggerated. For V2DI, the adjusted > cost would be 2 and total penalized cost is 4, which is the > same as before. btw, I tried to use one single log2(nunits), > but the penalized cost is still big enough and can't fix the > degraded bmk blender_r. > > The separated SPEC2017 evaluations on Power8, Power9 and Power10 > at option sets O2-vect and Ofast-unroll showed this change is > neutral (that is same effect as before). > > Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9. > > Is it ok for trunk? > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html > > BR, > Kewen > ----- > gcc/ChangeLog: > > * config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust > the way to compute extra penalized cost. > > --- > gcc/config/rs6000/rs6000.c | 28 +++++++++++++++++----------- > 1 file changed, 17 insertions(+), 11 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 4ab23b0ab33..e08b94c0447 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -5454,17 +5454,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data *data, > { > tree vectype = STMT_VINFO_VECTYPE (stmt_info); > unsigned int nunits = vect_nunits_for_cost (vectype); > - unsigned int extra_cost = nunits * stmt_cost; > - /* As function rs6000_builtin_vectorization_cost shows, we have > - priced much on V16QI/V8HI vector construction as their units, > - if we penalize them with nunits * stmt_cost, it can result in > - an unreliable body cost, eg: for V16QI on Power8, stmt_cost > - is 20 and nunits is 16, the extra cost is 320 which looks > - much exaggerated. So let's use one maximum bound for the > - extra penalized cost for vector construction here. */ > - const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12; > - if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR) > - extra_cost = MAX_PENALIZED_COST_FOR_CTOR; > + /* As function rs6000_builtin_vectorization_cost shows, we > + have priced much on V16QI/V8HI vector construction by > + considering their units, if we penalize them with nunits > + * stmt_cost here, it can result in an unreliable body cost, This might be confusing to the reader, since you have deleted the calculation of nunits * stmt_cost.  Could you instead write this to indicate that we used to adjust in this way, and it had this particular downside, so that's why you're choosing this heuristic? It's a minor thing but I think people reading the code will be confused otherwise. I think the heuristic is generally reasonable, and certainly better than what we had before! LGTM with adjusted commentary, so recommend maintainers approve. Thanks for the patch! Bill > + eg: for V16QI on Power8, stmt_cost is 20 and nunits is 16, > + the penalty will be 320 which looks much exaggerated. But > + there are actually nunits scalar loads, so we try to adopt > + one reasonable penalized cost for each load rather than > + stmt_cost. Here, with stmt_cost dividing by log2(nunits)^2, > + we can still retain the necessary penalty for small nunits > + meanwhile stabilize the penalty for big nunits. */ > + int nunits_log2 = exact_log2 (nunits); > + gcc_assert (nunits_log2 > 0); > + unsigned int nunits_sq = nunits_log2 * nunits_log2; > + unsigned int adjusted_cost = stmt_cost / nunits_sq; > + gcc_assert (adjusted_cost > 0); > + unsigned int extra_cost = nunits * adjusted_cost; > data->extra_ctor_cost += extra_cost; > } > } > -- > 2.25.1