From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id D1B713858C27 for ; Wed, 13 Oct 2021 02:30:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D1B713858C27 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19D1Bd5G023611; Tue, 12 Oct 2021 22:30:58 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3bnnjw1aa7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Oct 2021 22:30:58 -0400 Received: from m0098414.ppops.net (m0098414.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 19D2SNtE007514; Tue, 12 Oct 2021 22:30:57 -0400 Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0b-001b2d01.pphosted.com with ESMTP id 3bnnjw1a8d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Oct 2021 22:30:57 -0400 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 19D2BZud023353; Wed, 13 Oct 2021 02:30:49 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma06fra.de.ibm.com with ESMTP id 3bk2bjm9ay-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 13 Oct 2021 02:30:49 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 19D2UjWq62259646 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Oct 2021 02:30:45 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B1CC811C078; Wed, 13 Oct 2021 02:30:45 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5847911C054; Wed, 13 Oct 2021 02:30:44 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.146.131]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 13 Oct 2021 02:30:44 +0000 (GMT) Subject: PING^1 [PATCH v2] rs6000: Modify the way for extra penalized cost To: GCC Patches Cc: Bill Schmidt , David Edelsohn , Segher Boessenkool References: From: "Kewen.Lin" Message-ID: Date: Wed, 13 Oct 2021 10:30:43 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 In-Reply-To: Content-Type: text/plain; charset=gbk Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 7K54VrmAxeSNPbj6Enm6rOTlZw3iHReQ X-Proofpoint-ORIG-GUID: 51bdWjPkgFy6EfT99BiUBVL3SYrA4J79 Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-10-12_07,2021-10-12_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 spamscore=0 clxscore=1015 priorityscore=1501 mlxscore=0 mlxlogscore=999 bulkscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109230001 definitions=main-2110130013 X-Spam-Status: No, score=-8.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, MIME_CHARSET_FARAWAY, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Oct 2021 02:31:00 -0000 Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580358.html BR, Kewen on 2021/9/28 ÏÂÎç4:16, Kewen.Lin via Gcc-patches wrote: > Hi, > > This patch follows the discussions here[1][2], where Segher > pointed out the existing way to guard the extra penalized > cost for strided/elementwise loads with a magic bound does > not scale. > > The way with nunits * stmt_cost can get one much > exaggerated penalized cost, such as: for V16QI on P8, it's > 16 * 20 = 320, that's why we need one bound. To make it > better and more readable, the penalized cost is simplified > as: > > unsigned adjusted_cost = (nunits == 2) ? 2 : 1; > unsigned extra_cost = nunits * adjusted_cost; > > For V2DI/V2DF, it uses 2 penalized cost for each scalar load > while for the other modes, it uses 1. It's mainly concluded > from the performance evaluations. One thing might be > related is that: More units vector gets constructed, more > instructions are used. It has more chances to schedule them > better (even run in parallelly when enough available units > at that time), so it seems reasonable not to penalize more > for them. > > The SPEC2017 evaluations on Power8/Power9/Power10 at option > sets O2-vect and Ofast-unroll show this change is neutral. > > Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9. > > Is it ok for trunk? > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html > [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580099.html > v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579529.html > > BR, > Kewen > ----- > gcc/ChangeLog: > > * config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust > the way to compute extra penalized cost. Remove useless parameter. > (rs6000_add_stmt_cost): Adjust the call to function > rs6000_update_target_cost_per_stmt. > > > --- > gcc/config/rs6000/rs6000.c | 31 ++++++++++++++++++------------- > 1 file changed, 18 insertions(+), 13 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index dd42b0964f1..8200e1152c2 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -5422,7 +5422,6 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data *data, > enum vect_cost_for_stmt kind, > struct _stmt_vec_info *stmt_info, > enum vect_cost_model_location where, > - int stmt_cost, > unsigned int orig_count) > { > > @@ -5462,17 +5461,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data *data, > { > tree vectype = STMT_VINFO_VECTYPE (stmt_info); > unsigned int nunits = vect_nunits_for_cost (vectype); > - unsigned int extra_cost = nunits * stmt_cost; > - /* As function rs6000_builtin_vectorization_cost shows, we have > - priced much on V16QI/V8HI vector construction as their units, > - if we penalize them with nunits * stmt_cost, it can result in > - an unreliable body cost, eg: for V16QI on Power8, stmt_cost > - is 20 and nunits is 16, the extra cost is 320 which looks > - much exaggerated. So let's use one maximum bound for the > - extra penalized cost for vector construction here. */ > - const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12; > - if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR) > - extra_cost = MAX_PENALIZED_COST_FOR_CTOR; > + /* Don't expect strided/elementwise loads for just 1 nunit. */ > + gcc_assert (nunits > 1); > + /* i386 port adopts nunits * stmt_cost as the penalized cost > + for this kind of penalization, we used to follow it but > + found it could result in an unreliable body cost especially > + for V16QI/V8HI modes. To make it better, we choose this > + new heuristic: for each scalar load, we use 2 as penalized > + cost for the case with 2 nunits and use 1 for the other > + cases. It's without much supporting theory, mainly > + concluded from the broad performance evaluations on Power8, > + Power9 and Power10. One possibly related point is that: > + vector construction for more units would use more insns, > + it has more chances to schedule them better (even run in > + parallelly when enough available units at that time), so > + it seems reasonable not to penalize that much for them. */ > + unsigned int adjusted_cost = (nunits == 2) ? 2 : 1; > + unsigned int extra_cost = nunits * adjusted_cost; > data->extra_ctor_cost += extra_cost; > } > } > @@ -5510,7 +5515,7 @@ rs6000_add_stmt_cost (class vec_info *vinfo, void *data, int count, > cost_data->cost[where] += retval; > > rs6000_update_target_cost_per_stmt (cost_data, kind, stmt_info, where, > - stmt_cost, orig_count); > + orig_count); > } > > return retval; > -- > 2.27.0 >