From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id CABE83858C2C for ; Thu, 9 Sep 2021 17:39:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CABE83858C2C Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 189HY1ZW172232; Thu, 9 Sep 2021 13:39:14 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ayethwb9r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Sep 2021 13:39:14 -0400 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 189HY3AV172468; Thu, 9 Sep 2021 13:39:14 -0400 Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ayethwb95-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Sep 2021 13:39:14 -0400 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 189HNSov004574; Thu, 9 Sep 2021 17:39:13 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma04wdc.us.ibm.com with ESMTP id 3axcnray3m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Sep 2021 17:39:13 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 189HdC8F36372988 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 9 Sep 2021 17:39:12 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 90A2DAE060; Thu, 9 Sep 2021 17:39:12 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 107B2AE05C; Thu, 9 Sep 2021 17:39:12 +0000 (GMT) Received: from Bills-MacBook-Pro.local (unknown [9.211.104.79]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 9 Sep 2021 17:39:11 +0000 (GMT) Reply-To: wschmidt@linux.ibm.com Subject: Re: [PATCH v4] rs6000: Add load density heuristic From: Bill Schmidt To: Segher Boessenkool , "Kewen.Lin" Cc: David Edelsohn , will schmidt , GCC Patches References: <7b9f9bdf-1ed5-139b-de9c-511ee8454b85@linux.ibm.com> <3424a3d3-fa4e-16f9-89c6-0b07beec957d@linux.ibm.com> <77fe5ac1-200f-db69-a92a-5d349642f394@linux.ibm.com> <4f7c5da8-75d3-2d98-b728-e1a319392097@linux.ibm.com> <20210909161152.GR1583@gate.crashing.org> <894f01c3-6481-0757-751f-b4239a4f0232@linux.ibm.com> Message-ID: <7bf98af4-db16-a07f-61aa-2c9498e71935@linux.ibm.com> Date: Thu, 9 Sep 2021 12:39:11 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <894f01c3-6481-0757-751f-b4239a4f0232@linux.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-TM-AS-GCONF: 00 X-Proofpoint-GUID: ZHifsDOupHq3HZ4VfqqT96OPBBQqtbg0 X-Proofpoint-ORIG-GUID: Jpf_VLiPIDU1Wd6nBgBtQsHocXmbNq3V X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-09-09_06:2021-09-09, 2021-09-09 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 mlxlogscore=999 suspectscore=0 phishscore=0 clxscore=1015 malwarescore=0 spamscore=0 impostorscore=0 mlxscore=0 priorityscore=1501 bulkscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109090107 X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Sep 2021 17:39:17 -0000 On 9/9/21 12:19 PM, Bill Schmidt wrote: > On 9/9/21 11:11 AM, Segher Boessenkool wrote: >> Hi! >> >> On Wed, Sep 08, 2021 at 02:57:14PM +0800, Kewen.Lin wrote: >>>>> + /* If we have strided or elementwise loads into a vector, it's >>>>> + possible to be bounded by latency and execution resources for >>>>> + many scalar loads. Try to account for this by scaling the >>>>> + construction cost by the number of elements involved, when >>>>> + handling each matching statement we record the possible extra >>>>> + penalized cost into target cost, in the end of costing for >>>>> + the whole loop, we do the actual penalization once some load >>>>> + density heuristics are satisfied. */ >>>> The above comment is quite hard to read. Can you please break up the last >>>> sentence into at least two sentences? >>> How about the below: >>> >>> + /* If we have strided or elementwise loads into a vector, it's >> "strided" is not a word: it properly is "stridden", which does not read >> very well either. "Have loads by stride, or by element, ..."? Is that >> good English, and easier to understand? > No, this is OK.  "Strided loads" is a term of art used by the > vectorizer; whether or not it was the Queen's English, it's what we > have...  (And I think you might only find "bestridden" in some 18th or > 19th century English poetry... :-) >>> + possible to be bounded by latency and execution resources for >>> + many scalar loads. Try to account for this by scaling the >>> + construction cost by the number of elements involved. For >>> + each matching statement, we record the possible extra >>> + penalized cost into the relevant field in target cost. When >>> + we want to finalize the whole loop costing, we will check if >>> + those related load density heuristics are satisfied, and add >>> + this accumulated penalized cost if yes. */ >>> >>>> Otherwise this looks good to me, and I recommend maintainers approve with >>>> that clarified. >> Does that text look good to you now Bill? It is still kinda complex, >> maybe you see a way to make it simpler. > I think it's OK now.  The complexity at least matches the code now > instead of exceeding it. :-P  j/k... Well, let me not be lazy, and see whether I can help: "Power processors do not currently have instructions for strided and elementwise loads, and instead we must generate multiple scalar loads.  This leads to undercounting of the cost.  We account for this by scaling the construction cost by the number of elements involved, and saving this as extra cost that we may or may not need to apply.  When finalizing the cost of the loop, the extra penalty is applied when the load density heuristics are satisfied." Something like that? Bill