From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 84C173858D20 for ; Wed, 31 May 2023 09:11:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 84C173858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34V98mvG006305; Wed, 31 May 2023 09:11:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=UL+7INeVc42+4fnqDt9EOYUn/2qH0/xCE4lDzyVgjqU=; b=Iq8N73CU/PD3wHSb3shAI/DeAlL0pNy8syZHO0MmRbd+Q4RgHADrXT502b9JCJzs5cEh s5Luo02iHy289iYNpi1KuOfltm8demvXFiVVRVf9I2nLp1mGT6m5Age4X7efol2n633v g/iaTy7UtZzkI52aLdZTMbBdGcuBRQ1VhBVVfrYDNnMO/OBYoaf7wWKRyQpcrwXe41Yo eqYLgAItsXQrtHs2FuLLfvwsJNkkWYrt1KlwrIC8rvsQTHqZIIEvbm4DaeYjOd7/5nYz CPJqvxRDwYYcDyd9X7QID9ZnOQirtrm/fNhB4ceUAmY8bRm7u3O5V9yx48fcs71Enbmy wA== Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qwmx2mb68-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 31 May 2023 09:11:25 +0000 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34V4jvj9019046; Wed, 31 May 2023 09:11:23 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma03fra.de.ibm.com (PPS) with ESMTPS id 3qu9g59k1r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 31 May 2023 09:11:23 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34V9BL8X42467796 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 31 May 2023 09:11:21 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E8DFC20040; Wed, 31 May 2023 09:11:20 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 121482004E; Wed, 31 May 2023 09:11:19 +0000 (GMT) Received: from [9.177.78.100] (unknown [9.177.78.100]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 31 May 2023 09:11:18 +0000 (GMT) Message-ID: Date: Wed, 31 May 2023 17:11:17 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions Content-Language: en-US To: Carl Love , Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, Peter Bergner References: <4f1af7ba04999d0258354b8e6794621ee303ec53.camel@us.ibm.com> <2fab487a-df28-0a5b-dba8-bed417e039bb@linux.ibm.com> From: "Kewen.Lin" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: M3hEAvazD9quSV-yxoq9KnHEcDS2AO_k X-Proofpoint-ORIG-GUID: M3hEAvazD9quSV-yxoq9KnHEcDS2AO_k X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-31_04,2023-05-30_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 adultscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxscore=0 priorityscore=1501 bulkscore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305310078 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Carl, on 2023/5/25 23:59, Carl Love wrote: > Peter, Kewen: > > On Thu, 2023-05-25 at 13:28 +0800, Kewen.Lin wrote: >> on 2023/5/24 23:20, Carl Love wrote: >>> On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote: >>>> on 2023/5/24 06:30, Peter Bergner wrote: >>>>> On 5/23/23 12:24 AM, Kewen.Lin wrote: >>>>>> on 2023/5/23 01:31, Carl Love wrote: >>>>>>> The builtins were requested for use in GLibC. As of >>>>>>> version >>>>>>> 2.31 they >>>>>>> were added as inline asm. They requested a builtin so the >>>>>>> asm >>>>>>> could be >>>>>>> removed. >>>>>> >>>>>> So IMHO we also want the similar support for mffscrn, that is >>>>>> to >>>>>> make >>>>>> use of mffscrn and mffscrni on Power9 and later, but falls >>>>>> back >>>>>> to >>>>>> __builtin_set_fpscr_rn + mffs similar on older platforms. >>>>> >>>>> So __builtin_set_fpscr_rn everything we want (sets the RN bits) >>>>> and >>>>> uses mffscrn/mffscrni on P9 and later and uses older insns on >>>>> pre- >>>>> P9. >>>>> The only problem is we don't return the current FPSCR bits, as >>>>> the >>>>> bif >>>>> is defined to return void. >>>> >>>> Yes. >>>> >>>>> Crazy idea, but could we extend the built-in >>>>> with an overload that returns the FPSCR bits? >>>> >>>> So you agree that we should make this proposed new bif handle >>>> pre-P9 >>>> just >>>> like some other existing bifs. :) I think extending it is good >>>> and >>>> doable, >>>> but the only concern here is the bif name >>>> "__builtin_set_fpscr_rn", >>>> which >>>> matches the existing behavior (only set rounding) but doesn't >>>> match >>>> the >>>> proposed extending behavior (set rounding and get some env bits >>>> back). >>>> Maybe it's not a big deal if the documentation clarify it well. >>> >>> Extending the builtin to pre Power 9 is straight forward and I >>> agree >>> would make good sense to do. >>> >>> I am a bit concerned on how to extend __builtin_set_fpscr_rn to add >>> the >>> new functionality. Peter suggests overloading the builtin to >>> either >>> return void or returns FPSCR bits. It is my understanding that the >>> return value for a given builtin had to be the same, i.e. you can't >>> overload the return value. Maybe you can with Bill's new >>> infrastructure? I recall having problems trying to overload the >>> return >>> value in the past and Bill said you couldn't do it. I play with >>> this >>> and see if I can overload the return value. >> >> Your understanding on that we fail to overload this for just >> different >> return types is correct. But previously I interpreted the extending >> proposal as to extend >> >> void __builtin_set_fpscr_rn (int); >> >> to >> >> void __builtin_set_fpscr_rn (int, double*); >> >> The related address taken and store here can be optimized out >> normally. > > I don't think that is correct. The current definition of the builtin Just to clarify: I didn't meant to suggest it, just explained why I thought overloading is doable previously as I interpreted it with one extended argument. :) > is: > > void __builtin_set_fpscr_rn (int); > > The proposal by Peter was to change the return type to double, i.e. > > double __builtin_set_fpscr_rn (int); > > Peter also said the following: > > The built-in machinery can see that the usage is expecting a return > value or not and for the pre-P9 code, can skip generating the ending > mffs if we don't want the return value. > > Which I don't think we want. The mffscrn and mffscrni instructions > return the contents of the control bits in the FPSCR, that is, bits > 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), are placed > into the corresponding bits in register FRT. All other bits in register > FRT are set to 0. > > The instructions also updates the current RN field of the FPSCR with > the new RN supplied the second argument of the instruction. So, the > instructions update the RN field just like the __builtin_set_fpscr_rn. > So, we can use the existing __builtin_set_fpscr_rn to update the RN for > all ISAs, we just need to have __builtin_set_fpscr_rn always return a > double with the desired fields from the FPSCR (the current RN). This > will then emulate the behavior of the mffscrn and mffscrni > instructions. The current uses of __builtin_set_fpscr_rn will just > ignore the return value which is not a problem. The return value can > be stored in the places were the user is currently using the inline asm > for the mffscrn and mffscrni instructions. > > The __builtin_set_fpscr_rn builtin is currently using the mffscrn and > mffscrni on Power 9 and throwing away the result from the instruction. > We just need to change __builtin_set_fpscr_rn to return the value > instead. For the pre Power 9 code, the builtin will need to read the > full FPSCR, mask of the desired fields and return the fields. > > So, there is no need for the builtin to have to determine if the user > is storing the result of the __builtin_set_fpscr_rn. The RN bits will > always be updated by the __builtin_set_fpscr_rn builtin and the > existing fields of the FPSCR will always be returned by the builtin. Yeah, I agree, even with pre-P9 code when the returned value is unused, I'd expect DCE can eliminate the part for the FPSCR bits reading and masking, it's just like before (only setting RN bits). The only concern I mentioned before is the built-in name doesn't clearly match what it does (with extending, it returns something instead) since it's only saying "set" and setting RN bits, the return value is easily misunderstood as returning old RN bits, the documentation has to explain and note it well. Looking forward to Segher's opinion on this. BR, Kewen > > Please let me know if you agree. I think I have this sorted out > correctly. Thanks. > > Carl >