From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 645FA3858D20 for ; Thu, 14 Dec 2023 07:18:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 645FA3858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=linux.vnet.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 645FA3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702538284; cv=none; b=RzngeQSwrlOH5pJM25w+dMtsuMSo1xUfgrXJ6FN7QvKa0naS8rAOlWn7midq6O7gXT4A3UB7026kDMtv3S5bO1asjsx20e/0mamOjsS9T3SHFKzWq6QId1M0NBhtqY7A4FHQBvFwYvt/gl93tkQSdUo/EIQVReamE5gqeViEZaY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702538284; c=relaxed/simple; bh=mlXLYWi0tSQkK5LgGGKjPqskn75p/jNjNNwmfvHQ/GQ=; h=DKIM-Signature:Message-ID:Date:Subject:To:From:MIME-Version; b=GeRpIKcG7blBSHP3cVhumIsiUhYCXVmvCBUg3qsjuPzJw3QDenhabZGRd3zRACqFlWStD2vdMNPdnxk8XHnzegvVDuUTWZGz0RwcEgPmlm+TibHzu49rYd8G/RebOHSWTUrRQmRRb3q9rY63qgowN1LVA8bQ1VNZJsllD7huVOM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3BE67BOo019282; Thu, 14 Dec 2023 07:17:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=Uu8gUARyBy7e3YYENB9czvrJH8xv9zlbBg//j+rIrZc=; b=A/zjVISa1+UYxTFS7nS0B0Coub8ewerSnEIZAnqpVoM1QvyAMOtzzf22+BdJN11kgyjx Xkqh8wveY/67TJwVXjcGO4gWNLc1nyVta2VwsnBKu+Tf8d65kCAymSQRBSke/e/ymp5P C0Pl7BVIvV9PoK2ygVSvFTOv7oXFsaZZK+vei+mxFQHjYmV5fjzbVfmUjrwEORpEn2wY 7aMez/wibYcFx8JfTmJ5eEg/IgWCD/KbteuDnRxaj/+k7ZCMfpxcfXC+EVtbFd5UJYDO 6Jqjqk4Q4TR6ibxmiMPyN/ThzbnnaNC9Uu2lGKnMjbVwPN+4HWzZIxATjLOcen9hNeCx QQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3uyv5ghkt6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Dec 2023 07:17:59 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3BE6gCL8024352; Thu, 14 Dec 2023 07:17:58 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3uyv5ghksu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Dec 2023 07:17:58 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3BE61U8E008455; Thu, 14 Dec 2023 07:17:58 GMT Received: from smtprelay07.dal12v.mail.ibm.com ([172.16.1.9]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3uw2jtq2mu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Dec 2023 07:17:58 +0000 Received: from smtpav05.wdc07v.mail.ibm.com (smtpav05.wdc07v.mail.ibm.com [10.39.53.232]) by smtprelay07.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3BE7HvQo36110828 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Dec 2023 07:17:57 GMT Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6897858061; Thu, 14 Dec 2023 07:17:57 +0000 (GMT) Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 90EF158043; Thu, 14 Dec 2023 07:17:55 +0000 (GMT) Received: from [9.61.98.110] (unknown [9.61.98.110]) by smtpav05.wdc07v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Dec 2023 07:17:55 +0000 (GMT) Message-ID: <8aff621f-5a4f-4a72-bf7d-064ecf29643c@linux.vnet.ibm.com> Date: Thu, 14 Dec 2023 12:47:53 +0530 User-Agent: Mozilla Thunderbird Subject: Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673 Content-Language: en-US To: Richard Earnshaw , Richard Sandiford , Peter Bergner Cc: GCC Development , vmakarov@redhat.com References: <51f4b26f-1462-45c2-8106-fbfe8dc61975@linux.vnet.ibm.com> <2a2060c7-5288-422d-ba1d-dfe4306b4c3f@linux.vnet.ibm.com> <566f7575-10c1-42b7-b006-f77a631a20cb@foss.arm.com> <85f7385c-b14a-4326-88fb-80bbf9b53cc4@linux.vnet.ibm.com> <6ca90437-7564-4339-b652-46587efe828e@foss.arm.com> From: Surya Kumari Jangala In-Reply-To: <6ca90437-7564-4339-b652-46587efe828e@foss.arm.com> Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: TEPP45ihb5qaPXZapcmJ86scnB_97FmV X-Proofpoint-ORIG-GUID: DhMpIGGW7Nr2IUmUqT9Eekv0KScLoHLJ Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-14_03,2023-12-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 spamscore=0 clxscore=1011 priorityscore=1501 adultscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 suspectscore=0 malwarescore=0 bulkscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2312140045 X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard, Thanks a lot for your response! Another failure reported by the Linaro CI is as follows: Running gcc:gcc.dg/dg.exp ... FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping" FAIL: gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping" I analyzed the failures and the root cause is the same for both the failures. The test pr10474.c is as follows: void f(int *i) { if (!i) return; else { __builtin_printf("Hi"); *i=0; } } With the patch (for PR111673), x1 (volatile) is being assigned to hold value of x0 (first parameter). Since it is a volatile, x1 is saved to the stack as there is a call later on. The save to the stack is generated in the LRA pass. The save is generated in the entry basic block. Due to the usage of the stack pointer in the entry bb, the testcase fails to be shrink wrapped. The reason why LRA generates the store insn in the entry bb is as follows: LRA emits insns to save volatile registers in the inheritance/splitting pass. In this pass, LRA builds EBBs (Extended Basic Block) and traverses the insns in the EBBs in reverse order from the last insn to the first insn. When LRA sees a write to a pseudo (that has been assigned a volatile register), and there is a read following the write, with an intervening call insn between the write and read, then LRA generates a spill immediately after the write and a restore immediately before the read. In the above test, there is an EBB containing the entry bb and the bb with the printf call. In the entry bb, there is a write to x1 (basically a copy from x0 to x1) and in the printf bb, there is a read of x1 after the call insn. So LRA generates a spill in the entry bb. Without patch, x19 is chosen to hold the value of x0. Since x19 is a non-volatile, the input RTL to the shrink wrap pass does not have any code to save x19 to the stack. Only the insn that copies x0 to x19 is present in the entry bb. In the shrink wrap pass, this insn is moved down the cfg to the bb containing the call to printf, thereby allowing prolog to be allocated only where needed. Thus shrink wrap succeeds. Shrink wrap can be made to succeed if the save of x1 occurs just before the call insn, instead of generating it after the write to x1. This will ensure that the spill does not occur in the entry bb. In fact, it is more efficient if the save occurs only in the path containing the printf call instead of occurring in the entry bb. I have a patch (bootstrapped and regtested on powerpc) that makes changes in LRA to save volatile registers before a call instead of after the write to the volatile. With this patch, both the above tests pass. Since the patch for PR111673 has been approved by Vladimir, I plan to commit the patch to trunk. And I will fix the test failures after doing the commit. Regards, Surya On 28/11/23 7:18 pm, Richard Earnshaw wrote: > > > On 28/11/2023 12:52, Surya Kumari Jangala wrote: >> Hi Richard, >> Thanks a lot for your response! >> >> Another failure reported by the Linaro CI is as follows : >> (Note: I am planning to send a separate mail for each failure, as this will make >> the discussion easy to track) >> >> FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve -moverride=tune=none  check-function-bodies dup_x0_m >> >> Expected code: >> >>        ... >>        add     (x[0-9]+), x0, #?1 >>        mov     (p[0-7])\.b, p15\.b >>        mov     z0\.d, \2/m, \1 >>        ... >>        ret >> >> >> Code obtained w/o patch: >>          addvl   sp, sp, #-1 >>          str     p15, [sp] >>          add     x0, x0, 1 >>          mov     p3.b, p15.b >>          mov     z0.d, p3/m, x0 >>          ldr     p15, [sp] >>          addvl   sp, sp, #1 >>          ret >> >> Code obtained w/ patch: >>     addvl   sp, sp, #-1 >>          str     p15, [sp] >>          mov     p3.b, p15.b >>          add     x0, x0, 1 >>          mov     z0.d, p3/m, x0 >>          ldr     p15, [sp] >>          addvl   sp, sp, #1 >>          ret >> >> As we can see, with the patch, the following two instructions are interchanged: >>          add     x0, x0, 1 >>          mov     p3.b, p15.b > > Indeed, both look acceptable results to me, especially given that we don't schedule results at -O1. > > There's two ways of fixing this: > 1) Simply swap the order to what the compiler currently generates (which is a little fragile, since it might flip back someday). > 2) Write the test as > > > ** ( > **       add     (x[0-9]+), x0, #?1 > **       mov     (p[0-7])\.b, p15\.b > **       mov     z0\.d, \2/m, \1 > ** | > **       mov     (p[0-7])\.b, p15\.b > **       add     (x[0-9]+), x0, #?1 > **       mov     z0\.d, \1/m, \2 > ** ) > > Note, we need to swap the match names in the third insn to account for the different order of the earlier instructions. > > Neither is ideal, but the second is perhaps a little more bomb proof. > > I don't really have a strong feeling either way, but perhaps the second is slightly preferable. > > Richard S: thoughts? > > R. > >> I believe that this is fine and the test can be modified to allow it to pass on >> aarch64. Please let me know what you think. >> >> Regards, >> Surya >> >> >> On 24/11/23 4:18 pm, Richard Earnshaw wrote: >>> >>> >>> On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote: >>>> Hi Richard, >>>> Ping. Please let me know if the test failure that I mentioned in the mail below can be handled by changing the expected generated code. I am not conversant with arm, and hence would appreciate your help. >>>> >>>> Regards, >>>> Surya >>>> >>>> On 03/11/23 4:58 pm, Surya Kumari Jangala wrote: >>>>> Hi Richard, >>>>> I had submitted a patch for review (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html) >>>>> regarding scaling save/restore costs of callee save registers with block >>>>> frequency in the IRA pass (PR111673). >>>>> >>>>> This patch has been approved by VMakarov >>>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html). >>>>> >>>>> With this patch, we are seeing performance improvements with spec on x86 >>>>> (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%). >>>>> >>>>> I received a mail from Linaro about some failures seen in the CI pipeline with >>>>> this patch. I have analyzed the failures and I wish to discuss the analysis with you. >>>>> >>>>> One failure reported by the Linaro CI is: >>>>> >>>>> FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, r[0-9]+, \\[r[0-9]+\\] 2 >>>>> >>>>> The diff in the assembly between trunk and patch is: >>>>> >>>>> 93c93 >>>>> <       push    {r4, r5} >>>>> --- >>>>>>         push    {fp} >>>>> 95c95 >>>>> <       ldrexd  r4, r5, [r0] >>>>> --- >>>>>>         ldrexd  fp, ip, [r0] >>>>> 99c99 >>>>> <       pop     {r4, r5} >>>>> --- >>>>>>         ldr     fp, [sp], #4 >>>>> >>>>> >>>>> The test fails with patch because the ldrexd insn uses fp & ip registers instead >>>>> of r[0-9]+ >>>>> >>>>> But the code produced by patch is better because it is pushing and restoring only >>>>> one register (fp) instead of two registers (r4, r5). Hence, this test can be >>>>> modified to allow it to pass on arm. Please let me know what you think. >>>>> >>>>> If you need more information, please let me know. I will be sending separate mails >>>>> for the other test failures. >>>>> >>> >>> Thanks for looking at this. >>> >>> >>> The key part of this test is that the compiler generates LDREXD.  The registers used for that are pretty much irrelevant as we don't match them to any other operations within the test.  So I'd recommend just testing for the mnemonic and not for any of the operands (ie just match "ldrexd\t"). >>> >>> R. >>> >>>>> Regards, >>>>> Surya >>>>> >>>>> >>>>>