From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id F11C23838A01 for ; Tue, 28 Nov 2023 15:41:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F11C23838A01 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F11C23838A01 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701186078; cv=none; b=M9Frs58sNalL6OgerqKZ7bdww/xyAqiWh3XR46Tfhztk0gQw+HMv3sjT31f4x5aQR+N8qyWFVj4F3LJ15n0rnlKkL25WW2gJkC6iQqEXpE27xLDBarV/GrTsrM1Qv4wf5Mkny4Oz0/PAFZafhJFQ43zOX8T6I1RUOI49CCwmrDM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701186078; c=relaxed/simple; bh=rXvkJ2R4N+Awwr6ab+pXkFt9SG3+TGAl51a0cSi19JU=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=wqlXIx6xtEBFkruktdyDYsDHlxJhUej3IkxC7g3wjAvUm5/zuZBvzVEl/y7b3AKqnvzpGKVc+hH3k8FkyTZZOK4L46/oeIFeU1qhs1FVYgohw/CtYZWlMzbbW0jxjAMMtdDnMAk0W4/zN9kO6zS7dgoi0PjmZ463pXW6mNAvdpQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B6DD7C15; Tue, 28 Nov 2023 07:42:03 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7B6F63F6C4; Tue, 28 Nov 2023 07:41:15 -0800 (PST) From: Richard Sandiford To: Richard Earnshaw Mail-Followup-To: Richard Earnshaw ,Surya Kumari Jangala , Peter Bergner , GCC Development , vmakarov@redhat.com, richard.sandiford@arm.com Cc: Surya Kumari Jangala , Peter Bergner , GCC Development , vmakarov@redhat.com Subject: Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673 References: <51f4b26f-1462-45c2-8106-fbfe8dc61975@linux.vnet.ibm.com> <2a2060c7-5288-422d-ba1d-dfe4306b4c3f@linux.vnet.ibm.com> <566f7575-10c1-42b7-b006-f77a631a20cb@foss.arm.com> <85f7385c-b14a-4326-88fb-80bbf9b53cc4@linux.vnet.ibm.com> <6ca90437-7564-4339-b652-46587efe828e@foss.arm.com> Date: Tue, 28 Nov 2023 15:41:14 +0000 In-Reply-To: <6ca90437-7564-4339-b652-46587efe828e@foss.arm.com> (Richard Earnshaw's message of "Tue, 28 Nov 2023 13:48:45 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-14.7 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Earnshaw writes: > On 28/11/2023 12:52, Surya Kumari Jangala wrote: >> Hi Richard, >> Thanks a lot for your response! >>=20 >> Another failure reported by the Linaro CI is as follows : >> (Note: I am planning to send a separate mail for each failure, as this w= ill make >> the discussion easy to track) >>=20 >> FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=3Darmv8.2-a+sve= -moverride=3Dtune=3Dnone check-function-bodies dup_x0_m >>=20 >> Expected code: >>=20 >> ... >> add (x[0-9]+), x0, #?1 >> mov (p[0-7])\.b, p15\.b >> mov z0\.d, \2/m, \1 >> ... >> ret >>=20 >>=20 >> Code obtained w/o patch: >> addvl sp, sp, #-1 >> str p15, [sp] >> add x0, x0, 1 >> mov p3.b, p15.b >> mov z0.d, p3/m, x0 >> ldr p15, [sp] >> addvl sp, sp, #1 >> ret >>=20 >> Code obtained w/ patch: >> addvl sp, sp, #-1 >> str p15, [sp] >> mov p3.b, p15.b >> add x0, x0, 1 >> mov z0.d, p3/m, x0 >> ldr p15, [sp] >> addvl sp, sp, #1 >> ret >>=20 >> As we can see, with the patch, the following two instructions are interc= hanged: >> add x0, x0, 1 >> mov p3.b, p15.b > > Indeed, both look acceptable results to me, especially given that we=20 > don't schedule results at -O1. > > There's two ways of fixing this: > 1) Simply swap the order to what the compiler currently generates (which= =20 > is a little fragile, since it might flip back someday). > 2) Write the test as > > > ** ( > ** add (x[0-9]+), x0, #?1 > ** mov (p[0-7])\.b, p15\.b > ** mov z0\.d, \2/m, \1 > ** | > ** mov (p[0-7])\.b, p15\.b > ** add (x[0-9]+), x0, #?1 > ** mov z0\.d, \1/m, \2 > ** ) > > Note, we need to swap the match names in the third insn to account for=20 > the different order of the earlier instructions. > > Neither is ideal, but the second is perhaps a little more bomb proof. > > I don't really have a strong feeling either way, but perhaps the second=20 > is slightly preferable. > > Richard S: thoughts? Yeah, I agree the second is probably better. The | doesn't reset the capture numbers, so I think the final instruction needs to be: ** mov z0\.d, \3/m, \4 Thanks, Richard > > R. > >> I believe that this is fine and the test can be modified to allow it to = pass on >> aarch64. Please let me know what you think. >>=20 >> Regards, >> Surya >>=20 >>=20 >> On 24/11/23 4:18 pm, Richard Earnshaw wrote: >>> >>> >>> On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote: >>>> Hi Richard, >>>> Ping. Please let me know if the test failure that I mentioned in the m= ail below can be handled by changing the expected generated code. I am not = conversant with arm, and hence would appreciate your help. >>>> >>>> Regards, >>>> Surya >>>> >>>> On 03/11/23 4:58 pm, Surya Kumari Jangala wrote: >>>>> Hi Richard, >>>>> I had submitted a patch for review (https://gcc.gnu.org/pipermail/gcc= -patches/2023-October/631849.html) >>>>> regarding scaling save/restore costs of callee save registers with bl= ock >>>>> frequency in the IRA pass (PR111673). >>>>> >>>>> This patch has been approved by VMakarov >>>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html). >>>>> >>>>> With this patch, we are seeing performance improvements with spec on = x86 >>>>> (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%). >>>>> >>>>> I received a mail from Linaro about some failures seen in the CI pipe= line with >>>>> this patch. I have analyzed the failures and I wish to discuss the an= alysis with you. >>>>> >>>>> One failure reported by the Linaro CI is: >>>>> >>>>> FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+,= r[0-9]+, \\[r[0-9]+\\] 2 >>>>> >>>>> The diff in the assembly between trunk and patch is: >>>>> >>>>> 93c93 >>>>> <=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 push=C2=A0=C2=A0=C2=A0 {r4, r5} >>>>> --- >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 push=C2=A0=C2=A0=C2=A0 {fp} >>>>> 95c95 >>>>> <=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ldrexd=C2=A0 r4, r5, [r0] >>>>> --- >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ldrexd=C2=A0 fp, ip, [r0] >>>>> 99c99 >>>>> <=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pop=C2=A0=C2=A0=C2=A0=C2=A0 {r4= , r5} >>>>> --- >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ldr=C2=A0=C2=A0=C2=A0=C2=A0 fp= , [sp], #4 >>>>> >>>>> >>>>> The test fails with patch because the ldrexd insn uses fp & ip regist= ers instead >>>>> of r[0-9]+ >>>>> >>>>> But the code produced by patch is better because it is pushing and re= storing only >>>>> one register (fp) instead of two registers (r4, r5). Hence, this test= can be >>>>> modified to allow it to pass on arm. Please let me know what you thin= k. >>>>> >>>>> If you need more information, please let me know. I will be sending s= eparate mails >>>>> for the other test failures. >>>>> >>> >>> Thanks for looking at this. >>> >>> >>> The key part of this test is that the compiler generates LDREXD.=C2=A0 = The registers used for that are pretty much irrelevant as we don't match th= em to any other operations within the test.=C2=A0 So I'd recommend just tes= ting for the mnemonic and not for any of the operands (ie just match "ldrex= d\t"). >>> >>> R. >>> >>>>> Regards, >>>>> Surya >>>>> >>>>> >>>>>