From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=0rYb=HJ=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id F11C23838A01
	for <gcc@gcc.gnu.org>; Tue, 28 Nov 2023 15:41:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F11C23838A01
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F11C23838A01
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701186078; cv=none;
	b=M9Frs58sNalL6OgerqKZ7bdww/xyAqiWh3XR46Tfhztk0gQw+HMv3sjT31f4x5aQR+N8qyWFVj4F3LJ15n0rnlKkL25WW2gJkC6iQqEXpE27xLDBarV/GrTsrM1Qv4wf5Mkny4Oz0/PAFZafhJFQ43zOX8T6I1RUOI49CCwmrDM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1701186078; c=relaxed/simple;
	bh=rXvkJ2R4N+Awwr6ab+pXkFt9SG3+TGAl51a0cSi19JU=;
	h=From:To:Subject:Date:Message-ID:MIME-Version; b=wqlXIx6xtEBFkruktdyDYsDHlxJhUej3IkxC7g3wjAvUm5/zuZBvzVEl/y7b3AKqnvzpGKVc+hH3k8FkyTZZOK4L46/oeIFeU1qhs1FVYgohw/CtYZWlMzbbW0jxjAMMtdDnMAk0W4/zN9kO6zS7dgoi0PjmZ463pXW6mNAvdpQ=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B6DD7C15;
	Tue, 28 Nov 2023 07:42:03 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7B6F63F6C4;
	Tue, 28 Nov 2023 07:41:15 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
Mail-Followup-To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>,Surya Kumari Jangala <jskumari@linux.vnet.ibm.com>,  Peter Bergner <bergner@linux.ibm.com>,  GCC Development <gcc@gcc.gnu.org>,  vmakarov@redhat.com, richard.sandiford@arm.com
Cc: Surya Kumari Jangala <jskumari@linux.vnet.ibm.com>,  Peter Bergner <bergner@linux.ibm.com>,  GCC Development <gcc@gcc.gnu.org>,  vmakarov@redhat.com
Subject: Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673
References: <51f4b26f-1462-45c2-8106-fbfe8dc61975@linux.vnet.ibm.com>
	<2a2060c7-5288-422d-ba1d-dfe4306b4c3f@linux.vnet.ibm.com>
	<566f7575-10c1-42b7-b006-f77a631a20cb@foss.arm.com>
	<85f7385c-b14a-4326-88fb-80bbf9b53cc4@linux.vnet.ibm.com>
	<6ca90437-7564-4339-b652-46587efe828e@foss.arm.com>
Date: Tue, 28 Nov 2023 15:41:14 +0000
In-Reply-To: <6ca90437-7564-4339-b652-46587efe828e@foss.arm.com> (Richard
	Earnshaw's message of "Tue, 28 Nov 2023 13:48:45 +0000")
Message-ID: <mpty1ehrilh.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-14.7 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc.gcc.gnu.org>

Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:
> On 28/11/2023 12:52, Surya Kumari Jangala wrote:
>> Hi Richard,
>> Thanks a lot for your response!
>>=20
>> Another failure reported by the Linaro CI is as follows :
>> (Note: I am planning to send a separate mail for each failure, as this w=
ill make
>> the discussion easy to track)
>>=20
>> FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=3Darmv8.2-a+sve=
 -moverride=3Dtune=3Dnone  check-function-bodies dup_x0_m
>>=20
>> Expected code:
>>=20
>>        ...
>>        add     (x[0-9]+), x0, #?1
>>        mov     (p[0-7])\.b, p15\.b
>>        mov     z0\.d, \2/m, \1
>>        ...
>>        ret
>>=20
>>=20
>> Code obtained w/o patch:
>>          addvl   sp, sp, #-1
>>          str     p15, [sp]
>>          add     x0, x0, 1
>>          mov     p3.b, p15.b
>>          mov     z0.d, p3/m, x0
>>          ldr     p15, [sp]
>>          addvl   sp, sp, #1
>>          ret
>>=20
>> Code obtained w/ patch:
>> 	addvl   sp, sp, #-1
>>          str     p15, [sp]
>>          mov     p3.b, p15.b
>>          add     x0, x0, 1
>>          mov     z0.d, p3/m, x0
>>          ldr     p15, [sp]
>>          addvl   sp, sp, #1
>>          ret
>>=20
>> As we can see, with the patch, the following two instructions are interc=
hanged:
>>          add     x0, x0, 1
>>          mov     p3.b, p15.b
>
> Indeed, both look acceptable results to me, especially given that we=20
> don't schedule results at -O1.
>
> There's two ways of fixing this:
> 1) Simply swap the order to what the compiler currently generates (which=
=20
> is a little fragile, since it might flip back someday).
> 2) Write the test as
>
>
> ** (
> **       add     (x[0-9]+), x0, #?1
> **       mov     (p[0-7])\.b, p15\.b
> **       mov     z0\.d, \2/m, \1
> ** |
> **       mov     (p[0-7])\.b, p15\.b
> **       add     (x[0-9]+), x0, #?1
> **       mov     z0\.d, \1/m, \2
> ** )
>
> Note, we need to swap the match names in the third insn to account for=20
> the different order of the earlier instructions.
>
> Neither is ideal, but the second is perhaps a little more bomb proof.
>
> I don't really have a strong feeling either way, but perhaps the second=20
> is slightly preferable.
>
> Richard S: thoughts?

Yeah, I agree the second is probably better.  The | doesn't reset the
capture numbers, so I think the final instruction needs to be:

**       mov     z0\.d, \3/m, \4

Thanks,
Richard

>
> R.
>
>> I believe that this is fine and the test can be modified to allow it to =
pass on
>> aarch64. Please let me know what you think.
>>=20
>> Regards,
>> Surya
>>=20
>>=20
>> On 24/11/23 4:18 pm, Richard Earnshaw wrote:
>>>
>>>
>>> On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote:
>>>> Hi Richard,
>>>> Ping. Please let me know if the test failure that I mentioned in the m=
ail below can be handled by changing the expected generated code. I am not =
conversant with arm, and hence would appreciate your help.
>>>>
>>>> Regards,
>>>> Surya
>>>>
>>>> On 03/11/23 4:58 pm, Surya Kumari Jangala wrote:
>>>>> Hi Richard,
>>>>> I had submitted a patch for review (https://gcc.gnu.org/pipermail/gcc=
-patches/2023-October/631849.html)
>>>>> regarding scaling save/restore costs of callee save registers with bl=
ock
>>>>> frequency in the IRA pass (PR111673).
>>>>>
>>>>> This patch has been approved by VMakarov
>>>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html).
>>>>>
>>>>> With this patch, we are seeing performance improvements with spec on =
x86
>>>>> (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%).
>>>>>
>>>>> I received a mail from Linaro about some failures seen in the CI pipe=
line with
>>>>> this patch. I have analyzed the failures and I wish to discuss the an=
alysis with you.
>>>>>
>>>>> One failure reported by the Linaro CI is:
>>>>>
>>>>> FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+,=
 r[0-9]+, \\[r[0-9]+\\] 2
>>>>>
>>>>> The diff in the assembly between trunk and patch is:
>>>>>
>>>>> 93c93
>>>>> <=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 push=C2=A0=C2=A0=C2=A0 {r4, r5}
>>>>> ---
>>>>>>  =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 push=C2=A0=C2=A0=C2=A0 {fp}
>>>>> 95c95
>>>>> <=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ldrexd=C2=A0 r4, r5, [r0]
>>>>> ---
>>>>>>  =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ldrexd=C2=A0 fp, ip, [r0]
>>>>> 99c99
>>>>> <=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pop=C2=A0=C2=A0=C2=A0=C2=A0 {r4=
, r5}
>>>>> ---
>>>>>>  =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ldr=C2=A0=C2=A0=C2=A0=C2=A0 fp=
, [sp], #4
>>>>>
>>>>>
>>>>> The test fails with patch because the ldrexd insn uses fp & ip regist=
ers instead
>>>>> of r[0-9]+
>>>>>
>>>>> But the code produced by patch is better because it is pushing and re=
storing only
>>>>> one register (fp) instead of two registers (r4, r5). Hence, this test=
 can be
>>>>> modified to allow it to pass on arm. Please let me know what you thin=
k.
>>>>>
>>>>> If you need more information, please let me know. I will be sending s=
eparate mails
>>>>> for the other test failures.
>>>>>
>>>
>>> Thanks for looking at this.
>>>
>>>
>>> The key part of this test is that the compiler generates LDREXD.=C2=A0 =
The registers used for that are pretty much irrelevant as we don't match th=
em to any other operations within the test.=C2=A0 So I'd recommend just tes=
ting for the mnemonic and not for any of the operands (ie just match "ldrex=
d\t").
>>>
>>> R.
>>>
>>>>> Regards,
>>>>> Surya
>>>>>
>>>>>
>>>>>