From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id ED3E73857004 for ; Thu, 20 Jul 2023 05:45:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ED3E73857004 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 692B52F4; Wed, 19 Jul 2023 22:45:44 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0677C3F738; Wed, 19 Jul 2023 22:44:59 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina ,gcc-patches@gcc.gnu.org, nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, hubicka@ucw.cz, rguenther@suse.de, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org, nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, hubicka@ucw.cz, rguenther@suse.de Subject: Re: [PATCH]AArch64 fix regexp for live_1.c sve test References: Date: Thu, 20 Jul 2023 06:44:58 +0100 In-Reply-To: (Tamar Christina's message of "Tue, 18 Jul 2023 15:43:21 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-26.7 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Tamar Christina writes: > Hi All, > > The resulting predicate register of a whilelo is not > restricted to the lower half of the predicate register file. > > As such these tests started failing after recent changes > because the whilelo outside the loop is getting assigned p15. It's the whilelo in the loop for me. We go from: .L3: ld1b z31.b, p7/z, [x4, x3] movprfx z30, z31 mul z30.b, p5/m, z30.b, z29.b st1b z30.b, p7, [x4, x3] mov p6.b, p7.b add x3, x3, x0 whilelo p7.b, w3, w1 b.any .L3 to: .L3: ld1b z31.b, p7/z, [x3, x2] movprfx z29, z31 mul z29.b, p6/m, z29.b, z30.b st1b z29.b, p7, [x3, x2] add x2, x2, x0 whilelo p15.b, w2, w1 b.any .L4 [...] .p2align 2,,3 .L4: mov p7.b, p15.b b .L3 This adds an extra (admittedly unconditional) branch to every non-final vector iteration, which seems unfortunate. I don't think we'd see p8-p15 otherwise, since the result of the whilelo is used as a governing predicate by the next iteration of the loop. This happens because the scalar loop is given an 89% chance of iterating. Previously we gave the vector loop an 83.33% chance of iterating, whereas after 061f74c06735e1fa35b910ae we give it a 12% chance. 0.89^16 == 15.50%, so the new probabilities definitely preserve the original probabilities more closely. But for purely heuristic probabilities like these, I'm not sure we should lean so heavily into the idea that the vector latch is unlikely. Honza, Richi, any thoughts? Just wanted to double-check that this was operating as expected before making the tests accept the (arguably) less efficient code. It looks like the commit was more aimed at fixing the profile counts for the epilogues, rather than the main loop. Thanks, Richard > This widens the regexp. > > Tested on aarch64-none-linux-gnu and passes again. > > Ok for master? > > Thanks, > Tamar > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/live_1.c: Update assembly. > > --- inline copy of patch -- > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c > index 80ee176d1807bf628ad47551d69ff5d84deda79e..2db6c3c209a9514646e92628f3d2dd58d466539c 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c > @@ -27,10 +27,10 @@ > > TEST_ALL (EXTRACT_LAST) > > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].b, } 2 } } */ > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].h, } 4 } } */ > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].s, } 4 } } */ > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].d, } 4 } } */ > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.b, } 2 } } */ > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, } 4 } } */ > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.s, } 4 } } */ > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.d, } 4 } } */ > > /* { dg-final { scan-assembler-times {\tlastb\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */ > /* { dg-final { scan-assembler-times {\tlastb\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */