From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 9B9273858C2B for ; Thu, 20 Jul 2023 09:14:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9B9273858C2B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 76F292F4; Thu, 20 Jul 2023 02:15:24 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1113F3F6C4; Thu, 20 Jul 2023 02:14:39 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,Tamar Christina , gcc-patches@gcc.gnu.org, nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, hubicka@ucw.cz, richard.sandiford@arm.com Cc: Tamar Christina , gcc-patches@gcc.gnu.org, nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, hubicka@ucw.cz Subject: Re: [PATCH]AArch64 fix regexp for live_1.c sve test References: Date: Thu, 20 Jul 2023 10:14:38 +0100 In-Reply-To: (Richard Biener's message of "Thu, 20 Jul 2023 07:20:35 +0000 (UTC)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-20.6 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener writes: > On Thu, 20 Jul 2023, Richard Sandiford wrote: > >> Tamar Christina writes: >> > Hi All, >> > >> > The resulting predicate register of a whilelo is not >> > restricted to the lower half of the predicate register file. >> > >> > As such these tests started failing after recent changes >> > because the whilelo outside the loop is getting assigned p15. >> >> It's the whilelo in the loop for me. We go from: >> >> .L3: >> ld1b z31.b, p7/z, [x4, x3] >> movprfx z30, z31 >> mul z30.b, p5/m, z30.b, z29.b >> st1b z30.b, p7, [x4, x3] >> mov p6.b, p7.b >> add x3, x3, x0 >> whilelo p7.b, w3, w1 >> b.any .L3 >> >> to: >> >> .L3: >> ld1b z31.b, p7/z, [x3, x2] >> movprfx z29, z31 >> mul z29.b, p6/m, z29.b, z30.b >> st1b z29.b, p7, [x3, x2] >> add x2, x2, x0 >> whilelo p15.b, w2, w1 >> b.any .L4 >> [...] >> .p2align 2,,3 >> .L4: >> mov p7.b, p15.b >> b .L3 >> >> This adds an extra (admittedly unconditional) branch to every non-final >> vector iteration, which seems unfortunate. I don't think we'd see >> p8-p15 otherwise, since the result of the whilelo is used as a >> governing predicate by the next iteration of the loop. >> >> This happens because the scalar loop is given an 89% chance of iterating. >> Previously we gave the vector loop an 83.33% chance of iterating, whereas >> after 061f74c06735e1fa35b910ae we give it a 12% chance. 0.89^16 == 15.50%, >> so the new probabilities definitely preserve the original probabilities >> more closely. But for purely heuristic probabilities like these, I'm >> not sure we should lean so heavily into the idea that the vector >> latch is unlikely. >> >> Honza, Richi, any thoughts? Just wanted to double-check that this >> was operating as expected before making the tests accept the (arguably) >> less efficient code. It looks like the commit was more aimed at fixing >> the profile counts for the epilogues, rather than the main loop. > > The above looks like a failed coalescing, can you track down where > that happens and why? Ah, sorry, I shouldn't have trimmed the context. The previous predicate (p6 in the original code) is live on exit from the loop, while the whilelo result is live on the latch edge. So I think a move is needed somewhere. Thanks, Richard