From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 4D71C3858C74 for ; Fri, 27 Jan 2023 12:25:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4D71C3858C74 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DD0092B; Fri, 27 Jan 2023 04:26:32 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 163E33F5A1; Fri, 27 Jan 2023 04:25:49 -0800 (PST) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina ,gcc-patches@gcc.gnu.org, nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org, nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com Subject: Re: [PATCH]AArch64: Fix codegen regressions around tbz. References: Date: Fri, 27 Jan 2023 12:25:48 +0000 In-Reply-To: (Tamar Christina's message of "Fri, 27 Jan 2023 10:39:25 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-36.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Tamar Christina writes: > Hi All, > > We were analyzing code quality after recent changes and have noticed that the > tbz support somehow managed to increase the number of branches overall rather > than decreased them. > > While investigating this we figured out that the problem is that when an > existing & exists in gimple and the instruction is generated because > of the range information gotten from the ANDed constant that we end up with the > situation that you get a NOP AND in the RTL expansion. > > This is not a problem as CSE will take care of it normally. The issue is when > this original AND was done in a location where PRE or FRE "lift" the AND to a > different basic block. This triggers a problem when the resulting value is not > single use. Instead of having an AND and tbz, we end up generating an > AND + TST + BR if the mode is HI or QI. > > This CSE across BB was a problem before but this change made it worse. Our > branch patterns rely on combine being able to fold AND or zero_extends into the > instructions. > > To work around this (since a proper fix is outside of the scope of stage-4) we > are limiting the new tbranch optab to only HI and QI mode values. This isn't a > problem because these two modes are modes for which we don't have CBZ support, > so they are the problematic cases to begin with. Additionally booleans are QI. > > The second thing we're doing is limiting the only legal bitpos to pos 0. i.e. > only the bottom bit. This such that we prevent the double ANDs as much as > possible. > > Now most other cases, i.e. where we had an explicit & in the source code are > still handled correctly by the anonymous (*tb1) > pattern that was added along with tbranch support. > > This means we don't expand the superflous AND here, and while it doesn't fix the > problem that in the cross BB case we loss tbz, it also doesn't make things worse. > > With these tweaks we've now reduced the number of insn uniformly as originally > expected. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (tbranch_3): Restrict to SHORT > and bottom bit only. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/tbz_2.c: New test. Agreed that reducing the scope of the new optimisation seems like a safe compromise for GCC 13. But could you add a testcase that shows the effect of both changes (reducing the mode selection and the bit selection)? The test above passes even without the patch. It would be good to have a PR tracking this too. Personally, I think we should try to get to the stage where gimple does the CSE optimisations we need, and where the tbranch optab can generate a tbz directly (rather than splitting it apart and hoping that combine will put it back together later). Thanks, Richard > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 2c1367977a68fc8e4289118e07bb61398856791e..aa09e93d85e9628e8944e03498697eb9597ef867 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -949,8 +949,8 @@ (define_insn "*cb1" > > (define_expand "tbranch_3" > [(set (pc) (if_then_else > - (EQL (match_operand:ALLI 0 "register_operand") > - (match_operand 1 "aarch64_simd_shift_imm_")) > + (EQL (match_operand:SHORT 0 "register_operand") > + (match_operand 1 "const0_operand")) > (label_ref (match_operand 2 "")) > (pc)))] > "" > diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_2.c b/gcc/testsuite/gcc.target/aarch64/tbz_2.c > new file mode 100644 > index 0000000000000000000000000000000000000000..ec128b58f35276a7c5452685a65c73f95f2d5f9a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/tbz_2.c > @@ -0,0 +1,130 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -std=c99 -fno-unwind-tables -fno-asynchronous-unwind-tables" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#include > + > +void h(void); > + > +/* > +** g1: > +** cbnz w0, .L[0-9]+ > +** ret > +** ... > +*/ > +void g1(int x) > +{ > + if (__builtin_expect (x, 0)) > + h (); > +} > + > +/* > +** g2: > +** tbnz x0, 0, .L[0-9]+ > +** ret > +** ... > +*/ > +void g2(int x) > +{ > + if (__builtin_expect (x & 1, 0)) > + h (); > +} > + > +/* > +** g3: > +** tbnz x0, 3, .L[0-9]+ > +** ret > +** ... > +*/ > +void g3(int x) > +{ > + if (__builtin_expect (x & 8, 0)) > + h (); > +} > + > +/* > +** g4: > +** tbnz w0, #31, .L[0-9]+ > +** ret > +** ... > +*/ > +void g4(int x) > +{ > + if (__builtin_expect (x & (1 << 31), 0)) > + h (); > +} > + > +/* > +** g5: > +** tst w0, 255 > +** bne .L[0-9]+ > +** ret > +** ... > +*/ > +void g5(char x) > +{ > + if (__builtin_expect (x, 0)) > + h (); > +} > + > +/* > +** g6: > +** tbnz w0, 0, .L[0-9]+ > +** ret > +** ... > +*/ > +void g6(char x) > +{ > + if (__builtin_expect (x & 1, 0)) > + h (); > +} > + > +/* > +** g7: > +** tst w0, 3 > +** bne .L[0-9]+ > +** ret > +** ... > +*/ > +void g7(char x) > +{ > + if (__builtin_expect (x & 3, 0)) > + h (); > +} > + > +/* > +** g8: > +** tbnz w0, 7, .L[0-9]+ > +** ret > +** ... > +*/ > +void g8(char x) > +{ > + if (__builtin_expect (x & (1 << 7), 0)) > + h (); > +} > + > +/* > +** g9: > +** tbnz w0, 0, .L[0-9]+ > +** ret > +** ... > +*/ > +void g9(bool x) > +{ > + if (__builtin_expect (x, 0)) > + h (); > +} > + > +/* > +** g10: > +** tbnz w0, 0, .L[0-9]+ > +** ret > +** ... > +*/ > +void g10(bool x) > +{ > + if (__builtin_expect (x & 1, 0)) > + h (); > +} > +