From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=p/3H=5Y=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 4D71C3858C74
	for <gcc-patches@gcc.gnu.org>; Fri, 27 Jan 2023 12:25:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4D71C3858C74
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DD0092B;
	Fri, 27 Jan 2023 04:26:32 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 163E33F5A1;
	Fri, 27 Jan 2023 04:25:49 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <tamar.christina@arm.com>
Mail-Followup-To: Tamar Christina <tamar.christina@arm.com>,gcc-patches@gcc.gnu.org,  nd@arm.com,  Richard.Earnshaw@arm.com,  Marcus.Shawcroft@arm.com,  Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com
Cc: gcc-patches@gcc.gnu.org,  nd@arm.com,  Richard.Earnshaw@arm.com,  Marcus.Shawcroft@arm.com,  Kyrylo.Tkachov@arm.com
Subject: Re: [PATCH]AArch64: Fix codegen regressions around tbz.
References: <patch-16829-tamar@arm.com>
Date: Fri, 27 Jan 2023 12:25:48 +0000
In-Reply-To: <patch-16829-tamar@arm.com> (Tamar Christina's message of "Fri,
	27 Jan 2023 10:39:25 +0000")
Message-ID: <mpto7qkjh9v.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-36.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> We were analyzing code quality after recent changes and have noticed that the
> tbz support somehow managed to increase the number of branches overall rather
> than decreased them.
>
> While investigating this we figured out that the problem is that when an
> existing & <contants> exists in gimple and the instruction is generated because
> of the range information gotten from the ANDed constant that we end up with the
> situation that you get a NOP AND in the RTL expansion.
>
> This is not a problem as CSE will take care of it normally.   The issue is when
> this original AND was done in a location where PRE or FRE "lift" the AND to a
> different basic block.  This triggers a problem when the resulting value is not
> single use.  Instead of having an AND and tbz, we end up generating an
> AND + TST + BR if the mode is HI or QI.
>
> This CSE across BB was a problem before but this change made it worse. Our
> branch patterns rely on combine being able to fold AND or zero_extends into the
> instructions.
>
> To work around this (since a proper fix is outside of the scope of stage-4) we
> are limiting the new tbranch optab to only HI and QI mode values.  This isn't a
> problem because these two modes are modes for which we don't have CBZ support,
> so they are the problematic cases to begin with.  Additionally booleans are QI.
>
> The second thing we're doing is limiting the only legal bitpos to pos 0. i.e.
> only the bottom bit.  This such that we prevent the double ANDs as much as
> possible.
>
> Now most other cases, i.e. where we had an explicit & in the source code are
> still handled correctly by the anonymous (*tb<optab><ALLI:mode><GPI:mode>1)
> pattern that was added along with tbranch support.
>
> This means we don't expand the superflous AND here, and while it doesn't fix the
> problem that in the cross BB case we loss tbz, it also doesn't make things worse.
>
> With these tweaks we've now reduced the number of insn uniformly as originally
> expected.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64.md (tbranch_<code><mode>3): Restrict to SHORT
> 	and bottom bit only.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/tbz_2.c: New test.

Agreed that reducing the scope of the new optimisation seems like a safe
compromise for GCC 13.  But could you add a testcase that shows the
effect of both changes (reducing the mode selection and the bit
selection)?  The test above passes even without the patch.

It would be good to have a PR tracking this too.

Personally, I think we should try to get to the stage where gimple
does the CSE optimisations we need, and where the tbranch optab can
generate a tbz directly (rather than splitting it apart and hoping
that combine will put it back together later).

Thanks,
Richard

> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 2c1367977a68fc8e4289118e07bb61398856791e..aa09e93d85e9628e8944e03498697eb9597ef867 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -949,8 +949,8 @@ (define_insn "*cb<optab><mode>1"
>  
>  (define_expand "tbranch_<code><mode>3"
>    [(set (pc) (if_then_else
> -              (EQL (match_operand:ALLI 0 "register_operand")
> -                   (match_operand 1 "aarch64_simd_shift_imm_<mode>"))
> +              (EQL (match_operand:SHORT 0 "register_operand")
> +                   (match_operand 1 "const0_operand"))
>                (label_ref (match_operand 2 ""))
>                (pc)))]
>    ""
> diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_2.c b/gcc/testsuite/gcc.target/aarch64/tbz_2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ec128b58f35276a7c5452685a65c73f95f2d5f9a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/tbz_2.c
> @@ -0,0 +1,130 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -std=c99  -fno-unwind-tables -fno-asynchronous-unwind-tables" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +#include <stdbool.h>
> +
> +void h(void);
> +
> +/*
> +** g1:
> +** 	cbnz	w0, .L[0-9]+
> +** 	ret
> +** 	...
> +*/
> +void g1(int x)
> +{
> +  if (__builtin_expect (x, 0))
> +    h ();
> +}
> +
> +/* 
> +** g2:
> +** 	tbnz	x0, 0, .L[0-9]+
> +** 	ret
> +** 	...
> +*/
> +void g2(int x)
> +{
> +  if (__builtin_expect (x & 1, 0))
> +    h ();
> +}
> +
> +/* 
> +** g3:
> +** 	tbnz	x0, 3, .L[0-9]+
> +** 	ret
> +** 	...
> +*/
> +void g3(int x)
> +{
> +  if (__builtin_expect (x & 8, 0))
> +    h ();
> +}
> +
> +/* 
> +** g4:
> +** 	tbnz	w0, #31, .L[0-9]+
> +** 	ret
> +** 	...
> +*/
> +void g4(int x)
> +{
> +  if (__builtin_expect (x & (1 << 31), 0))
> +    h ();
> +}
> +
> +/* 
> +** g5:
> +** 	tst	w0, 255
> +** 	bne	.L[0-9]+
> +** 	ret
> +** 	...
> +*/
> +void g5(char x)
> +{
> +  if (__builtin_expect (x, 0))
> +    h ();
> +}
> +
> +/* 
> +** g6:
> +** 	tbnz	w0, 0, .L[0-9]+
> +** 	ret
> +** 	...
> +*/
> +void g6(char x)
> +{
> +  if (__builtin_expect (x & 1, 0))
> +    h ();
> +}
> +
> +/* 
> +** g7:
> +** 	tst	w0, 3
> +** 	bne	.L[0-9]+
> +** 	ret
> +** 	...
> +*/
> +void g7(char x)
> +{
> +  if (__builtin_expect (x & 3, 0))
> +    h ();
> +}
> +
> +/* 
> +** g8:
> +** 	tbnz	w0, 7, .L[0-9]+
> +** 	ret
> +** 	...
> +*/
> +void g8(char x)
> +{
> +  if (__builtin_expect (x & (1 << 7), 0))
> +    h ();
> +}
> +
> +/* 
> +** g9:
> +** 	tbnz	w0, 0, .L[0-9]+
> +** 	ret
> +** 	...
> +*/
> +void g9(bool x)
> +{
> +  if (__builtin_expect (x, 0))
> +    h ();
> +}
> +
> +/* 
> +** g10:
> +** 	tbnz	w0, 0, .L[0-9]+
> +** 	ret
> +** 	...
> +*/
> +void g10(bool x)
> +{
> +  if (__builtin_expect (x & 1, 0))
> +    h ();
> +}
> +