From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=oJxH=4D=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 56F2E3889E05
	for <gcc-patches@gcc.gnu.org>; Mon,  5 Dec 2022 14:06:40 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 56F2E3889E05
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B7ADA23A;
	Mon,  5 Dec 2022 06:06:46 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 39CA93F71A;
	Mon,  5 Dec 2022 06:06:39 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Mail-Followup-To: Tamar Christina <Tamar.Christina@arm.com>,"gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Richard Earnshaw <Richard.Earnshaw@arm.com>,  nd <nd@arm.com>,  Marcus Shawcroft <Marcus.Shawcroft@arm.com>, richard.sandiford@arm.com
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Richard Earnshaw <Richard.Earnshaw@arm.com>,  nd <nd@arm.com>,  Marcus Shawcroft <Marcus.Shawcroft@arm.com>
Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
References: <patch-16485-tamar@arm.com> <Y1+3ThtA9vUT43aA@arm.com>
	<VI1PR08MB532500B9E166C6DDBF255B56FF059@VI1PR08MB5325.eurprd08.prod.outlook.com>
	<mptr0y4ld20.fsf@arm.com>
	<VI1PR08MB53256C1B20AD36411515B57DFF049@VI1PR08MB5325.eurprd08.prod.outlook.com>
	<mptedu4lce5.fsf@arm.com>
	<VI1PR08MB53257BEA3481CD4BF261E891FF049@VI1PR08MB5325.eurprd08.prod.outlook.com>
	<mptv8ngjwpi.fsf@arm.com>
	<VI1PR08MB53253860DD16B5844B9DCEF3FF049@VI1PR08MB5325.eurprd08.prod.outlook.com>
	<mptiljgjvu9.fsf@arm.com>
	<VI1PR08MB5325AA0970C1185FC267E21EFF0D9@VI1PR08MB5325.eurprd08.prod.outlook.com>
	<mptv8n784y6.fsf@arm.com>
	<VI1PR08MB5325C1BC32C8A54A2728169CFF0F9@VI1PR08MB5325.eurprd08.prod.outlook.com>
	<VI1PR08MB532501A9AA9ECB97914BA9C8FF149@VI1PR08MB5325.eurprd08.prod.outlook.com>
Date: Mon, 05 Dec 2022 14:06:37 +0000
In-Reply-To: <VI1PR08MB532501A9AA9ECB97914BA9C8FF149@VI1PR08MB5325.eurprd08.prod.outlook.com>
	(Tamar Christina's message of "Thu, 1 Dec 2022 16:44:07 +0000")
Message-ID: <mptcz8ydjwx.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-38.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Tamar Christina <Tamar.Christina@arm.com> writes:
> Hi,
>
> I hadn't received any reply so I had implemented various ways to do this (about 8 of them in fact).
>
> The conclusion is that no, we cannot emit one big RTL for the final instruction immediately.
> The reason that all comparisons in the AArch64 backend expand to separate CC compares, and
> separate testing of the operands is for ifcvt.
>
> The separate CC compare is needed so ifcvt can produce csel, cset etc from the compares.  Unlike
> say combine, ifcvt can not do recog on a parallel with a clobber.  Should we emit the instruction
> directly then ifcvt will not be able to say, make a csel, because we have no patterns which handle
> zero_extract and compare. (unlike combine ifcvt cannot transform the extract into an AND).
>
> While you could provide various patterns for this (and I did try) you end up with broken patterns
> because you can't add the clobber to the CC register.  If you do, ifcvt recog fails.
>
> i.e.
>
> int
> f1 (int x)
> {
>   if (x & 1)
>     return 1;
>   return x;
> }
>
> We lose csel here.
>
> Secondly the reason the compare with an explicit CC mode is needed is so that ifcvt can transform
> the operation into a version that doesn't require the flags to be set.  But it only does so if it know
> the explicit usage of the CC reg.
>
> For instance 
>
> int
> foo (int a, int b)
> {
>   return ((a & (1 << 25)) ? 5 : 4);
> }
>
> Doesn't require a comparison, the optimal form is:
>
> foo(int, int):
>         ubfx    x0, x0, 25, 1
>         add     w0, w0, 4
>         ret
>
> and no compare is actually needed.  If you represent the instruction using an ANDS instead of a zero_extract
> then you get close, but you end up with an ands followed by an add, which is a slower operation.
>
> These two reasons are the main reasons why all comparisons in AArch64 expand the way they do, so tbranch
> Shouldn't do anything differently here.

Thanks for the (useful) investigation.  Makes sense.

> Additionally the reason for the optab was to pass range information
> to the backend during expansion.

Yeah.  But I think the fundamental reason that AArch64 defines the
optab is still that it has native support for the associated operation
(which is a good thing, an honest reason).  The fact that we split it
apart for if-conversion---in a different form from normal comparisons---
is an implementation detail.  So it still seems like a proper optab,
rather than a crutch to convey tree info.

> In this version however I have represented the expand using an ANDS instead.  This allows us not to regress
> on -O0 as the previous version did.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Note that this patch relies on https://patchwork.sourceware.org/project/gcc/patch/Y1+4qItMrQHbdqqD@arm.com/ 
> which has yet to be reviewed but which cleans up extensions so they can be used like this.
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64.md (*tb<optab><mode>1): Rename to...
> 	(*tb<optab><ALLI:mode><GPI:mode>1): ... this.
> 	(tbranch_<code><mode>4): New.
> 	(zero_extend<SI_ONLY:mode><SD_HSDI:mode>2,
> 	zero_extend<HI_ONLY:mode><SD_HSDI:mode>2,
> 	zero_extend<QI_ONLY:mode><SD_HSDI:mode>2): Make dynamic calls with @.
> 	* config/aarch64/iterators.md(ZEROM, zerom): New.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/tbz_1.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 4c181a96e555c2a58c59fc991000b2a2fa9bd244..7ee1d01e050004e42cd2d0049f0200da71d918bb 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -946,12 +946,33 @@ (define_insn "*cb<optab><mode>1"
>  		      (const_int 1)))]
>  )
>  
> -(define_insn "*tb<optab><mode>1"
> +(define_expand "tbranch_<code><mode>4"
>    [(set (pc) (if_then_else
> -	      (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
> -				    (const_int 1)
> -				    (match_operand 1
> -				      "aarch64_simd_shift_imm_<mode>" "n"))
> +              (EQL (match_operand:ALLI 0 "register_operand")
> +                   (match_operand 1 "aarch64_simd_shift_imm_<mode>"))
> +              (label_ref (match_operand 2 ""))
> +              (pc)))]
> +  ""
> +{
> +  rtx bitvalue = gen_reg_rtx (<ZEROM>mode);
> +  rtx reg = gen_reg_rtx (<ZEROM>mode);
> +  if (<MODE>mode == <ZEROM>mode)
> +    reg = operands[0];
> +  else
> +    emit_insn (gen_zero_extend2 (<MODE>mode, <ZEROM>mode, reg, operands[0]));

I think the last five lines should just be:

  rtx reg = gen_lowpart (<ZEROM>mode, operands[0]);

using paradoxical subregs for the QI and HI cases.  Using subregs should
generate better code, since the temporary runs the risk of having the
same value live in two different pseudos at the same time (one pseudo
with the original mode, one pseudo with the extended mode).

OK with that change and without the changes to the zero_extend pattern names.

Thanks,
Richard

> +  rtx val = GEN_INT (1UL << UINTVAL (operands[1]));
> +  emit_insn (gen_and<zerom>3 (bitvalue, reg, val));
> +  operands[1] = const0_rtx;
> +  operands[0] = aarch64_gen_compare_reg (<CODE>, bitvalue,
> +					 operands[1]);
> +})
> +
> +(define_insn "*tb<optab><ALLI:mode><GPI:mode>1"
> +  [(set (pc) (if_then_else
> +	      (EQL (zero_extract:GPI (match_operand:ALLI 0 "register_operand" "r")
> +				     (const_int 1)
> +				     (match_operand 1
> +				       "aarch64_simd_shift_imm_<ALLI:mode>" "n"))
>  		   (const_int 0))
>  	     (label_ref (match_operand 2 "" ""))
>  	     (pc)))
> @@ -962,15 +983,15 @@ (define_insn "*tb<optab><mode>1"
>        {
>  	if (get_attr_far_branch (insn) == 1)
>  	  return aarch64_gen_far_branch (operands, 2, "Ltb",
> -					 "<inv_tb>\\t%<w>0, %1, ");
> +					 "<inv_tb>\\t%<ALLI:w>0, %1, ");
>  	else
>  	  {
>  	    operands[1] = GEN_INT (HOST_WIDE_INT_1U << UINTVAL (operands[1]));
> -	    return "tst\t%<w>0, %1\;<bcond>\t%l2";
> +	    return "tst\t%<ALLI:w>0, %1\;<bcond>\t%l2";
>  	  }
>        }
>      else
> -      return "<tbz>\t%<w>0, %1, %l2";
> +      return "<tbz>\t%<ALLI:w>0, %1, %l2";
>    }
>    [(set_attr "type" "branch")
>     (set (attr "length")
> @@ -1962,7 +1983,7 @@ (define_insn "extend<ALLX:mode><SD_HSDI:mode>2"
>     (set_attr "arch" "*,*,fp")]
>  )
>  
> -(define_insn "zero_extend<SI_ONLY:mode><SD_HSDI:mode>2"
> +(define_insn "@zero_extend<SI_ONLY:mode><SD_HSDI:mode>2"
>    [(set (match_operand:SD_HSDI 0 "register_operand" "=r,r,w,w,r,w")
>          (zero_extend:SD_HSDI
>  	  (match_operand:SI_ONLY 1 "nonimmediate_operand" "r,m,r,m,w,w")))]
> @@ -1978,7 +1999,7 @@ (define_insn "zero_extend<SI_ONLY:mode><SD_HSDI:mode>2"
>     (set_attr "arch" "*,*,fp,fp,fp,fp")]
>  )
>  
> -(define_insn "zero_extend<HI_ONLY:mode><SD_HSDI:mode>2"
> +(define_insn "@zero_extend<HI_ONLY:mode><SD_HSDI:mode>2"
>    [(set (match_operand:SD_HSDI 0 "register_operand" "=r,r,w,w,r,w")
>          (zero_extend:SD_HSDI
>  	  (match_operand:HI_ONLY 1 "nonimmediate_operand" "r,m,r,m,w,w")))]
> @@ -1994,7 +2015,7 @@ (define_insn "zero_extend<HI_ONLY:mode><SD_HSDI:mode>2"
>     (set_attr "arch" "*,*,fp16,fp,fp,fp16")]
>  )
>  
> -(define_insn "zero_extend<QI_ONLY:mode><SD_HSDI:mode>2"
> +(define_insn "@zero_extend<QI_ONLY:mode><SD_HSDI:mode>2"
>    [(set (match_operand:SD_HSDI 0 "register_operand" "=r,r,w,r,w")
>          (zero_extend:SD_HSDI
>  	  (match_operand:QI_ONLY 1 "nonimmediate_operand" "r,m,m,w,w")))]
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index df72c079f218db9727a96924cab496e91ce6df59..816e44753fb9f6245f3abdb6d3e689a36986ac99 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -1107,6 +1107,8 @@ (define_mode_attr s [(HF "h") (SF "s") (DF "d") (SI "s") (DI "d")])
>  ;; Give the length suffix letter for a sign- or zero-extension.
>  (define_mode_attr size [(QI "b") (HI "h") (SI "w")])
>  (define_mode_attr sizel [(QI "b") (HI "h") (SI "")])
> +(define_mode_attr ZEROM [(QI "SI") (HI "SI") (SI "SI") (DI "DI")])
> +(define_mode_attr zerom [(QI "si") (HI "si") (SI "si") (DI "di")])
>  
>  ;; Give the number of bits in the mode
>  (define_mode_attr sizen [(QI "8") (HI "16") (SI "32") (DI "64")])
> diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_1.c b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..39deb58e278e2180ab270b5a999cac62cb17c682
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
> @@ -0,0 +1,95 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -std=c99  -fno-unwind-tables -fno-asynchronous-unwind-tables" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +#include <stdbool.h>
> +
> +void h(void);
> +
> +/*
> +** g1:
> +** 	tbnz	w[0-9]+, #?0, .L([0-9]+)
> +** 	ret
> +**	...
> +*/
> +void g1(bool x)
> +{
> +  if (__builtin_expect (x, 0))
> +    h ();
> +}
> +
> +/*
> +** g2:
> +** 	tbz	w[0-9]+, #?0, .L([0-9]+)
> +** 	b	h
> +**	...
> +*/
> +void g2(bool x)
> +{
> +  if (__builtin_expect (x, 1))
> +    h ();
> +}
> +
> +/*
> +** g3_ge:
> +** 	tbnz	w[0-9]+, #?31, .L[0-9]+
> +** 	b	h
> +**	...
> +*/
> +void g3_ge(int x)
> +{
> +  if (__builtin_expect (x >= 0, 1))
> +    h ();
> +}
> +
> +/*
> +** g3_gt:
> +** 	cmp	w[0-9]+, 0
> +** 	ble	.L[0-9]+
> +** 	b	h
> +**	...
> +*/
> +void g3_gt(int x)
> +{
> +  if (__builtin_expect (x > 0, 1))
> +    h ();
> +}
> +
> +/*
> +** g3_lt:
> +** 	tbz	w[0-9]+, #?31, .L[0-9]+
> +** 	b	h
> +**	...
> +*/
> +void g3_lt(int x)
> +{
> +  if (__builtin_expect (x < 0, 1))
> +    h ();
> +}
> +
> +/*
> +** g3_le:
> +** 	cmp	w[0-9]+, 0
> +** 	bgt	.L[0-9]+
> +** 	b	h
> +**	...
> +*/
> +void g3_le(int x)
> +{
> +  if (__builtin_expect (x <= 0, 1))
> +    h ();
> +}
> +
> +/*
> +** g5:
> +** 	mov	w[0-9]+, 65279
> +** 	tst	w[0-9]+, w[0-9]+
> +** 	beq	.L[0-9]+
> +** 	b	h
> +**	...
> +*/ 
> +void g5(int x)
> +{
> +  if (__builtin_expect (x & 0xfeff, 1))
> +    h ();
> +}