From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pinskia@gmail.com>
Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636])
	by sourceware.org (Postfix) with ESMTPS id 72D7E385D0FD
	for <gcc-patches@gcc.gnu.org>; Thu, 27 Oct 2022 03:15:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 72D7E385D0FD
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-x636.google.com with SMTP id c24so82122pls.9
        for <gcc-patches@gcc.gnu.org>; Wed, 26 Oct 2022 20:15:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=S1PNSYcC8vmZK3bSzjkA5+2bhH8vCoHN38z6iUKcwis=;
        b=P1q3L5xuFo0RXyqPN1vB5Y+vpusswYPPO77ScdFy11HaYSdnfip7TXEPxMZ5vvvWbA
         0brKPYoiGsYvlbJxgnUpTrPRgdFrc3nAgI8KxtyPNIea3L9eEhi1zDHcJCsBCYazuITP
         nda1YEv/RkSDWBOuWmNGBeBPpdLEHvciKQI5A8HnuKTUUENQ4yZkUHDr7wj2CI9OY9Oc
         bU8Q2f+gmLjHJ862y7/o2Slkt5WF16ZL7xp722dzXADCu1svqbTvQD4pPQpFagoflATS
         zNk2JnVWWEEpCwse1KqFK4StSTkCqKDoKQQTq3qkzrQnrCu54RsAjTBsqX3+WS5UDY1K
         F0ZA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=S1PNSYcC8vmZK3bSzjkA5+2bhH8vCoHN38z6iUKcwis=;
        b=VHrcjg+IBaixQSRIwcBNjZsJJ2X1Ytjtb1u+7FKV/OLmGQ53HcC5IzuUPILzbiDKME
         KLFRp8g1bO7CMOPWYog6WnWbMj17EINnKw4EQVHRG+txdh3ycz3zE+1TYmPo5s8t4ZCU
         8gfR0Ot8RYOGrqcB0aeIVyt6mC3ZmQ52H2kelTUnl3s/CvBEhxPcGZkNUD+7kG9C060Y
         T9EQ/eO8A5WLDy8kBGEtlM5XQHS1Xj4/cA1kZYCDAqvjsdXvZClQwkKacn9tMjH+OVAO
         FwrwBmj+E+mbDUNT0SlsX2LxeY1IWBc1yXV/gKxaB8tzv5OqDeRV/6LUPUvMZiyNfswB
         9Iyg==
X-Gm-Message-State: ACrzQf0k3IdR27Rr9ZdfjeCI/4V2iSB3JWzh1OQeducLyCA0n+SVhZ5K
	zCwCpo7ajiqYvLnOftFCqoGN6B8VHR2Ta09TAWkbwgS/
X-Google-Smtp-Source: AMsMyM6wJoJvuE7Nh5UcHtMsTnjiOhA2XaIxuN2t0cWXpAYXj/uR1jeBXqp/DL6oMrc7RTlybIAfSjoTlcTIj/4I5XA=
X-Received: by 2002:a17:90a:8b93:b0:20a:bd84:5182 with SMTP id
 z19-20020a17090a8b9300b0020abd845182mr7939236pjn.161.1666840536241; Wed, 26
 Oct 2022 20:15:36 -0700 (PDT)
MIME-Version: 1.0
References: <patch-15721-tamar@arm.com> <Yn6RThbeeE97co2Y@arm.com>
In-Reply-To: <Yn6RThbeeE97co2Y@arm.com>
From: Andrew Pinski <pinskia@gmail.com>
Date: Wed, 26 Oct 2022 20:15:23 -0700
Message-ID: <CA+=Sn1nKneiOJ=CYVfU6ZxyV-P89=NFjYouQY5a0w4Fp5W3PjQ@mail.gmail.com>
Subject: Re: [PATCH 2/3]AArch64 Promote function arguments using a paradoxical
 subreg when beneficial.
To: Tamar Christina <tamar.christina@arm.com>
Cc: gcc-patches@gcc.gnu.org, Richard.Earnshaw@arm.com, nd@arm.com, 
	richard.sandiford@arm.com, Marcus.Shawcroft@arm.com
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,THIS_AD,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Fri, May 13, 2022 at 10:14 AM Tamar Christina via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi All,
>
> The PROMOTE_MODE always promotes 8 and 16-bit parameters to 32-bits.
> This promotion is not required for the ABI which states:
>
>
> ```
> C.9     If the argument is an Integral or Pointer Type, the size of the argument is
> less than or equal to 8 bytes and the NGRN is less than 8, the argument is
> copied to the least significant bits in x[NGRN]. The NGRN is incremented by one.
> The argument has now been allocated.
>
> C.16    If the size of the argument is less than 8 bytes then the size of the
> argument is set to 8 bytes. The effect is as if the argument was copied to the
> least significant bits of a 64-bit register and the remaining bits filled with
> unspecified values
> ```
>
> That is, the bits in the registers are unspecified and callees cannot assume
> any particular status.
>
> This means that we can avoid the promotion and still get correct code as the
> language level promotion rules require values to be extended when the bits are
> significant.
>
> So if we are .e.g OR-ing two 8-bit values no extend is needed as the top bits
> are irrelevant.  If we are doing e.g. addition, then the top bits *might* be
> relevant depending on the result type.  But the middle end will always
> contain the appropriate extend in those cases.
>
> The mid-end also has optimizations around this assumption and the AArch64 port
> actively undoes them.
>
> So for instance
>
> uint16_t fd (uint8_t xr){
>     return xr + 1;
> }
>
> uint8_t fd2 (uint8_t xr){
>     return xr + 1;
> }
>
> should produce
>
> fd:                                     // @fd
>         and     w8, w0, #0xff
>         add     w0, w8, #1
>         ret
> fd2:                                    // @fd2
>         add     w0, w0, #1
>         ret
>
> like clang does instead of
>
> fd:
>         and     w0, w0, 255
>         add     w0, w0, 1
>         ret
> fd2:
>         and     w0, w0, 255
>         add     w0, w0, 1
>         ret
>
> like we do now.  Removing this forced expansion maintains correctness but fixes
> issues with various codegen defects.  It also brings us inline with clang.
>
> Note that C, C++ and Fortran etc all correctly specify what should happen w.r.t
> extends and e.g. array indexing, pointer arith etc so we never get incorrect
> code.
>
> There is however a second reason for doing this promotion: RTL efficiency.
> The promotion stops us from having to promote the values to SI to be able to
> use them in instructions and then truncating again afterwards.
>
> To get both the efficiency and the simpler RTL we can instead promote to a
> paradoxical subreg.  This patch implements the hook for AArch64 and adds an
> explicit opt-out for values that feed into comparisons.  This is done because:
>
> 1. our comparisons patterns already allow us to absorb the zero extend
> 2. The extension allows us to use cbz/cbnz/tbz etc.  In some cases such as
>
> int foo (char a, char b)
> {
>    if (a)
>      if (b)
>        bar1 ();
>      else
>        ...
>     else
>      if (b)
>        bar2 ();
>      else
>        ...
> }
>
> by zero extending the value we can avoid having to repeatedly test the value
> before a branch.  Allowing the zero extend also allows our existing `ands`
> patterns to work as expected.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> I have to commit this and the last patch together but ease of review
> I have split them up here. However 209 missed optimization xfails are
> fixed.
>
> No performance difference on SPECCPU 2017 but no failures.
>
> Ok for master?

Did this patch ever get approved? It is a nice improvement that would
be nice to get into GCC 13 before the close of stage 1.

Thanks,
Andrew

>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64.cc (aarch64_promote_function_args_subreg_p):
>         (TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P): New.
>         * config/aarch64/aarch64.h (PROMOTE_MODE): Expand doc.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/apc-subreg.c: New test.
>
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index efa46ac0b8799b5849b609d591186e26e5cb37ff..cc74a816fcc6458aa065246a30a4d2184692ad74 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -34,7 +34,8 @@
>
>  #define REGISTER_TARGET_PRAGMAS() aarch64_register_pragmas ()
>
> -/* Target machine storage layout.  */
> +/* Target machine storage layout.  See also
> +   TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P.  */
>
>  #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)    \
>    if (GET_MODE_CLASS (MODE) == MODE_INT                \
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 2f559600cff55af9d468e8d0810545583cc986f5..252d6c2af72afc1dfee1a86644a5753784b41f59 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -3736,6 +3736,57 @@ aarch64_array_mode_supported_p (machine_mode mode,
>    return false;
>  }
>
> +/* Implement target hook TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P to complement
> +   PROMOTE_MODE.  If any argument promotion was done, do them as subregs.  */
> +static bool
> +aarch64_promote_function_args_subreg_p (machine_mode mode,
> +                                       machine_mode promoted_mode,
> +                                       int /* unsignedp */, tree parm)
> +{
> +  bool candidate_p = GET_MODE_CLASS (mode) == MODE_INT
> +                    && GET_MODE_CLASS (promoted_mode) == MODE_INT
> +                    && known_lt (GET_MODE_SIZE (mode), 4)
> +                    && promoted_mode == SImode;
> +
> +  if (!candidate_p)
> +    return false;
> +
> +  if (!parm || !is_gimple_reg (parm))
> +    return true;
> +
> +  tree var = parm;
> +  if (!VAR_P (var))
> +    {
> +      if (TREE_CODE (parm) == SSA_NAME
> +          && !(var = SSA_NAME_VAR (var)))
> +       return true;
> +      else if (TREE_CODE (parm) != PARM_DECL)
> +       return true;
> +    }
> +
> +  /* If the variable is used inside a comparison which sets CC then we should
> +     still promote using an extend.  By doing this we make it easier to use
> +     cbz/cbnz but also repeatedly having to test the value in certain
> +     circumstances like nested if values that test the same value with calls
> +     in between. */
> +  tree ssa_var = ssa_default_def (cfun, var);
> +  if (!ssa_var)
> +    return true;
> +
> +  const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (ssa_var));
> +  const ssa_use_operand_t *ptr;
> +
> +  for (ptr = head->next; ptr != head; ptr = ptr->next)
> +    if (USE_STMT(ptr) && is_gimple_assign (USE_STMT (ptr)))
> +      {
> +       tree_code code = gimple_assign_rhs_code (USE_STMT(ptr));
> +       if (TREE_CODE_CLASS (code) == tcc_comparison)
> +         return false;
> +      }
> +
> +  return true;
> +}
> +
>  /* MODE is some form of SVE vector mode.  For data modes, return the number
>     of vector register bits that each element of MODE occupies, such as 64
>     for both VNx2DImode and VNx2SImode (where each 32-bit value is stored
> @@ -27490,6 +27541,10 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_ARRAY_MODE_SUPPORTED_P
>  #define TARGET_ARRAY_MODE_SUPPORTED_P aarch64_array_mode_supported_p
>
> +#undef TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P
> +#define TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P \
> +  aarch64_promote_function_args_subreg_p
> +
>  #undef TARGET_VECTORIZE_CREATE_COSTS
>  #define TARGET_VECTORIZE_CREATE_COSTS aarch64_vectorize_create_costs
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/apc-subreg.c b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..2d7563a11ce11fa677f7ad4bf2a090e6a136e4d9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
> @@ -0,0 +1,103 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#include <stdint.h>
> +
> +/*
> +** f0:
> +**     mvn     w0, w0
> +**     ret
> +*/
> +uint8_t f0 (uint8_t xr){
> +    return (uint8_t) (0xff - xr);
> +}
> +
> +/*
> +** f1:
> +**     mvn     w0, w0
> +**     ret
> +*/
> +int8_t f1 (int8_t xr){
> +    return (int8_t) (0xff - xr);
> +}
> +
> +/*
> +** f2:
> +**     mvn     w0, w0
> +**     ret
> +*/
> +uint16_t f2 (uint16_t xr){
> +    return (uint16_t) (0xffFF - xr);
> +}
> +
> +/*
> +** f3:
> +**     mvn     w0, w0
> +**     ret
> +*/
> +uint32_t f3 (uint32_t xr){
> +    return (uint32_t) (0xffFFffff - xr);
> +}
> +
> +/*
> +** f4:
> +**     mvn     x0, x0
> +**     ret
> +*/
> +uint64_t f4 (uint64_t xr){
> +    return (uint64_t) (0xffFFffffffffffff - xr);
> +}
> +
> +/*
> +** f5:
> +**     mvn     w0, w0
> +**     sub     w0, w0, w1
> +**     ret
> +*/
> +uint8_t f5 (uint8_t xr, uint8_t xc){
> +    return (uint8_t) (0xff - xr - xc);
> +}
> +
> +/*
> +** f6:
> +**     mvn     w0, w0
> +**     and     w0, w0, 255
> +**     and     w1, w1, 255
> +**     mul     w0, w0, w1
> +**     ret
> +*/
> +uint16_t f6 (uint8_t xr, uint8_t xc){
> +    return ((uint8_t) (0xff - xr)) * xc;
> +}
> +
> +/*
> +** f7:
> +**     and     w0, w0, 255
> +**     and     w1, w1, 255
> +**     mul     w0, w0, w1
> +**     ret
> +*/
> +uint16_t f7 (uint8_t xr, uint8_t xc){
> +    return xr * xc;
> +}
> +
> +/*
> +** f8:
> +**     mul     w0, w0, w1
> +**     and     w0, w0, 255
> +**     ret
> +*/
> +uint16_t f8 (uint8_t xr, uint8_t xc){
> +    return (uint8_t)(xr * xc);
> +}
> +
> +/*
> +** f9:
> +**     and     w0, w0, 255
> +**     add     w0, w0, w1
> +**     ret
> +*/
> +uint16_t f9 (uint8_t xr, uint16_t xc){
> +    return xr + xc;
> +}
>
>
>
>
> --