From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by sourceware.org (Postfix) with ESMTPS id 72D7E385D0FD for ; Thu, 27 Oct 2022 03:15:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 72D7E385D0FD Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x636.google.com with SMTP id c24so82122pls.9 for ; Wed, 26 Oct 2022 20:15:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=S1PNSYcC8vmZK3bSzjkA5+2bhH8vCoHN38z6iUKcwis=; b=P1q3L5xuFo0RXyqPN1vB5Y+vpusswYPPO77ScdFy11HaYSdnfip7TXEPxMZ5vvvWbA 0brKPYoiGsYvlbJxgnUpTrPRgdFrc3nAgI8KxtyPNIea3L9eEhi1zDHcJCsBCYazuITP nda1YEv/RkSDWBOuWmNGBeBPpdLEHvciKQI5A8HnuKTUUENQ4yZkUHDr7wj2CI9OY9Oc bU8Q2f+gmLjHJ862y7/o2Slkt5WF16ZL7xp722dzXADCu1svqbTvQD4pPQpFagoflATS zNk2JnVWWEEpCwse1KqFK4StSTkCqKDoKQQTq3qkzrQnrCu54RsAjTBsqX3+WS5UDY1K F0ZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=S1PNSYcC8vmZK3bSzjkA5+2bhH8vCoHN38z6iUKcwis=; b=VHrcjg+IBaixQSRIwcBNjZsJJ2X1Ytjtb1u+7FKV/OLmGQ53HcC5IzuUPILzbiDKME KLFRp8g1bO7CMOPWYog6WnWbMj17EINnKw4EQVHRG+txdh3ycz3zE+1TYmPo5s8t4ZCU 8gfR0Ot8RYOGrqcB0aeIVyt6mC3ZmQ52H2kelTUnl3s/CvBEhxPcGZkNUD+7kG9C060Y T9EQ/eO8A5WLDy8kBGEtlM5XQHS1Xj4/cA1kZYCDAqvjsdXvZClQwkKacn9tMjH+OVAO FwrwBmj+E+mbDUNT0SlsX2LxeY1IWBc1yXV/gKxaB8tzv5OqDeRV/6LUPUvMZiyNfswB 9Iyg== X-Gm-Message-State: ACrzQf0k3IdR27Rr9ZdfjeCI/4V2iSB3JWzh1OQeducLyCA0n+SVhZ5K zCwCpo7ajiqYvLnOftFCqoGN6B8VHR2Ta09TAWkbwgS/ X-Google-Smtp-Source: AMsMyM6wJoJvuE7Nh5UcHtMsTnjiOhA2XaIxuN2t0cWXpAYXj/uR1jeBXqp/DL6oMrc7RTlybIAfSjoTlcTIj/4I5XA= X-Received: by 2002:a17:90a:8b93:b0:20a:bd84:5182 with SMTP id z19-20020a17090a8b9300b0020abd845182mr7939236pjn.161.1666840536241; Wed, 26 Oct 2022 20:15:36 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andrew Pinski Date: Wed, 26 Oct 2022 20:15:23 -0700 Message-ID: Subject: Re: [PATCH 2/3]AArch64 Promote function arguments using a paradoxical subreg when beneficial. To: Tamar Christina Cc: gcc-patches@gcc.gnu.org, Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,THIS_AD,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, May 13, 2022 at 10:14 AM Tamar Christina via Gcc-patches wrote: > > Hi All, > > The PROMOTE_MODE always promotes 8 and 16-bit parameters to 32-bits. > This promotion is not required for the ABI which states: > > > ``` > C.9 If the argument is an Integral or Pointer Type, the size of the argument is > less than or equal to 8 bytes and the NGRN is less than 8, the argument is > copied to the least significant bits in x[NGRN]. The NGRN is incremented by one. > The argument has now been allocated. > > C.16 If the size of the argument is less than 8 bytes then the size of the > argument is set to 8 bytes. The effect is as if the argument was copied to the > least significant bits of a 64-bit register and the remaining bits filled with > unspecified values > ``` > > That is, the bits in the registers are unspecified and callees cannot assume > any particular status. > > This means that we can avoid the promotion and still get correct code as the > language level promotion rules require values to be extended when the bits are > significant. > > So if we are .e.g OR-ing two 8-bit values no extend is needed as the top bits > are irrelevant. If we are doing e.g. addition, then the top bits *might* be > relevant depending on the result type. But the middle end will always > contain the appropriate extend in those cases. > > The mid-end also has optimizations around this assumption and the AArch64 port > actively undoes them. > > So for instance > > uint16_t fd (uint8_t xr){ > return xr + 1; > } > > uint8_t fd2 (uint8_t xr){ > return xr + 1; > } > > should produce > > fd: // @fd > and w8, w0, #0xff > add w0, w8, #1 > ret > fd2: // @fd2 > add w0, w0, #1 > ret > > like clang does instead of > > fd: > and w0, w0, 255 > add w0, w0, 1 > ret > fd2: > and w0, w0, 255 > add w0, w0, 1 > ret > > like we do now. Removing this forced expansion maintains correctness but fixes > issues with various codegen defects. It also brings us inline with clang. > > Note that C, C++ and Fortran etc all correctly specify what should happen w.r.t > extends and e.g. array indexing, pointer arith etc so we never get incorrect > code. > > There is however a second reason for doing this promotion: RTL efficiency. > The promotion stops us from having to promote the values to SI to be able to > use them in instructions and then truncating again afterwards. > > To get both the efficiency and the simpler RTL we can instead promote to a > paradoxical subreg. This patch implements the hook for AArch64 and adds an > explicit opt-out for values that feed into comparisons. This is done because: > > 1. our comparisons patterns already allow us to absorb the zero extend > 2. The extension allows us to use cbz/cbnz/tbz etc. In some cases such as > > int foo (char a, char b) > { > if (a) > if (b) > bar1 (); > else > ... > else > if (b) > bar2 (); > else > ... > } > > by zero extending the value we can avoid having to repeatedly test the value > before a branch. Allowing the zero extend also allows our existing `ands` > patterns to work as expected. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > I have to commit this and the last patch together but ease of review > I have split them up here. However 209 missed optimization xfails are > fixed. > > No performance difference on SPECCPU 2017 but no failures. > > Ok for master? Did this patch ever get approved? It is a nice improvement that would be nice to get into GCC 13 before the close of stage 1. Thanks, Andrew > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64.cc (aarch64_promote_function_args_subreg_p): > (TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P): New. > * config/aarch64/aarch64.h (PROMOTE_MODE): Expand doc. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/apc-subreg.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index efa46ac0b8799b5849b609d591186e26e5cb37ff..cc74a816fcc6458aa065246a30a4d2184692ad74 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -34,7 +34,8 @@ > > #define REGISTER_TARGET_PRAGMAS() aarch64_register_pragmas () > > -/* Target machine storage layout. */ > +/* Target machine storage layout. See also > + TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P. */ > > #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE) \ > if (GET_MODE_CLASS (MODE) == MODE_INT \ > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 2f559600cff55af9d468e8d0810545583cc986f5..252d6c2af72afc1dfee1a86644a5753784b41f59 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -3736,6 +3736,57 @@ aarch64_array_mode_supported_p (machine_mode mode, > return false; > } > > +/* Implement target hook TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P to complement > + PROMOTE_MODE. If any argument promotion was done, do them as subregs. */ > +static bool > +aarch64_promote_function_args_subreg_p (machine_mode mode, > + machine_mode promoted_mode, > + int /* unsignedp */, tree parm) > +{ > + bool candidate_p = GET_MODE_CLASS (mode) == MODE_INT > + && GET_MODE_CLASS (promoted_mode) == MODE_INT > + && known_lt (GET_MODE_SIZE (mode), 4) > + && promoted_mode == SImode; > + > + if (!candidate_p) > + return false; > + > + if (!parm || !is_gimple_reg (parm)) > + return true; > + > + tree var = parm; > + if (!VAR_P (var)) > + { > + if (TREE_CODE (parm) == SSA_NAME > + && !(var = SSA_NAME_VAR (var))) > + return true; > + else if (TREE_CODE (parm) != PARM_DECL) > + return true; > + } > + > + /* If the variable is used inside a comparison which sets CC then we should > + still promote using an extend. By doing this we make it easier to use > + cbz/cbnz but also repeatedly having to test the value in certain > + circumstances like nested if values that test the same value with calls > + in between. */ > + tree ssa_var = ssa_default_def (cfun, var); > + if (!ssa_var) > + return true; > + > + const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (ssa_var)); > + const ssa_use_operand_t *ptr; > + > + for (ptr = head->next; ptr != head; ptr = ptr->next) > + if (USE_STMT(ptr) && is_gimple_assign (USE_STMT (ptr))) > + { > + tree_code code = gimple_assign_rhs_code (USE_STMT(ptr)); > + if (TREE_CODE_CLASS (code) == tcc_comparison) > + return false; > + } > + > + return true; > +} > + > /* MODE is some form of SVE vector mode. For data modes, return the number > of vector register bits that each element of MODE occupies, such as 64 > for both VNx2DImode and VNx2SImode (where each 32-bit value is stored > @@ -27490,6 +27541,10 @@ aarch64_libgcc_floating_mode_supported_p > #undef TARGET_ARRAY_MODE_SUPPORTED_P > #define TARGET_ARRAY_MODE_SUPPORTED_P aarch64_array_mode_supported_p > > +#undef TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P > +#define TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P \ > + aarch64_promote_function_args_subreg_p > + > #undef TARGET_VECTORIZE_CREATE_COSTS > #define TARGET_VECTORIZE_CREATE_COSTS aarch64_vectorize_create_costs > > diff --git a/gcc/testsuite/gcc.target/aarch64/apc-subreg.c b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c > new file mode 100644 > index 0000000000000000000000000000000000000000..2d7563a11ce11fa677f7ad4bf2a090e6a136e4d9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c > @@ -0,0 +1,103 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O" } */ > +/* { dg-final { check-function-bodies "**" "" "" } } */ > + > +#include > + > +/* > +** f0: > +** mvn w0, w0 > +** ret > +*/ > +uint8_t f0 (uint8_t xr){ > + return (uint8_t) (0xff - xr); > +} > + > +/* > +** f1: > +** mvn w0, w0 > +** ret > +*/ > +int8_t f1 (int8_t xr){ > + return (int8_t) (0xff - xr); > +} > + > +/* > +** f2: > +** mvn w0, w0 > +** ret > +*/ > +uint16_t f2 (uint16_t xr){ > + return (uint16_t) (0xffFF - xr); > +} > + > +/* > +** f3: > +** mvn w0, w0 > +** ret > +*/ > +uint32_t f3 (uint32_t xr){ > + return (uint32_t) (0xffFFffff - xr); > +} > + > +/* > +** f4: > +** mvn x0, x0 > +** ret > +*/ > +uint64_t f4 (uint64_t xr){ > + return (uint64_t) (0xffFFffffffffffff - xr); > +} > + > +/* > +** f5: > +** mvn w0, w0 > +** sub w0, w0, w1 > +** ret > +*/ > +uint8_t f5 (uint8_t xr, uint8_t xc){ > + return (uint8_t) (0xff - xr - xc); > +} > + > +/* > +** f6: > +** mvn w0, w0 > +** and w0, w0, 255 > +** and w1, w1, 255 > +** mul w0, w0, w1 > +** ret > +*/ > +uint16_t f6 (uint8_t xr, uint8_t xc){ > + return ((uint8_t) (0xff - xr)) * xc; > +} > + > +/* > +** f7: > +** and w0, w0, 255 > +** and w1, w1, 255 > +** mul w0, w0, w1 > +** ret > +*/ > +uint16_t f7 (uint8_t xr, uint8_t xc){ > + return xr * xc; > +} > + > +/* > +** f8: > +** mul w0, w0, w1 > +** and w0, w0, 255 > +** ret > +*/ > +uint16_t f8 (uint8_t xr, uint8_t xc){ > + return (uint8_t)(xr * xc); > +} > + > +/* > +** f9: > +** and w0, w0, 255 > +** add w0, w0, w1 > +** ret > +*/ > +uint16_t f9 (uint8_t xr, uint16_t xc){ > + return xr + xc; > +} > > > > > --