From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by sourceware.org (Postfix) with ESMTPS id D70433858C55 for ; Wed, 27 Mar 2024 10:55:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D70433858C55 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D70433858C55 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::52d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711536947; cv=none; b=sC6h6RXbC0pOE4yIVntO0VDFOOFvSMhwNdVaUQ41DhdV1+jQMsOVxY6wPh8o87T/C5uOCPZ9Na38kTRMs3IcRtKV2f2LyjEFXjGu/89JhCpkChEGx6yYMJxRxND6R1f+LNq8gpVCPei7B/YOYzwk5da7b+JFZdYFa303agDLYao= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711536947; c=relaxed/simple; bh=dqTfsoYw9UZHEmul97icf0BI+Cs2zdSKRtaCmsXBMN8=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=TEpOdb0g9u4uoyMjFtgXavj5YO7BfzxYjDAGs27pgxvf5ZLAqQNWvHFeturbT970DrM7mT+nomqyiEBT5dPm22lMPU+cp2ScU8xNON4B8KlubktNhzf8ab1ctL+jxfO1HBPJtaEpD5NspkGc/pQJMGChYbdlJV5D7tP+5LBFjAM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-566e869f631so7103685a12.0 for ; Wed, 27 Mar 2024 03:55:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1711536943; x=1712141743; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=w3nbpMu2JhxEFzhzBlo2cA9XY7lbzzixZINjj+evBL0=; b=L4Zog9fzTOkdHAK6O/PyVarvCUMqBRP9Lbqzr4z4exF9Ur0SX1OX8Z9ugih4R7AV1a M/TR5CfUOo1dUNUqXWnBFd3+6BAQEAuel11P+zVYsrNs/KmrBRtZQmlF7jUjpSBDsPQk c2uwbLOMSxvMMiKaor/FRjjiKn0dhJ4JIYrWk3Q9ZB2UV2E6EfaFVGy7/fyzOHiMjcNx GiVIgYuMiEGigXQTypMcPtyHu5VR7ITqsP/uUnuMGFOov6s94k9+AaBz0VL6NwfOtn+l VxDw7W5PRlqIZ60wFghmTPgvVNzTZ/uBIp5vAsHCfZvgDcbzbtD+XelxSm/GUcZfOeAm wqbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711536943; x=1712141743; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w3nbpMu2JhxEFzhzBlo2cA9XY7lbzzixZINjj+evBL0=; b=Q0Dp/mIylf+gjPhqANTcSSUiyBQAICuI7f3VZQwhL8BKPkgHnkA76fHKMZUSHPzUEj MnEEWjlB3yIg0l3u/cyCZjmur3dGEMhRAcqT2u5EQnmNCLWmTdmWM+MX0U1bsOh9iJFK eK06kM6PJokwL2QbW4anpG3g5mpCgmUosEFbs7QUPHU4ifjXIFaR0TxrK7tIoIgszs2E W+dWxz2kw4XOmC4cXnJh/kX2a4PmeP6EmgMknXFM4Yt/mJKOmpjlV0dzVjip0iQsXesR 41lQdAxxK+6nhnqrWm0OtxTUEa7PoeRa8PLoSeVsJrlbKIbcnPYfQu0PbU8auPTovSjN uCpg== X-Gm-Message-State: AOJu0YwYJVsDggAp52jAY0REI4dOr/iI/x5xpzyBuiCoNZczFC9wacaB nmf5r4qR7fWmxSvvHJ5keV09pgxOI85hr+xxPzbOXdfHK8HnKa+HI8OlawfFuY/lJvRdGCHu0PE HRkLzpv35gFZC1KnzkvszRmk2q+Gw7CN+06cLJw== X-Google-Smtp-Source: AGHT+IGjFe4Pfz/f5u8Nduty16952f4+QQeVT+Hj9aMuPWNIt+97Y8gFcFtGzQP7xLIrAUdOWAtgSJuo+PqPUSjJOX8= X-Received: by 2002:a50:9505:0:b0:567:737f:e910 with SMTP id u5-20020a509505000000b00567737fe910mr719880eda.3.1711536943395; Wed, 27 Mar 2024 03:55:43 -0700 (PDT) MIME-Version: 1.0 References: <20221109231006.3240799-1-philipp.tomsich@vrull.eu> <12bcc671-c96b-aa78-9a2d-9a832b389148@gmail.com> In-Reply-To: From: Philipp Tomsich Date: Wed, 27 Mar 2024 11:55:32 +0100 Message-ID: Subject: Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split To: Jeff Law Cc: gcc-patches@gcc.gnu.org, Vineet Gupta , Kito Cheng , Jeff Law , Manolis Tsamis , Palmer Dabbelt , Christoph Muellner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Jeff, just a heads-up that that trunk (i.e., the soon-to-be GCC14) still generates the suboptimal sequence: https://godbolt.org/z/K9YYEPsvY Thanks, Philipp. On Mon, 21 Nov 2022 at 18:00, Philipp Tomsich wr= ote: > > On Sun, 20 Nov 2022 at 17:38, Jeff Law wrote: > > > > > > On 11/9/22 16:10, Philipp Tomsich wrote: > > > The current method of treating shifts of extended values on RISC-V > > > frequently causes sequences of 3 shifts, despite the presence of the > > > 'zero_extendsidi2_shifted' pattern. > > > > > > Consider: > > > unsigned long f(unsigned int a, unsigned long b) > > > { > > > a =3D a << 1; > > > unsigned long c =3D (unsigned long) a; > > > c =3D b + (c<<4); > > > return c; > > > } > > > which will present at combine-time as: > > > Trying 7, 8 -> 9: > > > 7: r78:SI=3Dr81:DI#0<<0x1 > > > REG_DEAD r81:DI > > > 8: r79:DI=3Dzero_extend(r78:SI) > > > REG_DEAD r78:SI > > > 9: r72:DI=3Dr79:DI<<0x4 > > > REG_DEAD r79:DI > > > Failed to match this instruction: > > > (set (reg:DI 72 [ _1 ]) > > > (and:DI (ashift:DI (reg:DI 81) > > > (const_int 5 [0x5])) > > > (const_int 68719476704 [0xfffffffe0]))) > > > and produce the following (optimized) assembly: > > > f: > > > slliw a5,a0,1 > > > slli a5,a5,32 > > > srli a5,a5,28 > > > add a0,a5,a1 > > > ret > > > > > > The current way of handling this (in 'zero_extendsidi2_shifted') > > > doesn't apply for two reasons: > > > - this is seen before reload, and > > > - (more importantly) the constant mask is not 0xfffffffful. > > > > > > To address this, we introduce a generalized version of shifting > > > zero-extended values that supports any mask of consecutive ones as > > > long as the number of training zeros is the inner shift-amount. > > > > > > With this new split, we generate the following assembly for the > > > aforementioned function: > > > f: > > > slli a0,a0,33 > > > srli a0,a0,28 > > > add a0,a0,a1 > > > ret > > > > > > Unfortunately, all of this causes some fallout (especially in how it > > > interacts with Zb* extensions and zero_extract expressions formed > > > during combine): this is addressed through additional instruction > > > splitting and handling of zero_extract. > > > > > > gcc/ChangeLog: > > > > > > * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed > > > as an and:DI. > > > (*andi_add.uw): New pattern. > > > (*slli_slli_uw): New pattern. > > > (*shift_then_shNadd.uw): New pattern. > > > (*slliuw): Rename to riscv_slli_uw. > > > (riscv_slli_uw): Renamed from *slliuw. > > > (*zeroextract2_highbits): New pattern. > > > (*zero_extract): New pattern, which will be split to > > > shift-left + shift-right. > > > * config/riscv/predicates.md (dimode_shift_operand): > > > * config/riscv/riscv.md (*zero_extract_lowbits): > > > (zero_extendsidi2_shifted): Rename. > > > (*zero_extendsidi2_shifted): Generalize. > > > (*shift_truthvalue): New pattern. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/riscv/shift-shift-6.c: New test. > > > * gcc.target/riscv/shift-shift-7.c: New test. > > > * gcc.target/riscv/shift-shift-8.c: New test. > > > * gcc.target/riscv/shift-shift-9.c: New test. > > > * gcc.target/riscv/snez.c: New test. > > > > > > Commit notes: > > > - Depends on a predicate posted in "RISC-V: Optimize branches testing > > > a bit-range or a shifted immediate". Depending on the order of > > > applying these, I'll take care to pull that part out of the other > > > patch if needed. > > > > > > Version-changes: 2 > > > - refactor > > > - optimise for additional corner cases and deal with fallout > > > > > > Version-changes: 3 > > > - removed the [WIP] from the commit message (no other changes) > > > > > > Signed-off-by: Philipp Tomsich > > > --- > > > > > > (no changes since v1) > > > > > > gcc/config/riscv/bitmanip.md | 142 ++++++++++++++-= --- > > > gcc/config/riscv/predicates.md | 5 + > > > gcc/config/riscv/riscv.md | 75 +++++++-- > > > .../gcc.target/riscv/shift-shift-6.c | 14 ++ > > > .../gcc.target/riscv/shift-shift-7.c | 16 ++ > > > .../gcc.target/riscv/shift-shift-8.c | 20 +++ > > > .../gcc.target/riscv/shift-shift-9.c | 15 ++ > > > gcc/testsuite/gcc.target/riscv/snez.c | 14 ++ > > > 8 files changed, 261 insertions(+), 40 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c > > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c > > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c > > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-9.c > > > create mode 100644 gcc/testsuite/gcc.target/riscv/snez.c > > > > > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip= .md > > > index 78fdf02c2ec..06126ac4819 100644 > > > --- a/gcc/config/riscv/bitmanip.md > > > +++ b/gcc/config/riscv/bitmanip.md > > > @@ -29,7 +29,20 @@ > > > [(set_attr "type" "bitmanip,load") > > > (set_attr "mode" "DI")]) > > > > > > -(define_insn "riscv_shNadd" > > > +;; We may end up forming a slli.uw with an immediate of 0 (while > > > +;; splitting through "*slli_slli_uw", below). > > > +;; Match this back to a zext.w > > > +(define_insn "*zext.w" > > > + [(set (match_operand:DI 0 "register_operand" "=3Dr") > > > + (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") > > > + (const_int 0)) > > > + (const_int 4294967295)))] > > > + "TARGET_64BIT && TARGET_ZBA" > > > + "zext.w\t%0,%1" > > > + [(set_attr "type" "bitmanip") > > > + (set_attr "mode" "DI")]) > > > > Would it be better to detect that we're going to create a shift count o= f > > zero and a -1 mask and emit RTL for a SI->DI zero extension directly > > rather than having this pattern? > > This is an attempt to use the RTL template in the slli_slli_uw insn-and-s= plit. > Of course, we can emit the pattern directly ... it is a question of > what is more readable (which may come down to personal preference). > > Let's try it for the next version and we can still go back to what we had= =E2=80=A6 > > > It overall looks sensible -- I didn't check all the conditions and such= , > > just the overall structure. > > > > > > Jeff > > > >