From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <npickito@gmail.com>
Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com
 [IPv6:2a00:1450:4864:20::530])
 by sourceware.org (Postfix) with ESMTPS id B7D0F3814FD1
 for <gcc-patches@gcc.gnu.org>; Tue,  7 Jun 2022 13:18:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B7D0F3814FD1
Received: by mail-ed1-x530.google.com with SMTP id b8so6282601edj.11
 for <gcc-patches@gcc.gnu.org>; Tue, 07 Jun 2022 06:18:51 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=KzQk6v4YNfUNSynTkXKCv2oGNZUnvluPvjCSDHi102E=;
 b=lDAU28a6Vqp6vLkjBfafmzPkYu5QngD2Z6hKMppzDiErakohiQKK7HCaoxlmVbZE5l
 mPVPvUDlG8YW9noAqBNVOo9HpBPViCuqJs8jsxjDteFGVzeVMg0TtF9RlNCwBarrSsUG
 hFDZfQuu3XLwVAgy/9qyLSKdt0BixEJPHTvkhIhsXGfCt6y354ZUGNkuRK+2X/CEqaVY
 NuhYDnwY4IyDWEBWIP1KYXAMK88DGU3uRvzBpRZpJTpvJiEmu38Evf9lZwUWOF1bb7jv
 v6wYbQ7Hx1QQLc/04uKgXdzpW82gD0nREgm4sOooycTP6tZaU+K+NMiqXBUplbhUhiHT
 uYSw==
X-Gm-Message-State: AOAM5337JfTvNwEETKnMo6rjzD7hzYLDezSxa0+dSIQNXI+/7qRsGnNJ
 u9df/z+ngth4govg6PDb0hX6rxZwuF67M2pjftY=
X-Google-Smtp-Source: ABdhPJxjs1OWzjVNlBMrl1YKim3zw1c5HXkWG3LscOvLNDzz1W1UOEmzdBrYN7ooT+MrzZrRQY0TS89xGNSHrYhYFe4=
X-Received: by 2002:a05:6402:516e:b0:42d:c48b:b724 with SMTP id
 d14-20020a056402516e00b0042dc48bb724mr33576847ede.93.1654607930029; Tue, 07
 Jun 2022 06:18:50 -0700 (PDT)
MIME-Version: 1.0
References: <20220524214703.4022737-1-philipp.tomsich@vrull.eu>
 <20220524214703.4022737-4-philipp.tomsich@vrull.eu>
 <CA+yXCZDTgMh1o_fbnPZY+UGVG8Q9-Buupk7iiS7JiW=+fJHeqQ@mail.gmail.com>
 <CAAeLtUALEtY0+FfXp2w50JBaJ2LdDxh5zj34q8tUD1x9RX4SXg@mail.gmail.com>
In-Reply-To: <CAAeLtUALEtY0+FfXp2w50JBaJ2LdDxh5zj34q8tUD1x9RX4SXg@mail.gmail.com>
From: Kito Cheng <kito.cheng@gmail.com>
Date: Tue, 7 Jun 2022 21:18:37 +0800
Message-ID: <CA+yXCZA+u1_J2J-VQ6R=uP11CQnt=8NGrGDN8MV+DzNG01w6vg@mail.gmail.com>
Subject: Re: [PATCH v1 3/3] RISC-V: Replace zero_extendsidi2_shifted with
 generalized split
To: Philipp Tomsich <philipp.tomsich@vrull.eu>
Cc: Andrew Waterman <andrew@sifive.com>, Vineet Gupta <vineetg@rivosinc.com>, 
 GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jun 2022 13:18:55 -0000

Using the same pseudo register makes one longer live range instead of
two shorter live ranges, that's not good when inst. scheduler try to
separate those two instructions, and I think register allocator has
more complete knowledge to decide which way is better - using the same
or different, so I prefer to use another pseudo here if possible.

That's also what AArch64/ARM/x86 port did - use new pseudo as tmp if possible.


On Tue, Jun 7, 2022 at 6:50 PM Philipp Tomsich <philipp.tomsich@vrull.eu> wrote:
>
> On Tue, 7 Jun 2022 at 12:24, Kito Cheng <kito.cheng@gmail.com> wrote:
> >
> > On Wed, May 25, 2022 at 5:47 AM Philipp Tomsich
> > <philipp.tomsich@vrull.eu> wrote:
> > >
> > > The current method of treating shifts of extended values on RISC-V
> > > frequently causes sequences of 3 shifts, despite the presence of the
> > > 'zero_extendsidi2_shifted' pattern.
> > >
> > > Consider:
> > >     unsigned long f(unsigned int a, unsigned long b)
> > >     {
> > >             a = a << 1;
> > >             unsigned long c = (unsigned long) a;
> > >             c = b + (c<<4);
> > >             return c;
> > >     }
> > > which will present at combine-time as:
> > >     Trying 7, 8 -> 9:
> > >         7: r78:SI=r81:DI#0<<0x1
> > >           REG_DEAD r81:DI
> > >         8: r79:DI=zero_extend(r78:SI)
> > >           REG_DEAD r78:SI
> > >         9: r72:DI=r79:DI<<0x4
> > >           REG_DEAD r79:DI
> > >     Failed to match this instruction:
> > >     (set (reg:DI 72 [ _1 ])
> > >         (and:DI (ashift:DI (reg:DI 81)
> > >                 (const_int 5 [0x5]))
> > >         (const_int 68719476704 [0xfffffffe0])))
> > > and produce the following (optimized) assembly:
> > >     f:
> > >         slliw   a5,a0,1
> > >         slli    a5,a5,32
> > >         srli    a5,a5,28
> > >         add     a0,a5,a1
> > >         ret
> > >
> > > The current way of handling this (in 'zero_extendsidi2_shifted')
> > > doesn't apply for two reasons:
> > > - this is seen before reload, and
> > > - (more importantly) the constant mask is not 0xfffffffful.
> > >
> > > To address this, we introduce a generalized version of shifting
> > > zero-extended values that supports any mask of consecutive ones as
> > > long as the number of training zeros is the inner shift-amount.
> > >
> > > With this new split, we generate the following assembly for the
> > > aforementioned function:
> > >     f:
> > >         slli    a0,a0,33
> > >         srli    a0,a0,28
> > >         add     a0,a0,a1
> > >         ret
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/riscv/riscv.md (zero_extendsidi2_shifted): Replace
> > >           with a generalized split that requires no clobber, runs
> > >           before reload and works for smaller masks.
> > >
> > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > > ---
> > >
> > >  gcc/config/riscv/riscv.md | 37 ++++++++++++++++++++-----------------
> > >  1 file changed, 20 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > > index b8ab0cf169a..cc10cd90a74 100644
> > > --- a/gcc/config/riscv/riscv.md
> > > +++ b/gcc/config/riscv/riscv.md
> > > @@ -2119,23 +2119,26 @@ (define_split
> > >  ;; occur when unsigned int is used for array indexing.  Split this into two
> > >  ;; shifts.  Otherwise we can get 3 shifts.
> > >
> > > -(define_insn_and_split "zero_extendsidi2_shifted"
> > > -  [(set (match_operand:DI 0 "register_operand" "=r")
> > > -       (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> > > -                          (match_operand:QI 2 "immediate_operand" "I"))
> > > -               (match_operand 3 "immediate_operand" "")))
> > > -   (clobber (match_scratch:DI 4 "=&r"))]
> > > -  "TARGET_64BIT && !TARGET_ZBA
> > > -   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0xffffffff)"
> > > -  "#"
> > > -  "&& reload_completed"
> > > -  [(set (match_dup 4)
> > > -       (ashift:DI (match_dup 1) (const_int 32)))
> > > -   (set (match_dup 0)
> > > -       (lshiftrt:DI (match_dup 4) (match_dup 5)))]
> > > -  "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));"
> > > -  [(set_attr "type" "shift")
> > > -   (set_attr "mode" "DI")])
> > > +(define_split
> > > +  [(set (match_operand:DI 0 "register_operand")
> > > +       (and:DI (ashift:DI (match_operand:DI 1 "register_operand")
> > > +                          (match_operand:QI 2 "immediate_operand"))
> > > +               (match_operand:DI 3 "consecutive_bits_operand")))]
> > > +  "TARGET_64BIT"
> > > +  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4)))
> > > +   (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))]
> >
> > I would prefer to keep using another register if possible:
> >
> > like this:
> > +  [(set (match_dup 6) (ashift:DI (match_dup 1) (match_dup 4)))
> > +   (set (match_dup 0) (lshiftrt:DI (match_dup 6) (match_dup 5)))]
> >
> > if (can_create_pseudo_p)
> >   operands[6] = gen_reg_rtx (DImode);
> > else
> >   operands[6] = operands[0];
>
> I don't see the benefit to this (unless you expect opportunities for
> CSE), as there will be a linear dependency chain anyway.  I'd like to
> understand your reasoning behind this a bit better, as our style
> currently generally tries to not avoid introducing temporaries if it
> is avoidable.
>
> Thanks,
> Philipp.
>
> >
> > > +{
> > > +       unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]);
> > > +       int leading = clz_hwi (mask);
> > > +       int trailing = ctz_hwi (mask);
> > > +
> > > +       /* The shift-amount must match the number of trailing bits */
> > > +       if (trailing != UINTVAL (operands[2]))
> > > +          FAIL;
> > > +
> > > +       operands[4] = GEN_INT (leading + trailing);
> > > +       operands[5] = GEN_INT (leading);
> > > +})
> > >
> > >  ;;
> > >  ;;  ....................
> > > --
> > > 2.34.1
> > >