From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <philipp.tomsich@vrull.eu>
Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a])
	by sourceware.org (Postfix) with ESMTPS id 208D83858D1E
	for <gcc-patches@gcc.gnu.org>; Thu, 10 Nov 2022 15:09:48 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 208D83858D1E
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu
Received: by mail-lf1-x12a.google.com with SMTP id bp15so3756514lfb.13
        for <gcc-patches@gcc.gnu.org>; Thu, 10 Nov 2022 07:09:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=vrull.eu; s=google;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=D5j1Ac18QweRQlpSSoZqtPjtYKJuj1xOK7hWC66W+tQ=;
        b=JAK9igxU1Vkzq2MOTylx5VoBn8uGtgw4ZMCxuNVZkALboTvR4k1LMMM1Rhq0GNewEN
         Sf2rDHd2J3VrOFgATsp+0rPGoaNY+A0l8XIjGAwrDA4DgahjUAtPe96BTeCEON8+amcT
         WyypIAyfLoCg7rpqcFoe0i6TqqPXV0PjmwBQN0QS3qJa/yDAEonJE/Hi8oIVNmiDb+Uc
         Vvy/J+hsBxSaNasuFyEW4LIn0ZRSn7IsImOr2fo0E+gx4LqSv5Y2uAT/iTw9/MxNWROD
         ebmS6Qu9wFdCTogJqaeo/cXhO0hPAnepPcx+mk8JJ6LZNgSY43Q8Hf0HrWBHM3CZONwH
         fiJw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=D5j1Ac18QweRQlpSSoZqtPjtYKJuj1xOK7hWC66W+tQ=;
        b=veApzRg6gFxV3gqp0mEbmKDvXz6QyUYOM3hXQ3wjvZRQEZ9YiShJilfvLVFBLFFHW/
         g0x/u1PEDfwGLfHP+oPt8P4XrQDL7VBzhedIqx0QRnGkhxpd/1d0a4nZccfQurmFWzy/
         zvmCOpAxTcSk95IFQIfr9j1M5BxeI+bV3EHNTR0C+4/SDr+ASy6YWVPPB4kWEulgvHmM
         XVB/OcSpIWYDy3Ox8Toq/oxx4C5KxAAXJYPj8bFoN+YNYKPFgKDH+fWR6NrPE33SU7xF
         gr813L2K5UmmXYv3/BvUbqN9v/sH8om07U6dI99ECmkVUPwGDmOH0jssMpWMViAbzUPM
         JsdA==
X-Gm-Message-State: ACrzQf1sO1Qf9cYs5iZtl62/xkcYpX30SkbtWQ1fqmt8BLJvQVxRnHVZ
	V6NuHbd+2U3WnhjJ2J+Z293KOfT+Z0mgBvLHoNi/qA==
X-Google-Smtp-Source: AMsMyM5mt5GAgysJ5lFhylZQ2omd5+0EW9GvZuQsfg3pdmV0WeNmh/zU6h9XY/1hF8/oW4J1wVrfIqfSnImHEADDiqU=
X-Received: by 2002:ac2:58e3:0:b0:4b0:fa45:9423 with SMTP id
 v3-20020ac258e3000000b004b0fa459423mr1604488lfo.154.1668092986357; Thu, 10
 Nov 2022 07:09:46 -0800 (PST)
MIME-Version: 1.0
References: <20221108195434.2701247-1-philipp.tomsich@vrull.eu> <mhng-3a1b1869-3786-43ae-b543-e5e245ded6d4@palmer-ri-x1c9a>
In-Reply-To: <mhng-3a1b1869-3786-43ae-b543-e5e245ded6d4@palmer-ri-x1c9a>
From: Philipp Tomsich <philipp.tomsich@vrull.eu>
Date: Thu, 10 Nov 2022 16:09:35 +0100
Message-ID: <CAAeLtUBG_CBWLFpmcWgA+Tg4hVesovrtD_=_78gusn=-ChnTmA@mail.gmail.com>
Subject: Re: [PATCH] RISC-V: costs: support shift-and-add in strength-reduction
To: Palmer Dabbelt <palmer@rivosinc.com>
Cc: gcc-patches@gcc.gnu.org, Kito Cheng <kito.cheng@gmail.com>, 
	Vineet Gupta <vineetg@rivosinc.com>, christoph.muellner@vrull.eu, jlaw@ventanamicro.com
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Thu, 10 Nov 2022 at 02:46, Palmer Dabbelt <palmer@rivosinc.com> wrote:
>
> On Tue, 08 Nov 2022 11:54:34 PST (-0800), philipp.tomsich@vrull.eu wrote:
> > The strength-reduction implementation in expmed.c will assess the
> > profitability of using shift-and-add using a RTL expression that wraps
> > a MULT (with a power-of-2) in a PLUS.  Unless the RISC-V rtx_costs
> > function recognizes this as expressing a sh[123]add instruction, we
> > will return an inflated cost---thus defeating the optimization.
> >
> > This change adds the necessary idiom recognition to provide an
> > accurate cost for this for of expressing sh[123]add.
> >
> > Instead on expanding to
> >       li      a5,200
> >       mulw    a0,a5,a0
> > with this change, the expression 'a * 200' is sythesized as:
> >       sh2add  a0,a0,a0   // *5 = a + 4 * a
> >       sh2add  a0,a0,a0   // *5 = a + 4 * a
> >       slli    a0,a0,3    // *8
>
> That's more instructions, but multiplication is generally expensive.  At
> some point I remember the SiFive cores getting very fast integer
> multipliers, but I don't see that reflected in the cost model anywhere
> so maybe I'm just wrong?  Andrew or Kito might remember...
>
> If the mul-based sequences are still faster on the SiFive cores then we
> should probably find a way to keep emitting them, which may just be a
> matter of adjusting those multiply costs.  Moving to the shift-based
> sequences seems reasonable for a generic target, though.

The cost for a regular MULT is COSTS_N_INSNS(4) for the series-7 (see
the SImode and DImode entries in the int_mul line):
/* Costs to use when optimizing for Sifive 7 Series.  */
static const struct riscv_tune_param sifive_7_tune_info = {
  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},       /* fp_add */
  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},       /* fp_mul */
  {COSTS_N_INSNS (20), COSTS_N_INSNS (20)},     /* fp_div */
  {COSTS_N_INSNS (4), COSTS_N_INSNS (4)},       /* int_mul */
  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},       /* int_div */
  2,                                            /* issue_rate */
  4,                                            /* branch_cost */
  3,                                            /* memory_cost */
  8,                                            /* fmv_cost */
  true,                                         /* slow_unaligned_access */
};

So the break-even is at COSTS_N_INSNS(4) + rtx_cost(immediate).

Testing against series-7, we get up to 5 (4 for the mul + 1 for the
li) instructions from strength reduction:

val * 783
=>
sh1add a5,a0,a0
slli a5,a5,4
add a5,a5,a0
slli a5,a5,4
sub a0,a5,a0

but fall back to a mul, once the cost exceeds this:

val * 1574
=>
li a5,1574
mul a0,a0,a5

> Either way, it probably warrants a test case to make sure we don't
> regress in the future.

Ack. Will be added for v2.

>
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
> >       if expressed as a plus and multiplication with a power-of-2.

This will still need to be regenerated (it's referring to a '.c'
extension still).

> >
> > ---
> >
> >  gcc/config/riscv/riscv.cc | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index ab6c745c722..0b2c4b3599d 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -2451,6 +2451,19 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
> >         *total = COSTS_N_INSNS (1);
> >         return true;
> >       }
> > +      /* Before strength-reduction, the shNadd can be expressed as the addition
> > +      of a multiplication with a power-of-two.  If this case is not handled,
> > +      the strength-reduction in expmed.c will calculate an inflated cost. */
> > +      if (TARGET_ZBA
> > +       && mode == word_mode
> > +       && GET_CODE (XEXP (x, 0)) == MULT
> > +       && REG_P (XEXP (XEXP (x, 0), 0))
> > +       && CONST_INT_P (XEXP (XEXP (x, 0), 1))
> > +       && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3))
>
> IIUC the fall-through is biting us here and this matches power-of-2 +1
> and power-of-2 -1.  That looks to be the case for the one below, though,
> so not sure if I'm just missing something?

The strength-reduction in expmed.cc uses "(PLUS (reg) (MULT (reg)
<pow2>))" to express a shift-then-add.
Here's one of the relevant snippets (from the internal costing in expmed.cc):
  all.shift_mult = gen_rtx_MULT (mode, all.reg, all.reg);
  all.shift_add = gen_rtx_PLUS (mode, all.shift_mult, all.reg);

So while we normally encounter a "(PLUS (reg) (ASHIFT (reg)
<shamt>))", for the strength-reduction we also need to provide the
cost for the expression with a MULT).
The other idioms (those matching above and below the new one) always
require an ASHIFT for the inner.

>
> > +     {
> > +       *total = COSTS_N_INSNS (1);
> > +       return true;
> > +     }
> >        /* shNadd.uw pattern for zba.
> >        [(set (match_operand:DI 0 "register_operand" "=r")
> >              (plus:DI