From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) by sourceware.org (Postfix) with ESMTPS id 208D83858D1E for ; Thu, 10 Nov 2022 15:09:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 208D83858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-lf1-x12a.google.com with SMTP id bp15so3756514lfb.13 for ; Thu, 10 Nov 2022 07:09:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=D5j1Ac18QweRQlpSSoZqtPjtYKJuj1xOK7hWC66W+tQ=; b=JAK9igxU1Vkzq2MOTylx5VoBn8uGtgw4ZMCxuNVZkALboTvR4k1LMMM1Rhq0GNewEN Sf2rDHd2J3VrOFgATsp+0rPGoaNY+A0l8XIjGAwrDA4DgahjUAtPe96BTeCEON8+amcT WyypIAyfLoCg7rpqcFoe0i6TqqPXV0PjmwBQN0QS3qJa/yDAEonJE/Hi8oIVNmiDb+Uc Vvy/J+hsBxSaNasuFyEW4LIn0ZRSn7IsImOr2fo0E+gx4LqSv5Y2uAT/iTw9/MxNWROD ebmS6Qu9wFdCTogJqaeo/cXhO0hPAnepPcx+mk8JJ6LZNgSY43Q8Hf0HrWBHM3CZONwH fiJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=D5j1Ac18QweRQlpSSoZqtPjtYKJuj1xOK7hWC66W+tQ=; b=veApzRg6gFxV3gqp0mEbmKDvXz6QyUYOM3hXQ3wjvZRQEZ9YiShJilfvLVFBLFFHW/ g0x/u1PEDfwGLfHP+oPt8P4XrQDL7VBzhedIqx0QRnGkhxpd/1d0a4nZccfQurmFWzy/ zvmCOpAxTcSk95IFQIfr9j1M5BxeI+bV3EHNTR0C+4/SDr+ASy6YWVPPB4kWEulgvHmM XVB/OcSpIWYDy3Ox8Toq/oxx4C5KxAAXJYPj8bFoN+YNYKPFgKDH+fWR6NrPE33SU7xF gr813L2K5UmmXYv3/BvUbqN9v/sH8om07U6dI99ECmkVUPwGDmOH0jssMpWMViAbzUPM JsdA== X-Gm-Message-State: ACrzQf1sO1Qf9cYs5iZtl62/xkcYpX30SkbtWQ1fqmt8BLJvQVxRnHVZ V6NuHbd+2U3WnhjJ2J+Z293KOfT+Z0mgBvLHoNi/qA== X-Google-Smtp-Source: AMsMyM5mt5GAgysJ5lFhylZQ2omd5+0EW9GvZuQsfg3pdmV0WeNmh/zU6h9XY/1hF8/oW4J1wVrfIqfSnImHEADDiqU= X-Received: by 2002:ac2:58e3:0:b0:4b0:fa45:9423 with SMTP id v3-20020ac258e3000000b004b0fa459423mr1604488lfo.154.1668092986357; Thu, 10 Nov 2022 07:09:46 -0800 (PST) MIME-Version: 1.0 References: <20221108195434.2701247-1-philipp.tomsich@vrull.eu> In-Reply-To: From: Philipp Tomsich Date: Thu, 10 Nov 2022 16:09:35 +0100 Message-ID: Subject: Re: [PATCH] RISC-V: costs: support shift-and-add in strength-reduction To: Palmer Dabbelt Cc: gcc-patches@gcc.gnu.org, Kito Cheng , Vineet Gupta , christoph.muellner@vrull.eu, jlaw@ventanamicro.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 10 Nov 2022 at 02:46, Palmer Dabbelt wrote: > > On Tue, 08 Nov 2022 11:54:34 PST (-0800), philipp.tomsich@vrull.eu wrote: > > The strength-reduction implementation in expmed.c will assess the > > profitability of using shift-and-add using a RTL expression that wraps > > a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs > > function recognizes this as expressing a sh[123]add instruction, we > > will return an inflated cost---thus defeating the optimization. > > > > This change adds the necessary idiom recognition to provide an > > accurate cost for this for of expressing sh[123]add. > > > > Instead on expanding to > > li a5,200 > > mulw a0,a5,a0 > > with this change, the expression 'a * 200' is sythesized as: > > sh2add a0,a0,a0 // *5 = a + 4 * a > > sh2add a0,a0,a0 // *5 = a + 4 * a > > slli a0,a0,3 // *8 > > That's more instructions, but multiplication is generally expensive. At > some point I remember the SiFive cores getting very fast integer > multipliers, but I don't see that reflected in the cost model anywhere > so maybe I'm just wrong? Andrew or Kito might remember... > > If the mul-based sequences are still faster on the SiFive cores then we > should probably find a way to keep emitting them, which may just be a > matter of adjusting those multiply costs. Moving to the shift-based > sequences seems reasonable for a generic target, though. The cost for a regular MULT is COSTS_N_INSNS(4) for the series-7 (see the SImode and DImode entries in the int_mul line): /* Costs to use when optimizing for Sifive 7 Series. */ static const struct riscv_tune_param sifive_7_tune_info = { {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_add */ {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */ {COSTS_N_INSNS (20), COSTS_N_INSNS (20)}, /* fp_div */ {COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */ {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */ 2, /* issue_rate */ 4, /* branch_cost */ 3, /* memory_cost */ 8, /* fmv_cost */ true, /* slow_unaligned_access */ }; So the break-even is at COSTS_N_INSNS(4) + rtx_cost(immediate). Testing against series-7, we get up to 5 (4 for the mul + 1 for the li) instructions from strength reduction: val * 783 => sh1add a5,a0,a0 slli a5,a5,4 add a5,a5,a0 slli a5,a5,4 sub a0,a5,a0 but fall back to a mul, once the cost exceeds this: val * 1574 => li a5,1574 mul a0,a0,a5 > Either way, it probably warrants a test case to make sure we don't > regress in the future. Ack. Will be added for v2. > > > > > gcc/ChangeLog: > > > > * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd, > > if expressed as a plus and multiplication with a power-of-2. This will still need to be regenerated (it's referring to a '.c' extension still). > > > > --- > > > > gcc/config/riscv/riscv.cc | 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > > index ab6c745c722..0b2c4b3599d 100644 > > --- a/gcc/config/riscv/riscv.cc > > +++ b/gcc/config/riscv/riscv.cc > > @@ -2451,6 +2451,19 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN > > *total = COSTS_N_INSNS (1); > > return true; > > } > > + /* Before strength-reduction, the shNadd can be expressed as the addition > > + of a multiplication with a power-of-two. If this case is not handled, > > + the strength-reduction in expmed.c will calculate an inflated cost. */ > > + if (TARGET_ZBA > > + && mode == word_mode > > + && GET_CODE (XEXP (x, 0)) == MULT > > + && REG_P (XEXP (XEXP (x, 0), 0)) > > + && CONST_INT_P (XEXP (XEXP (x, 0), 1)) > > + && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3)) > > IIUC the fall-through is biting us here and this matches power-of-2 +1 > and power-of-2 -1. That looks to be the case for the one below, though, > so not sure if I'm just missing something? The strength-reduction in expmed.cc uses "(PLUS (reg) (MULT (reg) ))" to express a shift-then-add. Here's one of the relevant snippets (from the internal costing in expmed.cc): all.shift_mult = gen_rtx_MULT (mode, all.reg, all.reg); all.shift_add = gen_rtx_PLUS (mode, all.shift_mult, all.reg); So while we normally encounter a "(PLUS (reg) (ASHIFT (reg) ))", for the strength-reduction we also need to provide the cost for the expression with a MULT). The other idioms (those matching above and below the new one) always require an ASHIFT for the inner. > > > + { > > + *total = COSTS_N_INSNS (1); > > + return true; > > + } > > /* shNadd.uw pattern for zba. > > [(set (match_operand:DI 0 "register_operand" "=r") > > (plus:DI