From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Z3Ow=6Y=orcam.me.uk=macro@sourceware.org>
Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::34])
	by sourceware.org (Postfix) with ESMTP id 36E273858D33
	for <gcc-patches@gcc.gnu.org>; Tue, 28 Feb 2023 12:54:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 36E273858D33
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk
Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk
Received: by angie.orcam.me.uk (Postfix, from userid 500)
	id AC98492009C; Tue, 28 Feb 2023 13:54:04 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
	by angie.orcam.me.uk (Postfix) with ESMTP id A577992009B;
	Tue, 28 Feb 2023 12:54:04 +0000 (GMT)
Date: Tue, 28 Feb 2023 12:54:04 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@orcam.me.uk>
To: Alexander Monakov <amonakov@ispras.ru>
cc: Richard Biener <richard.guenther@gmail.com>, 
    Jeff Law <jeffreyalaw@gmail.com>, Andrew Pinski <pinskia@gmail.com>, 
    Palmer Dabbelt <palmer@dabbelt.com>, gcc-patches@gcc.gnu.org
Subject: Re: RISC-V: Add divmod instruction support
In-Reply-To: <72c83a06-6733-a982-04e8-64dbd754cb5e@ispras.ru>
Message-ID: <alpine.DEB.2.21.2302231911420.48569@angie.orcam.me.uk>
References: <CALXbNshUoKPUqJiL2Nit7t4Hahg0wYMPMxcWRrwWZf8=7BADng@mail.gmail.com> <mhng-80df8fcb-f95d-4060-bbd1-a14cb4854250@palmer-ri-x1c9a> <CA+=Sn1m3TBcuPPLhy+EOyejAfKaK6SaD-8LeBqYdNCLzp_K2PA@mail.gmail.com> <alpine.DEB.2.21.2302181848430.25434@angie.orcam.me.uk>
 <26ca669c-70ce-e475-717e-3c36f1e1c703@gmail.com> <alpine.DEB.2.21.2302190110550.25434@angie.orcam.me.uk> <CAFiYyc2o0Gtu63ENpnULpovNevUm6Oxb3cCBDc+Ag=GOHhauhA@mail.gmail.com> <72c83a06-6733-a982-04e8-64dbd754cb5e@ispras.ru>
User-Agent: Alpine 2.21 (DEB 202 2017-01-01)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-3488.9 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,KAM_INFOUSMEBIZ,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, 20 Feb 2023, Alexander Monakov wrote:

> > >  That's the kind of stuff I'd expect to happen at the tree level though,
> > > before expand.
> > 
> > The GIMPLE pass forming divmod could indeed choose to emit the
> > div + mul/sub sequence instead if an actual divmod pattern isn't available.
> > It could even generate some fake mul/sub/mod RTXen to cost the two
> > variants against each other but I seriously doubt any uarch that implements
> > division/modulo has a slower mul/sub.
> 
> Making a correct decision requires knowing to which degree the divider is
> pipelined, and costs won't properly reflect that. If the divider accepts
> a new div/mod instruction every couple of cycles, it's faster to just issue
> a div followed by a mod with the same operands.

 I guess there exist microarchitectures that have their divider pipelined, 
but I haven't come across one.  I think division is not an operation that 
is commonly optimised for in hw RTL, whether integer or FP, not at least 
in embedded applications, and given the nature of the operation I believe 
it would be particularly costly in terms of silicon.  I've seen latencies 
and repeat rates of up to 50 quoted in CPU documentation even for 32-bit 
integer division while multiplication had a latency of 5 and a repeat rate 
of 1 (i.e. fully pipelined) for the same microarchitecture.

 So (taking the data dependency into account) the latency of a DIV + MOD 
operation would be 100 for said microarchitecture (and the same repeat 
rate), while a DIV + MUL/SUB sequence would have a latency of 56 and a 
repeat rate of 50.  Quite an improvement.

> Therefore I think in this case it's fair for GIMPLE level to just check if
> the divmod pattern is available, and let the target do the fine tuning via
> the divmod expander.

 Hmm, we have DFA scheduling available that is supposed to give latencies 
and repeat rates for individual operations.  Wouldn't it be possible to 
get hold of this information?

 If we cannot make use of this information at the GIMPLE level (which I'd 
consider regrettable), then I think that maybe we need a target hook to 
say division is cheap that would prevent a DIV + MOD to DIV + MUL/SUB 
transformation from happening, perhaps off by default.  I think it would 
be regrettable too if every backend for targets/subtargets that do not 
have a hardware DIVMOD operation (e.g. MIPS has one at certain ISA levels 
only) or a pipelined division would have to code the transformation by 
hand.

> It would make sense for tree-ssa-mathopts to emit div + mul/sub when neither
> 'divmod' nor 'mod' patterns are available, because RTL expansion will do the
> same, just later, and we'll rely on RTL CSE to clean up the redundant div.

 That as well.

> But RISC-V has both 'div' and 'mod', so as I tried to explain in the first
> paragraph we should let the target decide.

 Still I think it would be best if RISC-V ought to supply a divmod pattern 
only where fused DIV/MOD execution is present in the microarchitecture.

  Maciej