public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Maciej W. Rozycki" <macro@orcam.me.uk>
To: Alexander Monakov <amonakov@ispras.ru>
Cc: Richard Biener <richard.guenther@gmail.com>,
	 Jeff Law <jeffreyalaw@gmail.com>,
	Andrew Pinski <pinskia@gmail.com>,
	 Palmer Dabbelt <palmer@dabbelt.com>,
	gcc-patches@gcc.gnu.org
Subject: Re: RISC-V: Add divmod instruction support
Date: Tue, 28 Feb 2023 12:54:04 +0000 (GMT)	[thread overview]
Message-ID: <alpine.DEB.2.21.2302231911420.48569@angie.orcam.me.uk> (raw)
In-Reply-To: <72c83a06-6733-a982-04e8-64dbd754cb5e@ispras.ru>

On Mon, 20 Feb 2023, Alexander Monakov wrote:

> > >  That's the kind of stuff I'd expect to happen at the tree level though,
> > > before expand.
> > 
> > The GIMPLE pass forming divmod could indeed choose to emit the
> > div + mul/sub sequence instead if an actual divmod pattern isn't available.
> > It could even generate some fake mul/sub/mod RTXen to cost the two
> > variants against each other but I seriously doubt any uarch that implements
> > division/modulo has a slower mul/sub.
> 
> Making a correct decision requires knowing to which degree the divider is
> pipelined, and costs won't properly reflect that. If the divider accepts
> a new div/mod instruction every couple of cycles, it's faster to just issue
> a div followed by a mod with the same operands.

 I guess there exist microarchitectures that have their divider pipelined, 
but I haven't come across one.  I think division is not an operation that 
is commonly optimised for in hw RTL, whether integer or FP, not at least 
in embedded applications, and given the nature of the operation I believe 
it would be particularly costly in terms of silicon.  I've seen latencies 
and repeat rates of up to 50 quoted in CPU documentation even for 32-bit 
integer division while multiplication had a latency of 5 and a repeat rate 
of 1 (i.e. fully pipelined) for the same microarchitecture.

 So (taking the data dependency into account) the latency of a DIV + MOD 
operation would be 100 for said microarchitecture (and the same repeat 
rate), while a DIV + MUL/SUB sequence would have a latency of 56 and a 
repeat rate of 50.  Quite an improvement.

> Therefore I think in this case it's fair for GIMPLE level to just check if
> the divmod pattern is available, and let the target do the fine tuning via
> the divmod expander.

 Hmm, we have DFA scheduling available that is supposed to give latencies 
and repeat rates for individual operations.  Wouldn't it be possible to 
get hold of this information?

 If we cannot make use of this information at the GIMPLE level (which I'd 
consider regrettable), then I think that maybe we need a target hook to 
say division is cheap that would prevent a DIV + MOD to DIV + MUL/SUB 
transformation from happening, perhaps off by default.  I think it would 
be regrettable too if every backend for targets/subtargets that do not 
have a hardware DIVMOD operation (e.g. MIPS has one at certain ISA levels 
only) or a pipelined division would have to code the transformation by 
hand.

> It would make sense for tree-ssa-mathopts to emit div + mul/sub when neither
> 'divmod' nor 'mod' patterns are available, because RTL expansion will do the
> same, just later, and we'll rely on RTL CSE to clean up the redundant div.

 That as well.

> But RISC-V has both 'div' and 'mod', so as I tried to explain in the first
> paragraph we should let the target decide.

 Still I think it would be best if RISC-V ought to supply a divmod pattern 
only where fused DIV/MOD execution is present in the microarchitecture.

  Maciej

  reply	other threads:[~2023-02-28 12:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-17 14:02 Matevos Mehrabyan
2023-02-18 18:26 ` Palmer Dabbelt
2023-02-18 18:42   ` Andrew Pinski
2023-02-18 19:26     ` Palmer Dabbelt
2023-02-18 19:31     ` Maciej W. Rozycki
2023-02-18 20:57       ` Prathamesh Kulkarni
2023-02-18 21:07       ` Jeff Law
2023-02-19  1:14         ` Maciej W. Rozycki
2023-02-20  8:11           ` Richard Biener
2023-02-20 13:32             ` Alexander Monakov
2023-02-28 12:54               ` Maciej W. Rozycki [this message]
2023-02-18 21:06   ` Jeff Law
2023-02-18 21:30     ` Palmer Dabbelt
2023-02-18 21:57       ` Jeff Law
2023-02-20  1:27       ` Andrew Waterman
2023-04-28 20:09 ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.2302231911420.48569@angie.orcam.me.uk \
    --to=macro@orcam.me.uk \
    --cc=amonakov@ispras.ru \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=pinskia@gmail.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).