From: "Maciej W. Rozycki" <macro@orcam.me.uk>
To: Alexander Monakov <amonakov@ispras.ru>
Cc: Richard Biener <richard.guenther@gmail.com>,
Jeff Law <jeffreyalaw@gmail.com>,
Andrew Pinski <pinskia@gmail.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
gcc-patches@gcc.gnu.org
Subject: Re: RISC-V: Add divmod instruction support
Date: Tue, 28 Feb 2023 12:54:04 +0000 (GMT) [thread overview]
Message-ID: <alpine.DEB.2.21.2302231911420.48569@angie.orcam.me.uk> (raw)
In-Reply-To: <72c83a06-6733-a982-04e8-64dbd754cb5e@ispras.ru>
On Mon, 20 Feb 2023, Alexander Monakov wrote:
> > > That's the kind of stuff I'd expect to happen at the tree level though,
> > > before expand.
> >
> > The GIMPLE pass forming divmod could indeed choose to emit the
> > div + mul/sub sequence instead if an actual divmod pattern isn't available.
> > It could even generate some fake mul/sub/mod RTXen to cost the two
> > variants against each other but I seriously doubt any uarch that implements
> > division/modulo has a slower mul/sub.
>
> Making a correct decision requires knowing to which degree the divider is
> pipelined, and costs won't properly reflect that. If the divider accepts
> a new div/mod instruction every couple of cycles, it's faster to just issue
> a div followed by a mod with the same operands.
I guess there exist microarchitectures that have their divider pipelined,
but I haven't come across one. I think division is not an operation that
is commonly optimised for in hw RTL, whether integer or FP, not at least
in embedded applications, and given the nature of the operation I believe
it would be particularly costly in terms of silicon. I've seen latencies
and repeat rates of up to 50 quoted in CPU documentation even for 32-bit
integer division while multiplication had a latency of 5 and a repeat rate
of 1 (i.e. fully pipelined) for the same microarchitecture.
So (taking the data dependency into account) the latency of a DIV + MOD
operation would be 100 for said microarchitecture (and the same repeat
rate), while a DIV + MUL/SUB sequence would have a latency of 56 and a
repeat rate of 50. Quite an improvement.
> Therefore I think in this case it's fair for GIMPLE level to just check if
> the divmod pattern is available, and let the target do the fine tuning via
> the divmod expander.
Hmm, we have DFA scheduling available that is supposed to give latencies
and repeat rates for individual operations. Wouldn't it be possible to
get hold of this information?
If we cannot make use of this information at the GIMPLE level (which I'd
consider regrettable), then I think that maybe we need a target hook to
say division is cheap that would prevent a DIV + MOD to DIV + MUL/SUB
transformation from happening, perhaps off by default. I think it would
be regrettable too if every backend for targets/subtargets that do not
have a hardware DIVMOD operation (e.g. MIPS has one at certain ISA levels
only) or a pipelined division would have to code the transformation by
hand.
> It would make sense for tree-ssa-mathopts to emit div + mul/sub when neither
> 'divmod' nor 'mod' patterns are available, because RTL expansion will do the
> same, just later, and we'll rely on RTL CSE to clean up the redundant div.
That as well.
> But RISC-V has both 'div' and 'mod', so as I tried to explain in the first
> paragraph we should let the target decide.
Still I think it would be best if RISC-V ought to supply a divmod pattern
only where fused DIV/MOD execution is present in the microarchitecture.
Maciej
next prev parent reply other threads:[~2023-02-28 12:54 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-17 14:02 Matevos Mehrabyan
2023-02-18 18:26 ` Palmer Dabbelt
2023-02-18 18:42 ` Andrew Pinski
2023-02-18 19:26 ` Palmer Dabbelt
2023-02-18 19:31 ` Maciej W. Rozycki
2023-02-18 20:57 ` Prathamesh Kulkarni
2023-02-18 21:07 ` Jeff Law
2023-02-19 1:14 ` Maciej W. Rozycki
2023-02-20 8:11 ` Richard Biener
2023-02-20 13:32 ` Alexander Monakov
2023-02-28 12:54 ` Maciej W. Rozycki [this message]
2023-02-18 21:06 ` Jeff Law
2023-02-18 21:30 ` Palmer Dabbelt
2023-02-18 21:57 ` Jeff Law
2023-02-20 1:27 ` Andrew Waterman
2023-04-28 20:09 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.21.2302231911420.48569@angie.orcam.me.uk \
--to=macro@orcam.me.uk \
--cc=amonakov@ispras.ru \
--cc=gcc-patches@gcc.gnu.org \
--cc=jeffreyalaw@gmail.com \
--cc=palmer@dabbelt.com \
--cc=pinskia@gmail.com \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).