public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC/RFA][PATCH 0/2] SVE intrinsics: Add strength reduction for division by constant.
@ 2024-07-17  6:58 Jennifer Schmitz
  2024-07-17  7:29 ` Richard Sandiford
  0 siblings, 1 reply; 3+ messages in thread
From: Jennifer Schmitz @ 2024-07-17  6:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

This patch series is part of an ongoing effort to replace the SVE intrinsic svdiv
by lower-strength instructions for division by constant. To that end, we
implemented svdiv_impl::fold to perform the following transformation in gimple:
- Division where all divisors are the same power of 2 --> svasrd
- Division where all divisors are powers of 2 --> svasr
We chose svdiv_impl::fold as location for the implementation to have the
transform applied as early as possible, such that other (existing or future)
gimple optimizations can be applied on the result.
Currently, the transform to is only applied for signed integers, because
there do not exist an unsigned svasrd and svasr. The transform has not (yet)
been implemented for svdivr.

Please also comment/advise on the following:
In a next patch, we would like to replace SVE division by constants (other
than powers of 2) by multiply and shifts, similar as for scalar division.
This is planned to be implemented in the gimple_folder as well. Thoughts?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4641 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC/RFA][PATCH 0/2] SVE intrinsics: Add strength reduction for division by constant.
  2024-07-17  6:58 [RFC/RFA][PATCH 0/2] SVE intrinsics: Add strength reduction for division by constant Jennifer Schmitz
@ 2024-07-17  7:29 ` Richard Sandiford
  2024-07-29 13:51   ` Jennifer Schmitz
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Sandiford @ 2024-07-17  7:29 UTC (permalink / raw)
  To: Jennifer Schmitz; +Cc: gcc-patches, Kyrylo Tkachov

Jennifer Schmitz <jschmitz@nvidia.com> writes:
> This patch series is part of an ongoing effort to replace the SVE intrinsic svdiv
> by lower-strength instructions for division by constant. To that end, we
> implemented svdiv_impl::fold to perform the following transformation in gimple:
> - Division where all divisors are the same power of 2 --> svasrd

Sounds good.

> - Division where all divisors are powers of 2 --> svasr

I don't think this is correct for negative dividends (which is why
ASRD exists).  E.g. -1 / 4 is 0 as computed by svdiv (round towards zero),
but -1 as computed by svasr (round towards -Inf).

> We chose svdiv_impl::fold as location for the implementation to have the
> transform applied as early as possible, such that other (existing or future)
> gimple optimizations can be applied on the result.
> Currently, the transform to is only applied for signed integers, because
> there do not exist an unsigned svasrd and svasr. The transform has not (yet)
> been implemented for svdivr.

FWIW, using svlsr for unsigned divisions should be OK.

> Please also comment/advise on the following:
> In a next patch, we would like to replace SVE division by constants (other
> than powers of 2) by multiply and shifts, similar as for scalar division.
> This is planned to be implemented in the gimple_folder as well. Thoughts?

I'm a bit uneasy about going that far.  I suppose it comes down to a
question about what intrinsics are for.  Are they for describing an
algorithm, or for hand-optimising a specific implementation of the
algorithm?  IMO it's the latter.

If people want to write out a calculation in natural arithmetic, it
would be better to write the algorithm in scalar code and let the
vectoriser handle it.  That gives the opportunity for many more
optimisations than just this one.

Intrinsics are about giving programmers direct, architecture-level
control over how something is implemented.  I've seen Arm's library
teams go to great lengths to work out which out of a choice of
instruction sequences is the best one, even though the sequences in
question would look functionally equivalent to a smart-enough compiler.

So part of the work of using intrinsics is to figure out what the best
sequence is.  And IMO, part of the contract is that the compiler
shouldn't interfere with the programmer's choices too much.  If the
compiler makes a change, it must very confident that it is a win for
the function as a whole.

Replacing one division with one shift is fine, as an aid to the programmer.
It removes the need for (say) templated functions to check for that case
manually.  Constant folding is fine too, for similar reasons.  In these
cases, there's not really a cost/benefit choice to be made between
different expansions.  One choice is objectively better in all
realistic situations.

But when it comes to general constants, there are many different choices
that could be made when deciding which constants should be open-coded
and which shouldn't.  IMO we should leave the choice to the programmer
in those cases.  If the compiler gets it wrong, there will be no way
for the programmer to force the compiler's hand ("no, when I say svdiv,
I really do mean svdiv").

Thanks,
Richard

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC/RFA][PATCH 0/2] SVE intrinsics: Add strength reduction for division by constant.
  2024-07-17  7:29 ` Richard Sandiford
@ 2024-07-29 13:51   ` Jennifer Schmitz
  0 siblings, 0 replies; 3+ messages in thread
From: Jennifer Schmitz @ 2024-07-29 13:51 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 3829 bytes --]

On 17 Jul 2024, at 09:29, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>> This patch series is part of an ongoing effort to replace the SVE intrinsic svdiv
>> by lower-strength instructions for division by constant. To that end, we
>> implemented svdiv_impl::fold to perform the following transformation in gimple:
>> - Division where all divisors are the same power of 2 --> svasrd
> 
> Sounds good.
> 
>> - Division where all divisors are powers of 2 --> svasr
> 
> I don't think this is correct for negative dividends (which is why
> ASRD exists).  E.g. -1 / 4 is 0 as computed by svdiv (round towards zero),
> but -1 as computed by svasr (round towards -Inf).
You’re right, I dropped the second patch.
> 
>> We chose svdiv_impl::fold as location for the implementation to have the
>> transform applied as early as possible, such that other (existing or future)
>> gimple optimizations can be applied on the result.
>> Currently, the transform to is only applied for signed integers, because
>> there do not exist an unsigned svasrd and svasr. The transform has not (yet)
>> been implemented for svdivr.
> 
> FWIW, using svlsr for unsigned divisions should be OK.
Thanks for pointing that out, I adapted the patch to transform unsigned division by a power of 2 to svlsr.
> 
>> Please also comment/advise on the following:
>> In a next patch, we would like to replace SVE division by constants (other
>> than powers of 2) by multiply and shifts, similar as for scalar division.
>> This is planned to be implemented in the gimple_folder as well. Thoughts?
> 
> I'm a bit uneasy about going that far.  I suppose it comes down to a
> question about what intrinsics are for.  Are they for describing an
> algorithm, or for hand-optimising a specific implementation of the
> algorithm?  IMO it's the latter.
> 
> If people want to write out a calculation in natural arithmetic, it
> would be better to write the algorithm in scalar code and let the
> vectoriser handle it.  That gives the opportunity for many more
> optimisations than just this one.
> 
> Intrinsics are about giving programmers direct, architecture-level
> control over how something is implemented.  I've seen Arm's library
> teams go to great lengths to work out which out of a choice of
> instruction sequences is the best one, even though the sequences in
> question would look functionally equivalent to a smart-enough compiler.
> 
> So part of the work of using intrinsics is to figure out what the best
> sequence is.  And IMO, part of the contract is that the compiler
> shouldn't interfere with the programmer's choices too much.  If the
> compiler makes a change, it must very confident that it is a win for
> the function as a whole.
> 
> Replacing one division with one shift is fine, as an aid to the programmer.
> It removes the need for (say) templated functions to check for that case
> manually.  Constant folding is fine too, for similar reasons.  In these
> cases, there's not really a cost/benefit choice to be made between
> different expansions.  One choice is objectively better in all
> realistic situations.
> 
> But when it comes to general constants, there are many different choices
> that could be made when deciding which constants should be open-coded
> and which shouldn't.  IMO we should leave the choice to the programmer
> in those cases.  If the compiler gets it wrong, there will be no way
> for the programmer to force the compiler's hand ("no, when I say svdiv,
> I really do mean svdiv”).
Makes sense, then we will not pursue this further and only leave the current optimization.
Best, Jennifer
> 
> Thanks,
> Richard


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4312 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-07-29 13:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-17  6:58 [RFC/RFA][PATCH 0/2] SVE intrinsics: Add strength reduction for division by constant Jennifer Schmitz
2024-07-17  7:29 ` Richard Sandiford
2024-07-29 13:51   ` Jennifer Schmitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).