public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/97738] New: Optimizing division by value & - value for HAKMEM 175
@ 2020-11-06 8:23 tkoenig at gcc dot gnu.org
2020-11-06 8:45 ` [Bug middle-end/97738] " rguenth at gcc dot gnu.org
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-11-06 8:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97738
Bug ID: 97738
Summary: Optimizing division by value & - value for HAKMEM 175
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: tkoenig at gcc dot gnu.org
Target Milestone: ---
A straightforward implementation of HAKMEM 175 (returning
the next number with the same number of bits) is
unsigned int
next_same_bit (unsigned int value)
{
unsigned int lowest_bit;
unsigned int left_bits;
unsigned int changed_bits;
unsigned int right_bits;
lowest_bit = value & - value;
left_bits = value + lowest_bit;
changed_bits = value ^ left_bits;
right_bits = (changed_bits / lowest_bit) >> 2;
return left_bits | right_bits;
}
In two's complement, this can be replaced by
unsigned int
next_s_bit (unsigned int value)
{
unsigned int lowest_bit;
unsigned int ctz;
unsigned int left_bits;
unsigned int changed_bits;
unsigned int right_bits;
ctz = __builtin_ctz (value);
lowest_bit = 1u << ctz;
left_bits = value + lowest_bit;
changed_bits = value ^ left_bits;
right_bits = changed_bits >> (ctz + 2);
return left_bits | right_bits;
}
to replace the expensive division by what is known to be a
power of two by a shift.
That transformation is counter-productive (and might be done
the other way) if there is no division by lowest_bit.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/97738] Optimizing division by value & - value for HAKMEM 175
2020-11-06 8:23 [Bug rtl-optimization/97738] New: Optimizing division by value & - value for HAKMEM 175 tkoenig at gcc dot gnu.org
@ 2020-11-06 8:45 ` rguenth at gcc dot gnu.org
2020-11-06 14:21 ` tkoenig at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-06 8:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97738
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Component|rtl-optimization |middle-end
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, guess that we should see to replace
x / y
with y known to have exactly one bit set to
x >> (ctz (y) + 1)
note I'm quite sure this isn't faster for all power-of-two y.
It's also not canonically simpler. In the end sth for
instruction selection / RTL expansion I guess.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/97738] Optimizing division by value & - value for HAKMEM 175
2020-11-06 8:23 [Bug rtl-optimization/97738] New: Optimizing division by value & - value for HAKMEM 175 tkoenig at gcc dot gnu.org
2020-11-06 8:45 ` [Bug middle-end/97738] " rguenth at gcc dot gnu.org
@ 2020-11-06 14:21 ` tkoenig at gcc dot gnu.org
2020-11-06 17:52 ` tkoenig at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-11-06 14:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97738
--- Comment #2 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Created attachment 49516
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49516&action=edit
Small benchmark
Here's a small benchmark for counting all 32-bit numbers with 16 bits set
according to the HAKMEM source.
Timing is (first float is elapsed time in seconds for version with division,
second float is for the shift):
2.319526 601080391
1.147284 601080391
with -O3 -march=native on an AMD Ryzen 7 1700X,
4.539288 601080391
2.700514 601080391
on POWER9.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/97738] Optimizing division by value & - value for HAKMEM 175
2020-11-06 8:23 [Bug rtl-optimization/97738] New: Optimizing division by value & - value for HAKMEM 175 tkoenig at gcc dot gnu.org
2020-11-06 8:45 ` [Bug middle-end/97738] " rguenth at gcc dot gnu.org
2020-11-06 14:21 ` tkoenig at gcc dot gnu.org
@ 2020-11-06 17:52 ` tkoenig at gcc dot gnu.org
2020-11-06 18:27 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-11-06 17:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97738
--- Comment #3 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Even faster code:
ctz = __builtin_ctz (value);
lowest_bit = value & - value;
left_bits = value + lowest_bit;
changed_bits = value ^ left_bits;
right_bits = changed_bits >> (ctz + 2);
return left_bits | right_bits;
The first two instructions get compiled directly (with -march=native)
to
blsi %edi, %edx
tzcntl %edi, %eax
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/97738] Optimizing division by value & - value for HAKMEM 175
2020-11-06 8:23 [Bug rtl-optimization/97738] New: Optimizing division by value & - value for HAKMEM 175 tkoenig at gcc dot gnu.org
` (2 preceding siblings ...)
2020-11-06 17:52 ` tkoenig at gcc dot gnu.org
@ 2020-11-06 18:27 ` jakub at gcc dot gnu.org
2020-11-07 10:21 ` tkoenig at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-06 18:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97738
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
What about a version that still sets lowest_bit to value & -value; rather than
1 < ctz?
Also, I'm not sure you can safely do the (changed_bits >> ctz) >> 2 to
changed_bits >> (ctz + 2) transformation, while because of the division one can
count on value not being 0 (otherwise UB), value & -value can still be e.g. 1U
<< 31 and then ctz 31 too, and changed_bits >> (31 + 2) being UB, while
(changed_bits >> 31) >> 2 well defined returning 0.
So, I think we could e.g. during expansion (or isel) based on target cost
optimize
x / (y & -y) to x >> __builtin_ctz (y) (also assuming the optab for ctz
exists), but anything else looks complicated.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/97738] Optimizing division by value & - value for HAKMEM 175
2020-11-06 8:23 [Bug rtl-optimization/97738] New: Optimizing division by value & - value for HAKMEM 175 tkoenig at gcc dot gnu.org
` (3 preceding siblings ...)
2020-11-06 18:27 ` jakub at gcc dot gnu.org
@ 2020-11-07 10:21 ` tkoenig at gcc dot gnu.org
2021-09-26 8:23 ` pinskia at gcc dot gnu.org
2021-09-26 8:24 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-11-07 10:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97738
--- Comment #5 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #4)
> What about a version that still sets lowest_bit to value & -value; rather
> than 1 < ctz?
I think this would be ideal, or close to it.
> Also, I'm not sure you can safely do the (changed_bits >> ctz) >> 2 to
> changed_bits >> (ctz + 2) transformation, while because of the division one
> can count on value not being 0 (otherwise UB), value & -value can still be
> e.g. 1U << 31 and then ctz 31 too, and changed_bits >> (31 + 2) being UB,
> while
> (changed_bits >> 31) >> 2 well defined returning 0.
OK.
> So, I think we could e.g. during expansion (or isel) based on target cost
> optimize
> x / (y & -y) to x >> __builtin_ctz (y) (also assuming the optab for ctz
> exists), but anything else looks complicated.
I think this would solve the issue for the original code (which is
what people will find on the web if they google for HAKMEM 175).
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/97738] Optimizing division by value & - value for HAKMEM 175
2020-11-06 8:23 [Bug rtl-optimization/97738] New: Optimizing division by value & - value for HAKMEM 175 tkoenig at gcc dot gnu.org
` (4 preceding siblings ...)
2020-11-07 10:21 ` tkoenig at gcc dot gnu.org
@ 2021-09-26 8:23 ` pinskia at gcc dot gnu.org
2021-09-26 8:24 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-26 8:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97738
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Last reconfirmed| |2021-09-26
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
I think x/(y&-y) should be expanded as x >> ctz (y&-y) + 1 (if ctz is an
opcode) but this should be done only at expand time (unless we get a "lower"
gimple phase).
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/97738] Optimizing division by value & - value for HAKMEM 175
2020-11-06 8:23 [Bug rtl-optimization/97738] New: Optimizing division by value & - value for HAKMEM 175 tkoenig at gcc dot gnu.org
` (5 preceding siblings ...)
2021-09-26 8:23 ` pinskia at gcc dot gnu.org
@ 2021-09-26 8:24 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-26 8:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97738
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note we don't need to do y&-y only if we keep track of popcount of the
SSA_NAME. But we don't have that yet.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-09-26 8:24 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-06 8:23 [Bug rtl-optimization/97738] New: Optimizing division by value & - value for HAKMEM 175 tkoenig at gcc dot gnu.org
2020-11-06 8:45 ` [Bug middle-end/97738] " rguenth at gcc dot gnu.org
2020-11-06 14:21 ` tkoenig at gcc dot gnu.org
2020-11-06 17:52 ` tkoenig at gcc dot gnu.org
2020-11-06 18:27 ` jakub at gcc dot gnu.org
2020-11-07 10:21 ` tkoenig at gcc dot gnu.org
2021-09-26 8:23 ` pinskia at gcc dot gnu.org
2021-09-26 8:24 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).