public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
@ 2024-06-19 15:15 burnus at gcc dot gnu.org
2024-06-20 1:58 ` [Bug middle-end/115551] " xry111 at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: burnus at gcc dot gnu.org @ 2024-06-19 15:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551
Bug ID: 115551
Summary: [missed optimization] "c1 << (a + c2)" not optimized
into "(c1 << c2) << a"
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: burnus at gcc dot gnu.org
CC: pinskia at gcc dot gnu.org
Target Milestone: ---
Created attachment 58468
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58468&action=edit
patch to show how to get a nice output – but doesn't actually use it. Not to be
used..
"c1 << (a + c2)" not optimized into "(c1 << c2) << a"
Example:
int f(int ch) {
unsigned long mask1 = ((((1UL))) << (1 + 4 * ((1) - 1))) << (ch * 4);
unsigned long mask2 = ((((1UL))) << (1 + 4 * ((ch + 1) - 1)));
return mask1-mask2;
}
GCC converts this currently to:
mask1 = 2 << (ch * 4)
mask2 = 1 << (ch * 4 + 1)
* * *
Related to
https://lore.kernel.org/lkml/d7ef7a6158df4ba6687233b0e00d37796b069fb3.1718791090.git.u.kleine-koenig@baylibre.com/
Result:
* With the 2nd form the resulting binary gets ~25% smaller
* Saving nearly 500 bytes!
* * *
On ARM, the generated code for mask1 is:
lsls r0, r0, #2
movs r3, #2
lsl.w r0, r3, r0
and for mask2:
lsls r0, r0, #2
adds r0, #1 // additional 'adds' instruction
movs r3, #1
lsl.w r0, r3, r0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
@ 2024-06-20 1:58 ` xry111 at gcc dot gnu.org
2024-06-20 6:57 ` burnus at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: xry111 at gcc dot gnu.org @ 2024-06-20 1:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551
Xi Ruoyao <xry111 at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |xry111 at gcc dot gnu.org
--- Comment #1 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
"c1 << (-5 + 5)" is fine but "(c1 << -5) << 5" invokes undefined behavior.
Thus we need some range info to do this optimization.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
2024-06-20 1:58 ` [Bug middle-end/115551] " xry111 at gcc dot gnu.org
@ 2024-06-20 6:57 ` burnus at gcc dot gnu.org
2024-06-20 7:21 ` burnus at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: burnus at gcc dot gnu.org @ 2024-06-20 6:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551
--- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> ---
> Thus we need some range info to do this optimization.
Good point.
It seems as if for c1 << (c2 * a + c3), C requires a >= -c3/c2 (read as float
division; c2 ≠ 0)
And the suggested optimization requires c2*a >= 0 and c3 >= 0 to fulfill C
requirement of nonnegative shifts.
Thus, this is fulfilled for any value of 'a' if c3 >= 0 and abs(c2) > c3.
The optimization can also be done for any value of 'a', if the hardware
supports c1 << (negative value) (as right shift, fillung with zeros) and
popcount(c1) == popcount(c1 << c3).
The first condition is fulfilled in this example.
I don't know about the second, but observed that Clang/LLVM optimizes the diff
mask1-mask2 to 0 on ARM but not x86_64 (not checked why nor whether ARM handles
negative shifts in a well-defined way or not).
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
2024-06-20 1:58 ` [Bug middle-end/115551] " xry111 at gcc dot gnu.org
2024-06-20 6:57 ` burnus at gcc dot gnu.org
@ 2024-06-20 7:21 ` burnus at gcc dot gnu.org
2024-06-20 7:32 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: burnus at gcc dot gnu.org @ 2024-06-20 7:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551
--- Comment #3 from Tobias Burnus <burnus at gcc dot gnu.org> ---
As we want to have a >= 0, I tried to convey it differently for the example in
comment 0:
(a) __attribute__((assume(ch >= 0)));
(b) 'unsigned ch' (instead of 'int ch')
but it didn't help. Thus, it looks as if it could be implemented for:
c1 << (a + c2)
if (a >= 0 and c2 >= 0) doing then the conversion
(c1 << c2) << a
Whether Ranger handles abs(c2) > c3 >= 0 → a >= 0 for 'c1 << (a*c2 + c3)', I
don't know, but the variant above with unsigned and 'assume(a >= 0)' should be
implementable and seems to make sense.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
` (2 preceding siblings ...)
2024-06-20 7:21 ` burnus at gcc dot gnu.org
@ 2024-06-20 7:32 ` rguenth at gcc dot gnu.org
2024-06-20 7:55 ` jakub at gcc dot gnu.org
2024-06-20 7:58 ` burnus at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-20 7:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
all same for right-shifts (both logical and arithmetic).
Note that 1 << (a + 5) might be cheaper than (1<<5) << a due to constraints
on immediates but for GIMPLE the latter is definitely more canonical.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
` (3 preceding siblings ...)
2024-06-20 7:32 ` rguenth at gcc dot gnu.org
@ 2024-06-20 7:55 ` jakub at gcc dot gnu.org
2024-06-20 7:58 ` burnus at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-06-20 7:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
We should also verify that it doesn't stand in the way of shift sanitization,
because
unsigned int a;
...
1 << (5 + a);
is well defined only for a in [0, 26] while once we optimize it to
(1 << 5) << a;
that would be well defined for a in [0, 31]. I think the shift sanitization is
done early, so just something to be verified in a testcase next to the patch.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115551] [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a"
2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
` (4 preceding siblings ...)
2024-06-20 7:55 ` jakub at gcc dot gnu.org
@ 2024-06-20 7:58 ` burnus at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: burnus at gcc dot gnu.org @ 2024-06-20 7:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551
--- Comment #6 from Tobias Burnus <burnus at gcc dot gnu.org> ---
Crossref: New Bug 115555 is for the range analysis to deduce from 'x << a' that
'a' must be nonnegative.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-06-20 7:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-19 15:15 [Bug middle-end/115551] New: [missed optimization] "c1 << (a + c2)" not optimized into "(c1 << c2) << a" burnus at gcc dot gnu.org
2024-06-20 1:58 ` [Bug middle-end/115551] " xry111 at gcc dot gnu.org
2024-06-20 6:57 ` burnus at gcc dot gnu.org
2024-06-20 7:21 ` burnus at gcc dot gnu.org
2024-06-20 7:32 ` rguenth at gcc dot gnu.org
2024-06-20 7:55 ` jakub at gcc dot gnu.org
2024-06-20 7:58 ` burnus at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).