public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path
@ 2022-11-20 18:48 avi at scylladb dot com
2022-11-20 19:23 ` [Bug rtl-optimization/107772] " pinskia at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: avi at scylladb dot com @ 2022-11-20 18:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772
Bug ID: 107772
Summary: [missed optimization] function prologue generated even
though it's only needed in an unlikely path
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: avi at scylladb dot com
Target Milestone: ---
Consider
int g(int);
void f(int* b, int* e) {
while (b != e) {
if (__builtin_expect(*b != 0, false)) [[unlikely]] {
*b = g(*b);
}
++b;
}
}
If we believe the __builtin_expect and/or unlikely annotations (had both for
extra safety), the loop usually does nothing. So we would expect any register
saving and restoring to be pushed to the unlikely section. Yet (-O3):
f(int*, int*):
cmp rdi, rsi
je .L10
push rbp
mov rbp, rsi
push rbx
mov rbx, rdi
sub rsp, 8
.L4:
mov edi, DWORD PTR [rbx]
test edi, edi
jne .L14
.L3:
add rbx, 4
cmp rbp, rbx
jne .L4
add rsp, 8
pop rbx
pop rbp
ret
.L14:
call g(int)
mov DWORD PTR [rbx], eax
jmp .L3
.L10:
ret
I count 8 instructions that could/should have been pushed to .L14.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path
2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
@ 2022-11-20 19:23 ` pinskia at gcc dot gnu.org
2022-11-28 18:08 ` avi at scylladb dot com
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-11-20 19:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
Last reconfirmed| |2022-11-20
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed, though this is more than just the your normal shrink wrapping case
as you need to split the loop into two.
Though maybe having the prologue and epoligue around the function call instead
might be better ....
Anyways this is still a more complex case for shrink wrapping.
I Noticed that LLVM does not even do a shrink wrapping for the early return if
b == e on entering the function.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path
2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
2022-11-20 19:23 ` [Bug rtl-optimization/107772] " pinskia at gcc dot gnu.org
@ 2022-11-28 18:08 ` avi at scylladb dot com
2022-11-28 18:13 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: avi at scylladb dot com @ 2022-11-28 18:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772
--- Comment #2 from Avi Kivity <avi at scylladb dot com> ---
I expect something like this:
f(int*, int*):
cmp rdi, rsi
je .L10
.L4:
cmp DWORD PTR [rsi], 0
jne .L14
.L3
add rsi, 4
cmp rsi, rdi
jne .L4
.L10
ret
.section .text.cold
.L14:
push rsi
push rdi
mov rax, DWORD PTR [rsi]
call g(int)
pop rdi
pop rsi
mov DWORD PTR [rsi], eax
jmp .L3
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path
2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
2022-11-20 19:23 ` [Bug rtl-optimization/107772] " pinskia at gcc dot gnu.org
2022-11-28 18:08 ` avi at scylladb dot com
@ 2022-11-28 18:13 ` pinskia at gcc dot gnu.org
2022-11-28 18:34 ` amonakov at gcc dot gnu.org
2022-11-28 20:32 ` avi at scylladb dot com
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-11-28 18:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Avi Kivity from comment #2)
> I expect something like this:
Right doing shrink wrapping like that is "hard" really and someone would need
to add a full infrastructure for this. I doubt it will be implemented any time
soon really.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path
2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
` (2 preceding siblings ...)
2022-11-28 18:13 ` pinskia at gcc dot gnu.org
@ 2022-11-28 18:34 ` amonakov at gcc dot gnu.org
2022-11-28 20:32 ` avi at scylladb dot com
4 siblings, 0 replies; 6+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-28 18:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772
Alexander Monakov <amonakov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amonakov at gcc dot gnu.org
--- Comment #4 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
You'll get better results from outlining a rare path manually: the
prologue/epilogue won't be re-executed for each invocation of 'g':
int g(int);
__attribute__((noinline,cold))
static void f_slowpath(int* b, int* e)
{
switch (0)
do {
if (*b != 0)
default: *b = g(*b);
} while (++b != e);
}
void f(int* b, int* e)
{
for (; b != e; b++)
if (*b != 0) {
f_slowpath(b, e);
return;
}
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path
2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
` (3 preceding siblings ...)
2022-11-28 18:34 ` amonakov at gcc dot gnu.org
@ 2022-11-28 20:32 ` avi at scylladb dot com
4 siblings, 0 replies; 6+ messages in thread
From: avi at scylladb dot com @ 2022-11-28 20:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772
--- Comment #5 from Avi Kivity <avi at scylladb dot com> ---
It indeed generates better code. However, it requires that I duplicate the
function body, which can be hard at times (consider f == std::transform and "if
(*b != 0) { *b = g(*b); }" as a lambda input.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-11-28 20:32 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
2022-11-20 19:23 ` [Bug rtl-optimization/107772] " pinskia at gcc dot gnu.org
2022-11-28 18:08 ` avi at scylladb dot com
2022-11-28 18:13 ` pinskia at gcc dot gnu.org
2022-11-28 18:34 ` amonakov at gcc dot gnu.org
2022-11-28 20:32 ` avi at scylladb dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).