public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path
@ 2022-11-20 18:48 avi at scylladb dot com
  2022-11-20 19:23 ` [Bug rtl-optimization/107772] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: avi at scylladb dot com @ 2022-11-20 18:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772

            Bug ID: 107772
           Summary: [missed optimization] function prologue generated even
                    though it's only needed in an unlikely path
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: avi at scylladb dot com
  Target Milestone: ---

Consider


int g(int);

void f(int* b, int* e) {
    while (b != e) {
        if (__builtin_expect(*b != 0, false)) [[unlikely]] {
            *b = g(*b);
        }
        ++b;
    }
}

If we believe the __builtin_expect and/or unlikely annotations (had both for
extra safety), the loop usually does nothing. So we would expect any register
saving and restoring to be pushed to the unlikely section. Yet (-O3):

f(int*, int*):
        cmp     rdi, rsi
        je      .L10
        push    rbp
        mov     rbp, rsi
        push    rbx
        mov     rbx, rdi
        sub     rsp, 8
.L4:
        mov     edi, DWORD PTR [rbx]
        test    edi, edi
        jne     .L14
.L3:
        add     rbx, 4
        cmp     rbp, rbx
        jne     .L4
        add     rsp, 8
        pop     rbx
        pop     rbp
        ret
.L14:
        call    g(int)
        mov     DWORD PTR [rbx], eax
        jmp     .L3
.L10:
        ret


I count 8 instructions that could/should have been pushed to .L14.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path
  2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
@ 2022-11-20 19:23 ` pinskia at gcc dot gnu.org
  2022-11-28 18:08 ` avi at scylladb dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-11-20 19:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2022-11-20

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed, though this is more than just the your normal shrink wrapping case
as you need to split the loop into two.

Though maybe having the prologue and epoligue around the function call instead
might be better ....

Anyways this is still a more complex case for shrink wrapping.

I Noticed that LLVM does not even do a shrink wrapping for the early return if
b == e on entering the function.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path
  2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
  2022-11-20 19:23 ` [Bug rtl-optimization/107772] " pinskia at gcc dot gnu.org
@ 2022-11-28 18:08 ` avi at scylladb dot com
  2022-11-28 18:13 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: avi at scylladb dot com @ 2022-11-28 18:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772

--- Comment #2 from Avi Kivity <avi at scylladb dot com> ---
I expect something like this:


f(int*, int*):
        cmp     rdi, rsi
        je      .L10
.L4:
        cmp     DWORD PTR [rsi], 0
        jne     .L14
.L3
        add     rsi, 4
        cmp     rsi, rdi
        jne     .L4
.L10
        ret

.section .text.cold

.L14:
        push    rsi
        push    rdi
        mov     rax, DWORD PTR [rsi]
        call    g(int)
        pop     rdi
        pop     rsi
        mov     DWORD PTR [rsi], eax
        jmp     .L3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path
  2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
  2022-11-20 19:23 ` [Bug rtl-optimization/107772] " pinskia at gcc dot gnu.org
  2022-11-28 18:08 ` avi at scylladb dot com
@ 2022-11-28 18:13 ` pinskia at gcc dot gnu.org
  2022-11-28 18:34 ` amonakov at gcc dot gnu.org
  2022-11-28 20:32 ` avi at scylladb dot com
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-11-28 18:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Avi Kivity from comment #2)
> I expect something like this:

Right doing shrink wrapping like that is "hard" really and someone would need
to add a full infrastructure for this. I doubt it will be implemented any time
soon really.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path
  2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
                   ` (2 preceding siblings ...)
  2022-11-28 18:13 ` pinskia at gcc dot gnu.org
@ 2022-11-28 18:34 ` amonakov at gcc dot gnu.org
  2022-11-28 20:32 ` avi at scylladb dot com
  4 siblings, 0 replies; 6+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-28 18:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #4 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
You'll get better results from outlining a rare path manually: the
prologue/epilogue won't be re-executed for each invocation of 'g':

int g(int);

__attribute__((noinline,cold))
static void f_slowpath(int* b, int* e)
{
    switch (0)
    do {
        if (*b != 0)
            default: *b = g(*b);
    } while (++b != e);
}

void f(int* b, int* e)
{
    for (; b != e; b++)
        if (*b != 0) {
            f_slowpath(b, e);
            return;
        }
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path
  2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
                   ` (3 preceding siblings ...)
  2022-11-28 18:34 ` amonakov at gcc dot gnu.org
@ 2022-11-28 20:32 ` avi at scylladb dot com
  4 siblings, 0 replies; 6+ messages in thread
From: avi at scylladb dot com @ 2022-11-28 20:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772

--- Comment #5 from Avi Kivity <avi at scylladb dot com> ---
It indeed generates better code. However, it requires that I duplicate the
function body, which can be hard at times (consider f == std::transform and "if
(*b != 0) { *b = g(*b); }" as a lambda input.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-28 20:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-20 18:48 [Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path avi at scylladb dot com
2022-11-20 19:23 ` [Bug rtl-optimization/107772] " pinskia at gcc dot gnu.org
2022-11-28 18:08 ` avi at scylladb dot com
2022-11-28 18:13 ` pinskia at gcc dot gnu.org
2022-11-28 18:34 ` amonakov at gcc dot gnu.org
2022-11-28 20:32 ` avi at scylladb dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).