public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/115204] New: unnecessary stack usage and copies (of temporaries)
@ 2024-05-23 12:19 mkretz at gcc dot gnu.org
  2024-05-23 12:21 ` [Bug target/115204] " mkretz at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: mkretz at gcc dot gnu.org @ 2024-05-23 12:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115204

            Bug ID: 115204
           Summary: unnecessary stack usage and copies (of temporaries)
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mkretz at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

Test case (https://compiler-explorer.com/z/P7s75EhMr):

struct A {
  int data[8];
};

struct A gen();

void g(struct A);

void f()
{
  g(gen());
}

This places the returned A object from 'gen()' on the stack, copies it and then
calls 'g'. Why? So instead of

f:
        sub     rsp, 40
        xor     eax, eax
        mov     rdi, rsp
        call    gen
        sub     rsp, 32
        movdqa  xmm0, XMMWORD PTR [rsp+32]
        movups  XMMWORD PTR [rsp], xmm0
        movdqa  xmm0, XMMWORD PTR [rsp+48]
        movups  XMMWORD PTR [rsp+16], xmm0
        call    g
        add     rsp, 72
        ret

can GCC just elide the copy? Like this:

f:
        sub     rsp, 40
        xor     eax, eax
        mov     rdi, rsp
        call    gen
        call    g
        add     rsp, 40
        ret


I understand that this optimization requires the caller to never read from the
object anymore. So a second call to 'g' with the same object returned from
'gen' (like in https://compiler-explorer.com/z/6rMYdnb34) requires that the
first call to 'g' gets a copy. But the second call does not require the copy.
I.e.

int f()
{
  struct A a = gen();
  g(a);
  g(a);
  return 1;
}

compiles to

f:
        sub     rsp, 40
        xor     eax, eax
        mov     rdi, rsp
        call    gen
        sub     rsp, 32
        movdqa  xmm0, XMMWORD PTR [rsp+32]
        movups  XMMWORD PTR [rsp], xmm0
        movdqa  xmm0, XMMWORD PTR [rsp+48]
        movups  XMMWORD PTR [rsp+16], xmm0
        call    g
        movdqa  xmm0, XMMWORD PTR [rsp+32]
        movups  XMMWORD PTR [rsp], xmm0
        movdqa  xmm0, XMMWORD PTR [rsp+48]
        movups  XMMWORD PTR [rsp+16], xmm0
        call    g
        mov     eax, 1
        add     rsp, 72
        ret

but could be

f:
        sub     rsp, 40
        xor     eax, eax
        mov     rdi, rsp
        call    gen
        sub     rsp, 32
        movdqa  xmm0, XMMWORD PTR [rsp+32]
        movups  XMMWORD PTR [rsp], xmm0
        movdqa  xmm0, XMMWORD PTR [rsp+48]
        movups  XMMWORD PTR [rsp+16], xmm0
        call    g
        add     rsp, 32
        call    g
        mov     eax, 1
        add     rsp, 40
        ret

IIUC, the second change would be significantly harder to implement because it
needs to shrink the stack. However, I don't believe this second case is as
important. The first one should be sufficiently common because of temporaries
passed into function arguments. So the following variation

void f()
{
  g(gen(), gen());
}

is something I see often, leading to many unnecessary stack copies. Instead of

f:
        sub     rsp, 72
        xor     eax, eax
        mov     rdi, rsp
        call    gen
        lea     rdi, [rsp+32]
        xor     eax, eax
        call    gen
        sub     rsp, 64
        movdqa  xmm0, XMMWORD PTR [rsp+64]
        movups  XMMWORD PTR [rsp+32], xmm0
        movdqa  xmm0, XMMWORD PTR [rsp+80]
        movups  XMMWORD PTR [rsp+48], xmm0
        movdqa  xmm0, XMMWORD PTR [rsp+96]
        movups  XMMWORD PTR [rsp], xmm0
        movdqa  xmm0, XMMWORD PTR [rsp+112]
        movups  XMMWORD PTR [rsp+16], xmm0
        call    g
        add     rsp, 136
        ret

I think it should be:

f:
        sub     rsp, 72
        xor     eax, eax
        mov     rdi, rsp
        call    gen
        lea     rdi, [rsp+32]
        xor     eax, eax
        call    gen
        call    g
        add     rsp, 72
        ret

IIUC, this depends on the psABI and I don't know how target-dependent such an
optimization is. That's why I

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/115204] unnecessary stack usage and copies (of temporaries)
  2024-05-23 12:19 [Bug target/115204] New: unnecessary stack usage and copies (of temporaries) mkretz at gcc dot gnu.org
@ 2024-05-23 12:21 ` mkretz at gcc dot gnu.org
  2024-05-23 12:21 ` pinskia at gcc dot gnu.org
  2024-05-23 12:29 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: mkretz at gcc dot gnu.org @ 2024-05-23 12:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115204

--- Comment #1 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> ---
That's why I tagged is as 'target'. I'd be happy to learn that it can be
resolved target-independently.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/115204] unnecessary stack usage and copies (of temporaries)
  2024-05-23 12:19 [Bug target/115204] New: unnecessary stack usage and copies (of temporaries) mkretz at gcc dot gnu.org
  2024-05-23 12:21 ` [Bug target/115204] " mkretz at gcc dot gnu.org
@ 2024-05-23 12:21 ` pinskia at gcc dot gnu.org
  2024-05-23 12:29 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-05-23 12:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115204

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I am 99% sure there is a dup of this bug already.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/115204] unnecessary stack usage and copies (of temporaries)
  2024-05-23 12:19 [Bug target/115204] New: unnecessary stack usage and copies (of temporaries) mkretz at gcc dot gnu.org
  2024-05-23 12:21 ` [Bug target/115204] " mkretz at gcc dot gnu.org
  2024-05-23 12:21 ` pinskia at gcc dot gnu.org
@ 2024-05-23 12:29 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-05-23 12:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115204

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Dup.

*** This bug has been marked as a duplicate of bug 28831 ***

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-23 12:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-23 12:19 [Bug target/115204] New: unnecessary stack usage and copies (of temporaries) mkretz at gcc dot gnu.org
2024-05-23 12:21 ` [Bug target/115204] " mkretz at gcc dot gnu.org
2024-05-23 12:21 ` pinskia at gcc dot gnu.org
2024-05-23 12:29 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).