public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/111844] New: missed optimization
@ 2023-10-17  4:17 113245 at gmail dot com
  2023-10-17  4:23 ` [Bug tree-optimization/111844] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: 113245 at gmail dot com @ 2023-10-17  4:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111844

            Bug ID: 111844
           Summary: missed optimization
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: 113245 at gmail dot com
  Target Milestone: ---

Hello,

The following code compiles and optimizes to something reasonable under -O2
-std=c++14 with gcc trunk (Oct 16, d5cfabc677b08f38ea5d5f85deeda746b4fabb88)


#include <cstring>

extern void bar();

struct P {
    unsigned int x;
    unsigned int y;
    unsigned int z[20];
};

void foo(void* buf, int inc) {
    P p;
    memcpy(&p, buf, sizeof(p)) ;
    p.x += inc;
    memcpy(buf, &p, sizeof(p)) ;

    // bar();
}


Results in assembly that only loads the portion of data from 'buf' that
corresponds to p.x.

foo(void*, int):
        movdqu  xmm0, XMMWORD PTR [rdi]
        movaps  XMMWORD PTR [rsp-104], xmm0
        add     DWORD PTR [rsp-104], esi
        movdqa  xmm0, XMMWORD PTR [rsp-104]
        movups  XMMWORD PTR [rdi], xmm0
        ret

However, reintroducing the call to bar() results in significantly worse
assembly; it appears to want to copy the entire struct `p` out of buf, even
though almost all of the movaps instructions are not useful.

foo(void*, int):
        movdqu  xmm0, XMMWORD PTR [rdi]
        mov     rax, QWORD PTR [rdi+80]
        movaps  XMMWORD PTR [rsp-104], xmm0
        movdqu  xmm0, XMMWORD PTR [rdi+16]
        add     DWORD PTR [rsp-104], esi
        movaps  XMMWORD PTR [rsp-88], xmm0
        movdqu  xmm0, XMMWORD PTR [rdi+32]
        mov     QWORD PTR [rsp-24], rax
        movaps  XMMWORD PTR [rsp-72], xmm0
        movdqu  xmm0, XMMWORD PTR [rdi+48]
        movaps  XMMWORD PTR [rsp-56], xmm0
        movdqu  xmm0, XMMWORD PTR [rdi+64]
        movaps  XMMWORD PTR [rsp-40], xmm0
        movdqa  xmm0, XMMWORD PTR [rsp-104]
        movups  XMMWORD PTR [rdi], xmm0
        jmp     bar()

For comparison, several versions of clang with the same flags will optimize
this to:

foo(void*, int):
        add     dword ptr [rdi], esi
        jmp     bar()

I am not sure why the loads to the stack-local `P p` are not elided; my first
thought was that perhaps escape analysis on &p forces the full load in case
memcpy "saves" the address of `p` for use by bar(); I would have expected that
wrapping the {decl/memcpy/increment/memcpy} in it's own scope would address
that but it seems to have no effect.

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-16 19:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-17  4:17 [Bug tree-optimization/111844] New: missed optimization 113245 at gmail dot com
2023-10-17  4:23 ` [Bug tree-optimization/111844] " pinskia at gcc dot gnu.org
2023-10-17  6:54 ` rguenth at gcc dot gnu.org
2023-11-16 19:40 ` jamborm at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).