public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/111844] New: missed optimization
@ 2023-10-17  4:17 113245 at gmail dot com
  2023-10-17  4:23 ` [Bug tree-optimization/111844] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: 113245 at gmail dot com @ 2023-10-17  4:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111844

            Bug ID: 111844
           Summary: missed optimization
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: 113245 at gmail dot com
  Target Milestone: ---

Hello,

The following code compiles and optimizes to something reasonable under -O2
-std=c++14 with gcc trunk (Oct 16, d5cfabc677b08f38ea5d5f85deeda746b4fabb88)


#include <cstring>

extern void bar();

struct P {
    unsigned int x;
    unsigned int y;
    unsigned int z[20];
};

void foo(void* buf, int inc) {
    P p;
    memcpy(&p, buf, sizeof(p)) ;
    p.x += inc;
    memcpy(buf, &p, sizeof(p)) ;

    // bar();
}


Results in assembly that only loads the portion of data from 'buf' that
corresponds to p.x.

foo(void*, int):
        movdqu  xmm0, XMMWORD PTR [rdi]
        movaps  XMMWORD PTR [rsp-104], xmm0
        add     DWORD PTR [rsp-104], esi
        movdqa  xmm0, XMMWORD PTR [rsp-104]
        movups  XMMWORD PTR [rdi], xmm0
        ret

However, reintroducing the call to bar() results in significantly worse
assembly; it appears to want to copy the entire struct `p` out of buf, even
though almost all of the movaps instructions are not useful.

foo(void*, int):
        movdqu  xmm0, XMMWORD PTR [rdi]
        mov     rax, QWORD PTR [rdi+80]
        movaps  XMMWORD PTR [rsp-104], xmm0
        movdqu  xmm0, XMMWORD PTR [rdi+16]
        add     DWORD PTR [rsp-104], esi
        movaps  XMMWORD PTR [rsp-88], xmm0
        movdqu  xmm0, XMMWORD PTR [rdi+32]
        mov     QWORD PTR [rsp-24], rax
        movaps  XMMWORD PTR [rsp-72], xmm0
        movdqu  xmm0, XMMWORD PTR [rdi+48]
        movaps  XMMWORD PTR [rsp-56], xmm0
        movdqu  xmm0, XMMWORD PTR [rdi+64]
        movaps  XMMWORD PTR [rsp-40], xmm0
        movdqa  xmm0, XMMWORD PTR [rsp-104]
        movups  XMMWORD PTR [rdi], xmm0
        jmp     bar()

For comparison, several versions of clang with the same flags will optimize
this to:

foo(void*, int):
        add     dword ptr [rdi], esi
        jmp     bar()

I am not sure why the loads to the stack-local `P p` are not elided; my first
thought was that perhaps escape analysis on &p forces the full load in case
memcpy "saves" the address of `p` for use by bar(); I would have expected that
wrapping the {decl/memcpy/increment/memcpy} in it's own scope would address
that but it seems to have no effect.

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/111844] missed optimization
  2023-10-17  4:17 [Bug tree-optimization/111844] New: missed optimization 113245 at gmail dot com
@ 2023-10-17  4:23 ` pinskia at gcc dot gnu.org
  2023-10-17  6:54 ` rguenth at gcc dot gnu.org
  2023-11-16 19:40 ` jamborm at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-17  4:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111844

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-10-17
             Status|UNCONFIRMED                 |NEW
           Severity|normal                      |enhancement
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect LLVM is able to just optimize it to:
#include <cstddef>


void foo1(void* buf, int inc) {
    unsigned int px;
    memcpy(&px, ((char*)buf)+offsetof(P, x), sizeof(px)) ;
    px += inc;
    memcpy(((char*)buf)+offsetof(P, x), &px, sizeof(px)) ;

   //  bar();
}

As it can see that is the only location is changed ...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/111844] missed optimization
  2023-10-17  4:17 [Bug tree-optimization/111844] New: missed optimization 113245 at gmail dot com
  2023-10-17  4:23 ` [Bug tree-optimization/111844] " pinskia at gcc dot gnu.org
@ 2023-10-17  6:54 ` rguenth at gcc dot gnu.org
  2023-11-16 19:40 ` jamborm at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-17  6:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111844

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We are not optimizing the code at all on the GIMPLE level but expand from

  <bb 2> [local count: 1073741824]:
  memcpy (&p, buf_5(D), 88);
  _1 = p.x;
  inc.0_2 = (unsigned int) inc_7(D);
  _3 = _1 + inc.0_2;
  p.x = _3;
  memcpy (buf_5(D), &p, 88);
  p ={v} {CLOBBER(eol)};
  return;

where when expanding memcpy inline during RTL expanding we seem to be able to
clean up after that.

It seems to me this is a task for SRA (again...) which should be more
forgiving to select stmts requiring address-taking of locals but only
when they are not rewritten plus analyzing memcpy, memset (and other
select builtins) as to their effect.

SRA handles the following by means of totally scalarizing 'p':

void foo(P* buf, int inc) {
    P p;
    p = *buf;
    p.x += inc;
    *buf = p;
}

and you get

_Z3fooP1Pi:
.LFB16:
        .cfi_startproc
        addl    %esi, (%rdi)
        ret

with or without the call to bar ().  You could argue more aggressive
"inline expanding" memcpy (to char[] = char[] in this case) would
be asked for but I think this might confuse SRA and I'm not sure we
apply the same costing as to whether to inline-expand the "memcpy"
at RTL expansion time.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/111844] missed optimization
  2023-10-17  4:17 [Bug tree-optimization/111844] New: missed optimization 113245 at gmail dot com
  2023-10-17  4:23 ` [Bug tree-optimization/111844] " pinskia at gcc dot gnu.org
  2023-10-17  6:54 ` rguenth at gcc dot gnu.org
@ 2023-11-16 19:40 ` jamborm at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-11-16 19:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111844

--- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)

> It seems to me this is a task for SRA (again...) which should be more
> forgiving to select stmts requiring address-taking of locals but only
> when they are not rewritten plus analyzing memcpy, memset (and other
> select builtins) as to their effect.

With https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636836.html

SRA is more forgiving to address-taken candidates like this but indeed
it also needs to understand memcpy and similar builtins, which it does
not.  Representing it in GIMPLE might almost work, though we probably
need to handle that case in build_ref_for_offset to create MEM_REFs
with the appropriate alias access type.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-16 19:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-17  4:17 [Bug tree-optimization/111844] New: missed optimization 113245 at gmail dot com
2023-10-17  4:23 ` [Bug tree-optimization/111844] " pinskia at gcc dot gnu.org
2023-10-17  6:54 ` rguenth at gcc dot gnu.org
2023-11-16 19:40 ` jamborm at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).