[Bug c/108255] New: Repeated address-of (lea) not optimized for size.

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "witold.baryluk+gcc at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c/108255] New: Repeated address-of (lea) not optimized for size.
Date: Fri, 30 Dec 2022 22:30:00 +0000	[thread overview]
Message-ID: <bug-108255-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108255

            Bug ID: 108255
           Summary: Repeated address-of (lea) not optimized for size.
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: witold.baryluk+gcc at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/q5sx9e49j

void f(int *);

int g(int of) {
    int x = 13;
    f(&x);
    f(&x);
    f(&x);
    f(&x);
    f(&x);
    f(&x);
    f(&x);
    f(&x);
    return 0;
}

Got:

g(int):
        sub     rsp, 24
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], 13
        call    f(int*)
        lea     rdi, [rsp+12]             # compute, 5 bytes
        call    f(int*)
        lea     rdi, [rsp+12]             # recompute, 5 bytes
        call    f(int*)
        lea     rdi, [rsp+12]             # recompute, 5 bytes
        call    f(int*)
        lea     rdi, [rsp+12]
        call    f(int*)
        lea     rdi, [rsp+12]
        call    f(int*)
        lea     rdi, [rsp+12]
        call    f(int*)
        lea     rdi, [rsp+12]
        call    f(int*)
        xor     eax, eax
        add     rsp, 24
        ret

But, note that lea is 5 bytes.

Expected (generated by clang 3.0 - 15.0):

g(int):                                  # @g(int)
        push    rbx                              # extra, but just 1 byte
        sub     rsp, 16
        mov     dword ptr [rsp + 12], 13         # CSE temp
        lea     rbx, [rsp + 12]
        mov     rdi, rbx                         # use
        call    f(int*)@PLT
        mov     rdi, rbx                         # reuse, 3 bytes
        call    f(int*)@PLT
        mov     rdi, rbx                         # reuse, 3 bytes
        call    f(int*)@PLT
        mov     rdi, rbx
        call    f(int*)@PLT
        mov     rdi, rbx
        call    f(int*)@PLT
        mov     rdi, rbx
        call    f(int*)@PLT
        mov     rdi, rbx
        call    f(int*)@PLT
        mov     rdi, rbx
        call    f(int*)@PLT
        xor     eax, eax
        add     rsp, 16
        pop     rbx                          # extra, but just 1 byte
        ret

Technically this is more instructions.

But

mov rdi, rbx is 3 bytes, which is shorter than 5 bytes of lea. This is at minor
expense of needing to save and restore rbx.

PS. Same happens when using temporary `int *const y = &x;`

Also same when optimizing for size (`-Os`).

It looks like gcc 4.8.5 produced expected code, but gcc 4.9.0 does not.

It is possible that the code produced by gcc 4.9.0 is faster, but it is also
likely it contributes quite a bit to binary size.

clang uses CSE even if there are even just two uses of `&x` in the above
example. It is likely a bit higher threshold is (3 or 4) is actually optimal
(can be calculated knowing encoding sizes).

Weirdly tho, gcc -m32 does this:

g():
        push    ebp
        mov     ebp, esp
        push    ebx
        lea     ebx, [ebp-12]
        sub     esp, 32
        mov     DWORD PTR [ebp-12], 13
        push    ebx
        call    f(int*)
        mov     DWORD PTR [esp], ebx
        call    f(int*)
        mov     DWORD PTR [esp], ebx
        call    f(int*)
        mov     ebx, DWORD PTR [ebp-4]
        xor     eax, eax
        leave
        ret

Where, it does compute address and stores it in temporary. But does it on a
stack, instead in a register (my guess is there are no free register to store
it and it is spilled)., but in fact lea here would be likely faster (mov    
DWORD PTR [esp], ebx, but requires memory/cache access, lea is 5 bytes, but
does not require memory access)

next             reply	other threads:[~2022-12-30 22:30 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-30 22:30 witold.baryluk+gcc at gmail dot com [this message]
2022-12-30 23:10 ` [Bug target/108255] " pinskia at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-108255-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).