public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/47949] New: Missed optimization for -Os using xchg instead of mov.
@ 2011-03-02  5:30 svfuerst at gmail dot com
  2011-03-02 10:23 ` [Bug target/47949] " rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: svfuerst at gmail dot com @ 2011-03-02  5:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949

           Summary: Missed optimization for -Os using xchg instead of mov.
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: svfuerst@gmail.com
            Target: x86 / amd64


xchg %eax, reg is a one-byte instruction.  If reg is dead, this instruction
could replace the two-byte mov reg, %eax for a one-byte savings.

ie:

int foo(int x)
{
    return x;
}

currently compiles to

mov    %edi,%eax
retq

with -Os, whereas the following may be better:

xchg %eax, %edi
retq

(Similar cases exist with mov reg, %rax; mov reg, %ax; and mov reg, %al)

Note that xchg is slower than mov, so this is only an optimization when size is
more important than speed.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/47949] Missed optimization for -Os using xchg instead of mov.
  2011-03-02  5:30 [Bug target/47949] New: Missed optimization for -Os using xchg instead of mov svfuerst at gmail dot com
@ 2011-03-02 10:23 ` rguenth at gcc dot gnu.org
  2011-03-02 10:53 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-03-02 10:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|x86 / amd64                 |x86_64-*-*, i?86-*-*
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2011.03.02 10:23:29
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-02 10:23:29 UTC ---
Interesting idea.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/47949] Missed optimization for -Os using xchg instead of mov.
  2011-03-02  5:30 [Bug target/47949] New: Missed optimization for -Os using xchg instead of mov svfuerst at gmail dot com
  2011-03-02 10:23 ` [Bug target/47949] " rguenth at gcc dot gnu.org
@ 2011-03-02 10:53 ` jakub at gcc dot gnu.org
  2011-03-02 21:51 ` svfuerst at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-03-02 10:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-03-02 10:53:28 UTC ---
I'm afraid it will upset Linux kernel people and others who are using -Os for
performance reasons to decrease its cache footprint, but if their code slows
down too much, they won't be happy.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/47949] Missed optimization for -Os using xchg instead of mov.
  2011-03-02  5:30 [Bug target/47949] New: Missed optimization for -Os using xchg instead of mov svfuerst at gmail dot com
  2011-03-02 10:23 ` [Bug target/47949] " rguenth at gcc dot gnu.org
  2011-03-02 10:53 ` jakub at gcc dot gnu.org
@ 2011-03-02 21:51 ` svfuerst at gmail dot com
  2021-06-08  9:52 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: svfuerst at gmail dot com @ 2011-03-02 21:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949

--- Comment #3 from Steven Fuerst <svfuerst at gmail dot com> 2011-03-02 21:51:12 UTC ---
Having a quick look at generated code... it appears that this pattern doesn't
come up all that often.  However, there is one case where it does: the epilogue
of a function. i.e. gcc tends to generate code looking like:

movl    %ebp, %eax
movq    8(%rsp), %rbx
movq    16(%rsp), %rbp
movq    24(%rsp), %r12
movq    32(%rsp), %r13
addq    $40, %rsp
ret

Replacing the move to %eax with an exchange with %ebp is a win in this
particular case.  The extra cycle or two of latency that xchg takes doesn't
matter as the other moves and ret instruction overlap in execution with it. 
Benchmarking on an opteron in 64bit mode confirms this hypothesis even in the
degenerate case where no other moves exist:

foo1:
    mov %edi, %eax
    retq

foo2:
    xchg %eax, %edi
    retq

foo1 and foo2 take the same time to execute.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/47949] Missed optimization for -Os using xchg instead of mov.
  2011-03-02  5:30 [Bug target/47949] New: Missed optimization for -Os using xchg instead of mov svfuerst at gmail dot com
                   ` (2 preceding siblings ...)
  2011-03-02 21:51 ` svfuerst at gmail dot com
@ 2021-06-08  9:52 ` pinskia at gcc dot gnu.org
  2022-08-03  8:11 ` cvs-commit at gcc dot gnu.org
  2022-08-04 18:23 ` roger at nextmovesoftware dot com
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-08  9:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This might have been fixed already via the commit referenced in PR92549

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/47949] Missed optimization for -Os using xchg instead of mov.
  2011-03-02  5:30 [Bug target/47949] New: Missed optimization for -Os using xchg instead of mov svfuerst at gmail dot com
                   ` (3 preceding siblings ...)
  2021-06-08  9:52 ` pinskia at gcc dot gnu.org
@ 2022-08-03  8:11 ` cvs-commit at gcc dot gnu.org
  2022-08-04 18:23 ` roger at nextmovesoftware dot com
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-03  8:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:fc6ef90173478521982e9df3831a06ea85b4f41e

commit r13-1945-gfc6ef90173478521982e9df3831a06ea85b4f41e
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Wed Aug 3 09:07:36 2022 +0100

    PR target/47949: Use xchg to move from/to AX_REG with -Oz on x86.

    This patch adds a peephole2 to i386.md to implement the suggestion in
    PR target/47949, of using xchg instead of mov for moving values to/from
    the %rax/%eax register, controlled by -Oz, as the xchg instruction is
    one byte shorter than the move it is replacing.

    The new test case is taken from the PR:
    int foo(int x) { return x; }

    where previously we'd generate:
    foo:    mov %edi,%eax  // 2 bytes
            ret

    but with this patch, using -Oz, we generate:
    foo:    xchg %eax,%edi  // 1 byte
            ret

    On the CSiBE benchmark, this saves a total of 10238 bytes (reducing
    the -Oz total from 3661796 bytes to 3651558 bytes, a 0.28% saving).

    Interestingly, some modern architectures (such as Zen 3) implement
    xchg using zero latency register renaming (just like mov), so in theory
    this transformation could be enabled when optimizing for speed, if
    benchmarking shows the improved code density produces consistently
    better performance.  However, this is architecture dependent, and
    there may be interactions using xchg (instead a single_set) in the
    late RTL passes (such as cprop_hardreg), so for now I've restricted
    this to -Oz.

    2022-08-03  Roger Sayle  <roger@nextmovesoftware.com>
                Uroš Bizjak  <ubizjak@gmail.com>

    gcc/ChangeLog
            PR target/47949
            * config/i386/i386.md (peephole2): New peephole2 to convert
            SWI48 moves to/from %rax/%eax where the src is dead to xchg,
            when optimizing for minimal size with -Oz.

    gcc/testsuite/ChangeLog
            PR target/47949
            * gcc.target/i386/pr47949.c: New test case.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/47949] Missed optimization for -Os using xchg instead of mov.
  2011-03-02  5:30 [Bug target/47949] New: Missed optimization for -Os using xchg instead of mov svfuerst at gmail dot com
                   ` (4 preceding siblings ...)
  2022-08-03  8:11 ` cvs-commit at gcc dot gnu.org
@ 2022-08-04 18:23 ` roger at nextmovesoftware dot com
  5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-08-04 18:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com
             Status|NEW                         |RESOLVED
   Target Milestone|---                         |13.0
         Resolution|---                         |FIXED

--- Comment #6 from Roger Sayle <roger at nextmovesoftware dot com> ---
This suggestion has now been implemented on mainline (when using -Oz).

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-04 18:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-02  5:30 [Bug target/47949] New: Missed optimization for -Os using xchg instead of mov svfuerst at gmail dot com
2011-03-02 10:23 ` [Bug target/47949] " rguenth at gcc dot gnu.org
2011-03-02 10:53 ` jakub at gcc dot gnu.org
2011-03-02 21:51 ` svfuerst at gmail dot com
2021-06-08  9:52 ` pinskia at gcc dot gnu.org
2022-08-03  8:11 ` cvs-commit at gcc dot gnu.org
2022-08-04 18:23 ` roger at nextmovesoftware dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).