public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/96034] New: missed optimization with extended registers
@ 2020-07-02 15:37 sshannin at gmail dot com
  2020-07-02 16:48 ` [Bug tree-optimization/96034] " jakub at gcc dot gnu.org
  2020-07-02 18:10 ` sshannin at gmail dot com
  0 siblings, 2 replies; 3+ messages in thread
From: sshannin at gmail dot com @ 2020-07-02 15:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96034

            Bug ID: 96034
           Summary: missed optimization with extended registers
           Product: gcc
           Version: 9.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sshannin at gmail dot com
  Target Milestone: ---

Noticed in example for PR96009.

Consider this simple function:

double bar(char i) {
    return i;
}

Compiled with -O3, we get:

movsbl  %dil, %edi
pxor  %xmm0, %xmm0
cvtsi2sdl %edi, %xmm0
ret

But aren't the movsb and pxor unnecessary? I think we should be able to just
cvtsi2sd and then ret.

Interestingly, compiling with -OS instead of -O3 manages to remove the pxor:
movsbl  %dil, %edi
cvtsi2sdl %edi, %xmm0
ret

Which is a one instruction better (unless -O3 is trying to keep the pxor for
alignment?), but even here I think the movsb could still go too.

Closest thing I find is PR48701, in that it also doesn't seem to understand
which registers are the same.

seth@fr-dev3:$ /toolchain14/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/toolchain14/bin/gcc
COLLECT_LTO_WRAPPER=/toolchain14/libexec/gcc/x86_64-pc-linux-gnu/9.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc_9_1_0/configure --prefix=/toolchain14
--enable-languages=c,c++,fortran --enable-lto --disable-plugin
--program-suffix=-9.1.0 --disable-multilib
Thread model: posix
gcc version 9.1.0 (GCC)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/96034] missed optimization with extended registers
  2020-07-02 15:37 [Bug tree-optimization/96034] New: missed optimization with extended registers sshannin at gmail dot com
@ 2020-07-02 16:48 ` jakub at gcc dot gnu.org
  2020-07-02 18:10 ` sshannin at gmail dot com
  1 sibling, 0 replies; 3+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-07-02 16:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96034

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The pxor is in there for performance reasons:
;; Break partial SSE register dependency stall.  This splitter should split
;; late in the pass sequence (after register rename pass), so allocated
;; registers won't change anymore
So sure, at -Os it is not present, because it makes the code larger (but
faster).

I also don't understand why you think the 8-bit to 32-bit sign extension is
unnecessary, the x86_64 ABI says that the upper bits have unspecified values.
It is true that LLVM ignores that part of ABI and makes its own incompatible
one and so doesn't add that, but it is wrong.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/96034] missed optimization with extended registers
  2020-07-02 15:37 [Bug tree-optimization/96034] New: missed optimization with extended registers sshannin at gmail dot com
  2020-07-02 16:48 ` [Bug tree-optimization/96034] " jakub at gcc dot gnu.org
@ 2020-07-02 18:10 ` sshannin at gmail dot com
  1 sibling, 0 replies; 3+ messages in thread
From: sshannin at gmail dot com @ 2020-07-02 18:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96034

sshannin at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #2 from sshannin at gmail dot com ---
Ah, yes, you're correct on both counts.

For future reference if anybody comes across this, I can confirm on both a
sandy bridge and skylake that the pxor does actually make it faster. I
should've checked first; I got too excited by "fewer instructions = better".

As far as the ABI, I'm certainly not an expert and if you claim that the upper
bits are undefined I certainly defer to you. As you intuited, I was checking
against llvm output (and it does omit the sign extend).

Sorry for the bother and thanks for such a helpful response.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-07-02 18:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02 15:37 [Bug tree-optimization/96034] New: missed optimization with extended registers sshannin at gmail dot com
2020-07-02 16:48 ` [Bug tree-optimization/96034] " jakub at gcc dot gnu.org
2020-07-02 18:10 ` sshannin at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).