public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/105079] New: _mm_storeu_si16 inefficiently uses pextrw to an integer reg (without SSE4.1)
@ 2022-03-28  9:26 peter at cordes dot ca
  2022-03-28 15:02 ` [Bug target/105079] " ubizjak at gmail dot com
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: peter at cordes dot ca @ 2022-03-28  9:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105079

            Bug ID: 105079
           Summary: _mm_storeu_si16 inefficiently uses pextrw to an
                    integer reg (without SSE4.1)
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

With PR105066 fixed, we do _mm_loadu_si16 with pinsrw from memory, because
that's available with just SSE2.  (And the cause wasn't tuning choices, it was
a typo in what insns GCC thought were available.)  Related: PR105072 re:
folding such 16-bit loads into memory source operands for PMOVZX/SXBQ.

But the famously non-orthogonal SSE2 only includes pextrw $imm, %xmm, reg.  Not
reg/mem until SSE4.1 (with a longer opcode for no apparent reason, instead of
just allowing mem addressing modes for the existing one.  But same mnemonic so
the assembler takes care of it.  https://www.felixcloutier.com/x86/pextrw)

So we do need to care about tuning for _mm_storeu_si16(p, v) without the option
of PEXTRW to memory.  Currently we do this, which is obviously bad:

    pextrw  $0, %xmm0, %eax      # 2 uops
    movw    %ax, (%rdi)

we should be doing this

    movd    %xmm0, %eax          # 1 uop
    mov     %ax, (%rdi)

https://godbolt.org/z/Ee3Ez174M

This is especially true if we don't need the integer value zero-extended into
EAX.

If we *did* also want the value zero-extended in an integer register, the extra
uop in PEXTRW (in addition to the port 0 uop like MOVD) is a port-5 shuffle to
extract an arbitrary 16-bit element, vs. a separate integer movzwl %cx, %eax
could run on any integer ALU port.  (Including port 6 on HSW/SKL, which doesn't
compete with any vector ALUs).

Mov-elimination for movzwl doesn't work on any current CPUs, only movzbl on
Intel, and movl / movq on both Intel and AMD.  So currently there's no benefit
to picking a different register like %ecx, instead of just using movzwl %ax,
%eax

When we both store and use the integer value:

int store16_and_use(void *p, __m128i v){
    _mm_storeu_si16( p, v );
    return 123 + *(unsigned short*)p;
}

https://godbolt.org/z/zq6TMo1oE current trunk GCC does this, which is not bad:

# -O3 with or without -msse4.1
        pextrw  $0, %xmm0, %eax
        movw    %ax, (%rdi)
        addl    $123, %eax
        ret

Clang13 uses MOVD + MOVZX like I was suggesting, even though it costs more code
size.  That's not necessarily better

        movd    %xmm0, %eax
        movw    %ax, (%rdi)
        movzwl  %ax, %eax
        addl    $123, %eax
        retq

In this case it's not obviously wrong to use PEXTRW to an integer reg, but it's
also fine to do it clang's way.  So however that corner case shakes out in the
process of fixing the main bug (using movd / movw without SSE4.1 when we don't
reload) is fine.

If SSE4.1 is available, the no-reload case should probably use PEXTRW to memory
instead of movd + movw.  On some CPUs, the ALU op that's part of PEXTRW has
more choice of ALU port than xmm->gp_int operations.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/105079] _mm_storeu_si16 inefficiently uses pextrw to an integer reg (without SSE4.1)
  2022-03-28  9:26 [Bug target/105079] New: _mm_storeu_si16 inefficiently uses pextrw to an integer reg (without SSE4.1) peter at cordes dot ca
@ 2022-03-28 15:02 ` ubizjak at gmail dot com
  2022-05-03 16:00 ` cvs-commit at gcc dot gnu.org
  2022-05-03 16:03 ` ubizjak at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: ubizjak at gmail dot com @ 2022-03-28 15:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105079

--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 52700
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52700&action=edit
Proposed patch

The attached patch handles the case from Comment #0.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/105079] _mm_storeu_si16 inefficiently uses pextrw to an integer reg (without SSE4.1)
  2022-03-28  9:26 [Bug target/105079] New: _mm_storeu_si16 inefficiently uses pextrw to an integer reg (without SSE4.1) peter at cordes dot ca
  2022-03-28 15:02 ` [Bug target/105079] " ubizjak at gmail dot com
@ 2022-05-03 16:00 ` cvs-commit at gcc dot gnu.org
  2022-05-03 16:03 ` ubizjak at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-03 16:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105079

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:2680f5eec23805ab8a344f942ca5a7e180d57905

commit r13-86-g2680f5eec23805ab8a344f942ca5a7e180d57905
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Tue May 3 17:59:40 2022 +0200

    i386: Optimize _mm_storeu_si16 w/o SSE4 [PR105079]

    Optimize _mm_storeu_si16 to use MOVD from a SSE to an integer register
    instead of PEXTRW from a low word of the SSE register to an integer reg.

    Avoid the transformation when optimizing for size for targets without
    TARGET_INTER_UNIT_MOVES_FROM_VEC capability, where the transformation
    results in two moves via a memory location.

    2022-05-03  Uroš Bizjak  <ubizjak@gmail.com>

    gcc/ChangeLog:

            PR target/105079
            * config/i386/sse.md (*vec_extract<mode>_0_mem): New pre-reload
            define_insn_and_split pattern.

    gcc/testsuite/ChangeLog:

            PR target/105079
            * gcc.target/i386/pr105079.c: New test.
            * gcc.target/i386/pr95483-1.c (dg-options): Use -msse4.1.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/105079] _mm_storeu_si16 inefficiently uses pextrw to an integer reg (without SSE4.1)
  2022-03-28  9:26 [Bug target/105079] New: _mm_storeu_si16 inefficiently uses pextrw to an integer reg (without SSE4.1) peter at cordes dot ca
  2022-03-28 15:02 ` [Bug target/105079] " ubizjak at gmail dot com
  2022-05-03 16:00 ` cvs-commit at gcc dot gnu.org
@ 2022-05-03 16:03 ` ubizjak at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: ubizjak at gmail dot com @ 2022-05-03 16:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105079

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED
   Target Milestone|---                         |13.0
           Assignee|unassigned at gcc dot gnu.org      |ubizjak at gmail dot com

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
Implemented for gcc-13.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-03 16:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-28  9:26 [Bug target/105079] New: _mm_storeu_si16 inefficiently uses pextrw to an integer reg (without SSE4.1) peter at cordes dot ca
2022-03-28 15:02 ` [Bug target/105079] " ubizjak at gmail dot com
2022-05-03 16:00 ` cvs-commit at gcc dot gnu.org
2022-05-03 16:03 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).