public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd
@ 2011-10-22 14:15 marc.glisse at normalesup dot org
  2011-10-22 16:10 ` [Bug target/50829] " ubizjak at gmail dot com
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: marc.glisse at normalesup dot org @ 2011-10-22 14:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

             Bug #: 50829
           Summary: avx extra copy for _mm256_insertf128_pd
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: marc.glisse@normalesup.org
            Target: x86_64-linux-gnu


With -Ofast -mavx (or -Os -mavx), this code:

__m256d concat(__m128d x){
    __m256d z=_mm256_castpd128_pd256(x);
    return _mm256_insertf128_pd(z,x,1);
}

is compiled (by a snapshot from Oct 10) to:

    .cfi_startproc
    pushq    %rbp
    .cfi_def_cfa_offset 16
    vmovapd    %xmm0, %xmm1
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    andq    $-32, %rsp
    addq    $16, %rsp
    vinsertf128    $0x1, %xmm0, %ymm1, %ymm0
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

Apart from all the fun with stack manipulation, this boils down to:
    vmovapd    %xmm0, %xmm1
    vinsertf128    $0x1, %xmm0, %ymm1, %ymm0

when it looks like this would be enough (and I tested it):
    vinsertf128    $0x1, %xmm0, %ymm0, %ymm0

I am not sure if gcc thinks that vinsertf128 shouldn't use the same register
for everything, or if it doesn't realize that it doesn't need to zero the upper
128 bits of the ymm register before calling insert. I understand that the avx
support is young, but avxintrin.h contains a comment saying that
_mm256_castpd128_pd256 "shouldn't generate any extra moves".

(I am not using broadcast because going through memory looks like a waste).


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-08-10 17:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-22 14:15 [Bug target/50829] New: avx extra copy for _mm256_insertf128_pd marc.glisse at normalesup dot org
2011-10-22 16:10 ` [Bug target/50829] " ubizjak at gmail dot com
2011-10-22 17:02 ` marc.glisse at normalesup dot org
2011-10-23  8:21 ` marc.glisse at normalesup dot org
2011-10-23  8:33 ` [Bug rtl-optimization/50829] " ubizjak at gmail dot com
2011-11-24  3:48 ` vmakarov at redhat dot com
2011-11-24  5:26 ` [Bug target/50829] " pinskia at gcc dot gnu.org
2011-11-24  7:20 ` vmakarov at redhat dot com
2011-11-24  7:23 ` pinskia at gcc dot gnu.org
2012-12-01 16:30 ` glisse at gcc dot gnu.org
2012-12-01 19:50 ` glisse at gcc dot gnu.org
2012-12-01 20:26 ` hjl.tools at gmail dot com
2012-12-01 22:22 ` hjl.tools at gmail dot com
2013-03-30 10:13 ` glisse at gcc dot gnu.org
2020-08-10 17:31 ` glisse at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).