public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code
@ 2011-05-19 14:40 piotr.wyderski at gmail dot com
  2011-05-19 14:41 ` [Bug tree-optimization/49064] " piotr.wyderski at gmail dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: piotr.wyderski at gmail dot com @ 2011-05-19 14:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064

           Summary: [x86/x64]: broken alias analysis leads vectorizer to
                    emit poor code
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: piotr.wyderski@gmail.com


On an x86 capable of SSE2 or x64 (which has SSE2 by definition) GCC tries
to vectorize as much integer code as possible, but ends up witch code much
worse than without vectorization. The SSE2-based version unnecessarily
recomputes all the m_Data pointers, as demonstrated by the following C++
snippet. I guess the reason is unsophisticated alias analysis, but the
actual reason may in fact be different.


struct X {

    __m128i*    m_Data;
    std::size_t m_Len;

    void xor_all(const X& v1, const X& v2);    
    void xor_all2(const X& v1, const X& v2);    
};


void X::xor_all(const X& v1, const X& v2) {

    for(std::size_t i = 0; i != m_Len; ++i) {

        m_Data[i] = v1.m_Data[i] ^ v2.m_Data[i];
    }
}

void X::xor_all2(const X& v1, const X& v2) {

    __m128i* p0 = m_Data;
    __m128i* p1 = v1.m_Data;
    __m128i* p2 = v2.m_Data;

    for(std::size_t i = 0; i != m_Len; ++i) {

        p0[i] = p1[i] ^ p2[i];
    }
}

As can be seen, xor_all2 produces nice code and xor_all doesn't:

0000000000447c70 <_ZN1X7xor_allERKS_S1_>:
  447c70:    48 83 7f 08 00           cmpq   $0x0,0x8(%rdi)
  447c75:    74 35                    je     447cac
<_ZN1X7xor_allERKS_S1_+0x3c>
  447c77:    31 c0                    xor    %eax,%eax
  447c79:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)
  447c80:    4c 8b 12                 mov    (%rdx),%r10
  447c83:    48 89 c1                 mov    %rax,%rcx
  447c86:    48 83 c0 01              add    $0x1,%rax
  447c8a:    4c 8b 0e                 mov    (%rsi),%r9
  447c8d:    48 c1 e1 04              shl    $0x4,%rcx
  447c91:    4c 8b 07                 mov    (%rdi),%r8
  447c94:    66 41 0f 6f 04 0a        movdqa (%r10,%rcx,1),%xmm0
  447c9a:    66 41 0f ef 04 09        pxor   (%r9,%rcx,1),%xmm0
  447ca0:    66 41 0f 7f 04 08        movdqa %xmm0,(%r8,%rcx,1)
  447ca6:    48 39 47 08              cmp    %rax,0x8(%rdi)
  447caa:    75 d4                    jne    447c80
<_ZN1X7xor_allERKS_S1_+0x10>
  447cac:    f3 c3                    repz retq 


0000000000447cb0 <_ZN1X8xor_all2ERKS_S1_>:
  447cb0:    48 83 7f 08 00           cmpq   $0x0,0x8(%rdi)
  447cb5:    48 8b 0f                 mov    (%rdi),%rcx
  447cb8:    48 8b 36                 mov    (%rsi),%rsi
  447cbb:    4c 8b 02                 mov    (%rdx),%r8
  447cbe:    74 26                    je     447ce6
<_ZN1X8xor_all2ERKS_S1_+0x36>
  447cc0:    31 c0                    xor    %eax,%eax
  447cc2:    31 d2                    xor    %edx,%edx
  447cc4:    0f 1f 40 00              nopl   0x0(%rax)
  447cc8:    66 41 0f 6f 04 00        movdqa (%r8,%rax,1),%xmm0
  447cce:    48 83 c2 01              add    $0x1,%rdx
  447cd2:    66 0f ef 04 06           pxor   (%rsi,%rax,1),%xmm0
  447cd7:    66 0f 7f 04 01           movdqa %xmm0,(%rcx,%rax,1)
  447cdc:    48 83 c0 10              add    $0x10,%rax
  447ce0:    48 39 57 08              cmp    %rdx,0x8(%rdi)
  447ce4:    75 e2                    jne    447cc8
<_ZN1X8xor_all2ERKS_S1_+0x18>
  447ce6:    f3 c3                    repz retq


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/49064] [x86/x64]: broken alias analysis leads vectorizer to emit poor code
  2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
@ 2011-05-19 14:41 ` piotr.wyderski at gmail dot com
  2011-05-20 10:29 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: piotr.wyderski at gmail dot com @ 2011-05-19 14:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064

--- Comment #1 from Piotr Wyderski <piotr.wyderski at gmail dot com> 2011-05-19 14:27:28 UTC ---
This is caused by the following definition in emmintrin.h:

/* The Intel API is flexible enough that we must allow aliasing with other
   vector types, and their scalar components.  */
typedef long long __m128i __attribute__ ((__vector_size__ (16),
__may_alias__)); 

Without __may_alias__ the generated assembly code is OK.
It's wrong to blindly assume a type aliases everything;
proper analysis should be performed.

Because the headers are indended to provide seamless integration
with MSVC and ICC vectorized code, it's a good practice to use SSE
that way. Most bona fide users will step into that trap assuming
GCC produces comparably good code, which in this case it obviously
doesn't, as can be seen above.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/49064] [x86/x64]: broken alias analysis leads vectorizer to emit poor code
  2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
  2011-05-19 14:41 ` [Bug tree-optimization/49064] " piotr.wyderski at gmail dot com
@ 2011-05-20 10:29 ` rguenth at gcc dot gnu.org
  2011-05-20 13:02 ` piotr.wyderski at gmail dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-05-20 10:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-05-20 10:03:32 UTC ---
Only type-based aliasing is disabled (which is required).  The testcase
does not compile for me, please provide something complete.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/49064] [x86/x64]: broken alias analysis leads vectorizer to emit poor code
  2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
  2011-05-19 14:41 ` [Bug tree-optimization/49064] " piotr.wyderski at gmail dot com
  2011-05-20 10:29 ` rguenth at gcc dot gnu.org
@ 2011-05-20 13:02 ` piotr.wyderski at gmail dot com
  2011-05-20 13:43 ` jakub at gcc dot gnu.org
  2021-08-14 21:59 ` [Bug target/49064] " pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: piotr.wyderski at gmail dot com @ 2011-05-20 13:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064

--- Comment #3 from Piotr Wyderski <piotr.wyderski at gmail dot com> 2011-05-20 12:50:49 UTC ---

#include <cstdint>
#include <emmintrin.h>

struct X {

    __m128i*    m_Data;
    std::size_t m_Len;

    void xor_all(const X& v1, const X& v2);    
    void xor_all2(const X& v1, const X& v2);    
};


void X::xor_all(const X& v1, const X& v2) {

    for(std::size_t i = 0; i != m_Len; ++i) {

        m_Data[i] = v1.m_Data[i] ^ v2.m_Data[i];
    }
}

void X::xor_all2(const X& v1, const X& v2) {

    __m128i* p0 = m_Data;
    __m128i* p1 = v1.m_Data;
    __m128i* p2 = v2.m_Data;

    for(std::size_t i = 0; i != m_Len; ++i) {

        p0[i] = p1[i] ^ p2[i];
    }
}

$ g++ -std=gnu++0x -msse2 -O2 -DNDEBUG testcase.cpp


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/49064] [x86/x64]: broken alias analysis leads vectorizer to emit poor code
  2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
                   ` (2 preceding siblings ...)
  2011-05-20 13:02 ` piotr.wyderski at gmail dot com
@ 2011-05-20 13:43 ` jakub at gcc dot gnu.org
  2021-08-14 21:59 ` [Bug target/49064] " pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-20 13:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-20 13:24:38 UTC ---
The code has to reread this->m_Data, v1.m_Data and v2.m_Data in every loop,
because writes through __m128i * could very well clobber X, points-to in this
case can't figure out anything, only TBAA would, but the __m{64,128}* types as
designed can alias anything.  It would really surprise me if Intel's __m128i
can alias ints, longs and many other things, but can't alias X in this case.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/49064] [x86/x64]: broken alias analysis leads vectorizer to emit poor code
  2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
                   ` (3 preceding siblings ...)
  2011-05-20 13:43 ` jakub at gcc dot gnu.org
@ 2021-08-14 21:59 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-14 21:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |alias, missed-optimization
             Status|UNCONFIRMED                 |RESOLVED
          Component|tree-optimization           |target
         Resolution|---                         |INVALID

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
All compilers I could test that support __m128i and _mm_xor_si128 cause an
aliasing issue. So this is invalid.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-14 21:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
2011-05-19 14:41 ` [Bug tree-optimization/49064] " piotr.wyderski at gmail dot com
2011-05-20 10:29 ` rguenth at gcc dot gnu.org
2011-05-20 13:02 ` piotr.wyderski at gmail dot com
2011-05-20 13:43 ` jakub at gcc dot gnu.org
2021-08-14 21:59 ` [Bug target/49064] " pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).