public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code
@ 2011-05-19 14:40 piotr.wyderski at gmail dot com
2011-05-19 14:41 ` [Bug tree-optimization/49064] " piotr.wyderski at gmail dot com
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: piotr.wyderski at gmail dot com @ 2011-05-19 14:40 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064
Summary: [x86/x64]: broken alias analysis leads vectorizer to
emit poor code
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: piotr.wyderski@gmail.com
On an x86 capable of SSE2 or x64 (which has SSE2 by definition) GCC tries
to vectorize as much integer code as possible, but ends up witch code much
worse than without vectorization. The SSE2-based version unnecessarily
recomputes all the m_Data pointers, as demonstrated by the following C++
snippet. I guess the reason is unsophisticated alias analysis, but the
actual reason may in fact be different.
struct X {
__m128i* m_Data;
std::size_t m_Len;
void xor_all(const X& v1, const X& v2);
void xor_all2(const X& v1, const X& v2);
};
void X::xor_all(const X& v1, const X& v2) {
for(std::size_t i = 0; i != m_Len; ++i) {
m_Data[i] = v1.m_Data[i] ^ v2.m_Data[i];
}
}
void X::xor_all2(const X& v1, const X& v2) {
__m128i* p0 = m_Data;
__m128i* p1 = v1.m_Data;
__m128i* p2 = v2.m_Data;
for(std::size_t i = 0; i != m_Len; ++i) {
p0[i] = p1[i] ^ p2[i];
}
}
As can be seen, xor_all2 produces nice code and xor_all doesn't:
0000000000447c70 <_ZN1X7xor_allERKS_S1_>:
447c70: 48 83 7f 08 00 cmpq $0x0,0x8(%rdi)
447c75: 74 35 je 447cac
<_ZN1X7xor_allERKS_S1_+0x3c>
447c77: 31 c0 xor %eax,%eax
447c79: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
447c80: 4c 8b 12 mov (%rdx),%r10
447c83: 48 89 c1 mov %rax,%rcx
447c86: 48 83 c0 01 add $0x1,%rax
447c8a: 4c 8b 0e mov (%rsi),%r9
447c8d: 48 c1 e1 04 shl $0x4,%rcx
447c91: 4c 8b 07 mov (%rdi),%r8
447c94: 66 41 0f 6f 04 0a movdqa (%r10,%rcx,1),%xmm0
447c9a: 66 41 0f ef 04 09 pxor (%r9,%rcx,1),%xmm0
447ca0: 66 41 0f 7f 04 08 movdqa %xmm0,(%r8,%rcx,1)
447ca6: 48 39 47 08 cmp %rax,0x8(%rdi)
447caa: 75 d4 jne 447c80
<_ZN1X7xor_allERKS_S1_+0x10>
447cac: f3 c3 repz retq
0000000000447cb0 <_ZN1X8xor_all2ERKS_S1_>:
447cb0: 48 83 7f 08 00 cmpq $0x0,0x8(%rdi)
447cb5: 48 8b 0f mov (%rdi),%rcx
447cb8: 48 8b 36 mov (%rsi),%rsi
447cbb: 4c 8b 02 mov (%rdx),%r8
447cbe: 74 26 je 447ce6
<_ZN1X8xor_all2ERKS_S1_+0x36>
447cc0: 31 c0 xor %eax,%eax
447cc2: 31 d2 xor %edx,%edx
447cc4: 0f 1f 40 00 nopl 0x0(%rax)
447cc8: 66 41 0f 6f 04 00 movdqa (%r8,%rax,1),%xmm0
447cce: 48 83 c2 01 add $0x1,%rdx
447cd2: 66 0f ef 04 06 pxor (%rsi,%rax,1),%xmm0
447cd7: 66 0f 7f 04 01 movdqa %xmm0,(%rcx,%rax,1)
447cdc: 48 83 c0 10 add $0x10,%rax
447ce0: 48 39 57 08 cmp %rdx,0x8(%rdi)
447ce4: 75 e2 jne 447cc8
<_ZN1X8xor_all2ERKS_S1_+0x18>
447ce6: f3 c3 repz retq
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/49064] [x86/x64]: broken alias analysis leads vectorizer to emit poor code
2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
@ 2011-05-19 14:41 ` piotr.wyderski at gmail dot com
2011-05-20 10:29 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: piotr.wyderski at gmail dot com @ 2011-05-19 14:41 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064
--- Comment #1 from Piotr Wyderski <piotr.wyderski at gmail dot com> 2011-05-19 14:27:28 UTC ---
This is caused by the following definition in emmintrin.h:
/* The Intel API is flexible enough that we must allow aliasing with other
vector types, and their scalar components. */
typedef long long __m128i __attribute__ ((__vector_size__ (16),
__may_alias__));
Without __may_alias__ the generated assembly code is OK.
It's wrong to blindly assume a type aliases everything;
proper analysis should be performed.
Because the headers are indended to provide seamless integration
with MSVC and ICC vectorized code, it's a good practice to use SSE
that way. Most bona fide users will step into that trap assuming
GCC produces comparably good code, which in this case it obviously
doesn't, as can be seen above.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/49064] [x86/x64]: broken alias analysis leads vectorizer to emit poor code
2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
2011-05-19 14:41 ` [Bug tree-optimization/49064] " piotr.wyderski at gmail dot com
@ 2011-05-20 10:29 ` rguenth at gcc dot gnu.org
2011-05-20 13:02 ` piotr.wyderski at gmail dot com
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-05-20 10:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-05-20 10:03:32 UTC ---
Only type-based aliasing is disabled (which is required). The testcase
does not compile for me, please provide something complete.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/49064] [x86/x64]: broken alias analysis leads vectorizer to emit poor code
2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
2011-05-19 14:41 ` [Bug tree-optimization/49064] " piotr.wyderski at gmail dot com
2011-05-20 10:29 ` rguenth at gcc dot gnu.org
@ 2011-05-20 13:02 ` piotr.wyderski at gmail dot com
2011-05-20 13:43 ` jakub at gcc dot gnu.org
2021-08-14 21:59 ` [Bug target/49064] " pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: piotr.wyderski at gmail dot com @ 2011-05-20 13:02 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064
--- Comment #3 from Piotr Wyderski <piotr.wyderski at gmail dot com> 2011-05-20 12:50:49 UTC ---
#include <cstdint>
#include <emmintrin.h>
struct X {
__m128i* m_Data;
std::size_t m_Len;
void xor_all(const X& v1, const X& v2);
void xor_all2(const X& v1, const X& v2);
};
void X::xor_all(const X& v1, const X& v2) {
for(std::size_t i = 0; i != m_Len; ++i) {
m_Data[i] = v1.m_Data[i] ^ v2.m_Data[i];
}
}
void X::xor_all2(const X& v1, const X& v2) {
__m128i* p0 = m_Data;
__m128i* p1 = v1.m_Data;
__m128i* p2 = v2.m_Data;
for(std::size_t i = 0; i != m_Len; ++i) {
p0[i] = p1[i] ^ p2[i];
}
}
$ g++ -std=gnu++0x -msse2 -O2 -DNDEBUG testcase.cpp
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/49064] [x86/x64]: broken alias analysis leads vectorizer to emit poor code
2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
` (2 preceding siblings ...)
2011-05-20 13:02 ` piotr.wyderski at gmail dot com
@ 2011-05-20 13:43 ` jakub at gcc dot gnu.org
2021-08-14 21:59 ` [Bug target/49064] " pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-20 13:43 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-20 13:24:38 UTC ---
The code has to reread this->m_Data, v1.m_Data and v2.m_Data in every loop,
because writes through __m128i * could very well clobber X, points-to in this
case can't figure out anything, only TBAA would, but the __m{64,128}* types as
designed can alias anything. It would really surprise me if Intel's __m128i
can alias ints, longs and many other things, but can't alias X in this case.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/49064] [x86/x64]: broken alias analysis leads vectorizer to emit poor code
2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
` (3 preceding siblings ...)
2011-05-20 13:43 ` jakub at gcc dot gnu.org
@ 2021-08-14 21:59 ` pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-14 21:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |alias, missed-optimization
Status|UNCONFIRMED |RESOLVED
Component|tree-optimization |target
Resolution|--- |INVALID
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
All compilers I could test that support __m128i and _mm_xor_si128 cause an
aliasing issue. So this is invalid.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-08-14 21:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-19 14:40 [Bug tree-optimization/49064] New: [x86/x64]: broken alias analysis leads vectorizer to emit poor code piotr.wyderski at gmail dot com
2011-05-19 14:41 ` [Bug tree-optimization/49064] " piotr.wyderski at gmail dot com
2011-05-20 10:29 ` rguenth at gcc dot gnu.org
2011-05-20 13:02 ` piotr.wyderski at gmail dot com
2011-05-20 13:43 ` jakub at gcc dot gnu.org
2021-08-14 21:59 ` [Bug target/49064] " pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).