public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/114943] New: X86 AVX2: inefficient code generated to convert SIMD Vectors
@ 2024-05-04 13:10 vincenzo.innocente at cern dot ch
2024-05-06 3:57 ` [Bug target/114943] " liuhongt at gcc dot gnu.org
0 siblings, 1 reply; 2+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2024-05-04 13:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114943
Bug ID: 114943
Summary: X86 AVX2: inefficient code generated to convert SIMD
Vectors
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
Target Milestone: ---
in the example below (see https://godbolt.org/z/qnfT4fE5G )
convert and covert3 produce code that looks to me inefficient w/r/t convert2
(and clang) for target x86-64-v3
#define VECTOR_EXT(N) __attribute__((vector_size(N)))
typedef float VECTOR_EXT(16) float32x4_t;
typedef double VECTOR_EXT(32) float64x4_t;
float32x4_t f1,f2,f3,f4,f;
float64x4_t d1,d2,d3,d4,d;
void covert() {
for (int i=0;i<4;++i) {
d1[i] = f1[i];
d2[i] = f2[i];
d3[i] = f3[i];
d4[i] = f4[i];
}
}
void covert2() {
for (int i=0;i<4;++i)
d1[i] = f1[i];
for (int i=0;i<4;++i)
d2[i] = f2[i];
for (int i=0;i<4;++i)
d3[i] = f3[i];
for (int i=0;i<4;++i)
d4[i] = f4[i];
}
void covert3() {
d1 = __builtin_convertvector(f1,float64x4_t);
}
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Bug target/114943] X86 AVX2: inefficient code generated to convert SIMD Vectors
2024-05-04 13:10 [Bug target/114943] New: X86 AVX2: inefficient code generated to convert SIMD Vectors vincenzo.innocente at cern dot ch
@ 2024-05-06 3:57 ` liuhongt at gcc dot gnu.org
0 siblings, 0 replies; 2+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-05-06 3:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114943
Hongtao Liu <liuhongt at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |liuhongt at gcc dot gnu.org
--- Comment #1 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
For convert3, we already have a patch for it, and will post soon.
For convert, the current loop vectorizer has a limitation to keep the same
vector length while vectorizing, thus generating extra packing/unpacking
instructions compared to convert2. But there's no such limitation in BB
vectorizer, so w/ -O3 -march=x86-64-v3 -fno-tree-loop-vectorize, convert is as
good as convert2
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-05-06 3:57 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-04 13:10 [Bug target/114943] New: X86 AVX2: inefficient code generated to convert SIMD Vectors vincenzo.innocente at cern dot ch
2024-05-06 3:57 ` [Bug target/114943] " liuhongt at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).