public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/96654] New: Failure to optimize vectorized conversion to `int` with AVX
@ 2020-08-17 11:57 gabravier at gmail dot com
2020-08-17 19:04 ` [Bug tree-optimization/96654] " ubizjak at gmail dot com
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: gabravier at gmail dot com @ 2020-08-17 11:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96654
Bug ID: 96654
Summary: Failure to optimize vectorized conversion to `int`
with AVX
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
void f(double *src, int *dst)
{
for (int i = 0; i < 4; i ++)
dst[i] = (int)src[i];
}
With -O3 -mavx, LLVM outputs this :
f(double*, int*):
vcvttpd2dq xmm0, ymmword ptr [rdi]
vmovupd xmmword ptr [rsi], xmm0
ret
GCC outputs this :
f(double*, int*):
push rbp
vmovupd xmm1, XMMWORD PTR [rdi]
vinsertf128 ymm0, ymm1, XMMWORD PTR [rdi+16], 0x1
mov rbp, rsp
vcvttpd2dq xmm0, ymm0
vmovdqu XMMWORD PTR [rsi], xmm0
vzeroupper
pop rbp
ret
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/96654] Failure to optimize vectorized conversion to `int` with AVX
2020-08-17 11:57 [Bug target/96654] New: Failure to optimize vectorized conversion to `int` with AVX gabravier at gmail dot com
@ 2020-08-17 19:04 ` ubizjak at gmail dot com
2020-08-22 16:32 ` glisse at gcc dot gnu.org
2020-08-25 10:40 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: ubizjak at gmail dot com @ 2020-08-17 19:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96654
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Component|target |tree-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2020-08-17
--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
The relevant pattern is present in sse.md:
(define_insn "fix_truncv4dfv4si2<mask_name>"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(fix:V4SI (match_operand:V4DF 1 "nonimmediate_operand" "vm")))]
"TARGET_AVX || (TARGET_AVX512VL && TARGET_AVX512F)"
"vcvttpd2dq{y}\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
but for some reason not exercised by target-independent part of the compiler.
Confirmed as a tree optimization problem.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/96654] Failure to optimize vectorized conversion to `int` with AVX
2020-08-17 11:57 [Bug target/96654] New: Failure to optimize vectorized conversion to `int` with AVX gabravier at gmail dot com
2020-08-17 19:04 ` [Bug tree-optimization/96654] " ubizjak at gmail dot com
@ 2020-08-22 16:32 ` glisse at gcc dot gnu.org
2020-08-25 10:40 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: glisse at gcc dot gnu.org @ 2020-08-22 16:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96654
--- Comment #2 from Marc Glisse <glisse at gcc dot gnu.org> ---
gcc doesn't seem very fond of using 2 different vector bitsizes at the same
time, so VEC_PACK_FIX_TRUNC_EXPR takes 2 vectors of 2 double and gives one
vector of 4 int. At the RTL level, we have a vec_concat:V4DF of 2 V2DF adjacent
in memory, but nothing knows to turn that into a single load. (the conversion
itself of 4 double to int is fine)
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/96654] Failure to optimize vectorized conversion to `int` with AVX
2020-08-17 11:57 [Bug target/96654] New: Failure to optimize vectorized conversion to `int` with AVX gabravier at gmail dot com
2020-08-17 19:04 ` [Bug tree-optimization/96654] " ubizjak at gmail dot com
2020-08-22 16:32 ` glisse at gcc dot gnu.org
@ 2020-08-25 10:40 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-08-25 10:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96654
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org,
| |rsandifo at gcc dot gnu.org
Blocks| |53947
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
The pattern is exercised directly by BB vectorization only, loop vectorization
still uses a fixed vector size. Still the assembly shows basically the same
code when doing BB vectorization only:
f:
.LFB0:
.cfi_startproc
vmovupd (%rdi), %xmm1
vinsertf128 $0x1, 16(%rdi), %ymm1, %ymm0
vcvttpd2dqy %ymm0, %xmm0
vmovdqu %xmm0, (%rsi)
vzeroupper
ret
this is probably because of some tuning (split unaligned loads, not using
a memory operand for vcvttpd2dqy).
With -O3 -fno-tree-loop-vectorize -march=core-avx2 I get
f:
.LFB0:
.cfi_startproc
vcvttpd2dqy (%rdi), %xmm0
vmovdqu %xmm0, (%rsi)
ret
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-08-25 10:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-17 11:57 [Bug target/96654] New: Failure to optimize vectorized conversion to `int` with AVX gabravier at gmail dot com
2020-08-17 19:04 ` [Bug tree-optimization/96654] " ubizjak at gmail dot com
2020-08-22 16:32 ` glisse at gcc dot gnu.org
2020-08-25 10:40 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).