public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic
@ 2012-10-16 14:22 ysrumyan at gmail dot com
2012-10-16 14:37 ` [Bug tree-optimization/54939] " rguenth at gcc dot gnu.org
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: ysrumyan at gmail dot com @ 2012-10-16 14:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
Bug #: 54939
Summary: Very poor vectorization of loops with complex
arithmetic
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: ysrumyan@gmail.com
Analyzing some performance anomaly for spec2000 I found out that 168.wupwise
with vectorization is slower than without it on x86. The main problem is that
gcc does not recognize some special idioms of complex addition and
multiplication in process of loop vectorization. For example, for a simple
zaxpy loop icc genearates 1.6X faster code than gcc. Here is assembly for zaxpy
loop produced by icc:
..B1.4: # Preds ..B1.2 ..B1.4
movups (%rsi,%rdx), %xmm2 #7.28
movups 16(%rsi,%rdx), %xmm5 #7.28
movups (%rsi,%rcx), %xmm4 #7.17
movups 16(%rsi,%rcx), %xmm7 #7.17
movddup (%rsi,%rdx), %xmm3 #7.27
incq %r8 #6.10
movddup 16(%rsi,%rdx), %xmm6 #7.27
unpckhpd %xmm2, %xmm2 #7.27
unpckhpd %xmm5, %xmm5 #7.27
mulpd %xmm1, %xmm3 #7.27
mulpd %xmm0, %xmm2 #7.27
mulpd %xmm1, %xmm6 #7.27
mulpd %xmm0, %xmm5 #7.27
addsubpd %xmm2, %xmm3 #7.27
addsubpd %xmm5, %xmm6 #7.27
addpd %xmm3, %xmm4 #7.9
addpd %xmm6, %xmm7 #7.9
movups %xmm4, (%rsi,%rcx) #7.9
movups %xmm7, 16(%rsi,%rcx) #7.9
addq $32, %rsi #6.10
cmpq %rdi, %r8 #6.10
jb ..B1.4 # Prob 64% #6.10
( I got it with -xSSE4.2 -O3 options). Gor gcc compiler the following options
were used: -m64 -mfpmath=sse -march=corei7 -O3 -ffast-math.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic
2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com
@ 2012-10-16 14:37 ` rguenth at gcc dot gnu.org
2012-10-16 14:55 ` ysrumyan at gmail dot com
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-10-16 14:37 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-10-16
Blocks| |53947
Ever Confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> 2012-10-16 14:36:52 UTC ---
Can you reproduce zaxpy source here please? Also please see the list of bugs
referenced from PR53947, there is likely a duplicate for this issue.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic
2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com
2012-10-16 14:37 ` [Bug tree-optimization/54939] " rguenth at gcc dot gnu.org
@ 2012-10-16 14:55 ` ysrumyan at gmail dot com
2012-10-16 15:06 ` ysrumyan at gmail dot com
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: ysrumyan at gmail dot com @ 2012-10-16 14:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
--- Comment #2 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2012-10-16 14:54:50 UTC ---
Created attachment 28455
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28455
test reproducer
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic
2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com
2012-10-16 14:37 ` [Bug tree-optimization/54939] " rguenth at gcc dot gnu.org
2012-10-16 14:55 ` ysrumyan at gmail dot com
@ 2012-10-16 15:06 ` ysrumyan at gmail dot com
2012-10-16 15:32 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: ysrumyan at gmail dot com @ 2012-10-16 15:06 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
--- Comment #3 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2012-10-16 15:06:19 UTC ---
I looked through the list of all issues related to vectorization but could not
find duplicate.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic
2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com
` (2 preceding siblings ...)
2012-10-16 15:06 ` ysrumyan at gmail dot com
@ 2012-10-16 15:32 ` rguenth at gcc dot gnu.org
2013-03-27 11:19 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-10-16 15:32 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> 2012-10-16 15:31:52 UTC ---
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic
2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com
` (3 preceding siblings ...)
2012-10-16 15:32 ` rguenth at gcc dot gnu.org
@ 2013-03-27 11:19 ` rguenth at gcc dot gnu.org
2023-07-21 12:28 ` rguenth at gcc dot gnu.org
2023-07-21 12:31 ` rguenth at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-27 11:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Blocks| |37021
AssignedTo|unassigned at gcc dot |rguenth at gcc dot gnu.org
|gnu.org |
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-27 11:19:49 UTC ---
Confirmed. GCC vectorizes this using hybrid SLP - it unrolls the loop once
to be able to vectorize two minus and two adds resulting from the complex
multiplication.
The PR is kind-of a duplicate of PR37021 where also a reduction and
a variable stride is involved. So fixing this bug is required to more
efficiently vectorize PR37021.
Note that even this bug has multiple issues that need to be tackled.
I happen to work on them.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic
2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com
` (4 preceding siblings ...)
2013-03-27 11:19 ` rguenth at gcc dot gnu.org
@ 2023-07-21 12:28 ` rguenth at gcc dot gnu.org
2023-07-21 12:31 ` rguenth at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-21 12:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
CC| |crazylht at gmail dot com
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
With SSE4.2 we now get
.L3:
movupd (%rdx,%rax), %xmm0
movupd (%rcx,%rax), %xmm4
movapd %xmm0, %xmm1
palignr $8, %xmm0, %xmm0
mulpd %xmm3, %xmm1
mulpd %xmm2, %xmm0
addpd %xmm4, %xmm1
addsubpd %xmm0, %xmm1
movups %xmm1, (%rcx,%rax)
addq $16, %rax
cmpq %rsi, %rax
jne .L3
with AVX and FMA
.L4:
vmovupd (%rdx,%rax), %ymm0
vmovapd %ymm4, %ymm1
vfmadd213pd (%rcx,%rax), %ymm0, %ymm1
vpermilpd $5, %ymm0, %ymm0
vmulpd %ymm3, %ymm0, %ymm0
vaddsubpd %ymm0, %ymm1, %ymm1
vmovupd %ymm1, (%rcx,%rax)
addq $32, %rax
cmpq %rsi, %rax
jne .L4
so I'd say fixed. But. With AVX512 we now get
.L4:
vmovupd (%rdi,%rax), %zmm0
vmovapd %zmm7, %zmm2
vmovapd %zmm4, %zmm6
vfmadd213pd (%rcx,%rax), %zmm0, %zmm2
vpermilpd $85, %zmm0, %zmm0
vfmadd132pd %zmm0, %zmm2, %zmm6
vfnmadd132pd %zmm4, %zmm2, %zmm0
vmovapd %zmm6, %zmm0{%k1}
vmovupd %zmm0, (%rcx,%rax)
addq $64, %rax
cmpq %rax, %rsi
jne .L4
it's odd that this only happens with -mprefer-vector-width=512 though. Do
we possibly miss vec_{fm,}{addsub,subadd} for those? Looks like so.
Tracking in PR110767. The vectorizer side is fixed.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic
2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com
` (5 preceding siblings ...)
2023-07-21 12:28 ` rguenth at gcc dot gnu.org
@ 2023-07-21 12:31 ` rguenth at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-21 12:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
Bug 54939 depends on bug 84361, which changed state.
Bug 84361 Summary: Fails to use vfmaddsub* for complex multiplication
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84361
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |DUPLICATE
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-07-21 12:31 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com
2012-10-16 14:37 ` [Bug tree-optimization/54939] " rguenth at gcc dot gnu.org
2012-10-16 14:55 ` ysrumyan at gmail dot com
2012-10-16 15:06 ` ysrumyan at gmail dot com
2012-10-16 15:32 ` rguenth at gcc dot gnu.org
2013-03-27 11:19 ` rguenth at gcc dot gnu.org
2023-07-21 12:28 ` rguenth at gcc dot gnu.org
2023-07-21 12:31 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).