public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic @ 2012-10-16 14:22 ysrumyan at gmail dot com 2012-10-16 14:37 ` [Bug tree-optimization/54939] " rguenth at gcc dot gnu.org ` (6 more replies) 0 siblings, 7 replies; 8+ messages in thread From: ysrumyan at gmail dot com @ 2012-10-16 14:22 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939 Bug #: 54939 Summary: Very poor vectorization of loops with complex arithmetic Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned@gcc.gnu.org ReportedBy: ysrumyan@gmail.com Analyzing some performance anomaly for spec2000 I found out that 168.wupwise with vectorization is slower than without it on x86. The main problem is that gcc does not recognize some special idioms of complex addition and multiplication in process of loop vectorization. For example, for a simple zaxpy loop icc genearates 1.6X faster code than gcc. Here is assembly for zaxpy loop produced by icc: ..B1.4: # Preds ..B1.2 ..B1.4 movups (%rsi,%rdx), %xmm2 #7.28 movups 16(%rsi,%rdx), %xmm5 #7.28 movups (%rsi,%rcx), %xmm4 #7.17 movups 16(%rsi,%rcx), %xmm7 #7.17 movddup (%rsi,%rdx), %xmm3 #7.27 incq %r8 #6.10 movddup 16(%rsi,%rdx), %xmm6 #7.27 unpckhpd %xmm2, %xmm2 #7.27 unpckhpd %xmm5, %xmm5 #7.27 mulpd %xmm1, %xmm3 #7.27 mulpd %xmm0, %xmm2 #7.27 mulpd %xmm1, %xmm6 #7.27 mulpd %xmm0, %xmm5 #7.27 addsubpd %xmm2, %xmm3 #7.27 addsubpd %xmm5, %xmm6 #7.27 addpd %xmm3, %xmm4 #7.9 addpd %xmm6, %xmm7 #7.9 movups %xmm4, (%rsi,%rcx) #7.9 movups %xmm7, 16(%rsi,%rcx) #7.9 addq $32, %rsi #6.10 cmpq %rdi, %r8 #6.10 jb ..B1.4 # Prob 64% #6.10 ( I got it with -xSSE4.2 -O3 options). Gor gcc compiler the following options were used: -m64 -mfpmath=sse -march=corei7 -O3 -ffast-math. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic 2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com @ 2012-10-16 14:37 ` rguenth at gcc dot gnu.org 2012-10-16 14:55 ` ysrumyan at gmail dot com ` (5 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: rguenth at gcc dot gnu.org @ 2012-10-16 14:37 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2012-10-16 Blocks| |53947 Ever Confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> 2012-10-16 14:36:52 UTC --- Can you reproduce zaxpy source here please? Also please see the list of bugs referenced from PR53947, there is likely a duplicate for this issue. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic 2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com 2012-10-16 14:37 ` [Bug tree-optimization/54939] " rguenth at gcc dot gnu.org @ 2012-10-16 14:55 ` ysrumyan at gmail dot com 2012-10-16 15:06 ` ysrumyan at gmail dot com ` (4 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: ysrumyan at gmail dot com @ 2012-10-16 14:55 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939 --- Comment #2 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2012-10-16 14:54:50 UTC --- Created attachment 28455 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28455 test reproducer ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic 2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com 2012-10-16 14:37 ` [Bug tree-optimization/54939] " rguenth at gcc dot gnu.org 2012-10-16 14:55 ` ysrumyan at gmail dot com @ 2012-10-16 15:06 ` ysrumyan at gmail dot com 2012-10-16 15:32 ` rguenth at gcc dot gnu.org ` (3 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: ysrumyan at gmail dot com @ 2012-10-16 15:06 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939 --- Comment #3 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2012-10-16 15:06:19 UTC --- I looked through the list of all issues related to vectorization but could not find duplicate. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic 2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com ` (2 preceding siblings ...) 2012-10-16 15:06 ` ysrumyan at gmail dot com @ 2012-10-16 15:32 ` rguenth at gcc dot gnu.org 2013-03-27 11:19 ` rguenth at gcc dot gnu.org ` (2 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: rguenth at gcc dot gnu.org @ 2012-10-16 15:32 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939 --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> 2012-10-16 15:31:52 UTC --- Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic 2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com ` (3 preceding siblings ...) 2012-10-16 15:32 ` rguenth at gcc dot gnu.org @ 2013-03-27 11:19 ` rguenth at gcc dot gnu.org 2023-07-21 12:28 ` rguenth at gcc dot gnu.org 2023-07-21 12:31 ` rguenth at gcc dot gnu.org 6 siblings, 0 replies; 8+ messages in thread From: rguenth at gcc dot gnu.org @ 2013-03-27 11:19 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Blocks| |37021 AssignedTo|unassigned at gcc dot |rguenth at gcc dot gnu.org |gnu.org | --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-27 11:19:49 UTC --- Confirmed. GCC vectorizes this using hybrid SLP - it unrolls the loop once to be able to vectorize two minus and two adds resulting from the complex multiplication. The PR is kind-of a duplicate of PR37021 where also a reduction and a variable stride is involved. So fixing this bug is required to more efficiently vectorize PR37021. Note that even this bug has multiple issues that need to be tackled. I happen to work on them. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic 2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com ` (4 preceding siblings ...) 2013-03-27 11:19 ` rguenth at gcc dot gnu.org @ 2023-07-21 12:28 ` rguenth at gcc dot gnu.org 2023-07-21 12:31 ` rguenth at gcc dot gnu.org 6 siblings, 0 replies; 8+ messages in thread From: rguenth at gcc dot gnu.org @ 2023-07-21 12:28 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution|--- |FIXED CC| |crazylht at gmail dot com --- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> --- With SSE4.2 we now get .L3: movupd (%rdx,%rax), %xmm0 movupd (%rcx,%rax), %xmm4 movapd %xmm0, %xmm1 palignr $8, %xmm0, %xmm0 mulpd %xmm3, %xmm1 mulpd %xmm2, %xmm0 addpd %xmm4, %xmm1 addsubpd %xmm0, %xmm1 movups %xmm1, (%rcx,%rax) addq $16, %rax cmpq %rsi, %rax jne .L3 with AVX and FMA .L4: vmovupd (%rdx,%rax), %ymm0 vmovapd %ymm4, %ymm1 vfmadd213pd (%rcx,%rax), %ymm0, %ymm1 vpermilpd $5, %ymm0, %ymm0 vmulpd %ymm3, %ymm0, %ymm0 vaddsubpd %ymm0, %ymm1, %ymm1 vmovupd %ymm1, (%rcx,%rax) addq $32, %rax cmpq %rsi, %rax jne .L4 so I'd say fixed. But. With AVX512 we now get .L4: vmovupd (%rdi,%rax), %zmm0 vmovapd %zmm7, %zmm2 vmovapd %zmm4, %zmm6 vfmadd213pd (%rcx,%rax), %zmm0, %zmm2 vpermilpd $85, %zmm0, %zmm0 vfmadd132pd %zmm0, %zmm2, %zmm6 vfnmadd132pd %zmm4, %zmm2, %zmm0 vmovapd %zmm6, %zmm0{%k1} vmovupd %zmm0, (%rcx,%rax) addq $64, %rax cmpq %rax, %rsi jne .L4 it's odd that this only happens with -mprefer-vector-width=512 though. Do we possibly miss vec_{fm,}{addsub,subadd} for those? Looks like so. Tracking in PR110767. The vectorizer side is fixed. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic 2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com ` (5 preceding siblings ...) 2023-07-21 12:28 ` rguenth at gcc dot gnu.org @ 2023-07-21 12:31 ` rguenth at gcc dot gnu.org 6 siblings, 0 replies; 8+ messages in thread From: rguenth at gcc dot gnu.org @ 2023-07-21 12:31 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939 Bug 54939 depends on bug 84361, which changed state. Bug 84361 Summary: Fails to use vfmaddsub* for complex multiplication https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84361 What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-07-21 12:31 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com 2012-10-16 14:37 ` [Bug tree-optimization/54939] " rguenth at gcc dot gnu.org 2012-10-16 14:55 ` ysrumyan at gmail dot com 2012-10-16 15:06 ` ysrumyan at gmail dot com 2012-10-16 15:32 ` rguenth at gcc dot gnu.org 2013-03-27 11:19 ` rguenth at gcc dot gnu.org 2023-07-21 12:28 ` rguenth at gcc dot gnu.org 2023-07-21 12:31 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).