public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic
@ 2012-10-16 14:22 ysrumyan at gmail dot com
  2012-10-16 14:37 ` [Bug tree-optimization/54939] " rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: ysrumyan at gmail dot com @ 2012-10-16 14:22 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939

             Bug #: 54939
           Summary: Very poor vectorization of loops with complex
                    arithmetic
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: ysrumyan@gmail.com


Analyzing some performance anomaly for spec2000 I found out that 168.wupwise
with vectorization is slower than without it on x86. The main problem is that
gcc does not recognize some special idioms of complex addition and
multiplication in process of loop vectorization. For example, for a simple
zaxpy loop icc genearates 1.6X faster code than gcc. Here is assembly for zaxpy
loop produced by icc:

..B1.4:                         # Preds ..B1.2 ..B1.4
        movups    (%rsi,%rdx), %xmm2                            #7.28
        movups    16(%rsi,%rdx), %xmm5                          #7.28
        movups    (%rsi,%rcx), %xmm4                            #7.17
        movups    16(%rsi,%rcx), %xmm7                          #7.17
        movddup   (%rsi,%rdx), %xmm3                            #7.27
        incq      %r8                                           #6.10
        movddup   16(%rsi,%rdx), %xmm6                          #7.27
        unpckhpd  %xmm2, %xmm2                                  #7.27
        unpckhpd  %xmm5, %xmm5                                  #7.27
        mulpd     %xmm1, %xmm3                                  #7.27
        mulpd     %xmm0, %xmm2                                  #7.27
        mulpd     %xmm1, %xmm6                                  #7.27
        mulpd     %xmm0, %xmm5                                  #7.27
        addsubpd  %xmm2, %xmm3                                  #7.27
        addsubpd  %xmm5, %xmm6                                  #7.27
        addpd     %xmm3, %xmm4                                  #7.9
        addpd     %xmm6, %xmm7                                  #7.9
        movups    %xmm4, (%rsi,%rcx)                            #7.9
        movups    %xmm7, 16(%rsi,%rcx)                          #7.9
        addq      $32, %rsi                                     #6.10
        cmpq      %rdi, %r8                                     #6.10
        jb        ..B1.4        # Prob 64%                      #6.10
( I got it with -xSSE4.2 -O3 options). Gor gcc compiler the following options
were used: -m64 -mfpmath=sse  -march=corei7 -O3 -ffast-math.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-07-21 12:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-16 14:22 [Bug tree-optimization/54939] New: Very poor vectorization of loops with complex arithmetic ysrumyan at gmail dot com
2012-10-16 14:37 ` [Bug tree-optimization/54939] " rguenth at gcc dot gnu.org
2012-10-16 14:55 ` ysrumyan at gmail dot com
2012-10-16 15:06 ` ysrumyan at gmail dot com
2012-10-16 15:32 ` rguenth at gcc dot gnu.org
2013-03-27 11:19 ` rguenth at gcc dot gnu.org
2023-07-21 12:28 ` rguenth at gcc dot gnu.org
2023-07-21 12:31 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).