public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "dorit at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/37021] Fortran Complex reduction / multiplication not vectorized
Date: Tue, 27 Jan 2009 12:40:00 -0000	[thread overview]
Message-ID: <20090127124026.4093.qmail@sourceware.org> (raw)
In-Reply-To: <bug-37021-10053@http.gcc.gnu.org/bugzilla/>



------- Comment #9 from dorit at gcc dot gnu dot org  2009-01-27 12:40 -------
(In reply to comment #4)
> The testcase should be
> subroutine to_product_of(self,a,b,a1,a2)
>   complex(kind=8) :: self (:)
>   complex(kind=8), intent(in) :: a(:,:)
>   complex(kind=8), intent(in) :: b(:)
>   integer a1,a2
>   do i = 1,a1
>     do j = 1,a2
>       self(i) = self(i) + a(j,i)*b(j)
>     end do
>   end do
> end subroutine
> to be meaningful - otherwise we are accessing a in non-continuous ways in the
> inner loop which would prevent vectorization.

this change from a(i,j) to a(j,i) is not required if we try to vectorize the
outer-loop, where the stride is 1. It's also a better way to vectorize the
reduction. A few limitations on the way though are:

1) somehow don't let gcc create guard code around the innermost loop to check
that it executes more than zero iterations. This creates a complicated control
flow structure within the outer-loop. For now you have to have  constant number
of iterations for the inner-loop because of that, or insert a statement like
"if (a2<=0) return;" before the loop...

2) use -fno-tree-sink cause otherwise it moves the loop iv increment to the
latch block and the vectorizer likes to have the latch block empty...

(see also PR33113 for related reference).


> With the versioning for stride == 1 I get then
> .L13:
>         movupd  16(%rax), %xmm1
>         movupd  (%rax), %xmm3
>         incl    %ecx
>         movupd  (%rdx), %xmm4
>         addq    $32, %rax
>         movapd  %xmm3, %xmm0
>         unpckhpd        %xmm1, %xmm3
>         unpcklpd        %xmm1, %xmm0
>         movupd  16(%rdx), %xmm1
>         movapd  %xmm4, %xmm2
>         addq    $32, %rdx
>         movapd  %xmm3, %xmm9
>         cmpl    %ecx, %r8d
>         unpcklpd        %xmm1, %xmm2
>         unpckhpd        %xmm1, %xmm4
>         movapd  %xmm4, %xmm1
>         movapd  %xmm2, %xmm4
>         mulpd   %xmm1, %xmm9
>         mulpd   %xmm0, %xmm4
>         mulpd   %xmm3, %xmm2
>         mulpd   %xmm1, %xmm0
>         subpd   %xmm9, %xmm4
>         addpd   %xmm2, %xmm0
>         addpd   %xmm4, %xmm6
>         addpd   %xmm0, %xmm5
>         ja      .L13
>         haddpd  %xmm5, %xmm5
>         cmpl    %r15d, %edi
>         movl    -4(%rsp), %ecx
>         haddpd  %xmm6, %xmm6
>         addsd   %xmm5, %xmm8
>         addsd   %xmm6, %xmm7
>         jne     .L12
>         jmp     .L14
> for the innermost loop, followed by a tail loop (peel for niters).  This is
> about 15% faster on AMD K10 than the non-vectorized loop (if you disable
> the cost-model and make sure to have enough iterations in the inner loop
> to pay back for the extra guarding conditions).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021


  parent reply	other threads:[~2009-01-27 12:40 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-04 17:57 [Bug tree-optimization/37021] New: " rguenth at gcc dot gnu dot org
2008-08-04 17:59 ` [Bug tree-optimization/37021] " rguenth at gcc dot gnu dot org
2008-08-19 15:31 ` rguenth at gcc dot gnu dot org
2009-01-21 15:43 ` rguenth at gcc dot gnu dot org
2009-01-23 15:33 ` rguenth at gcc dot gnu dot org
2009-01-23 15:36 ` rguenth at gcc dot gnu dot org
2009-01-25  9:13 ` irar at il dot ibm dot com
2009-01-25 11:04 ` rguenther at suse dot de
2009-01-25 12:17 ` irar at il dot ibm dot com
2009-01-27 12:40 ` dorit at gcc dot gnu dot org [this message]
     [not found] <bug-37021-4@http.gcc.gnu.org/bugzilla/>
2011-03-25 11:49 ` sebastian.hegler@tu-dresden.de
2011-03-25 12:27 ` sebastian.hegler@tu-dresden.de
2011-03-25 13:13 ` rguenther at suse dot de
2012-07-13  8:46 ` rguenth at gcc dot gnu.org
2013-02-13 15:58 ` rguenth at gcc dot gnu.org
2013-03-27 10:39 ` rguenth at gcc dot gnu.org
2013-03-27 10:40 ` rguenth at gcc dot gnu.org
2013-04-07 13:18 ` dominiq at lps dot ens.fr
2015-05-12 11:56 ` rguenth at gcc dot gnu.org
2015-06-10 10:45 ` rguenth at gcc dot gnu.org
2015-08-25  8:11 ` rguenth at gcc dot gnu.org
2015-08-27 22:09 ` wschmidt at gcc dot gnu.org
2015-08-28  7:46 ` rguenther at suse dot de
2015-08-28 13:20 ` wschmidt at gcc dot gnu.org
2015-08-28 13:31 ` wschmidt at gcc dot gnu.org
2015-10-22 10:03 ` rguenth at gcc dot gnu.org
2023-07-21 12:28 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090127124026.4093.qmail@sourceware.org \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).