public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/39300]  New: vectorizer confused by predictive commoning
@ 2009-02-25 12:16 matz at gcc dot gnu dot org
  2009-02-25 13:53 ` [Bug tree-optimization/39300] " matz at gcc dot gnu dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: matz at gcc dot gnu dot org @ 2009-02-25 12:16 UTC (permalink / raw)
  To: gcc-bugs

The loop in this test is not vectorized if either PRE or predictive
commoning is active:
% cat vecttest2.f
      subroutine Bench_StaggeredLeapfrog2( cctk_dim,XADM_curv_stag0,
     &ADM_kzz_stag,lgxx,nx)
      implicit none
      INTEGER cctk_dim
      INTEGER XADM_curv_stag0
      REAL*8 ADM_kzz_stag(XADM_curv_stag0)

      integer :: i
      integer :: nx
      REAL*8,DIMENSION(cctk_dim):: lgxx
      do i=2,nx-1
        ADM_kzz_stag(i) = ADM_kzz_stag(i)+lgxx(i)+lgxx(i-1)+lgxx(i+1)
      end do
      end subroutine Bench_StaggeredLeapfrog2
% gfortran -c -O3 -g -ffast-math -ftree-vectorizer-verbose=2 vecttest2.f
vecttest2.f:11: note: not vectorized: unsupported use in stmt.
vecttest2.f:12: note: not vectorized: unsupported use in stmt.
% add -fno-tree-pre -fno-predictive-commoning to above command:
vecttest2.f:11: note: LOOP VECTORIZED.
% add only -fno-tree-pre (so predictive commoning is active):
vecttest2.f:11: note: LOOP VECTORIZED.
vecttest2.f:12: note: not vectorized: unsupported use in stmt.

The one vectorized loop in case it mentions two of them is the tail loop
for the one produced by predictive commoning.  That one doesn't contain
any loop carried values.  Somehow the vectorizer doesn't like the 
PHI nodes in the loop created by predictive commoning.

This testcase comes from 436.cactusADM, where it's very important to
vectorize a certain inner loop, and this (PRE and predcom) is one reason this
doesn't happen already.


-- 
           Summary: vectorizer confused by predictive commoning
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: matz at gcc dot gnu dot org
  GCC host triplet: x86_64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/39300] vectorizer confused by predictive commoning
  2009-02-25 12:16 [Bug tree-optimization/39300] New: vectorizer confused by predictive commoning matz at gcc dot gnu dot org
@ 2009-02-25 13:53 ` matz at gcc dot gnu dot org
  2009-02-25 13:56 ` rguenth at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: matz at gcc dot gnu dot org @ 2009-02-25 13:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from matz at gcc dot gnu dot org  2009-02-25 13:53 -------
For reference intel fortran (11.0) produces three loops, one where it uses
predictive commoning (that is used when there are only few iterations):

..B1.7:                         # Preds ..B1.6
        movsd     8(%r8), %xmm1                                 #13.52
        movsd     (%r8), %xmm0                                  #13.52
                                # LOE rax rdx rcx rbx rbp rsi rdi r8 r9 r12 r13
r14 r15 xmm0 xmm1
..B1.8:                         # Preds ..B1.8 ..B1.7
        movaps    %xmm1, %xmm2                                  #13.33
        movsd     16(%r8,%rdi,8), %xmm3                         #13.52
        addsd     %xmm3, %xmm2                                  #13.33
        addsd     %xmm0, %xmm2                                  #13.41
        movaps    %xmm1, %xmm0                                  #14.7
        movaps    %xmm3, %xmm1                                  #14.7
        addsd     8(%rdx,%rdi,8), %xmm2                         #13.9
        movsd     %xmm2, 8(%rcx,%rdi,8)                         #13.9
        incq      %rdi                                          #14.7
        cmpq      %rax, %rdi                                    #14.7
        jl        ..B1.8        # Prob 82%                      #14.7

And two others which are vectorized (plus four/eight times unrolled), but
do _not_ use something like predictive commoning (i.e. no cross iteration
values).  Both loops are just versions of each other, one for aligned
destinations and the other for unaligned.  The aligned variant is this:

..B1.15:                        # Preds ..B1.10 ..B1.15
        movsd     8(%rdx,%rax,8), %xmm1                         #13.18
        movhpd    16(%rdx,%rax,8), %xmm1                        #13.18
        movsd     8(%r8,%rax,8), %xmm0                          #13.34
        movhpd    16(%r8,%rax,8), %xmm0                         #13.34
        movsd     24(%rdx,%rax,8), %xmm4                        #13.18
        movhpd    32(%rdx,%rax,8), %xmm4                        #13.18
        movsd     24(%r8,%rax,8), %xmm2                         #13.34
        movhpd    32(%r8,%rax,8), %xmm2                         #13.34
        movsd     40(%rdx,%rax,8), %xmm7                        #13.18
        movhpd    48(%rdx,%rax,8), %xmm7                        #13.18
        movsd     40(%r8,%rax,8), %xmm5                         #13.34
        movhpd    48(%r8,%rax,8), %xmm5                         #13.34
        movsd     56(%rdx,%rax,8), %xmm10                       #13.18
        movhpd    64(%rdx,%rax,8), %xmm10                       #13.18
        movsd     56(%r8,%rax,8), %xmm8                         #13.34
        movhpd    64(%r8,%rax,8), %xmm8                         #13.34
        addpd     %xmm0, %xmm1                                  #13.33
        addpd     (%r8,%rax,8), %xmm1                           #13.41
        addpd     %xmm2, %xmm4                                  #13.33
        addpd     %xmm5, %xmm7                                  #13.33
        addpd     %xmm8, %xmm10                                 #13.33
        movaps    16(%r8,%rax,8), %xmm3                         #13.52
        addpd     %xmm3, %xmm1                                  #13.9
        movaps    32(%r8,%rax,8), %xmm6                         #13.52
        movaps    48(%r8,%rax,8), %xmm9                         #13.52
        movaps    %xmm1, 8(%rcx,%rax,8)                         #13.9
        addpd     %xmm3, %xmm4                                  #13.41
        addpd     %xmm6, %xmm4                                  #13.9
        movaps    %xmm4, 24(%rcx,%rax,8)                        #13.9
        addpd     %xmm6, %xmm7                                  #13.41
        addpd     %xmm9, %xmm7                                  #13.9
        movaps    %xmm7, 40(%rcx,%rax,8)                        #13.9
        addpd     %xmm9, %xmm10                                 #13.41
        addpd     64(%r8,%rax,8), %xmm10                        #13.9
        movaps    %xmm10, 56(%rcx,%rax,8)                       #13.9
        addq      $8, %rax                                      #14.7
        cmpq      %r9, %rax                                     #14.7
        jl        ..B1.15       # Prob 82%                      #14.7

Not most optimal, due to not using the cross-iteration values to save 
two loads per iteration.  But still much better than what GCC uses.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/39300] vectorizer confused by predictive commoning
  2009-02-25 12:16 [Bug tree-optimization/39300] New: vectorizer confused by predictive commoning matz at gcc dot gnu dot org
  2009-02-25 13:53 ` [Bug tree-optimization/39300] " matz at gcc dot gnu dot org
@ 2009-02-25 13:56 ` rguenth at gcc dot gnu dot org
  2009-02-25 14:07 ` [Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE rguenth at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-02-25 13:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2009-02-25 13:56 -------
Confirmed.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2009-02-25 13:56:08
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE
  2009-02-25 12:16 [Bug tree-optimization/39300] New: vectorizer confused by predictive commoning matz at gcc dot gnu dot org
  2009-02-25 13:53 ` [Bug tree-optimization/39300] " matz at gcc dot gnu dot org
  2009-02-25 13:56 ` rguenth at gcc dot gnu dot org
@ 2009-02-25 14:07 ` rguenth at gcc dot gnu dot org
  2009-02-25 14:08 ` irar at il dot ibm dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-02-25 14:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2009-02-25 14:07 -------
Simpler C testcase:

float res[1024], data[1024];

void foo(void)
{
  int i;
  float tmp = data[0];
  for (i = 1; i < 1024; ++i)
    {
      float tmp2 = data[i];
      res[i] = tmp + tmp2;
      tmp = tmp2;
    }
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE
  2009-02-25 12:16 [Bug tree-optimization/39300] New: vectorizer confused by predictive commoning matz at gcc dot gnu dot org
                   ` (2 preceding siblings ...)
  2009-02-25 14:07 ` [Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE rguenth at gcc dot gnu dot org
@ 2009-02-25 14:08 ` irar at il dot ibm dot com
  2009-03-08 14:26 ` dorit at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: irar at il dot ibm dot com @ 2009-02-25 14:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from irar at il dot ibm dot com  2009-02-25 14:08 -------
Looks similar to PR 35229.

We get here:
# pre.1 = PHI <D.1, D.2>
..
load D.2 
D.3 = D.2 + pre.1 + ...
store D.3


-- 

irar at il dot ibm dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE
  2009-02-25 12:16 [Bug tree-optimization/39300] New: vectorizer confused by predictive commoning matz at gcc dot gnu dot org
                   ` (3 preceding siblings ...)
  2009-02-25 14:08 ` irar at il dot ibm dot com
@ 2009-03-08 14:26 ` dorit at gcc dot gnu dot org
  2009-03-09 13:25 ` matz at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: dorit at gcc dot gnu dot org @ 2009-03-08 14:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from dorit at gcc dot gnu dot org  2009-03-08 14:25 -------
This is a known problem... Indeed when Zdenek introduced predictive-commoning
there was a discussion on whether to schedule it before or after vectorization.
AFAIR, it ended up getting scheduled before the vectorizer just because this
happened to be what Zdenek tested/experimented with, and he didn't have a
problem with scheduling it after vectorization as long as it didn't hurt
performance (of mgrid in particular). Here are related threads:
http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01383.html
http://gcc.gnu.org/ml/gcc-patches/2007-02/msg00555.html
http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00571.html

Regardless of whether we scheudule predcom after vectorization, it will still
be useful to teach the vectorizer to handle such dependence patterns, as they
may (and do) appear in the source code.  


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE
  2009-02-25 12:16 [Bug tree-optimization/39300] New: vectorizer confused by predictive commoning matz at gcc dot gnu dot org
                   ` (4 preceding siblings ...)
  2009-03-08 14:26 ` dorit at gcc dot gnu dot org
@ 2009-03-09 13:25 ` matz at gcc dot gnu dot org
  2009-07-22 15:31 ` matz at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: matz at gcc dot gnu dot org @ 2009-03-09 13:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from matz at gcc dot gnu dot org  2009-03-09 13:25 -------
It's also PRE that produces such patterns, so moving predcom behind
vectorization alone won't help this problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE
  2009-02-25 12:16 [Bug tree-optimization/39300] New: vectorizer confused by predictive commoning matz at gcc dot gnu dot org
                   ` (5 preceding siblings ...)
  2009-03-09 13:25 ` matz at gcc dot gnu dot org
@ 2009-07-22 15:31 ` matz at gcc dot gnu dot org
  2009-07-22 15:40 ` matz at gcc dot gnu dot org
  2009-08-17 17:39 ` jessiecute13 at aol dot com
  8 siblings, 0 replies; 10+ messages in thread
From: matz at gcc dot gnu dot org @ 2009-07-22 15:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from matz at gcc dot gnu dot org  2009-07-22 15:31 -------
Subject: Bug 39300

Author: matz
Date: Wed Jul 22 15:30:50 2009
New Revision: 149942

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=149942
Log:
        PR tree-optimization/35229
        PR tree-optimization/39300

        * tree-ssa-pre.c (includes): Include tree-scalar-evolution.h.
        (inhibit_phi_insertion): New function.
        (insert_into_preds_of_block): Call it for REFERENCEs.
        (init_pre): Initialize and finalize scalar evolutions.
        * Makefile.in (tree-ssa-pre.o): Depend on tree-scalar-evolution.h .

testsuite/
        * gcc.dg/vect/vect-pre-interact.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/vect-pre-interact.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-pre.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE
  2009-02-25 12:16 [Bug tree-optimization/39300] New: vectorizer confused by predictive commoning matz at gcc dot gnu dot org
                   ` (6 preceding siblings ...)
  2009-07-22 15:31 ` matz at gcc dot gnu dot org
@ 2009-07-22 15:40 ` matz at gcc dot gnu dot org
  2009-08-17 17:39 ` jessiecute13 at aol dot com
  8 siblings, 0 replies; 10+ messages in thread
From: matz at gcc dot gnu dot org @ 2009-07-22 15:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from matz at gcc dot gnu dot org  2009-07-22 15:40 -------
So, the immediate problem is now fixed, but I'd suggest leaving this
enhancement request open, in case anybody wants to work on extending the
vectorizer to deal with these loop-carried dependencies, because in that
case we wouldn't need to dumb down PRE anymore, which would be even better.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE
  2009-02-25 12:16 [Bug tree-optimization/39300] New: vectorizer confused by predictive commoning matz at gcc dot gnu dot org
                   ` (7 preceding siblings ...)
  2009-07-22 15:40 ` matz at gcc dot gnu dot org
@ 2009-08-17 17:39 ` jessiecute13 at aol dot com
  8 siblings, 0 replies; 10+ messages in thread
From: jessiecute13 at aol dot com @ 2009-08-17 17:39 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from jessiecute13 at aol dot com  2009-08-17 17:38 -------
$1.21


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-08-17 17:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-25 12:16 [Bug tree-optimization/39300] New: vectorizer confused by predictive commoning matz at gcc dot gnu dot org
2009-02-25 13:53 ` [Bug tree-optimization/39300] " matz at gcc dot gnu dot org
2009-02-25 13:56 ` rguenth at gcc dot gnu dot org
2009-02-25 14:07 ` [Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE rguenth at gcc dot gnu dot org
2009-02-25 14:08 ` irar at il dot ibm dot com
2009-03-08 14:26 ` dorit at gcc dot gnu dot org
2009-03-09 13:25 ` matz at gcc dot gnu dot org
2009-07-22 15:31 ` matz at gcc dot gnu dot org
2009-07-22 15:40 ` matz at gcc dot gnu dot org
2009-08-17 17:39 ` jessiecute13 at aol dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).