From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-155760-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 17418 invoked by alias); 4 Aug 2009 13:22:23 -0000
Received: (qmail 17405 invoked by uid 22791); 4 Aug 2009 13:22:21 -0000
X-SWARE-Spam-Status: No, hits=-1.2 required=5.0 	tests=AWL,BAYES_00,J_CHICKENPOX_21,J_CHICKENPOX_22,J_CHICKENPOX_32,J_CHICKENPOX_61
X-Spam-Check-By: sourceware.org
Received: from bromo.med.uc.edu (HELO bromo.med.uc.edu) (129.137.3.146)     by sourceware.org (qpsmtpd/0.43rc1) with SMTP; Tue, 04 Aug 2009 13:22:11 +0000
Received: from bromo.med.uc.edu (localhost.localdomain [127.0.0.1]) 	by bromo.med.uc.edu (Postfix) with ESMTP id 5231CB0074; 	Tue,  4 Aug 2009 09:22:09 -0400 (EDT)
Received: (from howarth@localhost) 	by bromo.med.uc.edu (8.14.3/8.14.3/Submit) id n74DM9ww009820; 	Tue, 4 Aug 2009 09:22:09 -0400
Date: Tue, 04 Aug 2009 13:24:00 -0000
From: Jack Howarth <howarth@bromo.med.uc.edu>
To: Dorit Nuzman <DORIT@il.ibm.com>
Cc: gcc@gcc.gnu.org, Ira Rosen <IRAR@il.ibm.com>
Subject: Re: PR33113
Message-ID: <20090804132208.GA9675@bromo.med.uc.edu>
References: <20090803142618.GA28181@bromo.med.uc.edu> <OFDC605335.8377CD5D-ONC2257608.00337691-C2257608.0034AAB7@il.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <OFDC605335.8377CD5D-ONC2257608.00337691-C2257608.0034AAB7@il.ibm.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2009-08/txt/msg00058.txt.bz2

On Tue, Aug 04, 2009 at 12:35:15PM +0300, Dorit Nuzman wrote:
> 
> Hi Jack,
> AFAIK this topic has not been addressed (closest thing is Richard
> Guenther's work on versioning unknown strides
> http://gcc.gnu.org/ml/gcc-patches/2009-01/msg01174.html) and I don't know
> about the prospects of this being done for gcc4.5 (I'd definitely lobby for
> it... do you know which of the polyhedron benchmarks have this issue,
> and/or which other benchmarks suffer from it?)
> Thanks for the reminder...
> dorit
> 

Dorit,
   While I don't have the list of effected benchmarks handy, a testcase which Tobias Burnus
made is discussed in the previous email below. I believe the issue was supposed to be pretty
wide spread in the benchmarks.
            Jack

> > Tobias Burnus <burnus@net-b.de> wrote on 13/08/2007 22:58:54:
>
> > Hi Dorit and Jack,
> >
> > Dorit Nuzman wrote:
> > > I actually wouldn't expect much improvement yet, as this initial
> version
> > > still has quite a few limitations. But if there are loops that you
> expect
> > > to get vectorized that do not, I'd be happy to take a look.
> > >
> > If something can be improved in the frontend, please tell us/the
> > gfortraners.
> >
> > Neither of the two loops get vectorized; the first one should be easier
> > than the second one:
> >
> > subroutine sub(aa,bb,n,m)
> >   implicit none
> >   integer, intent(in) :: n,m
> >   real, intent(inout) :: aa(n,m)
> >   real, intent(in)    :: bb(n,m)
> >   integer :: i,j
> >   do i = 1,m
> >     do j= 2,n
> >       aa(i,j)= aa(i,j-1)+bb(i,j-1)
> >     enddo
> >   enddo
> >   do j= 2,n
> >     do i = 1,m
> >       aa(i,j)= aa(i,j-1)+bb(i,j-1)
> >     enddo
> >   enddo
> > end subroutine
> > end
> >
> > First loop:
> > a.f90:8: note: not vectorized: data ref analysis failed D.1462_61 =
> > (*aa_60(D))[D.1461_59]
>
> Here, AFAIU, the inner-loop accesses non-consecutive data, whereas the
> outer-loop accesses consecutive data. So the only way to vectorize this
> loop is either to interchange the loops, or to do outer-loop
vectorization
> (unless we want to gather disjoint elements in each vectorized inner-loop
> iteration, which probably we don't). What fails the outer-loop vectorizer
> (i.e. when attempting to vectorize the loop in line 7) is the fact that
the
> inner-loop bound is unknown, in which case the compiler creates a
> guard-code before the inner-loop to decide whether to skip it or not,
which
> makes the structure of the outer-loop more complicated than is currently
> unhandled. (This is also what I referred to here:
> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00743.html).
> So we get:
>       a.f90:7: note: not vectorized: too many BBs in loop.
> To work around that (just to see how far we can get) I put in constant
> bound for the inner-loop. Now outer-loop vectorization fails with:
>       a.f90:7: note: not vectorized: data ref analysis failed D.1444_64 =
> (*aa_63(D))[D.1443_62]
>       a.f90:7: note: bad data references.
> This is because the stride in the inner-loop is unknown (because I guess
> the dimention size is unknown). You can see that the data-ref analyzer
> reports:
>       "failed: evolution of offset is not affine."
> I'll bring it up with Sebastian and Zdenek, see what they think can be
done
> within the data-ref analyzer in this resepct.
>
> After outer-loop vectorization fails we move on to vectorize the loop in
> line 8, and fail with the same data-ref analysis problem (unknown
stride).
> So for now, we need a testcase with a known inner-loop bound and known
> dimentions/strides.
> However, even when we are able to overcome all the above, we still won't
be
> able to vectorize the loop unless the inner-loop execution order is
> reversed (otherwise there's a true dependnece from a(i,j) to a(i,j-1)).
> ifort probably has loop reversal capabilities. We don't reverse loops yet
> in the middle-end, and also we don't vectorize loops with a reverse
access
> yet.
>
> > Second loop:
> > a.f90:13: note: Unknown alignment for access: *aa_60(D)
> > a.f90:13: note: Unknown alignment for access: *bb_68(D)
> > a.f90:13: note: Unknown alignment for access: *aa_60(D)
> > a.f90:13: note: not vectorized: can't determine dependence between
> > (*aa_60(D))[D.1461_59] and (*aa_60(D))[D.1456_53]
> >
>
> Here, the outer-loop (in line 12) accesses non-consecutive data, whereas
> the inner-loop (in line 13) accesses consecutive data, so the inner-loop
is
> a better candidate for vectorization. At first, the outer-loop vectorizer
> fails with:
>       a.f90:12: note: not vectorized: too many BBs in loop.
> (same as above), but after I change the inner-loop bound to a constnat it
> fails with:
>       a.f90:12: note: evolution of offset is not affine.
> (again, same problem as above, because the outer-loop stride is unknown).
>
> Moving on to the loop at line 13, we get:
>       a.f90:13: note: not vectorized: can't determine dependence between
> (*aa_63(D))[D.1443_91] and (*aa_63(D))[D.1438_85]
>       a.f90:13: note: bad data dependence.
>
> This is because the data-dependence analysis fails with the following
> message:
> "
> (compute_affine_dependence
>   (stmt_a =
> D.1444_92 = (*aa_63(D))[D.1443_91])
>   (stmt_b =
> (*aa_63(D))[D.1438_85] = D.1449_100)
> (subscript_dependence_tester
> (analyze_overlapping_iterations
>   (chrec_a = {pretmp.62_120 + 1, +, 1}_4)
>   (chrec_b = {pretmp.60_49 + 1, +, 1}_4)
> (analyze_siv_subscript
> siv test failed: unimplemented.
> )
>   (overlap_iterations_a = not known
> )
>   (overlap_iterations_b = not known
> )
> )
> (dependence classified: scev_not_known)
> )
> )
> "
> There's already an open PR for this. I'll ping Sebastian about this
issue.
>
> > ifort vectorizes both loops.
> >
> >
> > The following loop is also vectorized with ifort; it is from the
> > Polyhedron test suite (ac.f90); exchanging the ix and iy loop does not
> > change anything.
> >
> >       SUBROUTINE SUSCEP(L,Iz)
> >       IMPLICIT NONE
> >       INTEGER L , Iz(L,L) , iznum, ix, iy
> >       iznum = 0
> >       DO iy = 1 , L
> >          DO ix = 1 , L
> >             iznum = iznum + Iz(iy,ix)
> >          ENDDO
> >       ENDDO
> >       END subroutine
> >       end
> >
> > gcc only prints:
> > test2.f90:1: note: vectorized 0 loops in function.
>
> the reason you don't see anything here is that GCC optimizes the loop
away.
> To work around that I added a print of iznum at the end of the function
> (and indeed now you can see that the loop survives and the vectorizer
> analyzes it). With this change only, the vectorizer reports:
>       failed: evolution of offset is not affine.
> (same problem as above - the stride is unknown cause the dimention size
is
> unknown).
>
> If I interchange the loops I get:
>       b.f90:6: note: Analyze phi: iznum_lsm.74_31 = PHI
>       <iznum_lsm.74_32(4), iznum_lsm.74_12(6)>
>       b.f90:6: note: reduction: not commutative/associative: iznum.10_37
>       tobias2b.f90:6: note: Unknown def-use cycle pattern.
>       ...
>       b.f90:6: note: worklist: examine stmt: iznum.9_36 = iznum_lsm.74_31
>       b.f90:6: note: vect_is_simple_use: operand iznum_lsm.74_31
>       b.f90:6: note: def_stmt: iznum_lsm.74_31 = PHI <iznum_lsm.74_32(4),
>       iznum_lsm.74_12(6)>
>       b.f90:6: note: Unsupported pattern.
>       b.f90:6: note: not vectorized: unsupported use in stmt.
>       2b.f90:6: note: unexpected pattern.
>
> This happens because we get the following pattern:
>   # iznum_lsm.74_31 = PHI <iznum_lsm.74_32(4), iznum_lsm.74_12(6)>
>   ...
>   iznum.9_36 = iznum_lsm.74_31;
>   iznum.10_37 = D.1420_35 + iznum.9_36;
>   iznum_lsm.74_12 = iznum.10_37;
>   ...
>
> which is a "complicated" way to write:
>   # iznum_lsm.74_31 = PHI <iznum_lsm.74_32(4), iznum_lsm.74_12(6)>
>   ...
>   iznum_lsm.74_12 = D.1420_35 + iznum_lsm.74_31;
>   ...
>
> In other words, the extra assignments confuse the vectorizer. This is a
> known problem - we need to extend our reduction detection code to detect
> more "complicated" def-use cycles than just the trivial one. I hope this
> will get done in the near future... (I'll add this testcase to the
relevant
> PR).
>
> Thanks for your feedback, it helps to prioritize the million items on our
> todo list...
>
> dorit
>
>