From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17418 invoked by alias); 4 Aug 2009 13:22:23 -0000 Received: (qmail 17405 invoked by uid 22791); 4 Aug 2009 13:22:21 -0000 X-SWARE-Spam-Status: No, hits=-1.2 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_21,J_CHICKENPOX_22,J_CHICKENPOX_32,J_CHICKENPOX_61 X-Spam-Check-By: sourceware.org Received: from bromo.med.uc.edu (HELO bromo.med.uc.edu) (129.137.3.146) by sourceware.org (qpsmtpd/0.43rc1) with SMTP; Tue, 04 Aug 2009 13:22:11 +0000 Received: from bromo.med.uc.edu (localhost.localdomain [127.0.0.1]) by bromo.med.uc.edu (Postfix) with ESMTP id 5231CB0074; Tue, 4 Aug 2009 09:22:09 -0400 (EDT) Received: (from howarth@localhost) by bromo.med.uc.edu (8.14.3/8.14.3/Submit) id n74DM9ww009820; Tue, 4 Aug 2009 09:22:09 -0400 Date: Tue, 04 Aug 2009 13:24:00 -0000 From: Jack Howarth To: Dorit Nuzman Cc: gcc@gcc.gnu.org, Ira Rosen Subject: Re: PR33113 Message-ID: <20090804132208.GA9675@bromo.med.uc.edu> References: <20090803142618.GA28181@bromo.med.uc.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2009-08/txt/msg00058.txt.bz2 On Tue, Aug 04, 2009 at 12:35:15PM +0300, Dorit Nuzman wrote: > > Hi Jack, > AFAIK this topic has not been addressed (closest thing is Richard > Guenther's work on versioning unknown strides > http://gcc.gnu.org/ml/gcc-patches/2009-01/msg01174.html) and I don't know > about the prospects of this being done for gcc4.5 (I'd definitely lobby for > it... do you know which of the polyhedron benchmarks have this issue, > and/or which other benchmarks suffer from it?) > Thanks for the reminder... > dorit > Dorit, While I don't have the list of effected benchmarks handy, a testcase which Tobias Burnus made is discussed in the previous email below. I believe the issue was supposed to be pretty wide spread in the benchmarks. Jack > > Tobias Burnus wrote on 13/08/2007 22:58:54: > > > Hi Dorit and Jack, > > > > Dorit Nuzman wrote: > > > I actually wouldn't expect much improvement yet, as this initial > version > > > still has quite a few limitations. But if there are loops that you > expect > > > to get vectorized that do not, I'd be happy to take a look. > > > > > If something can be improved in the frontend, please tell us/the > > gfortraners. > > > > Neither of the two loops get vectorized; the first one should be easier > > than the second one: > > > > subroutine sub(aa,bb,n,m) > > implicit none > > integer, intent(in) :: n,m > > real, intent(inout) :: aa(n,m) > > real, intent(in) :: bb(n,m) > > integer :: i,j > > do i = 1,m > > do j= 2,n > > aa(i,j)= aa(i,j-1)+bb(i,j-1) > > enddo > > enddo > > do j= 2,n > > do i = 1,m > > aa(i,j)= aa(i,j-1)+bb(i,j-1) > > enddo > > enddo > > end subroutine > > end > > > > First loop: > > a.f90:8: note: not vectorized: data ref analysis failed D.1462_61 = > > (*aa_60(D))[D.1461_59] > > Here, AFAIU, the inner-loop accesses non-consecutive data, whereas the > outer-loop accesses consecutive data. So the only way to vectorize this > loop is either to interchange the loops, or to do outer-loop vectorization > (unless we want to gather disjoint elements in each vectorized inner-loop > iteration, which probably we don't). What fails the outer-loop vectorizer > (i.e. when attempting to vectorize the loop in line 7) is the fact that the > inner-loop bound is unknown, in which case the compiler creates a > guard-code before the inner-loop to decide whether to skip it or not, which > makes the structure of the outer-loop more complicated than is currently > unhandled. (This is also what I referred to here: > http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00743.html). > So we get: > a.f90:7: note: not vectorized: too many BBs in loop. > To work around that (just to see how far we can get) I put in constant > bound for the inner-loop. Now outer-loop vectorization fails with: > a.f90:7: note: not vectorized: data ref analysis failed D.1444_64 = > (*aa_63(D))[D.1443_62] > a.f90:7: note: bad data references. > This is because the stride in the inner-loop is unknown (because I guess > the dimention size is unknown). You can see that the data-ref analyzer > reports: > "failed: evolution of offset is not affine." > I'll bring it up with Sebastian and Zdenek, see what they think can be done > within the data-ref analyzer in this resepct. > > After outer-loop vectorization fails we move on to vectorize the loop in > line 8, and fail with the same data-ref analysis problem (unknown stride). > So for now, we need a testcase with a known inner-loop bound and known > dimentions/strides. > However, even when we are able to overcome all the above, we still won't be > able to vectorize the loop unless the inner-loop execution order is > reversed (otherwise there's a true dependnece from a(i,j) to a(i,j-1)). > ifort probably has loop reversal capabilities. We don't reverse loops yet > in the middle-end, and also we don't vectorize loops with a reverse access > yet. > > > Second loop: > > a.f90:13: note: Unknown alignment for access: *aa_60(D) > > a.f90:13: note: Unknown alignment for access: *bb_68(D) > > a.f90:13: note: Unknown alignment for access: *aa_60(D) > > a.f90:13: note: not vectorized: can't determine dependence between > > (*aa_60(D))[D.1461_59] and (*aa_60(D))[D.1456_53] > > > > Here, the outer-loop (in line 12) accesses non-consecutive data, whereas > the inner-loop (in line 13) accesses consecutive data, so the inner-loop is > a better candidate for vectorization. At first, the outer-loop vectorizer > fails with: > a.f90:12: note: not vectorized: too many BBs in loop. > (same as above), but after I change the inner-loop bound to a constnat it > fails with: > a.f90:12: note: evolution of offset is not affine. > (again, same problem as above, because the outer-loop stride is unknown). > > Moving on to the loop at line 13, we get: > a.f90:13: note: not vectorized: can't determine dependence between > (*aa_63(D))[D.1443_91] and (*aa_63(D))[D.1438_85] > a.f90:13: note: bad data dependence. > > This is because the data-dependence analysis fails with the following > message: > " > (compute_affine_dependence > (stmt_a = > D.1444_92 = (*aa_63(D))[D.1443_91]) > (stmt_b = > (*aa_63(D))[D.1438_85] = D.1449_100) > (subscript_dependence_tester > (analyze_overlapping_iterations > (chrec_a = {pretmp.62_120 + 1, +, 1}_4) > (chrec_b = {pretmp.60_49 + 1, +, 1}_4) > (analyze_siv_subscript > siv test failed: unimplemented. > ) > (overlap_iterations_a = not known > ) > (overlap_iterations_b = not known > ) > ) > (dependence classified: scev_not_known) > ) > ) > " > There's already an open PR for this. I'll ping Sebastian about this issue. > > > ifort vectorizes both loops. > > > > > > The following loop is also vectorized with ifort; it is from the > > Polyhedron test suite (ac.f90); exchanging the ix and iy loop does not > > change anything. > > > > SUBROUTINE SUSCEP(L,Iz) > > IMPLICIT NONE > > INTEGER L , Iz(L,L) , iznum, ix, iy > > iznum = 0 > > DO iy = 1 , L > > DO ix = 1 , L > > iznum = iznum + Iz(iy,ix) > > ENDDO > > ENDDO > > END subroutine > > end > > > > gcc only prints: > > test2.f90:1: note: vectorized 0 loops in function. > > the reason you don't see anything here is that GCC optimizes the loop away. > To work around that I added a print of iznum at the end of the function > (and indeed now you can see that the loop survives and the vectorizer > analyzes it). With this change only, the vectorizer reports: > failed: evolution of offset is not affine. > (same problem as above - the stride is unknown cause the dimention size is > unknown). > > If I interchange the loops I get: > b.f90:6: note: Analyze phi: iznum_lsm.74_31 = PHI > > b.f90:6: note: reduction: not commutative/associative: iznum.10_37 > tobias2b.f90:6: note: Unknown def-use cycle pattern. > ... > b.f90:6: note: worklist: examine stmt: iznum.9_36 = iznum_lsm.74_31 > b.f90:6: note: vect_is_simple_use: operand iznum_lsm.74_31 > b.f90:6: note: def_stmt: iznum_lsm.74_31 = PHI iznum_lsm.74_12(6)> > b.f90:6: note: Unsupported pattern. > b.f90:6: note: not vectorized: unsupported use in stmt. > 2b.f90:6: note: unexpected pattern. > > This happens because we get the following pattern: > # iznum_lsm.74_31 = PHI > ... > iznum.9_36 = iznum_lsm.74_31; > iznum.10_37 = D.1420_35 + iznum.9_36; > iznum_lsm.74_12 = iznum.10_37; > ... > > which is a "complicated" way to write: > # iznum_lsm.74_31 = PHI > ... > iznum_lsm.74_12 = D.1420_35 + iznum_lsm.74_31; > ... > > In other words, the extra assignments confuse the vectorizer. This is a > known problem - we need to extend our reduction detection code to detect > more "complicated" def-use cycles than just the trivial one. I hope this > will get done in the near future... (I'll add this testcase to the relevant > PR). > > Thanks for your feedback, it helps to prioritize the million items on our > todo list... > > dorit > >