[Bug tree-optimization/33243] New: Missed opportunities for vectorization due to unhandled real

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/33243]  New: Missed opportunities for vectorization due to unhandled real_type
@ 2007-08-30  2:47 spop at gcc dot gnu dot org
  2007-08-30  3:11 ` [Bug tree-optimization/33243] " spop at gcc dot gnu dot org
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: spop at gcc dot gnu dot org @ 2007-08-30  2:47 UTC (permalink / raw)
  To: gcc-bugs

There are two time consuming routines in air.f90 of the Polyhedron
benchmark that are not vectorized: lines 1328 and 1354.  These appear
in the top counting of execution time with oprofile:

      SUBROUTINE DERIVY(D,U,Uy,Al,Np,Nd,M)
      IMPLICIT REAL*8(A-H,O-Z)
      PARAMETER (NX=150,NY=150)
      DIMENSION D(NY,33) , U(NX,NY) , Uy(NX,NY) , Al(30) , Np(30)
      DO jm = 1 , M
         jmax = 0
         jmin = 1
         DO i = 1 , Nd
            jmax = jmax + Np(i) + 1
            DO j = jmin , jmax
               uyt = 0.
               DO k = 0 , Np(i)
                  uyt = uyt + D(j,k+1)*U(jm,jmin+k)
               ENDDO
               Uy(jm,j) = uyt*Al(i)
            ENDDO
            jmin = jmin + Np(i) + 1
         ENDDO
      ENDDO
      CONTINUE
      END

./poly_air_1354.f90:12: note: def_stmt: uyt_1 = PHI <0.0(9), uyt_42(11)>
./poly_air_1354.f90:12: note: Unsupported pattern.
./poly_air_1354.f90:12: note: not vectorized: unsupported use in stmt.
./poly_air_1354.f90:12: note: unexpected pattern.
./poly_air_1354.f90:1: note: vectorized 0 loops in function.

This is due to an unsupported type, real_type, for the reduction variable uyt:
(this is on an i686-linux machine)

 <ssa_name 0xb7c47270
    type <real_type 0xb7badb64 real8 DF
        size <integer_cst 0xb7ba0738 constant invariant 64>
        unit size <integer_cst 0xb7ba0754 constant invariant 8>
        align 64 symtab 0 alias set 3 canonical type 0xb7badb64 precision 64
        pointer_to_this <pointer_type 0xb7badca8>>
    visited var <var_decl 0xb7c40000 uyt> def_stmt <phi_node 0xb7c4a380>
    version 1>

Another similar routine that also appears in the top ranked and not
vectorized due to the same unsupported real_type reasons is in air.f90:1181


      SUBROUTINE FVSPLTX2
      IMPLICIT REAL*8(A-H,O-Z)
      PARAMETER (NX=150,NY=150)
      DIMENSION DX(NX,33) , ALX(30) , NPX(30)
      DIMENSION FP1(NX,NY) , FM1(NX,NY) , FP1x(30,NX) , FM1x(30,NX)
      DIMENSION FP2(NX,NY) , FM2(NX,NY) , FP2x(30,NX) , FM2x(30,NX)
      DIMENSION FP3(NX,NY) , FM3(NX,NY) , FP3x(30,NX) , FM3x(30,NX)
      DIMENSION FP4(NX,NY) , FM4(NX,NY) , FP4x(30,NX) , FM4x(30,NX)
      DIMENSION FV2(NX,NY) , DXP2(30,NX) , DXM2(30,NX)
      DIMENSION FV3(NX,NY) , DXP3(30,NX) , DXM3(30,NX)
      DIMENSION FV4(NX,NY) , DXP4(30,NX) , DXM4(30,NX)
      COMMON /XD1   / FP1 , FM1 , FP2 , FM2 , FP3 , FM3 , FP4 , FM4 ,   &
     &                FP1x , FM1x , FP2x , FM2x , FP3x , FM3x , FP4x ,  &
     &                FM4x , FV2 , FV3 , FV4 , DXP2 , DXM2 , DXP3 ,     &
     &                DXM3 , DXP4 , DXM4 , DX , NPX , ALX , NDX , MXPy


      DO ik = 1 , MXPy
         jmax = 0
         jmin = 1
         DO i = 1 , NDX
            jmax = jmax + NPX(i) + 1
!
! INITIALIZE
!
            FP1x(i,ik) = 0.
            FM1x(i,ik) = 0.
            FP2x(i,ik) = 0.
            FM2x(i,ik) = 0.
            FP3x(i,ik) = 0.
            FM3x(i,ik) = 0.
            FP4x(i,ik) = 0.
            FM4x(i,ik) = 0.
            DXP2(i,ik) = 0.
            DXM2(i,ik) = 0.
            DXP3(i,ik) = 0.
            DXM3(i,ik) = 0.
            DXP4(i,ik) = 0.
            DXM4(i,ik) = 0.
            DO k = 0 , NPX(i)
               jk = jmin + k
               FP1x(i,ik) = FP1x(i,ik) + DX(jmax,k+1)*FP1(jk,ik)
               FM1x(i,ik) = FM1x(i,ik) + DX(jmin,k+1)*FM1(jk,ik)
               FP2x(i,ik) = FP2x(i,ik) + DX(jmax,k+1)*FP2(jk,ik)
               FM2x(i,ik) = FM2x(i,ik) + DX(jmin,k+1)*FM2(jk,ik)
               FP3x(i,ik) = FP3x(i,ik) + DX(jmax,k+1)*FP3(jk,ik)
               FM3x(i,ik) = FM3x(i,ik) + DX(jmin,k+1)*FM3(jk,ik)
               FP4x(i,ik) = FP4x(i,ik) + DX(jmax,k+1)*FP4(jk,ik)
               FM4x(i,ik) = FM4x(i,ik) + DX(jmin,k+1)*FM4(jk,ik)
               DXP2(i,ik) = DXP2(i,ik) + DX(jmax,k+1)*FV2(jk,ik)
               DXM2(i,ik) = DXM2(i,ik) + DX(jmin,k+1)*FV2(jk,ik)
               DXP3(i,ik) = DXP3(i,ik) + DX(jmax,k+1)*FV3(jk,ik)
               DXM3(i,ik) = DXM3(i,ik) + DX(jmin,k+1)*FV3(jk,ik)
               DXP4(i,ik) = DXP4(i,ik) + DX(jmax,k+1)*FV4(jk,ik)
               DXM4(i,ik) = DXM4(i,ik) + DX(jmin,k+1)*FV4(jk,ik)
            ENDDO
            FP1x(i,ik) = FP1x(i,ik)*ALX(i)
            FM1x(i,ik) = FM1x(i,ik)*ALX(i)
            FP2x(i,ik) = FP2x(i,ik)*ALX(i)
            FM2x(i,ik) = FM2x(i,ik)*ALX(i)
            FP3x(i,ik) = FP3x(i,ik)*ALX(i)
            FM3x(i,ik) = FM3x(i,ik)*ALX(i)
            FP4x(i,ik) = FP4x(i,ik)*ALX(i)
            FM4x(i,ik) = FM4x(i,ik)*ALX(i)
            DXP2(i,ik) = DXP2(i,ik)*ALX(i)
            DXM2(i,ik) = DXM2(i,ik)*ALX(i)
            DXP3(i,ik) = DXP3(i,ik)*ALX(i)
            DXM3(i,ik) = DXM3(i,ik)*ALX(i)
            DXP4(i,ik) = DXP4(i,ik)*ALX(i)
            DXM4(i,ik) = DXM4(i,ik)*ALX(i)
            jmin = jmin + NPX(i) + 1
         ENDDO
      ENDDO
      CONTINUE
      END


Here are some kernels from test_fpu.f90 that could be vectorized, 
but are not, due to the exact same problem with the real_type not 
supported.  The places where the vectorization fails are marked 
with a comment at the end of the line: !seb.

SUBROUTINE Crout (a,n)      
USE kinds
IMPLICIT NONE

INTEGER :: n                
REAL(RK8) :: a(n,n)         

INTEGER :: i, j, m, imax(1)      
INTEGER :: index(n)              
REAL(RK8) :: b(n,n), temp(n)     

index = (/(i,i=1,n)/)        

DO j = 1, n        
   DO i = 1, j-1
      b(i, j) = a(i, j)
   END DO
   DO i = j, n
      b(i, j) = a(n+1-j, i+1-j)
   END DO
END DO

DO j = 1, n   

   DO i = j, n    
      b(n-i+j,n+1-i) = b(n-i+j,n+1-i)-DOT_PRODUCT(b(n+1-i:n-i+j-1,n+1-i),
b(1:j-1,j))  !seb1
   END DO

   imax = MAXLOC(ABS( (/ (b(j+i-1,i),i=1,n-j+1) /) ))
   m = imax(1)
   b(j+m-1,m) = 1/b(j+m-1,m)

   IF (m /= n+1-j) THEN   
      index((/j,n+1-m/))     = index((/n+1-m,j/))
      b((/j,n+1-m/),n+2-m:n) = b((/n+1-m,j/),n+2-m:n)
      temp(1:n+1-m)          = b(m:n, m)
      b(m:j-1+m, m)          = b(n+1-j:n, n+1-j)
      b(j+m:n, m)            = b(j, j+1:n+1-m)
      b(n+1-j:n, n+1-j)      = temp(1:j)
      b(j, j+1:n+1-m)        = temp(j+1:n+1-m)
   END IF

   DO i = j+1, n   
      b(j,i) = b(n,n+1-j)*(b(j,i)-DOT_PRODUCT(b(n+1-j:n-1,n+1-j),b(1:j-1,i)))
!seb2
   END DO
END DO

DO j = 1, n-1     
   temp(1) = b(n, n+1-j)
   DO i = j+1, n
      b(n-i+j,n+1-i) = -DOT_PRODUCT(b(n-i+j:n-1,n+1-i),temp(1:i-j))*b(n,n+1-i) 
!seb3
      temp(i-j+1) = b(n-i+j,n+1-i)
   END DO
END DO

DO i = 1, (n+1)/3      
   temp(1:n+2-3*i) = b(2*i:n+1-i,i)
   DO j = 2*i, n+1-i
      b(j, i) = b(n+i-j, n+1-j)
   END DO
   DO j = i, n+1-2*i
      b(i+j-1, j) = b(n+1-i, n+2-i-j)
   END DO
   b(n+1-i, i+1:n+2-2*i) = temp(1:n+2-3*i)
END DO

DO i = 1, n-1      
   DO j = i+1, n
      b(i,j) = -b(i,j)-DOT_PRODUCT(temp(1:j-i-1), b(i+1:j-1,j)) !seb4
      temp(j-i) = b(i,j)
   END DO
END DO

DO i = 1, n-1      
   temp(1:n-i) = b(i,i+1:n)
   DO j = 1,i
      b(i,j) = b(i,j)+DOT_PRODUCT(temp(1:n-i),b(i+1:n,j))  !seb5
   END DO
   DO j = i+1, n
      b(i,j) = DOT_PRODUCT(temp(j-i:n-i),b(j:n,j)) !seb6
   END DO
END DO

END SUBROUTINE Crout


Here are the details about the fails:

seb6: not vectorized because of real_type problem

./test_fpu.f90:80: note: def_stmt: val.75_1012 = PHI <val.75_1028(250),
0.0(248)>
./test_fpu.f90:80: note: Unsupported pattern.
./test_fpu.f90:80: note: not vectorized: unsupported use in stmt.
./test_fpu.f90:80: note: unexpected pattern.(get_loop_exit_condition 

seb5: same real_type problem

./test_fpu.f90:77: note: def_stmt: val.73_887 = PHI <val.73_994(241), 0.0(239)>
./test_fpu.f90:77: note: Unsupported pattern.
./test_fpu.f90:77: note: not vectorized: unsupported use in stmt.

seb4: same real_type problem

./test_fpu.f90:69: note: def_stmt: val.70_980 = PHI <val.70_931(222), 0.0(220)>
./test_fpu.f90:69: note: Unsupported pattern.
./test_fpu.f90:69: note: not vectorized: unsupported use in stmt.

seb3: same real_type problem

./test_fpu.f90:51: note: def_stmt: val.66_229 = PHI <val.66_770(181), 0.0(179)>
./test_fpu.f90:51: note: Unsupported pattern.
./test_fpu.f90:51: note: not vectorized: unsupported use in stmt.

seb2: same real_type problem

./test_fpu.f90:44: note: def_stmt: val.64_260 = PHI <val.64_711(165), 0.0(163)>
./test_fpu.f90:44: note: Unsupported pattern.
./test_fpu.f90:44: note: not vectorized: unsupported use in stmt.

seb1: same real_type problem

./test_fpu.f90:26: note: def_stmt: val.18_1661 = PHI <val.18_244(53), 0.0(51)>
./test_fpu.f90:26: note: Unsupported pattern.
./test_fpu.f90:26: note: not vectorized: unsupported use in stmt.


-- 
           Summary: Missed opportunities for vectorization due to unhandled
                    real_type
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: spop at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33243


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/33243] Missed opportunities for vectorization due to unhandled real_type
  2007-08-30  2:47 [Bug tree-optimization/33243] New: Missed opportunities for vectorization due to unhandled real_type spop at gcc dot gnu dot org
@ 2007-08-30  3:11 ` spop at gcc dot gnu dot org
  2007-08-30 10:12 ` dorit at gcc dot gnu dot org
  2007-08-30 14:19 ` spop at gcc dot gnu dot org
  2 siblings, 0 replies; 5+ messages in thread
From: spop at gcc dot gnu dot org @ 2007-08-30  3:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from spop at gcc dot gnu dot org  2007-08-30 03:10 -------
Created an attachment (id=14137)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14137&action=view)
testcase

This is the last testcase that get scrambled by bugzilla.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33243


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/33243] Missed opportunities for vectorization due to unhandled real_type
  2007-08-30  2:47 [Bug tree-optimization/33243] New: Missed opportunities for vectorization due to unhandled real_type spop at gcc dot gnu dot org
  2007-08-30  3:11 ` [Bug tree-optimization/33243] " spop at gcc dot gnu dot org
@ 2007-08-30 10:12 ` dorit at gcc dot gnu dot org
  2007-08-30 14:19 ` spop at gcc dot gnu dot org
  2 siblings, 0 replies; 5+ messages in thread
From: dorit at gcc dot gnu dot org @ 2007-08-30 10:12 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #2 from dorit at gcc dot gnu dot org  2007-08-30 10:12 -------
> There are two time consuming routines in air.f90 of the Polyhedron
> benchmark that are not vectorized: lines 1328 and 1354.  These appear
> in the top counting of execution time with oprofile:
> 
>       SUBROUTINE DERIVY(D,U,Uy,Al,Np,Nd,M)
>       IMPLICIT REAL*8(A-H,O-Z)
>       PARAMETER (NX=150,NY=150)
>       DIMENSION D(NY,33) , U(NX,NY) , Uy(NX,NY) , Al(30) , Np(30)
>       DO jm = 1 , M
>          jmax = 0
>          jmin = 1
>          DO i = 1 , Nd
>             jmax = jmax + Np(i) + 1
>             DO j = jmin , jmax
>                uyt = 0.
>                DO k = 0 , Np(i)
>                   uyt = uyt + D(j,k+1)*U(jm,jmin+k)
>                ENDDO
>                Uy(jm,j) = uyt*Al(i)
>             ENDDO
>             jmin = jmin + Np(i) + 1
>          ENDDO
>       ENDDO
>       CONTINUE
>       END
> 
> ./poly_air_1354.f90:12: note: def_stmt: uyt_1 = PHI <0.0(9), uyt_42(11)>
> ./poly_air_1354.f90:12: note: Unsupported pattern.
> ./poly_air_1354.f90:12: note: not vectorized: unsupported use in stmt.
> ./poly_air_1354.f90:12: note: unexpected pattern.
> ./poly_air_1354.f90:1: note: vectorized 0 loops in function.
> 
> This is due to an unsupported type, real_type, for the reduction variable uyt:
> (this is on an i686-linux machine)

There is no "unhandled real_type" problem, you just need to use -ffast-math to
allow vectorization of summation of fp types (or the new reassociation flag):
pr33243b.f90:12: note: Analyze phi: uyt_1 = PHI <0.0(9), uyt_42(11)>
pr33243b.f90:12: note: reduction: unsafe fp math optimization: D.1386_41 +
uyt_1
pr33243b.f90:12: note: Unknown def-use cycle pattern.

If you use -ffast-math the reduction is detected:
pr33243b.f90:12: note: Analyze phi: uyt_1 = PHI <0.0(9), uyt_42(11)>
pr33243b.f90:12: note: detected reduction:D.1386_41 + uyt_1
pr33243b.f90:12: note: Detected reduction.

However, the loop will still not get vectorized because there is a
non-consecutive access in the loop: 
pr33243b.f90:12: note: === vect_analyze_data_ref_accesses ===
pr33243b.f90:12: note: not consecutive access
pr33243b.f90:12: note: not vectorized: complicated access pattern.

This is because the stride of the accesses to D(j,k+1) and U(jm,jmin+k) in the
inner-loop (k-loop) between inner-loop iterations is 1200B: 

            DO j = jmin , jmax
               uyt = 0.
               DO k = 0 , NP(i)
                  uyt = uyt + D(j,k+1)*U(jm,jmin+k)
               ENDDO
               Uy(jm,j) = uyt*Al(i)
            ENDDO

In the outer-loop (j-loop) these accesses are consecutive, and also you don't
need to use the -ffast-math flag. However there are other problems: 
1) the compiler creates a guard to control whether to enter the inner-loop or
not (cause it may execute 0 times). This creates a more involved control-flow
than the outer-loop vectorizer is willing to work with. A solution would be to
create this guard outside the outer-loop (in case it is invariant, as is the
case here), which is like versioning the loop (or unswichting the loop).
2) if you change the loop count to something constant (just to bypass the above
problem), then indeed no guard code is generated, but there is a computation
(advancing an iv) in the latch block of the outer-loop (so it is not empty, and
we are not willing to work with such loops). We need to clean that away.
3) After these problems are solved, we still need to deal with a
non-consecutive access in the outer-loop - the store to Uy(jm,j). AFAICS, this
requires either transposing the Uy array in advance, or teaching the vectorizer
to "scatter" the results to the non-adjacent locations (which would be quite
expensive, but we could give it a try).

Alternatively, vectorizing the inner-loop would require transposing the D and U
matrices.

Another option is to interchange the jm loop with the j loop - I think this way
all accesses would be consecutive, and we could vectorize the jm loop (which
would now be a doubly-nested loop that the outer-loop vectorizer could handle).

So, the PR for this testcase would be better classified under one of the above
problems/missed-optimizations rather than "unhandled real_type". 

> 
> Another similar routine that also appears in the top ranked and not
> vectorized due to the same unsupported real_type reasons is in air.f90:1181
> 
> 
>       SUBROUTINE FVSPLTX2
>       IMPLICIT REAL*8(A-H,O-Z)
>       PARAMETER (NX=150,NY=150)
>       DIMENSION DX(NX,33) , ALX(30) , NPX(30)
>       DIMENSION FP1(NX,NY) , FM1(NX,NY) , FP1x(30,NX) , FM1x(30,NX)
>       DIMENSION FP2(NX,NY) , FM2(NX,NY) , FP2x(30,NX) , FM2x(30,NX)
>       DIMENSION FP3(NX,NY) , FM3(NX,NY) , FP3x(30,NX) , FM3x(30,NX)
>       DIMENSION FP4(NX,NY) , FM4(NX,NY) , FP4x(30,NX) , FM4x(30,NX)
>       DIMENSION FV2(NX,NY) , DXP2(30,NX) , DXM2(30,NX)
>       DIMENSION FV3(NX,NY) , DXP3(30,NX) , DXM3(30,NX)
>       DIMENSION FV4(NX,NY) , DXP4(30,NX) , DXM4(30,NX)
>       COMMON /XD1   / FP1 , FM1 , FP2 , FM2 , FP3 , FM3 , FP4 , FM4 ,   &
>      &                FP1x , FM1x , FP2x , FM2x , FP3x , FM3x , FP4x ,  &
>      &                FM4x , FV2 , FV3 , FV4 , DXP2 , DXM2 , DXP3 ,     &
>      &                DXM3 , DXP4 , DXM4 , DX , NPX , ALX , NDX , MXPy
> 
> 
>       DO ik = 1 , MXPy
>          jmax = 0
>          jmin = 1
>          DO i = 1 , NDX
>             jmax = jmax + NPX(i) + 1
> !
> ! INITIALIZE
> !
>             FP1x(i,ik) = 0.
>             FM1x(i,ik) = 0.
>             FP2x(i,ik) = 0.
>             FM2x(i,ik) = 0.
>             FP3x(i,ik) = 0.
>             FM3x(i,ik) = 0.
>             FP4x(i,ik) = 0.
>             FM4x(i,ik) = 0.
>             DXP2(i,ik) = 0.
>             DXM2(i,ik) = 0.
>             DXP3(i,ik) = 0.
>             DXM3(i,ik) = 0.
>             DXP4(i,ik) = 0.
>             DXM4(i,ik) = 0.
>             DO k = 0 , NPX(i)
>                jk = jmin + k
>                FP1x(i,ik) = FP1x(i,ik) + DX(jmax,k+1)*FP1(jk,ik)
>                FM1x(i,ik) = FM1x(i,ik) + DX(jmin,k+1)*FM1(jk,ik)
>                FP2x(i,ik) = FP2x(i,ik) + DX(jmax,k+1)*FP2(jk,ik)
>                FM2x(i,ik) = FM2x(i,ik) + DX(jmin,k+1)*FM2(jk,ik)
>                FP3x(i,ik) = FP3x(i,ik) + DX(jmax,k+1)*FP3(jk,ik)
>                FM3x(i,ik) = FM3x(i,ik) + DX(jmin,k+1)*FM3(jk,ik)
>                FP4x(i,ik) = FP4x(i,ik) + DX(jmax,k+1)*FP4(jk,ik)
>                FM4x(i,ik) = FM4x(i,ik) + DX(jmin,k+1)*FM4(jk,ik)
>                DXP2(i,ik) = DXP2(i,ik) + DX(jmax,k+1)*FV2(jk,ik)
>                DXM2(i,ik) = DXM2(i,ik) + DX(jmin,k+1)*FV2(jk,ik)
>                DXP3(i,ik) = DXP3(i,ik) + DX(jmax,k+1)*FV3(jk,ik)
>                DXM3(i,ik) = DXM3(i,ik) + DX(jmin,k+1)*FV3(jk,ik)
>                DXP4(i,ik) = DXP4(i,ik) + DX(jmax,k+1)*FV4(jk,ik)
>                DXM4(i,ik) = DXM4(i,ik) + DX(jmin,k+1)*FV4(jk,ik)
>             ENDDO
>             FP1x(i,ik) = FP1x(i,ik)*ALX(i)
>             FM1x(i,ik) = FM1x(i,ik)*ALX(i)
>             FP2x(i,ik) = FP2x(i,ik)*ALX(i)
>             FM2x(i,ik) = FM2x(i,ik)*ALX(i)
>             FP3x(i,ik) = FP3x(i,ik)*ALX(i)
>             FM3x(i,ik) = FM3x(i,ik)*ALX(i)
>             FP4x(i,ik) = FP4x(i,ik)*ALX(i)
>             FM4x(i,ik) = FM4x(i,ik)*ALX(i)
>             DXP2(i,ik) = DXP2(i,ik)*ALX(i)
>             DXM2(i,ik) = DXM2(i,ik)*ALX(i)
>             DXP3(i,ik) = DXP3(i,ik)*ALX(i)
>             DXM3(i,ik) = DXM3(i,ik)*ALX(i)
>             DXP4(i,ik) = DXP4(i,ik)*ALX(i)
>             DXM4(i,ik) = DXM4(i,ik)*ALX(i)
>             jmin = jmin + NPX(i) + 1
>          ENDDO
>       ENDDO
>       CONTINUE
>       END
>

Here again, it's not an issue with unhandling real_types, but a problem to
detect the reduction because of extra copy stmts that lim leaves behind:

  # xd1__dxm4xd1_I_I_lsm.42_89 = PHI <xd1__dxm4xd1_I_I_lsm.42_429(7),
xd1__dxm4xd1_I_I_lsm.42_320(9)>
  D.1487_302 = xd1__dxm4xd1_I_I_lsm.42_89;
  D.1489_313 = D.1488_312 + D.1487_302;
  xd1__dxm4xd1_I_I_lsm.42_320 = D.1489_313;

This problem with detecting only a restricted form of a reduction (that
consists of a single stmt + phi node) is already reported under PR32824. It
would be solved by a more general reduction detection utility that Razya is
planning to implement, and/or by cleaning away the stuff that lim leaves
behind. (By the way, you'd still need to use -ffast-math).

However, even when this is fixed, we have non consecutive accesses to array X,
which would need to be transposed.

If you look at the outer-loop (i-loop), I think everything is consecutive
there. The problem is that the inner-loop bound is not invariant in the
outer-loop, so that would prevent outer-loop vectorization:

pr33243c.f90:21: note: Symbolic number of iterations is (<unnamed-unsigned:32>)
D.1420_13 + 1
pr33243c.f90:21: note: not vectorized: inner-loop count not invariant.
pr33243c.f90:21: note: bad loop form.

> 
> Here are some kernels from test_fpu.f90 that could be vectorized, 
> but are not, due to the exact same problem with the real_type not 
> supported.  The places where the vectorization fails are marked 
> with a comment at the end of the line: !seb.
>

I couldn't compile the rest of the kernels because of the "USE kinds" - I get:
Fatal Error: Can't open module file 'kinds.mod' for reading
How do I get around that?

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33243

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/33243] Missed opportunities for vectorization due to unhandled real_type
  2007-08-30  2:47 [Bug tree-optimization/33243] New: Missed opportunities for vectorization due to unhandled real_type spop at gcc dot gnu dot org
  2007-08-30  3:11 ` [Bug tree-optimization/33243] " spop at gcc dot gnu dot org
  2007-08-30 10:12 ` dorit at gcc dot gnu dot org
@ 2007-08-30 14:19 ` spop at gcc dot gnu dot org
  2 siblings, 0 replies; 5+ messages in thread
From: spop at gcc dot gnu dot org @ 2007-08-30 14:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from spop at gcc dot gnu dot org  2007-08-30 14:19 -------
Subject: Re:  Missed opportunities for vectorization due to unhandled real_type

Thanks for the detailed comments and further investigation.

On 30 Aug 2007 10:12:26 -0000, dorit at gcc dot gnu dot org >
> I couldn't compile the rest of the kernels because of the "USE kinds" - I get:
> Fatal Error: Can't open module file 'kinds.mod' for reading
> How do I get around that?
>

You can try to install the gfortran compiler, and then just build with
gfortran and not with the f951 driver:

~/usr/bin/gfortran-4.3  -c -O2 -msse2 -ftree-vectorize
-fdump-tree-vect-all test_fpu.f90

works fine for me.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33243


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/33243] Missed opportunities for vectorization due to unhandled real_type
       [not found] <bug-33243-4@http.gcc.gnu.org/bugzilla/>
@ 2021-07-21  2:41 ` pinskia at gcc dot gnu.org
  0 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-21  2:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33243

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
           Keywords|                            |missed-optimization
             Blocks|                            |53947


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-21  2:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-30  2:47 [Bug tree-optimization/33243] New: Missed opportunities for vectorization due to unhandled real_type spop at gcc dot gnu dot org
2007-08-30  3:11 ` [Bug tree-optimization/33243] " spop at gcc dot gnu dot org
2007-08-30 10:12 ` dorit at gcc dot gnu dot org
2007-08-30 14:19 ` spop at gcc dot gnu dot org
     [not found] <bug-33243-4@http.gcc.gnu.org/bugzilla/>
2021-07-21  2:41 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).