[Bug fortran/38318] New: moving the allocation of temps out of loops.

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug fortran/38318]  New: moving the allocation of temps out of loops.
@ 2008-11-29 16:17 jv244 at cam dot ac dot uk
  2008-12-06 20:29 ` [Bug fortran/38318] " tkoenig at gcc dot gnu dot org
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-11-29 16:17 UTC (permalink / raw)
  To: gcc-bugs

consider the following source and timings, were a natural form of a subroutine
S1, and two hand optimized forms are timed:

> cat test.f90
SUBROUTINE S1(N,A)
 REAL :: A(3)
 DO I=1,N
   CALL S2(-A)
 ENDDO
END SUBROUTINE

SUBROUTINE S1_opt1(N,A)
 REAL :: A(3)
 REAL, ALLOCATABLE :: B(:)
 ALLOCATE(B(SIZE(A,1)))
 DO I=1,N
   B=-A
   CALL S2(B)
 ENDDO
END SUBROUTINE


SUBROUTINE S1_opt2(N,A)
 REAL :: A(3),B(3)
 DO I=1,N
   B=-A
   CALL S2(B)
 ENDDO
END SUBROUTINE

> cat main.f90

SUBROUTINE S2(A)
 REAL :: A(*),D
 COMMON /F/D
 D=D+A(1)+A(2)+A(3)
END SUBROUTINE

INTEGER, PARAMETER :: N=100000
REAL :: A(3),T1,T2,T3,T4,D
COMMON /F/D
D=0.0
A=0.0
CALL CPU_TIME(T1)
DO I=1,10000
  CALL S1(N,A)
ENDDO
CALL CPU_TIME(T2)
DO I=1,10000
  CALL S1_opt1(N,A)
ENDDO
CALL CPU_TIME(T3)
DO I=1,10000
  CALL S1_opt2(N,A)
ENDDO
CALL CPU_TIME(T4)

write(6,*) "Default [s]:",T2-T1
write(6,*) "OPT1 [s]:",T3-T2
write(6,*) "OPT2 [s]:",T4-T3
write(6,*) D
END

gfortran-4.4 -O3 test.f90 main.f90
 Default [s]:   18.293142
 OPT1 [s]:   6.2603912
 OPT2 [s]:   6.2563915

ifort -O3 test.f90 main.f90
 Default [s]:   6.256391
 OPT1 [s]:   6.252390
 OPT2 [s]:   6.256390

so, gfortran by default is about 3x slower than ifort, which by default moves
the generation of the temporaries out of the loop. 

FYI, allowing for multi file IPO, I hope LTO gets that far...

ifort -O3 -fast test.f90 main.f90 (includes ipo)
 Default [s]:   3.752234
 OPT1 [s]:   1.276080
 OPT2 [s]:   3.752234


-- 
           Summary: moving the allocation of temps out of loops.
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jv244 at cam dot ac dot uk
OtherBugsDependingO 36854
             nThis:


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug fortran/38318] moving the allocation of temps out of loops.
  2008-11-29 16:17 [Bug fortran/38318] New: moving the allocation of temps out of loops jv244 at cam dot ac dot uk
@ 2008-12-06 20:29 ` tkoenig at gcc dot gnu dot org
  2010-02-21  9:12 ` jv244 at cam dot ac dot uk
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2008-12-06 20:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from tkoenig at gcc dot gnu dot org  2008-12-06 20:25 -------
This could also be useful when done in the middle-end,
see PR 21046.


-- 

tkoenig at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |21046
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2008-12-06 20:25:54
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug fortran/38318] moving the allocation of temps out of loops.
  2008-11-29 16:17 [Bug fortran/38318] New: moving the allocation of temps out of loops jv244 at cam dot ac dot uk
  2008-12-06 20:29 ` [Bug fortran/38318] " tkoenig at gcc dot gnu dot org
@ 2010-02-21  9:12 ` jv244 at cam dot ac dot uk
  2010-02-21 11:07 ` burnus at gcc dot gnu dot org
  2010-02-21 12:12 ` [Bug middle-end/38318] " jv244 at cam dot ac dot uk
  3 siblings, 0 replies; 13+ messages in thread
From: jv244 at cam dot ac dot uk @ 2010-02-21  9:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from jv244 at cam dot ac dot uk  2010-02-21 09:12 -------
seemingly being discussed, since useful for tonto

http://gcc.gnu.org/ml/fortran/2010-02/msg00157.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug fortran/38318] moving the allocation of temps out of loops.
  2008-11-29 16:17 [Bug fortran/38318] New: moving the allocation of temps out of loops jv244 at cam dot ac dot uk
  2008-12-06 20:29 ` [Bug fortran/38318] " tkoenig at gcc dot gnu dot org
  2010-02-21  9:12 ` jv244 at cam dot ac dot uk
@ 2010-02-21 11:07 ` burnus at gcc dot gnu dot org
  2010-02-21 12:12 ` [Bug middle-end/38318] " jv244 at cam dot ac dot uk
  3 siblings, 0 replies; 13+ messages in thread
From: burnus at gcc dot gnu dot org @ 2010-02-21 11:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from burnus at gcc dot gnu dot org  2010-02-21 11:06 -------
(In reply to comment #2)
> seemingly being discussed, since useful for tonto
> 
> http://gcc.gnu.org/ml/fortran/2010-02/msg00157.html
> 

But there: "it's unfortunately not possible to avoid the temporary creation
without serious data-flow analysis work - too late for the frontend"

Thus, this seems to be more a middle-end item.

Regarding the current timing, I get with ifort -O3 -fast (v 11.1) vs.
gfortran -flto -fwhole-program -O3 --fast-math -march=native (today's 4.5) It
is also interesting that gfortran is much faster for the optimized version than
ifort.

gfortran                     ifort
 Default [s]:   24.881554     Default [s]:   5.108319
 OPT1 [s]:   1.6641045        OPT1 [s]:   3.280205
 OPT2 [s]:   1.6641045        OPT2 [s]:   4.988311
   0.0000000                   0.0000000E+00

real    0m28.420s            real    0m13.400s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/38318] moving the allocation of temps out of loops.
  2008-11-29 16:17 [Bug fortran/38318] New: moving the allocation of temps out of loops jv244 at cam dot ac dot uk
                   ` (2 preceding siblings ...)
  2010-02-21 11:07 ` burnus at gcc dot gnu dot org
@ 2010-02-21 12:12 ` jv244 at cam dot ac dot uk
  3 siblings, 0 replies; 13+ messages in thread
From: jv244 at cam dot ac dot uk @ 2010-02-21 12:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from jv244 at cam dot ac dot uk  2010-02-21 12:11 -------
(In reply to comment #3)
> (In reply to comment #2)
> > seemingly being discussed, since useful for tonto
> > 
> > http://gcc.gnu.org/ml/fortran/2010-02/msg00157.html
> > 
> 
> But there: "it's unfortunately not possible to avoid the temporary creation
> without serious data-flow analysis work - too late for the frontend"
> 
> Thus, this seems to be more a middle-end item.

right, changing component as such. This would actually be much more powerful as
a middle-end thing, since it would also capture the case where a programmer
would explicitly allocate/deallocate stuff in a loop.


-- 

jv244 at cam dot ac dot uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|fortran                     |middle-end


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/38318] moving the allocation of temps out of loops.
       [not found] <bug-38318-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2013-11-10 18:40 ` glisse at gcc dot gnu.org
@ 2013-11-10 19:13 ` Joost.VandeVondele at mat dot ethz.ch
  7 siblings, 0 replies; 13+ messages in thread
From: Joost.VandeVondele at mat dot ethz.ch @ 2013-11-10 19:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318

--- Comment #12 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> ---
(In reply to Marc Glisse from comment #11)
> So S2 cannot call free (or realloc) on the pointer and then exit or call
> longjmp or do an infinite loop or anything like that in fortran? Maybe we'll
> need a flag set by the front-end that says whether (all) functions are safe.

well, not free or realloc or longjmp, but infinite loops are allowed in S2. The
point is, one is really not passing a pointer to S2 (from a Fortran point of
view).


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/38318] moving the allocation of temps out of loops.
       [not found] <bug-38318-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2013-11-10 18:23 ` Joost.VandeVondele at mat dot ethz.ch
@ 2013-11-10 18:40 ` glisse at gcc dot gnu.org
  2013-11-10 19:13 ` Joost.VandeVondele at mat dot ethz.ch
  7 siblings, 0 replies; 13+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-11-10 18:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318

--- Comment #11 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Joost VandeVondele from comment #10)
> (In reply to Marc Glisse from comment #9)
> > Ok. If you used __builtin_abort instead of _gfortran_os_error, I think my
> > current patch would handle it. It is hard for gcc to guess that
> > _gfortran_os_error is safe. 
> 
> For the Fortran FE people (not me, I'm a user), but  _gfortran_os_error
> should have an attribute like 'abort' or 'noreturn'.

abort doesn't exist, and noreturn is not sufficient, as a function that calls
free on the pointer then exits is noreturn but unsafe.

> However, the compiler
> should also be able to figure out this can never be called (if B is
> 'allocated on the stack') in this subroutine.

Yes.

> > Er, no, I missed the call to s2. I would also need some attribute on s2 so
> > the compiler knows that s2 doesn't do anything too weird. 
> 
> Actually, in Fortran, S2 can't do anything 'weird' with B, in the sense that
> your optimization should certainly apply. Not so sure about the correct
> terms here, but in approximate C-speak, B 'as a pointer' is guaranteed to be
> pointing to exactly the same address, nothing has happened to its target,
> and no pointer can be pointing to whatever B was pointing to....

So S2 cannot call free (or realloc) on the pointer and then exit or call
longjmp or do an infinite loop or anything like that in fortran? Maybe we'll
need a flag set by the front-end that says whether (all) functions are safe.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/38318] moving the allocation of temps out of loops.
       [not found] <bug-38318-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2013-11-10 18:05 ` glisse at gcc dot gnu.org
@ 2013-11-10 18:23 ` Joost.VandeVondele at mat dot ethz.ch
  2013-11-10 18:40 ` glisse at gcc dot gnu.org
  2013-11-10 19:13 ` Joost.VandeVondele at mat dot ethz.ch
  7 siblings, 0 replies; 13+ messages in thread
From: Joost.VandeVondele at mat dot ethz.ch @ 2013-11-10 18:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318

--- Comment #10 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> ---
(In reply to Marc Glisse from comment #9)

> 
> Ok. If you used __builtin_abort instead of _gfortran_os_error, I think my
> current patch would handle it. It is hard for gcc to guess that
> _gfortran_os_error is safe. 

For the Fortran FE people (not me, I'm a user), but  _gfortran_os_error should
have an attribute like 'abort' or 'noreturn'. However, the compiler should also
be able to figure out this can never be called (if B is 'allocated on the
stack') in this subroutine.

> Er, no, I missed the call to s2. I would also need some attribute on s2 so
> the compiler knows that s2 doesn't do anything too weird. 

Actually, in Fortran, S2 can't do anything 'weird' with B, in the sense that
your optimization should certainly apply. Not so sure about the correct terms
here, but in approximate C-speak, B 'as a pointer' is guaranteed to be pointing
to exactly the same address, nothing has happened to its target, and no pointer
can be pointing to whatever B was pointing to....

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/38318] moving the allocation of temps out of loops.
       [not found] <bug-38318-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2013-11-10 16:52 ` Joost.VandeVondele at mat dot ethz.ch
@ 2013-11-10 18:05 ` glisse at gcc dot gnu.org
  2013-11-10 18:23 ` Joost.VandeVondele at mat dot ethz.ch
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-11-10 18:05 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318

--- Comment #9 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Joost VandeVondele from comment #8)
> Marc, looks like the fortran FE changed a lot since this bug was filed, and
> there is no explicit allocate anymore, in fact the variable is created on
> stack by the frontend...

Cool, the best optimizations are those you don't need to do ;-)

> So now, the proper testcase would be:
> > cat PR38318-3.f90
> SUBROUTINE S1(N,A)
>  REAL :: A(3)
>  REAL, DIMENSION(:), ALLOCATABLE :: B
>  DO I=1,N
>    ALLOCATE(B(3))
>    B=-A
>    CALL S2(B)
>    DEALLOCATE(B)
>  ENDDO
> END SUBROUTINE
> 
> which really should contain any call to _gfortran_runtime_error_at,
> _gfortran_os_error, __builtin_malloc, __builtin_free if all were perfect,
> and certainly not in the loop

Ok. If you used __builtin_abort instead of _gfortran_os_error, I think my
current patch would handle it. It is hard for gcc to guess that
_gfortran_os_error is safe. On the other hand, if I special case the test
if(VAR==0) as mentioned in a comment in my patch, it won't look at that branch
anymore and the optimization should apply.

Er, no, I missed the call to s2. I would also need some attribute on s2 so the
compiler knows that s2 doesn't do anything too weird. Hopefully, when the
compiler has the sources for s2, we could later let it guess those
attributes...


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/38318] moving the allocation of temps out of loops.
       [not found] <bug-38318-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2013-11-10 16:26 ` glisse at gcc dot gnu.org
@ 2013-11-10 16:52 ` Joost.VandeVondele at mat dot ethz.ch
  2013-11-10 18:05 ` glisse at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Joost.VandeVondele at mat dot ethz.ch @ 2013-11-10 16:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318

--- Comment #8 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> ---
(In reply to Marc Glisse from comment #7)
> (In reply to Joost VandeVondele from comment #6)
> > Marc, I think your recently posted patch:
> > http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01049.html
> > could fix the problem with the testcase subroutine S1, even though 'moving
> > allocations out of loops' is more or less a side effect.
> 
> I don't speak fortran fluently so I tried compiling S1 with an unpatched
> compiler and -O2 -fdump-tree-optimized, but I don't see any call to malloc
> in there. Could you explain, with references to a dump, what the internal
> functions mean and where my patch might help?

Marc, looks like the fortran FE changed a lot since this bug was filed, and
there is no explicit allocate anymore, in fact the variable is created on stack
by the frontend... this is controlled by -fmax-stack-var-size=0 (putting it to
zero, will yield your __builtin_malloc() that I recalled, in the
PR38318.f90.003t.original dump). You have a precedent for getting the a
reasonable size (32768 for fortran).

The _gfortran_internal_(un)pack is a fortran FE thing, that guarantees that
memory is contiguous... clearly a missed frontend optimization in this case.

So now, the proper testcase would be:
> cat PR38318-3.f90
SUBROUTINE S1(N,A)
 REAL :: A(3)
 REAL, DIMENSION(:), ALLOCATABLE :: B
 DO I=1,N
   ALLOCATE(B(3))
   B=-A
   CALL S2(B)
   DEALLOCATE(B)
 ENDDO
END SUBROUTINE

which really should contain any call to _gfortran_runtime_error_at,
_gfortran_os_error, __builtin_malloc, __builtin_free if all were perfect, and
certainly not in the loop


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/38318] moving the allocation of temps out of loops.
       [not found] <bug-38318-4@http.gcc.gnu.org/bugzilla/>
  2010-10-07 15:04 ` dominiq at lps dot ens.fr
  2013-11-10 16:02 ` Joost.VandeVondele at mat dot ethz.ch
@ 2013-11-10 16:26 ` glisse at gcc dot gnu.org
  2013-11-10 16:52 ` Joost.VandeVondele at mat dot ethz.ch
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-11-10 16:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318

--- Comment #7 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Joost VandeVondele from comment #6)
> Marc, I think your recently posted patch:
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01049.html
> could fix the problem with the testcase subroutine S1, even though 'moving
> allocations out of loops' is more or less a side effect.

I don't speak fortran fluently so I tried compiling S1 with an unpatched
compiler and -O2 -fdump-tree-optimized, but I don't see any call to malloc in
there. Could you explain, with references to a dump, what the internal
functions mean and where my patch might help?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/38318] moving the allocation of temps out of loops.
       [not found] <bug-38318-4@http.gcc.gnu.org/bugzilla/>
  2010-10-07 15:04 ` dominiq at lps dot ens.fr
@ 2013-11-10 16:02 ` Joost.VandeVondele at mat dot ethz.ch
  2013-11-10 16:26 ` glisse at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Joost.VandeVondele at mat dot ethz.ch @ 2013-11-10 16:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318

Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |glisse at gcc dot gnu.org,
                   |                            |Joost.VandeVondele at mat dot ethz
                   |                            |.ch

--- Comment #6 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> ---
Marc, I think your recently posted patch:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01049.html
could fix the problem with the testcase subroutine S1, even though 'moving
allocations out of loops' is more or less a side effect.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/38318] moving the allocation of temps out of loops.
       [not found] <bug-38318-4@http.gcc.gnu.org/bugzilla/>
@ 2010-10-07 15:04 ` dominiq at lps dot ens.fr
  2013-11-10 16:02 ` Joost.VandeVondele at mat dot ethz.ch
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: dominiq at lps dot ens.fr @ 2010-10-07 15:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jh at suse dot cz,
                   |                            |rguenther at suse dot de

--- Comment #5 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2010-10-07 15:04:34 UTC ---
Another case of interest is "automatic arrays". An interesting example is the
polyhedron test nf.f90.
On Core2 Duo and Darwin the following patch


--- nf.f90    2005-10-11 22:53:32.000000000 +0200
+++ nf_v2.f90    2010-10-07 16:49:38.000000000 +0200
@@ -153,7 +153,7 @@ integer :: nx , nxy , nxyz , maxiter
 real(dpkind),dimension(nxyz):: ad,au1,au2,au3,x,b
 real(dpkind)::targrms

-real(dpkind),allocatable,dimension(:) :: r,q,p,z,g,gi
+real(dpkind),allocatable,dimension(:) :: r,q,p,z,g,gi,t,u
 real(dpkind):: alpha,beta,qr,qrp,rmserr
 integer :: iter , tbase , tgi , tcg , tickspersec , maxticks

@@ -163,7 +163,7 @@ call GetGI3D(1,nxyz)                 ! c
 call system_clock(tgi,tickspersec,maxticks)
 deallocate(g)

-allocate (r(nxyz),q(nxyz),p(nxyz),z(nxyz))
+allocate (r(nxyz),q(nxyz),p(nxyz),z(nxyz),t(nxyz),u(nxyz))
 CALL SPMMULT(x,r) ; r = b - r        ! compute initial residual vector

 write(*,'(A)') ' Iter      Alpha        Beta     RMS Residual   Sum of
Residuals'
@@ -171,12 +171,12 @@ write(*,'(I4,24X,2G18.7)') 0,sqrt(DOT_PR

                                      !  Do a single iteration with alpha =1 
                                      !  to reduce sum of residuals to 0
-p = r ; CALL NF3DPrecon(p,1,nxyz) ; CALL SPMMULT(p,z)
+p = r ; CALL NF3DPrecon(p,t,u,1,nxyz) ; CALL SPMMULT(p,z)
 x = x + p ; r = r - z
 write(*,'(I4,F12.5,12X,2G18.7)') 0,1.0,sqrt(DOT_PRODUCT(r,r)/nxyz),sum(r)

 do iter = 1 , maxiter
-   q = r ; CALL NF3DPrecon(q,1,nxyz)
+   q = r ; CALL NF3DPrecon(q,t,u,1,nxyz)
    qr = DOT_PRODUCT(q,r)
    if ( iter==1 ) then
       beta = 0.0
@@ -197,7 +197,7 @@ call system_clock(tcg,tickspersec,maxtic
 write(*,'(/A,F10.3/A,F10.3/A,F10.3)') ' Time for setup    
',REAL(tgi-tbase)/REAL(tickspersec) , &
                                       ' Time per iteration
',REAL(tcg-tgi)/REAL(tickspersec*min(iter,maxiter)) , &
                                       ' Total Time        
',REAL(tcg-tbase)/REAL(tickspersec)
-deallocate(r,q,p,z,gi)
+deallocate(r,q,p,z,gi,t,u)
 contains
                                     
!=========================================
                                      ! Banded matrix multiply b = A.x
=========                                     
@@ -253,7 +253,7 @@ end subroutine GetGI2D               !==

                                     
!=========================================
                                      ! solve for a plane of cells using 
======
-subroutine NF2DPrecon(x,i1,i2)       ! 2D NF Preconditioning matrix
+subroutine NF2DPrecon(x,t,i1,i2)       ! 2D NF Preconditioning matrix
 integer :: i1 , i2
 real(dpkind),dimension(i2)::x,t
 integer :: i
@@ -272,11 +272,12 @@ end subroutine NF2DPrecon            !==
 subroutine GetGI3D(i1,i2)            ! compute gi for a 3D block of cells
=====
 integer :: i1 , i2
 integer :: i
+real(dpkind),dimension(nxyz)::t
 g = ad
 do i = i1 , i2 , nxy                 ! advance one plane at a time
    if ( i>i1 ) then                  ! get contribution from previous plane 
       g(i-nxy:i-1) = au3(i-nxy:i-1)
-      call NF2DPrecon(g,i-nxy,i-1)
+      call NF2DPrecon(g,t,i-nxy,i-1)
       g(i:i+nxy-1) = g(i:i+nxy-1) - au3(i-nxy:i-1)*g(i-nxy:i-1)
    endif
    call GetGI2D(i,i+nxy-1)           ! get contribution from this plane
@@ -285,17 +286,17 @@ end subroutine GetGI3D               !==

                                     
!=========================================
                                      ! solve for a 3D block of cells using 
-subroutine NF3DPrecon(x,i1,i2)       ! 3D Preconditioning matrix
+subroutine NF3DPrecon(x,t,u,i1,i2)       ! 3D Preconditioning matrix
 integer :: i1 , i2
-real(dpkind),dimension(i2)::x,t
+real(dpkind),dimension(i2)::x,t,u
 integer :: i
 do i = i1 , i2 , nxy
    if ( i>i1 ) x(i:i+nxy-1) = x(i:i+nxy-1) - au3(i-nxy:i-1)*x(i-nxy:i-1)
-   call NF2DPrecon(x,i,i+nxy-1)
+   call NF2DPrecon(x,u,i,i+nxy-1)
 enddo   
 do i = i2-2*nxy+1 , i1 , -nxy
    t(i:i+nxy-1) = au3(i:i+nxy-1)*x(i+nxy:i+2*nxy-1)
-   call NF2DPrecon(t,i,i+nxy-1)
+   call NF2DPrecon(t,u,i,i+nxy-1)
    x(i:i+nxy-1) = x(i:i+nxy-1) - t(i:i+nxy-1)
 enddo
 end subroutine NF3DPrecon           
!=========================================

cuts the execution time from ~28s to ~20s (Note that with the options I use all
the procs are inlined).


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-11-10 19:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-29 16:17 [Bug fortran/38318] New: moving the allocation of temps out of loops jv244 at cam dot ac dot uk
2008-12-06 20:29 ` [Bug fortran/38318] " tkoenig at gcc dot gnu dot org
2010-02-21  9:12 ` jv244 at cam dot ac dot uk
2010-02-21 11:07 ` burnus at gcc dot gnu dot org
2010-02-21 12:12 ` [Bug middle-end/38318] " jv244 at cam dot ac dot uk
     [not found] <bug-38318-4@http.gcc.gnu.org/bugzilla/>
2010-10-07 15:04 ` dominiq at lps dot ens.fr
2013-11-10 16:02 ` Joost.VandeVondele at mat dot ethz.ch
2013-11-10 16:26 ` glisse at gcc dot gnu.org
2013-11-10 16:52 ` Joost.VandeVondele at mat dot ethz.ch
2013-11-10 18:05 ` glisse at gcc dot gnu.org
2013-11-10 18:23 ` Joost.VandeVondele at mat dot ethz.ch
2013-11-10 18:40 ` glisse at gcc dot gnu.org
2013-11-10 19:13 ` Joost.VandeVondele at mat dot ethz.ch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).