public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/59345] New: _gfortran_internal_pack on compiler generated temps
@ 2013-11-29 14:31 Joost.VandeVondele at mat dot ethz.ch
2013-12-22 21:00 ` [Bug fortran/59345] " dominiq at lps dot ens.fr
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Joost.VandeVondele at mat dot ethz.ch @ 2013-11-29 14:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59345
Bug ID: 59345
Summary: _gfortran_internal_pack on compiler generated temps
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: Joost.VandeVondele at mat dot ethz.ch
There is a missed optimization on compiler generated temporaries. Basically:
SUBROUTINE S1(A)
REAL :: A(3)
CALL S2(-A)
END SUBROUTINE
leads to an optimized tree that contains calls to
_gfortran_internal_pack
_gfortran_internal_unpack
__builtin_free
which should not be needed as generated temps are known to be contiguous (in
particular in this case, where it is generated on the stack).
This would help to fully resolve PR38318 .
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug fortran/59345] _gfortran_internal_pack on compiler generated temps
2013-11-29 14:31 [Bug fortran/59345] New: _gfortran_internal_pack on compiler generated temps Joost.VandeVondele at mat dot ethz.ch
@ 2013-12-22 21:00 ` dominiq at lps dot ens.fr
2014-12-06 10:05 ` Joost.VandeVondele at mat dot ethz.ch
2014-12-06 15:49 ` Joost.VandeVondele at mat dot ethz.ch
2 siblings, 0 replies; 4+ messages in thread
From: dominiq at lps dot ens.fr @ 2013-12-22 21:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59345
Dominique d'Humieres <dominiq at lps dot ens.fr> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2013-12-22
Ever confirmed|0 |1
--- Comment #1 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
Confirmed at r206155.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug fortran/59345] _gfortran_internal_pack on compiler generated temps
2013-11-29 14:31 [Bug fortran/59345] New: _gfortran_internal_pack on compiler generated temps Joost.VandeVondele at mat dot ethz.ch
2013-12-22 21:00 ` [Bug fortran/59345] " dominiq at lps dot ens.fr
@ 2014-12-06 10:05 ` Joost.VandeVondele at mat dot ethz.ch
2014-12-06 15:49 ` Joost.VandeVondele at mat dot ethz.ch
2 siblings, 0 replies; 4+ messages in thread
From: Joost.VandeVondele at mat dot ethz.ch @ 2014-12-06 10:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59345
Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2013-12-22 00:00:00 |2014-12-6
CC| |Joost.VandeVondele at mat dot ethz
| |.ch
Known to fail| |4.9.2, 5.0
--- Comment #2 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> ---
still happens with trunk.
In the microbenchmark below, seems like a 3-fold overhead due to packing. This
is similar to using an assumed shape dummy arg as a temp, while in the latter
case, this can be fixed with the contiguous attribute. Could the solution be as
simple as somehow providing the 'contiguous' attribute to compiler generated
temporaries ?
> gfortran -Ofast -fno-inline t.f90
> ./a.out
with packing: 1.8157229999999998 sec.
without packing: 0.49092599999999997 sec.
assumed shape, no contiguous : 1.9047100000000006 sec.
assumed shape, contiguous : 0.46692899999999948 sec.
total calls to foo: 400000000 expected 200000000
> cat t.f90
MODULE M
INTEGER, SAVE :: count=0
CONTAINS
SUBROUTINE S1(A,foo)
REAL :: A(3)
CALL foo(-A)
END SUBROUTINE
SUBROUTINE S2(A,foo)
REAL :: A(3)
REAL :: B(3)
B=-A
CALL foo(B)
END SUBROUTINE
SUBROUTINE S3(A,B,foo)
REAL :: A(3)
REAL :: B(:)
B=-A
CALL foo(B)
END SUBROUTINE
SUBROUTINE S4(A,B,foo)
REAL :: A(3)
REAL, CONTIGUOUS :: B(:)
B=-A
CALL foo(B)
END SUBROUTINE
SUBROUTINE foo(A)
REAL :: A(3)
count=count+1
END SUBROUTINE
END MODULE
PROGRAM TEST
USE M
IMPLICIT NONE
REAL :: A(3),B(3)
INTEGER :: i
REAL*8 :: t1,t2,t3,t4,t5,t6,t7,t8
INTEGER :: N
A=0
N=100000000
CALL CPU_TIME(t1)
DO i=1,N
CALL S1(A,foo)
ENDDO
CALL CPU_TIME(t2)
CALL CPU_TIME(t3)
DO i=1,N
CALL S2(A,foo)
ENDDO
CALL CPU_TIME(t4)
CALL CPU_TIME(t5)
DO i=1,N
CALL S3(A,B,foo)
ENDDO
CALL CPU_TIME(t6)
CALL CPU_TIME(t7)
DO i=1,N
CALL S4(A,B,foo)
ENDDO
CALL CPU_TIME(t8)
WRITE(6,*) "with packing:", t2-t1, " sec."
WRITE(6,*) "without packing:", t4-t3, "sec. "
WRITE(6,*) "assumed shape, no contiguous :", t6-t5, "sec. "
WRITE(6,*) "assumed shape, contiguous :", t8-t7, "sec. "
WRITE(6,*) "total calls to foo:", count, "expected", 2*N
END
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug fortran/59345] _gfortran_internal_pack on compiler generated temps
2013-11-29 14:31 [Bug fortran/59345] New: _gfortran_internal_pack on compiler generated temps Joost.VandeVondele at mat dot ethz.ch
2013-12-22 21:00 ` [Bug fortran/59345] " dominiq at lps dot ens.fr
2014-12-06 10:05 ` Joost.VandeVondele at mat dot ethz.ch
@ 2014-12-06 15:49 ` Joost.VandeVondele at mat dot ethz.ch
2 siblings, 0 replies; 4+ messages in thread
From: Joost.VandeVondele at mat dot ethz.ch @ 2014-12-06 15:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59345
--- Comment #3 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> ---
I'm pasting here another testcase, since I think it is related.
This works as it should (i.e. no pack/unpack), an allocatable as function
result:
> cat tt.f90
SUBROUTINE S1(A)
INTERFACE
FUNCTION CONTIGUOUS_F1() RESULT(res)
INTEGER, ALLOCATABLE :: res(:)
END FUNCTION
END INTERFACE
CALL S2(CONTIGUOUS_F1())
END SUBROUTINE
This generates a pack/unpack as well, i.e. an array that is a function result:
> cat tt.f90
SUBROUTINE S1(A)
INTERFACE
FUNCTION CONTIGUOUS_F1() RESULT(res)
INTEGER :: res(5)
END FUNCTION
END INTERFACE
CALL S2(CONTIGUOUS_F1())
END SUBROUTINE
This also leads to a pack, a function that returns an allocatable, but called
via a procedure pointer.
> cat tt.f90
SUBROUTINE S1(A)
INTERFACE
FUNCTION CONTIGUOUS_F1() RESULT(res)
INTEGER, ALLOCATABLE :: res(:)
END FUNCTION
END INTERFACE
PROCEDURE(CONTIGUOUS_F1), POINTER :: A
CALL S2(A())
END SUBROUTINE
In these cases, the issue seems that gfc_is_simply_contiguous returns false,
while maybe it should return true ?
I think this is also the reason things go wrong with the testcase in comment
#1, this is an EXPR_OP, and somehow might be simply_contiguous nevertheless.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-12-06 15:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-29 14:31 [Bug fortran/59345] New: _gfortran_internal_pack on compiler generated temps Joost.VandeVondele at mat dot ethz.ch
2013-12-22 21:00 ` [Bug fortran/59345] " dominiq at lps dot ens.fr
2014-12-06 10:05 ` Joost.VandeVondele at mat dot ethz.ch
2014-12-06 15:49 ` Joost.VandeVondele at mat dot ethz.ch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).