* [patch, libfortran] Speed up / orthogonalize in_pack and in_unpack
@ 2008-03-19 9:51 Thomas Koenig
2008-03-19 10:53 ` FX
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Koenig @ 2008-03-19 9:51 UTC (permalink / raw)
To: fortran, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 2701 bytes --]
Hello world,
now that Real World (TM) time constraints have lessened somewhat, I
finally have some time again to hack on gfortran :-)
Here's a patch that adds a few missing types (kind=1 and 2 integer,
kind=10 and kind=16 complex and real) to the internal routines that are
called by internal_pack and internal_unpack, so we don't have to call
memcpy on these. It also provides a routine for each of the real types
(I favored orthogonality over library code size for this).
The test cases check that there are no regressions for any
of the types that were touched.
Regression-tested on i686-pc-linux-gnu (although I couldn't test
the large-integer case). OK?
Thomas
2008-03-19 Thomas Koenig <tkoenig@gcc.gnu.org>
PR libfortran/32972
* Makefile.am (in_pack_c): Add in_pack_i1.c, in_pack_i2.c,
in_pack_r4.c, in_pack_r8.c, in_pack_r10.c and in_pack_r16.c.
(in_unpack_c): Add in_unpack_i1.c, in_unpack_i2.c,
in_unpack_r4.c, in_unpack_r8.c, in_unpack_r10.c and
in_unpack_r16.c.
* Makefile.in: Regenerate.
* libgfortran.h: Add prototypes for internal_pack_1,
internal_pack_2, internal_pack_16, internal_pack_r4,
internal_pack_r8, internal_pack_r10, internal_pack_r16,
internal_pack_c10 and internal_pack_c16. Add prototypes for
internal_unpack_1, internal_unpack_2, internal_unpack_16,
internal_unpack_r4, internal_unpack_r8, internal_unpack_r10,
internal_unpack_r16, internal_unpack_c10 and
internal_unpack_c16.
* runtime/in_pack_generic.c (internal_pack): Use sizeof instead
of hardwired sizes.
Add calls to internal_pack_1, internal_pack_2,
internal_pack_16, internal_pack_r4, internal_pack_r8,
internal_pack_r10, internal_pack_r16, internal_pack_c10 and
internal_pack_c16.
* runtime/in_unpack_generic.c (internal_unpack): Use sizeof
instead of hardwired sizes.
Add calls to internal_unpack_1, internal_unpack_2,
internal_unpack_16, internal_unpack_r4, internal_unpack_r8,
internal_unpack_r10, internal_unpack_r16, internal_unpack_c10
and internal_unpack_c16.
* generated/in_pack_r4.c: New file.
* generated/in_pack_i2.c: New file.
* generated/in_unpack_i1.c: New file.
* generated/in_pack_r10.c: New file.
* generated/in_unpack_r4.c: New file.
* generated/in_unpack_i2.c: New file.
* generated/in_unpack_r16.c: New file.
* generated/in_pack_r8.c: New file.
* generated/in_unpack_r10.c: New file.
* generated/in_unpack_r8.c: New file.
* generated/in_pack_r16.c: New file.
* generated/in_pack_i1.c: New file.
2008-03-19 Thomas Koenig <tkoenig@gcc.gnu.org>
PR libfortran/32972
* gfortran.dg/internal_pack_1.f90: New test case.
* gfortran.dg/internal_pack_2.f90: New test case.
* gfortran.dg/internal_pack_3.f90: New test case.
[-- Attachment #2: internal_pack_1.f90 --]
[-- Type: text/x-fortran, Size: 1917 bytes --]
! { dg-do run }
! Test that the internal pack and unpack routines work OK
! for different data types
program main
integer(kind=1), dimension(3) :: i1
integer(kind=2), dimension(3) :: i2
integer(kind=4), dimension(3) :: i4
integer(kind=8), dimension(3) :: i8
real(kind=4), dimension(3) :: r4
real(kind=8), dimension(3) :: r8
i1 = (/ -1, 1, -3 /)
call sub_i1(i1(1:3:2))
if (any(i1 /= (/ 3, 1, 2 /))) call abort
i2 = (/ -1, 1, -3 /)
call sub_i2(i2(1:3:2))
if (any(i2 /= (/ 3, 1, 2 /))) call abort
i4 = (/ -1, 1, -3 /)
call sub_i4(i4(1:3:2))
if (any(i4 /= (/ 3, 1, 2 /))) call abort
i8 = (/ -1, 1, -3 /)
call sub_i8(i8(1:3:2))
if (any(i8 /= (/ 3, 1, 2 /))) call abort
r4 = (/ -1.0, 1.0, -3.0 /)
call sub_r4(r4(1:3:2))
if (any(r4 /= (/ 3.0, 1.0, 2.0/))) call abort
r8 = (/ -1.0_8, 1.0_8, -3.0_8 /)
call sub_r8(r8(1:3:2))
if (any(r8 /= (/ 3.0_8, 1.0_8, 2.0_8/))) call abort
end program main
subroutine sub_i1(i)
integer(kind=1), dimension(2) :: i
if (i(1) /= -1) call abort
if (i(2) /= -3) call abort
i(1) = 3
i(2) = 2
end subroutine sub_i1
subroutine sub_i2(i)
integer(kind=2), dimension(2) :: i
if (i(1) /= -1) call abort
if (i(2) /= -3) call abort
i(1) = 3
i(2) = 2
end subroutine sub_i2
subroutine sub_i4(i)
integer(kind=4), dimension(2) :: i
if (i(1) /= -1) call abort
if (i(2) /= -3) call abort
i(1) = 3
i(2) = 2
end subroutine sub_i4
subroutine sub_i8(i)
integer(kind=8), dimension(2) :: i
if (i(1) /= -1) call abort
if (i(2) /= -3) call abort
i(1) = 3
i(2) = 2
end subroutine sub_i8
subroutine sub_r4(r)
real(kind=4), dimension(2) :: r
if (r(1) /= -1.) call abort
if (r(2) /= -3.) call abort
r(1) = 3.
r(2) = 2.
end subroutine sub_r4
subroutine sub_r8(r)
real(kind=8), dimension(2) :: r
if (r(1) /= -1._8) call abort
if (r(2) /= -3._8) call abort
r(1) = 3._8
r(2) = 2._8
end subroutine sub_r8
[-- Attachment #3: internal_pack_2.f90 --]
[-- Type: text/x-fortran, Size: 678 bytes --]
! { dg-do run }
! { dg-require-effective-target fortran_large_real }
! Test that the internal pack and unpack routines work OK
! for our large real type.
program main
implicit none
integer,parameter :: k = selected_real_kind (precision (0.0_8) + 1)
real(kind=k), dimension(3) :: rk
rk = (/ -1.0_k, 1.0_k, -3.0_k /)
call sub_rk(rk(1:3:2))
if (any(rk /= (/ 3.0_k, 1.0_k, 2.0_k/))) call abort
end program main
subroutine sub_rk(r)
implicit none
integer,parameter :: k = selected_real_kind (precision (0.0_8) + 1)
real(kind=k), dimension(2) :: r
if (r(1) /= -1._k) call abort
if (r(2) /= -3._k) call abort
r(1) = 3._k
r(2) = 2._k
end subroutine sub_rk
[-- Attachment #4: internal_pack_3.f90 --]
[-- Type: text/x-fortran, Size: 541 bytes --]
! { dg-do run }
! { dg-require-effective-target fortran_large_int }
! Test that the internal pack and unpack routines work OK
! for our large integer type.
program main
integer,parameter :: k = selected_int_kind (range (0_8) + 1)
integer(kind=k), dimension(3) :: ik
ik = (/ -1, 1, -3 /)
call sub_ik(ik(1:3:2))
if (any(ik /= (/ 3, 1, 2 /))) call abort
end program main
subroutine sub_ik(i)
integer(kind=k), dimension(2) :: i
if (i(1) /= -1) call abort
if (i(2) /= -3) call abort
i(1) = 3
i(2) = 2
end subroutine sub_ik
[-- Attachment #5: patch-1 --]
[-- Type: text/x-patch, Size: 8470 bytes --]
Index: Makefile.am
===================================================================
--- Makefile.am (revision 133308)
+++ Makefile.am (working copy)
@@ -380,18 +380,30 @@ $(srcdir)/generated/cshift1_8.c \
$(srcdir)/generated/cshift1_16.c
in_pack_c = \
+$(srcdir)/generated/in_pack_i1.c \
+$(srcdir)/generated/in_pack_i2.c \
$(srcdir)/generated/in_pack_i4.c \
$(srcdir)/generated/in_pack_i8.c \
$(srcdir)/generated/in_pack_i16.c \
+$(srcdir)/generated/in_pack_r4.c \
+$(srcdir)/generated/in_pack_r8.c \
+$(srcdir)/generated/in_pack_r10.c \
+$(srcdir)/generated/in_pack_r16.c \
$(srcdir)/generated/in_pack_c4.c \
$(srcdir)/generated/in_pack_c8.c \
$(srcdir)/generated/in_pack_c10.c \
$(srcdir)/generated/in_pack_c16.c
in_unpack_c = \
+$(srcdir)/generated/in_unpack_i1.c \
+$(srcdir)/generated/in_unpack_i2.c \
$(srcdir)/generated/in_unpack_i4.c \
$(srcdir)/generated/in_unpack_i8.c \
$(srcdir)/generated/in_unpack_i16.c \
+$(srcdir)/generated/in_unpack_r4.c \
+$(srcdir)/generated/in_unpack_r8.c \
+$(srcdir)/generated/in_unpack_r10.c \
+$(srcdir)/generated/in_unpack_r16.c \
$(srcdir)/generated/in_unpack_c4.c \
$(srcdir)/generated/in_unpack_c8.c \
$(srcdir)/generated/in_unpack_c10.c \
Index: libgfortran.h
===================================================================
--- libgfortran.h (revision 133308)
+++ libgfortran.h (working copy)
@@ -609,10 +609,15 @@ extern void reshape_packed (char *, inde
const char *, index_type);
internal_proto(reshape_packed);
-/* Repacking functions. */
+/* Repacking functions. These are called internally by internal_pack
+ and internal_unpack. */
+
+GFC_INTEGER_1 *internal_pack_1 (gfc_array_i1 *);
+internal_proto(internal_pack_1);
+
+GFC_INTEGER_2 *internal_pack_2 (gfc_array_i2 *);
+internal_proto(internal_pack_2);
-/* ??? These aren't currently used by the compiler, though we
- certainly could do so. */
GFC_INTEGER_4 *internal_pack_4 (gfc_array_i4 *);
internal_proto(internal_pack_4);
@@ -624,6 +629,22 @@ GFC_INTEGER_16 *internal_pack_16 (gfc_ar
internal_proto(internal_pack_16);
#endif
+GFC_REAL_4 *internal_pack_r4 (gfc_array_r4 *);
+internal_proto(internal_pack_r4);
+
+GFC_REAL_8 *internal_pack_r8 (gfc_array_r8 *);
+internal_proto(internal_pack_r8);
+
+#if defined HAVE_GFC_REAL_10
+GFC_REAL_10 *internal_pack_r10 (gfc_array_r10 *);
+internal_proto(internal_pack_r10);
+#endif
+
+#if defined HAVE_GFC_REAL_16
+GFC_REAL_16 *internal_pack_r16 (gfc_array_r16 *);
+internal_proto(internal_pack_r16);
+#endif
+
GFC_COMPLEX_4 *internal_pack_c4 (gfc_array_c4 *);
internal_proto(internal_pack_c4);
@@ -635,6 +656,17 @@ GFC_COMPLEX_10 *internal_pack_c10 (gfc_a
internal_proto(internal_pack_c10);
#endif
+#if defined HAVE_GFC_COMPLEX_16
+GFC_COMPLEX_16 *internal_pack_c16 (gfc_array_c16 *);
+internal_proto(internal_pack_c16);
+#endif
+
+extern void internal_unpack_1 (gfc_array_i1 *, const GFC_INTEGER_1 *);
+internal_proto(internal_unpack_1);
+
+extern void internal_unpack_2 (gfc_array_i2 *, const GFC_INTEGER_2 *);
+internal_proto(internal_unpack_2);
+
extern void internal_unpack_4 (gfc_array_i4 *, const GFC_INTEGER_4 *);
internal_proto(internal_unpack_4);
@@ -646,6 +678,22 @@ extern void internal_unpack_16 (gfc_arra
internal_proto(internal_unpack_16);
#endif
+extern void internal_unpack_r4 (gfc_array_r4 *, const GFC_REAL_4 *);
+internal_proto(internal_unpack_r4);
+
+extern void internal_unpack_r8 (gfc_array_r8 *, const GFC_REAL_8 *);
+internal_proto(internal_unpack_r8);
+
+#if defined HAVE_GFC_REAL_10
+extern void internal_unpack_r10 (gfc_array_r10 *, const GFC_REAL_10 *);
+internal_proto(internal_unpack_r10);
+#endif
+
+#if defined HAVE_GFC_REAL_16
+extern void internal_unpack_r16 (gfc_array_r16 *, const GFC_REAL_16 *);
+internal_proto(internal_unpack_r16);
+#endif
+
extern void internal_unpack_c4 (gfc_array_c4 *, const GFC_COMPLEX_4 *);
internal_proto(internal_unpack_c4);
Index: runtime/in_pack_generic.c
===================================================================
--- runtime/in_pack_generic.c (revision 133308)
+++ runtime/in_pack_generic.c (working copy)
@@ -65,25 +65,65 @@ internal_pack (gfc_array_char * source)
{
case GFC_DTYPE_INTEGER:
case GFC_DTYPE_LOGICAL:
- case GFC_DTYPE_REAL:
switch (size)
{
- case 4:
- return internal_pack_4 ((gfc_array_i4 *)source);
+ case sizeof (GFC_INTEGER_1):
+ return internal_pack_1 ((gfc_array_i1 *) source);
+
+ case sizeof (GFC_INTEGER_2):
+ return internal_pack_2 ((gfc_array_i2 *) source);
+
+ case sizeof (GFC_INTEGER_4):
+ return internal_pack_4 ((gfc_array_i4 *) source);
- case 8:
- return internal_pack_8 ((gfc_array_i8 *)source);
+ case sizeof (GFC_INTEGER_8):
+ return internal_pack_8 ((gfc_array_i8 *) source);
+
+#if defined(HAVE_GFC_INTEGER_16)
+ case sizeof (GFC_INTEGER_16):
+ return internal_pack_16 (gfc_array_i16 *) source);
+#endif
}
break;
+ case GFC_DTYPE_REAL:
+ switch (size)
+ {
+ case sizeof (GFC_REAL_4):
+ return internal_pack_r4 ((gfc_array_r4 *) source);
+
+ case sizeof (GFC_REAL_8):
+ return internal_pack_r8 ((gfc_array_r8 *) source);
+
+#if defined (HAVE_GFC_REAL_10)
+ case sizeof (GFC_REAL_10):
+ return internal_pack_r10 ((gfc_array_r10 *) source);
+#endif
+
+#if defined (HAVE_GFC_REAL_16)
+ case sizeof (GFC_REAL_16):
+ return internal_pack_r16 ((gfc_array_r16 *) source);
+#endif
+ }
case GFC_DTYPE_COMPLEX:
switch (size)
{
- case 8:
- return internal_pack_c4 ((gfc_array_c4 *)source);
+ case sizeof (GFC_COMPLEX_4):
+ return internal_pack_c4 ((gfc_array_c4 *) source);
- case 16:
- return internal_pack_c8 ((gfc_array_c8 *)source);
+ case sizeof (GFC_COMPLEX_8):
+ return internal_pack_c8 ((gfc_array_c8 *) source);
+
+#if defined (HAVE_GFC_COMPLEX_10)
+ case sizeof (GFC_COMPLEX_10):
+ return internal_pack_c10 ((gfc_array_c10 *) source);
+#endif
+
+#if defined (HAVE_GFC_COMPLEX_16)
+ case sizeof (GFC_COMPLEX_16):
+ return internal_pack_c16 ((gfc_array_c16 *) source);
+#endif
+
}
break;
Index: runtime/in_unpack_generic.c
===================================================================
--- runtime/in_unpack_generic.c (revision 133308)
+++ runtime/in_unpack_generic.c (working copy)
@@ -62,29 +62,80 @@ internal_unpack (gfc_array_char * d, con
{
case GFC_DTYPE_INTEGER:
case GFC_DTYPE_LOGICAL:
- case GFC_DTYPE_REAL:
switch (size)
{
- case 4:
- internal_unpack_4 ((gfc_array_i4 *)d, (const GFC_INTEGER_4 *)s);
+ case sizeof (GFC_INTEGER_1):
+ internal_unpack_1 ((gfc_array_i1 *) d, (const GFC_INTEGER_1 *) s);
+ return;
+
+ case sizeof (GFC_INTEGER_2):
+ internal_unpack_2 ((gfc_array_i2 *) d, (const GFC_INTEGER_2 *) s);
+ return;
+
+ case sizeof (GFC_INTEGER_4):
+ internal_unpack_4 ((gfc_array_i4 *) d, (const GFC_INTEGER_4 *) s);
+ return;
+
+ case sizeof (GFC_INTEGER_8):
+ internal_unpack_8 ((gfc_array_i8 *) d, (const GFC_INTEGER_8 *) s);
return;
- case 8:
- internal_unpack_8 ((gfc_array_i8 *)d, (const GFC_INTEGER_8 *)s);
+#if defined (HAVE_GFC_INTEGER_16)
+ case sizeof (GFC_INTEGER_16):
+ internal_unpack_16 ((gfc_array_i16 *) d, (const GFC_INTEGER_16 *) s);
return;
+#endif
}
break;
+ case GFC_DTYPE_REAL:
+ switch (size)
+ {
+ case sizeof (GFC_REAL_4):
+ internal_unpack_r4 ((gfc_array_r4 *) d, (const GFC_REAL_4 *) s);
+ return;
+
+ case sizeof (GFC_REAL_8):
+ internal_unpack_r8 ((gfc_array_r8 *) d, (const GFC_REAL_8 *) s);
+ return;
+
+#if defined(HAVE_GFC_REAL_10)
+ case sizeof (GFC_REAL_10):
+ internal_unpack_r10 ((gfc_array_r10 *) d, (const GFC_REAL_10 *) s);
+ return;
+#endif
+
+#if defined(HAVE_GFC_REAL_16)
+ case sizeof (GFC_REAL_16):
+ internal_unpack_r16 ((gfc_array_r16 *) d, (const GFC_REAL_16 *) s);
+ return;
+#endif
+
+ }
+
case GFC_DTYPE_COMPLEX:
switch (size)
{
- case 8:
+ case sizeof (GFC_COMPLEX_4):
internal_unpack_c4 ((gfc_array_c4 *)d, (const GFC_COMPLEX_4 *)s);
return;
- case 16:
+ case sizeof (GFC_COMPLEX_8):
internal_unpack_c8 ((gfc_array_c8 *)d, (const GFC_COMPLEX_8 *)s);
return;
+
+#if defined(HAVE_GFC_COMPLEX_10)
+ case sizeof (GFC_COMPLEX_10):
+ internal_unpack_c10 ((gfc_array_c10 *) d, (const GFC_COMPLEX_10 *) s);
+ return;
+#endif
+
+#if defined(HAVE_GFC_COMPLEX_16)
+ case sizeof (GFC_COMPLEX_16):
+ internal_unpack_c16 ((gfc_array_c16 *) d, (const GFC_COMPLEX_16 *) s);
+ return;
+#endif
+
}
default:
break;
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch, libfortran] Speed up / orthogonalize in_pack and in_unpack
2008-03-19 9:51 [patch, libfortran] Speed up / orthogonalize in_pack and in_unpack Thomas Koenig
@ 2008-03-19 10:53 ` FX
2008-03-19 19:50 ` Thomas Koenig
0 siblings, 1 reply; 5+ messages in thread
From: FX @ 2008-03-19 10:53 UTC (permalink / raw)
To: Thomas Koenig; +Cc: fortran, gcc-patches
> 2008-03-19 Thomas Koenig <tkoenig@gcc.gnu.org>
>
> PR libfortran/32972
> * Makefile.am (in_pack_c): Add in_pack_i1.c, in_pack_i2.c,
> in_pack_r4.c, in_pack_r8.c, in_pack_r10.c and in_pack_r16.c.
> (in_unpack_c): Add in_unpack_i1.c, in_unpack_i2.c,
> in_unpack_r4.c, in_unpack_r8.c, in_unpack_r10.c and
> in_unpack_r16.c.
> * Makefile.in: Regenerate.
> * libgfortran.h: Add prototypes for internal_pack_1,
> internal_pack_2, internal_pack_16, internal_pack_r4,
> internal_pack_r8, internal_pack_r10, internal_pack_r16,
> internal_pack_c10 and internal_pack_c16. Add prototypes for
> internal_unpack_1, internal_unpack_2, internal_unpack_16,
> internal_unpack_r4, internal_unpack_r8, internal_unpack_r10,
> internal_unpack_r16, internal_unpack_c10 and
> internal_unpack_c16.
> * runtime/in_pack_generic.c (internal_pack): Use sizeof instead
> of hardwired sizes.
> Add calls to internal_pack_1, internal_pack_2,
> internal_pack_16, internal_pack_r4, internal_pack_r8,
> internal_pack_r10, internal_pack_r16, internal_pack_c10 and
> internal_pack_c16.
> * runtime/in_unpack_generic.c (internal_unpack): Use sizeof
> instead of hardwired sizes.
> Add calls to internal_unpack_1, internal_unpack_2,
> internal_unpack_16, internal_unpack_r4, internal_unpack_r8,
> internal_unpack_r10, internal_unpack_r16, internal_unpack_c10
> and internal_unpack_c16.
> * generated/in_pack_r4.c: New file.
> * generated/in_pack_i2.c: New file.
> * generated/in_unpack_i1.c: New file.
> * generated/in_pack_r10.c: New file.
> * generated/in_unpack_r4.c: New file.
> * generated/in_unpack_i2.c: New file.
> * generated/in_unpack_r16.c: New file.
> * generated/in_pack_r8.c: New file.
> * generated/in_unpack_r10.c: New file.
> * generated/in_unpack_r8.c: New file.
> * generated/in_pack_r16.c: New file.
> * generated/in_pack_i1.c: New file.
OK
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch, libfortran] Speed up / orthogonalize in_pack and in_unpack
2008-03-19 10:53 ` FX
@ 2008-03-19 19:50 ` Thomas Koenig
2008-03-20 4:32 ` Jerry DeLisle
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Koenig @ 2008-03-19 19:50 UTC (permalink / raw)
To: FX; +Cc: fortran, gcc-patches
On Wed, 2008-03-19 at 10:11 +0000, FX wrote:
> OK
Committed, together with Dominique's correction of a typo as noted
in the PR. Thanks!
For the record, here is an indication of the execution speed advantage
for real(10) on an i686-pc-linux-gnu:
$ cat foo.f90
program main
real(kind=10) a(10000)
a = 100.
do i=1,10000
call foo(a(1:10000:2))
end do
end program main
subroutine foo(a)
real(kind=10) a(5000)
a(1) = 1.
a(499) = 1.
end subroutine foo
$ gfortran-4.3 -O3 -static foo.f90
$ time ./a.out
real 0m3.654s
user 0m3.644s
sys 0m0.000s
$ gfortran -O3 -static foo.f90
$ time ./a.out
real 0m0.719s
user 0m0.700s
sys 0m0.000s
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch, libfortran] Speed up / orthogonalize in_pack and in_unpack
2008-03-19 19:50 ` Thomas Koenig
@ 2008-03-20 4:32 ` Jerry DeLisle
2008-03-20 15:59 ` Thomas Koenig
0 siblings, 1 reply; 5+ messages in thread
From: Jerry DeLisle @ 2008-03-20 4:32 UTC (permalink / raw)
To: Thomas Koenig; +Cc: FX, fortran, gcc-patches
Thomas Koenig wrote:
> On Wed, 2008-03-19 at 10:11 +0000, FX wrote:
>
>> OK
>
> Committed, together with Dominique's correction of a typo as noted
> in the PR. Thanks!
>
> For the record, here is an indication of the execution speed advantage
> for real(10) on an i686-pc-linux-gnu:
>
> $ cat foo.f90
> program main
> real(kind=10) a(10000)
> a = 100.
> do i=1,10000
> call foo(a(1:10000:2))
> end do
> end program main
>
> subroutine foo(a)
> real(kind=10) a(5000)
> a(1) = 1.
> a(499) = 1.
> end subroutine foo
>
> $ gfortran-4.3 -O3 -static foo.f90
> $ time ./a.out
>
> real 0m3.654s
> user 0m3.644s
> sys 0m0.000s
> $ gfortran -O3 -static foo.f90
> $ time ./a.out
>
> real 0m0.719s
> user 0m0.700s
> sys 0m0.000s
>
>
>
Thats a very nice improvement! Can we see any shifts in polyhedron or spec?
Great job.
Jerry
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch, libfortran] Speed up / orthogonalize in_pack and in_unpack
2008-03-20 4:32 ` Jerry DeLisle
@ 2008-03-20 15:59 ` Thomas Koenig
0 siblings, 0 replies; 5+ messages in thread
From: Thomas Koenig @ 2008-03-20 15:59 UTC (permalink / raw)
To: Jerry DeLisle; +Cc: FX, fortran, gcc-patches
On Wed, 2008-03-19 at 18:39 -0700, Jerry DeLisle wrote:
> Can we see any shifts in polyhedron or spec?
I suspect not for polyhedron; the improvement is for
kind=10 and kind=16 real and complex types, plus for
kind=1 and kind=2 integers. I don't have access to SPEC
sources to see if repacking arrays makes a big difference
there.
I'll continue working along those lines for
the intrinsics (currently working on pack and
unpack).
Thomas
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-03-20 15:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-19 9:51 [patch, libfortran] Speed up / orthogonalize in_pack and in_unpack Thomas Koenig
2008-03-19 10:53 ` FX
2008-03-19 19:50 ` Thomas Koenig
2008-03-20 4:32 ` Jerry DeLisle
2008-03-20 15:59 ` Thomas Koenig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).