* [patch, libgfortran] Use memcpy in a few more places for eoshift
@ 2017-07-03 22:06 Thomas Koenig
2017-07-08 11:57 ` *Ping* " Thomas Koenig
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Koenig @ 2017-07-03 22:06 UTC (permalink / raw)
To: fortran, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]
Hello world,
attached are a few more speedups for special eoshift cases. This
time, nothing fancy, just use memcpy for copying in the
contiguous case.
I am still looking at eoshift2 (scalar shift, array boundary)
to see if it would be possible to duplicate the speed gains for
eoshift0 (scalar shift, scalar boundary), but it won't hurt
to do this first. At least the shift along dimension 1
should be faster by about a factor of two.
I have also added a few test cases which test eoshift in all
the variants touched by this patch.
Regression-testing as I write this. I don't expect anything bad
(because I tested all test cases containing *eoshift*).
OK for trunk if this passes?
Regards
Thomas
2017-06-03 Thomas Koenig <tkoenig@gcc.gnu.org>
* intrinsics/eoshift2.c (eoshift2): Use memcpy
for innermost copy where possible.
* m4/eoshift1.m4 (eoshift1): Likewise.
* m4/eoshift3.m4 (eoshift3): Likewise.
* generated/eoshift1_16.c: Regenerated.
* generated/eoshift1_4.c: Regenerated.
* generated/eoshift1_8.c: Regenerated.
* generated/eoshift3_16.c: Regenerated.
* generated/eoshift3_4.c: Regenerated.
* generated/eoshift3_8.c: Regenerated.
2017-06-03 Thomas Koenig <tkoenig@gcc.gnu.org>
* gfortran.dg/eoshift_4.f90: New test.
* gfortran.dg/eoshift_5.f90: New test.
* gfortran.dg/eoshift_6.f90: New test.
[-- Attachment #2: p1.diff --]
[-- Type: text/x-patch, Size: 8255 bytes --]
Index: intrinsics/eoshift2.c
===================================================================
--- intrinsics/eoshift2.c (Revision 249936)
+++ intrinsics/eoshift2.c (Arbeitskopie)
@@ -181,12 +181,23 @@ eoshift2 (gfc_array_char *ret, const gfc_array_cha
src = sptr;
dest = &rptr[-shift * roffset];
}
- for (n = 0; n < len; n++)
- {
- memcpy (dest, src, size);
- dest += roffset;
- src += soffset;
- }
+
+ /* If the elements are contiguous, perform a single block move. */
+ if (soffset == size && roffset == size)
+ {
+ size_t chunk = size * len;
+ memcpy (dest, src, chunk);
+ dest += chunk;
+ }
+ else
+ {
+ for (n = 0; n < len; n++)
+ {
+ memcpy (dest, src, size);
+ dest += roffset;
+ src += soffset;
+ }
+ }
if (shift >= 0)
{
n = shift;
Index: m4/eoshift1.m4
===================================================================
--- m4/eoshift1.m4 (Revision 249936)
+++ m4/eoshift1.m4 (Arbeitskopie)
@@ -184,12 +184,23 @@ eoshift1 (gfc_array_char * const restrict ret,
src = sptr;
dest = &rptr[delta * roffset];
}
- for (n = 0; n < len - delta; n++)
- {
- memcpy (dest, src, size);
- dest += roffset;
- src += soffset;
- }
+
+ /* If the elements are contiguous, perform a single block move. */
+ if (soffset == size && roffset == size)
+ {
+ size_t chunk = size * (len - delta);
+ memcpy (dest, src, chunk);
+ dest += chunk;
+ }
+ else
+ {
+ for (n = 0; n < len - delta; n++)
+ {
+ memcpy (dest, src, size);
+ dest += roffset;
+ src += soffset;
+ }
+ }
if (sh < 0)
dest = rptr;
n = delta;
Index: m4/eoshift3.m4
===================================================================
--- m4/eoshift3.m4 (Revision 249936)
+++ m4/eoshift3.m4 (Arbeitskopie)
@@ -199,12 +199,24 @@ eoshift3 (gfc_array_char * const restrict ret,
src = sptr;
dest = &rptr[delta * roffset];
}
- for (n = 0; n < len - delta; n++)
- {
- memcpy (dest, src, size);
- dest += roffset;
- src += soffset;
- }
+
+ /* If the elements are contiguous, perform a single block move. */
+ if (soffset == size && roffset == size)
+ {
+ size_t chunk = size * (len - delta);
+ memcpy (dest, src, chunk);
+ dest += chunk;
+ }
+ else
+ {
+ for (n = 0; n < len - delta; n++)
+ {
+ memcpy (dest, src, size);
+ dest += roffset;
+ src += soffset;
+ }
+ }
+
if (sh < 0)
dest = rptr;
n = delta;
Index: generated/eoshift1_16.c
===================================================================
--- generated/eoshift1_16.c (Revision 249936)
+++ generated/eoshift1_16.c (Arbeitskopie)
@@ -183,12 +183,23 @@ eoshift1 (gfc_array_char * const restrict ret,
src = sptr;
dest = &rptr[delta * roffset];
}
- for (n = 0; n < len - delta; n++)
- {
- memcpy (dest, src, size);
- dest += roffset;
- src += soffset;
- }
+
+ /* If the elements are contiguous, perform a single block move. */
+ if (soffset == size && roffset == size)
+ {
+ size_t chunk = size * (len - delta);
+ memcpy (dest, src, chunk);
+ dest += chunk;
+ }
+ else
+ {
+ for (n = 0; n < len - delta; n++)
+ {
+ memcpy (dest, src, size);
+ dest += roffset;
+ src += soffset;
+ }
+ }
if (sh < 0)
dest = rptr;
n = delta;
Index: generated/eoshift1_4.c
===================================================================
--- generated/eoshift1_4.c (Revision 249936)
+++ generated/eoshift1_4.c (Arbeitskopie)
@@ -183,12 +183,23 @@ eoshift1 (gfc_array_char * const restrict ret,
src = sptr;
dest = &rptr[delta * roffset];
}
- for (n = 0; n < len - delta; n++)
- {
- memcpy (dest, src, size);
- dest += roffset;
- src += soffset;
- }
+
+ /* If the elements are contiguous, perform a single block move. */
+ if (soffset == size && roffset == size)
+ {
+ size_t chunk = size * (len - delta);
+ memcpy (dest, src, chunk);
+ dest += chunk;
+ }
+ else
+ {
+ for (n = 0; n < len - delta; n++)
+ {
+ memcpy (dest, src, size);
+ dest += roffset;
+ src += soffset;
+ }
+ }
if (sh < 0)
dest = rptr;
n = delta;
Index: generated/eoshift1_8.c
===================================================================
--- generated/eoshift1_8.c (Revision 249936)
+++ generated/eoshift1_8.c (Arbeitskopie)
@@ -183,12 +183,23 @@ eoshift1 (gfc_array_char * const restrict ret,
src = sptr;
dest = &rptr[delta * roffset];
}
- for (n = 0; n < len - delta; n++)
- {
- memcpy (dest, src, size);
- dest += roffset;
- src += soffset;
- }
+
+ /* If the elements are contiguous, perform a single block move. */
+ if (soffset == size && roffset == size)
+ {
+ size_t chunk = size * (len - delta);
+ memcpy (dest, src, chunk);
+ dest += chunk;
+ }
+ else
+ {
+ for (n = 0; n < len - delta; n++)
+ {
+ memcpy (dest, src, size);
+ dest += roffset;
+ src += soffset;
+ }
+ }
if (sh < 0)
dest = rptr;
n = delta;
Index: generated/eoshift3_16.c
===================================================================
--- generated/eoshift3_16.c (Revision 249936)
+++ generated/eoshift3_16.c (Arbeitskopie)
@@ -198,12 +198,24 @@ eoshift3 (gfc_array_char * const restrict ret,
src = sptr;
dest = &rptr[delta * roffset];
}
- for (n = 0; n < len - delta; n++)
- {
- memcpy (dest, src, size);
- dest += roffset;
- src += soffset;
- }
+
+ /* If the elements are contiguous, perform a single block move. */
+ if (soffset == size && roffset == size)
+ {
+ size_t chunk = size * (len - delta);
+ memcpy (dest, src, chunk);
+ dest += chunk;
+ }
+ else
+ {
+ for (n = 0; n < len - delta; n++)
+ {
+ memcpy (dest, src, size);
+ dest += roffset;
+ src += soffset;
+ }
+ }
+
if (sh < 0)
dest = rptr;
n = delta;
Index: generated/eoshift3_4.c
===================================================================
--- generated/eoshift3_4.c (Revision 249936)
+++ generated/eoshift3_4.c (Arbeitskopie)
@@ -198,12 +198,24 @@ eoshift3 (gfc_array_char * const restrict ret,
src = sptr;
dest = &rptr[delta * roffset];
}
- for (n = 0; n < len - delta; n++)
- {
- memcpy (dest, src, size);
- dest += roffset;
- src += soffset;
- }
+
+ /* If the elements are contiguous, perform a single block move. */
+ if (soffset == size && roffset == size)
+ {
+ size_t chunk = size * (len - delta);
+ memcpy (dest, src, chunk);
+ dest += chunk;
+ }
+ else
+ {
+ for (n = 0; n < len - delta; n++)
+ {
+ memcpy (dest, src, size);
+ dest += roffset;
+ src += soffset;
+ }
+ }
+
if (sh < 0)
dest = rptr;
n = delta;
Index: generated/eoshift3_8.c
===================================================================
--- generated/eoshift3_8.c (Revision 249936)
+++ generated/eoshift3_8.c (Arbeitskopie)
@@ -198,12 +198,24 @@ eoshift3 (gfc_array_char * const restrict ret,
src = sptr;
dest = &rptr[delta * roffset];
}
- for (n = 0; n < len - delta; n++)
- {
- memcpy (dest, src, size);
- dest += roffset;
- src += soffset;
- }
+
+ /* If the elements are contiguous, perform a single block move. */
+ if (soffset == size && roffset == size)
+ {
+ size_t chunk = size * (len - delta);
+ memcpy (dest, src, chunk);
+ dest += chunk;
+ }
+ else
+ {
+ for (n = 0; n < len - delta; n++)
+ {
+ memcpy (dest, src, size);
+ dest += roffset;
+ src += soffset;
+ }
+ }
+
if (sh < 0)
dest = rptr;
n = delta;
[-- Attachment #3: eoshift_4.f90 --]
[-- Type: text/x-fortran, Size: 4830 bytes --]
! { dg-do run }
! Check that eoshift works for three-dimensional arrays.
module x
implicit none
contains
subroutine eoshift_2 (array, shift, boundary, dim, res)
real, dimension(:,:,:), intent(in) :: array
real, dimension(:,:,:), intent(out) :: res
integer, value :: shift
real, optional, dimension(:,:), intent(in) :: boundary
integer, optional, intent(in) :: dim
integer :: s1, s2, s3
integer :: n1, n2, n3
real :: b
integer :: d
if (present(dim)) then
d = dim
else
d = 1
end if
n1 = size(array,1)
n2 = size(array,2)
n3 = size(array,3)
select case(dim)
case(1)
if (shift > 0) then
shift = min(shift, n1)
do s3=1,n3
do s2=1,n2
b = boundary(s2,s3)
do s1= 1, n1 - shift
res(s1,s2,s3) = array(s1+shift,s2,s3)
end do
do s1 = n1 - shift + 1,n1
res(s1,s2,s3) = b
end do
end do
end do
else
shift = max(shift, -n1)
do s3=1,n3
do s2=1,n2
b = boundary(s2,s3)
do s1=1,-shift
res(s1,s2,s3) = b
end do
do s1= 1-shift,n1
res(s1,s2,s3) = array(s1+shift,s2,s3)
end do
end do
end do
end if
case(2)
if (shift > 0) then
shift = min(shift, n2)
do s3=1,n3
do s2=1, n2 - shift
do s1=1,n1
res(s1,s2,s3) = array(s1,s2+shift,s3)
end do
end do
do s2=n2 - shift + 1, n2
do s1=1,n1
b = boundary(s1,s3)
res(s1,s2,s3) = b
end do
end do
end do
else
shift = max(shift, -n2)
do s3=1,n3
do s2=1,-shift
do s1=1,n1
b = boundary(s1,s3)
res(s1,s2,s3) = b
end do
end do
do s2=1-shift,n2
do s1=1,n1
res(s1,s2,s3) = array(s1,s2+shift,s3)
end do
end do
end do
end if
case(3)
if (shift > 0) then
shift = min(shift, n3)
do s3=1,n3 - shift
do s2=1, n2
do s1=1,n1
res(s1,s2,s3) = array(s1,s2,s3+shift)
end do
end do
end do
do s3=n3 - shift + 1, n3
do s2=1, n2
do s1=1,n1
b = boundary(s1,s2)
res(s1,s2,s3) = b
end do
end do
end do
else
shift = max(shift, -n3)
do s3=1,-shift
do s2=1,n2
do s1=1,n1
b = boundary(s1,s2)
res(s1,s2,s3) = b
end do
end do
end do
do s3=1-shift,n3
do s2=1,n2
do s1=1,n1
res(s1,s2,s3) = array(s1,s2,s3+shift)
end do
end do
end do
end if
case default
stop "Illegal dim"
end select
end subroutine eoshift_2
end module x
program main
use x
implicit none
integer, parameter :: n1=20,n2=30,n3=40
real, dimension(n1,n2,n3) :: a,b,c
real, dimension(2*n1,n2,n3) :: a2,c2
integer :: dim, shift, shift_lim
real, dimension(n2,n3), target :: b1
real, dimension(n1,n3), target :: b2
real, dimension(n1,n2), target :: b3
real, dimension(:,:), pointer :: bp
call random_number(a)
call random_number (b1)
call random_number (b2)
call random_number (b3)
do dim=1,3
if (dim == 1) then
shift_lim = n1 + 1
bp => b1
else if (dim == 2) then
shift_lim = n2 + 1
bp => b2
else
shift_lim = n3 + 1
bp => b3
end if
do shift=-shift_lim, shift_lim
b = eoshift(a,shift,dim=dim, boundary=bp)
call eoshift_2 (a, shift=shift, dim=dim, boundary=bp, res=c)
if (any (b /= c)) then
print *,"dim = ", dim, "shift = ", shift
print *,b
print *,c
call abort
end if
a2 = 42.
a2(1:2*n1:2,:,:) = a
b = eoshift(a2(1:2*n1:2,:,:), shift, dim=dim, boundary=bp)
if (any (b /= c)) then
call abort
end if
c2 = 43.
c2(1:2*n1:2,:,:) = eoshift(a,shift,dim=dim, boundary=bp)
if (any(c2(1:2*n1:2,:,:) /= c)) then
call abort
end if
if (any(c2(2:2*n1:2,:,:) /= 43)) then
call abort
end if
end do
end do
end program main
[-- Attachment #4: eoshift_5.f90 --]
[-- Type: text/x-fortran, Size: 4645 bytes --]
! { dg-do run }
! Check that eoshift works for three-dimensional arrays.
module x
implicit none
contains
subroutine eoshift_1 (array, shift, boundary, dim, res)
real, dimension(:,:,:), intent(in) :: array
real, dimension(:,:,:), intent(out) :: res
integer, dimension(:,:), intent(in) :: shift
real, optional, intent(in) :: boundary
integer, optional, intent(in) :: dim
integer :: s1, s2, s3
integer :: n1, n2, n3
integer :: sh
real :: b
integer :: d
if (present(boundary)) then
b = boundary
else
b = 0.0
end if
if (present(dim)) then
d = dim
else
d = 1
end if
n1 = size(array,1)
n2 = size(array,2)
n3 = size(array,3)
select case(dim)
case(1)
do s3=1,n3
do s2=1,n2
sh = shift(s2,s3)
if (sh > 0) then
sh = min(sh, n1)
do s1= 1, n1 - sh
res(s1,s2,s3) = array(s1+sh,s2,s3)
end do
do s1 = n1 - sh + 1,n1
res(s1,s2,s3) = b
end do
else
sh = max(sh, -n1)
do s1=1,-sh
res(s1,s2,s3) = b
end do
do s1= 1-sh,n1
res(s1,s2,s3) = array(s1+sh,s2,s3)
end do
end if
end do
end do
case(2)
do s3=1,n3
do s1=1,n1
sh = shift(s1,s3)
if (sh > 0) then
sh = min (sh, n2)
do s2=1, n2 - sh
res(s1,s2,s3) = array(s1,s2+sh,s3)
end do
do s2=n2 - sh + 1, n2
res(s1,s2,s3) = b
end do
else
sh = max(sh, -n2)
do s2=1,-sh
res(s1,s2,s3) = b
end do
do s2=1-sh,n2
res(s1,s2,s3) = array(s1,s2+sh,s3)
end do
end if
end do
end do
case(3)
do s2=1, n2
do s1=1,n1
sh = shift(s1, s2)
if (sh > 0) then
sh = min(sh, n3)
do s3=1,n3 - sh
res(s1,s2,s3) = array(s1,s2,s3+sh)
end do
do s3=n3 - sh + 1, n3
res(s1,s2,s3) = b
end do
else
sh = max(sh, -n3)
do s3=1,-sh
res(s1,s2,s3) = b
end do
do s3=1-sh,n3
res(s1,s2,s3) = array(s1,s2,s3+sh)
end do
end if
end do
end do
case default
stop "Illegal dim"
end select
end subroutine eoshift_1
subroutine fill_shift(x, n)
integer, intent(out), dimension(:,:) :: x
integer, intent(in) :: n
integer :: n1, n2, s1, s2
integer :: v
v = -n - 1
n1 = size(x,1)
n2 = size(x,2)
do s2=1,n2
do s1=1,n1
x(s1,s2) = v
v = v + 1
if (v > n + 1) v = -n - 1
end do
end do
end subroutine fill_shift
end module x
program main
use x
implicit none
integer, parameter :: n1=20,n2=30,n3=40
real, dimension(n1,n2,n3) :: a,b,c
real, dimension(2*n1,n2,n3) :: a2, c2
integer :: dim
integer, dimension(n2,n3), target :: sh1
integer, dimension(n1,n3), target :: sh2
integer, dimension(n1,n2), target :: sh3
real, dimension(n2,n3), target :: b1
real, dimension(n1,n3), target :: b2
real, dimension(n1,n2), target :: b3
integer, dimension(:,:), pointer :: sp
real, dimension(:,:), pointer :: bp
call random_number(a)
call fill_shift(sh1, n1)
call fill_shift(sh2, n2)
call fill_shift(sh3, n3)
do dim=1,3
if (dim == 1) then
sp => sh1
else if (dim == 2) then
sp => sh2
else
sp => sh3
end if
b = eoshift(a,shift=sp,dim=dim,boundary=-0.5)
call eoshift_1 (a, shift=sp, dim=dim, boundary=-0.5,res=c)
if (any (b /= c)) then
print *,"dim = ", dim
print *,"sp = ", sp
print '(99F8.4)',b
print '(99F8.4)',c
call abort
end if
a2 = 42.
a2(1:2*n1:2,:,:) = a
b = eoshift(a2(1:2*n1:2,:,:), shift=sp, dim=dim, boundary=-0.5)
if (any(b /= c)) then
call abort
end if
c2 = 43.
c2(1:2*n1:2,:,:) = eoshift(a, shift=sp, dim=dim, boundary=-0.5)
if (any(c2(1:2*n1:2,:,:) /= c)) then
call abort
end if
if (any(c2(2:2*n1:2,:,:) /= 43.)) then
call abort
end if
end do
end program main
[-- Attachment #5: eoshift_6.f90 --]
[-- Type: text/x-fortran, Size: 4679 bytes --]
! { dg-do run }
! Check that eoshift works for three-dimensional arrays.
module x
implicit none
contains
subroutine eoshift_3 (array, shift, boundary, dim, res)
real, dimension(:,:,:), intent(in) :: array
real, dimension(:,:,:), intent(out) :: res
integer, dimension(:,:), intent(in) :: shift
real, optional, dimension(:,:), intent(in) :: boundary
integer, optional, intent(in) :: dim
integer :: s1, s2, s3
integer :: n1, n2, n3
integer :: sh
real :: b
integer :: d
if (present(dim)) then
d = dim
else
d = 1
end if
n1 = size(array,1)
n2 = size(array,2)
n3 = size(array,3)
select case(dim)
case(1)
do s3=1,n3
do s2=1,n2
sh = shift(s2,s3)
b = boundary(s2,s3)
if (sh > 0) then
sh = min(sh, n1)
do s1= 1, n1 - sh
res(s1,s2,s3) = array(s1+sh,s2,s3)
end do
do s1 = n1 - sh + 1,n1
res(s1,s2,s3) = b
end do
else
sh = max(sh, -n1)
do s1=1,-sh
res(s1,s2,s3) = b
end do
do s1= 1-sh,n1
res(s1,s2,s3) = array(s1+sh,s2,s3)
end do
end if
end do
end do
case(2)
do s3=1,n3
do s1=1,n1
sh = shift(s1,s3)
b = boundary(s1,s3)
if (sh > 0) then
sh = min (sh, n2)
do s2=1, n2 - sh
res(s1,s2,s3) = array(s1,s2+sh,s3)
end do
do s2=n2 - sh + 1, n2
res(s1,s2,s3) = b
end do
else
sh = max(sh, -n2)
do s2=1,-sh
res(s1,s2,s3) = b
end do
do s2=1-sh,n2
res(s1,s2,s3) = array(s1,s2+sh,s3)
end do
end if
end do
end do
case(3)
do s2=1, n2
do s1=1,n1
sh = shift(s1, s2)
b = boundary(s1, s2)
if (sh > 0) then
sh = min(sh, n3)
do s3=1,n3 - sh
res(s1,s2,s3) = array(s1,s2,s3+sh)
end do
do s3=n3 - sh + 1, n3
res(s1,s2,s3) = b
end do
else
sh = max(sh, -n3)
do s3=1,-sh
res(s1,s2,s3) = b
end do
do s3=1-sh,n3
res(s1,s2,s3) = array(s1,s2,s3+sh)
end do
end if
end do
end do
case default
stop "Illegal dim"
end select
end subroutine eoshift_3
subroutine fill_shift(x, n)
integer, intent(out), dimension(:,:) :: x
integer, intent(in) :: n
integer :: n1, n2, s1, s2
integer :: v
v = -n - 1
n1 = size(x,1)
n2 = size(x,2)
do s2=1,n2
do s1=1,n1
x(s1,s2) = v
v = v + 1
if (v > n + 1) v = -n - 1
end do
end do
end subroutine fill_shift
end module x
program main
use x
implicit none
integer, parameter :: n1=10,n2=30,n3=40
real, dimension(n1,n2,n3) :: a,b,c
real, dimension(2*n1,n2,n3) :: a2, c2
integer :: dim
integer, dimension(n2,n3), target :: sh1
integer, dimension(n1,n3), target :: sh2
integer, dimension(n1,n2), target :: sh3
real, dimension(n2,n3), target :: b1
real, dimension(n1,n3), target :: b2
real, dimension(n1,n2), target :: b3
integer, dimension(:,:), pointer :: sp
real, dimension(:,:), pointer :: bp
call random_number(a)
call random_number(b1)
call random_number(b2)
call random_number(b3)
call fill_shift(sh1, n1)
call fill_shift(sh2, n2)
call fill_shift(sh3, n3)
do dim=1,3
if (dim == 1) then
sp => sh1
bp => b1
else if (dim == 2) then
sp => sh2
bp => b2
else
sp => sh3
bp => b3
end if
b = eoshift(a,shift=sp,dim=dim,boundary=bp)
call eoshift_3 (a, shift=sp, dim=dim, boundary=bp,res=c)
if (any (b /= c)) then
call abort
end if
a2 = 42.
a2(1:2*n1:2,:,:) = a
b = eoshift(a2(1:2*n1:2,:,:), shift=sp, dim=dim, boundary=bp)
if (any(b /= c)) then
call abort
end if
c2 = 43.
c2(1:2*n1:2,:,:) = eoshift(a, shift=sp, dim=dim, boundary=bp)
if (any(c2(1:2*n1:2,:,:) /= c)) then
call abort
end if
if (any(c2(2:2*n1:2,:,:) /= 43.)) then
call abort
end if
end do
end program main
^ permalink raw reply [flat|nested] 5+ messages in thread
* *Ping* [patch, libgfortran] Use memcpy in a few more places for eoshift
2017-07-03 22:06 [patch, libgfortran] Use memcpy in a few more places for eoshift Thomas Koenig
@ 2017-07-08 11:57 ` Thomas Koenig
2017-07-09 11:28 ` Thomas Koenig
2017-07-09 18:10 ` Paul Richard Thomas
0 siblings, 2 replies; 5+ messages in thread
From: Thomas Koenig @ 2017-07-08 11:57 UTC (permalink / raw)
To: fortran, gcc-patches
Am 04.07.2017 um 00:06 schrieb Thomas Koenig:
> attached are a few more speedups for special eoshift cases. This
> time, nothing fancy, just use memcpy for copying in the
> contiguous case.
Ping?
Regards
Thomas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: *Ping* [patch, libgfortran] Use memcpy in a few more places for eoshift
2017-07-08 11:57 ` *Ping* " Thomas Koenig
@ 2017-07-09 11:28 ` Thomas Koenig
2017-07-09 18:10 ` Paul Richard Thomas
1 sibling, 0 replies; 5+ messages in thread
From: Thomas Koenig @ 2017-07-09 11:28 UTC (permalink / raw)
To: fortran, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1396 bytes --]
Am 08.07.2017 um 13:57 schrieb Thomas Koenig:
> Am 04.07.2017 um 00:06 schrieb Thomas Koenig:
>
>> attached are a few more speedups for special eoshift cases. This
>> time, nothing fancy, just use memcpy for copying in the
>> contiguous case.
>
> Ping?
>
> Regards
>
> Thomas
Some benchmarks (source attached).
$ gfortran eo_bench_2.f90 && ./a.out
dim = 1 t = 0.747093916
dim = 2 t = 2.09117603
dim = 3 t = 3.07099581
$ gfortran-7 -static-libgfortran eo_bench_2.f90 && ./a.out
dim = 1 t = 1.24332905
dim = 2 t = 2.09103727
dim = 3 t = 3.05382776
$ gfortran eo_bench_3.f90 && ./a.out
dim = 1 t = 0.734890938
dim = 2 t = 2.40442204
dim = 3 t = 3.12888288
$ gfortran-7 -static-libgfortran eo_bench_3.f90 && ./a.out
dim = 1 t = 1.30460107
dim = 2 t = 2.17445374
dim = 3 t = 2.78331423
$ gfortran eo_bench_4.f90 && ./a.out
dim = 1 t = 0.777376175
dim = 2 t = 2.40524292
dim = 3 t = 3.10695219
$ gfortran-7 -static-libgfortran eo_bench_4.f90 && ./a.out
dim = 1 t = 1.39399910
dim = 2 t = 2.16738701
dim = 3 t = 3.09568548
So, we get a 65% to 78% speedup for a common use case (dim=1).
[-- Attachment #2: eo_bench_2.f90~ --]
[-- Type: text/plain, Size: 363 bytes --]
program main
implicit none
integer, parameter :: n=600
real, dimension(n,n,n) :: a, c
real, dimension(n,n) :: b
real :: t1, t2
integer :: dim
call random_number(a)
b = 0.
do dim=1,3
call cpu_time(t1)
c = eoshift(a, -3, dim=dim, boundary=b)
call cpu_time(t2)
print *,"dim = ", dim, " t = ", t2-t1
end do
end program main
[-- Attachment #3: eo_bench_3.f90 --]
[-- Type: text/x-fortran, Size: 480 bytes --]
program main
implicit none
integer, parameter :: n=600
real, dimension(n,n,n) :: a, c
real, dimension(n,n) :: b
real :: t1, t2
integer :: dim
integer, dimension(n,n) :: sh
real, dimension(n,n) :: sh_real
call random_number(sh_real)
sh = int(sh_real * 10 + 5)
call random_number(a)
b = 0.
do dim=1,3
call cpu_time(t1)
c = eoshift(a, shift=sh, dim=dim)
call cpu_time(t2)
print *,"dim = ", dim, " t = ", t2-t1
end do
end program main
[-- Attachment #4: eo_bench_4.f90 --]
[-- Type: text/x-fortran, Size: 516 bytes --]
program main
implicit none
integer, parameter :: n=600
real, dimension(n,n,n) :: a, c
real, dimension(n,n) :: b
real :: t1, t2
integer :: dim
integer, dimension(n,n) :: sh
real, dimension(n,n) :: sh_real
call random_number(sh_real)
sh = int(sh_real * 10 + 5)
call random_number(b)
call random_number(a)
b = 0.
do dim=1,3
call cpu_time(t1)
c = eoshift(a, shift=sh, dim=dim, boundary=b)
call cpu_time(t2)
print *,"dim = ", dim, " t = ", t2-t1
end do
end program main
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: *Ping* [patch, libgfortran] Use memcpy in a few more places for eoshift
2017-07-08 11:57 ` *Ping* " Thomas Koenig
2017-07-09 11:28 ` Thomas Koenig
@ 2017-07-09 18:10 ` Paul Richard Thomas
2017-07-09 19:11 ` Thomas Koenig
1 sibling, 1 reply; 5+ messages in thread
From: Paul Richard Thomas @ 2017-07-09 18:10 UTC (permalink / raw)
To: Thomas Koenig; +Cc: fortran, gcc-patches
Hi Thomas,
The patch is OK by me.
Thanks for working on speeding up these library functions. Does the
octave version, mentioned in the clf thread, translate easily into C?
I had to remind myself of how octave cell arrays function. It is
certainly a remarkably concise solution.
Cheers
Paul
On 8 July 2017 at 12:57, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Am 04.07.2017 um 00:06 schrieb Thomas Koenig:
>
>> attached are a few more speedups for special eoshift cases. This
>> time, nothing fancy, just use memcpy for copying in the
>> contiguous case.
>
>
> Ping?
>
> Regards
>
> Thomas
--
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: *Ping* [patch, libgfortran] Use memcpy in a few more places for eoshift
2017-07-09 18:10 ` Paul Richard Thomas
@ 2017-07-09 19:11 ` Thomas Koenig
0 siblings, 0 replies; 5+ messages in thread
From: Thomas Koenig @ 2017-07-09 19:11 UTC (permalink / raw)
To: Paul Richard Thomas; +Cc: fortran, gcc-patches
Hi Paul,
> The patch is OK by me.
Thanks for the review. Committed as rev. 250085.
> Thanks for working on speeding up these library functions. Does the
> octave version, mentioned in the clf thread, translate easily into C?
> I had to remind myself of how octave cell arrays function. It is
> certainly a remarkably concise solution.
I have to admit that I do not yet know how ocatave does it.
A bit of cursory grepping in the source did not reveal any C (or C++)
code which uses something called "circshift", so I will have to do
some more looking.
Regards
Thomas
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-07-09 19:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-03 22:06 [patch, libgfortran] Use memcpy in a few more places for eoshift Thomas Koenig
2017-07-08 11:57 ` *Ping* " Thomas Koenig
2017-07-09 11:28 ` Thomas Koenig
2017-07-09 18:10 ` Paul Richard Thomas
2017-07-09 19:11 ` Thomas Koenig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).