* [patch, libfortran] Speed up cshift for dim > 1
@ 2017-06-14 19:42 Thomas Koenig
2017-06-16 17:19 ` Jerry DeLisle
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Koenig @ 2017-06-14 19:42 UTC (permalink / raw)
To: fortran, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 2552 bytes --]
Hello world,
the attached patch implements a blocked algorithm for
improving the speed of cshift for dim > 1.
It uses the fact that
integer, dimension (n1,n2,n3) :: a, b
b = cshift(a,shift,3)
is identical, as far as the memory locations is concerned.
integer, dimension (n1*n2*n3) :: c, d
d = cshift(c, shift*n1*n2, 1)
The speedup is quite large; from being really slow for
dim > 1, this patch makes it go even faster.
Below there are some comparisons for the attached benchmark,
do-1.f90. gfortran-7 uses the old library version.
Interestingly, the library version is also much faster
than an implementation of straight DO loops.
Regression-tested. OK for trunk?
Regards
Thomas
$ gfortran-7 -static-libgfortran -O3 do-1.f90 && ./a.out
Testing explicit DO loops
Dim = 1 Elapsed CPU time = 5.71363878
Dim = 2 Elapsed CPU time = 5.40494061
Dim = 3 Elapsed CPU time = 5.40769291
Testing built-in cshift
Dim = 1 Elapsed CPU time = 3.43479729
Dim = 2 Elapsed CPU time = 11.7110386
Dim = 3 Elapsed CPU time = 31.0966301
$ gfortran -static-libgfortran -O3 do-1.f90 && ./a.out
Testing explicit DO loops
Dim = 1 Elapsed CPU time = 5.73881340
Dim = 2 Elapsed CPU time = 5.38435745
Dim = 3 Elapsed CPU time = 5.38971329
Testing built-in cshift
Dim = 1 Elapsed CPU time = 3.42018127
Dim = 2 Elapsed CPU time = 2.24075317
Dim = 3 Elapsed CPU time = 2.23136330
2017-06-14 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/52473
* m4/cshift0.m4: For arrays that are contiguous up to
shift, implement blocked algorighm for cshift.
* generated/cshift0_c10.c: Regenerated.
* generated/cshift0_c16.c: Regenerated.
* generated/cshift0_c4.c: Regenerated.
* generated/cshift0_c8.c: Regenerated.
* generated/cshift0_i1.c: Regenerated.
* generated/cshift0_i16.c: Regenerated.
* generated/cshift0_i2.c: Regenerated.
* generated/cshift0_i4.c: Regenerated.
* generated/cshift0_i8.c: Regenerated.
* generated/cshift0_r10.c: Regenerated.
* generated/cshift0_r16.c: Regenerated.
* generated/cshift0_r4.c: Regenerated.
* generated/cshift0_r8.c: Regenerated.
2017-06-14 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/52473
* gfortran.dg/cshift_1.f90: New test.
[-- Attachment #2: p6.diff --]
[-- Type: text/x-patch, Size: 48036 bytes --]
Index: m4/cshift0.m4
===================================================================
--- m4/cshift0.m4 (Revision 249104)
+++ m4/cshift0.m4 (Arbeitskopie)
@@ -52,6 +52,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -64,33 +67,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_c10.c
===================================================================
--- generated/cshift0_c10.c (Revision 249104)
+++ generated/cshift0_c10.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_c16.c
===================================================================
--- generated/cshift0_c16.c (Revision 249104)
+++ generated/cshift0_c16.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_c4.c
===================================================================
--- generated/cshift0_c4.c (Revision 249104)
+++ generated/cshift0_c4.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_c8.c
===================================================================
--- generated/cshift0_c8.c (Revision 249104)
+++ generated/cshift0_c8.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_i1.c
===================================================================
--- generated/cshift0_i1.c (Revision 249104)
+++ generated/cshift0_i1.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_i16.c
===================================================================
--- generated/cshift0_i16.c (Revision 249104)
+++ generated/cshift0_i16.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_i2.c
===================================================================
--- generated/cshift0_i2.c (Revision 249104)
+++ generated/cshift0_i2.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_i4.c
===================================================================
--- generated/cshift0_i4.c (Revision 249104)
+++ generated/cshift0_i4.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_i8.c
===================================================================
--- generated/cshift0_i8.c (Revision 249104)
+++ generated/cshift0_i8.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_r10.c
===================================================================
--- generated/cshift0_r10.c (Revision 249104)
+++ generated/cshift0_r10.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_r16.c
===================================================================
--- generated/cshift0_r16.c (Revision 249104)
+++ generated/cshift0_r16.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_r4.c
===================================================================
--- generated/cshift0_r4.c (Revision 249104)
+++ generated/cshift0_r4.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
Index: generated/cshift0_r8.c
===================================================================
--- generated/cshift0_r8.c (Revision 249104)
+++ generated/cshift0_r8.c (Arbeitskopie)
@@ -51,6 +51,9 @@
index_type len;
index_type n;
+ bool do_blocked;
+ index_type r_ex, a_ex;
+
which = which - 1;
sstride[0] = 0;
rstride[0] = 0;
@@ -63,33 +66,99 @@
soffset = 1;
len = 0;
- for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ r_ex = 1;
+ a_ex = 1;
+
+ if (which > 0)
{
- if (dim == which)
- {
- roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
- if (roffset == 0)
- roffset = 1;
- soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
- if (soffset == 0)
- soffset = 1;
- len = GFC_DESCRIPTOR_EXTENT(array,dim);
- }
- else
- {
- count[n] = 0;
- extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
- rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
- sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
- n++;
- }
+ /* Test if both ret and array are contiguous. */
+ do_blocked = true;
+ dim = GFC_DESCRIPTOR_RANK (array);
+ for (n = 0; n < dim; n ++)
+ {
+ index_type rs, as;
+ rs = GFC_DESCRIPTOR_STRIDE (ret, n);
+ if (rs != r_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ as = GFC_DESCRIPTOR_STRIDE (array, n);
+ if (as != a_ex)
+ {
+ do_blocked = false;
+ break;
+ }
+ r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n);
+ a_ex *= GFC_DESCRIPTOR_EXTENT (array, n);
+ }
}
- if (sstride[0] == 0)
- sstride[0] = 1;
- if (rstride[0] == 0)
- rstride[0] = 1;
+ else
+ do_blocked = false;
- dim = GFC_DESCRIPTOR_RANK (array);
+ n = 0;
+
+ if (do_blocked)
+ {
+ /* For contiguous arrays, use the relationship that
+
+ dimension(n1,n2,n3) :: a, b
+ b = cshift(a,sh,3)
+
+ can be dealt with as if
+
+ dimension(n1*n2*n3) :: an, bn
+ bn = cshift(a,sh*n1*n2,1)
+
+ we can used a more blocked algorithm for dim>1. */
+ sstride[0] = 1;
+ rstride[0] = 1;
+ roffset = 1;
+ soffset = 1;
+ len = GFC_DESCRIPTOR_STRIDE(array, which)
+ * GFC_DESCRIPTOR_EXTENT(array, which);
+ shift *= GFC_DESCRIPTOR_STRIDE(array, which);
+ for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ dim = GFC_DESCRIPTOR_RANK (array) - which;
+ }
+ else
+ {
+ for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++)
+ {
+ if (dim == which)
+ {
+ roffset = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ if (roffset == 0)
+ roffset = 1;
+ soffset = GFC_DESCRIPTOR_STRIDE(array,dim);
+ if (soffset == 0)
+ soffset = 1;
+ len = GFC_DESCRIPTOR_EXTENT(array,dim);
+ }
+ else
+ {
+ count[n] = 0;
+ extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim);
+ rstride[n] = GFC_DESCRIPTOR_STRIDE(ret,dim);
+ sstride[n] = GFC_DESCRIPTOR_STRIDE(array,dim);
+ n++;
+ }
+ }
+ if (sstride[0] == 0)
+ sstride[0] = 1;
+ if (rstride[0] == 0)
+ rstride[0] = 1;
+
+ dim = GFC_DESCRIPTOR_RANK (array);
+ }
+
rstride0 = rstride[0];
sstride0 = sstride[0];
rptr = ret->base_addr;
[-- Attachment #3: cshift_1.f90 --]
[-- Type: text/x-fortran, Size: 3148 bytes --]
! { dg-do run }
! Take cshift through its paces to make sure no boundary
! cases are wrong with the optimized cshift.
module kinds
integer, parameter :: sp = selected_real_kind(6) ! Single precision
end module kinds
module replacements
use kinds
contains
subroutine cshift_sp_3_v1 (array, shift, dim, res)
integer, parameter :: wp = sp
real(kind=wp), dimension(:,:,:), intent(in) :: array
integer, intent(in) :: shift, dim
real(kind=wp), dimension(:,:,:), intent(out) :: res
integer :: i,j,k
integer :: sh, rsh
integer :: n
integer :: n2, n3
res = 0
n3 = size(array,3)
n2 = size(array,2)
n1 = size(array,1)
if (dim == 1) then
n = n1
sh = modulo(shift, n)
rsh = n - sh
do k=1, n3
do j=1, n2
do i=1, rsh
res(i,j,k) = array(i+sh,j,k)
end do
do i=rsh+1,n
res(i,j,k) = array(i-rsh,j,k)
end do
end do
end do
else if (dim == 2) then
n = n2
sh = modulo(shift,n)
rsh = n - sh
do k=1, n3
do j=1, rsh
do i=1, n1
res(i,j,k) = array(i,j+sh, k)
end do
end do
do j=rsh+1, n
do i=1, n1
res(i,j,k) = array(i,j-rsh, k)
end do
end do
end do
else if (dim == 3) then
n = n3
sh = modulo(shift, n)
rsh = n - sh
do k=1, rsh
do j=1, n2
do i=1, n1
res(i,j,k) = array(i, j, k+sh)
end do
end do
end do
do k=rsh+1, n
do j=1, n2
do i=1, n1
res(i,j, k) = array(i, j, k-rsh)
end do
end do
end do
else
stop "Wrong argument to dim"
end if
end subroutine cshift_sp_3_v1
end module replacements
program testme
use kinds
use replacements
implicit none
integer, parameter :: wp = sp ! Working precision
INTEGER, PARAMETER :: n = 7
real(kind=wp), dimension(:,:,:), allocatable :: a,b,c
integer i, j, k
real:: t1, t2
integer, parameter :: nrep = 20
allocate (a(n,n,n), b(n,n,n),c(n,n,n))
call random_number(a)
do k = 1,3
do i=-3,3,2
call cshift_sp_3_v1 (a, i, k, b)
c = cshift(a,i,k)
if (any (c /= b)) call abort
end do
end do
deallocate (b,c)
allocate (b(n-1,n-1,n-1),c(n-1,n-1,n-1))
do k=1,3
do i=-3,3,2
call cshift_sp_3_v1 (a(1:n-1,1:n-1,1:n-1), i, k, b)
c = cshift(a(1:n-1,1:n-1,1:n-1), i, k)
if (any (c /= b)) call abort
end do
end do
deallocate (b,c)
allocate (b(n,n-1,n-1), c(n,n-1,n-1))
do k=1,3
do i=-3,3,2
call cshift_sp_3_v1 (a(:,2:n,2:n), i, k, b)
c = cshift (a(:,2:n,2:n), i, k)
if (any (c /= b)) call abort
end do
end do
deallocate (b,c)
allocate (b(n,n,n-1), c(n,n,n-1))
do k=1,3
do i=-3,3,2
call cshift_sp_3_v1 (a(:,:,2:n), i, k, b)
c = cshift (a(:,:,2:n), i, k)
if (any (c /= b)) call abort
end do
end do
end program testme
[-- Attachment #4: do-1.f90 --]
[-- Type: text/x-fortran, Size: 2681 bytes --]
module kinds
integer, parameter :: sp = selected_real_kind(6) ! Single precision
integer, parameter :: dp = selected_real_kind(15) ! Double precision
end module kinds
module replacements
use kinds
contains
subroutine cshift_sp_3_v1 (array, shift, dim, res)
integer, parameter :: wp = sp
real(kind=wp), dimension(:,:,:), intent(in), contiguous :: array
integer, intent(in) :: shift, dim
real(kind=wp), dimension(:,:,:), intent(out), contiguous :: res
integer :: i,j,k
integer :: sh, rsh
integer :: n
res = 0
if (dim == 1) then
n = size(array,1)
sh = modulo(shift, n)
rsh = n - sh
do k=1, size(array,3)
do j=1, size(array,2)
do i=1, rsh
res(i,j,k) = array(i+sh,j,k)
end do
do i=rsh+1,n
res(i,j,k) = array(i-rsh,j,k)
end do
end do
end do
else if (dim == 2) then
n = size(array,2)
sh = modulo(shift,n)
rsh = n - sh
do k=1, size(array,3)
do j=1, rsh
do i=1, size(array,1)
res(i,j,k) = array(i,j+sh, k)
end do
end do
do j=rsh+1, n
do i=1, size(array,1)
res(i,j,k) = array(i,j-rsh, k)
end do
end do
end do
else if (dim == 3) then
n = size(array,3)
sh = modulo(shift, n)
rsh = n - sh
do k=1, rsh
do j=1, size(array,2)
do i=1, size(array,1)
res(i,j,k) = array(i, j, k+sh)
end do
end do
end do
do k=rsh+1, n
do j=1, size(array,2)
do i=1, size(array,1)
res(i,j, k) = array(i, j, k-rsh)
end do
end do
end do
else
stop "Wrong argument to dim"
end if
end subroutine cshift_sp_3_v1
end module replacements
program testme
use kinds
use replacements
implicit none
integer, parameter :: wp = sp ! Working precision
INTEGER, PARAMETER :: n = 500
real(kind=wp) :: a(n,n,n), b(n,n,n)
integer i, j, k
real t1, t2
print *,"Testing explicit DO loops"
call random_number(a)
do k = 1,3
call cpu_time ( t1 )
do j = 1, 10
call cshift_sp_3_v1 (a, 1, k, b)
end do
call cpu_time ( t2 )
write ( *, * ) 'Dim = ', k, ' Elapsed CPU time = ', t2-t1
end do
print *,"Testing built-in cshift"
do k = 1,3
call cpu_time ( t1 )
do j = 1, 10
b = cshift(a,1,k)
end do
call cpu_time ( t2 )
write ( *, * ) 'Dim = ', k, ' Elapsed CPU time = ', t2-t1
end do
end program testme
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch, libfortran] Speed up cshift for dim > 1
2017-06-14 19:42 [patch, libfortran] Speed up cshift for dim > 1 Thomas Koenig
@ 2017-06-16 17:19 ` Jerry DeLisle
2017-06-18 18:16 ` Thomas Koenig
0 siblings, 1 reply; 5+ messages in thread
From: Jerry DeLisle @ 2017-06-16 17:19 UTC (permalink / raw)
To: Thomas Koenig, fortran, gcc-patches
On 06/14/2017 12:41 PM, Thomas Koenig wrote:
> Hello world,
>
> the attached patch implements a blocked algorithm for
> improving the speed of cshift for dim > 1.
>
> It uses the fact that
>
> integer, dimension (n1,n2,n3) :: a, b
>
> b = cshift(a,shift,3)
>
> is identical, as far as the memory locations is concerned.
>
> integer, dimension (n1*n2*n3) :: c, d
> d = cshift(c, shift*n1*n2, 1)
>
> The speedup is quite large; from being really slow for
> dim > 1, this patch makes it go even faster.
>
> Below there are some comparisons for the attached benchmark,
> do-1.f90. gfortran-7 uses the old library version.
>
> Interestingly, the library version is also much faster
> than an implementation of straight DO loops.
>
> Regression-tested. OK for trunk?
>
OK for trunk.
Thanks,
Jerry
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch, libfortran] Speed up cshift for dim > 1
2017-06-16 17:19 ` Jerry DeLisle
@ 2017-06-18 18:16 ` Thomas Koenig
0 siblings, 0 replies; 5+ messages in thread
From: Thomas Koenig @ 2017-06-18 18:16 UTC (permalink / raw)
To: Jerry DeLisle, fortran, gcc-patches
Hi Jerry,
> OK for trunk.
Committed as r249350.
Thanks for the review, and thanks to Dominique for testing.
Doing the same for eoshift should be straightforward, I'll do
that later (but certainly before the 8.1 release).
Next, on to cshift with an array as shift. I have some ideas
there, let's see how that works out.
Regards
Thomas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch, libfortran] Speed up cshift for dim > 1
@ 2017-06-16 15:14 Dominique d'Humières
2017-06-19 18:01 ` Thomas Koenig
0 siblings, 1 reply; 5+ messages in thread
From: Dominique d'Humières @ 2017-06-16 15:14 UTC (permalink / raw)
To: Thomas Koenig; +Cc: gfortran, gcc-patches
Hi Thomas,
Your patch works as advertised! For the record, the following CSHIFT is still 4 times slower than the DO loop
td(:,:) = cshift(array=t(:,:), shift=vect(:), dim=1)
Thanks for working on this issue.
Dominique
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch, libfortran] Speed up cshift for dim > 1
2017-06-16 15:14 Dominique d'Humières
@ 2017-06-19 18:01 ` Thomas Koenig
0 siblings, 0 replies; 5+ messages in thread
From: Thomas Koenig @ 2017-06-19 18:01 UTC (permalink / raw)
To: Dominique d'Humières; +Cc: gfortran, gcc-patches
Hi Dominique,
> For the record, the following CSHIFT is still 4 times slower than the DO loop
I have looked into this a bit. The main reason is that, unlike cshift0
(without the array as shift) we do not generate individual functions to
call for the usual data types, we use memcpy with a size determined
at run-time by looking at the array descriptor. This is, of course,
quite slow.
So, the solution should probably be to generate functions like
cshift1_4_i4 and then call them. This would generate a bit of
bloat, but if people use this in a serios way, I think
this is OK.
This was already done for cshift0 a few years ago. What
we have there looks like (intrinsics/cshift0.c)
type_size = GFC_DTYPE_TYPE_SIZE (array);
switch(type_size)
{
case GFC_DTYPE_LOGICAL_1:
case GFC_DTYPE_INTEGER_1:
case GFC_DTYPE_DERIVED_1:
cshift0_i1 ((gfc_array_i1 *)ret, (gfc_array_i1 *) array, shift,
which);
return;
case GFC_DTYPE_LOGICAL_2:
case GFC_DTYPE_INTEGER_2:
cshift0_i2 ((gfc_array_i2 *)ret, (gfc_array_i2 *) array, shift,
which);
return;
so this is something that we could also emulate.
A bit of work, but nothing that looks un-doable.
Regards
Thomas
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-06-19 18:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-14 19:42 [patch, libfortran] Speed up cshift for dim > 1 Thomas Koenig
2017-06-16 17:19 ` Jerry DeLisle
2017-06-18 18:16 ` Thomas Koenig
2017-06-16 15:14 Dominique d'Humières
2017-06-19 18:01 ` Thomas Koenig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).