* [PATCH] PR libfortran/97063 - Wrong result for vector (step size is negative) * matrix
@ 2020-10-11 19:09 Harald Anlauf
2020-10-17 16:38 ` *PING* " Harald Anlauf
0 siblings, 1 reply; 3+ messages in thread
From: Harald Anlauf @ 2020-10-11 19:09 UTC (permalink / raw)
To: fortran, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 2134 bytes --]
PR libfortran/97063 - Wrong result for vector (step size is negative) * matrix
Dear all,
when matrix-multiplying rank-1 times rank-2 arrays, a wrong result was
produced when a negative stride was used for the rank-1 array. In that
case special code for rank-2 times rank-2 was erroneously executed.
We should never have gotten there, so move the check for rank-1 of the
first argument before that case.
The patch looks horrendously large because it consists essentially of
regenerated code (nearly 99%).
Regtests cleanly on x86_64-pc-linux-gnu.
OK for master? And backport to all open branches where it applies?
Thanks,
Harald
The MATMUL intrinsic provided a wrong result for rank-1 times rank-2 array
when a negative stride was used for addressing the elements of the rank-1
array, because a check on strides was erroneously placed before the check
on the rank. Interchange order of checks.
libgfortran/ChangeLog:
* m4/matmul_internal.m4: Move check for rank-1 times rank-2 before
checks on strides for rank-2 times rank-2.
* generated/matmul_c10.c: Regenerated.
* generated/matmul_c16.c: Likewise.
* generated/matmul_c4.c: Likewise.
* generated/matmul_c8.c: Likewise.
* generated/matmul_i1.c: Likewise.
* generated/matmul_i16.c: Likewise.
* generated/matmul_i2.c: Likewise.
* generated/matmul_i4.c: Likewise.
* generated/matmul_i8.c: Likewise.
* generated/matmul_r10.c: Likewise.
* generated/matmul_r16.c: Likewise.
* generated/matmul_r4.c: Likewise.
* generated/matmul_r8.c: Likewise.
* generated/matmulavx128_c10.c: Likewise.
* generated/matmulavx128_c16.c: Likewise.
* generated/matmulavx128_c4.c: Likewise.
* generated/matmulavx128_c8.c: Likewise.
* generated/matmulavx128_i1.c: Likewise.
* generated/matmulavx128_i16.c: Likewise.
* generated/matmulavx128_i2.c: Likewise.
* generated/matmulavx128_i4.c: Likewise.
* generated/matmulavx128_i8.c: Likewise.
* generated/matmulavx128_r10.c: Likewise.
* generated/matmulavx128_r16.c: Likewise.
* generated/matmulavx128_r4.c: Likewise.
* generated/matmulavx128_r8.c: Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/matmul_20.f90: New test.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr97063.patch --]
[-- Type: text/x-patch, Size: 120103 bytes --]
diff --git a/gcc/testsuite/gfortran.dg/matmul_20.f90 b/gcc/testsuite/gfortran.dg/matmul_20.f90
new file mode 100644
index 00000000000..7a211a4974d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/matmul_20.f90
@@ -0,0 +1,47 @@
+! { dg-do run }
+! PR97063 - Wrong result for vector (step size is negative) * matrix
+
+program p
+ implicit none
+ integer, parameter :: m = 3, k = 2*m, l = k-1, n = 4
+ integer :: i, j, m1, m2, ms
+ integer :: ai(k), bi(k,n), ci(n), ci_ref(n), c1, c2
+ real :: ar(k), br(k,n), cr(n), cr_ref(n)
+
+ ai(:) = [(i,i=0,k-1)]
+ bi(:,:) = reshape ([(((5*i+j),i=0,k-1),j=0,n-1)],[k,n])
+
+ ! Parameters of subscript triplet
+ m1 = 1; m2 = l; ms = 2
+
+ ! Reference values for cross-checks: integer variant
+ c1 = dot_product (ai(m1:m2: ms), bi(m1:m2: ms,1))
+ c2 = dot_product (ai(m1:m2: ms), bi(m1:m2: ms,2))
+ ci_ref = matmul (ai(m1:m2: ms), bi(m1:m2: ms,:))
+ ci = matmul (ai(m2:m1:-ms), bi(m2:m1:-ms,:))
+
+ if (ci_ref(1) /= c1 .or. ci_ref(2) /= c2) stop 1
+ if (any (ci /= ci_ref)) stop 2
+
+ ! Real variant
+ ar = real (ai)
+ br = real (bi)
+ cr_ref = matmul (ar(m1:m2: ms), br(m1:m2: ms,:))
+ cr = matmul (ar(m2:m1:-ms), br(m2:m1:-ms,:))
+
+ if (any (cr_ref /= real (ci_ref))) stop 3
+ if (any (cr /= cr_ref )) stop 4
+
+ ! Mixed variants
+ cr_ref = matmul (ar(m1:m2: ms), bi(m1:m2: ms,:))
+ cr = matmul (ar(m2:m1:-ms), bi(m2:m1:-ms,:))
+
+ if (any (cr_ref /= real (ci_ref))) stop 5
+ if (any (cr /= cr_ref )) stop 6
+
+ cr_ref = matmul (ai(m1:m2: ms), br(m1:m2: ms,:))
+ cr = matmul (ai(m2:m1:-ms), br(m2:m1:-ms,:))
+
+ if (any (cr_ref /= real (ci_ref))) stop 7
+ if (any (cr /= cr_ref )) stop 8
+end program
diff --git a/libgfortran/generated/matmul_c10.c b/libgfortran/generated/matmul_c10.c
index ce5be246ddb..5bfd61d97ce 100644
--- a/libgfortran/generated/matmul_c10.c
+++ b/libgfortran/generated/matmul_c10.c
@@ -590,20 +590,6 @@ matmul_c10_avx (gfc_array_c10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_10 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_c10_avx (gfc_array_c10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_10 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_c10_avx2 (gfc_array_c10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_10 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_c10_avx2 (gfc_array_c10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_10 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_c10_avx512f (gfc_array_c10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_10 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_c10_avx512f (gfc_array_c10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_10 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_c10_vanilla (gfc_array_c10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_10 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_c10_vanilla (gfc_array_c10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_10 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_c10 (gfc_array_c10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_10 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_c10 (gfc_array_c10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_10 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_c16.c b/libgfortran/generated/matmul_c16.c
index bf756d124ec..d7617e31b43 100644
--- a/libgfortran/generated/matmul_c16.c
+++ b/libgfortran/generated/matmul_c16.c
@@ -590,20 +590,6 @@ matmul_c16_avx (gfc_array_c16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_16 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_c16_avx (gfc_array_c16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_16 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_c16_avx2 (gfc_array_c16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_16 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_c16_avx2 (gfc_array_c16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_16 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_c16_avx512f (gfc_array_c16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_16 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_c16_avx512f (gfc_array_c16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_16 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_c16_vanilla (gfc_array_c16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_16 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_c16_vanilla (gfc_array_c16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_16 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_c16 (gfc_array_c16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_16 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_c16 (gfc_array_c16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_16 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_c4.c b/libgfortran/generated/matmul_c4.c
index 5b244104574..9303e6add20 100644
--- a/libgfortran/generated/matmul_c4.c
+++ b/libgfortran/generated/matmul_c4.c
@@ -590,20 +590,6 @@ matmul_c4_avx (gfc_array_c4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_4 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_c4_avx (gfc_array_c4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_4 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_c4_avx2 (gfc_array_c4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_4 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_c4_avx2 (gfc_array_c4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_4 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_c4_avx512f (gfc_array_c4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_4 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_c4_avx512f (gfc_array_c4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_4 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_c4_vanilla (gfc_array_c4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_4 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_c4_vanilla (gfc_array_c4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_4 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_c4 (gfc_array_c4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_4 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_c4 (gfc_array_c4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_4 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_c8.c b/libgfortran/generated/matmul_c8.c
index df3cb927e1c..d29c99a5b4a 100644
--- a/libgfortran/generated/matmul_c8.c
+++ b/libgfortran/generated/matmul_c8.c
@@ -590,20 +590,6 @@ matmul_c8_avx (gfc_array_c8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_8 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_c8_avx (gfc_array_c8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_8 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_c8_avx2 (gfc_array_c8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_8 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_c8_avx2 (gfc_array_c8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_8 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_c8_avx512f (gfc_array_c8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_8 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_c8_avx512f (gfc_array_c8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_8 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_c8_vanilla (gfc_array_c8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_8 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_c8_vanilla (gfc_array_c8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_8 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_c8 (gfc_array_c8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_8 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_c8 (gfc_array_c8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_8 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_i1.c b/libgfortran/generated/matmul_i1.c
index 49b0fbad211..72c4cee7d18 100644
--- a/libgfortran/generated/matmul_i1.c
+++ b/libgfortran/generated/matmul_i1.c
@@ -590,20 +590,6 @@ matmul_i1_avx (gfc_array_i1 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_1 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_i1_avx (gfc_array_i1 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_1 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_i1_avx2 (gfc_array_i1 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_1 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_i1_avx2 (gfc_array_i1 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_1 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_i1_avx512f (gfc_array_i1 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_1 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_i1_avx512f (gfc_array_i1 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_1 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_i1_vanilla (gfc_array_i1 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_1 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_i1_vanilla (gfc_array_i1 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_1 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_i1 (gfc_array_i1 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_1 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_i1 (gfc_array_i1 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_1 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_i16.c b/libgfortran/generated/matmul_i16.c
index 4e1d837682b..81586844098 100644
--- a/libgfortran/generated/matmul_i16.c
+++ b/libgfortran/generated/matmul_i16.c
@@ -590,20 +590,6 @@ matmul_i16_avx (gfc_array_i16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_16 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_i16_avx (gfc_array_i16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_16 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_i16_avx2 (gfc_array_i16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_16 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_i16_avx2 (gfc_array_i16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_16 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_i16_avx512f (gfc_array_i16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_16 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_i16_avx512f (gfc_array_i16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_16 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_i16_vanilla (gfc_array_i16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_16 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_i16_vanilla (gfc_array_i16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_16 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_i16 (gfc_array_i16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_16 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_i16 (gfc_array_i16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_16 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_i2.c b/libgfortran/generated/matmul_i2.c
index 191298708dc..1320a2e3ec2 100644
--- a/libgfortran/generated/matmul_i2.c
+++ b/libgfortran/generated/matmul_i2.c
@@ -590,20 +590,6 @@ matmul_i2_avx (gfc_array_i2 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_2 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_i2_avx (gfc_array_i2 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_2 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_i2_avx2 (gfc_array_i2 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_2 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_i2_avx2 (gfc_array_i2 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_2 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_i2_avx512f (gfc_array_i2 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_2 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_i2_avx512f (gfc_array_i2 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_2 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_i2_vanilla (gfc_array_i2 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_2 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_i2_vanilla (gfc_array_i2 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_2 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_i2 (gfc_array_i2 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_2 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_i2 (gfc_array_i2 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_2 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_i4.c b/libgfortran/generated/matmul_i4.c
index ab14a0a3ff3..4ee22218d2c 100644
--- a/libgfortran/generated/matmul_i4.c
+++ b/libgfortran/generated/matmul_i4.c
@@ -590,20 +590,6 @@ matmul_i4_avx (gfc_array_i4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_4 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_i4_avx (gfc_array_i4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_4 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_i4_avx2 (gfc_array_i4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_4 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_i4_avx2 (gfc_array_i4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_4 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_i4_avx512f (gfc_array_i4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_4 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_i4_avx512f (gfc_array_i4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_4 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_i4_vanilla (gfc_array_i4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_4 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_i4_vanilla (gfc_array_i4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_4 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_i4 (gfc_array_i4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_4 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_i4 (gfc_array_i4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_4 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_i8.c b/libgfortran/generated/matmul_i8.c
index bc627e189fe..b68a27f76ad 100644
--- a/libgfortran/generated/matmul_i8.c
+++ b/libgfortran/generated/matmul_i8.c
@@ -590,20 +590,6 @@ matmul_i8_avx (gfc_array_i8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_8 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_i8_avx (gfc_array_i8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_8 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_i8_avx2 (gfc_array_i8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_8 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_i8_avx2 (gfc_array_i8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_8 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_i8_avx512f (gfc_array_i8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_8 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_i8_avx512f (gfc_array_i8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_8 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_i8_vanilla (gfc_array_i8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_8 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_i8_vanilla (gfc_array_i8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_8 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_i8 (gfc_array_i8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_8 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_i8 (gfc_array_i8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_8 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_r10.c b/libgfortran/generated/matmul_r10.c
index b5e63be2448..859c5a56747 100644
--- a/libgfortran/generated/matmul_r10.c
+++ b/libgfortran/generated/matmul_r10.c
@@ -590,20 +590,6 @@ matmul_r10_avx (gfc_array_r10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_10 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_r10_avx (gfc_array_r10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_10 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_r10_avx2 (gfc_array_r10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_10 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_r10_avx2 (gfc_array_r10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_10 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_r10_avx512f (gfc_array_r10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_10 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_r10_avx512f (gfc_array_r10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_10 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_r10_vanilla (gfc_array_r10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_10 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_r10_vanilla (gfc_array_r10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_10 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_r10 (gfc_array_r10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_10 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_r10 (gfc_array_r10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_10 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_r16.c b/libgfortran/generated/matmul_r16.c
index 4e6c66bb8f3..b2fc7f86149 100644
--- a/libgfortran/generated/matmul_r16.c
+++ b/libgfortran/generated/matmul_r16.c
@@ -590,20 +590,6 @@ matmul_r16_avx (gfc_array_r16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_16 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_r16_avx (gfc_array_r16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_16 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_r16_avx2 (gfc_array_r16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_16 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_r16_avx2 (gfc_array_r16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_16 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_r16_avx512f (gfc_array_r16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_16 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_r16_avx512f (gfc_array_r16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_16 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_r16_vanilla (gfc_array_r16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_16 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_r16_vanilla (gfc_array_r16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_16 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_r16 (gfc_array_r16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_16 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_r16 (gfc_array_r16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_16 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_r4.c b/libgfortran/generated/matmul_r4.c
index 202634b55d1..11d785f04c4 100644
--- a/libgfortran/generated/matmul_r4.c
+++ b/libgfortran/generated/matmul_r4.c
@@ -590,20 +590,6 @@ matmul_r4_avx (gfc_array_r4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_4 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_r4_avx (gfc_array_r4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_4 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_r4_avx2 (gfc_array_r4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_4 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_r4_avx2 (gfc_array_r4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_4 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_r4_avx512f (gfc_array_r4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_4 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_r4_avx512f (gfc_array_r4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_4 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_r4_vanilla (gfc_array_r4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_4 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_r4_vanilla (gfc_array_r4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_4 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_r4 (gfc_array_r4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_4 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_r4 (gfc_array_r4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_4 *restrict abase_x;
diff --git a/libgfortran/generated/matmul_r8.c b/libgfortran/generated/matmul_r8.c
index 22c24e50c37..6aae02f8798 100644
--- a/libgfortran/generated/matmul_r8.c
+++ b/libgfortran/generated/matmul_r8.c
@@ -590,20 +590,6 @@ matmul_r8_avx (gfc_array_r8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_8 *restrict bbase_y;
@@ -618,6 +604,20 @@ matmul_r8_avx (gfc_array_r8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_8 *restrict abase_x;
@@ -1158,20 +1158,6 @@ matmul_r8_avx2 (gfc_array_r8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_8 *restrict bbase_y;
@@ -1186,6 +1172,20 @@ matmul_r8_avx2 (gfc_array_r8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_8 *restrict abase_x;
@@ -1726,20 +1726,6 @@ matmul_r8_avx512f (gfc_array_r8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_8 *restrict bbase_y;
@@ -1754,6 +1740,20 @@ matmul_r8_avx512f (gfc_array_r8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_8 *restrict abase_x;
@@ -2308,20 +2308,6 @@ matmul_r8_vanilla (gfc_array_r8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_8 *restrict bbase_y;
@@ -2336,6 +2322,20 @@ matmul_r8_vanilla (gfc_array_r8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_8 *restrict abase_x;
@@ -2949,20 +2949,6 @@ matmul_r8 (gfc_array_r8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_8 *restrict bbase_y;
@@ -2977,6 +2963,20 @@ matmul_r8 (gfc_array_r8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_8 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_c10.c b/libgfortran/generated/matmulavx128_c10.c
index b5ffd030d4a..d0b417c39fd 100644
--- a/libgfortran/generated/matmulavx128_c10.c
+++ b/libgfortran/generated/matmulavx128_c10.c
@@ -555,20 +555,6 @@ matmul_c10_avx128_fma3 (gfc_array_c10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_10 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_c10_avx128_fma3 (gfc_array_c10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_10 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_c10_avx128_fma4 (gfc_array_c10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_10 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_c10_avx128_fma4 (gfc_array_c10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_10 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_c16.c b/libgfortran/generated/matmulavx128_c16.c
index 32a355e424d..0137ba550e4 100644
--- a/libgfortran/generated/matmulavx128_c16.c
+++ b/libgfortran/generated/matmulavx128_c16.c
@@ -555,20 +555,6 @@ matmul_c16_avx128_fma3 (gfc_array_c16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_16 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_c16_avx128_fma3 (gfc_array_c16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_16 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_c16_avx128_fma4 (gfc_array_c16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_16 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_c16_avx128_fma4 (gfc_array_c16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_16 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_c4.c b/libgfortran/generated/matmulavx128_c4.c
index 97b53d3300f..850bd2ba1db 100644
--- a/libgfortran/generated/matmulavx128_c4.c
+++ b/libgfortran/generated/matmulavx128_c4.c
@@ -555,20 +555,6 @@ matmul_c4_avx128_fma3 (gfc_array_c4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_4 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_c4_avx128_fma3 (gfc_array_c4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_4 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_c4_avx128_fma4 (gfc_array_c4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_4 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_c4_avx128_fma4 (gfc_array_c4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_4 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_c8.c b/libgfortran/generated/matmulavx128_c8.c
index e73575e3b63..49d8b446ad9 100644
--- a/libgfortran/generated/matmulavx128_c8.c
+++ b/libgfortran/generated/matmulavx128_c8.c
@@ -555,20 +555,6 @@ matmul_c8_avx128_fma3 (gfc_array_c8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_8 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_c8_avx128_fma3 (gfc_array_c8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_8 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_c8_avx128_fma4 (gfc_array_c8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_COMPLEX_8 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_c8_avx128_fma4 (gfc_array_c8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_COMPLEX_8 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_i1.c b/libgfortran/generated/matmulavx128_i1.c
index 00885fa3139..8fc6d921b00 100644
--- a/libgfortran/generated/matmulavx128_i1.c
+++ b/libgfortran/generated/matmulavx128_i1.c
@@ -555,20 +555,6 @@ matmul_i1_avx128_fma3 (gfc_array_i1 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_1 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_i1_avx128_fma3 (gfc_array_i1 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_1 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_i1_avx128_fma4 (gfc_array_i1 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_1 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_i1_avx128_fma4 (gfc_array_i1 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_1 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_i16.c b/libgfortran/generated/matmulavx128_i16.c
index 942dc08fdb5..a3495570d52 100644
--- a/libgfortran/generated/matmulavx128_i16.c
+++ b/libgfortran/generated/matmulavx128_i16.c
@@ -555,20 +555,6 @@ matmul_i16_avx128_fma3 (gfc_array_i16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_16 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_i16_avx128_fma3 (gfc_array_i16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_16 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_i16_avx128_fma4 (gfc_array_i16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_16 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_i16_avx128_fma4 (gfc_array_i16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_16 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_i2.c b/libgfortran/generated/matmulavx128_i2.c
index baa3c9fba2e..944eaf08cd1 100644
--- a/libgfortran/generated/matmulavx128_i2.c
+++ b/libgfortran/generated/matmulavx128_i2.c
@@ -555,20 +555,6 @@ matmul_i2_avx128_fma3 (gfc_array_i2 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_2 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_i2_avx128_fma3 (gfc_array_i2 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_2 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_i2_avx128_fma4 (gfc_array_i2 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_2 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_i2_avx128_fma4 (gfc_array_i2 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_2 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_i4.c b/libgfortran/generated/matmulavx128_i4.c
index 0c69623ba72..a8e270dd97c 100644
--- a/libgfortran/generated/matmulavx128_i4.c
+++ b/libgfortran/generated/matmulavx128_i4.c
@@ -555,20 +555,6 @@ matmul_i4_avx128_fma3 (gfc_array_i4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_4 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_i4_avx128_fma3 (gfc_array_i4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_4 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_i4_avx128_fma4 (gfc_array_i4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_4 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_i4_avx128_fma4 (gfc_array_i4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_4 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_i8.c b/libgfortran/generated/matmulavx128_i8.c
index f8670020caa..9c7f4925687 100644
--- a/libgfortran/generated/matmulavx128_i8.c
+++ b/libgfortran/generated/matmulavx128_i8.c
@@ -555,20 +555,6 @@ matmul_i8_avx128_fma3 (gfc_array_i8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_8 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_i8_avx128_fma3 (gfc_array_i8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_8 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_i8_avx128_fma4 (gfc_array_i8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_INTEGER_8 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_i8_avx128_fma4 (gfc_array_i8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_INTEGER_8 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_r10.c b/libgfortran/generated/matmulavx128_r10.c
index 24fb2972d1d..e2a44cf7e0d 100644
--- a/libgfortran/generated/matmulavx128_r10.c
+++ b/libgfortran/generated/matmulavx128_r10.c
@@ -555,20 +555,6 @@ matmul_r10_avx128_fma3 (gfc_array_r10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_10 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_r10_avx128_fma3 (gfc_array_r10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_10 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_r10_avx128_fma4 (gfc_array_r10 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_10 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_r10_avx128_fma4 (gfc_array_r10 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_10)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_10 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_r16.c b/libgfortran/generated/matmulavx128_r16.c
index 231d04db0ad..186b226ebc7 100644
--- a/libgfortran/generated/matmulavx128_r16.c
+++ b/libgfortran/generated/matmulavx128_r16.c
@@ -555,20 +555,6 @@ matmul_r16_avx128_fma3 (gfc_array_r16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_16 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_r16_avx128_fma3 (gfc_array_r16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_16 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_r16_avx128_fma4 (gfc_array_r16 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_16 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_r16_avx128_fma4 (gfc_array_r16 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_16)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_16 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_r4.c b/libgfortran/generated/matmulavx128_r4.c
index c58228017bf..e21ea39f124 100644
--- a/libgfortran/generated/matmulavx128_r4.c
+++ b/libgfortran/generated/matmulavx128_r4.c
@@ -555,20 +555,6 @@ matmul_r4_avx128_fma3 (gfc_array_r4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_4 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_r4_avx128_fma3 (gfc_array_r4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_4 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_r4_avx128_fma4 (gfc_array_r4 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_4 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_r4_avx128_fma4 (gfc_array_r4 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_4)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_4 *restrict abase_x;
diff --git a/libgfortran/generated/matmulavx128_r8.c b/libgfortran/generated/matmulavx128_r8.c
index e93aeec8910..e7efd075889 100644
--- a/libgfortran/generated/matmulavx128_r8.c
+++ b/libgfortran/generated/matmulavx128_r8.c
@@ -555,20 +555,6 @@ matmul_r8_avx128_fma3 (gfc_array_r8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_8 *restrict bbase_y;
@@ -583,6 +569,20 @@ matmul_r8_avx128_fma3 (gfc_array_r8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_8 *restrict abase_x;
@@ -1124,20 +1124,6 @@ matmul_r8_avx128_fma4 (gfc_array_r8 * const restrict retarray,
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const GFC_REAL_8 *restrict bbase_y;
@@ -1152,6 +1138,20 @@ matmul_r8_avx128_fma4 (gfc_array_r8 * const restrict retarray,
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = (GFC_REAL_8)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const GFC_REAL_8 *restrict abase_x;
diff --git a/libgfortran/m4/matmul_internal.m4 b/libgfortran/m4/matmul_internal.m4
index 32a1e01e12f..13fd7696238 100644
--- a/libgfortran/m4/matmul_internal.m4
+++ b/libgfortran/m4/matmul_internal.m4
@@ -506,20 +506,6 @@ sinclude(`matmul_asm_'rtype_code`.m4')dnl
}
}
}
- else if (axstride < aystride)
- {
- for (y = 0; y < ycount; y++)
- for (x = 0; x < xcount; x++)
- dest[x*rxstride + y*rystride] = ('rtype_name`)0;
-
- for (y = 0; y < ycount; y++)
- for (n = 0; n < count; n++)
- for (x = 0; x < xcount; x++)
- /* dest[x,y] += a[x,n] * b[n,y] */
- dest[x*rxstride + y*rystride] +=
- abase[x*axstride + n*aystride] *
- bbase[n*bxstride + y*bystride];
- }
else if (GFC_DESCRIPTOR_RANK (a) == 1)
{
const 'rtype_name` *restrict bbase_y;
@@ -534,6 +520,20 @@ sinclude(`matmul_asm_'rtype_code`.m4')dnl
dest[y*rxstride] = s;
}
}
+ else if (axstride < aystride)
+ {
+ for (y = 0; y < ycount; y++)
+ for (x = 0; x < xcount; x++)
+ dest[x*rxstride + y*rystride] = ('rtype_name`)0;
+
+ for (y = 0; y < ycount; y++)
+ for (n = 0; n < count; n++)
+ for (x = 0; x < xcount; x++)
+ /* dest[x,y] += a[x,n] * b[n,y] */
+ dest[x*rxstride + y*rystride] +=
+ abase[x*axstride + n*aystride] *
+ bbase[n*bxstride + y*bystride];
+ }
else
{
const 'rtype_name` *restrict abase_x;
^ permalink raw reply [flat|nested] 3+ messages in thread
* *PING* [PATCH] PR libfortran/97063 - Wrong result for vector (step size is negative) * matrix
2020-10-11 19:09 [PATCH] PR libfortran/97063 - Wrong result for vector (step size is negative) * matrix Harald Anlauf
@ 2020-10-17 16:38 ` Harald Anlauf
2020-10-18 13:10 ` Thomas Koenig
0 siblings, 1 reply; 3+ messages in thread
From: Harald Anlauf @ 2020-10-17 16:38 UTC (permalink / raw)
To: Harald Anlauf; +Cc: fortran, gcc-patches
Early *ping*.
> Gesendet: Sonntag, 11. Oktober 2020 um 21:09 Uhr
> Von: "Harald Anlauf" <anlauf@gmx.de>
> An: "fortran" <fortran@gcc.gnu.org>, "gcc-patches" <gcc-patches@gcc.gnu.org>
> Betreff: [PATCH] PR libfortran/97063 - Wrong result for vector (step size is negative) * matrix
>
> PR libfortran/97063 - Wrong result for vector (step size is negative) * matrix
> Dear all,
>
> when matrix-multiplying rank-1 times rank-2 arrays, a wrong result was
> produced when a negative stride was used for the rank-1 array. In that
> case special code for rank-2 times rank-2 was erroneously executed.
> We should never have gotten there, so move the check for rank-1 of the
> first argument before that case.
>
> The patch looks horrendously large because it consists essentially of
> regenerated code (nearly 99%).
>
> Regtests cleanly on x86_64-pc-linux-gnu.
>
> OK for master? And backport to all open branches where it applies?
>
> Thanks,
> Harald
>
>
> The MATMUL intrinsic provided a wrong result for rank-1 times rank-2 array
> when a negative stride was used for addressing the elements of the rank-1
> array, because a check on strides was erroneously placed before the check
> on the rank. Interchange order of checks.
>
> libgfortran/ChangeLog:
>
> * m4/matmul_internal.m4: Move check for rank-1 times rank-2 before
> checks on strides for rank-2 times rank-2.
> * generated/matmul_c10.c: Regenerated.
> * generated/matmul_c16.c: Likewise.
> * generated/matmul_c4.c: Likewise.
> * generated/matmul_c8.c: Likewise.
> * generated/matmul_i1.c: Likewise.
> * generated/matmul_i16.c: Likewise.
> * generated/matmul_i2.c: Likewise.
> * generated/matmul_i4.c: Likewise.
> * generated/matmul_i8.c: Likewise.
> * generated/matmul_r10.c: Likewise.
> * generated/matmul_r16.c: Likewise.
> * generated/matmul_r4.c: Likewise.
> * generated/matmul_r8.c: Likewise.
> * generated/matmulavx128_c10.c: Likewise.
> * generated/matmulavx128_c16.c: Likewise.
> * generated/matmulavx128_c4.c: Likewise.
> * generated/matmulavx128_c8.c: Likewise.
> * generated/matmulavx128_i1.c: Likewise.
> * generated/matmulavx128_i16.c: Likewise.
> * generated/matmulavx128_i2.c: Likewise.
> * generated/matmulavx128_i4.c: Likewise.
> * generated/matmulavx128_i8.c: Likewise.
> * generated/matmulavx128_r10.c: Likewise.
> * generated/matmulavx128_r16.c: Likewise.
> * generated/matmulavx128_r4.c: Likewise.
> * generated/matmulavx128_r8.c: Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gfortran.dg/matmul_20.f90: New test.
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: *PING* [PATCH] PR libfortran/97063 - Wrong result for vector (step size is negative) * matrix
2020-10-17 16:38 ` *PING* " Harald Anlauf
@ 2020-10-18 13:10 ` Thomas Koenig
0 siblings, 0 replies; 3+ messages in thread
From: Thomas Koenig @ 2020-10-18 13:10 UTC (permalink / raw)
To: Harald Anlauf; +Cc: gcc-patches, fortran
Hello Harald,
> Early *ping*.
> OK for master? And backport to all open branches where it applies?
OK for both.
Thanks a lot for the patch!
Best regards
Thomas
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-10-18 13:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-11 19:09 [PATCH] PR libfortran/97063 - Wrong result for vector (step size is negative) * matrix Harald Anlauf
2020-10-17 16:38 ` *PING* " Harald Anlauf
2020-10-18 13:10 ` Thomas Koenig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).