* [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
@ 2021-02-23 21:46 Harald Anlauf
2021-03-01 22:19 ` *PING* " Harald Anlauf
2021-03-04 2:16 ` Jerry DeLisle
0 siblings, 2 replies; 5+ messages in thread
From: Harald Anlauf @ 2021-02-23 21:46 UTC (permalink / raw)
To: fortran, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 2032 bytes --]
Dear all,
under certain circumstances a call to MATMUL for rank-2 times rank-1
would invoke a highly tuned rank-2 times rank-2 algorithm which could
lead to invalid reads and writes. The solution is to check the rank
of the second argument to matmul and fall back to a regular algorithm
for rank-1. The invalid accesses did show up with valgrind.
I have not been able to create a testcase that gives wrong results.
Regtested on x86_64-pc-linux-gnu, and verified with valgrind.
OK for master?
As this affects all open branches down to 8, ok for backports?
Thanks,
Harald
PR libfortran/99218 - matmul on temporary array accesses invalid memory
Do not invoke tuned rank-2 times rank-2 matmul if rank(b) == 1.
libgfortran/ChangeLog:
PR libfortran/99218
* m4/matmul_internal.m4: Invoke tuned matmul only for rank(b)>1.
* generated/matmul_c10.c: Regenerated.
* generated/matmul_c16.c: Likewise.
* generated/matmul_c4.c: Likewise.
* generated/matmul_c8.c: Likewise.
* generated/matmul_i1.c: Likewise.
* generated/matmul_i16.c: Likewise.
* generated/matmul_i2.c: Likewise.
* generated/matmul_i4.c: Likewise.
* generated/matmul_i8.c: Likewise.
* generated/matmul_r10.c: Likewise.
* generated/matmul_r16.c: Likewise.
* generated/matmul_r4.c: Likewise.
* generated/matmul_r8.c: Likewise.
* generated/matmulavx128_c10.c: Likewise.
* generated/matmulavx128_c16.c: Likewise.
* generated/matmulavx128_c4.c: Likewise.
* generated/matmulavx128_c8.c: Likewise.
* generated/matmulavx128_i1.c: Likewise.
* generated/matmulavx128_i16.c: Likewise.
* generated/matmulavx128_i2.c: Likewise.
* generated/matmulavx128_i4.c: Likewise.
* generated/matmulavx128_i8.c: Likewise.
* generated/matmulavx128_r10.c: Likewise.
* generated/matmulavx128_r16.c: Likewise.
* generated/matmulavx128_r4.c: Likewise.
* generated/matmulavx128_r8.c: Likewise.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr99218.patch --]
[-- Type: text/x-patch, Size: 42083 bytes --]
diff --git a/libgfortran/generated/matmul_c10.c b/libgfortran/generated/matmul_c10.c
index 3e81b491ea1..b8172e8845d 100644
--- a/libgfortran/generated/matmul_c10.c
+++ b/libgfortran/generated/matmul_c10.c
@@ -276,7 +276,8 @@ matmul_c10_avx (gfc_array_c10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_c10_avx2 (gfc_array_c10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_c10_avx512f (gfc_array_c10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_c10_vanilla (gfc_array_c10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_c10 (gfc_array_c10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_c16.c b/libgfortran/generated/matmul_c16.c
index 61a9a70b5e4..a97e06f0155 100644
--- a/libgfortran/generated/matmul_c16.c
+++ b/libgfortran/generated/matmul_c16.c
@@ -276,7 +276,8 @@ matmul_c16_avx (gfc_array_c16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_c16_avx2 (gfc_array_c16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_c16_avx512f (gfc_array_c16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_c16_vanilla (gfc_array_c16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_c16 (gfc_array_c16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_c4.c b/libgfortran/generated/matmul_c4.c
index ecbd2c11918..f884a4ba8f1 100644
--- a/libgfortran/generated/matmul_c4.c
+++ b/libgfortran/generated/matmul_c4.c
@@ -276,7 +276,8 @@ matmul_c4_avx (gfc_array_c4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_c4_avx2 (gfc_array_c4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_c4_avx512f (gfc_array_c4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_c4_vanilla (gfc_array_c4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_c4 (gfc_array_c4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_c8.c b/libgfortran/generated/matmul_c8.c
index e2b36ff5490..29fbaa2f8b5 100644
--- a/libgfortran/generated/matmul_c8.c
+++ b/libgfortran/generated/matmul_c8.c
@@ -276,7 +276,8 @@ matmul_c8_avx (gfc_array_c8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_c8_avx2 (gfc_array_c8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_c8_avx512f (gfc_array_c8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_c8_vanilla (gfc_array_c8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_c8 (gfc_array_c8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_i1.c b/libgfortran/generated/matmul_i1.c
index 24fb1619306..285d37cda71 100644
--- a/libgfortran/generated/matmul_i1.c
+++ b/libgfortran/generated/matmul_i1.c
@@ -276,7 +276,8 @@ matmul_i1_avx (gfc_array_i1 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_i1_avx2 (gfc_array_i1 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_i1_avx512f (gfc_array_i1 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_i1_vanilla (gfc_array_i1 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_i1 (gfc_array_i1 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_i16.c b/libgfortran/generated/matmul_i16.c
index 498740a48d3..d54a7d966d5 100644
--- a/libgfortran/generated/matmul_i16.c
+++ b/libgfortran/generated/matmul_i16.c
@@ -276,7 +276,8 @@ matmul_i16_avx (gfc_array_i16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_i16_avx2 (gfc_array_i16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_i16_avx512f (gfc_array_i16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_i16_vanilla (gfc_array_i16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_i16 (gfc_array_i16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_i2.c b/libgfortran/generated/matmul_i2.c
index 1d40b399e40..eca6daad2d8 100644
--- a/libgfortran/generated/matmul_i2.c
+++ b/libgfortran/generated/matmul_i2.c
@@ -276,7 +276,8 @@ matmul_i2_avx (gfc_array_i2 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_i2_avx2 (gfc_array_i2 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_i2_avx512f (gfc_array_i2 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_i2_vanilla (gfc_array_i2 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_i2 (gfc_array_i2 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_i4.c b/libgfortran/generated/matmul_i4.c
index b5f83d5453c..a33bb6afaa6 100644
--- a/libgfortran/generated/matmul_i4.c
+++ b/libgfortran/generated/matmul_i4.c
@@ -276,7 +276,8 @@ matmul_i4_avx (gfc_array_i4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_i4_avx2 (gfc_array_i4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_i4_avx512f (gfc_array_i4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_i4_vanilla (gfc_array_i4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_i4 (gfc_array_i4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_i8.c b/libgfortran/generated/matmul_i8.c
index bfaee38f2d2..f49b8c0a185 100644
--- a/libgfortran/generated/matmul_i8.c
+++ b/libgfortran/generated/matmul_i8.c
@@ -276,7 +276,8 @@ matmul_i8_avx (gfc_array_i8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_i8_avx2 (gfc_array_i8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_i8_avx512f (gfc_array_i8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_i8_vanilla (gfc_array_i8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_i8 (gfc_array_i8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_r10.c b/libgfortran/generated/matmul_r10.c
index f4851878e66..6cb59bda7ca 100644
--- a/libgfortran/generated/matmul_r10.c
+++ b/libgfortran/generated/matmul_r10.c
@@ -276,7 +276,8 @@ matmul_r10_avx (gfc_array_r10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_r10_avx2 (gfc_array_r10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_r10_avx512f (gfc_array_r10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_r10_vanilla (gfc_array_r10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_r10 (gfc_array_r10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_r16.c b/libgfortran/generated/matmul_r16.c
index 662cea13894..aca9bd2a140 100644
--- a/libgfortran/generated/matmul_r16.c
+++ b/libgfortran/generated/matmul_r16.c
@@ -276,7 +276,8 @@ matmul_r16_avx (gfc_array_r16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_r16_avx2 (gfc_array_r16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_r16_avx512f (gfc_array_r16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_r16_vanilla (gfc_array_r16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_r16 (gfc_array_r16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_r4.c b/libgfortran/generated/matmul_r4.c
index 9f435f57357..4e0caa6cfe6 100644
--- a/libgfortran/generated/matmul_r4.c
+++ b/libgfortran/generated/matmul_r4.c
@@ -276,7 +276,8 @@ matmul_r4_avx (gfc_array_r4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_r4_avx2 (gfc_array_r4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_r4_avx512f (gfc_array_r4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_r4_vanilla (gfc_array_r4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_r4 (gfc_array_r4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmul_r8.c b/libgfortran/generated/matmul_r8.c
index 5ceec71b58d..d4e825c8155 100644
--- a/libgfortran/generated/matmul_r8.c
+++ b/libgfortran/generated/matmul_r8.c
@@ -276,7 +276,8 @@ matmul_r8_avx (gfc_array_r8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -844,7 +845,8 @@ matmul_r8_avx2 (gfc_array_r8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_r8_avx512f (gfc_array_r8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_r8_vanilla (gfc_array_r8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_r8 (gfc_array_r8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_c10.c b/libgfortran/generated/matmulavx128_c10.c
index 434d327c601..e21e6cbe253 100644
--- a/libgfortran/generated/matmulavx128_c10.c
+++ b/libgfortran/generated/matmulavx128_c10.c
@@ -241,7 +241,8 @@ matmul_c10_avx128_fma3 (gfc_array_c10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_c10_avx128_fma4 (gfc_array_c10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_c16.c b/libgfortran/generated/matmulavx128_c16.c
index 27110ad17e5..1cf686a7e4b 100644
--- a/libgfortran/generated/matmulavx128_c16.c
+++ b/libgfortran/generated/matmulavx128_c16.c
@@ -241,7 +241,8 @@ matmul_c16_avx128_fma3 (gfc_array_c16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_c16_avx128_fma4 (gfc_array_c16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_c4.c b/libgfortran/generated/matmulavx128_c4.c
index 4f0f67a6d1d..64f4886399b 100644
--- a/libgfortran/generated/matmulavx128_c4.c
+++ b/libgfortran/generated/matmulavx128_c4.c
@@ -241,7 +241,8 @@ matmul_c4_avx128_fma3 (gfc_array_c4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_c4_avx128_fma4 (gfc_array_c4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_c8.c b/libgfortran/generated/matmulavx128_c8.c
index 4521103d40f..d0846d7be8a 100644
--- a/libgfortran/generated/matmulavx128_c8.c
+++ b/libgfortran/generated/matmulavx128_c8.c
@@ -241,7 +241,8 @@ matmul_c8_avx128_fma3 (gfc_array_c8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_c8_avx128_fma4 (gfc_array_c8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_i1.c b/libgfortran/generated/matmulavx128_i1.c
index e96e30293a3..aa161ba0056 100644
--- a/libgfortran/generated/matmulavx128_i1.c
+++ b/libgfortran/generated/matmulavx128_i1.c
@@ -241,7 +241,8 @@ matmul_i1_avx128_fma3 (gfc_array_i1 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_i1_avx128_fma4 (gfc_array_i1 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_i16.c b/libgfortran/generated/matmulavx128_i16.c
index a4330584a0c..a28b226a080 100644
--- a/libgfortran/generated/matmulavx128_i16.c
+++ b/libgfortran/generated/matmulavx128_i16.c
@@ -241,7 +241,8 @@ matmul_i16_avx128_fma3 (gfc_array_i16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_i16_avx128_fma4 (gfc_array_i16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_i2.c b/libgfortran/generated/matmulavx128_i2.c
index 53ebd769cfb..cd54a519417 100644
--- a/libgfortran/generated/matmulavx128_i2.c
+++ b/libgfortran/generated/matmulavx128_i2.c
@@ -241,7 +241,8 @@ matmul_i2_avx128_fma3 (gfc_array_i2 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_i2_avx128_fma4 (gfc_array_i2 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_i4.c b/libgfortran/generated/matmulavx128_i4.c
index 7feb2cf6403..ece1ddd668e 100644
--- a/libgfortran/generated/matmulavx128_i4.c
+++ b/libgfortran/generated/matmulavx128_i4.c
@@ -241,7 +241,8 @@ matmul_i4_avx128_fma3 (gfc_array_i4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_i4_avx128_fma4 (gfc_array_i4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_i8.c b/libgfortran/generated/matmulavx128_i8.c
index 65b64037861..b63a7feba50 100644
--- a/libgfortran/generated/matmulavx128_i8.c
+++ b/libgfortran/generated/matmulavx128_i8.c
@@ -241,7 +241,8 @@ matmul_i8_avx128_fma3 (gfc_array_i8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_i8_avx128_fma4 (gfc_array_i8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_r10.c b/libgfortran/generated/matmulavx128_r10.c
index eecddf4247e..bc2ea08a1b8 100644
--- a/libgfortran/generated/matmulavx128_r10.c
+++ b/libgfortran/generated/matmulavx128_r10.c
@@ -241,7 +241,8 @@ matmul_r10_avx128_fma3 (gfc_array_r10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_r10_avx128_fma4 (gfc_array_r10 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_r16.c b/libgfortran/generated/matmulavx128_r16.c
index e5042aece2f..228dde8f537 100644
--- a/libgfortran/generated/matmulavx128_r16.c
+++ b/libgfortran/generated/matmulavx128_r16.c
@@ -241,7 +241,8 @@ matmul_r16_avx128_fma3 (gfc_array_r16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_r16_avx128_fma4 (gfc_array_r16 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_r4.c b/libgfortran/generated/matmulavx128_r4.c
index 45039f89547..32f634b07c9 100644
--- a/libgfortran/generated/matmulavx128_r4.c
+++ b/libgfortran/generated/matmulavx128_r4.c
@@ -241,7 +241,8 @@ matmul_r4_avx128_fma3 (gfc_array_r4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_r4_avx128_fma4 (gfc_array_r4 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_r8.c b/libgfortran/generated/matmulavx128_r8.c
index 1d3311e833e..01bea4f0949 100644
--- a/libgfortran/generated/matmulavx128_r8.c
+++ b/libgfortran/generated/matmulavx128_r8.c
@@ -241,7 +241,8 @@ matmul_r8_avx128_fma3 (gfc_array_r8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
@@ -810,7 +811,8 @@ matmul_r8_avx128_fma4 (gfc_array_r8 * const restrict retarray,
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
diff --git a/libgfortran/m4/matmul_internal.m4 b/libgfortran/m4/matmul_internal.m4
index 13fd7696238..0e96207a0fc 100644
--- a/libgfortran/m4/matmul_internal.m4
+++ b/libgfortran/m4/matmul_internal.m4
@@ -192,7 +192,8 @@ sinclude(`matmul_asm_'rtype_code`.m4')dnl
}
}
- if (rxstride == 1 && axstride == 1 && bxstride == 1)
+ if (rxstride == 1 && axstride == 1 && bxstride == 1
+ && GFC_DESCRIPTOR_RANK (b) != 1)
{
/* This block of code implements a tuned matmul, derived from
Superscalar GEMM-based level 3 BLAS, Beta version 0.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* *PING* [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
2021-02-23 21:46 [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory Harald Anlauf
@ 2021-03-01 22:19 ` Harald Anlauf
2021-03-04 2:16 ` Jerry DeLisle
1 sibling, 0 replies; 5+ messages in thread
From: Harald Anlauf @ 2021-03-01 22:19 UTC (permalink / raw)
To: Harald Anlauf; +Cc: fortran, gcc-patches
Early ping.
Harald
> Gesendet: Dienstag, 23. Februar 2021 um 22:46 Uhr
> Von: "Harald Anlauf" <anlauf@gmx.de>
> An: "fortran" <fortran@gcc.gnu.org>, "gcc-patches" <gcc-patches@gcc.gnu.org>
> Betreff: [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
>
> Dear all,
>
> under certain circumstances a call to MATMUL for rank-2 times rank-1
> would invoke a highly tuned rank-2 times rank-2 algorithm which could
> lead to invalid reads and writes. The solution is to check the rank
> of the second argument to matmul and fall back to a regular algorithm
> for rank-1. The invalid accesses did show up with valgrind.
>
> I have not been able to create a testcase that gives wrong results.
>
> Regtested on x86_64-pc-linux-gnu, and verified with valgrind.
>
> OK for master?
>
> As this affects all open branches down to 8, ok for backports?
>
> Thanks,
> Harald
>
>
> PR libfortran/99218 - matmul on temporary array accesses invalid memory
>
> Do not invoke tuned rank-2 times rank-2 matmul if rank(b) == 1.
>
> libgfortran/ChangeLog:
>
> PR libfortran/99218
> * m4/matmul_internal.m4: Invoke tuned matmul only for rank(b)>1.
> * generated/matmul_c10.c: Regenerated.
> * generated/matmul_c16.c: Likewise.
> * generated/matmul_c4.c: Likewise.
> * generated/matmul_c8.c: Likewise.
> * generated/matmul_i1.c: Likewise.
> * generated/matmul_i16.c: Likewise.
> * generated/matmul_i2.c: Likewise.
> * generated/matmul_i4.c: Likewise.
> * generated/matmul_i8.c: Likewise.
> * generated/matmul_r10.c: Likewise.
> * generated/matmul_r16.c: Likewise.
> * generated/matmul_r4.c: Likewise.
> * generated/matmul_r8.c: Likewise.
> * generated/matmulavx128_c10.c: Likewise.
> * generated/matmulavx128_c16.c: Likewise.
> * generated/matmulavx128_c4.c: Likewise.
> * generated/matmulavx128_c8.c: Likewise.
> * generated/matmulavx128_i1.c: Likewise.
> * generated/matmulavx128_i16.c: Likewise.
> * generated/matmulavx128_i2.c: Likewise.
> * generated/matmulavx128_i4.c: Likewise.
> * generated/matmulavx128_i8.c: Likewise.
> * generated/matmulavx128_r10.c: Likewise.
> * generated/matmulavx128_r16.c: Likewise.
> * generated/matmulavx128_r4.c: Likewise.
> * generated/matmulavx128_r8.c: Likewise.
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
2021-02-23 21:46 [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory Harald Anlauf
2021-03-01 22:19 ` *PING* " Harald Anlauf
@ 2021-03-04 2:16 ` Jerry DeLisle
2021-03-04 20:23 ` Aw: " Harald Anlauf
1 sibling, 1 reply; 5+ messages in thread
From: Jerry DeLisle @ 2021-03-04 2:16 UTC (permalink / raw)
To: Harald Anlauf, fortran, gcc-patches
Yes, OK, however, have you been able to test performance. I am only
curious. There was a test program we used back when this code was first
implemented in bugzilla. I do not remember the PR number off hand.
Jerry
On 2/23/21 1:46 PM, Harald Anlauf via Fortran wrote:
> Dear all,
>
> under certain circumstances a call to MATMUL for rank-2 times rank-1
> would invoke a highly tuned rank-2 times rank-2 algorithm which could
> lead to invalid reads and writes. The solution is to check the rank
> of the second argument to matmul and fall back to a regular algorithm
> for rank-1. The invalid accesses did show up with valgrind.
>
> I have not been able to create a testcase that gives wrong results.
>
> Regtested on x86_64-pc-linux-gnu, and verified with valgrind.
>
> OK for master?
>
> As this affects all open branches down to 8, ok for backports?
>
> Thanks,
> Harald
>
>
> PR libfortran/99218 - matmul on temporary array accesses invalid memory
>
> Do not invoke tuned rank-2 times rank-2 matmul if rank(b) == 1.
>
> libgfortran/ChangeLog:
>
> PR libfortran/99218
> * m4/matmul_internal.m4: Invoke tuned matmul only for rank(b)>1.
> * generated/matmul_c10.c: Regenerated.
> * generated/matmul_c16.c: Likewise.
> * generated/matmul_c4.c: Likewise.
> * generated/matmul_c8.c: Likewise.
> * generated/matmul_i1.c: Likewise.
> * generated/matmul_i16.c: Likewise.
> * generated/matmul_i2.c: Likewise.
> * generated/matmul_i4.c: Likewise.
> * generated/matmul_i8.c: Likewise.
> * generated/matmul_r10.c: Likewise.
> * generated/matmul_r16.c: Likewise.
> * generated/matmul_r4.c: Likewise.
> * generated/matmul_r8.c: Likewise.
> * generated/matmulavx128_c10.c: Likewise.
> * generated/matmulavx128_c16.c: Likewise.
> * generated/matmulavx128_c4.c: Likewise.
> * generated/matmulavx128_c8.c: Likewise.
> * generated/matmulavx128_i1.c: Likewise.
> * generated/matmulavx128_i16.c: Likewise.
> * generated/matmulavx128_i2.c: Likewise.
> * generated/matmulavx128_i4.c: Likewise.
> * generated/matmulavx128_i8.c: Likewise.
> * generated/matmulavx128_r10.c: Likewise.
> * generated/matmulavx128_r16.c: Likewise.
> * generated/matmulavx128_r4.c: Likewise.
> * generated/matmulavx128_r8.c: Likewise.
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Aw: Re: [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
2021-03-04 2:16 ` Jerry DeLisle
@ 2021-03-04 20:23 ` Harald Anlauf
2021-03-05 19:56 ` Harald Anlauf
0 siblings, 1 reply; 5+ messages in thread
From: Harald Anlauf @ 2021-03-04 20:23 UTC (permalink / raw)
To: Jerry DeLisle; +Cc: fortran, gcc-patches
Hi Jerry,
> Yes, OK, however, have you been able to test performance. I am only
> curious. There was a test program we used back when this code was first
> implemented in bugzilla. I do not remember the PR number off hand.
as you mentioned in a private mail, it was PR51119, and the timing program
https://gcc.gnu.org/bugzilla/attachment.cgi?id=40039
I needed to fix the source code slightly to make it work with current gfortran,
by replacing the subroutine dummy with
subroutine dummy(a,b)
integer, parameter :: wp = selected_real_kind(4), &
dp = selected_real_kind(8)
real(dp), intent(in), dimension(1) :: a
real(dp), intent(inout), dimension(1) :: b
end subroutine dummy
Testing it on my notebook with an Intel i5-8250U which has avx2, I found no
significant differences between the current master and the version with the
patch when compiling with
% gfc-11 -static -O2 -march=native -finline-matmul-limit=0 compare.f90
E.g. gcc-11 with patch to libfortran:
=========================================================
================ MEASURED GIGAFLOPS =
=========================================================
Matmul Matmul
fixed Matmul variable
Size Loops explicit refMatmul assumed explicit
=========================================================
2 2000 0.025 0.139 0.025 0.026
4 2000 0.191 0.799 0.743 0.741
8 2000 3.272 2.437 3.280 3.311
16 2000 7.615 2.768 8.405 7.572
32 2000 8.492 3.063 9.733 9.521
64 2000 14.137 3.299 14.118 14.295
128 2000 18.838 3.128 19.149 18.893
256 477 17.214 3.256 17.293 17.255
512 59 17.940 3.316 17.986 17.985
1024 7 17.672 2.665 17.691 17.698
2048 1 17.571 2.595 17.559 17.170
With unmodified gcc-11:
=========================================================
================ MEASURED GIGAFLOPS =
=========================================================
Matmul Matmul
fixed Matmul variable
Size Loops explicit refMatmul assumed explicit
=========================================================
2 2000 0.024 0.194 0.025 0.025
4 2000 0.231 1.641 0.718 0.716
8 2000 3.424 2.445 3.198 3.435
16 2000 7.715 2.718 7.615 7.845
32 2000 8.696 3.088 9.728 9.772
64 2000 14.171 3.275 13.995 14.447
128 2000 18.931 3.127 18.942 19.019
256 477 17.239 3.232 17.267 17.291
512 59 17.938 3.315 17.967 17.996
1024 7 17.674 2.632 17.673 17.711
2048 1 17.579 2.581 17.552 17.587
give or take. (For those too lazy to check: refMatmul is just
the naive explicit matmul).
However, when comparing with older gccs I got better numbers! E.g. gcc-7:
=========================================================
================ MEASURED GIGAFLOPS =
=========================================================
Matmul Matmul
fixed Matmul variable
Size Loops explicit refMatmul assumed explicit
=========================================================
2 2000 0.113 0.199 0.126 0.150
4 2000 0.866 0.865 0.766 0.881
8 2000 3.551 2.750 3.371 3.852
16 2000 7.826 3.517 7.489 7.464
32 2000 9.989 3.859 11.811 11.903
64 2000 16.218 4.213 16.501 16.687
128 2000 19.971 4.006 20.070 20.049
256 477 22.804 4.139 22.949 22.894
512 59 23.637 4.047 23.800 23.765
1024 7 23.051 3.065 23.177 23.152
2048 1 22.953 2.784 22.946 22.960
So if I were worried that there is a performance penalty by my patch,
I'd look for other places, too.
Cheers,
Harald
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
2021-03-04 20:23 ` Aw: " Harald Anlauf
@ 2021-03-05 19:56 ` Harald Anlauf
0 siblings, 0 replies; 5+ messages in thread
From: Harald Anlauf @ 2021-03-05 19:56 UTC (permalink / raw)
To: Harald Anlauf; +Cc: Jerry DeLisle, fortran, gcc-patches
Dear all,
I finally figured out that the array dimensions simply need to be
large enough to get invalid memory accesses that actual lead to a
crash.
I will commit the following testcase along with the fix to libfortran:
! { dg-do run }
! PR libfortran/99218 - matmul on temporary array accesses invalid memory
program p
implicit none
integer, parameter :: nState = 300000
integer, parameter :: nCon = 1
real, parameter :: ZERO = 0.0
real :: G(nCon,nState) = ZERO
real :: H(nState,nCon) = ZERO
real :: lambda(nCon) = ZERO
real :: f(nState) = ZERO
f = matmul (transpose (G), lambda)
if (f(1) /= ZERO) stop 1
end program
Cheers,
Harald
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-03-05 19:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-23 21:46 [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory Harald Anlauf
2021-03-01 22:19 ` *PING* " Harald Anlauf
2021-03-04 2:16 ` Jerry DeLisle
2021-03-04 20:23 ` Aw: " Harald Anlauf
2021-03-05 19:56 ` Harald Anlauf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).