public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
@ 2021-02-23 21:46 Harald Anlauf
  2021-03-01 22:19 ` *PING* " Harald Anlauf
  2021-03-04  2:16 ` Jerry DeLisle
  0 siblings, 2 replies; 5+ messages in thread
From: Harald Anlauf @ 2021-02-23 21:46 UTC (permalink / raw)
  To: fortran, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2032 bytes --]

Dear all,

under certain circumstances a call to MATMUL for rank-2 times rank-1
would invoke a highly tuned rank-2 times rank-2 algorithm which could
lead to invalid reads and writes.  The solution is to check the rank
of the second argument to matmul and fall back to a regular algorithm
for rank-1.  The invalid accesses did show up with valgrind.

I have not been able to create a testcase that gives wrong results.

Regtested on x86_64-pc-linux-gnu, and verified with valgrind.

OK for master?

As this affects all open branches down to 8, ok for backports?

Thanks,
Harald


PR libfortran/99218 - matmul on temporary array accesses invalid memory

Do not invoke tuned rank-2 times rank-2 matmul if rank(b) == 1.

libgfortran/ChangeLog:

	PR libfortran/99218
	* m4/matmul_internal.m4: Invoke tuned matmul only for rank(b)>1.
	* generated/matmul_c10.c: Regenerated.
        * generated/matmul_c16.c: Likewise.
        * generated/matmul_c4.c: Likewise.
        * generated/matmul_c8.c: Likewise.
        * generated/matmul_i1.c: Likewise.
        * generated/matmul_i16.c: Likewise.
        * generated/matmul_i2.c: Likewise.
        * generated/matmul_i4.c: Likewise.
        * generated/matmul_i8.c: Likewise.
        * generated/matmul_r10.c: Likewise.
        * generated/matmul_r16.c: Likewise.
        * generated/matmul_r4.c: Likewise.
        * generated/matmul_r8.c: Likewise.
        * generated/matmulavx128_c10.c: Likewise.
        * generated/matmulavx128_c16.c: Likewise.
        * generated/matmulavx128_c4.c: Likewise.
        * generated/matmulavx128_c8.c: Likewise.
        * generated/matmulavx128_i1.c: Likewise.
        * generated/matmulavx128_i16.c: Likewise.
        * generated/matmulavx128_i2.c: Likewise.
        * generated/matmulavx128_i4.c: Likewise.
        * generated/matmulavx128_i8.c: Likewise.
        * generated/matmulavx128_r10.c: Likewise.
        * generated/matmulavx128_r16.c: Likewise.
        * generated/matmulavx128_r4.c: Likewise.
        * generated/matmulavx128_r8.c: Likewise.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr99218.patch --]
[-- Type: text/x-patch, Size: 42083 bytes --]

diff --git a/libgfortran/generated/matmul_c10.c b/libgfortran/generated/matmul_c10.c
index 3e81b491ea1..b8172e8845d 100644
--- a/libgfortran/generated/matmul_c10.c
+++ b/libgfortran/generated/matmul_c10.c
@@ -276,7 +276,8 @@ matmul_c10_avx (gfc_array_c10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_c10_avx2 (gfc_array_c10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_c10_avx512f (gfc_array_c10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_c10_vanilla (gfc_array_c10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_c10 (gfc_array_c10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_c16.c b/libgfortran/generated/matmul_c16.c
index 61a9a70b5e4..a97e06f0155 100644
--- a/libgfortran/generated/matmul_c16.c
+++ b/libgfortran/generated/matmul_c16.c
@@ -276,7 +276,8 @@ matmul_c16_avx (gfc_array_c16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_c16_avx2 (gfc_array_c16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_c16_avx512f (gfc_array_c16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_c16_vanilla (gfc_array_c16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_c16 (gfc_array_c16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_c4.c b/libgfortran/generated/matmul_c4.c
index ecbd2c11918..f884a4ba8f1 100644
--- a/libgfortran/generated/matmul_c4.c
+++ b/libgfortran/generated/matmul_c4.c
@@ -276,7 +276,8 @@ matmul_c4_avx (gfc_array_c4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_c4_avx2 (gfc_array_c4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_c4_avx512f (gfc_array_c4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_c4_vanilla (gfc_array_c4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_c4 (gfc_array_c4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_c8.c b/libgfortran/generated/matmul_c8.c
index e2b36ff5490..29fbaa2f8b5 100644
--- a/libgfortran/generated/matmul_c8.c
+++ b/libgfortran/generated/matmul_c8.c
@@ -276,7 +276,8 @@ matmul_c8_avx (gfc_array_c8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_c8_avx2 (gfc_array_c8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_c8_avx512f (gfc_array_c8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_c8_vanilla (gfc_array_c8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_c8 (gfc_array_c8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_i1.c b/libgfortran/generated/matmul_i1.c
index 24fb1619306..285d37cda71 100644
--- a/libgfortran/generated/matmul_i1.c
+++ b/libgfortran/generated/matmul_i1.c
@@ -276,7 +276,8 @@ matmul_i1_avx (gfc_array_i1 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_i1_avx2 (gfc_array_i1 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_i1_avx512f (gfc_array_i1 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_i1_vanilla (gfc_array_i1 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_i1 (gfc_array_i1 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_i16.c b/libgfortran/generated/matmul_i16.c
index 498740a48d3..d54a7d966d5 100644
--- a/libgfortran/generated/matmul_i16.c
+++ b/libgfortran/generated/matmul_i16.c
@@ -276,7 +276,8 @@ matmul_i16_avx (gfc_array_i16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_i16_avx2 (gfc_array_i16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_i16_avx512f (gfc_array_i16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_i16_vanilla (gfc_array_i16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_i16 (gfc_array_i16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_i2.c b/libgfortran/generated/matmul_i2.c
index 1d40b399e40..eca6daad2d8 100644
--- a/libgfortran/generated/matmul_i2.c
+++ b/libgfortran/generated/matmul_i2.c
@@ -276,7 +276,8 @@ matmul_i2_avx (gfc_array_i2 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_i2_avx2 (gfc_array_i2 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_i2_avx512f (gfc_array_i2 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_i2_vanilla (gfc_array_i2 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_i2 (gfc_array_i2 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_i4.c b/libgfortran/generated/matmul_i4.c
index b5f83d5453c..a33bb6afaa6 100644
--- a/libgfortran/generated/matmul_i4.c
+++ b/libgfortran/generated/matmul_i4.c
@@ -276,7 +276,8 @@ matmul_i4_avx (gfc_array_i4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_i4_avx2 (gfc_array_i4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_i4_avx512f (gfc_array_i4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_i4_vanilla (gfc_array_i4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_i4 (gfc_array_i4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_i8.c b/libgfortran/generated/matmul_i8.c
index bfaee38f2d2..f49b8c0a185 100644
--- a/libgfortran/generated/matmul_i8.c
+++ b/libgfortran/generated/matmul_i8.c
@@ -276,7 +276,8 @@ matmul_i8_avx (gfc_array_i8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_i8_avx2 (gfc_array_i8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_i8_avx512f (gfc_array_i8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_i8_vanilla (gfc_array_i8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_i8 (gfc_array_i8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_r10.c b/libgfortran/generated/matmul_r10.c
index f4851878e66..6cb59bda7ca 100644
--- a/libgfortran/generated/matmul_r10.c
+++ b/libgfortran/generated/matmul_r10.c
@@ -276,7 +276,8 @@ matmul_r10_avx (gfc_array_r10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_r10_avx2 (gfc_array_r10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_r10_avx512f (gfc_array_r10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_r10_vanilla (gfc_array_r10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_r10 (gfc_array_r10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_r16.c b/libgfortran/generated/matmul_r16.c
index 662cea13894..aca9bd2a140 100644
--- a/libgfortran/generated/matmul_r16.c
+++ b/libgfortran/generated/matmul_r16.c
@@ -276,7 +276,8 @@ matmul_r16_avx (gfc_array_r16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_r16_avx2 (gfc_array_r16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_r16_avx512f (gfc_array_r16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_r16_vanilla (gfc_array_r16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_r16 (gfc_array_r16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_r4.c b/libgfortran/generated/matmul_r4.c
index 9f435f57357..4e0caa6cfe6 100644
--- a/libgfortran/generated/matmul_r4.c
+++ b/libgfortran/generated/matmul_r4.c
@@ -276,7 +276,8 @@ matmul_r4_avx (gfc_array_r4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_r4_avx2 (gfc_array_r4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_r4_avx512f (gfc_array_r4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_r4_vanilla (gfc_array_r4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_r4 (gfc_array_r4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmul_r8.c b/libgfortran/generated/matmul_r8.c
index 5ceec71b58d..d4e825c8155 100644
--- a/libgfortran/generated/matmul_r8.c
+++ b/libgfortran/generated/matmul_r8.c
@@ -276,7 +276,8 @@ matmul_r8_avx (gfc_array_r8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -844,7 +845,8 @@ matmul_r8_avx2 (gfc_array_r8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1412,7 +1414,8 @@ matmul_r8_avx512f (gfc_array_r8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -1994,7 +1997,8 @@ matmul_r8_vanilla (gfc_array_r8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -2635,7 +2639,8 @@ matmul_r8 (gfc_array_r8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_c10.c b/libgfortran/generated/matmulavx128_c10.c
index 434d327c601..e21e6cbe253 100644
--- a/libgfortran/generated/matmulavx128_c10.c
+++ b/libgfortran/generated/matmulavx128_c10.c
@@ -241,7 +241,8 @@ matmul_c10_avx128_fma3 (gfc_array_c10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_c10_avx128_fma4 (gfc_array_c10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_c16.c b/libgfortran/generated/matmulavx128_c16.c
index 27110ad17e5..1cf686a7e4b 100644
--- a/libgfortran/generated/matmulavx128_c16.c
+++ b/libgfortran/generated/matmulavx128_c16.c
@@ -241,7 +241,8 @@ matmul_c16_avx128_fma3 (gfc_array_c16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_c16_avx128_fma4 (gfc_array_c16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_c4.c b/libgfortran/generated/matmulavx128_c4.c
index 4f0f67a6d1d..64f4886399b 100644
--- a/libgfortran/generated/matmulavx128_c4.c
+++ b/libgfortran/generated/matmulavx128_c4.c
@@ -241,7 +241,8 @@ matmul_c4_avx128_fma3 (gfc_array_c4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_c4_avx128_fma4 (gfc_array_c4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_c8.c b/libgfortran/generated/matmulavx128_c8.c
index 4521103d40f..d0846d7be8a 100644
--- a/libgfortran/generated/matmulavx128_c8.c
+++ b/libgfortran/generated/matmulavx128_c8.c
@@ -241,7 +241,8 @@ matmul_c8_avx128_fma3 (gfc_array_c8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_c8_avx128_fma4 (gfc_array_c8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_i1.c b/libgfortran/generated/matmulavx128_i1.c
index e96e30293a3..aa161ba0056 100644
--- a/libgfortran/generated/matmulavx128_i1.c
+++ b/libgfortran/generated/matmulavx128_i1.c
@@ -241,7 +241,8 @@ matmul_i1_avx128_fma3 (gfc_array_i1 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_i1_avx128_fma4 (gfc_array_i1 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_i16.c b/libgfortran/generated/matmulavx128_i16.c
index a4330584a0c..a28b226a080 100644
--- a/libgfortran/generated/matmulavx128_i16.c
+++ b/libgfortran/generated/matmulavx128_i16.c
@@ -241,7 +241,8 @@ matmul_i16_avx128_fma3 (gfc_array_i16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_i16_avx128_fma4 (gfc_array_i16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_i2.c b/libgfortran/generated/matmulavx128_i2.c
index 53ebd769cfb..cd54a519417 100644
--- a/libgfortran/generated/matmulavx128_i2.c
+++ b/libgfortran/generated/matmulavx128_i2.c
@@ -241,7 +241,8 @@ matmul_i2_avx128_fma3 (gfc_array_i2 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_i2_avx128_fma4 (gfc_array_i2 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_i4.c b/libgfortran/generated/matmulavx128_i4.c
index 7feb2cf6403..ece1ddd668e 100644
--- a/libgfortran/generated/matmulavx128_i4.c
+++ b/libgfortran/generated/matmulavx128_i4.c
@@ -241,7 +241,8 @@ matmul_i4_avx128_fma3 (gfc_array_i4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_i4_avx128_fma4 (gfc_array_i4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_i8.c b/libgfortran/generated/matmulavx128_i8.c
index 65b64037861..b63a7feba50 100644
--- a/libgfortran/generated/matmulavx128_i8.c
+++ b/libgfortran/generated/matmulavx128_i8.c
@@ -241,7 +241,8 @@ matmul_i8_avx128_fma3 (gfc_array_i8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_i8_avx128_fma4 (gfc_array_i8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_r10.c b/libgfortran/generated/matmulavx128_r10.c
index eecddf4247e..bc2ea08a1b8 100644
--- a/libgfortran/generated/matmulavx128_r10.c
+++ b/libgfortran/generated/matmulavx128_r10.c
@@ -241,7 +241,8 @@ matmul_r10_avx128_fma3 (gfc_array_r10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_r10_avx128_fma4 (gfc_array_r10 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_r16.c b/libgfortran/generated/matmulavx128_r16.c
index e5042aece2f..228dde8f537 100644
--- a/libgfortran/generated/matmulavx128_r16.c
+++ b/libgfortran/generated/matmulavx128_r16.c
@@ -241,7 +241,8 @@ matmul_r16_avx128_fma3 (gfc_array_r16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_r16_avx128_fma4 (gfc_array_r16 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_r4.c b/libgfortran/generated/matmulavx128_r4.c
index 45039f89547..32f634b07c9 100644
--- a/libgfortran/generated/matmulavx128_r4.c
+++ b/libgfortran/generated/matmulavx128_r4.c
@@ -241,7 +241,8 @@ matmul_r4_avx128_fma3 (gfc_array_r4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_r4_avx128_fma4 (gfc_array_r4 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/generated/matmulavx128_r8.c b/libgfortran/generated/matmulavx128_r8.c
index 1d3311e833e..01bea4f0949 100644
--- a/libgfortran/generated/matmulavx128_r8.c
+++ b/libgfortran/generated/matmulavx128_r8.c
@@ -241,7 +241,8 @@ matmul_r8_avx128_fma3 (gfc_array_r8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
@@ -810,7 +811,8 @@ matmul_r8_avx128_fma4 (gfc_array_r8 * const restrict retarray,
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1
diff --git a/libgfortran/m4/matmul_internal.m4 b/libgfortran/m4/matmul_internal.m4
index 13fd7696238..0e96207a0fc 100644
--- a/libgfortran/m4/matmul_internal.m4
+++ b/libgfortran/m4/matmul_internal.m4
@@ -192,7 +192,8 @@ sinclude(`matmul_asm_'rtype_code`.m4')dnl
 	}
     }

-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+  if (rxstride == 1 && axstride == 1 && bxstride == 1
+      && GFC_DESCRIPTOR_RANK (b) != 1)
     {
       /* This block of code implements a tuned matmul, derived from
          Superscalar GEMM-based level 3 BLAS,  Beta version 0.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* *PING* [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
  2021-02-23 21:46 [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory Harald Anlauf
@ 2021-03-01 22:19 ` Harald Anlauf
  2021-03-04  2:16 ` Jerry DeLisle
  1 sibling, 0 replies; 5+ messages in thread
From: Harald Anlauf @ 2021-03-01 22:19 UTC (permalink / raw)
  To: Harald Anlauf; +Cc: fortran, gcc-patches

Early ping.

Harald


> Gesendet: Dienstag, 23. Februar 2021 um 22:46 Uhr
> Von: "Harald Anlauf" <anlauf@gmx.de>
> An: "fortran" <fortran@gcc.gnu.org>, "gcc-patches" <gcc-patches@gcc.gnu.org>
> Betreff: [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
>
> Dear all,
>
> under certain circumstances a call to MATMUL for rank-2 times rank-1
> would invoke a highly tuned rank-2 times rank-2 algorithm which could
> lead to invalid reads and writes.  The solution is to check the rank
> of the second argument to matmul and fall back to a regular algorithm
> for rank-1.  The invalid accesses did show up with valgrind.
>
> I have not been able to create a testcase that gives wrong results.
>
> Regtested on x86_64-pc-linux-gnu, and verified with valgrind.
>
> OK for master?
>
> As this affects all open branches down to 8, ok for backports?
>
> Thanks,
> Harald
>
>
> PR libfortran/99218 - matmul on temporary array accesses invalid memory
>
> Do not invoke tuned rank-2 times rank-2 matmul if rank(b) == 1.
>
> libgfortran/ChangeLog:
>
> 	PR libfortran/99218
> 	* m4/matmul_internal.m4: Invoke tuned matmul only for rank(b)>1.
> 	* generated/matmul_c10.c: Regenerated.
>         * generated/matmul_c16.c: Likewise.
>         * generated/matmul_c4.c: Likewise.
>         * generated/matmul_c8.c: Likewise.
>         * generated/matmul_i1.c: Likewise.
>         * generated/matmul_i16.c: Likewise.
>         * generated/matmul_i2.c: Likewise.
>         * generated/matmul_i4.c: Likewise.
>         * generated/matmul_i8.c: Likewise.
>         * generated/matmul_r10.c: Likewise.
>         * generated/matmul_r16.c: Likewise.
>         * generated/matmul_r4.c: Likewise.
>         * generated/matmul_r8.c: Likewise.
>         * generated/matmulavx128_c10.c: Likewise.
>         * generated/matmulavx128_c16.c: Likewise.
>         * generated/matmulavx128_c4.c: Likewise.
>         * generated/matmulavx128_c8.c: Likewise.
>         * generated/matmulavx128_i1.c: Likewise.
>         * generated/matmulavx128_i16.c: Likewise.
>         * generated/matmulavx128_i2.c: Likewise.
>         * generated/matmulavx128_i4.c: Likewise.
>         * generated/matmulavx128_i8.c: Likewise.
>         * generated/matmulavx128_r10.c: Likewise.
>         * generated/matmulavx128_r16.c: Likewise.
>         * generated/matmulavx128_r4.c: Likewise.
>         * generated/matmulavx128_r8.c: Likewise.
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
  2021-02-23 21:46 [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory Harald Anlauf
  2021-03-01 22:19 ` *PING* " Harald Anlauf
@ 2021-03-04  2:16 ` Jerry DeLisle
  2021-03-04 20:23   ` Aw: " Harald Anlauf
  1 sibling, 1 reply; 5+ messages in thread
From: Jerry DeLisle @ 2021-03-04  2:16 UTC (permalink / raw)
  To: Harald Anlauf, fortran, gcc-patches

Yes, OK, however, have you been able to test performance. I am only 
curious. There was a test program we used back when this code was first 
implemented in bugzilla. I do not remember the PR number off hand.

Jerry

On 2/23/21 1:46 PM, Harald Anlauf via Fortran wrote:
> Dear all,
>
> under certain circumstances a call to MATMUL for rank-2 times rank-1
> would invoke a highly tuned rank-2 times rank-2 algorithm which could
> lead to invalid reads and writes.  The solution is to check the rank
> of the second argument to matmul and fall back to a regular algorithm
> for rank-1.  The invalid accesses did show up with valgrind.
>
> I have not been able to create a testcase that gives wrong results.
>
> Regtested on x86_64-pc-linux-gnu, and verified with valgrind.
>
> OK for master?
>
> As this affects all open branches down to 8, ok for backports?
>
> Thanks,
> Harald
>
>
> PR libfortran/99218 - matmul on temporary array accesses invalid memory
>
> Do not invoke tuned rank-2 times rank-2 matmul if rank(b) == 1.
>
> libgfortran/ChangeLog:
>
> 	PR libfortran/99218
> 	* m4/matmul_internal.m4: Invoke tuned matmul only for rank(b)>1.
> 	* generated/matmul_c10.c: Regenerated.
>          * generated/matmul_c16.c: Likewise.
>          * generated/matmul_c4.c: Likewise.
>          * generated/matmul_c8.c: Likewise.
>          * generated/matmul_i1.c: Likewise.
>          * generated/matmul_i16.c: Likewise.
>          * generated/matmul_i2.c: Likewise.
>          * generated/matmul_i4.c: Likewise.
>          * generated/matmul_i8.c: Likewise.
>          * generated/matmul_r10.c: Likewise.
>          * generated/matmul_r16.c: Likewise.
>          * generated/matmul_r4.c: Likewise.
>          * generated/matmul_r8.c: Likewise.
>          * generated/matmulavx128_c10.c: Likewise.
>          * generated/matmulavx128_c16.c: Likewise.
>          * generated/matmulavx128_c4.c: Likewise.
>          * generated/matmulavx128_c8.c: Likewise.
>          * generated/matmulavx128_i1.c: Likewise.
>          * generated/matmulavx128_i16.c: Likewise.
>          * generated/matmulavx128_i2.c: Likewise.
>          * generated/matmulavx128_i4.c: Likewise.
>          * generated/matmulavx128_i8.c: Likewise.
>          * generated/matmulavx128_r10.c: Likewise.
>          * generated/matmulavx128_r16.c: Likewise.
>          * generated/matmulavx128_r4.c: Likewise.
>          * generated/matmulavx128_r8.c: Likewise.
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Aw: Re: [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
  2021-03-04  2:16 ` Jerry DeLisle
@ 2021-03-04 20:23   ` Harald Anlauf
  2021-03-05 19:56     ` Harald Anlauf
  0 siblings, 1 reply; 5+ messages in thread
From: Harald Anlauf @ 2021-03-04 20:23 UTC (permalink / raw)
  To: Jerry DeLisle; +Cc: fortran, gcc-patches

Hi Jerry,

> Yes, OK, however, have you been able to test performance. I am only
> curious. There was a test program we used back when this code was first
> implemented in bugzilla. I do not remember the PR number off hand.

as you mentioned in a private mail, it was PR51119, and the timing program

  https://gcc.gnu.org/bugzilla/attachment.cgi?id=40039

I needed to fix the source code slightly to make it work with current gfortran,
by replacing the subroutine dummy with

subroutine dummy(a,b)
  integer, parameter :: wp = selected_real_kind(4), &
       dp = selected_real_kind(8)
  real(dp), intent(in),    dimension(1) :: a
  real(dp), intent(inout), dimension(1) :: b
end subroutine dummy

Testing it on my notebook with an Intel i5-8250U which has avx2, I found no
significant differences between the current master and the version with the
patch when compiling with

% gfc-11 -static -O2 -march=native -finline-matmul-limit=0 compare.f90

E.g. gcc-11 with patch to libfortran:

 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000      0.025      0.139      0.025      0.026
    4  2000      0.191      0.799      0.743      0.741
    8  2000      3.272      2.437      3.280      3.311
   16  2000      7.615      2.768      8.405      7.572
   32  2000      8.492      3.063      9.733      9.521
   64  2000     14.137      3.299     14.118     14.295
  128  2000     18.838      3.128     19.149     18.893
  256   477     17.214      3.256     17.293     17.255
  512    59     17.940      3.316     17.986     17.985
 1024     7     17.672      2.665     17.691     17.698
 2048     1     17.571      2.595     17.559     17.170

With unmodified gcc-11:

 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000      0.024      0.194      0.025      0.025
    4  2000      0.231      1.641      0.718      0.716
    8  2000      3.424      2.445      3.198      3.435
   16  2000      7.715      2.718      7.615      7.845
   32  2000      8.696      3.088      9.728      9.772
   64  2000     14.171      3.275     13.995     14.447
  128  2000     18.931      3.127     18.942     19.019
  256   477     17.239      3.232     17.267     17.291
  512    59     17.938      3.315     17.967     17.996
 1024     7     17.674      2.632     17.673     17.711
 2048     1     17.579      2.581     17.552     17.587

give or take.  (For those too lazy to check: refMatmul is just
the naive explicit matmul).

However, when comparing with older gccs I got better numbers!  E.g. gcc-7:

 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000      0.113      0.199      0.126      0.150
    4  2000      0.866      0.865      0.766      0.881
    8  2000      3.551      2.750      3.371      3.852
   16  2000      7.826      3.517      7.489      7.464
   32  2000      9.989      3.859     11.811     11.903
   64  2000     16.218      4.213     16.501     16.687
  128  2000     19.971      4.006     20.070     20.049
  256   477     22.804      4.139     22.949     22.894
  512    59     23.637      4.047     23.800     23.765
 1024     7     23.051      3.065     23.177     23.152
 2048     1     22.953      2.784     22.946     22.960

So if I were worried that there is a performance penalty by my patch,
I'd look for other places, too.

Cheers,
Harald


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory
  2021-03-04 20:23   ` Aw: " Harald Anlauf
@ 2021-03-05 19:56     ` Harald Anlauf
  0 siblings, 0 replies; 5+ messages in thread
From: Harald Anlauf @ 2021-03-05 19:56 UTC (permalink / raw)
  To: Harald Anlauf; +Cc: Jerry DeLisle, fortran, gcc-patches

Dear all,

I finally figured out that the array dimensions simply need to be
large enough to get invalid memory accesses that actual lead to a
crash.

I will commit the following testcase along with the fix to libfortran:


! { dg-do run }
! PR libfortran/99218 - matmul on temporary array accesses invalid memory

program p
  implicit none
  integer, parameter :: nState = 300000
  integer, parameter :: nCon = 1
  real,    parameter :: ZERO = 0.0
  real :: G(nCon,nState) = ZERO
  real :: H(nState,nCon) = ZERO
  real :: lambda(nCon)   = ZERO
  real :: f(nState)      = ZERO
  f = matmul (transpose (G), lambda)
  if (f(1) /= ZERO) stop 1
end program


Cheers,
Harald


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-05 19:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-23 21:46 [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory Harald Anlauf
2021-03-01 22:19 ` *PING* " Harald Anlauf
2021-03-04  2:16 ` Jerry DeLisle
2021-03-04 20:23   ` Aw: " Harald Anlauf
2021-03-05 19:56     ` Harald Anlauf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).