From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) by sourceware.org (Postfix) with ESMTPS id 014E73857C4E; Sun, 11 Oct 2020 19:09:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 014E73857C4E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gmx.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=anlauf@gmx.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1602443361; bh=XgML5QFrf8siUN0cRdswAjzpjIBtJFoL4fwO33DdBV8=; h=X-UI-Sender-Class:From:To:Subject:Date; b=c9oHko8cNvRoCvvOgIDmdwamPBmbBtmrm2vHR0lExl+6DmD8pf1Ui3jcORMx8IohT A6Ewy18W70bQPykTQi4jANTxzRZxZKNwR8mJ98i4wmaAAVETWSfIlgKgCcT2LT6V1H bOv05zloHl/CWvQaLoN89ybXFE9JWL/oj36bbLJ8= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [93.207.86.41] ([93.207.86.41]) by web-mail.gmx.net (3c-app-gmx-bap59.server.lan [172.19.172.129]) (via HTTP); Sun, 11 Oct 2020 21:09:20 +0200 MIME-Version: 1.0 Message-ID: From: Harald Anlauf To: fortran , gcc-patches Subject: [PATCH] PR libfortran/97063 - Wrong result for vector (step size is negative) * matrix Content-Type: multipart/mixed; boundary=rekceb-4ee45564-bce9-4713-9095-c9ce18a0de9d Date: Sun, 11 Oct 2020 21:09:20 +0200 Importance: normal Sensitivity: Normal X-Priority: 3 X-Provags-ID: V03:K1:RdboqTatAN+LmBXLpZ3lT+bXHkHPzJKZHmZxvqATE0/VZD99eYRtQzIqsNafCdv/YEYZZ X4/4z2NhqWkw9rXM7iTdKLq0SZEPkkoECXbWqyQJE4XmWU2pzjwFjgYoG+cggkl/UtaX7ffc0TnO k5YQ5259p8rX3MJTAzQjOqKmCPzcArDWfOlixPxabMhgGnWh14AWGcQxdP8ym73zEhyIsig9lhNE ksDGdHXb1udP6W7na/Udbr0bZyRXx/aYna04umIF555tF4CjrpIHSOzIreesTFLTzBWiVyipc+ov dU= X-UI-Out-Filterresults: notjunk:1;V03:K0:75tX5+bDdTQ=:cvRygJ57LG7gHqtMdSKqmk cPv7J4s1wQvblnCbCyYLJ+q/AFdRn0EmvvlV13wAgls+/se9dR7Hp2ZdbYAeA1k2Df7brfdfg CwlfkI4UsBfgB2W0YH4vMpDaLou93cKPn9Flr50hCnBdeXcLKeuIQdqX55Q2qYDm6z4l/Cj6u e9D8W3m3buQjdUb2JAZY2MSMurFPY5fD37jr7Orq7fxJTjlVWIoMcOookL1hH2P6+oRKmjxIM AIsOy3eFQrZqsQj1D09sKe16OnVbP1pVH8KnsC08vT9r9VxvAbkSMkuPCMONCTPCnYlVZwT5s R89mB/x9jVKgKYQThTMfVhb27dY8LC4NV8/BKioAHW259AOD2ZBjxlJgCnJsr6SPAPrIkuT4n LppCel6PcgpzcCcbWSQ7bKeFnJSkN06B4WBGTHlY3/4Q3T9cz5zY7wAXo4Nh2htEXRGlkNGuj bzCoj75nbezrWlgK2kr6R4VPkopP/gXTnbcMZ9TteQiqP3t6dzX8gmye45ltf97Sf+jPk1lBV qt/mewVez2s/1mWya3yji9PeIMqEPH1HSpnlihSjQJdzu1JRlOrzCebfP2Xel5pmkFWrcZ7X6 gcPYYXlNBXVr7HCT49ir92ivVBYZ86zkPyC2UB7Ohe3OMJQkxfEt2q/WqKlMV3GGZzdmixEQ0 1QhE/Zt71GoMy3845K1/BF9EdKUHDvEe9sQXb4DRiOSe6zw== X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: fortran@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Fortran mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Oct 2020 19:09:32 -0000 --rekceb-4ee45564-bce9-4713-9095-c9ce18a0de9d Content-Type: text/plain; charset=UTF-8 PR libfortran/97063 - Wrong result for vector (step size is negative) * matrix Dear all, when matrix-multiplying rank-1 times rank-2 arrays, a wrong result was produced when a negative stride was used for the rank-1 array. In that case special code for rank-2 times rank-2 was erroneously executed. We should never have gotten there, so move the check for rank-1 of the first argument before that case. The patch looks horrendously large because it consists essentially of regenerated code (nearly 99%). Regtests cleanly on x86_64-pc-linux-gnu. OK for master? And backport to all open branches where it applies? Thanks, Harald The MATMUL intrinsic provided a wrong result for rank-1 times rank-2 array when a negative stride was used for addressing the elements of the rank-1 array, because a check on strides was erroneously placed before the check on the rank. Interchange order of checks. libgfortran/ChangeLog: * m4/matmul_internal.m4: Move check for rank-1 times rank-2 before checks on strides for rank-2 times rank-2. * generated/matmul_c10.c: Regenerated. * generated/matmul_c16.c: Likewise. * generated/matmul_c4.c: Likewise. * generated/matmul_c8.c: Likewise. * generated/matmul_i1.c: Likewise. * generated/matmul_i16.c: Likewise. * generated/matmul_i2.c: Likewise. * generated/matmul_i4.c: Likewise. * generated/matmul_i8.c: Likewise. * generated/matmul_r10.c: Likewise. * generated/matmul_r16.c: Likewise. * generated/matmul_r4.c: Likewise. * generated/matmul_r8.c: Likewise. * generated/matmulavx128_c10.c: Likewise. * generated/matmulavx128_c16.c: Likewise. * generated/matmulavx128_c4.c: Likewise. * generated/matmulavx128_c8.c: Likewise. * generated/matmulavx128_i1.c: Likewise. * generated/matmulavx128_i16.c: Likewise. * generated/matmulavx128_i2.c: Likewise. * generated/matmulavx128_i4.c: Likewise. * generated/matmulavx128_i8.c: Likewise. * generated/matmulavx128_r10.c: Likewise. * generated/matmulavx128_r16.c: Likewise. * generated/matmulavx128_r4.c: Likewise. * generated/matmulavx128_r8.c: Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/matmul_20.f90: New test. --rekceb-4ee45564-bce9-4713-9095-c9ce18a0de9d Content-Type: text/x-patch Content-Disposition: attachment; filename=pr97063.patch diff --git a/gcc/testsuite/gfortran.dg/matmul_20.f90 b/gcc/testsuite/gfortran.dg/matmul_20.f90 new file mode 100644 index 00000000000..7a211a4974d --- /dev/null +++ b/gcc/testsuite/gfortran.dg/matmul_20.f90 @@ -0,0 +1,47 @@ +! { dg-do run } +! PR97063 - Wrong result for vector (step size is negative) * matrix + +program p + implicit none + integer, parameter :: m = 3, k = 2*m, l = k-1, n = 4 + integer :: i, j, m1, m2, ms + integer :: ai(k), bi(k,n), ci(n), ci_ref(n), c1, c2 + real :: ar(k), br(k,n), cr(n), cr_ref(n) + + ai(:) = [(i,i=0,k-1)] + bi(:,:) = reshape ([(((5*i+j),i=0,k-1),j=0,n-1)],[k,n]) + + ! Parameters of subscript triplet + m1 = 1; m2 = l; ms = 2 + + ! Reference values for cross-checks: integer variant + c1 = dot_product (ai(m1:m2: ms), bi(m1:m2: ms,1)) + c2 = dot_product (ai(m1:m2: ms), bi(m1:m2: ms,2)) + ci_ref = matmul (ai(m1:m2: ms), bi(m1:m2: ms,:)) + ci = matmul (ai(m2:m1:-ms), bi(m2:m1:-ms,:)) + + if (ci_ref(1) /= c1 .or. ci_ref(2) /= c2) stop 1 + if (any (ci /= ci_ref)) stop 2 + + ! Real variant + ar = real (ai) + br = real (bi) + cr_ref = matmul (ar(m1:m2: ms), br(m1:m2: ms,:)) + cr = matmul (ar(m2:m1:-ms), br(m2:m1:-ms,:)) + + if (any (cr_ref /= real (ci_ref))) stop 3 + if (any (cr /= cr_ref )) stop 4 + + ! Mixed variants + cr_ref = matmul (ar(m1:m2: ms), bi(m1:m2: ms,:)) + cr = matmul (ar(m2:m1:-ms), bi(m2:m1:-ms,:)) + + if (any (cr_ref /= real (ci_ref))) stop 5 + if (any (cr /= cr_ref )) stop 6 + + cr_ref = matmul (ai(m1:m2: ms), br(m1:m2: ms,:)) + cr = matmul (ai(m2:m1:-ms), br(m2:m1:-ms,:)) + + if (any (cr_ref /= real (ci_ref))) stop 7 + if (any (cr /= cr_ref )) stop 8 +end program diff --git a/libgfortran/generated/matmul_c10.c b/libgfortran/generated/matmul_c10.c index ce5be246ddb..5bfd61d97ce 100644 --- a/libgfortran/generated/matmul_c10.c +++ b/libgfortran/generated/matmul_c10.c @@ -590,20 +590,6 @@ matmul_c10_avx (gfc_array_c10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_10 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_c10_avx (gfc_array_c10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_10 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_c10_avx2 (gfc_array_c10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_10 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_c10_avx2 (gfc_array_c10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_10 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_c10_avx512f (gfc_array_c10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_10 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_c10_avx512f (gfc_array_c10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_10 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_c10_vanilla (gfc_array_c10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_10 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_c10_vanilla (gfc_array_c10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_10 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_c10 (gfc_array_c10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_10 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_c10 (gfc_array_c10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_10 *restrict abase_x; diff --git a/libgfortran/generated/matmul_c16.c b/libgfortran/generated/matmul_c16.c index bf756d124ec..d7617e31b43 100644 --- a/libgfortran/generated/matmul_c16.c +++ b/libgfortran/generated/matmul_c16.c @@ -590,20 +590,6 @@ matmul_c16_avx (gfc_array_c16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_16 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_c16_avx (gfc_array_c16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_16 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_c16_avx2 (gfc_array_c16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_16 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_c16_avx2 (gfc_array_c16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_16 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_c16_avx512f (gfc_array_c16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_16 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_c16_avx512f (gfc_array_c16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_16 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_c16_vanilla (gfc_array_c16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_16 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_c16_vanilla (gfc_array_c16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_16 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_c16 (gfc_array_c16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_16 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_c16 (gfc_array_c16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_16 *restrict abase_x; diff --git a/libgfortran/generated/matmul_c4.c b/libgfortran/generated/matmul_c4.c index 5b244104574..9303e6add20 100644 --- a/libgfortran/generated/matmul_c4.c +++ b/libgfortran/generated/matmul_c4.c @@ -590,20 +590,6 @@ matmul_c4_avx (gfc_array_c4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_4 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_c4_avx (gfc_array_c4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_4 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_c4_avx2 (gfc_array_c4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_4 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_c4_avx2 (gfc_array_c4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_4 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_c4_avx512f (gfc_array_c4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_4 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_c4_avx512f (gfc_array_c4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_4 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_c4_vanilla (gfc_array_c4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_4 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_c4_vanilla (gfc_array_c4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_4 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_c4 (gfc_array_c4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_4 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_c4 (gfc_array_c4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_4 *restrict abase_x; diff --git a/libgfortran/generated/matmul_c8.c b/libgfortran/generated/matmul_c8.c index df3cb927e1c..d29c99a5b4a 100644 --- a/libgfortran/generated/matmul_c8.c +++ b/libgfortran/generated/matmul_c8.c @@ -590,20 +590,6 @@ matmul_c8_avx (gfc_array_c8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_8 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_c8_avx (gfc_array_c8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_8 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_c8_avx2 (gfc_array_c8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_8 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_c8_avx2 (gfc_array_c8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_8 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_c8_avx512f (gfc_array_c8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_8 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_c8_avx512f (gfc_array_c8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_8 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_c8_vanilla (gfc_array_c8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_8 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_c8_vanilla (gfc_array_c8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_8 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_c8 (gfc_array_c8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_8 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_c8 (gfc_array_c8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_8 *restrict abase_x; diff --git a/libgfortran/generated/matmul_i1.c b/libgfortran/generated/matmul_i1.c index 49b0fbad211..72c4cee7d18 100644 --- a/libgfortran/generated/matmul_i1.c +++ b/libgfortran/generated/matmul_i1.c @@ -590,20 +590,6 @@ matmul_i1_avx (gfc_array_i1 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_1 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_i1_avx (gfc_array_i1 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_1 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_i1_avx2 (gfc_array_i1 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_1 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_i1_avx2 (gfc_array_i1 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_1 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_i1_avx512f (gfc_array_i1 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_1 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_i1_avx512f (gfc_array_i1 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_1 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_i1_vanilla (gfc_array_i1 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_1 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_i1_vanilla (gfc_array_i1 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_1 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_i1 (gfc_array_i1 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_1 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_i1 (gfc_array_i1 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_1 *restrict abase_x; diff --git a/libgfortran/generated/matmul_i16.c b/libgfortran/generated/matmul_i16.c index 4e1d837682b..81586844098 100644 --- a/libgfortran/generated/matmul_i16.c +++ b/libgfortran/generated/matmul_i16.c @@ -590,20 +590,6 @@ matmul_i16_avx (gfc_array_i16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_16 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_i16_avx (gfc_array_i16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_16 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_i16_avx2 (gfc_array_i16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_16 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_i16_avx2 (gfc_array_i16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_16 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_i16_avx512f (gfc_array_i16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_16 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_i16_avx512f (gfc_array_i16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_16 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_i16_vanilla (gfc_array_i16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_16 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_i16_vanilla (gfc_array_i16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_16 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_i16 (gfc_array_i16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_16 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_i16 (gfc_array_i16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_16 *restrict abase_x; diff --git a/libgfortran/generated/matmul_i2.c b/libgfortran/generated/matmul_i2.c index 191298708dc..1320a2e3ec2 100644 --- a/libgfortran/generated/matmul_i2.c +++ b/libgfortran/generated/matmul_i2.c @@ -590,20 +590,6 @@ matmul_i2_avx (gfc_array_i2 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_2 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_i2_avx (gfc_array_i2 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_2 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_i2_avx2 (gfc_array_i2 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_2 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_i2_avx2 (gfc_array_i2 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_2 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_i2_avx512f (gfc_array_i2 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_2 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_i2_avx512f (gfc_array_i2 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_2 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_i2_vanilla (gfc_array_i2 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_2 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_i2_vanilla (gfc_array_i2 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_2 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_i2 (gfc_array_i2 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_2 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_i2 (gfc_array_i2 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_2 *restrict abase_x; diff --git a/libgfortran/generated/matmul_i4.c b/libgfortran/generated/matmul_i4.c index ab14a0a3ff3..4ee22218d2c 100644 --- a/libgfortran/generated/matmul_i4.c +++ b/libgfortran/generated/matmul_i4.c @@ -590,20 +590,6 @@ matmul_i4_avx (gfc_array_i4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_4 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_i4_avx (gfc_array_i4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_4 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_i4_avx2 (gfc_array_i4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_4 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_i4_avx2 (gfc_array_i4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_4 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_i4_avx512f (gfc_array_i4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_4 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_i4_avx512f (gfc_array_i4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_4 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_i4_vanilla (gfc_array_i4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_4 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_i4_vanilla (gfc_array_i4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_4 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_i4 (gfc_array_i4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_4 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_i4 (gfc_array_i4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_4 *restrict abase_x; diff --git a/libgfortran/generated/matmul_i8.c b/libgfortran/generated/matmul_i8.c index bc627e189fe..b68a27f76ad 100644 --- a/libgfortran/generated/matmul_i8.c +++ b/libgfortran/generated/matmul_i8.c @@ -590,20 +590,6 @@ matmul_i8_avx (gfc_array_i8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_8 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_i8_avx (gfc_array_i8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_8 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_i8_avx2 (gfc_array_i8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_8 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_i8_avx2 (gfc_array_i8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_8 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_i8_avx512f (gfc_array_i8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_8 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_i8_avx512f (gfc_array_i8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_8 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_i8_vanilla (gfc_array_i8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_8 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_i8_vanilla (gfc_array_i8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_8 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_i8 (gfc_array_i8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_8 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_i8 (gfc_array_i8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_8 *restrict abase_x; diff --git a/libgfortran/generated/matmul_r10.c b/libgfortran/generated/matmul_r10.c index b5e63be2448..859c5a56747 100644 --- a/libgfortran/generated/matmul_r10.c +++ b/libgfortran/generated/matmul_r10.c @@ -590,20 +590,6 @@ matmul_r10_avx (gfc_array_r10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_10 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_r10_avx (gfc_array_r10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_10 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_r10_avx2 (gfc_array_r10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_10 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_r10_avx2 (gfc_array_r10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_10 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_r10_avx512f (gfc_array_r10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_10 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_r10_avx512f (gfc_array_r10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_10 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_r10_vanilla (gfc_array_r10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_10 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_r10_vanilla (gfc_array_r10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_10 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_r10 (gfc_array_r10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_10 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_r10 (gfc_array_r10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_10 *restrict abase_x; diff --git a/libgfortran/generated/matmul_r16.c b/libgfortran/generated/matmul_r16.c index 4e6c66bb8f3..b2fc7f86149 100644 --- a/libgfortran/generated/matmul_r16.c +++ b/libgfortran/generated/matmul_r16.c @@ -590,20 +590,6 @@ matmul_r16_avx (gfc_array_r16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_16 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_r16_avx (gfc_array_r16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_16 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_r16_avx2 (gfc_array_r16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_16 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_r16_avx2 (gfc_array_r16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_16 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_r16_avx512f (gfc_array_r16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_16 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_r16_avx512f (gfc_array_r16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_16 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_r16_vanilla (gfc_array_r16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_16 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_r16_vanilla (gfc_array_r16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_16 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_r16 (gfc_array_r16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_16 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_r16 (gfc_array_r16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_16 *restrict abase_x; diff --git a/libgfortran/generated/matmul_r4.c b/libgfortran/generated/matmul_r4.c index 202634b55d1..11d785f04c4 100644 --- a/libgfortran/generated/matmul_r4.c +++ b/libgfortran/generated/matmul_r4.c @@ -590,20 +590,6 @@ matmul_r4_avx (gfc_array_r4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_4 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_r4_avx (gfc_array_r4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_4 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_r4_avx2 (gfc_array_r4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_4 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_r4_avx2 (gfc_array_r4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_4 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_r4_avx512f (gfc_array_r4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_4 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_r4_avx512f (gfc_array_r4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_4 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_r4_vanilla (gfc_array_r4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_4 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_r4_vanilla (gfc_array_r4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_4 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_r4 (gfc_array_r4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_4 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_r4 (gfc_array_r4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_4 *restrict abase_x; diff --git a/libgfortran/generated/matmul_r8.c b/libgfortran/generated/matmul_r8.c index 22c24e50c37..6aae02f8798 100644 --- a/libgfortran/generated/matmul_r8.c +++ b/libgfortran/generated/matmul_r8.c @@ -590,20 +590,6 @@ matmul_r8_avx (gfc_array_r8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_8 *restrict bbase_y; @@ -618,6 +604,20 @@ matmul_r8_avx (gfc_array_r8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_8 *restrict abase_x; @@ -1158,20 +1158,6 @@ matmul_r8_avx2 (gfc_array_r8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_8 *restrict bbase_y; @@ -1186,6 +1172,20 @@ matmul_r8_avx2 (gfc_array_r8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_8 *restrict abase_x; @@ -1726,20 +1726,6 @@ matmul_r8_avx512f (gfc_array_r8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_8 *restrict bbase_y; @@ -1754,6 +1740,20 @@ matmul_r8_avx512f (gfc_array_r8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_8 *restrict abase_x; @@ -2308,20 +2308,6 @@ matmul_r8_vanilla (gfc_array_r8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_8 *restrict bbase_y; @@ -2336,6 +2322,20 @@ matmul_r8_vanilla (gfc_array_r8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_8 *restrict abase_x; @@ -2949,20 +2949,6 @@ matmul_r8 (gfc_array_r8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_8 *restrict bbase_y; @@ -2977,6 +2963,20 @@ matmul_r8 (gfc_array_r8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_8 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_c10.c b/libgfortran/generated/matmulavx128_c10.c index b5ffd030d4a..d0b417c39fd 100644 --- a/libgfortran/generated/matmulavx128_c10.c +++ b/libgfortran/generated/matmulavx128_c10.c @@ -555,20 +555,6 @@ matmul_c10_avx128_fma3 (gfc_array_c10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_10 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_c10_avx128_fma3 (gfc_array_c10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_10 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_c10_avx128_fma4 (gfc_array_c10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_10 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_c10_avx128_fma4 (gfc_array_c10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_10 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_c16.c b/libgfortran/generated/matmulavx128_c16.c index 32a355e424d..0137ba550e4 100644 --- a/libgfortran/generated/matmulavx128_c16.c +++ b/libgfortran/generated/matmulavx128_c16.c @@ -555,20 +555,6 @@ matmul_c16_avx128_fma3 (gfc_array_c16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_16 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_c16_avx128_fma3 (gfc_array_c16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_16 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_c16_avx128_fma4 (gfc_array_c16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_16 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_c16_avx128_fma4 (gfc_array_c16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_16 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_c4.c b/libgfortran/generated/matmulavx128_c4.c index 97b53d3300f..850bd2ba1db 100644 --- a/libgfortran/generated/matmulavx128_c4.c +++ b/libgfortran/generated/matmulavx128_c4.c @@ -555,20 +555,6 @@ matmul_c4_avx128_fma3 (gfc_array_c4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_4 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_c4_avx128_fma3 (gfc_array_c4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_4 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_c4_avx128_fma4 (gfc_array_c4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_4 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_c4_avx128_fma4 (gfc_array_c4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_4 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_c8.c b/libgfortran/generated/matmulavx128_c8.c index e73575e3b63..49d8b446ad9 100644 --- a/libgfortran/generated/matmulavx128_c8.c +++ b/libgfortran/generated/matmulavx128_c8.c @@ -555,20 +555,6 @@ matmul_c8_avx128_fma3 (gfc_array_c8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_8 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_c8_avx128_fma3 (gfc_array_c8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_8 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_c8_avx128_fma4 (gfc_array_c8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_COMPLEX_8 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_c8_avx128_fma4 (gfc_array_c8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_COMPLEX_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_COMPLEX_8 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_i1.c b/libgfortran/generated/matmulavx128_i1.c index 00885fa3139..8fc6d921b00 100644 --- a/libgfortran/generated/matmulavx128_i1.c +++ b/libgfortran/generated/matmulavx128_i1.c @@ -555,20 +555,6 @@ matmul_i1_avx128_fma3 (gfc_array_i1 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_1 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_i1_avx128_fma3 (gfc_array_i1 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_1 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_i1_avx128_fma4 (gfc_array_i1 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_1 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_i1_avx128_fma4 (gfc_array_i1 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_1)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_1 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_i16.c b/libgfortran/generated/matmulavx128_i16.c index 942dc08fdb5..a3495570d52 100644 --- a/libgfortran/generated/matmulavx128_i16.c +++ b/libgfortran/generated/matmulavx128_i16.c @@ -555,20 +555,6 @@ matmul_i16_avx128_fma3 (gfc_array_i16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_16 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_i16_avx128_fma3 (gfc_array_i16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_16 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_i16_avx128_fma4 (gfc_array_i16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_16 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_i16_avx128_fma4 (gfc_array_i16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_16 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_i2.c b/libgfortran/generated/matmulavx128_i2.c index baa3c9fba2e..944eaf08cd1 100644 --- a/libgfortran/generated/matmulavx128_i2.c +++ b/libgfortran/generated/matmulavx128_i2.c @@ -555,20 +555,6 @@ matmul_i2_avx128_fma3 (gfc_array_i2 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_2 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_i2_avx128_fma3 (gfc_array_i2 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_2 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_i2_avx128_fma4 (gfc_array_i2 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_2 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_i2_avx128_fma4 (gfc_array_i2 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_2)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_2 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_i4.c b/libgfortran/generated/matmulavx128_i4.c index 0c69623ba72..a8e270dd97c 100644 --- a/libgfortran/generated/matmulavx128_i4.c +++ b/libgfortran/generated/matmulavx128_i4.c @@ -555,20 +555,6 @@ matmul_i4_avx128_fma3 (gfc_array_i4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_4 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_i4_avx128_fma3 (gfc_array_i4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_4 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_i4_avx128_fma4 (gfc_array_i4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_4 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_i4_avx128_fma4 (gfc_array_i4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_4 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_i8.c b/libgfortran/generated/matmulavx128_i8.c index f8670020caa..9c7f4925687 100644 --- a/libgfortran/generated/matmulavx128_i8.c +++ b/libgfortran/generated/matmulavx128_i8.c @@ -555,20 +555,6 @@ matmul_i8_avx128_fma3 (gfc_array_i8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_8 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_i8_avx128_fma3 (gfc_array_i8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_8 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_i8_avx128_fma4 (gfc_array_i8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_INTEGER_8 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_i8_avx128_fma4 (gfc_array_i8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_INTEGER_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_INTEGER_8 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_r10.c b/libgfortran/generated/matmulavx128_r10.c index 24fb2972d1d..e2a44cf7e0d 100644 --- a/libgfortran/generated/matmulavx128_r10.c +++ b/libgfortran/generated/matmulavx128_r10.c @@ -555,20 +555,6 @@ matmul_r10_avx128_fma3 (gfc_array_r10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_10 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_r10_avx128_fma3 (gfc_array_r10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_10 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_r10_avx128_fma4 (gfc_array_r10 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_10 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_r10_avx128_fma4 (gfc_array_r10 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_10)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_10 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_r16.c b/libgfortran/generated/matmulavx128_r16.c index 231d04db0ad..186b226ebc7 100644 --- a/libgfortran/generated/matmulavx128_r16.c +++ b/libgfortran/generated/matmulavx128_r16.c @@ -555,20 +555,6 @@ matmul_r16_avx128_fma3 (gfc_array_r16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_16 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_r16_avx128_fma3 (gfc_array_r16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_16 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_r16_avx128_fma4 (gfc_array_r16 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_16 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_r16_avx128_fma4 (gfc_array_r16 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_16)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_16 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_r4.c b/libgfortran/generated/matmulavx128_r4.c index c58228017bf..e21ea39f124 100644 --- a/libgfortran/generated/matmulavx128_r4.c +++ b/libgfortran/generated/matmulavx128_r4.c @@ -555,20 +555,6 @@ matmul_r4_avx128_fma3 (gfc_array_r4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_4 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_r4_avx128_fma3 (gfc_array_r4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_4 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_r4_avx128_fma4 (gfc_array_r4 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_4 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_r4_avx128_fma4 (gfc_array_r4 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_4)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_4 *restrict abase_x; diff --git a/libgfortran/generated/matmulavx128_r8.c b/libgfortran/generated/matmulavx128_r8.c index e93aeec8910..e7efd075889 100644 --- a/libgfortran/generated/matmulavx128_r8.c +++ b/libgfortran/generated/matmulavx128_r8.c @@ -555,20 +555,6 @@ matmul_r8_avx128_fma3 (gfc_array_r8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_8 *restrict bbase_y; @@ -583,6 +569,20 @@ matmul_r8_avx128_fma3 (gfc_array_r8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_8 *restrict abase_x; @@ -1124,20 +1124,6 @@ matmul_r8_avx128_fma4 (gfc_array_r8 * const restrict retarray, } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const GFC_REAL_8 *restrict bbase_y; @@ -1152,6 +1138,20 @@ matmul_r8_avx128_fma4 (gfc_array_r8 * const restrict retarray, dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = (GFC_REAL_8)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const GFC_REAL_8 *restrict abase_x; diff --git a/libgfortran/m4/matmul_internal.m4 b/libgfortran/m4/matmul_internal.m4 index 32a1e01e12f..13fd7696238 100644 --- a/libgfortran/m4/matmul_internal.m4 +++ b/libgfortran/m4/matmul_internal.m4 @@ -506,20 +506,6 @@ sinclude(`matmul_asm_'rtype_code`.m4')dnl } } } - else if (axstride < aystride) - { - for (y = 0; y < ycount; y++) - for (x = 0; x < xcount; x++) - dest[x*rxstride + y*rystride] = ('rtype_name`)0; - - for (y = 0; y < ycount; y++) - for (n = 0; n < count; n++) - for (x = 0; x < xcount; x++) - /* dest[x,y] += a[x,n] * b[n,y] */ - dest[x*rxstride + y*rystride] += - abase[x*axstride + n*aystride] * - bbase[n*bxstride + y*bystride]; - } else if (GFC_DESCRIPTOR_RANK (a) == 1) { const 'rtype_name` *restrict bbase_y; @@ -534,6 +520,20 @@ sinclude(`matmul_asm_'rtype_code`.m4')dnl dest[y*rxstride] = s; } } + else if (axstride < aystride) + { + for (y = 0; y < ycount; y++) + for (x = 0; x < xcount; x++) + dest[x*rxstride + y*rystride] = ('rtype_name`)0; + + for (y = 0; y < ycount; y++) + for (n = 0; n < count; n++) + for (x = 0; x < xcount; x++) + /* dest[x,y] += a[x,n] * b[n,y] */ + dest[x*rxstride + y*rystride] += + abase[x*axstride + n*aystride] * + bbase[n*bxstride + y*bystride]; + } else { const 'rtype_name` *restrict abase_x; --rekceb-4ee45564-bce9-4713-9095-c9ce18a0de9d--