From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1580 invoked by alias); 28 Oct 2011 16:32:30 -0000 Received: (qmail 1531 invoked by uid 22791); 28 Oct 2011 16:32:24 -0000 X-SWARE-Spam-Status: No, hits=-1.1 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp21.services.sfr.fr (HELO smtp21.services.sfr.fr) (93.17.128.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 28 Oct 2011 16:32:08 +0000 Received: from filter.sfr.fr (localhost [127.0.0.1]) by msfrf2103.sfr.fr (SMTP Server) with ESMTP id 7DC587000348; Fri, 28 Oct 2011 18:32:04 +0200 (CEST) Received: from gimli.local (125.123.193.77.rev.sfr.net [77.193.123.125]) by msfrf2103.sfr.fr (SMTP Server) with ESMTP id 2380370002EC; Fri, 28 Oct 2011 18:32:04 +0200 (CEST) X-SFR-UUID: 20111028163204145.2380370002EC@msfrf2103.sfr.fr From: Mikael Morin To: fortran@gcc.gnu.org Subject: Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum =?iso-8859-1?q?and=09product?= (AKA scalarization of reductions) Date: Fri, 28 Oct 2011 17:25:00 -0000 User-Agent: KMail/1.13.5 (FreeBSD/8.2-PRERELEASE; KDE/4.5.5; amd64; ; ) Cc: Jack Howarth , GCC patches References: <20111027232818.18581.901@gimli.local> <20111028135636.GB32273@bromo.med.uc.edu> In-Reply-To: <20111028135636.GB32273@bromo.med.uc.edu> MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_ritqOtWC3RXraLM" Message-Id: <201110281830.35708.mikael.morin@sfr.fr> X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-10/txt/msg02692.txt.bz2 --Boundary-00=_ritqOtWC3RXraLM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-length: 1813 On Friday 28 October 2011 15:56:36 Jack Howarth wrote: > Mikael, > The complete patch bootstraps current FSF gcc trunk on > x86_64-apple-darwin11 and the resulting gfortran compiler can compile the > Polyhedron 2005 benchmarks using... > > Compile Command : gfortran-fsf-4.7 -O3 -ffast-math -funroll-loops -flto > -fwhole-program %n.f90 -o %n > > without runtime regressions. However I don't seem to see any particular > performance improvements with your patches applied. In fact, a few > benchmarks including nf and test_fpu seem to show slower runtimes > (~8-11%). Have you done any benchmarking with and without the proposed > patches? Jack Not myself, but the previous versions of the patch have been reported to give sensitive improvement on "tonto" here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c26 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c35 Since those versions, the array constructor handling has been improved, and a few mostly cosmetic changes have been applied, so I expect the posted patch to be on par with the previous ones, possibly slightly better. Now regarding your regressions, it is quite a lot worse, and quite unexpected. I have just looked at test_fpu.f90 and nf.f90 from a polyhedron source I have found at http://www.polyhedron.com/web_images/documents/pb05.zip. There is no call to product in them, and both use only single-argument sum calls, which are not (or shouldn't be) impacted by my patch (scalar cases). Indeed, if I compare the code produced using -fdump-tree-original, there is zero difference in nf.f90, and in test_fpu.f90 only slight variations which are very very unlikely to cause the regression you see (see attached diff). Could you double check your figures, and/or that the regressions are really caused by my patch? Mikael --Boundary-00=_ritqOtWC3RXraLM Content-Type: text/x-patch; charset="utf-8"; name="test_fpu.f90.003t.original.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="test_fpu.f90.003t.original.diff" Content-length: 4850 --- test_fpu.f90.003t.original.master 2011-10-28 18:08:53.000000000 +0200 +++ test_fpu.f90.003t.original.patched 2011-10-28 18:22:28.000000000 +0200 @@ -1929,6 +1929,7 @@ D.2297 = offset.65 + -1; atmp.64.dim[0].ubound = D.2297; pos.61 = D.2297 >= 0 ? 1 : 0; + offset.62 = 1; { integer(kind=8) S.67; @@ -1936,7 +1937,6 @@ while (1) { if (S.67 > D.2297) goto L.133; - offset.62 = 1; if (ABS_EXPR <(*(real(kind=8)[0] * restrict) atmp.64.data)[S.67]> > limit.63) { limit.63 = ABS_EXPR <(*(real(kind=8)[0] * restrict) atmp.64.data)[S.67]>; @@ -2406,14 +2406,14 @@ integer(kind=8) D.2457; integer(kind=8) S.104; - D.2457 = D.2436 + D.2442; - D.2458 = stride.45; + D.2457 = stride.45; + D.2458 = D.2436 + D.2442; D.2459 = D.2443 * stride.45 + D.2439; S.104 = 0; while (1) { if (S.104 > D.2444) goto L.149; - (*(real(kind=8)[0:] * restrict) atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2458 + D.2457]; + (*(real(kind=8)[0:] * restrict) atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2457 + D.2458]; S.104 = S.104 + 1; } L.149:; @@ -2486,13 +2486,13 @@ integer(kind=8) D.2479; integer(kind=8) S.106; - D.2479 = D.2473 + D.2476; - D.2480 = stride.45; + D.2479 = stride.45; + D.2480 = D.2473 + D.2476; S.106 = D.2471; while (1) { if (S.106 > D.2472) goto L.152; - (*b)[(S.106 + D.2477) * D.2480 + D.2479] = (*temp)[S.106 + -1]; + (*b)[(S.106 + D.2477) * D.2479 + D.2480] = (*temp)[S.106 + -1]; S.106 = S.106 + 1; } L.152:; @@ -2756,13 +2756,13 @@ integer(kind=8) D.2549; integer(kind=8) S.112; - D.2549 = D.2543 + D.2546; - D.2550 = stride.45; + D.2549 = stride.45; + D.2550 = D.2543 + D.2546; S.112 = 1; while (1) { if (S.112 > D.2542) goto L.168; - (*b)[(S.112 + D.2547) * D.2550 + D.2549] = (*temp)[S.112 + -1]; + (*b)[(S.112 + D.2547) * D.2549 + D.2550] = (*temp)[S.112 + -1]; S.112 = S.112 + 1; } L.168:; @@ -2885,13 +2885,13 @@ integer(kind=8) D.2582; integer(kind=8) S.115; - D.2582 = D.2575 + D.2579; - D.2583 = stride.45; + D.2582 = stride.45; + D.2583 = D.2575 + D.2579; S.115 = 1; while (1) { if (S.115 > D.2578) goto L.176; - (*temp)[S.115 + -1] = (*b)[(S.115 + D.2580) * D.2583 + D.2582]; + (*temp)[S.115 + -1] = (*b)[(S.115 + D.2580) * D.2582 + D.2583]; S.115 = S.115 + 1; } L.176:; @@ -3348,6 +3348,7 @@ D.2733 = (integer(kind=8)) *n; D.2734 = (integer(kind=8)) k; pos.146 = D.2732 <= D.2733 ? 1 : 0; + offset.147 = 1 - D.2732; { integer(kind=8) D.2736; integer(kind=8) S.149; @@ -3357,7 +3358,6 @@ while (1) { if (S.149 > D.2733) goto L.191; - offset.147 = 1 - D.2732; if (ABS_EXPR <(*b)[S.149 + D.2736]> > limit.148) { limit.148 = ABS_EXPR <(*b)[S.149 + D.2736]>; --Boundary-00=_ritqOtWC3RXraLM--