From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-305894-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 1580 invoked by alias); 28 Oct 2011 16:32:30 -0000
Received: (qmail 1531 invoked by uid 22791); 28 Oct 2011 16:32:24 -0000
X-SWARE-Spam-Status: No, hits=-1.1 required=5.0	tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from smtp21.services.sfr.fr (HELO smtp21.services.sfr.fr) (93.17.128.1)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 28 Oct 2011 16:32:08 +0000
Received: from filter.sfr.fr (localhost [127.0.0.1])	by msfrf2103.sfr.fr (SMTP Server) with ESMTP id 7DC587000348;	Fri, 28 Oct 2011 18:32:04 +0200 (CEST)
Received: from gimli.local (125.123.193.77.rev.sfr.net [77.193.123.125])	by msfrf2103.sfr.fr (SMTP Server) with ESMTP id 2380370002EC;	Fri, 28 Oct 2011 18:32:04 +0200 (CEST)
X-SFR-UUID: 20111028163204145.2380370002EC@msfrf2103.sfr.fr
From: Mikael Morin <mikael.morin@sfr.fr>
To: fortran@gcc.gnu.org
Subject: Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum  =?iso-8859-1?q?and=09product?= (AKA scalarization of reductions)
Date: Fri, 28 Oct 2011 17:25:00 -0000
User-Agent: KMail/1.13.5 (FreeBSD/8.2-PRERELEASE; KDE/4.5.5; amd64; ; )
Cc: Jack Howarth <howarth@bromo.med.uc.edu>, GCC patches <gcc-patches@gcc.gnu.org>
References: <20111027232818.18581.901@gimli.local> <20111028135636.GB32273@bromo.med.uc.edu>
In-Reply-To: <20111028135636.GB32273@bromo.med.uc.edu>
MIME-Version: 1.0
Content-Type: Multipart/Mixed;  boundary="Boundary-00=_ritqOtWC3RXraLM"
Message-Id: <201110281830.35708.mikael.morin@sfr.fr>
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-10/txt/msg02692.txt.bz2


--Boundary-00=_ritqOtWC3RXraLM
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-length: 1813

On Friday 28 October 2011 15:56:36 Jack Howarth wrote:
> Mikael,
>     The complete patch bootstraps current FSF gcc trunk on
> x86_64-apple-darwin11 and the resulting gfortran compiler can compile the
> Polyhedron 2005 benchmarks using...
> 
> Compile Command : gfortran-fsf-4.7 -O3 -ffast-math -funroll-loops -flto
> -fwhole-program %n.f90 -o %n
> 
> without runtime regressions. However I don't seem to see any particular
> performance improvements with your patches applied. In fact, a few
> benchmarks including nf and test_fpu seem to show slower runtimes
> (~8-11%). Have you done any benchmarking with and without the proposed
> patches? Jack

Not myself, but the previous versions of the patch have been reported to give 
sensitive improvement on "tonto" here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c26
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c35

Since those versions, the array constructor handling has been improved, and a 
few mostly cosmetic changes have been applied, so I expect the posted patch to 
be on par with the previous ones, possibly slightly better.

Now regarding your regressions, it is quite a lot worse, and quite unexpected.
I have just looked at test_fpu.f90 and nf.f90 from a polyhedron source I have 
found at http://www.polyhedron.com/web_images/documents/pb05.zip. 
There is no call to product in them, and both use only single-argument sum 
calls, which are not (or shouldn't be) impacted by my patch (scalar cases). 
Indeed, if I compare the code produced using -fdump-tree-original, there is 
zero difference in nf.f90, and in test_fpu.f90 only slight variations which 
are very very unlikely to cause the regression you see (see attached diff).

Could you double check your figures, and/or that the regressions are really 
caused by my patch?

Mikael

--Boundary-00=_ritqOtWC3RXraLM
Content-Type: text/x-patch;
  charset="utf-8";
  name="test_fpu.f90.003t.original.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="test_fpu.f90.003t.original.diff"
Content-length: 4850

--- test_fpu.f90.003t.original.master	2011-10-28 18:08:53.000000000 +0200
+++ test_fpu.f90.003t.original.patched	2011-10-28 18:22:28.000000000 +0200
@@ -1929,6 +1929,7 @@
                       D.2297 = offset.65 + -1;
                       atmp.64.dim[0].ubound = D.2297;
                       pos.61 = D.2297 >= 0 ? 1 : 0;
+                      offset.62 = 1;
                       {
                         integer(kind=8) S.67;
 
@@ -1936,7 +1937,6 @@
                         while (1)
                           {
                             if (S.67 > D.2297) goto L.133;
-                            offset.62 = 1;
                             if (ABS_EXPR <(*(real(kind=8)[0] * restrict) atmp.64.data)[S.67]> > limit.63)
                               {
                                 limit.63 = ABS_EXPR <(*(real(kind=8)[0] * restrict) atmp.64.data)[S.67]>;
@@ -2406,14 +2406,14 @@
                           integer(kind=8) D.2457;
                           integer(kind=8) S.104;
 
-                          D.2457 = D.2436 + D.2442;
-                          D.2458 = stride.45;
+                          D.2457 = stride.45;
+                          D.2458 = D.2436 + D.2442;
                           D.2459 = D.2443 * stride.45 + D.2439;
                           S.104 = 0;
                           while (1)
                             {
                               if (S.104 > D.2444) goto L.149;
-                              (*(real(kind=8)[0:] * restrict) atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2458 + D.2457];
+                              (*(real(kind=8)[0:] * restrict) atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2457 + D.2458];
                               S.104 = S.104 + 1;
                             }
                           L.149:;
@@ -2486,13 +2486,13 @@
                           integer(kind=8) D.2479;
                           integer(kind=8) S.106;
 
-                          D.2479 = D.2473 + D.2476;
-                          D.2480 = stride.45;
+                          D.2479 = stride.45;
+                          D.2480 = D.2473 + D.2476;
                           S.106 = D.2471;
                           while (1)
                             {
                               if (S.106 > D.2472) goto L.152;
-                              (*b)[(S.106 + D.2477) * D.2480 + D.2479] = (*temp)[S.106 + -1];
+                              (*b)[(S.106 + D.2477) * D.2479 + D.2480] = (*temp)[S.106 + -1];
                               S.106 = S.106 + 1;
                             }
                           L.152:;
@@ -2756,13 +2756,13 @@
                       integer(kind=8) D.2549;
                       integer(kind=8) S.112;
 
-                      D.2549 = D.2543 + D.2546;
-                      D.2550 = stride.45;
+                      D.2549 = stride.45;
+                      D.2550 = D.2543 + D.2546;
                       S.112 = 1;
                       while (1)
                         {
                           if (S.112 > D.2542) goto L.168;
-                          (*b)[(S.112 + D.2547) * D.2550 + D.2549] = (*temp)[S.112 + -1];
+                          (*b)[(S.112 + D.2547) * D.2549 + D.2550] = (*temp)[S.112 + -1];
                           S.112 = S.112 + 1;
                         }
                       L.168:;
@@ -2885,13 +2885,13 @@
                       integer(kind=8) D.2582;
                       integer(kind=8) S.115;
 
-                      D.2582 = D.2575 + D.2579;
-                      D.2583 = stride.45;
+                      D.2582 = stride.45;
+                      D.2583 = D.2575 + D.2579;
                       S.115 = 1;
                       while (1)
                         {
                           if (S.115 > D.2578) goto L.176;
-                          (*temp)[S.115 + -1] = (*b)[(S.115 + D.2580) * D.2583 + D.2582];
+                          (*temp)[S.115 + -1] = (*b)[(S.115 + D.2580) * D.2582 + D.2583];
                           S.115 = S.115 + 1;
                         }
                       L.176:;
@@ -3348,6 +3348,7 @@
                       D.2733 = (integer(kind=8)) *n;
                       D.2734 = (integer(kind=8)) k;
                       pos.146 = D.2732 <= D.2733 ? 1 : 0;
+                      offset.147 = 1 - D.2732;
                       {
                         integer(kind=8) D.2736;
                         integer(kind=8) S.149;
@@ -3357,7 +3358,6 @@
                         while (1)
                           {
                             if (S.149 > D.2733) goto L.191;
-                            offset.147 = 1 - D.2732;
                             if (ABS_EXPR <(*b)[S.149 + D.2736]> > limit.148)
                               {
                                 limit.148 = ABS_EXPR <(*b)[S.149 + D.2736]>;

--Boundary-00=_ritqOtWC3RXraLM--