From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-299774-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 21043 invoked by alias); 20 Nov 2009 13:45:46 -0000
Received: (qmail 20846 invoked by uid 48); 20 Nov 2009 13:45:10 -0000
Date: Fri, 20 Nov 2009 13:45:00 -0000
Message-ID: <20091120134510.20845.qmail@sourceware.org>
X-Bugzilla-Reason: CC
References: <bug-42108-9410@http.gcc.gnu.org/bugzilla/>
Subject: [Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
In-Reply-To: <bug-42108-9410@http.gcc.gnu.org/bugzilla/>
Reply-To: gcc-bugzilla@gcc.gnu.org
To: gcc-bugs@gcc.gnu.org
From: "dominiq at lps dot ens dot fr" <gcc-bugzilla@gcc.gnu.org>
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2009-11/txt/msg01687.txt.bz2


------- Comment #9 from dominiq at lps dot ens dot fr  2009-11-20 13:45 -------
I am rather confused by some comments:

(1) Although I am not fluent with x86 assembly, I am pretty sure that no code
in eval is vectorized (assembly taken from this pr or from the original post
http://gcc.gnu.org/ml/fortran/2009-11/msg00163.html).

(2) If I am not mistaken, the k loop always handle 3 elements for i, i+n, and
i+2*n.

(3) On a core2duo 2.1Ghz, I only see small changes in the timing between 4.3.4
to trunk, -O1 to -O3, and 32 or 64 bit mode.

Now if I do the following change:

--- pr42108_1_db.f90    2009-11-20 14:14:05.000000000 +0100
+++ pr42108_1_db_1.f90  2009-11-20 14:15:24.000000000 +0100
@@ -7,12 +7,10 @@ subroutine  eval(foo1,foo2,foo3,foo4,x,n
   do i=2,n
     foo3(i)=foo2*foo4(i)
     do  j=1,i-1
-      temp=0.0d0
-      jmini=j-i
-      do  k=i,nnd,n
-        temp=temp+(x(k)-x(k+jmini))**2
-      end do
-      temp = sqrt(temp+foo1)
+      temp = sqrt( (x(i) - x(j))**2 &
+                  +(x(i+n) - x(j+n))**2 &
+                  +(x(i+2*n)-x(j+2*n))**2 &
+                  +foo1)
       foo3(i)=foo3(i)+temp*foo4(j)
       foo3(j)=foo3(j)+temp*foo4(i)
     end do

I go from 9.2s to 5.5s for n=20000. So the k loop is not automatically unrolled
even with -funroll-loops.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108