From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-156041-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 5993 invoked by alias); 19 Aug 2009 13:34:40 -0000
Received: (qmail 5980 invoked by uid 22791); 19 Aug 2009 13:34:39 -0000
X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 	tests=AWL,BAYES_00,J_CHICKENPOX_51,J_CHICKENPOX_93
X-Spam-Check-By: sourceware.org
Received: from mail3-relais-sop.national.inria.fr (HELO mail3-relais-sop.national.inria.fr) (192.134.164.104)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 19 Aug 2009 13:34:28 +0000
Received: from gaia.futurs.inria.fr (HELO [195.83.212.216]) ([195.83.212.216])   by mail3-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 19 Aug 2009 15:34:23 +0200
Message-ID: <4A8BFF5F.5070708@inria.fr>
Date: Wed, 19 Aug 2009 13:56:00 -0000
From: Albert Cohen <Albert.Cohen@inria.fr>
User-Agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103)
MIME-Version: 1.0
To: Richard Guenther <richard.guenther@gmail.com>
CC: gcc@gcc.gnu.org
Subject: Re: complete_unrolli / complete_unroll
References: <E1MdhYW-0005WQ-00.aserg2004-list-ru@f143.mail.ru>	 <4A8BE7B2.7080708@inria.fr> <84fc9c000908190507x708772cdueb8193a0ff7517aa@mail.gmail.com>
In-Reply-To: <84fc9c000908190507x708772cdueb8193a0ff7517aa@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2009-08/txt/msg00339.txt.bz2

Richard Guenther wrote:
> 2009/8/19 Albert Cohen <Albert.Cohen@inria.fr>:
>> When debugging graphite, we ran into code bloat issues due to
>> pass_complete_unrolli being called very early in the non-ipa
>> optimization sequence. Much later, the full-blown pass_complete_unroll
>> is scheduled, and this one does not do any harm.
>>
>> Strangely, this early unrolling pass (tuned to only unroll inner loops)
>> is only enabled at -O3, independently of the -funroll-loops flag.
>>
>> Does anyone remember why it is there, for which platform it is useful,
>> and what are the perf regressions if we remove it?
> 
> The early loop unrolling pass is very important to remove abstraction
> penalty for C++ programs that chose not to implement manual
> unrolling by relying on the inliner and template metaprogramming.
> 
> In tramp3d you for example see (very much simplified, intermediate
> state after some inlining):
> 
>  foo (int i, int j, int k)
> {
>  double a[][][];
>  int index[3];
>  const int dX[3] = { 1, 0, 0 };
> ...
>  for (m=0; m<3; ++m)
>   index[m] = 0;
>  index[0] = i;
>  index[1] = j;
>  index[2] = k;
>   ... a[index[0]][index[1]][index[2]];
>  for (m=0; m<3; ++m)
>   index[m] += dx[m];
> ... a[index[0]][index[1]][index[2]];
> 
> etc. to access a[i][j][k] and a[i+1][j][k].
> 
> There is an absoulte need to unroll these simple loops before
> CSE otherwise loop optimizations have no chance on optimizing
> anything here.
> 
> Another benchmark that degrades considerably without early
> unrolling is 454.calculix (in fact that one was the reason to
> add this pass).
> 
>> My guess is that it may only harm... disabling or damaging the
>> effectivenesss of the (loop-level) vectorizer and increasing compilation
>> time.
> 
> No it definitely does not.  But it has one small issue in that it sometimes
> also unrolls an outermost loop IIRC, that could be fixed.

Thanks a lot for the quick and detailed response.

It is more difficult than I thought, then :-( We'll think more, and
maybe come up with yet another pass ordering proposal, but definitely
this tramp3d code deserves to be processed by graphite AFTER
unrolling+cse has done its specialization trick.

Albert