From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5993 invoked by alias); 19 Aug 2009 13:34:40 -0000 Received: (qmail 5980 invoked by uid 22791); 19 Aug 2009 13:34:39 -0000 X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_51,J_CHICKENPOX_93 X-Spam-Check-By: sourceware.org Received: from mail3-relais-sop.national.inria.fr (HELO mail3-relais-sop.national.inria.fr) (192.134.164.104) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 19 Aug 2009 13:34:28 +0000 Received: from gaia.futurs.inria.fr (HELO [195.83.212.216]) ([195.83.212.216]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 19 Aug 2009 15:34:23 +0200 Message-ID: <4A8BFF5F.5070708@inria.fr> Date: Wed, 19 Aug 2009 13:56:00 -0000 From: Albert Cohen User-Agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103) MIME-Version: 1.0 To: Richard Guenther CC: gcc@gcc.gnu.org Subject: Re: complete_unrolli / complete_unroll References: <4A8BE7B2.7080708@inria.fr> <84fc9c000908190507x708772cdueb8193a0ff7517aa@mail.gmail.com> In-Reply-To: <84fc9c000908190507x708772cdueb8193a0ff7517aa@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2009-08/txt/msg00339.txt.bz2 Richard Guenther wrote: > 2009/8/19 Albert Cohen : >> When debugging graphite, we ran into code bloat issues due to >> pass_complete_unrolli being called very early in the non-ipa >> optimization sequence. Much later, the full-blown pass_complete_unroll >> is scheduled, and this one does not do any harm. >> >> Strangely, this early unrolling pass (tuned to only unroll inner loops) >> is only enabled at -O3, independently of the -funroll-loops flag. >> >> Does anyone remember why it is there, for which platform it is useful, >> and what are the perf regressions if we remove it? > > The early loop unrolling pass is very important to remove abstraction > penalty for C++ programs that chose not to implement manual > unrolling by relying on the inliner and template metaprogramming. > > In tramp3d you for example see (very much simplified, intermediate > state after some inlining): > > foo (int i, int j, int k) > { > double a[][][]; > int index[3]; > const int dX[3] = { 1, 0, 0 }; > ... > for (m=0; m<3; ++m) > index[m] = 0; > index[0] = i; > index[1] = j; > index[2] = k; > ... a[index[0]][index[1]][index[2]]; > for (m=0; m<3; ++m) > index[m] += dx[m]; > ... a[index[0]][index[1]][index[2]]; > > etc. to access a[i][j][k] and a[i+1][j][k]. > > There is an absoulte need to unroll these simple loops before > CSE otherwise loop optimizations have no chance on optimizing > anything here. > > Another benchmark that degrades considerably without early > unrolling is 454.calculix (in fact that one was the reason to > add this pass). > >> My guess is that it may only harm... disabling or damaging the >> effectivenesss of the (loop-level) vectorizer and increasing compilation >> time. > > No it definitely does not. But it has one small issue in that it sometimes > also unrolls an outermost loop IIRC, that could be fixed. Thanks a lot for the quick and detailed response. It is more difficult than I thought, then :-( We'll think more, and maybe come up with yet another pass ordering proposal, but definitely this tramp3d code deserves to be processed by graphite AFTER unrolling+cse has done its specialization trick. Albert