From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-156033-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 16720 invoked by alias); 19 Aug 2009 11:53:34 -0000
Received: (qmail 16635 invoked by uid 22791); 19 Aug 2009 11:53:33 -0000
X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 	tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Received: from mail1-relais-roc.national.inria.fr (HELO mail1-relais-roc.national.inria.fr) (192.134.164.82)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 19 Aug 2009 11:53:25 +0000
Received: from gaia.futurs.inria.fr (HELO [195.83.212.216]) ([195.83.212.216])   by mail1-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 19 Aug 2009 13:53:22 +0200
Message-ID: <4A8BE7B2.7080708@inria.fr>
Date: Wed, 19 Aug 2009 12:13:00 -0000
From: Albert Cohen <Albert.Cohen@inria.fr>
User-Agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103)
MIME-Version: 1.0
To: gcc@gcc.gnu.org
Subject: complete_unrolli / complete_unroll
References: <E1MdhYW-0005WQ-00.aserg2004-list-ru@f143.mail.ru>
In-Reply-To: <E1MdhYW-0005WQ-00.aserg2004-list-ru@f143.mail.ru>
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2009-08/txt/msg00331.txt.bz2

When debugging graphite, we ran into code bloat issues due to
pass_complete_unrolli being called very early in the non-ipa
optimization sequence. Much later, the full-blown pass_complete_unroll
is scheduled, and this one does not do any harm.

Strangely, this early unrolling pass (tuned to only unroll inner loops)
is only enabled at -O3, independently of the -funroll-loops flag.

Does anyone remember why it is there, for which platform it is useful,
and what are the perf regressions if we remove it?

My guess is that it may only harm... disabling or damaging the
effectivenesss of the (loop-level) vectorizer and increasing compilation
time.

Thanks,
Albert

PS: When this question is solved, it will also be interesting to start a
serious discussion on how to improve the flexibility in customizing pass
ordering and parameterization of passes depending on the target. Grigori
Fursin's work shows the strong benefits and already provides a working
prototype. This question is independent of whether the customization is
done by experts or machine-learning/statistical techniques.