From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-52360-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 4583 invoked by alias); 16 May 2002 14:07:46 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 4393 invoked from network); 16 May 2002 14:07:07 -0000
Received: from unknown (HELO atrey.karlin.mff.cuni.cz) (195.113.31.123)
  by sources.redhat.com with SMTP; 16 May 2002 14:07:07 -0000
Received: by atrey.karlin.mff.cuni.cz (Postfix, from userid 4018)
	id B9BA94FCD7; Thu, 16 May 2002 16:07:07 +0200 (CEST)
Date: Thu, 16 May 2002 07:33:00 -0000
From: Jan Hubicka <jh@suse.cz>
To: Robert Dewar <dewar@gnat.com>
Cc: dberlin@dberlin.org, mark@codesourcery.com, roger@eyesopen.com,
	aj@suse.de, davem@redhat.com, gcc@gcc.gnu.org, rth@redhat.com
Subject: -O2 versus -O1 (Was: Re: GCSE store motion)
Message-ID: <20020516140707.GH21167@atrey.karlin.mff.cuni.cz>
References: <20020516114838.949B6F28C9@nile.gnat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20020516114838.949B6F28C9@nile.gnat.com>
User-Agent: Mutt/1.3.27i
X-SW-Source: 2002-05/txt/msg01288.txt.bz2

> > That means we shouldn't be spending much time trying to do software
> > loop pipelining when compiling GCC, so the optimization shouldn't
> > make compiling the compiler significantly slower.
> 
> I don't see how you conclude this. You have to do the analysis on every
> loop. There will definitely be loops in GCC where the optimization is
> possible, there will be loops where it is not. I would expect the
> compiler to spend quite a bit of time trying to improve code for
> loops in GCC. What I am saying is that I doubt that the overall
> effect will be that benficial for GCC.

I don't think the rule should be taken literaly for each optimization.
Software pipelining, profile feedback, loop unroling, function inlining,
prefetch code genration, scheduling on i386 are all optimizations that will
lose in such test and still are worthwhile to have as for numeric code for
instance are a must.

I think we have -O1 for those "I want sane code but don't have time to wait"
and -O2 for "I can wait to save extra few %".

On the other hand, what I think is wortwhile is to reconsider what optimizations
should be enabled at -O1. Currently we do:

      flag_defer_pop = 1;
      flag_thread_jumps = 1;
#ifdef DELAY_SLOTS
      flag_delayed_branch = 1;
#endif
#ifdef CAN_DEBUG_WITHOUT_FP
      flag_omit_frame_pointer = 1;
#endif
      flag_guess_branch_prob = 1;
      flag_cprop_registers = 1;
      flag_loop_optimize = 1;
      flag_crossjumping = 1;
      flag_if_conversion = 1;
      flag_if_conversion2 = 1;

I believe crossjumping, jump threading and perhaps if conversion 2 are examples
of such optimizations that are expensive and brings not so much benefit.
Do you think it makes sense to run some tests and think about disabling them?
Would be the "bootstrap -O1" considered as valueable rule of thumb?

On the other hand at -O2 we do some bits that are not that expensive
and may come to -O1 category.  I would guess for:

      flag_optimize_sibling_calls = 1;
      flag_rename_registers = 1;
      flag_caller_saves = 1;
      flag_force_mem = 1;
      flag_regmove = 1;
      flag_strict_aliasing = 1;
      flag_reorder_blocks = 1;
      flag_reorder_functions = 1;

What do you think?  If we get kind of agreeement, I can run series of tests
for these optimizations...

Another thing I believe can be worthwhile is to have switch that enables
the aggressive bits, like loop unrolling or prefetch people can use for
benchmarks or very CPU bound code.  It appears to be common problems of the
GCC reviews that they do use suboptimal switches and partly it is our mistake
I guess. It is very dificult to set it up.

Honza