From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4583 invoked by alias); 16 May 2002 14:07:46 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 4393 invoked from network); 16 May 2002 14:07:07 -0000 Received: from unknown (HELO atrey.karlin.mff.cuni.cz) (195.113.31.123) by sources.redhat.com with SMTP; 16 May 2002 14:07:07 -0000 Received: by atrey.karlin.mff.cuni.cz (Postfix, from userid 4018) id B9BA94FCD7; Thu, 16 May 2002 16:07:07 +0200 (CEST) Date: Thu, 16 May 2002 07:33:00 -0000 From: Jan Hubicka To: Robert Dewar Cc: dberlin@dberlin.org, mark@codesourcery.com, roger@eyesopen.com, aj@suse.de, davem@redhat.com, gcc@gcc.gnu.org, rth@redhat.com Subject: -O2 versus -O1 (Was: Re: GCSE store motion) Message-ID: <20020516140707.GH21167@atrey.karlin.mff.cuni.cz> References: <20020516114838.949B6F28C9@nile.gnat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020516114838.949B6F28C9@nile.gnat.com> User-Agent: Mutt/1.3.27i X-SW-Source: 2002-05/txt/msg01288.txt.bz2 > > That means we shouldn't be spending much time trying to do software > > loop pipelining when compiling GCC, so the optimization shouldn't > > make compiling the compiler significantly slower. > > I don't see how you conclude this. You have to do the analysis on every > loop. There will definitely be loops in GCC where the optimization is > possible, there will be loops where it is not. I would expect the > compiler to spend quite a bit of time trying to improve code for > loops in GCC. What I am saying is that I doubt that the overall > effect will be that benficial for GCC. I don't think the rule should be taken literaly for each optimization. Software pipelining, profile feedback, loop unroling, function inlining, prefetch code genration, scheduling on i386 are all optimizations that will lose in such test and still are worthwhile to have as for numeric code for instance are a must. I think we have -O1 for those "I want sane code but don't have time to wait" and -O2 for "I can wait to save extra few %". On the other hand, what I think is wortwhile is to reconsider what optimizations should be enabled at -O1. Currently we do: flag_defer_pop = 1; flag_thread_jumps = 1; #ifdef DELAY_SLOTS flag_delayed_branch = 1; #endif #ifdef CAN_DEBUG_WITHOUT_FP flag_omit_frame_pointer = 1; #endif flag_guess_branch_prob = 1; flag_cprop_registers = 1; flag_loop_optimize = 1; flag_crossjumping = 1; flag_if_conversion = 1; flag_if_conversion2 = 1; I believe crossjumping, jump threading and perhaps if conversion 2 are examples of such optimizations that are expensive and brings not so much benefit. Do you think it makes sense to run some tests and think about disabling them? Would be the "bootstrap -O1" considered as valueable rule of thumb? On the other hand at -O2 we do some bits that are not that expensive and may come to -O1 category. I would guess for: flag_optimize_sibling_calls = 1; flag_rename_registers = 1; flag_caller_saves = 1; flag_force_mem = 1; flag_regmove = 1; flag_strict_aliasing = 1; flag_reorder_blocks = 1; flag_reorder_functions = 1; What do you think? If we get kind of agreeement, I can run series of tests for these optimizations... Another thing I believe can be worthwhile is to have switch that enables the aggressive bits, like loop unrolling or prefetch people can use for benchmarks or very CPU bound code. It appears to be common problems of the GCC reviews that they do use suboptimal switches and partly it is our mistake I guess. It is very dificult to set it up. Honza