From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12877 invoked by alias); 30 Dec 2012 11:25:06 -0000 Received: (qmail 12841 invoked by uid 22791); 30 Dec 2012 11:25:04 -0000 X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: sourceware.org Received: from asav3.lyse.net (HELO asav3.lyse.net) (81.167.37.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 30 Dec 2012 11:24:57 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by asav3.lyse.net (Postfix) with ESMTP id 8D45084177; Sun, 30 Dec 2012 12:24:55 +0100 (CET) Received: from zebra.redhouse.homelinux.net (121.79-160-103.customer.lyse.net [79.160.103.121]) by asav3.lyse.net (Postfix) with ESMTP id 7733B84027; Sun, 30 Dec 2012 12:24:54 +0100 (CET) Received: from [192.168.4.160] (lion [192.168.4.160]) by zebra.redhouse.homelinux.net (Postfix) with ESMTP id 25C0BBAB5; Sun, 30 Dec 2012 12:24:54 +0100 (CET) Message-ID: <50E02486.6010103@hesbynett.no> Date: Sun, 30 Dec 2012 11:25:00 -0000 From: David Brown User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120912 Thunderbird/15.0.1 MIME-Version: 1.0 To: =?UTF-8?B?w4FuZ2VsIEdvbnrDoWxleg==?= CC: Kicer , Andrew Haley , gcc-help@gcc.gnu.org Subject: Re: problems with optimisation References: <3594412.lfrBexjLtS@kicer> <50DDB877.9000806@redhat.com> <50DDC9F7.9070606@westcontrol.com> <4179792.vI8coZ6zEV@kicer> <50DF19A5.2020909@westcontrol.com> <50DF6E0F.5090707@gmail.com> In-Reply-To: <50DF6E0F.5090707@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org X-SW-Source: 2012-12/txt/msg00153.txt.bz2 On 29/12/12 23:26, Ángel González wrote: > On 29/12/12 17:26, David Brown wrote: >> With -Os, the compiler will obey normal "inline" directives (at least, >> that is my experience when compiling C on the avr - I have not tried >> C++ much on it). It won't do any automatic extra inlining, except for >> static functions that are only used once - which are always inlined as >> this saves space. Again, I don't know how that plays with template >> functions or other C++ features. >> >> As far as I know, gcc uses weighting heuristics to decide whether to >> do something the rcall you mentioned above, compared to using the >> inlined code directly. It is certainly not impossible that the >> weightings are not optimal here. >> >> There is currently very little use of C++ with avr-gcc. The avr port >> maintainers and the avrlibc developers have little experience with >> C++, and feel they have enough to do with just the C support. But >> there are a few people on the avr-gcc mailing list that work with C++, >> and it is certainly worth posting there too - they may be able to give >> suggestions. >> >> >> >> mvh., >> >> David > > I got good results (code apparently better) using -O3 in avr instead of > -Os. Just the skipped instructions in the prologue and epiloques may be > worth it. It may that since on avr you have one cycle per instruction > (except branches), when optimizing for speed, you indirectly also > optimize the number of instructions. However, I was using C, not C++, so > the different way of coding could lead to worse optimizations. It is not always easy to guess the best choice of optimisation flags. You are right that on the avr, small often means fast - and optimisations that first appear to make code larger (such as inlining functions that are used more than once, or loop unrolling) can lead to smaller code by avoiding prologues/epilogues, function call overheads, and other "bookkeeping" code. Theoretically, the compiler knows this and will pick the smaller code with -Os. In practice, it is a very hard problem, and there is a limit to the complexity (and accuracy) that can be achieved here. On the bright side, gcc seems to be getting steadily smarter about these things - gcc 4.7 does partial function inlining and function specialisation in some cases. Personally, I would like to see the distinction between "optimise for speed" and "optimise for size" disappear - or at least be reduced to a specialised flag (meaning "I /really/ don't care about speed - just make the code as small as possible", and vice-versa). There are several reasons for this: On modern "big" cpus, small means fast because small fits the fastest cache levels (including branch target buffers, prefetch buffers, etc.) best. On an old 386 cpu it might make sense to unroll a loop - on an i7 the fastest code will have the loop intact (unless unrolling gives additional optimisations). And now the 386 will be deprecated... On small cpus (like the avr), fast means small because fast means running fewer instructions. In cases where it makes sense to bias on the side of size or speed, programmers are notoriously bad at making such decisions themselves. Hands up all developers who always profile their code before deciding which bits need optimisations :-) The compiler, on the other hand, can do a reasonable job in many cases (see the -fipa-profile flag for an example). On big cpus, the normal optimisation choice should be "make this code as fast as possible on this processor, maintaining all standards". Other sensible options are "as fast as possible disregarding the fine print of IEEE standards" (the "-Ofast" flag), and "as fast as possible but still easy to debug" (the "-Og" flag). On small cpus, the ideal flag would be something like "as fast as possible, but fitting within 32K code memory" - but I don't see that coming in the next version or two of gcc... > I recommend giving gcc as much information as possible, and watch the > generated code. I got gcc to perform a few tricky optimizations, and in > one case, I manually unrolled a loop for him (otherwise, it didn't > notice it could be optimized). If you see a very bad instance of code > generation, open a bug. :) > What difference do you have from -Os to -O3 ? > The more information the compiler gets, the better. In particular, you always get better results if you can make your functions (and data) static - if the compiler can see that the functions don't escape (by taking their addresses), it can do far more optimisations. mvh., David