From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14296 invoked by alias); 4 Aug 2003 12:35:39 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 14288 invoked from network); 4 Aug 2003 12:35:39 -0000 Received: from unknown (HELO mururoa.inria.fr) (138.96.80.8) by sources.redhat.com with SMTP; 4 Aug 2003 12:35:39 -0000 Received: from mururoa.inria.fr (papadop@localhost) by mururoa.inria.fr (8.12.8/8.12.5) with ESMTP id h74CZbpw004365; Mon, 4 Aug 2003 14:35:38 +0200 Message-Id: <200308041235.h74CZbpw004365@mururoa.inria.fr> To: Gabriel Dos Reis cc: Martin Reinecke , gcc@gcc.gnu.org Subject: Re: std::pow implementation In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 04 Aug 2003 12:55:00 -0000 From: Theodore Papadopoulo X-SW-Source: 2003-08/txt/msg00109.txt.bz2 Sorry, I'm certainly replying a bit late... gdr@integrable-solutions.net said: > | > It suffices to point out that (defunct) KCC did outperform GCC on most > | > real world code. > | > | But surely not due to honouring the inline keyword. > Its honouring the inline keyword was most certainly part of that. Well to be honest at the time the inline-limit was put into gcc, gcc was much better than KCC (in terms of inlining not in terms of overall efficiency of the compiled code). A simple program like: template struct A { void f() { A::f(); A::f(); } }; template <> struct A<0> { void f() { } }; int main() { A<32>::f(); } was causing a real memory hog with gcc (which inlined everything and not realizing that the functions were empty !!) while KCC was stopping after a few inlining levels. At that time, inlining was done on RTL was quite fast and the inlining limit was very high... Then a lot of things changed, garbage collection was introduced, tree based inlining was introduced, function-at-a-time and now even unit-at-a-time strategy, a new standard C++ library came in, ... (this list is certainly not intented to be used as a list of guilty changes) ... and inlining (presumably among other things) started to be really slow. Then, someone noticed, that one way to reduce the time the compiler was spending in doing inlining was simply to reduce the inlining limit (and even to introduces new ones). And there were some convincing examples to show that the overall performance of the programs were not decreased (at least too much). Unfortunately, with more testing, this has proved to be not entirely true. So saying that KCC was better than gcc because it inlined more was simply not true (at the time where this story begins), KCC was still often better than gcc and gcc was inlining everything it could (ie marked as inline either explicit or implicit). The main conclusions this leads to, are (IMHO): - The simple example above shows clearly that some limit has to exist and to be put by the compiler (of course here the function is empty and the compiler could have figured it out, but imagine if it were not). - At some point (egcs times, around 1998), even while inlining much much more than today gcc was faster and consumed less memory (and certainly had also difficult memory management related bugs...). So the scheme that Gaby describes worked (with some very high limits) at some point in the past and people were not complaining... - The current limitting strategy is coming from a very practical point of view (restrict the compilation time of the released compiler) and might not be considered as the definitive answer on the problem all the more that users have reported that the inlining limits consequences are varying a lot depending on the langage in use, the type of the code, etc. - The inlining strategy plays some important role. In the function above, gcc which (at that time) was doing top-down inlining (ie from A<32> down to A<0>) had to inline everything before being able to realize that the function was empty. I think Nathan or Zack proposed a bottom-up experimental patch around that time. From what I read here, it seems that the ideal inlining strategy still has to be found (or more likely approximated). - Context (in conjunction of optimization and inlining strategy) is also certainly important (Imagine if you could prove that f() is empty in the example above). I hope that tree based optimisation will allow partial optimization of functions after inlining, so that the metrics of the costs will be more meaningfull... Final point: it has been reported here (and it is also my experience), that for some C++ code, sometimes -Os (which I believe restricts the amount of inlining) results with more efficient code than all over -O[0123] choices. This with timings that can go for about 12s at -O0 down to 1.2s for -Os and about 2.5s for -O2 (numbers are approximative and pre-date the unit-at-a-time patch but I'm not sure how this interferes, all functions were small). This tends to show that - it is the metrics that are used to determine the efficiency of the code that need some work. - certainly, at some point the compiler will know better what to inline (sorry Gaby) and what to keep as a function. And I also buy, all the portability arguments that have been raised along this discussion. It really seems that this inlining problem is much more difficult to cope with than it was expected... I wish I had a good idea... I just hope that SSA based optimisations and the work Jan just did on metrics and unit-at-a-time, will really make effective some of the infrastructure work (like function-at-a-time and tree based inlining) that might need tree based optimisation. Here I certainly show how little I know about how inlining interferes with other optimizations (now and the past RTL based scheme). -------------------------------------------------------------------- Theodore Papadopoulo Email: Theodore.Papadopoulo@sophia.inria.fr Tel: (33) 04 92 38 76 01 --------------------------------------------------------------------