PCH, and more generally C++ parser performance

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* PCH, and more generally C++ parser performance
@ 2000-08-24 18:17 Zack Weinberg
  2000-08-24 18:45 ` Mark Mitchell
                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Zack Weinberg @ 2000-08-24 18:17 UTC (permalink / raw)
  To: gcc

We've been discussing the performance problems of the C++ front end on
an internal Redhat mailing list.  I was asked to open the discussion
up to a wider audience.

We are looking at a specific test case, provided by Apple.  Here's a
quote from the original letter:

> First, get SPU 0.5, which is in the sources.redhat.com CVS archive,
> under src/utils/spu.  It's configured automatically as part of a
> GDB checkout's configure, but not built, so you have to cd to
> <objdir>/utils/spu and do the make manually.  It's a very simple
> program, just consists of one file, so "cc spu.c" would work too.
> 
> Then do
> 
>   ./spu --language c++ \
>     --files 1 --functions 20 --function-length 10 \
>     --lib-enums 2000 --enumerators 8 \             
>     --lib-structs 1000 --fields 12 \
>     --lib-classes 800 --methods 20 \
>     --lib-functions 10000           

This generates a bunch of files named file<whatever>.  The ones we're
interested in are file0.cc and filelib.h.  file0.cc has 20 functions
in it, but it includes filelib.h which has ten thousand or so.  This
is said to approximate the structure of some real-world,
class-library-heavy code.

Once you have all these files, you can compile them several different
ways, but the one I've been using is just

$ time ./cc1plus -lang-c++ -quiet file0.cc -o file0.s

This cc1plus has the preprocessor built into it.  If you don't want to
muck with the 7,000 line patch to get an integrated preprocessor, you
can run file0.cc through gcc -E and then invoke cc1plus on the .ii
file; the numbers will be roughly the same.

With gcc 2.95.2 on my machine, it takes ten seconds to compile file0.
With current CVS, it takes sixteen seconds.  This is x86-linux; Stan
Shebs gets similar numbers on ppc-macosx with comparable hardware.

The top ten, from gprof:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 16.00      2.55     2.55   285012     0.01     0.01  get_identifier
 10.10      4.16     1.61 14360005     0.00     0.00  ggc_set_mark
  8.16      5.46     1.30       18    72.22   193.19  ggc_mark_trees
  4.96      6.25     0.79        1   790.00 15902.99  yyparse
  3.51      6.81     0.56     5652     0.10     0.10  lookup_tag
  3.32      7.34     0.53  1404134     0.00     0.00  memset
  3.01      7.82     0.48  1861769     0.00     0.00  ggc_alloc
  2.32      8.19     0.37  2011020     0.00     0.00  lang_mark_tree
  2.01      8.51     0.32  1222221     0.00     0.00  make_node
  1.44      8.74     0.23   201127     0.00     0.00  walk_tree

I conclude from this that we have three targets to wail on:
get_identifier, garbage collection, and the parser.

(1) get_identifier is an easy one.  

[9]     17.6    2.55    0.25  285012         get_identifier [9]
                0.10    0.00  285012/388269      strlen [104]
                0.02    0.06   89982/1222221     make_node [21]
                0.01    0.07   89982/138858      ggc_alloc_string [110]

It's been called 285,000 times.  It called make_node 90,000 times.
That means that there were 90,000 unique identifiers in the sample
code.  get_identifier's hash table has only 1000 slots; even with a
perfect hash function, which it does not have, we would get 90-element
chains.  (An ideal binary search tree would have depth 16.)

Compare the performance of cpplib's hash table lookup routine, which
is doing more or less the same thing:

[83]     1.4    0.20    0.03  179920         _cpp_lookup_with_hash [83]
                0.01    0.00   37035/322515      memcpy [105]
                0.01    0.00       4/4           expand_hash [339]
                0.00    0.00     306/395         _obstack_newchunk [538]
                0.00    0.00       4/153205      cfree [115]

This uses a specialized implementation of Vladimir Makarov's
expandable hash tables, and a sane hash function that's calculated in
parallel with scanning the identifier.  1.4% of runtime instead of
17.6%.  It isn't being called as many times, but if it scales linearly
(and it should) it'd only need 0.31 seconds to do what get_identifier
is doing.  I intend to make get_identifier use cpplib's hash table
implementation, and share the table itself with cpplib when using an
integrated preprocessor, which should wipe out this bottleneck.

(2) Garbage collection.  Here the relevant question is, how much
memory are we using?  I instrumented mmap() and brk() and I have
pretty graphs which you can have if you're interested.  But this test
case is pretty darn pathological for memory usage.  We free very
little memory over the course of the run, and at the end we're using
45 megabytes of mmapped anonymous memory, plus nine allocated with
brk.  When you count program text, libraries, daemons, and kernel
space memory, this test case is going to need more than 64 megs of
RAM.  I have 256, it didn't hit swap space on my system, but it will
on many people's.

To a first approximation, all of the memory allocated with mmap is
GCed arena:

                0.00    0.00       1/719         init_ggc [890]
                0.00    0.00       1/719         _IO_file_doallocate [971]
                0.00    0.00       3/719         read_file [1517]
                0.00    0.00       9/719         chunk_alloc [102]
                0.00    0.00     705/719         ggc_alloc [46]
[1188]   0.0    0.00    0.00     719         mmap [1188]

And there is simply no fast way to mark and sweep 45 megabytes of
data.  We could bum cycles out of ggc_set_mark and ggc_mark_trees; I
did a bit of that already, but the improvements are negligible.  Any
real improvement's got to come from reducing the amount of memory GC
has to inspect.

There's three ways I can think of to do that.  First, we could reduce
the internal fragmentation in ggc-page's allocator.  It rounds
everything to powers of two.  On 32-bit machines, RTXes are 4+4n bytes
where n is rtx_length[] of the particular type.  n=2 is very common,
leading to size 12 and a word of wasted space.  Trees are even worse;
most members of tree_union are ~20 bytes (rounded to 32), and two are
~100 bytes (rounded to 128).  We could go to a slab-ish allocation
scheme with tight packing, or we could figure out how to make the
common tree nodes smaller.  This wouldn't reduce the absolute quantity
of work to be done by the mark phase, but it would reduce the total
memory in use and so improve the cache situation or at least the paging
situation.

Second, we could change to some sort of incremental or generational
collector.  This usually requires evil system dependent hacks, because
you have to write-protect memory and intercept the faults.  I proposed
a simple two-generation scheme awhile ago, where we would explicitly
assert to GC that (for example) the saved structure for an inline
function would never again be modified nor was it to be deallocated.
The mark phase could then ignore it.  Mark Mitchell thought that the
necessary invariant didn't happen often enough to be worth it, but
maybe we could change things so it did.

Third, we could not generate so much data in the first place.  Most of
that 45 megs is saved tree representations of inline functions.  Most
of those inline functions are never used.  If we could avoid parsing
them in the first place, wouldn't that be nice?  Here's where you're
all thinking precompiled headers.  That's the right direction, but a
naive PCH implementation - like the one Cygnus had that blindly dumped
the compiler's data space to disk - won't help much here.  We'd have
to reread and possibly unpickle all 45 megs on each compiler invocation,
and once it's in, the garbage collector would have to deal with it.

Apple inherited a remarkably clever PCH implementation from NeXT.  It
can figure out which functions the compiler actually needs to see, and
present it with only those.  Alas, it only works with C; but Stan
reports that for the C version of the above example, compiler runtime
with PCH is less than a second.  (Part of that might be the simpler C
front end, though.)  We can do something similar without necessarily
implementing PCH.  The C++ front end already has logic to delay
parsing inline functions until convenient.  It is used only to work
around some of the horrors of parsing C++ at the moment, but it could
be applied to every inline function, and they could be delayed until
actually required instead of just until the context for parsing them
is better.  The unparsed structure is somewhat smaller, and much
simpler, than the tree it becomes.

I like this idea a lot.  I may not have time or sufficient understanding
of the C++ front end to implement it - I'm going back to grad school
in three weeks - but I think it's worth examining in detail.

(3) The parser is also bloody slow; we're spending .79 seconds just in
yyparse.  This is mostly the fault of C++ the language.  Improvements
in this area will come from giving it less text to grovel through -
another advantage of the above idea about delaying parsing until
needed.  (The code that saves inline functions for later uses a
trivial brace-counting algorithm.)

It's instructive to note that if you use byacc instead of bison, the
generated parser is significantly faster - by itself.  The difference
isn't enough to show up in the total wall clock time.

  4.96      6.25     0.79        1   790.00 15902.99  yyparse (bison)
  3.57      7.20     0.57        1   570.00 15916.08  yyparse (byacc)

If you're desperate, turning off YYDEBUG might also help.

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 18:17 PCH, and more generally C++ parser performance Zack Weinberg
@ 2000-08-24 18:45 ` Mark Mitchell
  2000-08-24 20:35   ` Richard Henderson
                     ` (2 more replies)
  2000-08-25 12:28 ` Stan Shebs
  2000-08-30 12:32 ` Joern Rennecke
  2 siblings, 3 replies; 42+ messages in thread
From: Mark Mitchell @ 2000-08-24 18:45 UTC (permalink / raw)
  To: zack; +Cc: gcc

>>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:

    Zack> It's been called 285,000 times.  It called make_node 90,000
    Zack> times.  That means that there were 90,000 unique identifiers
    Zack> in the sample code.  get_identifier's hash table has only
    Zack> 1000 slots; even with a perfect hash function, which it does
    Zack> not have, we would get 90-element chains.  (An ideal binary
    Zack> search tree would have depth 16.)

Funny -- Alex, Richard, Jason, Benjamin and I we were just talking
about this at lunch yesterday.  I was planning on doing something
about this -- but it's great if you will instead.

    Zack> And there is simply no fast way to mark and sweep 45
    Zack> megabytes of data.  We could bum cycles out of ggc_set_mark
    Zack> and ggc_mark_trees; I did a bit of that already, but the
    Zack> improvements are negligible.  Any real improvement's got to
    Zack> come from reducing the amount of memory GC has to inspect.

Right.  The things I have in mind are:

  - Use buckets that are the right sizes for `decl'.  That's where
    most of the memory is going.  That will reduce total memory
    usage.  It won't reduce GC time much, though -- unless you
    get to the point where you're paging.

  - Sort the pointers when marking to improve locality.

  - Reduce the total memory usage.  DECLs have gotten fat -- there's
    a lot of stuff that's only used for some DECLs, and not others.
    When inlining, we end up with lots of copies of what are
    abstractly the same VAR_DECLs.  Besides having different RTL,
    these things are pretty much the same.  Eliminating that waste
    (somehow) would account for some 30% in some of my tests.
    There is some stuff in the statement-trees that is just cruft;
    for example, COMPOUND_STMT could pretty much go at this point.
    Removing dead code on trees would win big -- there's a lot
    of code (especially where templates are involved) that is 
    dead.  TARGET_EXPR is also one of the evil things -- besides
    having rather horrid semantics, we create a VAR_DECL for
    every TARGET_EXPR -- even if we don't need it.

  - Reduce the number of branches we mark along.  For example, if
    we made it an invariant that all DECLs are reachable in some
    particular way, we can avoid marking them from the body of
    a function.  More simply, we could say that all RTL instruction
    chains have to be reachable from the front -- so we never
    have to follow PREV_INSN.

  - Doing some kind of "freeze" on memory isn't entirely unreasonable
    -- but it is hard.  That brings us back to many of the problems
    that we faced with obstacks.  You really want to ensure that
    the frozen memory cannot change, or there will be hard to find
    bugs.  The big problem here is that it's easy to know that
    some things will always be needed -- but that doesn't do you 
    much good unless you know what they point to won't change.

    Zack> never used.  If we could avoid parsing them in the first
    Zack> place, wouldn't that be nice?

The language really doesn't let you do that.  You have to parse
everything.  Example:

  template <class T> 
  struct S { 
    static int i;
  };

  template <class T>
  int S<T>::i = f();

  inline void g () { 
    S<int>.i = 7;
  }

This program requires you to instantiate `S<int>::i', even if you
don't need `g'.

    Zack> (3) The parser is also bloody slow; we're spending .79
    Zack> seconds just in yyparse.

I think this will be improved over the next year.  I can't say much
more than that, just yet.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 18:45 ` Mark Mitchell
@ 2000-08-24 20:35   ` Richard Henderson
  2000-08-24 22:02     ` Mark Mitchell
  2000-08-24 21:54   ` PCH, and more generally C++ parser performance Zack Weinberg
  2000-08-25 22:53   ` Zack Weinberg
  2 siblings, 1 reply; 42+ messages in thread
From: Richard Henderson @ 2000-08-24 20:35 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: zack, gcc

On Thu, Aug 24, 2000 at 06:40:47PM -0700, Mark Mitchell wrote:
>   - Doing some kind of "freeze" on memory isn't entirely unreasonable
>     -- but it is hard.  That brings us back to many of the problems
>     that we faced with obstacks.

It doesn't bring back obstack horror if you use the MMU to enforce
this sort of thing, which is what Zack was mentioning.

You guess that the memory is stable, mark it read-only, and if you're
wrong you get a SEGV, mark it read-write again but now with knowledge
that it must be rescaned.

It is still hard though.


r~

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 18:45 ` Mark Mitchell
  2000-08-24 20:35   ` Richard Henderson
@ 2000-08-24 21:54   ` Zack Weinberg
  2000-08-24 22:05     ` Mark Mitchell
  2000-08-25 22:53   ` Zack Weinberg
  2 siblings, 1 reply; 42+ messages in thread
From: Zack Weinberg @ 2000-08-24 21:54 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc

On Thu, Aug 24, 2000 at 06:40:47PM -0700, Mark Mitchell wrote:
> >>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:
> 
>     Zack> It's been called 285,000 times.  It called make_node 90,000
>     Zack> times.  That means that there were 90,000 unique identifiers
>     Zack> in the sample code.  get_identifier's hash table has only
>     Zack> 1000 slots; even with a perfect hash function, which it does
>     Zack> not have, we would get 90-element chains.  (An ideal binary
>     Zack> search tree would have depth 16.)
> 
> Funny -- Alex, Richard, Jason, Benjamin and I we were just talking
> about this at lunch yesterday.  I was planning on doing something
> about this -- but it's great if you will instead.

Heh.  The sad bit is if I'd known about it before 10 AM yesterday, I'd
have been at that meeting.

I'm brutally trimming your comments where I have nothing to add.

...
>   - Sort the pointers when marking to improve locality.

I tried to do this, misunderstood the way ggc_mark_trees works, and
wound up calling qsort on every iteration.  Obviously that was worse.

Did you mean to use a priority queue or something like that, or just
process one batch, sort the resulting list, and repeat?  Either way,
I wonder if it wouldn't be wise to make the main loop work over both
trees and RTXes.  We've got ggc_mark_trees and ggc_mark_rtx_children
and they do very similar things in different fashion.

>   - Doing some kind of "freeze" on memory isn't entirely unreasonable
>     -- but it is hard.  That brings us back to many of the problems
>     that we faced with obstacks.  You really want to ensure that
>     the frozen memory cannot change, or there will be hard to find
>     bugs.  The big problem here is that it's easy to know that
>     some things will always be needed -- but that doesn't do you 
>     much good unless you know what they point to won't change.

RTH wants to use page protection to deal with this, which could work.
I'll leave that sort of hacking to people who aren't frightened of
signals :)

We might be able to do 'permanent' without 'never changes.'  Consider
IDENTIFIER_NODEs.  They're as permanent as you get, and so is the
string they point to.  But the data pointed to by their
lang_identifier structure can change.  We could add the
lang_identifiers to the GC-root table, and mark the nodes and their
strings permanently.  And there really are things that never change,
like the constant trees and RTXes.

The thing about this is, if we have the basic functionality, we can be
as conservative as we want about what it's used for.  (As long as it
remains an overall win.)

>     Zack> never used.  If we could avoid parsing them in the first
>     Zack> place, wouldn't that be nice?
> 
> The language really doesn't let you do that.  You have to parse
> everything.  Example:
> 
>   template <class T> 
>   struct S { 
>     static int i;
>   };
> 
>   template <class T>
>   int S<T>::i = f();
> 
>   inline void g () { 
>     S<int>.i = 7;
>   }
> 
> This program requires you to instantiate `S<int>::i', even if you
> don't need `g'.

I don't understand C++ well enough to know why.  In fact, I can't
parse

template <class T>
int S<T>::i = f();

at all.  Note that in this example the only text I would defer
processing would be the body of g -

{
  S<int>.i = 7;
}

And I'd think the as-if rule would let you get away with it, anyhow.
(If the only reference to S<int>::i, whatever that is, is inside g,
and g is never used, how can the program tell whether S<int>::i
exists?)

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 20:35   ` Richard Henderson
@ 2000-08-24 22:02     ` Mark Mitchell
  2000-08-24 22:14       ` Daniel Berlin
  2000-08-25 11:32       ` finding fault addresses in signal handlers Geoff Keating
  0 siblings, 2 replies; 42+ messages in thread
From: Mark Mitchell @ 2000-08-24 22:02 UTC (permalink / raw)
  To: rth; +Cc: zack, gcc

>>>>> "Richard" == Richard Henderson <rth@cygnus.com> writes:

    Richard> On Thu, Aug 24, 2000 at 06:40:47PM -0700, Mark Mitchell
    Richard> wrote:
    >> - Doing some kind of "freeze" on memory isn't entirely
    >> unreasonable -- but it is hard.  That brings us back to many of
    >> the problems that we faced with obstacks.

    Richard> It doesn't bring back obstack horror if you use the MMU
    Richard> to enforce this sort of thing, which is what Zack was
    Richard> mentioning.

Right.  But do we want to require that the OS provide that kind of
capability?  I guess we can always fall back on what we've already
got.

    Richard> You guess that the memory is stable, mark it read-only,
    Richard> and if you're wrong you get a SEGV, mark it read-write
    Richard> again but now with knowledge that it must be rescaned.

    Richard> It is still hard though.

Right.  You have to remember not only that that memory is needed, but
what all else is needed because it is referenced from there.  And if
that memory *is* written, you need to decide whether or not you want
to consider freeing the referenced memory.  If you don't want to then
go rescan everything, you have to keep reference counts.  (Ugh.)

It's not that there's nothing good down this path -- it's just that
deciding exactly how to do this is tricky.  I'm a little afraid we're
putting the cart before the horse -- the last time I looked at C++
memory usage, there looked like there were still ways to get big
percentage eliminations in the total memory consumption.  (Examples:
tree_identifier contains too many slots, most of which are unused, we
create lots of duplicate types for template parameters, lots of
duplicate template argument vectors, etc.)

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 21:54   ` PCH, and more generally C++ parser performance Zack Weinberg
@ 2000-08-24 22:05     ` Mark Mitchell
  2000-08-24 22:24       ` Daniel Berlin
  2000-08-24 23:43       ` Zack Weinberg
  0 siblings, 2 replies; 42+ messages in thread
From: Mark Mitchell @ 2000-08-24 22:05 UTC (permalink / raw)
  To: zack; +Cc: gcc

>>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:

    >> - Sort the pointers when marking to improve locality.

    Zack> I tried to do this, misunderstood the way ggc_mark_trees
    Zack> works, and wound up calling qsort on every iteration.
    Zack> Obviously that was worse.

I tried it too, using radix-sort (which has better asymptotic
complexity that qsort, although not necessarily better wall-clock
time), but I couldn't get a win, even on a case that was causing heavy
paging.  I think I just blew it somehow -- it should help.

    >>  The language really doesn't let you do that.  You have to
    >> parse everything.  Example:
    >> 
    >> template <class T> struct S { static int i; };
    >> 
    >> template <class T> int S<T>::i = f();
    >> 
    >> inline void g () { S<int>.i = 7; }
    >> 
    >> This program requires you to instantiate `S<int>::i', even if
    >> you don't need `g'.

    Zack> I don't understand C++ well enough to know why.  In fact, I
    Zack> can't parse

    Zack> template <class T> int S<T>::i = f();

Oh.  That's called a "static data member".  Think of it as a global
variable, but inside a class scope.  There's only one such thing
(that's why it's global) -- as opposed to a field, or ordinary data
member, of which there is one per object.

    Zack> at all.  Note that in this example the only text I would
    Zack> defer processing would be the body of g -

    Zack> { S<int>.i = 7; }

The point is that the instantiation -- in the body of G -- causes this
static data member to come in to existence, and I believe the standard
requires that we call `f'.  (Which of course might have side-effects.)
Skip parsing `g', and you skip instantiating `S<int>::i', and you
therefore skip the side-effects in `f' ...

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 22:02     ` Mark Mitchell
@ 2000-08-24 22:14       ` Daniel Berlin
  2000-08-25 11:32       ` finding fault addresses in signal handlers Geoff Keating
  1 sibling, 0 replies; 42+ messages in thread
From: Daniel Berlin @ 2000-08-24 22:14 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: rth, zack, gcc

>     Richard> It doesn't bring back obstack horror if you use the MMU
>     Richard> to enforce this sort of thing, which is what Zack was
>     Richard> mentioning.
> 
> Right.  But do we want to require that the OS provide that kind of
> capability?  I guess we can always fall back on what we've already
> got.

We can make ggc-page do this, since it's used on systems which are likely
to have this capability. ggc-simple could be left alone.
Does anyone know of an mmapping, or vallocing, system we support, that
can't be made to do this in a few hours?

> 
>     Richard> You guess that the memory is stable, mark it read-only,
>     Richard> and if you're wrong you get a SEGV, mark it read-write
>     Richard> again but now with knowledge that it must be rescaned.
> 
>     Richard> It is still hard though.
> 
> Right.  You have to remember not only that that memory is needed, but
> what all else is needed because it is referenced from there.  And if
> that memory *is* written, you need to decide whether or not you want
> to consider freeing the referenced memory.  If you don't want to then
> go rescan everything, you have to keep reference counts.  (Ugh.)
> 
> It's not that there's nothing good down this path -- it's just that
> deciding exactly how to do this is tricky.  I'm a little afraid we're
> putting the cart before the horse -- the last time I looked at C++
> memory usage, there looked like there were still ways to get big
> percentage eliminations in the total memory consumption.  
Absolutely.
While GC may be eating a lot of time, it's cause it has a lot of work
to do.
It's much better to reduce the amount of memory for marking to go through,
then worry about making marking quicker.

Why don't you guys (Me, i do C++ support for GDB, and so when i work on
GCC, it's removing duplicated dwarf2 info, not hacking the C++ front
end) cherry pick for a while, then start looking into difficult things.
--Dan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 22:05     ` Mark Mitchell
@ 2000-08-24 22:24       ` Daniel Berlin
  2000-08-24 23:43         ` Mark Mitchell
  2000-08-24 23:43         ` Zack Weinberg
  2000-08-24 23:43       ` Zack Weinberg
  1 sibling, 2 replies; 42+ messages in thread
From: Daniel Berlin @ 2000-08-24 22:24 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: zack, gcc

>     Zack> Obviously that was worse.
> 
> I tried it too, using radix-sort (which has better asymptotic
> complexity that qsort, although not necessarily better wall-clock
> time), but I couldn't get a win, even on a case that was causing heavy
> paging.  I think I just blew it somehow -- it should help.
Actually, since we are sorting integers, we can do it in n log log n with
nilsson's algorithm.
 http://www.nada.kth.se/~snilsson/public/code/nloglogn.c
If you look on his page, and A. Andersson's, they have papers on
implementing radix sort well.
Andersson's page is at http://www.csd.uu.se/~arnea


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 23:43         ` Zack Weinberg
@ 2000-08-24 23:43           ` Mark Mitchell
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Mitchell @ 2000-08-24 23:43 UTC (permalink / raw)
  To: zack; +Cc: dan, gcc

>>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:

    Zack> I think y'all misunderstood.

Well, maybe -- but I didn't do what you had showed.  I actually sorted
the list between passes.  Perhaps not as good as a priority queue --
but should still show benefits if there are benefits to be had.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 22:05     ` Mark Mitchell
  2000-08-24 22:24       ` Daniel Berlin
@ 2000-08-24 23:43       ` Zack Weinberg
  2000-08-25  0:19         ` Mark Mitchell
  1 sibling, 1 reply; 42+ messages in thread
From: Zack Weinberg @ 2000-08-24 23:43 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc

>     >> template <class T> struct S { static int i; };
>     >> 
>     >> template <class T> int S<T>::i = f();
>     >> 
>     >> inline void g () { S<int>.i = 7; }
>     >> 
>     >> This program requires you to instantiate `S<int>::i', even if
>     >> you don't need `g'.
> 
>     Zack> I don't understand C++ well enough to know why.  In fact, I
>     Zack> can't parse
> 
>     Zack> template <class T> int S<T>::i = f();
> 
> Oh.  That's called a "static data member".  Think of it as a global
> variable, but inside a class scope.  There's only one such thing
> (that's why it's global) -- as opposed to a field, or ordinary data
> member, of which there is one per object.

Hm.  Is int S<T>::i the same variable as the static int i inside the
declaration of struct S?  If so then I think I get it: this expression
provides the initializer for that variable.  But it still might not be
used, because we haven't said what T is yet.

> The point is that the instantiation -- in the body of G -- causes this
> static data member to come in to existence, and I believe the standard
> requires that we call `f'.  (Which of course might have side-effects.)

And here's where I'd expect as-if to come into play.  If the only
references to S<int>::i are in g and the S<T>::i = f() expression, and
g is never used, can the program tell if S<int>::i does not actually
exist as a global variable?  If not, then as-if says it doesn't have
to exist.  The program can tell that f was called, but I'd expect
there to be weasel words to the effect that the initialization
expression of a templated static data member may not be evaluated if
the template is never instantiated (or the program can't tell that it
wasn't).

Even if there are no such weasel words, couldn't we arrange to call f
and throw the value away?

If I write two normal functions, one of which refers to S<int> and the
other S<double>, is f called twice?

If your example is in a header file and two files include it, how
do we avoid initializing S<int>::i twice (hence calling f twice)?

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 22:24       ` Daniel Berlin
  2000-08-24 23:43         ` Mark Mitchell
@ 2000-08-24 23:43         ` Zack Weinberg
  2000-08-24 23:43           ` Mark Mitchell
  1 sibling, 1 reply; 42+ messages in thread
From: Zack Weinberg @ 2000-08-24 23:43 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Mark Mitchell, gcc

On Thu, Aug 24, 2000 at 10:24:05PM -0700, Daniel Berlin wrote:
> >     Zack> Obviously that was worse.
> > 
> > I tried it too, using radix-sort (which has better asymptotic
> > complexity that qsort, although not necessarily better wall-clock
> > time), but I couldn't get a win, even on a case that was causing heavy
> > paging.  I think I just blew it somehow -- it should help.
> Actually, since we are sorting integers, we can do it in n log log n with
> nilsson's algorithm.
>  http://www.nada.kth.se/~snilsson/public/code/nloglogn.c
> If you look on his page, and A. Andersson's, they have papers on
> implementing radix sort well.
> Andersson's page is at http://www.csd.uu.se/~arnea

I think y'all misunderstood.  It wasn't because you can do better than
qsort.  I thought ggc_mark_trees worked like this:

  while (old pending list is nonempty)
  {
    for (each tree in old pending list)
    {
      mark tree
      add all referenced trees to new pending list
    }
    replace old pending list with new pending list 
    clear new pending list
  }

And I put a qsort(pending list) right after the while, because it was
a quick-n-dirty hack and I would go back and use a better algorithm if
it turned out to help.  Except that's not how ggc_mark_trees works.
It's really

  while (pending stack is nonempty)
  {
    pull top element off of pending stack
    mark it
    push all referenced trees onto pending stack
  }

so it would sort the entire list every time we got done with a tree.
Making a nice O(n) algorithm into O(n*(n log n)) or worse.  glibc's
qsort appears to use mergesort, which I don't think is pathological
for almost-sorted arrays, but still.

If we were gonna do this for real, I'd suggest (a) sort the roots list
once, after we get done adding them all; (b) use a priority queue
instead of stop-everything-and-sort; (c) do trees, RTL, and everything
else all at the same time.

There's also madvise(MADV_RANDOM), which might help the kernel out a
bit if we're paging.  Or not.  The Solaris manpage seems to be talking
only about behavior of file mappings.

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 22:24       ` Daniel Berlin
@ 2000-08-24 23:43         ` Mark Mitchell
  2000-08-25  0:56           ` Daniel Berlin
  2000-08-24 23:43         ` Zack Weinberg
  1 sibling, 1 reply; 42+ messages in thread
From: Mark Mitchell @ 2000-08-24 23:43 UTC (permalink / raw)
  To: dan; +Cc: zack, gcc

>>>>> "Daniel" == Daniel Berlin <dan@cgsoftware.com> writes:

    Daniel> http://www.nada.kth.se/~snilsson/public/code/nloglogn.c

Yup.  Unfortunately, the time wasn't going to the sort -- rather, the
sorting didn't seem to reduce the paging overhead.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 23:43       ` Zack Weinberg
@ 2000-08-25  0:19         ` Mark Mitchell
  2000-08-27 15:31           ` Zack Weinberg
  0 siblings, 1 reply; 42+ messages in thread
From: Mark Mitchell @ 2000-08-25  0:19 UTC (permalink / raw)
  To: zack; +Cc: gcc

>>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:

    Zack> Hm.  Is int S<T>::i the same variable as the static int i
    Zack> inside the declaration of struct S?  If so then I think I
    Zack> get it: this expression provides the initializer for that
    Zack> variable.  But it still might not be used, because we
    Zack> haven't said what T is yet.

Right.

    Zack> Even if there are no such weasel words, couldn't we arrange
    Zack> to call f and throw the value away?

I don't think there are any such weasel words -- but I could be
wrong.

Yes, we could call `f' and throw the value away -- but how would we
know to do that without parsing `g'?

    Zack> If I write two normal functions, one of which refers to
    Zack> S<int> and the other S<double>, is f called twice?

Yes.

    Zack> If your example is in a header file and two files include
    Zack> it, how do we avoid initializing S<int>::i twice (hence
    Zack> calling f twice)?

Magic.  Each static template variable has an associated guard
variable; the initialization code checks that the guard variable is
not set before performing the initialization.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 23:43         ` Mark Mitchell
@ 2000-08-25  0:56           ` Daniel Berlin
  2000-08-27 17:09             ` Zack Weinberg
  0 siblings, 1 reply; 42+ messages in thread
From: Daniel Berlin @ 2000-08-25  0:56 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: zack, gcc

> >>>>> "Daniel" == Daniel Berlin <dan@cgsoftware.com> writes:
> 
>     Daniel> http://www.nada.kth.se/~snilsson/public/code/nloglogn.c
> 
> Yup.  Unfortunately, the time wasn't going to the sort -- rather, the
> sorting didn't seem to reduce the paging overhead.
This is because the average pointer distance isn't far enough away to
matter (at least for trees. This is the output of printing the address of
the tree, right after it gets popped off the pending list).

3dea00
39be80
3976c0
39ee00
3a0000
38e200
39eb00
39be00
cb480
c6f00
ccf00
cb600
a7080
a8580
aa900
a7200
29d80
aa600
a7000
<it gets *much* more clustered, i'll spare you the details>

We stay in the same 128k area for quite a long time, move to another one,
stay there, etc.
Very rarely do we get short runs going from one place to another.
So sorting it doesn't reduce the paging overhead all that much.

On the other hand, looking at various printouts of the data, including the
collection number+pointer, shows we seem to check the same trees over and
over, and they haven't changed, or at least, haven't changed for the most
part.
They seem to change during the first collection to the second, then not
change for 3 collections, then change, then stay constant the ret of the
time.
I assume this corresponds with various optimization passes.
It seems to make sense to take advantage of this, if it's the general
behavior (it was for random programs i tried), to not collect trees all
the time, but instead only after certain passes.
Especially if our problem is that we are marking too much, not the speed
of the marking itself.
This would, of course, likely require putting them in a different gc heap
than other things, but it would be helpful to figure out object lifetimes,
and optimize the GC so we collect the right things, at the right time.

However, as I said before, if you can reduce the amount of memory we have
to go through, in any way, it's a much better optimization (in terms of
usefulness) at this point than optimizing the GC in any way (which is
always optimization to try to not check as much memory as we would without
a given GC optimization).

 --Dan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* finding fault addresses in signal handlers
  2000-08-24 22:02     ` Mark Mitchell
  2000-08-24 22:14       ` Daniel Berlin
@ 2000-08-25 11:32       ` Geoff Keating
  1 sibling, 0 replies; 42+ messages in thread
From: Geoff Keating @ 2000-08-25 11:32 UTC (permalink / raw)
  To: gcc

There was some question about how portable it was to ask for the fault
address in a signal handler.

It turns out that (a) there's a Unix(tm) way of doing this, which
works on Solaris and AIX, and fails to work in an interesting way on
HPUX (of course), and which doesn't work on linux due to not being
implemented; and (b) it's not hard to make linux go with typically
10-20 minutes work.

Here's my test program.  Works on Solaris, AIX, linux/x86, linux/ppc.
You wouldn't use the #ifdefs in gcc, instead keying off host triples,
as otherwise it'd be just too ugly.  I can document the macros if
people insist :-).
-- 
- Geoffrey Keating <geoffk@cygnus.com>

===File ~/t.c===============================================
#include <signal.h>
#include <stdio.h>
#include <sys/mman.h>
#include <limits.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>

/* Yes, they all have to be 'volatile'.  Really.  */
typedef unsigned long ptrint_t;
static volatile sig_atomic_t lock = 1;
static volatile ptrint_t blockstart_g;
static volatile ptrint_t blockend_g;
static volatile unsigned long *volatile bitmask_g;
#define BITMASK_UNIT (sizeof(*bitmask_g) * CHAR_BIT)
static volatile size_t page_size_g; /* = (size_t) sysconf (_SC_PAGESIZE); */

/* Either unprotect the page containing addr_p, if it is known to the
   GC, and return 1, or return 0 indicating that there was a real
   page fault.  */
static int
maybe_mark_address (void *addr_p)
{
  const ptrint_t addr = (ptrint_t) addr_p;

  /* Local shadows of volatile variables so that performance isn't
     affected.  */
  const ptrint_t blockstart = blockstart_g;
  const ptrint_t blockend = blockend_g;
  size_t page_size = page_size_g;

  ptrint_t addroffset;
  int result;
  
  if (lock != 0)
    abort ();
  
  if (addr < blockstart || addr >= blockend)
    return 0;

  addroffset = (addr - blockstart) / page_size;

  bitmask_g [addroffset / BITMASK_UNIT] |= 1 << (addroffset % BITMASK_UNIT);

  result = mprotect ((void *)(addroffset * page_size + blockstart), 
		     page_size, PROT_READ | PROT_WRITE);
  if (result == -1)
    abort ();

  return 1;
}

/* Now for the grotty machine-dependent stuff.  */
#if defined (__linux__) && (defined (__powerpc__) || defined (__i386__))

#if defined (__powerpc__)
#define EXTRA_ARGS , struct sigcontext_struct *sigc
#define FAULT_ADDRESS ((void *)sigc->regs->dar)
#define SIGNAL_OK 					\
  (sigc->signal == SIGSEGV 				\
   && sigc->regs->trap == 0x00300			\
   && (sigc->regs->dsisr & 0x02000000) != 0)
#endif
#if defined (__i386__)
#define EXTRA_ARGS , struct sigcontext sigc
#define FAULT_ADDRESS ((void*)(sigc.cr2))
#define SIGNAL_OK (sigc.trapno == 14)
#endif


static struct sigaction segv_action;
#define CONTINUE sigaction (SIGSEGV, &segv_action, NULL); return

#define SETUP_HANDLER(handler)						\
  do {									\
    sigaction (SIGSEGV, NULL, &segv_action);				\
    segv_action.sa_handler = (sig_t)(handler);				\
    /* We want:								\
       system calls to resume if interrupted;				\
       segfaults inside the handler to be trapped, by the default	\
         technique (that is, a core dump).  */				\
    segv_action.sa_flags =						\
       SA_RESTART | SA_NODEFER | SA_RESETHAND;				\
    sigaction (SIGSEGV, &segv_action, NULL);				\
  } while (0)

#else

#define EXTRA_ARGS , siginfo_t *sigi, void *unused
#define SIGNAL_OK (sigi->si_signo == SIGSEGV && sigi->si_code == SEGV_ACCERR)
#define FAULT_ADDRESS (sigi->si_addr)

static struct sigaction segv_action;
#define CONTINUE sigaction (SIGSEGV, &segv_action, NULL); return

#define SETUP_HANDLER(handler)						\
  do {									\
    sigaction (SIGSEGV, NULL, &segv_action);				\
    segv_action.sa_sigaction = (handler);				\
    /* We want:								\
       system calls to resume if interrupted;				\
       to use the three-argument version of the signal handler; and	\
       segfaults inside the handler to be trapped, by the default	\
         technique (that is, a core dump).  */				\
    segv_action.sa_flags =						\
       SA_RESTART | SA_SIGINFO | SA_NODEFER | SA_RESETHAND;		\
    sigaction (SIGSEGV, &segv_action, NULL);				\
  } while (0)

#endif

static void
handle_segv (int signum EXTRA_ARGS)
{
  if (! SIGNAL_OK
      || ! maybe_mark_address (FAULT_ADDRESS))
    {
      /* For ease of debugging.  */
      static volatile void *fa;
      fa = FAULT_ADDRESS;
      abort ();
    }
  CONTINUE;
}

static void
setup_segv_handler (void)
{
  page_size_g = (size_t) sysconf (_SC_PAGESIZE);
  lock = 0;
  SETUP_HANDLER (handle_segv);
}


/* A simple test program.  */
int 
main(void)
{
  unsigned char *data;
  size_t toalloc;
  size_t page_size;
  size_t i;
  int devzero;
  
  setup_segv_handler ();
  page_size = page_size_g;
  
  /* Allocate some pages.  */
#define NUMPAGES 256
  toalloc = NUMPAGES*page_size;
#define NUMWORDS \
  ((NUMPAGES + BITMASK_UNIT - 1) / BITMASK_UNIT)

#ifndef MAP_ANONYMOUS
  devzero = open ("/dev/zero", O_RDWR);
  if (devzero == -1)
    perror ("open /dev/zero failed");
  data = mmap (NULL, toalloc, PROT_READ | PROT_WRITE,
	       MAP_PRIVATE, devzero, 0);
#else
  data = mmap (NULL, toalloc, PROT_READ | PROT_WRITE,
	       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

#endif

  if (data == (unsigned char *)-1)
    perror ("mmap failed");
  lock = 1;
  blockstart_g = (ptrint_t) data;
  blockend_g = ((ptrint_t)data) + toalloc;
  bitmask_g = malloc (NUMWORDS * sizeof (*bitmask_g));
  if (bitmask_g == NULL)
    perror ("malloc failed");
  for (i = 0; i < NUMWORDS; i++)
    bitmask_g[i] = 0;
  lock = 0;
  
  /* Initialise them with an interesting bit pattern.  */
  for (i = 0; i < toalloc; i++)
    data[i] = i ^ (i / page_size);
  for (i = 0; i < NUMWORDS; i++)
    if (bitmask_g[i] != 0)
      abort ();

  /* Make them read-only.  */
  mprotect (data, toalloc, PROT_READ);
  
  /* Check that everything works as it should.  */
  for (i = 0; i < toalloc; i++)
    if (data[i] != (unsigned char)(i ^ (i / page_size)))
      abort ();
  for (i = 0; i < NUMWORDS; i++)
    if (bitmask_g[i] != 0)
      abort ();

  data[0] = 7;
  if (bitmask_g[0] != 1)
    abort ();
  for (i = 1; i < NUMWORDS; i++)
    if (bitmask_g[i] != 0)
      abort ();
  lock = 1;
  bitmask_g[0] = 0;
  lock = 0;
  if (data[0] != 7)
    abort ();
  if (bitmask_g[0] != 0)
    abort ();
  data[0] = 8;
  if (bitmask_g[0] != 0)
    abort ();
  if (data[0] != 8)
    abort ();
  if (bitmask_g[0] != 0)
    abort ();
  mprotect (data, page_size, PROT_READ);
  data[0] = 9;
  if (bitmask_g[0] != 1)
    abort ();
  data[page_size] = 10;
  if (bitmask_g[0] != 3)
    abort ();

  printf ("completed ok.\n");
  exit (0);
}
============================================================

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 18:17 PCH, and more generally C++ parser performance Zack Weinberg
  2000-08-24 18:45 ` Mark Mitchell
@ 2000-08-25 12:28 ` Stan Shebs
  2000-08-27 17:59   ` Zack Weinberg
  2000-08-30 12:32 ` Joern Rennecke
  2 siblings, 1 reply; 42+ messages in thread
From: Stan Shebs @ 2000-08-25 12:28 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc

Some very interesting analysis!  Just a couple comments:

Zack Weinberg wrote:
> 
> Third, we could not generate so much data in the first place.  Most of
> that 45 megs is saved tree representations of inline functions.  Most
> of those inline functions are never used.  If we could avoid parsing
> them in the first place, wouldn't that be nice?  Here's where you're
> all thinking precompiled headers.

That probably explains the big discrepancy between C and C++ performance.
Volumewise, only a small percentage of the synthetic example is class
definitions, but it jumped the compile time by some 50-60%.  Of course,
this will be more of an issue for systems that use more C++ headers; the
basic Mac API is 98% C.  On Macs, the primary source of C++ headers is
Metrowerks' PowerPlant, which is sizeable, but nowhere near the base API.

> Apple inherited a remarkably clever PCH implementation from NeXT.  It
> can figure out which functions the compiler actually needs to see, and
> present it with only those.  Alas, it only works with C; but Stan
> reports that for the C version of the above example, compiler runtime
> with PCH is less than a second.

Yeah, it's amazing how fast GCC is when it doesn't have to parse thousands
and thousands of unreferenced enums and prototypes... :-)

Incidentally, I've been on a crash project to add C++ to cpp-precomp
(as we call it), am learning way more about dark corners of C++ syntax
than I really wanted to know. :-)  Also - but keep in mind this is not a
promise - I'm getting some favorable response to the idea of freeing the
sources to cpp-precomp.  Although it's not reasonable to try to include
any of it directly into GCC (it's written in C++, for one thing), it
should be useful as a source of ideas on how to make precompiled headers
work.

> (Part of that might be the simpler C
> front end, though.)  We can do something similar without necessarily
> implementing PCH.  The C++ front end already has logic to delay
> parsing inline functions until convenient.  It is used only to work
> around some of the horrors of parsing C++ at the moment, but it could
> be applied to every inline function, and they could be delayed until
> actually required instead of just until the context for parsing them
> is better.  The unparsed structure is somewhat smaller, and much
> simpler, than the tree it becomes.

That sounds like a good idea.  Token streams are actually very compact,
you can get down to an average of little more than 1 byte/token, which
is smaller than either source or tree structure.  cpp-precomp basically
works by having various tables indexing the sea of tokens, and pulling
out ranges of tokens for further processing.

Stan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 18:45 ` Mark Mitchell
  2000-08-24 20:35   ` Richard Henderson
  2000-08-24 21:54   ` PCH, and more generally C++ parser performance Zack Weinberg
@ 2000-08-25 22:53   ` Zack Weinberg
  2 siblings, 0 replies; 42+ messages in thread
From: Zack Weinberg @ 2000-08-25 22:53 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc

On Thu, Aug 24, 2000 at 06:40:47PM -0700, Mark Mitchell wrote:
> >>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:
> 
>     Zack> It's been called 285,000 times.  It called make_node 90,000
>     Zack> times.  That means that there were 90,000 unique identifiers
>     Zack> in the sample code.  get_identifier's hash table has only
>     Zack> 1000 slots; even with a perfect hash function, which it does
>     Zack> not have, we would get 90-element chains.  (An ideal binary
>     Zack> search tree would have depth 16.)
> 
> Funny -- Alex, Richard, Jason, Benjamin and I we were just talking
> about this at lunch yesterday.  I was planning on doing something
> about this -- but it's great if you will instead.

Okay.  I have a quick-and-dirty implementation of this (which doesn't
work except in front ends that use cpplib ... minor details, y'know)
and it is a pretty impressive win.

Before:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 17.23      2.75     2.75   285012     0.01     0.01  get_identifier
 11.47      4.58     1.83 14360005     0.00     0.00  ggc_set_mark
  8.27      5.90     1.32       18    73.33   208.88  ggc_mark_trees
  4.57      6.63     0.73     5652     0.13     0.13  lookup_tag
  3.57      7.20     0.57        1   570.00 15916.08  yyparse
  2.94      7.67     0.47  2011020     0.00     0.00  lang_mark_tree
  2.76      8.11     0.44  1404252     0.00     0.00  memset
  2.19      8.46     0.35    72526     0.00     0.03  grokdeclarator
  1.69      8.73     0.27  1861769     0.00     0.00  ggc_alloc
  1.50      8.97     0.24   201127     0.00     0.00  walk_tree
...
  0.81     10.60     0.13   179920     0.00     0.00  _cpp_lookup_with_hash

After:

 13.67      1.90     1.90 13535029     0.00     0.00  ggc_set_mark
  8.71      3.11     1.21       18    67.22   197.11  ggc_mark_trees
  5.47      3.87     0.76        1   760.00 13729.41  yyparse
  4.25      4.46     0.59     5652     0.10     0.10  lookup_tag
  3.45      4.94     0.48  2004597     0.00     0.00  lang_mark_tree
  3.45      5.42     0.48  1399863     0.00     0.00  memset
  2.52      5.77     0.35  1771787     0.00     0.00  ggc_alloc
  2.52      6.12     0.35    72526     0.00     0.02  grokdeclarator
  2.23      6.43     0.31  1222221     0.00     0.00  make_node
  2.09      6.72     0.29   286353     0.00     0.00  _cpp_lookup_with_hash

Total time as measured by gprof is 15.96sec before, 13.89sec after.  I
think we can do even better; we're presently calling
_cpp_lookup_with_hash more than we need to.

                0.11    0.03  106475/286353      cpp_lookup [102]
                0.18    0.04  179878/286353      parse_name [58]
[57]     2.6    0.29    0.07  286353         _cpp_lookup_with_hash [57]

The calls from parse_name are unavoidable, but I think we can crush
out many of the calls from cpp_lookup.  If we trace back a bit more...

                0.00    0.00      19/106475      initialize_builtins [578]
                0.00    0.00      23/106475      _cpp_init_stacks [700]
                0.00    0.00     934/106475      maybe_get_identifier [462]
                0.04    0.13  105499/106475      get_identifier [88]
[102]    1.3    0.04    0.13  106475         cpp_lookup [102]
                0.11    0.03  106475/286353      _cpp_lookup_with_hash [57]

                0.00    0.05   23586/105499      set_mangled_name_for_decl [65]
                0.00    0.05   24312/105499      declare_function_name [13]
                0.00    0.10   47875/105499      make_decl_rtl [41]
[88]     1.5    0.00    0.21  105499         get_identifier [88]
                0.04    0.13  105499/106475      cpp_lookup [102]

I don't follow make_decl_rtl real well, but it looks like we're
calling get_identifier from there to set the DECL_ASSEMBLER_NAME of a
decl - and that name often winds up being the same as the DECL_NAME,
but we go through get_identifier anyway.  I Could Be Wrong.

The calls from declare_function_name are completely avoidable; we are
looking up __FUNCTION__, etc. over and over again.  Shove 'em into
c_global_trees and forget about it...

And set_mangled_name_for_decl ... does not call get_identifier.  This
is an artifact of tailcall optimization.  It calls either mangle_decl
or build_decl_overload_real (depending on -fnew-abi), both of which
end by tailcalling get_identifier.  This is to set DECL_ASSEMBLER_NAME
and looks unavoidable - at least at first glance.

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-25  0:19         ` Mark Mitchell
@ 2000-08-27 15:31           ` Zack Weinberg
  2000-08-27 16:12             ` Mark Mitchell
  0 siblings, 1 reply; 42+ messages in thread
From: Zack Weinberg @ 2000-08-27 15:31 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc

On Fri, Aug 25, 2000 at 12:19:30AM -0700, Mark Mitchell wrote:
> >>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:
>     Zack> [Perhaps the standard has weasel words permitting us not
>     Zack> to call f]
> 
> I don't think there are any such weasel words -- but I could be
> wrong.

Hmm.  I read bits of the standard and I can make a case for us being
required *not* to call f.  Look at 14.7.1, paragraphs 1, 7, 9:

   1	... Unless a member of a class template or a member template
	has been explicitly instantiated or explicitly specialized, the
	specialization of the member is implicitly instantiated when
	the specialization is referenced in a context that requires
	the member definition to exist; in particular, the
	initialization (and any associated side-effects) of a static
	data member does not occur unless the static data member is
	itself used in a way that requires the definition of the
	static data member to exist.

   7	The implicit instantiation of a class template does not cause
	any static data members of that class to be implicitly
	instantiated.

   9	An implementation shall not instantiate a function template, a
	member template, a member class, or a static data member of a
	class template that does not require instantiation.

I'm reading that to say that S<int>::i shall be instantiated if and
only if it is "required to exist."  The question is then whether its
sole use inside an inline function which is not itself used, g,
requires it to exist.  I can't find anything in the standard about
inlines that are defined and not used in a translation unit.  I'd
argue that the as-if rule means it is not required (and is therefore
forbidden, per 14.7.1p9) because the program cannot tell whether
S<int>::i exists.

>     Zack> Even if there are no such weasel words, couldn't we arrange
>     Zack> to call f and throw the value away?
>
> Yes, we could call `f' and throw the value away -- but how would we
> know to do that without parsing `g'?

I imagined calling f once as soon as we saw 

template <class T> int S<T>::i = f();

and using its value for the first instantiation of S<T> we encounter.
But that's not right, because we might never instantiate S<T>.  

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 15:31           ` Zack Weinberg
@ 2000-08-27 16:12             ` Mark Mitchell
  2000-08-27 17:46               ` Zack Weinberg
  0 siblings, 1 reply; 42+ messages in thread
From: Mark Mitchell @ 2000-08-27 16:12 UTC (permalink / raw)
  To: zack; +Cc: gcc

>>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:

    Zack>    1 ... Unless a member of a class template or a member
    Zack> template has been explicitly instantiated or explicitly
    Zack> specialized, the specialization of the member is implicitly
    Zack> instantiated when the specialization is referenced in a
    Zack> context that requires the member definition to exist; in
    Zack> particular, the initialization (and any associated
    Zack> side-effects) of a static data member does not occur unless
    Zack> the static data member is itself used in a way that requires
    Zack> the definition of the static data member to exist.

Yes, that's the question all right.  There are lots of tricky bits in
the standard about the "point of instantiation".  It can matter
exactly *where* a template is instantiated.  So, if, for example, some
template was unequivocally needed somewhere else, we might have to
then go back and pretend it was actually instantiated inside the
inline function.

There are other questions as well:

  inline void f () { class C { friend void g() {}; }; }
  void g() {}

I believe this is illegal due to duplicate definitions of `g'.  But
you can't know that without parsing `f'.

I understand your basic idea: that the body of a function shouldn't
matter if the function isn't "needed".  But, unfortunately (from a
language cleanliness point of view), some information can escape.

After all `static' functions aren't needed either, if they're never
called.  The same kind of logic would imply that given:

  static void f () { S<int>::i = 3; }

maybe we shouldn't instantiate S<int>::i.  But that would sure confuse
a lot of people.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-25  0:56           ` Daniel Berlin
@ 2000-08-27 17:09             ` Zack Weinberg
  2000-08-27 17:32               ` Mark Mitchell
  0 siblings, 1 reply; 42+ messages in thread
From: Zack Weinberg @ 2000-08-27 17:09 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Mark Mitchell, gcc

On Fri, Aug 25, 2000 at 12:56:19AM -0700, Daniel Berlin wrote:

> However, as I said before, if you can reduce the amount of memory we have
> to go through, in any way, it's a much better optimization (in terms of
> usefulness) at this point than optimizing the GC in any way (which is
> always optimization to try to not check as much memory as we would without
> a given GC optimization).

I just discovered ggc_print_statistics.  Here's what it says when
called at the very end of compilation - just before free_reg_info().
[I hacked it up to scale things into kbytes/Mbytes for readability.
Then I sorted the output by percentage.]

Tree                 Number            Bytes    % Total
identifier_node       97271            5.94M     16.603
parm_decl             42861            5.23M     14.632
function_decl         30924            3.77M     10.557
var_decl              24251            2.96M      8.279
const_decl            15216            1.86M      5.195
tree_list             55848            1.70M      4.766
method_type           12861            1.57M      4.391
integer_cst           48741            1.49M      4.160
field_decl            11761            1.44M      4.015
function_type          9924            1.21M      3.388
binding               37953            1.16M      3.239
result_decl            8103         1012.88k      2.766
string_cst            24251          757.84k      2.070
decl_stmt             24249          757.78k      2.070
type_decl              5704          713.00k      1.947
scope_stmt            16166          505.19k      1.380
block                 16166          505.19k      1.380
pointer_type           3706          463.25k      1.265
record_type            3650          456.25k      1.246
reference_type         3641          455.12k      1.243
tree_vec               4574          377.88k      1.032
nop_expr               8083          252.59k      0.690
indirect_ref           8083          252.59k      0.690
compound_stmt          8083          252.59k      0.690
enumeral_type          2010          251.25k      0.686
return_stmt            6054          189.19k      0.517
init_expr              6054          189.19k      0.517
overload              10920          170.62k      0.466
array_type              198           24.75k      0.068
integer_type            135           16.88k      0.046
vector_type               5          640.00       0.002
real_type                 5          640.00       0.002
void_type                 3          384.00       0.001
namespace_decl            3          384.00       0.001
lang_type                 2          256.00       0.001
complex_type              4          512.00       0.001
boolean_type              1          128.00       0.000
error_mark                1           16.00       0.000
Total                547465           35.76M

RTX                  Number            Bytes    % Total
mem                   55229          862.95k     66.454
symbol_ref            55412          432.91k     33.337
const_int               201            1.57k      0.121
const_double             21          672.00       0.051
reg                      27          432.00       0.032
code_label                1           64.00       0.005
pc                        1            4.00       0.000
cc0                       1            4.00       0.000
Total                110893            1.27M

Log        Allocated             Used
2              4.00k           60.00 
3            452.00k          437.65k
4              1.28M            1.21M
5             11.35M            8.05M
6              8.80M            8.72M
7             22.08M           21.50M
8             84.00k           83.00k

Total bytes marked: 41932084
Total bytes mapped: 46182400

It definitely looks like reducing the size of *_DECL nodes would help.
Also *_TYPE nodes and identifiers, if we can swing it.  (This report
doesn't count any strings, or the structures created by cpphash.c for
the identifier table.)

I'm a bit surprised at the size and number of INTEGER_CST nodes.
Aren't they being allocated uniquely?  (looks) No they aren't, I was
thinking of CONST_INTs.  Maybe I'll make INTEGER_CSTs be allocated
uniquely too.  A bit of perl hackery indicates that there are 14,444
integer constants in the code being processed, but only 177 different
values.

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 17:09             ` Zack Weinberg
@ 2000-08-27 17:32               ` Mark Mitchell
  2000-08-27 17:37                 ` Zack Weinberg
  0 siblings, 1 reply; 42+ messages in thread
From: Mark Mitchell @ 2000-08-27 17:32 UTC (permalink / raw)
  To: zack; +Cc: dan, gcc

>>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:

You're now gradually redoing all the research I did a few months
ago... :-)  Good science, you're duplicating my results. :-)

    Zack> I'm a bit surprised at the size and number of INTEGER_CST
    Zack> nodes.  Aren't they being allocated uniquely?  (looks) No
    Zack> they aren't, I was thinking of CONST_INTs.  Maybe I'll make
    Zack> INTEGER_CSTs be allocated uniquely too.  A bit of perl

Careful, that won't work if you do it in the obvious way.  Some parts
of the compiler treat INTEGER_CST nodes as mutable.  I started trying
to fix this, but didn't finish. :-(

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 17:32               ` Mark Mitchell
@ 2000-08-27 17:37                 ` Zack Weinberg
  0 siblings, 0 replies; 42+ messages in thread
From: Zack Weinberg @ 2000-08-27 17:37 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: dan, gcc

On Sun, Aug 27, 2000 at 05:32:02PM -0700, Mark Mitchell wrote:
> >>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:
> 
> You're now gradually redoing all the research I did a few months
> ago... :-)  Good science, you're duplicating my results. :-)

Heh.

I can't do anything but research until my outstanding patches get
reviewed (hint, hint) so I might as well do research. :)

>     Zack> I'm a bit surprised at the size and number of INTEGER_CST
>     Zack> nodes.  Aren't they being allocated uniquely?  (looks) No
>     Zack> they aren't, I was thinking of CONST_INTs.  Maybe I'll make
>     Zack> INTEGER_CSTs be allocated uniquely too.  A bit of perl
> 
> Careful, that won't work if you do it in the obvious way.  Some parts
> of the compiler treat INTEGER_CST nodes as mutable.  I started trying
> to fix this, but didn't finish. :-(

Yep, just discovered that myself.  Also that the type of an
INTEGER_CST matters.  It's a bit too big of a problem for me to tackle
right now.

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 16:12             ` Mark Mitchell
@ 2000-08-27 17:46               ` Zack Weinberg
  2000-08-27 18:01                 ` Mark Mitchell
  0 siblings, 1 reply; 42+ messages in thread
From: Zack Weinberg @ 2000-08-27 17:46 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc

On Sun, Aug 27, 2000 at 04:12:08PM -0700, Mark Mitchell wrote:
> >>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:
> 
>     Zack>    1 ... Unless a member of a class template or a member
>     Zack> template has been explicitly instantiated or explicitly
>     Zack> specialized, the specialization of the member is implicitly
>     Zack> instantiated when the specialization is referenced in a
>     Zack> context that requires the member definition to exist; in
>     Zack> particular, the initialization (and any associated
>     Zack> side-effects) of a static data member does not occur unless
>     Zack> the static data member is itself used in a way that requires
>     Zack> the definition of the static data member to exist.
> 
> Yes, that's the question all right.  There are lots of tricky bits in
> the standard about the "point of instantiation".  It can matter
> exactly *where* a template is instantiated.  So, if, for example, some
> template was unequivocally needed somewhere else, we might have to
> then go back and pretend it was actually instantiated inside the
> inline function.

I only see 14.6.4.1 defining the concept, and from that, there can be
multiple p.o.i.s for most things - and where we can't, that section
sounds like we could get away with the p.o.i. being the place where it
was unequivocally needed.  But that's just that section; I presume the
concept is used elsewhere...

> There are other questions as well:
> 
>   inline void f () { class C { friend void g() {}; }; }
>   void g() {}
> 
> I believe this is illegal due to duplicate definitions of `g'.  But
> you can't know that without parsing `f'.

But are we required to diagnose it?  The first g() is only in scope
inside f, IIUC, and so the only rule it trips that I see is the one
about all definitions of an inline function being inline - which we
aren't required to diagnose.

> I understand your basic idea: that the body of a function shouldn't
> matter if the function isn't "needed".  But, unfortunately (from a
> language cleanliness point of view), some information can escape.
> 
> After all `static' functions aren't needed either, if they're never
> called.  The same kind of logic would imply that given:
> 
>   static void f () { S<int>::i = 3; }
> 
> maybe we shouldn't instantiate S<int>::i.  But that would sure confuse
> a lot of people.

How could the program tell the difference?  We already don't emit
assembly for function bodies of unreferenced static functions...

I do recognize it may be difficult to do this properly, but it's such
a major optimization that I think we need to consider seriously how we
can do it.  Apple's even talking about doing the same trick for
*declarations* that aren't referenced...

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-25 12:28 ` Stan Shebs
@ 2000-08-27 17:59   ` Zack Weinberg
  2000-08-27 21:13     ` Geoff Keating
  0 siblings, 1 reply; 42+ messages in thread
From: Zack Weinberg @ 2000-08-27 17:59 UTC (permalink / raw)
  To: Stan Shebs; +Cc: gcc

On Fri, Aug 25, 2000 at 12:28:20PM -0700, Stan Shebs wrote:
> Some very interesting analysis!  Just a couple comments:
> 
> Zack Weinberg wrote:
> > 
> > Third, we could not generate so much data in the first place.  Most of
> > that 45 megs is saved tree representations of inline functions.  Most
> > of those inline functions are never used.  If we could avoid parsing
> > them in the first place, wouldn't that be nice?  Here's where you're
> > all thinking precompiled headers.
> 
> That probably explains the big discrepancy between C and C++ performance.
> Volumewise, only a small percentage of the synthetic example is class
> definitions, but it jumped the compile time by some 50-60%.  Of course,
> this will be more of an issue for systems that use more C++ headers; the
> basic Mac API is 98% C.  On Macs, the primary source of C++ headers is
> Metrowerks' PowerPlant, which is sizeable, but nowhere near the base API.

In fact, I spoke too soon.  Most of the memory consumption is
identifier and declaration nodes, types, and bindings (whatever those
are).  In other words, it wouldn't help memory consumption much to do
lazy parsing of inline functions.  It *would* help parse time, though
- I think.

> Incidentally, I've been on a crash project to add C++ to cpp-precomp
> (as we call it), am learning way more about dark corners of C++ syntax
> than I really wanted to know. :-)

Look over at the conversation I've been having with Mark Mitchell.  He
thinks we can't get away with lazy parsing of anything in C++, and has
some solid arguments for it.

> Also - but keep in mind this is not a promise - I'm getting some
> favorable response to the idea of freeing the sources to
> cpp-precomp.  Although it's not reasonable to try to include any of
> it directly into GCC (it's written in C++, for one thing), it should
> be useful as a source of ideas on how to make precompiled headers
> work.

It won't be me doing it, anymore, but whoever takes over the project
will probably be glad for the ideas.

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 17:46               ` Zack Weinberg
@ 2000-08-27 18:01                 ` Mark Mitchell
  2000-08-27 18:34                   ` Zack Weinberg
  0 siblings, 1 reply; 42+ messages in thread
From: Mark Mitchell @ 2000-08-27 18:01 UTC (permalink / raw)
  To: zack; +Cc: gcc

>>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:

I'll have to think harder about whether or not its legal.  Jason might
have good ideas as well regarding that.

    Zack> I do recognize it may be difficult to do this properly, but
    Zack> it's such a major optimization that I think we need to
    Zack> consider seriously how we can do it.  Apple's even talking
    Zack> about doing the same trick for *declarations* that aren't
    Zack> referenced...

The traditional approach to PCH is that you pay the cost of all the
parsing exactly once, and then restore your state.  The assumption is
that you are parsing the same header files many times when building
your app, so if you pay the cost only once you've pretty much won.

Apple's approach is to implement this by emitting for the compiler
only the bits that are needed, essentially via a magic preprocessor.
The traditional approach is to mmap in the state of the compilation
after parsing the header files.

The traditional approach will be faster (modulo GC issues, but let's
leave those aside) -- since you don't have to do any parsing at all.
It also seems less risky to me in terms of language semantics,
although there are still a few tricky issues.  It also makes it easier
(if you design it right) to use program databases, implement export,
and so forth.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 18:01                 ` Mark Mitchell
@ 2000-08-27 18:34                   ` Zack Weinberg
  2000-08-27 21:20                     ` Mark Mitchell
  0 siblings, 1 reply; 42+ messages in thread
From: Zack Weinberg @ 2000-08-27 18:34 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc

On Sun, Aug 27, 2000 at 06:01:16PM -0700, Mark Mitchell wrote:
> >>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:
...
> The traditional approach to PCH is that you pay the cost of all the
> parsing exactly once, and then restore your state.  The assumption is
> that you are parsing the same header files many times when building
> your app, so if you pay the cost only once you've pretty much won.

Right.

> Apple's approach is to implement this by emitting for the compiler
> only the bits that are needed, essentially via a magic preprocessor.
> The traditional approach is to mmap in the state of the compilation
> after parsing the header files.
> 
> The traditional approach will be faster (modulo GC issues, but let's
> leave those aside) -- since you don't have to do any parsing at all.

I don't think you should just handwave over the GC issues.  In the
test case, there's 36 megs of trees, 1.2 megs of RTL, and 3 megs of
misc GCed data.  It takes approximately 0.5 seconds every time we do a
collection.  Apple's code takes 0.6 seconds to process the entire test
case; it's unlikely that the traditional approach will be that much
faster.  Do *one* GC run, therefore, and Apple's implementation will
be faster than the traditional one.

There's also all kinds of portability issues implied by your 'mmap in
the state of the compilation'.  First off, to mmap it at all, the
on-disk data structure has to match the in-core one exactly.  And that
means the dump files break every time the compiler changes even
slightly.  This is an acceptable constraint as long as it's right
there up front and we fail immediately instead of loading inconsistent
data and dying horribly later.

The dump file data will be modified by GCC after it's reloaded, and it
contains pointers.  Therefore both MAP_PRIVATE and MAP_FIXED have to
work reliably.  If we don't have mmap, we're stuck valloc-ing pages
until we get the virtual addresses we want and then reading in however
much data (which will exact its own cost.)

Contrariwise, if we pickle everything nicely so the on-disk format
isn't fragile, then we have to unpickle it on every run, which is
expensive, or use the pickled format in core, which is even more
expensive.

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 17:59   ` Zack Weinberg
@ 2000-08-27 21:13     ` Geoff Keating
  2000-08-29 17:32       ` Joe Buck
  2000-08-31  0:53       ` Gabriel Dos Reis
  0 siblings, 2 replies; 42+ messages in thread
From: Geoff Keating @ 2000-08-27 21:13 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Stan Shebs, gcc

Zack Weinberg <zack@wolery.cumb.org> writes:

> Look over at the conversation I've been having with Mark Mitchell.  He
> thinks we can't get away with lazy parsing of anything in C++, and has
> some solid arguments for it.

That's why my preferred scheme uses lazy _loading_, keyed by
identifier names.

I claim (and I'd be interested to know if anyone has a counterexample)
that if an identifier is never used other than to be declared or
defined, then program behaviour is the same as if those declarations
were never made.

So, for instance, if we have something like

#include <everything.h>
int foo (int bar)
{
  bar = f(g(i<3>));
}

then during parsing, everything relating to "foo", "bar", "f", "g",
"i" gets loaded from a pre-compiled header file, as are their types
and so on.

There are some other things I think I can claim:

- For a method, it is unnecessary to load it in if the name of its class
  is never referenced.

- For any procedure, it is unnecessary to load in the types of one of
  its parameters if they are not otherwise referenced.  It's not
  necessary to load in the type of the return value if any of its
  parameters have types that are not loaded in.  Of course we do need to
  have the procedure declaration itself loaded in.

- It's unnecessary to load in the definition of any procedure unless
  we would actually output it.

I think at this level it should be doable to implement this.  If I'm
lucky, we can go farther, so that if there are dozens of classes all
defining a 'f' method we don't have to load all them in (I'm thinking
here particularly of methods named things like 'clone()'); probably
for methods we don't need to load them in if the name of their class
is never referenced.

-- 
- Geoffrey Keating <geoffk@cygnus.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 18:34                   ` Zack Weinberg
@ 2000-08-27 21:20                     ` Mark Mitchell
  2000-08-27 21:53                       ` Zack Weinberg
  0 siblings, 1 reply; 42+ messages in thread
From: Mark Mitchell @ 2000-08-27 21:20 UTC (permalink / raw)
  To: zack; +Cc: gcc

>>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:

    Zack> I don't think you should just handwave over the GC issues.

Fair 'nuff.  But I do think that there are such significant things we
can do there that it won't matter much.

    Zack> There's also all kinds of portability issues implied by your
    Zack> 'mmap in the state of the compilation'.

Oh, I know.  I know partly because I have in front of me the source to
another compiler that uses the traditional approach.  The code really
isn't that ugly -- and based on the number of platforms this code has
been used on `mmap' with the right flags (or a suitable equivalent) is
available just about everywhere that matters.

In other words, I have an existence proof that this is a reasonable
way to go. :-)

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 21:20                     ` Mark Mitchell
@ 2000-08-27 21:53                       ` Zack Weinberg
  2000-08-27 21:58                         ` Mark Mitchell
  0 siblings, 1 reply; 42+ messages in thread
From: Zack Weinberg @ 2000-08-27 21:53 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc

On Sun, Aug 27, 2000 at 09:20:02PM -0700, Mark Mitchell wrote:
> >>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:
> 
>     Zack> I don't think you should just handwave over the GC issues.
> 
> Fair 'nuff.  But I do think that there are such significant things we
> can do there that it won't matter much.

This is probably true, yes.

>     Zack> There's also all kinds of portability issues implied by your
>     Zack> 'mmap in the state of the compilation'.
> 
> Oh, I know.  I know partly because I have in front of me the source to
> another compiler that uses the traditional approach.  The code really
> isn't that ugly -- and based on the number of platforms this code has
> been used on `mmap' with the right flags (or a suitable equivalent) is
> available just about everywhere that matters.
> 
> In other words, I have an existence proof that this is a reasonable
> way to go. :-)

I believe you.  I am extrapolating badly from the code I have in front
of me, which is indeed quite ugly - but then it tries to dump out and
reload the entire data segment (including libc state and other
dragons).  If it just dumped the GC arena it would be a lot cleaner.
And there's no particular need to dump things like
c_global_trees... to a first approximation, everything you need should
be reachable from the get_identifier table.  (I have this nagging
feeling that there's state in the ordinary malloc pool or global
variables that you'd need, though.)

zw

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 21:53                       ` Zack Weinberg
@ 2000-08-27 21:58                         ` Mark Mitchell
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Mitchell @ 2000-08-27 21:58 UTC (permalink / raw)
  To: zack; +Cc: gcc

>>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:

    Zack> I believe you.  I am extrapolating badly from the code I
    Zack> have in front of me, which is indeed quite ugly - but then
    Zack> it tries to dump out and reload the entire data segment
    Zack> (including libc state and other dragons).

Ugh.  I probably cannot say much more about the code I have in front
of me.  But suffice it to say that it does something much more like
dumping out the GC pool.  It takes some care (more care than presently
taken in GCC!) to make that work (in fact, that compiler uses
something like obstacks, only in a much, much cleaner way than the way
they were used in GCC), but with care you can get away with doing
vastly less than dumping your entire core state.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 21:13     ` Geoff Keating
@ 2000-08-29 17:32       ` Joe Buck
  2000-08-29 18:04         ` Geoff Keating
  2000-08-29 18:28         ` Stan Shebs
  2000-08-31  0:53       ` Gabriel Dos Reis
  1 sibling, 2 replies; 42+ messages in thread
From: Joe Buck @ 2000-08-29 17:32 UTC (permalink / raw)
  To: Geoff Keating; +Cc: Zack Weinberg, Stan Shebs, gcc

> I claim (and I'd be interested to know if anyone has a counterexample)
> that if an identifier is never used other than to be declared or
> defined, then program behaviour is the same as if those declarations
> were never made.

There are a couple of things that you're of course already aware of, like
constructors and destructors that may have side effects, and the impact
that the existence of a declaration may have on name resolution.

> So, for instance, if we have something like
> 
> #include <everything.h>
> int foo (int bar)
> {
>   bar = f(g(i<3>));
> }
> 
> then during parsing, everything relating to "foo", "bar", "f", "g",
> "i" gets loaded from a pre-compiled header file, as are their types
> and so on.

The only complication I can think of is that
#include <everything.h>
may also instantiate a static file-scope object that has a constructor.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-29 17:32       ` Joe Buck
@ 2000-08-29 18:04         ` Geoff Keating
  2000-08-29 19:34           ` Joe Buck
  2000-08-29 18:28         ` Stan Shebs
  1 sibling, 1 reply; 42+ messages in thread
From: Geoff Keating @ 2000-08-29 18:04 UTC (permalink / raw)
  To: jbuck; +Cc: zack, shebs, gcc

> From: Joe Buck <jbuck@racerx.synopsys.com>
> Date: Tue, 29 Aug 2000 17:31:27 -0700 (PDT)
> Cc: zack@wolery.cumb.org (Zack Weinberg), shebs@apple.com (Stan Shebs),
>         gcc@gcc.gnu.org
> 
> 
> > I claim (and I'd be interested to know if anyone has a counterexample)
> > that if an identifier is never used other than to be declared or
> > defined, then program behaviour is the same as if those declarations
> > were never made.
> 
> There are a couple of things that you're of course already aware of, like
> constructors and destructors that may have side effects, and the impact
> that the existence of a declaration may have on name resolution.

I did indeed think of those.

Constructors, destructors, variable definitions, non-inline procedure
definitions all have to be handled specially, because they all produce
assembler output.  You can't lazily compile

int x = 3;

even in C because even if 'x' is never used in a particular
compilation unit, some other unit might use it.  The options I see for
such code are:

- don't allow these
- don't precompile these
- save a chunk of assembler input which gets stuck in when the PCH is
  used.  This would have the disadvantage that you could have
  conflicts if any assembler was output before the PCH was included,
  so you would have to prohibit this.
- be really clever, make sure the assembler has no conflicts
  by (say) renaming internal symbols.  This would be hard.

My instinct is to start by implementing the first or second, and then
extend it later if that turns out to be too slow.

As for name resolution, I claim that if an identifier 'x' is never
used other than to be declared or defined, it is never relevant to
name resolution.  I suspect that when I get involved in this the first
thing I'll need to do is go through the standards and check that this
is true for C++.  (At least I know it's true in C :-).  The incredibly
complicated C++ name resolution rules were very near the front of my
mind when I was thinking about this.

-- 
- Geoffrey Keating <geoffk@cygnus.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-29 17:32       ` Joe Buck
  2000-08-29 18:04         ` Geoff Keating
@ 2000-08-29 18:28         ` Stan Shebs
  1 sibling, 0 replies; 42+ messages in thread
From: Stan Shebs @ 2000-08-29 18:28 UTC (permalink / raw)
  To: Joe Buck; +Cc: Geoff Keating, Zack Weinberg, gcc

Joe Buck wrote:
> 
> > #include <everything.h>
> > int foo (int bar)
> > {
> >   bar = f(g(i<3>));
> > }
> >
> > then during parsing, everything relating to "foo", "bar", "f", "g",
> > "i" gets loaded from a pre-compiled header file, as are their types
> > and so on.
> 
> The only complication I can think of is that
> #include <everything.h>
> may also instantiate a static file-scope object that has a constructor.

In Apple's scheme, anything that is static gets put out unconditionally,
as do pragmas (another reason to hate pragmas).  For extra flavor, the
preprocessor warns you about any static objects it sees while it is
constructing the precomp file.

Stan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-29 18:04         ` Geoff Keating
@ 2000-08-29 19:34           ` Joe Buck
  2000-08-29 23:30             ` Geoff Keating
  0 siblings, 1 reply; 42+ messages in thread
From: Joe Buck @ 2000-08-29 19:34 UTC (permalink / raw)
  To: Geoff Keating; +Cc: zack, shebs, gcc

I wrote:
> > There are a couple of things that you're of course already aware of, like
> > constructors and destructors that may have side effects, and the impact
> > that the existence of a declaration may have on name resolution.

Geoff Keating writes:
> I did indeed think of those.
> 
> Constructors, destructors, variable definitions, non-inline procedure
> definitions all have to be handled specially, because they all produce
> assembler output.  You can't lazily compile
> 
> int x = 3;
> 
> even in C because even if 'x' is never used in a particular
> compilation unit, some other unit might use it.  The options I see for
> such code are:
> 
> - don't allow these
> - don't precompile these
> - save a chunk of assembler input which gets stuck in when the PCH is
>   used.  This would have the disadvantage that you could have
>   conflicts if any assembler was output before the PCH was included,
>   so you would have to prohibit this.
> - be really clever, make sure the assembler has no conflicts
>   by (say) renaming internal symbols.  This would be hard.
> 
> My instinct is to start by implementing the first or second, and then
> extend it later if that turns out to be too slow.

This may not suffice, because of a few C++ idioms that won't be supported.
In particular, you can't precompile the iostreams headers, and if you
can't do that, the C++ community isn't going to be that interested.

Look in libstdc++-v3/bits/std_iostream.h.  You will find

namespace std {
	....
	static ios_base::Init __ioinit;
};

ios_base::Init is a hack to assure that cin/cout/cerr/clog get initialized
before any use even in constructors for file-scope objects.  The idea is
that one ios_base::Init object will get built for each .o file that
includes <iostream>.  A counter is maintained, incremented in the
constructor and decremented in the destructor.  The constructor and
destructor handle initialization and finalization of the standard streams.

This is actually a kludge first invented by the cfront people, and
Stroustrup is inordinately proud of it.  Since he's told people about
it in his books, you can expect to encounter it.

As a comment in ios_base.h says, we could avoid this kludge by relying
on -finit-priority to give the constructors for iostreams priority over
everything else, but that won't work on all GCC target platforms
(you need a new-enough version of the GNU linker).

So what to do instead?  Something like "save a hunk of assembler"?
Do you intend for PCH's to be machine-independent or machine-dependent?
Perhaps a limited number of common forms, corresponding to constructors
and initializers could be supported in a machine independent way.
The two most obvious things to support are the equivalent of

int x = 3;

and
static some_class some_obj;

where a note saying "set x to 3" or "include constructor and destructor
for some_obj" could be represented in a machine-independent way.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-29 19:34           ` Joe Buck
@ 2000-08-29 23:30             ` Geoff Keating
  0 siblings, 0 replies; 42+ messages in thread
From: Geoff Keating @ 2000-08-29 23:30 UTC (permalink / raw)
  To: jbuck; +Cc: zack, shebs, gcc

> From: Joe Buck <jbuck@racerx.synopsys.com>
> Date: Tue, 29 Aug 2000 19:33:25 -0700 (PDT)

> Geoff Keating writes:
> > Constructors, destructors, variable definitions, non-inline procedure
> > definitions all have to be handled specially, because they all produce
> > assembler output.  You can't lazily compile
> > 
> > int x = 3;
> > 
> > even in C because even if 'x' is never used in a particular
> > compilation unit, some other unit might use it.  The options I see for
> > such code are:
> > 
> > - don't allow these
> > - don't precompile these
> > - save a chunk of assembler input which gets stuck in when the PCH is
> >   used.  This would have the disadvantage that you could have
> >   conflicts if any assembler was output before the PCH was included,
> >   so you would have to prohibit this.
> > - be really clever, make sure the assembler has no conflicts
> >   by (say) renaming internal symbols.  This would be hard.

...
> So what to do instead?  Something like "save a hunk of assembler"?
> Do you intend for PCH's to be machine-independent or machine-dependent?

PCHs will be target-dependent because headers are target-dependent.  I
haven't thought about host-dependency yet, it will probably depend on
what is convenient.

> Perhaps a limited number of common forms, corresponding to constructors
> and initializers could be supported in a machine independent way.
> The two most obvious things to support are the equivalent of
> 
> int x = 3;
> 
> and
> static some_class some_obj;
> 
> where a note saying "set x to 3" or "include constructor and destructor
> for some_obj" could be represented in a machine-independent way.

This is something to consider after the initial implementation.  I'd
be happy to just get declarations working first.  Constructors would
be the next obvious step.
-- 
- Geoffrey Keating <geoffk@cygnus.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-24 18:17 PCH, and more generally C++ parser performance Zack Weinberg
  2000-08-24 18:45 ` Mark Mitchell
  2000-08-25 12:28 ` Stan Shebs
@ 2000-08-30 12:32 ` Joern Rennecke
  2 siblings, 0 replies; 42+ messages in thread
From: Joern Rennecke @ 2000-08-30 12:32 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc

> (2) Garbage collection.  Here the relevant question is, how much
> memory are we using?  I instrumented mmap() and brk() and I have
> pretty graphs which you can have if you're interested.  But this test
> case is pretty darn pathological for memory usage.  We free very
> little memory over the course of the run, and at the end we're using
> 45 megabytes of mmapped anonymous memory, plus nine allocated with
> brk.  When you count program text, libraries, daemons, and kernel
> space memory, this test case is going to need more than 64 megs of
> RAM.  I have 256, it didn't hit swap space on my system, but it will
> on many people's.

Well, obviously you need enough physical memory to start with.
In order to avoid unnecessary allocations, I propose to only do GC if
either:

- the amount of newly allocated memory is at least X % + Y of the total
  allocated memory after the last GC (or at program start if there was
  no GC yet).  X should be something in the range of 10..100

or:

- The total amount of allocated memory is at least A, and the wall clock
  time spent since the last GC is at least B % of the wall clock time spent
  in the last GC.  B should be something in the range of 50..500%

The idea here is that if we are running into swapping, you see the wall clock
time skyrocketing.  It then makes sense to try to eliminate some memory
use after enough memory accesses have been made that might have released
some memory.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-27 21:13     ` Geoff Keating
  2000-08-29 17:32       ` Joe Buck
@ 2000-08-31  0:53       ` Gabriel Dos Reis
  2000-08-31  2:40         ` Geoff Keating
  1 sibling, 1 reply; 42+ messages in thread
From: Gabriel Dos Reis @ 2000-08-31  0:53 UTC (permalink / raw)
  To: Geoff Keating; +Cc: Zack Weinberg, Stan Shebs, gcc

Geoff Keating <geoffk@cygnus.com> writes:

| Zack Weinberg <zack@wolery.cumb.org> writes:
| 
| > Look over at the conversation I've been having with Mark Mitchell.  He
| > thinks we can't get away with lazy parsing of anything in C++, and has
| > some solid arguments for it.

Yes, I'm also of the opinion that you can't simply do lazy parsing in
C++, unless you have indeed a very very sophisticated scheme to get
name resolution right, especially to get the two-phase name lookup in
templates (see 14.6 of the C++ Standard).  You also need to keep
track of points of instantiation. 

| That's why my preferred scheme uses lazy _loading_, keyed by
| identifier names.
| 
| I claim (and I'd be interested to know if anyone has a counterexample)
| that if an identifier is never used other than to be declared or
| defined, then program behaviour is the same as if those declarations
| were never made.

How exactly do you determine that an identifier is never used other
than to be declared or defined given separate compilation?  Consider
the nifty_counter idiom popular in the C++ community for consistent
initialization of global variables across different translation units:

	// nifty_counter.H
	class nifty_counter {
	public:
		nifty_counter();
		~nifty_counter();

	private:
		static int count;
	};

	static nifty_counter counter;

Every translation unit that #includes "nifty_counter.H" defines an
instance of nifty_counter -- at first glance there is no other use.
Actually, the constructor and destructor will be responsible for
incrementing and decrementing nifty_counter::count for proper
initialization of globals across different translation units.  

That idiom is, for example, used to initialize the global stream
objects in the C++ Standard Library.

The idea is generalized into the idiom "resource acquisition is
initialization".  And we need to support it correctly, whether we like
it or not.

| So, for instance, if we have something like
| 
| #include <everything.h>
| int foo (int bar)
| {
|   bar = f(g(i<3>));
| }
| 
| then during parsing, everything relating to "foo", "bar", "f", "g",
| "i" gets loaded from a pre-compiled header file, as are their types
| and so on.

Given that name lookup is not that simple in C++, I would like you be
more precise here: how do you handle names found by unqualified name
lookup? 

-- Gaby
CodeSourcery, LLC                             http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-31  0:53       ` Gabriel Dos Reis
@ 2000-08-31  2:40         ` Geoff Keating
  2000-08-31 13:40           ` Gabriel Dos Reis
  0 siblings, 1 reply; 42+ messages in thread
From: Geoff Keating @ 2000-08-31  2:40 UTC (permalink / raw)
  To: Gabriel.Dos-Reis; +Cc: zack, shebs, gcc

> Cc: Zack Weinberg <zack@wolery.cumb.org>, Stan Shebs <shebs@apple.com>,
>         gcc@gcc.gnu.org
> From: Gabriel Dos Reis <Gabriel.Dos-Reis@cmla.ens-cachan.fr>
> Organization: CMLA, ENS Cachan -- CNRS UMR 8536 (France)
> Date: 31 Aug 2000 06:53:22 +0200
> 
> Geoff Keating <geoffk@cygnus.com> writes:
> 
> | Zack Weinberg <zack@wolery.cumb.org> writes:
> | 
> | > Look over at the conversation I've been having with Mark Mitchell.  He
> | > thinks we can't get away with lazy parsing of anything in C++, and has
> | > some solid arguments for it.
> 
> Yes, I'm also of the opinion that you can't simply do lazy parsing in
> C++, unless you have indeed a very very sophisticated scheme to get
> name resolution right, especially to get the two-phase name lookup in
> templates (see 14.6 of the C++ Standard).  You also need to keep
> track of points of instantiation. 

...
> 	// nifty_counter.H
> 	class nifty_counter {
> 	public:
> 		nifty_counter();
> 		~nifty_counter();
> 
> 	private:
> 		static int count;
> 	};
> 	
> 	static nifty_counter counter;

This has been mentioned previously.  The construct

static nifty_counter counter;

cannot be easily precompiled, as it is a definition.  It would be
nearly as hard even if it was just

static int counter;

because the compiler must allocate memory for it.

So you simply have to save this definition away---this cannot be done
as text or as a token stream, because its meaning might change---and
output it each time the PCH is used.

In the worst case, consider someone who thinks it a good idea to feed
an entire .cc file to the precompiler, and whose `source' consists of

#include <foo.cc>
/* ha ha ha */

I don't think we have to make this work for PCH to be useful.

> That idiom is, for example, used to initialize the global stream
> objects in the C++ Standard Library.
> 
> The idea is generalized into the idiom "resource acquisition is
> initialization".  And we need to support it correctly, whether we like
> it or not.

I agree that it must be supported correctly if it supported at all.
I do not agree that PCH would be useless if it was not supported
fully, or even if it was not supported at all.  For instance, we could
make it work only for the special case of the global streams.

> | So, for instance, if we have something like
> | 
> | #include <everything.h>
> | int foo (int bar)
> | {
> |   bar = f(g(i<3>));
> | }
> | 
> | then during parsing, everything relating to "foo", "bar", "f", "g",
> | "i" gets loaded from a pre-compiled header file, as are their types
> | and so on.
> 
> Given that name lookup is not that simple in C++, I would like you be
> more precise here: how do you handle names found by unqualified name
> lookup? 

Name lookup is the determination of which, out of a set of possible
meanings for a name, is the one meant.  My scheme handles this by
loading in all the possible definitions of a name before name lookup
starts.  For instance, when loading "foo" I would load T::foo,
J::T::foo, and plain ::foo, if they were all defined.  (I might make
an exception for C::foo if C is not yet loaded; I think, but am not
sure, that under those circumstances "foo" cannot mean C::foo.)

The implementation I'm thinking of would hook into get_identifier.
It'd keep a second table of 'identifiers that are defined but not
loaded yet', and if an identifier was not found in the normal hash
table the second table would be checked, and if found there it would
be loaded in.  This approach is simple and reliable, and easily
made smarter.

-- 
- Geoffrey Keating <geoffk@cygnus.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
  2000-08-31  2:40         ` Geoff Keating
@ 2000-08-31 13:40           ` Gabriel Dos Reis
  0 siblings, 0 replies; 42+ messages in thread
From: Gabriel Dos Reis @ 2000-08-31 13:40 UTC (permalink / raw)
  To: Geoff Keating; +Cc: zack, shebs, gcc

Geoff Keating <geoffk@cygnus.com> writes:

| > Cc: Zack Weinberg <zack@wolery.cumb.org>, Stan Shebs <shebs@apple.com>,
| >         gcc@gcc.gnu.org
| > From: Gabriel Dos Reis <Gabriel.Dos-Reis@cmla.ens-cachan.fr>
| > Organization: CMLA, ENS Cachan -- CNRS UMR 8536 (France)
| > Date: 31 Aug 2000 06:53:22 +0200
| > 
| > Geoff Keating <geoffk@cygnus.com> writes:
| > 
| > | Zack Weinberg <zack@wolery.cumb.org> writes:
| > | 
| > | > Look over at the conversation I've been having with Mark Mitchell.  He
| > | > thinks we can't get away with lazy parsing of anything in C++, and has
| > | > some solid arguments for it.
| > 
| > Yes, I'm also of the opinion that you can't simply do lazy parsing in
| > C++, unless you have indeed a very very sophisticated scheme to get
| > name resolution right, especially to get the two-phase name lookup in
| > templates (see 14.6 of the C++ Standard).  You also need to keep
| > track of points of instantiation. 

[...]

| In the worst case, consider someone who thinks it a good idea to feed
| an entire .cc file to the precompiler, and whose `source' consists of
| 
| #include <foo.cc>
| /* ha ha ha */

This is not very far from what people are doing currently because of
lack of separate template compilation.

| I don't think we have to make this work for PCH to be useful.
| 
| > That idiom is, for example, used to initialize the global stream
| > objects in the C++ Standard Library.
| > 
| > The idea is generalized into the idiom "resource acquisition is
| > initialization".  And we need to support it correctly, whether we like
| > it or not.
| 
| I agree that it must be supported correctly if it supported at all.
| I do not agree that PCH would be useless if it was not supported
| fully, or even if it was not supported at all.

I was not questioning the usefulness of PCH.

I'm worried about the things affected by PCH.  And I'm even more
worried when I'm hearing the plan of lazy parsing.

[...]

| > Given that name lookup is not that simple in C++, I would like you be
| > more precise here: how do you handle names found by unqualified name
| > lookup? 
| 
| Name lookup is the determination of which, out of a set of possible
| meanings for a name, is the one meant.  My scheme handles this by
| loading in all the possible definitions of a name before name lookup
| starts.

So you're not arguing for Zack's lazy parsing?

| ...  For instance, when loading "foo" I would load T::foo,
| J::T::foo, and plain ::foo, if they were all defined.  (I might make
| an exception for C::foo if C is not yet loaded; I think, but am not
| sure, that under those circumstances "foo" cannot mean C::foo.)

I think you might need it in depedent name lookup.

-- Gaby
CodeSourcery, LLC                             http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
@ 2000-08-25  8:13 Jeff Greif
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff Greif @ 2000-08-25  8:13 UTC (permalink / raw)
  To: Zack Weinberg, Daniel Berlin, Mark Mitchell, gcc

Watch out for madvise.  If you use it on Solaris, and peculiar things
happen, it might be best to turn it off and forget it.

Many years ago I worked on the predecessor of Mathematica, called SMP,
which ran on several operating systems and used a mark-and-sweep garbage
collector.  On BSD-related systems, like SunOs, it used the madvise
system call (then called vadvise on Sun) to warn the paging system about
the upcoming behaviors in the mark and sweep phases.  It turned out that
one of the settings had a bug that caused memory to be scribbled, and
while I could prove (by commenting out the vadvise calls) that the fault
was in the code triggered by one setting of vadvise, I could not produce
a small test case for Sun.  It's sufficiently obscure that it may never
have been fixed.  I spent over a week trying to nail it down.  The
overall symptom was that on an unloaded machine, you could repeatedly
run a process doing a particular 10-minute computation, and either get
the right answer, the wrong answer, or a core dump in a seemingly random
fashion.  The only useful thing the debugger told me was that things
went haywire during garbage collection -- running the debugger was
almost useless because every memory location you examined changed the
paging behavior, producing different results.  Until I was inspired to
try commenting out the vadvise calls, the debugger only made me more and
more confused.

Jeff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
@ 2000-08-25  1:56 Geert Bosch
  0 siblings, 0 replies; 42+ messages in thread
From: Geert Bosch @ 2000-08-25  1:56 UTC (permalink / raw)
  To: Mark Mitchell, rth; +Cc: gcc, zack

On Thu, 24 Aug 2000 22:00:52 -0700, Mark Mitchell wrote:

  Right.  But do we want to require that the OS provide that kind of
  capability?  I guess we can always fall back on what we've already
  got.

We cannot require this in general, so such a scheme should always
be able to fall-back on methods not requiring the ability to
continue after a SIGSEGV signal.

  -Geert

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PCH, and more generally C++ parser performance
@ 2000-08-25  1:13 Mike Stump
  0 siblings, 0 replies; 42+ messages in thread
From: Mike Stump @ 2000-08-25  1:13 UTC (permalink / raw)
  To: zack; +Cc: gcc

> From: Zack Weinberg <zack@wolery.cumb.org>
> Date: Thu, 24 Aug 2000 18:17:01 -0700

> (1) get_identifier is an easy one.  

> I intend to make get_identifier use cpplib's hash table
> implementation, and share the table itself with cpplib when using an
> integrated preprocessor, which should wipe out this bottleneck.

Sounds good.  Might be interesting to compare with map and hash_map
from the C++ library, and see if those two are in fact worse than
cpplib.  If they aren't, maybe cpplib needs fixing.

> (3) The parser is also bloody slow; we're spending .79 seconds just in
> yyparse.  This is mostly the fault of C++ the language.  Improvements
> in this area will come from giving it less text to grovel through -
> another advantage of the above idea about delaying parsing until
> needed.  (The code that saves inline functions for later uses a
> trivial brace-counting algorithm.)

This describes the way the compiler used to work, then we fixed it.  :-)
The context was templates.  The trick is, once we look it up for
inlining, we want to be able to push into the inline, complete the
compile of it, pop back to the inlining routine and have it inlined.
One can check out older compilers to see the code that did it.

I am told by Mike Ball of Sun that the most compact representation for
saving away inlines (the context was templates as I recall) is in
fact, just source...  so this lends to the idea it might be worth
considering going that direction.  Also, export from template fame
might make use of the same technology.

> It's instructive to note that if you use byacc instead of bison, the
> generated parser is significantly faster - by itself.  The difference
> isn't enough to show up in the total wall clock time.

>   4.96      6.25     0.79        1   790.00 15902.99  yyparse (bison)
>   3.57      7.20     0.57        1   570.00 15916.08  yyparse (byacc)

If one wants to sanity check it, one can gcov it and see where things
are being hit hard, and see if that makes sense.  Sometimes one can
just look at the size of the numbers and say, oh, that's way too high,
trace it down, and find something stupid.

Sounds like some really fine work you've put in, thanks.

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2000-08-31 13:40 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-08-24 18:17 PCH, and more generally C++ parser performance Zack Weinberg
2000-08-24 18:45 ` Mark Mitchell
2000-08-24 20:35   ` Richard Henderson
2000-08-24 22:02     ` Mark Mitchell
2000-08-24 22:14       ` Daniel Berlin
2000-08-25 11:32       ` finding fault addresses in signal handlers Geoff Keating
2000-08-24 21:54   ` PCH, and more generally C++ parser performance Zack Weinberg
2000-08-24 22:05     ` Mark Mitchell
2000-08-24 22:24       ` Daniel Berlin
2000-08-24 23:43         ` Mark Mitchell
2000-08-25  0:56           ` Daniel Berlin
2000-08-27 17:09             ` Zack Weinberg
2000-08-27 17:32               ` Mark Mitchell
2000-08-27 17:37                 ` Zack Weinberg
2000-08-24 23:43         ` Zack Weinberg
2000-08-24 23:43           ` Mark Mitchell
2000-08-24 23:43       ` Zack Weinberg
2000-08-25  0:19         ` Mark Mitchell
2000-08-27 15:31           ` Zack Weinberg
2000-08-27 16:12             ` Mark Mitchell
2000-08-27 17:46               ` Zack Weinberg
2000-08-27 18:01                 ` Mark Mitchell
2000-08-27 18:34                   ` Zack Weinberg
2000-08-27 21:20                     ` Mark Mitchell
2000-08-27 21:53                       ` Zack Weinberg
2000-08-27 21:58                         ` Mark Mitchell
2000-08-25 22:53   ` Zack Weinberg
2000-08-25 12:28 ` Stan Shebs
2000-08-27 17:59   ` Zack Weinberg
2000-08-27 21:13     ` Geoff Keating
2000-08-29 17:32       ` Joe Buck
2000-08-29 18:04         ` Geoff Keating
2000-08-29 19:34           ` Joe Buck
2000-08-29 23:30             ` Geoff Keating
2000-08-29 18:28         ` Stan Shebs
2000-08-31  0:53       ` Gabriel Dos Reis
2000-08-31  2:40         ` Geoff Keating
2000-08-31 13:40           ` Gabriel Dos Reis
2000-08-30 12:32 ` Joern Rennecke
2000-08-25  1:13 Mike Stump
2000-08-25  1:56 Geert Bosch
2000-08-25  8:13 Jeff Greif

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).