Faster compilation speed

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Faster compilation speed
@ 2002-08-09 12:17 Mike Stump
  2002-08-09 13:04 ` Noel Yap
                   ` (6 more replies)
  0 siblings, 7 replies; 173+ messages in thread
From: Mike Stump @ 2002-08-09 12:17 UTC (permalink / raw)
  To: gcc

I'd like to introduce lots of various changes to improve compiler 
speed.  I thought I should send out an email and see if others think 
this would be good to have in the tree.  Also, if it is, I'd like to 
solicit any ideas others have for me to pursue.  I'd be happy to do all 
the hard work, if you come up with the ideas!  The target is to be 6x 
faster.

The first realization I came to is that the only existing control for 
such things is -O[123], and having thought about it, I think it would 
be best to retain and use those flags.  For minimal user impact, I 
think it would be good to not perturb existing users of -O[0123] too 
much, or at leaast, not at first.  If we wanted to change them, I think 
-O0 should be the `fast' version, -O1 should be what -O0 does now with 
some additions around the edges, and -O2 and -O3 also slide over (at 
least one).  What do you think, slide them all over one or more, or 
just make -O0 do less, or...?  Maybe we have a -O0.0 to mean compile 
very quickly?

Another question would be how many knobs should we have?  At first, I 
am inclined to say just one.  If we want, we can later break them out 
into more choices.  I am mainly interested in a single knob at this 
point.

Another question is, what should the lower limit be on uglifying code 
for the sake of compilation speed.

Below are some concrete ideas so others can get a feel for the types of 
changes, and to comment on the flag and how it is used.
While I give a specific example, I'm more interested in the upper level 
comments, than discussion of not combining temp slots.

The use of a macro preprocessor symbol allows us to replace it with 0 
or 1, should we want to obtain a compiler that is unconditionally 
faster, or one that doesn't have any extra code in it.

This change yields a 0.9% speed improvement when compiling expr.c.  Not 
much, but if the compiler were 6x faster, this would be 5.5% change in 
compilation speed.  The resulting code is worse, but not by much.

So, let the discussion begin...


Doing diffs in flags.h.~1~:
*** flags.h.~1~ Fri Aug  9 10:17:36 2002
--- flags.h     Fri Aug  9 10:37:58 2002
*************** extern int flag_signaling_nans;
*** 696,699 ****
--- 696,705 ----
  #define HONOR_SIGN_DEPENDENT_ROUNDING(MODE) \
    (MODE_HAS_SIGN_DEPENDENT_ROUNDING (MODE) && 
!flag_unsafe_math_optimizations)

+ /* Nonzero for compiling as fast as we can.  */
+
+ extern int flag_speed_compile;
+
+ #define SPEEDCOMPILE flag_speed_compile
+
  #endif /* ! GCC_FLAGS_H */
--------------
Doing diffs in function.c.~1~:
*** function.c.~1~      Fri Aug  9 10:17:36 2002
--- function.c  Fri Aug  9 10:37:58 2002
*************** free_temp_slots ()
*** 1198,1203 ****
--- 1198,1206 ----
  {
    struct temp_slot *p;

+   if (SPEEDCOMPILE)
+     return;
+
    for (p = temp_slots; p; p = p->next)
      if (p->in_use && p->level == temp_slot_level && ! p->keep
        && p->rtl_expr == 0)
*************** free_temps_for_rtl_expr (t)
*** 1214,1219 ****
--- 1217,1225 ----
  {
    struct temp_slot *p;

+   if (SPEEDCOMPILE)
+     return;
+
    for (p = temp_slots; p; p = p->next)
      if (p->rtl_expr == t)
        {
*************** pop_temp_slots ()
*** 1301,1311 ****
  {
    struct temp_slot *p;

!   for (p = temp_slots; p; p = p->next)
!     if (p->in_use && p->level == temp_slot_level && p->rtl_expr == 0)
!       p->in_use = 0;

!   combine_temp_slots ();

    temp_slot_level--;
  }
--- 1307,1320 ----
  {
    struct temp_slot *p;

!   if (! SPEEDCOMPILE)
!     {
!       for (p = temp_slots; p; p = p->next)
!       if (p->in_use && p->level == temp_slot_level && p->rtl_expr == 
0)
!         p->in_use = 0;

!       combine_temp_slots ();
!     }

    temp_slot_level--;
  }
--------------
Doing diffs in toplev.c.~1~:
*** toplev.c.~1~        Fri Aug  9 10:17:40 2002
--- toplev.c    Fri Aug  9 11:31:50 2002
*************** int flag_new_regalloc = 0;
*** 894,899 ****
--- 894,903 ----

  int flag_tracer = 0;

+ /* If nonzero, speed-up the compile as fast as we can.  */
+
+ int flag_speed_compile = 0;
+
  /* Values of the -falign-* flags: how much to align labels in code.
     0 means `use default', 1 means `don't align'.
     For each variable, there is an _log variant which is the power
*************** display_help ()
*** 3679,3684 ****
--- 3683,3689 ----

    printf (_("  -O[number]              Set optimization level to 
[number]\n"));
    printf (_("  -Os                     Optimize for space rather than 
speed\n"));
+   printf (_("  -Of                     Compile as fast as 
possible\n"));
    for (i = LAST_PARAM; i--;)
      {
        const char *description = compiler_params[i].help;
*************** parse_options_and_default_flags (argc, a
*** 4772,4777 ****
--- 4777,4786 ----
              /* Optimizing for size forces optimize to be 2.  */
              optimize = 2;
            }
+         else if ((p[0] == 'f') && (p[1] == 0))
+           {
+             flag_speed_compile = 1;
+           }
          else
            {
              const int optimize_val = read_integral_parameter (p, p - 
2, -1);
--------------


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 12:17 Faster compilation speed Mike Stump
@ 2002-08-09 13:04 ` Noel Yap
  2002-08-09 13:10   ` Matt Austern
                     ` (3 more replies)
  2002-08-09 13:10 ` Aldy Hernandez
                   ` (5 subsequent siblings)
  6 siblings, 4 replies; 173+ messages in thread
From: Noel Yap @ 2002-08-09 13:04 UTC (permalink / raw)
  To: Mike Stump, gcc

Build speeds are most helped by minimizing the number
of files opened and closed during the build.  I think
a good start would be to have preprocessed header
files.  My idea would be to add options to cpp that
would have it produce preprocessed files.  Doing so
would allow it to be easily integrated into a build
system like "make".

At first, I think all that's really needed is a cpp
option, say --preprocess-includes, that just goes
through and preprocesses the #include directives (eg
it doesn't preprocess #define's, #if's, ...).

Conceivably, this would also require some other
option, possibly --preprocessed-header-file-path, so
that it can recognize when to use existing
preprocessed header files.

MTC,
Noel
--- Mike Stump <mrs@apple.com> wrote:
> I'd like to introduce lots of various changes to
> improve compiler 
> speed.  I thought I should send out an email and see
> if others think 
> this would be good to have in the tree.  Also, if it
> is, I'd like to 
> solicit any ideas others have for me to pursue.  I'd
> be happy to do all 
> the hard work, if you come up with the ideas!  The
> target is to be 6x 
> faster.
> 
> The first realization I came to is that the only
> existing control for 
> such things is -O[123], and having thought about it,
> I think it would 
> be best to retain and use those flags.  For minimal
> user impact, I 
> think it would be good to not perturb existing users
> of -O[0123] too 
> much, or at leaast, not at first.  If we wanted to
> change them, I think 
> -O0 should be the `fast' version, -O1 should be what
> -O0 does now with 
> some additions around the edges, and -O2 and -O3
> also slide over (at 
> least one).  What do you think, slide them all over
> one or more, or 
> just make -O0 do less, or...?  Maybe we have a -O0.0
> to mean compile 
> very quickly?
> 
> Another question would be how many knobs should we
> have?  At first, I 
> am inclined to say just one.  If we want, we can
> later break them out 
> into more choices.  I am mainly interested in a
> single knob at this 
> point.
> 
> Another question is, what should the lower limit be
> on uglifying code 
> for the sake of compilation speed.
> 
> Below are some concrete ideas so others can get a
> feel for the types of 
> changes, and to comment on the flag and how it is
> used.
> While I give a specific example, I'm more interested
> in the upper level 
> comments, than discussion of not combining temp
> slots.
> 
> The use of a macro preprocessor symbol allows us to
> replace it with 0 
> or 1, should we want to obtain a compiler that is
> unconditionally 
> faster, or one that doesn't have any extra code in
> it.
> 
> This change yields a 0.9% speed improvement when
> compiling expr.c.  Not 
> much, but if the compiler were 6x faster, this would
> be 5.5% change in 
> compilation speed.  The resulting code is worse, but
> not by much.
> 
> So, let the discussion begin...
> 
> 
> Doing diffs in flags.h.~1~:
> *** flags.h.~1~ Fri Aug  9 10:17:36 2002
> --- flags.h     Fri Aug  9 10:37:58 2002
> *************** extern int flag_signaling_nans;
> *** 696,699 ****
> --- 696,705 ----
>    #define HONOR_SIGN_DEPENDENT_ROUNDING(MODE) \
>      (MODE_HAS_SIGN_DEPENDENT_ROUNDING (MODE) && 
> !flag_unsafe_math_optimizations)
> 
> + /* Nonzero for compiling as fast as we can.  */
> +
> + extern int flag_speed_compile;
> +
> + #define SPEEDCOMPILE flag_speed_compile
> +
>    #endif /* ! GCC_FLAGS_H */
> --------------
> Doing diffs in function.c.~1~:
> *** function.c.~1~      Fri Aug  9 10:17:36 2002
> --- function.c  Fri Aug  9 10:37:58 2002
> *************** free_temp_slots ()
> *** 1198,1203 ****
> --- 1198,1206 ----
>    {
>      struct temp_slot *p;
> 
> +   if (SPEEDCOMPILE)
> +     return;
> +
>      for (p = temp_slots; p; p = p->next)
>        if (p->in_use && p->level == temp_slot_level
> && ! p->keep
>          && p->rtl_expr == 0)
> *************** free_temps_for_rtl_expr (t)
> *** 1214,1219 ****
> --- 1217,1225 ----
>    {
>      struct temp_slot *p;
> 
> +   if (SPEEDCOMPILE)
> +     return;
> +
>      for (p = temp_slots; p; p = p->next)
>        if (p->rtl_expr == t)
>          {
> *************** pop_temp_slots ()
> *** 1301,1311 ****
>    {
>      struct temp_slot *p;
> 
> !   for (p = temp_slots; p; p = p->next)
> !     if (p->in_use && p->level == temp_slot_level
> && p->rtl_expr == 0)
> !       p->in_use = 0;
> 
> !   combine_temp_slots ();
> 
>      temp_slot_level--;
>    }
> --- 1307,1320 ----
>    {
>      struct temp_slot *p;
> 
> !   if (! SPEEDCOMPILE)
> !     {
> !       for (p = temp_slots; p; p = p->next)
> !       if (p->in_use && p->level == temp_slot_level
> && p->rtl_expr == 
> 0)
> !         p->in_use = 0;
> 
> !       combine_temp_slots ();
> !     }
> 
>      temp_slot_level--;
>    }
> --------------
> Doing diffs in toplev.c.~1~:
> *** toplev.c.~1~        Fri Aug  9 10:17:40 2002
> --- toplev.c    Fri Aug  9 11:31:50 2002
> *************** int flag_new_regalloc = 0;
> *** 894,899 ****
> --- 894,903 ----
> 
>    int flag_tracer = 0;
> 
> + /* If nonzero, speed-up the compile as fast as we
> can.  */
> +
> + int flag_speed_compile = 0;
> +
>    /* Values of the -falign-* flags: how much to
> align labels in code.
>       0 means `use default', 1 means `don't align'.
>       For each variable, there is an _log variant
> which is the power
> *************** display_help ()
> *** 3679,3684 ****
> --- 3683,3689 ----
> 
>      printf (_("  -O[number]              Set
> optimization level to 
> [number]\n"));
>      printf (_("  -Os                     Optimize
> for space rather than 
> speed\n"));
> +   printf (_("  -Of                     Compile as
> fast as 
> possible\n"));
>      for (i = LAST_PARAM; i--;)
>        {
>          const char *description =
> compiler_params[i].help;
> *************** parse_options_and_default_flags
> (argc, a
> *** 4772,4777 ****
> --- 4777,4786 ----
>                /* Optimizing for size forces
> optimize to be 2.  */
>                optimize = 2;
>              }
> +         else if ((p[0] == 'f') && (p[1] == 0))
> +           {
> +             flag_speed_compile = 1;
> +           }
>            else
>              {
>                const int optimize_val =
> read_integral_parameter (p, p - 
> 2, -1);
> --------------
> 


__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 13:04 ` Noel Yap
@ 2002-08-09 13:10   ` Matt Austern
  2002-08-09 14:22   ` Neil Booth
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 173+ messages in thread
From: Matt Austern @ 2002-08-09 13:10 UTC (permalink / raw)
  To: Noel Yap; +Cc: Mike Stump, gcc

On Friday, August 9, 2002, at 01:04 PM, Noel Yap wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 13:04 ` Noel Yap
  2002-08-09 13:10   ` Matt Austern
@ 2002-08-09 14:22   ` Neil Booth
  2002-08-09 14:44     ` Noel Yap
  2002-08-09 15:13   ` Stan Shebs
  2002-08-09 18:57   ` Linus Torvalds
  3 siblings, 1 reply; 173+ messages in thread
From: Neil Booth @ 2002-08-09 14:22 UTC (permalink / raw)
  To: Noel Yap; +Cc: Mike Stump, gcc

Noel Yap wrote:-

> At first, I think all that's really needed is a cpp
> option, say --preprocess-includes, that just goes
> through and preprocesses the #include directives (eg
> it doesn't preprocess #define's, #if's, ...).

Heh, if only life were this easy.  If you actually think about what CPP
does, you'd realize this is a no-go.  Two immediate issues:

1) #include can take a macro as argument
2) #include can appear in preprocessor conditional blocks.  You
   only know whether they are processed if you know the correct value
   of the #if.  This often depends on macro expansions, and correct
   processing of prior includes.  Of course, #defines appear in
   conditional blocks too, so this is kind of important to get right.

There are no easy shortcuts here: to preprocess something properly,
you have to do *everything* the preprocessor does "normally".  There
are no shortcuts, not even trivial ones.

We *do* do too many stats and opens though; when I get time I'll post
my ideas about this.

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 14:22   ` Neil Booth
@ 2002-08-09 14:44     ` Noel Yap
  2002-08-09 15:14       ` Neil Booth
  0 siblings, 1 reply; 173+ messages in thread
From: Noel Yap @ 2002-08-09 14:44 UTC (permalink / raw)
  To: Neil Booth; +Cc: Mike Stump, gcc

--- Neil Booth <neil@daikokuya.co.uk> wrote:
> Heh, if only life were this easy.  If you actually
> think about what CPP
> does, you'd realize this is a no-go.  Two immediate
> issues:
> 
> 1) #include can take a macro as argument

Yes, what I suggest certainly won't work for this
situation.

OTOH, how many times is this really used?  Would it be
such a sin to say that one cannot do the preprocessing
I suggested if one has macros for #include arguments?

> 2) #include can appear in preprocessor conditional
> blocks.  You
>    only know whether they are processed if you know
> the correct value
>    of the #if.  This often depends on macro
> expansions, and correct
>    processing of prior includes.  Of course,
> #defines appear in
>    conditional blocks too, so this is kind of
> important to get right.

I don't see this as too big a problem.  Just output a
file like:
#if COND
/* contents of header file
#endif

In fact, doing it this way has the advantage that
several builds, not necessarily agreeing on the value
of COND, can use the file.

> There are no easy shortcuts here: to preprocess
> something properly,
> you have to do *everything* the preprocessor does
> "normally".  There
> are no shortcuts, not even trivial ones.

I think one needn't preprocess everything perfectly in
order to gain significant advantages.  Would you say
that what I suggest is better than what we have now?

If an ideal solution is being worked on, I'd opt for
that.  OTOH, I think this solution has been in the
works for at least a couple of years now.  I think the
--preprocess-includes option should be very simple to
do.

> We *do* do too many stats and opens though; when I
> get time I'll post
> my ideas about this.

I'm sure my ideas are far from ideal so I'm looking
forward to yours.

Noel

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 14:44     ` Noel Yap
@ 2002-08-09 15:14       ` Neil Booth
  2002-08-10 15:54         ` Noel Yap
  0 siblings, 1 reply; 173+ messages in thread
From: Neil Booth @ 2002-08-09 15:14 UTC (permalink / raw)
  To: Noel Yap; +Cc: Mike Stump, gcc

Noel Yap wrote:-

> I don't see this as too big a problem.  Just output a
> file like:
> #if COND
> /* contents of header file
> #endif
> 
> In fact, doing it this way has the advantage that
> several builds, not necessarily agreeing on the value
> of COND, can use the file.

Hmm, and what about header guards?  Infinite recursion?

> I think one needn't preprocess everything perfectly in
> order to gain significant advantages.  Would you say
> that what I suggest is better than what we have now?

Correctness is paramount; if it's not correct it's no
good.

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:14       ` Neil Booth
@ 2002-08-10 15:54         ` Noel Yap
  0 siblings, 0 replies; 173+ messages in thread
From: Noel Yap @ 2002-08-10 15:54 UTC (permalink / raw)
  To: Neil Booth; +Cc: Mike Stump, gcc

--- Neil Booth <neil@daikokuya.co.uk> wrote:
> Noel Yap wrote:-
> 
> > I don't see this as too big a problem.  Just
> output a
> > file like:
> > #if COND
> > /* contents of header file
> > #endif
> > 
> > In fact, doing it this way has the advantage that
> > several builds, not necessarily agreeing on the
> value
> > of COND, can use the file.
> 
> Hmm, and what about header guards?  Infinite
> recursion?

Unless I'm missing something, header guards by
themselves shouldn't pose a problem.

You're right.  Cyclic dependencies would throw this
whole thing out of whack.  OTOH, I think such practice
needs to be avoided anyhow.

Another case related to recursive includes is where
each level of recursion would have side effects (eg
redefining a macro whose value is used in the next
recursion).  Again, I've heard this usage only once
and even the creator of such a header file said it was
a tremendous hack for programmers with no proper
education in programming (IIRC, they were physicists).

> > I think one needn't preprocess everything
> perfectly in
> > order to gain significant advantages.  Would you
> say
> > that what I suggest is better than what we have
> now?
> 
> Correctness is paramount; if it's not correct it's
> no
> good.

I apologize if my post was misunderstood.  What I
meant to say was, if it's able to preprocess, then
allow it, otherwise, don't.  IOW, those already
following common practices can take advantage of a new
feature, those that don't have what they have now.

I can certainly understand the ideals of keeping the
tool and all its features pure and working for all
possible uses.  OTOH, doing so may prevent practicle
avenues that possibly 99% of users can benefit from.

Noel

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 13:04 ` Noel Yap
  2002-08-09 13:10   ` Matt Austern
  2002-08-09 14:22   ` Neil Booth
@ 2002-08-09 15:13   ` Stan Shebs
  2002-08-09 15:18     ` Neil Booth
                       ` (2 more replies)
  2002-08-09 18:57   ` Linus Torvalds
  3 siblings, 3 replies; 173+ messages in thread
From: Stan Shebs @ 2002-08-09 15:13 UTC (permalink / raw)
  To: Noel Yap; +Cc: Mike Stump, gcc

Noel Yap wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:13   ` Stan Shebs
@ 2002-08-09 15:18     ` Neil Booth
  2002-08-10 16:12       ` Noel Yap
  2002-08-09 15:19     ` Ziemowit Laski
  2002-08-10 16:07     ` Noel Yap
  2 siblings, 1 reply; 173+ messages in thread
From: Neil Booth @ 2002-08-09 15:18 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Noel Yap, Mike Stump, gcc

Stan Shebs wrote:-

> Is this assertion based on empirical measurement, and if so, for what
> source code and what system?  For instance, the longest source file
> in GCC is about 15K lines, and at -O2, only a small percentage of
> time is spent messing with files.  If I use -save-temps on cp/decl.c on
> one of my (Linux) machines, I get a total time of about 38 sec from
> source to asm.  If I just compile decl.i, it's about 37 sec, so that's
> 1 sec for *all* preprocessing, including all file opening/closing.

Yes, it's very rare that preprocessing is more than 2% of -O2 time;
it's often less than 1%.  IMO that says more about the efficiency
of the rest than of CPP.

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:18     ` Neil Booth
@ 2002-08-10 16:12       ` Noel Yap
  2002-08-10 18:00         ` Nix
  0 siblings, 1 reply; 173+ messages in thread
From: Noel Yap @ 2002-08-10 16:12 UTC (permalink / raw)
  To: Neil Booth, Stan Shebs; +Cc: Noel Yap, Mike Stump, gcc

--- Neil Booth <neil@daikokuya.co.uk> wrote:
> Stan Shebs wrote:-
> 
> > Is this assertion based on empirical measurement,
> and if so, for what
> > source code and what system?  For instance, the
> longest source file
> > in GCC is about 15K lines, and at -O2, only a
> small percentage of
> > time is spent messing with files.  If I use
> -save-temps on cp/decl.c on
> > one of my (Linux) machines, I get a total time of
> about 38 sec from
> > source to asm.  If I just compile decl.i, it's
> about 37 sec, so that's
> > 1 sec for *all* preprocessing, including all file
> opening/closing.
> 
> Yes, it's very rare that preprocessing is more than
> 2% of -O2 time;
> it's often less than 1%.  IMO that says more about
> the efficiency
> of the rest than of CPP.

I would agree if you're talking about complete builds
spanning only a few C/C++ files.  OTOH, when builds
span many hundreds of these files, build-time (not
just compile-time) starts getting bogged down on
(mostly) reopening and repreprocessing the same files
over and over.

Within our system, builds on Windows are magnitudes
faster since we're able to take advantage of
precompiled headers.  AFAIK, I legitimate study was
made studying whether to use this feature or not.

Noel

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 16:12       ` Noel Yap
@ 2002-08-10 18:00         ` Nix
  2002-08-10 20:36           ` Noel Yap
  2002-08-12 15:08           ` Mike Stump
  0 siblings, 2 replies; 173+ messages in thread
From: Nix @ 2002-08-10 18:00 UTC (permalink / raw)
  To: Noel Yap; +Cc: Neil Booth, gcc

[Cc: list trimmed]
On Sat, 10 Aug 2002, Noel Yap spake:
> I would agree if you're talking about complete builds
> spanning only a few C/C++ files.  OTOH, when builds
> span many hundreds of these files, build-time (not
> just compile-time) starts getting bogged down on
> (mostly) reopening and repreprocessing the same files
> over and over.
> 
> Within our system, builds on Windows are magnitudes
> faster since we're able to take advantage of
> precompiled headers.

Are you sure that this isn't because GCC is having to parse the headers
over and over again, while the precompiled system can avoid that
overhead?

Especially for C++ header files (which tend to be large, complex,
interdependent, and include a lot of code), the parsing and compilation
time *vastly* dominates the preprocessing time.

Example, with GCC-3.1, with a `hello world' iostreams-using program...

The code:

#include <iostream>

int main (void)
 {
  std::cout << "Hello world";
  return 0;
 }

Time spent preprocessing (distorted by the slowness of cpp's output
routines):

nix@loki 62 /tmp% time c++ -E -ftime-report hello.C >/dev/null

real    0m1.424s
user    0m0.710s
sys     0m0.100s

Time spent preprocessing and parsing (roughly; cpp's output routines are
still slow; on the trunk much less time will be spent preprocessing
because the integrated preprocessor doesn't have to do any output at all
there, instead feeding a token stream to the rest of the compiler):

nix@loki 60 /tmp% c++ -ftime-report -fsyntax-only hello.C 

Execution times (seconds)
 garbage collection    :   1.16 (12%) usr   0.08 ( 6%) sys   2.19 (13%) wall
 preprocessing         :   1.04 (11%) usr   0.29 (20%) sys   2.10 (12%) wall
 lexical analysis      :   0.99 (10%) usr   0.28 (20%) sys   1.87 (11%) wall
 parser                :   6.12 (65%) usr   0.75 (53%) sys  10.85 (63%) wall
 varconst              :   0.08 ( 1%) usr   0.00 ( 0%) sys   0.10 ( 1%) wall
 TOTAL                 :   9.44             1.42            17.21

(oddly, preprocessing took *longer* than it did using -E, which I'd not
 expected; but, still parsing vastly dominates preprocessing, and this isn't
 going near e.g. the STL headers)

Complete run, with optimization:

nix@loki 66 /tmp% c++ -O2 -ftime-report -o hello hello.C

Execution times (seconds)
 garbage collection    :   1.10 (11%) usr   0.11 ( 9%) sys   1.74 (11%) wall
 cfg cleanup           :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 life analysis         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 preprocessing         :   1.12 (11%) usr   0.22 (18%) sys   2.04 (13%) wall
 lexical analysis      :   0.98 (10%) usr   0.22 (18%) sys   1.93 (12%) wall
 parser                :   6.46 (65%) usr   0.63 (53%) sys   9.98 (62%) wall
 expand                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 varconst              :   0.08 ( 1%) usr   0.00 ( 0%) sys   0.12 ( 1%) wall
 CSE                   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 CSE 2                 :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 regmove               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 global alloc          :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 flow 2                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 rename registers      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 scheduling 2          :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.02 ( 0%) wall
 final                 :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 TOTAL                 :   9.96             1.20            16.16

Now obviously with a less toy example the time consumed optimizing would
rise; but that doesn't affect my point, that the lion's share of time
spent in C++ header files is parsing time, and that speeding up the
preprocessor will have limited effect now (thanks to Zack and Neil
speeding it up so much already :) ).

-- 
`There's something satisfying about killing JWZ over and over again.'
                                        -- 1i, personal communication

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 18:00         ` Nix
@ 2002-08-10 20:36           ` Noel Yap
  2002-08-11  4:30             ` Nix
  2002-08-12 15:08           ` Mike Stump
  1 sibling, 1 reply; 173+ messages in thread
From: Noel Yap @ 2002-08-10 20:36 UTC (permalink / raw)
  To: Nix; +Cc: Neil Booth, gcc

--- Nix <nix@esperi.demon.co.uk> wrote:
> [Cc: list trimmed]
> On Sat, 10 Aug 2002, Noel Yap spake:
> > I would agree if you're talking about complete
> builds
> > spanning only a few C/C++ files.  OTOH, when
> builds
> > span many hundreds of these files, build-time (not
> > just compile-time) starts getting bogged down on
> > (mostly) reopening and repreprocessing the same
> files
> > over and over.
> > 
> > Within our system, builds on Windows are
> magnitudes
> > faster since we're able to take advantage of
> > precompiled headers.
> 
> Are you sure that this isn't because GCC is having
> to parse the headers
> over and over again, while the precompiled system
> can avoid that
> overhead?

No, I'm not sure.  In any case, whether it's due to
elimination of reparsing or elimination of reopening,
would you agree that precompiled headers should speed
up builds?

> Especially for C++ header files (which tend to be
> large, complex,
> interdependent, and include a lot of code), the
> parsing and compilation
> time *vastly* dominates the preprocessing time.

What about for us lowly C programmers?

> Example, with GCC-3.1, with a `hello world'
> iostreams-using program...
> 
> The code:
> 
> #include <iostream>
> 
> int main (void)
>  {
>   std::cout << "Hello world";
>   return 0;
>  }
> 
> Time spent preprocessing (distorted by the slowness
> of cpp's output
> routines):
> 
> nix@loki 62 /tmp% time c++ -E -ftime-report hello.C
> >/dev/null
> 
> real    0m1.424s
> user    0m0.710s
> sys     0m0.100s
> 
> Time spent preprocessing and parsing (roughly; cpp's
> output routines are
> still slow; on the trunk much less time will be
> spent preprocessing
> because the integrated preprocessor doesn't have to
> do any output at all
> there, instead feeding a token stream to the rest of
> the compiler):
> 
> nix@loki 60 /tmp% c++ -ftime-report -fsyntax-only
> hello.C 
> 
> Execution times (seconds)
>  garbage collection    :   1.16 (12%) usr   0.08 (
> 6%) sys   2.19 (13%) wall
>  preprocessing         :   1.04 (11%) usr   0.29
> (20%) sys   2.10 (12%) wall
>  lexical analysis      :   0.99 (10%) usr   0.28
> (20%) sys   1.87 (11%) wall
>  parser                :   6.12 (65%) usr   0.75
> (53%) sys  10.85 (63%) wall
>  varconst              :   0.08 ( 1%) usr   0.00 (
> 0%) sys   0.10 ( 1%) wall
>  TOTAL                 :   9.44             1.42    
>        17.21
> 
> (oddly, preprocessing took *longer* than it did
> using -E, which I'd not
>  expected; but, still parsing vastly dominates
> preprocessing, and this isn't
>  going near e.g. the STL headers)

OK.  Now let's say that that preprocessing can be used
across several compiles.  Can you see how an entire
_build_ (eg comprising of many compiles) can be sped
up?

> Complete run, with optimization:
> 
> nix@loki 66 /tmp% c++ -O2 -ftime-report -o hello
> hello.C
> 
> Execution times (seconds)
>  garbage collection    :   1.10 (11%) usr   0.11 (
> 9%) sys   1.74 (11%) wall
>  cfg cleanup           :   0.01 ( 0%) usr   0.00 (
> 0%) sys   0.01 ( 0%) wall
>  life analysis         :   0.02 ( 0%) usr   0.00 (
> 0%) sys   0.02 ( 0%) wall
>  preprocessing         :   1.12 (11%) usr   0.22
> (18%) sys   2.04 (13%) wall
>  lexical analysis      :   0.98 (10%) usr   0.22
> (18%) sys   1.93 (12%) wall
>  parser                :   6.46 (65%) usr   0.63
> (53%) sys   9.98 (62%) wall
>  expand                :   0.00 ( 0%) usr   0.00 (
> 0%) sys   0.01 ( 0%) wall
>  varconst              :   0.08 ( 1%) usr   0.00 (
> 0%) sys   0.12 ( 1%) wall
>  CSE                   :   0.02 ( 0%) usr   0.00 (
> 0%) sys   0.03 ( 0%) wall
>  CSE 2                 :   0.01 ( 0%) usr   0.00 (
> 0%) sys   0.03 ( 0%) wall
>  regmove               :   0.01 ( 0%) usr   0.00 (
> 0%) sys   0.02 ( 0%) wall
>  global alloc          :   0.02 ( 0%) usr   0.00 (
> 0%) sys   0.04 ( 0%) wall
>  flow 2                :   0.01 ( 0%) usr   0.00 (
> 0%) sys   0.01 ( 0%) wall
>  rename registers      :   0.02 ( 0%) usr   0.00 (
> 0%) sys   0.03 ( 0%) wall
>  scheduling 2          :   0.00 ( 0%) usr   0.01 (
> 1%) sys   0.02 ( 0%) wall
>  final                 :   0.01 ( 0%) usr   0.00 (
> 0%) sys   0.01 ( 0%) wall
>  TOTAL                 :   9.96             1.20    
>        16.16
> 
> Now obviously with a less toy example the time
> consumed optimizing would
> rise; but that doesn't affect my point, that the
> lion's share of time
> spent in C++ header files is parsing time, and that
> speeding up the
> preprocessor will have limited effect now (thanks to
> Zack and Neil
> speeding it up so much already :) ).

What kind of effect does it have for C?  Do you think
saving preprocessor output (of header files) can speed
up a build consisting of many, many compiles?

Thanks,
Noel

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 20:36           ` Noel Yap
@ 2002-08-11  4:30             ` Nix
  0 siblings, 0 replies; 173+ messages in thread
From: Nix @ 2002-08-11  4:30 UTC (permalink / raw)
  To: Noel Yap; +Cc: Neil Booth, gcc

[rewrapped my quoted text]
On Sat, 10 Aug 2002, Noel Yap stated:
> --- Nix <nix@esperi.demon.co.uk> wrote:
>> Are you sure that this isn't because GCC is having to parse the
>> headers over and over again, while the precompiled system can avoid
>> that overhead?
> 
> No, I'm not sure.  In any case, whether it's due to
> elimination of reparsing or elimination of reopening,
> would you agree that precompiled headers should speed
> up builds?

Yes, but mainly (IMHO) because the `precompilation' process includes
some parsing work. The preprocessing job (compilation phases 1--4)
should be quite fast.

So speeding up *parsing* is the point here; getting rid of bison should
help fix that :)

(Maybe I'm being too pedantic here.)

>> Especially for C++ header files (which tend to be large, complex,
>> interdependent, and include a lot of code), the parsing and
>> compilation time *vastly* dominates the preprocessing time.
> 
> What about for us lowly C programmers?

(oops, sorry, I thought you were using C++, because C++ users really
*notice* time spent in headers.)

The disparity there isn't anywhere near so extreme, but it's still there
(just).

I know that even with large bodies of C code I've never been able to
spot preprocessing time; even the old cccp was damned-near instantaneous
(well, except on very memory-constrained boxes where even ls(1) was a
hassle).

[snip]
>> Now obviously with a less toy example the time consumed optimizing
>> would rise; but that doesn't affect my point, that the lion's share
>> of time spent in C++ header files is parsing time, and that speeding
>> up the preprocessor will have limited effect now (thanks to Zack and
>> Neil speeding it up so much already :) ).
> 
> What kind of effect does it have for C?  Do you think

Hm...

... from my quick check (so primitive that I'm not even going to post it
here) preprocessing and parsing seem to consume roughly equal amounts of
time, and both are far exceeded by the amount of time taken to compile
the code itself.

So there's not much need for preprocessor optimization in C as far as I
can tell.

> saving preprocessor output (of header files) can speed
> up a build consisting of many, many compiles?

Preprocessor *output*? In its current state, the output phase is the
slowest part of the preprocessor, such that feeding token streams
straight into the compiler (as 3.3-to-be will) is faster than saving it
out to disk would be :)

And for C code in particular I imagine that the larger size of the
precompiled header lumps would cause extra disk I/O time that would
exceed the time taken to parse the headers in the first place... but 
this is a guess: some of the people who've actually been working on
precompiled headers can probably answer this better :)

-- 
`There's something satisfying about killing JWZ over and over again.'
                                        -- 1i, personal communication

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 18:00         ` Nix
  2002-08-10 20:36           ` Noel Yap
@ 2002-08-12 15:08           ` Mike Stump
  1 sibling, 0 replies; 173+ messages in thread
From: Mike Stump @ 2002-08-12 15:08 UTC (permalink / raw)
  To: Nix; +Cc: Noel Yap, Neil Booth, gcc

On Saturday, August 10, 2002, at 05:49 PM, Nix wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:13   ` Stan Shebs
  2002-08-09 15:18     ` Neil Booth
@ 2002-08-09 15:19     ` Ziemowit Laski
  2002-08-09 15:25       ` Neil Booth
  2002-08-10 16:16       ` Noel Yap
  2002-08-10 16:07     ` Noel Yap
  2 siblings, 2 replies; 173+ messages in thread
From: Ziemowit Laski @ 2002-08-09 15:19 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Ziemowit Laski, Noel Yap, Mike Stump, gcc

On Friday, August 9, 2002, at 03:12 , Stan Shebs wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:19     ` Ziemowit Laski
@ 2002-08-09 15:25       ` Neil Booth
  2002-08-10 16:16       ` Noel Yap
  1 sibling, 0 replies; 173+ messages in thread
From: Neil Booth @ 2002-08-09 15:25 UTC (permalink / raw)
  To: Ziemowit Laski; +Cc: Stan Shebs, Noel Yap, Mike Stump, gcc

Ziemowit Laski wrote:-

> >Is this assertion based on empirical measurement, and if so, for what
> >source code and what system?  For instance, the longest source file
> >in GCC is about 15K lines, and at -O2, only a small percentage of
> >time is spent messing with files.  If I use -save-temps on cp/decl.c on
> >one of my (Linux) machines, I get a total time of about 38 sec from
> >source to asm.  If I just compile decl.i, it's about 37 sec, so that's
> >1 sec for *all* preprocessing, including all file opening/closing.
> 
> Since the preprocessor is integrated, I don't think you can separate
> the timings in this way. :(  A 'gcc3 -E cp/decl.c -o decl.i' would
> probably be more meaningful.

It is separated with the timing stuff.

Your test is not good: it tests time to output.  It is well-known
that current CPP output is quite slow; on Linux this is largely a
Glibc problem.  CPP output can be 50% of preprocessing time, which
when you think about it is quite illogical.  However, it can be
made much faster, and I will do this eventually.

Since we use an integrated CPP, timing output is kind of irrelevant
(and vastly overstates CPP time).  Current CPP provides tokens to
the parser far, far faster than cccp did via a temporary file and
a duplicated lexer in the front end (not to mention other advantages,
like precise token location information).

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:19     ` Ziemowit Laski
  2002-08-09 15:25       ` Neil Booth
@ 2002-08-10 16:16       ` Noel Yap
  1 sibling, 0 replies; 173+ messages in thread
From: Noel Yap @ 2002-08-10 16:16 UTC (permalink / raw)
  To: Ziemowit Laski, Stan Shebs; +Cc: Ziemowit Laski, Noel Yap, Mike Stump, gcc

--- Ziemowit Laski <zlaski@apple.com> wrote:
> 
> On Friday, August 9, 2002, at 03:12 , Stan Shebs
> wrote:
> 
> > Noel Yap wrote:
> >
> >> Build speeds are most helped by minimizing the
> number
> >> of files opened and closed during the build.
> >>
> > Is this assertion based on empirical measurement,
> and if so, for what
> > source code and what system?  For instance, the
> longest source file
> > in GCC is about 15K lines, and at -O2, only a
> small percentage of
> > time is spent messing with files.  If I use
> -save-temps on cp/decl.c on
> > one of my (Linux) machines, I get a total time of
> about 38 sec from
> > source to asm.  If I just compile decl.i, it's
> about 37 sec, so that's
> > 1 sec for *all* preprocessing, including all file
> opening/closing.
> 
> Since the preprocessor is integrated, I don't think
> you can separate
> the timings in this way. :(  A 'gcc3 -E cp/decl.c -o
> decl.i' would
> probably be more meaningful.

This is a good point.

I think an even better study would be to replicate
John Lakos's study within one's own project.  I'd be
very interested to find out how many projects (other
than the ones I've seen) fit Lakos's "largeness" and
would, therefore, be able to take advantage of
preprocessed headers.

Noel

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:13   ` Stan Shebs
  2002-08-09 15:18     ` Neil Booth
  2002-08-09 15:19     ` Ziemowit Laski
@ 2002-08-10 16:07     ` Noel Yap
  2002-08-10 16:18       ` Neil Booth
  2 siblings, 1 reply; 173+ messages in thread
From: Noel Yap @ 2002-08-10 16:07 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Mike Stump, gcc

--- Stan Shebs <shebs@apple.com> wrote:
> Noel Yap wrote:
> 
> >Build speeds are most helped by minimizing the
> number
> >of files opened and closed during the build.
> >
> Is this assertion based on empirical measurement,
> and if so, for what
> source code and what system?  For instance, the
> longest source file
> in GCC is about 15K lines, and at -O2, only a small
> percentage of
> time is spent messing with files.  If I use
> -save-temps on cp/decl.c on
> one of my (Linux) machines, I get a total time of
> about 38 sec from
> source to asm.  If I just compile decl.i, it's about
> 37 sec, so that's
> 1 sec for *all* preprocessing, including all file
> opening/closing.

This is a good question.

John Lakos in _Large-Scale C++ Software Development_
has performed a rudimentary case study.  If the
conclusions are true, then your example indicates that
there wasn't much of a difference between the number
of files used when compiling decl.c and decl.i.

The study also indicates that having #include's within
header files is the largest contributor to the problem
(since nested #include's would increase the number of
file accesses combinatorially).

As another indication that the conclusion is true,
Lakos added guards around the #include lines
themselves and found compile times to dramatically
decrease.  For example:
#if header_h
#   include <header.h>
#endif

I can go on, but I doubt others on this list would
appreciate a reprint of the chapter.  If you don't
have the book, I suggest at least finding a copy and
reading this chapter.

> Obviously, other programs will have different
> characteristics, and if
> you have one for which file opening/closing
> dominates compile time,
> that will be very interesting.  But it's bad to try
> to optimize
> something before you have numerical evidence.

I agree.

Would you agree with Lakos's findings as evidence to
this fact?

Noel

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 16:07     ` Noel Yap
@ 2002-08-10 16:18       ` Neil Booth
  2002-08-10 20:27         ` Noel Yap
  0 siblings, 1 reply; 173+ messages in thread
From: Neil Booth @ 2002-08-10 16:18 UTC (permalink / raw)
  To: Noel Yap; +Cc: Stan Shebs, Mike Stump, gcc

Noel Yap wrote:-

> The study also indicates that having #include's within
> header files is the largest contributor to the problem
> (since nested #include's would increase the number of
> file accesses combinatorially).

See below for why this isn't true for most compilers now.

> As another indication that the conclusion is true,
> Lakos added guards around the #include lines
> themselves and found compile times to dramatically
> decrease.  For example:
> #if header_h
> #   include <header.h>
> #endif

This isn't the case with GCC.  I hope you're aware of that.
The first time GCC reads <header.h> it remembers if it had
header guards.  If it's ever asked to #include it again,
it checks if the guard is defined, and doesn't do anything.
The file's contents are also not cached if it has header
guards, on the assumption that the contents are unlikely to
be of interest in the future.

In other words, this kind of #include protection is ugly and
pointless (and possibly error-prone, though that would tend
to be immediately obvious).  Most compilers now implement
this optimization, but 5 or 6 years ago this wasn't the case.
I think GCC was one of the first.

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 16:18       ` Neil Booth
@ 2002-08-10 20:27         ` Noel Yap
  2002-08-11  0:11           ` Neil Booth
  0 siblings, 1 reply; 173+ messages in thread
From: Noel Yap @ 2002-08-10 20:27 UTC (permalink / raw)
  To: Neil Booth; +Cc: Stan Shebs, Mike Stump, gcc

--- Neil Booth <neil@daikokuya.co.uk> wrote:
> Noel Yap wrote:-
> 
> > The study also indicates that having #include's
> within
> > header files is the largest contributor to the
> problem
> > (since nested #include's would increase the number
> of
> > file accesses combinatorially).
> 
> See below for why this isn't true for most compilers
> now.
> 
> > As another indication that the conclusion is true,
> > Lakos added guards around the #include lines
> > themselves and found compile times to dramatically
> > decrease.  For example:
> > #if header_h
> > #   include <header.h>
> > #endif
> 
> This isn't the case with GCC.  I hope you're aware
> of that.
> The first time GCC reads <header.h> it remembers if
> it had
> header guards.  If it's ever asked to #include it
> again,
> it checks if the guard is defined, and doesn't do
> anything.
> The file's contents are also not cached if it has
> header
> guards, on the assumption that the contents are
> unlikely to
> be of interest in the future.
> 
> In other words, this kind of #include protection is
> ugly and
> pointless (and possibly error-prone, though that
> would tend
> to be immediately obvious).  Most compilers now
> implement
> this optimization, but 5 or 6 years ago this wasn't
> the case.
> I think GCC was one of the first.

I stand corrected.  (I'm assuming gcc doesn't do this
in cases where the header guard might have side
effects or if there's a matching #else for the
#ifndef).

Do you think precompiled headers would help build
speed across several compiles since it would be
another source to eliminate repeated file opens?

Thanks,
Noel

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 20:27         ` Noel Yap
@ 2002-08-11  0:11           ` Neil Booth
  2002-08-12 12:04             ` Devang Patel
  0 siblings, 1 reply; 173+ messages in thread
From: Neil Booth @ 2002-08-11  0:11 UTC (permalink / raw)
  To: Noel Yap; +Cc: Stan Shebs, Mike Stump, gcc

Noel Yap wrote:-

> I stand corrected.  (I'm assuming gcc doesn't do this
> in cases where the header guard might have side
> effects or if there's a matching #else for the
> #ifndef).

Correct.  Header guards with side effects hardly exist
I think.  We recognize #ifndef and #if !defined with
optional parentheses.  Comments and whitespace do not
affect the optimization.  Headers with #else, #elif
at the top level, and with anything outside the guards,
or with a header guard that comes from a macro expansion
are not optimized this way.

> Do you think precompiled headers would help build
> speed across several compiles since it would be
> another source to eliminate repeated file opens?

I don't think repeated file opens are high on the list
of time eaters, particularly because of the optimization
I mentioned.  Tokenization and parsing probably take
much longer.

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-11  0:11           ` Neil Booth
@ 2002-08-12 12:04             ` Devang Patel
  0 siblings, 0 replies; 173+ messages in thread
From: Devang Patel @ 2002-08-12 12:04 UTC (permalink / raw)
  To: Noel Yap; +Cc: Neil Booth, Stan Shebs, Mike Stump, gcc

On Sunday, August 11, 2002, at 12:08  AM, Neil Booth wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 13:04 ` Noel Yap
                     ` (2 preceding siblings ...)
  2002-08-09 15:13   ` Stan Shebs
@ 2002-08-09 18:57   ` Linus Torvalds
  2002-08-09 19:12     ` Phil Edwards
                       ` (2 more replies)
  3 siblings, 3 replies; 173+ messages in thread
From: Linus Torvalds @ 2002-08-09 18:57 UTC (permalink / raw)
  To: yap_noel, gcc

In article < 20020809200413.46719.qmail@web21403.mail.yahoo.com > you write:
>Build speeds are most helped by minimizing the number
>of files opened and closed during the build.

I _seriously_ doubt that.

Opening (and even reading) a cached file is not an expensive operation,
not compared to the kinds of run-times gcc has.  We're talking a few
microseconds per file open at a low level.  Even parsing it should not
be that expensive, especially if the preprocessor is any good (and from
all I've seen, these days it _is_ good).

I strongly suspect that what makes gcc slow is that it has absolutely
horrible cache behaviour, a big VM footprint, and chases pointers in
that badly cached area all of the time.

And that, in turn, is probably impossible to fix as long as gcc uses
garbage collection for most of its internal memory management.  There
just aren't all that many worse ways to f*ck up your cache behaviour
than by using lots of allocations and lazy GC to manage your memory. 

The problem with bad cache behaviour is that you don't get nice spikes
in specific places that you can try to optimize - the cost ends up being
spread all over the places that touch the data structures. 

The problem with trying to avoid GC is that if you do that you have to
be careful about your reference counts, and I doubt the gcc people want
to be that careful, especially considering that the code-base right now
is not likely to be very easy to convert.

(Plus the fact that GC proponents absolutely refuse to see the error of
their ways, and will flame me royally for even _daring_ to say that GC
sucks donkey brains through a straw from a performance standpoint.  If
order to work with refcounting, you need to have the mentality that
every single data structure with a non-local lifetime needs to have the
count as it's major member)

			Linus

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 18:57   ` Linus Torvalds
@ 2002-08-09 19:12     ` Phil Edwards
  2002-08-09 19:34     ` Kevin Atkinson
  2002-08-10 19:20     ` Noel Yap
  2 siblings, 0 replies; 173+ messages in thread
From: Phil Edwards @ 2002-08-09 19:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: yap_noel, gcc

On Fri, Aug 09, 2002 at 06:56:58PM -0700, Linus Torvalds wrote:
> In article < 20020809200413.46719.qmail@web21403.mail.yahoo.com > you write:
> >Build speeds are most helped by minimizing the number
> >of files opened and closed during the build.
> 
> I _seriously_ doubt that.

To be fair, when listing "things we can do to speed up the build," most
people don't include tinkering with the guts of the compiler.  Statements
like that of the original poster are correct when the compiler cannot be
touched, and in fact many textbooks say exactly that:  minimize the number
of files opened (or more generally, system calls) to speed the build.
(The lesson is typically something about multiple include guard macros or
proper makefile dependancies.)  So let's not be too harsh.

When we're allowed to hack on the compiler source itself, of course,
those statements go right out the window.  :-)

Phil

-- 
I would therefore like to posit that computing's central challenge, viz. "How
not to make a mess of it," has /not/ been met.
                                                 - Edsger Dijkstra, 1930-2002

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 18:57   ` Linus Torvalds
  2002-08-09 19:12     ` Phil Edwards
@ 2002-08-09 19:34     ` Kevin Atkinson
  2002-08-09 20:28       ` Linus Torvalds
  2002-08-10 19:20     ` Noel Yap
  2 siblings, 1 reply; 173+ messages in thread
From: Kevin Atkinson @ 2002-08-09 19:34 UTC (permalink / raw)
  To: gcc

On Fri, 9 Aug 2002, Linus Torvalds wrote:

> And that, in turn, is probably impossible to fix as long as gcc uses
> garbage collection for most of its internal memory management.  There
> just aren't all that many worse ways to f*ck up your cache behaviour
> than by using lots of allocations and lazy GC to manage your memory. 

Excuse the interruption, but from what I read a good generational garbage 
collector can be just as fast as manually managing memory?  Is this not 
the case?  If so could some one point me to some information regarding 
why?  I am not trying to argue with anyone as I really don't know that 
much about GC except from what I read in a few papers.

Sorry, I was reading this thread and that point struct me by surprise.

--- 
http://kevin.atkinson.dhs.org

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 19:34     ` Kevin Atkinson
@ 2002-08-09 20:28       ` Linus Torvalds
  2002-08-09 21:12         ` Daniel Berlin
  2002-08-10  6:32         ` Robert Lipe
  0 siblings, 2 replies; 173+ messages in thread
From: Linus Torvalds @ 2002-08-09 20:28 UTC (permalink / raw)
  To: kevin, gcc

In article < Pine.LNX.4.44.0208092227500.2273-100000@kevin-pc.atkinson.dhs.org > you write:
>On Fri, 9 Aug 2002, Linus Torvalds wrote:
>
>> And that, in turn, is probably impossible to fix as long as gcc uses
>> garbage collection for most of its internal memory management.  There
>> just aren't all that many worse ways to f*ck up your cache behaviour
>> than by using lots of allocations and lazy GC to manage your memory. 
>
>Excuse the interruption, but from what I read a good generational garbage 
>collector can be just as fast as manually managing memory?

All the papers I've seen on it are total jokes.  But maybe I've looked
at the wrong ones. 

One fundamental fact on modern hardware is that data cache locality is
good, and not being in the cache sucks.  This is not likely to change. 
In particular, this means that if you allocate stuff, you want to re-use
the stuff you just freed _as_soon_as_possible_ - preferably before the
previously dirty data has ever even been evicted from the cache, so that
you can re-use the thing to avoid reading it in, but also to avoid
writing out stale data. 

This implies that any lazy de-allocation is bad. When a piece of memory
is free, you want to de-allocate it _immediately_, so that the next
allocation gets to re-use it and gets the cache footprint "for free".

Generational garabage collectors tend to never re-use hot objects, and
often do the copying between generations making things even worse on the
cache.  Compaction helps subsequent use somewhat, but is in itself
inherently costly, and the indirection (or fixup) implied by it can
limit other optimization. 

Sure, by being lazy you can sometimes win in icache footprint (and in
instruction count - a lot of the "GC is fast" papers seem to rely on the
fact that you can do other optimizations if you're lazy), but you lose
big in dirty dcache footprint.  And since dcache is much more expensive
than instructions, you're better off doing explicit memory management
with refcounting (optionally helped by the programming language, of
course.  You can make exact refcounting be your "GC" with some language
support). 

However, there's another, more fundamental issue.  It's the _mindset_. 
The GC mindset tends to go hand-in-hand with pointer chasing, while
people who use explicit allocators tend to be happier with doing things
like "realloc()" and trying to use arrays and indexes instead of linked
lists and just generally trying to avoid allocating lots of small
things.  Which tends to be better on the cache. 

Yes, I generalize. Don't we all?

For example, if you have an _explicit_ refcounting system, then it is
quite natural to have operations like "copy-on-write", where if you
decide to change a tree node you do something like

	copy_on_write(node_t **np)
	{

		note_t *node = *np;
		if (node->count > 1)
			newnode = copy_alloc(node);
			*np = newnode;
			node->count--;
			node = newnode;
		}
		return node;
	}

and then before you change a tree node you do

	node = copy_on_write(&tree->node);
	.. we now know we are the exclusive owners of "node" ..

which tends to be very efficient - it allows sharing, even if sharing is
often not the common case (and doesn't do any extra allocations for the
common case of an access that was already exclusively owned).

(If you want to be thread-safe you need to be more careful yet, and have
thread-safe "get_node()/put_node()" actions etc.  Most applications
don't need to be that careful, but you'll see a _lot_ of this inside an
operating system). 

In contrast, in a GC system where you do _not_ have access to the
explicit refcounting, you tend to always copy the node, just because you
don't know if the original node might be shared through another tree or
not.  Even if sharing ends up not being the most common case.  So you do
a lot of extra work, and you end up with even more cache pressure. 

Are the GC systems that do refcounting internally _and_ expose the
information upwards to the user? I bet there are. But the fact is, the
rest of them (99.9%) give those few well-done GC's a bad name.

"So what about circular data structures? Refcounting doesn't work for
them".  Right.  Don't do them.  Or handle them very very carefully (ie
there can be a "head" that gets special handling and keeps the others
alive). Compilers certainly almost always end up working with DAG's, not
cyclic structures. Make it a rule.

Does it take more effort? Yes.  The advantage of GC is that it is
automatic.  But CG apologists should just admit that it causes bad
problems and often _encourages_ people to write code that performs
badly. 

I really think it's the mindset that is the biggest problem.  A GC
system with explicitly visible reference counts (and immediate freeing)
with language support to make it easier to get the refcounts right
(things like automatically incrementing the refcounts when passing the
object off to others) wouldn't necessarily be painful to use, and would
clearly offer all the advantages of just doing it all by hand. 

That's not the world we live in, though.

		Linus

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 20:28       ` Linus Torvalds
@ 2002-08-09 21:12         ` Daniel Berlin
  2002-08-09 21:52           ` Linus Torvalds
  2002-08-10  6:32         ` Robert Lipe
  1 sibling, 1 reply; 173+ messages in thread
From: Daniel Berlin @ 2002-08-09 21:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: kevin, gcc

> 
> "So what about circular data structures? Refcounting doesn't work for
> them".  Right.  Don't do them.  Or handle them very very carefully (ie
> there can be a "head" that gets special handling and keeps the others
> alive). Compilers certainly almost always end up working with DAG's, not
> cyclic structures. Make it a rule.
Sorry, there are cases that make this impossible to do (IOW we can't make 
it a rule).
But another option is to do what Python does.
Have a reference cycle GC that just handles breaking cycles.
Run it explicitly at times, or much like we do ggc_collect now.
Reference cycles can only possibly occur in container objects, so you 
only have to deal with the overhead of cycle-breaking there.

--Dan

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 21:12         ` Daniel Berlin
@ 2002-08-09 21:52           ` Linus Torvalds
  0 siblings, 0 replies; 173+ messages in thread
From: Linus Torvalds @ 2002-08-09 21:52 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: kevin, gcc

On Sat, 10 Aug 2002, Daniel Berlin wrote:
> > 
> > "So what about circular data structures? Refcounting doesn't work for
> > them".  Right.  Don't do them.  Or handle them very very carefully (ie
> > there can be a "head" that gets special handling and keeps the others
> > alive). Compilers certainly almost always end up working with DAG's, not
> > cyclic structures. Make it a rule.
>
> Sorry, there are cases that make this impossible to do (IOW we can't make 
> it a rule).

Hmm. I can't imagine what is there that is inherently cyclic, but breaking 
the cycles might be more painful than it's worth, so I'll take your word 
for it.

Things like data structure definitions (which clearly can be cyclic thanks
to pointers to themselves) can often be resolved trivially with nesting
rules (ie if you can show that the lifetime of type A is a superset of the
lifetime of B, then you don't actually need to refcount a backpointer from
B to A).

For the obvious example that I can think of (ie just a structure
definition containing a pointer to itself - possibly indirectly via other
structures), that type lifetime nesting is inherent in the C type scopes,
for example. For type X to have been able to contain a pointer to type Y,
Y must have had a larger scope than X, so the pointer from one type
structure to another never needs refcounting in a C compiler.

(This, btw, is why I don't believe in automated GC systems - even if they
use refcounting internally. It's simply fairly hard to tell a GC system
simple rules like when you need to ref-count, and when you don't.  If you
just always ref-count on assignment, you _will_ get the obvious circular
references, simply because you miss the high-level picture).

But other cases might certainly be much more painful, so I certainly agree
with you:

> But another option is to do what Python does.
> Have a reference cycle GC that just handles breaking cycles.
> Run it explicitly at times, or much like we do ggc_collect now.
> Reference cycles can only possibly occur in container objects, so you 
> only have to deal with the overhead of cycle-breaking there.

Nothing says you can't mix the two approaches, no. If the subset of
allocations you need to worry about from a GC standpoint is relatively
small, the cache efficiency advantages of refcounting clearly don't
matter, and the disadvantages can be disproportional.

			Linus

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 20:28       ` Linus Torvalds
  2002-08-09 21:12         ` Daniel Berlin
@ 2002-08-10  6:32         ` Robert Lipe
  2002-08-10 14:26           ` Cyrille Chepelov
  1 sibling, 1 reply; 173+ messages in thread
From: Robert Lipe @ 2002-08-10  6:32 UTC (permalink / raw)
  To: gcc

Linus Torvalds wrote:

> One fundamental fact on modern hardware is that data cache locality is
> good, and not being in the cache sucks.  This is not likely to change. 

This is a fact.

Measuring this sort of thing is possible.  (Optimizing without
measuring is seldom a good idea.)  In the absence of processor pods
and bus analyzers, has anyone thrown gcc at a tool like 'valgrind' or
cachegrind?

	http://developer.kde.org/~sewardj/

RJL

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10  6:32         ` Robert Lipe
@ 2002-08-10 14:26           ` Cyrille Chepelov
  2002-08-10 17:33             ` Daniel Berlin
  2002-08-11  1:03             ` Florian Weimer
  0 siblings, 2 replies; 173+ messages in thread
From: Cyrille Chepelov @ 2002-08-10 14:26 UTC (permalink / raw)
  To: gcc

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4064 bytes --]

Le Sat, Aug 10, 2002, Ã  08:32:26AM -0500, Robert Lipe a Ã©crit:

> Linus Torvalds wrote:
> 
> > One fundamental fact on modern hardware is that data cache locality is
> > good, and not being in the cache sucks.  This is not likely to change. 
> 
> This is a fact.

> Measuring this sort of thing is possible.  (Optimizing without
> measuring is seldom a good idea.)  In the absence of processor pods
> and bus analyzers, has anyone thrown gcc at a tool like 'valgrind' or
> cachegrind?

I just did (I was forming the idea while reading the thread, but you beat me
in suggesting it before I implemented it).

I have tried on a grand total of three files, two from today's mainline CVS
(updated from anonymous about four hours ago), and one from Linux 2.5.30; as
my machine is not exactly the dual-multi-gigahertz, "HT"-interconnected
(HyperTransport ?) with gobs of memory bandwith (and what else? 64 bits?) 
monsters Linus has been bragging about recently, please bear with lack of
patience to run CG over the whole aforementioned packages...

Some detailed results here: http://www.chepelov.org/cyrille/gcc-valgrind 

Excerpt:

	java/parse.c
==17875== I   refs:      275,598,220
==17875== I1  misses:         43,600
==17875== L2i misses:         41,948
==17875== I1  miss rate:         0.1%
==17875== L2i miss rate:         0.1%
==17875== 
==17875== D   refs:      145,894,312  (94,095,162 rd + 51,799,150 wr)
==17875== D1  misses:        322,121  (   259,431 rd +     62,690 wr)
==17875== L2d misses:        313,318  (   251,817 rd +     61,501 wr)
==17875== D1  miss rate:         0.2% (       0.2%   +        0.1%  )
==17875== L2d miss rate:         0.2% (       0.2%   +        0.1%  )
==17875== 
==17875== L2 refs:           365,721  (   303,031 rd +     62,690 wr)
==17875== L2 misses:         355,266  (   293,765 rd +     61,501 wr)
==17875== L2 miss rate:          0.0% (       0.0%   +        0.1%  )

	emit-rtl.c:
==17968== I   refs:      2,315,492,628
==17968== I1  misses:        5,888,264
==17968== L2i misses:        5,481,716
==17968== I1  miss rate:          0.25%
==17968== L2i miss rate:          0.23%
==17968== 
==17968== D   refs:      1,172,342,347  (702,376,465 rd + 469,965,882 wr)
==17968== D1  misses:        7,920,482  (  6,205,391 rd +   1,715,091 wr)
==17968== L2d misses:        7,134,597  (  5,455,816 rd +   1,678,781 wr)
==17968== D1  miss rate:           0.6% (        0.8%   +         0.3%  )
==17968== L2d miss rate:           0.6% (        0.7%   +         0.3%  )
==17968== 
==17968== L2 refs:          13,808,746  ( 12,093,655 rd +   1,715,091 wr)
==17968== L2 misses:        12,616,313  ( 10,937,532 rd +   1,678,781 wr)
==17968== L2 miss rate:            0.3% (        0.3%   +         0.3%  )

	linux/kernel/signal.c:
==22924== 
==22924== I   refs:      1,020,746
==22924== I1  misses:        1,030
==22924== L2i misses:          946
==22924== I1  miss rate:      0.10%
==22924== L2i miss rate:       0.9%
==22924== 
==22924== D   refs:        480,927  (335,166 rd + 145,761 wr)
==22924== D1  misses:        2,075  (  1,535 rd +     540 wr)
==22924== L2d misses:        2,072  (  1,532 rd +     540 wr)
==22924== D1  miss rate:       0.4% (    0.4%   +     0.3%  )
==22924== L2d miss rate:       0.4% (    0.4%   +     0.3%  )
==22924== 
==22924== L2 refs:           3,105  (  2,565 rd +     540 wr)
==22924== L2 misses:         3,018  (  2,478 rd +     540 wr)
==22924== L2 miss rate:        0.2% (    0.1%   +     0.3%  )


I don't want to fuel any kind of flamewars (after all, it's only software),
but the miss rates above don't seem too horrible (maybe they are, after all).

What cachegrind doesn't show (yet ?) is if the access pattern kills
opportunities for the memory interface to use burst transfers; after all,
SDRAM also has some form of "seek time". It is possible that something's
hidden there. Also, I didn't spend much time trying to figure the proper
vg_annotate include path, so some functions appear as unknown in the
detailed cachegrind outputs. Well, that's a start.

	-- Cyrille

-- 
Grumpf.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 14:26           ` Cyrille Chepelov
@ 2002-08-10 17:33             ` Daniel Berlin
  2002-08-10 18:21               ` Linus Torvalds
  2002-08-10 18:28               ` Cyrille Chepelov
  2002-08-11  1:03             ` Florian Weimer
  1 sibling, 2 replies; 173+ messages in thread
From: Daniel Berlin @ 2002-08-10 17:33 UTC (permalink / raw)
  To: Cyrille Chepelov; +Cc: gcc

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1468 bytes --]

On Sat, 10 Aug 2002, Cyrille Chepelov wrote:

> Le Sat, Aug 10, 2002, Ã  08:32:26AM -0500, Robert Lipe a Ã©crit:
> 
> > Linus Torvalds wrote:
> > 
> > > One fundamental fact on modern hardware is that data cache locality is
> > > good, and not being in the cache sucks.  This is not likely to change. 
> > 
> > This is a fact.
> 
> > Measuring this sort of thing is possible.  (Optimizing without
> > measuring is seldom a good idea.)  In the absence of processor pods
> > and bus analyzers, has anyone thrown gcc at a tool like 'valgrind' or
> > cachegrind?
> 
> I just did (I was forming the idea while reading the thread, but you beat me
> in suggesting it before I implemented it).
> 
> I have tried on a grand total of three files, two from today's mainline CVS
> (updated from anonymous about four hours ago), and one from Linux 2.5.30; as
> my machine is not exactly the dual-multi-gigahertz, "HT"-interconnected
> (HyperTransport ?) with gobs of memory bandwith (and what else? 64 bits?) 
> monsters Linus has been bragging about recently, please bear with lack of
> patience to run CG over the whole aforementioned packages...

The numbers I get on a p4 with cachegrind are *much* worse in all cases.

The miss rates are all >2%, which is a far cry from 0.1% and 0.0%.

Are you sure you have valgrind configured right for your cache?

I'm going to do this the *real* way, using the performance monitoring 
counters on my p4, and get *real* numbers.
--Dan

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 17:33             ` Daniel Berlin
@ 2002-08-10 18:21               ` Linus Torvalds
  2002-08-10 18:38                 ` Daniel Berlin
  2002-08-10 18:39                 ` Cyrille Chepelov
  2002-08-10 18:28               ` Cyrille Chepelov
  1 sibling, 2 replies; 173+ messages in thread
From: Linus Torvalds @ 2002-08-10 18:21 UTC (permalink / raw)
  To: dberlin, gcc

In article < Pine.LNX.4.44.0208102031550.8641-100000@dberlin.org > you write:
>
>The numbers I get on a p4 with cachegrind are *much* worse in all cases.
>
>The miss rates are all >2%, which is a far cry from 0.1% and 0.0%.

One thing to look out for when looking at cache miss numbers is what
they actually _mean_.

That is particularly true when it comes to the percentages. Are the
percentages relative to #instructions, or #memops, or #line fetches (the
latter ends up being interesting especially for I$).

The "percentage per instruction" number is to some degree a nonsensical
number (since many instructions do not do any D$ accesses at all), but
it has the advantage that it makes the I$ and D$ misses comparable, and
it also allows you to make a quick estimation of how much time was
actually spent on cache misses. 

The _best_ number to get (and in the end, the only one that really
matters) is "cycles spent waiting on cache" and "cycles spent doing
useful work", but I don't think valgrind gives you that.  The P4
counters should do it, though. 

If you wan tto use the HW counters under Linux, get "oprofile" from
sourceforge.net. (I don't think it does P4 events yet, though)

			Linus

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 18:21               ` Linus Torvalds
@ 2002-08-10 18:38                 ` Daniel Berlin
  2002-08-10 18:39                 ` Cyrille Chepelov
  1 sibling, 0 replies; 173+ messages in thread
From: Daniel Berlin @ 2002-08-10 18:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: gcc

On Sat, 10 Aug 2002, Linus Torvalds wrote:

> In article < Pine.LNX.4.44.0208102031550.8641-100000@dberlin.org > you write:
> >
> >The numbers I get on a p4 with cachegrind are *much* worse in all cases.
> >
> >The miss rates are all >2%, which is a far cry from 0.1% and 0.0%.
> 
> One thing to look out for when looking at cache miss numbers is what
> they actually _mean_.

Yeah.

> The _best_ number to get (and in the end, the only one that really
> matters) is "cycles spent waiting on cache" and "cycles spent doing
> useful work", but I don't think valgrind gives you that.  The P4
> counters should do it, though. 

Yuppers.
> 
> If you wan tto use the HW counters under Linux, get "oprofile" from
> sourceforge.net. (I don't think it does P4 events yet, though)

brink and abyss do p4 events, which is what i'm using.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 18:21               ` Linus Torvalds
  2002-08-10 18:38                 ` Daniel Berlin
@ 2002-08-10 18:39                 ` Cyrille Chepelov
  1 sibling, 0 replies; 173+ messages in thread
From: Cyrille Chepelov @ 2002-08-10 18:39 UTC (permalink / raw)
  To: gcc

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

Le Sat, Aug 10, 2002, Ã  06:20:51PM -0700, Linus Torvalds a Ã©crit:

> >The numbers I get on a p4 with cachegrind are *much* worse in all cases.
> >
> >The miss rates are all >2%, which is a far cry from 0.1% and 0.0%.
> 
> One thing to look out for when looking at cache miss numbers is what
> they actually _mean_.
> 
> That is particularly true when it comes to the percentages. Are the
> percentages relative to #instructions, or #memops, or #line fetches (the
> latter ends up being interesting especially for I$).

These are percentages relative to the number of accesses. L2 percentages are
also relative to the original number of accesses, not to the number of L1
misses.

> The _best_ number to get (and in the end, the only one that really
> matters) is "cycles spent waiting on cache" and "cycles spent doing
> useful work", but I don't think valgrind gives you that.  The P4
> counters should do it, though. 

Indeed, cachegrind won't tell you when there was a miss but the hardware was
smart enough to do something useful while it waits for the cache.
Despite this limitation, shouldn't 
	(number_of_L1_misses * N) + (number_of_L2_misses * M) * cycle_len
[where N is roughly 10 and M roughly 200, or updated figures] be a ballpark
figure of the time lost waiting for RAM to catch up?

> If you wan tto use the HW counters under Linux, get "oprofile" from
> sourceforge.net. (I don't think it does P4 events yet, though)
The site says it doesn't yet.

	-- Cyrille

-- 
Grumpf.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 17:33             ` Daniel Berlin
  2002-08-10 18:21               ` Linus Torvalds
@ 2002-08-10 18:28               ` Cyrille Chepelov
  2002-08-10 18:30                 ` John Levon
  1 sibling, 1 reply; 173+ messages in thread
From: Cyrille Chepelov @ 2002-08-10 18:28 UTC (permalink / raw)
  To: gcc

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2235 bytes --]

Le Sat, Aug 10, 2002, Ã  08:33:53PM -0400, Daniel Berlin a Ã©crit:

> On Sat, 10 Aug 2002, Cyrille Chepelov wrote:
> > I have tried on a grand total of three files, two from today's mainline CVS
> > (updated from anonymous about four hours ago), and one from Linux 2.5.30; as
> > my machine is not exactly the dual-multi-gigahertz, "HT"-interconnected
> > (HyperTransport ?) with gobs of memory bandwith (and what else? 64 bits?) 

(Some brave soul pointed to me that HT is more probably HyperThreading. I
stand corrected (though being LT surely entitles one to getting cooler toys
that mere mortals)).

> The numbers I get on a p4 with cachegrind are *much* worse in all cases.
> 
> The miss rates are all >2%, which is a far cry from 0.1% and 0.0%.

a-ha ! This is interesting... Did you run on the same sample files as I did,
or others ? Can you reproduce my numbers if you set --I1=65536,2,64
--D1=65536,2,64 --L2=65536,8,64 ?

> Are you sure you have valgrind configured right for your cache?

Sure, no. The cache spec numbers did look about rig... D'oh! Looks like 
Cachegrind trusts a little too faithfully what this old (A0-stepping) Duron 
says. CG believes L2 is 1 KB, whereas in fact it is 64KB.

I've just re-ran the java/parser.c test with forcing --L2=65536,8,64, and
uploaded the results (same place)

What are the first lines of output from vg_annotate on your system ?
It certainly sounds unbelievable that a Duron's cache design beats a P4's.

(there is something curious about the L2 lines from the initial output (the
last three ones). Saying that 355266 misses for 365721 refs means a 0.0%
miss rate certainly sounds strange, I've got to ask Julian about the logic
there. Looks to me that L2 failed 97% of its mission).

> I'm going to do this the *real* way, using the performance monitoring 
> counters on my p4, and get *real* numbers.

It would be very interesting to see how far off CG falls... CG does make the
implicit assumption that the process runs uninterrupted (I tried welding
cachegrind into UML, but that didn't bring me far). The real CPU will
certainly give you a more lively picture.... (the performance monitoring
counters are not per-process on Linux, are they ?)

	-- Cyrille

-- 
Grumpf.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 18:28               ` Cyrille Chepelov
@ 2002-08-10 18:30                 ` John Levon
  0 siblings, 0 replies; 173+ messages in thread
From: John Levon @ 2002-08-10 18:30 UTC (permalink / raw)
  To: gcc

On Sun, Aug 11, 2002 at 03:28:51AM +0200, Cyrille Chepelov wrote:

> It would be very interesting to see how far off CG falls... CG does make the
> implicit assumption that the process runs uninterrupted (I tried welding
> cachegrind into UML, but that didn't bring me far). The real CPU will
> certainly give you a more lively picture.... (the performance monitoring
> counters are not per-process on Linux, are they ?)

perfctr patch supports virtual counters (google first hit). I don't
remember if it has P4 support yet.

regards
john

-- 
"It is unbecoming for young men to utter maxims."
	- Aristotle

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 14:26           ` Cyrille Chepelov
  2002-08-10 17:33             ` Daniel Berlin
@ 2002-08-11  1:03             ` Florian Weimer
  1 sibling, 0 replies; 173+ messages in thread
From: Florian Weimer @ 2002-08-11  1:03 UTC (permalink / raw)
  To: Cyrille Chepelov; +Cc: gcc

Cyrille Chepelov <cyrille@chepelov.org> writes:

> What cachegrind doesn't show (yet ?) is if the access pattern kills
> opportunities for the memory interface to use burst transfers;

By the way:

IIRC, there is some FUD by the author on the web page that the cache
simulation might be incorrect.  Maybe someone should check this before
jumping to conclusions (I'm not familiar with processor cache
architectures, that's why I can't do this, sorry).

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 18:57   ` Linus Torvalds
  2002-08-09 19:12     ` Phil Edwards
  2002-08-09 19:34     ` Kevin Atkinson
@ 2002-08-10 19:20     ` Noel Yap
  2 siblings, 0 replies; 173+ messages in thread
From: Noel Yap @ 2002-08-10 19:20 UTC (permalink / raw)
  To: Linus Torvalds, gcc

--- Linus Torvalds <torvalds@transmeta.com> wrote:
> In article
> < 20020809200413.46719.qmail@web21403.mail.yahoo.com >
> you write:
> >Build speeds are most helped by minimizing the
> number
> >of files opened and closed during the build.
> 
> I _seriously_ doubt that.

Yes, my statement is exagerated although they are not
completely truthless.

The study conducted by John Lakos and some testing
that I have conducted point to the fact that
minimizing file opens does speed up builds
significantly.

Of course, that's not to say that other courses of
action shouldn't be pursued.

> Opening (and even reading) a cached file is not an
> expensive operation,
> not compared to the kinds of run-times gcc has. 
> We're talking a few
> microseconds per file open at a low level.  Even
> parsing it should not
> be that expensive, especially if the preprocessor is
> any good (and from
> all I've seen, these days it _is_ good).

Hmm, perhaps it's time I conducted some tests again. 
I'm assuming you're talking about caching at the OS
level?

> I strongly suspect that what makes gcc slow is that
> it has absolutely
> horrible cache behaviour, a big VM footprint, and
> chases pointers in
> that badly cached area all of the time.

Maybe you're not talking about caching at the OS
level.  Caching at the compiler level will certainly
help with header files that are included multiple
times.  OTOH, caching at the OS level and/or
preprocessing header files will help with that /and/
header files that are included across compiles.

> And that, in turn, is probably impossible to fix as
> long as gcc uses
> garbage collection for most of its internal memory
> management.  There
> just aren't all that many worse ways to f*ck up your
> cache behaviour
> than by using lots of allocations and lazy GC to
> manage your memory. 
> 
> The problem with bad cache behaviour is that you
> don't get nice spikes
> in specific places that you can try to optimize -
> the cost ends up being
> spread all over the places that touch the data
> structures. 
> 
> The problem with trying to avoid GC is that if you
> do that you have to
> be careful about your reference counts, and I doubt
> the gcc people want
> to be that careful, especially considering that the
> code-base right now
> is not likely to be very easy to convert.
> 
> (Plus the fact that GC proponents absolutely refuse
> to see the error of
> their ways, and will flame me royally for even
> _daring_ to say that GC
> sucks donkey brains through a straw from a
> performance standpoint.  If
> order to work with refcounting, you need to have the
> mentality that
> every single data structure with a non-local
> lifetime needs to have the
> count as it's major member)

I'll leave it to the experts to hash this area out.

Noel


__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 12:17 Faster compilation speed Mike Stump
  2002-08-09 13:04 ` Noel Yap
@ 2002-08-09 13:10 ` Aldy Hernandez
  2002-08-09 15:28   ` Mike Stump
  2002-08-09 14:29 ` Neil Booth
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 173+ messages in thread
From: Aldy Hernandez @ 2002-08-09 13:10 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

>>>>> "Mike" == Mike Stump <mrs@apple.com> writes:

 > + /* Nonzero for compiling as fast as we can.  */
 > +
 > + extern int flag_speed_compile;
 > +
 > + #define SPEEDCOMPILE flag_speed_compile

So, you want to introduce a flag to do faster compilation?  Why not
spend your time making the current infrastructure faster?

Aldy

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 13:10 ` Aldy Hernandez
@ 2002-08-09 15:28   ` Mike Stump
  2002-08-09 16:00     ` Aldy Hernandez
  2002-08-09 19:07     ` David Edelsohn
  0 siblings, 2 replies; 173+ messages in thread
From: Mike Stump @ 2002-08-09 15:28 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc

On Friday, August 9, 2002, at 01:15 PM, Aldy Hernandez wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:28   ` Mike Stump
@ 2002-08-09 16:00     ` Aldy Hernandez
  2002-08-09 16:26       ` Stan Shebs
  2002-08-12 16:05       ` Mike Stump
  2002-08-09 19:07     ` David Edelsohn
  1 sibling, 2 replies; 173+ messages in thread
From: Aldy Hernandez @ 2002-08-09 16:00 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

> Let's take my combine elision patch.  This patch makes the compiler 
> generate worse code.  The way in which it is worse, is that more stack 
> space is used.  How much more, well, my initial guess is that it is 
> less than 10% worse.  Not too bad.  Maybe users would care, maybe they 

I assume you have already looked at the horrendity of the code
presently generated by -O0.  It's pretty unusable as it is.  Who would
really want to use gcc under the influence of "worse than -O0"?
Really.

> I hope that explains my thinking a little bit more.  Comments?  
> Anything sound wrong?  And unforeseen dangers?

Off the top of my head, if you insist on this approach, at least
guarantee that generated code is no worse to debug.  That is the only
reason *I* use -O0, to debug.

Cheers.
Aldy

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:00     ` Aldy Hernandez
@ 2002-08-09 16:26       ` Stan Shebs
  2002-08-09 16:31         ` Aldy Hernandez
                           ` (2 more replies)
  2002-08-12 16:05       ` Mike Stump
  1 sibling, 3 replies; 173+ messages in thread
From: Stan Shebs @ 2002-08-09 16:26 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Mike Stump, gcc

Aldy Hernandez wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:26       ` Stan Shebs
@ 2002-08-09 16:31         ` Aldy Hernandez
  2002-08-09 16:51           ` Stan Shebs
  2002-08-09 17:36         ` Daniel Berlin
  2002-08-12 16:23         ` Mike Stump
  2 siblings, 1 reply; 173+ messages in thread
From: Aldy Hernandez @ 2002-08-09 16:31 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Mike Stump, gcc

> OK, then to really rub it in, CW runs much faster than GCC, even on
> that slow Darwin OS :-), and that's with its non-optimizing case being

Hey, no fair.  You know my complaints are strictly in the filesystem
:).

> Sacrificing -O0 optimization is just a desperation move, since
> we don't seem to have many other ideas about how to make GCC as
> fast as CW.

Ah, the truth comes out.  So... Don't you think that if we spent more
time getting the infrastructure faster, -O0 will improve as well?

Either way, I ain't going to vote against a faster -O0.  At least
it speeds up my development cycle, since I program by building cc1,
inspecting assembly, and repeating cycle :).

Aldy

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:31         ` Aldy Hernandez
@ 2002-08-09 16:51           ` Stan Shebs
  2002-08-09 16:54             ` Aldy Hernandez
                               ` (3 more replies)
  0 siblings, 4 replies; 173+ messages in thread
From: Stan Shebs @ 2002-08-09 16:51 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Mike Stump, gcc

Aldy Hernandez wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:51           ` Stan Shebs
@ 2002-08-09 16:54             ` Aldy Hernandez
  2002-08-09 17:44             ` Daniel Berlin
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 173+ messages in thread
From: Aldy Hernandez @ 2002-08-09 16:54 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Mike Stump, gcc

> I don't think Mike mentioned it, but speeding up the compiler has
> become our group's top priority, and every idea is on the table
> right now.  The 6x goal sounds extreme, but it helps keep in mind
> that one or two or even a dozen 5% improvements will not be
> sufficient to attain parity with the competition.

Fair enough.  Game on, and good luck.

And please don't keep your changes in your tree, and then have them
become obsolete in 4 months when you try to merge :)

Aldy

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:51           ` Stan Shebs
  2002-08-09 16:54             ` Aldy Hernandez
@ 2002-08-09 17:44             ` Daniel Berlin
  2002-08-09 18:35               ` David S. Miller
  2002-08-09 18:25             ` David S. Miller
  2002-08-10 10:02             ` Neil Booth
  3 siblings, 1 reply; 173+ messages in thread
From: Daniel Berlin @ 2002-08-09 17:44 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Aldy Hernandez, Mike Stump, gcc

On Fri, 9 Aug 2002, Stan Shebs wrote:

> Aldy Hernandez wrote:
> 
> >[...]
> >
> >   So... Don't you think that if we spent more
> >time getting the infrastructure faster, -O0 will improve as well?
> >
> Well sure, it should be part of the plan.
> 
> One of my suspicions is that the massive use of macros in tree
> and RTL is concealing excessive pointer chasing, because they
> don't show up in either profile or coverage numbers

Ding ding, you have another winner.

I actually benched this once, by functionizing some often used macros.

The timings were horrendous.
But what can we do to increase cache locality, or get rid of these 
problems?


> is taking the macros that we function-ized for debugging purposes
> (Ira posted it to gcc-patches some time ago, but nobody wanted it
> because dwarf2 macro debugging was going to be available RSN), and
> will build a (slow) GCC that will do it all through function calls.
> That should yield a much more interesting profile.
> 
> I don't think Mike mentioned it, but speeding up the compiler has
> become our group's top priority, and every idea is on the table
> right now.  The 6x goal sounds extreme, but it helps keep in mind
> that one or two or even a dozen 5% improvements will not be
> sufficient to attain parity with the competition.

I think part of the problem is that the timings gcc itself outputs aren't 
completely accurate, because sometimes we go around the calls that would 
push the timevar.

> Stan
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 17:44             ` Daniel Berlin
@ 2002-08-09 18:35               ` David S. Miller
  2002-08-09 18:39                 ` Aldy Hernandez
  0 siblings, 1 reply; 173+ messages in thread
From: David S. Miller @ 2002-08-09 18:35 UTC (permalink / raw)
  To: dberlin; +Cc: shebs, aldyh, mrs, gcc

   From: Daniel Berlin <dberlin@dberlin.org>
   Date: Fri, 9 Aug 2002 20:44:00 -0400 (EDT)

   The timings were horrendous.
   But what can we do to increase cache locality, or get rid of these
   problems?

And TLB locality...  I propose two possible solutions.

1) Reference count these objects properly, and stop being at the
   mercy of the garbage collector.

2) Make RTL/TREE layout less pointer driven.

I read elsewhere today someone saying that garbage collecting is for
people who cannot count, and after trying to beat GCC's GC into
submission for a few weeks I couldn't agree more :-)  And for this
reason if I had the time right now I'd probably tackle #1 first.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 18:35               ` David S. Miller
@ 2002-08-09 18:39                 ` Aldy Hernandez
  2002-08-09 18:59                   ` David S. Miller
  2002-08-09 20:01                   ` Per Bothner
  0 siblings, 2 replies; 173+ messages in thread
From: Aldy Hernandez @ 2002-08-09 18:39 UTC (permalink / raw)
  To: David S. Miller; +Cc: dberlin, shebs, mrs, gcc

> 2) Make RTL/TREE layout less pointer driven.

For the clueless, ahem me, could you go into more detail on this?

Thanks.

Aldy

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 18:39                 ` Aldy Hernandez
@ 2002-08-09 18:59                   ` David S. Miller
  2002-08-09 20:01                   ` Per Bothner
  1 sibling, 0 replies; 173+ messages in thread
From: David S. Miller @ 2002-08-09 18:59 UTC (permalink / raw)
  To: aldyh; +Cc: dberlin, shebs, mrs, gcc

   From: Aldy Hernandez <aldyh@redhat.com>
   Date: Fri, 9 Aug 2002 18:45:00 -0700

   > 2) Make RTL/TREE layout less pointer driven.

   For the clueless, ahem me, could you go into more detail on this?

Embed RTL object info instead of using pointers to other RTL objects.

It's about as far a reaching change as reference counting RTL and
killing off garbage collection.  The reason #2 is so far reaching is
that it would require changing several of the semantics of shared RTL
and also getting rid of the places that just randomly stick new RTL
all over the place.

Garbage collection is just an excuse to be lazy with how we manage
RTL objects in GCC.

Further consideration suggests that you can approach either solution
in at least two stages.  The first stage is somehow documenting in
the code each spot where we rewrite existing RTL.  That makes the
rest of the work a bit easier.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 18:39                 ` Aldy Hernandez
  2002-08-09 18:59                   ` David S. Miller
@ 2002-08-09 20:01                   ` Per Bothner
  1 sibling, 0 replies; 173+ messages in thread
From: Per Bothner @ 2002-08-09 20:01 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc

Aldy Hernandez wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:51           ` Stan Shebs
  2002-08-09 16:54             ` Aldy Hernandez
  2002-08-09 17:44             ` Daniel Berlin
@ 2002-08-09 18:25             ` David S. Miller
  2002-08-13  0:50               ` Loren James Rittle
  2002-08-10 10:02             ` Neil Booth
  3 siblings, 1 reply; 173+ messages in thread
From: David S. Miller @ 2002-08-09 18:25 UTC (permalink / raw)
  To: gcc

All of these attempts of taking care of "low hanging fruit"
are great.  But these efforts should not make us ignore the
real problems GCC has.

For example, I'm convinced that teaching all the RTL code "how to
count" and thus obviating garbage collection all together, would be
the biggest win ever.  (I'm saying RTL should have reference counts,
if someone didn't catch what I meant)

Someone, I think Stan Shebs, mentioned pointer chasing,
and that's another great area of exploration.

The problem is that most people don't want to, or has the time to, sit
down and do such far reaching changes necessary to fix these toplevel
problems.

This is exactly what makes things such as a "flag_go_fast" option so
appealing.  :-(

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 18:25             ` David S. Miller
@ 2002-08-13  0:50               ` Loren James Rittle
  2002-08-13 21:46                 ` Fergus Henderson
  0 siblings, 1 reply; 173+ messages in thread
From: Loren James Rittle @ 2002-08-13  0:50 UTC (permalink / raw)
  To: davem; +Cc: gcc

In article < 20020809.181251.63969530.davem@redhat.com > David S. Miller writes:

> For example, I'm convinced that teaching all the RTL code "how to
> count" and thus obviating garbage collection all together, would be
> the biggest win ever.  (I'm saying RTL should have reference counts,
> if someone didn't catch what I meant)

Hi David,

(This message is in the interest of brainstorming ways to improve
 compilation speed, even if we can't volunteer to implement, as Mike
 requested.)

In general, comparing RC-GC to scan-GC, I often thought along the
quoted lines as well.  However, I had no systematic data and my
opinion softened somewhat after reading Boehm's papers.  Then, for
non-modern hardware, I once did compare the performance of a
scan-GC-based system (using boehm-gc) verses that of an equivalent
explicit-free-based system (along with all the application-level RC
code).  I was truly surprised at how little overhead there was for
using the boehm-gc technique (off-hand, I think it was under 1% for my
system, but I do doubt this study applies to modern HW and/or gcc's
memory usage pattern) and, more importantly, how much code complexity
was reduced.  I believe that reduction in code complexity is what
drove gcc switching to scan-GC RTL.  If you hand-coded RC back in, how
is that different than the complexity that was once removed with the
introduction of scan-GC?  If I recall correctly, subtle object
lifetime bugs came and went with the pre-scan-GC code due to
complexity (perhaps it was never formally RC'd and if that is your
answer, I'd buy it ;-).

Now, if I understand it right, the scan-GC technique used in gcc is
not as elegant (some explicit marking is required) or high-performance
(gcc's implementation doesn't use hardware dirty bits, etc.) as that
used in boehm-gc.  Has anyone ever tested gcc with its own GC disabled
but boehm-gc enabled?  OK, this is a red herring question.  Even if
performance was greater, portability concerns are what caused the
decision to build a new custom scan-GC verses reusing boehm-gc...
Assuming your (application-level) RC-GC test pans out in terms of
speedup, perhaps adding explicit code to maintain counts is not the
best approach to keeping the reins on complexity.

This might be what you meant, but: Wouldn't it be neater if gcc itself
could generally reference count underlying memory which supports C
pointers (as a language extension)?  According to published papers,
the compiler for Inferno could do it (I read them years ago when
looking at the classic Java GC model verses other VM technology thus
no cite here; I think it is interesting that the latest Java JIT
compilers support RC-GC now).

Perhaps it is impossible to add generic RC support to C and expose it
to all users (for instance, there is the classic pointer escape/ABI
problem).  But it seems that we could mark structs whose pointers and
underlying memory representations are to be handled specially upon
pointer copy/invalidation (i.e. due to failing off the end of a scope)
and then rigorously check usage against whatever model we use to avoid
pointer escape.  GCC's use of pointers in this area is regular and I
see no reason the RC extension couldn't be modeled off the exact needs
of the RTL usage (just as scan-GC was not exposed to compiler users,
this RC-GC support could be tuned for compiler implementation).

How to handle bootstrap since we'd want to use the new technique to
replace gcc's current scan-GC?  The current GC is only slightly
intrusive and could be retained to build the stage1 compiler with
support for the new RC-pointer handler (and related support for struct
marking in source).  Current scan-GC would be disabled for stage2 and
3; the new RC-pointer handler would be enabled.

Regards,
Loren

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13  0:50               ` Loren James Rittle
@ 2002-08-13 21:46                 ` Fergus Henderson
  2002-08-13 22:40                   ` David S. Miller
  2002-08-14  7:36                   ` Jeff Sturm
  0 siblings, 2 replies; 173+ messages in thread
From: Fergus Henderson @ 2002-08-13 21:46 UTC (permalink / raw)
  To: Loren James Rittle; +Cc: davem, gcc

On 13-Aug-2002, Loren James Rittle <rittle@latour.rsch.comm.mot.com> wrote:
> Has anyone ever tested gcc with its own GC disabled
> but boehm-gc enabled?  OK, this is a red herring question.  Even if
> performance was greater, portability concerns are what caused the
> decision to build a new custom scan-GC verses reusing boehm-gc...

Yes, but GCC could use the Boehm GC on systems which supported it,
if the Boehm GC was faster...

I think this would be a very interesting experiment.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: < http://www.cs.mu.oz.au/~fjh >  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 21:46                 ` Fergus Henderson
@ 2002-08-13 22:40                   ` David S. Miller
  2002-08-13 23:44                     ` Fergus Henderson
                                       ` (2 more replies)
  2002-08-14  7:36                   ` Jeff Sturm
  1 sibling, 3 replies; 173+ messages in thread
From: David S. Miller @ 2002-08-13 22:40 UTC (permalink / raw)
  To: fjh; +Cc: rittle, gcc

   From: Fergus Henderson <fjh@cs.mu.OZ.AU>
   Date: Wed, 14 Aug 2002 14:46:37 +1000

   Yes, but GCC could use the Boehm GC on systems which supported it,
   if the Boehm GC was faster...

   I think this would be a very interesting experiment.

Feel free to even try it with an infinitely fast GC, even
one that executed in zero time.

Because for the millionth time, it's not the performance of GC itself.
It's the temporal and spatial locality problems of data accesses which
is a fundamental result of using GC for memory allocation.

It is not an issue of "how fast" the GC is.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 22:40                   ` David S. Miller
@ 2002-08-13 23:44                     ` Fergus Henderson
  2002-08-14  7:58                     ` Jeff Sturm
  2002-08-14  9:52                     ` Richard Henderson
  2 siblings, 0 replies; 173+ messages in thread
From: Fergus Henderson @ 2002-08-13 23:44 UTC (permalink / raw)
  To: David S. Miller; +Cc: rittle, gcc

On 13-Aug-2002, David S. Miller <davem@redhat.com> wrote:
>    From: Fergus Henderson <fjh@cs.mu.OZ.AU>
>    Date: Wed, 14 Aug 2002 14:46:37 +1000
>    
>    Yes, but GCC could use the Boehm GC on systems which supported it,
>    if the Boehm GC was faster...
>    
>    I think this would be a very interesting experiment.
> 
> Feel free to even try it with an infinitely fast GC, even
> one that executed in zero time.
> 
> Because for the millionth time, it's not the performance of GC itself.
> It's the temporal and spatial locality problems of data accesses which
> is a fundamental result of using GC for memory allocation.
> 
> It is not an issue of "how fast" the GC is.

Look, there are a number of possible memory management strategies and
implementations possible.  GC using GCC's current GC implementation is
one.  Conservative GC using the Boehm collector is another.  Reference
counting is another.  Reference counting has its own set of drawbacks
for locality, so it's not clear it would be a win; doing the experiment
would be a *lot* of work.  If someone really feels strongly about RC,
and has lots of time, by all means, go for it.

Using the Boehm collector is less likely to be a huge win, but it might
well be a significant win, and it would be much easier to carry out
that experiment.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: < http://www.cs.mu.oz.au/~fjh >  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 22:40                   ` David S. Miller
  2002-08-13 23:44                     ` Fergus Henderson
@ 2002-08-14  7:58                     ` Jeff Sturm
  2002-08-14  9:52                     ` Richard Henderson
  2 siblings, 0 replies; 173+ messages in thread
From: Jeff Sturm @ 2002-08-14  7:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: fjh, rittle, gcc

On Tue, 13 Aug 2002, David S. Miller wrote:
>    I think this would be a very interesting experiment.
>
> Feel free to even try it with an infinitely fast GC, even
> one that executed in zero time.
>
> Because for the millionth time, it's not the performance of GC itself.
> It's the temporal and spatial locality problems of data accesses which
> is a fundamental result of using GC for memory allocation.

Relax.  Earlier in this thread I seem to remember you were advocating
certain experiments in spite of the skeptics.  So give the GC experts a
chance.

As I understand it, generational collection ought to improve locality,
since the youngest generation can be collected frequently, and may even be
small enough to fit mostly in cache.

(I've never observed it to work in practice, but don't let that
discourage anyone :-)

Jeff

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 22:40                   ` David S. Miller
  2002-08-13 23:44                     ` Fergus Henderson
  2002-08-14  7:58                     ` Jeff Sturm
@ 2002-08-14  9:52                     ` Richard Henderson
  2002-08-14 10:00                       ` David Edelsohn
  2002-08-14 10:15                       ` David Edelsohn
  2 siblings, 2 replies; 173+ messages in thread
From: Richard Henderson @ 2002-08-14  9:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: fjh, rittle, gcc

On Tue, Aug 13, 2002 at 10:26:41PM -0700, David S. Miller wrote:
> Because for the millionth time, it's not the performance of GC itself.
> It's the temporal and spatial locality problems of data accesses which
> is a fundamental result of using GC for memory allocation.

You havn't shown (or even provided guesstemates) how much temporal
or spacial locallity could be had by moving away from GC.  Exactly
how much garbage is created during compilation of a function, Dave?

Suppose we did do manual memory allocation and never created any
garbage whatsoever.  Suppose perfect temporal locality.  How much
spacial locality do we have, considering the pointer-chasing structure
of our IL?  My guess is not much.

The folks that are doing cache-miss studies and concluding anything
should also go back and measure gcc 2.95, before we used GC at all.
That's perhaps not ideal, since it's obstacks instead of reference
counting, but it's not a worthless data point.

The conclusion that RC will solve all our problems is not foregone.
I think we're better served trying to adjust the form of the IL so
that we do less pointer chasing, as Geoff suggested elsewhere in 
this thread.

r~

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-14  9:52                     ` Richard Henderson
@ 2002-08-14 10:00                       ` David Edelsohn
  2002-08-14 12:01                         ` Andreas Schwab
  2002-08-14 10:15                       ` David Edelsohn
  1 sibling, 1 reply; 173+ messages in thread
From: David Edelsohn @ 2002-08-14 10:00 UTC (permalink / raw)
  To: Richard Henderson, David S. Miller; +Cc: gcc

>>>>> Richard Henderson writes:

Richard> You havn't shown (or even provided guesstemates) how much temporal
Richard> or spacial locallity could be had by moving away from GC.  Exactly
Richard> how much garbage is created during compilation of a function, Dave?

Richard> Suppose we did do manual memory allocation and never created any
Richard> garbage whatsoever.  Suppose perfect temporal locality.  How much
Richard> spacial locality do we have, considering the pointer-chasing structure
Richard> of our IL?  My guess is not much.

	Places where GCC could benefit from spacial locality is by
allocating the instruction list and pseudo registers from a large, static
virtual memory array instead of allocating individual objects dynamically.
I am *not* suggesting removing the linked list pointers or the pointers to
the actual RTL.  GCC often scans or walks through the instructions
linearly.  Pseudo registers are allocated consecutively.  Allocating those
linearly-accessed objects in contiguous memory would improve cache
locality.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-14 10:00                       ` David Edelsohn
@ 2002-08-14 12:01                         ` Andreas Schwab
  2002-08-14 12:07                           ` David Edelsohn
  0 siblings, 1 reply; 173+ messages in thread
From: Andreas Schwab @ 2002-08-14 12:01 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Richard Henderson, David S. Miller, gcc

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1074 bytes --]

David Edelsohn <dje@watson.ibm.com> writes:

|> >>>>> Richard Henderson writes:
|> 
|> Richard> You havn't shown (or even provided guesstemates) how much temporal
|> Richard> or spacial locallity could be had by moving away from GC.  Exactly
|> Richard> how much garbage is created during compilation of a function, Dave?
|> 
|> Richard> Suppose we did do manual memory allocation and never created any
|> Richard> garbage whatsoever.  Suppose perfect temporal locality.  How much
|> Richard> spacial locality do we have, considering the pointer-chasing structure
|> Richard> of our IL?  My guess is not much.
|> 
|> 	Places where GCC could benefit from spacial locality is by
|> allocating the instruction list and pseudo registers from a large, static
|> virtual memory array instead of allocating individual objects dynamically.

Obstacks?

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 NÃ¼rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-14 12:01                         ` Andreas Schwab
@ 2002-08-14 12:07                           ` David Edelsohn
  2002-08-14 13:20                             ` Michael Matz
  2002-08-14 13:20                             ` Faster compilation speed Jamie Lokier
  0 siblings, 2 replies; 173+ messages in thread
From: David Edelsohn @ 2002-08-14 12:07 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Richard Henderson, David S. Miller, gcc

>>>>> Andreas Schwab writes:

|> 	Places where GCC could benefit from spacial locality is by
|> allocating the instruction list and pseudo registers from a large, static
|> virtual memory array instead of allocating individual objects dynamically.

Andreas> Obstacks?

	I thought that obstacks are created dynamically, not statically.
One does not want to ever copy or grow the array.

	Statically allocating some of the large, persistent, sequential
collections of objects would help locality.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-14 12:07                           ` David Edelsohn
@ 2002-08-14 13:20                             ` Michael Matz
  2002-08-14 16:31                               ` Faster compilation speed [zone allocation] Per Bothner
  2002-08-14 13:20                             ` Faster compilation speed Jamie Lokier
  1 sibling, 1 reply; 173+ messages in thread
From: Michael Matz @ 2002-08-14 13:20 UTC (permalink / raw)
  To: David Edelsohn; +Cc: gcc

Hi,

On Wed, 14 Aug 2002, David Edelsohn wrote:

> |> 	Places where GCC could benefit from spacial locality is by
> |> allocating the instruction list and pseudo registers from a large, static
> |> virtual memory array instead of allocating individual objects dynamically.
>
> Andreas> Obstacks?
>
> 	I thought that obstacks are created dynamically, not statically.

Sort of.  Obstacks have the ability to grow an object which isn't yet
finalized, and in that process there might be some copying (the canonical
example is a string, which is created character by character).  After
finalization it doesn't change it's address anymore, but still is part of
that obstack.

One would not use that functionality, but simply use obstacks as
convenient containers for small objects, which are allocated already
finalized.  It allocates memory in blocks, and then gives out part of the
current block as long as enough is free in it, and the request is not
larger than a certain size (in which case it gets it's own block).  This
makes for extremely fast allocation (just a pointer increment in the
general case).  One can't deallocate objects in an obstack (or better only
all objects allocated after a certain one).  And it creates good space
locality, and needs less memory then a general allocator like malloc (in
case many small objects are allocated).

But that one can't free objects is a quite severe limitation (I wrote one
for KDE, in which you can free objects, but it has certain restrictions).
But it's still usable.  E.g. I use an obstack in the new register
allocator to allocate most of my small objects from it (nodes and edges of
the graph), and then simply free the whole thing once at the end of that
phase.  But that's not possible e.g. with the current RTL of the function,
there you really don't want to use an obstack.

> One does not want to ever copy or grow the array.

As explained, this doesn't happen if one uses the obstack without growing
objects.

> Statically allocating some of the large, persistent, sequential
> collections of objects would help locality.

This would lead to the idea of obstacks (without growing obstacks) per
data structure type, IOW to a zone allocator, which is not a bad thing.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed [zone allocation]
  2002-08-14 13:20                             ` Michael Matz
@ 2002-08-14 16:31                               ` Per Bothner
  2002-08-15 11:34                                 ` Aldy Hernandez
  0 siblings, 1 reply; 173+ messages in thread
From: Per Bothner @ 2002-08-14 16:31 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc

Michael Matz wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed [zone allocation]
  2002-08-14 16:31                               ` Faster compilation speed [zone allocation] Per Bothner
@ 2002-08-15 11:34                                 ` Aldy Hernandez
  2002-08-15 11:39                                   ` David Edelsohn
                                                     ` (3 more replies)
  0 siblings, 4 replies; 173+ messages in thread
From: Aldy Hernandez @ 2002-08-15 11:34 UTC (permalink / raw)
  To: Per Bothner; +Cc: Michael Matz, gcc

>>>>> "Per" == Per Bothner <per@bothner.com> writes:

This is just an idea, why doesn't someone hack the GC to never
collect, and then we can really find out how much is to be gained by a
refcounter, or no GC at all, etc.

Why go down this path, if we're not even sure it'll improve anything
(well, that much anyhow).

Aldy

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed [zone allocation]
  2002-08-15 11:34                                 ` Aldy Hernandez
@ 2002-08-15 11:39                                   ` David Edelsohn
  2002-08-15 12:01                                     ` Lynn Winebarger
  2002-08-15 11:41                                   ` Michael Matz
                                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 173+ messages in thread
From: David Edelsohn @ 2002-08-15 11:39 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Per Bothner, Michael Matz, gcc

>>>>> Aldy Hernandez writes:

Aldy> This is just an idea, why doesn't someone hack the GC to never
Aldy> collect, and then we can really find out how much is to be gained by a
Aldy> refcounter, or no GC at all, etc.

Aldy> Why go down this path, if we're not even sure it'll improve anything
Aldy> (well, that much anyhow).

	Because the problem is not the garbage collection, its the
allocation pattern.  The proposal to use reference counting allows GCC to
switch to an allocator with better locality -- it's a requirement for the
underlying improvement, not a fix unto itself.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed [zone allocation]
  2002-08-15 11:39                                   ` David Edelsohn
@ 2002-08-15 12:01                                     ` Lynn Winebarger
  2002-08-15 12:11                                       ` David Edelsohn
  0 siblings, 1 reply; 173+ messages in thread
From: Lynn Winebarger @ 2002-08-15 12:01 UTC (permalink / raw)
  To: David Edelsohn, Aldy Hernandez; +Cc: Per Bothner, Michael Matz, gcc

On Thursday 15 August 2002 13:39, David Edelsohn wrote:
> >>>>> Aldy Hernandez writes:
> 
> 	Because the problem is not the garbage collection, its the
> allocation pattern.  The proposal to use reference counting allows GCC to
> switch to an allocator with better locality -- it's a requirement for the
> underlying improvement, not a fix unto itself.
> 
   GCC's GC promotion of poor locality of reference is not proof that
reference counting is the only way to improve that locality of reference.
It doesn't matter what allocation/reclamation scheme you switch to, if it's
not used in a way consistent with the cases it optimizes for, it's going to
stink.  There's just as much reason to believe there's a generational GC
that will do what you need as to believe reference counting will be some
kind of magic bullet (without the brittleness).
   
Lynn

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed [zone allocation]
  2002-08-15 12:01                                     ` Lynn Winebarger
@ 2002-08-15 12:11                                       ` David Edelsohn
  0 siblings, 0 replies; 173+ messages in thread
From: David Edelsohn @ 2002-08-15 12:11 UTC (permalink / raw)
  To: Lynn Winebarger; +Cc: Aldy Hernandez, Per Bothner, Michael Matz, gcc

>>>>> Lynn Winebarger writes:

Lynn> GCC's GC promotion of poor locality of reference is not proof that
Lynn> reference counting is the only way to improve that locality of reference.
Lynn> It doesn't matter what allocation/reclamation scheme you switch to, if it's
Lynn> not used in a way consistent with the cases it optimizes for, it's going to
Lynn> stink.  There's just as much reason to believe there's a generational GC
Lynn> that will do what you need as to believe reference counting will be some
Lynn> kind of magic bullet (without the brittleness).

	Let me correct my sloppy wording.  What I meant by "it's a
requirement for the underlying improvement" is that it is a dependency for
that particular proposal -- RC is a means to an end, not an end unto
itself.  There are many ways to address the locality problem.

	I am trying to encourage people participating in this discussion
to stop fixating on the garbage collector itself.  Somehow when GC is
mentioned, people obsess on the garbage collection process without reading
the entire discussion.  If there is interest in discussing garbage
collectors, there are other mailinglists on that specific topic where the
pros and cons of various styles with and without hardware assistance are
debated.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed [zone allocation]
  2002-08-15 11:34                                 ` Aldy Hernandez
  2002-08-15 11:39                                   ` David Edelsohn
@ 2002-08-15 11:41                                   ` Michael Matz
  2002-08-16  8:44                                     ` Kai Henningsen
  2002-08-15 11:43                                   ` Per Bothner
  2002-08-15 11:57                                   ` Kevin Handy
  3 siblings, 1 reply; 173+ messages in thread
From: Michael Matz @ 2002-08-15 11:41 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Per Bothner, gcc

Hi,

On 15 Aug 2002, Aldy Hernandez wrote:

> This is just an idea, why doesn't someone hack the GC to never
> collect, and then we can really find out how much is to be gained by a
> refcounter, or no GC at all, etc.

To switch off GC doesn't necessarily bring anything, except that GC isn't
done.  But the allocated memory still has the same locality as before
(i.e. if it's the reason for bad performance now, that will still be the
case if we switch off GC).  I.e. it wouldn't proove anything.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed [zone allocation]
  2002-08-15 11:41                                   ` Michael Matz
@ 2002-08-16  8:44                                     ` Kai Henningsen
  0 siblings, 0 replies; 173+ messages in thread
From: Kai Henningsen @ 2002-08-16  8:44 UTC (permalink / raw)
  To: gcc

matz@suse.de (Michael Matz)  wrote on 15.08.02 in < Pine.LNX.4.33.0208152037200.13269-100000@wotan.suse.de >:

> On 15 Aug 2002, Aldy Hernandez wrote:
>
> > This is just an idea, why doesn't someone hack the GC to never
> > collect, and then we can really find out how much is to be gained by a
> > refcounter, or no GC at all, etc.
>
> To switch off GC doesn't necessarily bring anything, except that GC isn't
> done.  But the allocated memory still has the same locality as before
> (i.e. if it's the reason for bad performance now, that will still be the
> case if we switch off GC).  I.e. it wouldn't proove anything.

Well, it might prove that the bad locality isn't *caused* by running the  
collector. (Or that it is, of course.)

MfG Kai

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed [zone allocation]
  2002-08-15 11:34                                 ` Aldy Hernandez
  2002-08-15 11:39                                   ` David Edelsohn
  2002-08-15 11:41                                   ` Michael Matz
@ 2002-08-15 11:43                                   ` Per Bothner
  2002-08-15 11:57                                   ` Kevin Handy
  3 siblings, 0 replies; 173+ messages in thread
From: Per Bothner @ 2002-08-15 11:43 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Michael Matz, gcc

Aldy Hernandez wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed [zone allocation]
  2002-08-15 11:34                                 ` Aldy Hernandez
                                                     ` (2 preceding siblings ...)
  2002-08-15 11:43                                   ` Per Bothner
@ 2002-08-15 11:57                                   ` Kevin Handy
  3 siblings, 0 replies; 173+ messages in thread
From: Kevin Handy @ 2002-08-15 11:57 UTC (permalink / raw)
  To: gcc

Aldy Hernandez wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-14 12:07                           ` David Edelsohn
  2002-08-14 13:20                             ` Michael Matz
@ 2002-08-14 13:20                             ` Jamie Lokier
  2002-08-14 16:01                               ` Nix
  1 sibling, 1 reply; 173+ messages in thread
From: Jamie Lokier @ 2002-08-14 13:20 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Andreas Schwab, Richard Henderson, David S. Miller, gcc

David Edelsohn wrote:
> 	I thought that obstacks are created dynamically, not statically.
> One does not want to ever copy or grow the array.

Obstacks use chunks of memory to hold many contiguous objects, so they
offer fairly good spatial locality.  But then, so do many decent GC
allocators (not ones using free lists, though).

> 	Statically allocating some of the large, persistent, sequential
> collections of objects would help locality.

Linus and David are suggesting that temporal locality of short-lived
objects is important -- i.e. reuse of memory from freed objects.
Who knows.

-- Jamie

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-14 13:20                             ` Faster compilation speed Jamie Lokier
@ 2002-08-14 16:01                               ` Nix
  0 siblings, 0 replies; 173+ messages in thread
From: Nix @ 2002-08-14 16:01 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: David Edelsohn, Andreas Schwab, Richard Henderson, David S. Miller, gcc

On Wed, 14 Aug 2002, Jamie Lokier muttered drunkenly:
> David Edelsohn wrote:
>> 	I thought that obstacks are created dynamically, not statically.
>> One does not want to ever copy or grow the array.
> 
> Obstacks use chunks of memory to hold many contiguous objects, so they
> offer fairly good spatial locality.  But then, so do many decent GC
> allocators (not ones using free lists, though).

Also, surely one does not *often* want to grow or copy the array: the
occasional copy isn't a problem (but you initialize it quite large
so the resizing isn't required often).

-- 
`Mips are real and bitrate earnest, shifting spam is not our goal;
 silicon to sand returnest, was not spoken of the soul.'
   --- _Eventful History: Version 1.x_, John M. Ford

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-14  9:52                     ` Richard Henderson
  2002-08-14 10:00                       ` David Edelsohn
@ 2002-08-14 10:15                       ` David Edelsohn
  2002-08-14 16:35                         ` Richard Henderson
  2002-08-20  4:15                         ` Richard Earnshaw
  1 sibling, 2 replies; 173+ messages in thread
From: David Edelsohn @ 2002-08-14 10:15 UTC (permalink / raw)
  To: Richard Henderson, David S. Miller; +Cc: gcc

>>>>> Richard Henderson writes:

Richard> The folks that are doing cache-miss studies and concluding anything
Richard> should also go back and measure gcc 2.95, before we used GC at all.
Richard> That's perhaps not ideal, since it's obstacks instead of reference
Richard> counting, but it's not a worthless data point.

	Thanks for the suggestion.  I think the results I got are pretty
damning: 

gcc-2.95.3 20010315 (release)

Source		I/D$ miss -O2		I/D$ miss -O0
------		-------------		-------------
reload.c		28			36
insn-recog.c		48			36


	For comparison, GCC 3.3 has values in the low 20's, especially at
no optimization.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-14 10:15                       ` David Edelsohn
@ 2002-08-14 16:35                         ` Richard Henderson
  2002-08-14 17:02                           ` David Edelsohn
  2002-08-20  4:15                         ` Richard Earnshaw
  1 sibling, 1 reply; 173+ messages in thread
From: Richard Henderson @ 2002-08-14 16:35 UTC (permalink / raw)
  To: David Edelsohn; +Cc: David S. Miller, gcc

On Wed, Aug 14, 2002 at 01:14:53PM -0400, David Edelsohn wrote:
> Thanks for the suggestion.  I think the results I got are pretty damning...

Try the following.  Appears to cut 30 seconds (3.5%) off of an -O2 -g
build of reload.c, and a small fraction of a second (3.1%) at -O0 -g.
This on an 800MHz Pentium III (Coppermine).

If I have rest_of_compilation dump out insn addresses before
optimization (the only time we could even hope for relatively
sequential nodes), INSN nodes are indeed largely coherent
(even without this patch).  But NOTE nodes are smaller, and
get put in a different size bucket, and so are allocated from
different pages.  Padding out the size of NOTEs and BARRIERs
make them allocated from the same pages, and the resulting
initial addresses are about as sequential as one could hope.

The remaining main source of non-sequentiality in the initial rtl is

	label = gen_label_rtx ();
	/* emit code */
	emit_label (label);

and there's really no helping that.

The other change is to add allocation buckets for two 
important rtx sizes.  On 32-bit systems, two-operand rtxs
(including REG, MEM, PLUS, etc) are 12 bytes, but we were
allocating 16 bytes.  Similarly an INSN (9 operand) and
CALL_INSN (10 operand) are 40 and 44 bytes respectively
but we were allocating 64.  I choose to put the bucket
at 10 operand so that CALL_INSNs and JUMP_INSNs can fit.

I havn't measured the overall real-life memory savings,
but this is 25% for REGs and 30% for INSNs.


r~


	* ggc-page.c (RTL_SIZE): New.
	(extra_order_size_table): Add specializations for 2 and 10 rtl slots.
	* rtl.def (BARRIER, NOTE): Pad to 9 slots.

Index: ggc-page.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/ggc-page.c,v
retrieving revision 1.51
diff -c -p -d -r1.51 ggc-page.c
*** ggc-page.c	4 Jun 2002 11:30:36 -0000	1.51
--- ggc-page.c	14 Aug 2002 22:38:57 -0000
*************** Software Foundation, 59 Temple Place - S
*** 163,175 ****
  
  #define NUM_EXTRA_ORDERS ARRAY_SIZE (extra_order_size_table)
  
  /* The Ith entry is the maximum size of an object to be stored in the
     Ith extra order.  Adding a new entry to this array is the *only*
     thing you need to do to add a new special allocation size.  */
  
  static const size_t extra_order_size_table[] = {
    sizeof (struct tree_decl),
!   sizeof (struct tree_list)
  };
  
  /* The total number of orders.  */
--- 163,180 ----
  
  #define NUM_EXTRA_ORDERS ARRAY_SIZE (extra_order_size_table)
  
+ #define RTL_SIZE(NSLOTS) \
+   (sizeof (struct rtx_def) + ((NSLOTS) - 1) * sizeof (rtunion))
+ 
  /* The Ith entry is the maximum size of an object to be stored in the
     Ith extra order.  Adding a new entry to this array is the *only*
     thing you need to do to add a new special allocation size.  */
  
  static const size_t extra_order_size_table[] = {
    sizeof (struct tree_decl),
!   sizeof (struct tree_list),
!   RTL_SIZE (2),			/* REG, MEM, PLUS, etc.  */
!   RTL_SIZE (10),		/* INSN, CALL_INSN, JUMP_INSN */
  };
  
  /* The total number of orders.  */
Index: rtl.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/rtl.def,v
retrieving revision 1.58
diff -c -p -d -r1.58 rtl.def
*** rtl.def	19 Jul 2002 23:11:18 -0000	1.58
--- rtl.def	14 Aug 2002 22:38:57 -0000
*************** DEF_RTL_EXPR(JUMP_INSN, "jump_insn", "iu
*** 566,587 ****
  DEF_RTL_EXPR(CALL_INSN, "call_insn", "iuuBteieee", 'i')
  
  /* A marker that indicates that control will not flow through.  */
! DEF_RTL_EXPR(BARRIER, "barrier", "iuu", 'x')
  
  /* Holds a label that is followed by instructions.
     Operand:
!    4: is used in jump.c for the use-count of the label.
!    5: is used in flow.c to point to the chain of label_ref's to this label.
!    6: is a number that is unique in the entire compilation.
!    7: is the user-given name of the label, if any.  */
  DEF_RTL_EXPR(CODE_LABEL, "code_label", "iuuB00is", 'x')
  
  /* Say where in the code a source line starts, for symbol table's sake.
     Operand:
!    4: filename, if line number > 0, note-specific data otherwise.
!    5: line number if > 0, enum note_insn otherwise.
!    6: unique number if line number == note_insn_deleted_label.  */
! DEF_RTL_EXPR(NOTE, "note", "iuuB0ni", 'x')
  
  /* ----------------------------------------------------------------------
     Top level constituents of INSN, JUMP_INSN and CALL_INSN.
--- 566,589 ----
  DEF_RTL_EXPR(CALL_INSN, "call_insn", "iuuBteieee", 'i')
  
  /* A marker that indicates that control will not flow through.  */
! DEF_RTL_EXPR(BARRIER, "barrier", "iuu000000", 'x')
  
  /* Holds a label that is followed by instructions.
     Operand:
!    5: is used in jump.c for the use-count of the label.
!    6: is used in flow.c to point to the chain of label_ref's to this label.
!    7: is a number that is unique in the entire compilation.
!    8: is the user-given name of the label, if any.  */
  DEF_RTL_EXPR(CODE_LABEL, "code_label", "iuuB00is", 'x')
  
  /* Say where in the code a source line starts, for symbol table's sake.
     Operand:
!    5: filename, if line number > 0, note-specific data otherwise.
!    6: line number if > 0, enum note_insn otherwise.
!    7: unique number if line number == note_insn_deleted_label.
!    8-9: padding so that notes and insns are the same size, and thus
!          allocated from the same page ordering.  */
! DEF_RTL_EXPR(NOTE, "note", "iuuB0ni00", 'x')
  
  /* ----------------------------------------------------------------------
     Top level constituents of INSN, JUMP_INSN and CALL_INSN.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-14 16:35                         ` Richard Henderson
@ 2002-08-14 17:02                           ` David Edelsohn
  0 siblings, 0 replies; 173+ messages in thread
From: David Edelsohn @ 2002-08-14 17:02 UTC (permalink / raw)
  To: Richard Henderson, David S. Miller; +Cc: gcc

	The patch does improve the cache behavior:

Source		I/D$ miss -O2		I/D$ miss -O0
------		-------------		-------------
reload.c	22 -> 23.4		22 -> 23.9
insn-recog.c	29 -> 30.3		23 -> 24.6

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-14 10:15                       ` David Edelsohn
  2002-08-14 16:35                         ` Richard Henderson
@ 2002-08-20  4:15                         ` Richard Earnshaw
  2002-08-20  5:38                           ` Jeff Sturm
  2002-08-20  8:00                           ` David Edelsohn
  1 sibling, 2 replies; 173+ messages in thread
From: Richard Earnshaw @ 2002-08-20  4:15 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Richard Henderson, David S. Miller, gcc, Richard.Earnshaw

> >>>>> Richard Henderson writes:
> 
> Richard> The folks that are doing cache-miss studies and concluding anything
> Richard> should also go back and measure gcc 2.95, before we used GC at all.
> Richard> That's perhaps not ideal, since it's obstacks instead of reference
> Richard> counting, but it's not a worthless data point.
> 
> 	Thanks for the suggestion.  I think the results I got are pretty
> damning: 
> 
> gcc-2.95.3 20010315 (release)
> 
> Source		I/D$ miss -O2		I/D$ miss -O0
> ------		-------------		-------------
> reload.c		28			36
> insn-recog.c		48			36
> 
> 
> 	For comparison, GCC 3.3 has values in the low 20's, especially at
> no optimization.
> 
> David
> 

Do you have/can you get data for TLB misses?

R.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-20  4:15                         ` Richard Earnshaw
@ 2002-08-20  5:38                           ` Jeff Sturm
  2002-08-20  5:53                             ` Richard Earnshaw
  2002-08-20  8:00                           ` David Edelsohn
  1 sibling, 1 reply; 173+ messages in thread
From: Jeff Sturm @ 2002-08-20  5:38 UTC (permalink / raw)
  To: Richard.Earnshaw; +Cc: David Edelsohn, Richard Henderson, David S. Miller, gcc

On Tue, 20 Aug 2002, Richard Earnshaw wrote:
> > gcc-2.95.3 20010315 (release)
> >
> > Source		I/D$ miss -O2		I/D$ miss -O0
> > ------		-------------		-------------
> > reload.c		28			36
> > insn-recog.c		48			36
>
> Do you have/can you get data for TLB misses?

I had done that on alpha, but didn't initially report the figures.  Would
a comparison to 2.95 also be useful?

gcc version 3.3 20020802 (experimental)

---------------------------------------------------------------------------
cc1 -O2 reload.i

issues/cycles = 0.51  issues/dcache_miss = 26.93  issues/dtb_miss = 1214.36

---------------------------------------------------------------------------
cc1 reload.i

issues/cycles = 0.52  issues/dcache_miss = 31.29  issues/dtb_miss = 1854.16

Jeff

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-20  5:38                           ` Jeff Sturm
@ 2002-08-20  5:53                             ` Richard Earnshaw
  2002-08-20 13:42                               ` Jeff Sturm
  0 siblings, 1 reply; 173+ messages in thread
From: Richard Earnshaw @ 2002-08-20  5:53 UTC (permalink / raw)
  To: Jeff Sturm
  Cc: Richard.Earnshaw, David Edelsohn, Richard Henderson,
	David S. Miller, gcc

> > Do you have/can you get data for TLB misses?
> 
> I had done that on alpha, but didn't initially report the figures.  Would
> a comparison to 2.95 also be useful?

Certainly -- the numbers don't really mean anything unless we have 
something to compare them against.  Remember, gcc-2.95 bootstrap times 
were about half those that we have now (*after* taking into account new 
languages and libraries etc).

R.

> 
> gcc version 3.3 20020802 (experimental)
> 
> ---------------------------------------------------------------------------
> cc1 -O2 reload.i
> 
> issues/cycles = 0.51  issues/dcache_miss = 26.93  issues/dtb_miss = 1214.36

So if I understand these figures correctly, then 

dcache_miss/dtb_miss ~= 45

That is, one in 45 dcache fetches also requires a tlb walk.  How many dtb 
entries does an Alpha have?

> 
> ---------------------------------------------------------------------------
> cc1 reload.i
> 
> issues/cycles = 0.52  issues/dcache_miss = 31.29  issues/dtb_miss = 1854.16
> 

giving
dcache_miss/dtb_miss ~= 60




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-20  5:53                             ` Richard Earnshaw
@ 2002-08-20 13:42                               ` Jeff Sturm
  2002-08-22  1:55                                 ` Richard Earnshaw
  0 siblings, 1 reply; 173+ messages in thread
From: Jeff Sturm @ 2002-08-20 13:42 UTC (permalink / raw)
  To: Richard.Earnshaw; +Cc: David Edelsohn, Richard Henderson, David S. Miller, gcc

On Tue, 20 Aug 2002, Richard Earnshaw wrote:
> > I had done that on alpha, but didn't initially report the figures.  Would
> > a comparison to 2.95 also be useful?
>
> Certainly -- the numbers don't really mean anything unless we have
> something to compare them against.

I figured so.  (Wow, I hadn't built a 2.95 toolchain in a long time.)

> > gcc version 3.3 20020802 (experimental)
> >
> > ---------------------------------------------------------------------------
> > cc1 -O2 reload.i
> >
> > issues/cycles = 0.51  issues/dcache_miss = 26.93  issues/dtb_miss = 1214.36

gcc version 2.95.3 20010315 (release)

cc1 -O2 reload.i
issues/cycles = 0.54  issues/dcache_miss = 26.31  issues/dtb_miss = 2488.

cc1 reload.i
issues/cycles = 0.52  issues/dcache_miss = 26.30  issues/dtb_miss = 3306.

Now that's interesting.  No real change in L1 cache performance, but TLB
misses nearly cut in half vs. 3.3.

Trying L3 misses (both with -O0):

3.3: issues/bcache_miss = 370
2.95.3: issues/bcache_miss = 437

Wall-clock time is nearly 2/1 for these tests, as are TLB misses, while
other stats are close.  Hmm.

> So if I understand these figures correctly, then
>
> dcache_miss/dtb_miss ~= 45
>
> That is, one in 45 dcache fetches also requires a tlb walk.

That's how I see it.

> How many dtb entries does an Alpha have?

No idea.  This is an ev56.  I could try grabbing the specs from Digital's
site, if I can still find it...

How expensive is a TLB miss, anyway?  I hadn't expected it would occur
often enough in gcc to be significant.  Note the IPC ratio stays constant,
but as I understand it, TLB is handled in software, so maybe those cycles
are counted by iprobe?

Jeff

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-20 13:42                               ` Jeff Sturm
@ 2002-08-22  1:55                                 ` Richard Earnshaw
  2002-08-22  2:03                                   ` David S. Miller
  2002-08-23 15:39                                   ` Jeff Sturm
  0 siblings, 2 replies; 173+ messages in thread
From: Richard Earnshaw @ 2002-08-22  1:55 UTC (permalink / raw)
  To: Jeff Sturm
  Cc: Richard.Earnshaw, David Edelsohn, Richard Henderson,
	David S. Miller, gcc

> On Tue, 20 Aug 2002, Richard Earnshaw wrote:
> > > I had done that on alpha, but didn't initially report the figures.  Would
> > > a comparison to 2.95 also be useful?
> >
> > Certainly -- the numbers don't really mean anything unless we have
> > something to compare them against.
> 
> I figured so.  (Wow, I hadn't built a 2.95 toolchain in a long time.)
> 
> > > gcc version 3.3 20020802 (experimental)
> > >
> > > ---------------------------------------------------------------------------
> > > cc1 -O2 reload.i
> > >
> > > issues/cycles = 0.51  issues/dcache_miss = 26.93  issues/dtb_miss = 1214.36
> 
> gcc version 2.95.3 20010315 (release)
> 
> cc1 -O2 reload.i
> issues/cycles = 0.54  issues/dcache_miss = 26.31  issues/dtb_miss = 2488.
> 
> cc1 reload.i
> issues/cycles = 0.52  issues/dcache_miss = 26.30  issues/dtb_miss = 3306.
> 
> Now that's interesting.  No real change in L1 cache performance, but TLB
> misses nearly cut in half vs. 3.3.
> 
> Trying L3 misses (both with -O0):
> 
> 3.3: issues/bcache_miss = 370
> 2.95.3: issues/bcache_miss = 437
> 
> Wall-clock time is nearly 2/1 for these tests, as are TLB misses, while
> other stats are close.  Hmm.
> 
> > So if I understand these figures correctly, then
> >
> > dcache_miss/dtb_miss ~= 45
> >
> > That is, one in 45 dcache fetches also requires a tlb walk.
> 
> That's how I see it.

OK, now consider it this way.  Each cache line miss will cause N bytes to 
be fetched from memory -- I don't know the details, but lets assume that's 
32 bytes, a typical value.  Each tlb entry will address one page -- again 
I don't know the details but 4K is common on many machines.

So, with gcc 2.95.3 we have

-O2 dcache_miss/tlb_miss = 2488 / 26.31 ~= 95
-O0 dcache_miss/tlb_miss = 3306 / 26.30 ~= 127

Since each dcache miss represents 32 bytes of memory we have 3040 (95 * 
32) and 4064 bytes fetched per tlb miss we have very nearly 75% and 100% 
of each page being accessed for each miss (it will be lower than this in 
practice, since some lines in a page will probably be fetched more than 
once and others not at all).

However, for gcc 3 we have 1440 and 1920 bytes; that is, we *at best* 
access less than half the memory in each page we touch.

> How expensive is a TLB miss, anyway?  I hadn't expected it would occur
> often enough in gcc to be significant.  Note the IPC ratio stays constant,
> but as I understand it, TLB is handled in software, so maybe those cycles
> are counted by iprobe?

A cache miss probably takes about twice as long if we also miss in the 
TLB, assuming tlb walking is done in hardware -- if you have a soft-loaded 
TLB, then it could take significantly longer.

R.



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-22  1:55                                 ` Richard Earnshaw
@ 2002-08-22  2:03                                   ` David S. Miller
  2002-08-23 15:39                                   ` Jeff Sturm
  1 sibling, 0 replies; 173+ messages in thread
From: David S. Miller @ 2002-08-22  2:03 UTC (permalink / raw)
  To: Richard.Earnshaw, rearnsha; +Cc: jsturm, dje, rth, gcc

   From: Richard Earnshaw <rearnsha@arm.com>
   Date: Thu, 22 Aug 2002 09:53:19 +0100

   > How expensive is a TLB miss, anyway?  I hadn't expected it would occur
   > often enough in gcc to be significant.  Note the IPC ratio stays constant,
   > but as I understand it, TLB is handled in software, so maybe those cycles
   > are counted by iprobe?

   A cache miss probably takes about twice as long if we also miss in the 
   TLB, assuming tlb walking is done in hardware -- if you have a soft-loaded 
   TLB, then it could take significantly longer.

A soft-loaded TLB miss on UltraSPARC can be serviced in ~38 processor
cycles.  At least this is how fast the Linux software TLB miss handler
is.  This includes all of the overhead associated with entering and
leaving the trap.  It also assumes that the TLB miss handler hits the
L2 cache for the page table entry load (there is only one memory
access necessary to service a TLB miss, bonus points to those who know
how this is accomplished without looking at the sources :-).

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-22  1:55                                 ` Richard Earnshaw
  2002-08-22  2:03                                   ` David S. Miller
@ 2002-08-23 15:39                                   ` Jeff Sturm
  1 sibling, 0 replies; 173+ messages in thread
From: Jeff Sturm @ 2002-08-23 15:39 UTC (permalink / raw)
  To: Richard.Earnshaw; +Cc: David Edelsohn, Richard Henderson, David S. Miller, gcc

On Thu, 22 Aug 2002, Richard Earnshaw wrote:
> OK, now consider it this way.  Each cache line miss will cause N bytes to
> be fetched from memory -- I don't know the details, but lets assume that's
> 32 bytes, a typical value.  Each tlb entry will address one page -- again
> I don't know the details but 4K is common on many machines.
>
> So, with gcc 2.95.3 we have
>
> -O2 dcache_miss/tlb_miss = 2488 / 26.31 ~= 95
> -O0 dcache_miss/tlb_miss = 3306 / 26.30 ~= 127
>
> Since each dcache miss represents 32 bytes of memory we have 3040 (95 *
> 32) and 4064 bytes fetched per tlb miss we have very nearly 75% and 100%
> of each page being accessed for each miss (it will be lower than this in
> practice, since some lines in a page will probably be fetched more than
> once and others not at all).
>
> However, for gcc 3 we have 1440 and 1920 bytes; that is, we *at best*
> access less than half the memory in each page we touch.

Interesting analysis; thanks.  It's actually worse than you say since
Alpha has 8k pages.

I looked up the ev56 specs to find out there are just 64 TLB entries, so
for any working set larger than 512k some thrashing would be expected.

For another experiment I installed one of the superpage patches available
for Linux; this enables the granularity hint bits for Alpha to support
pages up to 4MB.  Then I modified ggc-page.c to allocate 4MB chucks by
anonymous mmap.

I then measured 70% fewer dtb misses for cc1, although wall clock time is
reduced by only ~5%.  So it would appear that TLB misses are indeed
important but not the overwhelming concern in gcc's performance.

Jeff

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-20  4:15                         ` Richard Earnshaw
  2002-08-20  5:38                           ` Jeff Sturm
@ 2002-08-20  8:00                           ` David Edelsohn
  1 sibling, 0 replies; 173+ messages in thread
From: David Edelsohn @ 2002-08-20  8:00 UTC (permalink / raw)
  To: Richard.Earnshaw; +Cc: Richard Henderson, David S. Miller, gcc

>>>>> Richard Earnshaw writes:

Richard> Do you have/can you get data for TLB misses?

	Yes.  I didn't comment on TLB statistics because it did not vary
much with optimization level or GCC versions.  GCC 2.95 is a little
better, but overlaps with GCC 3.3 TLB statistics.  Both GCC 2.95 and GCC
3.3 statistics follow the source file size.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 21:46                 ` Fergus Henderson
  2002-08-13 22:40                   ` David S. Miller
@ 2002-08-14  7:36                   ` Jeff Sturm
  1 sibling, 0 replies; 173+ messages in thread
From: Jeff Sturm @ 2002-08-14  7:36 UTC (permalink / raw)
  To: Fergus Henderson; +Cc: Loren James Rittle, davem, gcc

On Wed, 14 Aug 2002, Fergus Henderson wrote:
> On 13-Aug-2002, Loren James Rittle <rittle@latour.rsch.comm.mot.com> wrote:
> > Has anyone ever tested gcc with its own GC disabled
> > but boehm-gc enabled?  OK, this is a red herring question.  Even if
> > performance was greater, portability concerns are what caused the
> > decision to build a new custom scan-GC verses reusing boehm-gc...
>
> Yes, but GCC could use the Boehm GC on systems which supported it,
> if the Boehm GC was faster...
>
> I think this would be a very interesting experiment.

I tried it a year or so ago on the 3.0 sources.  Had a ggc-boehm.c
operating mostly conservatively.  Using ggc's marking infrastructure may
be possible, but seemed difficult to interface with boehm-gc.

One of the difficult problems is that boehm-gc doesn't want to follow
pointers through ordinary (malloc'ed) heap sections.  So I overrode
malloc/free to use the GC methods.

I made ggc_collect() a no-op, since boehm-gc knows when it needs to
collect, and overriding its heuristics doesn't really help matters anyway.

Overall it seemed to shave a few minutes off the bootstrap time, but also
increased memory usage considerably.  I expected this.  Tuning frequency
of collection typically amounts to a size/speed tradeoff.  I don't think
conservativeness was an important factor in heap size.

It could've been interesting to try incremental/generational collection.
I didn't do that.

My impression based partly on that experiment is that
allocation & collection overhead in GCC is not all that substantial, and
the real gains are going to be elsewhere, i.e. improving temporal locality
as has been discussed lately.  That isn't a problem that any GC is going
to fix.  (I also don't think it's a necessary evil of GC, rather it's how
you use the allocator... e.g. creating too many short-lived objects is a
bad thing.)

Jeff

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:51           ` Stan Shebs
                               ` (2 preceding siblings ...)
  2002-08-09 18:25             ` David S. Miller
@ 2002-08-10 10:02             ` Neil Booth
  3 siblings, 0 replies; 173+ messages in thread
From: Neil Booth @ 2002-08-10 10:02 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Aldy Hernandez, Mike Stump, gcc

Stan Shebs wrote:-

> One of my suspicions is that the massive use of macros in tree
> and RTL is concealing excessive pointer chasing, because they
> don't show up in either profile or coverage numbers.

Yes.  I look forward to the day when we use type-safe structures
that contain only the relevant information, rather than a "tree"
which is little more than the union of the universe, along with
compensating macros to detect type violations.

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:26       ` Stan Shebs
  2002-08-09 16:31         ` Aldy Hernandez
@ 2002-08-09 17:36         ` Daniel Berlin
  2002-08-12 16:23         ` Mike Stump
  2 siblings, 0 replies; 173+ messages in thread
From: Daniel Berlin @ 2002-08-09 17:36 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Aldy Hernandez, Mike Stump, gcc

On Fri, 9 Aug 2002, Stan Shebs wrote:

> Aldy Hernandez wrote:
> 
> >>Let's take my combine elision patch.  This patch makes the compiler 
> >>generate worse code.  The way in which it is worse, is that more stack 
> >>space is used.  How much more, well, my initial guess is that it is 
> >>less than 10% worse.  Not too bad.  Maybe users would care, maybe they 
> >>
> >
> >I assume you have already looked at the horrendity of the code
> >presently generated by -O0.  It's pretty unusable as it is.  Who would
> >really want to use gcc under the influence of "worse than -O0"?
> >Really.
> >
> OK, then to really rub it in, CW runs much faster than GCC, even on
> that slow Darwin OS :-), and that's with its non-optimizing case being
> about halfway between GCC's -O0 and -O1, and works well with the
> debugger still.
> 
> Sacrificing -O0 optimization is just a desperation move, since
> we don't seem to have many other ideas about how to make GCC as
> fast as CW.

Look, there are, in reality, two things that make our compiler slower 
than metrowerks, even at -O0

First is parsing.
The bison parser is just not fast. It never will be.
Period.

The second is expansion from tree to RTL.
It's not fast either.  The timings don't always tell the real story. There 
are cases where expansion is occuring when the timevar isn't pushed (IE 
other things that call expand_*, where * = anything but _body, where the 
timevar is pushed).

The solutions to the first is already in progress (give me a clean, 
working hand-written parser, that can compile libstdc++, and i'll happily make it 
go real fast.  I was just starting to when the branch was abandoned.).

Codewarrior, for comparison sake, uses a backtracking recursive descent 
parser for it's C++ compiler.

The second is hard to solve in a way people would like.  The fastest way 
to solve the problem is to do native code generation off the tree at -O0, 
avoiding any optimizations whatsoever.

This is, of course, not easy to do with our current MD files.
We really would need a *burg like tool and associated descriptions.
You could do debugging output without too much difficulty. Most of the 
debug_* functions operate on trees anyway.

PFE solves our first problem as well, but not the second one.  We still 
have to *generate* the code.

But there still have to be better answers than trying to avoid the backend 
entirely.  If our backend is so godawfully bad that we have to start 
skipping entire "normal" phases (IE not  optimizations to speed up code, 
or things that are done in plenty of other compilers at -O0), then we 
really *do* need to rearchitect them, and maybe more.
Not just directed speed ups.
At some point, it becomes easier to redo it from scratch well.
Particularly when nobody today understands why anyone thought it was a 
good idea to do it the way it's done now.

 --Dan

> 
> Stan
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:26       ` Stan Shebs
  2002-08-09 16:31         ` Aldy Hernandez
  2002-08-09 17:36         ` Daniel Berlin
@ 2002-08-12 16:23         ` Mike Stump
  2 siblings, 0 replies; 173+ messages in thread
From: Mike Stump @ 2002-08-12 16:23 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Aldy Hernandez, gcc

On Friday, August 9, 2002, at 04:25 PM, Stan Shebs wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:00     ` Aldy Hernandez
  2002-08-09 16:26       ` Stan Shebs
@ 2002-08-12 16:05       ` Mike Stump
  1 sibling, 0 replies; 173+ messages in thread
From: Mike Stump @ 2002-08-12 16:05 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc

On Friday, August 9, 2002, at 04:05 PM, Aldy Hernandez wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:28   ` Mike Stump
  2002-08-09 16:00     ` Aldy Hernandez
@ 2002-08-09 19:07     ` David Edelsohn
  1 sibling, 0 replies; 173+ messages in thread
From: David Edelsohn @ 2002-08-09 19:07 UTC (permalink / raw)
  To: Mike Stump, Stan Shebs; +Cc: gcc

	In regard to the benefit of some optimization at -O0, please see
http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00690.html ("The Death of
Stupid").

	Other comercial compilers are able to focus on compilation speed
at -O0 with some small, appropriate optimization.  They also efficiently
produce extremely good code with full optimization enabled.  They do not
need an additional -fquick-compile flag.

	GCC does not have much low-hanging fruit left.  IMHO, playing
these speed-up games distracts interested developers from addressing the
fundamental design problems which slow down GCC.  The underlying problems
have been mentioned in this discussion.  If we begin to attack them now,
we may have them ready for GCC 3.4.  If we keep looking for easy
solutions, GCC is going to remain at a disadvantage.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 12:17 Faster compilation speed Mike Stump
  2002-08-09 13:04 ` Noel Yap
  2002-08-09 13:10 ` Aldy Hernandez
@ 2002-08-09 14:29 ` Neil Booth
  2002-08-09 15:02   ` Nathan Sidwell
  2002-08-12 12:11   ` Mike Stump
  2002-08-09 14:51 ` Stan Shebs
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 173+ messages in thread
From: Neil Booth @ 2002-08-09 14:29 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

Mike Stump wrote:-

> I'd like to introduce lots of various changes to improve compiler 
> speed.

Just my opinion, Mike, but I think a lot of current slowness is due to
redo-ing too many things, and not taking advantage of ordering or whatever
technique so that conclusions deduced from internal representations are
made in a logical, efficent way.  (e.g. I think we try to constant fold
things that we've already tried to constant fold and failed, repeatedly,
and we don't do the constant folding we do do in an optimal way.  I could
be wrong, though; I've not looked in detail).  I cannot explain this
clearly, or with any specific example, but IMO we work far too hard to
do what we do.  I'd like to see this cleaned up instead.

For example, see some of Mark's recent patches.  I think we could continue
doing that for ages.  I also believe that using Bison (and our
ill-considered extensions like attributes pretty much anywhere) don't
help efficiency.  We could probably do better in the C front end with
a tree representation that is closer to C than the current
multi-language form of trees.

What worries me about PCH and similar schemes is it's too easy to fix
the symptoms, rather than the real reasons for the slowness.  As a
result, such things might never be fixed.

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 14:29 ` Neil Booth
@ 2002-08-09 15:02   ` Nathan Sidwell
  2002-08-09 17:05     ` Stan Shebs
  2002-08-10  2:21     ` Gabriel Dos Reis
  2002-08-12 12:11   ` Mike Stump
  1 sibling, 2 replies; 173+ messages in thread
From: Nathan Sidwell @ 2002-08-09 15:02 UTC (permalink / raw)
  To: Neil Booth; +Cc: Mike Stump, gcc

Neil Booth wrote:

> Just my opinion, Mike, but I think a lot of current slowness is due to
> redo-ing too many things, and not taking advantage of ordering or whatever
> technique so that conclusions deduced from internal representations are
> made in a logical, efficent way.  (e.g. I think we try to constant fold
> things that we've already tried to constant fold and failed, repeatedly,
> and we don't do the constant folding we do do in an optimal way.  I could
> be wrong, though; I've not looked in detail).  I cannot explain this
Yup, redoing things seems to happen a lot in the c++ front end.
The type conversion machinery seems to work a lot like
	if (complicated fn to try conversion 1)
	  complicated fn to do conversion 1
	else if (complicated fn to try conversion 2)
	  complicated fn to do conversion 2
	...
unifying static_cast, (cast), const_cast, implicit_conversion, overload
arg resolution might be a win.

I think you might be right about fold-const. That's recursive itself,
so we should only need to call that when we really need to flatten
a const, rather than after every new operation.

As you'll have noticed I'm tweaking the coverage machinery to
try and find hotspots and deadspots. My immediate plan for this
is to
a) fix .da files so they don't grow indefinitly large - nearly done
b) add some kind of __builtin_unexpected (), to mark expected
dead code
c) write some perl scripts to munge the gcov output

I hope some of that is useful to others.

nathan

-- 
Dr Nathan Sidwell   ::   http://www.codesourcery.com   ::   CodeSourcery LLC
         'But that's a lie.' - 'Yes it is. What's your point?'
nathan@codesourcery.com : http://www.cs.bris.ac.uk/~nathan/ : nathan@acm.org

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:02   ` Nathan Sidwell
@ 2002-08-09 17:05     ` Stan Shebs
  2002-08-10  2:21     ` Gabriel Dos Reis
  1 sibling, 0 replies; 173+ messages in thread
From: Stan Shebs @ 2002-08-09 17:05 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Neil Booth, Mike Stump, gcc

Nathan Sidwell wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:02   ` Nathan Sidwell
  2002-08-09 17:05     ` Stan Shebs
@ 2002-08-10  2:21     ` Gabriel Dos Reis
  1 sibling, 0 replies; 173+ messages in thread
From: Gabriel Dos Reis @ 2002-08-10  2:21 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Neil Booth, Mike Stump, gcc

Nathan Sidwell <nathan@codesourcery.com> writes:

| unifying static_cast, (cast), const_cast, implicit_conversion, overload
| arg resolution might be a win.

We might get correctness at the same time.

[...]

| I hope some of that is useful to others.

Definitely.

-- Gaby

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 14:29 ` Neil Booth
  2002-08-09 15:02   ` Nathan Sidwell
@ 2002-08-12 12:11   ` Mike Stump
  2002-08-12 12:41     ` David Edelsohn
  2002-08-12 19:17     ` Mike Stump
  1 sibling, 2 replies; 173+ messages in thread
From: Mike Stump @ 2002-08-12 12:11 UTC (permalink / raw)
  To: Neil Booth; +Cc: gcc

On Friday, August 9, 2002, at 02:27 PM, Neil Booth wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 12:11   ` Mike Stump
@ 2002-08-12 12:41     ` David Edelsohn
  2002-08-12 12:47       ` Matt Austern
  2002-08-12 19:17     ` Mike Stump
  1 sibling, 1 reply; 173+ messages in thread
From: David Edelsohn @ 2002-08-12 12:41 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

>>>>> Mike Stump writes:

Mike> Instead?  Well, I cannot promise instead, but I think it is reasonable 
Mike> to look at it in addition to all the other stuff.

	If Apple wants to tackle one or more of the fundamental GCC design
problems affecting compiler performance which have been mentioned during
this discussion, I think that Apple will have a lot of support and help
from GCC developers.  This means doing the analysis of the problem,
experimenting with possible approaches, designing a solution, and
implementing that solution with the entire GCC development community.

	Fiddling around the edges, disabling functionality to save
compilation time is not likely to be effective for Apple or for the GCC
community.  The big gains are to be found in revising the design and
implementation of GCC's underlying infrastructure, not lots of little
tweaks.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 12:41     ` David Edelsohn
@ 2002-08-12 12:47       ` Matt Austern
  2002-08-12 12:56         ` David S. Miller
  0 siblings, 1 reply; 173+ messages in thread
From: Matt Austern @ 2002-08-12 12:47 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Mike Stump, gcc

On Monday, August 12, 2002, at 12:40 PM, David Edelsohn wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 12:47       ` Matt Austern
@ 2002-08-12 12:56         ` David S. Miller
  2002-08-12 13:56           ` Matt Austern
  2002-08-12 14:28           ` Stan Shebs
  0 siblings, 2 replies; 173+ messages in thread
From: David S. Miller @ 2002-08-12 12:56 UTC (permalink / raw)
  To: austern; +Cc: dje, mrs, gcc

   From: Matt Austern <austern@apple.com>
   Date: Mon, 12 Aug 2002 12:47:30 -0700
   
   And yes, we're aware that many gains are possible only
   if we rewrite the parser or redesign the tree structure.  The
   only reason we haven't started on rewriting the parser is
   that someone else is already doing it.

So work on an attempt at RTL refcounting, the patch below is a place
to start.

Next you have to:

1) walk through the whole compiler and add all the
   proper {GET,PUT}_RTX calls.

2) find a solution for circular RTL

   I would suggest as a first pass (ie. to get some performance
   numbers), special case things like INSN_LISTs and just don't
   refcount for the references to INSNs they generate.  Likewise
   for INSN dependency lists generated by the scheduler et al.

3) bring it at least to the point where you can successfully
   get a successful build of some non-trivial source file.
   Perhaps gcc/reload.i.  Even if it requires some gross hacks
   to get it to pass through, post GC vs. refcounting performance
   numbers.

4) Almost certainly, in trying to refcount things correctly, you will
   spot real bugs in the compiler.  Please keep track of these so they
   can be fixed independant of whether the rtx refcounting is ever
   used or not.

5) If you are still bored at this point, add the machinery to use the
   RTX walking of the current garbage collector to verify the
   reference counts.  This will basically be required in order to
   make and sufficiently correctness check a final implementation.

   It would be enabled by default, so that if any refence counts
   go wrong they will be spotted with impunity.  This is part of the
   sociological aspect of these changes, namely getting people to
   think about proper resource tracking when working with RTL
   objects.  If the compiler explodes when they get it wrong, they
   will learn eventually :-)

Because if someone else doesn't do this, I will end up doing
so :-)

--- ./rtl.h.~1~	Sun Aug 11 19:04:35 2002
+++ ./rtl.h	Sun Aug 11 20:42:02 2002
@@ -130,6 +130,9 @@ struct rtx_def
   /* The kind of value the expression has.  */
   ENUM_BITFIELD(machine_mode) mode : 8;
 
+  /* Reference count.  */
+  unsigned int __count : 24;
+
   /* 1 in a MEM if we should keep the alias set for this mem unchanged
      when we access a component.
      1 in a CALL_INSN if it is a sibling call.
@@ -184,7 +187,7 @@ struct rtx_def
      1 in a REG means this reg refers to the return value
      of the current function.
      1 in a SYMBOL_REF if the symbol is weak.  */
-  unsigned integrated : 1;
+  unsigned int integrated : 1;
   /* 1 in an INSN or a SET if this rtx is related to the call frame,
      either changing how we compute the frame address or saving and
      restoring registers in the prologue and epilogue.
@@ -193,7 +196,7 @@ struct rtx_def
      1 in a REG if the register is a pointer.
      1 in a SYMBOL_REF if it addresses something in the per-function
      constant string pool.  */
-  unsigned frame_related : 1;
+  unsigned int frame_related : 1;
 
   /* The first element of the operands of this rtx.
      The number of operands and their types are controlled
@@ -211,12 +214,25 @@ struct rtx_def
 #define GET_MODE(RTX)	    ((enum machine_mode) (RTX)->mode)
 #define PUT_MODE(RTX, MODE) ((RTX)->mode = (ENUM_BITFIELD(machine_mode)) (MODE))
 
+/* Define macros to get/put references to RTL objects.  */
+
+#define GET_RTX(RTX)		(((RTX)->__count)++)
+#define PUT_RTX(RTX) \
+do \
+  { \
+    if (--((RTX)->__count) == 0) \
+      __put_rtx(RTX); \
+  } \
+while (0)
+
+
 /* RTL vector.  These appear inside RTX's when there is a need
    for a variable number of things.  The principle use is inside
    PARALLEL expressions.  */
 
 struct rtvec_def GTY(()) {
   int num_elem;		/* number of elements */
+  int __count;		/* reference count */
   rtx GTY ((length ("%h.num_elem"))) elem[1];
 };
 
@@ -225,6 +241,15 @@ struct rtvec_def GTY(()) {
 #define GET_NUM_ELEM(RTVEC)		((RTVEC)->num_elem)
 #define PUT_NUM_ELEM(RTVEC, NUM)	((RTVEC)->num_elem = (NUM))
 
+#define GET_RTVEC(RTVEC)	(((RTVEC)->__count)++)
+#define PUT_RTVEC(RTVEC) \
+do \
+  { \
+    if (--((RTVEC)->__count) == 0) \
+      __put_rtvec(RTVEC); \
+  } \
+while (0)
+
 /* Predicate yielding nonzero iff X is an rtl for a register.  */
 #define REG_P(X) (GET_CODE (X) == REG)
 
@@ -1347,6 +1372,8 @@ extern rtx emit_copy_of_insn_after	PARAM
 extern rtx rtx_alloc			PARAMS ((RTX_CODE));
 extern rtvec rtvec_alloc		PARAMS ((int));
 extern rtx copy_rtx			PARAMS ((rtx));
+extern void __put_rtx			PARAMS ((rtx));
+extern void __put_rtvec			PARAMS ((rtvec));
 
 /* In emit-rtl.c */
 extern rtx copy_rtx_if_shared		PARAMS ((rtx));
--- ./gengenrtl.c.~1~	Sun Aug 11 19:04:33 2002
+++ ./gengenrtl.c	Sun Aug 11 20:45:18 2002
@@ -278,11 +278,15 @@ gendef (format)
      the memory and initializes it.  */
   puts ("{");
   puts ("  rtx rt;");
-  printf ("  rt = ggc_alloc_rtx (%d);\n", (int) strlen (format));
+  puts ("  int n;");
+  printf ("  n = (sizeof (struct rtx_def) + ((%d - 1) * sizeof(rtunion)));\n",
+	  (int) strlen (format));
+  puts ("  rt = xmalloc (n);\n");
 
   puts ("  memset (rt, 0, sizeof (struct rtx_def) - sizeof (rtunion));\n");
   puts ("  PUT_CODE (rt, code);");
   puts ("  PUT_MODE (rt, mode);");
+  puts ("  rt->__count = 1;");
 
   for (p = format, i = j = 0; *p ; ++p, ++i)
     if (*p != '0')
--- ./rtl.c.~1~	Tue Jun  4 14:06:54 2002
+++ ./rtl.c	Sun Aug 11 20:53:11 2002
@@ -242,14 +242,34 @@ rtvec_alloc (n)
 {
   rtvec rt;
 
-  rt = ggc_alloc_rtvec (n);
+  n = (sizeof(struct rtvec_def)
+       + ((n - 1) * sizeof (rtx)));
+  rt = xmalloc (n);
+
+  PUT_NUM_ELEM (rt, n);
+  rt->__count = 1;
+
   /* clear out the vector */
   memset (&rt->elem[0], 0, n * sizeof (rtx));
 
-  PUT_NUM_ELEM (rt, n);
   return rt;
 }
 
+void
+__put_rtvec (rv)
+     rtvec rv;
+{
+  int i, len = GET_NUM_ELEM (rv);
+
+  for (i = 0; i < len; i++)
+    {
+      if (! rv->elem[i])
+	abort ();
+      PUT_RTX (rv->elem[i]);
+    }
+  xfree (rv);
+}
+
 /* Allocate an rtx of code CODE.  The CODE is stored in the rtx;
    all the rest is initialized to zero.  */
 
@@ -258,9 +278,11 @@ rtx_alloc (code)
   RTX_CODE code;
 {
   rtx rt;
-  int n = GET_RTX_LENGTH (code);
+  int n;
 
-  rt = ggc_alloc_rtx (n);
+  n = (sizeof (struct rtx_def)
+       + ((GET_RTX_LENGTH (code) - 1) * sizeof(rtunion)));
+  rt = xmalloc (n);
 
   /* We want to clear everything up to the FLD array.  Normally, this
      is one int, but we don't want to assume that and it isn't very
@@ -268,7 +290,58 @@ rtx_alloc (code)
 
   memset (rt, 0, sizeof (struct rtx_def) - sizeof (rtunion));
   PUT_CODE (rt, code);
+  rt->__count = 1;
   return rt;
+}
+
+void
+__put_rtx(rt)
+     rtx rt;
+{
+  char *fmt;
+  int i, j, len;
+
+  fmt = GET_RTX_FORMAT (GET_CODE (rt));
+  len = GET_RTX_LENGTH (GET_CODE (rt));
+  for (i = 0; i < len; i++) {
+    switch (fmt[i]) {
+	case 'e':
+	  if (! XEXP (rt, i))
+	    abort ();
+	  PUT_RTX (XEXP (rt, i));
+	  break;
+
+	case 'E':
+	case 'V':
+	  /* XXX How to handle vectors... XXX */
+	  if (XVEC (rt, i) != NULL)
+	    {
+	      for (j = 0; j < XVECLEN (rt, i); j++)
+		{
+		  if (! XVECEXP (rt, i, j))
+		    abort ();
+		  PUT_RTX (XVECEXP (rt, i, j));
+		}
+	    }
+	  break;
+
+	case 't':
+	case 'w':
+	case 'i':
+	case 's':
+	case 'S':
+	case 'T':
+	case 'u':
+	case 'B':
+	case '0':
+	  break;
+
+	default:
+	  abort ();
+    };
+  }
+
+  xfree(rt);
 }
 
 \f

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 12:56         ` David S. Miller
@ 2002-08-12 13:56           ` Matt Austern
  2002-08-12 14:27             ` Daniel Berlin
                               ` (2 more replies)
  2002-08-12 14:28           ` Stan Shebs
  1 sibling, 3 replies; 173+ messages in thread
From: Matt Austern @ 2002-08-12 13:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: dje, mrs, gcc

On Monday, August 12, 2002, at 12:43 PM, David S. Miller wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 13:56           ` Matt Austern
@ 2002-08-12 14:27             ` Daniel Berlin
  2002-08-12 15:26               ` David Edelsohn
  2002-08-12 14:59             ` David S. Miller
  2002-08-12 16:00             ` Geoff Keating
  2 siblings, 1 reply; 173+ messages in thread
From: Daniel Berlin @ 2002-08-12 14:27 UTC (permalink / raw)
  To: Matt Austern; +Cc: David S. Miller, dje, mrs, gcc

On Mon, 12 Aug 2002, Matt Austern wrote:

> On Monday, August 12, 2002, at 12:43 PM, David S. Miller wrote:
> 
> >    From: Matt Austern <austern@apple.com>
> >    Date: Mon, 12 Aug 2002 12:47:30 -0700
> >
> >    And yes, we're aware that many gains are possible only
> >    if we rewrite the parser or redesign the tree structure.  The
> >    only reason we haven't started on rewriting the parser is
> >    that someone else is already doing it.
> >
> > So work on an attempt at RTL refcounting, the patch below is a place
> > to start.
> 
> Thanks for the pointer, that's a useful starting point.
> 
> But, at the risk of sounding like a broken record...  Do
> we have benchmarks showing that RTL gc is one of
> the major causes of slow compile speed?
> 
> At the moment, we're spending a lot of time doing
> benchmarking and trying to figure out just where the
> time is going.  I realize this has its limitations, that
> poorly designed data structures may end up resulting
> in tiny bits of overhead everywhere even if they never
> show up in a profile.  But at least we can try to
> understand what kinds of programs are especially
> bad.  (One interesting fact, for example: one file that
> we care a lot about takes twice as long to compile with
> the C++ front end than with the C front end.)

Well, the tools for this stuff are much better on osx than on Linux, so 
you guys are probably ahead of others in figuring out whether GC is really 
bad for us.

You can easily get numbers like data cache miss cycles, etc, and graph 
them nicely with MONster.
-Dan

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 14:27             ` Daniel Berlin
@ 2002-08-12 15:26               ` David Edelsohn
  2002-08-13 10:49                 ` David Edelsohn
  0 siblings, 1 reply; 173+ messages in thread
From: David Edelsohn @ 2002-08-12 15:26 UTC (permalink / raw)
  To: Daniel Berlin, Matt Austern, David S. Miller; +Cc: gcc

	I have IBM's hpmcount tool installed on a Power4 AIX 5.1 system
which can use PMAPI to access the hardware performance counters on the
chip.  I would be happy to provide additional data for comparison with the
x86 cache statistics which have been mentioned.

	So that we're all on the same page, what sourcefile is being
compiled with which GCC options?

	I can acquire information like for cc1 -O2 hello.c:

  PM_DTLB_MISS (Data TLB misses)               :            5538
  PM_ITLB_MISS (Instruction TLB misses)        :             819
  PM_LD_MISS_L1 (L1 D cache load misses)       :           43074
  PM_ST_MISS_L1 (L1 D cache store misses)      :          349240
  PM_ST_REF_L1 (L1 D cache store references)   :         1958037
  PM_LD_REF_L1 (L1 D cache load references)    :         3113549

  Utilization rate                           :          29.438 %
  % TLB misses per cycle                     :           0.038 %
  Avg number of loads per TLB miss           :         562.215
  Load and store operations                  :           5.072 M
  Instructions per load/store                :           2.899
  Avg number of loads per load miss          :          72.284
  Avg number of stores per store miss        :           5.607
  Avg number of load/stores per D1 miss      :          12.927
  L1 cache hit rate                          :          92.264 %


  PM_DATA_FROM_L3 (Data loaded from L3)                   :            1420
  PM_DATA_FROM_MEM (Data loaded from memory)              :             144
  PM_DATA_FROM_L35 (Data loaded from L3.5)                :              19
  PM_DATA_FROM_L2 (Data loaded from L2)                   :           36410
  PM_DATA_FROM_L25_SHR (Data loaded from L2.5 shared)     :               0
  PM_DATA_FROM_L275_SHR (Data loaded from L2.75 shared)   :               0
  PM_DATA_FROM_L275_MOD (Data loaded from L2.75 modified) :               0
  PM_DATA_FROM_L25_MOD (Data loaded from L2.5 modified)   :               0

  Memory traffic                             :           0.074 MBytes
  Memory bandwidth                           :           1.589 MBytes/sec
  Total loads from L3                        :           0.001 M
  L3 traffic                                 :           0.184 MBytes
  L3 bandwidth                               :           3.970 MBytes/sec
  L3 Load miss rate                          :           9.097 %
  Total loads from L2                        :           0.036 M
  L2 traffic                                 :           4.660 MBytes
  L2 bandwidth                               :         100.446 MBytes/sec
  L2 Load miss rate                          :           4.167 %


David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 15:26               ` David Edelsohn
@ 2002-08-13 10:49                 ` David Edelsohn
  2002-08-13 10:52                   ` David S. Miller
                                     ` (2 more replies)
  0 siblings, 3 replies; 173+ messages in thread
From: David Edelsohn @ 2002-08-13 10:49 UTC (permalink / raw)
  To: Daniel Berlin, Matt Austern, David S. Miller; +Cc: gcc

Source file	Insns / L1 D$ Miss
-----------	------------------
reload.c		22
reload1.c		25
insn-recog.c		29

GCC 3.3 20020812 (experimental)
powerpc-ibm-aix5.1.0.0
Power4 processor

	As one of my colleagues commented, this is the cache behavior one
would see with database transaction processing.  In other words, this is
*really bad*.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 10:49                 ` David Edelsohn
@ 2002-08-13 10:52                   ` David S. Miller
  2002-08-13 14:03                   ` David Edelsohn
  2002-08-13 15:32                   ` Daniel Berlin
  2 siblings, 0 replies; 173+ messages in thread
From: David S. Miller @ 2002-08-13 10:52 UTC (permalink / raw)
  To: dje; +Cc: dan, austern, gcc

   From: David Edelsohn <dje@watson.ibm.com>
   Date: Tue, 13 Aug 2002 13:49:18 -0400

   As one of my colleagues commented, this is the cache behavior one
   would see with database transaction processing.  In other words, this is
   *really bad*.

Thanks for doing these tests David.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 10:49                 ` David Edelsohn
  2002-08-13 10:52                   ` David S. Miller
@ 2002-08-13 14:03                   ` David Edelsohn
  2002-08-13 14:46                     ` Geoff Keating
                                       ` (2 more replies)
  2002-08-13 15:32                   ` Daniel Berlin
  2 siblings, 3 replies; 173+ messages in thread
From: David Edelsohn @ 2002-08-13 14:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: dan, austern, gcc

	Here's an interesting (aka depressing) data point.  My previous
cache miss statistics were for GCC -O2.  At -O0, GCC's cache miss
statistics stay the same or get up to 20% *worse*.  In comparison, the
cache statistics for IBM's compiler without optimization enabled *improve*
up to 50 for the same reload.c and insn-recog.c input files compared to
optimized.

	GCC has some sort of overhead, maybe the tree->RTL conversion as
Dan mentioned, which really hurts re-use at -O0.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 14:03                   ` David Edelsohn
@ 2002-08-13 14:46                     ` Geoff Keating
  2002-08-13 15:10                       ` David Edelsohn
  2002-08-14  9:25                     ` Kevin Handy
  2002-08-18 12:58                     ` Jeff Sturm
  2 siblings, 1 reply; 173+ messages in thread
From: Geoff Keating @ 2002-08-13 14:46 UTC (permalink / raw)
  To: David Edelsohn; +Cc: gcc

David Edelsohn <dje@watson.ibm.com> writes:

> 	Here's an interesting (aka depressing) data point.  My previous
> cache miss statistics were for GCC -O2.  At -O0, GCC's cache miss
> statistics stay the same or get up to 20% *worse*.  In comparison, the
> cache statistics for IBM's compiler without optimization enabled *improve*
> up to 50 for the same reload.c and insn-recog.c input files compared to
> optimized.
> 
> 	GCC has some sort of overhead, maybe the tree->RTL conversion as
> Dan mentioned, which really hurts re-use at -O0.

Could you try with -fsyntax-only?

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 14:46                     ` Geoff Keating
@ 2002-08-13 15:10                       ` David Edelsohn
  2002-08-13 15:26                         ` Neil Booth
  0 siblings, 1 reply; 173+ messages in thread
From: David Edelsohn @ 2002-08-13 15:10 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

>>>>> Geoff Keating writes:

Geoff> Could you try with -fsyntax-only?

Source		I/D$ miss -O2	I/D$ miss -O0	I/D$ miss -fsyntax-only
------------	-------------	-------------	-----------------------
reload.c		22		22		23
reload1.c		25		22		23
insn-recog.c		29		23		26

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 15:10                       ` David Edelsohn
@ 2002-08-13 15:26                         ` Neil Booth
  0 siblings, 0 replies; 173+ messages in thread
From: Neil Booth @ 2002-08-13 15:26 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Geoff Keating, gcc

David Edelsohn wrote:-

> >>>>> Geoff Keating writes:
> 
> Geoff> Could you try with -fsyntax-only?
> 
> Source		I/D$ miss -O2	I/D$ miss -O0	I/D$ miss -fsyntax-only
> ------------	-------------	-------------	-----------------------
> reload.c		22		22		23
> reload1.c		25		22		23
> insn-recog.c		29		23		26

And -E 8-)  I'd actually be quite curious if you have time.

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 14:03                   ` David Edelsohn
  2002-08-13 14:46                     ` Geoff Keating
@ 2002-08-14  9:25                     ` Kevin Handy
  2002-08-18 12:58                     ` Jeff Sturm
  2 siblings, 0 replies; 173+ messages in thread
From: Kevin Handy @ 2002-08-14  9:25 UTC (permalink / raw)
  To: gcc

David Edelsohn wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 14:03                   ` David Edelsohn
  2002-08-13 14:46                     ` Geoff Keating
  2002-08-14  9:25                     ` Kevin Handy
@ 2002-08-18 12:58                     ` Jeff Sturm
  2002-08-19 12:55                       ` Mike Stump
  2002-08-20 11:22                       ` Will Cohen
  2 siblings, 2 replies; 173+ messages in thread
From: Jeff Sturm @ 2002-08-18 12:58 UTC (permalink / raw)
  To: David Edelsohn; +Cc: David S. Miller, dan, austern, gcc

On Tue, 13 Aug 2002, David Edelsohn wrote:
> 	Here's an interesting (aka depressing) data point.  My previous
> cache miss statistics were for GCC -O2.  At -O0, GCC's cache miss
> statistics stay the same or get up to 20% *worse*.  In comparison, the
> cache statistics for IBM's compiler without optimization enabled *improve*
> up to 50 for the same reload.c and insn-recog.c input files compared to
> optimized.

Here's a data point on alpha-linux:

cc1 -quiet -O2 reload.i
issues/cycles = 0.51  issues/dcache_miss = 26.93

Without optimization:

cc1 -quiet  reload.i
issues/cycles = 0.52  issues/dcache_miss = 31.29

This is on a ev56 with a direct-mapped cache.  To get some idea where the
misses are taking place, I experimented with iprobe's sampling mode.
Omitting results below the 1% sample threshold, I get:

function                    | issues | access | misses | i/m |  a/m
----------------------------+--------+--------+--------+-----+-----
yyparse                     |   2924 |    848 |    148 |  20 |  5.7
gt_ggc_mx_lang_tree_node    |   1336 |    612 |     74 |  18 |  8.2
verify_flow_info            |   1388 |    408 |    129 |  11 |  3.1
copy_rtx_if_shared          |   2120 |   1012 |     53 |  40 | 19.0
propagate_one_insn          |   3636 |    504 |     52 |  70 |  9.6
find_temp_slot_from_address |    728 |    232 |    126 |   6 |  1.8
ggc_mark_rtx_children_1     |   1580 |    316 |     40 |  40 |  7.9
extract_insn                |   1576 |    476 |     52 |  30 |  9.1
record_reg_classes          |   3848 |    944 |     65 |  59 | 14.5
reg_scan_mark_refs          |   1472 |    632 |     66 |  22 |  9.5
find_reloads                |   7680 |   3104 |    148 |  52 | 20.9
subst_reloads               |   4772 |   2736 |    169 |  28 | 16.1
side_effects_p              |   1344 |    564 |     43 |  31 | 13.1
for_each_rtx                |   4924 |   1464 |     75 |  66 | 19.5
ggc_alloc                   |   2424 |    728 |    111 |  22 |  6.5
ggc_set_mark                |   3392 |    976 |    107 |  32 |  9.1

(Each sample reported is 2^14 events.)

yyparse performs badly (as would any table-driven parser), but how about
verify_flow_info and find_temp_slot_from_address?  Both are reporting
awful cache behavior.

Jeff

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-18 12:58                     ` Jeff Sturm
@ 2002-08-19 12:55                       ` Mike Stump
  2002-08-20 11:22                       ` Will Cohen
  1 sibling, 0 replies; 173+ messages in thread
From: Mike Stump @ 2002-08-19 12:55 UTC (permalink / raw)
  To: Jeff Sturm; +Cc: David Edelsohn, David S. Miller, dan, austern, gcc

On Sunday, August 18, 2002, at 12:57 PM, Jeff Sturm wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-18 12:58                     ` Jeff Sturm
  2002-08-19 12:55                       ` Mike Stump
@ 2002-08-20 11:22                       ` Will Cohen
  1 sibling, 0 replies; 173+ messages in thread
From: Will Cohen @ 2002-08-20 11:22 UTC (permalink / raw)
  To: Jeff Sturm; +Cc: David Edelsohn, David S. Miller, dan, austern, gcc

How about reordering the rows and columns in the table used by yyparse 
to improve locality?  Have a instrumented version of the yyparse to 
record the number of times each transition is taken and use the data to 
interchange rows and columns to attempt to get frequent transitions in 
the same cache line (or at least not conflicting memory locations). It 
would be a kind of feedback-directed optimization 
(-fprofile-arcs/-fbranch-probabilities) for bison.

-Will

Jeff Sturm wrote:
On Tue, 13 Aug 2002, David Edelsohn wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 10:49                 ` David Edelsohn
  2002-08-13 10:52                   ` David S. Miller
  2002-08-13 14:03                   ` David Edelsohn
@ 2002-08-13 15:32                   ` Daniel Berlin
  2002-08-13 15:58                     ` David Edelsohn
  2 siblings, 1 reply; 173+ messages in thread
From: Daniel Berlin @ 2002-08-13 15:32 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Daniel Berlin, Matt Austern, David S. Miller, gcc

On Tue, 13 Aug 2002, David Edelsohn wrote:

> Source file	Insns / L1 D$ Miss
> -----------	------------------
> reload.c		22
> reload1.c		25
> insn-recog.c		29
> 
> GCC 3.3 20020812 (experimental)
> powerpc-ibm-aix5.1.0.0
> Power4 processor
> 
> 	As one of my colleagues commented, this is the cache behavior one
> would see with database transaction processing.  In other words, this is
> *really bad*.

Yup.

> 
> David
> 
> 

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 15:32                   ` Daniel Berlin
@ 2002-08-13 15:58                     ` David Edelsohn
  2002-08-13 16:49                       ` David S. Miller
  0 siblings, 1 reply; 173+ messages in thread
From: David Edelsohn @ 2002-08-13 15:58 UTC (permalink / raw)
  To: dberlin; +Cc: Daniel Berlin, Matt Austern, David S. Miller, gcc

>>>>> Daniel Berlin writes:

>> As one of my colleagues commented, this is the cache behavior one
>> would see with database transaction processing.  In other words, this is
>> *really bad*.

Daniel> Yup.

	The problem isn't that the number is low at optimization.  29 I/M
is not horrible.  Low 20's is bad.  Scientific code will have a value in
the low hundreds, but compilation is not that regular a computation.

	The problem is that the number stays the same or gets worse
without optimization.  Most commercial compilers will be in the same
ballpark when optimizing, but use a lot fewer instructions and a lot fewer
cache misses to produce minimally optimized, debuggable code.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13 15:58                     ` David Edelsohn
@ 2002-08-13 16:49                       ` David S. Miller
  0 siblings, 0 replies; 173+ messages in thread
From: David S. Miller @ 2002-08-13 16:49 UTC (permalink / raw)
  To: dje; +Cc: dberlin, dan, austern, gcc

   From: David Edelsohn <dje@watson.ibm.com>
   Date: Tue, 13 Aug 2002 18:58:25 -0400
   
   	The problem isn't that the number is low at optimization.

Can you control when the performance counters start/stop monitoring?
If so, then you can figure out more precisely whether it is mostly
during:

1) Front end tree or tree->rtl conversion

2) rest_of_compilation() onward

3) Both #1 and #2 about evenly, because all of our core data
   structures come out of GC the whole compiler has bad spatial
   and temporal locality

My money is on #3 :-)

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 13:56           ` Matt Austern
  2002-08-12 14:27             ` Daniel Berlin
@ 2002-08-12 14:59             ` David S. Miller
  2002-08-12 16:00             ` Geoff Keating
  2 siblings, 0 replies; 173+ messages in thread
From: David S. Miller @ 2002-08-12 14:59 UTC (permalink / raw)
  To: austern; +Cc: dje, mrs, gcc

   From: Matt Austern <austern@apple.com>
   Date: Mon, 12 Aug 2002 13:56:32 -0700

   But, at the risk of sounding like a broken record...  Do
   we have benchmarks showing that RTL gc is one of
   the major causes of slow compile speed?

It's not the GC it's the resulting data access patterns
that result, and such overhead won't show up in normal profiling since
such overhead is simply spread all over the compiler.

That's the purpose of hobbling together a "hack" implementation
of refcounting, to get some performance comparisons.  You don't have
to do a "final" perfect implementation to realize a tree usable enough
for simple initial benchmarking.  Based upon those results, we can
decide to continue or not.

But hey if people are going to be silly enough to require
pre-benchmarking before even laying a finger on the refcounting
bits, no problem we'll just have to wait for me to work on it
then ;-)

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 13:56           ` Matt Austern
  2002-08-12 14:27             ` Daniel Berlin
  2002-08-12 14:59             ` David S. Miller
@ 2002-08-12 16:00             ` Geoff Keating
  2002-08-13  2:58               ` Nick Ing-Simmons
  2002-08-13 10:47               ` Richard Henderson
  2 siblings, 2 replies; 173+ messages in thread
From: Geoff Keating @ 2002-08-12 16:00 UTC (permalink / raw)
  To: Matt Austern; +Cc: gcc

Matt Austern <austern@apple.com> writes:

> On Monday, August 12, 2002, at 12:43 PM, David S. Miller wrote:
> 
> >    From: Matt Austern <austern@apple.com>
> >    Date: Mon, 12 Aug 2002 12:47:30 -0700
> >
> >    And yes, we're aware that many gains are possible only
> >    if we rewrite the parser or redesign the tree structure.  The
> >    only reason we haven't started on rewriting the parser is
> >    that someone else is already doing it.
> >
> > So work on an attempt at RTL refcounting, the patch below is a place
> > to start.
> 
> Thanks for the pointer, that's a useful starting point.
> 
> But, at the risk of sounding like a broken record...  Do
> we have benchmarks showing that RTL gc is one of
> the major causes of slow compile speed?

We happen to know that GC as a whole is 10-13% of total compile time,
even at -O0, and my expectation is that the RTL part of that is
perhaps two-thirds, say 7%.  So the benefit you can get is 7% less any
overhead in tracking the reference counts and freeing
briefly-allocated RTL.  

My suggestion is to try shrinking RTL in other ways.  For instance,
once RTL is generated it should all match an insn or a splitter.  If
we could store RTL as the insn number (or a splitter number) plus the
operands, rather than the expanded form we have now, that should be
much easier to traverse.  For those operations that look at the form
of RTL, code could be generated to perform that operation knowing what
insns exist; for instance, on x86 the form of the 'add' instruction is:

(insn 15 13 17 (parallel[
            (set (reg:SI 61)
                (plus:SI (reg/v:SI 59)
                    (reg/v:SI 60)))
            (clobber (reg:CC 17 flags))
        ] ) -1 (nil)
    (nil))

we could represent this as

(packed_insn 15 13 17 207 {*addsi_1} [(reg:SI 61) (reg:SI 59) (reg:SI 60)])

which would save us, by my count, 50% of the RTL objects for this
case.  I'd expect that would then speed GC (on this object) by 50%,
speed up allocation by 50%, and hopefully would also speed up code
that uses these objects because (a) they'd better fit in cache and (b)
there would be fewer pointers to chase.

To perform operations that are now done directly on the RTL, there'd be
a switch statement, for instance:

int reg_mentioned_p (reg, in)
{
...
  case PACKED_INSN:
    switch (PACKED_INSN_NUMBER (in)) {
...
      case 207: /* *addsi_1 */
	if (REGNO (reg) == 17)  // deal with the clobbered register
          return 1;
        // deal with the operands
        break;
    }
...
}

Even combine can be handled this way, by pregenerating rules based on
the insn numbers being combined.  Relatively few insns can actually be
combined, so it shouldn't require a huge amount of generated code.

On RISCy chips, you could take even further advantage of the fact that
often an operand is guaranteed to be a register, or a constant
integer or whatever, and so eliminate some tests.

I'm not sure how much work this is to implement.  I suspect what you'd
end up doing is performing a trade-off between generating too many
routines and having to rewrite large chunks of old code to use the
routines that already exist but that they don't use.

Now, if only I could think of something that would work like this on
trees...

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 16:00             ` Geoff Keating
@ 2002-08-13  2:58               ` Nick Ing-Simmons
  2002-08-13 10:47               ` Richard Henderson
  1 sibling, 0 replies; 173+ messages in thread
From: Nick Ing-Simmons @ 2002-08-13  2:58 UTC (permalink / raw)
  To: geoffk; +Cc: gcc, Matt Austern

Geoff Keating <geoffk@geoffk.org> writes:
>
>We happen to know that GC as a whole is 10-13% of total compile time,
>even at -O0, and my expectation is that the RTL part of that is
>perhaps two-thirds, say 7%.  So the benefit you can get is 7% less any
>overhead in tracking the reference counts and freeing
>briefly-allocated RTL.  

That does not take into account the cache/tlb locality effects that 
Linus explained are caused by delayed reclaimation.

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 16:00             ` Geoff Keating
  2002-08-13  2:58               ` Nick Ing-Simmons
@ 2002-08-13 10:47               ` Richard Henderson
  1 sibling, 0 replies; 173+ messages in thread
From: Richard Henderson @ 2002-08-13 10:47 UTC (permalink / raw)
  To: Geoff Keating; +Cc: Matt Austern, gcc

On Mon, Aug 12, 2002 at 04:00:08PM -0700, Geoff Keating wrote:
> My suggestion is to try shrinking RTL in other ways.  For instance,
> once RTL is generated it should all match an insn or a splitter.  If
> we could store RTL as the insn number (or a splitter number) plus the
> operands, rather than the expanded form we have now, that should be
> much easier to traverse.

I've thought about this in passing before.

> (packed_insn 15 13 17 207 {*addsi_1} [(reg:SI 61) (reg:SI 59) (reg:SI 60)])
> 
> which would save us, by my count, 50% of the RTL objects for this
> case.

A bit more than that if the packed_insn rtl is actually variable
sized so that the operands are directly at the end of the other
arguments.

> To perform operations that are now done directly on the RTL, there'd be
> a switch statement, for instance:

Another possible solution, particularly for bletcherous code like
combine, is to regenerate the full instruction on demand.  After
try_combine is done with an insn, we free it immediately so that
we don't accumulate garbage.

But I suspect that most passes don't need this.  They only need
to know which operands are inputs, sets, and clobbers.  They need
to know which predicates apply.  Information which is trivial to
generate off the md file.

This idea, I think, has real potential, and could actually be
implemented without disrupting the entire compiler.

> Even combine can be handled this way, by pregenerating rules based on
> the insn numbers being combined.  Relatively few insns can actually be
> combined, so it shouldn't require a huge amount of generated code.

Pre-generating the combinations would be really cool, and probably
save quite a bit o time, but I don't really believe in that for even
the medium term.  The number of possibilities is really quite large.

> Now, if only I could think of something that would work like this on
> trees...

Having stronger typing instead of the union-of-everything would do.

r~

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 12:56         ` David S. Miller
  2002-08-12 13:56           ` Matt Austern
@ 2002-08-12 14:28           ` Stan Shebs
  2002-08-12 15:05             ` David S. Miller
  1 sibling, 1 reply; 173+ messages in thread
From: Stan Shebs @ 2002-08-12 14:28 UTC (permalink / raw)
  To: David S. Miller; +Cc: austern, dje, mrs, gcc

David S. Miller wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 14:28           ` Stan Shebs
@ 2002-08-12 15:05             ` David S. Miller
  0 siblings, 0 replies; 173+ messages in thread
From: David S. Miller @ 2002-08-12 15:05 UTC (permalink / raw)
  To: shebs; +Cc: austern, dje, mrs, gcc

   From: Stan Shebs <shebs@apple.com>
   Date: Mon, 12 Aug 2002 14:27:52 -0700

   So, uh, did I miss the part where refcounting is shown to be an improvement
   over the status quo?  It's plausible I suppose, but counting does have its
   overhead too.  We ought to have at least a back-of-the-envelope estimate
   before changing everything...

You can choose to do that, but I bet you can spend the same amount of
effort getting a benchmark'able refcounting tree together.

This is so frustrating that I just might stop everything else I'm
doing and put something together so I can just avoid all of this
rediculious red tape people are putting up just to work on what
amounts to a frickin technology demo!

Have you ever implemented something solely to figure out whether it
was worthwhile or not? :-)

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 12:11   ` Mike Stump
  2002-08-12 12:41     ` David Edelsohn
@ 2002-08-12 19:17     ` Mike Stump
  2002-08-12 23:28       ` Neil Booth
  1 sibling, 1 reply; 173+ messages in thread
From: Mike Stump @ 2002-08-12 19:17 UTC (permalink / raw)
  To: Mike Stump; +Cc: Neil Booth, gcc

On Monday, August 12, 2002, at 12:11 PM, Mike Stump wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 19:17     ` Mike Stump
@ 2002-08-12 23:28       ` Neil Booth
  0 siblings, 0 replies; 173+ messages in thread
From: Neil Booth @ 2002-08-12 23:28 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

Mike Stump wrote:-

> Ok, I looked at it.  A straight forward check to see it is has been 
> folded first with the use of an existing unused bit in the tree speeds 
> it up by 1.0003, or not enough to bother with all the code and the use 
> of the extra bit that someone else may find more valuable.  :-(

That's a shame.  8-(  Thanks for looking at it.

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 12:17 Faster compilation speed Mike Stump
                   ` (2 preceding siblings ...)
  2002-08-09 14:29 ` Neil Booth
@ 2002-08-09 14:51 ` Stan Shebs
  2002-08-09 15:03   ` David Edelsohn
  2002-08-09 15:26   ` Geoff Keating
  2002-08-09 14:59 ` Timothy J. Wood
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 173+ messages in thread
From: Stan Shebs @ 2002-08-09 14:51 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

Mike Stump wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 14:51 ` Stan Shebs
@ 2002-08-09 15:03   ` David Edelsohn
  2002-08-09 15:43     ` Stan Shebs
  2002-08-09 16:43     ` Alan Lehotsky
  2002-08-09 15:26   ` Geoff Keating
  1 sibling, 2 replies; 173+ messages in thread
From: David Edelsohn @ 2002-08-09 15:03 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Mike Stump, gcc

>>>>> Stan Shebs writes:

Stan> I think it suffices to have -O0 mean "go as fast as possible".  From time to
Stan> time, I've noticed that there's been a temptation to try to sneak in a 
Stan> little
Stan> optimization even at -O0, presumably with the assumption that the time
Stan> penalty was negligible.  (There are users who complain that -O0 should
Stan> do some amount of optimization, but IMHO we should ignore them.)

	Saying "do not run any optimization at -O0" shows a tremendous
lack of understanding or investigation.  One wants minimal optimization
even at -O0 to decrease the size of the IL representation of the function
being compiled.  The little bit of computation to perform trivial
optimization more than makes up for itself with the decreased size of the
IL that needs to be processed to generate the output.

	One needs to be careful about which optimizations are run, but
with the right choices it definitely is a net win.

David

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:03   ` David Edelsohn
@ 2002-08-09 15:43     ` Stan Shebs
  2002-08-09 16:43     ` Alan Lehotsky
  1 sibling, 0 replies; 173+ messages in thread
From: Stan Shebs @ 2002-08-09 15:43 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Mike Stump, gcc

David Edelsohn wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:03   ` David Edelsohn
  2002-08-09 15:43     ` Stan Shebs
@ 2002-08-09 16:43     ` Alan Lehotsky
  2002-08-09 16:49       ` Matt Austern
  1 sibling, 1 reply; 173+ messages in thread
From: Alan Lehotsky @ 2002-08-09 16:43 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Stan Shebs, Mike Stump, gcc

At 6:03 PM -0400 8/9/02, David Edelsohn wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:43     ` Alan Lehotsky
@ 2002-08-09 16:49       ` Matt Austern
  2002-08-10  2:24         ` Gabriel Dos Reis
  0 siblings, 1 reply; 173+ messages in thread
From: Matt Austern @ 2002-08-09 16:49 UTC (permalink / raw)
  To: Alan Lehotsky; +Cc: David Edelsohn, Stan Shebs, Mike Stump, gcc

On Friday, August 9, 2002, at 04:17 PM, Alan Lehotsky wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:49       ` Matt Austern
@ 2002-08-10  2:24         ` Gabriel Dos Reis
  0 siblings, 0 replies; 173+ messages in thread
From: Gabriel Dos Reis @ 2002-08-10  2:24 UTC (permalink / raw)
  To: Matt Austern; +Cc: Alan Lehotsky, David Edelsohn, Stan Shebs, Mike Stump, gcc

Matt Austern <austern@apple.com> writes:

| On Friday, August 9, 2002, at 04:17 PM, Alan Lehotsky wrote:
| 
| > This is DEFINITELY TRUE!
| >
| > For example, the Bliss11 compiler ACTUALLY ran faster with 
| > optimization turned on because assembling the unoptimized code 
| > actually took longer than the time running FULL optimization required 
| > for anything but the most trivial programs.
| 
| Shall we take it as a given that nobody is going to check
| in a patch for faster compilations without benchmarking
| and making sure that it really does speed things up?

Some while ago, when the compiler slowdown was a hotter issue, it was
suggested that no new optimization-related patches should be checked
in if there were no concrete evidence that they're bringing noticeable
wins.  I don't know how that turns out, though.

-- Gaby

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 14:51 ` Stan Shebs
  2002-08-09 15:03   ` David Edelsohn
@ 2002-08-09 15:26   ` Geoff Keating
  2002-08-09 16:06     ` Stan Shebs
  2002-08-12 15:55     ` Mike Stump
  1 sibling, 2 replies; 173+ messages in thread
From: Geoff Keating @ 2002-08-09 15:26 UTC (permalink / raw)
  To: Stan Shebs; +Cc: gcc

Stan Shebs <shebs@apple.com> writes:

> Mike Stump wrote:
> 
> >
> > The first realization I came to is that the only existing control
> > for such things is -O[123], and having thought about it, I think it
> > would be best to retain and use those flags.  For minimal user
> > impact, I think it would be good to not perturb existing users of
> > -O[0123] too much, or at leaast, not at first.  If we wanted to
> > change them, I think -O0 should be the `fast' version, -O1 should be
> > what -O0 does now with some additions around the edges, and -O2 and
> > -O3 also slide over (at least one).  What do you think, slide them
> > all over one or more, or just make -O0 do less, or...?  Maybe we
> > have a -O0.0 to mean compile very quickly?
> 
> I think it suffices to have -O0 mean "go as fast as possible".

Note that that's different to what it means now, which is "I want the
debugger to not surprise me."

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:26   ` Geoff Keating
@ 2002-08-09 16:06     ` Stan Shebs
  2002-08-09 16:14       ` Terry Flannery
  2002-08-09 16:29       ` Phil Edwards
  2002-08-12 15:55     ` Mike Stump
  1 sibling, 2 replies; 173+ messages in thread
From: Stan Shebs @ 2002-08-09 16:06 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

Geoff Keating wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:06     ` Stan Shebs
@ 2002-08-09 16:14       ` Terry Flannery
  2002-08-09 16:29         ` Neil Booth
  2002-08-09 16:29       ` Phil Edwards
  1 sibling, 1 reply; 173+ messages in thread
From: Terry Flannery @ 2002-08-09 16:14 UTC (permalink / raw)
  To: Stan Shebs, Geoff Keating; +Cc: gcc

IMHO, a new flag should be introduced, for example, -Of for maximum compile
speed, and no surprises when debugging. -O0 should be minimal optimizations,
and -O[s1-3] should remain as they are.
I use the preprocessor to generate a preprocessed version of all the system
header I use, into one header, and #include that in my program's header
(with the flags to dump macros) , saving some time when building. If there
was some support for pre-compiled headers, I'm sure that the compiler would
be much faster.

Terry

----- Original Message -----
From: "Stan Shebs" <shebs@apple.com>
To: "Geoff Keating" <geoffk@geoffk.org>
Cc: <gcc@gcc.gnu.org>
Sent: Saturday, August 10, 2002 12:05 AM
Subject: Re: Faster compilation speed


> Geoff Keating wrote:
>
> >Stan Shebs <shebs@apple.com> writes:
> >
> >>Mike Stump wrote:
> >>
> >>>The first realization I came to is that the only existing control
> >>>for such things is -O[123], and having thought about it, I think it
> >>>would be best to retain and use those flags.  For minimal user
> >>>impact, I think it would be good to not perturb existing users of
> >>>-O[0123] too much, or at leaast, not at first.  If we wanted to
> >>>change them, I think -O0 should be the `fast' version, -O1 should be
> >>>what -O0 does now with some additions around the edges, and -O2 and
> >>>-O3 also slide over (at least one).  What do you think, slide them
> >>>all over one or more, or just make -O0 do less, or...?  Maybe we
> >>>have a -O0.0 to mean compile very quickly?
> >>>
> >>I think it suffices to have -O0 mean "go as fast as possible".
> >>
> >
> >Note that that's different to what it means now, which is "I want the
> >debugger to not surprise me."
> >
> There's been a little bit of a drift over the years - -O0 used to be
> "no opts at all", -O1 was "not too surprising for the debugger", and
> -O2 was all-out.  I remember some pressure from Cygnus customers to
> make -O0 do more optimization, sometimes out of stupidity, but in the
> legitimate cases because the -O0 code was too slow and/or large to
> fit on the target embedded system, even for debugging.
>
> So what *should* we do with -O0 optimizations that measurably
> slow down the compiler?
>
> Stan
>
>
>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:14       ` Terry Flannery
@ 2002-08-09 16:29         ` Neil Booth
  0 siblings, 0 replies; 173+ messages in thread
From: Neil Booth @ 2002-08-09 16:29 UTC (permalink / raw)
  To: Terry Flannery; +Cc: Stan Shebs, Geoff Keating, gcc

Terry Flannery wrote:-

> IMHO, a new flag should be introduced, for example, -Of for maximum compile
> speed, and no surprises when debugging. -O0 should be minimal optimizations,
> and -O[s1-3] should remain as they are.
> I use the preprocessor to generate a preprocessed version of all the system
> header I use, into one header, and #include that in my program's header
> (with the flags to dump macros) , saving some time when building. If there
> was some support for pre-compiled headers, I'm sure that the compiler would
> be much faster.

How much time (%-wise) does it save?

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:06     ` Stan Shebs
  2002-08-09 16:14       ` Terry Flannery
@ 2002-08-09 16:29       ` Phil Edwards
  2002-08-12 16:24         ` Mike Stump
  1 sibling, 1 reply; 173+ messages in thread
From: Phil Edwards @ 2002-08-09 16:29 UTC (permalink / raw)
  To: Stan Shebs; +Cc: gcc

On Fri, Aug 09, 2002 at 04:05:16PM -0700, Stan Shebs wrote:
> So what *should* we do with -O0 optimizations that measurably
> slow down the compiler?

How "minimal" can an optimization be, if it measurably slows down the
compiler?  If it slows things down, let's just move it to -O1/-O2.


Personally, "fastest compile possible" usually just means -fsyntax-only.
I have a hard time wanting to do anything with ad-hoc output.

Phil

-- 
I would therefore like to posit that computing's central challenge, viz. "How
not to make a mess of it," has /not/ been met.
                                                 - Edsger Dijkstra, 1930-2002

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 16:29       ` Phil Edwards
@ 2002-08-12 16:24         ` Mike Stump
  2002-08-12 18:38           ` Phil Edwards
  2002-08-13  5:27           ` Theodore Papadopoulo
  0 siblings, 2 replies; 173+ messages in thread
From: Mike Stump @ 2002-08-12 16:24 UTC (permalink / raw)
  To: Phil Edwards; +Cc: Stan Shebs, gcc

On Friday, August 9, 2002, at 04:29 PM, Phil Edwards wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 16:24         ` Mike Stump
@ 2002-08-12 18:38           ` Phil Edwards
  2002-08-13  5:27           ` Theodore Papadopoulo
  1 sibling, 0 replies; 173+ messages in thread
From: Phil Edwards @ 2002-08-12 18:38 UTC (permalink / raw)
  To: Mike Stump; +Cc: Stan Shebs, gcc

On Mon, Aug 12, 2002 at 04:24:46PM -0700, Mike Stump wrote:
> On Friday, August 9, 2002, at 04:29 PM, Phil Edwards wrote:
> > Personally, "fastest compile possible" usually just means 
> > -fsyntax-only.
> 
> -fsyntax-only isn't a compile.

My point, if we're nitpicking, is that almost every single time I hear a
user complain that, "gcc is taking so long," it's immediately followed by,
"all I want to do is check that I got the template specializations in the
right order," etc.  So they use -fsyntax-only while writing their code,
and then fire off a "real" build at -O5.2e7 and go home for the evening.

Phil

-- 
I would therefore like to posit that computing's central challenge, viz. "How
not to make a mess of it," has /not/ been met.
                                                 - Edsger Dijkstra, 1930-2002

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-12 16:24         ` Mike Stump
  2002-08-12 18:38           ` Phil Edwards
@ 2002-08-13  5:27           ` Theodore Papadopoulo
  2002-08-13 10:03             ` Mike Stump
  1 sibling, 1 reply; 173+ messages in thread
From: Theodore Papadopoulo @ 2002-08-13  5:27 UTC (permalink / raw)
  To: Mike Stump; +Cc: Phil Edwards, Stan Shebs, gcc

OK, since this is a brainstorming about speeding up gcc, and since 
silly ideas are at least discussed, let me try one.

Why not make incremental compilation a standard for gcc...

This would mean storing some information into the object files. 
Things I can see are:

- Compilation flags (defines, optimization, code generation 
  and debugging flags at least).
- A signature (eg MD5 or other) for each data_type/function/global (decl ?)
  allowing for a quick check for a change. We may even differentiate
  between visible/invisible changes. Eg if a function body changes 
  but not its interface there is no need to recompile the functions
  calling it. More generally name changes could be detected as 
  non-changes, but I suspect that this will mess up with debugging information.

Then generate code only for the relevant symbols (ie the new ones or 
those that have been changed or affected indirectly by a change ie 
depending on a function or variable that changed) and do a replacing 
of these in the .o file (is there an gas option like --replace ?).

In some way this is like PCH but pushed one step further. I can 
understand that making it work reliably is quite difficult, but the 
perspective of having a fast incremental compiler is tempting...
The information to store is certainly one of the trickiest part so a 
first step could be to add a flag stating recompile only this symbol and 
what depends on it. Not very user friendly, but maybe an interesting 
first step...

Is this a totally remote/stupid idea, or can it be done in some 
eventually not too distant future ??

	Theo.

--------------------------------------------------------------------
Theodore Papadopoulo
Email: Theodore.Papadopoulo@sophia.inria.fr Tel: (33) 04 92 38 76 01
 --------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-13  5:27           ` Theodore Papadopoulo
@ 2002-08-13 10:03             ` Mike Stump
  0 siblings, 0 replies; 173+ messages in thread
From: Mike Stump @ 2002-08-13 10:03 UTC (permalink / raw)
  To: Theodore Papadopoulo; +Cc: Phil Edwards, Stan Shebs, gcc

On Tuesday, August 13, 2002, at 05:27 AM, Theodore Papadopoulo wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 15:26   ` Geoff Keating
  2002-08-09 16:06     ` Stan Shebs
@ 2002-08-12 15:55     ` Mike Stump
  1 sibling, 0 replies; 173+ messages in thread
From: Mike Stump @ 2002-08-12 15:55 UTC (permalink / raw)
  To: Geoff Keating; +Cc: Stan Shebs, gcc

On Friday, August 9, 2002, at 03:26 PM, Geoff Keating wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 12:17 Faster compilation speed Mike Stump
                   ` (3 preceding siblings ...)
  2002-08-09 14:51 ` Stan Shebs
@ 2002-08-09 14:59 ` Timothy J. Wood
  2002-08-16 13:31   ` Problem with PFE approach [Was: Faster compilation speed] Timothy J. Wood
  2002-08-09 16:01 ` Faster compilation speed Richard Henderson
  2002-08-10 17:48 ` Aaron Lehmann
  6 siblings, 1 reply; 173+ messages in thread
From: Timothy J. Wood @ 2002-08-09 14:59 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

On Friday, August 9, 2002, at 12:17  PM, Mike Stump wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Problem with PFE approach [Was: Faster compilation speed]
  2002-08-09 14:59 ` Timothy J. Wood
@ 2002-08-16 13:31   ` Timothy J. Wood
  2002-08-16 13:44     ` Devang Patel
  2002-08-16 13:54     ` Devang Patel
  0 siblings, 2 replies; 173+ messages in thread
From: Timothy J. Wood @ 2002-08-16 13:31 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

  So, another point in favor of discarding the concept of 'statically 
precompilation' based on a problem I just ran into with PFE under 
10.2...

  I'm emulating some of the Win32 API for porting games to Mac OS X.  
Win32 has a macro like this:

#ifndef INITGUID
#define DEFINE_GUID(name, l, w1, w2, b1, b2, b3, b4, b5, b6, b7, b8) \
    EXTERN_C const GUID FAR name
#else

#define DEFINE_GUID(name, l, w1, w2, b1, b2, b3, b4, b5, b6, b7, b8) \
        EXTERN_C const GUID name \
                = { l, w1, w2, { b1, b2,  b3,  b4,  b5,  b6,  b7,  b8 } 
}
#endif // INITGUID

  If this gets stuck in a PFE and the PFE is applied as a prefix header 
(the only way it can be done right now), then the file being compiled 
cannot make its own decision about whether INITGUID should be defined 
or not.

  Clearly there are ways around this, but the current approach makes 
the compiler produce different output based on whether PFE is on or 
not.  I consider this a bug.

  This would not be a problem with an automatic precompiler that 
remembered facts and didn't use the prefix header hack.

  Are there problems with what I describe below or are people just 
avoiding commenting on this since it is too hard to implement? :)

-tim

On Friday, August 9, 2002, at 02:58  PM, Timothy J. Wood wrote:
2) This one is rather crazy and would involve huge amounts of work 
probably....

  a) Toss some or all of your PFE code in the bin (yikes!)
  b) Build a precompile server that the compiler can attach to and 
request precompiled headers (give a path and set of -D flags or 
whatever other state is needed to uniquely identify the precompile 
output).  Requests would be satisfied via shared memory (yes, 
non-portable, so this whole mechanism will only work on modern 
machines).
  c) Inside the server, keep parsed representations of all headers 
that have been imported and the -D state used when parsing the 
headers.  As new headers are parsed, they should be able to **layer** 
on top of existing parsed headers (so there should only be one parsed 
version of std::string).  This avoids the confining requirement that 
you have one big master precompiled header.
 d) Details about concurrency, security, locating the server, and so 
on left as an exercise for the reader.

  The main advantage here is that people would get fast compiles 
WITHOUT having to tune their single PFE header.  Additionally, more 
headers would get precompiled than would otherwise, yielding faster 
builds.  If they layering is done correctly, the memory usage of the 
entire system could be lower (since if you have two projects to build, 
both of which import STL, there would be only one precompiled version 
of STL).

  At the start of a build, a special 'check filesystem' command could 
be sent to the server to have it do a one-time check of timestamps of 
headers files.  Assuming the timestamps haven't changed, the 
precompiled headers could be kept across builds!

  Naturally doing a 'clean' build from the IDE option would need to be 
able to flush and probably shut down the server since it is inevitable 
that there will be bugs that will corrupt the precomp database :(

  #2 could really take many forms.  The key idea is that having a 
single PFE file is non-optimal.  Developers should not have to spend 
time tuning such a file to get the best compile time.  The compiler 
and IDE should handle all these details by default.  Having the 
developer involved here just leads to extra (ongoing!) work for the 
developer and a sub-optimal set of precompiled headers.

  Your goal should be to have the developer open their project and 
have it build 6x faster (instead of requiring the developer to do a 
several hours of tweaking on their PFE file to get the best 
performance -- and then having to keep it up to date over the life of 
their project).

3) This is possibly even harder...  Keep track of what facts in a 
header each source file cared about (macro values defined or 
undefined, structure layout, function signature, etc, etc, etc).  If a 
header changes, have the precompile server keep track of the facts 
that have changed and then only rebuild source files that care about 
those changes (assuming the source file itself hasn't compiled).  This 
could get really ugly since you'd potentially keep track of multiple 
fact timestamps (consider if a build fails or is aborted so some files 
got updated for the current state of a header and some didn't).

  Extra bonus points for doing this on a lower granularity basis 
(i.e., don't recompile a function if it wouldn't produce different 
output).  This would clearly be very hard and a large departure from 
the current state of affairs :)

  Anyway, I think the biggest improvements lie in moving away from the 
current batch compile philosophy mandated by the command line tools.  
Instead, the command line tools should be a front end onto a much more 
powerful persistent compile server.

  (Hey, you asked for ideas and said it was OK if they were hard :)

-tim

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-16 13:31   ` Problem with PFE approach [Was: Faster compilation speed] Timothy J. Wood
@ 2002-08-16 13:44     ` Devang Patel
  2002-08-16 14:31       ` Timothy J. Wood
  2002-08-16 13:54     ` Devang Patel
  1 sibling, 1 reply; 173+ messages in thread
From: Devang Patel @ 2002-08-16 13:44 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: Mike Stump, gcc

On Friday, August 16, 2002, at 01:31 PM, Timothy J. Wood wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-16 13:44     ` Devang Patel
@ 2002-08-16 14:31       ` Timothy J. Wood
  2002-08-16 14:39         ` Neil Booth
  2002-08-16 14:46         ` Devang Patel
  0 siblings, 2 replies; 173+ messages in thread
From: Timothy J. Wood @ 2002-08-16 14:31 UTC (permalink / raw)
  To: Devang Patel; +Cc: Mike Stump, gcc

On Friday, August 16, 2002, at 01:43  PM, Devang Patel wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-16 14:31       ` Timothy J. Wood
@ 2002-08-16 14:39         ` Neil Booth
  2002-08-16 14:46         ` Devang Patel
  1 sibling, 0 replies; 173+ messages in thread
From: Neil Booth @ 2002-08-16 14:39 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: Devang Patel, Mike Stump, gcc

Timothy J. Wood wrote:-

>   The fact that you have to build this massive single header that acts 
> as a prefix header is the broken part -- implementation details like 
> this should not be exposed to the user.  Just like Apple doesn't make 
> users manually configure their Apache server for personal web sharing, 
> Apple shouldn't make their developers do a bunch of work to get decent 
> compile speeds.  It should "Just Work (TM)".

I agree.  Borland, MS and KAI managed this, so we should too.

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-16 14:31       ` Timothy J. Wood
  2002-08-16 14:39         ` Neil Booth
@ 2002-08-16 14:46         ` Devang Patel
  1 sibling, 0 replies; 173+ messages in thread
From: Devang Patel @ 2002-08-16 14:46 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: Mike Stump, gcc

On Friday, August 16, 2002, at 02:31 PM, Timothy J. Wood wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-16 13:31   ` Problem with PFE approach [Was: Faster compilation speed] Timothy J. Wood
  2002-08-16 13:44     ` Devang Patel
@ 2002-08-16 13:54     ` Devang Patel
  2002-08-16 14:42       ` Neil Booth
  2002-08-16 14:45       ` Timothy J. Wood
  1 sibling, 2 replies; 173+ messages in thread
From: Devang Patel @ 2002-08-16 13:54 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: Mike Stump, gcc

On Friday, August 16, 2002, at 01:31 PM, Timothy J. Wood wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-16 13:54     ` Devang Patel
@ 2002-08-16 14:42       ` Neil Booth
  2002-08-16 14:57         ` Devang Patel
  2002-08-16 14:45       ` Timothy J. Wood
  1 sibling, 1 reply; 173+ messages in thread
From: Neil Booth @ 2002-08-16 14:42 UTC (permalink / raw)
  To: Devang Patel; +Cc: Timothy J. Wood, Mike Stump, gcc

Devang Patel wrote:-

> In your previous two queries, what you want from PFE is to discard few 
> things
> based on macros  from precompiled headers. But when PFE restores trees,
> it has gone too far as far as macros are concerned.

The implementation should know what its assumptions are, and if they're
broken recover somehow.  Have you seen KAI's documentation (online)
for their PCH implementation?  It seems like a good solution to me.

Neil.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-16 14:42       ` Neil Booth
@ 2002-08-16 14:57         ` Devang Patel
  2002-08-17 15:31           ` Timothy J. Wood
  0 siblings, 1 reply; 173+ messages in thread
From: Devang Patel @ 2002-08-16 14:57 UTC (permalink / raw)
  To: Neil Booth; +Cc: Timothy J. Wood, Mike Stump, gcc

On Friday, August 16, 2002, at 02:41 PM, Neil Booth wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-16 14:57         ` Devang Patel
@ 2002-08-17 15:31           ` Timothy J. Wood
  2002-08-17 20:04             ` Daniel Berlin
                               ` (2 more replies)
  0 siblings, 3 replies; 173+ messages in thread
From: Timothy J. Wood @ 2002-08-17 15:31 UTC (permalink / raw)
  To: Devang Patel; +Cc: Mike Stump, gcc

  So, another problem with PFE that I've noticed after working with it 
for a while...

  If you put all your commonly used headers in a PFE, then changing any 
of these headers causes the PFE header to considered changed.  And, 
since this header is imported into every single file in your project, 
you end up in a situation where changing any header causes the entire 
project to be rebuilt.  This is clearly not good for day to day 
development.

  A PCH approach that was automatic and didn't have a single monolithic 
file would avoid the artificial tying together of all the headers in 
the world and would thus lead to faster incremental builds due to fewer 
files being rebuilt.

  Another approach that would work with a monolithic file would be some 
sort of fact database that would allow the build system to decide early 
on that the change in question didn't effect some subset of files.

-tim

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-17 15:31           ` Timothy J. Wood
@ 2002-08-17 20:04             ` Daniel Berlin
  2002-08-17 20:07               ` Andrew Pinski
  2002-08-17 20:14               ` Timothy J. Wood
  2002-08-17 20:15             ` Daniel Berlin
  2002-08-19  7:07             ` Stan Shebs
  2 siblings, 2 replies; 173+ messages in thread
From: Daniel Berlin @ 2002-08-17 20:04 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: Devang Patel, Mike Stump, gcc

On Sat, 17 Aug 2002, Timothy J. Wood wrote:

> 
>    So, another problem with PFE that I've noticed after working with it 
> for a while...
> 
>    If you put all your commonly used headers in a PFE, then changing any 
> of these headers causes the PFE header to considered changed.  And, 
> since this header is imported into every single file in your project, 
> you end up in a situation where changing any header causes the entire 
> project to be rebuilt. 

Um, this header should *not* be explicitly included in the files.
It's *prefix* header.

The only thing that would need to be rebuilt in this case is the prefix 
header.
Everything else that would normally not be rebuilt will not be rebuilt.

IE the only thing extra that gets rebuilt is the prefix header.

 
> This is clearly not good for day to day 
> development.
> 
>    A PCH approach that was automatic and didn't have a single monolithic 
> file would avoid the artificial tying together of all the headers in 
> the world and would thus lead to faster incremental builds due to fewer 
> files being rebuilt.
> 
>    Another approach that would work with a monolithic file would be some 
> sort of fact database that would allow the build system to decide early 
> on that the change in question didn't effect some subset of files.
> 
> -tim
> 
> 
> 

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-17 20:04             ` Daniel Berlin
@ 2002-08-17 20:07               ` Andrew Pinski
  2002-08-17 20:14               ` Timothy J. Wood
  1 sibling, 0 replies; 173+ messages in thread
From: Andrew Pinski @ 2002-08-17 20:07 UTC (permalink / raw)
  To: dberlin; +Cc: Timothy J. Wood, Devang Patel, Mike Stump, gcc

PFE is like the prepocessed headers in CodeWarrior.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-17 20:04             ` Daniel Berlin
  2002-08-17 20:07               ` Andrew Pinski
@ 2002-08-17 20:14               ` Timothy J. Wood
  2002-08-17 20:21                 ` Daniel Berlin
  2002-08-19 11:59                 ` Devang Patel
  1 sibling, 2 replies; 173+ messages in thread
From: Timothy J. Wood @ 2002-08-17 20:14 UTC (permalink / raw)
  To: dberlin; +Cc: Devang Patel, Mike Stump, gcc

On Saturday, August 17, 2002, at 08:04  PM, Daniel Berlin wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-17 20:14               ` Timothy J. Wood
@ 2002-08-17 20:21                 ` Daniel Berlin
  2002-08-18  3:17                   ` Kai Henningsen
  2002-08-19 11:59                 ` Devang Patel
  1 sibling, 1 reply; 173+ messages in thread
From: Daniel Berlin @ 2002-08-17 20:21 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: Devang Patel, Mike Stump, gcc

On Sat, 17 Aug 2002, Timothy J. Wood wrote:

> 
> On Saturday, August 17, 2002, at 08:04  PM, Daniel Berlin wrote:
> 
> > On Sat, 17 Aug 2002, Timothy J. Wood wrote:
> >
> >>
> >>    So, another problem with PFE that I've noticed after working with 
> >> it
> >> for a while...
> >>
> >>    If you put all your commonly used headers in a PFE, then changing 
> >> any
> >> of these headers causes the PFE header to considered changed.  And,
> >> since this header is imported into every single file in your project,
> >> you end up in a situation where changing any header causes the entire
> >> project to be rebuilt.
> >
> > Um, this header should *not* be explicitly included in the files.
> > It's *prefix* header.
> 
>    I'm not saying that I'm #including it in my sources.  What I'm saying 
> is that the IDE knows that all my files depend upon it (they all end up 
> including it due to it being the prefix header, regardless of whether 
> it is listed or not).  This means that they may have depedencies on the 
> its contents and must be rebuilt if it or any header it includes 
> changes.

No, they shouldn't have any dependencies on it's contents. They should 
include what they normally include.  The fact that the prefix header stores the 
compiler state should prevent these includes from doing anything (since 
it'll know it's already processed that header) when it is present.
Any build system that makes the files depend on the prefix header is 
broken, and needs to be fixed.

Prefix headers need to be rebuilt when compilation options change, or the 
headers it includes change. 
Files only need rebuilt when some normal header they depend on changes.
*Not* when the prefix header changes.

 > 
>    The way I think about this is that the prefix header mess is just a 
> hack to avoid having a #include at the top of each file.  There should 
> be nothing else special about the header -- it is just assumed that 
> there is a #include at the top of your file.
> 
> > The only thing that would need to be rebuilt in this case is the 
> > prefix header.
> > Everything else that would normally not be rebuilt will not be rebuilt.
> 
>    Nope... everything needs to be rebuilt.  The problem is that the 
> prefix header might satisfy some symbol or macro that a source file 
> needs (assume that the source file doesn't explicitly include headers 
> it needs). 

Don't assume that.
It should always do so.
If not, the source code is wrong.
Period.
It's not a usability issue that users must have the proper includes.

--Dan

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-17 20:21                 ` Daniel Berlin
@ 2002-08-18  3:17                   ` Kai Henningsen
  2002-08-18  7:36                     ` Daniel Berlin
  0 siblings, 1 reply; 173+ messages in thread
From: Kai Henningsen @ 2002-08-18  3:17 UTC (permalink / raw)
  To: gcc

dberlin@dberlin.org (Daniel Berlin)  wrote on 17.08.02 in < Pine.LNX.4.44.0208172315090.29572-100000@dberlin.org >:

> On Sat, 17 Aug 2002, Timothy J. Wood wrote:
>
> >
> > On Saturday, August 17, 2002, at 08:04  PM, Daniel Berlin wrote:
> >
> > > On Sat, 17 Aug 2002, Timothy J. Wood wrote:
> > >
> > >>
> > >>    So, another problem with PFE that I've noticed after working with
> > >> it
> > >> for a while...
> > >>
> > >>    If you put all your commonly used headers in a PFE, then changing
> > >> any
> > >> of these headers causes the PFE header to considered changed.  And,
> > >> since this header is imported into every single file in your project,
> > >> you end up in a situation where changing any header causes the entire
> > >> project to be rebuilt.
> > >
> > > Um, this header should *not* be explicitly included in the files.
> > > It's *prefix* header.
> >
> >    I'm not saying that I'm #including it in my sources.  What I'm saying
> > is that the IDE knows that all my files depend upon it (they all end up
> > including it due to it being the prefix header, regardless of whether
> > it is listed or not).  This means that they may have depedencies on the
> > its contents and must be rebuilt if it or any header it includes
> > changes.
>
> No, they shouldn't have any dependencies on it's contents. They should

That would be seriously broken ...

> include what they normally include.  The fact that the prefix header stores
> the compiler state should prevent these includes from doing anything (since
> it'll know it's already processed that header) when it is present.
> Any build system that makes the files depend on the prefix header is
> broken, and needs to be fixed.

... unless you have some mechanism to prevent them from being influenced  
by any change in any header which is used in the prefix header but which  
they do not include normally.

What mechanism would that be?

The dependency chain is *exactly* the same as if the prefix header was  
normally included at the start of every source file.

MfG Kai

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-18  3:17                   ` Kai Henningsen
@ 2002-08-18  7:36                     ` Daniel Berlin
  2002-08-18 11:20                       ` jepler
  0 siblings, 1 reply; 173+ messages in thread
From: Daniel Berlin @ 2002-08-18  7:36 UTC (permalink / raw)
  To: Kai Henningsen; +Cc: gcc

On 18 Aug 2002, Kai Henningsen wrote:

> dberlin@dberlin.org (Daniel Berlin)  wrote on 17.08.02 in < Pine.LNX.4.44.0208172315090.29572-100000@dberlin.org >:
> 
> > On Sat, 17 Aug 2002, Timothy J. Wood wrote:
> >
> > >
> > > On Saturday, August 17, 2002, at 08:04  PM, Daniel Berlin wrote:
> > >
> > > > On Sat, 17 Aug 2002, Timothy J. Wood wrote:
> > > >
> > > >>
> > > >>    So, another problem with PFE that I've noticed after working with
> > > >> it
> > > >> for a while...
> > > >>
> > > >>    If you put all your commonly used headers in a PFE, then changing
> > > >> any
> > > >> of these headers causes the PFE header to considered changed.  And,
> > > >> since this header is imported into every single file in your project,
> > > >> you end up in a situation where changing any header causes the entire
> > > >> project to be rebuilt.
> > > >
> > > > Um, this header should *not* be explicitly included in the files.
> > > > It's *prefix* header.
> > >
> > >    I'm not saying that I'm #including it in my sources.  What I'm saying
> > > is that the IDE knows that all my files depend upon it (they all end up
> > > including it due to it being the prefix header, regardless of whether
> > > it is listed or not).  This means that they may have depedencies on the
> > > its contents and must be rebuilt if it or any header it includes
> > > changes.
> >
> > No, they shouldn't have any dependencies on it's contents. They should
> 
> That would be seriously broken ...
> 
> > include what they normally include.  The fact that the prefix header stores
> > the compiler state should prevent these includes from doing anything (since
> > it'll know it's already processed that header) when it is present.
> > Any build system that makes the files depend on the prefix header is
> > broken, and needs to be fixed.
> 
> ... unless you have some mechanism to prevent them from being influenced  
> by any change in any header which is used in the prefix header but which  
> they do not include normally.

Why would they be influenced by a change to something they would not 
normally include?
Unless they don't include what they normally should.
> 
> What mechanism would that be?

Reality?
> 
> The dependency chain is *exactly* the same as if the prefix header was  
> normally included at the start of every source file.

This is wrong, and leads exactly to the problem Tim describes.
The dependency chain should *not* include the prefix header.

The fact that the prefix header exists is not something the build system 
should know about, except insofar that it rebuild the prefix header when 
the headers it includes changes.

That's *it*.


> 
> MfG Kai
> 
> 

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-18  7:36                     ` Daniel Berlin
@ 2002-08-18 11:20                       ` jepler
  2002-08-18 13:20                         ` Daniel Berlin
  0 siblings, 1 reply; 173+ messages in thread
From: jepler @ 2002-08-18 11:20 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Kai Henningsen, gcc

Let me see if I understand what people are talking about.

a.h:
    /* Include header guard if appropriate */
    #define X 1

b.h:
    /* Include header guard if appropriate */
    #define Y 1

m.c:
    #include "a.h"
    int main(void) { return Y; }

If m.c is compiled using PFE, and the PFE header contains both a.h and b.h,
will the compilation complete successfully?

If yes, and b.h is later modified to remove the Y definition will a build
system where m.c does not depend on the PFE header actually rebuild m.c,
since the output of m.c depends (erroneously) on an item in b.h through
the PFE header?

My understanding of the PFE symbol implies that m.c would see a definition
from b.h even though b.h was not the target of a #include directive.  This
means that programmers will accidentally depend on symbols from b.h even
when it's not included, and that if they do, and the build system does not
consider the PFE header a dependency of each source file, the definitions
will not only be visible when they should not be, but the build will be
wrong since the new contents of these accidentally referenced header files
will not catually cause a rebuild.

Jeff

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-18 11:20                       ` jepler
@ 2002-08-18 13:20                         ` Daniel Berlin
  2002-08-18 14:31                           ` Timothy J. Wood
  0 siblings, 1 reply; 173+ messages in thread
From: Daniel Berlin @ 2002-08-18 13:20 UTC (permalink / raw)
  To: jepler; +Cc: Kai Henningsen, gcc

On Sun, 18 Aug 2002 jepler@unpythonic.net wrote:

> Let me see if I understand what people are talking about.
> 
> a.h:
>     /* Include header guard if appropriate */
>     #define X 1
> 
> b.h:
>     /* Include header guard if appropriate */
>     #define Y 1
> 
> m.c:
>     #include "a.h"
>     int main(void) { return Y; }
> 
> If m.c is compiled using PFE, and the PFE header contains both a.h and b.h,
> will the compilation complete successfully?
> 
> If yes, and b.h is later modified to remove the Y definition will a build
> system where m.c does not depend on the PFE header actually rebuild m.c,
> since the output of m.c depends (erroneously) on an item in b.h through
> the PFE header?

A build system where m.c does not depend on the prefix header should 
*not* rebuild if b.h is modified.
That's my point.

> 
> My understanding of the PFE symbol implies that m.c would see a definition
> from b.h even though b.h was not the target of a #include directive.
Yes, they would be existing, but this is user error.
They should always include the right things. 
In other words, you should make sure it works without a PFE header 
before you try it *with* one.
It's only when you *count* on the fact that the PFE header is there that 
you run into dependency problems.

--Dan

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-18 13:20                         ` Daniel Berlin
@ 2002-08-18 14:31                           ` Timothy J. Wood
  2002-08-18 14:35                             ` Andrew Pinski
  2002-08-19  2:41                             ` Michael Matz
  0 siblings, 2 replies; 173+ messages in thread
From: Timothy J. Wood @ 2002-08-18 14:31 UTC (permalink / raw)
  To: dberlin; +Cc: jepler, Kai Henningsen, gcc

On Sunday, August 18, 2002, at 01:20  PM, Daniel Berlin wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-18 14:31                           ` Timothy J. Wood
@ 2002-08-18 14:35                             ` Andrew Pinski
  2002-08-18 14:55                               ` Timothy J. Wood
  2002-08-19  2:41                             ` Michael Matz
  1 sibling, 1 reply; 173+ messages in thread
From: Andrew Pinski @ 2002-08-18 14:35 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: dberlin, jepler, Kai Henningsen, gcc

PFE is good for headers that hardly change, like system headers.
It is not good for headers that change in development.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-18 14:35                             ` Andrew Pinski
@ 2002-08-18 14:55                               ` Timothy J. Wood
  0 siblings, 0 replies; 173+ messages in thread
From: Timothy J. Wood @ 2002-08-18 14:55 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc

On Sunday, August 18, 2002, at 02:36  PM, Andrew Pinski wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-18 14:31                           ` Timothy J. Wood
  2002-08-18 14:35                             ` Andrew Pinski
@ 2002-08-19  2:41                             ` Michael Matz
  2002-08-19  6:26                               ` jepler
  2002-08-19 11:53                               ` Devang Patel
  1 sibling, 2 replies; 173+ messages in thread
From: Michael Matz @ 2002-08-19  2:41 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: dberlin, jepler, Kai Henningsen, gcc

Hi,

On Sun, 18 Aug 2002, Timothy J. Wood wrote:

>    Thus, if you are going to implicitly include the header, you damn
> well better included it in dependency analysis.

No, because the existance of that header shouldn't influence the outcome
of the compiler in any way.

>    I can accept an argument of "this is too hard to do correctly right
> now", but not "the user screwed up".  The user didn't screw up -- the
> compiler just isn't smart enough to do it correctly yet.

If the source doesn't compile without the prefix header the user did
something wrong, IOW he's screwed if he doesn't want to fix it.  Period.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-19  2:41                             ` Michael Matz
@ 2002-08-19  6:26                               ` jepler
  2002-08-19  6:40                                 ` Daniel Berlin
  2002-08-19 11:50                                 ` Devang Patel
  2002-08-19 11:53                               ` Devang Patel
  1 sibling, 2 replies; 173+ messages in thread
From: jepler @ 2002-08-19  6:26 UTC (permalink / raw)
  To: Michael Matz; +Cc: Timothy J. Wood, dberlin, Kai Henningsen, gcc

> On Sun, 18 Aug 2002, Timothy J. Wood wrote:
> >    I can accept an argument of "this is too hard to do correctly right
> > now", but not "the user screwed up".  The user didn't screw up -- the
> > compiler just isn't smart enough to do it correctly yet.

On Mon, Aug 19, 2002 at 11:21:28AM +0200, Michael Matz wrote:
> If the source doesn't compile without the prefix header the user did
> something wrong, IOW he's screwed if he doesn't want to fix it.  Period.

PFE makes it too easy for the programmer to accidentally give his program
different meaning with or without the prefix header.  I can do without one
more way to screw up my program.

The following set of files will compile a program with or without PFE, but
using a PFE that contains both a.h and b.h, the behavior will change.  So
the suggestion that files should be checked that they compile without PFE
is not enough to ensure that there aren't unintended changes in program
meaning in the presence of PFE.

// a.h
#define DEFA

// b.h
#define DEFB

// m.c
#include "a.h"
int main(void) {
#ifdef DEFB
	return 1;
#else
	return 0;
#endif;
}

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-19  6:26                               ` jepler
@ 2002-08-19  6:40                                 ` Daniel Berlin
  2002-08-19 11:50                                 ` Devang Patel
  1 sibling, 0 replies; 173+ messages in thread
From: Daniel Berlin @ 2002-08-19  6:40 UTC (permalink / raw)
  To: jepler; +Cc: Michael Matz, Timothy J. Wood, Kai Henningsen, gcc

On Mon, 19 Aug 2002 jepler@unpythonic.net wrote:

> > On Sun, 18 Aug 2002, Timothy J. Wood wrote:
> > >    I can accept an argument of "this is too hard to do correctly right
> > > now", but not "the user screwed up".  The user didn't screw up -- the
> > > compiler just isn't smart enough to do it correctly yet.
> 
> On Mon, Aug 19, 2002 at 11:21:28AM +0200, Michael Matz wrote:
> > If the source doesn't compile without the prefix header the user did
> > something wrong, IOW he's screwed if he doesn't want to fix it.  Period.
> 
> PFE makes it too easy for the programmer to accidentally give his program
> different meaning with or without the prefix header.  I can do without one
> more way to screw up my program.
> 
> The following set of files will compile a program with or without PFE, but
> using a PFE that contains both a.h and b.h, the behavior will change. 

This is an implementation problem, and one that should be fixed.
As is making symbols visible without the explicit includes (Though this is 
slightly harder to solve, but still possible through various means).


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-19  6:26                               ` jepler
  2002-08-19  6:40                                 ` Daniel Berlin
@ 2002-08-19 11:50                                 ` Devang Patel
  2002-08-19 12:55                                   ` Jeff Epler
  1 sibling, 1 reply; 173+ messages in thread
From: Devang Patel @ 2002-08-19 11:50 UTC (permalink / raw)
  To: jepler; +Cc: dberlin, gcc

On Monday, August 19, 2002, at 06:26  AM, jepler@unpythonic.net wrote:

The following set of files will compile a program with or without PFE, but
using a PFE that contains both a.h and b.h, the behavior will change.

This is not implementation problem or PFE model problem.
If you are including a.h and b.h in PFE means what you're asking compiler to do 
is to compile following source

/// m.c
#include "a.h"
#include "b.h"
int main(void) {
#ifdef DEFB
return 1;
#else
return 0;
#endif;
}

And, no doubt, it can have different behavior then following original source

// m.c
#include "a.h"
int main(void) {
#ifdef DEFB
return 1;
#else
return 0;
#endif;
}

-Devang

So
the suggestion that files should be checked that they compile without PFE
is not enough to ensure that there aren't unintended changes in program
meaning in the presence of PFE.

// a.h
#define DEFA

// b.h
#define DEFB

// m.c
#include "a.h"
int main(void) {
#ifdef DEFB
return 1;
#else
return 0;
#endif;
}

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-19 11:50                                 ` Devang Patel
@ 2002-08-19 12:55                                   ` Jeff Epler
  2002-08-19 13:03                                     ` Ziemowit Laski
  0 siblings, 1 reply; 173+ messages in thread
From: Jeff Epler @ 2002-08-19 12:55 UTC (permalink / raw)
  To: Devang Patel; +Cc: dberlin, gcc

On Mon, Aug 19, 2002 at 11:50:24AM -0700, Devang Patel wrote:
>  
> On Monday, August 19, 2002, at 06:26  AM, jepler@unpythonic.net wrote: 
> 
> 
> > The following set of files will compile a program with or without 
> > PFE, but 
> > using a PFE that contains both a.h and b.h, the behavior will 
> > change. 
> > 
>  
> 
> This is not implementation problem or PFE model problem. 
> If you are including a.h and b.h in PFE means what you're asking 
> compiler to do  
> is to compile following source 
> 
> 
> /// m.c 
> #include "a.h" 
> #include "b.h" 
> int main(void) { 
> #ifdef DEFB 
> 	return 1; 
> #else 
> 	return 0; 
> #endif; 
> } 

.. then the build system must treat m.c as depending on the PFE, which
in turn depends on all headers it contains.  But that's where this
discussion started, with the PFE cure being worse than the illness since
it makes your whole project recompile when you touch a header file.

Jeff

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-19 12:55                                   ` Jeff Epler
@ 2002-08-19 13:03                                     ` Ziemowit Laski
  0 siblings, 0 replies; 173+ messages in thread
From: Ziemowit Laski @ 2002-08-19 13:03 UTC (permalink / raw)
  To: Jeff Epler; +Cc: Devang Patel, dberlin, gcc

On Monday, Aug 19, 2002, at 12:54 US/Pacific, Jeff Epler wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-19  2:41                             ` Michael Matz
  2002-08-19  6:26                               ` jepler
@ 2002-08-19 11:53                               ` Devang Patel
  1 sibling, 0 replies; 173+ messages in thread
From: Devang Patel @ 2002-08-19 11:53 UTC (permalink / raw)
  To: Michael Matz; +Cc: Timothy J. Wood, dberlin, jepler, Kai Henningsen, gcc

On Monday, August 19, 2002, at 02:21  AM, Michael Matz wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-17 20:14               ` Timothy J. Wood
  2002-08-17 20:21                 ` Daniel Berlin
@ 2002-08-19 11:59                 ` Devang Patel
  1 sibling, 0 replies; 173+ messages in thread
From: Devang Patel @ 2002-08-19 11:59 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: dberlin, Mike Stump, gcc

On Saturday, August 17, 2002, at 08:14  PM, Timothy J. Wood wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-17 15:31           ` Timothy J. Wood
  2002-08-17 20:04             ` Daniel Berlin
@ 2002-08-17 20:15             ` Daniel Berlin
  2002-08-19  7:07             ` Stan Shebs
  2 siblings, 0 replies; 173+ messages in thread
From: Daniel Berlin @ 2002-08-17 20:15 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: Devang Patel, Mike Stump, gcc

On Sat, 17 Aug 2002, Timothy J. Wood wrote:

> 
>    So, another problem with PFE that I've noticed after working with it 
> for a while...
> 
>    If you put all your commonly used headers in a PFE, then changing any 
> of these headers causes the PFE header to considered changed.  And, 
> since this header is imported into every single file in your project, 
> you end up in a situation where changing any header causes the entire 
> project to be rebuilt.  This is clearly not good for day to day 
> development.
> 
>    A PCH approach that was automatic and didn't have a single monolithic 
> file would avoid the artificial tying together of all the headers in 
> the world and would thus lead to faster incremental builds due to fewer 
> files being rebuilt.
> 
>    Another approach that would work with a monolithic file would be some 
> sort of fact database that would allow the build system to decide early 
> on that the change in question didn't effect some subset of files.
> 

Also, while constructive criticism is good and all, at some point, it 
becomes "put up or shut up". It's one thing to say how great something 
would be, another thing to implement it.  We have heard your idea, we know 
how to implement it.  Everyone is aware of it.  At this point, i'd 
rather you tell me how good it is when you've got code to do it, rather 
than keep pointing out what you perceive to be flaws in something that is 
a large improvement over what exists now.

One of the things that slows down gcc development is criticism of patches 
that are large improvements over what exists now, in favor of some 
"better" approach, which nobody has yet implemented.  Then this large 
improvement never gets accepted, and nobody ever implements the "better 
approach". The perfect is the enemy of the good.

 --Dan

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-17 15:31           ` Timothy J. Wood
  2002-08-17 20:04             ` Daniel Berlin
  2002-08-17 20:15             ` Daniel Berlin
@ 2002-08-19  7:07             ` Stan Shebs
  2002-08-19  8:52               ` Timothy J. Wood
  2 siblings, 1 reply; 173+ messages in thread
From: Stan Shebs @ 2002-08-19  7:07 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: Devang Patel, Mike Stump, gcc

Timothy J. Wood wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-19  7:07             ` Stan Shebs
@ 2002-08-19  8:52               ` Timothy J. Wood
  0 siblings, 0 replies; 173+ messages in thread
From: Timothy J. Wood @ 2002-08-19  8:52 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Devang Patel, Mike Stump, gcc

On Monday, August 19, 2002, at 07:05  AM, Stan Shebs wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Problem with PFE approach [Was: Faster compilation speed]
  2002-08-16 13:54     ` Devang Patel
  2002-08-16 14:42       ` Neil Booth
@ 2002-08-16 14:45       ` Timothy J. Wood
  1 sibling, 0 replies; 173+ messages in thread
From: Timothy J. Wood @ 2002-08-16 14:45 UTC (permalink / raw)
  To: Devang Patel; +Cc: Mike Stump, gcc

On Friday, August 16, 2002, at 01:54  PM, Devang Patel wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 12:17 Faster compilation speed Mike Stump
                   ` (4 preceding siblings ...)
  2002-08-09 14:59 ` Timothy J. Wood
@ 2002-08-09 16:01 ` Richard Henderson
  2002-08-10 17:48 ` Aaron Lehmann
  6 siblings, 0 replies; 173+ messages in thread
From: Richard Henderson @ 2002-08-09 16:01 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

On Fri, Aug 09, 2002 at 12:17:32PM -0700, Mike Stump wrote:
> Another question is, what should the lower limit be on uglifying code 
> for the sake of compilation speed.

You'll find that really ugly code will compile slower than
code that has been optimized some simply due to the fact
that you emit less assembly, and therefore do less I/O.

As for not re-using temp slots, sure I guess that's something
we can do at -O0.  I don't see a need for the new command-line
switch though.

r~

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-09 12:17 Faster compilation speed Mike Stump
                   ` (5 preceding siblings ...)
  2002-08-09 16:01 ` Faster compilation speed Richard Henderson
@ 2002-08-10 17:48 ` Aaron Lehmann
  2002-08-12 10:36   ` Dale Johannesen
  6 siblings, 1 reply; 173+ messages in thread
From: Aaron Lehmann @ 2002-08-10 17:48 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

On Fri, Aug 09, 2002 at 12:17:32PM -0700, Mike Stump wrote:
> I'd like to introduce lots of various changes to improve compiler 
> speed.

Just adding my two cents to the discussion - I saw many ideas
presented in this thread that look promising, but one thing that I
didn't see mentioned was gcc's extensive sanity checking. There are
many tests which will produce an internal compiler error when merited.
This is great tool for debugging, but most of these errors should be
impossible to reach. Does anyone know how much overhead this sanity
checking in general causes, and whether there are any sanity checks
that are unusually expensive and should be considered for removal?

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: Faster compilation speed
  2002-08-10 17:48 ` Aaron Lehmann
@ 2002-08-12 10:36   ` Dale Johannesen
  0 siblings, 0 replies; 173+ messages in thread
From: Dale Johannesen @ 2002-08-12 10:36 UTC (permalink / raw)
  To: Aaron Lehmann; +Cc: Dale Johannesen, Mike Stump, gcc

On Saturday, August 10, 2002, at 05:48 PM, Aaron Lehmann wrote:

^ permalink raw reply	[flat|nested] 173+ messages in thread

end of thread, other threads:[~2002-08-23 15:39 UTC | newest]

Thread overview: 173+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-09 12:17 Faster compilation speed Mike Stump
2002-08-09 13:04 ` Noel Yap
2002-08-09 13:10   ` Matt Austern
2002-08-09 14:22   ` Neil Booth
2002-08-09 14:44     ` Noel Yap
2002-08-09 15:14       ` Neil Booth
2002-08-10 15:54         ` Noel Yap
2002-08-09 15:13   ` Stan Shebs
2002-08-09 15:18     ` Neil Booth
2002-08-10 16:12       ` Noel Yap
2002-08-10 18:00         ` Nix
2002-08-10 20:36           ` Noel Yap
2002-08-11  4:30             ` Nix
2002-08-12 15:08           ` Mike Stump
2002-08-09 15:19     ` Ziemowit Laski
2002-08-09 15:25       ` Neil Booth
2002-08-10 16:16       ` Noel Yap
2002-08-10 16:07     ` Noel Yap
2002-08-10 16:18       ` Neil Booth
2002-08-10 20:27         ` Noel Yap
2002-08-11  0:11           ` Neil Booth
2002-08-12 12:04             ` Devang Patel
2002-08-09 18:57   ` Linus Torvalds
2002-08-09 19:12     ` Phil Edwards
2002-08-09 19:34     ` Kevin Atkinson
2002-08-09 20:28       ` Linus Torvalds
2002-08-09 21:12         ` Daniel Berlin
2002-08-09 21:52           ` Linus Torvalds
2002-08-10  6:32         ` Robert Lipe
2002-08-10 14:26           ` Cyrille Chepelov
2002-08-10 17:33             ` Daniel Berlin
2002-08-10 18:21               ` Linus Torvalds
2002-08-10 18:38                 ` Daniel Berlin
2002-08-10 18:39                 ` Cyrille Chepelov
2002-08-10 18:28               ` Cyrille Chepelov
2002-08-10 18:30                 ` John Levon
2002-08-11  1:03             ` Florian Weimer
2002-08-10 19:20     ` Noel Yap
2002-08-09 13:10 ` Aldy Hernandez
2002-08-09 15:28   ` Mike Stump
2002-08-09 16:00     ` Aldy Hernandez
2002-08-09 16:26       ` Stan Shebs
2002-08-09 16:31         ` Aldy Hernandez
2002-08-09 16:51           ` Stan Shebs
2002-08-09 16:54             ` Aldy Hernandez
2002-08-09 17:44             ` Daniel Berlin
2002-08-09 18:35               ` David S. Miller
2002-08-09 18:39                 ` Aldy Hernandez
2002-08-09 18:59                   ` David S. Miller
2002-08-09 20:01                   ` Per Bothner
2002-08-09 18:25             ` David S. Miller
2002-08-13  0:50               ` Loren James Rittle
2002-08-13 21:46                 ` Fergus Henderson
2002-08-13 22:40                   ` David S. Miller
2002-08-13 23:44                     ` Fergus Henderson
2002-08-14  7:58                     ` Jeff Sturm
2002-08-14  9:52                     ` Richard Henderson
2002-08-14 10:00                       ` David Edelsohn
2002-08-14 12:01                         ` Andreas Schwab
2002-08-14 12:07                           ` David Edelsohn
2002-08-14 13:20                             ` Michael Matz
2002-08-14 16:31                               ` Faster compilation speed [zone allocation] Per Bothner
2002-08-15 11:34                                 ` Aldy Hernandez
2002-08-15 11:39                                   ` David Edelsohn
2002-08-15 12:01                                     ` Lynn Winebarger
2002-08-15 12:11                                       ` David Edelsohn
2002-08-15 11:41                                   ` Michael Matz
2002-08-16  8:44                                     ` Kai Henningsen
2002-08-15 11:43                                   ` Per Bothner
2002-08-15 11:57                                   ` Kevin Handy
2002-08-14 13:20                             ` Faster compilation speed Jamie Lokier
2002-08-14 16:01                               ` Nix
2002-08-14 10:15                       ` David Edelsohn
2002-08-14 16:35                         ` Richard Henderson
2002-08-14 17:02                           ` David Edelsohn
2002-08-20  4:15                         ` Richard Earnshaw
2002-08-20  5:38                           ` Jeff Sturm
2002-08-20  5:53                             ` Richard Earnshaw
2002-08-20 13:42                               ` Jeff Sturm
2002-08-22  1:55                                 ` Richard Earnshaw
2002-08-22  2:03                                   ` David S. Miller
2002-08-23 15:39                                   ` Jeff Sturm
2002-08-20  8:00                           ` David Edelsohn
2002-08-14  7:36                   ` Jeff Sturm
2002-08-10 10:02             ` Neil Booth
2002-08-09 17:36         ` Daniel Berlin
2002-08-12 16:23         ` Mike Stump
2002-08-12 16:05       ` Mike Stump
2002-08-09 19:07     ` David Edelsohn
2002-08-09 14:29 ` Neil Booth
2002-08-09 15:02   ` Nathan Sidwell
2002-08-09 17:05     ` Stan Shebs
2002-08-10  2:21     ` Gabriel Dos Reis
2002-08-12 12:11   ` Mike Stump
2002-08-12 12:41     ` David Edelsohn
2002-08-12 12:47       ` Matt Austern
2002-08-12 12:56         ` David S. Miller
2002-08-12 13:56           ` Matt Austern
2002-08-12 14:27             ` Daniel Berlin
2002-08-12 15:26               ` David Edelsohn
2002-08-13 10:49                 ` David Edelsohn
2002-08-13 10:52                   ` David S. Miller
2002-08-13 14:03                   ` David Edelsohn
2002-08-13 14:46                     ` Geoff Keating
2002-08-13 15:10                       ` David Edelsohn
2002-08-13 15:26                         ` Neil Booth
2002-08-14  9:25                     ` Kevin Handy
2002-08-18 12:58                     ` Jeff Sturm
2002-08-19 12:55                       ` Mike Stump
2002-08-20 11:22                       ` Will Cohen
2002-08-13 15:32                   ` Daniel Berlin
2002-08-13 15:58                     ` David Edelsohn
2002-08-13 16:49                       ` David S. Miller
2002-08-12 14:59             ` David S. Miller
2002-08-12 16:00             ` Geoff Keating
2002-08-13  2:58               ` Nick Ing-Simmons
2002-08-13 10:47               ` Richard Henderson
2002-08-12 14:28           ` Stan Shebs
2002-08-12 15:05             ` David S. Miller
2002-08-12 19:17     ` Mike Stump
2002-08-12 23:28       ` Neil Booth
2002-08-09 14:51 ` Stan Shebs
2002-08-09 15:03   ` David Edelsohn
2002-08-09 15:43     ` Stan Shebs
2002-08-09 16:43     ` Alan Lehotsky
2002-08-09 16:49       ` Matt Austern
2002-08-10  2:24         ` Gabriel Dos Reis
2002-08-09 15:26   ` Geoff Keating
2002-08-09 16:06     ` Stan Shebs
2002-08-09 16:14       ` Terry Flannery
2002-08-09 16:29         ` Neil Booth
2002-08-09 16:29       ` Phil Edwards
2002-08-12 16:24         ` Mike Stump
2002-08-12 18:38           ` Phil Edwards
2002-08-13  5:27           ` Theodore Papadopoulo
2002-08-13 10:03             ` Mike Stump
2002-08-12 15:55     ` Mike Stump
2002-08-09 14:59 ` Timothy J. Wood
2002-08-16 13:31   ` Problem with PFE approach [Was: Faster compilation speed] Timothy J. Wood
2002-08-16 13:44     ` Devang Patel
2002-08-16 14:31       ` Timothy J. Wood
2002-08-16 14:39         ` Neil Booth
2002-08-16 14:46         ` Devang Patel
2002-08-16 13:54     ` Devang Patel
2002-08-16 14:42       ` Neil Booth
2002-08-16 14:57         ` Devang Patel
2002-08-17 15:31           ` Timothy J. Wood
2002-08-17 20:04             ` Daniel Berlin
2002-08-17 20:07               ` Andrew Pinski
2002-08-17 20:14               ` Timothy J. Wood
2002-08-17 20:21                 ` Daniel Berlin
2002-08-18  3:17                   ` Kai Henningsen
2002-08-18  7:36                     ` Daniel Berlin
2002-08-18 11:20                       ` jepler
2002-08-18 13:20                         ` Daniel Berlin
2002-08-18 14:31                           ` Timothy J. Wood
2002-08-18 14:35                             ` Andrew Pinski
2002-08-18 14:55                               ` Timothy J. Wood
2002-08-19  2:41                             ` Michael Matz
2002-08-19  6:26                               ` jepler
2002-08-19  6:40                                 ` Daniel Berlin
2002-08-19 11:50                                 ` Devang Patel
2002-08-19 12:55                                   ` Jeff Epler
2002-08-19 13:03                                     ` Ziemowit Laski
2002-08-19 11:53                               ` Devang Patel
2002-08-19 11:59                 ` Devang Patel
2002-08-17 20:15             ` Daniel Berlin
2002-08-19  7:07             ` Stan Shebs
2002-08-19  8:52               ` Timothy J. Wood
2002-08-16 14:45       ` Timothy J. Wood
2002-08-09 16:01 ` Faster compilation speed Richard Henderson
2002-08-10 17:48 ` Aaron Lehmann
2002-08-12 10:36   ` Dale Johannesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).