* Faster compilation speed @ 2002-08-09 12:17 Mike Stump 2002-08-09 13:04 ` Noel Yap ` (6 more replies) 0 siblings, 7 replies; 173+ messages in thread From: Mike Stump @ 2002-08-09 12:17 UTC (permalink / raw) To: gcc I'd like to introduce lots of various changes to improve compiler speed. I thought I should send out an email and see if others think this would be good to have in the tree. Also, if it is, I'd like to solicit any ideas others have for me to pursue. I'd be happy to do all the hard work, if you come up with the ideas! The target is to be 6x faster. The first realization I came to is that the only existing control for such things is -O[123], and having thought about it, I think it would be best to retain and use those flags. For minimal user impact, I think it would be good to not perturb existing users of -O[0123] too much, or at leaast, not at first. If we wanted to change them, I think -O0 should be the `fast' version, -O1 should be what -O0 does now with some additions around the edges, and -O2 and -O3 also slide over (at least one). What do you think, slide them all over one or more, or just make -O0 do less, or...? Maybe we have a -O0.0 to mean compile very quickly? Another question would be how many knobs should we have? At first, I am inclined to say just one. If we want, we can later break them out into more choices. I am mainly interested in a single knob at this point. Another question is, what should the lower limit be on uglifying code for the sake of compilation speed. Below are some concrete ideas so others can get a feel for the types of changes, and to comment on the flag and how it is used. While I give a specific example, I'm more interested in the upper level comments, than discussion of not combining temp slots. The use of a macro preprocessor symbol allows us to replace it with 0 or 1, should we want to obtain a compiler that is unconditionally faster, or one that doesn't have any extra code in it. This change yields a 0.9% speed improvement when compiling expr.c. Not much, but if the compiler were 6x faster, this would be 5.5% change in compilation speed. The resulting code is worse, but not by much. So, let the discussion begin... Doing diffs in flags.h.~1~: *** flags.h.~1~ Fri Aug 9 10:17:36 2002 --- flags.h Fri Aug 9 10:37:58 2002 *************** extern int flag_signaling_nans; *** 696,699 **** --- 696,705 ---- #define HONOR_SIGN_DEPENDENT_ROUNDING(MODE) \ (MODE_HAS_SIGN_DEPENDENT_ROUNDING (MODE) && !flag_unsafe_math_optimizations) + /* Nonzero for compiling as fast as we can. */ + + extern int flag_speed_compile; + + #define SPEEDCOMPILE flag_speed_compile + #endif /* ! GCC_FLAGS_H */ -------------- Doing diffs in function.c.~1~: *** function.c.~1~ Fri Aug 9 10:17:36 2002 --- function.c Fri Aug 9 10:37:58 2002 *************** free_temp_slots () *** 1198,1203 **** --- 1198,1206 ---- { struct temp_slot *p; + if (SPEEDCOMPILE) + return; + for (p = temp_slots; p; p = p->next) if (p->in_use && p->level == temp_slot_level && ! p->keep && p->rtl_expr == 0) *************** free_temps_for_rtl_expr (t) *** 1214,1219 **** --- 1217,1225 ---- { struct temp_slot *p; + if (SPEEDCOMPILE) + return; + for (p = temp_slots; p; p = p->next) if (p->rtl_expr == t) { *************** pop_temp_slots () *** 1301,1311 **** { struct temp_slot *p; ! for (p = temp_slots; p; p = p->next) ! if (p->in_use && p->level == temp_slot_level && p->rtl_expr == 0) ! p->in_use = 0; ! combine_temp_slots (); temp_slot_level--; } --- 1307,1320 ---- { struct temp_slot *p; ! if (! SPEEDCOMPILE) ! { ! for (p = temp_slots; p; p = p->next) ! if (p->in_use && p->level == temp_slot_level && p->rtl_expr == 0) ! p->in_use = 0; ! combine_temp_slots (); ! } temp_slot_level--; } -------------- Doing diffs in toplev.c.~1~: *** toplev.c.~1~ Fri Aug 9 10:17:40 2002 --- toplev.c Fri Aug 9 11:31:50 2002 *************** int flag_new_regalloc = 0; *** 894,899 **** --- 894,903 ---- int flag_tracer = 0; + /* If nonzero, speed-up the compile as fast as we can. */ + + int flag_speed_compile = 0; + /* Values of the -falign-* flags: how much to align labels in code. 0 means `use default', 1 means `don't align'. For each variable, there is an _log variant which is the power *************** display_help () *** 3679,3684 **** --- 3683,3689 ---- printf (_(" -O[number] Set optimization level to [number]\n")); printf (_(" -Os Optimize for space rather than speed\n")); + printf (_(" -Of Compile as fast as possible\n")); for (i = LAST_PARAM; i--;) { const char *description = compiler_params[i].help; *************** parse_options_and_default_flags (argc, a *** 4772,4777 **** --- 4777,4786 ---- /* Optimizing for size forces optimize to be 2. */ optimize = 2; } + else if ((p[0] == 'f') && (p[1] == 0)) + { + flag_speed_compile = 1; + } else { const int optimize_val = read_integral_parameter (p, p - 2, -1); -------------- ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 12:17 Faster compilation speed Mike Stump @ 2002-08-09 13:04 ` Noel Yap 2002-08-09 13:10 ` Matt Austern ` (3 more replies) 2002-08-09 13:10 ` Aldy Hernandez ` (5 subsequent siblings) 6 siblings, 4 replies; 173+ messages in thread From: Noel Yap @ 2002-08-09 13:04 UTC (permalink / raw) To: Mike Stump, gcc Build speeds are most helped by minimizing the number of files opened and closed during the build. I think a good start would be to have preprocessed header files. My idea would be to add options to cpp that would have it produce preprocessed files. Doing so would allow it to be easily integrated into a build system like "make". At first, I think all that's really needed is a cpp option, say --preprocess-includes, that just goes through and preprocesses the #include directives (eg it doesn't preprocess #define's, #if's, ...). Conceivably, this would also require some other option, possibly --preprocessed-header-file-path, so that it can recognize when to use existing preprocessed header files. MTC, Noel --- Mike Stump <mrs@apple.com> wrote: > I'd like to introduce lots of various changes to > improve compiler > speed. I thought I should send out an email and see > if others think > this would be good to have in the tree. Also, if it > is, I'd like to > solicit any ideas others have for me to pursue. I'd > be happy to do all > the hard work, if you come up with the ideas! The > target is to be 6x > faster. > > The first realization I came to is that the only > existing control for > such things is -O[123], and having thought about it, > I think it would > be best to retain and use those flags. For minimal > user impact, I > think it would be good to not perturb existing users > of -O[0123] too > much, or at leaast, not at first. If we wanted to > change them, I think > -O0 should be the `fast' version, -O1 should be what > -O0 does now with > some additions around the edges, and -O2 and -O3 > also slide over (at > least one). What do you think, slide them all over > one or more, or > just make -O0 do less, or...? Maybe we have a -O0.0 > to mean compile > very quickly? > > Another question would be how many knobs should we > have? At first, I > am inclined to say just one. If we want, we can > later break them out > into more choices. I am mainly interested in a > single knob at this > point. > > Another question is, what should the lower limit be > on uglifying code > for the sake of compilation speed. > > Below are some concrete ideas so others can get a > feel for the types of > changes, and to comment on the flag and how it is > used. > While I give a specific example, I'm more interested > in the upper level > comments, than discussion of not combining temp > slots. > > The use of a macro preprocessor symbol allows us to > replace it with 0 > or 1, should we want to obtain a compiler that is > unconditionally > faster, or one that doesn't have any extra code in > it. > > This change yields a 0.9% speed improvement when > compiling expr.c. Not > much, but if the compiler were 6x faster, this would > be 5.5% change in > compilation speed. The resulting code is worse, but > not by much. > > So, let the discussion begin... > > > Doing diffs in flags.h.~1~: > *** flags.h.~1~ Fri Aug 9 10:17:36 2002 > --- flags.h Fri Aug 9 10:37:58 2002 > *************** extern int flag_signaling_nans; > *** 696,699 **** > --- 696,705 ---- > #define HONOR_SIGN_DEPENDENT_ROUNDING(MODE) \ > (MODE_HAS_SIGN_DEPENDENT_ROUNDING (MODE) && > !flag_unsafe_math_optimizations) > > + /* Nonzero for compiling as fast as we can. */ > + > + extern int flag_speed_compile; > + > + #define SPEEDCOMPILE flag_speed_compile > + > #endif /* ! GCC_FLAGS_H */ > -------------- > Doing diffs in function.c.~1~: > *** function.c.~1~ Fri Aug 9 10:17:36 2002 > --- function.c Fri Aug 9 10:37:58 2002 > *************** free_temp_slots () > *** 1198,1203 **** > --- 1198,1206 ---- > { > struct temp_slot *p; > > + if (SPEEDCOMPILE) > + return; > + > for (p = temp_slots; p; p = p->next) > if (p->in_use && p->level == temp_slot_level > && ! p->keep > && p->rtl_expr == 0) > *************** free_temps_for_rtl_expr (t) > *** 1214,1219 **** > --- 1217,1225 ---- > { > struct temp_slot *p; > > + if (SPEEDCOMPILE) > + return; > + > for (p = temp_slots; p; p = p->next) > if (p->rtl_expr == t) > { > *************** pop_temp_slots () > *** 1301,1311 **** > { > struct temp_slot *p; > > ! for (p = temp_slots; p; p = p->next) > ! if (p->in_use && p->level == temp_slot_level > && p->rtl_expr == 0) > ! p->in_use = 0; > > ! combine_temp_slots (); > > temp_slot_level--; > } > --- 1307,1320 ---- > { > struct temp_slot *p; > > ! if (! SPEEDCOMPILE) > ! { > ! for (p = temp_slots; p; p = p->next) > ! if (p->in_use && p->level == temp_slot_level > && p->rtl_expr == > 0) > ! p->in_use = 0; > > ! combine_temp_slots (); > ! } > > temp_slot_level--; > } > -------------- > Doing diffs in toplev.c.~1~: > *** toplev.c.~1~ Fri Aug 9 10:17:40 2002 > --- toplev.c Fri Aug 9 11:31:50 2002 > *************** int flag_new_regalloc = 0; > *** 894,899 **** > --- 894,903 ---- > > int flag_tracer = 0; > > + /* If nonzero, speed-up the compile as fast as we > can. */ > + > + int flag_speed_compile = 0; > + > /* Values of the -falign-* flags: how much to > align labels in code. > 0 means `use default', 1 means `don't align'. > For each variable, there is an _log variant > which is the power > *************** display_help () > *** 3679,3684 **** > --- 3683,3689 ---- > > printf (_(" -O[number] Set > optimization level to > [number]\n")); > printf (_(" -Os Optimize > for space rather than > speed\n")); > + printf (_(" -Of Compile as > fast as > possible\n")); > for (i = LAST_PARAM; i--;) > { > const char *description = > compiler_params[i].help; > *************** parse_options_and_default_flags > (argc, a > *** 4772,4777 **** > --- 4777,4786 ---- > /* Optimizing for size forces > optimize to be 2. */ > optimize = 2; > } > + else if ((p[0] == 'f') && (p[1] == 0)) > + { > + flag_speed_compile = 1; > + } > else > { > const int optimize_val = > read_integral_parameter (p, p - > 2, -1); > -------------- > __________________________________________________ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 13:04 ` Noel Yap @ 2002-08-09 13:10 ` Matt Austern 2002-08-09 14:22 ` Neil Booth ` (2 subsequent siblings) 3 siblings, 0 replies; 173+ messages in thread From: Matt Austern @ 2002-08-09 13:10 UTC (permalink / raw) To: Noel Yap; +Cc: Mike Stump, gcc On Friday, August 9, 2002, at 01:04 PM, Noel Yap wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 13:04 ` Noel Yap 2002-08-09 13:10 ` Matt Austern @ 2002-08-09 14:22 ` Neil Booth 2002-08-09 14:44 ` Noel Yap 2002-08-09 15:13 ` Stan Shebs 2002-08-09 18:57 ` Linus Torvalds 3 siblings, 1 reply; 173+ messages in thread From: Neil Booth @ 2002-08-09 14:22 UTC (permalink / raw) To: Noel Yap; +Cc: Mike Stump, gcc Noel Yap wrote:- > At first, I think all that's really needed is a cpp > option, say --preprocess-includes, that just goes > through and preprocesses the #include directives (eg > it doesn't preprocess #define's, #if's, ...). Heh, if only life were this easy. If you actually think about what CPP does, you'd realize this is a no-go. Two immediate issues: 1) #include can take a macro as argument 2) #include can appear in preprocessor conditional blocks. You only know whether they are processed if you know the correct value of the #if. This often depends on macro expansions, and correct processing of prior includes. Of course, #defines appear in conditional blocks too, so this is kind of important to get right. There are no easy shortcuts here: to preprocess something properly, you have to do *everything* the preprocessor does "normally". There are no shortcuts, not even trivial ones. We *do* do too many stats and opens though; when I get time I'll post my ideas about this. Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 14:22 ` Neil Booth @ 2002-08-09 14:44 ` Noel Yap 2002-08-09 15:14 ` Neil Booth 0 siblings, 1 reply; 173+ messages in thread From: Noel Yap @ 2002-08-09 14:44 UTC (permalink / raw) To: Neil Booth; +Cc: Mike Stump, gcc --- Neil Booth <neil@daikokuya.co.uk> wrote: > Heh, if only life were this easy. If you actually > think about what CPP > does, you'd realize this is a no-go. Two immediate > issues: > > 1) #include can take a macro as argument Yes, what I suggest certainly won't work for this situation. OTOH, how many times is this really used? Would it be such a sin to say that one cannot do the preprocessing I suggested if one has macros for #include arguments? > 2) #include can appear in preprocessor conditional > blocks. You > only know whether they are processed if you know > the correct value > of the #if. This often depends on macro > expansions, and correct > processing of prior includes. Of course, > #defines appear in > conditional blocks too, so this is kind of > important to get right. I don't see this as too big a problem. Just output a file like: #if COND /* contents of header file #endif In fact, doing it this way has the advantage that several builds, not necessarily agreeing on the value of COND, can use the file. > There are no easy shortcuts here: to preprocess > something properly, > you have to do *everything* the preprocessor does > "normally". There > are no shortcuts, not even trivial ones. I think one needn't preprocess everything perfectly in order to gain significant advantages. Would you say that what I suggest is better than what we have now? If an ideal solution is being worked on, I'd opt for that. OTOH, I think this solution has been in the works for at least a couple of years now. I think the --preprocess-includes option should be very simple to do. > We *do* do too many stats and opens though; when I > get time I'll post > my ideas about this. I'm sure my ideas are far from ideal so I'm looking forward to yours. Noel __________________________________________________ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 14:44 ` Noel Yap @ 2002-08-09 15:14 ` Neil Booth 2002-08-10 15:54 ` Noel Yap 0 siblings, 1 reply; 173+ messages in thread From: Neil Booth @ 2002-08-09 15:14 UTC (permalink / raw) To: Noel Yap; +Cc: Mike Stump, gcc Noel Yap wrote:- > I don't see this as too big a problem. Just output a > file like: > #if COND > /* contents of header file > #endif > > In fact, doing it this way has the advantage that > several builds, not necessarily agreeing on the value > of COND, can use the file. Hmm, and what about header guards? Infinite recursion? > I think one needn't preprocess everything perfectly in > order to gain significant advantages. Would you say > that what I suggest is better than what we have now? Correctness is paramount; if it's not correct it's no good. Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:14 ` Neil Booth @ 2002-08-10 15:54 ` Noel Yap 0 siblings, 0 replies; 173+ messages in thread From: Noel Yap @ 2002-08-10 15:54 UTC (permalink / raw) To: Neil Booth; +Cc: Mike Stump, gcc --- Neil Booth <neil@daikokuya.co.uk> wrote: > Noel Yap wrote:- > > > I don't see this as too big a problem. Just > output a > > file like: > > #if COND > > /* contents of header file > > #endif > > > > In fact, doing it this way has the advantage that > > several builds, not necessarily agreeing on the > value > > of COND, can use the file. > > Hmm, and what about header guards? Infinite > recursion? Unless I'm missing something, header guards by themselves shouldn't pose a problem. You're right. Cyclic dependencies would throw this whole thing out of whack. OTOH, I think such practice needs to be avoided anyhow. Another case related to recursive includes is where each level of recursion would have side effects (eg redefining a macro whose value is used in the next recursion). Again, I've heard this usage only once and even the creator of such a header file said it was a tremendous hack for programmers with no proper education in programming (IIRC, they were physicists). > > I think one needn't preprocess everything > perfectly in > > order to gain significant advantages. Would you > say > > that what I suggest is better than what we have > now? > > Correctness is paramount; if it's not correct it's > no > good. I apologize if my post was misunderstood. What I meant to say was, if it's able to preprocess, then allow it, otherwise, don't. IOW, those already following common practices can take advantage of a new feature, those that don't have what they have now. I can certainly understand the ideals of keeping the tool and all its features pure and working for all possible uses. OTOH, doing so may prevent practicle avenues that possibly 99% of users can benefit from. Noel __________________________________________________ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 13:04 ` Noel Yap 2002-08-09 13:10 ` Matt Austern 2002-08-09 14:22 ` Neil Booth @ 2002-08-09 15:13 ` Stan Shebs 2002-08-09 15:18 ` Neil Booth ` (2 more replies) 2002-08-09 18:57 ` Linus Torvalds 3 siblings, 3 replies; 173+ messages in thread From: Stan Shebs @ 2002-08-09 15:13 UTC (permalink / raw) To: Noel Yap; +Cc: Mike Stump, gcc Noel Yap wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:13 ` Stan Shebs @ 2002-08-09 15:18 ` Neil Booth 2002-08-10 16:12 ` Noel Yap 2002-08-09 15:19 ` Ziemowit Laski 2002-08-10 16:07 ` Noel Yap 2 siblings, 1 reply; 173+ messages in thread From: Neil Booth @ 2002-08-09 15:18 UTC (permalink / raw) To: Stan Shebs; +Cc: Noel Yap, Mike Stump, gcc Stan Shebs wrote:- > Is this assertion based on empirical measurement, and if so, for what > source code and what system? For instance, the longest source file > in GCC is about 15K lines, and at -O2, only a small percentage of > time is spent messing with files. If I use -save-temps on cp/decl.c on > one of my (Linux) machines, I get a total time of about 38 sec from > source to asm. If I just compile decl.i, it's about 37 sec, so that's > 1 sec for *all* preprocessing, including all file opening/closing. Yes, it's very rare that preprocessing is more than 2% of -O2 time; it's often less than 1%. IMO that says more about the efficiency of the rest than of CPP. Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:18 ` Neil Booth @ 2002-08-10 16:12 ` Noel Yap 2002-08-10 18:00 ` Nix 0 siblings, 1 reply; 173+ messages in thread From: Noel Yap @ 2002-08-10 16:12 UTC (permalink / raw) To: Neil Booth, Stan Shebs; +Cc: Noel Yap, Mike Stump, gcc --- Neil Booth <neil@daikokuya.co.uk> wrote: > Stan Shebs wrote:- > > > Is this assertion based on empirical measurement, > and if so, for what > > source code and what system? For instance, the > longest source file > > in GCC is about 15K lines, and at -O2, only a > small percentage of > > time is spent messing with files. If I use > -save-temps on cp/decl.c on > > one of my (Linux) machines, I get a total time of > about 38 sec from > > source to asm. If I just compile decl.i, it's > about 37 sec, so that's > > 1 sec for *all* preprocessing, including all file > opening/closing. > > Yes, it's very rare that preprocessing is more than > 2% of -O2 time; > it's often less than 1%. IMO that says more about > the efficiency > of the rest than of CPP. I would agree if you're talking about complete builds spanning only a few C/C++ files. OTOH, when builds span many hundreds of these files, build-time (not just compile-time) starts getting bogged down on (mostly) reopening and repreprocessing the same files over and over. Within our system, builds on Windows are magnitudes faster since we're able to take advantage of precompiled headers. AFAIK, I legitimate study was made studying whether to use this feature or not. Noel __________________________________________________ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 16:12 ` Noel Yap @ 2002-08-10 18:00 ` Nix 2002-08-10 20:36 ` Noel Yap 2002-08-12 15:08 ` Mike Stump 0 siblings, 2 replies; 173+ messages in thread From: Nix @ 2002-08-10 18:00 UTC (permalink / raw) To: Noel Yap; +Cc: Neil Booth, gcc [Cc: list trimmed] On Sat, 10 Aug 2002, Noel Yap spake: > I would agree if you're talking about complete builds > spanning only a few C/C++ files. OTOH, when builds > span many hundreds of these files, build-time (not > just compile-time) starts getting bogged down on > (mostly) reopening and repreprocessing the same files > over and over. > > Within our system, builds on Windows are magnitudes > faster since we're able to take advantage of > precompiled headers. Are you sure that this isn't because GCC is having to parse the headers over and over again, while the precompiled system can avoid that overhead? Especially for C++ header files (which tend to be large, complex, interdependent, and include a lot of code), the parsing and compilation time *vastly* dominates the preprocessing time. Example, with GCC-3.1, with a `hello world' iostreams-using program... The code: #include <iostream> int main (void) { std::cout << "Hello world"; return 0; } Time spent preprocessing (distorted by the slowness of cpp's output routines): nix@loki 62 /tmp% time c++ -E -ftime-report hello.C >/dev/null real 0m1.424s user 0m0.710s sys 0m0.100s Time spent preprocessing and parsing (roughly; cpp's output routines are still slow; on the trunk much less time will be spent preprocessing because the integrated preprocessor doesn't have to do any output at all there, instead feeding a token stream to the rest of the compiler): nix@loki 60 /tmp% c++ -ftime-report -fsyntax-only hello.C Execution times (seconds) garbage collection : 1.16 (12%) usr 0.08 ( 6%) sys 2.19 (13%) wall preprocessing : 1.04 (11%) usr 0.29 (20%) sys 2.10 (12%) wall lexical analysis : 0.99 (10%) usr 0.28 (20%) sys 1.87 (11%) wall parser : 6.12 (65%) usr 0.75 (53%) sys 10.85 (63%) wall varconst : 0.08 ( 1%) usr 0.00 ( 0%) sys 0.10 ( 1%) wall TOTAL : 9.44 1.42 17.21 (oddly, preprocessing took *longer* than it did using -E, which I'd not expected; but, still parsing vastly dominates preprocessing, and this isn't going near e.g. the STL headers) Complete run, with optimization: nix@loki 66 /tmp% c++ -O2 -ftime-report -o hello hello.C Execution times (seconds) garbage collection : 1.10 (11%) usr 0.11 ( 9%) sys 1.74 (11%) wall cfg cleanup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall life analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall preprocessing : 1.12 (11%) usr 0.22 (18%) sys 2.04 (13%) wall lexical analysis : 0.98 (10%) usr 0.22 (18%) sys 1.93 (12%) wall parser : 6.46 (65%) usr 0.63 (53%) sys 9.98 (62%) wall expand : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall varconst : 0.08 ( 1%) usr 0.00 ( 0%) sys 0.12 ( 1%) wall CSE : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall CSE 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall regmove : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall global alloc : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall flow 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall rename registers : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall scheduling 2 : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.02 ( 0%) wall final : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall TOTAL : 9.96 1.20 16.16 Now obviously with a less toy example the time consumed optimizing would rise; but that doesn't affect my point, that the lion's share of time spent in C++ header files is parsing time, and that speeding up the preprocessor will have limited effect now (thanks to Zack and Neil speeding it up so much already :) ). -- `There's something satisfying about killing JWZ over and over again.' -- 1i, personal communication ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 18:00 ` Nix @ 2002-08-10 20:36 ` Noel Yap 2002-08-11 4:30 ` Nix 2002-08-12 15:08 ` Mike Stump 1 sibling, 1 reply; 173+ messages in thread From: Noel Yap @ 2002-08-10 20:36 UTC (permalink / raw) To: Nix; +Cc: Neil Booth, gcc --- Nix <nix@esperi.demon.co.uk> wrote: > [Cc: list trimmed] > On Sat, 10 Aug 2002, Noel Yap spake: > > I would agree if you're talking about complete > builds > > spanning only a few C/C++ files. OTOH, when > builds > > span many hundreds of these files, build-time (not > > just compile-time) starts getting bogged down on > > (mostly) reopening and repreprocessing the same > files > > over and over. > > > > Within our system, builds on Windows are > magnitudes > > faster since we're able to take advantage of > > precompiled headers. > > Are you sure that this isn't because GCC is having > to parse the headers > over and over again, while the precompiled system > can avoid that > overhead? No, I'm not sure. In any case, whether it's due to elimination of reparsing or elimination of reopening, would you agree that precompiled headers should speed up builds? > Especially for C++ header files (which tend to be > large, complex, > interdependent, and include a lot of code), the > parsing and compilation > time *vastly* dominates the preprocessing time. What about for us lowly C programmers? > Example, with GCC-3.1, with a `hello world' > iostreams-using program... > > The code: > > #include <iostream> > > int main (void) > { > std::cout << "Hello world"; > return 0; > } > > Time spent preprocessing (distorted by the slowness > of cpp's output > routines): > > nix@loki 62 /tmp% time c++ -E -ftime-report hello.C > >/dev/null > > real 0m1.424s > user 0m0.710s > sys 0m0.100s > > Time spent preprocessing and parsing (roughly; cpp's > output routines are > still slow; on the trunk much less time will be > spent preprocessing > because the integrated preprocessor doesn't have to > do any output at all > there, instead feeding a token stream to the rest of > the compiler): > > nix@loki 60 /tmp% c++ -ftime-report -fsyntax-only > hello.C > > Execution times (seconds) > garbage collection : 1.16 (12%) usr 0.08 ( > 6%) sys 2.19 (13%) wall > preprocessing : 1.04 (11%) usr 0.29 > (20%) sys 2.10 (12%) wall > lexical analysis : 0.99 (10%) usr 0.28 > (20%) sys 1.87 (11%) wall > parser : 6.12 (65%) usr 0.75 > (53%) sys 10.85 (63%) wall > varconst : 0.08 ( 1%) usr 0.00 ( > 0%) sys 0.10 ( 1%) wall > TOTAL : 9.44 1.42 > 17.21 > > (oddly, preprocessing took *longer* than it did > using -E, which I'd not > expected; but, still parsing vastly dominates > preprocessing, and this isn't > going near e.g. the STL headers) OK. Now let's say that that preprocessing can be used across several compiles. Can you see how an entire _build_ (eg comprising of many compiles) can be sped up? > Complete run, with optimization: > > nix@loki 66 /tmp% c++ -O2 -ftime-report -o hello > hello.C > > Execution times (seconds) > garbage collection : 1.10 (11%) usr 0.11 ( > 9%) sys 1.74 (11%) wall > cfg cleanup : 0.01 ( 0%) usr 0.00 ( > 0%) sys 0.01 ( 0%) wall > life analysis : 0.02 ( 0%) usr 0.00 ( > 0%) sys 0.02 ( 0%) wall > preprocessing : 1.12 (11%) usr 0.22 > (18%) sys 2.04 (13%) wall > lexical analysis : 0.98 (10%) usr 0.22 > (18%) sys 1.93 (12%) wall > parser : 6.46 (65%) usr 0.63 > (53%) sys 9.98 (62%) wall > expand : 0.00 ( 0%) usr 0.00 ( > 0%) sys 0.01 ( 0%) wall > varconst : 0.08 ( 1%) usr 0.00 ( > 0%) sys 0.12 ( 1%) wall > CSE : 0.02 ( 0%) usr 0.00 ( > 0%) sys 0.03 ( 0%) wall > CSE 2 : 0.01 ( 0%) usr 0.00 ( > 0%) sys 0.03 ( 0%) wall > regmove : 0.01 ( 0%) usr 0.00 ( > 0%) sys 0.02 ( 0%) wall > global alloc : 0.02 ( 0%) usr 0.00 ( > 0%) sys 0.04 ( 0%) wall > flow 2 : 0.01 ( 0%) usr 0.00 ( > 0%) sys 0.01 ( 0%) wall > rename registers : 0.02 ( 0%) usr 0.00 ( > 0%) sys 0.03 ( 0%) wall > scheduling 2 : 0.00 ( 0%) usr 0.01 ( > 1%) sys 0.02 ( 0%) wall > final : 0.01 ( 0%) usr 0.00 ( > 0%) sys 0.01 ( 0%) wall > TOTAL : 9.96 1.20 > 16.16 > > Now obviously with a less toy example the time > consumed optimizing would > rise; but that doesn't affect my point, that the > lion's share of time > spent in C++ header files is parsing time, and that > speeding up the > preprocessor will have limited effect now (thanks to > Zack and Neil > speeding it up so much already :) ). What kind of effect does it have for C? Do you think saving preprocessor output (of header files) can speed up a build consisting of many, many compiles? Thanks, Noel __________________________________________________ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 20:36 ` Noel Yap @ 2002-08-11 4:30 ` Nix 0 siblings, 0 replies; 173+ messages in thread From: Nix @ 2002-08-11 4:30 UTC (permalink / raw) To: Noel Yap; +Cc: Neil Booth, gcc [rewrapped my quoted text] On Sat, 10 Aug 2002, Noel Yap stated: > --- Nix <nix@esperi.demon.co.uk> wrote: >> Are you sure that this isn't because GCC is having to parse the >> headers over and over again, while the precompiled system can avoid >> that overhead? > > No, I'm not sure. In any case, whether it's due to > elimination of reparsing or elimination of reopening, > would you agree that precompiled headers should speed > up builds? Yes, but mainly (IMHO) because the `precompilation' process includes some parsing work. The preprocessing job (compilation phases 1--4) should be quite fast. So speeding up *parsing* is the point here; getting rid of bison should help fix that :) (Maybe I'm being too pedantic here.) >> Especially for C++ header files (which tend to be large, complex, >> interdependent, and include a lot of code), the parsing and >> compilation time *vastly* dominates the preprocessing time. > > What about for us lowly C programmers? (oops, sorry, I thought you were using C++, because C++ users really *notice* time spent in headers.) The disparity there isn't anywhere near so extreme, but it's still there (just). I know that even with large bodies of C code I've never been able to spot preprocessing time; even the old cccp was damned-near instantaneous (well, except on very memory-constrained boxes where even ls(1) was a hassle). [snip] >> Now obviously with a less toy example the time consumed optimizing >> would rise; but that doesn't affect my point, that the lion's share >> of time spent in C++ header files is parsing time, and that speeding >> up the preprocessor will have limited effect now (thanks to Zack and >> Neil speeding it up so much already :) ). > > What kind of effect does it have for C? Do you think Hm... ... from my quick check (so primitive that I'm not even going to post it here) preprocessing and parsing seem to consume roughly equal amounts of time, and both are far exceeded by the amount of time taken to compile the code itself. So there's not much need for preprocessor optimization in C as far as I can tell. > saving preprocessor output (of header files) can speed > up a build consisting of many, many compiles? Preprocessor *output*? In its current state, the output phase is the slowest part of the preprocessor, such that feeding token streams straight into the compiler (as 3.3-to-be will) is faster than saving it out to disk would be :) And for C code in particular I imagine that the larger size of the precompiled header lumps would cause extra disk I/O time that would exceed the time taken to parse the headers in the first place... but this is a guess: some of the people who've actually been working on precompiled headers can probably answer this better :) -- `There's something satisfying about killing JWZ over and over again.' -- 1i, personal communication ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 18:00 ` Nix 2002-08-10 20:36 ` Noel Yap @ 2002-08-12 15:08 ` Mike Stump 1 sibling, 0 replies; 173+ messages in thread From: Mike Stump @ 2002-08-12 15:08 UTC (permalink / raw) To: Nix; +Cc: Noel Yap, Neil Booth, gcc On Saturday, August 10, 2002, at 05:49 PM, Nix wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:13 ` Stan Shebs 2002-08-09 15:18 ` Neil Booth @ 2002-08-09 15:19 ` Ziemowit Laski 2002-08-09 15:25 ` Neil Booth 2002-08-10 16:16 ` Noel Yap 2002-08-10 16:07 ` Noel Yap 2 siblings, 2 replies; 173+ messages in thread From: Ziemowit Laski @ 2002-08-09 15:19 UTC (permalink / raw) To: Stan Shebs; +Cc: Ziemowit Laski, Noel Yap, Mike Stump, gcc On Friday, August 9, 2002, at 03:12 , Stan Shebs wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:19 ` Ziemowit Laski @ 2002-08-09 15:25 ` Neil Booth 2002-08-10 16:16 ` Noel Yap 1 sibling, 0 replies; 173+ messages in thread From: Neil Booth @ 2002-08-09 15:25 UTC (permalink / raw) To: Ziemowit Laski; +Cc: Stan Shebs, Noel Yap, Mike Stump, gcc Ziemowit Laski wrote:- > >Is this assertion based on empirical measurement, and if so, for what > >source code and what system? For instance, the longest source file > >in GCC is about 15K lines, and at -O2, only a small percentage of > >time is spent messing with files. If I use -save-temps on cp/decl.c on > >one of my (Linux) machines, I get a total time of about 38 sec from > >source to asm. If I just compile decl.i, it's about 37 sec, so that's > >1 sec for *all* preprocessing, including all file opening/closing. > > Since the preprocessor is integrated, I don't think you can separate > the timings in this way. :( A 'gcc3 -E cp/decl.c -o decl.i' would > probably be more meaningful. It is separated with the timing stuff. Your test is not good: it tests time to output. It is well-known that current CPP output is quite slow; on Linux this is largely a Glibc problem. CPP output can be 50% of preprocessing time, which when you think about it is quite illogical. However, it can be made much faster, and I will do this eventually. Since we use an integrated CPP, timing output is kind of irrelevant (and vastly overstates CPP time). Current CPP provides tokens to the parser far, far faster than cccp did via a temporary file and a duplicated lexer in the front end (not to mention other advantages, like precise token location information). Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:19 ` Ziemowit Laski 2002-08-09 15:25 ` Neil Booth @ 2002-08-10 16:16 ` Noel Yap 1 sibling, 0 replies; 173+ messages in thread From: Noel Yap @ 2002-08-10 16:16 UTC (permalink / raw) To: Ziemowit Laski, Stan Shebs; +Cc: Ziemowit Laski, Noel Yap, Mike Stump, gcc --- Ziemowit Laski <zlaski@apple.com> wrote: > > On Friday, August 9, 2002, at 03:12 , Stan Shebs > wrote: > > > Noel Yap wrote: > > > >> Build speeds are most helped by minimizing the > number > >> of files opened and closed during the build. > >> > > Is this assertion based on empirical measurement, > and if so, for what > > source code and what system? For instance, the > longest source file > > in GCC is about 15K lines, and at -O2, only a > small percentage of > > time is spent messing with files. If I use > -save-temps on cp/decl.c on > > one of my (Linux) machines, I get a total time of > about 38 sec from > > source to asm. If I just compile decl.i, it's > about 37 sec, so that's > > 1 sec for *all* preprocessing, including all file > opening/closing. > > Since the preprocessor is integrated, I don't think > you can separate > the timings in this way. :( A 'gcc3 -E cp/decl.c -o > decl.i' would > probably be more meaningful. This is a good point. I think an even better study would be to replicate John Lakos's study within one's own project. I'd be very interested to find out how many projects (other than the ones I've seen) fit Lakos's "largeness" and would, therefore, be able to take advantage of preprocessed headers. Noel __________________________________________________ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:13 ` Stan Shebs 2002-08-09 15:18 ` Neil Booth 2002-08-09 15:19 ` Ziemowit Laski @ 2002-08-10 16:07 ` Noel Yap 2002-08-10 16:18 ` Neil Booth 2 siblings, 1 reply; 173+ messages in thread From: Noel Yap @ 2002-08-10 16:07 UTC (permalink / raw) To: Stan Shebs; +Cc: Mike Stump, gcc --- Stan Shebs <shebs@apple.com> wrote: > Noel Yap wrote: > > >Build speeds are most helped by minimizing the > number > >of files opened and closed during the build. > > > Is this assertion based on empirical measurement, > and if so, for what > source code and what system? For instance, the > longest source file > in GCC is about 15K lines, and at -O2, only a small > percentage of > time is spent messing with files. If I use > -save-temps on cp/decl.c on > one of my (Linux) machines, I get a total time of > about 38 sec from > source to asm. If I just compile decl.i, it's about > 37 sec, so that's > 1 sec for *all* preprocessing, including all file > opening/closing. This is a good question. John Lakos in _Large-Scale C++ Software Development_ has performed a rudimentary case study. If the conclusions are true, then your example indicates that there wasn't much of a difference between the number of files used when compiling decl.c and decl.i. The study also indicates that having #include's within header files is the largest contributor to the problem (since nested #include's would increase the number of file accesses combinatorially). As another indication that the conclusion is true, Lakos added guards around the #include lines themselves and found compile times to dramatically decrease. For example: #if header_h # include <header.h> #endif I can go on, but I doubt others on this list would appreciate a reprint of the chapter. If you don't have the book, I suggest at least finding a copy and reading this chapter. > Obviously, other programs will have different > characteristics, and if > you have one for which file opening/closing > dominates compile time, > that will be very interesting. But it's bad to try > to optimize > something before you have numerical evidence. I agree. Would you agree with Lakos's findings as evidence to this fact? Noel __________________________________________________ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 16:07 ` Noel Yap @ 2002-08-10 16:18 ` Neil Booth 2002-08-10 20:27 ` Noel Yap 0 siblings, 1 reply; 173+ messages in thread From: Neil Booth @ 2002-08-10 16:18 UTC (permalink / raw) To: Noel Yap; +Cc: Stan Shebs, Mike Stump, gcc Noel Yap wrote:- > The study also indicates that having #include's within > header files is the largest contributor to the problem > (since nested #include's would increase the number of > file accesses combinatorially). See below for why this isn't true for most compilers now. > As another indication that the conclusion is true, > Lakos added guards around the #include lines > themselves and found compile times to dramatically > decrease. For example: > #if header_h > # include <header.h> > #endif This isn't the case with GCC. I hope you're aware of that. The first time GCC reads <header.h> it remembers if it had header guards. If it's ever asked to #include it again, it checks if the guard is defined, and doesn't do anything. The file's contents are also not cached if it has header guards, on the assumption that the contents are unlikely to be of interest in the future. In other words, this kind of #include protection is ugly and pointless (and possibly error-prone, though that would tend to be immediately obvious). Most compilers now implement this optimization, but 5 or 6 years ago this wasn't the case. I think GCC was one of the first. Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 16:18 ` Neil Booth @ 2002-08-10 20:27 ` Noel Yap 2002-08-11 0:11 ` Neil Booth 0 siblings, 1 reply; 173+ messages in thread From: Noel Yap @ 2002-08-10 20:27 UTC (permalink / raw) To: Neil Booth; +Cc: Stan Shebs, Mike Stump, gcc --- Neil Booth <neil@daikokuya.co.uk> wrote: > Noel Yap wrote:- > > > The study also indicates that having #include's > within > > header files is the largest contributor to the > problem > > (since nested #include's would increase the number > of > > file accesses combinatorially). > > See below for why this isn't true for most compilers > now. > > > As another indication that the conclusion is true, > > Lakos added guards around the #include lines > > themselves and found compile times to dramatically > > decrease. For example: > > #if header_h > > # include <header.h> > > #endif > > This isn't the case with GCC. I hope you're aware > of that. > The first time GCC reads <header.h> it remembers if > it had > header guards. If it's ever asked to #include it > again, > it checks if the guard is defined, and doesn't do > anything. > The file's contents are also not cached if it has > header > guards, on the assumption that the contents are > unlikely to > be of interest in the future. > > In other words, this kind of #include protection is > ugly and > pointless (and possibly error-prone, though that > would tend > to be immediately obvious). Most compilers now > implement > this optimization, but 5 or 6 years ago this wasn't > the case. > I think GCC was one of the first. I stand corrected. (I'm assuming gcc doesn't do this in cases where the header guard might have side effects or if there's a matching #else for the #ifndef). Do you think precompiled headers would help build speed across several compiles since it would be another source to eliminate repeated file opens? Thanks, Noel __________________________________________________ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 20:27 ` Noel Yap @ 2002-08-11 0:11 ` Neil Booth 2002-08-12 12:04 ` Devang Patel 0 siblings, 1 reply; 173+ messages in thread From: Neil Booth @ 2002-08-11 0:11 UTC (permalink / raw) To: Noel Yap; +Cc: Stan Shebs, Mike Stump, gcc Noel Yap wrote:- > I stand corrected. (I'm assuming gcc doesn't do this > in cases where the header guard might have side > effects or if there's a matching #else for the > #ifndef). Correct. Header guards with side effects hardly exist I think. We recognize #ifndef and #if !defined with optional parentheses. Comments and whitespace do not affect the optimization. Headers with #else, #elif at the top level, and with anything outside the guards, or with a header guard that comes from a macro expansion are not optimized this way. > Do you think precompiled headers would help build > speed across several compiles since it would be > another source to eliminate repeated file opens? I don't think repeated file opens are high on the list of time eaters, particularly because of the optimization I mentioned. Tokenization and parsing probably take much longer. Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-11 0:11 ` Neil Booth @ 2002-08-12 12:04 ` Devang Patel 0 siblings, 0 replies; 173+ messages in thread From: Devang Patel @ 2002-08-12 12:04 UTC (permalink / raw) To: Noel Yap; +Cc: Neil Booth, Stan Shebs, Mike Stump, gcc On Sunday, August 11, 2002, at 12:08 AM, Neil Booth wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 13:04 ` Noel Yap ` (2 preceding siblings ...) 2002-08-09 15:13 ` Stan Shebs @ 2002-08-09 18:57 ` Linus Torvalds 2002-08-09 19:12 ` Phil Edwards ` (2 more replies) 3 siblings, 3 replies; 173+ messages in thread From: Linus Torvalds @ 2002-08-09 18:57 UTC (permalink / raw) To: yap_noel, gcc In article < 20020809200413.46719.qmail@web21403.mail.yahoo.com > you write: >Build speeds are most helped by minimizing the number >of files opened and closed during the build. I _seriously_ doubt that. Opening (and even reading) a cached file is not an expensive operation, not compared to the kinds of run-times gcc has. We're talking a few microseconds per file open at a low level. Even parsing it should not be that expensive, especially if the preprocessor is any good (and from all I've seen, these days it _is_ good). I strongly suspect that what makes gcc slow is that it has absolutely horrible cache behaviour, a big VM footprint, and chases pointers in that badly cached area all of the time. And that, in turn, is probably impossible to fix as long as gcc uses garbage collection for most of its internal memory management. There just aren't all that many worse ways to f*ck up your cache behaviour than by using lots of allocations and lazy GC to manage your memory. The problem with bad cache behaviour is that you don't get nice spikes in specific places that you can try to optimize - the cost ends up being spread all over the places that touch the data structures. The problem with trying to avoid GC is that if you do that you have to be careful about your reference counts, and I doubt the gcc people want to be that careful, especially considering that the code-base right now is not likely to be very easy to convert. (Plus the fact that GC proponents absolutely refuse to see the error of their ways, and will flame me royally for even _daring_ to say that GC sucks donkey brains through a straw from a performance standpoint. If order to work with refcounting, you need to have the mentality that every single data structure with a non-local lifetime needs to have the count as it's major member) Linus ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 18:57 ` Linus Torvalds @ 2002-08-09 19:12 ` Phil Edwards 2002-08-09 19:34 ` Kevin Atkinson 2002-08-10 19:20 ` Noel Yap 2 siblings, 0 replies; 173+ messages in thread From: Phil Edwards @ 2002-08-09 19:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: yap_noel, gcc On Fri, Aug 09, 2002 at 06:56:58PM -0700, Linus Torvalds wrote: > In article < 20020809200413.46719.qmail@web21403.mail.yahoo.com > you write: > >Build speeds are most helped by minimizing the number > >of files opened and closed during the build. > > I _seriously_ doubt that. To be fair, when listing "things we can do to speed up the build," most people don't include tinkering with the guts of the compiler. Statements like that of the original poster are correct when the compiler cannot be touched, and in fact many textbooks say exactly that: minimize the number of files opened (or more generally, system calls) to speed the build. (The lesson is typically something about multiple include guard macros or proper makefile dependancies.) So let's not be too harsh. When we're allowed to hack on the compiler source itself, of course, those statements go right out the window. :-) Phil -- I would therefore like to posit that computing's central challenge, viz. "How not to make a mess of it," has /not/ been met. - Edsger Dijkstra, 1930-2002 ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 18:57 ` Linus Torvalds 2002-08-09 19:12 ` Phil Edwards @ 2002-08-09 19:34 ` Kevin Atkinson 2002-08-09 20:28 ` Linus Torvalds 2002-08-10 19:20 ` Noel Yap 2 siblings, 1 reply; 173+ messages in thread From: Kevin Atkinson @ 2002-08-09 19:34 UTC (permalink / raw) To: gcc On Fri, 9 Aug 2002, Linus Torvalds wrote: > And that, in turn, is probably impossible to fix as long as gcc uses > garbage collection for most of its internal memory management. There > just aren't all that many worse ways to f*ck up your cache behaviour > than by using lots of allocations and lazy GC to manage your memory. Excuse the interruption, but from what I read a good generational garbage collector can be just as fast as manually managing memory? Is this not the case? If so could some one point me to some information regarding why? I am not trying to argue with anyone as I really don't know that much about GC except from what I read in a few papers. Sorry, I was reading this thread and that point struct me by surprise. --- http://kevin.atkinson.dhs.org ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 19:34 ` Kevin Atkinson @ 2002-08-09 20:28 ` Linus Torvalds 2002-08-09 21:12 ` Daniel Berlin 2002-08-10 6:32 ` Robert Lipe 0 siblings, 2 replies; 173+ messages in thread From: Linus Torvalds @ 2002-08-09 20:28 UTC (permalink / raw) To: kevin, gcc In article < Pine.LNX.4.44.0208092227500.2273-100000@kevin-pc.atkinson.dhs.org > you write: >On Fri, 9 Aug 2002, Linus Torvalds wrote: > >> And that, in turn, is probably impossible to fix as long as gcc uses >> garbage collection for most of its internal memory management. There >> just aren't all that many worse ways to f*ck up your cache behaviour >> than by using lots of allocations and lazy GC to manage your memory. > >Excuse the interruption, but from what I read a good generational garbage >collector can be just as fast as manually managing memory? All the papers I've seen on it are total jokes. But maybe I've looked at the wrong ones. One fundamental fact on modern hardware is that data cache locality is good, and not being in the cache sucks. This is not likely to change. In particular, this means that if you allocate stuff, you want to re-use the stuff you just freed _as_soon_as_possible_ - preferably before the previously dirty data has ever even been evicted from the cache, so that you can re-use the thing to avoid reading it in, but also to avoid writing out stale data. This implies that any lazy de-allocation is bad. When a piece of memory is free, you want to de-allocate it _immediately_, so that the next allocation gets to re-use it and gets the cache footprint "for free". Generational garabage collectors tend to never re-use hot objects, and often do the copying between generations making things even worse on the cache. Compaction helps subsequent use somewhat, but is in itself inherently costly, and the indirection (or fixup) implied by it can limit other optimization. Sure, by being lazy you can sometimes win in icache footprint (and in instruction count - a lot of the "GC is fast" papers seem to rely on the fact that you can do other optimizations if you're lazy), but you lose big in dirty dcache footprint. And since dcache is much more expensive than instructions, you're better off doing explicit memory management with refcounting (optionally helped by the programming language, of course. You can make exact refcounting be your "GC" with some language support). However, there's another, more fundamental issue. It's the _mindset_. The GC mindset tends to go hand-in-hand with pointer chasing, while people who use explicit allocators tend to be happier with doing things like "realloc()" and trying to use arrays and indexes instead of linked lists and just generally trying to avoid allocating lots of small things. Which tends to be better on the cache. Yes, I generalize. Don't we all? For example, if you have an _explicit_ refcounting system, then it is quite natural to have operations like "copy-on-write", where if you decide to change a tree node you do something like copy_on_write(node_t **np) { note_t *node = *np; if (node->count > 1) newnode = copy_alloc(node); *np = newnode; node->count--; node = newnode; } return node; } and then before you change a tree node you do node = copy_on_write(&tree->node); .. we now know we are the exclusive owners of "node" .. which tends to be very efficient - it allows sharing, even if sharing is often not the common case (and doesn't do any extra allocations for the common case of an access that was already exclusively owned). (If you want to be thread-safe you need to be more careful yet, and have thread-safe "get_node()/put_node()" actions etc. Most applications don't need to be that careful, but you'll see a _lot_ of this inside an operating system). In contrast, in a GC system where you do _not_ have access to the explicit refcounting, you tend to always copy the node, just because you don't know if the original node might be shared through another tree or not. Even if sharing ends up not being the most common case. So you do a lot of extra work, and you end up with even more cache pressure. Are the GC systems that do refcounting internally _and_ expose the information upwards to the user? I bet there are. But the fact is, the rest of them (99.9%) give those few well-done GC's a bad name. "So what about circular data structures? Refcounting doesn't work for them". Right. Don't do them. Or handle them very very carefully (ie there can be a "head" that gets special handling and keeps the others alive). Compilers certainly almost always end up working with DAG's, not cyclic structures. Make it a rule. Does it take more effort? Yes. The advantage of GC is that it is automatic. But CG apologists should just admit that it causes bad problems and often _encourages_ people to write code that performs badly. I really think it's the mindset that is the biggest problem. A GC system with explicitly visible reference counts (and immediate freeing) with language support to make it easier to get the refcounts right (things like automatically incrementing the refcounts when passing the object off to others) wouldn't necessarily be painful to use, and would clearly offer all the advantages of just doing it all by hand. That's not the world we live in, though. Linus ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 20:28 ` Linus Torvalds @ 2002-08-09 21:12 ` Daniel Berlin 2002-08-09 21:52 ` Linus Torvalds 2002-08-10 6:32 ` Robert Lipe 1 sibling, 1 reply; 173+ messages in thread From: Daniel Berlin @ 2002-08-09 21:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: kevin, gcc > > "So what about circular data structures? Refcounting doesn't work for > them". Right. Don't do them. Or handle them very very carefully (ie > there can be a "head" that gets special handling and keeps the others > alive). Compilers certainly almost always end up working with DAG's, not > cyclic structures. Make it a rule. Sorry, there are cases that make this impossible to do (IOW we can't make it a rule). But another option is to do what Python does. Have a reference cycle GC that just handles breaking cycles. Run it explicitly at times, or much like we do ggc_collect now. Reference cycles can only possibly occur in container objects, so you only have to deal with the overhead of cycle-breaking there. --Dan ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 21:12 ` Daniel Berlin @ 2002-08-09 21:52 ` Linus Torvalds 0 siblings, 0 replies; 173+ messages in thread From: Linus Torvalds @ 2002-08-09 21:52 UTC (permalink / raw) To: Daniel Berlin; +Cc: kevin, gcc On Sat, 10 Aug 2002, Daniel Berlin wrote: > > > > "So what about circular data structures? Refcounting doesn't work for > > them". Right. Don't do them. Or handle them very very carefully (ie > > there can be a "head" that gets special handling and keeps the others > > alive). Compilers certainly almost always end up working with DAG's, not > > cyclic structures. Make it a rule. > > Sorry, there are cases that make this impossible to do (IOW we can't make > it a rule). Hmm. I can't imagine what is there that is inherently cyclic, but breaking the cycles might be more painful than it's worth, so I'll take your word for it. Things like data structure definitions (which clearly can be cyclic thanks to pointers to themselves) can often be resolved trivially with nesting rules (ie if you can show that the lifetime of type A is a superset of the lifetime of B, then you don't actually need to refcount a backpointer from B to A). For the obvious example that I can think of (ie just a structure definition containing a pointer to itself - possibly indirectly via other structures), that type lifetime nesting is inherent in the C type scopes, for example. For type X to have been able to contain a pointer to type Y, Y must have had a larger scope than X, so the pointer from one type structure to another never needs refcounting in a C compiler. (This, btw, is why I don't believe in automated GC systems - even if they use refcounting internally. It's simply fairly hard to tell a GC system simple rules like when you need to ref-count, and when you don't. If you just always ref-count on assignment, you _will_ get the obvious circular references, simply because you miss the high-level picture). But other cases might certainly be much more painful, so I certainly agree with you: > But another option is to do what Python does. > Have a reference cycle GC that just handles breaking cycles. > Run it explicitly at times, or much like we do ggc_collect now. > Reference cycles can only possibly occur in container objects, so you > only have to deal with the overhead of cycle-breaking there. Nothing says you can't mix the two approaches, no. If the subset of allocations you need to worry about from a GC standpoint is relatively small, the cache efficiency advantages of refcounting clearly don't matter, and the disadvantages can be disproportional. Linus ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 20:28 ` Linus Torvalds 2002-08-09 21:12 ` Daniel Berlin @ 2002-08-10 6:32 ` Robert Lipe 2002-08-10 14:26 ` Cyrille Chepelov 1 sibling, 1 reply; 173+ messages in thread From: Robert Lipe @ 2002-08-10 6:32 UTC (permalink / raw) To: gcc Linus Torvalds wrote: > One fundamental fact on modern hardware is that data cache locality is > good, and not being in the cache sucks. This is not likely to change. This is a fact. Measuring this sort of thing is possible. (Optimizing without measuring is seldom a good idea.) In the absence of processor pods and bus analyzers, has anyone thrown gcc at a tool like 'valgrind' or cachegrind? http://developer.kde.org/~sewardj/ RJL ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 6:32 ` Robert Lipe @ 2002-08-10 14:26 ` Cyrille Chepelov 2002-08-10 17:33 ` Daniel Berlin 2002-08-11 1:03 ` Florian Weimer 0 siblings, 2 replies; 173+ messages in thread From: Cyrille Chepelov @ 2002-08-10 14:26 UTC (permalink / raw) To: gcc [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 4064 bytes --] Le Sat, Aug 10, 2002, à 08:32:26AM -0500, Robert Lipe a écrit: > Linus Torvalds wrote: > > > One fundamental fact on modern hardware is that data cache locality is > > good, and not being in the cache sucks. This is not likely to change. > > This is a fact. > Measuring this sort of thing is possible. (Optimizing without > measuring is seldom a good idea.) In the absence of processor pods > and bus analyzers, has anyone thrown gcc at a tool like 'valgrind' or > cachegrind? I just did (I was forming the idea while reading the thread, but you beat me in suggesting it before I implemented it). I have tried on a grand total of three files, two from today's mainline CVS (updated from anonymous about four hours ago), and one from Linux 2.5.30; as my machine is not exactly the dual-multi-gigahertz, "HT"-interconnected (HyperTransport ?) with gobs of memory bandwith (and what else? 64 bits?) monsters Linus has been bragging about recently, please bear with lack of patience to run CG over the whole aforementioned packages... Some detailed results here: http://www.chepelov.org/cyrille/gcc-valgrind Excerpt: java/parse.c ==17875== I refs: 275,598,220 ==17875== I1 misses: 43,600 ==17875== L2i misses: 41,948 ==17875== I1 miss rate: 0.1% ==17875== L2i miss rate: 0.1% ==17875== ==17875== D refs: 145,894,312 (94,095,162 rd + 51,799,150 wr) ==17875== D1 misses: 322,121 ( 259,431 rd + 62,690 wr) ==17875== L2d misses: 313,318 ( 251,817 rd + 61,501 wr) ==17875== D1 miss rate: 0.2% ( 0.2% + 0.1% ) ==17875== L2d miss rate: 0.2% ( 0.2% + 0.1% ) ==17875== ==17875== L2 refs: 365,721 ( 303,031 rd + 62,690 wr) ==17875== L2 misses: 355,266 ( 293,765 rd + 61,501 wr) ==17875== L2 miss rate: 0.0% ( 0.0% + 0.1% ) emit-rtl.c: ==17968== I refs: 2,315,492,628 ==17968== I1 misses: 5,888,264 ==17968== L2i misses: 5,481,716 ==17968== I1 miss rate: 0.25% ==17968== L2i miss rate: 0.23% ==17968== ==17968== D refs: 1,172,342,347 (702,376,465 rd + 469,965,882 wr) ==17968== D1 misses: 7,920,482 ( 6,205,391 rd + 1,715,091 wr) ==17968== L2d misses: 7,134,597 ( 5,455,816 rd + 1,678,781 wr) ==17968== D1 miss rate: 0.6% ( 0.8% + 0.3% ) ==17968== L2d miss rate: 0.6% ( 0.7% + 0.3% ) ==17968== ==17968== L2 refs: 13,808,746 ( 12,093,655 rd + 1,715,091 wr) ==17968== L2 misses: 12,616,313 ( 10,937,532 rd + 1,678,781 wr) ==17968== L2 miss rate: 0.3% ( 0.3% + 0.3% ) linux/kernel/signal.c: ==22924== ==22924== I refs: 1,020,746 ==22924== I1 misses: 1,030 ==22924== L2i misses: 946 ==22924== I1 miss rate: 0.10% ==22924== L2i miss rate: 0.9% ==22924== ==22924== D refs: 480,927 (335,166 rd + 145,761 wr) ==22924== D1 misses: 2,075 ( 1,535 rd + 540 wr) ==22924== L2d misses: 2,072 ( 1,532 rd + 540 wr) ==22924== D1 miss rate: 0.4% ( 0.4% + 0.3% ) ==22924== L2d miss rate: 0.4% ( 0.4% + 0.3% ) ==22924== ==22924== L2 refs: 3,105 ( 2,565 rd + 540 wr) ==22924== L2 misses: 3,018 ( 2,478 rd + 540 wr) ==22924== L2 miss rate: 0.2% ( 0.1% + 0.3% ) I don't want to fuel any kind of flamewars (after all, it's only software), but the miss rates above don't seem too horrible (maybe they are, after all). What cachegrind doesn't show (yet ?) is if the access pattern kills opportunities for the memory interface to use burst transfers; after all, SDRAM also has some form of "seek time". It is possible that something's hidden there. Also, I didn't spend much time trying to figure the proper vg_annotate include path, so some functions appear as unknown in the detailed cachegrind outputs. Well, that's a start. -- Cyrille -- Grumpf. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 14:26 ` Cyrille Chepelov @ 2002-08-10 17:33 ` Daniel Berlin 2002-08-10 18:21 ` Linus Torvalds 2002-08-10 18:28 ` Cyrille Chepelov 2002-08-11 1:03 ` Florian Weimer 1 sibling, 2 replies; 173+ messages in thread From: Daniel Berlin @ 2002-08-10 17:33 UTC (permalink / raw) To: Cyrille Chepelov; +Cc: gcc [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 1468 bytes --] On Sat, 10 Aug 2002, Cyrille Chepelov wrote: > Le Sat, Aug 10, 2002, à 08:32:26AM -0500, Robert Lipe a écrit: > > > Linus Torvalds wrote: > > > > > One fundamental fact on modern hardware is that data cache locality is > > > good, and not being in the cache sucks. This is not likely to change. > > > > This is a fact. > > > Measuring this sort of thing is possible. (Optimizing without > > measuring is seldom a good idea.) In the absence of processor pods > > and bus analyzers, has anyone thrown gcc at a tool like 'valgrind' or > > cachegrind? > > I just did (I was forming the idea while reading the thread, but you beat me > in suggesting it before I implemented it). > > I have tried on a grand total of three files, two from today's mainline CVS > (updated from anonymous about four hours ago), and one from Linux 2.5.30; as > my machine is not exactly the dual-multi-gigahertz, "HT"-interconnected > (HyperTransport ?) with gobs of memory bandwith (and what else? 64 bits?) > monsters Linus has been bragging about recently, please bear with lack of > patience to run CG over the whole aforementioned packages... The numbers I get on a p4 with cachegrind are *much* worse in all cases. The miss rates are all >2%, which is a far cry from 0.1% and 0.0%. Are you sure you have valgrind configured right for your cache? I'm going to do this the *real* way, using the performance monitoring counters on my p4, and get *real* numbers. --Dan ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 17:33 ` Daniel Berlin @ 2002-08-10 18:21 ` Linus Torvalds 2002-08-10 18:38 ` Daniel Berlin 2002-08-10 18:39 ` Cyrille Chepelov 2002-08-10 18:28 ` Cyrille Chepelov 1 sibling, 2 replies; 173+ messages in thread From: Linus Torvalds @ 2002-08-10 18:21 UTC (permalink / raw) To: dberlin, gcc In article < Pine.LNX.4.44.0208102031550.8641-100000@dberlin.org > you write: > >The numbers I get on a p4 with cachegrind are *much* worse in all cases. > >The miss rates are all >2%, which is a far cry from 0.1% and 0.0%. One thing to look out for when looking at cache miss numbers is what they actually _mean_. That is particularly true when it comes to the percentages. Are the percentages relative to #instructions, or #memops, or #line fetches (the latter ends up being interesting especially for I$). The "percentage per instruction" number is to some degree a nonsensical number (since many instructions do not do any D$ accesses at all), but it has the advantage that it makes the I$ and D$ misses comparable, and it also allows you to make a quick estimation of how much time was actually spent on cache misses. The _best_ number to get (and in the end, the only one that really matters) is "cycles spent waiting on cache" and "cycles spent doing useful work", but I don't think valgrind gives you that. The P4 counters should do it, though. If you wan tto use the HW counters under Linux, get "oprofile" from sourceforge.net. (I don't think it does P4 events yet, though) Linus ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 18:21 ` Linus Torvalds @ 2002-08-10 18:38 ` Daniel Berlin 2002-08-10 18:39 ` Cyrille Chepelov 1 sibling, 0 replies; 173+ messages in thread From: Daniel Berlin @ 2002-08-10 18:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: gcc On Sat, 10 Aug 2002, Linus Torvalds wrote: > In article < Pine.LNX.4.44.0208102031550.8641-100000@dberlin.org > you write: > > > >The numbers I get on a p4 with cachegrind are *much* worse in all cases. > > > >The miss rates are all >2%, which is a far cry from 0.1% and 0.0%. > > One thing to look out for when looking at cache miss numbers is what > they actually _mean_. Yeah. > The _best_ number to get (and in the end, the only one that really > matters) is "cycles spent waiting on cache" and "cycles spent doing > useful work", but I don't think valgrind gives you that. The P4 > counters should do it, though. Yuppers. > > If you wan tto use the HW counters under Linux, get "oprofile" from > sourceforge.net. (I don't think it does P4 events yet, though) brink and abyss do p4 events, which is what i'm using. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 18:21 ` Linus Torvalds 2002-08-10 18:38 ` Daniel Berlin @ 2002-08-10 18:39 ` Cyrille Chepelov 1 sibling, 0 replies; 173+ messages in thread From: Cyrille Chepelov @ 2002-08-10 18:39 UTC (permalink / raw) To: gcc [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 1493 bytes --] Le Sat, Aug 10, 2002, à 06:20:51PM -0700, Linus Torvalds a écrit: > >The numbers I get on a p4 with cachegrind are *much* worse in all cases. > > > >The miss rates are all >2%, which is a far cry from 0.1% and 0.0%. > > One thing to look out for when looking at cache miss numbers is what > they actually _mean_. > > That is particularly true when it comes to the percentages. Are the > percentages relative to #instructions, or #memops, or #line fetches (the > latter ends up being interesting especially for I$). These are percentages relative to the number of accesses. L2 percentages are also relative to the original number of accesses, not to the number of L1 misses. > The _best_ number to get (and in the end, the only one that really > matters) is "cycles spent waiting on cache" and "cycles spent doing > useful work", but I don't think valgrind gives you that. The P4 > counters should do it, though. Indeed, cachegrind won't tell you when there was a miss but the hardware was smart enough to do something useful while it waits for the cache. Despite this limitation, shouldn't (number_of_L1_misses * N) + (number_of_L2_misses * M) * cycle_len [where N is roughly 10 and M roughly 200, or updated figures] be a ballpark figure of the time lost waiting for RAM to catch up? > If you wan tto use the HW counters under Linux, get "oprofile" from > sourceforge.net. (I don't think it does P4 events yet, though) The site says it doesn't yet. -- Cyrille -- Grumpf. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 17:33 ` Daniel Berlin 2002-08-10 18:21 ` Linus Torvalds @ 2002-08-10 18:28 ` Cyrille Chepelov 2002-08-10 18:30 ` John Levon 1 sibling, 1 reply; 173+ messages in thread From: Cyrille Chepelov @ 2002-08-10 18:28 UTC (permalink / raw) To: gcc [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 2235 bytes --] Le Sat, Aug 10, 2002, à 08:33:53PM -0400, Daniel Berlin a écrit: > On Sat, 10 Aug 2002, Cyrille Chepelov wrote: > > I have tried on a grand total of three files, two from today's mainline CVS > > (updated from anonymous about four hours ago), and one from Linux 2.5.30; as > > my machine is not exactly the dual-multi-gigahertz, "HT"-interconnected > > (HyperTransport ?) with gobs of memory bandwith (and what else? 64 bits?) (Some brave soul pointed to me that HT is more probably HyperThreading. I stand corrected (though being LT surely entitles one to getting cooler toys that mere mortals)). > The numbers I get on a p4 with cachegrind are *much* worse in all cases. > > The miss rates are all >2%, which is a far cry from 0.1% and 0.0%. a-ha ! This is interesting... Did you run on the same sample files as I did, or others ? Can you reproduce my numbers if you set --I1=65536,2,64 --D1=65536,2,64 --L2=65536,8,64 ? > Are you sure you have valgrind configured right for your cache? Sure, no. The cache spec numbers did look about rig... D'oh! Looks like Cachegrind trusts a little too faithfully what this old (A0-stepping) Duron says. CG believes L2 is 1 KB, whereas in fact it is 64KB. I've just re-ran the java/parser.c test with forcing --L2=65536,8,64, and uploaded the results (same place) What are the first lines of output from vg_annotate on your system ? It certainly sounds unbelievable that a Duron's cache design beats a P4's. (there is something curious about the L2 lines from the initial output (the last three ones). Saying that 355266 misses for 365721 refs means a 0.0% miss rate certainly sounds strange, I've got to ask Julian about the logic there. Looks to me that L2 failed 97% of its mission). > I'm going to do this the *real* way, using the performance monitoring > counters on my p4, and get *real* numbers. It would be very interesting to see how far off CG falls... CG does make the implicit assumption that the process runs uninterrupted (I tried welding cachegrind into UML, but that didn't bring me far). The real CPU will certainly give you a more lively picture.... (the performance monitoring counters are not per-process on Linux, are they ?) -- Cyrille -- Grumpf. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 18:28 ` Cyrille Chepelov @ 2002-08-10 18:30 ` John Levon 0 siblings, 0 replies; 173+ messages in thread From: John Levon @ 2002-08-10 18:30 UTC (permalink / raw) To: gcc On Sun, Aug 11, 2002 at 03:28:51AM +0200, Cyrille Chepelov wrote: > It would be very interesting to see how far off CG falls... CG does make the > implicit assumption that the process runs uninterrupted (I tried welding > cachegrind into UML, but that didn't bring me far). The real CPU will > certainly give you a more lively picture.... (the performance monitoring > counters are not per-process on Linux, are they ?) perfctr patch supports virtual counters (google first hit). I don't remember if it has P4 support yet. regards john -- "It is unbecoming for young men to utter maxims." - Aristotle ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 14:26 ` Cyrille Chepelov 2002-08-10 17:33 ` Daniel Berlin @ 2002-08-11 1:03 ` Florian Weimer 1 sibling, 0 replies; 173+ messages in thread From: Florian Weimer @ 2002-08-11 1:03 UTC (permalink / raw) To: Cyrille Chepelov; +Cc: gcc Cyrille Chepelov <cyrille@chepelov.org> writes: > What cachegrind doesn't show (yet ?) is if the access pattern kills > opportunities for the memory interface to use burst transfers; By the way: IIRC, there is some FUD by the author on the web page that the cache simulation might be incorrect. Maybe someone should check this before jumping to conclusions (I'm not familiar with processor cache architectures, that's why I can't do this, sorry). ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 18:57 ` Linus Torvalds 2002-08-09 19:12 ` Phil Edwards 2002-08-09 19:34 ` Kevin Atkinson @ 2002-08-10 19:20 ` Noel Yap 2 siblings, 0 replies; 173+ messages in thread From: Noel Yap @ 2002-08-10 19:20 UTC (permalink / raw) To: Linus Torvalds, gcc --- Linus Torvalds <torvalds@transmeta.com> wrote: > In article > < 20020809200413.46719.qmail@web21403.mail.yahoo.com > > you write: > >Build speeds are most helped by minimizing the > number > >of files opened and closed during the build. > > I _seriously_ doubt that. Yes, my statement is exagerated although they are not completely truthless. The study conducted by John Lakos and some testing that I have conducted point to the fact that minimizing file opens does speed up builds significantly. Of course, that's not to say that other courses of action shouldn't be pursued. > Opening (and even reading) a cached file is not an > expensive operation, > not compared to the kinds of run-times gcc has. > We're talking a few > microseconds per file open at a low level. Even > parsing it should not > be that expensive, especially if the preprocessor is > any good (and from > all I've seen, these days it _is_ good). Hmm, perhaps it's time I conducted some tests again. I'm assuming you're talking about caching at the OS level? > I strongly suspect that what makes gcc slow is that > it has absolutely > horrible cache behaviour, a big VM footprint, and > chases pointers in > that badly cached area all of the time. Maybe you're not talking about caching at the OS level. Caching at the compiler level will certainly help with header files that are included multiple times. OTOH, caching at the OS level and/or preprocessing header files will help with that /and/ header files that are included across compiles. > And that, in turn, is probably impossible to fix as > long as gcc uses > garbage collection for most of its internal memory > management. There > just aren't all that many worse ways to f*ck up your > cache behaviour > than by using lots of allocations and lazy GC to > manage your memory. > > The problem with bad cache behaviour is that you > don't get nice spikes > in specific places that you can try to optimize - > the cost ends up being > spread all over the places that touch the data > structures. > > The problem with trying to avoid GC is that if you > do that you have to > be careful about your reference counts, and I doubt > the gcc people want > to be that careful, especially considering that the > code-base right now > is not likely to be very easy to convert. > > (Plus the fact that GC proponents absolutely refuse > to see the error of > their ways, and will flame me royally for even > _daring_ to say that GC > sucks donkey brains through a straw from a > performance standpoint. If > order to work with refcounting, you need to have the > mentality that > every single data structure with a non-local > lifetime needs to have the > count as it's major member) I'll leave it to the experts to hash this area out. Noel __________________________________________________ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 12:17 Faster compilation speed Mike Stump 2002-08-09 13:04 ` Noel Yap @ 2002-08-09 13:10 ` Aldy Hernandez 2002-08-09 15:28 ` Mike Stump 2002-08-09 14:29 ` Neil Booth ` (4 subsequent siblings) 6 siblings, 1 reply; 173+ messages in thread From: Aldy Hernandez @ 2002-08-09 13:10 UTC (permalink / raw) To: Mike Stump; +Cc: gcc >>>>> "Mike" == Mike Stump <mrs@apple.com> writes: > + /* Nonzero for compiling as fast as we can. */ > + > + extern int flag_speed_compile; > + > + #define SPEEDCOMPILE flag_speed_compile So, you want to introduce a flag to do faster compilation? Why not spend your time making the current infrastructure faster? Aldy ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 13:10 ` Aldy Hernandez @ 2002-08-09 15:28 ` Mike Stump 2002-08-09 16:00 ` Aldy Hernandez 2002-08-09 19:07 ` David Edelsohn 0 siblings, 2 replies; 173+ messages in thread From: Mike Stump @ 2002-08-09 15:28 UTC (permalink / raw) To: Aldy Hernandez; +Cc: gcc On Friday, August 9, 2002, at 01:15 PM, Aldy Hernandez wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:28 ` Mike Stump @ 2002-08-09 16:00 ` Aldy Hernandez 2002-08-09 16:26 ` Stan Shebs 2002-08-12 16:05 ` Mike Stump 2002-08-09 19:07 ` David Edelsohn 1 sibling, 2 replies; 173+ messages in thread From: Aldy Hernandez @ 2002-08-09 16:00 UTC (permalink / raw) To: Mike Stump; +Cc: gcc > Let's take my combine elision patch. This patch makes the compiler > generate worse code. The way in which it is worse, is that more stack > space is used. How much more, well, my initial guess is that it is > less than 10% worse. Not too bad. Maybe users would care, maybe they I assume you have already looked at the horrendity of the code presently generated by -O0. It's pretty unusable as it is. Who would really want to use gcc under the influence of "worse than -O0"? Really. > I hope that explains my thinking a little bit more. Comments? > Anything sound wrong? And unforeseen dangers? Off the top of my head, if you insist on this approach, at least guarantee that generated code is no worse to debug. That is the only reason *I* use -O0, to debug. Cheers. Aldy ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:00 ` Aldy Hernandez @ 2002-08-09 16:26 ` Stan Shebs 2002-08-09 16:31 ` Aldy Hernandez ` (2 more replies) 2002-08-12 16:05 ` Mike Stump 1 sibling, 3 replies; 173+ messages in thread From: Stan Shebs @ 2002-08-09 16:26 UTC (permalink / raw) To: Aldy Hernandez; +Cc: Mike Stump, gcc Aldy Hernandez wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:26 ` Stan Shebs @ 2002-08-09 16:31 ` Aldy Hernandez 2002-08-09 16:51 ` Stan Shebs 2002-08-09 17:36 ` Daniel Berlin 2002-08-12 16:23 ` Mike Stump 2 siblings, 1 reply; 173+ messages in thread From: Aldy Hernandez @ 2002-08-09 16:31 UTC (permalink / raw) To: Stan Shebs; +Cc: Mike Stump, gcc > OK, then to really rub it in, CW runs much faster than GCC, even on > that slow Darwin OS :-), and that's with its non-optimizing case being Hey, no fair. You know my complaints are strictly in the filesystem :). > Sacrificing -O0 optimization is just a desperation move, since > we don't seem to have many other ideas about how to make GCC as > fast as CW. Ah, the truth comes out. So... Don't you think that if we spent more time getting the infrastructure faster, -O0 will improve as well? Either way, I ain't going to vote against a faster -O0. At least it speeds up my development cycle, since I program by building cc1, inspecting assembly, and repeating cycle :). Aldy ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:31 ` Aldy Hernandez @ 2002-08-09 16:51 ` Stan Shebs 2002-08-09 16:54 ` Aldy Hernandez ` (3 more replies) 0 siblings, 4 replies; 173+ messages in thread From: Stan Shebs @ 2002-08-09 16:51 UTC (permalink / raw) To: Aldy Hernandez; +Cc: Mike Stump, gcc Aldy Hernandez wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:51 ` Stan Shebs @ 2002-08-09 16:54 ` Aldy Hernandez 2002-08-09 17:44 ` Daniel Berlin ` (2 subsequent siblings) 3 siblings, 0 replies; 173+ messages in thread From: Aldy Hernandez @ 2002-08-09 16:54 UTC (permalink / raw) To: Stan Shebs; +Cc: Mike Stump, gcc > I don't think Mike mentioned it, but speeding up the compiler has > become our group's top priority, and every idea is on the table > right now. The 6x goal sounds extreme, but it helps keep in mind > that one or two or even a dozen 5% improvements will not be > sufficient to attain parity with the competition. Fair enough. Game on, and good luck. And please don't keep your changes in your tree, and then have them become obsolete in 4 months when you try to merge :) Aldy ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:51 ` Stan Shebs 2002-08-09 16:54 ` Aldy Hernandez @ 2002-08-09 17:44 ` Daniel Berlin 2002-08-09 18:35 ` David S. Miller 2002-08-09 18:25 ` David S. Miller 2002-08-10 10:02 ` Neil Booth 3 siblings, 1 reply; 173+ messages in thread From: Daniel Berlin @ 2002-08-09 17:44 UTC (permalink / raw) To: Stan Shebs; +Cc: Aldy Hernandez, Mike Stump, gcc On Fri, 9 Aug 2002, Stan Shebs wrote: > Aldy Hernandez wrote: > > >[...] > > > > So... Don't you think that if we spent more > >time getting the infrastructure faster, -O0 will improve as well? > > > Well sure, it should be part of the plan. > > One of my suspicions is that the massive use of macros in tree > and RTL is concealing excessive pointer chasing, because they > don't show up in either profile or coverage numbers Ding ding, you have another winner. I actually benched this once, by functionizing some often used macros. The timings were horrendous. But what can we do to increase cache locality, or get rid of these problems? > is taking the macros that we function-ized for debugging purposes > (Ira posted it to gcc-patches some time ago, but nobody wanted it > because dwarf2 macro debugging was going to be available RSN), and > will build a (slow) GCC that will do it all through function calls. > That should yield a much more interesting profile. > > I don't think Mike mentioned it, but speeding up the compiler has > become our group's top priority, and every idea is on the table > right now. The 6x goal sounds extreme, but it helps keep in mind > that one or two or even a dozen 5% improvements will not be > sufficient to attain parity with the competition. I think part of the problem is that the timings gcc itself outputs aren't completely accurate, because sometimes we go around the calls that would push the timevar. > Stan > > > > ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 17:44 ` Daniel Berlin @ 2002-08-09 18:35 ` David S. Miller 2002-08-09 18:39 ` Aldy Hernandez 0 siblings, 1 reply; 173+ messages in thread From: David S. Miller @ 2002-08-09 18:35 UTC (permalink / raw) To: dberlin; +Cc: shebs, aldyh, mrs, gcc From: Daniel Berlin <dberlin@dberlin.org> Date: Fri, 9 Aug 2002 20:44:00 -0400 (EDT) The timings were horrendous. But what can we do to increase cache locality, or get rid of these problems? And TLB locality... I propose two possible solutions. 1) Reference count these objects properly, and stop being at the mercy of the garbage collector. 2) Make RTL/TREE layout less pointer driven. I read elsewhere today someone saying that garbage collecting is for people who cannot count, and after trying to beat GCC's GC into submission for a few weeks I couldn't agree more :-) And for this reason if I had the time right now I'd probably tackle #1 first. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 18:35 ` David S. Miller @ 2002-08-09 18:39 ` Aldy Hernandez 2002-08-09 18:59 ` David S. Miller 2002-08-09 20:01 ` Per Bothner 0 siblings, 2 replies; 173+ messages in thread From: Aldy Hernandez @ 2002-08-09 18:39 UTC (permalink / raw) To: David S. Miller; +Cc: dberlin, shebs, mrs, gcc > 2) Make RTL/TREE layout less pointer driven. For the clueless, ahem me, could you go into more detail on this? Thanks. Aldy ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 18:39 ` Aldy Hernandez @ 2002-08-09 18:59 ` David S. Miller 2002-08-09 20:01 ` Per Bothner 1 sibling, 0 replies; 173+ messages in thread From: David S. Miller @ 2002-08-09 18:59 UTC (permalink / raw) To: aldyh; +Cc: dberlin, shebs, mrs, gcc From: Aldy Hernandez <aldyh@redhat.com> Date: Fri, 9 Aug 2002 18:45:00 -0700 > 2) Make RTL/TREE layout less pointer driven. For the clueless, ahem me, could you go into more detail on this? Embed RTL object info instead of using pointers to other RTL objects. It's about as far a reaching change as reference counting RTL and killing off garbage collection. The reason #2 is so far reaching is that it would require changing several of the semantics of shared RTL and also getting rid of the places that just randomly stick new RTL all over the place. Garbage collection is just an excuse to be lazy with how we manage RTL objects in GCC. Further consideration suggests that you can approach either solution in at least two stages. The first stage is somehow documenting in the code each spot where we rewrite existing RTL. That makes the rest of the work a bit easier. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 18:39 ` Aldy Hernandez 2002-08-09 18:59 ` David S. Miller @ 2002-08-09 20:01 ` Per Bothner 1 sibling, 0 replies; 173+ messages in thread From: Per Bothner @ 2002-08-09 20:01 UTC (permalink / raw) To: Aldy Hernandez; +Cc: gcc Aldy Hernandez wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:51 ` Stan Shebs 2002-08-09 16:54 ` Aldy Hernandez 2002-08-09 17:44 ` Daniel Berlin @ 2002-08-09 18:25 ` David S. Miller 2002-08-13 0:50 ` Loren James Rittle 2002-08-10 10:02 ` Neil Booth 3 siblings, 1 reply; 173+ messages in thread From: David S. Miller @ 2002-08-09 18:25 UTC (permalink / raw) To: gcc All of these attempts of taking care of "low hanging fruit" are great. But these efforts should not make us ignore the real problems GCC has. For example, I'm convinced that teaching all the RTL code "how to count" and thus obviating garbage collection all together, would be the biggest win ever. (I'm saying RTL should have reference counts, if someone didn't catch what I meant) Someone, I think Stan Shebs, mentioned pointer chasing, and that's another great area of exploration. The problem is that most people don't want to, or has the time to, sit down and do such far reaching changes necessary to fix these toplevel problems. This is exactly what makes things such as a "flag_go_fast" option so appealing. :-( ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 18:25 ` David S. Miller @ 2002-08-13 0:50 ` Loren James Rittle 2002-08-13 21:46 ` Fergus Henderson 0 siblings, 1 reply; 173+ messages in thread From: Loren James Rittle @ 2002-08-13 0:50 UTC (permalink / raw) To: davem; +Cc: gcc In article < 20020809.181251.63969530.davem@redhat.com > David S. Miller writes: > For example, I'm convinced that teaching all the RTL code "how to > count" and thus obviating garbage collection all together, would be > the biggest win ever. (I'm saying RTL should have reference counts, > if someone didn't catch what I meant) Hi David, (This message is in the interest of brainstorming ways to improve compilation speed, even if we can't volunteer to implement, as Mike requested.) In general, comparing RC-GC to scan-GC, I often thought along the quoted lines as well. However, I had no systematic data and my opinion softened somewhat after reading Boehm's papers. Then, for non-modern hardware, I once did compare the performance of a scan-GC-based system (using boehm-gc) verses that of an equivalent explicit-free-based system (along with all the application-level RC code). I was truly surprised at how little overhead there was for using the boehm-gc technique (off-hand, I think it was under 1% for my system, but I do doubt this study applies to modern HW and/or gcc's memory usage pattern) and, more importantly, how much code complexity was reduced. I believe that reduction in code complexity is what drove gcc switching to scan-GC RTL. If you hand-coded RC back in, how is that different than the complexity that was once removed with the introduction of scan-GC? If I recall correctly, subtle object lifetime bugs came and went with the pre-scan-GC code due to complexity (perhaps it was never formally RC'd and if that is your answer, I'd buy it ;-). Now, if I understand it right, the scan-GC technique used in gcc is not as elegant (some explicit marking is required) or high-performance (gcc's implementation doesn't use hardware dirty bits, etc.) as that used in boehm-gc. Has anyone ever tested gcc with its own GC disabled but boehm-gc enabled? OK, this is a red herring question. Even if performance was greater, portability concerns are what caused the decision to build a new custom scan-GC verses reusing boehm-gc... Assuming your (application-level) RC-GC test pans out in terms of speedup, perhaps adding explicit code to maintain counts is not the best approach to keeping the reins on complexity. This might be what you meant, but: Wouldn't it be neater if gcc itself could generally reference count underlying memory which supports C pointers (as a language extension)? According to published papers, the compiler for Inferno could do it (I read them years ago when looking at the classic Java GC model verses other VM technology thus no cite here; I think it is interesting that the latest Java JIT compilers support RC-GC now). Perhaps it is impossible to add generic RC support to C and expose it to all users (for instance, there is the classic pointer escape/ABI problem). But it seems that we could mark structs whose pointers and underlying memory representations are to be handled specially upon pointer copy/invalidation (i.e. due to failing off the end of a scope) and then rigorously check usage against whatever model we use to avoid pointer escape. GCC's use of pointers in this area is regular and I see no reason the RC extension couldn't be modeled off the exact needs of the RTL usage (just as scan-GC was not exposed to compiler users, this RC-GC support could be tuned for compiler implementation). How to handle bootstrap since we'd want to use the new technique to replace gcc's current scan-GC? The current GC is only slightly intrusive and could be retained to build the stage1 compiler with support for the new RC-pointer handler (and related support for struct marking in source). Current scan-GC would be disabled for stage2 and 3; the new RC-pointer handler would be enabled. Regards, Loren ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 0:50 ` Loren James Rittle @ 2002-08-13 21:46 ` Fergus Henderson 2002-08-13 22:40 ` David S. Miller 2002-08-14 7:36 ` Jeff Sturm 0 siblings, 2 replies; 173+ messages in thread From: Fergus Henderson @ 2002-08-13 21:46 UTC (permalink / raw) To: Loren James Rittle; +Cc: davem, gcc On 13-Aug-2002, Loren James Rittle <rittle@latour.rsch.comm.mot.com> wrote: > Has anyone ever tested gcc with its own GC disabled > but boehm-gc enabled? OK, this is a red herring question. Even if > performance was greater, portability concerns are what caused the > decision to build a new custom scan-GC verses reusing boehm-gc... Yes, but GCC could use the Boehm GC on systems which supported it, if the Boehm GC was faster... I think this would be a very interesting experiment. -- Fergus Henderson <fjh@cs.mu.oz.au> | "I have always known that the pursuit The University of Melbourne | of excellence is a lethal habit" WWW: < http://www.cs.mu.oz.au/~fjh > | -- the last words of T. S. Garp. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 21:46 ` Fergus Henderson @ 2002-08-13 22:40 ` David S. Miller 2002-08-13 23:44 ` Fergus Henderson ` (2 more replies) 2002-08-14 7:36 ` Jeff Sturm 1 sibling, 3 replies; 173+ messages in thread From: David S. Miller @ 2002-08-13 22:40 UTC (permalink / raw) To: fjh; +Cc: rittle, gcc From: Fergus Henderson <fjh@cs.mu.OZ.AU> Date: Wed, 14 Aug 2002 14:46:37 +1000 Yes, but GCC could use the Boehm GC on systems which supported it, if the Boehm GC was faster... I think this would be a very interesting experiment. Feel free to even try it with an infinitely fast GC, even one that executed in zero time. Because for the millionth time, it's not the performance of GC itself. It's the temporal and spatial locality problems of data accesses which is a fundamental result of using GC for memory allocation. It is not an issue of "how fast" the GC is. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 22:40 ` David S. Miller @ 2002-08-13 23:44 ` Fergus Henderson 2002-08-14 7:58 ` Jeff Sturm 2002-08-14 9:52 ` Richard Henderson 2 siblings, 0 replies; 173+ messages in thread From: Fergus Henderson @ 2002-08-13 23:44 UTC (permalink / raw) To: David S. Miller; +Cc: rittle, gcc On 13-Aug-2002, David S. Miller <davem@redhat.com> wrote: > From: Fergus Henderson <fjh@cs.mu.OZ.AU> > Date: Wed, 14 Aug 2002 14:46:37 +1000 > > Yes, but GCC could use the Boehm GC on systems which supported it, > if the Boehm GC was faster... > > I think this would be a very interesting experiment. > > Feel free to even try it with an infinitely fast GC, even > one that executed in zero time. > > Because for the millionth time, it's not the performance of GC itself. > It's the temporal and spatial locality problems of data accesses which > is a fundamental result of using GC for memory allocation. > > It is not an issue of "how fast" the GC is. Look, there are a number of possible memory management strategies and implementations possible. GC using GCC's current GC implementation is one. Conservative GC using the Boehm collector is another. Reference counting is another. Reference counting has its own set of drawbacks for locality, so it's not clear it would be a win; doing the experiment would be a *lot* of work. If someone really feels strongly about RC, and has lots of time, by all means, go for it. Using the Boehm collector is less likely to be a huge win, but it might well be a significant win, and it would be much easier to carry out that experiment. -- Fergus Henderson <fjh@cs.mu.oz.au> | "I have always known that the pursuit The University of Melbourne | of excellence is a lethal habit" WWW: < http://www.cs.mu.oz.au/~fjh > | -- the last words of T. S. Garp. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 22:40 ` David S. Miller 2002-08-13 23:44 ` Fergus Henderson @ 2002-08-14 7:58 ` Jeff Sturm 2002-08-14 9:52 ` Richard Henderson 2 siblings, 0 replies; 173+ messages in thread From: Jeff Sturm @ 2002-08-14 7:58 UTC (permalink / raw) To: David S. Miller; +Cc: fjh, rittle, gcc On Tue, 13 Aug 2002, David S. Miller wrote: > I think this would be a very interesting experiment. > > Feel free to even try it with an infinitely fast GC, even > one that executed in zero time. > > Because for the millionth time, it's not the performance of GC itself. > It's the temporal and spatial locality problems of data accesses which > is a fundamental result of using GC for memory allocation. Relax. Earlier in this thread I seem to remember you were advocating certain experiments in spite of the skeptics. So give the GC experts a chance. As I understand it, generational collection ought to improve locality, since the youngest generation can be collected frequently, and may even be small enough to fit mostly in cache. (I've never observed it to work in practice, but don't let that discourage anyone :-) Jeff ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 22:40 ` David S. Miller 2002-08-13 23:44 ` Fergus Henderson 2002-08-14 7:58 ` Jeff Sturm @ 2002-08-14 9:52 ` Richard Henderson 2002-08-14 10:00 ` David Edelsohn 2002-08-14 10:15 ` David Edelsohn 2 siblings, 2 replies; 173+ messages in thread From: Richard Henderson @ 2002-08-14 9:52 UTC (permalink / raw) To: David S. Miller; +Cc: fjh, rittle, gcc On Tue, Aug 13, 2002 at 10:26:41PM -0700, David S. Miller wrote: > Because for the millionth time, it's not the performance of GC itself. > It's the temporal and spatial locality problems of data accesses which > is a fundamental result of using GC for memory allocation. You havn't shown (or even provided guesstemates) how much temporal or spacial locallity could be had by moving away from GC. Exactly how much garbage is created during compilation of a function, Dave? Suppose we did do manual memory allocation and never created any garbage whatsoever. Suppose perfect temporal locality. How much spacial locality do we have, considering the pointer-chasing structure of our IL? My guess is not much. The folks that are doing cache-miss studies and concluding anything should also go back and measure gcc 2.95, before we used GC at all. That's perhaps not ideal, since it's obstacks instead of reference counting, but it's not a worthless data point. The conclusion that RC will solve all our problems is not foregone. I think we're better served trying to adjust the form of the IL so that we do less pointer chasing, as Geoff suggested elsewhere in this thread. r~ ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-14 9:52 ` Richard Henderson @ 2002-08-14 10:00 ` David Edelsohn 2002-08-14 12:01 ` Andreas Schwab 2002-08-14 10:15 ` David Edelsohn 1 sibling, 1 reply; 173+ messages in thread From: David Edelsohn @ 2002-08-14 10:00 UTC (permalink / raw) To: Richard Henderson, David S. Miller; +Cc: gcc >>>>> Richard Henderson writes: Richard> You havn't shown (or even provided guesstemates) how much temporal Richard> or spacial locallity could be had by moving away from GC. Exactly Richard> how much garbage is created during compilation of a function, Dave? Richard> Suppose we did do manual memory allocation and never created any Richard> garbage whatsoever. Suppose perfect temporal locality. How much Richard> spacial locality do we have, considering the pointer-chasing structure Richard> of our IL? My guess is not much. Places where GCC could benefit from spacial locality is by allocating the instruction list and pseudo registers from a large, static virtual memory array instead of allocating individual objects dynamically. I am *not* suggesting removing the linked list pointers or the pointers to the actual RTL. GCC often scans or walks through the instructions linearly. Pseudo registers are allocated consecutively. Allocating those linearly-accessed objects in contiguous memory would improve cache locality. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-14 10:00 ` David Edelsohn @ 2002-08-14 12:01 ` Andreas Schwab 2002-08-14 12:07 ` David Edelsohn 0 siblings, 1 reply; 173+ messages in thread From: Andreas Schwab @ 2002-08-14 12:01 UTC (permalink / raw) To: David Edelsohn; +Cc: Richard Henderson, David S. Miller, gcc [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 1074 bytes --] David Edelsohn <dje@watson.ibm.com> writes: |> >>>>> Richard Henderson writes: |> |> Richard> You havn't shown (or even provided guesstemates) how much temporal |> Richard> or spacial locallity could be had by moving away from GC. Exactly |> Richard> how much garbage is created during compilation of a function, Dave? |> |> Richard> Suppose we did do manual memory allocation and never created any |> Richard> garbage whatsoever. Suppose perfect temporal locality. How much |> Richard> spacial locality do we have, considering the pointer-chasing structure |> Richard> of our IL? My guess is not much. |> |> Places where GCC could benefit from spacial locality is by |> allocating the instruction list and pseudo registers from a large, static |> virtual memory array instead of allocating individual objects dynamically. Obstacks? Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-14 12:01 ` Andreas Schwab @ 2002-08-14 12:07 ` David Edelsohn 2002-08-14 13:20 ` Michael Matz 2002-08-14 13:20 ` Faster compilation speed Jamie Lokier 0 siblings, 2 replies; 173+ messages in thread From: David Edelsohn @ 2002-08-14 12:07 UTC (permalink / raw) To: Andreas Schwab; +Cc: Richard Henderson, David S. Miller, gcc >>>>> Andreas Schwab writes: |> Places where GCC could benefit from spacial locality is by |> allocating the instruction list and pseudo registers from a large, static |> virtual memory array instead of allocating individual objects dynamically. Andreas> Obstacks? I thought that obstacks are created dynamically, not statically. One does not want to ever copy or grow the array. Statically allocating some of the large, persistent, sequential collections of objects would help locality. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-14 12:07 ` David Edelsohn @ 2002-08-14 13:20 ` Michael Matz 2002-08-14 16:31 ` Faster compilation speed [zone allocation] Per Bothner 2002-08-14 13:20 ` Faster compilation speed Jamie Lokier 1 sibling, 1 reply; 173+ messages in thread From: Michael Matz @ 2002-08-14 13:20 UTC (permalink / raw) To: David Edelsohn; +Cc: gcc Hi, On Wed, 14 Aug 2002, David Edelsohn wrote: > |> Places where GCC could benefit from spacial locality is by > |> allocating the instruction list and pseudo registers from a large, static > |> virtual memory array instead of allocating individual objects dynamically. > > Andreas> Obstacks? > > I thought that obstacks are created dynamically, not statically. Sort of. Obstacks have the ability to grow an object which isn't yet finalized, and in that process there might be some copying (the canonical example is a string, which is created character by character). After finalization it doesn't change it's address anymore, but still is part of that obstack. One would not use that functionality, but simply use obstacks as convenient containers for small objects, which are allocated already finalized. It allocates memory in blocks, and then gives out part of the current block as long as enough is free in it, and the request is not larger than a certain size (in which case it gets it's own block). This makes for extremely fast allocation (just a pointer increment in the general case). One can't deallocate objects in an obstack (or better only all objects allocated after a certain one). And it creates good space locality, and needs less memory then a general allocator like malloc (in case many small objects are allocated). But that one can't free objects is a quite severe limitation (I wrote one for KDE, in which you can free objects, but it has certain restrictions). But it's still usable. E.g. I use an obstack in the new register allocator to allocate most of my small objects from it (nodes and edges of the graph), and then simply free the whole thing once at the end of that phase. But that's not possible e.g. with the current RTL of the function, there you really don't want to use an obstack. > One does not want to ever copy or grow the array. As explained, this doesn't happen if one uses the obstack without growing objects. > Statically allocating some of the large, persistent, sequential > collections of objects would help locality. This would lead to the idea of obstacks (without growing obstacks) per data structure type, IOW to a zone allocator, which is not a bad thing. Ciao, Michael. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed [zone allocation] 2002-08-14 13:20 ` Michael Matz @ 2002-08-14 16:31 ` Per Bothner 2002-08-15 11:34 ` Aldy Hernandez 0 siblings, 1 reply; 173+ messages in thread From: Per Bothner @ 2002-08-14 16:31 UTC (permalink / raw) To: Michael Matz; +Cc: gcc Michael Matz wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed [zone allocation] 2002-08-14 16:31 ` Faster compilation speed [zone allocation] Per Bothner @ 2002-08-15 11:34 ` Aldy Hernandez 2002-08-15 11:39 ` David Edelsohn ` (3 more replies) 0 siblings, 4 replies; 173+ messages in thread From: Aldy Hernandez @ 2002-08-15 11:34 UTC (permalink / raw) To: Per Bothner; +Cc: Michael Matz, gcc >>>>> "Per" == Per Bothner <per@bothner.com> writes: This is just an idea, why doesn't someone hack the GC to never collect, and then we can really find out how much is to be gained by a refcounter, or no GC at all, etc. Why go down this path, if we're not even sure it'll improve anything (well, that much anyhow). Aldy ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed [zone allocation] 2002-08-15 11:34 ` Aldy Hernandez @ 2002-08-15 11:39 ` David Edelsohn 2002-08-15 12:01 ` Lynn Winebarger 2002-08-15 11:41 ` Michael Matz ` (2 subsequent siblings) 3 siblings, 1 reply; 173+ messages in thread From: David Edelsohn @ 2002-08-15 11:39 UTC (permalink / raw) To: Aldy Hernandez; +Cc: Per Bothner, Michael Matz, gcc >>>>> Aldy Hernandez writes: Aldy> This is just an idea, why doesn't someone hack the GC to never Aldy> collect, and then we can really find out how much is to be gained by a Aldy> refcounter, or no GC at all, etc. Aldy> Why go down this path, if we're not even sure it'll improve anything Aldy> (well, that much anyhow). Because the problem is not the garbage collection, its the allocation pattern. The proposal to use reference counting allows GCC to switch to an allocator with better locality -- it's a requirement for the underlying improvement, not a fix unto itself. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed [zone allocation] 2002-08-15 11:39 ` David Edelsohn @ 2002-08-15 12:01 ` Lynn Winebarger 2002-08-15 12:11 ` David Edelsohn 0 siblings, 1 reply; 173+ messages in thread From: Lynn Winebarger @ 2002-08-15 12:01 UTC (permalink / raw) To: David Edelsohn, Aldy Hernandez; +Cc: Per Bothner, Michael Matz, gcc On Thursday 15 August 2002 13:39, David Edelsohn wrote: > >>>>> Aldy Hernandez writes: > > Because the problem is not the garbage collection, its the > allocation pattern. The proposal to use reference counting allows GCC to > switch to an allocator with better locality -- it's a requirement for the > underlying improvement, not a fix unto itself. > GCC's GC promotion of poor locality of reference is not proof that reference counting is the only way to improve that locality of reference. It doesn't matter what allocation/reclamation scheme you switch to, if it's not used in a way consistent with the cases it optimizes for, it's going to stink. There's just as much reason to believe there's a generational GC that will do what you need as to believe reference counting will be some kind of magic bullet (without the brittleness). Lynn ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed [zone allocation] 2002-08-15 12:01 ` Lynn Winebarger @ 2002-08-15 12:11 ` David Edelsohn 0 siblings, 0 replies; 173+ messages in thread From: David Edelsohn @ 2002-08-15 12:11 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Aldy Hernandez, Per Bothner, Michael Matz, gcc >>>>> Lynn Winebarger writes: Lynn> GCC's GC promotion of poor locality of reference is not proof that Lynn> reference counting is the only way to improve that locality of reference. Lynn> It doesn't matter what allocation/reclamation scheme you switch to, if it's Lynn> not used in a way consistent with the cases it optimizes for, it's going to Lynn> stink. There's just as much reason to believe there's a generational GC Lynn> that will do what you need as to believe reference counting will be some Lynn> kind of magic bullet (without the brittleness). Let me correct my sloppy wording. What I meant by "it's a requirement for the underlying improvement" is that it is a dependency for that particular proposal -- RC is a means to an end, not an end unto itself. There are many ways to address the locality problem. I am trying to encourage people participating in this discussion to stop fixating on the garbage collector itself. Somehow when GC is mentioned, people obsess on the garbage collection process without reading the entire discussion. If there is interest in discussing garbage collectors, there are other mailinglists on that specific topic where the pros and cons of various styles with and without hardware assistance are debated. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed [zone allocation] 2002-08-15 11:34 ` Aldy Hernandez 2002-08-15 11:39 ` David Edelsohn @ 2002-08-15 11:41 ` Michael Matz 2002-08-16 8:44 ` Kai Henningsen 2002-08-15 11:43 ` Per Bothner 2002-08-15 11:57 ` Kevin Handy 3 siblings, 1 reply; 173+ messages in thread From: Michael Matz @ 2002-08-15 11:41 UTC (permalink / raw) To: Aldy Hernandez; +Cc: Per Bothner, gcc Hi, On 15 Aug 2002, Aldy Hernandez wrote: > This is just an idea, why doesn't someone hack the GC to never > collect, and then we can really find out how much is to be gained by a > refcounter, or no GC at all, etc. To switch off GC doesn't necessarily bring anything, except that GC isn't done. But the allocated memory still has the same locality as before (i.e. if it's the reason for bad performance now, that will still be the case if we switch off GC). I.e. it wouldn't proove anything. Ciao, Michael. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed [zone allocation] 2002-08-15 11:41 ` Michael Matz @ 2002-08-16 8:44 ` Kai Henningsen 0 siblings, 0 replies; 173+ messages in thread From: Kai Henningsen @ 2002-08-16 8:44 UTC (permalink / raw) To: gcc matz@suse.de (Michael Matz) wrote on 15.08.02 in < Pine.LNX.4.33.0208152037200.13269-100000@wotan.suse.de >: > On 15 Aug 2002, Aldy Hernandez wrote: > > > This is just an idea, why doesn't someone hack the GC to never > > collect, and then we can really find out how much is to be gained by a > > refcounter, or no GC at all, etc. > > To switch off GC doesn't necessarily bring anything, except that GC isn't > done. But the allocated memory still has the same locality as before > (i.e. if it's the reason for bad performance now, that will still be the > case if we switch off GC). I.e. it wouldn't proove anything. Well, it might prove that the bad locality isn't *caused* by running the collector. (Or that it is, of course.) MfG Kai ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed [zone allocation] 2002-08-15 11:34 ` Aldy Hernandez 2002-08-15 11:39 ` David Edelsohn 2002-08-15 11:41 ` Michael Matz @ 2002-08-15 11:43 ` Per Bothner 2002-08-15 11:57 ` Kevin Handy 3 siblings, 0 replies; 173+ messages in thread From: Per Bothner @ 2002-08-15 11:43 UTC (permalink / raw) To: Aldy Hernandez; +Cc: Michael Matz, gcc Aldy Hernandez wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed [zone allocation] 2002-08-15 11:34 ` Aldy Hernandez ` (2 preceding siblings ...) 2002-08-15 11:43 ` Per Bothner @ 2002-08-15 11:57 ` Kevin Handy 3 siblings, 0 replies; 173+ messages in thread From: Kevin Handy @ 2002-08-15 11:57 UTC (permalink / raw) To: gcc Aldy Hernandez wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-14 12:07 ` David Edelsohn 2002-08-14 13:20 ` Michael Matz @ 2002-08-14 13:20 ` Jamie Lokier 2002-08-14 16:01 ` Nix 1 sibling, 1 reply; 173+ messages in thread From: Jamie Lokier @ 2002-08-14 13:20 UTC (permalink / raw) To: David Edelsohn; +Cc: Andreas Schwab, Richard Henderson, David S. Miller, gcc David Edelsohn wrote: > I thought that obstacks are created dynamically, not statically. > One does not want to ever copy or grow the array. Obstacks use chunks of memory to hold many contiguous objects, so they offer fairly good spatial locality. But then, so do many decent GC allocators (not ones using free lists, though). > Statically allocating some of the large, persistent, sequential > collections of objects would help locality. Linus and David are suggesting that temporal locality of short-lived objects is important -- i.e. reuse of memory from freed objects. Who knows. -- Jamie ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-14 13:20 ` Faster compilation speed Jamie Lokier @ 2002-08-14 16:01 ` Nix 0 siblings, 0 replies; 173+ messages in thread From: Nix @ 2002-08-14 16:01 UTC (permalink / raw) To: Jamie Lokier Cc: David Edelsohn, Andreas Schwab, Richard Henderson, David S. Miller, gcc On Wed, 14 Aug 2002, Jamie Lokier muttered drunkenly: > David Edelsohn wrote: >> I thought that obstacks are created dynamically, not statically. >> One does not want to ever copy or grow the array. > > Obstacks use chunks of memory to hold many contiguous objects, so they > offer fairly good spatial locality. But then, so do many decent GC > allocators (not ones using free lists, though). Also, surely one does not *often* want to grow or copy the array: the occasional copy isn't a problem (but you initialize it quite large so the resizing isn't required often). -- `Mips are real and bitrate earnest, shifting spam is not our goal; silicon to sand returnest, was not spoken of the soul.' --- _Eventful History: Version 1.x_, John M. Ford ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-14 9:52 ` Richard Henderson 2002-08-14 10:00 ` David Edelsohn @ 2002-08-14 10:15 ` David Edelsohn 2002-08-14 16:35 ` Richard Henderson 2002-08-20 4:15 ` Richard Earnshaw 1 sibling, 2 replies; 173+ messages in thread From: David Edelsohn @ 2002-08-14 10:15 UTC (permalink / raw) To: Richard Henderson, David S. Miller; +Cc: gcc >>>>> Richard Henderson writes: Richard> The folks that are doing cache-miss studies and concluding anything Richard> should also go back and measure gcc 2.95, before we used GC at all. Richard> That's perhaps not ideal, since it's obstacks instead of reference Richard> counting, but it's not a worthless data point. Thanks for the suggestion. I think the results I got are pretty damning: gcc-2.95.3 20010315 (release) Source I/D$ miss -O2 I/D$ miss -O0 ------ ------------- ------------- reload.c 28 36 insn-recog.c 48 36 For comparison, GCC 3.3 has values in the low 20's, especially at no optimization. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-14 10:15 ` David Edelsohn @ 2002-08-14 16:35 ` Richard Henderson 2002-08-14 17:02 ` David Edelsohn 2002-08-20 4:15 ` Richard Earnshaw 1 sibling, 1 reply; 173+ messages in thread From: Richard Henderson @ 2002-08-14 16:35 UTC (permalink / raw) To: David Edelsohn; +Cc: David S. Miller, gcc On Wed, Aug 14, 2002 at 01:14:53PM -0400, David Edelsohn wrote: > Thanks for the suggestion. I think the results I got are pretty damning... Try the following. Appears to cut 30 seconds (3.5%) off of an -O2 -g build of reload.c, and a small fraction of a second (3.1%) at -O0 -g. This on an 800MHz Pentium III (Coppermine). If I have rest_of_compilation dump out insn addresses before optimization (the only time we could even hope for relatively sequential nodes), INSN nodes are indeed largely coherent (even without this patch). But NOTE nodes are smaller, and get put in a different size bucket, and so are allocated from different pages. Padding out the size of NOTEs and BARRIERs make them allocated from the same pages, and the resulting initial addresses are about as sequential as one could hope. The remaining main source of non-sequentiality in the initial rtl is label = gen_label_rtx (); /* emit code */ emit_label (label); and there's really no helping that. The other change is to add allocation buckets for two important rtx sizes. On 32-bit systems, two-operand rtxs (including REG, MEM, PLUS, etc) are 12 bytes, but we were allocating 16 bytes. Similarly an INSN (9 operand) and CALL_INSN (10 operand) are 40 and 44 bytes respectively but we were allocating 64. I choose to put the bucket at 10 operand so that CALL_INSNs and JUMP_INSNs can fit. I havn't measured the overall real-life memory savings, but this is 25% for REGs and 30% for INSNs. r~ * ggc-page.c (RTL_SIZE): New. (extra_order_size_table): Add specializations for 2 and 10 rtl slots. * rtl.def (BARRIER, NOTE): Pad to 9 slots. Index: ggc-page.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/ggc-page.c,v retrieving revision 1.51 diff -c -p -d -r1.51 ggc-page.c *** ggc-page.c 4 Jun 2002 11:30:36 -0000 1.51 --- ggc-page.c 14 Aug 2002 22:38:57 -0000 *************** Software Foundation, 59 Temple Place - S *** 163,175 **** #define NUM_EXTRA_ORDERS ARRAY_SIZE (extra_order_size_table) /* The Ith entry is the maximum size of an object to be stored in the Ith extra order. Adding a new entry to this array is the *only* thing you need to do to add a new special allocation size. */ static const size_t extra_order_size_table[] = { sizeof (struct tree_decl), ! sizeof (struct tree_list) }; /* The total number of orders. */ --- 163,180 ---- #define NUM_EXTRA_ORDERS ARRAY_SIZE (extra_order_size_table) + #define RTL_SIZE(NSLOTS) \ + (sizeof (struct rtx_def) + ((NSLOTS) - 1) * sizeof (rtunion)) + /* The Ith entry is the maximum size of an object to be stored in the Ith extra order. Adding a new entry to this array is the *only* thing you need to do to add a new special allocation size. */ static const size_t extra_order_size_table[] = { sizeof (struct tree_decl), ! sizeof (struct tree_list), ! RTL_SIZE (2), /* REG, MEM, PLUS, etc. */ ! RTL_SIZE (10), /* INSN, CALL_INSN, JUMP_INSN */ }; /* The total number of orders. */ Index: rtl.def =================================================================== RCS file: /cvs/gcc/gcc/gcc/rtl.def,v retrieving revision 1.58 diff -c -p -d -r1.58 rtl.def *** rtl.def 19 Jul 2002 23:11:18 -0000 1.58 --- rtl.def 14 Aug 2002 22:38:57 -0000 *************** DEF_RTL_EXPR(JUMP_INSN, "jump_insn", "iu *** 566,587 **** DEF_RTL_EXPR(CALL_INSN, "call_insn", "iuuBteieee", 'i') /* A marker that indicates that control will not flow through. */ ! DEF_RTL_EXPR(BARRIER, "barrier", "iuu", 'x') /* Holds a label that is followed by instructions. Operand: ! 4: is used in jump.c for the use-count of the label. ! 5: is used in flow.c to point to the chain of label_ref's to this label. ! 6: is a number that is unique in the entire compilation. ! 7: is the user-given name of the label, if any. */ DEF_RTL_EXPR(CODE_LABEL, "code_label", "iuuB00is", 'x') /* Say where in the code a source line starts, for symbol table's sake. Operand: ! 4: filename, if line number > 0, note-specific data otherwise. ! 5: line number if > 0, enum note_insn otherwise. ! 6: unique number if line number == note_insn_deleted_label. */ ! DEF_RTL_EXPR(NOTE, "note", "iuuB0ni", 'x') /* ---------------------------------------------------------------------- Top level constituents of INSN, JUMP_INSN and CALL_INSN. --- 566,589 ---- DEF_RTL_EXPR(CALL_INSN, "call_insn", "iuuBteieee", 'i') /* A marker that indicates that control will not flow through. */ ! DEF_RTL_EXPR(BARRIER, "barrier", "iuu000000", 'x') /* Holds a label that is followed by instructions. Operand: ! 5: is used in jump.c for the use-count of the label. ! 6: is used in flow.c to point to the chain of label_ref's to this label. ! 7: is a number that is unique in the entire compilation. ! 8: is the user-given name of the label, if any. */ DEF_RTL_EXPR(CODE_LABEL, "code_label", "iuuB00is", 'x') /* Say where in the code a source line starts, for symbol table's sake. Operand: ! 5: filename, if line number > 0, note-specific data otherwise. ! 6: line number if > 0, enum note_insn otherwise. ! 7: unique number if line number == note_insn_deleted_label. ! 8-9: padding so that notes and insns are the same size, and thus ! allocated from the same page ordering. */ ! DEF_RTL_EXPR(NOTE, "note", "iuuB0ni00", 'x') /* ---------------------------------------------------------------------- Top level constituents of INSN, JUMP_INSN and CALL_INSN. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-14 16:35 ` Richard Henderson @ 2002-08-14 17:02 ` David Edelsohn 0 siblings, 0 replies; 173+ messages in thread From: David Edelsohn @ 2002-08-14 17:02 UTC (permalink / raw) To: Richard Henderson, David S. Miller; +Cc: gcc The patch does improve the cache behavior: Source I/D$ miss -O2 I/D$ miss -O0 ------ ------------- ------------- reload.c 22 -> 23.4 22 -> 23.9 insn-recog.c 29 -> 30.3 23 -> 24.6 David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-14 10:15 ` David Edelsohn 2002-08-14 16:35 ` Richard Henderson @ 2002-08-20 4:15 ` Richard Earnshaw 2002-08-20 5:38 ` Jeff Sturm 2002-08-20 8:00 ` David Edelsohn 1 sibling, 2 replies; 173+ messages in thread From: Richard Earnshaw @ 2002-08-20 4:15 UTC (permalink / raw) To: David Edelsohn; +Cc: Richard Henderson, David S. Miller, gcc, Richard.Earnshaw > >>>>> Richard Henderson writes: > > Richard> The folks that are doing cache-miss studies and concluding anything > Richard> should also go back and measure gcc 2.95, before we used GC at all. > Richard> That's perhaps not ideal, since it's obstacks instead of reference > Richard> counting, but it's not a worthless data point. > > Thanks for the suggestion. I think the results I got are pretty > damning: > > gcc-2.95.3 20010315 (release) > > Source I/D$ miss -O2 I/D$ miss -O0 > ------ ------------- ------------- > reload.c 28 36 > insn-recog.c 48 36 > > > For comparison, GCC 3.3 has values in the low 20's, especially at > no optimization. > > David > Do you have/can you get data for TLB misses? R. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-20 4:15 ` Richard Earnshaw @ 2002-08-20 5:38 ` Jeff Sturm 2002-08-20 5:53 ` Richard Earnshaw 2002-08-20 8:00 ` David Edelsohn 1 sibling, 1 reply; 173+ messages in thread From: Jeff Sturm @ 2002-08-20 5:38 UTC (permalink / raw) To: Richard.Earnshaw; +Cc: David Edelsohn, Richard Henderson, David S. Miller, gcc On Tue, 20 Aug 2002, Richard Earnshaw wrote: > > gcc-2.95.3 20010315 (release) > > > > Source I/D$ miss -O2 I/D$ miss -O0 > > ------ ------------- ------------- > > reload.c 28 36 > > insn-recog.c 48 36 > > Do you have/can you get data for TLB misses? I had done that on alpha, but didn't initially report the figures. Would a comparison to 2.95 also be useful? gcc version 3.3 20020802 (experimental) --------------------------------------------------------------------------- cc1 -O2 reload.i issues/cycles = 0.51 issues/dcache_miss = 26.93 issues/dtb_miss = 1214.36 --------------------------------------------------------------------------- cc1 reload.i issues/cycles = 0.52 issues/dcache_miss = 31.29 issues/dtb_miss = 1854.16 Jeff ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-20 5:38 ` Jeff Sturm @ 2002-08-20 5:53 ` Richard Earnshaw 2002-08-20 13:42 ` Jeff Sturm 0 siblings, 1 reply; 173+ messages in thread From: Richard Earnshaw @ 2002-08-20 5:53 UTC (permalink / raw) To: Jeff Sturm Cc: Richard.Earnshaw, David Edelsohn, Richard Henderson, David S. Miller, gcc > > Do you have/can you get data for TLB misses? > > I had done that on alpha, but didn't initially report the figures. Would > a comparison to 2.95 also be useful? Certainly -- the numbers don't really mean anything unless we have something to compare them against. Remember, gcc-2.95 bootstrap times were about half those that we have now (*after* taking into account new languages and libraries etc). R. > > gcc version 3.3 20020802 (experimental) > > --------------------------------------------------------------------------- > cc1 -O2 reload.i > > issues/cycles = 0.51 issues/dcache_miss = 26.93 issues/dtb_miss = 1214.36 So if I understand these figures correctly, then dcache_miss/dtb_miss ~= 45 That is, one in 45 dcache fetches also requires a tlb walk. How many dtb entries does an Alpha have? > > --------------------------------------------------------------------------- > cc1 reload.i > > issues/cycles = 0.52 issues/dcache_miss = 31.29 issues/dtb_miss = 1854.16 > giving dcache_miss/dtb_miss ~= 60 ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-20 5:53 ` Richard Earnshaw @ 2002-08-20 13:42 ` Jeff Sturm 2002-08-22 1:55 ` Richard Earnshaw 0 siblings, 1 reply; 173+ messages in thread From: Jeff Sturm @ 2002-08-20 13:42 UTC (permalink / raw) To: Richard.Earnshaw; +Cc: David Edelsohn, Richard Henderson, David S. Miller, gcc On Tue, 20 Aug 2002, Richard Earnshaw wrote: > > I had done that on alpha, but didn't initially report the figures. Would > > a comparison to 2.95 also be useful? > > Certainly -- the numbers don't really mean anything unless we have > something to compare them against. I figured so. (Wow, I hadn't built a 2.95 toolchain in a long time.) > > gcc version 3.3 20020802 (experimental) > > > > --------------------------------------------------------------------------- > > cc1 -O2 reload.i > > > > issues/cycles = 0.51 issues/dcache_miss = 26.93 issues/dtb_miss = 1214.36 gcc version 2.95.3 20010315 (release) cc1 -O2 reload.i issues/cycles = 0.54 issues/dcache_miss = 26.31 issues/dtb_miss = 2488. cc1 reload.i issues/cycles = 0.52 issues/dcache_miss = 26.30 issues/dtb_miss = 3306. Now that's interesting. No real change in L1 cache performance, but TLB misses nearly cut in half vs. 3.3. Trying L3 misses (both with -O0): 3.3: issues/bcache_miss = 370 2.95.3: issues/bcache_miss = 437 Wall-clock time is nearly 2/1 for these tests, as are TLB misses, while other stats are close. Hmm. > So if I understand these figures correctly, then > > dcache_miss/dtb_miss ~= 45 > > That is, one in 45 dcache fetches also requires a tlb walk. That's how I see it. > How many dtb entries does an Alpha have? No idea. This is an ev56. I could try grabbing the specs from Digital's site, if I can still find it... How expensive is a TLB miss, anyway? I hadn't expected it would occur often enough in gcc to be significant. Note the IPC ratio stays constant, but as I understand it, TLB is handled in software, so maybe those cycles are counted by iprobe? Jeff ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-20 13:42 ` Jeff Sturm @ 2002-08-22 1:55 ` Richard Earnshaw 2002-08-22 2:03 ` David S. Miller 2002-08-23 15:39 ` Jeff Sturm 0 siblings, 2 replies; 173+ messages in thread From: Richard Earnshaw @ 2002-08-22 1:55 UTC (permalink / raw) To: Jeff Sturm Cc: Richard.Earnshaw, David Edelsohn, Richard Henderson, David S. Miller, gcc > On Tue, 20 Aug 2002, Richard Earnshaw wrote: > > > I had done that on alpha, but didn't initially report the figures. Would > > > a comparison to 2.95 also be useful? > > > > Certainly -- the numbers don't really mean anything unless we have > > something to compare them against. > > I figured so. (Wow, I hadn't built a 2.95 toolchain in a long time.) > > > > gcc version 3.3 20020802 (experimental) > > > > > > --------------------------------------------------------------------------- > > > cc1 -O2 reload.i > > > > > > issues/cycles = 0.51 issues/dcache_miss = 26.93 issues/dtb_miss = 1214.36 > > gcc version 2.95.3 20010315 (release) > > cc1 -O2 reload.i > issues/cycles = 0.54 issues/dcache_miss = 26.31 issues/dtb_miss = 2488. > > cc1 reload.i > issues/cycles = 0.52 issues/dcache_miss = 26.30 issues/dtb_miss = 3306. > > Now that's interesting. No real change in L1 cache performance, but TLB > misses nearly cut in half vs. 3.3. > > Trying L3 misses (both with -O0): > > 3.3: issues/bcache_miss = 370 > 2.95.3: issues/bcache_miss = 437 > > Wall-clock time is nearly 2/1 for these tests, as are TLB misses, while > other stats are close. Hmm. > > > So if I understand these figures correctly, then > > > > dcache_miss/dtb_miss ~= 45 > > > > That is, one in 45 dcache fetches also requires a tlb walk. > > That's how I see it. OK, now consider it this way. Each cache line miss will cause N bytes to be fetched from memory -- I don't know the details, but lets assume that's 32 bytes, a typical value. Each tlb entry will address one page -- again I don't know the details but 4K is common on many machines. So, with gcc 2.95.3 we have -O2 dcache_miss/tlb_miss = 2488 / 26.31 ~= 95 -O0 dcache_miss/tlb_miss = 3306 / 26.30 ~= 127 Since each dcache miss represents 32 bytes of memory we have 3040 (95 * 32) and 4064 bytes fetched per tlb miss we have very nearly 75% and 100% of each page being accessed for each miss (it will be lower than this in practice, since some lines in a page will probably be fetched more than once and others not at all). However, for gcc 3 we have 1440 and 1920 bytes; that is, we *at best* access less than half the memory in each page we touch. > How expensive is a TLB miss, anyway? I hadn't expected it would occur > often enough in gcc to be significant. Note the IPC ratio stays constant, > but as I understand it, TLB is handled in software, so maybe those cycles > are counted by iprobe? A cache miss probably takes about twice as long if we also miss in the TLB, assuming tlb walking is done in hardware -- if you have a soft-loaded TLB, then it could take significantly longer. R. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-22 1:55 ` Richard Earnshaw @ 2002-08-22 2:03 ` David S. Miller 2002-08-23 15:39 ` Jeff Sturm 1 sibling, 0 replies; 173+ messages in thread From: David S. Miller @ 2002-08-22 2:03 UTC (permalink / raw) To: Richard.Earnshaw, rearnsha; +Cc: jsturm, dje, rth, gcc From: Richard Earnshaw <rearnsha@arm.com> Date: Thu, 22 Aug 2002 09:53:19 +0100 > How expensive is a TLB miss, anyway? I hadn't expected it would occur > often enough in gcc to be significant. Note the IPC ratio stays constant, > but as I understand it, TLB is handled in software, so maybe those cycles > are counted by iprobe? A cache miss probably takes about twice as long if we also miss in the TLB, assuming tlb walking is done in hardware -- if you have a soft-loaded TLB, then it could take significantly longer. A soft-loaded TLB miss on UltraSPARC can be serviced in ~38 processor cycles. At least this is how fast the Linux software TLB miss handler is. This includes all of the overhead associated with entering and leaving the trap. It also assumes that the TLB miss handler hits the L2 cache for the page table entry load (there is only one memory access necessary to service a TLB miss, bonus points to those who know how this is accomplished without looking at the sources :-). ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-22 1:55 ` Richard Earnshaw 2002-08-22 2:03 ` David S. Miller @ 2002-08-23 15:39 ` Jeff Sturm 1 sibling, 0 replies; 173+ messages in thread From: Jeff Sturm @ 2002-08-23 15:39 UTC (permalink / raw) To: Richard.Earnshaw; +Cc: David Edelsohn, Richard Henderson, David S. Miller, gcc On Thu, 22 Aug 2002, Richard Earnshaw wrote: > OK, now consider it this way. Each cache line miss will cause N bytes to > be fetched from memory -- I don't know the details, but lets assume that's > 32 bytes, a typical value. Each tlb entry will address one page -- again > I don't know the details but 4K is common on many machines. > > So, with gcc 2.95.3 we have > > -O2 dcache_miss/tlb_miss = 2488 / 26.31 ~= 95 > -O0 dcache_miss/tlb_miss = 3306 / 26.30 ~= 127 > > Since each dcache miss represents 32 bytes of memory we have 3040 (95 * > 32) and 4064 bytes fetched per tlb miss we have very nearly 75% and 100% > of each page being accessed for each miss (it will be lower than this in > practice, since some lines in a page will probably be fetched more than > once and others not at all). > > However, for gcc 3 we have 1440 and 1920 bytes; that is, we *at best* > access less than half the memory in each page we touch. Interesting analysis; thanks. It's actually worse than you say since Alpha has 8k pages. I looked up the ev56 specs to find out there are just 64 TLB entries, so for any working set larger than 512k some thrashing would be expected. For another experiment I installed one of the superpage patches available for Linux; this enables the granularity hint bits for Alpha to support pages up to 4MB. Then I modified ggc-page.c to allocate 4MB chucks by anonymous mmap. I then measured 70% fewer dtb misses for cc1, although wall clock time is reduced by only ~5%. So it would appear that TLB misses are indeed important but not the overwhelming concern in gcc's performance. Jeff ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-20 4:15 ` Richard Earnshaw 2002-08-20 5:38 ` Jeff Sturm @ 2002-08-20 8:00 ` David Edelsohn 1 sibling, 0 replies; 173+ messages in thread From: David Edelsohn @ 2002-08-20 8:00 UTC (permalink / raw) To: Richard.Earnshaw; +Cc: Richard Henderson, David S. Miller, gcc >>>>> Richard Earnshaw writes: Richard> Do you have/can you get data for TLB misses? Yes. I didn't comment on TLB statistics because it did not vary much with optimization level or GCC versions. GCC 2.95 is a little better, but overlaps with GCC 3.3 TLB statistics. Both GCC 2.95 and GCC 3.3 statistics follow the source file size. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 21:46 ` Fergus Henderson 2002-08-13 22:40 ` David S. Miller @ 2002-08-14 7:36 ` Jeff Sturm 1 sibling, 0 replies; 173+ messages in thread From: Jeff Sturm @ 2002-08-14 7:36 UTC (permalink / raw) To: Fergus Henderson; +Cc: Loren James Rittle, davem, gcc On Wed, 14 Aug 2002, Fergus Henderson wrote: > On 13-Aug-2002, Loren James Rittle <rittle@latour.rsch.comm.mot.com> wrote: > > Has anyone ever tested gcc with its own GC disabled > > but boehm-gc enabled? OK, this is a red herring question. Even if > > performance was greater, portability concerns are what caused the > > decision to build a new custom scan-GC verses reusing boehm-gc... > > Yes, but GCC could use the Boehm GC on systems which supported it, > if the Boehm GC was faster... > > I think this would be a very interesting experiment. I tried it a year or so ago on the 3.0 sources. Had a ggc-boehm.c operating mostly conservatively. Using ggc's marking infrastructure may be possible, but seemed difficult to interface with boehm-gc. One of the difficult problems is that boehm-gc doesn't want to follow pointers through ordinary (malloc'ed) heap sections. So I overrode malloc/free to use the GC methods. I made ggc_collect() a no-op, since boehm-gc knows when it needs to collect, and overriding its heuristics doesn't really help matters anyway. Overall it seemed to shave a few minutes off the bootstrap time, but also increased memory usage considerably. I expected this. Tuning frequency of collection typically amounts to a size/speed tradeoff. I don't think conservativeness was an important factor in heap size. It could've been interesting to try incremental/generational collection. I didn't do that. My impression based partly on that experiment is that allocation & collection overhead in GCC is not all that substantial, and the real gains are going to be elsewhere, i.e. improving temporal locality as has been discussed lately. That isn't a problem that any GC is going to fix. (I also don't think it's a necessary evil of GC, rather it's how you use the allocator... e.g. creating too many short-lived objects is a bad thing.) Jeff ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:51 ` Stan Shebs ` (2 preceding siblings ...) 2002-08-09 18:25 ` David S. Miller @ 2002-08-10 10:02 ` Neil Booth 3 siblings, 0 replies; 173+ messages in thread From: Neil Booth @ 2002-08-10 10:02 UTC (permalink / raw) To: Stan Shebs; +Cc: Aldy Hernandez, Mike Stump, gcc Stan Shebs wrote:- > One of my suspicions is that the massive use of macros in tree > and RTL is concealing excessive pointer chasing, because they > don't show up in either profile or coverage numbers. Yes. I look forward to the day when we use type-safe structures that contain only the relevant information, rather than a "tree" which is little more than the union of the universe, along with compensating macros to detect type violations. Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:26 ` Stan Shebs 2002-08-09 16:31 ` Aldy Hernandez @ 2002-08-09 17:36 ` Daniel Berlin 2002-08-12 16:23 ` Mike Stump 2 siblings, 0 replies; 173+ messages in thread From: Daniel Berlin @ 2002-08-09 17:36 UTC (permalink / raw) To: Stan Shebs; +Cc: Aldy Hernandez, Mike Stump, gcc On Fri, 9 Aug 2002, Stan Shebs wrote: > Aldy Hernandez wrote: > > >>Let's take my combine elision patch. This patch makes the compiler > >>generate worse code. The way in which it is worse, is that more stack > >>space is used. How much more, well, my initial guess is that it is > >>less than 10% worse. Not too bad. Maybe users would care, maybe they > >> > > > >I assume you have already looked at the horrendity of the code > >presently generated by -O0. It's pretty unusable as it is. Who would > >really want to use gcc under the influence of "worse than -O0"? > >Really. > > > OK, then to really rub it in, CW runs much faster than GCC, even on > that slow Darwin OS :-), and that's with its non-optimizing case being > about halfway between GCC's -O0 and -O1, and works well with the > debugger still. > > Sacrificing -O0 optimization is just a desperation move, since > we don't seem to have many other ideas about how to make GCC as > fast as CW. Look, there are, in reality, two things that make our compiler slower than metrowerks, even at -O0 First is parsing. The bison parser is just not fast. It never will be. Period. The second is expansion from tree to RTL. It's not fast either. The timings don't always tell the real story. There are cases where expansion is occuring when the timevar isn't pushed (IE other things that call expand_*, where * = anything but _body, where the timevar is pushed). The solutions to the first is already in progress (give me a clean, working hand-written parser, that can compile libstdc++, and i'll happily make it go real fast. I was just starting to when the branch was abandoned.). Codewarrior, for comparison sake, uses a backtracking recursive descent parser for it's C++ compiler. The second is hard to solve in a way people would like. The fastest way to solve the problem is to do native code generation off the tree at -O0, avoiding any optimizations whatsoever. This is, of course, not easy to do with our current MD files. We really would need a *burg like tool and associated descriptions. You could do debugging output without too much difficulty. Most of the debug_* functions operate on trees anyway. PFE solves our first problem as well, but not the second one. We still have to *generate* the code. But there still have to be better answers than trying to avoid the backend entirely. If our backend is so godawfully bad that we have to start skipping entire "normal" phases (IE not optimizations to speed up code, or things that are done in plenty of other compilers at -O0), then we really *do* need to rearchitect them, and maybe more. Not just directed speed ups. At some point, it becomes easier to redo it from scratch well. Particularly when nobody today understands why anyone thought it was a good idea to do it the way it's done now. --Dan > > Stan > > > > ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:26 ` Stan Shebs 2002-08-09 16:31 ` Aldy Hernandez 2002-08-09 17:36 ` Daniel Berlin @ 2002-08-12 16:23 ` Mike Stump 2 siblings, 0 replies; 173+ messages in thread From: Mike Stump @ 2002-08-12 16:23 UTC (permalink / raw) To: Stan Shebs; +Cc: Aldy Hernandez, gcc On Friday, August 9, 2002, at 04:25 PM, Stan Shebs wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:00 ` Aldy Hernandez 2002-08-09 16:26 ` Stan Shebs @ 2002-08-12 16:05 ` Mike Stump 1 sibling, 0 replies; 173+ messages in thread From: Mike Stump @ 2002-08-12 16:05 UTC (permalink / raw) To: Aldy Hernandez; +Cc: gcc On Friday, August 9, 2002, at 04:05 PM, Aldy Hernandez wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:28 ` Mike Stump 2002-08-09 16:00 ` Aldy Hernandez @ 2002-08-09 19:07 ` David Edelsohn 1 sibling, 0 replies; 173+ messages in thread From: David Edelsohn @ 2002-08-09 19:07 UTC (permalink / raw) To: Mike Stump, Stan Shebs; +Cc: gcc In regard to the benefit of some optimization at -O0, please see http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00690.html ("The Death of Stupid"). Other comercial compilers are able to focus on compilation speed at -O0 with some small, appropriate optimization. They also efficiently produce extremely good code with full optimization enabled. They do not need an additional -fquick-compile flag. GCC does not have much low-hanging fruit left. IMHO, playing these speed-up games distracts interested developers from addressing the fundamental design problems which slow down GCC. The underlying problems have been mentioned in this discussion. If we begin to attack them now, we may have them ready for GCC 3.4. If we keep looking for easy solutions, GCC is going to remain at a disadvantage. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 12:17 Faster compilation speed Mike Stump 2002-08-09 13:04 ` Noel Yap 2002-08-09 13:10 ` Aldy Hernandez @ 2002-08-09 14:29 ` Neil Booth 2002-08-09 15:02 ` Nathan Sidwell 2002-08-12 12:11 ` Mike Stump 2002-08-09 14:51 ` Stan Shebs ` (3 subsequent siblings) 6 siblings, 2 replies; 173+ messages in thread From: Neil Booth @ 2002-08-09 14:29 UTC (permalink / raw) To: Mike Stump; +Cc: gcc Mike Stump wrote:- > I'd like to introduce lots of various changes to improve compiler > speed. Just my opinion, Mike, but I think a lot of current slowness is due to redo-ing too many things, and not taking advantage of ordering or whatever technique so that conclusions deduced from internal representations are made in a logical, efficent way. (e.g. I think we try to constant fold things that we've already tried to constant fold and failed, repeatedly, and we don't do the constant folding we do do in an optimal way. I could be wrong, though; I've not looked in detail). I cannot explain this clearly, or with any specific example, but IMO we work far too hard to do what we do. I'd like to see this cleaned up instead. For example, see some of Mark's recent patches. I think we could continue doing that for ages. I also believe that using Bison (and our ill-considered extensions like attributes pretty much anywhere) don't help efficiency. We could probably do better in the C front end with a tree representation that is closer to C than the current multi-language form of trees. What worries me about PCH and similar schemes is it's too easy to fix the symptoms, rather than the real reasons for the slowness. As a result, such things might never be fixed. Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 14:29 ` Neil Booth @ 2002-08-09 15:02 ` Nathan Sidwell 2002-08-09 17:05 ` Stan Shebs 2002-08-10 2:21 ` Gabriel Dos Reis 2002-08-12 12:11 ` Mike Stump 1 sibling, 2 replies; 173+ messages in thread From: Nathan Sidwell @ 2002-08-09 15:02 UTC (permalink / raw) To: Neil Booth; +Cc: Mike Stump, gcc Neil Booth wrote: > Just my opinion, Mike, but I think a lot of current slowness is due to > redo-ing too many things, and not taking advantage of ordering or whatever > technique so that conclusions deduced from internal representations are > made in a logical, efficent way. (e.g. I think we try to constant fold > things that we've already tried to constant fold and failed, repeatedly, > and we don't do the constant folding we do do in an optimal way. I could > be wrong, though; I've not looked in detail). I cannot explain this Yup, redoing things seems to happen a lot in the c++ front end. The type conversion machinery seems to work a lot like if (complicated fn to try conversion 1) complicated fn to do conversion 1 else if (complicated fn to try conversion 2) complicated fn to do conversion 2 ... unifying static_cast, (cast), const_cast, implicit_conversion, overload arg resolution might be a win. I think you might be right about fold-const. That's recursive itself, so we should only need to call that when we really need to flatten a const, rather than after every new operation. As you'll have noticed I'm tweaking the coverage machinery to try and find hotspots and deadspots. My immediate plan for this is to a) fix .da files so they don't grow indefinitly large - nearly done b) add some kind of __builtin_unexpected (), to mark expected dead code c) write some perl scripts to munge the gcov output I hope some of that is useful to others. nathan -- Dr Nathan Sidwell :: http://www.codesourcery.com :: CodeSourcery LLC 'But that's a lie.' - 'Yes it is. What's your point?' nathan@codesourcery.com : http://www.cs.bris.ac.uk/~nathan/ : nathan@acm.org ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:02 ` Nathan Sidwell @ 2002-08-09 17:05 ` Stan Shebs 2002-08-10 2:21 ` Gabriel Dos Reis 1 sibling, 0 replies; 173+ messages in thread From: Stan Shebs @ 2002-08-09 17:05 UTC (permalink / raw) To: Nathan Sidwell; +Cc: Neil Booth, Mike Stump, gcc Nathan Sidwell wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:02 ` Nathan Sidwell 2002-08-09 17:05 ` Stan Shebs @ 2002-08-10 2:21 ` Gabriel Dos Reis 1 sibling, 0 replies; 173+ messages in thread From: Gabriel Dos Reis @ 2002-08-10 2:21 UTC (permalink / raw) To: Nathan Sidwell; +Cc: Neil Booth, Mike Stump, gcc Nathan Sidwell <nathan@codesourcery.com> writes: | unifying static_cast, (cast), const_cast, implicit_conversion, overload | arg resolution might be a win. We might get correctness at the same time. [...] | I hope some of that is useful to others. Definitely. -- Gaby ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 14:29 ` Neil Booth 2002-08-09 15:02 ` Nathan Sidwell @ 2002-08-12 12:11 ` Mike Stump 2002-08-12 12:41 ` David Edelsohn 2002-08-12 19:17 ` Mike Stump 1 sibling, 2 replies; 173+ messages in thread From: Mike Stump @ 2002-08-12 12:11 UTC (permalink / raw) To: Neil Booth; +Cc: gcc On Friday, August 9, 2002, at 02:27 PM, Neil Booth wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 12:11 ` Mike Stump @ 2002-08-12 12:41 ` David Edelsohn 2002-08-12 12:47 ` Matt Austern 2002-08-12 19:17 ` Mike Stump 1 sibling, 1 reply; 173+ messages in thread From: David Edelsohn @ 2002-08-12 12:41 UTC (permalink / raw) To: Mike Stump; +Cc: gcc >>>>> Mike Stump writes: Mike> Instead? Well, I cannot promise instead, but I think it is reasonable Mike> to look at it in addition to all the other stuff. If Apple wants to tackle one or more of the fundamental GCC design problems affecting compiler performance which have been mentioned during this discussion, I think that Apple will have a lot of support and help from GCC developers. This means doing the analysis of the problem, experimenting with possible approaches, designing a solution, and implementing that solution with the entire GCC development community. Fiddling around the edges, disabling functionality to save compilation time is not likely to be effective for Apple or for the GCC community. The big gains are to be found in revising the design and implementation of GCC's underlying infrastructure, not lots of little tweaks. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 12:41 ` David Edelsohn @ 2002-08-12 12:47 ` Matt Austern 2002-08-12 12:56 ` David S. Miller 0 siblings, 1 reply; 173+ messages in thread From: Matt Austern @ 2002-08-12 12:47 UTC (permalink / raw) To: David Edelsohn; +Cc: Mike Stump, gcc On Monday, August 12, 2002, at 12:40 PM, David Edelsohn wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 12:47 ` Matt Austern @ 2002-08-12 12:56 ` David S. Miller 2002-08-12 13:56 ` Matt Austern 2002-08-12 14:28 ` Stan Shebs 0 siblings, 2 replies; 173+ messages in thread From: David S. Miller @ 2002-08-12 12:56 UTC (permalink / raw) To: austern; +Cc: dje, mrs, gcc From: Matt Austern <austern@apple.com> Date: Mon, 12 Aug 2002 12:47:30 -0700 And yes, we're aware that many gains are possible only if we rewrite the parser or redesign the tree structure. The only reason we haven't started on rewriting the parser is that someone else is already doing it. So work on an attempt at RTL refcounting, the patch below is a place to start. Next you have to: 1) walk through the whole compiler and add all the proper {GET,PUT}_RTX calls. 2) find a solution for circular RTL I would suggest as a first pass (ie. to get some performance numbers), special case things like INSN_LISTs and just don't refcount for the references to INSNs they generate. Likewise for INSN dependency lists generated by the scheduler et al. 3) bring it at least to the point where you can successfully get a successful build of some non-trivial source file. Perhaps gcc/reload.i. Even if it requires some gross hacks to get it to pass through, post GC vs. refcounting performance numbers. 4) Almost certainly, in trying to refcount things correctly, you will spot real bugs in the compiler. Please keep track of these so they can be fixed independant of whether the rtx refcounting is ever used or not. 5) If you are still bored at this point, add the machinery to use the RTX walking of the current garbage collector to verify the reference counts. This will basically be required in order to make and sufficiently correctness check a final implementation. It would be enabled by default, so that if any refence counts go wrong they will be spotted with impunity. This is part of the sociological aspect of these changes, namely getting people to think about proper resource tracking when working with RTL objects. If the compiler explodes when they get it wrong, they will learn eventually :-) Because if someone else doesn't do this, I will end up doing so :-) --- ./rtl.h.~1~ Sun Aug 11 19:04:35 2002 +++ ./rtl.h Sun Aug 11 20:42:02 2002 @@ -130,6 +130,9 @@ struct rtx_def /* The kind of value the expression has. */ ENUM_BITFIELD(machine_mode) mode : 8; + /* Reference count. */ + unsigned int __count : 24; + /* 1 in a MEM if we should keep the alias set for this mem unchanged when we access a component. 1 in a CALL_INSN if it is a sibling call. @@ -184,7 +187,7 @@ struct rtx_def 1 in a REG means this reg refers to the return value of the current function. 1 in a SYMBOL_REF if the symbol is weak. */ - unsigned integrated : 1; + unsigned int integrated : 1; /* 1 in an INSN or a SET if this rtx is related to the call frame, either changing how we compute the frame address or saving and restoring registers in the prologue and epilogue. @@ -193,7 +196,7 @@ struct rtx_def 1 in a REG if the register is a pointer. 1 in a SYMBOL_REF if it addresses something in the per-function constant string pool. */ - unsigned frame_related : 1; + unsigned int frame_related : 1; /* The first element of the operands of this rtx. The number of operands and their types are controlled @@ -211,12 +214,25 @@ struct rtx_def #define GET_MODE(RTX) ((enum machine_mode) (RTX)->mode) #define PUT_MODE(RTX, MODE) ((RTX)->mode = (ENUM_BITFIELD(machine_mode)) (MODE)) +/* Define macros to get/put references to RTL objects. */ + +#define GET_RTX(RTX) (((RTX)->__count)++) +#define PUT_RTX(RTX) \ +do \ + { \ + if (--((RTX)->__count) == 0) \ + __put_rtx(RTX); \ + } \ +while (0) + + /* RTL vector. These appear inside RTX's when there is a need for a variable number of things. The principle use is inside PARALLEL expressions. */ struct rtvec_def GTY(()) { int num_elem; /* number of elements */ + int __count; /* reference count */ rtx GTY ((length ("%h.num_elem"))) elem[1]; }; @@ -225,6 +241,15 @@ struct rtvec_def GTY(()) { #define GET_NUM_ELEM(RTVEC) ((RTVEC)->num_elem) #define PUT_NUM_ELEM(RTVEC, NUM) ((RTVEC)->num_elem = (NUM)) +#define GET_RTVEC(RTVEC) (((RTVEC)->__count)++) +#define PUT_RTVEC(RTVEC) \ +do \ + { \ + if (--((RTVEC)->__count) == 0) \ + __put_rtvec(RTVEC); \ + } \ +while (0) + /* Predicate yielding nonzero iff X is an rtl for a register. */ #define REG_P(X) (GET_CODE (X) == REG) @@ -1347,6 +1372,8 @@ extern rtx emit_copy_of_insn_after PARAM extern rtx rtx_alloc PARAMS ((RTX_CODE)); extern rtvec rtvec_alloc PARAMS ((int)); extern rtx copy_rtx PARAMS ((rtx)); +extern void __put_rtx PARAMS ((rtx)); +extern void __put_rtvec PARAMS ((rtvec)); /* In emit-rtl.c */ extern rtx copy_rtx_if_shared PARAMS ((rtx)); --- ./gengenrtl.c.~1~ Sun Aug 11 19:04:33 2002 +++ ./gengenrtl.c Sun Aug 11 20:45:18 2002 @@ -278,11 +278,15 @@ gendef (format) the memory and initializes it. */ puts ("{"); puts (" rtx rt;"); - printf (" rt = ggc_alloc_rtx (%d);\n", (int) strlen (format)); + puts (" int n;"); + printf (" n = (sizeof (struct rtx_def) + ((%d - 1) * sizeof(rtunion)));\n", + (int) strlen (format)); + puts (" rt = xmalloc (n);\n"); puts (" memset (rt, 0, sizeof (struct rtx_def) - sizeof (rtunion));\n"); puts (" PUT_CODE (rt, code);"); puts (" PUT_MODE (rt, mode);"); + puts (" rt->__count = 1;"); for (p = format, i = j = 0; *p ; ++p, ++i) if (*p != '0') --- ./rtl.c.~1~ Tue Jun 4 14:06:54 2002 +++ ./rtl.c Sun Aug 11 20:53:11 2002 @@ -242,14 +242,34 @@ rtvec_alloc (n) { rtvec rt; - rt = ggc_alloc_rtvec (n); + n = (sizeof(struct rtvec_def) + + ((n - 1) * sizeof (rtx))); + rt = xmalloc (n); + + PUT_NUM_ELEM (rt, n); + rt->__count = 1; + /* clear out the vector */ memset (&rt->elem[0], 0, n * sizeof (rtx)); - PUT_NUM_ELEM (rt, n); return rt; } +void +__put_rtvec (rv) + rtvec rv; +{ + int i, len = GET_NUM_ELEM (rv); + + for (i = 0; i < len; i++) + { + if (! rv->elem[i]) + abort (); + PUT_RTX (rv->elem[i]); + } + xfree (rv); +} + /* Allocate an rtx of code CODE. The CODE is stored in the rtx; all the rest is initialized to zero. */ @@ -258,9 +278,11 @@ rtx_alloc (code) RTX_CODE code; { rtx rt; - int n = GET_RTX_LENGTH (code); + int n; - rt = ggc_alloc_rtx (n); + n = (sizeof (struct rtx_def) + + ((GET_RTX_LENGTH (code) - 1) * sizeof(rtunion))); + rt = xmalloc (n); /* We want to clear everything up to the FLD array. Normally, this is one int, but we don't want to assume that and it isn't very @@ -268,7 +290,58 @@ rtx_alloc (code) memset (rt, 0, sizeof (struct rtx_def) - sizeof (rtunion)); PUT_CODE (rt, code); + rt->__count = 1; return rt; +} + +void +__put_rtx(rt) + rtx rt; +{ + char *fmt; + int i, j, len; + + fmt = GET_RTX_FORMAT (GET_CODE (rt)); + len = GET_RTX_LENGTH (GET_CODE (rt)); + for (i = 0; i < len; i++) { + switch (fmt[i]) { + case 'e': + if (! XEXP (rt, i)) + abort (); + PUT_RTX (XEXP (rt, i)); + break; + + case 'E': + case 'V': + /* XXX How to handle vectors... XXX */ + if (XVEC (rt, i) != NULL) + { + for (j = 0; j < XVECLEN (rt, i); j++) + { + if (! XVECEXP (rt, i, j)) + abort (); + PUT_RTX (XVECEXP (rt, i, j)); + } + } + break; + + case 't': + case 'w': + case 'i': + case 's': + case 'S': + case 'T': + case 'u': + case 'B': + case '0': + break; + + default: + abort (); + }; + } + + xfree(rt); } \f ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 12:56 ` David S. Miller @ 2002-08-12 13:56 ` Matt Austern 2002-08-12 14:27 ` Daniel Berlin ` (2 more replies) 2002-08-12 14:28 ` Stan Shebs 1 sibling, 3 replies; 173+ messages in thread From: Matt Austern @ 2002-08-12 13:56 UTC (permalink / raw) To: David S. Miller; +Cc: dje, mrs, gcc On Monday, August 12, 2002, at 12:43 PM, David S. Miller wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 13:56 ` Matt Austern @ 2002-08-12 14:27 ` Daniel Berlin 2002-08-12 15:26 ` David Edelsohn 2002-08-12 14:59 ` David S. Miller 2002-08-12 16:00 ` Geoff Keating 2 siblings, 1 reply; 173+ messages in thread From: Daniel Berlin @ 2002-08-12 14:27 UTC (permalink / raw) To: Matt Austern; +Cc: David S. Miller, dje, mrs, gcc On Mon, 12 Aug 2002, Matt Austern wrote: > On Monday, August 12, 2002, at 12:43 PM, David S. Miller wrote: > > > From: Matt Austern <austern@apple.com> > > Date: Mon, 12 Aug 2002 12:47:30 -0700 > > > > And yes, we're aware that many gains are possible only > > if we rewrite the parser or redesign the tree structure. The > > only reason we haven't started on rewriting the parser is > > that someone else is already doing it. > > > > So work on an attempt at RTL refcounting, the patch below is a place > > to start. > > Thanks for the pointer, that's a useful starting point. > > But, at the risk of sounding like a broken record... Do > we have benchmarks showing that RTL gc is one of > the major causes of slow compile speed? > > At the moment, we're spending a lot of time doing > benchmarking and trying to figure out just where the > time is going. I realize this has its limitations, that > poorly designed data structures may end up resulting > in tiny bits of overhead everywhere even if they never > show up in a profile. But at least we can try to > understand what kinds of programs are especially > bad. (One interesting fact, for example: one file that > we care a lot about takes twice as long to compile with > the C++ front end than with the C front end.) Well, the tools for this stuff are much better on osx than on Linux, so you guys are probably ahead of others in figuring out whether GC is really bad for us. You can easily get numbers like data cache miss cycles, etc, and graph them nicely with MONster. -Dan ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 14:27 ` Daniel Berlin @ 2002-08-12 15:26 ` David Edelsohn 2002-08-13 10:49 ` David Edelsohn 0 siblings, 1 reply; 173+ messages in thread From: David Edelsohn @ 2002-08-12 15:26 UTC (permalink / raw) To: Daniel Berlin, Matt Austern, David S. Miller; +Cc: gcc I have IBM's hpmcount tool installed on a Power4 AIX 5.1 system which can use PMAPI to access the hardware performance counters on the chip. I would be happy to provide additional data for comparison with the x86 cache statistics which have been mentioned. So that we're all on the same page, what sourcefile is being compiled with which GCC options? I can acquire information like for cc1 -O2 hello.c: PM_DTLB_MISS (Data TLB misses) : 5538 PM_ITLB_MISS (Instruction TLB misses) : 819 PM_LD_MISS_L1 (L1 D cache load misses) : 43074 PM_ST_MISS_L1 (L1 D cache store misses) : 349240 PM_ST_REF_L1 (L1 D cache store references) : 1958037 PM_LD_REF_L1 (L1 D cache load references) : 3113549 Utilization rate : 29.438 % % TLB misses per cycle : 0.038 % Avg number of loads per TLB miss : 562.215 Load and store operations : 5.072 M Instructions per load/store : 2.899 Avg number of loads per load miss : 72.284 Avg number of stores per store miss : 5.607 Avg number of load/stores per D1 miss : 12.927 L1 cache hit rate : 92.264 % PM_DATA_FROM_L3 (Data loaded from L3) : 1420 PM_DATA_FROM_MEM (Data loaded from memory) : 144 PM_DATA_FROM_L35 (Data loaded from L3.5) : 19 PM_DATA_FROM_L2 (Data loaded from L2) : 36410 PM_DATA_FROM_L25_SHR (Data loaded from L2.5 shared) : 0 PM_DATA_FROM_L275_SHR (Data loaded from L2.75 shared) : 0 PM_DATA_FROM_L275_MOD (Data loaded from L2.75 modified) : 0 PM_DATA_FROM_L25_MOD (Data loaded from L2.5 modified) : 0 Memory traffic : 0.074 MBytes Memory bandwidth : 1.589 MBytes/sec Total loads from L3 : 0.001 M L3 traffic : 0.184 MBytes L3 bandwidth : 3.970 MBytes/sec L3 Load miss rate : 9.097 % Total loads from L2 : 0.036 M L2 traffic : 4.660 MBytes L2 bandwidth : 100.446 MBytes/sec L2 Load miss rate : 4.167 % David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 15:26 ` David Edelsohn @ 2002-08-13 10:49 ` David Edelsohn 2002-08-13 10:52 ` David S. Miller ` (2 more replies) 0 siblings, 3 replies; 173+ messages in thread From: David Edelsohn @ 2002-08-13 10:49 UTC (permalink / raw) To: Daniel Berlin, Matt Austern, David S. Miller; +Cc: gcc Source file Insns / L1 D$ Miss ----------- ------------------ reload.c 22 reload1.c 25 insn-recog.c 29 GCC 3.3 20020812 (experimental) powerpc-ibm-aix5.1.0.0 Power4 processor As one of my colleagues commented, this is the cache behavior one would see with database transaction processing. In other words, this is *really bad*. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 10:49 ` David Edelsohn @ 2002-08-13 10:52 ` David S. Miller 2002-08-13 14:03 ` David Edelsohn 2002-08-13 15:32 ` Daniel Berlin 2 siblings, 0 replies; 173+ messages in thread From: David S. Miller @ 2002-08-13 10:52 UTC (permalink / raw) To: dje; +Cc: dan, austern, gcc From: David Edelsohn <dje@watson.ibm.com> Date: Tue, 13 Aug 2002 13:49:18 -0400 As one of my colleagues commented, this is the cache behavior one would see with database transaction processing. In other words, this is *really bad*. Thanks for doing these tests David. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 10:49 ` David Edelsohn 2002-08-13 10:52 ` David S. Miller @ 2002-08-13 14:03 ` David Edelsohn 2002-08-13 14:46 ` Geoff Keating ` (2 more replies) 2002-08-13 15:32 ` Daniel Berlin 2 siblings, 3 replies; 173+ messages in thread From: David Edelsohn @ 2002-08-13 14:03 UTC (permalink / raw) To: David S. Miller; +Cc: dan, austern, gcc Here's an interesting (aka depressing) data point. My previous cache miss statistics were for GCC -O2. At -O0, GCC's cache miss statistics stay the same or get up to 20% *worse*. In comparison, the cache statistics for IBM's compiler without optimization enabled *improve* up to 50 for the same reload.c and insn-recog.c input files compared to optimized. GCC has some sort of overhead, maybe the tree->RTL conversion as Dan mentioned, which really hurts re-use at -O0. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 14:03 ` David Edelsohn @ 2002-08-13 14:46 ` Geoff Keating 2002-08-13 15:10 ` David Edelsohn 2002-08-14 9:25 ` Kevin Handy 2002-08-18 12:58 ` Jeff Sturm 2 siblings, 1 reply; 173+ messages in thread From: Geoff Keating @ 2002-08-13 14:46 UTC (permalink / raw) To: David Edelsohn; +Cc: gcc David Edelsohn <dje@watson.ibm.com> writes: > Here's an interesting (aka depressing) data point. My previous > cache miss statistics were for GCC -O2. At -O0, GCC's cache miss > statistics stay the same or get up to 20% *worse*. In comparison, the > cache statistics for IBM's compiler without optimization enabled *improve* > up to 50 for the same reload.c and insn-recog.c input files compared to > optimized. > > GCC has some sort of overhead, maybe the tree->RTL conversion as > Dan mentioned, which really hurts re-use at -O0. Could you try with -fsyntax-only? -- - Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com> ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 14:46 ` Geoff Keating @ 2002-08-13 15:10 ` David Edelsohn 2002-08-13 15:26 ` Neil Booth 0 siblings, 1 reply; 173+ messages in thread From: David Edelsohn @ 2002-08-13 15:10 UTC (permalink / raw) To: Geoff Keating; +Cc: gcc >>>>> Geoff Keating writes: Geoff> Could you try with -fsyntax-only? Source I/D$ miss -O2 I/D$ miss -O0 I/D$ miss -fsyntax-only ------------ ------------- ------------- ----------------------- reload.c 22 22 23 reload1.c 25 22 23 insn-recog.c 29 23 26 David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 15:10 ` David Edelsohn @ 2002-08-13 15:26 ` Neil Booth 0 siblings, 0 replies; 173+ messages in thread From: Neil Booth @ 2002-08-13 15:26 UTC (permalink / raw) To: David Edelsohn; +Cc: Geoff Keating, gcc David Edelsohn wrote:- > >>>>> Geoff Keating writes: > > Geoff> Could you try with -fsyntax-only? > > Source I/D$ miss -O2 I/D$ miss -O0 I/D$ miss -fsyntax-only > ------------ ------------- ------------- ----------------------- > reload.c 22 22 23 > reload1.c 25 22 23 > insn-recog.c 29 23 26 And -E 8-) I'd actually be quite curious if you have time. Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 14:03 ` David Edelsohn 2002-08-13 14:46 ` Geoff Keating @ 2002-08-14 9:25 ` Kevin Handy 2002-08-18 12:58 ` Jeff Sturm 2 siblings, 0 replies; 173+ messages in thread From: Kevin Handy @ 2002-08-14 9:25 UTC (permalink / raw) To: gcc David Edelsohn wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 14:03 ` David Edelsohn 2002-08-13 14:46 ` Geoff Keating 2002-08-14 9:25 ` Kevin Handy @ 2002-08-18 12:58 ` Jeff Sturm 2002-08-19 12:55 ` Mike Stump 2002-08-20 11:22 ` Will Cohen 2 siblings, 2 replies; 173+ messages in thread From: Jeff Sturm @ 2002-08-18 12:58 UTC (permalink / raw) To: David Edelsohn; +Cc: David S. Miller, dan, austern, gcc On Tue, 13 Aug 2002, David Edelsohn wrote: > Here's an interesting (aka depressing) data point. My previous > cache miss statistics were for GCC -O2. At -O0, GCC's cache miss > statistics stay the same or get up to 20% *worse*. In comparison, the > cache statistics for IBM's compiler without optimization enabled *improve* > up to 50 for the same reload.c and insn-recog.c input files compared to > optimized. Here's a data point on alpha-linux: cc1 -quiet -O2 reload.i issues/cycles = 0.51 issues/dcache_miss = 26.93 Without optimization: cc1 -quiet reload.i issues/cycles = 0.52 issues/dcache_miss = 31.29 This is on a ev56 with a direct-mapped cache. To get some idea where the misses are taking place, I experimented with iprobe's sampling mode. Omitting results below the 1% sample threshold, I get: function | issues | access | misses | i/m | a/m ----------------------------+--------+--------+--------+-----+----- yyparse | 2924 | 848 | 148 | 20 | 5.7 gt_ggc_mx_lang_tree_node | 1336 | 612 | 74 | 18 | 8.2 verify_flow_info | 1388 | 408 | 129 | 11 | 3.1 copy_rtx_if_shared | 2120 | 1012 | 53 | 40 | 19.0 propagate_one_insn | 3636 | 504 | 52 | 70 | 9.6 find_temp_slot_from_address | 728 | 232 | 126 | 6 | 1.8 ggc_mark_rtx_children_1 | 1580 | 316 | 40 | 40 | 7.9 extract_insn | 1576 | 476 | 52 | 30 | 9.1 record_reg_classes | 3848 | 944 | 65 | 59 | 14.5 reg_scan_mark_refs | 1472 | 632 | 66 | 22 | 9.5 find_reloads | 7680 | 3104 | 148 | 52 | 20.9 subst_reloads | 4772 | 2736 | 169 | 28 | 16.1 side_effects_p | 1344 | 564 | 43 | 31 | 13.1 for_each_rtx | 4924 | 1464 | 75 | 66 | 19.5 ggc_alloc | 2424 | 728 | 111 | 22 | 6.5 ggc_set_mark | 3392 | 976 | 107 | 32 | 9.1 (Each sample reported is 2^14 events.) yyparse performs badly (as would any table-driven parser), but how about verify_flow_info and find_temp_slot_from_address? Both are reporting awful cache behavior. Jeff ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-18 12:58 ` Jeff Sturm @ 2002-08-19 12:55 ` Mike Stump 2002-08-20 11:22 ` Will Cohen 1 sibling, 0 replies; 173+ messages in thread From: Mike Stump @ 2002-08-19 12:55 UTC (permalink / raw) To: Jeff Sturm; +Cc: David Edelsohn, David S. Miller, dan, austern, gcc On Sunday, August 18, 2002, at 12:57 PM, Jeff Sturm wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-18 12:58 ` Jeff Sturm 2002-08-19 12:55 ` Mike Stump @ 2002-08-20 11:22 ` Will Cohen 1 sibling, 0 replies; 173+ messages in thread From: Will Cohen @ 2002-08-20 11:22 UTC (permalink / raw) To: Jeff Sturm; +Cc: David Edelsohn, David S. Miller, dan, austern, gcc How about reordering the rows and columns in the table used by yyparse to improve locality? Have a instrumented version of the yyparse to record the number of times each transition is taken and use the data to interchange rows and columns to attempt to get frequent transitions in the same cache line (or at least not conflicting memory locations). It would be a kind of feedback-directed optimization (-fprofile-arcs/-fbranch-probabilities) for bison. -Will Jeff Sturm wrote: On Tue, 13 Aug 2002, David Edelsohn wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 10:49 ` David Edelsohn 2002-08-13 10:52 ` David S. Miller 2002-08-13 14:03 ` David Edelsohn @ 2002-08-13 15:32 ` Daniel Berlin 2002-08-13 15:58 ` David Edelsohn 2 siblings, 1 reply; 173+ messages in thread From: Daniel Berlin @ 2002-08-13 15:32 UTC (permalink / raw) To: David Edelsohn; +Cc: Daniel Berlin, Matt Austern, David S. Miller, gcc On Tue, 13 Aug 2002, David Edelsohn wrote: > Source file Insns / L1 D$ Miss > ----------- ------------------ > reload.c 22 > reload1.c 25 > insn-recog.c 29 > > GCC 3.3 20020812 (experimental) > powerpc-ibm-aix5.1.0.0 > Power4 processor > > As one of my colleagues commented, this is the cache behavior one > would see with database transaction processing. In other words, this is > *really bad*. Yup. > > David > > ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 15:32 ` Daniel Berlin @ 2002-08-13 15:58 ` David Edelsohn 2002-08-13 16:49 ` David S. Miller 0 siblings, 1 reply; 173+ messages in thread From: David Edelsohn @ 2002-08-13 15:58 UTC (permalink / raw) To: dberlin; +Cc: Daniel Berlin, Matt Austern, David S. Miller, gcc >>>>> Daniel Berlin writes: >> As one of my colleagues commented, this is the cache behavior one >> would see with database transaction processing. In other words, this is >> *really bad*. Daniel> Yup. The problem isn't that the number is low at optimization. 29 I/M is not horrible. Low 20's is bad. Scientific code will have a value in the low hundreds, but compilation is not that regular a computation. The problem is that the number stays the same or gets worse without optimization. Most commercial compilers will be in the same ballpark when optimizing, but use a lot fewer instructions and a lot fewer cache misses to produce minimally optimized, debuggable code. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 15:58 ` David Edelsohn @ 2002-08-13 16:49 ` David S. Miller 0 siblings, 0 replies; 173+ messages in thread From: David S. Miller @ 2002-08-13 16:49 UTC (permalink / raw) To: dje; +Cc: dberlin, dan, austern, gcc From: David Edelsohn <dje@watson.ibm.com> Date: Tue, 13 Aug 2002 18:58:25 -0400 The problem isn't that the number is low at optimization. Can you control when the performance counters start/stop monitoring? If so, then you can figure out more precisely whether it is mostly during: 1) Front end tree or tree->rtl conversion 2) rest_of_compilation() onward 3) Both #1 and #2 about evenly, because all of our core data structures come out of GC the whole compiler has bad spatial and temporal locality My money is on #3 :-) ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 13:56 ` Matt Austern 2002-08-12 14:27 ` Daniel Berlin @ 2002-08-12 14:59 ` David S. Miller 2002-08-12 16:00 ` Geoff Keating 2 siblings, 0 replies; 173+ messages in thread From: David S. Miller @ 2002-08-12 14:59 UTC (permalink / raw) To: austern; +Cc: dje, mrs, gcc From: Matt Austern <austern@apple.com> Date: Mon, 12 Aug 2002 13:56:32 -0700 But, at the risk of sounding like a broken record... Do we have benchmarks showing that RTL gc is one of the major causes of slow compile speed? It's not the GC it's the resulting data access patterns that result, and such overhead won't show up in normal profiling since such overhead is simply spread all over the compiler. That's the purpose of hobbling together a "hack" implementation of refcounting, to get some performance comparisons. You don't have to do a "final" perfect implementation to realize a tree usable enough for simple initial benchmarking. Based upon those results, we can decide to continue or not. But hey if people are going to be silly enough to require pre-benchmarking before even laying a finger on the refcounting bits, no problem we'll just have to wait for me to work on it then ;-) ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 13:56 ` Matt Austern 2002-08-12 14:27 ` Daniel Berlin 2002-08-12 14:59 ` David S. Miller @ 2002-08-12 16:00 ` Geoff Keating 2002-08-13 2:58 ` Nick Ing-Simmons 2002-08-13 10:47 ` Richard Henderson 2 siblings, 2 replies; 173+ messages in thread From: Geoff Keating @ 2002-08-12 16:00 UTC (permalink / raw) To: Matt Austern; +Cc: gcc Matt Austern <austern@apple.com> writes: > On Monday, August 12, 2002, at 12:43 PM, David S. Miller wrote: > > > From: Matt Austern <austern@apple.com> > > Date: Mon, 12 Aug 2002 12:47:30 -0700 > > > > And yes, we're aware that many gains are possible only > > if we rewrite the parser or redesign the tree structure. The > > only reason we haven't started on rewriting the parser is > > that someone else is already doing it. > > > > So work on an attempt at RTL refcounting, the patch below is a place > > to start. > > Thanks for the pointer, that's a useful starting point. > > But, at the risk of sounding like a broken record... Do > we have benchmarks showing that RTL gc is one of > the major causes of slow compile speed? We happen to know that GC as a whole is 10-13% of total compile time, even at -O0, and my expectation is that the RTL part of that is perhaps two-thirds, say 7%. So the benefit you can get is 7% less any overhead in tracking the reference counts and freeing briefly-allocated RTL. My suggestion is to try shrinking RTL in other ways. For instance, once RTL is generated it should all match an insn or a splitter. If we could store RTL as the insn number (or a splitter number) plus the operands, rather than the expanded form we have now, that should be much easier to traverse. For those operations that look at the form of RTL, code could be generated to perform that operation knowing what insns exist; for instance, on x86 the form of the 'add' instruction is: (insn 15 13 17 (parallel[ (set (reg:SI 61) (plus:SI (reg/v:SI 59) (reg/v:SI 60))) (clobber (reg:CC 17 flags)) ] ) -1 (nil) (nil)) we could represent this as (packed_insn 15 13 17 207 {*addsi_1} [(reg:SI 61) (reg:SI 59) (reg:SI 60)]) which would save us, by my count, 50% of the RTL objects for this case. I'd expect that would then speed GC (on this object) by 50%, speed up allocation by 50%, and hopefully would also speed up code that uses these objects because (a) they'd better fit in cache and (b) there would be fewer pointers to chase. To perform operations that are now done directly on the RTL, there'd be a switch statement, for instance: int reg_mentioned_p (reg, in) { ... case PACKED_INSN: switch (PACKED_INSN_NUMBER (in)) { ... case 207: /* *addsi_1 */ if (REGNO (reg) == 17) // deal with the clobbered register return 1; // deal with the operands break; } ... } Even combine can be handled this way, by pregenerating rules based on the insn numbers being combined. Relatively few insns can actually be combined, so it shouldn't require a huge amount of generated code. On RISCy chips, you could take even further advantage of the fact that often an operand is guaranteed to be a register, or a constant integer or whatever, and so eliminate some tests. I'm not sure how much work this is to implement. I suspect what you'd end up doing is performing a trade-off between generating too many routines and having to rewrite large chunks of old code to use the routines that already exist but that they don't use. Now, if only I could think of something that would work like this on trees... -- - Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com> ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 16:00 ` Geoff Keating @ 2002-08-13 2:58 ` Nick Ing-Simmons 2002-08-13 10:47 ` Richard Henderson 1 sibling, 0 replies; 173+ messages in thread From: Nick Ing-Simmons @ 2002-08-13 2:58 UTC (permalink / raw) To: geoffk; +Cc: gcc, Matt Austern Geoff Keating <geoffk@geoffk.org> writes: > >We happen to know that GC as a whole is 10-13% of total compile time, >even at -O0, and my expectation is that the RTL part of that is >perhaps two-thirds, say 7%. So the benefit you can get is 7% less any >overhead in tracking the reference counts and freeing >briefly-allocated RTL. That does not take into account the cache/tlb locality effects that Linus explained are caused by delayed reclaimation. -- Nick Ing-Simmons http://www.ni-s.u-net.com/ ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 16:00 ` Geoff Keating 2002-08-13 2:58 ` Nick Ing-Simmons @ 2002-08-13 10:47 ` Richard Henderson 1 sibling, 0 replies; 173+ messages in thread From: Richard Henderson @ 2002-08-13 10:47 UTC (permalink / raw) To: Geoff Keating; +Cc: Matt Austern, gcc On Mon, Aug 12, 2002 at 04:00:08PM -0700, Geoff Keating wrote: > My suggestion is to try shrinking RTL in other ways. For instance, > once RTL is generated it should all match an insn or a splitter. If > we could store RTL as the insn number (or a splitter number) plus the > operands, rather than the expanded form we have now, that should be > much easier to traverse. I've thought about this in passing before. > (packed_insn 15 13 17 207 {*addsi_1} [(reg:SI 61) (reg:SI 59) (reg:SI 60)]) > > which would save us, by my count, 50% of the RTL objects for this > case. A bit more than that if the packed_insn rtl is actually variable sized so that the operands are directly at the end of the other arguments. > To perform operations that are now done directly on the RTL, there'd be > a switch statement, for instance: Another possible solution, particularly for bletcherous code like combine, is to regenerate the full instruction on demand. After try_combine is done with an insn, we free it immediately so that we don't accumulate garbage. But I suspect that most passes don't need this. They only need to know which operands are inputs, sets, and clobbers. They need to know which predicates apply. Information which is trivial to generate off the md file. This idea, I think, has real potential, and could actually be implemented without disrupting the entire compiler. > Even combine can be handled this way, by pregenerating rules based on > the insn numbers being combined. Relatively few insns can actually be > combined, so it shouldn't require a huge amount of generated code. Pre-generating the combinations would be really cool, and probably save quite a bit o time, but I don't really believe in that for even the medium term. The number of possibilities is really quite large. > Now, if only I could think of something that would work like this on > trees... Having stronger typing instead of the union-of-everything would do. r~ ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 12:56 ` David S. Miller 2002-08-12 13:56 ` Matt Austern @ 2002-08-12 14:28 ` Stan Shebs 2002-08-12 15:05 ` David S. Miller 1 sibling, 1 reply; 173+ messages in thread From: Stan Shebs @ 2002-08-12 14:28 UTC (permalink / raw) To: David S. Miller; +Cc: austern, dje, mrs, gcc David S. Miller wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 14:28 ` Stan Shebs @ 2002-08-12 15:05 ` David S. Miller 0 siblings, 0 replies; 173+ messages in thread From: David S. Miller @ 2002-08-12 15:05 UTC (permalink / raw) To: shebs; +Cc: austern, dje, mrs, gcc From: Stan Shebs <shebs@apple.com> Date: Mon, 12 Aug 2002 14:27:52 -0700 So, uh, did I miss the part where refcounting is shown to be an improvement over the status quo? It's plausible I suppose, but counting does have its overhead too. We ought to have at least a back-of-the-envelope estimate before changing everything... You can choose to do that, but I bet you can spend the same amount of effort getting a benchmark'able refcounting tree together. This is so frustrating that I just might stop everything else I'm doing and put something together so I can just avoid all of this rediculious red tape people are putting up just to work on what amounts to a frickin technology demo! Have you ever implemented something solely to figure out whether it was worthwhile or not? :-) ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 12:11 ` Mike Stump 2002-08-12 12:41 ` David Edelsohn @ 2002-08-12 19:17 ` Mike Stump 2002-08-12 23:28 ` Neil Booth 1 sibling, 1 reply; 173+ messages in thread From: Mike Stump @ 2002-08-12 19:17 UTC (permalink / raw) To: Mike Stump; +Cc: Neil Booth, gcc On Monday, August 12, 2002, at 12:11 PM, Mike Stump wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 19:17 ` Mike Stump @ 2002-08-12 23:28 ` Neil Booth 0 siblings, 0 replies; 173+ messages in thread From: Neil Booth @ 2002-08-12 23:28 UTC (permalink / raw) To: Mike Stump; +Cc: gcc Mike Stump wrote:- > Ok, I looked at it. A straight forward check to see it is has been > folded first with the use of an existing unused bit in the tree speeds > it up by 1.0003, or not enough to bother with all the code and the use > of the extra bit that someone else may find more valuable. :-( That's a shame. 8-( Thanks for looking at it. Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 12:17 Faster compilation speed Mike Stump ` (2 preceding siblings ...) 2002-08-09 14:29 ` Neil Booth @ 2002-08-09 14:51 ` Stan Shebs 2002-08-09 15:03 ` David Edelsohn 2002-08-09 15:26 ` Geoff Keating 2002-08-09 14:59 ` Timothy J. Wood ` (2 subsequent siblings) 6 siblings, 2 replies; 173+ messages in thread From: Stan Shebs @ 2002-08-09 14:51 UTC (permalink / raw) To: Mike Stump; +Cc: gcc Mike Stump wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 14:51 ` Stan Shebs @ 2002-08-09 15:03 ` David Edelsohn 2002-08-09 15:43 ` Stan Shebs 2002-08-09 16:43 ` Alan Lehotsky 2002-08-09 15:26 ` Geoff Keating 1 sibling, 2 replies; 173+ messages in thread From: David Edelsohn @ 2002-08-09 15:03 UTC (permalink / raw) To: Stan Shebs; +Cc: Mike Stump, gcc >>>>> Stan Shebs writes: Stan> I think it suffices to have -O0 mean "go as fast as possible". From time to Stan> time, I've noticed that there's been a temptation to try to sneak in a Stan> little Stan> optimization even at -O0, presumably with the assumption that the time Stan> penalty was negligible. (There are users who complain that -O0 should Stan> do some amount of optimization, but IMHO we should ignore them.) Saying "do not run any optimization at -O0" shows a tremendous lack of understanding or investigation. One wants minimal optimization even at -O0 to decrease the size of the IL representation of the function being compiled. The little bit of computation to perform trivial optimization more than makes up for itself with the decreased size of the IL that needs to be processed to generate the output. One needs to be careful about which optimizations are run, but with the right choices it definitely is a net win. David ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:03 ` David Edelsohn @ 2002-08-09 15:43 ` Stan Shebs 2002-08-09 16:43 ` Alan Lehotsky 1 sibling, 0 replies; 173+ messages in thread From: Stan Shebs @ 2002-08-09 15:43 UTC (permalink / raw) To: David Edelsohn; +Cc: Mike Stump, gcc David Edelsohn wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:03 ` David Edelsohn 2002-08-09 15:43 ` Stan Shebs @ 2002-08-09 16:43 ` Alan Lehotsky 2002-08-09 16:49 ` Matt Austern 1 sibling, 1 reply; 173+ messages in thread From: Alan Lehotsky @ 2002-08-09 16:43 UTC (permalink / raw) To: David Edelsohn; +Cc: Stan Shebs, Mike Stump, gcc At 6:03 PM -0400 8/9/02, David Edelsohn wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:43 ` Alan Lehotsky @ 2002-08-09 16:49 ` Matt Austern 2002-08-10 2:24 ` Gabriel Dos Reis 0 siblings, 1 reply; 173+ messages in thread From: Matt Austern @ 2002-08-09 16:49 UTC (permalink / raw) To: Alan Lehotsky; +Cc: David Edelsohn, Stan Shebs, Mike Stump, gcc On Friday, August 9, 2002, at 04:17 PM, Alan Lehotsky wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:49 ` Matt Austern @ 2002-08-10 2:24 ` Gabriel Dos Reis 0 siblings, 0 replies; 173+ messages in thread From: Gabriel Dos Reis @ 2002-08-10 2:24 UTC (permalink / raw) To: Matt Austern; +Cc: Alan Lehotsky, David Edelsohn, Stan Shebs, Mike Stump, gcc Matt Austern <austern@apple.com> writes: | On Friday, August 9, 2002, at 04:17 PM, Alan Lehotsky wrote: | | > This is DEFINITELY TRUE! | > | > For example, the Bliss11 compiler ACTUALLY ran faster with | > optimization turned on because assembling the unoptimized code | > actually took longer than the time running FULL optimization required | > for anything but the most trivial programs. | | Shall we take it as a given that nobody is going to check | in a patch for faster compilations without benchmarking | and making sure that it really does speed things up? Some while ago, when the compiler slowdown was a hotter issue, it was suggested that no new optimization-related patches should be checked in if there were no concrete evidence that they're bringing noticeable wins. I don't know how that turns out, though. -- Gaby ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 14:51 ` Stan Shebs 2002-08-09 15:03 ` David Edelsohn @ 2002-08-09 15:26 ` Geoff Keating 2002-08-09 16:06 ` Stan Shebs 2002-08-12 15:55 ` Mike Stump 1 sibling, 2 replies; 173+ messages in thread From: Geoff Keating @ 2002-08-09 15:26 UTC (permalink / raw) To: Stan Shebs; +Cc: gcc Stan Shebs <shebs@apple.com> writes: > Mike Stump wrote: > > > > > The first realization I came to is that the only existing control > > for such things is -O[123], and having thought about it, I think it > > would be best to retain and use those flags. For minimal user > > impact, I think it would be good to not perturb existing users of > > -O[0123] too much, or at leaast, not at first. If we wanted to > > change them, I think -O0 should be the `fast' version, -O1 should be > > what -O0 does now with some additions around the edges, and -O2 and > > -O3 also slide over (at least one). What do you think, slide them > > all over one or more, or just make -O0 do less, or...? Maybe we > > have a -O0.0 to mean compile very quickly? > > I think it suffices to have -O0 mean "go as fast as possible". Note that that's different to what it means now, which is "I want the debugger to not surprise me." -- - Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com> ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:26 ` Geoff Keating @ 2002-08-09 16:06 ` Stan Shebs 2002-08-09 16:14 ` Terry Flannery 2002-08-09 16:29 ` Phil Edwards 2002-08-12 15:55 ` Mike Stump 1 sibling, 2 replies; 173+ messages in thread From: Stan Shebs @ 2002-08-09 16:06 UTC (permalink / raw) To: Geoff Keating; +Cc: gcc Geoff Keating wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:06 ` Stan Shebs @ 2002-08-09 16:14 ` Terry Flannery 2002-08-09 16:29 ` Neil Booth 2002-08-09 16:29 ` Phil Edwards 1 sibling, 1 reply; 173+ messages in thread From: Terry Flannery @ 2002-08-09 16:14 UTC (permalink / raw) To: Stan Shebs, Geoff Keating; +Cc: gcc IMHO, a new flag should be introduced, for example, -Of for maximum compile speed, and no surprises when debugging. -O0 should be minimal optimizations, and -O[s1-3] should remain as they are. I use the preprocessor to generate a preprocessed version of all the system header I use, into one header, and #include that in my program's header (with the flags to dump macros) , saving some time when building. If there was some support for pre-compiled headers, I'm sure that the compiler would be much faster. Terry ----- Original Message ----- From: "Stan Shebs" <shebs@apple.com> To: "Geoff Keating" <geoffk@geoffk.org> Cc: <gcc@gcc.gnu.org> Sent: Saturday, August 10, 2002 12:05 AM Subject: Re: Faster compilation speed > Geoff Keating wrote: > > >Stan Shebs <shebs@apple.com> writes: > > > >>Mike Stump wrote: > >> > >>>The first realization I came to is that the only existing control > >>>for such things is -O[123], and having thought about it, I think it > >>>would be best to retain and use those flags. For minimal user > >>>impact, I think it would be good to not perturb existing users of > >>>-O[0123] too much, or at leaast, not at first. If we wanted to > >>>change them, I think -O0 should be the `fast' version, -O1 should be > >>>what -O0 does now with some additions around the edges, and -O2 and > >>>-O3 also slide over (at least one). What do you think, slide them > >>>all over one or more, or just make -O0 do less, or...? Maybe we > >>>have a -O0.0 to mean compile very quickly? > >>> > >>I think it suffices to have -O0 mean "go as fast as possible". > >> > > > >Note that that's different to what it means now, which is "I want the > >debugger to not surprise me." > > > There's been a little bit of a drift over the years - -O0 used to be > "no opts at all", -O1 was "not too surprising for the debugger", and > -O2 was all-out. I remember some pressure from Cygnus customers to > make -O0 do more optimization, sometimes out of stupidity, but in the > legitimate cases because the -O0 code was too slow and/or large to > fit on the target embedded system, even for debugging. > > So what *should* we do with -O0 optimizations that measurably > slow down the compiler? > > Stan > > > ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:14 ` Terry Flannery @ 2002-08-09 16:29 ` Neil Booth 0 siblings, 0 replies; 173+ messages in thread From: Neil Booth @ 2002-08-09 16:29 UTC (permalink / raw) To: Terry Flannery; +Cc: Stan Shebs, Geoff Keating, gcc Terry Flannery wrote:- > IMHO, a new flag should be introduced, for example, -Of for maximum compile > speed, and no surprises when debugging. -O0 should be minimal optimizations, > and -O[s1-3] should remain as they are. > I use the preprocessor to generate a preprocessed version of all the system > header I use, into one header, and #include that in my program's header > (with the flags to dump macros) , saving some time when building. If there > was some support for pre-compiled headers, I'm sure that the compiler would > be much faster. How much time (%-wise) does it save? Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:06 ` Stan Shebs 2002-08-09 16:14 ` Terry Flannery @ 2002-08-09 16:29 ` Phil Edwards 2002-08-12 16:24 ` Mike Stump 1 sibling, 1 reply; 173+ messages in thread From: Phil Edwards @ 2002-08-09 16:29 UTC (permalink / raw) To: Stan Shebs; +Cc: gcc On Fri, Aug 09, 2002 at 04:05:16PM -0700, Stan Shebs wrote: > So what *should* we do with -O0 optimizations that measurably > slow down the compiler? How "minimal" can an optimization be, if it measurably slows down the compiler? If it slows things down, let's just move it to -O1/-O2. Personally, "fastest compile possible" usually just means -fsyntax-only. I have a hard time wanting to do anything with ad-hoc output. Phil -- I would therefore like to posit that computing's central challenge, viz. "How not to make a mess of it," has /not/ been met. - Edsger Dijkstra, 1930-2002 ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 16:29 ` Phil Edwards @ 2002-08-12 16:24 ` Mike Stump 2002-08-12 18:38 ` Phil Edwards 2002-08-13 5:27 ` Theodore Papadopoulo 0 siblings, 2 replies; 173+ messages in thread From: Mike Stump @ 2002-08-12 16:24 UTC (permalink / raw) To: Phil Edwards; +Cc: Stan Shebs, gcc On Friday, August 9, 2002, at 04:29 PM, Phil Edwards wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 16:24 ` Mike Stump @ 2002-08-12 18:38 ` Phil Edwards 2002-08-13 5:27 ` Theodore Papadopoulo 1 sibling, 0 replies; 173+ messages in thread From: Phil Edwards @ 2002-08-12 18:38 UTC (permalink / raw) To: Mike Stump; +Cc: Stan Shebs, gcc On Mon, Aug 12, 2002 at 04:24:46PM -0700, Mike Stump wrote: > On Friday, August 9, 2002, at 04:29 PM, Phil Edwards wrote: > > Personally, "fastest compile possible" usually just means > > -fsyntax-only. > > -fsyntax-only isn't a compile. My point, if we're nitpicking, is that almost every single time I hear a user complain that, "gcc is taking so long," it's immediately followed by, "all I want to do is check that I got the template specializations in the right order," etc. So they use -fsyntax-only while writing their code, and then fire off a "real" build at -O5.2e7 and go home for the evening. Phil -- I would therefore like to posit that computing's central challenge, viz. "How not to make a mess of it," has /not/ been met. - Edsger Dijkstra, 1930-2002 ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-12 16:24 ` Mike Stump 2002-08-12 18:38 ` Phil Edwards @ 2002-08-13 5:27 ` Theodore Papadopoulo 2002-08-13 10:03 ` Mike Stump 1 sibling, 1 reply; 173+ messages in thread From: Theodore Papadopoulo @ 2002-08-13 5:27 UTC (permalink / raw) To: Mike Stump; +Cc: Phil Edwards, Stan Shebs, gcc OK, since this is a brainstorming about speeding up gcc, and since silly ideas are at least discussed, let me try one. Why not make incremental compilation a standard for gcc... This would mean storing some information into the object files. Things I can see are: - Compilation flags (defines, optimization, code generation and debugging flags at least). - A signature (eg MD5 or other) for each data_type/function/global (decl ?) allowing for a quick check for a change. We may even differentiate between visible/invisible changes. Eg if a function body changes but not its interface there is no need to recompile the functions calling it. More generally name changes could be detected as non-changes, but I suspect that this will mess up with debugging information. Then generate code only for the relevant symbols (ie the new ones or those that have been changed or affected indirectly by a change ie depending on a function or variable that changed) and do a replacing of these in the .o file (is there an gas option like --replace ?). In some way this is like PCH but pushed one step further. I can understand that making it work reliably is quite difficult, but the perspective of having a fast incremental compiler is tempting... The information to store is certainly one of the trickiest part so a first step could be to add a flag stating recompile only this symbol and what depends on it. Not very user friendly, but maybe an interesting first step... Is this a totally remote/stupid idea, or can it be done in some eventually not too distant future ?? Theo. -------------------------------------------------------------------- Theodore Papadopoulo Email: Theodore.Papadopoulo@sophia.inria.fr Tel: (33) 04 92 38 76 01 -------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-13 5:27 ` Theodore Papadopoulo @ 2002-08-13 10:03 ` Mike Stump 0 siblings, 0 replies; 173+ messages in thread From: Mike Stump @ 2002-08-13 10:03 UTC (permalink / raw) To: Theodore Papadopoulo; +Cc: Phil Edwards, Stan Shebs, gcc On Tuesday, August 13, 2002, at 05:27 AM, Theodore Papadopoulo wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 15:26 ` Geoff Keating 2002-08-09 16:06 ` Stan Shebs @ 2002-08-12 15:55 ` Mike Stump 1 sibling, 0 replies; 173+ messages in thread From: Mike Stump @ 2002-08-12 15:55 UTC (permalink / raw) To: Geoff Keating; +Cc: Stan Shebs, gcc On Friday, August 9, 2002, at 03:26 PM, Geoff Keating wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 12:17 Faster compilation speed Mike Stump ` (3 preceding siblings ...) 2002-08-09 14:51 ` Stan Shebs @ 2002-08-09 14:59 ` Timothy J. Wood 2002-08-16 13:31 ` Problem with PFE approach [Was: Faster compilation speed] Timothy J. Wood 2002-08-09 16:01 ` Faster compilation speed Richard Henderson 2002-08-10 17:48 ` Aaron Lehmann 6 siblings, 1 reply; 173+ messages in thread From: Timothy J. Wood @ 2002-08-09 14:59 UTC (permalink / raw) To: Mike Stump; +Cc: gcc On Friday, August 9, 2002, at 12:17 PM, Mike Stump wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Problem with PFE approach [Was: Faster compilation speed] 2002-08-09 14:59 ` Timothy J. Wood @ 2002-08-16 13:31 ` Timothy J. Wood 2002-08-16 13:44 ` Devang Patel 2002-08-16 13:54 ` Devang Patel 0 siblings, 2 replies; 173+ messages in thread From: Timothy J. Wood @ 2002-08-16 13:31 UTC (permalink / raw) To: Mike Stump; +Cc: gcc So, another point in favor of discarding the concept of 'statically precompilation' based on a problem I just ran into with PFE under 10.2... I'm emulating some of the Win32 API for porting games to Mac OS X. Win32 has a macro like this: #ifndef INITGUID #define DEFINE_GUID(name, l, w1, w2, b1, b2, b3, b4, b5, b6, b7, b8) \ EXTERN_C const GUID FAR name #else #define DEFINE_GUID(name, l, w1, w2, b1, b2, b3, b4, b5, b6, b7, b8) \ EXTERN_C const GUID name \ = { l, w1, w2, { b1, b2, b3, b4, b5, b6, b7, b8 } } #endif // INITGUID If this gets stuck in a PFE and the PFE is applied as a prefix header (the only way it can be done right now), then the file being compiled cannot make its own decision about whether INITGUID should be defined or not. Clearly there are ways around this, but the current approach makes the compiler produce different output based on whether PFE is on or not. I consider this a bug. This would not be a problem with an automatic precompiler that remembered facts and didn't use the prefix header hack. Are there problems with what I describe below or are people just avoiding commenting on this since it is too hard to implement? :) -tim On Friday, August 9, 2002, at 02:58 PM, Timothy J. Wood wrote: 2) This one is rather crazy and would involve huge amounts of work probably.... a) Toss some or all of your PFE code in the bin (yikes!) b) Build a precompile server that the compiler can attach to and request precompiled headers (give a path and set of -D flags or whatever other state is needed to uniquely identify the precompile output). Requests would be satisfied via shared memory (yes, non-portable, so this whole mechanism will only work on modern machines). c) Inside the server, keep parsed representations of all headers that have been imported and the -D state used when parsing the headers. As new headers are parsed, they should be able to **layer** on top of existing parsed headers (so there should only be one parsed version of std::string). This avoids the confining requirement that you have one big master precompiled header. d) Details about concurrency, security, locating the server, and so on left as an exercise for the reader. The main advantage here is that people would get fast compiles WITHOUT having to tune their single PFE header. Additionally, more headers would get precompiled than would otherwise, yielding faster builds. If they layering is done correctly, the memory usage of the entire system could be lower (since if you have two projects to build, both of which import STL, there would be only one precompiled version of STL). At the start of a build, a special 'check filesystem' command could be sent to the server to have it do a one-time check of timestamps of headers files. Assuming the timestamps haven't changed, the precompiled headers could be kept across builds! Naturally doing a 'clean' build from the IDE option would need to be able to flush and probably shut down the server since it is inevitable that there will be bugs that will corrupt the precomp database :( #2 could really take many forms. The key idea is that having a single PFE file is non-optimal. Developers should not have to spend time tuning such a file to get the best compile time. The compiler and IDE should handle all these details by default. Having the developer involved here just leads to extra (ongoing!) work for the developer and a sub-optimal set of precompiled headers. Your goal should be to have the developer open their project and have it build 6x faster (instead of requiring the developer to do a several hours of tweaking on their PFE file to get the best performance -- and then having to keep it up to date over the life of their project). 3) This is possibly even harder... Keep track of what facts in a header each source file cared about (macro values defined or undefined, structure layout, function signature, etc, etc, etc). If a header changes, have the precompile server keep track of the facts that have changed and then only rebuild source files that care about those changes (assuming the source file itself hasn't compiled). This could get really ugly since you'd potentially keep track of multiple fact timestamps (consider if a build fails or is aborted so some files got updated for the current state of a header and some didn't). Extra bonus points for doing this on a lower granularity basis (i.e., don't recompile a function if it wouldn't produce different output). This would clearly be very hard and a large departure from the current state of affairs :) Anyway, I think the biggest improvements lie in moving away from the current batch compile philosophy mandated by the command line tools. Instead, the command line tools should be a front end onto a much more powerful persistent compile server. (Hey, you asked for ideas and said it was OK if they were hard :) -tim ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-16 13:31 ` Problem with PFE approach [Was: Faster compilation speed] Timothy J. Wood @ 2002-08-16 13:44 ` Devang Patel 2002-08-16 14:31 ` Timothy J. Wood 2002-08-16 13:54 ` Devang Patel 1 sibling, 1 reply; 173+ messages in thread From: Devang Patel @ 2002-08-16 13:44 UTC (permalink / raw) To: Timothy J. Wood; +Cc: Mike Stump, gcc On Friday, August 16, 2002, at 01:31 PM, Timothy J. Wood wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-16 13:44 ` Devang Patel @ 2002-08-16 14:31 ` Timothy J. Wood 2002-08-16 14:39 ` Neil Booth 2002-08-16 14:46 ` Devang Patel 0 siblings, 2 replies; 173+ messages in thread From: Timothy J. Wood @ 2002-08-16 14:31 UTC (permalink / raw) To: Devang Patel; +Cc: Mike Stump, gcc On Friday, August 16, 2002, at 01:43 PM, Devang Patel wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-16 14:31 ` Timothy J. Wood @ 2002-08-16 14:39 ` Neil Booth 2002-08-16 14:46 ` Devang Patel 1 sibling, 0 replies; 173+ messages in thread From: Neil Booth @ 2002-08-16 14:39 UTC (permalink / raw) To: Timothy J. Wood; +Cc: Devang Patel, Mike Stump, gcc Timothy J. Wood wrote:- > The fact that you have to build this massive single header that acts > as a prefix header is the broken part -- implementation details like > this should not be exposed to the user. Just like Apple doesn't make > users manually configure their Apache server for personal web sharing, > Apple shouldn't make their developers do a bunch of work to get decent > compile speeds. It should "Just Work (TM)". I agree. Borland, MS and KAI managed this, so we should too. Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-16 14:31 ` Timothy J. Wood 2002-08-16 14:39 ` Neil Booth @ 2002-08-16 14:46 ` Devang Patel 1 sibling, 0 replies; 173+ messages in thread From: Devang Patel @ 2002-08-16 14:46 UTC (permalink / raw) To: Timothy J. Wood; +Cc: Mike Stump, gcc On Friday, August 16, 2002, at 02:31 PM, Timothy J. Wood wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-16 13:31 ` Problem with PFE approach [Was: Faster compilation speed] Timothy J. Wood 2002-08-16 13:44 ` Devang Patel @ 2002-08-16 13:54 ` Devang Patel 2002-08-16 14:42 ` Neil Booth 2002-08-16 14:45 ` Timothy J. Wood 1 sibling, 2 replies; 173+ messages in thread From: Devang Patel @ 2002-08-16 13:54 UTC (permalink / raw) To: Timothy J. Wood; +Cc: Mike Stump, gcc On Friday, August 16, 2002, at 01:31 PM, Timothy J. Wood wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-16 13:54 ` Devang Patel @ 2002-08-16 14:42 ` Neil Booth 2002-08-16 14:57 ` Devang Patel 2002-08-16 14:45 ` Timothy J. Wood 1 sibling, 1 reply; 173+ messages in thread From: Neil Booth @ 2002-08-16 14:42 UTC (permalink / raw) To: Devang Patel; +Cc: Timothy J. Wood, Mike Stump, gcc Devang Patel wrote:- > In your previous two queries, what you want from PFE is to discard few > things > based on macros from precompiled headers. But when PFE restores trees, > it has gone too far as far as macros are concerned. The implementation should know what its assumptions are, and if they're broken recover somehow. Have you seen KAI's documentation (online) for their PCH implementation? It seems like a good solution to me. Neil. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-16 14:42 ` Neil Booth @ 2002-08-16 14:57 ` Devang Patel 2002-08-17 15:31 ` Timothy J. Wood 0 siblings, 1 reply; 173+ messages in thread From: Devang Patel @ 2002-08-16 14:57 UTC (permalink / raw) To: Neil Booth; +Cc: Timothy J. Wood, Mike Stump, gcc On Friday, August 16, 2002, at 02:41 PM, Neil Booth wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-16 14:57 ` Devang Patel @ 2002-08-17 15:31 ` Timothy J. Wood 2002-08-17 20:04 ` Daniel Berlin ` (2 more replies) 0 siblings, 3 replies; 173+ messages in thread From: Timothy J. Wood @ 2002-08-17 15:31 UTC (permalink / raw) To: Devang Patel; +Cc: Mike Stump, gcc So, another problem with PFE that I've noticed after working with it for a while... If you put all your commonly used headers in a PFE, then changing any of these headers causes the PFE header to considered changed. And, since this header is imported into every single file in your project, you end up in a situation where changing any header causes the entire project to be rebuilt. This is clearly not good for day to day development. A PCH approach that was automatic and didn't have a single monolithic file would avoid the artificial tying together of all the headers in the world and would thus lead to faster incremental builds due to fewer files being rebuilt. Another approach that would work with a monolithic file would be some sort of fact database that would allow the build system to decide early on that the change in question didn't effect some subset of files. -tim ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-17 15:31 ` Timothy J. Wood @ 2002-08-17 20:04 ` Daniel Berlin 2002-08-17 20:07 ` Andrew Pinski 2002-08-17 20:14 ` Timothy J. Wood 2002-08-17 20:15 ` Daniel Berlin 2002-08-19 7:07 ` Stan Shebs 2 siblings, 2 replies; 173+ messages in thread From: Daniel Berlin @ 2002-08-17 20:04 UTC (permalink / raw) To: Timothy J. Wood; +Cc: Devang Patel, Mike Stump, gcc On Sat, 17 Aug 2002, Timothy J. Wood wrote: > > So, another problem with PFE that I've noticed after working with it > for a while... > > If you put all your commonly used headers in a PFE, then changing any > of these headers causes the PFE header to considered changed. And, > since this header is imported into every single file in your project, > you end up in a situation where changing any header causes the entire > project to be rebuilt. Um, this header should *not* be explicitly included in the files. It's *prefix* header. The only thing that would need to be rebuilt in this case is the prefix header. Everything else that would normally not be rebuilt will not be rebuilt. IE the only thing extra that gets rebuilt is the prefix header. > This is clearly not good for day to day > development. > > A PCH approach that was automatic and didn't have a single monolithic > file would avoid the artificial tying together of all the headers in > the world and would thus lead to faster incremental builds due to fewer > files being rebuilt. > > Another approach that would work with a monolithic file would be some > sort of fact database that would allow the build system to decide early > on that the change in question didn't effect some subset of files. > > -tim > > > ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-17 20:04 ` Daniel Berlin @ 2002-08-17 20:07 ` Andrew Pinski 2002-08-17 20:14 ` Timothy J. Wood 1 sibling, 0 replies; 173+ messages in thread From: Andrew Pinski @ 2002-08-17 20:07 UTC (permalink / raw) To: dberlin; +Cc: Timothy J. Wood, Devang Patel, Mike Stump, gcc PFE is like the prepocessed headers in CodeWarrior. Thanks, Andrew Pinski ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-17 20:04 ` Daniel Berlin 2002-08-17 20:07 ` Andrew Pinski @ 2002-08-17 20:14 ` Timothy J. Wood 2002-08-17 20:21 ` Daniel Berlin 2002-08-19 11:59 ` Devang Patel 1 sibling, 2 replies; 173+ messages in thread From: Timothy J. Wood @ 2002-08-17 20:14 UTC (permalink / raw) To: dberlin; +Cc: Devang Patel, Mike Stump, gcc On Saturday, August 17, 2002, at 08:04 PM, Daniel Berlin wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-17 20:14 ` Timothy J. Wood @ 2002-08-17 20:21 ` Daniel Berlin 2002-08-18 3:17 ` Kai Henningsen 2002-08-19 11:59 ` Devang Patel 1 sibling, 1 reply; 173+ messages in thread From: Daniel Berlin @ 2002-08-17 20:21 UTC (permalink / raw) To: Timothy J. Wood; +Cc: Devang Patel, Mike Stump, gcc On Sat, 17 Aug 2002, Timothy J. Wood wrote: > > On Saturday, August 17, 2002, at 08:04 PM, Daniel Berlin wrote: > > > On Sat, 17 Aug 2002, Timothy J. Wood wrote: > > > >> > >> So, another problem with PFE that I've noticed after working with > >> it > >> for a while... > >> > >> If you put all your commonly used headers in a PFE, then changing > >> any > >> of these headers causes the PFE header to considered changed. And, > >> since this header is imported into every single file in your project, > >> you end up in a situation where changing any header causes the entire > >> project to be rebuilt. > > > > Um, this header should *not* be explicitly included in the files. > > It's *prefix* header. > > I'm not saying that I'm #including it in my sources. What I'm saying > is that the IDE knows that all my files depend upon it (they all end up > including it due to it being the prefix header, regardless of whether > it is listed or not). This means that they may have depedencies on the > its contents and must be rebuilt if it or any header it includes > changes. No, they shouldn't have any dependencies on it's contents. They should include what they normally include. The fact that the prefix header stores the compiler state should prevent these includes from doing anything (since it'll know it's already processed that header) when it is present. Any build system that makes the files depend on the prefix header is broken, and needs to be fixed. Prefix headers need to be rebuilt when compilation options change, or the headers it includes change. Files only need rebuilt when some normal header they depend on changes. *Not* when the prefix header changes. > > The way I think about this is that the prefix header mess is just a > hack to avoid having a #include at the top of each file. There should > be nothing else special about the header -- it is just assumed that > there is a #include at the top of your file. > > > The only thing that would need to be rebuilt in this case is the > > prefix header. > > Everything else that would normally not be rebuilt will not be rebuilt. > > Nope... everything needs to be rebuilt. The problem is that the > prefix header might satisfy some symbol or macro that a source file > needs (assume that the source file doesn't explicitly include headers > it needs). Don't assume that. It should always do so. If not, the source code is wrong. Period. It's not a usability issue that users must have the proper includes. --Dan ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-17 20:21 ` Daniel Berlin @ 2002-08-18 3:17 ` Kai Henningsen 2002-08-18 7:36 ` Daniel Berlin 0 siblings, 1 reply; 173+ messages in thread From: Kai Henningsen @ 2002-08-18 3:17 UTC (permalink / raw) To: gcc dberlin@dberlin.org (Daniel Berlin) wrote on 17.08.02 in < Pine.LNX.4.44.0208172315090.29572-100000@dberlin.org >: > On Sat, 17 Aug 2002, Timothy J. Wood wrote: > > > > > On Saturday, August 17, 2002, at 08:04 PM, Daniel Berlin wrote: > > > > > On Sat, 17 Aug 2002, Timothy J. Wood wrote: > > > > > >> > > >> So, another problem with PFE that I've noticed after working with > > >> it > > >> for a while... > > >> > > >> If you put all your commonly used headers in a PFE, then changing > > >> any > > >> of these headers causes the PFE header to considered changed. And, > > >> since this header is imported into every single file in your project, > > >> you end up in a situation where changing any header causes the entire > > >> project to be rebuilt. > > > > > > Um, this header should *not* be explicitly included in the files. > > > It's *prefix* header. > > > > I'm not saying that I'm #including it in my sources. What I'm saying > > is that the IDE knows that all my files depend upon it (they all end up > > including it due to it being the prefix header, regardless of whether > > it is listed or not). This means that they may have depedencies on the > > its contents and must be rebuilt if it or any header it includes > > changes. > > No, they shouldn't have any dependencies on it's contents. They should That would be seriously broken ... > include what they normally include. The fact that the prefix header stores > the compiler state should prevent these includes from doing anything (since > it'll know it's already processed that header) when it is present. > Any build system that makes the files depend on the prefix header is > broken, and needs to be fixed. ... unless you have some mechanism to prevent them from being influenced by any change in any header which is used in the prefix header but which they do not include normally. What mechanism would that be? The dependency chain is *exactly* the same as if the prefix header was normally included at the start of every source file. MfG Kai ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-18 3:17 ` Kai Henningsen @ 2002-08-18 7:36 ` Daniel Berlin 2002-08-18 11:20 ` jepler 0 siblings, 1 reply; 173+ messages in thread From: Daniel Berlin @ 2002-08-18 7:36 UTC (permalink / raw) To: Kai Henningsen; +Cc: gcc On 18 Aug 2002, Kai Henningsen wrote: > dberlin@dberlin.org (Daniel Berlin) wrote on 17.08.02 in < Pine.LNX.4.44.0208172315090.29572-100000@dberlin.org >: > > > On Sat, 17 Aug 2002, Timothy J. Wood wrote: > > > > > > > > On Saturday, August 17, 2002, at 08:04 PM, Daniel Berlin wrote: > > > > > > > On Sat, 17 Aug 2002, Timothy J. Wood wrote: > > > > > > > >> > > > >> So, another problem with PFE that I've noticed after working with > > > >> it > > > >> for a while... > > > >> > > > >> If you put all your commonly used headers in a PFE, then changing > > > >> any > > > >> of these headers causes the PFE header to considered changed. And, > > > >> since this header is imported into every single file in your project, > > > >> you end up in a situation where changing any header causes the entire > > > >> project to be rebuilt. > > > > > > > > Um, this header should *not* be explicitly included in the files. > > > > It's *prefix* header. > > > > > > I'm not saying that I'm #including it in my sources. What I'm saying > > > is that the IDE knows that all my files depend upon it (they all end up > > > including it due to it being the prefix header, regardless of whether > > > it is listed or not). This means that they may have depedencies on the > > > its contents and must be rebuilt if it or any header it includes > > > changes. > > > > No, they shouldn't have any dependencies on it's contents. They should > > That would be seriously broken ... > > > include what they normally include. The fact that the prefix header stores > > the compiler state should prevent these includes from doing anything (since > > it'll know it's already processed that header) when it is present. > > Any build system that makes the files depend on the prefix header is > > broken, and needs to be fixed. > > ... unless you have some mechanism to prevent them from being influenced > by any change in any header which is used in the prefix header but which > they do not include normally. Why would they be influenced by a change to something they would not normally include? Unless they don't include what they normally should. > > What mechanism would that be? Reality? > > The dependency chain is *exactly* the same as if the prefix header was > normally included at the start of every source file. This is wrong, and leads exactly to the problem Tim describes. The dependency chain should *not* include the prefix header. The fact that the prefix header exists is not something the build system should know about, except insofar that it rebuild the prefix header when the headers it includes changes. That's *it*. > > MfG Kai > > ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-18 7:36 ` Daniel Berlin @ 2002-08-18 11:20 ` jepler 2002-08-18 13:20 ` Daniel Berlin 0 siblings, 1 reply; 173+ messages in thread From: jepler @ 2002-08-18 11:20 UTC (permalink / raw) To: Daniel Berlin; +Cc: Kai Henningsen, gcc Let me see if I understand what people are talking about. a.h: /* Include header guard if appropriate */ #define X 1 b.h: /* Include header guard if appropriate */ #define Y 1 m.c: #include "a.h" int main(void) { return Y; } If m.c is compiled using PFE, and the PFE header contains both a.h and b.h, will the compilation complete successfully? If yes, and b.h is later modified to remove the Y definition will a build system where m.c does not depend on the PFE header actually rebuild m.c, since the output of m.c depends (erroneously) on an item in b.h through the PFE header? My understanding of the PFE symbol implies that m.c would see a definition from b.h even though b.h was not the target of a #include directive. This means that programmers will accidentally depend on symbols from b.h even when it's not included, and that if they do, and the build system does not consider the PFE header a dependency of each source file, the definitions will not only be visible when they should not be, but the build will be wrong since the new contents of these accidentally referenced header files will not catually cause a rebuild. Jeff ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-18 11:20 ` jepler @ 2002-08-18 13:20 ` Daniel Berlin 2002-08-18 14:31 ` Timothy J. Wood 0 siblings, 1 reply; 173+ messages in thread From: Daniel Berlin @ 2002-08-18 13:20 UTC (permalink / raw) To: jepler; +Cc: Kai Henningsen, gcc On Sun, 18 Aug 2002 jepler@unpythonic.net wrote: > Let me see if I understand what people are talking about. > > a.h: > /* Include header guard if appropriate */ > #define X 1 > > b.h: > /* Include header guard if appropriate */ > #define Y 1 > > m.c: > #include "a.h" > int main(void) { return Y; } > > If m.c is compiled using PFE, and the PFE header contains both a.h and b.h, > will the compilation complete successfully? > > If yes, and b.h is later modified to remove the Y definition will a build > system where m.c does not depend on the PFE header actually rebuild m.c, > since the output of m.c depends (erroneously) on an item in b.h through > the PFE header? A build system where m.c does not depend on the prefix header should *not* rebuild if b.h is modified. That's my point. > > My understanding of the PFE symbol implies that m.c would see a definition > from b.h even though b.h was not the target of a #include directive. Yes, they would be existing, but this is user error. They should always include the right things. In other words, you should make sure it works without a PFE header before you try it *with* one. It's only when you *count* on the fact that the PFE header is there that you run into dependency problems. --Dan ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-18 13:20 ` Daniel Berlin @ 2002-08-18 14:31 ` Timothy J. Wood 2002-08-18 14:35 ` Andrew Pinski 2002-08-19 2:41 ` Michael Matz 0 siblings, 2 replies; 173+ messages in thread From: Timothy J. Wood @ 2002-08-18 14:31 UTC (permalink / raw) To: dberlin; +Cc: jepler, Kai Henningsen, gcc On Sunday, August 18, 2002, at 01:20 PM, Daniel Berlin wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-18 14:31 ` Timothy J. Wood @ 2002-08-18 14:35 ` Andrew Pinski 2002-08-18 14:55 ` Timothy J. Wood 2002-08-19 2:41 ` Michael Matz 1 sibling, 1 reply; 173+ messages in thread From: Andrew Pinski @ 2002-08-18 14:35 UTC (permalink / raw) To: Timothy J. Wood; +Cc: dberlin, jepler, Kai Henningsen, gcc PFE is good for headers that hardly change, like system headers. It is not good for headers that change in development. Thanks, Andrew Pinski ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-18 14:35 ` Andrew Pinski @ 2002-08-18 14:55 ` Timothy J. Wood 0 siblings, 0 replies; 173+ messages in thread From: Timothy J. Wood @ 2002-08-18 14:55 UTC (permalink / raw) To: Andrew Pinski; +Cc: gcc On Sunday, August 18, 2002, at 02:36 PM, Andrew Pinski wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-18 14:31 ` Timothy J. Wood 2002-08-18 14:35 ` Andrew Pinski @ 2002-08-19 2:41 ` Michael Matz 2002-08-19 6:26 ` jepler 2002-08-19 11:53 ` Devang Patel 1 sibling, 2 replies; 173+ messages in thread From: Michael Matz @ 2002-08-19 2:41 UTC (permalink / raw) To: Timothy J. Wood; +Cc: dberlin, jepler, Kai Henningsen, gcc Hi, On Sun, 18 Aug 2002, Timothy J. Wood wrote: > Thus, if you are going to implicitly include the header, you damn > well better included it in dependency analysis. No, because the existance of that header shouldn't influence the outcome of the compiler in any way. > I can accept an argument of "this is too hard to do correctly right > now", but not "the user screwed up". The user didn't screw up -- the > compiler just isn't smart enough to do it correctly yet. If the source doesn't compile without the prefix header the user did something wrong, IOW he's screwed if he doesn't want to fix it. Period. Ciao, Michael. ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-19 2:41 ` Michael Matz @ 2002-08-19 6:26 ` jepler 2002-08-19 6:40 ` Daniel Berlin 2002-08-19 11:50 ` Devang Patel 2002-08-19 11:53 ` Devang Patel 1 sibling, 2 replies; 173+ messages in thread From: jepler @ 2002-08-19 6:26 UTC (permalink / raw) To: Michael Matz; +Cc: Timothy J. Wood, dberlin, Kai Henningsen, gcc > On Sun, 18 Aug 2002, Timothy J. Wood wrote: > > I can accept an argument of "this is too hard to do correctly right > > now", but not "the user screwed up". The user didn't screw up -- the > > compiler just isn't smart enough to do it correctly yet. On Mon, Aug 19, 2002 at 11:21:28AM +0200, Michael Matz wrote: > If the source doesn't compile without the prefix header the user did > something wrong, IOW he's screwed if he doesn't want to fix it. Period. PFE makes it too easy for the programmer to accidentally give his program different meaning with or without the prefix header. I can do without one more way to screw up my program. The following set of files will compile a program with or without PFE, but using a PFE that contains both a.h and b.h, the behavior will change. So the suggestion that files should be checked that they compile without PFE is not enough to ensure that there aren't unintended changes in program meaning in the presence of PFE. // a.h #define DEFA // b.h #define DEFB // m.c #include "a.h" int main(void) { #ifdef DEFB return 1; #else return 0; #endif; } ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-19 6:26 ` jepler @ 2002-08-19 6:40 ` Daniel Berlin 2002-08-19 11:50 ` Devang Patel 1 sibling, 0 replies; 173+ messages in thread From: Daniel Berlin @ 2002-08-19 6:40 UTC (permalink / raw) To: jepler; +Cc: Michael Matz, Timothy J. Wood, Kai Henningsen, gcc On Mon, 19 Aug 2002 jepler@unpythonic.net wrote: > > On Sun, 18 Aug 2002, Timothy J. Wood wrote: > > > I can accept an argument of "this is too hard to do correctly right > > > now", but not "the user screwed up". The user didn't screw up -- the > > > compiler just isn't smart enough to do it correctly yet. > > On Mon, Aug 19, 2002 at 11:21:28AM +0200, Michael Matz wrote: > > If the source doesn't compile without the prefix header the user did > > something wrong, IOW he's screwed if he doesn't want to fix it. Period. > > PFE makes it too easy for the programmer to accidentally give his program > different meaning with or without the prefix header. I can do without one > more way to screw up my program. > > The following set of files will compile a program with or without PFE, but > using a PFE that contains both a.h and b.h, the behavior will change. This is an implementation problem, and one that should be fixed. As is making symbols visible without the explicit includes (Though this is slightly harder to solve, but still possible through various means). ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-19 6:26 ` jepler 2002-08-19 6:40 ` Daniel Berlin @ 2002-08-19 11:50 ` Devang Patel 2002-08-19 12:55 ` Jeff Epler 1 sibling, 1 reply; 173+ messages in thread From: Devang Patel @ 2002-08-19 11:50 UTC (permalink / raw) To: jepler; +Cc: dberlin, gcc On Monday, August 19, 2002, at 06:26 AM, jepler@unpythonic.net wrote: The following set of files will compile a program with or without PFE, but using a PFE that contains both a.h and b.h, the behavior will change. This is not implementation problem or PFE model problem. If you are including a.h and b.h in PFE means what you're asking compiler to do is to compile following source /// m.c #include "a.h" #include "b.h" int main(void) { #ifdef DEFB return 1; #else return 0; #endif; } And, no doubt, it can have different behavior then following original source // m.c #include "a.h" int main(void) { #ifdef DEFB return 1; #else return 0; #endif; } -Devang So the suggestion that files should be checked that they compile without PFE is not enough to ensure that there aren't unintended changes in program meaning in the presence of PFE. // a.h #define DEFA // b.h #define DEFB // m.c #include "a.h" int main(void) { #ifdef DEFB return 1; #else return 0; #endif; } ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-19 11:50 ` Devang Patel @ 2002-08-19 12:55 ` Jeff Epler 2002-08-19 13:03 ` Ziemowit Laski 0 siblings, 1 reply; 173+ messages in thread From: Jeff Epler @ 2002-08-19 12:55 UTC (permalink / raw) To: Devang Patel; +Cc: dberlin, gcc On Mon, Aug 19, 2002 at 11:50:24AM -0700, Devang Patel wrote: > > On Monday, August 19, 2002, at 06:26 AM, jepler@unpythonic.net wrote: > > > > The following set of files will compile a program with or without > > PFE, but > > using a PFE that contains both a.h and b.h, the behavior will > > change. > > > > > This is not implementation problem or PFE model problem. > If you are including a.h and b.h in PFE means what you're asking > compiler to do > is to compile following source > > > /// m.c > #include "a.h" > #include "b.h" > int main(void) { > #ifdef DEFB > return 1; > #else > return 0; > #endif; > } .. then the build system must treat m.c as depending on the PFE, which in turn depends on all headers it contains. But that's where this discussion started, with the PFE cure being worse than the illness since it makes your whole project recompile when you touch a header file. Jeff ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-19 12:55 ` Jeff Epler @ 2002-08-19 13:03 ` Ziemowit Laski 0 siblings, 0 replies; 173+ messages in thread From: Ziemowit Laski @ 2002-08-19 13:03 UTC (permalink / raw) To: Jeff Epler; +Cc: Devang Patel, dberlin, gcc On Monday, Aug 19, 2002, at 12:54 US/Pacific, Jeff Epler wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-19 2:41 ` Michael Matz 2002-08-19 6:26 ` jepler @ 2002-08-19 11:53 ` Devang Patel 1 sibling, 0 replies; 173+ messages in thread From: Devang Patel @ 2002-08-19 11:53 UTC (permalink / raw) To: Michael Matz; +Cc: Timothy J. Wood, dberlin, jepler, Kai Henningsen, gcc On Monday, August 19, 2002, at 02:21 AM, Michael Matz wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-17 20:14 ` Timothy J. Wood 2002-08-17 20:21 ` Daniel Berlin @ 2002-08-19 11:59 ` Devang Patel 1 sibling, 0 replies; 173+ messages in thread From: Devang Patel @ 2002-08-19 11:59 UTC (permalink / raw) To: Timothy J. Wood; +Cc: dberlin, Mike Stump, gcc On Saturday, August 17, 2002, at 08:14 PM, Timothy J. Wood wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-17 15:31 ` Timothy J. Wood 2002-08-17 20:04 ` Daniel Berlin @ 2002-08-17 20:15 ` Daniel Berlin 2002-08-19 7:07 ` Stan Shebs 2 siblings, 0 replies; 173+ messages in thread From: Daniel Berlin @ 2002-08-17 20:15 UTC (permalink / raw) To: Timothy J. Wood; +Cc: Devang Patel, Mike Stump, gcc On Sat, 17 Aug 2002, Timothy J. Wood wrote: > > So, another problem with PFE that I've noticed after working with it > for a while... > > If you put all your commonly used headers in a PFE, then changing any > of these headers causes the PFE header to considered changed. And, > since this header is imported into every single file in your project, > you end up in a situation where changing any header causes the entire > project to be rebuilt. This is clearly not good for day to day > development. > > A PCH approach that was automatic and didn't have a single monolithic > file would avoid the artificial tying together of all the headers in > the world and would thus lead to faster incremental builds due to fewer > files being rebuilt. > > Another approach that would work with a monolithic file would be some > sort of fact database that would allow the build system to decide early > on that the change in question didn't effect some subset of files. > Also, while constructive criticism is good and all, at some point, it becomes "put up or shut up". It's one thing to say how great something would be, another thing to implement it. We have heard your idea, we know how to implement it. Everyone is aware of it. At this point, i'd rather you tell me how good it is when you've got code to do it, rather than keep pointing out what you perceive to be flaws in something that is a large improvement over what exists now. One of the things that slows down gcc development is criticism of patches that are large improvements over what exists now, in favor of some "better" approach, which nobody has yet implemented. Then this large improvement never gets accepted, and nobody ever implements the "better approach". The perfect is the enemy of the good. --Dan ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-17 15:31 ` Timothy J. Wood 2002-08-17 20:04 ` Daniel Berlin 2002-08-17 20:15 ` Daniel Berlin @ 2002-08-19 7:07 ` Stan Shebs 2002-08-19 8:52 ` Timothy J. Wood 2 siblings, 1 reply; 173+ messages in thread From: Stan Shebs @ 2002-08-19 7:07 UTC (permalink / raw) To: Timothy J. Wood; +Cc: Devang Patel, Mike Stump, gcc Timothy J. Wood wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-19 7:07 ` Stan Shebs @ 2002-08-19 8:52 ` Timothy J. Wood 0 siblings, 0 replies; 173+ messages in thread From: Timothy J. Wood @ 2002-08-19 8:52 UTC (permalink / raw) To: Stan Shebs; +Cc: Devang Patel, Mike Stump, gcc On Monday, August 19, 2002, at 07:05 AM, Stan Shebs wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Problem with PFE approach [Was: Faster compilation speed] 2002-08-16 13:54 ` Devang Patel 2002-08-16 14:42 ` Neil Booth @ 2002-08-16 14:45 ` Timothy J. Wood 1 sibling, 0 replies; 173+ messages in thread From: Timothy J. Wood @ 2002-08-16 14:45 UTC (permalink / raw) To: Devang Patel; +Cc: Mike Stump, gcc On Friday, August 16, 2002, at 01:54 PM, Devang Patel wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 12:17 Faster compilation speed Mike Stump ` (4 preceding siblings ...) 2002-08-09 14:59 ` Timothy J. Wood @ 2002-08-09 16:01 ` Richard Henderson 2002-08-10 17:48 ` Aaron Lehmann 6 siblings, 0 replies; 173+ messages in thread From: Richard Henderson @ 2002-08-09 16:01 UTC (permalink / raw) To: Mike Stump; +Cc: gcc On Fri, Aug 09, 2002 at 12:17:32PM -0700, Mike Stump wrote: > Another question is, what should the lower limit be on uglifying code > for the sake of compilation speed. You'll find that really ugly code will compile slower than code that has been optimized some simply due to the fact that you emit less assembly, and therefore do less I/O. As for not re-using temp slots, sure I guess that's something we can do at -O0. I don't see a need for the new command-line switch though. r~ ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-09 12:17 Faster compilation speed Mike Stump ` (5 preceding siblings ...) 2002-08-09 16:01 ` Faster compilation speed Richard Henderson @ 2002-08-10 17:48 ` Aaron Lehmann 2002-08-12 10:36 ` Dale Johannesen 6 siblings, 1 reply; 173+ messages in thread From: Aaron Lehmann @ 2002-08-10 17:48 UTC (permalink / raw) To: Mike Stump; +Cc: gcc On Fri, Aug 09, 2002 at 12:17:32PM -0700, Mike Stump wrote: > I'd like to introduce lots of various changes to improve compiler > speed. Just adding my two cents to the discussion - I saw many ideas presented in this thread that look promising, but one thing that I didn't see mentioned was gcc's extensive sanity checking. There are many tests which will produce an internal compiler error when merited. This is great tool for debugging, but most of these errors should be impossible to reach. Does anyone know how much overhead this sanity checking in general causes, and whether there are any sanity checks that are unusually expensive and should be considered for removal? ^ permalink raw reply [flat|nested] 173+ messages in thread
* Re: Faster compilation speed 2002-08-10 17:48 ` Aaron Lehmann @ 2002-08-12 10:36 ` Dale Johannesen 0 siblings, 0 replies; 173+ messages in thread From: Dale Johannesen @ 2002-08-12 10:36 UTC (permalink / raw) To: Aaron Lehmann; +Cc: Dale Johannesen, Mike Stump, gcc On Saturday, August 10, 2002, at 05:48 PM, Aaron Lehmann wrote: ^ permalink raw reply [flat|nested] 173+ messages in thread
end of thread, other threads:[~2002-08-23 15:39 UTC | newest] Thread overview: 173+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-08-09 12:17 Faster compilation speed Mike Stump 2002-08-09 13:04 ` Noel Yap 2002-08-09 13:10 ` Matt Austern 2002-08-09 14:22 ` Neil Booth 2002-08-09 14:44 ` Noel Yap 2002-08-09 15:14 ` Neil Booth 2002-08-10 15:54 ` Noel Yap 2002-08-09 15:13 ` Stan Shebs 2002-08-09 15:18 ` Neil Booth 2002-08-10 16:12 ` Noel Yap 2002-08-10 18:00 ` Nix 2002-08-10 20:36 ` Noel Yap 2002-08-11 4:30 ` Nix 2002-08-12 15:08 ` Mike Stump 2002-08-09 15:19 ` Ziemowit Laski 2002-08-09 15:25 ` Neil Booth 2002-08-10 16:16 ` Noel Yap 2002-08-10 16:07 ` Noel Yap 2002-08-10 16:18 ` Neil Booth 2002-08-10 20:27 ` Noel Yap 2002-08-11 0:11 ` Neil Booth 2002-08-12 12:04 ` Devang Patel 2002-08-09 18:57 ` Linus Torvalds 2002-08-09 19:12 ` Phil Edwards 2002-08-09 19:34 ` Kevin Atkinson 2002-08-09 20:28 ` Linus Torvalds 2002-08-09 21:12 ` Daniel Berlin 2002-08-09 21:52 ` Linus Torvalds 2002-08-10 6:32 ` Robert Lipe 2002-08-10 14:26 ` Cyrille Chepelov 2002-08-10 17:33 ` Daniel Berlin 2002-08-10 18:21 ` Linus Torvalds 2002-08-10 18:38 ` Daniel Berlin 2002-08-10 18:39 ` Cyrille Chepelov 2002-08-10 18:28 ` Cyrille Chepelov 2002-08-10 18:30 ` John Levon 2002-08-11 1:03 ` Florian Weimer 2002-08-10 19:20 ` Noel Yap 2002-08-09 13:10 ` Aldy Hernandez 2002-08-09 15:28 ` Mike Stump 2002-08-09 16:00 ` Aldy Hernandez 2002-08-09 16:26 ` Stan Shebs 2002-08-09 16:31 ` Aldy Hernandez 2002-08-09 16:51 ` Stan Shebs 2002-08-09 16:54 ` Aldy Hernandez 2002-08-09 17:44 ` Daniel Berlin 2002-08-09 18:35 ` David S. Miller 2002-08-09 18:39 ` Aldy Hernandez 2002-08-09 18:59 ` David S. Miller 2002-08-09 20:01 ` Per Bothner 2002-08-09 18:25 ` David S. Miller 2002-08-13 0:50 ` Loren James Rittle 2002-08-13 21:46 ` Fergus Henderson 2002-08-13 22:40 ` David S. Miller 2002-08-13 23:44 ` Fergus Henderson 2002-08-14 7:58 ` Jeff Sturm 2002-08-14 9:52 ` Richard Henderson 2002-08-14 10:00 ` David Edelsohn 2002-08-14 12:01 ` Andreas Schwab 2002-08-14 12:07 ` David Edelsohn 2002-08-14 13:20 ` Michael Matz 2002-08-14 16:31 ` Faster compilation speed [zone allocation] Per Bothner 2002-08-15 11:34 ` Aldy Hernandez 2002-08-15 11:39 ` David Edelsohn 2002-08-15 12:01 ` Lynn Winebarger 2002-08-15 12:11 ` David Edelsohn 2002-08-15 11:41 ` Michael Matz 2002-08-16 8:44 ` Kai Henningsen 2002-08-15 11:43 ` Per Bothner 2002-08-15 11:57 ` Kevin Handy 2002-08-14 13:20 ` Faster compilation speed Jamie Lokier 2002-08-14 16:01 ` Nix 2002-08-14 10:15 ` David Edelsohn 2002-08-14 16:35 ` Richard Henderson 2002-08-14 17:02 ` David Edelsohn 2002-08-20 4:15 ` Richard Earnshaw 2002-08-20 5:38 ` Jeff Sturm 2002-08-20 5:53 ` Richard Earnshaw 2002-08-20 13:42 ` Jeff Sturm 2002-08-22 1:55 ` Richard Earnshaw 2002-08-22 2:03 ` David S. Miller 2002-08-23 15:39 ` Jeff Sturm 2002-08-20 8:00 ` David Edelsohn 2002-08-14 7:36 ` Jeff Sturm 2002-08-10 10:02 ` Neil Booth 2002-08-09 17:36 ` Daniel Berlin 2002-08-12 16:23 ` Mike Stump 2002-08-12 16:05 ` Mike Stump 2002-08-09 19:07 ` David Edelsohn 2002-08-09 14:29 ` Neil Booth 2002-08-09 15:02 ` Nathan Sidwell 2002-08-09 17:05 ` Stan Shebs 2002-08-10 2:21 ` Gabriel Dos Reis 2002-08-12 12:11 ` Mike Stump 2002-08-12 12:41 ` David Edelsohn 2002-08-12 12:47 ` Matt Austern 2002-08-12 12:56 ` David S. Miller 2002-08-12 13:56 ` Matt Austern 2002-08-12 14:27 ` Daniel Berlin 2002-08-12 15:26 ` David Edelsohn 2002-08-13 10:49 ` David Edelsohn 2002-08-13 10:52 ` David S. Miller 2002-08-13 14:03 ` David Edelsohn 2002-08-13 14:46 ` Geoff Keating 2002-08-13 15:10 ` David Edelsohn 2002-08-13 15:26 ` Neil Booth 2002-08-14 9:25 ` Kevin Handy 2002-08-18 12:58 ` Jeff Sturm 2002-08-19 12:55 ` Mike Stump 2002-08-20 11:22 ` Will Cohen 2002-08-13 15:32 ` Daniel Berlin 2002-08-13 15:58 ` David Edelsohn 2002-08-13 16:49 ` David S. Miller 2002-08-12 14:59 ` David S. Miller 2002-08-12 16:00 ` Geoff Keating 2002-08-13 2:58 ` Nick Ing-Simmons 2002-08-13 10:47 ` Richard Henderson 2002-08-12 14:28 ` Stan Shebs 2002-08-12 15:05 ` David S. Miller 2002-08-12 19:17 ` Mike Stump 2002-08-12 23:28 ` Neil Booth 2002-08-09 14:51 ` Stan Shebs 2002-08-09 15:03 ` David Edelsohn 2002-08-09 15:43 ` Stan Shebs 2002-08-09 16:43 ` Alan Lehotsky 2002-08-09 16:49 ` Matt Austern 2002-08-10 2:24 ` Gabriel Dos Reis 2002-08-09 15:26 ` Geoff Keating 2002-08-09 16:06 ` Stan Shebs 2002-08-09 16:14 ` Terry Flannery 2002-08-09 16:29 ` Neil Booth 2002-08-09 16:29 ` Phil Edwards 2002-08-12 16:24 ` Mike Stump 2002-08-12 18:38 ` Phil Edwards 2002-08-13 5:27 ` Theodore Papadopoulo 2002-08-13 10:03 ` Mike Stump 2002-08-12 15:55 ` Mike Stump 2002-08-09 14:59 ` Timothy J. Wood 2002-08-16 13:31 ` Problem with PFE approach [Was: Faster compilation speed] Timothy J. Wood 2002-08-16 13:44 ` Devang Patel 2002-08-16 14:31 ` Timothy J. Wood 2002-08-16 14:39 ` Neil Booth 2002-08-16 14:46 ` Devang Patel 2002-08-16 13:54 ` Devang Patel 2002-08-16 14:42 ` Neil Booth 2002-08-16 14:57 ` Devang Patel 2002-08-17 15:31 ` Timothy J. Wood 2002-08-17 20:04 ` Daniel Berlin 2002-08-17 20:07 ` Andrew Pinski 2002-08-17 20:14 ` Timothy J. Wood 2002-08-17 20:21 ` Daniel Berlin 2002-08-18 3:17 ` Kai Henningsen 2002-08-18 7:36 ` Daniel Berlin 2002-08-18 11:20 ` jepler 2002-08-18 13:20 ` Daniel Berlin 2002-08-18 14:31 ` Timothy J. Wood 2002-08-18 14:35 ` Andrew Pinski 2002-08-18 14:55 ` Timothy J. Wood 2002-08-19 2:41 ` Michael Matz 2002-08-19 6:26 ` jepler 2002-08-19 6:40 ` Daniel Berlin 2002-08-19 11:50 ` Devang Patel 2002-08-19 12:55 ` Jeff Epler 2002-08-19 13:03 ` Ziemowit Laski 2002-08-19 11:53 ` Devang Patel 2002-08-19 11:59 ` Devang Patel 2002-08-17 20:15 ` Daniel Berlin 2002-08-19 7:07 ` Stan Shebs 2002-08-19 8:52 ` Timothy J. Wood 2002-08-16 14:45 ` Timothy J. Wood 2002-08-09 16:01 ` Faster compilation speed Richard Henderson 2002-08-10 17:48 ` Aaron Lehmann 2002-08-12 10:36 ` Dale Johannesen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).