Re: C compile time

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: C compile time
  2003-06-19 20:16 C compile time Dara Hazeghi
@ 2003-06-19 20:16 ` Andrew Pinski
  2003-06-19 20:22 ` Diego Novillo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 71+ messages in thread
From: Andrew Pinski @ 2003-06-19 20:16 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: Andrew Pinski, gcc


On Thursday, Jun 19, 2003, at 15:21 US/Eastern, Dara Hazeghi wrote:

> 1) Andrew Pinski's patch gets us back on average about
> 10% on mainline and 7.5% on branch. Hopefully his
> copyright assignment arrives soon!

It just came and I will be signing it soon but I having problems with my
patch now for some reason for c++ which I am looking at right now.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-19 20:16 Dara Hazeghi
  2003-06-19 20:16 ` Andrew Pinski
                   ` (3 more replies)
  0 siblings, 4 replies; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-19 20:16 UTC (permalink / raw)
  To: gcc; +Cc: pinskia

Hello,

I've now updated results at
http://www.myownlittleworld.com/computers/gcctable.html
to include a table of the percentage change for
compile time between the various compilers.

Some things which stick out:

1) Andrew Pinski's patch gets us back on average about
10% on mainline and 7.5% on branch. Hopefully his
copyright assignment arrives soon!

2) Kaveh's work the garbage collection algorithm means
that gcc 3.3 is the first major gcc release since egcs
1.0.3a that's not more than 5% slower than the last
major release on the preceding branch. gcc 3.2.3 for
instance is between 14% and 42% slower than gcc 3.0.4
depending on optimizations.

3) The biggest slowdown to gcc was between gcc 3.0.X
and 3.1.X (ie different branches), but between 3.1.1
and 3.2.3, there was quite a slowdown too, ~6-9%. I
didn't realize that such big changes occurred on
release branches.

4) -funit-at-a-time is expensive!

Some (possibly impractical) suggestions:

1) At this point, the SPEC testers are keeping track
of runtime. We already have compile regression testers
for the testsuite which automatically report
regressions to the list. Should the same be done for
compile-time, ie a compile time increase of more than
1% prompts a message to the list.

1b) This may not be practical because of noise between
runs. Possibly do multiple runs, and take their mean,
or some such to reduce noise? Maybe not practical with
SPEC2K, but with SPEC95?

2) Establish a clear criteria for new optimizations,
and where they fit. For instance, -O1 according to the
manual means avoiding time consuming optimizations.
Yet as of 3.3, it's 55% slower than in 2.95.3 and 35%
slower than in 3.0.4. Perhaps state that certain
optimization levels aren't allowed to slow more than a
certain number of % between release for a certain
important benchmarks (ie SPEC2K, linux-kernel, etc.)

2b) Set criteria for new optimizations to be added.
Mandate a certain amount of runtime improvement in a
certain benchmark, before an optimization is included
with -O2 for example.

Cheers,

Dara

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 20:16 C compile time Dara Hazeghi
  2003-06-19 20:16 ` Andrew Pinski
@ 2003-06-19 20:22 ` Diego Novillo
  2003-06-19 21:58   ` Dara Hazeghi
  2003-06-19 20:44 ` Jan Hubicka
  2003-06-19 22:10 ` Steven Bosscher
  3 siblings, 1 reply; 71+ messages in thread
From: Diego Novillo @ 2003-06-19 20:22 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: gcc, pinskia

On Thu, 2003-06-19 at 15:21, Dara Hazeghi wrote:

> 1) At this point, the SPEC testers are keeping track
> of runtime. We already have compile regression testers
> for the testsuite which automatically report
> regressions to the list. Should the same be done for
> compile-time, ie a compile time increase of more than
> 1% prompts a message to the list.
> 

SPEC already uses a geometric mean when doing more than 1 run.  The
problem with SPEC is that it uses wall-clock timers.  Variations in the
1-4% range aren't uncommon (even if the compiler didn't change).

If the machine running SPEC happens to be doing something else at the
time, the results will be affected.  Running SPEC in single-user mode is
not practical for an automated tester, so all you can do is -STOP
everything else and disallow logins.

You really need to examine trends in performance and do the detective
work.  In theory, the SPEC testers are collecting everything necessary,
down to the daily diffs.  But the process is time consuming, just the
same.

Diego.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 20:16 C compile time Dara Hazeghi
  2003-06-19 20:16 ` Andrew Pinski
  2003-06-19 20:22 ` Diego Novillo
@ 2003-06-19 20:44 ` Jan Hubicka
  2003-06-19 21:23   ` Dara Hazeghi
  2003-06-19 22:10 ` Steven Bosscher
  3 siblings, 1 reply; 71+ messages in thread
From: Jan Hubicka @ 2003-06-19 20:44 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: gcc, pinskia

> Hello,
> 
> I've now updated results at
> http://www.myownlittleworld.com/computers/gcctable.html
> to include a table of the percentage change for
> compile time between the various compilers.
> 
> Some things which stick out:
> 
> 1) Andrew Pinski's patch gets us back on average about
> 10% on mainline and 7.5% on branch. Hopefully his
> copyright assignment arrives soon!
> 
> 2) Kaveh's work the garbage collection algorithm means
> that gcc 3.3 is the first major gcc release since egcs
> 1.0.3a that's not more than 5% slower than the last
> major release on the preceding branch. gcc 3.2.3 for
> instance is between 14% and 42% slower than gcc 3.0.4
> depending on optimizations.
> 
> 3) The biggest slowdown to gcc was between gcc 3.0.X
> and 3.1.X (ie different branches), but between 3.1.1
> and 3.2.3, there was quite a slowdown too, ~6-9%. I
> didn't realize that such big changes occurred on
> release branches.
> 
> 4) -funit-at-a-time is expensive!

Can you please try -O2 -funit-at-a-time?  That way we see how much of
overhead comes from -funit-at-a-time itself and inlining functions
called once and how much from the inlining heuristics.
I will try to prepare patch limiting function body for inlining once
soon.

Honza

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 20:44 ` Jan Hubicka
@ 2003-06-19 21:23   ` Dara Hazeghi
  2003-06-19 21:23     ` Jan Hubicka
  0 siblings, 1 reply; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-19 21:23 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

> Can you please try -O2 -funit-at-a-time?

Sure:

results with mainline (compiling cc1 from gcc 3.2.3):
-O2      -O2 -funit...    -O3 -fno-unit...  -O3
483.84   558.45           595.36            704.59
         +15.4%                             +18.3%

In other words, the compile time increase is
comparable, though not equal...

Dara

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 21:23   ` Dara Hazeghi
@ 2003-06-19 21:23     ` Jan Hubicka
  2003-06-19 21:26       ` Dara Hazeghi
  0 siblings, 1 reply; 71+ messages in thread
From: Jan Hubicka @ 2003-06-19 21:23 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: Jan Hubicka, gcc

> > Can you please try -O2 -funit-at-a-time?
> 
> Sure:
> 
> results with mainline (compiling cc1 from gcc 3.2.3):
> -O2      -O2 -funit...    -O3 -fno-unit...  -O3
> 483.84   558.45           595.36            704.59
>          +15.4%                             +18.3%
> 
> In other words, the compile time increase is
> comparable, though not equal...
Hmm, so we have combination of these two problems...

In case it is easy for you to do so, can you also compare -O2
-funit-at-a-time -fno-inline-functions and -O2 -fno-inline-functions.
That will elliminate the inlining once optimization that should be
guilty for majority of the slowdown.  In case it remains it is likely
memory overhead - how much memory do you have in your box?

Thanks for testing it!
Honza
> 
> Dara
> 
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!
> http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 21:23     ` Jan Hubicka
@ 2003-06-19 21:26       ` Dara Hazeghi
  2003-06-19 21:31         ` Jan Hubicka
  0 siblings, 1 reply; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-19 21:26 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

> Hmm, so we have combination of these two problems...
> 
> In case it is easy for you to do so, can you also
> compare -O2
> -funit-at-a-time -fno-inline-functions and -O2
> -fno-inline-functions.

Do you mean -O3 here? -fno-inline-functions
-funit-at-a-time -O2 was exactly the same as
-funit-at-a-time -O2.

> That will elliminate the inlining once optimization
> that should be
> guilty for majority of the slowdown.  In case it
> remains it is likely
> memory overhead - how much memory do you have in
> your box?

1Gig. Not enough I take it :-)

Dara

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 21:26       ` Dara Hazeghi
@ 2003-06-19 21:31         ` Jan Hubicka
  2003-06-19 21:59           ` Jan Hubicka
  0 siblings, 1 reply; 71+ messages in thread
From: Jan Hubicka @ 2003-06-19 21:31 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: Jan Hubicka, gcc

> > Hmm, so we have combination of these two problems...
> > 
> > In case it is easy for you to do so, can you also
> > compare -O2
> > -funit-at-a-time -fno-inline-functions and -O2
> > -fno-inline-functions.
> 
> Do you mean -O3 here? -fno-inline-functions
> -funit-at-a-time -O2 was exactly the same as
> -funit-at-a-time -O2.

Hmm, I realize there is no way to disable inlining functions once.
I will prepare patch and then come back :)
> 
> > That will elliminate the inlining once optimization
> > that should be
> > guilty for majority of the slowdown.  In case it
> > remains it is likely
> > memory overhead - how much memory do you have in
> > your box?
> 
> 1Gig. Not enough I take it :-)
:)) I just wondered whether it is GC triggering.  My system is 256MB, so
it should not be better in this regard than yours.

Honza
> 
> Dara
> 
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!
> http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 21:58   ` Dara Hazeghi
@ 2003-06-19 21:58     ` Diego Novillo
  2003-06-20 22:42       ` Dara Hazeghi
  2003-06-19 21:59     ` Jan Hubicka
  1 sibling, 1 reply; 71+ messages in thread
From: Diego Novillo @ 2003-06-19 21:58 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: gcc

On Thu, 2003-06-19 at 17:22, Dara Hazeghi wrote:

> April 24 and April 29, and still haven't recovered
> yet. Anybody have offhand ideas what might have caused
> that, before I start digging through gcc-cvs? Thanks,
> 
You can use the .diff files stored in the daily log directories:
http://people.redhat.com/dnovillo/spec2000/gcc/log/


Diego.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 20:22 ` Diego Novillo
@ 2003-06-19 21:58   ` Dara Hazeghi
  2003-06-19 21:58     ` Diego Novillo
  2003-06-19 21:59     ` Jan Hubicka
  0 siblings, 2 replies; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-19 21:58 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc

--- Diego Novillo <dnovillo@redhat.com> wrote:

> 
> You really need to examine trends in performance and
> do the detective
> work.  In theory, the SPEC testers are collecting
> everything necessary,
> down to the daily diffs.  But the process is time
> consuming, just the
> same.

True. I hadn't realized that noise with SPEC runs was
such a big deal. So speaking of examining the stats,
SPEC2K 176.gcc peak compiles really took a hit between
April 24 and April 29, and still haven't recovered
yet. Anybody have offhand ideas what might have caused
that, before I start digging through gcc-cvs? Thanks,

Dara

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 21:58   ` Dara Hazeghi
  2003-06-19 21:58     ` Diego Novillo
@ 2003-06-19 21:59     ` Jan Hubicka
  1 sibling, 0 replies; 71+ messages in thread
From: Jan Hubicka @ 2003-06-19 21:59 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: Diego Novillo, gcc

> 
> --- Diego Novillo <dnovillo@redhat.com> wrote:
> 
> > 
> > You really need to examine trends in performance and
> > do the detective
> > work.  In theory, the SPEC testers are collecting
> > everything necessary,
> > down to the daily diffs.  But the process is time
> > consuming, just the
> > same.
> 
> True. I hadn't realized that noise with SPEC runs was
> such a big deal. So speaking of examining the stats,

Actually my experience is that majority of patches has less effect than
the noise is, but overall the curves move.  It is dificult to analyze
such a data then :(

Honza
> SPEC2K 176.gcc peak compiles really took a hit between
> April 24 and April 29, and still haven't recovered
> yet. Anybody have offhand ideas what might have caused
> that, before I start digging through gcc-cvs? Thanks,
> 
> Dara
> 
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!
> http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 21:31         ` Jan Hubicka
@ 2003-06-19 21:59           ` Jan Hubicka
  2003-06-20  0:55             ` Dara Hazeghi
  0 siblings, 1 reply; 71+ messages in thread
From: Jan Hubicka @ 2003-06-19 21:59 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dara Hazeghi, gcc

> > > Hmm, so we have combination of these two problems...
> > > 
> > > In case it is easy for you to do so, can you also
> > > compare -O2
> > > -funit-at-a-time -fno-inline-functions and -O2
> > > -fno-inline-functions.
> > 
> > Do you mean -O3 here? -fno-inline-functions
> > -funit-at-a-time -O2 was exactly the same as
> > -funit-at-a-time -O2.
> 
> Hmm, I realize there is no way to disable inlining functions once.
> I will prepare patch and then come back :)

Here it comes.  Bootstrap in progress, OK assuming it psses?
Dara: with this patch you -O2 -funit-at-a-time should enable unit at a
time compilation only, while -O2 -finline-functions-called-once will be
what -O2 -funit-at-a-time is now.

Honza

Thu Jun 19 23:28:51 CEST 2003  Jan Hubicka  <jh@suse.cz>
	* cgraphunit.c (cgraph_optimize): Inline functions called once
	only with flag_inline_functions_called_once.
	* flags.h (flag_inline_functions_called_once): Declare.
	* toplev.c (flag_inline_functions_called_once): Global variable.
	(lang_independent_options): Add optimize-functions-called-once.
	(parse_options_and_default_flags): Set
	flag_inline_functions_called_once at -O3.
	(process_options): flag_inline_functions_called_once imply
	flag_unit_at_a_time.
	* invoke.texi (-finline-functions-called-once): Document.
Index: cgraphunit.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraphunit.c,v
retrieving revision 1.5
diff -c -3 -p -r1.5 cgraphunit.c
*** cgraphunit.c	12 May 2003 09:46:25 -0000	1.5
--- cgraphunit.c	19 Jun 2003 21:26:41 -0000
*************** cgraph_optimize ()
*** 454,460 ****
  
    cgraph_mark_local_functions ();
  
!   cgraph_mark_functions_to_inline_once ();
  
    cgraph_global_info_ready = true;
    if (!quiet_flag)
--- 454,461 ----
  
    cgraph_mark_local_functions ();
  
!   if (flag_inline_functions_called_once)
!     cgraph_mark_functions_to_inline_once ();
  
    cgraph_global_info_ready = true;
    if (!quiet_flag)
Index: flags.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/flags.h,v
retrieving revision 1.111
diff -c -3 -p -r1.111 flags.h
*** flags.h	15 Jun 2003 08:29:58 -0000	1.111
--- flags.h	19 Jun 2003 21:26:42 -0000
*************** extern int flag_rerun_loop_opt;
*** 377,382 ****
--- 377,386 ----
  
  extern int flag_inline_functions;
  
+ /* Nonzero means make functions that are called just once.  */
+ 
+ extern int flag_inline_functions_called_once;
+ 
  /* Nonzero for -fkeep-inline-functions: even if we make a function
     go inline everywhere, keep its definition around for debugging
     purposes.  */
Index: toplev.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/toplev.c,v
retrieving revision 1.771
diff -c -3 -p -r1.771 toplev.c
*** toplev.c	15 Jun 2003 08:29:59 -0000	1.771
--- toplev.c	19 Jun 2003 21:26:43 -0000
*************** int flag_rerun_loop_opt;
*** 706,711 ****
--- 707,716 ----
  
  int flag_inline_functions;
  
+ /* Nonzero means make functions that are called just once.  */
+ 
+ int flag_inline_functions_called_once;
+ 
  /* Nonzero for -fkeep-inline-functions: even if we make a function
     go inline everywhere, keep its definition around for debugging
     purposes.  */
*************** static const lang_independent_options f_
*** 1097,1102 ****
--- 1102,1109 ----
     N_("Allow function addresses to be held in registers") },
    {"inline-functions", &flag_inline_functions, 1,
     N_("Integrate simple functions into their callers") },
+   {"inline-functions-called-once", &flag_inline_functions_called_once, 1,
+    N_("Integrate simple functions into their callers when they are called just once") },
    {"keep-inline-functions", &flag_keep_inline_functions, 1,
     N_("Generate code for funcs even if they are fully inlined") },
    {"inline", &flag_no_inline, 0,
*************** parse_options_and_default_flags (int arg
*** 5062,5067 ****
--- 5075,5081 ----
        flag_rename_registers = 1;
        flag_unswitch_loops = 1;
        flag_unit_at_a_time = 1;
+       flag_inline_functions_called_once = 1;
      }
  
    if (optimize < 2 || optimize_size)
*************** process_options (void)
*** 5278,5283 ****
--- 5292,5300 ----
      flag_asynchronous_unwind_tables = 1;
    if (flag_asynchronous_unwind_tables)
      flag_unwind_tables = 1;
+ 
+   if (flag_inline_functions_called_once)
+     flag_unit_at_a_time = 1;
  
    /* Disable unit-at-a-time mode for frontends not supporting callgraph
       interface.  */
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.291
diff -c -3 -p -r1.291 invoke.texi
*** doc/invoke.texi	13 Jun 2003 10:11:45 -0000	1.291
--- doc/invoke.texi	19 Jun 2003 21:26:48 -0000
*************** in the following sections.
*** 265,271 ****
  -fforce-addr  -fforce-mem  -ffunction-sections @gol
  -fgcse  -fgcse-lm  -fgcse-sm  -floop-optimize  -fcrossjumping @gol
  -fif-conversion  -fif-conversion2 @gol
! -finline-functions  -finline-limit=@var{n}  -fkeep-inline-functions @gol
  -fkeep-static-consts  -fmerge-constants  -fmerge-all-constants @gol
  -fmove-all-movables  -fnew-ra  -fno-branch-count-reg @gol
  -fno-default-inline  -fno-defer-pop @gol
--- 265,272 ----
  -fforce-addr  -fforce-mem  -ffunction-sections @gol
  -fgcse  -fgcse-lm  -fgcse-sm  -floop-optimize  -fcrossjumping @gol
  -fif-conversion  -fif-conversion2 @gol
! -finline-functions  -finline-limit=@var{n} @gol
! -finline-functions-called-once  -fkeep-inline-functions @gol
  -fkeep-static-consts  -fmerge-constants  -fmerge-all-constants @gol
  -fmove-all-movables  -fnew-ra  -fno-branch-count-reg @gol
  -fno-default-inline  -fno-defer-pop @gol
*************** integrating in this way.
*** 3611,3616 ****
--- 3612,3625 ----
  If all calls to a given function are integrated, and the function is
  declared @code{static}, then the function is normally not output as
  assembler code in its own right.
+ 
+ Enabled at level @option{-O3}.
+ 
+ @item -finline-functions-called-once
+ @opindex finline-functions-called-once
+ Integrate functions into their callers when they are called just once
+ and their address is not taken, so the orignal function body does not need
+ to be assembled.  Implies @option{-funit-at-a-time}
  
  Enabled at level @option{-O3}.
  

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 20:16 C compile time Dara Hazeghi
                   ` (2 preceding siblings ...)
  2003-06-19 20:44 ` Jan Hubicka
@ 2003-06-19 22:10 ` Steven Bosscher
  2003-06-19 22:30   ` Steven Bosscher
  3 siblings, 1 reply; 71+ messages in thread
From: Steven Bosscher @ 2003-06-19 22:10 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: gcc, pinskia

Dara Hazeghi wrote:

>Hello,
>
>I've now updated results at
>http://www.myownlittleworld.com/computers/gcctable.html
>to include a table of the percentage change for
>compile time between the various compilers.
>

Thanks for this nice comparison.  One question ("Again!?"  Yes, sorry :-)



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 22:10 ` Steven Bosscher
@ 2003-06-19 22:30   ` Steven Bosscher
  0 siblings, 0 replies; 71+ messages in thread
From: Steven Bosscher @ 2003-06-19 22:30 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Dara Hazeghi, gcc, pinskia

Steven Bosscher wrote:

> Dara Hazeghi wrote:
>
>> Hello,
>>
>> I've now updated results at
>> http://www.myownlittleworld.com/computers/gcctable.html
>> to include a table of the percentage change for
>> compile time between the various compilers.
>>
>
> Thanks for this nice comparison.  One question ("Again!?"  Yes, sorry :-) 

[ Sorry, sent too early ]

You say on your page that all compilers you compare are based on 
snapshots from 20030614.  On tree-ssa some significant speedups have 
gone in since (for CCP in particular).  So could you try a more recent 
tree-ssa snapshot and see if it helps?

Gr.
Steven


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 21:59           ` Jan Hubicka
@ 2003-06-20  0:55             ` Dara Hazeghi
  0 siblings, 0 replies; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-20  0:55 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc


--- Jan Hubicka <jh@suse.cz> wrote:
> Dara: with this patch you -O2 -funit-at-a-time
> should enable unit at a
> time compilation only, while -O2
> -finline-functions-called-once will be
> what -O2 -funit-at-a-time is now.

Okay, I've done that. I can't quite explain the
results (yes checking is disabled, machine otherwise
idle):
-O2      -O2 -funit...      -O2 -finline-f...
453.50   449.82             531.44
         -0.8%              +17.2%

Hope this helps. I'm a bit surprised by the -O2
-funit... timings, as well as the fact that plain -O2
is now 6% faster than withh 20030614, so I'll try to
reconfirm these timings.

Dara

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 21:58     ` Diego Novillo
@ 2003-06-20 22:42       ` Dara Hazeghi
  2003-06-21  0:34         ` Diego Novillo
  0 siblings, 1 reply; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-20 22:42 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc


--- Diego Novillo <dnovillo@redhat.com> wrote:
> On Thu, 2003-06-19 at 17:22, Dara Hazeghi wrote:
> 
> > April 24 and April 29, and still haven't recovered
> > yet. Anybody have offhand ideas what might have
> caused
> > that, before I start digging through gcc-cvs?
> Thanks,
> > 
> You can use the .diff files stored in the daily log
> directories:
> http://people.redhat.com/dnovillo/spec2000/gcc/log/

You guys make this too easy. One final question, do
the SPEC testers just use vanilla configurations for
gcc, or do they disable checking or other things?
Thanks,

Dara

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-20 22:42       ` Dara Hazeghi
@ 2003-06-21  0:34         ` Diego Novillo
  0 siblings, 0 replies; 71+ messages in thread
From: Diego Novillo @ 2003-06-21  0:34 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: gcc

On Fri, 2003-06-20 at 16:07, Dara Hazeghi wrote:

> You guys make this too easy.
>
:)

> One final question, do
> the SPEC testers just use vanilla configurations for
> gcc, or do they disable checking or other things?
> Thanks,
> 
I don't know about the ones Andreas runs.  Mine configure with
--disable-checking and --disable-multilib.  Check the build.log file
that's in the daily logs for the exact flags.


Diego.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 22:03         ` Dara Hazeghi
  2003-06-20 20:36           ` Scott Robert Ladd
@ 2003-07-04  7:14           ` Ben Elliston
  1 sibling, 0 replies; 71+ messages in thread
From: Ben Elliston @ 2003-07-04  7:14 UTC (permalink / raw)
  To: gcc

Dara Hazeghi <dhazeghi@yahoo.com> writes:

> Regarding icc, I don't care much for it, but it was the only other C
> compiler I found for x86/Linux that was easy to get ahold of. I
> included it for reference purposes. If anybody has a suggestion for
> another Linux C compiler, by all means, tell me.

How about TCC?  I realise that TCC's code quality is not terrific, but
it is probably an interesting data point for non-optimised compilation
times.

There is also LCC.

Cheers, Ben


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-30 15:28 Robert Dewar
  0 siblings, 0 replies; 71+ messages in thread
From: Robert Dewar @ 2003-06-30 15:28 UTC (permalink / raw)
  To: dberlin, dewar; +Cc: coyote, dhazeghi, gcc, pkoning

> It would be better to improve GDB than to dumb down gcc's optimizations.
> Also work on merging things like var-tracking and dwarf2 location list
> support from the rtlopt-branch/cfg-branch for gcc, which helps immensely
> with optimized debugging.

Yes, but you still get transformations in the optimized code that cannot
be followed by the debugger in a clearly inteligent way.

I am not talking about dumbing down -O here, but rather in practice making
-O0 better.

In our world at least, people use -O0 primarily because they can
't debug at higher levels. Yes, it would be nice if this is fixed,
but it would also be nice if there was a debuggable level which did
not generate so much junk.

In comparison with other compilers, the performance in unoptimized, clearly
debuggable mode of gcc is rather poor, even if the -O2 code compares
favorably.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-30 14:25 Robert Dewar
@ 2003-06-30 14:58 ` Daniel Berlin
  0 siblings, 0 replies; 71+ messages in thread
From: Daniel Berlin @ 2003-06-30 14:58 UTC (permalink / raw)
  To: Robert Dewar; +Cc: pkoning, coyote, dhazeghi, gcc



On Mon, 30 Jun 2003, Robert Dewar wrote:

> > Absolutely.  The notion that you do debugging with -O0 and only final
> > build with -O2 is obsolete -- and actually never was a good one.
> >
> > If you want reliable software, you have to debug what you ship -- not
> > something totally different.
>
> Mind you, in practice we find that gdb is pretty weak debugging -O2 code,
> so this is a tricky requirement. It sometimes helps to use -O1 as a
> compromise, but I still would very much like to see -Od, meaning do all
> the optimization you can that does not intefere with debugging :-)

It would be better to improve GDB than to dumb down gcc's optimizations.
Also work on merging things like var-tracking and dwarf2 location list
support from the rtlopt-branch/cfg-branch for gcc, which helps immensely
with optimized debugging.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-30 14:25 Robert Dewar
  2003-06-30 14:58 ` Daniel Berlin
  0 siblings, 1 reply; 71+ messages in thread
From: Robert Dewar @ 2003-06-30 14:25 UTC (permalink / raw)
  To: dewar, pkoning; +Cc: coyote, dhazeghi, gcc

> Absolutely.  The notion that you do debugging with -O0 and only final
> build with -O2 is obsolete -- and actually never was a good one.  
> 
> If you want reliable software, you have to debug what you ship -- not
> something totally different.

Mind you, in practice we find that gdb is pretty weak debugging -O2 code,
so this is a tricky requirement. It sometimes helps to use -O1 as a 
compromise, but I still would very much like to see -Od, meaning do all
the optimization you can that does not intefere with debugging :-)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-29 13:51 Robert Dewar
@ 2003-06-30 13:50 ` Paul Koning
  0 siblings, 0 replies; 71+ messages in thread
From: Paul Koning @ 2003-06-30 13:50 UTC (permalink / raw)
  To: dewar; +Cc: coyote, dhazeghi, gcc

>>>>> "Robert" == Robert Dewar <dewar@gnat.com> writes:

 >> But for the vast majority of working programmers, I simply don't
 >> see why optimized compile times are such an issue.

 Robert> In our world, we have many customers building very large
 Robert> systems that require optimization to be turned on once they
 Robert> get past the initial integration stage, so the great majority
 Robert> of development work is done in optimized mode.  The model
 Robert> where you do everything unoptimized and turn on optimization
 Robert> for the final build and you are done is not reasonable for
 Robert> large critical systems which must be extensively tested in
 Robert> essentially final form.

 Robert> So for us, -O2 compilation time is indeed critical.

Absolutely.  The notion that you do debugging with -O0 and only final
build with -O2 is obsolete -- and actually never was a good one.  

If you want reliable software, you have to debug what you ship -- not
something totally different.

	  paul

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-29 13:51 Robert Dewar
  2003-06-30 13:50 ` Paul Koning
  0 siblings, 1 reply; 71+ messages in thread
From: Robert Dewar @ 2003-06-29 13:51 UTC (permalink / raw)
  To: coyote, dhazeghi; +Cc: gcc

> But for the vast majority of working programmers, I
> simply don't see why 
> optimized compile times are such an issue.

In our world, we have many customers building very large systems that require
optimization to be turned on once they get past the initial integration
stage, so the great majority of development work is done in optimized mode.
The model where you do everything unoptimized and turn on optimization for
the final build and you are done is not reasonable for large critical systems
which must be extensively tested in essentially final form.

So for us, -O2 compilation time is indeed critical.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-20 20:36           ` Scott Robert Ladd
  2003-06-21  0:31             ` Dara Hazeghi
@ 2003-06-21 16:14             ` Michael S. Zick
  1 sibling, 0 replies; 71+ messages in thread
From: Michael S. Zick @ 2003-06-21 16:14 UTC (permalink / raw)
  To: Scott Robert Ladd, Dara Hazeghi; +Cc: gcc, s.bosscher, pinskia

On Friday 20 June 2003 02:30 pm, Scott Robert Ladd wrote:
> But for the vast majority of working programmers, I simply don't see why
> optimized compile times are such an issue.
>
> I'm more than open to illumination on this topic.
Scott;
1) It is a good metric to follow the evolution of compiler
development.
2) Better to track this actively than wait for the time
when it becomes faster to work a problem with paper 
and pencil, then toggle it in from the front panel.

C.S. is rumored to be past the "toggle it in from
the front panel" stage.  But I suppose there might
be a few still rewiring their plugboards...

Mike

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-20 20:36           ` Scott Robert Ladd
@ 2003-06-21  0:31             ` Dara Hazeghi
  2003-06-21 16:14             ` Michael S. Zick
  1 sibling, 0 replies; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-21  0:31 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: gcc

> But for the vast majority of working programmers, I
> simply don't see why 
> optimized compile times are such an issue.

You have a point. My point is that gcc has tried best
to be amenable to all types of users, be they the
typical working programmer, or the kernel hacker, or
the hobbyist.

One group for instance, is people who develop gcc.
Since for them, compiles are done with -O2 -g, I'd
certainly imagine this can become an issue (though if
I'm wrong, feel free to chime in!).

Another group is distributions targeting older
machines, ie OpenBSD/m68k. As Marc Espie pointed out
(http://gcc.gnu.org/ml/gcc/2002-05/msg01460.html) this
is more than a minor inconvenience.

Then of course, there're the people who like to
compile their software (I guess I fit there). Most
free software packages default to -O2, or -O3. Trust
me, switching from gcc 2.95 to 3.1 on Darwin, and
trying to rebuild the 5 dozen fink packages I use was
not fun. Took 4 days instead of 2.5.

And of course, there was the little thread a few
months back about how linux kernel developers were
less than pleased with the state of compile time in
gcc 3.2.

So the short answer is of course, it only matters to
some folk. But I think that raising the issue at a
minimum raises awareness.

Dara

P.S. I personally would like a -O.5 option. Something
which generates decent code (not too much slower than
-O1), but doesn't constantly receive new
optimizations, and so as a result, slows down only
with the underlying infrastructure.

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 22:03         ` Dara Hazeghi
@ 2003-06-20 20:36           ` Scott Robert Ladd
  2003-06-21  0:31             ` Dara Hazeghi
  2003-06-21 16:14             ` Michael S. Zick
  2003-07-04  7:14           ` Ben Elliston
  1 sibling, 2 replies; 71+ messages in thread
From: Scott Robert Ladd @ 2003-06-20 20:36 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: gcc, s.bosscher, pinskia

Dara Hazeghi wrote:
 > Scott, I added -g to the table. Do you consider -pg
 > particularly important (from the real world)?

Given that -pg has little or no influence on compile time, the question 
is moot.

Going back to -g for a moment: According to your tables, the overall 
compile time has not increased precipitously for debug-style compiles. 
While optimized compile times do indeed get slower with successive 
versions of GCC, such compiles are only rarely made in my experience. 
During development, the vast, vast majority of my compiles do not 
include any optimization options or set them to -O0.

Therefore, I am somewhat confused as to why there is so much stress over 
increasing compile times for optimized builds. I can see it as an issue 
for very large executable images, such as the Linux kernel, or for 
people who design and build source-based GNU/Linux distros like Gentoo. 
But for the vast majority of working programmers, I simply don't see why 
optimized compile times are such an issue.

I'm more than open to illumination on this topic.

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 16:41             ` Mark Mitchell
  2003-06-19 17:08               ` Jan Hubicka
@ 2003-06-19 17:33               ` Jeff Sturm
  1 sibling, 0 replies; 71+ messages in thread
From: Jeff Sturm @ 2003-06-19 17:33 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, Zack Weinberg, Benjamin Kosnik, gcc, guenth

On 19 Jun 2003, Mark Mitchell wrote:
> > Why do we want inline before gimplifing?
>
> We might want to do that to do C++-specific optimizations.
>
> Jason, Jeff, and I talked about this some at the conference; it's not
> clear what the right answer is.  Some C++-specific optimizations
> (virtual call elimination is a canonical example) really want flow
> graphs and inlining, constant propagation, and dead code elimination to
> have happenned, but still want some C++-specific bits lying around.

Java may want to do similar optimizations, plus some of its own
(elimination of redundant synchronization and class initialization come
immediately to mind).  Some of the frontend code could be simplified or
perhaps removed entirely once the CFG is available, such as
check_for_initialization.

Jeff

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 16:41             ` Mark Mitchell
@ 2003-06-19 17:08               ` Jan Hubicka
  2003-06-19 17:33               ` Jeff Sturm
  1 sibling, 0 replies; 71+ messages in thread
From: Jan Hubicka @ 2003-06-19 17:08 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, Zack Weinberg, Benjamin Kosnik, gcc, guenth

> 
> > Why do we want inline before gimplifing?
> 
> We might want to do that to do C++-specific optimizations.
> 
> Jason, Jeff, and I talked about this some at the conference; it's not
> clear what the right answer is.  Some C++-specific optimizations
> (virtual call elimination is a canonical example) really want flow
> graphs and inlining, constant propagation, and dead code elimination to
> have happenned, but still want some C++-specific bits lying around.

I think I remember the discussion.
In the case of virtual call ellimination I believe the only way is to
expose virtual calls to gimple in some generic way.  It seems to be too
much effort to do constant propagation and dead code ellimination on
non-gimplified trees.  There probably should be some virtual call
expression in gimple that gets elliminated in the later passes...

Honza
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> mark@codesourcery.com
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 16:36           ` Jan Hubicka
@ 2003-06-19 16:41             ` Mark Mitchell
  2003-06-19 17:08               ` Jan Hubicka
  2003-06-19 17:33               ` Jeff Sturm
  0 siblings, 2 replies; 71+ messages in thread
From: Mark Mitchell @ 2003-06-19 16:41 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Zack Weinberg, Benjamin Kosnik, gcc, guenth

> Why do we want inline before gimplifing?

We might want to do that to do C++-specific optimizations.

Jason, Jeff, and I talked about this some at the conference; it's not
clear what the right answer is.  Some C++-specific optimizations
(virtual call elimination is a canonical example) really want flow
graphs and inlining, constant propagation, and dead code elimination to
have happenned, but still want some C++-specific bits lying around.

-- 
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 16:31         ` Mark Mitchell
@ 2003-06-19 16:36           ` Jan Hubicka
  2003-06-19 16:41             ` Mark Mitchell
  0 siblings, 1 reply; 71+ messages in thread
From: Jan Hubicka @ 2003-06-19 16:36 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, Zack Weinberg, Benjamin Kosnik, gcc, guenth

> > As I described in the other mail, we can do the inlining decisions first
> > and the do the top-down inlining based on these.  It is possible to get
> > upper estimate of function body after the inlining, on the other hand we
> > can probably do significantly better if we were doing inlining
> > step-by-step and optimizing the intermediate results so we have better
> > idea about how far the inlined version simplify.
> 
> Yes, deciding when to do that is one of the tricky bits.

Simple heuristics can be to do this only when the resulting body
estimate is small so the function will remain inline candidate and the
resulting function will actually be considered for inlining many times
(as we do always inline functions called once).

This should imit the overall growth caused by the decision pass by the
number of nontrivial functions (having more than one calee) *
small_function_threshold that should be enought.
>  
> > In unit-at-a-time I already do have analysis to recognize these cases.
> > How hard would be to make current tree-inline.c to not do the copy and
> > modify trees in place instead?
> 
> I'm not sure.  
> 
> In theory what you suggest is possible, but it might be pretty tricky.  

That what I was affraid of :(
> 
> (You'd have to walk the whole tree and fix up any DECL_CONTEXT entries;
> I'm not sure what else might or might not need to change.)
> 
> If eliminated SAVE_EXPR in favor of real variables everywhere, this
> might be easier.  In other words, doing this in GIMPLE might be easier. 
> But, unfortunately, you may want to inline before that point...

Why do we want inline before gimplifing?

Honza
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> mark@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-19 15:21       ` Jan Hubicka
@ 2003-06-19 16:31         ` Mark Mitchell
  2003-06-19 16:36           ` Jan Hubicka
  0 siblings, 1 reply; 71+ messages in thread
From: Mark Mitchell @ 2003-06-19 16:31 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Zack Weinberg, Benjamin Kosnik, gcc, guenth

> As I described in the other mail, we can do the inlining decisions first
> and the do the top-down inlining based on these.  It is possible to get
> upper estimate of function body after the inlining, on the other hand we
> can probably do significantly better if we were doing inlining
> step-by-step and optimizing the intermediate results so we have better
> idea about how far the inlined version simplify.

Yes, deciding when to do that is one of the tricky bits.
 
> In unit-at-a-time I already do have analysis to recognize these cases.
> How hard would be to make current tree-inline.c to not do the copy and
> modify trees in place instead?

I'm not sure.  

In theory what you suggest is possible, but it might be pretty tricky.  

(You'd have to walk the whole tree and fix up any DECL_CONTEXT entries;
I'm not sure what else might or might not need to change.)

If eliminated SAVE_EXPR in favor of real variables everywhere, this
might be easier.  In other words, doing this in GIMPLE might be easier. 
But, unfortunately, you may want to inline before that point...

-- 
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 21:26     ` Mark Mitchell
  2003-06-18 21:51       ` Zack Weinberg
@ 2003-06-19 15:21       ` Jan Hubicka
  2003-06-19 16:31         ` Mark Mitchell
  1 sibling, 1 reply; 71+ messages in thread
From: Jan Hubicka @ 2003-06-19 15:21 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Zack Weinberg, Benjamin Kosnik, gcc, guenth, jh

> On Wed, 2003-06-18 at 13:05, Zack Weinberg wrote:
> > Mark Mitchell <mark@codesourcery.com> writes:
> > 
> > > For 3.4, we could consider going back to the "bottom-up" inlining
> > > strategy.  That might be better than what we have now, even though it's
> > > inherently quadratic.  Implementing bottom-up inlining wouldn't be
> > > terribly hard; all the same tree-inlining machinery would work.
> > 
> > With information about the complete translation unit in hand, I don't
> > think bottom-up has to be quadratic.  Consider the following
> > algorithm.
> 
> I think we need to say quadratic in what.  You're right that it's a
> linear number of inlining operations, but that's quadratic in terms of
> the number of nodes in the trees.

As I described in the other mail, we can do the inlining decisions first
and the do the top-down inlining based on these.  It is possible to get
upper estimate of function body after the inlining, on the other hand we
can probably do significantly better if we were doing inlining
step-by-step and optimizing the intermediate results so we have better
idea about how far the inlined version simplify.

> 
> For example:
> 
>   void f1() { return v1; }
>   void f2() { return v2 + f1(); }
>   void f3() { return v3 + f2(); }
>   ...
> 
> When you inline into f27 you'll have 27 copies of f1 lying around, 26
> copies of f2, 25 copies of f3, and so forth.
> 
> You can do better, but you have to be smarter.  If you're lucky, and
> some of these functions needn't actually be emitted in the .o file, you
> can do better.  For example, if f3 is needed, but f1 and f2 aren't, then
> you can inline f2 and f1 directly into f3, and not bother inlining f1
> into f2.

In unit-at-a-time I already do have analysis to recognize these cases.
How hard would be to make current tree-inline.c to not do the copy and
modify trees in place instead?

Honza
> 
> In C++, this is a typical case -- f27 may well be the only function
> which needs to be emitted, while all the other ones are inline template
> functions that need not be emitted and -- often -- are used nowhere else
> in the translation unit.
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> mark@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-19 14:58 Richard Guenther
  0 siblings, 0 replies; 71+ messages in thread
From: Richard Guenther @ 2003-06-19 14:58 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc, jh, Benjamin Kosnik

>> It would be nice if some of the inlining issues got sorted out for 3.4,
>> and -Winline became deterministic again.
>
>For 3.4, we could consider going back to the "bottom-up" inlining
>strategy.  That might be better than what we have now, even though it's
>inherently quadratic.  Implementing bottom-up inlining wouldn't be
>terribly hard; all the same tree-inlining machinery would work.
>
>One of the things we seem to forget in all the inlining discussion is
>that inlining has never worked well.  In fact, one of the big
>motivations in going to function-at-a-time was to try to fix all the
>lameness in the RTL inliner!  On many large C++ programs, the 2.95 era
>compilers would simply exhaust all memory trying to do inlining...
>
>I'm pretty convinced that there's no easy fix, unfortunately.

I think at least with a callgraph available we can do better. Also
implementing more user hints (like __attribute__((leafify))) correctly
needs a callgraph due to our function deffering stuff.

Richard.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 21:51       ` Zack Weinberg
@ 2003-06-18 23:09         ` Mark Mitchell
  0 siblings, 0 replies; 71+ messages in thread
From: Mark Mitchell @ 2003-06-18 23:09 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Benjamin Kosnik, gcc, guenth, jh

On Wed, 2003-06-18 at 14:07, Zack Weinberg wrote:
> Mark Mitchell <mark@codesourcery.com> writes:
> 
> >   void f1() { return v1; }
> >   void f2() { return v2 + f1(); }
> >   void f3() { return v3 + f2(); }
> >   ...
> >
> > When you inline into f27 you'll have 27 copies of f1 lying around, 26
> > copies of f2, 25 copies of f3, and so forth.
> >
> > You can do better, but you have to be smarter.  If you're lucky, and
> > some of these functions needn't actually be emitted in the .o file, you
> > can do better.  For example, if f3 is needed, but f1 and f2 aren't, then
> > you can inline f2 and f1 directly into f3, and not bother inlining f1
> > into f2.
> 
> I think that can be done as a trivial adjustment to the algorithm I
> proposed -- you attach a "number of call sites" reference count to
> each FUNCTION_DECL, and decrement it every time it gets inlined; when
> it gets to zero, throw away the function body.  This effectively does
> the last step of my original algorithm at the same time as the main
> loop.  Obviously, externally visible decls have to be pinned.

If I understand correctly, that improvement decreases the total memory
usage (by throwing things away sooner), but still leaves you quadratic
in the number of node-copies you have to do.

-- 
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18  9:15       ` Steven Bosscher
  2003-06-18 10:07         ` Jan Hubicka
@ 2003-06-18 22:03         ` Dara Hazeghi
  2003-06-20 20:36           ` Scott Robert Ladd
  2003-07-04  7:14           ` Ben Elliston
  1 sibling, 2 replies; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-18 22:03 UTC (permalink / raw)
  To: gcc; +Cc: coyote, s.bosscher, pinskia

Hello,

I've tried to update my measurements to include the
recommendations made on the list.

Current table is at:
http://www.myownlittleworld.com/computers/gcctable.html

(so as to avoid yahoo mail munging it).

Steve, you're right that -funit-at-a-time is
responsible for the slowdown at -O3. Without it,
results on mainline are quite similar to 3.3. I'll try
to record executable sizes when I get a chance. I
don't think this will be a good runtime benchmark,
because gcc is supposed to be largely impervious to
such optimizations, but if you think not, I can check
that too.

Andrew, your patch obviously helped enormously.
However, I tried it against tree-ssa and saw no
difference (beyond noise). Any ideas as to why?

Scott, I added -g to the table. Do you consider -pg
particularly important (from the real world)?

As mentioned in
http://gcc.gnu.org/ml/gcc/2003-06/msg01562.html, I'm
uncertain how to get results for -fsyntax-only, but
I'll certainly check.

Regarding icc, I don't care much for it, but it was
the only other C compiler I found for x86/Linux that
was easy to get ahold of. I included it for reference
purposes. If anybody has a suggestion for another
Linux C compiler, by all means, tell me.

All in all, I'm pleased that this issue is getting IMO
some much deserved attention, and glad that the issues
weren't as severe as might have initially seemed.

Cheers,

Dara

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-18 21:51 Chris Lattner
  0 siblings, 0 replies; 71+ messages in thread
From: Chris Lattner @ 2003-06-18 21:51 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Zack Weinberg, gcc


Mark Mitchell said:
>> With information about the complete translation unit in hand, I don't
>> think bottom-up has to be quadratic.  Consider the following
>> algorithm.

> I think we need to say quadratic in what.  You're right that it's a
> linear number of inlining operations, but that's quadratic in terms of
> the number of nodes in the trees.

That's actually _exponential_ in the number of nodes in the tree, which
is why a good heuristic is a _must_.

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 21:26     ` Mark Mitchell
@ 2003-06-18 21:51       ` Zack Weinberg
  2003-06-18 23:09         ` Mark Mitchell
  2003-06-19 15:21       ` Jan Hubicka
  1 sibling, 1 reply; 71+ messages in thread
From: Zack Weinberg @ 2003-06-18 21:51 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Benjamin Kosnik, gcc, guenth, jh

Mark Mitchell <mark@codesourcery.com> writes:

>   void f1() { return v1; }
>   void f2() { return v2 + f1(); }
>   void f3() { return v3 + f2(); }
>   ...
>
> When you inline into f27 you'll have 27 copies of f1 lying around, 26
> copies of f2, 25 copies of f3, and so forth.
>
> You can do better, but you have to be smarter.  If you're lucky, and
> some of these functions needn't actually be emitted in the .o file, you
> can do better.  For example, if f3 is needed, but f1 and f2 aren't, then
> you can inline f2 and f1 directly into f3, and not bother inlining f1
> into f2.

I think that can be done as a trivial adjustment to the algorithm I
proposed -- you attach a "number of call sites" reference count to
each FUNCTION_DECL, and decrement it every time it gets inlined; when
it gets to zero, throw away the function body.  This effectively does
the last step of my original algorithm at the same time as the main
loop.  Obviously, externally visible decls have to be pinned.

zw

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 20:52   ` Zack Weinberg
@ 2003-06-18 21:26     ` Mark Mitchell
  2003-06-18 21:51       ` Zack Weinberg
  2003-06-19 15:21       ` Jan Hubicka
  0 siblings, 2 replies; 71+ messages in thread
From: Mark Mitchell @ 2003-06-18 21:26 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Benjamin Kosnik, gcc, guenth, jh

On Wed, 2003-06-18 at 13:05, Zack Weinberg wrote:
> Mark Mitchell <mark@codesourcery.com> writes:
> 
> > For 3.4, we could consider going back to the "bottom-up" inlining
> > strategy.  That might be better than what we have now, even though it's
> > inherently quadratic.  Implementing bottom-up inlining wouldn't be
> > terribly hard; all the same tree-inlining machinery would work.
> 
> With information about the complete translation unit in hand, I don't
> think bottom-up has to be quadratic.  Consider the following
> algorithm.

I think we need to say quadratic in what.  You're right that it's a
linear number of inlining operations, but that's quadratic in terms of
the number of nodes in the trees.

For example:

  void f1() { return v1; }
  void f2() { return v2 + f1(); }
  void f3() { return v3 + f2(); }
  ...

When you inline into f27 you'll have 27 copies of f1 lying around, 26
copies of f2, 25 copies of f3, and so forth.

You can do better, but you have to be smarter.  If you're lucky, and
some of these functions needn't actually be emitted in the .o file, you
can do better.  For example, if f3 is needed, but f1 and f2 aren't, then
you can inline f2 and f1 directly into f3, and not bother inlining f1
into f2.

In C++, this is a typical case -- f27 may well be the only function
which needs to be emitted, while all the other ones are inline template
functions that need not be emitted and -- often -- are used nowhere else
in the translation unit.

-- 
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-18 21:18 Chris Lattner
  0 siblings, 0 replies; 71+ messages in thread
From: Chris Lattner @ 2003-06-18 21:18 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Mark Mitchell, Benjamin Kosnik, gcc, guenth, jh

Zach said:
> With information about the complete translation unit in hand, I don't
> think bottom-up has to be quadratic.  Consider the following
> algorithm.
> 1) Construct a complete call graph for the translation unit.
 ...

> The key property of this algorithm is that each function is processed
> for inlining exactly once, and we never have to inline more than one
> level, because all call sites in an inlinee have already been either
> inlined or marked do-not-inline.  Thus, no quadratic term.  IIRC
> topological sort is O(n log n), which we can live with.

This is similar to what we do in LLVM, except that we don't build an
explicit call graph.  If you're interested, here's the code for the
heuristic:
http://llvm.cs.uiuc.edu/doxygen/FunctionInlining_8cpp-source.html

Basically we do a bottom-up inlining based on what amounts to the call
graph (we just don't explicitly create one).  Our heuristic currently is
set to only inline when it will shrink the program, thus the cut-off is
set to be very conservative.  Besides that though, you may be interested
in why we try harder to inline some functions than others.

The only major problem that I have with the LLVM implementation right now
is that it doesn't cache information about the size of the function it is
considering inlining (it recalculates it every time it considers the
function).  Aside from this though (and the fact that it happens to use an
std::map instead of a hash_map), the algorithm is linear in the number of
function inlines it performs.

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 20:11 ` Mark Mitchell
  2003-06-18 20:49   ` Jan Hubicka
@ 2003-06-18 20:52   ` Zack Weinberg
  2003-06-18 21:26     ` Mark Mitchell
  1 sibling, 1 reply; 71+ messages in thread
From: Zack Weinberg @ 2003-06-18 20:52 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Benjamin Kosnik, gcc, guenth, jh

Mark Mitchell <mark@codesourcery.com> writes:

> For 3.4, we could consider going back to the "bottom-up" inlining
> strategy.  That might be better than what we have now, even though it's
> inherently quadratic.  Implementing bottom-up inlining wouldn't be
> terribly hard; all the same tree-inlining machinery would work.

With information about the complete translation unit in hand, I don't
think bottom-up has to be quadratic.  Consider the following
algorithm.

1) Construct a complete call graph for the translation unit.
2) Mark all calls to functions not visible in the call graph
   not-to-be-inlined.
3) Topologically sort the call graph.  When cycles are detected,
   pick an edge in the cycle and mark that call site not-to-be-inlined.
   (Need a good heuristic for that.)  The sort goes bottom-up: if A
   calls B, B sorts before A.

4) For each function in the order produced by the topological sort:
   a) For each call site in the function which is not already marked
      not-to-be-inlined, decide heuristically whether to inline at
      that call site, and either perform the inlining or mark the call
      not-to-be-inlined.
   b) If possible, perform basic optimizations on the function -
      constant propagation and unreachable code elimination would
      probably be the biggest wins.

5) Discard function bodies which are now unreferenced and not
   externally visible.

The key property of this algorithm is that each function is processed
for inlining exactly once, and we never have to inline more than one
level, because all call sites in an inlinee have already been either
inlined or marked do-not-inline.  Thus, no quadratic term.  IIRC
topological sort is O(n log n), which we can live with.

zw

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 20:11 ` Mark Mitchell
@ 2003-06-18 20:49   ` Jan Hubicka
  2003-06-18 20:52   ` Zack Weinberg
  1 sibling, 0 replies; 71+ messages in thread
From: Jan Hubicka @ 2003-06-18 20:49 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Benjamin Kosnik, gcc, guenth, jh

> > It would be nice if some of the inlining issues got sorted out for 3.4,
> > and -Winline became deterministic again.
> 
> -Winline has never been deterministic.  It just seemed that way.
> 
> Since 3.0 , it's just been broken.  It just didn't warn you about a lot
> of the cases where it didn't inline.
> 
> Before that, it was still non-deterministic, in the sense that the RTL
> inliner had throttles that would keep it from inlining as much as you
> might have liked.
> 
> So, the problem has gotten worse, but it's not a new problem.
> 
> For 3.4, we could consider going back to the "bottom-up" inlining
> strategy.  That might be better than what we have now, even though it's
> inherently quadratic.  Implementing bottom-up inlining wouldn't be
> terribly hard; all the same tree-inlining machinery would work.
> 
> One of the things we seem to forget in all the inlining discussion is
> that inlining has never worked well.  In fact, one of the big
> motivations in going to function-at-a-time was to try to fix all the
> lameness in the RTL inliner!  On many large C++ programs, the 2.95 era
> compilers would simply exhaust all memory trying to do inlining...
> 
> I'm pretty convinced that there's no easy fix, unfortunately.

With cgraph code, I am considering relatively simple heuristics taking
extra arguments of maximal growth of the compilation unit, maximal
growth of big function and bif functio threshold.

I think we can simply order cgraph rev postorder (so start decision from
leaves of the callgraph "tree") and decide to inline each small function
(or function called once) while copmuting infromation about the growth
of destination function body and overall growth.
The thresholds should prevent us from running into degerate cases (that
are generally either that we construct extravagantly large function
bodies by deep inlining or inlining too many times or we enlarge whole
program too much).  Perhaps the maximal growth of the compilation unit
is redundant.

Then we can perform top-down inlining once we made bottom-up decisions.

I tend to believe that this should fix problems we are seeing (modulo
the determinism mentioned here, tsk :) This is currently my main
motivation to get cgraph code used :)

Honza
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> mark@codesourcery.com
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 10:38 ` Joseph S. Myers
@ 2003-06-18 20:36   ` Dara Hazeghi
  0 siblings, 0 replies; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-18 20:36 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: gcc

--- "Joseph S. Myers" <jsm28@cam.ac.uk> wrote:
> On Tue, 17 Jun 2003, Dara Hazeghi wrote:
> 
> > gcc version     -O0     -O1     -O2     -O3
> 
> Could you also do tests with -fsyntax-only (the
> difference between
> -fsyntax-only and -O0 being all the code generation
> work outside the front
> end that still needs to be done at -O0)?

Good idea. Unfortunately I don't have a good idea how
to do this. With gcc 2.7.2.3-2.91.66, I can do this
easily. But with 2.95.X, -fsyntax-only is badly
broken, and with 3.X, no object files are created,
seriously confusing make. Any suggestions? I guess I
could change around the makefiles, but that'll take a
good bit of work... Thanks,

Dara

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 16:34 Benjamin Kosnik
@ 2003-06-18 20:11 ` Mark Mitchell
  2003-06-18 20:49   ` Jan Hubicka
  2003-06-18 20:52   ` Zack Weinberg
  0 siblings, 2 replies; 71+ messages in thread
From: Mark Mitchell @ 2003-06-18 20:11 UTC (permalink / raw)
  To: Benjamin Kosnik; +Cc: gcc, guenth, jh

> It would be nice if some of the inlining issues got sorted out for 3.4,
> and -Winline became deterministic again.

-Winline has never been deterministic.  It just seemed that way.

Since 3.0 , it's just been broken.  It just didn't warn you about a lot
of the cases where it didn't inline.

Before that, it was still non-deterministic, in the sense that the RTL
inliner had throttles that would keep it from inlining as much as you
might have liked.

So, the problem has gotten worse, but it's not a new problem.

For 3.4, we could consider going back to the "bottom-up" inlining
strategy.  That might be better than what we have now, even though it's
inherently quadratic.  Implementing bottom-up inlining wouldn't be
terribly hard; all the same tree-inlining machinery would work.

One of the things we seem to forget in all the inlining discussion is
that inlining has never worked well.  In fact, one of the big
motivations in going to function-at-a-time was to try to fix all the
lameness in the RTL inliner!  On many large C++ programs, the 2.95 era
compilers would simply exhaust all memory trying to do inlining...

I'm pretty convinced that there's no easy fix, unfortunately.

-- 
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 19:30         ` Wolfgang Bangerth
@ 2003-06-18 19:31           ` Chris Lattner
  0 siblings, 0 replies; 71+ messages in thread
From: Chris Lattner @ 2003-06-18 19:31 UTC (permalink / raw)
  To: Wolfgang Bangerth; +Cc: Jan Hubicka, Richard Guenther, gcc

On Wed, 18 Jun 2003, Wolfgang Bangerth wrote:

> > Here is an idiom widely used in the LLVM code base, for example:
> > http://llvm.cs.uiuc.edu/doxygen/GCSE_8cpp-source.html
>
> Is this guaranteed to work? I mean, how can the address of a function with
> static linkage be used in a virtual function table? (I think it should work,
> since the vt can only be set up in this very file, but still...)

It is guaranteed to work, and the vtable itself should be marked static as
well.  The idea is that there is a public accessor function:

// createGCSEPass - The public interface to this file...
Pass *createGCSEPass() { return new GCSE(); }

Which indirectly exposes the pass to the outside world.

> > Besides the fact that my project would benefit a lot from this, if there
> > is no reason to not mark the methods static, we should.
>
> If the class itself is declared in a an anonymous namespace, I agree.
> Not that I'm opposed to your proposal, it's just that I don't see a whole lot
> of code out there that uses this idiom. I wouldn't want to trade this feature
> against another 10% slowdown in the compiler ;-)

Me neither!  This _shouldn't_ slow down the compiler at all, in fact I
expect it to help the linker and dynamic linkers a bit.  You're probably
right that not a whole lot of code uses anonymous namespaces, but I think
more and more is.  I don't think we should "mangle" C static functions
into "unique" names instead of marking them static, which is basically
the situation we are in with anon-namespaces now...

Perhaps I should file a (feature request) bug report?

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 19:28       ` Chris Lattner
@ 2003-06-18 19:30         ` Wolfgang Bangerth
  2003-06-18 19:31           ` Chris Lattner
  0 siblings, 1 reply; 71+ messages in thread
From: Wolfgang Bangerth @ 2003-06-18 19:30 UTC (permalink / raw)
  To: Chris Lattner; +Cc: Jan Hubicka, Richard Guenther, gcc

> Here is an idiom widely used in the LLVM code base, for example:
> http://llvm.cs.uiuc.edu/doxygen/GCSE_8cpp-source.html

Is this guaranteed to work? I mean, how can the address of a function with 
static linkage be used in a virtual function table? (I think it should work, 
since the vt can only be set up in this very file, but still...)

> Besides the fact that my project would benefit a lot from this, if there
> is no reason to not mark the methods static, we should. 

If the class itself is declared in a an anonymous namespace, I agree.

Not that I'm opposed to your proposal, it's just that I don't see a whole lot 
of code out there that uses this idiom. I wouldn't want to trade this feature 
against another 10% slowdown in the compiler ;-)

W.

-------------------------------------------------------------------------
Wolfgang Bangerth              email:            bangerth@ices.utexas.edu
                               www: http://www.ices.utexas.edu/~bangerth/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 18:57     ` Wolfgang Bangerth
@ 2003-06-18 19:28       ` Chris Lattner
  2003-06-18 19:30         ` Wolfgang Bangerth
  0 siblings, 1 reply; 71+ messages in thread
From: Chris Lattner @ 2003-06-18 19:28 UTC (permalink / raw)
  To: Wolfgang Bangerth; +Cc: Jan Hubicka, Richard Guenther, gcc

On Wed, 18 Jun 2003, Wolfgang Bangerth wrote:
> Yes, yes, I understand that :-) It's just an observation that not a whole lot
> of code is being placed into anonymous namespaces, whether it being functions
> are whole classes.

Here is an idiom widely used in the LLVM code base, for example:
http://llvm.cs.uiuc.edu/doxygen/GCSE_8cpp-source.html

Most passes in the compiler are implemented as a subclass of a "Pass"
class, which basically exposes a "run" virtual method.  Subclasses (often
living in anonymous namespaces) override this virtual function, then have
a number of private non-virtual methods (like EliminateCSE) in the example
above.  No code outside of the class should ever be able to call
EliminateCSE, yet GCC makes the symbol is made public, which is bad for
the optimizer, and probably slows down the linker as well.

Alternatively, we could implement our passes as basically empty classes,
with a bunch of static methods.  This works, but means that state has to
be passed around through arguments, which isn't as nice.  :)  An example:
http://llvm.cs.uiuc.edu/doxygen/DeadArgumentElimination_8cpp-source.html

Besides the fact that my project would benefit a lot from this, if there
is no reason to not mark the methods static, we should.  I would not be
suprised if this also comes up for important free software applications
like KDE and other large C++ apps.  The boost library also probably uses
them...

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 18:48   ` Chris Lattner
@ 2003-06-18 18:57     ` Wolfgang Bangerth
  2003-06-18 19:28       ` Chris Lattner
  0 siblings, 1 reply; 71+ messages in thread
From: Wolfgang Bangerth @ 2003-06-18 18:57 UTC (permalink / raw)
  To: Chris Lattner; +Cc: Jan Hubicka, Richard Guenther, gcc

> > > How much of this is because anonymous namespaces are not marking their
> > > contents as static?
> >
> > Even if they did, I don't think it would be a big thing. Most functions
> > in C++ should be member functions, and then they can't be static.
>
> That's not true.  If defined in an anonymous namespace, the member
> functions are not directly accessable to other translation units, thus the
> symbols would be marked private.  Anonymous namespaces effectively make
> them static, it's just that the optimizer is not currently able to take
> advantage of this piece of semantic information...

Yes, yes, I understand that :-) It's just an observation that not a whole lot 
of code is being placed into anonymous namespaces, whether it being functions 
are whole classes. 

As an analogy: Where in C you would have the functions implementing an 
internal interface to some data structure as static functions, they would 
most likely be private/protected member functions of a class in C++, but if 
that class has also a public interface (the usual case), then the class 
cannot be part of an anonymous namespace.

Or do you mean you want to place the implementation of a member functions of a 
publicly visible class into an anonymous namespace? That would strike me as a 
particularly bad and error-prone style...

W.

-------------------------------------------------------------------------
Wolfgang Bangerth              email:            bangerth@ices.utexas.edu
                               www: http://www.ices.utexas.edu/~bangerth/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 18:28 ` Wolfgang Bangerth
@ 2003-06-18 18:48   ` Chris Lattner
  2003-06-18 18:57     ` Wolfgang Bangerth
  0 siblings, 1 reply; 71+ messages in thread
From: Chris Lattner @ 2003-06-18 18:48 UTC (permalink / raw)
  To: Wolfgang Bangerth; +Cc: Jan Hubicka, Richard Guenther, gcc

On Wed, 18 Jun 2003, Wolfgang Bangerth wrote:
> > How much of this is because anonymous namespaces are not marking their
> > contents as static?
>
> Even if they did, I don't think it would be a big thing. Most functions in C++
> should be member functions, and then they can't be static.

That's not true.  If defined in an anonymous namespace, the member
functions are not directly accessable to other translation units, thus the
symbols would be marked private.  Anonymous namespaces effectively make
them static, it's just that the optimizer is not currently able to take
advantage of this piece of semantic information...

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 17:52 Chris Lattner
  2003-06-18 18:01 ` Jan Hubicka
@ 2003-06-18 18:28 ` Wolfgang Bangerth
  2003-06-18 18:48   ` Chris Lattner
  1 sibling, 1 reply; 71+ messages in thread
From: Wolfgang Bangerth @ 2003-06-18 18:28 UTC (permalink / raw)
  To: Chris Lattner, Jan Hubicka; +Cc: Richard Guenther, gcc


> How much of this is because anonymous namespaces are not marking their
> contents as static?

Even if they did, I don't think it would be a big thing. Most functions in C++ 
should be member functions, and then they can't be static.

W.

-------------------------------------------------------------------------
Wolfgang Bangerth              email:            bangerth@ices.utexas.edu
                               www: http://www.ices.utexas.edu/~bangerth/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 18:01 ` Jan Hubicka
@ 2003-06-18 18:08   ` Chris Lattner
  0 siblings, 0 replies; 71+ messages in thread
From: Chris Lattner @ 2003-06-18 18:08 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Wolfgang Bangerth, Richard Guenther, gcc

On Wed, 18 Jun 2003, Jan Hubicka wrote:

> > How much of this is because anonymous namespaces are not marking their
> > contents as static?  In C++, static functions are deprecated...

> No idea.
> How does the implementation of anonoumous namespaces work?  I didn't
> know these are actually not marked as static...

Last time I checked, mainline was just giving names in an anonymous
namespace horrible "unique" names, but not marking them static.  There was
some discussion of possible interaction with the export feature of C++,
but GCC does not support that feature and if it ever does, it will be a
long time off.

From what I can tell, there should be no problem marking members of an
anonymous namespace as static in current GCC...

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 17:52 Chris Lattner
@ 2003-06-18 18:01 ` Jan Hubicka
  2003-06-18 18:08   ` Chris Lattner
  2003-06-18 18:28 ` Wolfgang Bangerth
  1 sibling, 1 reply; 71+ messages in thread
From: Jan Hubicka @ 2003-06-18 18:01 UTC (permalink / raw)
  To: Chris Lattner; +Cc: Jan Hubicka, Wolfgang Bangerth, Richard Guenther, gcc

> 
> Jan Hubicka wrote:
> > Actually there are so few static functions in C++ so unit-at-a-time as
> > implemented currently has almost no effect.
> 
> How much of this is because anonymous namespaces are not marking their
> contents as static?  In C++, static functions are deprecated...
No idea.
How does the implementation of anonoumous namespaces work?  I didn't
know these are actually not marked as static...

Honza
> 
> -Chris
> 
> -- 
> http://llvm.cs.uiuc.edu/
> http://www.nondot.org/~sabre/Projects/
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-18 17:52 Chris Lattner
  2003-06-18 18:01 ` Jan Hubicka
  2003-06-18 18:28 ` Wolfgang Bangerth
  0 siblings, 2 replies; 71+ messages in thread
From: Chris Lattner @ 2003-06-18 17:52 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Wolfgang Bangerth, Richard Guenther, gcc


Jan Hubicka wrote:
> Actually there are so few static functions in C++ so unit-at-a-time as
> implemented currently has almost no effect.

How much of this is because anonymous namespaces are not marking their
contents as static?  In C++, static functions are deprecated...

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 16:12 Wolfgang Bangerth
@ 2003-06-18 17:48 ` Jan Hubicka
  0 siblings, 0 replies; 71+ messages in thread
From: Jan Hubicka @ 2003-06-18 17:48 UTC (permalink / raw)
  To: Wolfgang Bangerth; +Cc: Richard Guenther, Jan Hubicka, gcc

> 
> > Note that I consider not having unit-at-a-time for C++ for 3.4 as a
> > showstopper as it is possibly the only way to get sane inlining and
> > such sane performance out of scientific C++ codes like POOMA.
> 
> I don't think that this is a particularly true statement. C++ codes have many 
> member functions that can be called from everywhere, and few file-static 
> functions. So being able to inline static functions that are only called once 
> is not a very important feature. The general ability to inline small accessor 
> and template functions is much more important, but that is independent of 
> unit-at-a-time.
Actually there are so few static functions in C++ so unit-at-a-time as
implemented currently has almost no effect.  On the other hand if we
reorganize inlining heuristics to take into acount code growth inlining
function X causes, I think we will get easy win for the inlining
problems in C++...  This is also dificult to do without an callgraph.

Honza
> 
> W.
> 
> -------------------------------------------------------------------------
> Wolfgang Bangerth              email:            bangerth@ices.utexas.edu
>                                www: http://www.ices.utexas.edu/~bangerth/
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 15:54 ` Mark Mitchell
@ 2003-06-18 17:42   ` Jan Hubicka
  0 siblings, 0 replies; 71+ messages in thread
From: Jan Hubicka @ 2003-06-18 17:42 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Richard Guenther, Jan Hubicka, gcc

> 
> > Note that I consider not having unit-at-a-time for C++ for 3.4 as a
> > showstopper as it is possibly the only way to get sane inlining and
> > such sane performance out of scientific C++ codes like POOMA.
> 
> Sorry -- we'll not hold 3.4 for this functionality.  It is likely to
> introduce new issues, and if it's not ready we'll hold off.
> 
> I agree that it's very desirable, but we've never had it until now, and
> we've always had a compiler people wanted to use for many projects.

I still hope to have something in the 3.4 timeframe, but any help is
appreciated :)
Today I started again from my original patch and I've cleaned up the way
functions are deferred, so now I deffer all functions to the end of
file, where I do finalize them and the new functions are finalized
immediately so at least I avoid duplication in between cgraphunit and
decl2.c's idea of deprecating functions.

Still I would like to understand when the given function can be
finalized - if I do that and turn cgraphunit code to deal with
non-unit-at-a-time mode as well, I can get non-unit-at-a-time mode to
expand function as soon as it is clear that it is needed and it is ready
saving some memory relative to current scheme that deffers always to the
end of compilation.

Honza
> 
> Thanks,
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> mark@codesourcery.com
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-18 16:34 Benjamin Kosnik
  2003-06-18 20:11 ` Mark Mitchell
  0 siblings, 1 reply; 71+ messages in thread
From: Benjamin Kosnik @ 2003-06-18 16:34 UTC (permalink / raw)
  To: gcc; +Cc: guenth, mark, jh

> Note that I consider not having unit-at-a-time for C++ for 3.4 as a
> showstopper as it is possibly the only way to get sane inlining and such
> sane performance out of scientific C++ codes like POOMA.

It's not just scientific C++ code. It's pretty much all C++ code.

It would be nice if some of the inlining issues got sorted out for 3.4,
and -Winline became deterministic again. I'm not working on this, so
all I can say is, "would be nice." Any efforts on this front would be
appreciated.

If I had extra gnudits, I'd spend them on this.

;)

-benjamin

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-18 16:12 Wolfgang Bangerth
  2003-06-18 17:48 ` Jan Hubicka
  0 siblings, 1 reply; 71+ messages in thread
From: Wolfgang Bangerth @ 2003-06-18 16:12 UTC (permalink / raw)
  To: Richard Guenther, Jan Hubicka, gcc

> Note that I consider not having unit-at-a-time for C++ for 3.4 as a
> showstopper as it is possibly the only way to get sane inlining and
> such sane performance out of scientific C++ codes like POOMA.

I don't think that this is a particularly true statement. C++ codes have many 
member functions that can be called from everywhere, and few file-static 
functions. So being able to inline static functions that are only called once 
is not a very important feature. The general ability to inline small accessor 
and template functions is much more important, but that is independent of 
unit-at-a-time.

W.

-------------------------------------------------------------------------
Wolfgang Bangerth              email:            bangerth@ices.utexas.edu
                               www: http://www.ices.utexas.edu/~bangerth/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
       [not found] <Pine.LNX.4.44.0306181249160.6712-100000@bellatrix.tat.physik.uni-tuebingen. de>
@ 2003-06-18 15:54 ` Mark Mitchell
  2003-06-18 17:42   ` Jan Hubicka
  0 siblings, 1 reply; 71+ messages in thread
From: Mark Mitchell @ 2003-06-18 15:54 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jan Hubicka, gcc


> Note that I consider not having unit-at-a-time for C++ for 3.4 as a
> showstopper as it is possibly the only way to get sane inlining and
> such sane performance out of scientific C++ codes like POOMA.

Sorry -- we'll not hold 3.4 for this functionality.  It is likely to
introduce new issues, and if it's not ready we'll hold off.

I agree that it's very desirable, but we've never had it until now, and
we've always had a compiler people wanted to use for many projects.

Thanks,

-- 
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18  7:43 ` Dara Hazeghi
  2003-06-18  8:41   ` Steven Bosscher
@ 2003-06-18 14:00   ` Scott Robert Ladd
  1 sibling, 0 replies; 71+ messages in thread
From: Scott Robert Ladd @ 2003-06-18 14:00 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: gcc mailing list

Dara Hazeghi wrote:
 > Okay, here's the updated table. Checking is disabled
 > on all compilers.
 >
 > gcc version     -O0     -O1     -O2     -O3
[snip]

Could you run tests for debug compiles (i.e., -g, -pg)? We could then 
compare numbers directly and fairly against your numbers for optimized 
compiles.

If you haven't the time, I'm running my own benchmarks for a forthcoming 
paper that compares gcc and icc (on many levels). I'll post those 
numbers once I've completed my tests.

I rarely use optimizations until I'm in the final phase of a project. My 
development compiles often include options for debugging or profiling; 
when I compile with optimization, my goal is to generate the fastest 
possible executable image, regardless of compile time (unless it is 
egregious, i.e., on an order of days). In terms of the overall lifetime 
of a program, compile time is less significant than run time.

Another note: Intel's compiler produces code that is faster in both 
compilation and execution (in most cases). I suspect some of gcc's 
compile time can be attributed to its multi-platform approach; Intel, by 
virtue of being platform specific, can focus its algorithms on one target.

Finally, it's important to consider correctness and comprehensiveness; 
if a faster compiler does not support current standards, it is generally 
less valuable to me than a slower compiler with full compliance. This 
observation is, perhaps, less critical for those who work only with C89 
and earlier standards, but is very important to those of us interested 
in C99 and C++. Recent gcc compilers may be slower, but they provide 
support for current standards.

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Professional programming for science and engineering;
Interesting and unusual bits of very free code.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-18 13:08 Richard Guenther
  0 siblings, 0 replies; 71+ messages in thread
From: Richard Guenther @ 2003-06-18 13:08 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Mark Mitchell, gcc

>> > Yes -- you recently mentioned that Mark had some ideas about C++
>> > unit-at-a-time. Could you tell a bit more about that?
>>
>> Mark does not like my current approach. C++ frontend currently contain
>> loop that does iterate over all known functions, virtual tables and
>> static data and outputs one only when it has been already assembled
>> reference to. (the templates instantiations and virtual tables are
>> created lazilly when needed)
>>
>> I modified it to walk the function bodies to discover what static
>> initializers and templates are needed so I don't need to actually
>> assemble to see what is needed, but Mark would preffer approach that
>> expands everything available and then removes dead objects.
>>
>> That means that I need to extend unit-at-a-time first to deal with data
>> structures as well (so one can cancel data structure from being output
>> when it is already expanded) and doing so also brings memory
explission.
>> Mark thinks we should reduce memory overhead of the frontend first this
>> is bit more involved change that I feel I am able to do in C++ frontend
>> in 3.4 horizont.
>
>Note that we also discussed an "in the middle" sollution that actually
>expands all functions but delay expansion of the virtual tables and
>examines what are needed.  I am trying to implement it but at the moment
>it looks even uglyer than the original and it slows libstdc++
>compilation down noticeably (12%) apparently due to extra expansion and
>memory overhead, I will try to cut this down somewhat.
>
>The expansion loop in C++ as currently written is really twisted and all
>my approaches to get unit-at-a-time in makes it even worse :(((

Note that I consider not having unit-at-a-time for C++ for 3.4 as a
showstopper as it is possibly the only way to get sane inlining and
such sane performance out of scientific C++ codes like POOMA.

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 12:38             ` Jan Hubicka
@ 2003-06-18 12:51               ` Jan Hubicka
  0 siblings, 0 replies; 71+ messages in thread
From: Jan Hubicka @ 2003-06-18 12:51 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Steven Bosscher, Dara Hazeghi, Andrew Pinski, gcc

> > Yes -- you recently mentioned that Mark had some ideas about C++ 
> > unit-at-a-time. Could you tell a bit more about that?
> 
> Mark does not like my current approach. C++ frontend currently contain
> loop that does iterate over all known functions, virtual tables and
> static data and outputs one only when it has been already assembled
> reference to. (the templates instantiations and virtual tables are
> created lazilly when needed)
> 
> I modified it to walk the function bodies to discover what static
> initializers and templates are needed so I don't need to actually
> assemble to see what is needed, but Mark would preffer approach that
> expands everything available and then removes dead objects.
> 
> That means that I need to extend unit-at-a-time first to deal with data
> structures as well (so one can cancel data structure from being output
> when it is already expanded) and doing so also brings memory explission.
> Mark thinks we should reduce memory overhead of the frontend first this
> is bit more involved change that I feel I am able to do in C++ frontend
> in 3.4 horizont.

Note that we also discussed an "in the middle" sollution that actually
expands all functions but delay expansion of the virtual tables and
examines what are needed.  I am trying to implement it but at the moment
it looks even uglyer than the original and it slows libstdc++
compilation down noticeably (12%) apparently due to extra expansion and
memory overhead, I will try to cut this down somewhat.

The expansion loop in C++ as currently written is really twisted and all
my approaches to get unit-at-a-time in makes it even worse :(((

Honza

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 10:55           ` Steven Bosscher
@ 2003-06-18 12:38             ` Jan Hubicka
  2003-06-18 12:51               ` Jan Hubicka
  0 siblings, 1 reply; 71+ messages in thread
From: Jan Hubicka @ 2003-06-18 12:38 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Jan Hubicka, Dara Hazeghi, Andrew Pinski, gcc

> 
> Maybe there should be an expand_function_called_once_inline() function 
> that does not duplicate the whole function body for inlining.  If you 
> already know that you're only going to inline that function and not 
> output it, then this should help a lot for memory consumption.  Of 
> course that doesn't help the backend...

Yes, it would be possible to avoid the duplication, but I am not sure
how much that will help - garbagecollecting the trees out does not seem
to take that much time even for insn-recog.c
> 
> With unit-at-a-time, do we start inlining from the callgraph leaves or 
> from the root?  If we do the former, than you could simply stop inining 

We still do the same inlining scheme as without unit-at-a-time (so from
the root).  I intended to change this once all frontends doing tree
inlining use unit-at-a-time.  It probably makes more sense to first
estimate sizes of reulting functions and decide what to inline precisely
and go away from current scheme of attributes deciding on inlining and
replace them by flags on cgraph edges.

> if a function grows too large.  Another thing you could try is to first 
> compute what the total size of all the functions would be after inlining 
> everything (all the information you need is available) and make inline 
> decisions based on that.
> 
> >Unfortunate thing on this path is that I will probably not be able to do
> >unit-at-a-time for C++ without some help and thus this work won't help
> >C++ where we get major problems.
> >
> 
> Yes -- you recently mentioned that Mark had some ideas about C++ 
> unit-at-a-time. Could you tell a bit more about that?

Mark does not like my current approach. C++ frontend currently contain
loop that does iterate over all known functions, virtual tables and
static data and outputs one only when it has been already assembled
reference to. (the templates instantiations and virtual tables are
created lazilly when needed)

I modified it to walk the function bodies to discover what static
initializers and templates are needed so I don't need to actually
assemble to see what is needed, but Mark would preffer approach that
expands everything available and then removes dead objects.

That means that I need to extend unit-at-a-time first to deal with data
structures as well (so one can cancel data structure from being output
when it is already expanded) and doing so also brings memory explission.
Mark thinks we should reduce memory overhead of the frontend first this
is bit more involved change that I feel I am able to do in C++ frontend
in 3.4 horizont.
> 
> >Other sollution would be to not give up so easilly and try to push more
> >on keeping compiler linear on the compilation.  For GCSE we probably can
> >give up when having too many memory references and simply do not record
> >the others.  Similarly we can deal with quadratic bottlenecks in the
> >scheduler, but I am not sure whether we will run into problems in
> >register allocation that is unavoidably quadratic by definition but do
> >far does not appear to cause major obstackles.
> >
> 
> Well if GCSE is O(N^3) then that is obviously a good place to start and 
> see what the effects are for the generated code.  But eventually, 
> keeping N small is the only thing that really helps if you have 

The N is actually number of memory references optimized via GCSE.  One
approach is to simply cut the references being optimized - we don't need
to GCSE all of them, so we can keep N small even with function bodies
being large.  Other aproach is, of course, region based optimization but
that is bit dificult...

Honza
> unavoidable quadratic or worse algorithms.  So that means, make sure 
> functions don't grow excessively large.  That helps reducing memory 
> consumption, too.
> 
> Gr.
> Steven
> 
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18 10:07         ` Jan Hubicka
@ 2003-06-18 10:55           ` Steven Bosscher
  2003-06-18 12:38             ` Jan Hubicka
  0 siblings, 1 reply; 71+ messages in thread
From: Steven Bosscher @ 2003-06-18 10:55 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dara Hazeghi, Andrew Pinski, gcc

Jan Hubicka wrote:

>>>>The slowdown at -O3 is of course due to unit-at-a-time, which can be a 
>>>>big memory sucker if there are lots of opportunities to inline static 
>>>>
>>>I was doing some estimates there, and for compiling SPEC sources the
>>>slowdown is proportinal to the code size growth caused by
>>>unit-at-a-time.  So the problem does not seem to be increased memory
>>>overhead, just the fact that we find more inlining opurtunities and
>>>inline more. GCC is probably good testcase for this as many parts are
>>>ordered top-down instead of bottom-up or old inliner likes.
>>>
>>You can also use the test case for PR 10155 to see the RAM abuse (and PR 
>>11121 while you're at it :-)
>>    
>>
>I seem to recall what happens for GCC compilation - in my original
>unit-at-a-time benchmarking I did same test and found almost all
>slowdown to come from insn-recog. That one is large file divided into
>few functions that all inline together (as they are called once) and we
>run into several quadratic bottlenecks.  Originally it took about 1 hour
>to compile on my K6 test machine.   I implemented
>-finline-functions-called-once for this reason but during the review
>process we concluded to remove it.  Perhaps this is good reason to put
>it back.
>

Maybe there should be an expand_function_called_once_inline() function 
that does not duplicate the whole function body for inlining.  If you 
already know that you're only going to inline that function and not 
output it, then this should help a lot for memory consumption.  Of 
course that doesn't help the backend...

>I did elliminated majority of these (there was reload-cse, local cse,
>sheduler problems) with single exception of GCSE that has problems with
>doing transparent bitmaps for memories.  I didn't see direct way to get
>out of that - you do have too many memories and to many alias classes to
>care about.  The function analyses transparency of all these memories
>even in the areas they are not live - this is how does PRE algorithm work
>and without SSAPRE there is probably not too much to invent about.  The
>algorithm simply is N^3.
>
>One sollution would be to limit growth of the function I am inlining
>once into so functions won't get too large and we won't exhibit this too
>often.  I seen some of the testcases at unit-at-a-time and they usually
>seem to exhibit similar behaviour.
>

I believe that this is the solution you should persue for now.

>There are several other things we can do with unit-at-a-time - limit
>overall growth of compilation unit or decide whether to inline or not
>based on overall growth, not function size.
>

With unit-at-a-time, do we start inlining from the callgraph leaves or 
from the root?  If we do the former, than you could simply stop inining 
if a function grows too large.  Another thing you could try is to first 
compute what the total size of all the functions would be after inlining 
everything (all the information you need is available) and make inline 
decisions based on that.

>Unfortunate thing on this path is that I will probably not be able to do
>unit-at-a-time for C++ without some help and thus this work won't help
>C++ where we get major problems.
>

Yes -- you recently mentioned that Mark had some ideas about C++ 
unit-at-a-time. Could you tell a bit more about that?

>Other sollution would be to not give up so easilly and try to push more
>on keeping compiler linear on the compilation.  For GCSE we probably can
>give up when having too many memory references and simply do not record
>the others.  Similarly we can deal with quadratic bottlenecks in the
>scheduler, but I am not sure whether we will run into problems in
>register allocation that is unavoidably quadratic by definition but do
>far does not appear to cause major obstackles.
>

Well if GCSE is O(N^3) then that is obviously a good place to start and 
see what the effects are for the generated code.  But eventually, 
keeping N small is the only thing that really helps if you have 
unavoidable quadratic or worse algorithms.  So that means, make sure 
functions don't grow excessively large.  That helps reducing memory 
consumption, too.

Gr.
Steven


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18  2:31 Dara Hazeghi
@ 2003-06-18 10:38 ` Joseph S. Myers
  2003-06-18 20:36   ` Dara Hazeghi
  0 siblings, 1 reply; 71+ messages in thread
From: Joseph S. Myers @ 2003-06-18 10:38 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: gcc

On Tue, 17 Jun 2003, Dara Hazeghi wrote:

> gcc version     -O0     -O1     -O2     -O3

Could you also do tests with -fsyntax-only (the difference between
-fsyntax-only and -O0 being all the code generation work outside the front
end that still needs to be done at -O0)?

-- 
Joseph S. Myers
jsm28@cam.ac.uk

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18  9:15       ` Steven Bosscher
@ 2003-06-18 10:07         ` Jan Hubicka
  2003-06-18 10:55           ` Steven Bosscher
  2003-06-18 22:03         ` Dara Hazeghi
  1 sibling, 1 reply; 71+ messages in thread
From: Jan Hubicka @ 2003-06-18 10:07 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Jan Hubicka, Dara Hazeghi, Andrew Pinski, gcc

> Jan Hubicka wrote:
> 
> >>Dara Hazeghi wrote:
> >>
> >>   
> >>
> >>>Okay, here's the updated table. Checking is disabled
> >>>on all compilers.
> >>>
> >>>gcc version     -O0     -O1     -O2     -O3
> >>>2.7.2.3         128.04  131.02  163.51  176.01
> >>>2.8.1           128.86  140.79  182.35  194.63
> >>>2.90.29         130.60  140.57  186.29  199.32
> >>>2.91.66         132.44  148.48  203.71  219.21
> >>>2.95.3          143.38  180.97  250.94  276.85
> >>>3.0.4           169.79  210.73  320.24  365.15
> >>>3.2.3           193.48  269.43  424.74  519.85
> >>>3.3             184.15  282.57  442.64  529.93
> >>>3.3-branch      184.15  283.89  443.66  535.10
> >>>3.4-mainline    195.34  302.02  483.84  704.59
> >>>3.4-mainline*   175.84  269.78  426.77  627.13
> >>>3.5-tree-ssa    223.33  327.91  503.58  702.11 
> >>>
> >>>     
> >>>
> >>So with Andrew P's patch, mainline C is faster than 3.3, hurray!
> >>
> >>The slowdown at -O3 is of course due to unit-at-a-time, which can be a 
> >>big memory sucker if there are lots of opportunities to inline static 
> >>   
> >>
> >
> >I was doing some estimates there, and for compiling SPEC sources the
> >slowdown is proportinal to the code size growth caused by
> >unit-at-a-time.  So the problem does not seem to be increased memory
> >overhead, just the fact that we find more inlining opurtunities and
> >inline more. GCC is probably good testcase for this as many parts are
> >ordered top-down instead of bottom-up or old inliner likes.
> >
> 
> You can also use the test case for PR 10155 to see the RAM abuse (and PR 
> 11121 while you're at it :-)

I seem to recall what happens for GCC compilation - in my original
unit-at-a-time benchmarking I did same test and found almost all
slowdown to come from insn-recog. That one is large file divided into
few functions that all inline together (as they are called once) and we
run into several quadratic bottlenecks.  Originally it took about 1 hour
to compile on my K6 test machine.   I implemented
-finline-functions-called-once for this reason but during the review
process we concluded to remove it.  Perhaps this is good reason to put
it back.

I did elliminated majority of these (there was reload-cse, local cse,
sheduler problems) with single exception of GCSE that has problems with
doing transparent bitmaps for memories.  I didn't see direct way to get
out of that - you do have too many memories and to many alias classes to
care about.  The function analyses transparency of all these memories
even in the areas they are not live - this is how does PRE algorithm work
and without SSAPRE there is probably not too much to invent about.  The
algorithm simply is N^3.

One sollution would be to limit growth of the function I am inlining
once into so functions won't get too large and we won't exhibit this too
often.  I seen some of the testcases at unit-at-a-time and they usually
seem to exhibit similar behaviour.

There are several other things we can do with unit-at-a-time - limit
overall growth of compilation unit or decide whether to inline or not
based on overall growth, not function size.

Unfortunate thing on this path is that I will probably not be able to do
unit-at-a-time for C++ without some help and thus this work won't help
C++ where we get major problems.

Other sollution would be to not give up so easilly and try to push more
on keeping compiler linear on the compilation.  For GCSE we probably can
give up when having too many memory references and simply do not record
the others.  Similarly we can deal with quadratic bottlenecks in the
scheduler, but I am not sure whether we will run into problems in
register allocation that is unavoidably quadratic by definition but do
far does not appear to cause major obstackles.

What about others think?

Honza
> 
> Gr.
> Steven
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18  9:14     ` Jan Hubicka
@ 2003-06-18  9:15       ` Steven Bosscher
  2003-06-18 10:07         ` Jan Hubicka
  2003-06-18 22:03         ` Dara Hazeghi
  0 siblings, 2 replies; 71+ messages in thread
From: Steven Bosscher @ 2003-06-18  9:15 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dara Hazeghi, Andrew Pinski, gcc

Jan Hubicka wrote:

>>Dara Hazeghi wrote:
>>
>>    
>>
>>>Okay, here's the updated table. Checking is disabled
>>>on all compilers.
>>>
>>>gcc version     -O0     -O1     -O2     -O3
>>>2.7.2.3         128.04  131.02  163.51  176.01
>>>2.8.1           128.86  140.79  182.35  194.63
>>>2.90.29         130.60  140.57  186.29  199.32
>>>2.91.66         132.44  148.48  203.71  219.21
>>>2.95.3          143.38  180.97  250.94  276.85
>>>3.0.4           169.79  210.73  320.24  365.15
>>>3.2.3           193.48  269.43  424.74  519.85
>>>3.3             184.15  282.57  442.64  529.93
>>>3.3-branch      184.15  283.89  443.66  535.10
>>>3.4-mainline    195.34  302.02  483.84  704.59
>>>3.4-mainline*   175.84  269.78  426.77  627.13
>>>3.5-tree-ssa    223.33  327.91  503.58  702.11 
>>>
>>>      
>>>
>>So with Andrew P's patch, mainline C is faster than 3.3, hurray!
>>
>>The slowdown at -O3 is of course due to unit-at-a-time, which can be a 
>>big memory sucker if there are lots of opportunities to inline static 
>>    
>>
>
>I was doing some estimates there, and for compiling SPEC sources the
>slowdown is proportinal to the code size growth caused by
>unit-at-a-time.  So the problem does not seem to be increased memory
>overhead, just the fact that we find more inlining opurtunities and
>inline more. GCC is probably good testcase for this as many parts are
>ordered top-down instead of bottom-up or old inliner likes.
>

You can also use the test case for PR 10155 to see the RAM abuse (and PR 
11121 while you're at it :-)

Gr.
Steven




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18  8:41   ` Steven Bosscher
@ 2003-06-18  9:14     ` Jan Hubicka
  2003-06-18  9:15       ` Steven Bosscher
  0 siblings, 1 reply; 71+ messages in thread
From: Jan Hubicka @ 2003-06-18  9:14 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Dara Hazeghi, Andrew Pinski, gcc

> Dara Hazeghi wrote:
> 
> >Okay, here's the updated table. Checking is disabled
> >on all compilers.
> >
> >gcc version     -O0     -O1     -O2     -O3
> >2.7.2.3         128.04  131.02  163.51  176.01
> >2.8.1           128.86  140.79  182.35  194.63
> >2.90.29         130.60  140.57  186.29  199.32
> >2.91.66         132.44  148.48  203.71  219.21
> >2.95.3          143.38  180.97  250.94  276.85
> >3.0.4           169.79  210.73  320.24  365.15
> >3.2.3           193.48  269.43  424.74  519.85
> >3.3             184.15  282.57  442.64  529.93
> >3.3-branch      184.15  283.89  443.66  535.10
> >3.4-mainline    195.34  302.02  483.84  704.59
> >3.4-mainline*   175.84  269.78  426.77  627.13
> >3.5-tree-ssa    223.33  327.91  503.58  702.11 
> >
> 
> So with Andrew P's patch, mainline C is faster than 3.3, hurray!
> 
> The slowdown at -O3 is of course due to unit-at-a-time, which can be a 
> big memory sucker if there are lots of opportunities to inline static 

I was doing some estimates there, and for compiling SPEC sources the
slowdown is proportinal to the code size growth caused by
unit-at-a-time.  So the problem does not seem to be increased memory
overhead, just the fact that we find more inlining opurtunities and
inline more. GCC is probably good testcase for this as many parts are
ordered top-down instead of bottom-up or old inliner likes.

It only indicate that we should re-think the unit-at-a-time inlining
heuristics.  I will try to come with something.

> functions (there is a PR for that IIRC).  What would that number look 
> like with -fno-unit-at-a-time?
> 
> Do you also have numbers for binary sizes and runtime performance?  :-)
> 
> >*this includes Andrew Pinki's patch for PR10962
> >
> That patch looks like a very nice improvement.
Ineed :)

Honza
> 
> Gr.
> Steven
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18  7:43 ` Dara Hazeghi
@ 2003-06-18  8:41   ` Steven Bosscher
  2003-06-18  9:14     ` Jan Hubicka
  2003-06-18 14:00   ` Scott Robert Ladd
  1 sibling, 1 reply; 71+ messages in thread
From: Steven Bosscher @ 2003-06-18  8:41 UTC (permalink / raw)
  To: Dara Hazeghi; +Cc: Andrew Pinski, gcc

Dara Hazeghi wrote:

>Okay, here's the updated table. Checking is disabled
>on all compilers.
>
>gcc version     -O0     -O1     -O2     -O3
>2.7.2.3         128.04  131.02  163.51  176.01
>2.8.1           128.86  140.79  182.35  194.63
>2.90.29         130.60  140.57  186.29  199.32
>2.91.66         132.44  148.48  203.71  219.21
>2.95.3          143.38  180.97  250.94  276.85
>3.0.4           169.79  210.73  320.24  365.15
>3.2.3           193.48  269.43  424.74  519.85
>3.3             184.15  282.57  442.64  529.93
>3.3-branch      184.15  283.89  443.66  535.10
>3.4-mainline    195.34  302.02  483.84  704.59
>3.4-mainline*   175.84  269.78  426.77  627.13
>3.5-tree-ssa    223.33  327.91  503.58  702.11 
>

So with Andrew P's patch, mainline C is faster than 3.3, hurray!

The slowdown at -O3 is of course due to unit-at-a-time, which can be a 
big memory sucker if there are lots of opportunities to inline static 
functions (there is a PR for that IIRC).  What would that number look 
like with -fno-unit-at-a-time?

Do you also have numbers for binary sizes and runtime performance?  :-)

>*this includes Andrew Pinki's patch for PR10962
>
That patch looks like a very nice improvement.

Gr.
Steven


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
  2003-06-18  3:38 Andrew Pinski
@ 2003-06-18  7:43 ` Dara Hazeghi
  2003-06-18  8:41   ` Steven Bosscher
  2003-06-18 14:00   ` Scott Robert Ladd
  0 siblings, 2 replies; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-18  7:43 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc

Okay, here's the updated table. Checking is disabled
on all compilers.

gcc version     -O0     -O1     -O2     -O3
2.7.2.3         128.04  131.02  163.51  176.01
2.8.1           128.86  140.79  182.35  194.63
2.90.29         130.60  140.57  186.29  199.32
2.91.66         132.44  148.48  203.71  219.21
2.95.3          143.38  180.97  250.94  276.85
3.0.4           169.79  210.73  320.24  365.15
3.2.3           193.48  269.43  424.74  519.85
3.3             184.15  282.57  442.64  529.93
3.3-branch      184.15  283.89  443.66  535.10
3.4-mainline    195.34  302.02  483.84  704.59
3.4-mainline*   175.84  269.78  426.77  627.13
3.5-tree-ssa    223.33  327.91  503.58  702.11 

*this includes Andrew Pinki's patch for PR10962

icc* version    -O0     -O1*    -O2     -O3*
5.01*           158.47  ~       293.47  293.11
6.0             142.81  ~       227.25  227.88
7.1             153.95  ~       243.35  243.78

*icc is Intel's C++ Compiler for Linux (unsupported
noncommerical version)
*icc sets -O1 and -O2 to be the same
*icc claims -O2 and -O3 are different, but I'm not
sure how, as compile time indicates
*icc 5.0 would not compile df.c, so df.c was compiled
with gcc for this test

Dara

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
       [not found] <3EEFA473.1020800@student.tudelft.nl>
@ 2003-06-18  4:36 ` Dara Hazeghi
  0 siblings, 0 replies; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-18  4:36 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc

Steve asked the obvious question offlist: with or
without checking. Well, it looks like in the process
of updating, I goofed. As a result, everything has
checking disabled, save for 3.4 mainline (ie the
tree-ssa numbers and the 3.3-branch numbers are good).
I'll have corrected results shortly...

Dara

> >
> >Hopefully this table gives some sense of where
> thing
> >are at.
> >
> >gcc version     -O0     -O1     -O2     -O3
> >2.7.2.3         128.04  131.02  163.51  176.01
> >2.8.1           128.86  140.79  182.35  194.63
> >2.90.29         130.60  140.57  186.29  199.32
> >2.91.66         132.44  148.48  203.71  219.21
> >2.95.3          143.38  180.97  250.94  276.85
> >3.0.4           169.79  210.73  320.24  365.15
> >3.2.3           193.48  269.43  424.74  519.85
> >3.3             184.15  282.57  442.64  529.93
> >3.3-branch      184.15  283.89  443.66  535.10
> >3.4-mainline    203.58  326.35  514.49  783.59
> >3.5-tree-ssa    223.33  327.91  503.58  702.11
> >
> >icc* version    -O0     -O1*    -O2     -O3*
> >5.01*           158.47  ~       293.47  293.11
> >6.0             142.81  ~       227.25  227.88
> >7.1             153.95  ~       243.35  243.78
> >
> >*icc is Intel's C++ Compiler for Linux (unsupported
> >noncommerical version)
> >*icc sets -O1 and -O2 to be the same
> >*icc claims -O2 and -O3 are different, but I'm not
> >sure how, as compile time indicates
> >*icc 5.0 would not compile df.c, so df.c was
> compiled
> >with gcc for this test
> >
> >Test conducted was compiling cc1 from gcc 3.2.3 on
> >i686-pc-linux-gnu with different versions of gcc.
> cvs
> >snapshots for 3.3-branch, 3.4-mainline and
> >3.5-tree-ssa were from 20030614.
> >
> >Cheers,
> >
> >Dara
> >
> 
> The usual question: configured how (ie. checking
> enabled/disabled, etc)?
> 
> 

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: C compile time
@ 2003-06-18  3:38 Andrew Pinski
  2003-06-18  7:43 ` Dara Hazeghi
  0 siblings, 1 reply; 71+ messages in thread
From: Andrew Pinski @ 2003-06-18  3:38 UTC (permalink / raw)
  To: dhazeghi, gcc; +Cc: Andrew Pinski

Dara,
For some reason (University of Cincinnati's network sucks), I cannot 
reply to the original message,
I also have to use list archives, :(.

Can you see how my patch in 10962 will help compile time (mostly it 
will help at -O0 but it could
help other places) on the mainline?
I have several patches which I will be submitting (after I get my 
copyright forms submitted, still
waiting for them to arrive), most of them removes invariant loads in 
loops (Shikari [shameless plug]
from the CHUD tools on Mac OS X are good to find these) and some which 
will cause sib calling to
happen more in some places.

Also did you compile the mainline and 3.5-tree-ssa with 
--disable-checking because it looks like you
did not from looking at the numbers?


Also for the mainline (and maybe 3.5-tree-ssa) could you run with 
-ftime-report, yes I know this will
report a lot of information but you could use a spread-sheet to look at 
the data and maybe give an idea
of where the problem is?

Thanks,
Andrew Pinski

> Hello,
>
> after Wolfgang's post about C++ compile times, my
> curiosity was piqued to check how C compile times have
> been going. I used compiling gcc 3.2.3's cc1 as the
> test.
>
> Hopefully this table gives some sense of where thing
> are at.
>
> gcc version     -O0     -O1     -O2     -O3
> 2.7.2.3         128.04  131.02  163.51  176.01
> 2.8.1           128.86  140.79  182.35  194.63
> 2.90.29         130.60  140.57  186.29  199.32
> 2.91.66         132.44  148.48  203.71  219.21
> 2.95.3          143.38  180.97  250.94  276.85
> 3.0.4           169.79  210.73  320.24  365.15
> 3.2.3           193.48  269.43  424.74  519.85
> 3.3             184.15  282.57  442.64  529.93
> 3.3-branch      184.15  283.89  443.66  535.10
> 3.4-mainline    203.58  326.35  514.49  783.59
> 3.5-tree-ssa    223.33  327.91  503.58  702.11
>
> icc* version    -O0     -O1*    -O2     -O3*
> 5.01*           158.47  ~       293.47  293.11
> 6.0             142.81  ~       227.25  227.88
> 7.1             153.95  ~       243.35  243.78
>
> *icc is Intel's C++ Compiler for Linux (unsupported
> noncommerical version)
> *icc sets -O1 and -O2 to be the same
> *icc claims -O2 and -O3 are different, but I'm not
> sure how, as compile time indicates
> *icc 5.0 would not compile df.c, so df.c was compiled
> with gcc for this test
>
> Test conducted was compiling cc1 from gcc 3.2.3 on
> i686-pc-linux-gnu with different versions of gcc. cvs
> snapshots for 3.3-branch, 3.4-mainline and
> 3.5-tree-ssa were from 20030614.
>
> Cheers,
>
> Dara

^ permalink raw reply	[flat|nested] 71+ messages in thread

* C compile time
@ 2003-06-18  2:31 Dara Hazeghi
  2003-06-18 10:38 ` Joseph S. Myers
  0 siblings, 1 reply; 71+ messages in thread
From: Dara Hazeghi @ 2003-06-18  2:31 UTC (permalink / raw)
  To: gcc

Hello,

after Wolfgang's post about C++ compile times, my
curiosity was piqued to check how C compile times have
been going. I used compiling gcc 3.2.3's cc1 as the
test.

Hopefully this table gives some sense of where thing
are at.

gcc version     -O0     -O1     -O2     -O3
2.7.2.3         128.04  131.02  163.51  176.01
2.8.1           128.86  140.79  182.35  194.63
2.90.29         130.60  140.57  186.29  199.32
2.91.66         132.44  148.48  203.71  219.21
2.95.3          143.38  180.97  250.94  276.85
3.0.4           169.79  210.73  320.24  365.15
3.2.3           193.48  269.43  424.74  519.85
3.3             184.15  282.57  442.64  529.93
3.3-branch      184.15  283.89  443.66  535.10
3.4-mainline    203.58  326.35  514.49  783.59
3.5-tree-ssa    223.33  327.91  503.58  702.11

icc* version    -O0     -O1*    -O2     -O3*
5.01*           158.47  ~       293.47  293.11
6.0             142.81  ~       227.25  227.88
7.1             153.95  ~       243.35  243.78

*icc is Intel's C++ Compiler for Linux (unsupported
noncommerical version)
*icc sets -O1 and -O2 to be the same
*icc claims -O2 and -O3 are different, but I'm not
sure how, as compile time indicates
*icc 5.0 would not compile df.c, so df.c was compiled
with gcc for this test

Test conducted was compiling cc1 from gcc 3.2.3 on
i686-pc-linux-gnu with different versions of gcc. cvs
snapshots for 3.3-branch, 3.4-mainline and
3.5-tree-ssa were from 20030614.

Cheers,

Dara

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2003-07-04  5:39 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-19 20:16 C compile time Dara Hazeghi
2003-06-19 20:16 ` Andrew Pinski
2003-06-19 20:22 ` Diego Novillo
2003-06-19 21:58   ` Dara Hazeghi
2003-06-19 21:58     ` Diego Novillo
2003-06-20 22:42       ` Dara Hazeghi
2003-06-21  0:34         ` Diego Novillo
2003-06-19 21:59     ` Jan Hubicka
2003-06-19 20:44 ` Jan Hubicka
2003-06-19 21:23   ` Dara Hazeghi
2003-06-19 21:23     ` Jan Hubicka
2003-06-19 21:26       ` Dara Hazeghi
2003-06-19 21:31         ` Jan Hubicka
2003-06-19 21:59           ` Jan Hubicka
2003-06-20  0:55             ` Dara Hazeghi
2003-06-19 22:10 ` Steven Bosscher
2003-06-19 22:30   ` Steven Bosscher
  -- strict thread matches above, loose matches on Subject: below --
2003-06-30 15:28 Robert Dewar
2003-06-30 14:25 Robert Dewar
2003-06-30 14:58 ` Daniel Berlin
2003-06-29 13:51 Robert Dewar
2003-06-30 13:50 ` Paul Koning
2003-06-19 14:58 Richard Guenther
2003-06-18 21:51 Chris Lattner
2003-06-18 21:18 Chris Lattner
2003-06-18 17:52 Chris Lattner
2003-06-18 18:01 ` Jan Hubicka
2003-06-18 18:08   ` Chris Lattner
2003-06-18 18:28 ` Wolfgang Bangerth
2003-06-18 18:48   ` Chris Lattner
2003-06-18 18:57     ` Wolfgang Bangerth
2003-06-18 19:28       ` Chris Lattner
2003-06-18 19:30         ` Wolfgang Bangerth
2003-06-18 19:31           ` Chris Lattner
2003-06-18 16:34 Benjamin Kosnik
2003-06-18 20:11 ` Mark Mitchell
2003-06-18 20:49   ` Jan Hubicka
2003-06-18 20:52   ` Zack Weinberg
2003-06-18 21:26     ` Mark Mitchell
2003-06-18 21:51       ` Zack Weinberg
2003-06-18 23:09         ` Mark Mitchell
2003-06-19 15:21       ` Jan Hubicka
2003-06-19 16:31         ` Mark Mitchell
2003-06-19 16:36           ` Jan Hubicka
2003-06-19 16:41             ` Mark Mitchell
2003-06-19 17:08               ` Jan Hubicka
2003-06-19 17:33               ` Jeff Sturm
2003-06-18 16:12 Wolfgang Bangerth
2003-06-18 17:48 ` Jan Hubicka
     [not found] <Pine.LNX.4.44.0306181249160.6712-100000@bellatrix.tat.physik.uni-tuebingen. de>
2003-06-18 15:54 ` Mark Mitchell
2003-06-18 17:42   ` Jan Hubicka
2003-06-18 13:08 Richard Guenther
     [not found] <3EEFA473.1020800@student.tudelft.nl>
2003-06-18  4:36 ` Dara Hazeghi
2003-06-18  3:38 Andrew Pinski
2003-06-18  7:43 ` Dara Hazeghi
2003-06-18  8:41   ` Steven Bosscher
2003-06-18  9:14     ` Jan Hubicka
2003-06-18  9:15       ` Steven Bosscher
2003-06-18 10:07         ` Jan Hubicka
2003-06-18 10:55           ` Steven Bosscher
2003-06-18 12:38             ` Jan Hubicka
2003-06-18 12:51               ` Jan Hubicka
2003-06-18 22:03         ` Dara Hazeghi
2003-06-20 20:36           ` Scott Robert Ladd
2003-06-21  0:31             ` Dara Hazeghi
2003-06-21 16:14             ` Michael S. Zick
2003-07-04  7:14           ` Ben Elliston
2003-06-18 14:00   ` Scott Robert Ladd
2003-06-18  2:31 Dara Hazeghi
2003-06-18 10:38 ` Joseph S. Myers
2003-06-18 20:36   ` Dara Hazeghi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).