public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* if-conversion a performance bottleneck
@ 2000-05-01 10:54 Brad Lucier
  2000-05-01 15:29 ` Richard Henderson
  0 siblings, 1 reply; 32+ messages in thread
From: Brad Lucier @ 2000-05-01 10:54 UTC (permalink / raw)
  To: gcc; +Cc: lucier

The patch

http://gcc.gnu.org/ml/gcc-patches/2000-04/msg00248.html

killed one of the biggest time hogs in gcc, but now Richard
Henderson's if-conversion pass has affected performance adversely.
Perhaps with the calls to verify_flow_info, this is a temporary effect,
I don't know.

On this file

http://www.math.purdue.edu/~lucier/_t-c-2.i.gz

called with these options

/export/u10/egcs-profile/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -fno-math-errno -mieee -g -mcpu=ev6 -O1 _t-c-2.i > & cc1.out &

with this compiler

popov-41% gcc -v
Reading specs from /export/u10/egcs-profile/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/specs
gcc version 2.96 20000430 (experimental)

I get the following performance on a 500 MHz alpha 21264:

 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20___t_2d_c_2d_2 {GC 47533k -> 10293k} {GC 15386k -> 10788k} {GC 14832k -> 11255k} ___init_proc {GC 24248k -> 2107k} ____20___t_2d_c_2d_2
Execution times (seconds)
 garbage collection    :   0.75 ( 0%) usr   0.00 ( 0%) sys   0.75 ( 0%) wall
 parser                :   6.55 ( 0%) usr   0.18 (15%) sys   6.73 ( 0%) wall
 varconst              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 integration           :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 jump                  :   6.81 ( 0%) usr   0.40 (32%) sys   7.21 ( 0%) wall
 CSE                   :   2.29 ( 0%) usr   0.00 ( 0%) sys   2.29 ( 0%) wall
 global CSE            :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 loop analysis         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 CSE 2                 :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 flow analysis         :  61.00 ( 2%) usr   0.07 ( 6%) sys  61.05 ( 2%) wall
 combiner              :   2.82 ( 0%) usr   0.00 ( 0%) sys   2.82 ( 0%) wall
 if-conversion         :2306.07 (64%) usr   0.44 (35%) sys2306.74 (64%) wall
 local alloc           :   1.31 ( 0%) usr   0.00 ( 0%) sys   1.31 ( 0%) wall
 global alloc          :   2.42 ( 0%) usr   0.04 ( 4%) sys   2.46 ( 0%) wall
 reload CSE regs       :   3.62 ( 0%) usr   0.00 ( 0%) sys   3.62 ( 0%) wall
 flow 2                :  30.20 ( 1%) usr   0.00 ( 0%) sys  30.20 ( 1%) wall
 if-conversion 2       :1196.27 (33%) usr   0.09 ( 7%) sys1196.01 (33%) wall
 shorten branches      :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall
 final                 :   2.69 ( 0%) usr   0.00 ( 1%) sys   2.70 ( 0%) wall
 symout                :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 rest of compilation   :   1.06 ( 0%) usr   0.00 ( 0%) sys   1.06 ( 0%) wall
 TOTAL                 :3624.12             1.27          3625.23

Some of the details are:

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 49.50    192.40   192.40        9 21378.26 21381.98  verify_flow_info
 28.10    301.64   109.24    40604     2.69     2.69  sbitmap_intersection_of_succs
  6.75    327.90    26.25    15025     1.75     1.75  sbitmap_intersection_of_preds
  5.38    348.79    20.89    20981     1.00     1.00  for_each_successor_phi
  1.87    356.06     7.27 25079015     0.00     0.00  bitmap_operation
...
-----------------------------------------------
                0.00  288.44       9/9           rest_of_compilation [5]
[6]     74.2    0.00  288.44       9         if_convert [6]
              192.40    0.03       9/9           verify_flow_info [7]
                2.17   90.46       6/9           compute_flow_dominators [8]
                0.00    3.27       1/10          update_life_info [11]
                0.00    0.03       9/37          free_basic_block_vars [99]
                0.03    0.00       9/18          compute_bb_for_insn [177]
                0.00    0.02   15384/15384       find_if_header [331]
                0.01    0.00       6/21          sbitmap_vector_alloc [222]
                0.00    0.00       1/25          allocate_reg_info [536]
                0.00    0.00       2/64864       max_reg_num [371]
                0.00    0.00       9/5201        get_max_uid [1093]
                0.00    0.00       1/974         sbitmap_alloc [1124]
                0.00    0.00       1/31743       sbitmap_zero [1070]
                0.00    0.00       1/1           count_or_remove_death_notes [1414]
-----------------------------------------------
              192.40    0.03       9/9           if_convert [6]
[7]     49.5  192.40    0.03       9         verify_flow_info [7]
                0.00    0.03   15008/26961       returnjump_p [183]
                0.00    0.00       3/87970       condjump_p [388]
                0.00    0.00       9/5201        get_max_uid [1093]
                0.00    0.00       9/231         get_insns [1157]
-----------------------------------------------
                1.09   45.23       3/9           flow_loops_find [10]
                2.17   90.46       6/9           if_convert [6]
[8]     35.7    3.26  135.69       9         compute_flow_dominators [8]
              109.24    0.01   40604/40604       sbitmap_intersection_of_succs [9]
               26.25    0.00   15025/15025       sbitmap_intersection_of_preds [13]
                0.17    0.00   55638/55638       sbitmap_a_and_b [88]
                0.02    0.00       9/21          sbitmap_vector_alloc [222]
                0.00    0.00       9/9           sbitmap_vector_ones [688]
                0.00    0.00       9/12          sbitmap_vector_zero [788]
                0.00    0.00       9/31743       sbitmap_zero [1070]
-----------------------------------------------
              109.24    0.01   40604/40604       compute_flow_dominators [8]
[9]     28.1  109.24    0.01   40604         sbitmap_intersection_of_succs [9]
                0.01    0.00   40604/55629       sbitmap_copy [396]
-----------------------------------------------


Brad Lucier

^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: if-conversion a performance bottleneck
@ 2000-05-04 22:32 Mike Stump
  2000-05-04 22:35 ` Richard Henderson
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Stump @ 2000-05-04 22:32 UTC (permalink / raw)
  To: lucier, rth; +Cc: gcc, matzmich

> Date: Thu, 4 May 2000 22:16:17 -0700
> From: Richard Henderson <rth@cygnus.com>

> > make -j bootstrap

> Err.. don't do that?  Try make -j3 instead.

Unfortunately -j3 gives you not three jobs, but an exponential cascade
of 3 jobs per recursive make level.  Yes, I have read the paper
`Recursive make considered harmful'.  :-)

I found that -j3 -l10 will at least try and limit them from expanding
too much, which is useful if your swap limited (just how did those 2
emacen grow to be 60M each, and netscape inflate to 90M? :-().

^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: if-conversion a performance bottleneck
@ 2000-05-04 23:28 Mathias Froehlich
  0 siblings, 0 replies; 32+ messages in thread
From: Mathias Froehlich @ 2000-05-04 23:28 UTC (permalink / raw)
  To: gcc

> On Thu, May 04, 2000 at 10:31:59PM -0700, Mike Stump wrote:
> > Unfortunately -j3 gives you not three jobs, but an exponential cascade
> > of 3 jobs per recursive make level.

> I know.  But 3 or 15 is still a lot better than 1000, which might
> be what you get with just -j.

> > I found that -j3 -l10 will at least try and limit them from expanding
> > too much, which is useful if your swap limited (just how did those 2
> > emacen grow to be 60M each, and netscape inflate to 90M? :-().

> That's what gigabytes of RAM are for.  ;-)

Or use a recent version of gmake (>=3.78.1), which implements a so
called "jobserver mode". This limits ther _overall_ number of working
make jobs to the number given by -j. This does also work for recursive
makes. 

Try it out it works great ...

  Regards,

     Mathias Froehlich

-- 
Mathias Fr"ohlich              e-mail: frohlich@na.uni-tuebingen.de
Institut f"ur Mathematik, Universit"at T"ubingen, D-72076 T"ubingen

^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: if-conversion a performance bottleneck
@ 2000-05-05 13:46 Brad Lucier
  0 siblings, 0 replies; 32+ messages in thread
From: Brad Lucier @ 2000-05-05 13:46 UTC (permalink / raw)
  To: lucier, pfeifer; +Cc: rth, mrs, gcc, matzmich

> On Fri, 5 May 2000, Brad Lucier wrote:
> > make -j 9 bootstrap
> > 
> > or whatever, and it doesn't seem to actually set off a bunch of parallel
> > jobs---definitely, by the time the stage1 compiler starts compiling
> > things, I'm down to one job at a time.  That's with the 2.2.13 kernel
> > on alpha with make 3.77; is this a known problem?  Should I upgrade
> > make?
> 
> Please do so and let us know the results! 
> 
> It would be very useful to document properly, how to use parallel
> makes for GCC -- perhaps you could come up with a patch for our
> documentation at?
> 
>   http://gcc.gnu.org/install/build.html

After upgrading make from version 3.77 to version 3.79,

make -j 9 bootstrap

works as I think it should---it consistently has about 9 processes
running in various subdirectories throughout the build process.

Brad

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2000-05-11 17:14 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-01 10:54 if-conversion a performance bottleneck Brad Lucier
2000-05-01 15:29 ` Richard Henderson
2000-05-01 18:01   ` Brad Lucier
2000-05-02  5:45     ` Michael Matz
2000-05-02 22:58       ` Michael Matz
2000-05-03  5:50         ` Brad Lucier
2000-05-03 20:05         ` Brad Lucier
2000-05-04 11:46           ` Michael Matz
2000-05-04 19:39             ` Brad Lucier
2000-05-04 22:16               ` Richard Henderson
2000-05-10  8:30                 ` Andreas Schwab
2000-05-10  9:36                   ` Joe Buck
2000-05-10  9:52                     ` Jeffrey A Law
2000-05-10 10:49                       ` Joe Buck
2000-05-10 14:17                     ` Joern Rennecke
2000-05-10 14:24                       ` GNU make options (was Re: if-conversion ...) Joe Buck
2000-05-04 11:55         ` if-conversion a performance bottleneck Richard Henderson
2000-05-05 12:01         ` Jeffrey A Law
2000-05-05 14:32           ` flow_d_f_o_compute misnamed? (was: if-conversion a performance...) Michael Matz
2000-05-05 14:53             ` Jeffrey A Law
2000-05-05 16:10               ` Michael Matz
2000-05-05 17:05                 ` Jeffrey A Law
2000-05-05 18:21                   ` Michael Matz
2000-05-05 19:04                     ` Michael Hayes
2000-05-11 17:14                     ` Jeffrey A Law
2000-05-04 22:32 if-conversion a performance bottleneck Mike Stump
2000-05-04 22:35 ` Richard Henderson
2000-05-05  6:12   ` Brad Lucier
2000-05-05 10:37     ` Richard Henderson
2000-05-05 13:35     ` Gerald Pfeifer
2000-05-04 23:28 Mathias Froehlich
2000-05-05 13:46 Brad Lucier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).