public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-08-06 13:16 Brad Lucier
  1999-08-06 13:54 ` Joern Rennecke
  1999-08-31 23:20 ` Brad Lucier
  0 siblings, 2 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-06 13:16 UTC (permalink / raw)
  To: gcc, gcc-bugs, lucier; +Cc: staff, hosking, wilker

To follow up on my comments that gcc-2.95 is much slower than egcs-1.1.2
on the alpha-ev6 for some files, here are some timing data for a profiled
cc1 on alphaev6-unknown-linux-gnu with glibc-2.1.1, binutils-2.9.5.0.4,
and kernel 2.2.10.

Because the various timings on this machine are screwed up, I don't know
how to interpret the information precisely.  But it can give you a
an idea of the relative times that various parts of the process took.

gcc was called with

gcc -fPIC -save-temps -O1

on eight relatively large files (basically, each of these files
contains a 25,000+ line procedure to be compiled, with a total
of 18 local variables and one argument).

The output from cc1 on the first of these files (which is typical) is:

 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 21.472976
time in integration: 0.000000
time in jump: 1.503040
time in cse: 0.000000
time in gcse: 0.000000
time in loop: 0.000000
time in cse2: 0.000000
time in branch-prob: 0.000000
time in flow: 5.866736
time in combine: 0.000000
time in regmove: 0.000000
time in sched: 0.000000
time in local-alloc: 0.000000
time in global-alloc: 44.732032
time in flow2: 0.000000
time in sched2: 0.000000
time in shorten-branch: 1.636752
time in stack-reg: 0.000000
time in final: 7.841184
time in varconst: 0.031232
time in symout: 0.000000
time in dump: 0.000000

If I read this correctly, it's spending a lot of time in reload
(global_alloc isn't in the call graph, and reload is; in toplev.c, you
see that one or the other is called, but the time is reported above as
global_alloc either way).  Perhaps there's a problem in reload.

The gprof output file can be found at

http://www.math.purdue.edu/~lucier/gmon.summary.gz

The summary information from gprof for cc1 begins:

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  7.74      2.74     2.74    90228     0.03     0.06  order_regs_for_reload
  6.94      5.21     2.46   358919     0.01     0.01  find_reloads
  6.82      7.62     2.42        8   302.49  4312.47  yyparse
  6.51      9.93     2.31 42370698     0.00     0.00  bitmap_bit_p
  3.64     11.22     1.29  2455661     0.00     0.00  yylex
  2.60     12.15     0.92   302171     0.00     0.00  record_reg_classes
  2.50     13.03     0.89                             hard_reg_use_compare
  2.32     13.85     0.82       24    34.26    58.61  stupid_life_analysis
  1.89     14.52     0.67   512830     0.00     0.00  for_each_rtx
  1.65     15.11     0.59  8208660     0.00     0.00  count_pseudo

Some selected information from the call graph about reload, which seems
to take a long time:

-----------------------------------------------
                2.74    2.27   90228/90228       find_reload_regs [7]
[8]     14.1    2.74    2.27   90228         order_regs_for_reload [8]
                0.59    0.91 8208660/8208660     count_pseudo [23]
                0.59    0.00 10827360/42370698     bitmap_bit_p [13]
                0.18    0.00 5503908/5503956     bitmap_clear [93]
                0.00    0.00   90228/704750      bitmap_initialize [259]
-----------------------------------------------
                0.17    4.56      16/16          reload [6]
[9]     13.3    0.17    4.56      16         reload_as_needed [9]
                0.26    1.83   90228/90228       emit_reload_insns [15]
                0.38    0.95   90228/90228       choose_reload_regs [30]
                0.70    0.29  102504/358919      find_reloads [11]
                0.04    0.04  256335/864560      note_stores [74]
                0.02    0.00   90228/90228       subst_reloads [247]
                0.00    0.02   12276/268691      eliminate_regs_in_insn [52]
                0.01    0.00   50798/84370       set_offsets_for_label [312]
                0.01    0.00   90228/2814858     asm_noperands [96]
                0.00    0.00   12276/268691      update_eliminable_offsets [216]
                0.00    0.00      16/40          set_initial_elim_offsets [475]
-----------------------------------------------
                0.21    3.32      24/24          reload [6]
[10]    10.0    0.21    3.32      24         calculate_needs_all_insns [10]
                1.76    0.74  256415/358919      find_reloads [11]
                0.06    0.42  256415/268691      eliminate_regs_in_insn [52]
                0.21    0.00   90228/90228       calculate_needs [83]
                0.07    0.01  122572/122572      set_label_offsets [146]
                0.03    0.00  256415/268691      update_eliminable_offsets [216]
                0.02    0.00  256415/1897960     single_set [109]
-----------------------------------------------
                0.70    0.29  102504/358919      reload_as_needed [9]
                1.76    0.74  256415/358919      calculate_needs_all_insns [10]
[11]     9.8    2.46    1.03  358919         find_reloads [11]
                0.19    0.14  219242/243152      push_reload [57]
                0.09    0.14  358911/1750922     extract_insn [33]
                0.13    0.00 1993307/3091999     reg_fits_class_p [88]
                0.08    0.02  310922/310922      combine_reloads [123]
                0.03    0.08   92668/92668       find_reloads_address [125]
                0.08    0.00 1692578/1910746     reg_class_subset_p [136]
                0.03    0.00  358919/1897960     single_set [109]
                0.02    0.00  237120/237388      reg_alternate_class [281]
                0.01    0.00  237120/355215      reg_preferred_class [311]
                0.01    0.00   97392/304810      normal_memory_operand [268]
                0.00    0.00     676/1014        zap_mask [724]
-----------------------------------------------
[12]     6.9    0.92    1.51  218511+1228287 <cycle 3 as a whole> [12]
                0.32    0.42  267378+208665      expand_expr <cycle 3> [40]
                0.10    0.27  246016             gen_movdi <cycle 3> [58]
                0.09    0.14   56286             expand_binop <cycle 3> [80]
                0.06    0.10  246688             emit_move_insn_1 <cycle 3> [99]
                0.02    0.13   25803             do_jump_for_compare <cycle 3> [106]
                0.08    0.05   65220             store_expr <cycle 3> [113]
                0.04    0.07  125120             emit_move_insn <cycle 3> [120]
                0.03    0.06   63104             expand_assignment <cycle 3> [130]
                0.03    0.04   73195             memory_address <cycle 3> [148]
                0.03    0.05   25811+2040        emit_cmp_insn <cycle 3> [149]
                0.03    0.02   25793+2040        do_jump <cycle 3> [181]
                0.01    0.03   25811             alpha_emit_conditional_branch <cycle 3> [188]
                0.01    0.02   29226             copy_to_mode_reg <cycle 3> [209]
                0.01    0.02   28480             force_reg <cycle 3> [234]
                0.01    0.01   32680             change_address <cycle 3> [240]
                0.00    0.02   10850             gen_ble <cycle 3> [246]
                0.01    0.01   25803             compare_from_rtx <cycle 3> [264]
                0.00    0.01    5233             gen_bgt <cycle 3> [300]
                0.01    0.00   23795             compare <cycle 3> [305]
                0.00    0.01    4997             gen_bne <cycle 3> [329]
                0.01    0.00   10500+37504       force_operand <cycle 3> [349]
                0.01    0.00    9808             expand_shift <cycle 3> [366]
                0.00    0.00    2008             gen_bgtu <cycle 3> [392]
                0.00    0.00    2072             emit_unop_insn <cycle 3> [394]
                0.00    0.00    1973             gen_beq <cycle 3> [407]
                0.00    0.00    4088             convert_modes <cycle 3> [409]
                0.00    0.00    2080             convert_move <cycle 3> [426]
                0.00    0.00     698             gen_blt <cycle 3> [431]
                0.00    0.00     140             expand_divmod <cycle 3> [456]
                0.00    0.00      32             expand_call <cycle 3> [459]
                0.00    0.00     662             expand_mult <cycle 3> [525]
                0.00    0.00      52             gen_bge <cycle 3> [571]
                0.00    0.00      88             copy_to_reg <cycle 3> [584]
                0.00    0.00      16             emit_libcall_block <cycle 3> [595]
                0.00    0.00      32             load_register_parameters <cycle 3> [623]
                0.00    0.00      32             precompute_arguments <cycle 3> [628]
                0.00    0.00      32             precompute_register_parameters <cycle 3> [639]
                0.00    0.00    1058             jumpifnot <cycle 3> [721]
-----------------------------------------------
                0.45    0.00 8208660/42370698     count_pseudo [23]
                0.59    0.00 10827360/42370698     order_regs_for_reload [8]
                0.63    0.00 11549184/42370698     choose_reload_regs [30]
                0.64    0.00 11785494/42370698     finish_spills [25]
[13]     6.5    2.31    0.00 42370698         bitmap_bit_p [13]
-----------------------------------------------
                0.21    2.03      24/24          rest_of_compilation [5]
[14]     6.3    0.21    2.03      24         final [14]
                0.22    1.81  497630/497630      final_scan_insn [17]
                0.00    0.00      24/5328        oballoc [485]
                0.00    0.00      24/48          check_exception_handler_labels [816]
                0.00    0.00      24/24          init_insn_eh_region [861]
                0.00    0.00      24/104         init_recog [804]
                0.00    0.00      24/24          free_insn_eh_region [856]
-----------------------------------------------
                0.26    1.83   90228/90228       reload_as_needed [9]
[15]     5.9    0.26    1.83   90228         emit_reload_insns [15]
                0.05    1.49  121568/121568      gen_reload [22]
                0.11    0.02 2218446/2218446     emit_insns_before [114]
                0.04    0.00  227736/519956      rtx_equal_p [128]
                0.01    0.02   59574/59574       reg_set_p [241]
                0.02    0.00  102417/102417      reload_reg_reaches_end_p [250]
                0.01    0.01  102409/102409      push_to_sequence [265]
                0.02    0.00   35821/35957       reg_mentioned_p [270]
                0.01    0.01   35821/864560      note_stores [74]
                0.01    0.00  121568/541814      end_sequence [195]
                0.01    0.00   71642/1897960     single_set [109]
                0.00    0.00   85747/363523      get_last_insn [271]
                0.00    0.00  157389/279445      get_insns [371]
                0.00    0.00   19159/541814      start_sequence [156]
                0.00    0.00   35829/422200      find_reg_note [238]
                0.00    0.00   19159/19311       emit_insns [482]
                0.00    0.00    1262/1262        delete_output_reload [539]
-----------------------------------------------
[16]     5.8    0.39    1.66  417809+698063  <cycle 7 as a whole> [16]
                0.20    1.16  317912             build_binary_op <cycle 7> [29]
                0.15    0.00  707786             default_conversion <cycle 7> [101]
                0.01    0.06   25861             build_unary_op <cycle 7> [155]
                0.01    0.00   35813             truthvalue_conversion <cycle 7> [310]
-----------------------------------------------

Brad Lucier    lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread
* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-11-09  8:26 Brad Lucier
  1999-11-30 23:37 ` Brad Lucier
  0 siblings, 1 reply; 54+ messages in thread
From: Brad Lucier @ 1999-11-09  8:26 UTC (permalink / raw)
  To: law, amylaar
  Cc: lucier, rth, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

> > ~8% off my testcase.  
> 
> Do you have any profile data for that?

I have some data for compiling a large, integer, file with

-O2 -mcpu=ev6 -fno-math-errno -fPIC

on the alpha.  However (1) I don't know how characteristic the program
is to others, and (2) I don't usually compile this file with -O2.  So
if you find this data useful, great, otherwise, ignore it.

The times for each pass are:

popov-53% /export/u10/egcs-prof/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -mcpu=ev6 -fno-math-errno -mieee -fPIC -O2 _meroon.i
__copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20___meroon {GC 160095k -> 33708k in 0.929} {GC 45097k -> 33687k in 0.940} {GC 47074k -> 37662k in 1.075} {GC 50287k -> 35884k in 1.022} {GC 54712k -> 40701k in 1.212} {GC 64544k -> 42767k in 1.267} ___init_proc {GC 86293k -> 3001k in 0.069} {GC 6054k -> 3060k in 0.145} ____20___meroon
time in parse: 21.162608 (-4%)
time in integration: 0.000000 (-0%)
time in jump: 516.838848 (-101%)
time in cse: 15.099696 (-3%)
time in gcse: -1645.-331280 (323%)
time in loop: 1.242448 (-0%)
time in cse2: 13.482464 (-3%)
time in branch-prob: 0.000000 (-0%)
time in flow: 8.959680 (-2%)
time in combine: 11.807648 (-2%)
time in regmove: 3.503840 (-1%)
time in sched: 356.688960 (-70%)
time in local-alloc: 5.562224 (-1%)
time in global-alloc: 34.060448 (-7%)
time in flow2: 8.440448 (-2%)
time in peephole2: 0.180560 (-0%)
time in sched2: 111.030736 (-22%)
time in shorten-branch: 0.581696 (-0%)
time in final: 3.423808 (-1%)
time in varconst: 0.001952 (-0%)
time in symout: 0.000000 (-0%)
time in dump: 0.000000 (-0%)
time in gc: 6.659248 (-1%)

So it seems that the int's used to contain the times in toplev.c don't
quite cut it here--change to unsigned longs (gains 1 bit on 32bit machines,
32 bits on the alpha) or doubles?

The top times in the profile file are:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 24.79    435.90   435.90   205024     0.00     0.00  pre_expr_reaches_here_p_work
 21.63    816.15   380.25 812491634     0.00     0.00  expr_killed_p
  8.85    971.72   155.58        1   155.58   535.82  compute_ae_kill
  7.68   1106.72   135.00    37475     0.00     0.00  compute_block_backward_dependences
  6.31   1217.65   110.93 122524058     0.00     0.00  rtx_renumbered_equal_p
  5.00   1305.47    87.82    97294     0.00     0.00  compute_transp
  4.24   1380.02    74.54 67176104     0.00     0.00  find_cross_jump
  3.67   1444.46    64.44        3    21.48    33.16  compute_hash_table
  2.39   1486.51    42.05  9018524     0.00     0.00  sbitmap_union_of_diff
  1.57   1514.14    27.63    83378     0.00     0.00  insert_expr_in_table
  1.33   1537.50    23.36   223375     0.00     0.00  record_one_set
  1.04   1555.83    18.33    74953     0.00     0.00  count_or_remove_death_notes
  0.92   1571.98    16.16       15     1.08    14.85  jump_optimize_1
  0.70   1584.28    12.30   389447     0.00     0.00  sbitmap_a_and_b
  0.69   1596.49    12.21   389436     0.00     0.00  sbitmap_a_or_b

Something seems the matter with the data for pre_expr_reaches_here_p_work.

Brad

^ permalink raw reply	[flat|nested] 54+ messages in thread
* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-11-05 14:43 Brad Lucier
  1999-11-30 23:37 ` Brad Lucier
  0 siblings, 1 reply; 54+ messages in thread
From: Brad Lucier @ 1999-11-05 14:43 UTC (permalink / raw)
  To: lucier, amylaar
  Cc: rth, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

Joern:

With your latest patch, the time for global_alloc is reduced to:

time in global-alloc: 30.228672 (20%)

which is down from 883 seconds yesterday.  Fantastic!

The top times for various functions are now:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  8.62     10.72    10.72    26627     0.40     0.40  record_conflicts
  7.36     19.87     9.15    30909     0.30     0.30  delete_from_jump_chain
  5.08     26.19     6.32        8   790.28  5353.28  jump_optimize_1
  4.65     31.98     5.79 10448965     0.00     0.00  rtx_renumbered_equal_p
  4.07     37.04     5.07  9164467     0.00     0.00  find_cross_jump
  3.64     41.57     4.53 16835940     0.00     0.00  next_active_insn
  3.41     45.81     4.24    93299     0.05     0.05  make_edge
  3.18     49.76     3.95   194464     0.02     0.02  record_one_conflict
  2.11     52.39     2.62        1  2623.05 123824.10  yyparse
  1.55     54.32     1.93  3127006     0.00     0.00  next_label
  1.50     56.18     1.87  2379715     0.00     0.00  yylex
  1.43     57.96     1.78        2   890.62   891.94  prune_preferences
  1.38     59.68     1.72        1  1715.82  1715.82  output_273
  1.32     61.32     1.64        2   819.82   819.82  mirror_conflicts

Brad

^ permalink raw reply	[flat|nested] 54+ messages in thread
* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-11-05 12:15 Thomas E Deweese
  1999-11-05 12:22 ` Joern Rennecke
                   ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Thomas E Deweese @ 1999-11-05 12:15 UTC (permalink / raw)
  To: Joern Rennecke, lucier; +Cc: gcc

JR> Well, since it worked for one place that looks for set bits in
JR> allocno sets, let's see if it works in the two other ones:

	I'm not 100% certain of the density of 1's in the bit fields
but my impression is that they are fairly sparse (I'm guessing,
especially in the 'conflicts' case).  In which case I would suggest at
least putting a check on any bits being set in 'word_' around the
inner for loop.

	Depending on the number of times the list is iterated over it
might even be worth doing a few steps of binary search on word_ to
throw out "large" sections of zero bits. Of course this will actually
slow things down if you approach 50% of the bits being set.

ie:
	if (word_ & 0xFFFF0000) {
	  if (word_ & 0xFF000000) {
	    // check high order byte
	  }
	  if (word_ & 0x00FF0000) {
	    // check 2nd from high order byte
          }
	}
	if (word_ & 0x0000FFFF) {
	  if (word_ & 0x0000FF00) {
	    // check 2nd from low order byte
	  }
	  if (word_ & 0x000000FF) {
	    // check low order byte
          }
	}

	I'm not certain if this is worth the 'uglyness' but this seems
to be a very tight loop given that word orienting the code made such a
large difference.

----


Index: global.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/global.c,v
retrieving revision 1.40
diff -p -r1.40 global.c
*** global.c    1999/11/05 08:31:48     1.40
--- global.c    1999/11/05 18:31:06
*************** static int allocno_row_words;
*** 139,144 ****
--- 139,172 ----
    &= ~ ((INT_TYPE) 1 << ((J) % INT_BITS)))
  /* END CYGNUS LOCAL */
  
+ /* For any allocno set in ALLOCNO_SET, set OUT_ALLOCNO to that allocno,
+    and execute CODE.  */
+ #define EXECUTE_IF_SET_IN_ALLOCNO_SET(ALLOCNO_SET, ALLOCNO, CODE)     \
+ do {                                                                  \
+   int i_;                                                             \
+   int allocno_;                                                               \
+   INT_TYPE *p_ = (ALLOCNO_SET);               \
+                                                                       \
+   for (i_ = allocno_row_words - 1, allocno_ = 0; i_ >= 0;             \
+        i_--, allocno_ += INT_BITS)                                    \
+     {                                                                 \
+       unsigned INT_TYPE word_ = (unsigned INT_TYPE) *p_++;            \
+                                                                       \
+       for ((ALLOCNO) = allocno_; word_; word_ >>= 1, (ALLOCNO)++)     \
+       {                                                               \
+         if (word_ & 1)                                                \
+           CODE;                                                       \
+       }                                                               \
+     }                                                                 \
+ } while (0)

-- 
							Thomas DeWeese
deweese@kodak.com
			"The only difference between theory and practice is
			 that in theory there isn't any." -- unknown

^ permalink raw reply	[flat|nested] 54+ messages in thread
* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-11-04 13:49 Brad Lucier
  1999-11-05 10:42 ` Joern Rennecke
  1999-11-30 23:37 ` Brad Lucier
  0 siblings, 2 replies; 54+ messages in thread
From: Brad Lucier @ 1999-11-04 13:49 UTC (permalink / raw)
  To: lucier, amylaar
  Cc: rth, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

> Could you benchmark this patch?
> 
> Thu Nov  4 06:15:40 1999  J"orn Rennecke <amylaar@cygnus.co.uk>
> 
> 	* global.c (CONFLICTP, SET_CONFLICT): Avoid signed division.
> 	(mirror_conflicts): New function.
> 	(global_alloc): Call it.
> 	(expand_preferences): Remove redundant CONFLICTP test.
> 	(find_reg, dump_conflicts): Likewise.
> 	(prune_preferences): Process conflicts one word at a time.
> 

I bootstrapped yesterday's mainline on alphaev6-unknown-linux-gnu,
and ran it on a similar test case.  Without the patch, I get:

/export/u10/egcs-prof/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -mcpu=ev6 -fno-math-errno -mieee -fPIC -O1 _meroon.i 
 ___H__20___meroon {GC 220613k -> 46104k in 1.295} {GC 79384k -> 50882k in 1.474} {GC 69598k -> 53435k in 1.583} ___init_proc {GC 93235k -> 3595k in 0.078} ____20___meroon
...
time in global-alloc: 1237.253728 (85%)
...

and the top functions by time were:

 29.05    142.54   142.54        2 71267.58 71273.92  prune_preferences
 13.86    210.53    68.00 1599926669     0.00     0.00  bitmap_bit_p
 11.99    269.37    58.84     2789    21.10    21.10  find_reg
 10.93    322.99    53.62        3 17873.05 40871.13  build_insn_chain
  9.55    369.85    46.86   194464     0.24     0.24  record_one_conflict
  2.21    380.71    10.85    26627     0.41     0.41  record_conflicts
  2.02    390.61     9.91    30909     0.32     0.32  delete_from_jump_chain

With the patch, I got

/export/u10/egcs-prof2/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -mcpu=ev6 -fno-math-errno -mieee -fPIC -O1 _meroon.i
 ___H__20___meroon {GC 220613k -> 46104k in 1.303} {GC 79384k -> 50882k in 1.459} {GC 69598k -> 53435k in 1.566} ___init_proc {GC 93235k -> 3595k in 0.080} ____20___meroon
...
time in global-alloc: 574.101744 (72%)
...

and the top functions were

 19.82     67.88    67.88 1599926669     0.00     0.00  bitmap_bit_p
 15.83    122.07    54.19        3 18063.15 41030.16  build_insn_chain
 14.52    171.77    49.71     2789    17.82    17.82  find_reg
 13.64    218.49    46.72   194464     0.24     0.24  record_one_conflict
  3.14    229.26    10.76    26627     0.40     0.40  record_conflicts
  2.94    239.32    10.06    30909     0.33     0.33  delete_from_jump_chain
...
  0.51    277.15     1.74        2   869.63   870.11  prune_preferences

So, even though global-alloc is still the biggest time sink, your patch
makes a *big* difference.  (I don't believe the times reported in each
pass of the compiler are reliable when it is profiled with -pg.)

Brad

^ permalink raw reply	[flat|nested] 54+ messages in thread
* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-08-06 23:17 Mike Stump
  1999-08-31 23:20 ` Mike Stump
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Stump @ 1999-08-06 23:17 UTC (permalink / raw)
  To: lucier; +Cc: gcc

> From: Brad Lucier <lucier@math.purdue.edu>
> To: amylaar@cygnus.co.uk (Joern Rennecke)
> Date: Fri, 6 Aug 1999 20:06:35 -0500 (EST)

> The biggest time sink seems to be the quadratic algorithm in
> prune_preferences in global.c.

Thanks for pointing out quadratic algorithms...  [ blush ] we usually
can't win with them, someone always finds _the_ testcase that will
kill it.

^ permalink raw reply	[flat|nested] 54+ messages in thread
* big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-08-04  9:15 Brad Lucier
  1999-08-31 23:20 ` Brad Lucier
  0 siblings, 1 reply; 54+ messages in thread
From: Brad Lucier @ 1999-08-04  9:15 UTC (permalink / raw)
  To: gcc, gcc-bugs; +Cc: lucier, staff, jlee, hosking, wilker

I'm running a Genetic Programming system, where compile times are as
important as run times.  I noticed that gcc-2.95 is a *lot* slower than
egcs-1.1.2 on my alpha-ev6 running Red Hat 6.0 with binutils 2.9.5.0.3

Here are some typical times:

egcs-1.1.2:

/usr/bin/time /usr/bin/gcc -mcpu=ev6 -fPIC -O1 -c -D___DYNAMIC -D___SINGLE_HOST system.c
106.60user 0.17system 0:13.49elapsed 790%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2248major+8092minor)pagefaults 0swaps

gcc-2.9.5:

/usr/bin/time /export/u10/gcc-2.95/bin/gcc -mcpu=ev6 -fPIC -O1 -c -D___DYNAMIC -D___SINGLE_HOST system.c
250.77user 0.24system 0:31.59elapsed 794%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2330major+15298minor)pagefaults 0swaps

(The user and system times on this box are too big by a factor of 8.)

I try to be an independent guy---I installed a test version compiled with
-pg, but gprof on my platform dumps core (both the original one and the one
with 2.9.5.0.3) so that isn't much help.

Any suggestions?  Anybody else notice this?

Brad Lucier   lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~1999-11-30 23:37 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-08-06 13:16 big slowdown in egcs-1.1.2->gcc-2.95 on alpha Brad Lucier
1999-08-06 13:54 ` Joern Rennecke
1999-08-06 16:25   ` Brad Lucier
1999-08-31 23:20     ` Brad Lucier
1999-08-06 18:06   ` Brad Lucier
1999-08-06 18:54     ` Joern Rennecke
1999-08-06 21:27       ` Brad Lucier
1999-08-07  1:04         ` Richard Henderson
1999-08-07  8:08           ` Brad Lucier
1999-08-31 23:20             ` Brad Lucier
1999-08-07  8:27           ` Brad Lucier
1999-08-31 23:20             ` Brad Lucier
1999-11-03 22:31             ` Joern Rennecke
1999-11-04 17:19               ` Richard Henderson
1999-11-05  0:33                 ` Jeffrey A Law
1999-11-30 23:37                   ` Jeffrey A Law
1999-11-30 23:37                 ` Richard Henderson
1999-11-30 23:37               ` Joern Rennecke
1999-08-31 23:20           ` Richard Henderson
1999-08-31 23:20         ` Brad Lucier
1999-08-11  1:57       ` Jeffrey A Law
1999-08-12 16:25         ` Joern Rennecke
1999-08-31 23:20           ` Joern Rennecke
1999-08-31 23:20         ` Jeffrey A Law
1999-08-31 23:20       ` Joern Rennecke
1999-08-31 23:20     ` Brad Lucier
1999-08-31 23:20   ` Joern Rennecke
1999-08-31 23:20 ` Brad Lucier
  -- strict thread matches above, loose matches on Subject: below --
1999-11-09  8:26 Brad Lucier
1999-11-30 23:37 ` Brad Lucier
1999-11-05 14:43 Brad Lucier
1999-11-30 23:37 ` Brad Lucier
1999-11-05 12:15 Thomas E Deweese
1999-11-05 12:22 ` Joern Rennecke
1999-11-30 23:37   ` Joern Rennecke
1999-11-05 12:31 ` Michael Meissner
1999-11-05 12:34   ` Joern Rennecke
1999-11-30 23:37     ` Joern Rennecke
1999-11-30 23:37   ` Michael Meissner
1999-11-30 23:37 ` Thomas E Deweese
1999-11-04 13:49 Brad Lucier
1999-11-05 10:42 ` Joern Rennecke
1999-11-05 15:44   ` Jeffrey A Law
1999-11-08 15:37     ` Joern Rennecke
1999-11-08 16:04       ` Jeffrey A Law
1999-11-30 23:37         ` Jeffrey A Law
1999-11-30 23:37       ` Joern Rennecke
1999-11-30 23:37     ` Jeffrey A Law
1999-11-30 23:37   ` Joern Rennecke
1999-11-30 23:37 ` Brad Lucier
1999-08-06 23:17 Mike Stump
1999-08-31 23:20 ` Mike Stump
1999-08-04  9:15 Brad Lucier
1999-08-31 23:20 ` Brad Lucier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).