Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-08-06 13:16 Brad Lucier
  1999-08-06 13:54 ` Joern Rennecke
  1999-08-31 23:20 ` Brad Lucier
  0 siblings, 2 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-06 13:16 UTC (permalink / raw)
  To: gcc, gcc-bugs, lucier; +Cc: staff, hosking, wilker

To follow up on my comments that gcc-2.95 is much slower than egcs-1.1.2
on the alpha-ev6 for some files, here are some timing data for a profiled
cc1 on alphaev6-unknown-linux-gnu with glibc-2.1.1, binutils-2.9.5.0.4,
and kernel 2.2.10.

Because the various timings on this machine are screwed up, I don't know
how to interpret the information precisely.  But it can give you a
an idea of the relative times that various parts of the process took.

gcc was called with

gcc -fPIC -save-temps -O1

on eight relatively large files (basically, each of these files
contains a 25,000+ line procedure to be compiled, with a total
of 18 local variables and one argument).

The output from cc1 on the first of these files (which is typical) is:

 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 21.472976
time in integration: 0.000000
time in jump: 1.503040
time in cse: 0.000000
time in gcse: 0.000000
time in loop: 0.000000
time in cse2: 0.000000
time in branch-prob: 0.000000
time in flow: 5.866736
time in combine: 0.000000
time in regmove: 0.000000
time in sched: 0.000000
time in local-alloc: 0.000000
time in global-alloc: 44.732032
time in flow2: 0.000000
time in sched2: 0.000000
time in shorten-branch: 1.636752
time in stack-reg: 0.000000
time in final: 7.841184
time in varconst: 0.031232
time in symout: 0.000000
time in dump: 0.000000

If I read this correctly, it's spending a lot of time in reload
(global_alloc isn't in the call graph, and reload is; in toplev.c, you
see that one or the other is called, but the time is reported above as
global_alloc either way).  Perhaps there's a problem in reload.

The gprof output file can be found at

http://www.math.purdue.edu/~lucier/gmon.summary.gz

The summary information from gprof for cc1 begins:

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  7.74      2.74     2.74    90228     0.03     0.06  order_regs_for_reload
  6.94      5.21     2.46   358919     0.01     0.01  find_reloads
  6.82      7.62     2.42        8   302.49  4312.47  yyparse
  6.51      9.93     2.31 42370698     0.00     0.00  bitmap_bit_p
  3.64     11.22     1.29  2455661     0.00     0.00  yylex
  2.60     12.15     0.92   302171     0.00     0.00  record_reg_classes
  2.50     13.03     0.89                             hard_reg_use_compare
  2.32     13.85     0.82       24    34.26    58.61  stupid_life_analysis
  1.89     14.52     0.67   512830     0.00     0.00  for_each_rtx
  1.65     15.11     0.59  8208660     0.00     0.00  count_pseudo

Some selected information from the call graph about reload, which seems
to take a long time:

-----------------------------------------------
                2.74    2.27   90228/90228       find_reload_regs [7]
[8]     14.1    2.74    2.27   90228         order_regs_for_reload [8]
                0.59    0.91 8208660/8208660     count_pseudo [23]
                0.59    0.00 10827360/42370698     bitmap_bit_p [13]
                0.18    0.00 5503908/5503956     bitmap_clear [93]
                0.00    0.00   90228/704750      bitmap_initialize [259]
-----------------------------------------------
                0.17    4.56      16/16          reload [6]
[9]     13.3    0.17    4.56      16         reload_as_needed [9]
                0.26    1.83   90228/90228       emit_reload_insns [15]
                0.38    0.95   90228/90228       choose_reload_regs [30]
                0.70    0.29  102504/358919      find_reloads [11]
                0.04    0.04  256335/864560      note_stores [74]
                0.02    0.00   90228/90228       subst_reloads [247]
                0.00    0.02   12276/268691      eliminate_regs_in_insn [52]
                0.01    0.00   50798/84370       set_offsets_for_label [312]
                0.01    0.00   90228/2814858     asm_noperands [96]
                0.00    0.00   12276/268691      update_eliminable_offsets [216]
                0.00    0.00      16/40          set_initial_elim_offsets [475]
-----------------------------------------------
                0.21    3.32      24/24          reload [6]
[10]    10.0    0.21    3.32      24         calculate_needs_all_insns [10]
                1.76    0.74  256415/358919      find_reloads [11]
                0.06    0.42  256415/268691      eliminate_regs_in_insn [52]
                0.21    0.00   90228/90228       calculate_needs [83]
                0.07    0.01  122572/122572      set_label_offsets [146]
                0.03    0.00  256415/268691      update_eliminable_offsets [216]
                0.02    0.00  256415/1897960     single_set [109]
-----------------------------------------------
                0.70    0.29  102504/358919      reload_as_needed [9]
                1.76    0.74  256415/358919      calculate_needs_all_insns [10]
[11]     9.8    2.46    1.03  358919         find_reloads [11]
                0.19    0.14  219242/243152      push_reload [57]
                0.09    0.14  358911/1750922     extract_insn [33]
                0.13    0.00 1993307/3091999     reg_fits_class_p [88]
                0.08    0.02  310922/310922      combine_reloads [123]
                0.03    0.08   92668/92668       find_reloads_address [125]
                0.08    0.00 1692578/1910746     reg_class_subset_p [136]
                0.03    0.00  358919/1897960     single_set [109]
                0.02    0.00  237120/237388      reg_alternate_class [281]
                0.01    0.00  237120/355215      reg_preferred_class [311]
                0.01    0.00   97392/304810      normal_memory_operand [268]
                0.00    0.00     676/1014        zap_mask [724]
-----------------------------------------------
[12]     6.9    0.92    1.51  218511+1228287 <cycle 3 as a whole> [12]
                0.32    0.42  267378+208665      expand_expr <cycle 3> [40]
                0.10    0.27  246016             gen_movdi <cycle 3> [58]
                0.09    0.14   56286             expand_binop <cycle 3> [80]
                0.06    0.10  246688             emit_move_insn_1 <cycle 3> [99]
                0.02    0.13   25803             do_jump_for_compare <cycle 3> [106]
                0.08    0.05   65220             store_expr <cycle 3> [113]
                0.04    0.07  125120             emit_move_insn <cycle 3> [120]
                0.03    0.06   63104             expand_assignment <cycle 3> [130]
                0.03    0.04   73195             memory_address <cycle 3> [148]
                0.03    0.05   25811+2040        emit_cmp_insn <cycle 3> [149]
                0.03    0.02   25793+2040        do_jump <cycle 3> [181]
                0.01    0.03   25811             alpha_emit_conditional_branch <cycle 3> [188]
                0.01    0.02   29226             copy_to_mode_reg <cycle 3> [209]
                0.01    0.02   28480             force_reg <cycle 3> [234]
                0.01    0.01   32680             change_address <cycle 3> [240]
                0.00    0.02   10850             gen_ble <cycle 3> [246]
                0.01    0.01   25803             compare_from_rtx <cycle 3> [264]
                0.00    0.01    5233             gen_bgt <cycle 3> [300]
                0.01    0.00   23795             compare <cycle 3> [305]
                0.00    0.01    4997             gen_bne <cycle 3> [329]
                0.01    0.00   10500+37504       force_operand <cycle 3> [349]
                0.01    0.00    9808             expand_shift <cycle 3> [366]
                0.00    0.00    2008             gen_bgtu <cycle 3> [392]
                0.00    0.00    2072             emit_unop_insn <cycle 3> [394]
                0.00    0.00    1973             gen_beq <cycle 3> [407]
                0.00    0.00    4088             convert_modes <cycle 3> [409]
                0.00    0.00    2080             convert_move <cycle 3> [426]
                0.00    0.00     698             gen_blt <cycle 3> [431]
                0.00    0.00     140             expand_divmod <cycle 3> [456]
                0.00    0.00      32             expand_call <cycle 3> [459]
                0.00    0.00     662             expand_mult <cycle 3> [525]
                0.00    0.00      52             gen_bge <cycle 3> [571]
                0.00    0.00      88             copy_to_reg <cycle 3> [584]
                0.00    0.00      16             emit_libcall_block <cycle 3> [595]
                0.00    0.00      32             load_register_parameters <cycle 3> [623]
                0.00    0.00      32             precompute_arguments <cycle 3> [628]
                0.00    0.00      32             precompute_register_parameters <cycle 3> [639]
                0.00    0.00    1058             jumpifnot <cycle 3> [721]
-----------------------------------------------
                0.45    0.00 8208660/42370698     count_pseudo [23]
                0.59    0.00 10827360/42370698     order_regs_for_reload [8]
                0.63    0.00 11549184/42370698     choose_reload_regs [30]
                0.64    0.00 11785494/42370698     finish_spills [25]
[13]     6.5    2.31    0.00 42370698         bitmap_bit_p [13]
-----------------------------------------------
                0.21    2.03      24/24          rest_of_compilation [5]
[14]     6.3    0.21    2.03      24         final [14]
                0.22    1.81  497630/497630      final_scan_insn [17]
                0.00    0.00      24/5328        oballoc [485]
                0.00    0.00      24/48          check_exception_handler_labels [816]
                0.00    0.00      24/24          init_insn_eh_region [861]
                0.00    0.00      24/104         init_recog [804]
                0.00    0.00      24/24          free_insn_eh_region [856]
-----------------------------------------------
                0.26    1.83   90228/90228       reload_as_needed [9]
[15]     5.9    0.26    1.83   90228         emit_reload_insns [15]
                0.05    1.49  121568/121568      gen_reload [22]
                0.11    0.02 2218446/2218446     emit_insns_before [114]
                0.04    0.00  227736/519956      rtx_equal_p [128]
                0.01    0.02   59574/59574       reg_set_p [241]
                0.02    0.00  102417/102417      reload_reg_reaches_end_p [250]
                0.01    0.01  102409/102409      push_to_sequence [265]
                0.02    0.00   35821/35957       reg_mentioned_p [270]
                0.01    0.01   35821/864560      note_stores [74]
                0.01    0.00  121568/541814      end_sequence [195]
                0.01    0.00   71642/1897960     single_set [109]
                0.00    0.00   85747/363523      get_last_insn [271]
                0.00    0.00  157389/279445      get_insns [371]
                0.00    0.00   19159/541814      start_sequence [156]
                0.00    0.00   35829/422200      find_reg_note [238]
                0.00    0.00   19159/19311       emit_insns [482]
                0.00    0.00    1262/1262        delete_output_reload [539]
-----------------------------------------------
[16]     5.8    0.39    1.66  417809+698063  <cycle 7 as a whole> [16]
                0.20    1.16  317912             build_binary_op <cycle 7> [29]
                0.15    0.00  707786             default_conversion <cycle 7> [101]
                0.01    0.06   25861             build_unary_op <cycle 7> [155]
                0.01    0.00   35813             truthvalue_conversion <cycle 7> [310]
-----------------------------------------------

Brad Lucier    lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 13:16 big slowdown in egcs-1.1.2->gcc-2.95 on alpha Brad Lucier
@ 1999-08-06 13:54 ` Joern Rennecke
  1999-08-06 16:25   ` Brad Lucier
                     ` (2 more replies)
  1999-08-31 23:20 ` Brad Lucier
  1 sibling, 3 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-08-06 13:54 UTC (permalink / raw)
  To: Brad Lucier; +Cc: gcc, gcc-bugs, lucier, staff, hosking, wilker, bernds

> If I read this correctly, it's spending a lot of time in reload
> (global_alloc isn't in the call graph, and reload is; in toplev.c, you
> see that one or the other is called, but the time is reported above as
> global_alloc either way).  Perhaps there's a problem in reload.

That's the price we pay for doing register spilling on a per-instruction
basis, and calling compute_use_by_pseudos twice for every instruction
and reload pass.

I think we could fix this by using pseudo register birth / death lists
instead of complete register sets.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 13:54 ` Joern Rennecke
@ 1999-08-06 16:25   ` Brad Lucier
  1999-08-31 23:20     ` Brad Lucier
  1999-08-06 18:06   ` Brad Lucier
  1999-08-31 23:20   ` Joern Rennecke
  2 siblings, 1 reply; 54+ messages in thread
From: Brad Lucier @ 1999-08-06 16:25 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Brad Lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds

> I think we could fix this by using pseudo register birth / death lists
> instead of complete register sets.

That would be nice; here are times comparing egcs-1.1.2 and gcc-2.9.5 on
this alpha-ev6 box, without the profiling overhead (0 times have been elided):

popov-3% /usr/lib/gcc-lib/alpha-redhat-linux/egcs-2.91.66/cc1 g0-1.i
 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 10.890208
time in jump: 0.900848
time in flow: 2.382416
time in global-alloc: 10.848240
time in shorten-branch: 0.927200
time in final: 5.018592
time in varconst: 0.007808

popov-4% /export/u10/gcc-2.95/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.95/cc1 g0-1.i
 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 11.073696
time in jump: 0.923296
time in flow: 3.405264
time in global-alloc: 17.851040
time in shorten-branch: 0.829600
time in final: 5.183536

Brad Lucier    lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 13:54 ` Joern Rennecke
  1999-08-06 16:25   ` Brad Lucier
@ 1999-08-06 18:06   ` Brad Lucier
  1999-08-06 18:54     ` Joern Rennecke
  1999-08-31 23:20     ` Brad Lucier
  1999-08-31 23:20   ` Joern Rennecke
  2 siblings, 2 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-06 18:06 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Brad Lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds

> That's the price we pay for doing register spilling on a per-instruction
> basis, and calling compute_use_by_pseudos twice for every instruction
> and reload pass.
> 
> I think we could fix this by using pseudo register birth / death lists
> instead of complete register sets.
> 
I made a mistake, sorry.  I didn't pass the -O1 -fPIC to cc1 when I
did the timings; reload is not the problem.  Mea culpa.

Here are the timings for the various stages of egcs-1.1.2 and gcc-2.95
with -O1 -fPIC.

popov-2% /usr/lib/gcc-lib/alpha-redhat-linux/egcs-2.91.66/cc1 -fPIC -O1 g0-1.i
 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 10.843360
time in integration: 0.007808
time in jump: 7.701616
time in cse: 8.022720
time in loop: 0.046848
time in flow: 2.352160
time in combine: 7.864608
time in local-alloc: 2.923120
time in global-alloc: 4.692608
time in shorten-branch: 0.361120
time in final: 2.023248

popov-4% /export/u10/gcc-2.95/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.95/cc1 -fPIC -O1 g0-1.i
 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 10.939984
time in integration: 0.000976
time in jump: 8.168144
time in cse: 5.181584
time in loop: 0.039040
time in flow: 2.695712
time in combine: 8.443376
time in local-alloc: 3.038288
time in global-alloc: 119.387248
time in flow2: 2.311168
time in shorten-branch: 0.383568
time in final: 2.120848

You can see there is a big difference in global-alloc.

Here is the beginning of the flat composite profile

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 36.61     57.68    57.68       16  3604.80  3605.63  prune_preferences
 12.47     77.32    19.64 391960848     0.00     0.00  bitmap_bit_p
  7.29     88.80    11.48   202789     0.06     0.06  record_one_conflict
  7.22    100.18    11.38       24   474.20  1233.79  build_insn_chain
  1.56    102.64     2.46        8   306.88 19628.50  yyparse
  1.35    104.77     2.13 27806760     0.00     0.00  count_pseudo
  1.32    106.85     2.08    15315     0.14     0.44  order_regs_for_reload
  1.05    108.50     1.65     1436     1.15     1.15  find_reg
  0.83    109.82     1.31  2455661     0.00     0.00  yylex

The biggest time sink seems to be the quadratic algorithm in prune_preferences
in global.c.

Again, the complete profile summary is at:

http://www.math.purdue.edu/~lucier/gmon.summary.gz

Brad Lucier     lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 18:06   ` Brad Lucier
@ 1999-08-06 18:54     ` Joern Rennecke
  1999-08-06 21:27       ` Brad Lucier
                         ` (2 more replies)
  1999-08-31 23:20     ` Brad Lucier
  1 sibling, 3 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-08-06 18:54 UTC (permalink / raw)
  To: Brad Lucier
  Cc: amylaar, lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds

> The biggest time sink seems to be the quadratic algorithm in prune_preferences
> in global.c.

Hmm, I wonder if the CONFLICT_P tests take a lot of time, or if the rest
of the operations in the inner loop cost more.

I wonder if this works:

Thu Aug  5 22:27:15 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* global.c (prune_preferences): Move some invariants out of the
	inner loop.

*** global.c	Wed Aug  4 21:33:01 1999
--- global.c-patched	Sat Aug  7 02:32:20 1999
*************** prune_preferences ()
*** 863,869 ****
  
    for (i = max_allocno - 1; i >= 0; i--)
      {
!       HARD_REG_SET temp;
  
        allocno = allocno_order[i];
        COPY_HARD_REG_SET (temp, hard_reg_conflicts[allocno]);
--- 863,869 ----
  
    for (i = max_allocno - 1; i >= 0; i--)
      {
!       HARD_REG_SET temp, temp2;
  
        allocno = allocno_order[i];
        COPY_HARD_REG_SET (temp, hard_reg_conflicts[allocno]);
*************** prune_preferences ()
*** 881,905 ****
        AND_COMPL_HARD_REG_SET (hard_reg_copy_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_full_preferences[allocno], temp);
  
-       CLEAR_HARD_REG_SET (regs_someone_prefers[allocno]);
- 
        /* Merge in the preferences of lower-priority registers (they have
  	 already been pruned).  If we also prefer some of those registers,
  	 don't exclude them unless we are of a smaller size (in which case
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
        for (j = i + 1; j < max_allocno; j++)
  	if (CONFLICTP (allocno, allocno_order[j])
  	    || CONFLICTP (allocno_order[j], allocno))
  	  {
- 	    COPY_HARD_REG_SET (temp,
- 			       hard_reg_full_preferences[allocno_order[j]]);
  	    if (allocno_size[allocno_order[j]] <= allocno_size[allocno])
! 	      AND_COMPL_HARD_REG_SET (temp,
! 				      hard_reg_full_preferences[allocno]);
! 			       
! 	    IOR_HARD_REG_SET (regs_someone_prefers[allocno], temp);
  	  }
      }
  }
  \f
--- 881,907 ----
        AND_COMPL_HARD_REG_SET (hard_reg_copy_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_full_preferences[allocno], temp);
  
        /* Merge in the preferences of lower-priority registers (they have
  	 already been pruned).  If we also prefer some of those registers,
  	 don't exclude them unless we are of a smaller size (in which case
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
+       CLEAR_HARD_REG_SET (temp);
+       CLEAR_HARD_REG_SET (temp2);
        for (j = i + 1; j < max_allocno; j++)
  	if (CONFLICTP (allocno, allocno_order[j])
  	    || CONFLICTP (allocno_order[j], allocno))
  	  {
  	    if (allocno_size[allocno_order[j]] <= allocno_size[allocno])
! 	      IOR_HARD_REG_SET (temp,
! 				hard_reg_full_preferences[allocno_order[j]]);
! 	    else
! 	      IOR_HARD_REG_SET (temp2,
! 				hard_reg_full_preferences[allocno_order[j]]);
  	  }
+       AND_COMPL_HARD_REG_SET (temp, hard_reg_full_preferences[allocno]);
+       IOR_HARD_REG_SET (temp, temp2);
+       COPY_HARD_REG_SET (regs_someone_prefers[allocno], temp);
      }
  }
  \f

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 18:54     ` Joern Rennecke
@ 1999-08-06 21:27       ` Brad Lucier
  1999-08-07  1:04         ` Richard Henderson
  1999-08-31 23:20         ` Brad Lucier
  1999-08-11  1:57       ` Jeffrey A Law
  1999-08-31 23:20       ` Joern Rennecke
  2 siblings, 2 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-06 21:27 UTC (permalink / raw)
  To: Joern Rennecke
  Cc: Brad Lucier, amylaar, gcc, gcc-bugs, staff, hosking, wilker, bernds

> 
> > The biggest time sink seems to be the quadratic algorithm in prune_preferences
> > in global.c.
> 
> Hmm, I wonder if the CONFLICT_P tests take a lot of time, or if the rest
> of the operations in the inner loop cost more.
> 
> I wonder if this works:
> 
Sorry, no, it made little difference:

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 35.63     55.32    55.32       16  3457.52  3458.44  prune_preferences
 12.51     74.75    19.43 391960848     0.00     0.00  bitmap_bit_p
  7.40     86.24    11.49   202789     0.06     0.06  record_one_conflict
  7.20     97.41    11.17       24   465.58  1218.03  build_insn_chain
  1.55     99.83     2.41        8   301.76 19347.54  yyparse
  1.49    102.13     2.31 27806760     0.00     0.00  count_pseudo

Here are some more details about global_alloc, prune_preferences,
and build_insn_chain.

-----------------------------------------------
                0.08   99.53      24/24          rest_of_compilation [5]
[6]     64.1    0.08   99.53      24         global_alloc [6]
               55.32    0.01      16/16          prune_preferences [7]
               11.17   18.06      24/24          build_insn_chain [8]
                0.24   10.63      24/24          reload [13]
                0.12    2.22      16/16          global_conflicts [33]
                1.66    0.00    1436/1436        find_reg [39]
                0.08    0.01      16/16          expand_preferences [256]
                0.00    0.00      24/220059      xmalloc [479]
                0.00    0.00      48/5488        get_insns [995]
-----------------------------------------------
               55.32    0.01      16/16          global_alloc [6]
[7]     35.6   55.32    0.01      16         prune_preferences [7]
                0.01    0.00   78844/334068      reg_preferred_class [306]
-----------------------------------------------
               11.17   18.06      24/24          global_alloc [6]
[8]     18.8   11.17   18.06      24         build_insn_chain [8]
               17.40    0.00 350981800/391960848     bitmap_bit_p [9]
                0.02    0.43  174187/5402847     note_stores [10]
                0.08    0.01  194131/194131      new_insn_chain [263]
                0.05    0.00  388262/574314      bitmap_copy [293]
                0.04    0.00  574446/16720630     bitmap_set_bit [48]
                0.01    0.02   88014/88014       reg_dies [385]
                0.00    0.00   34596/3151441     bitmap_clear [204]
                0.00    0.00      24/1004613     bitmap_initialize [401]
-----------------------------------------------

Brad Lucier     lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 21:27       ` Brad Lucier
@ 1999-08-07  1:04         ` Richard Henderson
  1999-08-07  8:08           ` Brad Lucier
                             ` (2 more replies)
  1999-08-31 23:20         ` Brad Lucier
  1 sibling, 3 replies; 54+ messages in thread
From: Richard Henderson @ 1999-08-07  1:04 UTC (permalink / raw)
  To: Brad Lucier
  Cc: Joern Rennecke, gcc, gcc-bugs, staff, hosking, wilker, bernds,
	gcc-patches

On Fri, Aug 06, 1999 at 11:27:32PM -0500, Brad Lucier wrote:
> Each sample counts as 0.000976562 seconds.
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name    
>  35.63     55.32    55.32       16  3457.52  3458.44  prune_preferences
>  12.51     74.75    19.43 391960848     0.00     0.00  bitmap_bit_p

The following should kill bitmap_bit_p from the interesting
functions.


r~



	* global.c (build_insn_chain): Use EXECUTE_IF_SET_IN_REG_SET 
	to invert loops.  Simplify block scanning.

Index: global.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/global.c,v
retrieving revision 1.28
diff -u -p -d -r1.28 global.c
--- global.c	1999/08/04 07:50:08	1.28
+++ global.c	1999/08/07 07:58:44
@@ -271,7 +271,7 @@ static void set_preference	PROTO((rtx, r
 static void dump_conflicts	PROTO((FILE *));
 static void reg_becomes_live	PROTO((rtx, rtx));
 static void reg_dies		PROTO((int, enum machine_mode));
-static void build_insn_chain	PROTO((rtx));
+static void build_insn_chain	PROTO((void));
 \f
 /* Perform allocation of pseudo-registers not allocated by local_alloc.
    FILE is a file to output debugging information on,
@@ -578,7 +578,7 @@ global_alloc (file)
   if (n_basic_blocks > 0)
 #endif
     {
-      build_insn_chain (get_insns ());
+      build_insn_chain ();
       retval = reload (get_insns (), 1, file);
     }
 
@@ -1667,96 +1667,88 @@ reg_dies (regno, mode)
 /* Walk the insns of the current function and build reload_insn_chain,
    and record register life information.  */
 static void
-build_insn_chain (first)
-     rtx first;
+build_insn_chain ()
 {
   struct insn_chain **p = &reload_insn_chain;
   struct insn_chain *prev = 0;
-  int b = 0;
+  int b;
 
   live_relevant_regs = ALLOCA_REG_SET ();
 
-  for (; first; first = NEXT_INSN (first))
+  for (b = n_basic_blocks - 1; b >= 0; --b)
     {
-      struct insn_chain *c;
-
-      if (first == BLOCK_HEAD (b))
-	{
-	  int i;
-	  CLEAR_REG_SET (live_relevant_regs);
-	  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-	    if (REGNO_REG_SET_P (BASIC_BLOCK (b)->global_live_at_start, i)
-		&& ! TEST_HARD_REG_BIT (eliminable_regset, i))
-	      SET_REGNO_REG_SET (live_relevant_regs, i);
+      basic_block bb = BASIC_BLOCK (b);
+      rtx insn, end;
+      int i;
 
-	  for (; i < max_regno; i++)
-	    if (reg_renumber[i] >= 0
-		&& REGNO_REG_SET_P (BASIC_BLOCK (b)->global_live_at_start, i))
-	      SET_REGNO_REG_SET (live_relevant_regs, i);
-	}
+      CLEAR_REG_SET (live_relevant_regs);
 
-      if (GET_CODE (first) != NOTE && GET_CODE (first) != BARRIER)
+      EXECUTE_IF_SET_IN_REG_SET (bb->global_live_at_start, 0, i,
 	{
-	  c = new_insn_chain ();
-	  c->prev = prev;
-	  prev = c;
-	  *p = c;
-	  p = &c->next;
-	  c->insn = first;
-	  c->block = b;
+	  if ((i < FIRST_PSEUDO_REGISTER
+	       && ! TEST_HARD_REG_BIT (eliminable_regset, i))
+	      || (i >= FIRST_PSEUDO_REGISTER
+		  && reg_renumber[i] >= 0))
+	    SET_REGNO_REG_SET (live_relevant_regs, i);
+	});
 
-	  COPY_REG_SET (c->live_before, live_relevant_regs);
+      insn = bb->head, end = bb->end;
+      while (1)
+	{
+	  struct insn_chain *c;
 
-	  if (GET_RTX_CLASS (GET_CODE (first)) == 'i')
+	  if (GET_CODE (insn) != NOTE)
 	    {
-	      rtx link;
-
-	      /* Mark the death of everything that dies in this instruction.  */
+	      c = new_insn_chain ();
+	      c->prev = prev;
+	      prev = c;
+	      *p = c;
+	      p = &c->next;
+	      c->insn = insn;
+	      c->block = b;
 
-	      for (link = REG_NOTES (first); link; link = XEXP (link, 1))
-		if (REG_NOTE_KIND (link) == REG_DEAD
-		    && GET_CODE (XEXP (link, 0)) == REG)
-		  reg_dies (REGNO (XEXP (link, 0)), GET_MODE (XEXP (link, 0)));
+	      COPY_REG_SET (c->live_before, live_relevant_regs);
 
-	      /* Mark everything born in this instruction as live.  */
+	      if (GET_RTX_CLASS (GET_CODE (insn)) == 'i')
+		{
+		  rtx link;
 
-	      note_stores (PATTERN (first), reg_becomes_live);
-	    }
+		  /* Mark the death of everything that dies in this
+		     instruction.  */
+		  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+		    if (REG_NOTE_KIND (link) == REG_DEAD
+			&& GET_CODE (XEXP (link, 0)) == REG)
+		      reg_dies (REGNO (XEXP (link, 0)),
+				GET_MODE (XEXP (link, 0)));
 
-	  /* Remember which registers are live at the end of the insn, before
-	     killing those with REG_UNUSED notes.  */
-	  COPY_REG_SET (c->live_after, live_relevant_regs);
+		  /* Mark everything born in this instruction as live.  */
+		  note_stores (PATTERN (insn), reg_becomes_live);
+		}
 
-	  if (GET_RTX_CLASS (GET_CODE (first)) == 'i')
-	    {
-	      rtx link;
+	      /* Remember which registers are live at the end of the insn,
+		 before killing those with REG_UNUSED notes.  */
+	      COPY_REG_SET (c->live_after, live_relevant_regs);
 
-	      /* Mark anything that is set in this insn and then unused as dying.  */
+	      if (GET_RTX_CLASS (GET_CODE (insn)) == 'i')
+		{
+		  rtx link;
 
-	      for (link = REG_NOTES (first); link; link = XEXP (link, 1))
-		if (REG_NOTE_KIND (link) == REG_UNUSED
-		    && GET_CODE (XEXP (link, 0)) == REG)
-		  reg_dies (REGNO (XEXP (link, 0)), GET_MODE (XEXP (link, 0)));
+		  /* Mark anything that is set in this insn and then unused
+		     as dying.  */
+		  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+		    if (REG_NOTE_KIND (link) == REG_UNUSED
+			&& GET_CODE (XEXP (link, 0)) == REG)
+		      reg_dies (REGNO (XEXP (link, 0)),
+				GET_MODE (XEXP (link, 0)));
+		}
 	    }
-	}
-
-      if (first == BLOCK_END (b))
-	b++;
 
-      /* Stop after we pass the end of the last basic block.  Verify that
-	 no real insns are after the end of the last basic block.
-
-	 We may want to reorganize the loop somewhat since this test should
-	 always be the right exit test.  */
-      if (b == n_basic_blocks)
-	{
-	  for (first = NEXT_INSN (first) ; first; first = NEXT_INSN (first))
-	    if (GET_RTX_CLASS (GET_CODE (first)) == 'i'
-		&& GET_CODE (PATTERN (first)) != USE)
-	      abort ();
-	  break;
+	  if (insn == end)
+	    break;
+	  insn = NEXT_INSN (insn);
 	}
     }
+
   FREE_REG_SET (live_relevant_regs);
   *p = 0;
 }

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-07  1:04         ` Richard Henderson
@ 1999-08-07  8:08           ` Brad Lucier
  1999-08-31 23:20             ` Brad Lucier
  1999-08-07  8:27           ` Brad Lucier
  1999-08-31 23:20           ` Richard Henderson
  2 siblings, 1 reply; 54+ messages in thread
From: Brad Lucier @ 1999-08-07  8:08 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Brad Lucier, Joern Rennecke, gcc, gcc-bugs, staff, hosking,
	wilker, bernds, gcc-patches

> The following should kill bitmap_bit_p from the interesting
> functions.

I'm leaving town in about an hour, and won't be able to test this
patch until I get e-mail access or I get back on the 16th.

The *.i files I'm using to test 'cc1 -fPIC -O1' are at

http://www.math.purdue.edu/~lucier/testcases.tar.gz

in case anyone else wants to try them.

Brad Lucier     lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-07  1:04         ` Richard Henderson
  1999-08-07  8:08           ` Brad Lucier
@ 1999-08-07  8:27           ` Brad Lucier
  1999-08-31 23:20             ` Brad Lucier
  1999-11-03 22:31             ` Joern Rennecke
  1999-08-31 23:20           ` Richard Henderson
  2 siblings, 2 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-07  8:27 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Brad Lucier, Joern Rennecke, gcc, gcc-bugs, staff, hosking,
	wilker, bernds, gcc-patches

> 
> On Fri, Aug 06, 1999 at 11:27:32PM -0500, Brad Lucier wrote:
> > Each sample counts as 0.000976562 seconds.
> >   %   cumulative   self              self     total           
> >  time   seconds   seconds    calls  ms/call  ms/call  name    
> >  35.63     55.32    55.32       16  3457.52  3458.44  prune_preferences
> >  12.51     74.75    19.43 391960848     0.00     0.00  bitmap_bit_p
> 
> The following should kill bitmap_bit_p from the interesting
> functions.

Right on ...

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 44.25     56.94    56.94       16  3558.59  3559.59  prune_preferences
  8.93     68.43    11.49   202789     0.06     0.06  record_one_conflict
  2.05     71.07     2.64 40979048     0.00     0.00  bitmap_bit_p
  1.88     73.49     2.42        8   303.10 16018.62  yyparse
  1.67     75.65     2.15 27806760     0.00     0.00  count_pseudo
  1.58     77.68     2.03    15315     0.13     0.47  order_regs_for_reload

Brad Lucier    lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 18:54     ` Joern Rennecke
  1999-08-06 21:27       ` Brad Lucier
@ 1999-08-11  1:57       ` Jeffrey A Law
  1999-08-12 16:25         ` Joern Rennecke
  1999-08-31 23:20         ` Jeffrey A Law
  1999-08-31 23:20       ` Joern Rennecke
  2 siblings, 2 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-08-11  1:57 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Brad Lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds

  In message < 199908070139.CAA18804@phal.cygnus.co.uk >you write:
  > Hmm, I wonder if the CONFLICT_P tests take a lot of time, or if the rest
  > of the operations in the inner loop cost more.
  > 
  > I wonder if this works:
  > 
  > Thu Aug  5 22:27:15 1999  J"orn Rennecke <amylaar@cygnus.co.uk>
  > 
  > 	* global.c (prune_preferences): Move some invariants out of the
  > 	inner loop.
This patch is fine, even if it did not make a significant difference for
the compile time slowdown in question.   Thanks.
jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-11  1:57       ` Jeffrey A Law
@ 1999-08-12 16:25         ` Joern Rennecke
  1999-08-31 23:20           ` Joern Rennecke
  1999-08-31 23:20         ` Jeffrey A Law
  1 sibling, 1 reply; 54+ messages in thread
From: Joern Rennecke @ 1999-08-12 16:25 UTC (permalink / raw)
  To: law; +Cc: amylaar, lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds

>   > 	* global.c (prune_preferences): Move some invariants out of the
>   > 	inner loop.
> This patch is fine, even if it did not make a significant difference for
> the compile time slowdown in question.   Thanks.

I have tested and comitted the patch.

I think I understand the bottleneck in prune_preferences better.  It is
indeed the CONFLICTP check.

We can expect the conflicts to be sparse, because typically we have a lot
of pseudos that live only during a single basic block or during a loop.

Moreover, CONFLICTS are naturally symmetric, yet as the comment for
conflicts says:
   `conflicts' is not symmetric; a conflict between allocno's i and j
   is recorded either in element i,j or in element j,i.  */
All CONFLICT_P tests are therefore duplicated, thus causing yet more
unnecessary overhead.   `conflicts' is not symmetric; a conflict between allocno's i and j
   is recorded either in element i,j or in element j,i.  */

So, I think we can get better compiling performance for large programs
if conflicts allocnos_live use a representation that is efficient for
sparse sets, e.g. a hash table or a balanced tree.
Moreover, it makes sense to normalize conflicts before we store them,
e.g. we could say CONFLICTP (i, j) is normalized if i <= j.  If the
conflict is not normalized, swapping the operands will normalize it.

Or if we go for hash tables and want to encourage a more uniform distribution
of has table filling, we could say CONFLICTP (i, j) is normalized if
(unsigned)(j - i) % n <= n >> 1 , where n is the total number of allocnos.
This does not work for even allocnos, so we just bump up n to an odd number.

The implementation is left as an exercise to the reader ;-)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 13:16 big slowdown in egcs-1.1.2->gcc-2.95 on alpha Brad Lucier
  1999-08-06 13:54 ` Joern Rennecke
@ 1999-08-31 23:20 ` Brad Lucier
  1 sibling, 0 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-31 23:20 UTC (permalink / raw)
  To: gcc, gcc-bugs, lucier; +Cc: staff, hosking, wilker

To follow up on my comments that gcc-2.95 is much slower than egcs-1.1.2
on the alpha-ev6 for some files, here are some timing data for a profiled
cc1 on alphaev6-unknown-linux-gnu with glibc-2.1.1, binutils-2.9.5.0.4,
and kernel 2.2.10.

Because the various timings on this machine are screwed up, I don't know
how to interpret the information precisely.  But it can give you a
an idea of the relative times that various parts of the process took.

gcc was called with

gcc -fPIC -save-temps -O1

on eight relatively large files (basically, each of these files
contains a 25,000+ line procedure to be compiled, with a total
of 18 local variables and one argument).

The output from cc1 on the first of these files (which is typical) is:

 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 21.472976
time in integration: 0.000000
time in jump: 1.503040
time in cse: 0.000000
time in gcse: 0.000000
time in loop: 0.000000
time in cse2: 0.000000
time in branch-prob: 0.000000
time in flow: 5.866736
time in combine: 0.000000
time in regmove: 0.000000
time in sched: 0.000000
time in local-alloc: 0.000000
time in global-alloc: 44.732032
time in flow2: 0.000000
time in sched2: 0.000000
time in shorten-branch: 1.636752
time in stack-reg: 0.000000
time in final: 7.841184
time in varconst: 0.031232
time in symout: 0.000000
time in dump: 0.000000

If I read this correctly, it's spending a lot of time in reload
(global_alloc isn't in the call graph, and reload is; in toplev.c, you
see that one or the other is called, but the time is reported above as
global_alloc either way).  Perhaps there's a problem in reload.

The gprof output file can be found at

http://www.math.purdue.edu/~lucier/gmon.summary.gz

The summary information from gprof for cc1 begins:

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  7.74      2.74     2.74    90228     0.03     0.06  order_regs_for_reload
  6.94      5.21     2.46   358919     0.01     0.01  find_reloads
  6.82      7.62     2.42        8   302.49  4312.47  yyparse
  6.51      9.93     2.31 42370698     0.00     0.00  bitmap_bit_p
  3.64     11.22     1.29  2455661     0.00     0.00  yylex
  2.60     12.15     0.92   302171     0.00     0.00  record_reg_classes
  2.50     13.03     0.89                             hard_reg_use_compare
  2.32     13.85     0.82       24    34.26    58.61  stupid_life_analysis
  1.89     14.52     0.67   512830     0.00     0.00  for_each_rtx
  1.65     15.11     0.59  8208660     0.00     0.00  count_pseudo

Some selected information from the call graph about reload, which seems
to take a long time:

-----------------------------------------------
                2.74    2.27   90228/90228       find_reload_regs [7]
[8]     14.1    2.74    2.27   90228         order_regs_for_reload [8]
                0.59    0.91 8208660/8208660     count_pseudo [23]
                0.59    0.00 10827360/42370698     bitmap_bit_p [13]
                0.18    0.00 5503908/5503956     bitmap_clear [93]
                0.00    0.00   90228/704750      bitmap_initialize [259]
-----------------------------------------------
                0.17    4.56      16/16          reload [6]
[9]     13.3    0.17    4.56      16         reload_as_needed [9]
                0.26    1.83   90228/90228       emit_reload_insns [15]
                0.38    0.95   90228/90228       choose_reload_regs [30]
                0.70    0.29  102504/358919      find_reloads [11]
                0.04    0.04  256335/864560      note_stores [74]
                0.02    0.00   90228/90228       subst_reloads [247]
                0.00    0.02   12276/268691      eliminate_regs_in_insn [52]
                0.01    0.00   50798/84370       set_offsets_for_label [312]
                0.01    0.00   90228/2814858     asm_noperands [96]
                0.00    0.00   12276/268691      update_eliminable_offsets [216]
                0.00    0.00      16/40          set_initial_elim_offsets [475]
-----------------------------------------------
                0.21    3.32      24/24          reload [6]
[10]    10.0    0.21    3.32      24         calculate_needs_all_insns [10]
                1.76    0.74  256415/358919      find_reloads [11]
                0.06    0.42  256415/268691      eliminate_regs_in_insn [52]
                0.21    0.00   90228/90228       calculate_needs [83]
                0.07    0.01  122572/122572      set_label_offsets [146]
                0.03    0.00  256415/268691      update_eliminable_offsets [216]
                0.02    0.00  256415/1897960     single_set [109]
-----------------------------------------------
                0.70    0.29  102504/358919      reload_as_needed [9]
                1.76    0.74  256415/358919      calculate_needs_all_insns [10]
[11]     9.8    2.46    1.03  358919         find_reloads [11]
                0.19    0.14  219242/243152      push_reload [57]
                0.09    0.14  358911/1750922     extract_insn [33]
                0.13    0.00 1993307/3091999     reg_fits_class_p [88]
                0.08    0.02  310922/310922      combine_reloads [123]
                0.03    0.08   92668/92668       find_reloads_address [125]
                0.08    0.00 1692578/1910746     reg_class_subset_p [136]
                0.03    0.00  358919/1897960     single_set [109]
                0.02    0.00  237120/237388      reg_alternate_class [281]
                0.01    0.00  237120/355215      reg_preferred_class [311]
                0.01    0.00   97392/304810      normal_memory_operand [268]
                0.00    0.00     676/1014        zap_mask [724]
-----------------------------------------------
[12]     6.9    0.92    1.51  218511+1228287 <cycle 3 as a whole> [12]
                0.32    0.42  267378+208665      expand_expr <cycle 3> [40]
                0.10    0.27  246016             gen_movdi <cycle 3> [58]
                0.09    0.14   56286             expand_binop <cycle 3> [80]
                0.06    0.10  246688             emit_move_insn_1 <cycle 3> [99]
                0.02    0.13   25803             do_jump_for_compare <cycle 3> [106]
                0.08    0.05   65220             store_expr <cycle 3> [113]
                0.04    0.07  125120             emit_move_insn <cycle 3> [120]
                0.03    0.06   63104             expand_assignment <cycle 3> [130]
                0.03    0.04   73195             memory_address <cycle 3> [148]
                0.03    0.05   25811+2040        emit_cmp_insn <cycle 3> [149]
                0.03    0.02   25793+2040        do_jump <cycle 3> [181]
                0.01    0.03   25811             alpha_emit_conditional_branch <cycle 3> [188]
                0.01    0.02   29226             copy_to_mode_reg <cycle 3> [209]
                0.01    0.02   28480             force_reg <cycle 3> [234]
                0.01    0.01   32680             change_address <cycle 3> [240]
                0.00    0.02   10850             gen_ble <cycle 3> [246]
                0.01    0.01   25803             compare_from_rtx <cycle 3> [264]
                0.00    0.01    5233             gen_bgt <cycle 3> [300]
                0.01    0.00   23795             compare <cycle 3> [305]
                0.00    0.01    4997             gen_bne <cycle 3> [329]
                0.01    0.00   10500+37504       force_operand <cycle 3> [349]
                0.01    0.00    9808             expand_shift <cycle 3> [366]
                0.00    0.00    2008             gen_bgtu <cycle 3> [392]
                0.00    0.00    2072             emit_unop_insn <cycle 3> [394]
                0.00    0.00    1973             gen_beq <cycle 3> [407]
                0.00    0.00    4088             convert_modes <cycle 3> [409]
                0.00    0.00    2080             convert_move <cycle 3> [426]
                0.00    0.00     698             gen_blt <cycle 3> [431]
                0.00    0.00     140             expand_divmod <cycle 3> [456]
                0.00    0.00      32             expand_call <cycle 3> [459]
                0.00    0.00     662             expand_mult <cycle 3> [525]
                0.00    0.00      52             gen_bge <cycle 3> [571]
                0.00    0.00      88             copy_to_reg <cycle 3> [584]
                0.00    0.00      16             emit_libcall_block <cycle 3> [595]
                0.00    0.00      32             load_register_parameters <cycle 3> [623]
                0.00    0.00      32             precompute_arguments <cycle 3> [628]
                0.00    0.00      32             precompute_register_parameters <cycle 3> [639]
                0.00    0.00    1058             jumpifnot <cycle 3> [721]
-----------------------------------------------
                0.45    0.00 8208660/42370698     count_pseudo [23]
                0.59    0.00 10827360/42370698     order_regs_for_reload [8]
                0.63    0.00 11549184/42370698     choose_reload_regs [30]
                0.64    0.00 11785494/42370698     finish_spills [25]
[13]     6.5    2.31    0.00 42370698         bitmap_bit_p [13]
-----------------------------------------------
                0.21    2.03      24/24          rest_of_compilation [5]
[14]     6.3    0.21    2.03      24         final [14]
                0.22    1.81  497630/497630      final_scan_insn [17]
                0.00    0.00      24/5328        oballoc [485]
                0.00    0.00      24/48          check_exception_handler_labels [816]
                0.00    0.00      24/24          init_insn_eh_region [861]
                0.00    0.00      24/104         init_recog [804]
                0.00    0.00      24/24          free_insn_eh_region [856]
-----------------------------------------------
                0.26    1.83   90228/90228       reload_as_needed [9]
[15]     5.9    0.26    1.83   90228         emit_reload_insns [15]
                0.05    1.49  121568/121568      gen_reload [22]
                0.11    0.02 2218446/2218446     emit_insns_before [114]
                0.04    0.00  227736/519956      rtx_equal_p [128]
                0.01    0.02   59574/59574       reg_set_p [241]
                0.02    0.00  102417/102417      reload_reg_reaches_end_p [250]
                0.01    0.01  102409/102409      push_to_sequence [265]
                0.02    0.00   35821/35957       reg_mentioned_p [270]
                0.01    0.01   35821/864560      note_stores [74]
                0.01    0.00  121568/541814      end_sequence [195]
                0.01    0.00   71642/1897960     single_set [109]
                0.00    0.00   85747/363523      get_last_insn [271]
                0.00    0.00  157389/279445      get_insns [371]
                0.00    0.00   19159/541814      start_sequence [156]
                0.00    0.00   35829/422200      find_reg_note [238]
                0.00    0.00   19159/19311       emit_insns [482]
                0.00    0.00    1262/1262        delete_output_reload [539]
-----------------------------------------------
[16]     5.8    0.39    1.66  417809+698063  <cycle 7 as a whole> [16]
                0.20    1.16  317912             build_binary_op <cycle 7> [29]
                0.15    0.00  707786             default_conversion <cycle 7> [101]
                0.01    0.06   25861             build_unary_op <cycle 7> [155]
                0.01    0.00   35813             truthvalue_conversion <cycle 7> [310]
-----------------------------------------------

Brad Lucier    lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-07  8:08           ` Brad Lucier
@ 1999-08-31 23:20             ` Brad Lucier
  0 siblings, 0 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-31 23:20 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Brad Lucier, Joern Rennecke, gcc, gcc-bugs, staff, hosking,
	wilker, bernds, gcc-patches

> The following should kill bitmap_bit_p from the interesting
> functions.

I'm leaving town in about an hour, and won't be able to test this
patch until I get e-mail access or I get back on the 16th.

The *.i files I'm using to test 'cc1 -fPIC -O1' are at

http://www.math.purdue.edu/~lucier/testcases.tar.gz

in case anyone else wants to try them.

Brad Lucier     lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 18:54     ` Joern Rennecke
  1999-08-06 21:27       ` Brad Lucier
  1999-08-11  1:57       ` Jeffrey A Law
@ 1999-08-31 23:20       ` Joern Rennecke
  2 siblings, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-08-31 23:20 UTC (permalink / raw)
  To: Brad Lucier
  Cc: amylaar, lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds

> The biggest time sink seems to be the quadratic algorithm in prune_preferences
> in global.c.

Hmm, I wonder if the CONFLICT_P tests take a lot of time, or if the rest
of the operations in the inner loop cost more.

I wonder if this works:

Thu Aug  5 22:27:15 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* global.c (prune_preferences): Move some invariants out of the
	inner loop.

*** global.c	Wed Aug  4 21:33:01 1999
--- global.c-patched	Sat Aug  7 02:32:20 1999
*************** prune_preferences ()
*** 863,869 ****
  
    for (i = max_allocno - 1; i >= 0; i--)
      {
!       HARD_REG_SET temp;
  
        allocno = allocno_order[i];
        COPY_HARD_REG_SET (temp, hard_reg_conflicts[allocno]);
--- 863,869 ----
  
    for (i = max_allocno - 1; i >= 0; i--)
      {
!       HARD_REG_SET temp, temp2;
  
        allocno = allocno_order[i];
        COPY_HARD_REG_SET (temp, hard_reg_conflicts[allocno]);
*************** prune_preferences ()
*** 881,905 ****
        AND_COMPL_HARD_REG_SET (hard_reg_copy_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_full_preferences[allocno], temp);
  
-       CLEAR_HARD_REG_SET (regs_someone_prefers[allocno]);
- 
        /* Merge in the preferences of lower-priority registers (they have
  	 already been pruned).  If we also prefer some of those registers,
  	 don't exclude them unless we are of a smaller size (in which case
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
        for (j = i + 1; j < max_allocno; j++)
  	if (CONFLICTP (allocno, allocno_order[j])
  	    || CONFLICTP (allocno_order[j], allocno))
  	  {
- 	    COPY_HARD_REG_SET (temp,
- 			       hard_reg_full_preferences[allocno_order[j]]);
  	    if (allocno_size[allocno_order[j]] <= allocno_size[allocno])
! 	      AND_COMPL_HARD_REG_SET (temp,
! 				      hard_reg_full_preferences[allocno]);
! 			       
! 	    IOR_HARD_REG_SET (regs_someone_prefers[allocno], temp);
  	  }
      }
  }
  \f
--- 881,907 ----
        AND_COMPL_HARD_REG_SET (hard_reg_copy_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_full_preferences[allocno], temp);
  
        /* Merge in the preferences of lower-priority registers (they have
  	 already been pruned).  If we also prefer some of those registers,
  	 don't exclude them unless we are of a smaller size (in which case
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
+       CLEAR_HARD_REG_SET (temp);
+       CLEAR_HARD_REG_SET (temp2);
        for (j = i + 1; j < max_allocno; j++)
  	if (CONFLICTP (allocno, allocno_order[j])
  	    || CONFLICTP (allocno_order[j], allocno))
  	  {
  	    if (allocno_size[allocno_order[j]] <= allocno_size[allocno])
! 	      IOR_HARD_REG_SET (temp,
! 				hard_reg_full_preferences[allocno_order[j]]);
! 	    else
! 	      IOR_HARD_REG_SET (temp2,
! 				hard_reg_full_preferences[allocno_order[j]]);
  	  }
+       AND_COMPL_HARD_REG_SET (temp, hard_reg_full_preferences[allocno]);
+       IOR_HARD_REG_SET (temp, temp2);
+       COPY_HARD_REG_SET (regs_someone_prefers[allocno], temp);
      }
  }
  \f

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-07  8:27           ` Brad Lucier
@ 1999-08-31 23:20             ` Brad Lucier
  1999-11-03 22:31             ` Joern Rennecke
  1 sibling, 0 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-31 23:20 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Brad Lucier, Joern Rennecke, gcc, gcc-bugs, staff, hosking,
	wilker, bernds, gcc-patches

> 
> On Fri, Aug 06, 1999 at 11:27:32PM -0500, Brad Lucier wrote:
> > Each sample counts as 0.000976562 seconds.
> >   %   cumulative   self              self     total           
> >  time   seconds   seconds    calls  ms/call  ms/call  name    
> >  35.63     55.32    55.32       16  3457.52  3458.44  prune_preferences
> >  12.51     74.75    19.43 391960848     0.00     0.00  bitmap_bit_p
> 
> The following should kill bitmap_bit_p from the interesting
> functions.

Right on ...

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 44.25     56.94    56.94       16  3558.59  3559.59  prune_preferences
  8.93     68.43    11.49   202789     0.06     0.06  record_one_conflict
  2.05     71.07     2.64 40979048     0.00     0.00  bitmap_bit_p
  1.88     73.49     2.42        8   303.10 16018.62  yyparse
  1.67     75.65     2.15 27806760     0.00     0.00  count_pseudo
  1.58     77.68     2.03    15315     0.13     0.47  order_regs_for_reload

Brad Lucier    lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-11  1:57       ` Jeffrey A Law
  1999-08-12 16:25         ` Joern Rennecke
@ 1999-08-31 23:20         ` Jeffrey A Law
  1 sibling, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-08-31 23:20 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Brad Lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds

  In message < 199908070139.CAA18804@phal.cygnus.co.uk >you write:
  > Hmm, I wonder if the CONFLICT_P tests take a lot of time, or if the rest
  > of the operations in the inner loop cost more.
  > 
  > I wonder if this works:
  > 
  > Thu Aug  5 22:27:15 1999  J"orn Rennecke <amylaar@cygnus.co.uk>
  > 
  > 	* global.c (prune_preferences): Move some invariants out of the
  > 	inner loop.
This patch is fine, even if it did not make a significant difference for
the compile time slowdown in question.   Thanks.
jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-07  1:04         ` Richard Henderson
  1999-08-07  8:08           ` Brad Lucier
  1999-08-07  8:27           ` Brad Lucier
@ 1999-08-31 23:20           ` Richard Henderson
  2 siblings, 0 replies; 54+ messages in thread
From: Richard Henderson @ 1999-08-31 23:20 UTC (permalink / raw)
  To: Brad Lucier
  Cc: Joern Rennecke, gcc, gcc-bugs, staff, hosking, wilker, bernds,
	gcc-patches

On Fri, Aug 06, 1999 at 11:27:32PM -0500, Brad Lucier wrote:
> Each sample counts as 0.000976562 seconds.
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name    
>  35.63     55.32    55.32       16  3457.52  3458.44  prune_preferences
>  12.51     74.75    19.43 391960848     0.00     0.00  bitmap_bit_p

The following should kill bitmap_bit_p from the interesting
functions.


r~



	* global.c (build_insn_chain): Use EXECUTE_IF_SET_IN_REG_SET 
	to invert loops.  Simplify block scanning.

Index: global.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/global.c,v
retrieving revision 1.28
diff -u -p -d -r1.28 global.c
--- global.c	1999/08/04 07:50:08	1.28
+++ global.c	1999/08/07 07:58:44
@@ -271,7 +271,7 @@ static void set_preference	PROTO((rtx, r
 static void dump_conflicts	PROTO((FILE *));
 static void reg_becomes_live	PROTO((rtx, rtx));
 static void reg_dies		PROTO((int, enum machine_mode));
-static void build_insn_chain	PROTO((rtx));
+static void build_insn_chain	PROTO((void));
 \f
 /* Perform allocation of pseudo-registers not allocated by local_alloc.
    FILE is a file to output debugging information on,
@@ -578,7 +578,7 @@ global_alloc (file)
   if (n_basic_blocks > 0)
 #endif
     {
-      build_insn_chain (get_insns ());
+      build_insn_chain ();
       retval = reload (get_insns (), 1, file);
     }
 
@@ -1667,96 +1667,88 @@ reg_dies (regno, mode)
 /* Walk the insns of the current function and build reload_insn_chain,
    and record register life information.  */
 static void
-build_insn_chain (first)
-     rtx first;
+build_insn_chain ()
 {
   struct insn_chain **p = &reload_insn_chain;
   struct insn_chain *prev = 0;
-  int b = 0;
+  int b;
 
   live_relevant_regs = ALLOCA_REG_SET ();
 
-  for (; first; first = NEXT_INSN (first))
+  for (b = n_basic_blocks - 1; b >= 0; --b)
     {
-      struct insn_chain *c;
-
-      if (first == BLOCK_HEAD (b))
-	{
-	  int i;
-	  CLEAR_REG_SET (live_relevant_regs);
-	  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-	    if (REGNO_REG_SET_P (BASIC_BLOCK (b)->global_live_at_start, i)
-		&& ! TEST_HARD_REG_BIT (eliminable_regset, i))
-	      SET_REGNO_REG_SET (live_relevant_regs, i);
+      basic_block bb = BASIC_BLOCK (b);
+      rtx insn, end;
+      int i;
 
-	  for (; i < max_regno; i++)
-	    if (reg_renumber[i] >= 0
-		&& REGNO_REG_SET_P (BASIC_BLOCK (b)->global_live_at_start, i))
-	      SET_REGNO_REG_SET (live_relevant_regs, i);
-	}
+      CLEAR_REG_SET (live_relevant_regs);
 
-      if (GET_CODE (first) != NOTE && GET_CODE (first) != BARRIER)
+      EXECUTE_IF_SET_IN_REG_SET (bb->global_live_at_start, 0, i,
 	{
-	  c = new_insn_chain ();
-	  c->prev = prev;
-	  prev = c;
-	  *p = c;
-	  p = &c->next;
-	  c->insn = first;
-	  c->block = b;
+	  if ((i < FIRST_PSEUDO_REGISTER
+	       && ! TEST_HARD_REG_BIT (eliminable_regset, i))
+	      || (i >= FIRST_PSEUDO_REGISTER
+		  && reg_renumber[i] >= 0))
+	    SET_REGNO_REG_SET (live_relevant_regs, i);
+	});
 
-	  COPY_REG_SET (c->live_before, live_relevant_regs);
+      insn = bb->head, end = bb->end;
+      while (1)
+	{
+	  struct insn_chain *c;
 
-	  if (GET_RTX_CLASS (GET_CODE (first)) == 'i')
+	  if (GET_CODE (insn) != NOTE)
 	    {
-	      rtx link;
-
-	      /* Mark the death of everything that dies in this instruction.  */
+	      c = new_insn_chain ();
+	      c->prev = prev;
+	      prev = c;
+	      *p = c;
+	      p = &c->next;
+	      c->insn = insn;
+	      c->block = b;
 
-	      for (link = REG_NOTES (first); link; link = XEXP (link, 1))
-		if (REG_NOTE_KIND (link) == REG_DEAD
-		    && GET_CODE (XEXP (link, 0)) == REG)
-		  reg_dies (REGNO (XEXP (link, 0)), GET_MODE (XEXP (link, 0)));
+	      COPY_REG_SET (c->live_before, live_relevant_regs);
 
-	      /* Mark everything born in this instruction as live.  */
+	      if (GET_RTX_CLASS (GET_CODE (insn)) == 'i')
+		{
+		  rtx link;
 
-	      note_stores (PATTERN (first), reg_becomes_live);
-	    }
+		  /* Mark the death of everything that dies in this
+		     instruction.  */
+		  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+		    if (REG_NOTE_KIND (link) == REG_DEAD
+			&& GET_CODE (XEXP (link, 0)) == REG)
+		      reg_dies (REGNO (XEXP (link, 0)),
+				GET_MODE (XEXP (link, 0)));
 
-	  /* Remember which registers are live at the end of the insn, before
-	     killing those with REG_UNUSED notes.  */
-	  COPY_REG_SET (c->live_after, live_relevant_regs);
+		  /* Mark everything born in this instruction as live.  */
+		  note_stores (PATTERN (insn), reg_becomes_live);
+		}
 
-	  if (GET_RTX_CLASS (GET_CODE (first)) == 'i')
-	    {
-	      rtx link;
+	      /* Remember which registers are live at the end of the insn,
+		 before killing those with REG_UNUSED notes.  */
+	      COPY_REG_SET (c->live_after, live_relevant_regs);
 
-	      /* Mark anything that is set in this insn and then unused as dying.  */
+	      if (GET_RTX_CLASS (GET_CODE (insn)) == 'i')
+		{
+		  rtx link;
 
-	      for (link = REG_NOTES (first); link; link = XEXP (link, 1))
-		if (REG_NOTE_KIND (link) == REG_UNUSED
-		    && GET_CODE (XEXP (link, 0)) == REG)
-		  reg_dies (REGNO (XEXP (link, 0)), GET_MODE (XEXP (link, 0)));
+		  /* Mark anything that is set in this insn and then unused
+		     as dying.  */
+		  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+		    if (REG_NOTE_KIND (link) == REG_UNUSED
+			&& GET_CODE (XEXP (link, 0)) == REG)
+		      reg_dies (REGNO (XEXP (link, 0)),
+				GET_MODE (XEXP (link, 0)));
+		}
 	    }
-	}
-
-      if (first == BLOCK_END (b))
-	b++;
 
-      /* Stop after we pass the end of the last basic block.  Verify that
-	 no real insns are after the end of the last basic block.
-
-	 We may want to reorganize the loop somewhat since this test should
-	 always be the right exit test.  */
-      if (b == n_basic_blocks)
-	{
-	  for (first = NEXT_INSN (first) ; first; first = NEXT_INSN (first))
-	    if (GET_RTX_CLASS (GET_CODE (first)) == 'i'
-		&& GET_CODE (PATTERN (first)) != USE)
-	      abort ();
-	  break;
+	  if (insn == end)
+	    break;
+	  insn = NEXT_INSN (insn);
 	}
     }
+
   FREE_REG_SET (live_relevant_regs);
   *p = 0;
 }

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 18:06   ` Brad Lucier
  1999-08-06 18:54     ` Joern Rennecke
@ 1999-08-31 23:20     ` Brad Lucier
  1 sibling, 0 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-31 23:20 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Brad Lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds

> That's the price we pay for doing register spilling on a per-instruction
> basis, and calling compute_use_by_pseudos twice for every instruction
> and reload pass.
> 
> I think we could fix this by using pseudo register birth / death lists
> instead of complete register sets.
> 
I made a mistake, sorry.  I didn't pass the -O1 -fPIC to cc1 when I
did the timings; reload is not the problem.  Mea culpa.

Here are the timings for the various stages of egcs-1.1.2 and gcc-2.95
with -O1 -fPIC.

popov-2% /usr/lib/gcc-lib/alpha-redhat-linux/egcs-2.91.66/cc1 -fPIC -O1 g0-1.i
 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 10.843360
time in integration: 0.007808
time in jump: 7.701616
time in cse: 8.022720
time in loop: 0.046848
time in flow: 2.352160
time in combine: 7.864608
time in local-alloc: 2.923120
time in global-alloc: 4.692608
time in shorten-branch: 0.361120
time in final: 2.023248

popov-4% /export/u10/gcc-2.95/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.95/cc1 -fPIC -O1 g0-1.i
 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 10.939984
time in integration: 0.000976
time in jump: 8.168144
time in cse: 5.181584
time in loop: 0.039040
time in flow: 2.695712
time in combine: 8.443376
time in local-alloc: 3.038288
time in global-alloc: 119.387248
time in flow2: 2.311168
time in shorten-branch: 0.383568
time in final: 2.120848

You can see there is a big difference in global-alloc.

Here is the beginning of the flat composite profile

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 36.61     57.68    57.68       16  3604.80  3605.63  prune_preferences
 12.47     77.32    19.64 391960848     0.00     0.00  bitmap_bit_p
  7.29     88.80    11.48   202789     0.06     0.06  record_one_conflict
  7.22    100.18    11.38       24   474.20  1233.79  build_insn_chain
  1.56    102.64     2.46        8   306.88 19628.50  yyparse
  1.35    104.77     2.13 27806760     0.00     0.00  count_pseudo
  1.32    106.85     2.08    15315     0.14     0.44  order_regs_for_reload
  1.05    108.50     1.65     1436     1.15     1.15  find_reg
  0.83    109.82     1.31  2455661     0.00     0.00  yylex

The biggest time sink seems to be the quadratic algorithm in prune_preferences
in global.c.

Again, the complete profile summary is at:

http://www.math.purdue.edu/~lucier/gmon.summary.gz

Brad Lucier     lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 13:54 ` Joern Rennecke
  1999-08-06 16:25   ` Brad Lucier
  1999-08-06 18:06   ` Brad Lucier
@ 1999-08-31 23:20   ` Joern Rennecke
  2 siblings, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-08-31 23:20 UTC (permalink / raw)
  To: Brad Lucier; +Cc: gcc, gcc-bugs, lucier, staff, hosking, wilker, bernds

> If I read this correctly, it's spending a lot of time in reload
> (global_alloc isn't in the call graph, and reload is; in toplev.c, you
> see that one or the other is called, but the time is reported above as
> global_alloc either way).  Perhaps there's a problem in reload.

That's the price we pay for doing register spilling on a per-instruction
basis, and calling compute_use_by_pseudos twice for every instruction
and reload pass.

I think we could fix this by using pseudo register birth / death lists
instead of complete register sets.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 21:27       ` Brad Lucier
  1999-08-07  1:04         ` Richard Henderson
@ 1999-08-31 23:20         ` Brad Lucier
  1 sibling, 0 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-31 23:20 UTC (permalink / raw)
  To: Joern Rennecke
  Cc: Brad Lucier, amylaar, gcc, gcc-bugs, staff, hosking, wilker, bernds

> 
> > The biggest time sink seems to be the quadratic algorithm in prune_preferences
> > in global.c.
> 
> Hmm, I wonder if the CONFLICT_P tests take a lot of time, or if the rest
> of the operations in the inner loop cost more.
> 
> I wonder if this works:
> 
Sorry, no, it made little difference:

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 35.63     55.32    55.32       16  3457.52  3458.44  prune_preferences
 12.51     74.75    19.43 391960848     0.00     0.00  bitmap_bit_p
  7.40     86.24    11.49   202789     0.06     0.06  record_one_conflict
  7.20     97.41    11.17       24   465.58  1218.03  build_insn_chain
  1.55     99.83     2.41        8   301.76 19347.54  yyparse
  1.49    102.13     2.31 27806760     0.00     0.00  count_pseudo

Here are some more details about global_alloc, prune_preferences,
and build_insn_chain.

-----------------------------------------------
                0.08   99.53      24/24          rest_of_compilation [5]
[6]     64.1    0.08   99.53      24         global_alloc [6]
               55.32    0.01      16/16          prune_preferences [7]
               11.17   18.06      24/24          build_insn_chain [8]
                0.24   10.63      24/24          reload [13]
                0.12    2.22      16/16          global_conflicts [33]
                1.66    0.00    1436/1436        find_reg [39]
                0.08    0.01      16/16          expand_preferences [256]
                0.00    0.00      24/220059      xmalloc [479]
                0.00    0.00      48/5488        get_insns [995]
-----------------------------------------------
               55.32    0.01      16/16          global_alloc [6]
[7]     35.6   55.32    0.01      16         prune_preferences [7]
                0.01    0.00   78844/334068      reg_preferred_class [306]
-----------------------------------------------
               11.17   18.06      24/24          global_alloc [6]
[8]     18.8   11.17   18.06      24         build_insn_chain [8]
               17.40    0.00 350981800/391960848     bitmap_bit_p [9]
                0.02    0.43  174187/5402847     note_stores [10]
                0.08    0.01  194131/194131      new_insn_chain [263]
                0.05    0.00  388262/574314      bitmap_copy [293]
                0.04    0.00  574446/16720630     bitmap_set_bit [48]
                0.01    0.02   88014/88014       reg_dies [385]
                0.00    0.00   34596/3151441     bitmap_clear [204]
                0.00    0.00      24/1004613     bitmap_initialize [401]
-----------------------------------------------

Brad Lucier     lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-12 16:25         ` Joern Rennecke
@ 1999-08-31 23:20           ` Joern Rennecke
  0 siblings, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-08-31 23:20 UTC (permalink / raw)
  To: law; +Cc: amylaar, lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds

>   > 	* global.c (prune_preferences): Move some invariants out of the
>   > 	inner loop.
> This patch is fine, even if it did not make a significant difference for
> the compile time slowdown in question.   Thanks.

I have tested and comitted the patch.

I think I understand the bottleneck in prune_preferences better.  It is
indeed the CONFLICTP check.

We can expect the conflicts to be sparse, because typically we have a lot
of pseudos that live only during a single basic block or during a loop.

Moreover, CONFLICTS are naturally symmetric, yet as the comment for
conflicts says:
   `conflicts' is not symmetric; a conflict between allocno's i and j
   is recorded either in element i,j or in element j,i.  */
All CONFLICT_P tests are therefore duplicated, thus causing yet more
unnecessary overhead.   `conflicts' is not symmetric; a conflict between allocno's i and j
   is recorded either in element i,j or in element j,i.  */

So, I think we can get better compiling performance for large programs
if conflicts allocnos_live use a representation that is efficient for
sparse sets, e.g. a hash table or a balanced tree.
Moreover, it makes sense to normalize conflicts before we store them,
e.g. we could say CONFLICTP (i, j) is normalized if i <= j.  If the
conflict is not normalized, swapping the operands will normalize it.

Or if we go for hash tables and want to encourage a more uniform distribution
of has table filling, we could say CONFLICTP (i, j) is normalized if
(unsigned)(j - i) % n <= n >> 1 , where n is the total number of allocnos.
This does not work for even allocnos, so we just bump up n to an odd number.

The implementation is left as an exercise to the reader ;-)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 16:25   ` Brad Lucier
@ 1999-08-31 23:20     ` Brad Lucier
  0 siblings, 0 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-31 23:20 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Brad Lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds

> I think we could fix this by using pseudo register birth / death lists
> instead of complete register sets.

That would be nice; here are times comparing egcs-1.1.2 and gcc-2.9.5 on
this alpha-ev6 box, without the profiling overhead (0 times have been elided):

popov-3% /usr/lib/gcc-lib/alpha-redhat-linux/egcs-2.91.66/cc1 g0-1.i
 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 10.890208
time in jump: 0.900848
time in flow: 2.382416
time in global-alloc: 10.848240
time in shorten-branch: 0.927200
time in final: 5.018592
time in varconst: 0.007808

popov-4% /export/u10/gcc-2.95/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.95/cc1 g0-1.i
 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 11.073696
time in jump: 0.923296
time in flow: 3.405264
time in global-alloc: 17.851040
time in shorten-branch: 0.829600
time in final: 5.183536

Brad Lucier    lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-07  8:27           ` Brad Lucier
  1999-08-31 23:20             ` Brad Lucier
@ 1999-11-03 22:31             ` Joern Rennecke
  1999-11-04 17:19               ` Richard Henderson
  1999-11-30 23:37               ` Joern Rennecke
  1 sibling, 2 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-11-03 22:31 UTC (permalink / raw)
  To: Brad Lucier
  Cc: rth, lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

> Each sample counts as 0.000976562 seconds.
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name    
>  44.25     56.94    56.94       16  3558.59  3559.59  prune_preferences
>   8.93     68.43    11.49   202789     0.06     0.06  record_one_conflict
>   2.05     71.07     2.64 40979048     0.00     0.00  bitmap_bit_p
>   1.88     73.49     2.42        8   303.10 16018.62  yyparse
>   1.67     75.65     2.15 27806760     0.00     0.00  count_pseudo
>   1.58     77.68     2.03    15315     0.13     0.47  order_regs_for_reload

I think we don't have to worry about the quadratic algorithm for now if
we can gain a linear factor of 32.  prune_preferences is so slow because
it tests every bit in the array separately.  We can process them one
word at a time instead, like record_one_conflict / record_conflicts do.

Moreover, CONFLICTP and SET_CONFLICT use a signed division instead of a
shift.  A simple cast to unsigned should remove some unnecessary overhead.

Could you benchmark this patch?

Thu Nov  4 06:15:40 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* global.c (CONFLICTP, SET_CONFLICT): Avoid signed division.
	(mirror_conflicts): New function.
	(global_alloc): Call it.
	(expand_preferences): Remove redundant CONFLICTP test.
	(find_reg, dump_conflicts): Likewise.
	(prune_preferences): Process conflicts one word at a time.

Index: global.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/global.c,v
retrieving revision 1.39
diff -p -r1.39 global.c
*** global.c	1999/11/01 23:19:44	1.39
--- global.c	1999/11/04 06:12:01
*************** static int allocno_row_words;
*** 127,138 ****
  /* Two macros to test or store 1 in an element of `conflicts'.  */
  
  #define CONFLICTP(I, J) \
!  (conflicts[(I) * allocno_row_words + (J) / INT_BITS]	\
!   & ((INT_TYPE) 1 << ((J) % INT_BITS)))
  
  #define SET_CONFLICT(I, J) \
!  (conflicts[(I) * allocno_row_words + (J) / INT_BITS]	\
!   |= ((INT_TYPE) 1 << ((J) % INT_BITS)))
  
  /* Set of hard regs currently live (during scan of all insns).  */
  
--- 127,138 ----
  /* Two macros to test or store 1 in an element of `conflicts'.  */
  
  #define CONFLICTP(I, J) \
!  (conflicts[(I) * allocno_row_words + (unsigned)(J) / INT_BITS]	\
!   & ((INT_TYPE) 1 << ((unsigned)(J) % INT_BITS)))
  
  #define SET_CONFLICT(I, J) \
!  (conflicts[(I) * allocno_row_words + (unsigned)(J) / INT_BITS]	\
!   |= ((INT_TYPE) 1 << ((unsigned)(J) % INT_BITS)))
  
  /* Set of hard regs currently live (during scan of all insns).  */
  
*************** static HARD_REG_SET eliminable_regset;
*** 259,264 ****
--- 259,265 ----
  
  static int allocno_compare	PROTO((const PTR, const PTR));
  static void global_conflicts	PROTO((void));
+ static void mirror_conflicts	PROTO((void));
  static void expand_preferences	PROTO((void));
  static void prune_preferences	PROTO((void));
  static void find_reg		PROTO((int, HARD_REG_SET, int, int, int));
*************** global_alloc (file)
*** 486,491 ****
--- 487,494 ----
  
        global_conflicts ();
  
+       mirror_conflicts ();
+ 
        /* Eliminate conflicts between pseudos and eliminable registers.  If
  	 the register is not eliminated, the pseudo won't really be able to
  	 live in the eliminable register, so the conflict doesn't matter.
*************** expand_preferences ()
*** 818,826 ****
  	    && GET_CODE (XEXP (link, 0)) == REG
  	    && reg_allocno[REGNO (XEXP (link, 0))] >= 0
  	    && ! CONFLICTP (reg_allocno[REGNO (SET_DEST (set))],
! 			    reg_allocno[REGNO (XEXP (link, 0))])
! 	    && ! CONFLICTP (reg_allocno[REGNO (XEXP (link, 0))],
! 			    reg_allocno[REGNO (SET_DEST (set))]))
  	  {
  	    int a1 = reg_allocno[REGNO (SET_DEST (set))];
  	    int a2 = reg_allocno[REGNO (XEXP (link, 0))];
--- 821,827 ----
  	    && GET_CODE (XEXP (link, 0)) == REG
  	    && reg_allocno[REGNO (XEXP (link, 0))] >= 0
  	    && ! CONFLICTP (reg_allocno[REGNO (SET_DEST (set))],
! 			    reg_allocno[REGNO (XEXP (link, 0))]))
  	  {
  	    int a1 = reg_allocno[REGNO (SET_DEST (set))];
  	    int a2 = reg_allocno[REGNO (XEXP (link, 0))];
*************** prune_preferences ()
*** 856,861 ****
--- 857,863 ----
  {
    int i, j;
    int allocno;
+   int *allocno_to_order = (int *) xmalloc (max_allocno * sizeof (int));
    
    /* Scan least most important to most important.
       For each allocno, remove from preferences registers that cannot be used,
*************** prune_preferences ()
*** 864,872 ****
  
    for (i = max_allocno - 1; i >= 0; i--)
      {
!       HARD_REG_SET temp, temp2;
  
        allocno = allocno_order[i];
        COPY_HARD_REG_SET (temp, hard_reg_conflicts[allocno]);
  
        if (allocno_calls_crossed[allocno] == 0)
--- 866,875 ----
  
    for (i = max_allocno - 1; i >= 0; i--)
      {
!       HARD_REG_SET temp;
  
        allocno = allocno_order[i];
+       allocno_to_order[allocno] = i;
        COPY_HARD_REG_SET (temp, hard_reg_conflicts[allocno]);
  
        if (allocno_calls_crossed[allocno] == 0)
*************** prune_preferences ()
*** 881,909 ****
        AND_COMPL_HARD_REG_SET (hard_reg_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_copy_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_full_preferences[allocno], temp);
  
        /* Merge in the preferences of lower-priority registers (they have
  	 already been pruned).  If we also prefer some of those registers,
  	 don't exclude them unless we are of a smaller size (in which case
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
        CLEAR_HARD_REG_SET (temp);
        CLEAR_HARD_REG_SET (temp2);
!       for (j = i + 1; j < max_allocno; j++)
! 	if (CONFLICTP (allocno, allocno_order[j])
! 	    || CONFLICTP (allocno_order[j], allocno))
! 	  {
! 	    if (allocno_size[allocno_order[j]] <= allocno_size[allocno])
! 	      IOR_HARD_REG_SET (temp,
! 				hard_reg_full_preferences[allocno_order[j]]);
! 	    else
! 	      IOR_HARD_REG_SET (temp2,
! 				hard_reg_full_preferences[allocno_order[j]]);
! 	  }
        AND_COMPL_HARD_REG_SET (temp, hard_reg_full_preferences[allocno]);
        IOR_HARD_REG_SET (temp, temp2);
        COPY_HARD_REG_SET (regs_someone_prefers[allocno], temp);
      }
  }
  \f
  /* Assign a hard register to ALLOCNO; look for one that is the beginning
--- 884,932 ----
        AND_COMPL_HARD_REG_SET (hard_reg_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_copy_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_full_preferences[allocno], temp);
+     }
  
+   for (i = max_allocno - 1; i >= 0; i--)
+     {
        /* Merge in the preferences of lower-priority registers (they have
  	 already been pruned).  If we also prefer some of those registers,
  	 don't exclude them unless we are of a smaller size (in which case
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
+       HARD_REG_SET temp, temp2;
+       INT_TYPE *p;
+       int allocno2, allocno3;
+ 
+       allocno = allocno_order[i];
+       p = conflicts + allocno * allocno_row_words;
+ 
        CLEAR_HARD_REG_SET (temp);
        CLEAR_HARD_REG_SET (temp2);
! 
!       for (j = allocno_row_words - 1, allocno2 = 0; j >= 0;
! 	   j--, allocno2 += INT_BITS)
! 	{
! 	  unsigned INT_TYPE word = (unsigned INT_TYPE) *p++;
! 
! 	  for (allocno3 = allocno2; word; word >>= 1, allocno3++)
! 	    {
! 	      if (word & 1 && allocno_to_order[allocno3] > i)
! 		{
! 		  if (allocno_size[allocno3] <= allocno_size[allocno])
! 		    IOR_HARD_REG_SET (temp,
! 				      hard_reg_full_preferences[allocno3]);
! 		  else
! 		    IOR_HARD_REG_SET (temp2,
! 				      hard_reg_full_preferences[allocno3]);
! 		}
! 	    }
! 	}
! 
        AND_COMPL_HARD_REG_SET (temp, hard_reg_full_preferences[allocno]);
        IOR_HARD_REG_SET (temp, temp2);
        COPY_HARD_REG_SET (regs_someone_prefers[allocno], temp);
      }
+   free (allocno_to_order);
  }
  \f
  /* Assign a hard register to ALLOCNO; look for one that is the beginning
*************** find_reg (allocno, losers, alt_regs_p, a
*** 1219,1225 ****
  	 mark it as conflicting with the hard regs this one occupies.  */
        lim = allocno;
        for (j = 0; j < max_allocno; j++)
! 	if (CONFLICTP (lim, j) || CONFLICTP (j, lim))
  	  {
  	    IOR_HARD_REG_SET (hard_reg_conflicts[j], this_reg);
  	  }
--- 1242,1248 ----
  	 mark it as conflicting with the hard regs this one occupies.  */
        lim = allocno;
        for (j = 0; j < max_allocno; j++)
! 	if (CONFLICTP (j, lim))
  	  {
  	    IOR_HARD_REG_SET (hard_reg_conflicts[j], this_reg);
  	  }
*************** record_conflicts (allocno_vec, len)
*** 1327,1332 ****
--- 1350,1387 ----
  	conflicts[ialloc_prod + j] |= allocnos_live[j];
      }
  }
+ 
+ /* If CONFLICTP (i, j) is true, make sure CONFLICTP (j, i) is also true.  */
+ static void
+ mirror_conflicts ()
+ {
+   register int i, j;
+   int rw = allocno_row_words;
+   int rwb = rw * INT_BITS;
+   INT_TYPE *p = conflicts;
+   INT_TYPE *q0 = conflicts, *q1, *q2;
+   unsigned INT_TYPE mask;
+ 
+   for (i = max_allocno - 1, mask = 1; i >= 0; i--, mask <<= 1)
+     {
+       if (! mask)
+ 	{
+ 	  mask = 1;
+ 	  q0++;
+ 	}
+       for (j = allocno_row_words - 1, q1 = q0; j >= 0; j--, q1 += rwb)
+ 	{
+ 	  unsigned INT_TYPE word;
+ 
+ 	  for (word = (unsigned INT_TYPE) *p++, q2 = q1; word;
+ 	       word >>= 1, q2 += rw)
+ 	    {
+ 	      if (word & 1)
+ 		*q2 |= mask;
+ 	    }
+ 	}
+     }
+ }
  \f
  /* Handle the case where REG is set by the insn being scanned,
     during the forward scan to accumulate conflicts.
*************** dump_conflicts (file)
*** 1805,1811 ****
        register int j;
        fprintf (file, ";; %d conflicts:", allocno_reg[i]);
        for (j = 0; j < max_allocno; j++)
! 	if (CONFLICTP (i, j) || CONFLICTP (j, i))
  	  fprintf (file, " %d", allocno_reg[j]);
        for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
  	if (TEST_HARD_REG_BIT (hard_reg_conflicts[i], j))
--- 1860,1866 ----
        register int j;
        fprintf (file, ";; %d conflicts:", allocno_reg[i]);
        for (j = 0; j < max_allocno; j++)
! 	if (CONFLICTP (j, i))
  	  fprintf (file, " %d", allocno_reg[j]);
        for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
  	if (TEST_HARD_REG_BIT (hard_reg_conflicts[i], j))

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-03 22:31             ` Joern Rennecke
@ 1999-11-04 17:19               ` Richard Henderson
  1999-11-05  0:33                 ` Jeffrey A Law
  1999-11-30 23:37                 ` Richard Henderson
  1999-11-30 23:37               ` Joern Rennecke
  1 sibling, 2 replies; 54+ messages in thread
From: Richard Henderson @ 1999-11-04 17:19 UTC (permalink / raw)
  To: Joern Rennecke
  Cc: Brad Lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

On Thu, Nov 04, 1999 at 06:29:25AM +0000, Joern Rennecke wrote:
> ! 	      if (word & 1 && allocno_to_order[allocno3] > i)

Parenthesis problem around the `&'.

Also fix the commentary for `conflicts' where it says that
the matrix is asymmetric.

Otherwise the patch looks good, and does make an enormous difference.
Time spent in global for Wilhelm Kloke's test case is down from 1687 
seconds to 93 seconds.

Feel free to commit after fixing nits.  Thanks!

r~

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-04 17:19               ` Richard Henderson
@ 1999-11-05  0:33                 ` Jeffrey A Law
  1999-11-30 23:37                   ` Jeffrey A Law
  1999-11-30 23:37                 ` Richard Henderson
  1 sibling, 1 reply; 54+ messages in thread
From: Jeffrey A Law @ 1999-11-05  0:33 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joern Rennecke, Brad Lucier, gcc, gcc-bugs, staff, hosking,
	wilker, bernds, gcc-patches

  In message < 19991104171948.B9751@cygnus.com >you write:
  > On Thu, Nov 04, 1999 at 06:29:25AM +0000, Joern Rennecke wrote:
  > > ! 	      if (word & 1 && allocno_to_order[allocno3] > i)
  > 
  > Parenthesis problem around the `&'.
  > 
  > Also fix the commentary for `conflicts' where it says that
  > the matrix is asymmetric.
  > 
  > Otherwise the patch looks good, and does make an enormous difference.
  > Time spent in global for Wilhelm Kloke's test case is down from 1687 
  > seconds to 93 seconds.
  > 
  > Feel free to commit after fixing nits.  Thanks!
I took care of the nits and installed Joern's patch.

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05  0:33                 ` Jeffrey A Law
@ 1999-11-30 23:37                   ` Jeffrey A Law
  0 siblings, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-11-30 23:37 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joern Rennecke, Brad Lucier, gcc, gcc-bugs, staff, hosking,
	wilker, bernds, gcc-patches

  In message < 19991104171948.B9751@cygnus.com >you write:
  > On Thu, Nov 04, 1999 at 06:29:25AM +0000, Joern Rennecke wrote:
  > > ! 	      if (word & 1 && allocno_to_order[allocno3] > i)
  > 
  > Parenthesis problem around the `&'.
  > 
  > Also fix the commentary for `conflicts' where it says that
  > the matrix is asymmetric.
  > 
  > Otherwise the patch looks good, and does make an enormous difference.
  > Time spent in global for Wilhelm Kloke's test case is down from 1687 
  > seconds to 93 seconds.
  > 
  > Feel free to commit after fixing nits.  Thanks!
I took care of the nits and installed Joern's patch.

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-04 17:19               ` Richard Henderson
  1999-11-05  0:33                 ` Jeffrey A Law
@ 1999-11-30 23:37                 ` Richard Henderson
  1 sibling, 0 replies; 54+ messages in thread
From: Richard Henderson @ 1999-11-30 23:37 UTC (permalink / raw)
  To: Joern Rennecke
  Cc: Brad Lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

On Thu, Nov 04, 1999 at 06:29:25AM +0000, Joern Rennecke wrote:
> ! 	      if (word & 1 && allocno_to_order[allocno3] > i)

Parenthesis problem around the `&'.

Also fix the commentary for `conflicts' where it says that
the matrix is asymmetric.

Otherwise the patch looks good, and does make an enormous difference.
Time spent in global for Wilhelm Kloke's test case is down from 1687 
seconds to 93 seconds.

Feel free to commit after fixing nits.  Thanks!

r~

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-03 22:31             ` Joern Rennecke
  1999-11-04 17:19               ` Richard Henderson
@ 1999-11-30 23:37               ` Joern Rennecke
  1 sibling, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-11-30 23:37 UTC (permalink / raw)
  To: Brad Lucier
  Cc: rth, lucier, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

> Each sample counts as 0.000976562 seconds.
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name    
>  44.25     56.94    56.94       16  3558.59  3559.59  prune_preferences
>   8.93     68.43    11.49   202789     0.06     0.06  record_one_conflict
>   2.05     71.07     2.64 40979048     0.00     0.00  bitmap_bit_p
>   1.88     73.49     2.42        8   303.10 16018.62  yyparse
>   1.67     75.65     2.15 27806760     0.00     0.00  count_pseudo
>   1.58     77.68     2.03    15315     0.13     0.47  order_regs_for_reload

I think we don't have to worry about the quadratic algorithm for now if
we can gain a linear factor of 32.  prune_preferences is so slow because
it tests every bit in the array separately.  We can process them one
word at a time instead, like record_one_conflict / record_conflicts do.

Moreover, CONFLICTP and SET_CONFLICT use a signed division instead of a
shift.  A simple cast to unsigned should remove some unnecessary overhead.

Could you benchmark this patch?

Thu Nov  4 06:15:40 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* global.c (CONFLICTP, SET_CONFLICT): Avoid signed division.
	(mirror_conflicts): New function.
	(global_alloc): Call it.
	(expand_preferences): Remove redundant CONFLICTP test.
	(find_reg, dump_conflicts): Likewise.
	(prune_preferences): Process conflicts one word at a time.

Index: global.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/global.c,v
retrieving revision 1.39
diff -p -r1.39 global.c
*** global.c	1999/11/01 23:19:44	1.39
--- global.c	1999/11/04 06:12:01
*************** static int allocno_row_words;
*** 127,138 ****
  /* Two macros to test or store 1 in an element of `conflicts'.  */
  
  #define CONFLICTP(I, J) \
!  (conflicts[(I) * allocno_row_words + (J) / INT_BITS]	\
!   & ((INT_TYPE) 1 << ((J) % INT_BITS)))
  
  #define SET_CONFLICT(I, J) \
!  (conflicts[(I) * allocno_row_words + (J) / INT_BITS]	\
!   |= ((INT_TYPE) 1 << ((J) % INT_BITS)))
  
  /* Set of hard regs currently live (during scan of all insns).  */
  
--- 127,138 ----
  /* Two macros to test or store 1 in an element of `conflicts'.  */
  
  #define CONFLICTP(I, J) \
!  (conflicts[(I) * allocno_row_words + (unsigned)(J) / INT_BITS]	\
!   & ((INT_TYPE) 1 << ((unsigned)(J) % INT_BITS)))
  
  #define SET_CONFLICT(I, J) \
!  (conflicts[(I) * allocno_row_words + (unsigned)(J) / INT_BITS]	\
!   |= ((INT_TYPE) 1 << ((unsigned)(J) % INT_BITS)))
  
  /* Set of hard regs currently live (during scan of all insns).  */
  
*************** static HARD_REG_SET eliminable_regset;
*** 259,264 ****
--- 259,265 ----
  
  static int allocno_compare	PROTO((const PTR, const PTR));
  static void global_conflicts	PROTO((void));
+ static void mirror_conflicts	PROTO((void));
  static void expand_preferences	PROTO((void));
  static void prune_preferences	PROTO((void));
  static void find_reg		PROTO((int, HARD_REG_SET, int, int, int));
*************** global_alloc (file)
*** 486,491 ****
--- 487,494 ----
  
        global_conflicts ();
  
+       mirror_conflicts ();
+ 
        /* Eliminate conflicts between pseudos and eliminable registers.  If
  	 the register is not eliminated, the pseudo won't really be able to
  	 live in the eliminable register, so the conflict doesn't matter.
*************** expand_preferences ()
*** 818,826 ****
  	    && GET_CODE (XEXP (link, 0)) == REG
  	    && reg_allocno[REGNO (XEXP (link, 0))] >= 0
  	    && ! CONFLICTP (reg_allocno[REGNO (SET_DEST (set))],
! 			    reg_allocno[REGNO (XEXP (link, 0))])
! 	    && ! CONFLICTP (reg_allocno[REGNO (XEXP (link, 0))],
! 			    reg_allocno[REGNO (SET_DEST (set))]))
  	  {
  	    int a1 = reg_allocno[REGNO (SET_DEST (set))];
  	    int a2 = reg_allocno[REGNO (XEXP (link, 0))];
--- 821,827 ----
  	    && GET_CODE (XEXP (link, 0)) == REG
  	    && reg_allocno[REGNO (XEXP (link, 0))] >= 0
  	    && ! CONFLICTP (reg_allocno[REGNO (SET_DEST (set))],
! 			    reg_allocno[REGNO (XEXP (link, 0))]))
  	  {
  	    int a1 = reg_allocno[REGNO (SET_DEST (set))];
  	    int a2 = reg_allocno[REGNO (XEXP (link, 0))];
*************** prune_preferences ()
*** 856,861 ****
--- 857,863 ----
  {
    int i, j;
    int allocno;
+   int *allocno_to_order = (int *) xmalloc (max_allocno * sizeof (int));
    
    /* Scan least most important to most important.
       For each allocno, remove from preferences registers that cannot be used,
*************** prune_preferences ()
*** 864,872 ****
  
    for (i = max_allocno - 1; i >= 0; i--)
      {
!       HARD_REG_SET temp, temp2;
  
        allocno = allocno_order[i];
        COPY_HARD_REG_SET (temp, hard_reg_conflicts[allocno]);
  
        if (allocno_calls_crossed[allocno] == 0)
--- 866,875 ----
  
    for (i = max_allocno - 1; i >= 0; i--)
      {
!       HARD_REG_SET temp;
  
        allocno = allocno_order[i];
+       allocno_to_order[allocno] = i;
        COPY_HARD_REG_SET (temp, hard_reg_conflicts[allocno]);
  
        if (allocno_calls_crossed[allocno] == 0)
*************** prune_preferences ()
*** 881,909 ****
        AND_COMPL_HARD_REG_SET (hard_reg_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_copy_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_full_preferences[allocno], temp);
  
        /* Merge in the preferences of lower-priority registers (they have
  	 already been pruned).  If we also prefer some of those registers,
  	 don't exclude them unless we are of a smaller size (in which case
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
        CLEAR_HARD_REG_SET (temp);
        CLEAR_HARD_REG_SET (temp2);
!       for (j = i + 1; j < max_allocno; j++)
! 	if (CONFLICTP (allocno, allocno_order[j])
! 	    || CONFLICTP (allocno_order[j], allocno))
! 	  {
! 	    if (allocno_size[allocno_order[j]] <= allocno_size[allocno])
! 	      IOR_HARD_REG_SET (temp,
! 				hard_reg_full_preferences[allocno_order[j]]);
! 	    else
! 	      IOR_HARD_REG_SET (temp2,
! 				hard_reg_full_preferences[allocno_order[j]]);
! 	  }
        AND_COMPL_HARD_REG_SET (temp, hard_reg_full_preferences[allocno]);
        IOR_HARD_REG_SET (temp, temp2);
        COPY_HARD_REG_SET (regs_someone_prefers[allocno], temp);
      }
  }
  \f
  /* Assign a hard register to ALLOCNO; look for one that is the beginning
--- 884,932 ----
        AND_COMPL_HARD_REG_SET (hard_reg_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_copy_preferences[allocno], temp);
        AND_COMPL_HARD_REG_SET (hard_reg_full_preferences[allocno], temp);
+     }
  
+   for (i = max_allocno - 1; i >= 0; i--)
+     {
        /* Merge in the preferences of lower-priority registers (they have
  	 already been pruned).  If we also prefer some of those registers,
  	 don't exclude them unless we are of a smaller size (in which case
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
+       HARD_REG_SET temp, temp2;
+       INT_TYPE *p;
+       int allocno2, allocno3;
+ 
+       allocno = allocno_order[i];
+       p = conflicts + allocno * allocno_row_words;
+ 
        CLEAR_HARD_REG_SET (temp);
        CLEAR_HARD_REG_SET (temp2);
! 
!       for (j = allocno_row_words - 1, allocno2 = 0; j >= 0;
! 	   j--, allocno2 += INT_BITS)
! 	{
! 	  unsigned INT_TYPE word = (unsigned INT_TYPE) *p++;
! 
! 	  for (allocno3 = allocno2; word; word >>= 1, allocno3++)
! 	    {
! 	      if (word & 1 && allocno_to_order[allocno3] > i)
! 		{
! 		  if (allocno_size[allocno3] <= allocno_size[allocno])
! 		    IOR_HARD_REG_SET (temp,
! 				      hard_reg_full_preferences[allocno3]);
! 		  else
! 		    IOR_HARD_REG_SET (temp2,
! 				      hard_reg_full_preferences[allocno3]);
! 		}
! 	    }
! 	}
! 
        AND_COMPL_HARD_REG_SET (temp, hard_reg_full_preferences[allocno]);
        IOR_HARD_REG_SET (temp, temp2);
        COPY_HARD_REG_SET (regs_someone_prefers[allocno], temp);
      }
+   free (allocno_to_order);
  }
  \f
  /* Assign a hard register to ALLOCNO; look for one that is the beginning
*************** find_reg (allocno, losers, alt_regs_p, a
*** 1219,1225 ****
  	 mark it as conflicting with the hard regs this one occupies.  */
        lim = allocno;
        for (j = 0; j < max_allocno; j++)
! 	if (CONFLICTP (lim, j) || CONFLICTP (j, lim))
  	  {
  	    IOR_HARD_REG_SET (hard_reg_conflicts[j], this_reg);
  	  }
--- 1242,1248 ----
  	 mark it as conflicting with the hard regs this one occupies.  */
        lim = allocno;
        for (j = 0; j < max_allocno; j++)
! 	if (CONFLICTP (j, lim))
  	  {
  	    IOR_HARD_REG_SET (hard_reg_conflicts[j], this_reg);
  	  }
*************** record_conflicts (allocno_vec, len)
*** 1327,1332 ****
--- 1350,1387 ----
  	conflicts[ialloc_prod + j] |= allocnos_live[j];
      }
  }
+ 
+ /* If CONFLICTP (i, j) is true, make sure CONFLICTP (j, i) is also true.  */
+ static void
+ mirror_conflicts ()
+ {
+   register int i, j;
+   int rw = allocno_row_words;
+   int rwb = rw * INT_BITS;
+   INT_TYPE *p = conflicts;
+   INT_TYPE *q0 = conflicts, *q1, *q2;
+   unsigned INT_TYPE mask;
+ 
+   for (i = max_allocno - 1, mask = 1; i >= 0; i--, mask <<= 1)
+     {
+       if (! mask)
+ 	{
+ 	  mask = 1;
+ 	  q0++;
+ 	}
+       for (j = allocno_row_words - 1, q1 = q0; j >= 0; j--, q1 += rwb)
+ 	{
+ 	  unsigned INT_TYPE word;
+ 
+ 	  for (word = (unsigned INT_TYPE) *p++, q2 = q1; word;
+ 	       word >>= 1, q2 += rw)
+ 	    {
+ 	      if (word & 1)
+ 		*q2 |= mask;
+ 	    }
+ 	}
+     }
+ }
  \f
  /* Handle the case where REG is set by the insn being scanned,
     during the forward scan to accumulate conflicts.
*************** dump_conflicts (file)
*** 1805,1811 ****
        register int j;
        fprintf (file, ";; %d conflicts:", allocno_reg[i]);
        for (j = 0; j < max_allocno; j++)
! 	if (CONFLICTP (i, j) || CONFLICTP (j, i))
  	  fprintf (file, " %d", allocno_reg[j]);
        for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
  	if (TEST_HARD_REG_BIT (hard_reg_conflicts[i], j))
--- 1860,1866 ----
        register int j;
        fprintf (file, ";; %d conflicts:", allocno_reg[i]);
        for (j = 0; j < max_allocno; j++)
! 	if (CONFLICTP (j, i))
  	  fprintf (file, " %d", allocno_reg[j]);
        for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
  	if (TEST_HARD_REG_BIT (hard_reg_conflicts[i], j))

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 12:31 ` Michael Meissner
  1999-11-05 12:34   ` Joern Rennecke
@ 1999-11-30 23:37   ` Michael Meissner
  1 sibling, 0 replies; 54+ messages in thread
From: Michael Meissner @ 1999-11-30 23:37 UTC (permalink / raw)
  To: Thomas E Deweese, Joern Rennecke, lucier; +Cc: gcc

On Fri, Nov 05, 1999 at 03:14:50PM -0500, Thomas E Deweese wrote:
> 
> JR> Well, since it worked for one place that looks for set bits in
> JR> allocno sets, let's see if it works in the two other ones:
> 
> 	I'm not 100% certain of the density of 1's in the bit fields
> but my impression is that they are fairly sparse (I'm guessing,
> especially in the 'conflicts' case).  In which case I would suggest at
> least putting a check on any bits being set in 'word_' around the
> inner for loop.

If they are really large and sparse, bitmaps might be more effective than
sbitmaps (assuming you use the EXECUTE_ macro in bitmaps).

-- 
Michael Meissner, Cygnus Solutions
PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886
email: meissner@cygnus.com	phone: 978-486-9304	fax: 978-692-4482

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 14:43 Brad Lucier
@ 1999-11-30 23:37 ` Brad Lucier
  0 siblings, 0 replies; 54+ messages in thread
From: Brad Lucier @ 1999-11-30 23:37 UTC (permalink / raw)
  To: lucier, amylaar
  Cc: rth, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

Joern:

With your latest patch, the time for global_alloc is reduced to:

time in global-alloc: 30.228672 (20%)

which is down from 883 seconds yesterday.  Fantastic!

The top times for various functions are now:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  8.62     10.72    10.72    26627     0.40     0.40  record_conflicts
  7.36     19.87     9.15    30909     0.30     0.30  delete_from_jump_chain
  5.08     26.19     6.32        8   790.28  5353.28  jump_optimize_1
  4.65     31.98     5.79 10448965     0.00     0.00  rtx_renumbered_equal_p
  4.07     37.04     5.07  9164467     0.00     0.00  find_cross_jump
  3.64     41.57     4.53 16835940     0.00     0.00  next_active_insn
  3.41     45.81     4.24    93299     0.05     0.05  make_edge
  3.18     49.76     3.95   194464     0.02     0.02  record_one_conflict
  2.11     52.39     2.62        1  2623.05 123824.10  yyparse
  1.55     54.32     1.93  3127006     0.00     0.00  next_label
  1.50     56.18     1.87  2379715     0.00     0.00  yylex
  1.43     57.96     1.78        2   890.62   891.94  prune_preferences
  1.38     59.68     1.72        1  1715.82  1715.82  output_273
  1.32     61.32     1.64        2   819.82   819.82  mirror_conflicts

Brad

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 10:42 ` Joern Rennecke
  1999-11-05 15:44   ` Jeffrey A Law
@ 1999-11-30 23:37   ` Joern Rennecke
  1 sibling, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-11-30 23:37 UTC (permalink / raw)
  To: Brad Lucier
  Cc: lucier, amylaar, rth, gcc, gcc-bugs, staff, hosking, wilker,
	bernds, gcc-patches

> and the top functions were
> 
>  19.82     67.88    67.88 1599926669     0.00     0.00  bitmap_bit_p
>  15.83    122.07    54.19        3 18063.15 41030.16  build_insn_chain
>  14.52    171.77    49.71     2789    17.82    17.82  find_reg
>  13.64    218.49    46.72   194464     0.24     0.24  record_one_conflict
>   3.14    229.26    10.76    26627     0.40     0.40  record_conflicts
>   2.94    239.32    10.06    30909     0.33     0.33  delete_from_jump_chain
> ...
>   0.51    277.15     1.74        2   869.63   870.11  prune_preferences
> 
> So, even though global-alloc is still the biggest time sink, your patch
> makes a *big* difference.  (I don't believe the times reported in each
> pass of the compiler are reliable when it is profiled with -pg.)

Well, since it worked for one place that looks for set bits in allocno sets,
let's see if it works in the two other ones:

Fri Nov  5 18:33:39 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* global.c (EXECUTE_IF_SET_IN_ALLOCNO_SET): New macro.
	(EXECUTE_IF_CONFLICT): Likewise.
	(ALLOCNO_LIVE_P): Avoid signed division.
	(SET_ALLOCNO_LIVE, CLEAR_ALLOCNO_LIVE): Likewise.
	(prune_preferences, find_reg): Use EXECUTE_IF_CONFLICT.
	(record_one_conflict): Use EXECUTE_IF_SET_IN_ALLOCNO_SET.

Index: global.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/global.c,v
retrieving revision 1.40
diff -p -r1.40 global.c
*** global.c	1999/11/05 08:31:48	1.40
--- global.c	1999/11/05 18:31:06
*************** static int allocno_row_words;
*** 139,144 ****
--- 139,172 ----
    &= ~ ((INT_TYPE) 1 << ((J) % INT_BITS)))
  /* END CYGNUS LOCAL */
  
+ /* For any allocno set in ALLOCNO_SET, set OUT_ALLOCNO to that allocno,
+    and execute CODE.  */
+ #define EXECUTE_IF_SET_IN_ALLOCNO_SET(ALLOCNO_SET, ALLOCNO, CODE)	\
+ do {									\
+   int i_;								\
+   int allocno_;								\
+   INT_TYPE *p_ = (ALLOCNO_SET);		\
+ 									\
+   for (i_ = allocno_row_words - 1, allocno_ = 0; i_ >= 0;		\
+        i_--, allocno_ += INT_BITS)					\
+     {									\
+       unsigned INT_TYPE word_ = (unsigned INT_TYPE) *p_++;		\
+ 									\
+       for ((ALLOCNO) = allocno_; word_; word_ >>= 1, (ALLOCNO)++)	\
+ 	{								\
+ 	  if (word_ & 1)						\
+ 	    CODE;							\
+ 	}								\
+     }									\
+ } while (0)
+ 
+ /* For any allocno that conflicts with IN_ALLOCNO, set OUT_ALLOCNO to
+    the conflicting allocno, and execute CODE.  This macro assumes that
+    mirror_conflicts has been run.  */
+ #define EXECUTE_IF_CONFLICT(IN_ALLOCNO, OUT_ALLOCNO, CODE)\
+   EXECUTE_IF_SET_IN_ALLOCNO_SET (conflicts + (IN_ALLOCNO) * allocno_row_words,\
+ 				 OUT_ALLOCNO, CODE)
+ 
  /* Set of hard regs currently live (during scan of all insns).  */
  
  static HARD_REG_SET hard_regs_live;
*************** static INT_TYPE *allocnos_live;
*** 217,231 ****
  
  /* Test, set or clear bit number I in allocnos_live,
     a bit vector indexed by allocno.  */
- 
- #define ALLOCNO_LIVE_P(I) \
-   (allocnos_live[(I) / INT_BITS] & ((INT_TYPE) 1 << ((I) % INT_BITS)))
- 
- #define SET_ALLOCNO_LIVE(I) \
-   (allocnos_live[(I) / INT_BITS] |= ((INT_TYPE) 1 << ((I) % INT_BITS)))
  
! #define CLEAR_ALLOCNO_LIVE(I) \
!   (allocnos_live[(I) / INT_BITS] &= ~((INT_TYPE) 1 << ((I) % INT_BITS)))
  
  /* This is turned off because it doesn't work right for DImode.
     (And it is only used for DImode, so the other cases are worthless.)
--- 245,262 ----
  
  /* Test, set or clear bit number I in allocnos_live,
     a bit vector indexed by allocno.  */
  
! #define ALLOCNO_LIVE_P(I)				\
!   (allocnos_live[(unsigned)(I) / INT_BITS]		\
!    & ((INT_TYPE) 1 << ((unsigned)(I) % INT_BITS)))
! 
! #define SET_ALLOCNO_LIVE(I)				\
!   (allocnos_live[(unsigned)(I) / INT_BITS]		\
!      |= ((INT_TYPE) 1 << ((unsigned)(I) % INT_BITS)))
! 
! #define CLEAR_ALLOCNO_LIVE(I)				\
!   (allocnos_live[(unsigned)(I) / INT_BITS]		\
!      &= ~((INT_TYPE) 1 << ((unsigned)(I) % INT_BITS)))
  
  /* This is turned off because it doesn't work right for DImode.
     (And it is only used for DImode, so the other cases are worthless.)
*************** expand_preferences ()
*** 860,866 ****
  static void
  prune_preferences ()
  {
!   int i, j;
    int allocno;
    int *allocno_to_order = (int *) xmalloc (max_allocno * sizeof (int));
    
--- 891,897 ----
  static void
  prune_preferences ()
  {
!   int i;
    int allocno;
    int *allocno_to_order = (int *) xmalloc (max_allocno * sizeof (int));
    
*************** prune_preferences ()
*** 899,931 ****
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
        HARD_REG_SET temp, temp2;
!       INT_TYPE *p;
!       int allocno2, allocno3;
  
        allocno = allocno_order[i];
-       p = conflicts + allocno * allocno_row_words;
  
        CLEAR_HARD_REG_SET (temp);
        CLEAR_HARD_REG_SET (temp2);
  
!       for (j = allocno_row_words - 1, allocno2 = 0; j >= 0;
! 	   j--, allocno2 += INT_BITS)
  	{
! 	  unsigned INT_TYPE word = (unsigned INT_TYPE) *p++;
! 
! 	  for (allocno3 = allocno2; word; word >>= 1, allocno3++)
  	    {
! 	      if ((word & 1) && allocno_to_order[allocno3] > i)
! 		{
! 		  if (allocno_size[allocno3] <= allocno_size[allocno])
! 		    IOR_HARD_REG_SET (temp,
! 				      hard_reg_full_preferences[allocno3]);
! 		  else
! 		    IOR_HARD_REG_SET (temp2,
! 				      hard_reg_full_preferences[allocno3]);
! 		}
  	    }
! 	}
  
        AND_COMPL_HARD_REG_SET (temp, hard_reg_full_preferences[allocno]);
        IOR_HARD_REG_SET (temp, temp2);
--- 930,952 ----
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
        HARD_REG_SET temp, temp2;
!       int allocno2;
  
        allocno = allocno_order[i];
  
        CLEAR_HARD_REG_SET (temp);
        CLEAR_HARD_REG_SET (temp2);
  
!       EXECUTE_IF_CONFLICT (allocno, allocno2,
  	{
! 	  if (allocno_to_order[allocno2] > i)
  	    {
! 	      if (allocno_size[allocno2] <= allocno_size[allocno])
! 		IOR_HARD_REG_SET (temp, hard_reg_full_preferences[allocno2]);
! 	      else
! 		IOR_HARD_REG_SET (temp2, hard_reg_full_preferences[allocno2]);
  	    }
! 	});
  
        AND_COMPL_HARD_REG_SET (temp, hard_reg_full_preferences[allocno]);
        IOR_HARD_REG_SET (temp, temp2);
*************** find_reg (allocno, losers, alt_regs_p, a
*** 1246,1256 ****
        /* For each other pseudo-reg conflicting with this one,
  	 mark it as conflicting with the hard regs this one occupies.  */
        lim = allocno;
!       for (j = 0; j < max_allocno; j++)
! 	if (CONFLICTP (j, lim))
! 	  {
! 	    IOR_HARD_REG_SET (hard_reg_conflicts[j], this_reg);
! 	  }
      }
  }
  \f
--- 1267,1276 ----
        /* For each other pseudo-reg conflicting with this one,
  	 mark it as conflicting with the hard regs this one occupies.  */
        lim = allocno;
!       EXECUTE_IF_CONFLICT (lim, j,
! 	{
! 	  IOR_HARD_REG_SET (hard_reg_conflicts[j], this_reg);
! 	});
      }
  }
  \f
*************** record_one_conflict (regno)
*** 1303,1313 ****
    if (regno < FIRST_PSEUDO_REGISTER)
      /* When a hard register becomes live,
         record conflicts with live pseudo regs.  */
!     for (j = 0; j < max_allocno; j++)
        {
! 	if (ALLOCNO_LIVE_P (j))
! 	  SET_HARD_REG_BIT (hard_reg_conflicts[j], regno);
!       }
    else
      /* When a pseudo-register becomes live,
         record conflicts first with hard regs,
--- 1323,1332 ----
    if (regno < FIRST_PSEUDO_REGISTER)
      /* When a hard register becomes live,
         record conflicts with live pseudo regs.  */
!     EXECUTE_IF_SET_IN_ALLOCNO_SET (allocnos_live, j,
        {
! 	SET_HARD_REG_BIT (hard_reg_conflicts[j], regno);
!       });
    else
      /* When a pseudo-register becomes live,
         record conflicts first with hard regs,

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 12:34   ` Joern Rennecke
@ 1999-11-30 23:37     ` Joern Rennecke
  0 siblings, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-11-30 23:37 UTC (permalink / raw)
  To: Michael Meissner; +Cc: deweese, amylaar, lucier, gcc

> If they are really large and sparse, bitmaps might be more effective than
> sbitmaps (assuming you use the EXECUTE_ macro in bitmaps).

The trouble with bitmaps, at least in their current implementation, is that
they use linear search.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-04 13:49 Brad Lucier
  1999-11-05 10:42 ` Joern Rennecke
@ 1999-11-30 23:37 ` Brad Lucier
  1 sibling, 0 replies; 54+ messages in thread
From: Brad Lucier @ 1999-11-30 23:37 UTC (permalink / raw)
  To: lucier, amylaar
  Cc: rth, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

> Could you benchmark this patch?
> 
> Thu Nov  4 06:15:40 1999  J"orn Rennecke <amylaar@cygnus.co.uk>
> 
> 	* global.c (CONFLICTP, SET_CONFLICT): Avoid signed division.
> 	(mirror_conflicts): New function.
> 	(global_alloc): Call it.
> 	(expand_preferences): Remove redundant CONFLICTP test.
> 	(find_reg, dump_conflicts): Likewise.
> 	(prune_preferences): Process conflicts one word at a time.
> 

I bootstrapped yesterday's mainline on alphaev6-unknown-linux-gnu,
and ran it on a similar test case.  Without the patch, I get:

/export/u10/egcs-prof/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -mcpu=ev6 -fno-math-errno -mieee -fPIC -O1 _meroon.i 
 ___H__20___meroon {GC 220613k -> 46104k in 1.295} {GC 79384k -> 50882k in 1.474} {GC 69598k -> 53435k in 1.583} ___init_proc {GC 93235k -> 3595k in 0.078} ____20___meroon
...
time in global-alloc: 1237.253728 (85%)
...

and the top functions by time were:

 29.05    142.54   142.54        2 71267.58 71273.92  prune_preferences
 13.86    210.53    68.00 1599926669     0.00     0.00  bitmap_bit_p
 11.99    269.37    58.84     2789    21.10    21.10  find_reg
 10.93    322.99    53.62        3 17873.05 40871.13  build_insn_chain
  9.55    369.85    46.86   194464     0.24     0.24  record_one_conflict
  2.21    380.71    10.85    26627     0.41     0.41  record_conflicts
  2.02    390.61     9.91    30909     0.32     0.32  delete_from_jump_chain

With the patch, I got

/export/u10/egcs-prof2/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -mcpu=ev6 -fno-math-errno -mieee -fPIC -O1 _meroon.i
 ___H__20___meroon {GC 220613k -> 46104k in 1.303} {GC 79384k -> 50882k in 1.459} {GC 69598k -> 53435k in 1.566} ___init_proc {GC 93235k -> 3595k in 0.080} ____20___meroon
...
time in global-alloc: 574.101744 (72%)
...

and the top functions were

 19.82     67.88    67.88 1599926669     0.00     0.00  bitmap_bit_p
 15.83    122.07    54.19        3 18063.15 41030.16  build_insn_chain
 14.52    171.77    49.71     2789    17.82    17.82  find_reg
 13.64    218.49    46.72   194464     0.24     0.24  record_one_conflict
  3.14    229.26    10.76    26627     0.40     0.40  record_conflicts
  2.94    239.32    10.06    30909     0.33     0.33  delete_from_jump_chain
...
  0.51    277.15     1.74        2   869.63   870.11  prune_preferences

So, even though global-alloc is still the biggest time sink, your patch
makes a *big* difference.  (I don't believe the times reported in each
pass of the compiler are reliable when it is profiled with -pg.)

Brad

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 12:15 Thomas E Deweese
  1999-11-05 12:22 ` Joern Rennecke
  1999-11-05 12:31 ` Michael Meissner
@ 1999-11-30 23:37 ` Thomas E Deweese
  2 siblings, 0 replies; 54+ messages in thread
From: Thomas E Deweese @ 1999-11-30 23:37 UTC (permalink / raw)
  To: Joern Rennecke, lucier; +Cc: gcc

JR> Well, since it worked for one place that looks for set bits in
JR> allocno sets, let's see if it works in the two other ones:

	I'm not 100% certain of the density of 1's in the bit fields
but my impression is that they are fairly sparse (I'm guessing,
especially in the 'conflicts' case).  In which case I would suggest at
least putting a check on any bits being set in 'word_' around the
inner for loop.

	Depending on the number of times the list is iterated over it
might even be worth doing a few steps of binary search on word_ to
throw out "large" sections of zero bits. Of course this will actually
slow things down if you approach 50% of the bits being set.

ie:
	if (word_ & 0xFFFF0000) {
	  if (word_ & 0xFF000000) {
	    // check high order byte
	  }
	  if (word_ & 0x00FF0000) {
	    // check 2nd from high order byte
          }
	}
	if (word_ & 0x0000FFFF) {
	  if (word_ & 0x0000FF00) {
	    // check 2nd from low order byte
	  }
	  if (word_ & 0x000000FF) {
	    // check low order byte
          }
	}

	I'm not certain if this is worth the 'uglyness' but this seems
to be a very tight loop given that word orienting the code made such a
large difference.

----


Index: global.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/global.c,v
retrieving revision 1.40
diff -p -r1.40 global.c
*** global.c    1999/11/05 08:31:48     1.40
--- global.c    1999/11/05 18:31:06
*************** static int allocno_row_words;
*** 139,144 ****
--- 139,172 ----
    &= ~ ((INT_TYPE) 1 << ((J) % INT_BITS)))
  /* END CYGNUS LOCAL */
  
+ /* For any allocno set in ALLOCNO_SET, set OUT_ALLOCNO to that allocno,
+    and execute CODE.  */
+ #define EXECUTE_IF_SET_IN_ALLOCNO_SET(ALLOCNO_SET, ALLOCNO, CODE)     \
+ do {                                                                  \
+   int i_;                                                             \
+   int allocno_;                                                               \
+   INT_TYPE *p_ = (ALLOCNO_SET);               \
+                                                                       \
+   for (i_ = allocno_row_words - 1, allocno_ = 0; i_ >= 0;             \
+        i_--, allocno_ += INT_BITS)                                    \
+     {                                                                 \
+       unsigned INT_TYPE word_ = (unsigned INT_TYPE) *p_++;            \
+                                                                       \
+       for ((ALLOCNO) = allocno_; word_; word_ >>= 1, (ALLOCNO)++)     \
+       {                                                               \
+         if (word_ & 1)                                                \
+           CODE;                                                       \
+       }                                                               \
+     }                                                                 \
+ } while (0)

-- 
							Thomas DeWeese
deweese@kodak.com
			"The only difference between theory and practice is
			 that in theory there isn't any." -- unknown

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-08 15:37     ` Joern Rennecke
  1999-11-08 16:04       ` Jeffrey A Law
@ 1999-11-30 23:37       ` Joern Rennecke
  1 sibling, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-11-30 23:37 UTC (permalink / raw)
  To: law
  Cc: amylaar, lucier, rth, gcc, gcc-bugs, staff, hosking, wilker,
	bernds, gcc-patches

> ~8% off my testcase.  

Do you have any profile data for that?

I've noticed that the reg_may_share code in find_reg is also pretty expensive.
With a little more helpful lookup tables it could be made linear instead
of quadratic in time.  So, if find_reg still consumes a noticable chunk
of the time in your benchmark, it might be worth a try.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 15:44   ` Jeffrey A Law
  1999-11-08 15:37     ` Joern Rennecke
@ 1999-11-30 23:37     ` Jeffrey A Law
  1 sibling, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-11-30 23:37 UTC (permalink / raw)
  To: Joern Rennecke
  Cc: Brad Lucier, rth, gcc, gcc-bugs, staff, hosking, wilker, bernds,
	gcc-patches

  In message < 199911051840.SAA21033@phal.cygnus.co.uk >you write:
  > Well, since it worked for one place that looks for set bits in allocno sets
  > ,
  > let's see if it works in the two other ones:
  > 
  > Fri Nov  5 18:33:39 1999  J"orn Rennecke <amylaar@cygnus.co.uk>
  > 
  > 	* global.c (EXECUTE_IF_SET_IN_ALLOCNO_SET): New macro.
  > 	(EXECUTE_IF_CONFLICT): Likewise.
  > 	(ALLOCNO_LIVE_P): Avoid signed division.
  > 	(SET_ALLOCNO_LIVE, CLEAR_ALLOCNO_LIVE): Likewise.
  > 	(prune_preferences, find_reg): Use EXECUTE_IF_CONFLICT.
  > 	(record_one_conflict): Use EXECUTE_IF_SET_IN_ALLOCNO_SET.
~8% off my testcase.  

There was a minor buglet in the change -- you've got go be careful about
expansion of macro arguments :-)  The addition of a level of parens fixed
the problem.

jeff


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-08 16:04       ` Jeffrey A Law
@ 1999-11-30 23:37         ` Jeffrey A Law
  0 siblings, 0 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-11-30 23:37 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: gcc

[ Note, highly trimmed cc line. We don't need to give everyone 3 copies
  of this ;-)]

  In message < 199911082334.XAA19153@phal.cygnus.co.uk >you write:
  > > ~8% off my testcase.  
  > 
  > Do you have any profile data for that?
With that last change from you global's time isn't worth optimizing anymore
for my testcase.  It sits around 6% of the total time, most of that
is spent in reload.

6% may sound high, but it isn't for this file.  cse1+cse2 weigh in at
60% of the total compilation time followed by dbranch and gcse at 14% each
after all the recent work + one lcm.c patch Andrew and I have been working on.

jeff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-09  8:26 Brad Lucier
@ 1999-11-30 23:37 ` Brad Lucier
  0 siblings, 0 replies; 54+ messages in thread
From: Brad Lucier @ 1999-11-30 23:37 UTC (permalink / raw)
  To: law, amylaar
  Cc: lucier, rth, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

> > ~8% off my testcase.  
> 
> Do you have any profile data for that?

I have some data for compiling a large, integer, file with

-O2 -mcpu=ev6 -fno-math-errno -fPIC

on the alpha.  However (1) I don't know how characteristic the program
is to others, and (2) I don't usually compile this file with -O2.  So
if you find this data useful, great, otherwise, ignore it.

The times for each pass are:

popov-53% /export/u10/egcs-prof/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -mcpu=ev6 -fno-math-errno -mieee -fPIC -O2 _meroon.i
__copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20___meroon {GC 160095k -> 33708k in 0.929} {GC 45097k -> 33687k in 0.940} {GC 47074k -> 37662k in 1.075} {GC 50287k -> 35884k in 1.022} {GC 54712k -> 40701k in 1.212} {GC 64544k -> 42767k in 1.267} ___init_proc {GC 86293k -> 3001k in 0.069} {GC 6054k -> 3060k in 0.145} ____20___meroon
time in parse: 21.162608 (-4%)
time in integration: 0.000000 (-0%)
time in jump: 516.838848 (-101%)
time in cse: 15.099696 (-3%)
time in gcse: -1645.-331280 (323%)
time in loop: 1.242448 (-0%)
time in cse2: 13.482464 (-3%)
time in branch-prob: 0.000000 (-0%)
time in flow: 8.959680 (-2%)
time in combine: 11.807648 (-2%)
time in regmove: 3.503840 (-1%)
time in sched: 356.688960 (-70%)
time in local-alloc: 5.562224 (-1%)
time in global-alloc: 34.060448 (-7%)
time in flow2: 8.440448 (-2%)
time in peephole2: 0.180560 (-0%)
time in sched2: 111.030736 (-22%)
time in shorten-branch: 0.581696 (-0%)
time in final: 3.423808 (-1%)
time in varconst: 0.001952 (-0%)
time in symout: 0.000000 (-0%)
time in dump: 0.000000 (-0%)
time in gc: 6.659248 (-1%)

So it seems that the int's used to contain the times in toplev.c don't
quite cut it here--change to unsigned longs (gains 1 bit on 32bit machines,
32 bits on the alpha) or doubles?

The top times in the profile file are:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 24.79    435.90   435.90   205024     0.00     0.00  pre_expr_reaches_here_p_work
 21.63    816.15   380.25 812491634     0.00     0.00  expr_killed_p
  8.85    971.72   155.58        1   155.58   535.82  compute_ae_kill
  7.68   1106.72   135.00    37475     0.00     0.00  compute_block_backward_dependences
  6.31   1217.65   110.93 122524058     0.00     0.00  rtx_renumbered_equal_p
  5.00   1305.47    87.82    97294     0.00     0.00  compute_transp
  4.24   1380.02    74.54 67176104     0.00     0.00  find_cross_jump
  3.67   1444.46    64.44        3    21.48    33.16  compute_hash_table
  2.39   1486.51    42.05  9018524     0.00     0.00  sbitmap_union_of_diff
  1.57   1514.14    27.63    83378     0.00     0.00  insert_expr_in_table
  1.33   1537.50    23.36   223375     0.00     0.00  record_one_set
  1.04   1555.83    18.33    74953     0.00     0.00  count_or_remove_death_notes
  0.92   1571.98    16.16       15     1.08    14.85  jump_optimize_1
  0.70   1584.28    12.30   389447     0.00     0.00  sbitmap_a_and_b
  0.69   1596.49    12.21   389436     0.00     0.00  sbitmap_a_or_b

Something seems the matter with the data for pre_expr_reaches_here_p_work.

Brad

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 12:22 ` Joern Rennecke
@ 1999-11-30 23:37   ` Joern Rennecke
  0 siblings, 0 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-11-30 23:37 UTC (permalink / raw)
  To: Thomas E Deweese; +Cc: amylaar, lucier, gcc

> 	I'm not 100% certain of the density of 1's in the bit fields
> but my impression is that they are fairly sparse (I'm guessing,
> especially in the 'conflicts' case).  In which case I would suggest at
> least putting a check on any bits being set in 'word_' around the
> inner for loop.

I think that typically the 1's will come in bursts, with wide stretches
of all-zeroes.
Note that the inner loop will will terminate as soon as word is zero.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-11-09  8:26 Brad Lucier
  1999-11-30 23:37 ` Brad Lucier
  0 siblings, 1 reply; 54+ messages in thread
From: Brad Lucier @ 1999-11-09  8:26 UTC (permalink / raw)
  To: law, amylaar
  Cc: lucier, rth, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

> > ~8% off my testcase.  
> 
> Do you have any profile data for that?

I have some data for compiling a large, integer, file with

-O2 -mcpu=ev6 -fno-math-errno -fPIC

on the alpha.  However (1) I don't know how characteristic the program
is to others, and (2) I don't usually compile this file with -O2.  So
if you find this data useful, great, otherwise, ignore it.

The times for each pass are:

popov-53% /export/u10/egcs-prof/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -mcpu=ev6 -fno-math-errno -mieee -fPIC -O2 _meroon.i
__copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20___meroon {GC 160095k -> 33708k in 0.929} {GC 45097k -> 33687k in 0.940} {GC 47074k -> 37662k in 1.075} {GC 50287k -> 35884k in 1.022} {GC 54712k -> 40701k in 1.212} {GC 64544k -> 42767k in 1.267} ___init_proc {GC 86293k -> 3001k in 0.069} {GC 6054k -> 3060k in 0.145} ____20___meroon
time in parse: 21.162608 (-4%)
time in integration: 0.000000 (-0%)
time in jump: 516.838848 (-101%)
time in cse: 15.099696 (-3%)
time in gcse: -1645.-331280 (323%)
time in loop: 1.242448 (-0%)
time in cse2: 13.482464 (-3%)
time in branch-prob: 0.000000 (-0%)
time in flow: 8.959680 (-2%)
time in combine: 11.807648 (-2%)
time in regmove: 3.503840 (-1%)
time in sched: 356.688960 (-70%)
time in local-alloc: 5.562224 (-1%)
time in global-alloc: 34.060448 (-7%)
time in flow2: 8.440448 (-2%)
time in peephole2: 0.180560 (-0%)
time in sched2: 111.030736 (-22%)
time in shorten-branch: 0.581696 (-0%)
time in final: 3.423808 (-1%)
time in varconst: 0.001952 (-0%)
time in symout: 0.000000 (-0%)
time in dump: 0.000000 (-0%)
time in gc: 6.659248 (-1%)

So it seems that the int's used to contain the times in toplev.c don't
quite cut it here--change to unsigned longs (gains 1 bit on 32bit machines,
32 bits on the alpha) or doubles?

The top times in the profile file are:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 24.79    435.90   435.90   205024     0.00     0.00  pre_expr_reaches_here_p_work
 21.63    816.15   380.25 812491634     0.00     0.00  expr_killed_p
  8.85    971.72   155.58        1   155.58   535.82  compute_ae_kill
  7.68   1106.72   135.00    37475     0.00     0.00  compute_block_backward_dependences
  6.31   1217.65   110.93 122524058     0.00     0.00  rtx_renumbered_equal_p
  5.00   1305.47    87.82    97294     0.00     0.00  compute_transp
  4.24   1380.02    74.54 67176104     0.00     0.00  find_cross_jump
  3.67   1444.46    64.44        3    21.48    33.16  compute_hash_table
  2.39   1486.51    42.05  9018524     0.00     0.00  sbitmap_union_of_diff
  1.57   1514.14    27.63    83378     0.00     0.00  insert_expr_in_table
  1.33   1537.50    23.36   223375     0.00     0.00  record_one_set
  1.04   1555.83    18.33    74953     0.00     0.00  count_or_remove_death_notes
  0.92   1571.98    16.16       15     1.08    14.85  jump_optimize_1
  0.70   1584.28    12.30   389447     0.00     0.00  sbitmap_a_and_b
  0.69   1596.49    12.21   389436     0.00     0.00  sbitmap_a_or_b

Something seems the matter with the data for pre_expr_reaches_here_p_work.

Brad

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-08 15:37     ` Joern Rennecke
@ 1999-11-08 16:04       ` Jeffrey A Law
  1999-11-30 23:37         ` Jeffrey A Law
  1999-11-30 23:37       ` Joern Rennecke
  1 sibling, 1 reply; 54+ messages in thread
From: Jeffrey A Law @ 1999-11-08 16:04 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: gcc

[ Note, highly trimmed cc line. We don't need to give everyone 3 copies
  of this ;-)]

  In message < 199911082334.XAA19153@phal.cygnus.co.uk >you write:
  > > ~8% off my testcase.  
  > 
  > Do you have any profile data for that?
With that last change from you global's time isn't worth optimizing anymore
for my testcase.  It sits around 6% of the total time, most of that
is spent in reload.

6% may sound high, but it isn't for this file.  cse1+cse2 weigh in at
60% of the total compilation time followed by dbranch and gcse at 14% each
after all the recent work + one lcm.c patch Andrew and I have been working on.


jeff


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 15:44   ` Jeffrey A Law
@ 1999-11-08 15:37     ` Joern Rennecke
  1999-11-08 16:04       ` Jeffrey A Law
  1999-11-30 23:37       ` Joern Rennecke
  1999-11-30 23:37     ` Jeffrey A Law
  1 sibling, 2 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-11-08 15:37 UTC (permalink / raw)
  To: law
  Cc: amylaar, lucier, rth, gcc, gcc-bugs, staff, hosking, wilker,
	bernds, gcc-patches

> ~8% off my testcase.  

Do you have any profile data for that?

I've noticed that the reg_may_share code in find_reg is also pretty expensive.
With a little more helpful lookup tables it could be made linear instead
of quadratic in time.  So, if find_reg still consumes a noticable chunk
of the time in your benchmark, it might be worth a try.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 10:42 ` Joern Rennecke
@ 1999-11-05 15:44   ` Jeffrey A Law
  1999-11-08 15:37     ` Joern Rennecke
  1999-11-30 23:37     ` Jeffrey A Law
  1999-11-30 23:37   ` Joern Rennecke
  1 sibling, 2 replies; 54+ messages in thread
From: Jeffrey A Law @ 1999-11-05 15:44 UTC (permalink / raw)
  To: Joern Rennecke
  Cc: Brad Lucier, rth, gcc, gcc-bugs, staff, hosking, wilker, bernds,
	gcc-patches

  In message < 199911051840.SAA21033@phal.cygnus.co.uk >you write:
  > Well, since it worked for one place that looks for set bits in allocno sets
  > ,
  > let's see if it works in the two other ones:
  > 
  > Fri Nov  5 18:33:39 1999  J"orn Rennecke <amylaar@cygnus.co.uk>
  > 
  > 	* global.c (EXECUTE_IF_SET_IN_ALLOCNO_SET): New macro.
  > 	(EXECUTE_IF_CONFLICT): Likewise.
  > 	(ALLOCNO_LIVE_P): Avoid signed division.
  > 	(SET_ALLOCNO_LIVE, CLEAR_ALLOCNO_LIVE): Likewise.
  > 	(prune_preferences, find_reg): Use EXECUTE_IF_CONFLICT.
  > 	(record_one_conflict): Use EXECUTE_IF_SET_IN_ALLOCNO_SET.
~8% off my testcase.  

There was a minor buglet in the change -- you've got go be careful about
expansion of macro arguments :-)  The addition of a level of parens fixed
the problem.

jeff


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-11-05 14:43 Brad Lucier
  1999-11-30 23:37 ` Brad Lucier
  0 siblings, 1 reply; 54+ messages in thread
From: Brad Lucier @ 1999-11-05 14:43 UTC (permalink / raw)
  To: lucier, amylaar
  Cc: rth, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

Joern:

With your latest patch, the time for global_alloc is reduced to:

time in global-alloc: 30.228672 (20%)

which is down from 883 seconds yesterday.  Fantastic!

The top times for various functions are now:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  8.62     10.72    10.72    26627     0.40     0.40  record_conflicts
  7.36     19.87     9.15    30909     0.30     0.30  delete_from_jump_chain
  5.08     26.19     6.32        8   790.28  5353.28  jump_optimize_1
  4.65     31.98     5.79 10448965     0.00     0.00  rtx_renumbered_equal_p
  4.07     37.04     5.07  9164467     0.00     0.00  find_cross_jump
  3.64     41.57     4.53 16835940     0.00     0.00  next_active_insn
  3.41     45.81     4.24    93299     0.05     0.05  make_edge
  3.18     49.76     3.95   194464     0.02     0.02  record_one_conflict
  2.11     52.39     2.62        1  2623.05 123824.10  yyparse
  1.55     54.32     1.93  3127006     0.00     0.00  next_label
  1.50     56.18     1.87  2379715     0.00     0.00  yylex
  1.43     57.96     1.78        2   890.62   891.94  prune_preferences
  1.38     59.68     1.72        1  1715.82  1715.82  output_273
  1.32     61.32     1.64        2   819.82   819.82  mirror_conflicts

Brad

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 12:31 ` Michael Meissner
@ 1999-11-05 12:34   ` Joern Rennecke
  1999-11-30 23:37     ` Joern Rennecke
  1999-11-30 23:37   ` Michael Meissner
  1 sibling, 1 reply; 54+ messages in thread
From: Joern Rennecke @ 1999-11-05 12:34 UTC (permalink / raw)
  To: Michael Meissner; +Cc: deweese, amylaar, lucier, gcc

> If they are really large and sparse, bitmaps might be more effective than
> sbitmaps (assuming you use the EXECUTE_ macro in bitmaps).

The trouble with bitmaps, at least in their current implementation, is that
they use linear search.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 12:15 Thomas E Deweese
  1999-11-05 12:22 ` Joern Rennecke
@ 1999-11-05 12:31 ` Michael Meissner
  1999-11-05 12:34   ` Joern Rennecke
  1999-11-30 23:37   ` Michael Meissner
  1999-11-30 23:37 ` Thomas E Deweese
  2 siblings, 2 replies; 54+ messages in thread
From: Michael Meissner @ 1999-11-05 12:31 UTC (permalink / raw)
  To: Thomas E Deweese, Joern Rennecke, lucier; +Cc: gcc

On Fri, Nov 05, 1999 at 03:14:50PM -0500, Thomas E Deweese wrote:
> 
> JR> Well, since it worked for one place that looks for set bits in
> JR> allocno sets, let's see if it works in the two other ones:
> 
> 	I'm not 100% certain of the density of 1's in the bit fields
> but my impression is that they are fairly sparse (I'm guessing,
> especially in the 'conflicts' case).  In which case I would suggest at
> least putting a check on any bits being set in 'word_' around the
> inner for loop.

If they are really large and sparse, bitmaps might be more effective than
sbitmaps (assuming you use the EXECUTE_ macro in bitmaps).

-- 
Michael Meissner, Cygnus Solutions
PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886
email: meissner@cygnus.com	phone: 978-486-9304	fax: 978-692-4482

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-05 12:15 Thomas E Deweese
@ 1999-11-05 12:22 ` Joern Rennecke
  1999-11-30 23:37   ` Joern Rennecke
  1999-11-05 12:31 ` Michael Meissner
  1999-11-30 23:37 ` Thomas E Deweese
  2 siblings, 1 reply; 54+ messages in thread
From: Joern Rennecke @ 1999-11-05 12:22 UTC (permalink / raw)
  To: Thomas E Deweese; +Cc: amylaar, lucier, gcc

> 	I'm not 100% certain of the density of 1's in the bit fields
> but my impression is that they are fairly sparse (I'm guessing,
> especially in the 'conflicts' case).  In which case I would suggest at
> least putting a check on any bits being set in 'word_' around the
> inner for loop.

I think that typically the 1's will come in bursts, with wide stretches
of all-zeroes.
Note that the inner loop will will terminate as soon as word is zero.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-11-05 12:15 Thomas E Deweese
  1999-11-05 12:22 ` Joern Rennecke
                   ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Thomas E Deweese @ 1999-11-05 12:15 UTC (permalink / raw)
  To: Joern Rennecke, lucier; +Cc: gcc

JR> Well, since it worked for one place that looks for set bits in
JR> allocno sets, let's see if it works in the two other ones:

	I'm not 100% certain of the density of 1's in the bit fields
but my impression is that they are fairly sparse (I'm guessing,
especially in the 'conflicts' case).  In which case I would suggest at
least putting a check on any bits being set in 'word_' around the
inner for loop.

	Depending on the number of times the list is iterated over it
might even be worth doing a few steps of binary search on word_ to
throw out "large" sections of zero bits. Of course this will actually
slow things down if you approach 50% of the bits being set.

ie:
	if (word_ & 0xFFFF0000) {
	  if (word_ & 0xFF000000) {
	    // check high order byte
	  }
	  if (word_ & 0x00FF0000) {
	    // check 2nd from high order byte
          }
	}
	if (word_ & 0x0000FFFF) {
	  if (word_ & 0x0000FF00) {
	    // check 2nd from low order byte
	  }
	  if (word_ & 0x000000FF) {
	    // check low order byte
          }
	}

	I'm not certain if this is worth the 'uglyness' but this seems
to be a very tight loop given that word orienting the code made such a
large difference.

----


Index: global.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/global.c,v
retrieving revision 1.40
diff -p -r1.40 global.c
*** global.c    1999/11/05 08:31:48     1.40
--- global.c    1999/11/05 18:31:06
*************** static int allocno_row_words;
*** 139,144 ****
--- 139,172 ----
    &= ~ ((INT_TYPE) 1 << ((J) % INT_BITS)))
  /* END CYGNUS LOCAL */
  
+ /* For any allocno set in ALLOCNO_SET, set OUT_ALLOCNO to that allocno,
+    and execute CODE.  */
+ #define EXECUTE_IF_SET_IN_ALLOCNO_SET(ALLOCNO_SET, ALLOCNO, CODE)     \
+ do {                                                                  \
+   int i_;                                                             \
+   int allocno_;                                                               \
+   INT_TYPE *p_ = (ALLOCNO_SET);               \
+                                                                       \
+   for (i_ = allocno_row_words - 1, allocno_ = 0; i_ >= 0;             \
+        i_--, allocno_ += INT_BITS)                                    \
+     {                                                                 \
+       unsigned INT_TYPE word_ = (unsigned INT_TYPE) *p_++;            \
+                                                                       \
+       for ((ALLOCNO) = allocno_; word_; word_ >>= 1, (ALLOCNO)++)     \
+       {                                                               \
+         if (word_ & 1)                                                \
+           CODE;                                                       \
+       }                                                               \
+     }                                                                 \
+ } while (0)

-- 
							Thomas DeWeese
deweese@kodak.com
			"The only difference between theory and practice is
			 that in theory there isn't any." -- unknown

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-11-04 13:49 Brad Lucier
@ 1999-11-05 10:42 ` Joern Rennecke
  1999-11-05 15:44   ` Jeffrey A Law
  1999-11-30 23:37   ` Joern Rennecke
  1999-11-30 23:37 ` Brad Lucier
  1 sibling, 2 replies; 54+ messages in thread
From: Joern Rennecke @ 1999-11-05 10:42 UTC (permalink / raw)
  To: Brad Lucier
  Cc: lucier, amylaar, rth, gcc, gcc-bugs, staff, hosking, wilker,
	bernds, gcc-patches

> and the top functions were
> 
>  19.82     67.88    67.88 1599926669     0.00     0.00  bitmap_bit_p
>  15.83    122.07    54.19        3 18063.15 41030.16  build_insn_chain
>  14.52    171.77    49.71     2789    17.82    17.82  find_reg
>  13.64    218.49    46.72   194464     0.24     0.24  record_one_conflict
>   3.14    229.26    10.76    26627     0.40     0.40  record_conflicts
>   2.94    239.32    10.06    30909     0.33     0.33  delete_from_jump_chain
> ...
>   0.51    277.15     1.74        2   869.63   870.11  prune_preferences
> 
> So, even though global-alloc is still the biggest time sink, your patch
> makes a *big* difference.  (I don't believe the times reported in each
> pass of the compiler are reliable when it is profiled with -pg.)

Well, since it worked for one place that looks for set bits in allocno sets,
let's see if it works in the two other ones:

Fri Nov  5 18:33:39 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* global.c (EXECUTE_IF_SET_IN_ALLOCNO_SET): New macro.
	(EXECUTE_IF_CONFLICT): Likewise.
	(ALLOCNO_LIVE_P): Avoid signed division.
	(SET_ALLOCNO_LIVE, CLEAR_ALLOCNO_LIVE): Likewise.
	(prune_preferences, find_reg): Use EXECUTE_IF_CONFLICT.
	(record_one_conflict): Use EXECUTE_IF_SET_IN_ALLOCNO_SET.

Index: global.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/global.c,v
retrieving revision 1.40
diff -p -r1.40 global.c
*** global.c	1999/11/05 08:31:48	1.40
--- global.c	1999/11/05 18:31:06
*************** static int allocno_row_words;
*** 139,144 ****
--- 139,172 ----
    &= ~ ((INT_TYPE) 1 << ((J) % INT_BITS)))
  /* END CYGNUS LOCAL */
  
+ /* For any allocno set in ALLOCNO_SET, set OUT_ALLOCNO to that allocno,
+    and execute CODE.  */
+ #define EXECUTE_IF_SET_IN_ALLOCNO_SET(ALLOCNO_SET, ALLOCNO, CODE)	\
+ do {									\
+   int i_;								\
+   int allocno_;								\
+   INT_TYPE *p_ = (ALLOCNO_SET);		\
+ 									\
+   for (i_ = allocno_row_words - 1, allocno_ = 0; i_ >= 0;		\
+        i_--, allocno_ += INT_BITS)					\
+     {									\
+       unsigned INT_TYPE word_ = (unsigned INT_TYPE) *p_++;		\
+ 									\
+       for ((ALLOCNO) = allocno_; word_; word_ >>= 1, (ALLOCNO)++)	\
+ 	{								\
+ 	  if (word_ & 1)						\
+ 	    CODE;							\
+ 	}								\
+     }									\
+ } while (0)
+ 
+ /* For any allocno that conflicts with IN_ALLOCNO, set OUT_ALLOCNO to
+    the conflicting allocno, and execute CODE.  This macro assumes that
+    mirror_conflicts has been run.  */
+ #define EXECUTE_IF_CONFLICT(IN_ALLOCNO, OUT_ALLOCNO, CODE)\
+   EXECUTE_IF_SET_IN_ALLOCNO_SET (conflicts + (IN_ALLOCNO) * allocno_row_words,\
+ 				 OUT_ALLOCNO, CODE)
+ 
  /* Set of hard regs currently live (during scan of all insns).  */
  
  static HARD_REG_SET hard_regs_live;
*************** static INT_TYPE *allocnos_live;
*** 217,231 ****
  
  /* Test, set or clear bit number I in allocnos_live,
     a bit vector indexed by allocno.  */
- 
- #define ALLOCNO_LIVE_P(I) \
-   (allocnos_live[(I) / INT_BITS] & ((INT_TYPE) 1 << ((I) % INT_BITS)))
- 
- #define SET_ALLOCNO_LIVE(I) \
-   (allocnos_live[(I) / INT_BITS] |= ((INT_TYPE) 1 << ((I) % INT_BITS)))
  
! #define CLEAR_ALLOCNO_LIVE(I) \
!   (allocnos_live[(I) / INT_BITS] &= ~((INT_TYPE) 1 << ((I) % INT_BITS)))
  
  /* This is turned off because it doesn't work right for DImode.
     (And it is only used for DImode, so the other cases are worthless.)
--- 245,262 ----
  
  /* Test, set or clear bit number I in allocnos_live,
     a bit vector indexed by allocno.  */
  
! #define ALLOCNO_LIVE_P(I)				\
!   (allocnos_live[(unsigned)(I) / INT_BITS]		\
!    & ((INT_TYPE) 1 << ((unsigned)(I) % INT_BITS)))
! 
! #define SET_ALLOCNO_LIVE(I)				\
!   (allocnos_live[(unsigned)(I) / INT_BITS]		\
!      |= ((INT_TYPE) 1 << ((unsigned)(I) % INT_BITS)))
! 
! #define CLEAR_ALLOCNO_LIVE(I)				\
!   (allocnos_live[(unsigned)(I) / INT_BITS]		\
!      &= ~((INT_TYPE) 1 << ((unsigned)(I) % INT_BITS)))
  
  /* This is turned off because it doesn't work right for DImode.
     (And it is only used for DImode, so the other cases are worthless.)
*************** expand_preferences ()
*** 860,866 ****
  static void
  prune_preferences ()
  {
!   int i, j;
    int allocno;
    int *allocno_to_order = (int *) xmalloc (max_allocno * sizeof (int));
    
--- 891,897 ----
  static void
  prune_preferences ()
  {
!   int i;
    int allocno;
    int *allocno_to_order = (int *) xmalloc (max_allocno * sizeof (int));
    
*************** prune_preferences ()
*** 899,931 ****
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
        HARD_REG_SET temp, temp2;
!       INT_TYPE *p;
!       int allocno2, allocno3;
  
        allocno = allocno_order[i];
-       p = conflicts + allocno * allocno_row_words;
  
        CLEAR_HARD_REG_SET (temp);
        CLEAR_HARD_REG_SET (temp2);
  
!       for (j = allocno_row_words - 1, allocno2 = 0; j >= 0;
! 	   j--, allocno2 += INT_BITS)
  	{
! 	  unsigned INT_TYPE word = (unsigned INT_TYPE) *p++;
! 
! 	  for (allocno3 = allocno2; word; word >>= 1, allocno3++)
  	    {
! 	      if ((word & 1) && allocno_to_order[allocno3] > i)
! 		{
! 		  if (allocno_size[allocno3] <= allocno_size[allocno])
! 		    IOR_HARD_REG_SET (temp,
! 				      hard_reg_full_preferences[allocno3]);
! 		  else
! 		    IOR_HARD_REG_SET (temp2,
! 				      hard_reg_full_preferences[allocno3]);
! 		}
  	    }
! 	}
  
        AND_COMPL_HARD_REG_SET (temp, hard_reg_full_preferences[allocno]);
        IOR_HARD_REG_SET (temp, temp2);
--- 930,952 ----
  	 we want to give the lower-priority allocno the first chance for
  	 these registers).  */
        HARD_REG_SET temp, temp2;
!       int allocno2;
  
        allocno = allocno_order[i];
  
        CLEAR_HARD_REG_SET (temp);
        CLEAR_HARD_REG_SET (temp2);
  
!       EXECUTE_IF_CONFLICT (allocno, allocno2,
  	{
! 	  if (allocno_to_order[allocno2] > i)
  	    {
! 	      if (allocno_size[allocno2] <= allocno_size[allocno])
! 		IOR_HARD_REG_SET (temp, hard_reg_full_preferences[allocno2]);
! 	      else
! 		IOR_HARD_REG_SET (temp2, hard_reg_full_preferences[allocno2]);
  	    }
! 	});
  
        AND_COMPL_HARD_REG_SET (temp, hard_reg_full_preferences[allocno]);
        IOR_HARD_REG_SET (temp, temp2);
*************** find_reg (allocno, losers, alt_regs_p, a
*** 1246,1256 ****
        /* For each other pseudo-reg conflicting with this one,
  	 mark it as conflicting with the hard regs this one occupies.  */
        lim = allocno;
!       for (j = 0; j < max_allocno; j++)
! 	if (CONFLICTP (j, lim))
! 	  {
! 	    IOR_HARD_REG_SET (hard_reg_conflicts[j], this_reg);
! 	  }
      }
  }
  \f
--- 1267,1276 ----
        /* For each other pseudo-reg conflicting with this one,
  	 mark it as conflicting with the hard regs this one occupies.  */
        lim = allocno;
!       EXECUTE_IF_CONFLICT (lim, j,
! 	{
! 	  IOR_HARD_REG_SET (hard_reg_conflicts[j], this_reg);
! 	});
      }
  }
  \f
*************** record_one_conflict (regno)
*** 1303,1313 ****
    if (regno < FIRST_PSEUDO_REGISTER)
      /* When a hard register becomes live,
         record conflicts with live pseudo regs.  */
!     for (j = 0; j < max_allocno; j++)
        {
! 	if (ALLOCNO_LIVE_P (j))
! 	  SET_HARD_REG_BIT (hard_reg_conflicts[j], regno);
!       }
    else
      /* When a pseudo-register becomes live,
         record conflicts first with hard regs,
--- 1323,1332 ----
    if (regno < FIRST_PSEUDO_REGISTER)
      /* When a hard register becomes live,
         record conflicts with live pseudo regs.  */
!     EXECUTE_IF_SET_IN_ALLOCNO_SET (allocnos_live, j,
        {
! 	SET_HARD_REG_BIT (hard_reg_conflicts[j], regno);
!       });
    else
      /* When a pseudo-register becomes live,
         record conflicts first with hard regs,

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-11-04 13:49 Brad Lucier
  1999-11-05 10:42 ` Joern Rennecke
  1999-11-30 23:37 ` Brad Lucier
  0 siblings, 2 replies; 54+ messages in thread
From: Brad Lucier @ 1999-11-04 13:49 UTC (permalink / raw)
  To: lucier, amylaar
  Cc: rth, gcc, gcc-bugs, staff, hosking, wilker, bernds, gcc-patches

> Could you benchmark this patch?
> 
> Thu Nov  4 06:15:40 1999  J"orn Rennecke <amylaar@cygnus.co.uk>
> 
> 	* global.c (CONFLICTP, SET_CONFLICT): Avoid signed division.
> 	(mirror_conflicts): New function.
> 	(global_alloc): Call it.
> 	(expand_preferences): Remove redundant CONFLICTP test.
> 	(find_reg, dump_conflicts): Likewise.
> 	(prune_preferences): Process conflicts one word at a time.
> 

I bootstrapped yesterday's mainline on alphaev6-unknown-linux-gnu,
and ran it on a similar test case.  Without the patch, I get:

/export/u10/egcs-prof/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -mcpu=ev6 -fno-math-errno -mieee -fPIC -O1 _meroon.i 
 ___H__20___meroon {GC 220613k -> 46104k in 1.295} {GC 79384k -> 50882k in 1.474} {GC 69598k -> 53435k in 1.583} ___init_proc {GC 93235k -> 3595k in 0.078} ____20___meroon
...
time in global-alloc: 1237.253728 (85%)
...

and the top functions by time were:

 29.05    142.54   142.54        2 71267.58 71273.92  prune_preferences
 13.86    210.53    68.00 1599926669     0.00     0.00  bitmap_bit_p
 11.99    269.37    58.84     2789    21.10    21.10  find_reg
 10.93    322.99    53.62        3 17873.05 40871.13  build_insn_chain
  9.55    369.85    46.86   194464     0.24     0.24  record_one_conflict
  2.21    380.71    10.85    26627     0.41     0.41  record_conflicts
  2.02    390.61     9.91    30909     0.32     0.32  delete_from_jump_chain

With the patch, I got

/export/u10/egcs-prof2/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -mcpu=ev6 -fno-math-errno -mieee -fPIC -O1 _meroon.i
 ___H__20___meroon {GC 220613k -> 46104k in 1.303} {GC 79384k -> 50882k in 1.459} {GC 69598k -> 53435k in 1.566} ___init_proc {GC 93235k -> 3595k in 0.080} ____20___meroon
...
time in global-alloc: 574.101744 (72%)
...

and the top functions were

 19.82     67.88    67.88 1599926669     0.00     0.00  bitmap_bit_p
 15.83    122.07    54.19        3 18063.15 41030.16  build_insn_chain
 14.52    171.77    49.71     2789    17.82    17.82  find_reg
 13.64    218.49    46.72   194464     0.24     0.24  record_one_conflict
  3.14    229.26    10.76    26627     0.40     0.40  record_conflicts
  2.94    239.32    10.06    30909     0.33     0.33  delete_from_jump_chain
...
  0.51    277.15     1.74        2   869.63   870.11  prune_preferences

So, even though global-alloc is still the biggest time sink, your patch
makes a *big* difference.  (I don't believe the times reported in each
pass of the compiler are reliable when it is profiled with -pg.)

Brad

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-06 23:17 Mike Stump
@ 1999-08-31 23:20 ` Mike Stump
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Stump @ 1999-08-31 23:20 UTC (permalink / raw)
  To: lucier; +Cc: gcc

> From: Brad Lucier <lucier@math.purdue.edu>
> To: amylaar@cygnus.co.uk (Joern Rennecke)
> Date: Fri, 6 Aug 1999 20:06:35 -0500 (EST)

> The biggest time sink seems to be the quadratic algorithm in
> prune_preferences in global.c.

Thanks for pointing out quadratic algorithms...  [ blush ] we usually
can't win with them, someone always finds _the_ testcase that will
kill it.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* big slowdown in egcs-1.1.2->gcc-2.95 on alpha
  1999-08-04  9:15 Brad Lucier
@ 1999-08-31 23:20 ` Brad Lucier
  0 siblings, 0 replies; 54+ messages in thread
From: Brad Lucier @ 1999-08-31 23:20 UTC (permalink / raw)
  To: gcc, gcc-bugs; +Cc: lucier, staff, jlee, hosking, wilker

I'm running a Genetic Programming system, where compile times are as
important as run times.  I noticed that gcc-2.95 is a *lot* slower than
egcs-1.1.2 on my alpha-ev6 running Red Hat 6.0 with binutils 2.9.5.0.3

Here are some typical times:

egcs-1.1.2:

/usr/bin/time /usr/bin/gcc -mcpu=ev6 -fPIC -O1 -c -D___DYNAMIC -D___SINGLE_HOST system.c
106.60user 0.17system 0:13.49elapsed 790%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2248major+8092minor)pagefaults 0swaps

gcc-2.9.5:

/usr/bin/time /export/u10/gcc-2.95/bin/gcc -mcpu=ev6 -fPIC -O1 -c -D___DYNAMIC -D___SINGLE_HOST system.c
250.77user 0.24system 0:31.59elapsed 794%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2330major+15298minor)pagefaults 0swaps

(The user and system times on this box are too big by a factor of 8.)

I try to be an independent guy---I installed a test version compiled with
-pg, but gprof on my platform dumps core (both the original one and the one
with 2.9.5.0.3) so that isn't much help.

Any suggestions?  Anybody else notice this?

Brad Lucier   lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-08-06 23:17 Mike Stump
  1999-08-31 23:20 ` Mike Stump
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Stump @ 1999-08-06 23:17 UTC (permalink / raw)
  To: lucier; +Cc: gcc

> From: Brad Lucier <lucier@math.purdue.edu>
> To: amylaar@cygnus.co.uk (Joern Rennecke)
> Date: Fri, 6 Aug 1999 20:06:35 -0500 (EST)

> The biggest time sink seems to be the quadratic algorithm in
> prune_preferences in global.c.

Thanks for pointing out quadratic algorithms...  [ blush ] we usually
can't win with them, someone always finds _the_ testcase that will
kill it.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* big slowdown in egcs-1.1.2->gcc-2.95 on alpha
@ 1999-08-04  9:15 Brad Lucier
  1999-08-31 23:20 ` Brad Lucier
  0 siblings, 1 reply; 54+ messages in thread
From: Brad Lucier @ 1999-08-04  9:15 UTC (permalink / raw)
  To: gcc, gcc-bugs; +Cc: lucier, staff, jlee, hosking, wilker

I'm running a Genetic Programming system, where compile times are as
important as run times.  I noticed that gcc-2.95 is a *lot* slower than
egcs-1.1.2 on my alpha-ev6 running Red Hat 6.0 with binutils 2.9.5.0.3

Here are some typical times:

egcs-1.1.2:

/usr/bin/time /usr/bin/gcc -mcpu=ev6 -fPIC -O1 -c -D___DYNAMIC -D___SINGLE_HOST system.c
106.60user 0.17system 0:13.49elapsed 790%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2248major+8092minor)pagefaults 0swaps

gcc-2.9.5:

/usr/bin/time /export/u10/gcc-2.95/bin/gcc -mcpu=ev6 -fPIC -O1 -c -D___DYNAMIC -D___SINGLE_HOST system.c
250.77user 0.24system 0:31.59elapsed 794%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2330major+15298minor)pagefaults 0swaps

(The user and system times on this box are too big by a factor of 8.)

I try to be an independent guy---I installed a test version compiled with
-pg, but gprof on my platform dumps core (both the original one and the one
with 2.9.5.0.3) so that isn't much help.

Any suggestions?  Anybody else notice this?

Brad Lucier   lucier@math.purdue.edu

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~1999-11-30 23:37 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-08-06 13:16 big slowdown in egcs-1.1.2->gcc-2.95 on alpha Brad Lucier
1999-08-06 13:54 ` Joern Rennecke
1999-08-06 16:25   ` Brad Lucier
1999-08-31 23:20     ` Brad Lucier
1999-08-06 18:06   ` Brad Lucier
1999-08-06 18:54     ` Joern Rennecke
1999-08-06 21:27       ` Brad Lucier
1999-08-07  1:04         ` Richard Henderson
1999-08-07  8:08           ` Brad Lucier
1999-08-31 23:20             ` Brad Lucier
1999-08-07  8:27           ` Brad Lucier
1999-08-31 23:20             ` Brad Lucier
1999-11-03 22:31             ` Joern Rennecke
1999-11-04 17:19               ` Richard Henderson
1999-11-05  0:33                 ` Jeffrey A Law
1999-11-30 23:37                   ` Jeffrey A Law
1999-11-30 23:37                 ` Richard Henderson
1999-11-30 23:37               ` Joern Rennecke
1999-08-31 23:20           ` Richard Henderson
1999-08-31 23:20         ` Brad Lucier
1999-08-11  1:57       ` Jeffrey A Law
1999-08-12 16:25         ` Joern Rennecke
1999-08-31 23:20           ` Joern Rennecke
1999-08-31 23:20         ` Jeffrey A Law
1999-08-31 23:20       ` Joern Rennecke
1999-08-31 23:20     ` Brad Lucier
1999-08-31 23:20   ` Joern Rennecke
1999-08-31 23:20 ` Brad Lucier
  -- strict thread matches above, loose matches on Subject: below --
1999-11-09  8:26 Brad Lucier
1999-11-30 23:37 ` Brad Lucier
1999-11-05 14:43 Brad Lucier
1999-11-30 23:37 ` Brad Lucier
1999-11-05 12:15 Thomas E Deweese
1999-11-05 12:22 ` Joern Rennecke
1999-11-30 23:37   ` Joern Rennecke
1999-11-05 12:31 ` Michael Meissner
1999-11-05 12:34   ` Joern Rennecke
1999-11-30 23:37     ` Joern Rennecke
1999-11-30 23:37   ` Michael Meissner
1999-11-30 23:37 ` Thomas E Deweese
1999-11-04 13:49 Brad Lucier
1999-11-05 10:42 ` Joern Rennecke
1999-11-05 15:44   ` Jeffrey A Law
1999-11-08 15:37     ` Joern Rennecke
1999-11-08 16:04       ` Jeffrey A Law
1999-11-30 23:37         ` Jeffrey A Law
1999-11-30 23:37       ` Joern Rennecke
1999-11-30 23:37     ` Jeffrey A Law
1999-11-30 23:37   ` Joern Rennecke
1999-11-30 23:37 ` Brad Lucier
1999-08-06 23:17 Mike Stump
1999-08-31 23:20 ` Mike Stump
1999-08-04  9:15 Brad Lucier
1999-08-31 23:20 ` Brad Lucier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).