public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Speed/profile of gcc3.4
@ 2004-01-20 12:42 Richard Guenther
  2004-01-20 12:49 ` Zack Winkles
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Richard Guenther @ 2004-01-20 12:42 UTC (permalink / raw)
  To: gcc

Hi!

To give some more data to the speed of g++ discussion I built a profiling
compiler and ran it over the tramp3d.cpp testcase
(http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d.cpp.gz). Top
on the (flat) profile are

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
  7.85     24.64    24.64 71506294     0.00     0.00  ggc_alloc
  3.37     35.23    10.59 75978109     0.00     0.00  htab_find_slot_with_hash
  3.17     45.20     9.97 15526171     0.00     0.00  walk_tree
  3.06     54.80     9.60  3895596     0.00     0.00  gt_ggc_mx_lang_tree_node
  2.10     61.40     6.60    43652     0.00     0.00  fixup_var_refs_insns
  2.07     67.91     6.51 116225166     0.00     0.00  ggc_set_mark
  2.05     74.35     6.44    17854     0.00     0.00  init_alias_analysis
  1.51     79.10     4.75   221721     0.00     0.00  htab_expand
  1.47     83.71     4.61     1044     0.00     0.02  store_motion
  1.17     87.37     3.66     8618     0.00     0.00  loop_regs_scan
  1.14     90.96     3.59 13512961     0.00     0.00  fixup_var_refs_1
  1.03     94.19     3.23   238610     0.00     0.00  compute_transp
  1.03     97.41     3.22  2216398     0.00     0.00  emit_insn
  0.94    100.37     2.96 27841267     0.00     0.00  note_stores
  0.94    103.31     2.94 20724190     0.00     0.00  splay_tree_splay_helper
  0.89    106.09     2.78 11630295     0.00     0.00  for_each_rtx
  0.88    108.85     2.76  2243042     0.00     0.00  cse_insn
  0.88    111.60     2.75  7037958     0.00     0.00  reg_scan_mark_refs
  0.79    114.08     2.48 16796134     0.00     0.00  find_loads
  0.71    116.31     2.23 129249560     0.00     0.00  bitmap_set_bit
  0.68    118.45     2.14  6755763     0.00     0.00  count_reg_usage
  0.67    120.55     2.10 58964152     0.00     0.00  find_reg_note
  0.61    122.47     1.92  3829519     0.00     0.00  constrain_operands
  0.60    124.35     1.88 13513006     0.00     0.00  fixup_var_refs_insn
  0.59    126.19     1.84 22372827     0.00     0.00  mark_set_1

Ugh.

The htab_find_slot_with_hash stuff should maybe splitted up because
it seems heavily overloaded. Also, do we use power-of-two hashtab sizes
only? In this case we could save the costly division/modulo calculations.

ggc_alloc - err - which collector do we use on default? The page or the zone collector?
How do I select a different collector?

For the page collector, inside ggc_alloc we should use __builtin_expect()
for the entry==NULL || entry->num_free_objects == 0, also using a wrapper
around ggc_alloc() with a __builtin_constant_p() could be used to speed up
the order calculation. Also push_depth/push_by_depth could make use of
__builtin_expect() and put the realloc out of line.  Was the use of
prefetch in ggc_pop_context benchmarked?

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Speed/profile of gcc3.4
  2004-01-20 12:42 Speed/profile of gcc3.4 Richard Guenther
@ 2004-01-20 12:49 ` Zack Winkles
  2004-01-20 13:01   ` Richard Guenther
  2004-01-20 13:09 ` Jakub Jelinek
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Zack Winkles @ 2004-01-20 12:49 UTC (permalink / raw)
  To: gcc

Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> said:
> ggc_alloc - err - which collector do we use on default? The page or
> the zone collector? How do I select a different collector?

AFAIK, page is the default.  Select a new one by passing --with-gz=zone
to ./configure.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Speed/profile of gcc3.4
  2004-01-20 12:49 ` Zack Winkles
@ 2004-01-20 13:01   ` Richard Guenther
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Guenther @ 2004-01-20 13:01 UTC (permalink / raw)
  To: Zack Winkles; +Cc: gcc

On Tue, 20 Jan 2004, Zack Winkles wrote:

> Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> said:
> > ggc_alloc - err - which collector do we use on default? The page or
> > the zone collector? How do I select a different collector?
>
> AFAIK, page is the default.  Select a new one by passing --with-gz=zone
> to ./configure.

Thanks, --with-gc is not documented in install.texi, neither are the
choices page or zone. Only simple is mentioned as note for DG Unix 4.0,
but isn't support for simple removed?

PR/13770.

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Speed/profile of gcc3.4
  2004-01-20 12:42 Speed/profile of gcc3.4 Richard Guenther
  2004-01-20 12:49 ` Zack Winkles
@ 2004-01-20 13:09 ` Jakub Jelinek
  2004-01-20 13:35 ` Jan Hubicka
  2004-01-20 22:36 ` Mike Stump
  3 siblings, 0 replies; 10+ messages in thread
From: Jakub Jelinek @ 2004-01-20 13:09 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

On Tue, Jan 20, 2004 at 01:42:35PM +0100, Richard Guenther wrote:
> The htab_find_slot_with_hash stuff should maybe splitted up because
> it seems heavily overloaded. Also, do we use power-of-two hashtab sizes
> only? In this case we could save the costly division/modulo calculations.

No, there are not enough primes which are power-of-two... ;)
All hash tables using libiberty/hashtable.c are using prime hashtab sizes,
ht_* in gcc/hashtable.c use power-of-two sizes.

	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Speed/profile of gcc3.4
  2004-01-20 12:42 Speed/profile of gcc3.4 Richard Guenther
  2004-01-20 12:49 ` Zack Winkles
  2004-01-20 13:09 ` Jakub Jelinek
@ 2004-01-20 13:35 ` Jan Hubicka
  2004-01-20 14:59   ` Richard Guenther
  2004-01-20 17:51   ` Giovanni Bajo
  2004-01-20 22:36 ` Mike Stump
  3 siblings, 2 replies; 10+ messages in thread
From: Jan Hubicka @ 2004-01-20 13:35 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

> Hi!
> 
> To give some more data to the speed of g++ discussion I built a profiling
> compiler and ran it over the tramp3d.cpp testcase
> (http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d.cpp.gz). Top
> on the (flat) profile are
> 
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>   7.85     24.64    24.64 71506294     0.00     0.00  ggc_alloc
>   3.37     35.23    10.59 75978109     0.00     0.00  htab_find_slot_with_hash
>   3.17     45.20     9.97 15526171     0.00     0.00  walk_tree
>   3.06     54.80     9.60  3895596     0.00     0.00  gt_ggc_mx_lang_tree_node
>   2.10     61.40     6.60    43652     0.00     0.00  fixup_var_refs_insns
>   2.07     67.91     6.51 116225166     0.00     0.00  ggc_set_mark
>   2.05     74.35     6.44    17854     0.00     0.00  init_alias_analysis
>   1.51     79.10     4.75   221721     0.00     0.00  htab_expand
>   1.47     83.71     4.61     1044     0.00     0.02  store_motion
>   1.17     87.37     3.66     8618     0.00     0.00  loop_regs_scan
>   1.14     90.96     3.59 13512961     0.00     0.00  fixup_var_refs_1
>   1.03     94.19     3.23   238610     0.00     0.00  compute_transp
>   1.03     97.41     3.22  2216398     0.00     0.00  emit_insn
>   0.94    100.37     2.96 27841267     0.00     0.00  note_stores
>   0.94    103.31     2.94 20724190     0.00     0.00  splay_tree_splay_helper
>   0.89    106.09     2.78 11630295     0.00     0.00  for_each_rtx
>   0.88    108.85     2.76  2243042     0.00     0.00  cse_insn
>   0.88    111.60     2.75  7037958     0.00     0.00  reg_scan_mark_refs
>   0.79    114.08     2.48 16796134     0.00     0.00  find_loads
>   0.71    116.31     2.23 129249560     0.00     0.00  bitmap_set_bit
>   0.68    118.45     2.14  6755763     0.00     0.00  count_reg_usage
>   0.67    120.55     2.10 58964152     0.00     0.00  find_reg_note
>   0.61    122.47     1.92  3829519     0.00     0.00  constrain_operands
>   0.60    124.35     1.88 13513006     0.00     0.00  fixup_var_refs_insn
>   0.59    126.19     1.84 22372827     0.00     0.00  mark_set_1
> 
> Ugh.
> 
> The htab_find_slot_with_hash stuff should maybe splitted up because
> it seems heavily overloaded. Also, do we use power-of-two hashtab sizes
> only? In this case we could save the costly division/modulo calculations.

Actually I did some profiling of this too and at least from Gerald's
testcase I concluded that wast majority of the hashtable uses come from
the for_each_template_parm.  Jason mentioned that Mark plans to trim
down use of these.  That should make it possible to shot this function
out of profiles completely.

Mark, do you made some progress on this?  If not, I can try to do
something myself if you give me someguidelines.
> 
> ggc_alloc - err - which collector do we use on default? The page or the zone collector?
> How do I select a different collector?
> 
> For the page collector, inside ggc_alloc we should use __builtin_expect()
> for the entry==NULL || entry->num_free_objects == 0, also using a wrapper
> around ggc_alloc() with a __builtin_constant_p() could be used to speed up
> the order calculation. Also push_depth/push_by_depth could make use of
> __builtin_expect() and put the realloc out of line.  Was the use of
> prefetch in ggc_pop_context benchmarked?
The builtin_expect tricks can be obsoletted by pusing profilebootstrap
to be used for porduction builds of the compiler.  Perhaps we can do
some work to make it cheaper by producing the train run testcase that is
smaller than full libjava/libstdc++ build?

Honza
> 
> Richard.
> 
> --
> Richard Guenther <richard dot guenther at uni-tuebingen dot de>
> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Speed/profile of gcc3.4
  2004-01-20 13:35 ` Jan Hubicka
@ 2004-01-20 14:59   ` Richard Guenther
  2004-01-20 15:29     ` Richard Guenther
  2004-01-20 17:51   ` Giovanni Bajo
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Guenther @ 2004-01-20 14:59 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

On Tue, 20 Jan 2004, Jan Hubicka wrote:

> > Hi!
> >
> > To give some more data to the speed of g++ discussion I built a profiling
> > compiler and ran it over the tramp3d.cpp testcase
> > (http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d.cpp.gz). Top
> > on the (flat) profile are
> >
> >   %   cumulative   self              self     total
> >  time   seconds   seconds    calls   s/call   s/call  name
> >   7.85     24.64    24.64 71506294     0.00     0.00  ggc_alloc
> >   3.37     35.23    10.59 75978109     0.00     0.00  htab_find_slot_with_hash
> >   3.17     45.20     9.97 15526171     0.00     0.00  walk_tree
>
> Actually I did some profiling of this too and at least from Gerald's
> testcase I concluded that wast majority of the hashtable uses come from
> the for_each_template_parm.  Jason mentioned that Mark plans to trim
> down use of these.  That should make it possible to shot this function
> out of profiles completely.
>
> Mark, do you made some progress on this?  If not, I can try to do
> something myself if you give me someguidelines.

Actually, for_each_template_param shows up here:

  0.43    151.22     1.34    47911     0.00     0.00  sbitmap_vector_alloc
  0.39    152.45     1.23 18536823     0.00     0.00  for_each_template_parm_r
  0.39    153.67     1.22  1120852     0.00     0.00  sbitmap_union_of_diff_cg

Callgraph looks like

                              384854             cselib_lookup_mem <cycle 1> [407]
                              640103             gen_rtx_CONST_INT <cycle 1> [1087]
                             1855728             cgraph_node <cycle 1> [595]
                             3023470             cselib_lookup <cycle 1> [108]
                             69793938             htab_find_slot <cycle 1> [187]
                0.00    0.00       2/101229133     make_cpp_dir [2759]
                0.00    0.00     441/101229133     _cpp_find_file [2155]
                0.00    0.00    4518/101229133     cgraph_remove_node [1459]
[41]     3.9   10.59    1.56 75978109         htab_find_slot_with_hash <cycle 1> [41]
                0.65    0.00 64368978/64368978     eq_pointer [310]
                0.45    0.00 1517339/1517339     reg_attrs_htab_eq [389]
                0.12    0.00  462631/462631      list_hash_eq [752]
                0.11    0.00 1052895/1052895     mem_attrs_htab_eq [780]
                0.10    0.01  227704/661135      type_hash_eq [480]
                0.07    0.00 1698056/1698056     size_htab_eq [927]

and for htab_find_slot

                              940124             get_mem_attrs <cycle 1> [866]
                             1096723             get_reg_attrs <cycle 1> [657]
                             1684632             size_int_type_wide <cycle 1> [549]
                             65661439             walk_tree <cycle 1> [23]
                0.05    0.04   58629/101229133     remove_eh_handler [721]
                0.09    0.06  104145/101229133     add_ehl_entry [590]
                0.17    0.12  200965/101229133     maybe_remove_eh_handler [397]
[187]    0.5    0.88    0.57 69793938         htab_find_slot <cycle 1> [187]
                0.36    0.00 65692767/114308909     hash_pointer [318]
                0.06    0.00  940124/1008688     mem_attrs_htab_hash [985]
                0.05    0.00 1096723/1407744     reg_attrs_htab_hash [929]
                0.04    0.00    5783/5783        const_hash_1 [1123]
                0.02    0.01 1684632/1685231     size_htab_hash [1284]
                0.02    0.00  363739/366283      ehl_hash [1404]
                0.01    0.00    3599/3599        const_double_htab_hash [1755]
                0.00    0.00    6571/12650       typename_hash [2424]
                0.00    0.00   31328/764758      insns_for_mem_hash [2973]
                0.00    0.00    5783/5783        const_desc_hash [3332]
                             69793938             htab_find_slot_with_hash <cycle 1> [41]

and walk_tree

                             100948352             walk_tree <cycle 1> [23]
                                 732             break_out_target_exprs <cycle 1> [2322]
                              137222             cxx_unsave_expr_now <cycle 1> [798]
                              193876             copy_body <cycle 1> [1409]
                              197273             cgraph_create_edges <cycle 1> [1072]
                              238457             record_call_1 <cycle 1> [429]
                              377772             expand_calls_inline <cycle 1> [2988]
                              472630             remap_decl <cycle 1> [379]
                              536279             expand_call_inline <cycle 1> [132]
                              765817             walk_tree_without_duplicates <cycle 1> [646]
                             1337259             cp_walk_subtrees <cycle 1> [221]
                             11268854             for_each_template_parm <cycle 1> [333]
[23]     5.8    9.97    8.22 15526171+100948352 walk_tree <cycle 1> [23]
                0.31    6.75 15566125/15566125     cp_unsave_r [61]
                0.53    0.00 48523627/50649982     first_rtl_op [352]
                0.22    0.05 2107198/2107198     no_linkage_helper [524]
                0.03    0.10  951599/951599      inline_forbidden_p_1 [709]
                0.10    0.01 1625951/1625951     calls_setjmp_r [773]
                0.05    0.00 5275403/30960015     cp_is_overload_p [478]
                0.03    0.00  593303/593303      c_estimate_num_insns_1 [1203]
                0.03    0.00 3316016/3316016     cxx_callgraph_analyze_expr [1225]
                0.01    0.00  110823/110823      find_reachable_label_1 [1700]
                0.00    0.00     671/26098412     copy_tree_r [45]
                0.00    0.00    7357/7357        bot_replace [2470]
                0.00    0.00    2424/2424        local_variable_p_walkfn [2528]
                0.00    0.00    9990/9990        nullify_returns_r [3277]
                             65661439             htab_find_slot <cycle 1> [187]
                             51607162             cp_walk_subtrees <cycle 1> [221]
                             26625551             mark_local_for_remap_r <cycle 1> [195]
                             18536823             for_each_template_parm_r <cycle 1> [210]


--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Speed/profile of gcc3.4
  2004-01-20 14:59   ` Richard Guenther
@ 2004-01-20 15:29     ` Richard Guenther
  2004-01-20 16:59       ` Richard Guenther
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Guenther @ 2004-01-20 15:29 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, Mark Mitchell

On Tue, 20 Jan 2004, Richard Guenther wrote:

> On Tue, 20 Jan 2004, Jan Hubicka wrote:
>
> > > Hi!
> > >
> > > To give some more data to the speed of g++ discussion I built a profiling
> > > compiler and ran it over the tramp3d.cpp testcase
> > > (http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d.cpp.gz). Top
> > > on the (flat) profile are
> > >
> > >   %   cumulative   self              self     total
> > >  time   seconds   seconds    calls   s/call   s/call  name
> > >   7.85     24.64    24.64 71506294     0.00     0.00  ggc_alloc
> > >   3.37     35.23    10.59 75978109     0.00     0.00  htab_find_slot_with_hash
> > >   3.17     45.20     9.97 15526171     0.00     0.00  walk_tree
> >
> > Actually I did some profiling of this too and at least from Gerald's
> > testcase I concluded that wast majority of the hashtable uses come from
> > the for_each_template_parm.  Jason mentioned that Mark plans to trim
> > down use of these.  That should make it possible to shot this function
> > out of profiles completely.
> >
> > Mark, do you made some progress on this?  If not, I can try to do
> > something myself if you give me someguidelines.

Btw. - looking at walk_tree_without_duplicates and the associated
walk_tree in tree-inline.c - we're putting all visited trees in the
hashtab.  Can't we avoid putting trees inside there that can't be shared?
Maybe at least optimize this in the for_each_template_param case?

Just a thought,

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Speed/profile of gcc3.4
  2004-01-20 15:29     ` Richard Guenther
@ 2004-01-20 16:59       ` Richard Guenther
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Guenther @ 2004-01-20 16:59 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, Mark Mitchell

[-- Attachment #1: Type: TEXT/PLAIN, Size: 599 bytes --]

On Tue, 20 Jan 2004, Richard Guenther wrote:

> Btw. - looking at walk_tree_without_duplicates and the associated
> walk_tree in tree-inline.c - we're putting all visited trees in the
> hashtab.  Can't we avoid putting trees inside there that can't be shared?
> Maybe at least optimize this in the for_each_template_param case?

Like at least cleaning the stuff a little bit up as in the attached patch
(just fell out while trying to get what it's actually doing...).

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

[-- Attachment #2: Type: TEXT/PLAIN, Size: 12292 bytes --]

Index: cp/pt.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/pt.c,v
retrieving revision 1.816.2.2
diff -u -c -3 -p -r1.816.2.2 pt.c
*** cp/pt.c	19 Jan 2004 20:39:33 -0000	1.816.2.2
--- cp/pt.c	20 Jan 2004 16:58:15 -0000
*************** static tree convert_nontype_argument (tr
*** 114,120 ****
  static tree convert_template_argument (tree, tree, tree,
  				       tsubst_flags_t, int, tree);
  static tree get_bindings_overload (tree, tree, tree);
! static int for_each_template_parm (tree, tree_fn_t, void*, htab_t);
  static tree build_template_parm_index (int, int, int, tree, tree);
  static int inline_needs_template_parms (tree);
  static void push_inline_template_parms_recursive (tree, int);
--- 114,120 ----
  static tree convert_template_argument (tree, tree, tree,
  				       tsubst_flags_t, int, tree);
  static tree get_bindings_overload (tree, tree, tree);
! static int for_each_template_parm (tree, tree_fn_t, void*);
  static tree build_template_parm_index (int, int, int, tree, tree);
  static int inline_needs_template_parms (tree);
  static void push_inline_template_parms_recursive (tree, int);
*************** process_partial_specialization (tree dec
*** 2478,2485 ****
        tpd.current_arg = i;
        for_each_template_parm (TREE_VEC_ELT (inner_args, i),
  			      &mark_template_parm,
! 			      &tpd,
! 			      NULL);
      }
    for (i = 0; i < ntparms; ++i)
      if (tpd.parms[i] == 0)
--- 2478,2484 ----
        tpd.current_arg = i;
        for_each_template_parm (TREE_VEC_ELT (inner_args, i),
  			      &mark_template_parm,
! 			      &tpd);
      }
    for (i = 0; i < ntparms; ++i)
      if (tpd.parms[i] == 0)
*************** process_partial_specialization (tree dec
*** 2560,2567 ****
  	      memset (tpd2.parms, 0, sizeof (int) * nargs);
  	      for_each_template_parm (type,
  				      &mark_template_parm,
! 				      &tpd2,
! 				      NULL);
  		  
  	      if (tpd2.arg_uses_template_parms [i])
  		{
--- 2559,2565 ----
  	      memset (tpd2.parms, 0, sizeof (int) * nargs);
  	      for_each_template_parm (type,
  				      &mark_template_parm,
! 				      &tpd2);
  		  
  	      if (tpd2.arg_uses_template_parms [i])
  		{
*************** struct pair_fn_data 
*** 4502,4507 ****
--- 4500,4514 ----
    htab_t visited;
  };
  
+ static inline int
+ for_each_template_parm_1 (tree t, struct pair_fn_data *pfd)
+ {
+   return walk_tree (&t,
+                       for_each_template_parm_r,
+                       pfd,
+                       pfd->visited) != NULL_TREE;
+ }
+ 
  /* Called from for_each_template_parm via walk_tree.  */
  
  static tree
*************** for_each_template_parm_r (tree* tp, int*
*** 4513,4519 ****
    void *data = pfd->data;
  
    if (TYPE_P (t)
!       && for_each_template_parm (TYPE_CONTEXT (t), fn, data, pfd->visited))
      return error_mark_node;
  
    switch (TREE_CODE (t))
--- 4520,4526 ----
    void *data = pfd->data;
  
    if (TYPE_P (t)
!       && for_each_template_parm_1 (TYPE_CONTEXT (t), pfd))
      return error_mark_node;
  
    switch (TREE_CODE (t))
*************** for_each_template_parm_r (tree* tp, int*
*** 4527,4548 ****
      case ENUMERAL_TYPE:
        if (!TYPE_TEMPLATE_INFO (t))
  	*walk_subtrees = 0;
!       else if (for_each_template_parm (TREE_VALUE (TYPE_TEMPLATE_INFO (t)),
! 				       fn, data, pfd->visited))
  	return error_mark_node;
        break;
  
      case METHOD_TYPE:
        /* Since we're not going to walk subtrees, we have to do this
  	 explicitly here.  */
!       if (for_each_template_parm (TYPE_METHOD_BASETYPE (t), fn, data,
! 				  pfd->visited))
  	return error_mark_node;
        /* Fall through.  */
  
      case FUNCTION_TYPE:
        /* Check the return type.  */
!       if (for_each_template_parm (TREE_TYPE (t), fn, data, pfd->visited))
  	return error_mark_node;
  
        /* Check the parameter types.  Since default arguments are not
--- 4534,4554 ----
      case ENUMERAL_TYPE:
        if (!TYPE_TEMPLATE_INFO (t))
  	*walk_subtrees = 0;
!       else if (for_each_template_parm_1 (TREE_VALUE (TYPE_TEMPLATE_INFO (t)),
! 				       pfd))
  	return error_mark_node;
        break;
  
      case METHOD_TYPE:
        /* Since we're not going to walk subtrees, we have to do this
  	 explicitly here.  */
!       if (for_each_template_parm_1 (TYPE_METHOD_BASETYPE (t), pfd))
  	return error_mark_node;
        /* Fall through.  */
  
      case FUNCTION_TYPE:
        /* Check the return type.  */
!       if (for_each_template_parm_1 (TREE_TYPE (t), pfd))
  	return error_mark_node;
  
        /* Check the parameter types.  Since default arguments are not
*************** for_each_template_parm_r (tree* tp, int*
*** 4555,4562 ****
  	tree parm;
  
  	for (parm = TYPE_ARG_TYPES (t); parm; parm = TREE_CHAIN (parm))
! 	  if (for_each_template_parm (TREE_VALUE (parm), fn, data,
! 				      pfd->visited))
  	    return error_mark_node;
  
  	/* Since we've already handled the TYPE_ARG_TYPES, we don't
--- 4561,4567 ----
  	tree parm;
  
  	for (parm = TYPE_ARG_TYPES (t); parm; parm = TREE_CHAIN (parm))
! 	  if (for_each_template_parm_1 (TREE_VALUE (parm), pfd))
  	    return error_mark_node;
  
  	/* Since we've already handled the TYPE_ARG_TYPES, we don't
*************** for_each_template_parm_r (tree* tp, int*
*** 4566,4599 ****
        break;
  
      case TYPEOF_TYPE:
!       if (for_each_template_parm (TYPE_FIELDS (t), fn, data, 
! 				  pfd->visited))
  	return error_mark_node;
        break;
  
      case FUNCTION_DECL:
      case VAR_DECL:
        if (DECL_LANG_SPECIFIC (t) && DECL_TEMPLATE_INFO (t)
! 	  && for_each_template_parm (DECL_TI_ARGS (t), fn, data,
! 				     pfd->visited))
  	return error_mark_node;
        /* Fall through.  */
  
      case PARM_DECL:
      case CONST_DECL:
        if (TREE_CODE (t) == CONST_DECL && DECL_TEMPLATE_PARM_P (t)
! 	  && for_each_template_parm (DECL_INITIAL (t), fn, data,
! 				     pfd->visited))
  	return error_mark_node;
        if (DECL_CONTEXT (t) 
! 	  && for_each_template_parm (DECL_CONTEXT (t), fn, data,
! 				     pfd->visited))
  	return error_mark_node;
        break;
  
      case BOUND_TEMPLATE_TEMPLATE_PARM:
        /* Record template parameters such as `T' inside `TT<T>'.  */
!       if (for_each_template_parm (TYPE_TI_ARGS (t), fn, data, pfd->visited))
  	return error_mark_node;
        /* Fall through.  */
  
--- 4571,4600 ----
        break;
  
      case TYPEOF_TYPE:
!       if (for_each_template_parm_1 (TYPE_FIELDS (t), pfd))
  	return error_mark_node;
        break;
  
      case FUNCTION_DECL:
      case VAR_DECL:
        if (DECL_LANG_SPECIFIC (t) && DECL_TEMPLATE_INFO (t)
! 	  && for_each_template_parm_1 (DECL_TI_ARGS (t), pfd))
  	return error_mark_node;
        /* Fall through.  */
  
      case PARM_DECL:
      case CONST_DECL:
        if (TREE_CODE (t) == CONST_DECL && DECL_TEMPLATE_PARM_P (t)
! 	  && for_each_template_parm_1 (DECL_INITIAL (t), pfd))
  	return error_mark_node;
        if (DECL_CONTEXT (t) 
! 	  && for_each_template_parm_1 (DECL_CONTEXT (t), pfd))
  	return error_mark_node;
        break;
  
      case BOUND_TEMPLATE_TEMPLATE_PARM:
        /* Record template parameters such as `T' inside `TT<T>'.  */
!       if (for_each_template_parm_1 (TYPE_TI_ARGS (t), pfd))
  	return error_mark_node;
        /* Fall through.  */
  
*************** for_each_template_parm_r (tree* tp, int*
*** 4609,4615 ****
      case TEMPLATE_DECL:
        /* A template template parameter is encountered.  */
        if (DECL_TEMPLATE_TEMPLATE_PARM_P (t)
! 	  && for_each_template_parm (TREE_TYPE (t), fn, data, pfd->visited))
  	return error_mark_node;
  
        /* Already substituted template template parameter */
--- 4610,4616 ----
      case TEMPLATE_DECL:
        /* A template template parameter is encountered.  */
        if (DECL_TEMPLATE_TEMPLATE_PARM_P (t)
! 	  && for_each_template_parm_1 (TREE_TYPE (t), pfd))
  	return error_mark_node;
  
        /* Already substituted template template parameter */
*************** for_each_template_parm_r (tree* tp, int*
*** 4618,4633 ****
  
      case TYPENAME_TYPE:
        if (!fn 
! 	  || for_each_template_parm (TYPENAME_TYPE_FULLNAME (t), fn,
! 				     data, pfd->visited))
  	return error_mark_node;
        break;
  
      case CONSTRUCTOR:
        if (TREE_TYPE (t) && TYPE_PTRMEMFUNC_P (TREE_TYPE (t))
! 	  && for_each_template_parm (TYPE_PTRMEMFUNC_FN_TYPE
! 				     (TREE_TYPE (t)), fn, data,
! 				     pfd->visited))
  	return error_mark_node;
        break;
        
--- 4619,4632 ----
  
      case TYPENAME_TYPE:
        if (!fn 
! 	  || for_each_template_parm_1 (TYPENAME_TYPE_FULLNAME (t), pfd))
  	return error_mark_node;
        break;
  
      case CONSTRUCTOR:
        if (TREE_TYPE (t) && TYPE_PTRMEMFUNC_P (TREE_TYPE (t))
! 	  && for_each_template_parm_1 (TYPE_PTRMEMFUNC_FN_TYPE
! 				     (TREE_TYPE (t)), pfd))
  	return error_mark_node;
        break;
        
*************** for_each_template_parm_r (tree* tp, int*
*** 4658,4665 ****
  	 the BINFO hierarchy, which is circular, and therefore
  	 confuses walk_tree.  */
        *walk_subtrees = 0;
!       if (for_each_template_parm (BASELINK_FUNCTIONS (*tp), fn, data,
! 				  pfd->visited))
  	return error_mark_node;
        break;
  
--- 4657,4663 ----
  	 the BINFO hierarchy, which is circular, and therefore
  	 confuses walk_tree.  */
        *walk_subtrees = 0;
!       if (for_each_template_parm_1 (BASELINK_FUNCTIONS (*tp), pfd))
  	return error_mark_node;
        break;
  
*************** for_each_template_parm_r (tree* tp, int*
*** 4681,4687 ****
     considered to be the function which always returns 1.  */
  
  static int
! for_each_template_parm (tree t, tree_fn_t fn, void* data, htab_t visited)
  {
    struct pair_fn_data pfd;
    int result;
--- 4679,4685 ----
     considered to be the function which always returns 1.  */
  
  static int
! for_each_template_parm (tree t, tree_fn_t fn, void* data)
  {
    struct pair_fn_data pfd;
    int result;
*************** for_each_template_parm (tree t, tree_fn_
*** 4695,4713 ****
       for_each_template_parm, so we would need to reorganize a fair
       bit to use walk_tree_without_duplicates, so we keep our own
       visited list.)  */
!   if (visited)
!     pfd.visited = visited;
!   else
!     pfd.visited = htab_create (37, htab_hash_pointer, htab_eq_pointer, 
! 			       NULL);
!   result = walk_tree (&t, 
! 		      for_each_template_parm_r, 
! 		      &pfd,
! 		      pfd.visited) != NULL_TREE;
  
    /* Clean up.  */
!   if (!visited)
!     htab_delete (pfd.visited);
  
    return result;
  }
--- 4693,4704 ----
       for_each_template_parm, so we would need to reorganize a fair
       bit to use walk_tree_without_duplicates, so we keep our own
       visited list.)  */
!   pfd.visited = htab_create (37, htab_hash_pointer, htab_eq_pointer, 
! 			     NULL);
!   result = for_each_template_parm_1 (t, &pfd);
  
    /* Clean up.  */
!   htab_delete (pfd.visited);
  
    return result;
  }
*************** for_each_template_parm (tree t, tree_fn_
*** 4717,4723 ****
  int
  uses_template_parms (tree t)
  {
!   return for_each_template_parm (t, 0, 0, NULL);
  }
  
  /* Returns true if T depends on any template parameter with level LEVEL.  */
--- 4708,4714 ----
  int
  uses_template_parms (tree t)
  {
!   return for_each_template_parm (t, 0, 0);
  }
  
  /* Returns true if T depends on any template parameter with level LEVEL.  */
*************** uses_template_parms (tree t)
*** 4725,4731 ****
  int
  uses_template_parms_level (tree t, int level)
  {
!   return for_each_template_parm (t, template_parm_this_level_p, &level, NULL);
  }
  
  static int tinst_depth;
--- 4716,4722 ----
  int
  uses_template_parms_level (tree t, int level)
  {
!   return for_each_template_parm (t, template_parm_this_level_p, &level);
  }
  
  static int tinst_depth;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Speed/profile of gcc3.4
  2004-01-20 13:35 ` Jan Hubicka
  2004-01-20 14:59   ` Richard Guenther
@ 2004-01-20 17:51   ` Giovanni Bajo
  1 sibling, 0 replies; 10+ messages in thread
From: Giovanni Bajo @ 2004-01-20 17:51 UTC (permalink / raw)
  To: Jan Hubicka, Richard Guenther; +Cc: gcc

Jan Hubicka wrote:

[for_each_template_parm]

> Jason mentioned that Mark plans to trim
> down use of these.  That should make it possible to shot this function
> out of profiles completely.
> 
> Mark, do you made some progress on this?  If not, I can try to do
> something myself if you give me someguidelines.

I'm willing to help with this, but I need some guidelines as well.

Giovanni Bajo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Speed/profile of gcc3.4
  2004-01-20 12:42 Speed/profile of gcc3.4 Richard Guenther
                   ` (2 preceding siblings ...)
  2004-01-20 13:35 ` Jan Hubicka
@ 2004-01-20 22:36 ` Mike Stump
  3 siblings, 0 replies; 10+ messages in thread
From: Mike Stump @ 2004-01-20 22:36 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

On Tuesday, January 20, 2004, at 04:42 AM, Richard Guenther wrote:
> Also push_depth/push_by_depth could make use of __builtin_expect() and 
> put the realloc out of line.

I don't know that would improve things much, but try it out.  Maybe on 
a CPU with bad prediction it might help, I wasn't on such a machine as 
near as I could tell.

>   Was the use of prefetch in ggc_pop_context benchmarked?

Yes.  Many runs to determine the right value, and very, very fine 
grained timing to be able to notice very very slight effects.  ppc G4 
target hardware.  Of course, other CPUs may like slightly different 
values, but the expected improvements you could obtain should be very 
slight.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-01-20 22:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-20 12:42 Speed/profile of gcc3.4 Richard Guenther
2004-01-20 12:49 ` Zack Winkles
2004-01-20 13:01   ` Richard Guenther
2004-01-20 13:09 ` Jakub Jelinek
2004-01-20 13:35 ` Jan Hubicka
2004-01-20 14:59   ` Richard Guenther
2004-01-20 15:29     ` Richard Guenther
2004-01-20 16:59       ` Richard Guenther
2004-01-20 17:51   ` Giovanni Bajo
2004-01-20 22:36 ` Mike Stump

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).