* Speed/profile of gcc3.4
@ 2004-01-20 12:42 Richard Guenther
2004-01-20 12:49 ` Zack Winkles
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Richard Guenther @ 2004-01-20 12:42 UTC (permalink / raw)
To: gcc
Hi!
To give some more data to the speed of g++ discussion I built a profiling
compiler and ran it over the tramp3d.cpp testcase
(http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d.cpp.gz). Top
on the (flat) profile are
% cumulative self self total
time seconds seconds calls s/call s/call name
7.85 24.64 24.64 71506294 0.00 0.00 ggc_alloc
3.37 35.23 10.59 75978109 0.00 0.00 htab_find_slot_with_hash
3.17 45.20 9.97 15526171 0.00 0.00 walk_tree
3.06 54.80 9.60 3895596 0.00 0.00 gt_ggc_mx_lang_tree_node
2.10 61.40 6.60 43652 0.00 0.00 fixup_var_refs_insns
2.07 67.91 6.51 116225166 0.00 0.00 ggc_set_mark
2.05 74.35 6.44 17854 0.00 0.00 init_alias_analysis
1.51 79.10 4.75 221721 0.00 0.00 htab_expand
1.47 83.71 4.61 1044 0.00 0.02 store_motion
1.17 87.37 3.66 8618 0.00 0.00 loop_regs_scan
1.14 90.96 3.59 13512961 0.00 0.00 fixup_var_refs_1
1.03 94.19 3.23 238610 0.00 0.00 compute_transp
1.03 97.41 3.22 2216398 0.00 0.00 emit_insn
0.94 100.37 2.96 27841267 0.00 0.00 note_stores
0.94 103.31 2.94 20724190 0.00 0.00 splay_tree_splay_helper
0.89 106.09 2.78 11630295 0.00 0.00 for_each_rtx
0.88 108.85 2.76 2243042 0.00 0.00 cse_insn
0.88 111.60 2.75 7037958 0.00 0.00 reg_scan_mark_refs
0.79 114.08 2.48 16796134 0.00 0.00 find_loads
0.71 116.31 2.23 129249560 0.00 0.00 bitmap_set_bit
0.68 118.45 2.14 6755763 0.00 0.00 count_reg_usage
0.67 120.55 2.10 58964152 0.00 0.00 find_reg_note
0.61 122.47 1.92 3829519 0.00 0.00 constrain_operands
0.60 124.35 1.88 13513006 0.00 0.00 fixup_var_refs_insn
0.59 126.19 1.84 22372827 0.00 0.00 mark_set_1
Ugh.
The htab_find_slot_with_hash stuff should maybe splitted up because
it seems heavily overloaded. Also, do we use power-of-two hashtab sizes
only? In this case we could save the costly division/modulo calculations.
ggc_alloc - err - which collector do we use on default? The page or the zone collector?
How do I select a different collector?
For the page collector, inside ggc_alloc we should use __builtin_expect()
for the entry==NULL || entry->num_free_objects == 0, also using a wrapper
around ggc_alloc() with a __builtin_constant_p() could be used to speed up
the order calculation. Also push_depth/push_by_depth could make use of
__builtin_expect() and put the realloc out of line. Was the use of
prefetch in ggc_pop_context benchmarked?
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Speed/profile of gcc3.4
2004-01-20 12:42 Speed/profile of gcc3.4 Richard Guenther
@ 2004-01-20 12:49 ` Zack Winkles
2004-01-20 13:01 ` Richard Guenther
2004-01-20 13:09 ` Jakub Jelinek
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Zack Winkles @ 2004-01-20 12:49 UTC (permalink / raw)
To: gcc
Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> said:
> ggc_alloc - err - which collector do we use on default? The page or
> the zone collector? How do I select a different collector?
AFAIK, page is the default. Select a new one by passing --with-gz=zone
to ./configure.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Speed/profile of gcc3.4
2004-01-20 12:49 ` Zack Winkles
@ 2004-01-20 13:01 ` Richard Guenther
0 siblings, 0 replies; 10+ messages in thread
From: Richard Guenther @ 2004-01-20 13:01 UTC (permalink / raw)
To: Zack Winkles; +Cc: gcc
On Tue, 20 Jan 2004, Zack Winkles wrote:
> Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> said:
> > ggc_alloc - err - which collector do we use on default? The page or
> > the zone collector? How do I select a different collector?
>
> AFAIK, page is the default. Select a new one by passing --with-gz=zone
> to ./configure.
Thanks, --with-gc is not documented in install.texi, neither are the
choices page or zone. Only simple is mentioned as note for DG Unix 4.0,
but isn't support for simple removed?
PR/13770.
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Speed/profile of gcc3.4
2004-01-20 12:42 Speed/profile of gcc3.4 Richard Guenther
2004-01-20 12:49 ` Zack Winkles
@ 2004-01-20 13:09 ` Jakub Jelinek
2004-01-20 13:35 ` Jan Hubicka
2004-01-20 22:36 ` Mike Stump
3 siblings, 0 replies; 10+ messages in thread
From: Jakub Jelinek @ 2004-01-20 13:09 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
On Tue, Jan 20, 2004 at 01:42:35PM +0100, Richard Guenther wrote:
> The htab_find_slot_with_hash stuff should maybe splitted up because
> it seems heavily overloaded. Also, do we use power-of-two hashtab sizes
> only? In this case we could save the costly division/modulo calculations.
No, there are not enough primes which are power-of-two... ;)
All hash tables using libiberty/hashtable.c are using prime hashtab sizes,
ht_* in gcc/hashtable.c use power-of-two sizes.
Jakub
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Speed/profile of gcc3.4
2004-01-20 12:42 Speed/profile of gcc3.4 Richard Guenther
2004-01-20 12:49 ` Zack Winkles
2004-01-20 13:09 ` Jakub Jelinek
@ 2004-01-20 13:35 ` Jan Hubicka
2004-01-20 14:59 ` Richard Guenther
2004-01-20 17:51 ` Giovanni Bajo
2004-01-20 22:36 ` Mike Stump
3 siblings, 2 replies; 10+ messages in thread
From: Jan Hubicka @ 2004-01-20 13:35 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
> Hi!
>
> To give some more data to the speed of g++ discussion I built a profiling
> compiler and ran it over the tramp3d.cpp testcase
> (http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d.cpp.gz). Top
> on the (flat) profile are
>
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 7.85 24.64 24.64 71506294 0.00 0.00 ggc_alloc
> 3.37 35.23 10.59 75978109 0.00 0.00 htab_find_slot_with_hash
> 3.17 45.20 9.97 15526171 0.00 0.00 walk_tree
> 3.06 54.80 9.60 3895596 0.00 0.00 gt_ggc_mx_lang_tree_node
> 2.10 61.40 6.60 43652 0.00 0.00 fixup_var_refs_insns
> 2.07 67.91 6.51 116225166 0.00 0.00 ggc_set_mark
> 2.05 74.35 6.44 17854 0.00 0.00 init_alias_analysis
> 1.51 79.10 4.75 221721 0.00 0.00 htab_expand
> 1.47 83.71 4.61 1044 0.00 0.02 store_motion
> 1.17 87.37 3.66 8618 0.00 0.00 loop_regs_scan
> 1.14 90.96 3.59 13512961 0.00 0.00 fixup_var_refs_1
> 1.03 94.19 3.23 238610 0.00 0.00 compute_transp
> 1.03 97.41 3.22 2216398 0.00 0.00 emit_insn
> 0.94 100.37 2.96 27841267 0.00 0.00 note_stores
> 0.94 103.31 2.94 20724190 0.00 0.00 splay_tree_splay_helper
> 0.89 106.09 2.78 11630295 0.00 0.00 for_each_rtx
> 0.88 108.85 2.76 2243042 0.00 0.00 cse_insn
> 0.88 111.60 2.75 7037958 0.00 0.00 reg_scan_mark_refs
> 0.79 114.08 2.48 16796134 0.00 0.00 find_loads
> 0.71 116.31 2.23 129249560 0.00 0.00 bitmap_set_bit
> 0.68 118.45 2.14 6755763 0.00 0.00 count_reg_usage
> 0.67 120.55 2.10 58964152 0.00 0.00 find_reg_note
> 0.61 122.47 1.92 3829519 0.00 0.00 constrain_operands
> 0.60 124.35 1.88 13513006 0.00 0.00 fixup_var_refs_insn
> 0.59 126.19 1.84 22372827 0.00 0.00 mark_set_1
>
> Ugh.
>
> The htab_find_slot_with_hash stuff should maybe splitted up because
> it seems heavily overloaded. Also, do we use power-of-two hashtab sizes
> only? In this case we could save the costly division/modulo calculations.
Actually I did some profiling of this too and at least from Gerald's
testcase I concluded that wast majority of the hashtable uses come from
the for_each_template_parm. Jason mentioned that Mark plans to trim
down use of these. That should make it possible to shot this function
out of profiles completely.
Mark, do you made some progress on this? If not, I can try to do
something myself if you give me someguidelines.
>
> ggc_alloc - err - which collector do we use on default? The page or the zone collector?
> How do I select a different collector?
>
> For the page collector, inside ggc_alloc we should use __builtin_expect()
> for the entry==NULL || entry->num_free_objects == 0, also using a wrapper
> around ggc_alloc() with a __builtin_constant_p() could be used to speed up
> the order calculation. Also push_depth/push_by_depth could make use of
> __builtin_expect() and put the realloc out of line. Was the use of
> prefetch in ggc_pop_context benchmarked?
The builtin_expect tricks can be obsoletted by pusing profilebootstrap
to be used for porduction builds of the compiler. Perhaps we can do
some work to make it cheaper by producing the train run testcase that is
smaller than full libjava/libstdc++ build?
Honza
>
> Richard.
>
> --
> Richard Guenther <richard dot guenther at uni-tuebingen dot de>
> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Speed/profile of gcc3.4
2004-01-20 13:35 ` Jan Hubicka
@ 2004-01-20 14:59 ` Richard Guenther
2004-01-20 15:29 ` Richard Guenther
2004-01-20 17:51 ` Giovanni Bajo
1 sibling, 1 reply; 10+ messages in thread
From: Richard Guenther @ 2004-01-20 14:59 UTC (permalink / raw)
To: Jan Hubicka; +Cc: gcc
On Tue, 20 Jan 2004, Jan Hubicka wrote:
> > Hi!
> >
> > To give some more data to the speed of g++ discussion I built a profiling
> > compiler and ran it over the tramp3d.cpp testcase
> > (http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d.cpp.gz). Top
> > on the (flat) profile are
> >
> > % cumulative self self total
> > time seconds seconds calls s/call s/call name
> > 7.85 24.64 24.64 71506294 0.00 0.00 ggc_alloc
> > 3.37 35.23 10.59 75978109 0.00 0.00 htab_find_slot_with_hash
> > 3.17 45.20 9.97 15526171 0.00 0.00 walk_tree
>
> Actually I did some profiling of this too and at least from Gerald's
> testcase I concluded that wast majority of the hashtable uses come from
> the for_each_template_parm. Jason mentioned that Mark plans to trim
> down use of these. That should make it possible to shot this function
> out of profiles completely.
>
> Mark, do you made some progress on this? If not, I can try to do
> something myself if you give me someguidelines.
Actually, for_each_template_param shows up here:
0.43 151.22 1.34 47911 0.00 0.00 sbitmap_vector_alloc
0.39 152.45 1.23 18536823 0.00 0.00 for_each_template_parm_r
0.39 153.67 1.22 1120852 0.00 0.00 sbitmap_union_of_diff_cg
Callgraph looks like
384854 cselib_lookup_mem <cycle 1> [407]
640103 gen_rtx_CONST_INT <cycle 1> [1087]
1855728 cgraph_node <cycle 1> [595]
3023470 cselib_lookup <cycle 1> [108]
69793938 htab_find_slot <cycle 1> [187]
0.00 0.00 2/101229133 make_cpp_dir [2759]
0.00 0.00 441/101229133 _cpp_find_file [2155]
0.00 0.00 4518/101229133 cgraph_remove_node [1459]
[41] 3.9 10.59 1.56 75978109 htab_find_slot_with_hash <cycle 1> [41]
0.65 0.00 64368978/64368978 eq_pointer [310]
0.45 0.00 1517339/1517339 reg_attrs_htab_eq [389]
0.12 0.00 462631/462631 list_hash_eq [752]
0.11 0.00 1052895/1052895 mem_attrs_htab_eq [780]
0.10 0.01 227704/661135 type_hash_eq [480]
0.07 0.00 1698056/1698056 size_htab_eq [927]
and for htab_find_slot
940124 get_mem_attrs <cycle 1> [866]
1096723 get_reg_attrs <cycle 1> [657]
1684632 size_int_type_wide <cycle 1> [549]
65661439 walk_tree <cycle 1> [23]
0.05 0.04 58629/101229133 remove_eh_handler [721]
0.09 0.06 104145/101229133 add_ehl_entry [590]
0.17 0.12 200965/101229133 maybe_remove_eh_handler [397]
[187] 0.5 0.88 0.57 69793938 htab_find_slot <cycle 1> [187]
0.36 0.00 65692767/114308909 hash_pointer [318]
0.06 0.00 940124/1008688 mem_attrs_htab_hash [985]
0.05 0.00 1096723/1407744 reg_attrs_htab_hash [929]
0.04 0.00 5783/5783 const_hash_1 [1123]
0.02 0.01 1684632/1685231 size_htab_hash [1284]
0.02 0.00 363739/366283 ehl_hash [1404]
0.01 0.00 3599/3599 const_double_htab_hash [1755]
0.00 0.00 6571/12650 typename_hash [2424]
0.00 0.00 31328/764758 insns_for_mem_hash [2973]
0.00 0.00 5783/5783 const_desc_hash [3332]
69793938 htab_find_slot_with_hash <cycle 1> [41]
and walk_tree
100948352 walk_tree <cycle 1> [23]
732 break_out_target_exprs <cycle 1> [2322]
137222 cxx_unsave_expr_now <cycle 1> [798]
193876 copy_body <cycle 1> [1409]
197273 cgraph_create_edges <cycle 1> [1072]
238457 record_call_1 <cycle 1> [429]
377772 expand_calls_inline <cycle 1> [2988]
472630 remap_decl <cycle 1> [379]
536279 expand_call_inline <cycle 1> [132]
765817 walk_tree_without_duplicates <cycle 1> [646]
1337259 cp_walk_subtrees <cycle 1> [221]
11268854 for_each_template_parm <cycle 1> [333]
[23] 5.8 9.97 8.22 15526171+100948352 walk_tree <cycle 1> [23]
0.31 6.75 15566125/15566125 cp_unsave_r [61]
0.53 0.00 48523627/50649982 first_rtl_op [352]
0.22 0.05 2107198/2107198 no_linkage_helper [524]
0.03 0.10 951599/951599 inline_forbidden_p_1 [709]
0.10 0.01 1625951/1625951 calls_setjmp_r [773]
0.05 0.00 5275403/30960015 cp_is_overload_p [478]
0.03 0.00 593303/593303 c_estimate_num_insns_1 [1203]
0.03 0.00 3316016/3316016 cxx_callgraph_analyze_expr [1225]
0.01 0.00 110823/110823 find_reachable_label_1 [1700]
0.00 0.00 671/26098412 copy_tree_r [45]
0.00 0.00 7357/7357 bot_replace [2470]
0.00 0.00 2424/2424 local_variable_p_walkfn [2528]
0.00 0.00 9990/9990 nullify_returns_r [3277]
65661439 htab_find_slot <cycle 1> [187]
51607162 cp_walk_subtrees <cycle 1> [221]
26625551 mark_local_for_remap_r <cycle 1> [195]
18536823 for_each_template_parm_r <cycle 1> [210]
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Speed/profile of gcc3.4
2004-01-20 14:59 ` Richard Guenther
@ 2004-01-20 15:29 ` Richard Guenther
2004-01-20 16:59 ` Richard Guenther
0 siblings, 1 reply; 10+ messages in thread
From: Richard Guenther @ 2004-01-20 15:29 UTC (permalink / raw)
To: Jan Hubicka; +Cc: gcc, Mark Mitchell
On Tue, 20 Jan 2004, Richard Guenther wrote:
> On Tue, 20 Jan 2004, Jan Hubicka wrote:
>
> > > Hi!
> > >
> > > To give some more data to the speed of g++ discussion I built a profiling
> > > compiler and ran it over the tramp3d.cpp testcase
> > > (http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d.cpp.gz). Top
> > > on the (flat) profile are
> > >
> > > % cumulative self self total
> > > time seconds seconds calls s/call s/call name
> > > 7.85 24.64 24.64 71506294 0.00 0.00 ggc_alloc
> > > 3.37 35.23 10.59 75978109 0.00 0.00 htab_find_slot_with_hash
> > > 3.17 45.20 9.97 15526171 0.00 0.00 walk_tree
> >
> > Actually I did some profiling of this too and at least from Gerald's
> > testcase I concluded that wast majority of the hashtable uses come from
> > the for_each_template_parm. Jason mentioned that Mark plans to trim
> > down use of these. That should make it possible to shot this function
> > out of profiles completely.
> >
> > Mark, do you made some progress on this? If not, I can try to do
> > something myself if you give me someguidelines.
Btw. - looking at walk_tree_without_duplicates and the associated
walk_tree in tree-inline.c - we're putting all visited trees in the
hashtab. Can't we avoid putting trees inside there that can't be shared?
Maybe at least optimize this in the for_each_template_param case?
Just a thought,
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Speed/profile of gcc3.4
2004-01-20 15:29 ` Richard Guenther
@ 2004-01-20 16:59 ` Richard Guenther
0 siblings, 0 replies; 10+ messages in thread
From: Richard Guenther @ 2004-01-20 16:59 UTC (permalink / raw)
To: Jan Hubicka; +Cc: gcc, Mark Mitchell
[-- Attachment #1: Type: TEXT/PLAIN, Size: 599 bytes --]
On Tue, 20 Jan 2004, Richard Guenther wrote:
> Btw. - looking at walk_tree_without_duplicates and the associated
> walk_tree in tree-inline.c - we're putting all visited trees in the
> hashtab. Can't we avoid putting trees inside there that can't be shared?
> Maybe at least optimize this in the for_each_template_param case?
Like at least cleaning the stuff a little bit up as in the attached patch
(just fell out while trying to get what it's actually doing...).
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
[-- Attachment #2: Type: TEXT/PLAIN, Size: 12292 bytes --]
Index: cp/pt.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/pt.c,v
retrieving revision 1.816.2.2
diff -u -c -3 -p -r1.816.2.2 pt.c
*** cp/pt.c 19 Jan 2004 20:39:33 -0000 1.816.2.2
--- cp/pt.c 20 Jan 2004 16:58:15 -0000
*************** static tree convert_nontype_argument (tr
*** 114,120 ****
static tree convert_template_argument (tree, tree, tree,
tsubst_flags_t, int, tree);
static tree get_bindings_overload (tree, tree, tree);
! static int for_each_template_parm (tree, tree_fn_t, void*, htab_t);
static tree build_template_parm_index (int, int, int, tree, tree);
static int inline_needs_template_parms (tree);
static void push_inline_template_parms_recursive (tree, int);
--- 114,120 ----
static tree convert_template_argument (tree, tree, tree,
tsubst_flags_t, int, tree);
static tree get_bindings_overload (tree, tree, tree);
! static int for_each_template_parm (tree, tree_fn_t, void*);
static tree build_template_parm_index (int, int, int, tree, tree);
static int inline_needs_template_parms (tree);
static void push_inline_template_parms_recursive (tree, int);
*************** process_partial_specialization (tree dec
*** 2478,2485 ****
tpd.current_arg = i;
for_each_template_parm (TREE_VEC_ELT (inner_args, i),
&mark_template_parm,
! &tpd,
! NULL);
}
for (i = 0; i < ntparms; ++i)
if (tpd.parms[i] == 0)
--- 2478,2484 ----
tpd.current_arg = i;
for_each_template_parm (TREE_VEC_ELT (inner_args, i),
&mark_template_parm,
! &tpd);
}
for (i = 0; i < ntparms; ++i)
if (tpd.parms[i] == 0)
*************** process_partial_specialization (tree dec
*** 2560,2567 ****
memset (tpd2.parms, 0, sizeof (int) * nargs);
for_each_template_parm (type,
&mark_template_parm,
! &tpd2,
! NULL);
if (tpd2.arg_uses_template_parms [i])
{
--- 2559,2565 ----
memset (tpd2.parms, 0, sizeof (int) * nargs);
for_each_template_parm (type,
&mark_template_parm,
! &tpd2);
if (tpd2.arg_uses_template_parms [i])
{
*************** struct pair_fn_data
*** 4502,4507 ****
--- 4500,4514 ----
htab_t visited;
};
+ static inline int
+ for_each_template_parm_1 (tree t, struct pair_fn_data *pfd)
+ {
+ return walk_tree (&t,
+ for_each_template_parm_r,
+ pfd,
+ pfd->visited) != NULL_TREE;
+ }
+
/* Called from for_each_template_parm via walk_tree. */
static tree
*************** for_each_template_parm_r (tree* tp, int*
*** 4513,4519 ****
void *data = pfd->data;
if (TYPE_P (t)
! && for_each_template_parm (TYPE_CONTEXT (t), fn, data, pfd->visited))
return error_mark_node;
switch (TREE_CODE (t))
--- 4520,4526 ----
void *data = pfd->data;
if (TYPE_P (t)
! && for_each_template_parm_1 (TYPE_CONTEXT (t), pfd))
return error_mark_node;
switch (TREE_CODE (t))
*************** for_each_template_parm_r (tree* tp, int*
*** 4527,4548 ****
case ENUMERAL_TYPE:
if (!TYPE_TEMPLATE_INFO (t))
*walk_subtrees = 0;
! else if (for_each_template_parm (TREE_VALUE (TYPE_TEMPLATE_INFO (t)),
! fn, data, pfd->visited))
return error_mark_node;
break;
case METHOD_TYPE:
/* Since we're not going to walk subtrees, we have to do this
explicitly here. */
! if (for_each_template_parm (TYPE_METHOD_BASETYPE (t), fn, data,
! pfd->visited))
return error_mark_node;
/* Fall through. */
case FUNCTION_TYPE:
/* Check the return type. */
! if (for_each_template_parm (TREE_TYPE (t), fn, data, pfd->visited))
return error_mark_node;
/* Check the parameter types. Since default arguments are not
--- 4534,4554 ----
case ENUMERAL_TYPE:
if (!TYPE_TEMPLATE_INFO (t))
*walk_subtrees = 0;
! else if (for_each_template_parm_1 (TREE_VALUE (TYPE_TEMPLATE_INFO (t)),
! pfd))
return error_mark_node;
break;
case METHOD_TYPE:
/* Since we're not going to walk subtrees, we have to do this
explicitly here. */
! if (for_each_template_parm_1 (TYPE_METHOD_BASETYPE (t), pfd))
return error_mark_node;
/* Fall through. */
case FUNCTION_TYPE:
/* Check the return type. */
! if (for_each_template_parm_1 (TREE_TYPE (t), pfd))
return error_mark_node;
/* Check the parameter types. Since default arguments are not
*************** for_each_template_parm_r (tree* tp, int*
*** 4555,4562 ****
tree parm;
for (parm = TYPE_ARG_TYPES (t); parm; parm = TREE_CHAIN (parm))
! if (for_each_template_parm (TREE_VALUE (parm), fn, data,
! pfd->visited))
return error_mark_node;
/* Since we've already handled the TYPE_ARG_TYPES, we don't
--- 4561,4567 ----
tree parm;
for (parm = TYPE_ARG_TYPES (t); parm; parm = TREE_CHAIN (parm))
! if (for_each_template_parm_1 (TREE_VALUE (parm), pfd))
return error_mark_node;
/* Since we've already handled the TYPE_ARG_TYPES, we don't
*************** for_each_template_parm_r (tree* tp, int*
*** 4566,4599 ****
break;
case TYPEOF_TYPE:
! if (for_each_template_parm (TYPE_FIELDS (t), fn, data,
! pfd->visited))
return error_mark_node;
break;
case FUNCTION_DECL:
case VAR_DECL:
if (DECL_LANG_SPECIFIC (t) && DECL_TEMPLATE_INFO (t)
! && for_each_template_parm (DECL_TI_ARGS (t), fn, data,
! pfd->visited))
return error_mark_node;
/* Fall through. */
case PARM_DECL:
case CONST_DECL:
if (TREE_CODE (t) == CONST_DECL && DECL_TEMPLATE_PARM_P (t)
! && for_each_template_parm (DECL_INITIAL (t), fn, data,
! pfd->visited))
return error_mark_node;
if (DECL_CONTEXT (t)
! && for_each_template_parm (DECL_CONTEXT (t), fn, data,
! pfd->visited))
return error_mark_node;
break;
case BOUND_TEMPLATE_TEMPLATE_PARM:
/* Record template parameters such as `T' inside `TT<T>'. */
! if (for_each_template_parm (TYPE_TI_ARGS (t), fn, data, pfd->visited))
return error_mark_node;
/* Fall through. */
--- 4571,4600 ----
break;
case TYPEOF_TYPE:
! if (for_each_template_parm_1 (TYPE_FIELDS (t), pfd))
return error_mark_node;
break;
case FUNCTION_DECL:
case VAR_DECL:
if (DECL_LANG_SPECIFIC (t) && DECL_TEMPLATE_INFO (t)
! && for_each_template_parm_1 (DECL_TI_ARGS (t), pfd))
return error_mark_node;
/* Fall through. */
case PARM_DECL:
case CONST_DECL:
if (TREE_CODE (t) == CONST_DECL && DECL_TEMPLATE_PARM_P (t)
! && for_each_template_parm_1 (DECL_INITIAL (t), pfd))
return error_mark_node;
if (DECL_CONTEXT (t)
! && for_each_template_parm_1 (DECL_CONTEXT (t), pfd))
return error_mark_node;
break;
case BOUND_TEMPLATE_TEMPLATE_PARM:
/* Record template parameters such as `T' inside `TT<T>'. */
! if (for_each_template_parm_1 (TYPE_TI_ARGS (t), pfd))
return error_mark_node;
/* Fall through. */
*************** for_each_template_parm_r (tree* tp, int*
*** 4609,4615 ****
case TEMPLATE_DECL:
/* A template template parameter is encountered. */
if (DECL_TEMPLATE_TEMPLATE_PARM_P (t)
! && for_each_template_parm (TREE_TYPE (t), fn, data, pfd->visited))
return error_mark_node;
/* Already substituted template template parameter */
--- 4610,4616 ----
case TEMPLATE_DECL:
/* A template template parameter is encountered. */
if (DECL_TEMPLATE_TEMPLATE_PARM_P (t)
! && for_each_template_parm_1 (TREE_TYPE (t), pfd))
return error_mark_node;
/* Already substituted template template parameter */
*************** for_each_template_parm_r (tree* tp, int*
*** 4618,4633 ****
case TYPENAME_TYPE:
if (!fn
! || for_each_template_parm (TYPENAME_TYPE_FULLNAME (t), fn,
! data, pfd->visited))
return error_mark_node;
break;
case CONSTRUCTOR:
if (TREE_TYPE (t) && TYPE_PTRMEMFUNC_P (TREE_TYPE (t))
! && for_each_template_parm (TYPE_PTRMEMFUNC_FN_TYPE
! (TREE_TYPE (t)), fn, data,
! pfd->visited))
return error_mark_node;
break;
--- 4619,4632 ----
case TYPENAME_TYPE:
if (!fn
! || for_each_template_parm_1 (TYPENAME_TYPE_FULLNAME (t), pfd))
return error_mark_node;
break;
case CONSTRUCTOR:
if (TREE_TYPE (t) && TYPE_PTRMEMFUNC_P (TREE_TYPE (t))
! && for_each_template_parm_1 (TYPE_PTRMEMFUNC_FN_TYPE
! (TREE_TYPE (t)), pfd))
return error_mark_node;
break;
*************** for_each_template_parm_r (tree* tp, int*
*** 4658,4665 ****
the BINFO hierarchy, which is circular, and therefore
confuses walk_tree. */
*walk_subtrees = 0;
! if (for_each_template_parm (BASELINK_FUNCTIONS (*tp), fn, data,
! pfd->visited))
return error_mark_node;
break;
--- 4657,4663 ----
the BINFO hierarchy, which is circular, and therefore
confuses walk_tree. */
*walk_subtrees = 0;
! if (for_each_template_parm_1 (BASELINK_FUNCTIONS (*tp), pfd))
return error_mark_node;
break;
*************** for_each_template_parm_r (tree* tp, int*
*** 4681,4687 ****
considered to be the function which always returns 1. */
static int
! for_each_template_parm (tree t, tree_fn_t fn, void* data, htab_t visited)
{
struct pair_fn_data pfd;
int result;
--- 4679,4685 ----
considered to be the function which always returns 1. */
static int
! for_each_template_parm (tree t, tree_fn_t fn, void* data)
{
struct pair_fn_data pfd;
int result;
*************** for_each_template_parm (tree t, tree_fn_
*** 4695,4713 ****
for_each_template_parm, so we would need to reorganize a fair
bit to use walk_tree_without_duplicates, so we keep our own
visited list.) */
! if (visited)
! pfd.visited = visited;
! else
! pfd.visited = htab_create (37, htab_hash_pointer, htab_eq_pointer,
! NULL);
! result = walk_tree (&t,
! for_each_template_parm_r,
! &pfd,
! pfd.visited) != NULL_TREE;
/* Clean up. */
! if (!visited)
! htab_delete (pfd.visited);
return result;
}
--- 4693,4704 ----
for_each_template_parm, so we would need to reorganize a fair
bit to use walk_tree_without_duplicates, so we keep our own
visited list.) */
! pfd.visited = htab_create (37, htab_hash_pointer, htab_eq_pointer,
! NULL);
! result = for_each_template_parm_1 (t, &pfd);
/* Clean up. */
! htab_delete (pfd.visited);
return result;
}
*************** for_each_template_parm (tree t, tree_fn_
*** 4717,4723 ****
int
uses_template_parms (tree t)
{
! return for_each_template_parm (t, 0, 0, NULL);
}
/* Returns true if T depends on any template parameter with level LEVEL. */
--- 4708,4714 ----
int
uses_template_parms (tree t)
{
! return for_each_template_parm (t, 0, 0);
}
/* Returns true if T depends on any template parameter with level LEVEL. */
*************** uses_template_parms (tree t)
*** 4725,4731 ****
int
uses_template_parms_level (tree t, int level)
{
! return for_each_template_parm (t, template_parm_this_level_p, &level, NULL);
}
static int tinst_depth;
--- 4716,4722 ----
int
uses_template_parms_level (tree t, int level)
{
! return for_each_template_parm (t, template_parm_this_level_p, &level);
}
static int tinst_depth;
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Speed/profile of gcc3.4
2004-01-20 13:35 ` Jan Hubicka
2004-01-20 14:59 ` Richard Guenther
@ 2004-01-20 17:51 ` Giovanni Bajo
1 sibling, 0 replies; 10+ messages in thread
From: Giovanni Bajo @ 2004-01-20 17:51 UTC (permalink / raw)
To: Jan Hubicka, Richard Guenther; +Cc: gcc
Jan Hubicka wrote:
[for_each_template_parm]
> Jason mentioned that Mark plans to trim
> down use of these. That should make it possible to shot this function
> out of profiles completely.
>
> Mark, do you made some progress on this? If not, I can try to do
> something myself if you give me someguidelines.
I'm willing to help with this, but I need some guidelines as well.
Giovanni Bajo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Speed/profile of gcc3.4
2004-01-20 12:42 Speed/profile of gcc3.4 Richard Guenther
` (2 preceding siblings ...)
2004-01-20 13:35 ` Jan Hubicka
@ 2004-01-20 22:36 ` Mike Stump
3 siblings, 0 replies; 10+ messages in thread
From: Mike Stump @ 2004-01-20 22:36 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
On Tuesday, January 20, 2004, at 04:42 AM, Richard Guenther wrote:
> Also push_depth/push_by_depth could make use of __builtin_expect() and
> put the realloc out of line.
I don't know that would improve things much, but try it out. Maybe on
a CPU with bad prediction it might help, I wasn't on such a machine as
near as I could tell.
> Was the use of prefetch in ggc_pop_context benchmarked?
Yes. Many runs to determine the right value, and very, very fine
grained timing to be able to notice very very slight effects. ppc G4
target hardware. Of course, other CPUs may like slightly different
values, but the expected improvements you could obtain should be very
slight.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-01-20 22:36 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-20 12:42 Speed/profile of gcc3.4 Richard Guenther
2004-01-20 12:49 ` Zack Winkles
2004-01-20 13:01 ` Richard Guenther
2004-01-20 13:09 ` Jakub Jelinek
2004-01-20 13:35 ` Jan Hubicka
2004-01-20 14:59 ` Richard Guenther
2004-01-20 15:29 ` Richard Guenther
2004-01-20 16:59 ` Richard Guenther
2004-01-20 17:51 ` Giovanni Bajo
2004-01-20 22:36 ` Mike Stump
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).