From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Sturm To: David Edelsohn Cc: "David S. Miller" , , , Subject: Re: Faster compilation speed Date: Sun, 18 Aug 2002 12:58:00 -0000 Message-id: References: <200208132103.RAA25968@makai.watson.ibm.com> X-SW-Source: 2002-08/msg01055.html On Tue, 13 Aug 2002, David Edelsohn wrote: > Here's an interesting (aka depressing) data point. My previous > cache miss statistics were for GCC -O2. At -O0, GCC's cache miss > statistics stay the same or get up to 20% *worse*. In comparison, the > cache statistics for IBM's compiler without optimization enabled *improve* > up to 50 for the same reload.c and insn-recog.c input files compared to > optimized. Here's a data point on alpha-linux: cc1 -quiet -O2 reload.i issues/cycles = 0.51 issues/dcache_miss = 26.93 Without optimization: cc1 -quiet reload.i issues/cycles = 0.52 issues/dcache_miss = 31.29 This is on a ev56 with a direct-mapped cache. To get some idea where the misses are taking place, I experimented with iprobe's sampling mode. Omitting results below the 1% sample threshold, I get: function | issues | access | misses | i/m | a/m ----------------------------+--------+--------+--------+-----+----- yyparse | 2924 | 848 | 148 | 20 | 5.7 gt_ggc_mx_lang_tree_node | 1336 | 612 | 74 | 18 | 8.2 verify_flow_info | 1388 | 408 | 129 | 11 | 3.1 copy_rtx_if_shared | 2120 | 1012 | 53 | 40 | 19.0 propagate_one_insn | 3636 | 504 | 52 | 70 | 9.6 find_temp_slot_from_address | 728 | 232 | 126 | 6 | 1.8 ggc_mark_rtx_children_1 | 1580 | 316 | 40 | 40 | 7.9 extract_insn | 1576 | 476 | 52 | 30 | 9.1 record_reg_classes | 3848 | 944 | 65 | 59 | 14.5 reg_scan_mark_refs | 1472 | 632 | 66 | 22 | 9.5 find_reloads | 7680 | 3104 | 148 | 52 | 20.9 subst_reloads | 4772 | 2736 | 169 | 28 | 16.1 side_effects_p | 1344 | 564 | 43 | 31 | 13.1 for_each_rtx | 4924 | 1464 | 75 | 66 | 19.5 ggc_alloc | 2424 | 728 | 111 | 22 | 6.5 ggc_set_mark | 3392 | 976 | 107 | 32 | 9.1 (Each sample reported is 2^14 events.) yyparse performs badly (as would any table-driven parser), but how about verify_flow_info and find_temp_slot_from_address? Both are reporting awful cache behavior. Jeff