From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21956 invoked by alias); 14 Jun 2006 20:41:47 -0000 Received: (qmail 21942 invoked by uid 22791); 14 Jun 2006 20:41:45 -0000 X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 14 Jun 2006 20:41:40 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11.20060308/8.12.11) with ESMTP id k5EKfcS6024188 for ; Wed, 14 Jun 2006 16:41:38 -0400 Received: from pobox.toronto.redhat.com (pobox.toronto.redhat.com [172.16.14.4]) by int-mx1.corp.redhat.com (8.12.11.20060308/8.12.11) with ESMTP id k5EKfanq028520 for ; Wed, 14 Jun 2006 16:41:37 -0400 Received: from [172.16.14.227] (IDENT:liGg6eRZeaI+/8+n0QvphClEPr3vBVsi@topaz.toronto.redhat.com [172.16.14.227]) by pobox.toronto.redhat.com (8.12.8/8.12.8) with ESMTP id k5EKfZve014534 for ; Wed, 14 Jun 2006 16:41:35 -0400 Message-ID: <4490747F.1050909@redhat.com> Date: Wed, 14 Jun 2006 20:41:00 -0000 From: Dave Brolley User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050317) MIME-Version: 1.0 To: sid@sources.redhat.com Subject: [patch][commit] Performance of --gprof option Content-Type: multipart/mixed; boundary="------------050407040100090507080407" X-IsSubscribed: yes Mailing-List: contact sid-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: sid-owner@sourceware.org X-SW-Source: 2006-q2/txt/msg00041.txt.bz2 This is a multi-part message in MIME format. --------------050407040100090507080407 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-length: 2142 Hi, We have two ports which support pipeline modelling (--final-insn-count, --trace-count) combined with use of --gprof=,cycles=1 in order to obtain cycle accurate profiles of the applications being simulated. Our clients have complained about the poor performance of SID when using these options. The poor performance is mainly due to the following factors: o The gprof component is a "regular" client of the target scheduler when in cycle mode. o It is necessary to use --insn-count=1 in order to obtain accurate samples when using --gprof=,cycles=n. Otherwise the gprof component is not triggered after the given number of cycles. These combine to slow the simulation because each simulated cycle requires one entire target scheduler iteration. Also, the gprof component's "sample" pin is driven once per simulated cycle. By making two simple changes, I was able to improve the performance of SID by 30% in this situation. 1) Have the cpu component perform gprof sampling directly. 2) Allow a given number of cycles to be driven on the gprof component's "sample" pin. Change number 1 removes the requirement of using --insn-count=1. Gprof sampling is now done by the cpu component in step_pin_handler and in stop_after_insns_p. Change number 2 allows several samples to be accumulated at the same pc (when there is some kind of stall due to latency), eliminating the need to drive the "sample" pin once per simulated cycle. This dramatically reduces the number of times the pin is driven during a typical simulation. These changes also fix a couple of "bugs" in that the gprof option wasn't working as advertised (when --help is specified) since o More samples would be taken than expected in insn-count mode if the cpu was yielded o Fewer samples than expected would be taken in cycle mode if --insn-count=1 was not used. The correct number of samples is now taken in all situations. I've committed the attached patch which implements these changes. Tested on two internal ports which support --gprof. Judging from the lack of calls to cg_profile*, no other ports are using this feature. Dave --------------050407040100090507080407 Content-Type: text/plain; name="sid-gprof.ChangeLog" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="sid-gprof.ChangeLog" Content-length: 1307 2006-06-14 Dave Brolley * gprof.cxx (gprof_component): Remove sim_sched and sim_sched_event. Remove the sim-sched-event attribute and the sim-sched relation. (accumulate): Increment the bucket count by the value driven on the pin. (configure): No longer a subscription to the target scheduler to maintain. 2006-06-14 Dave Brolley * sidcpuutil.h (basic_cpu::sample_gprof_pin): New pin. (sample_gprof): New method of basic_cpu. (step_pin_handler): Don't reset current_step_insn_count. Call sample_gprof, if requested and save latency. (stop_after_insns_p): Likewise. (unconfigure_gprof): Now takes no arguments. Remove unused code. Disconnect the sample_gprof_pin. (configure_gprof): Now takes configuration string. Connect the sample_gprof_pin. Determine the sampling mode. (gprof_cycles,gprof_counter,gprof_prev_latency): New members of basic_cpu. (configure): Initialize gprof_counter. (basic_cpu): Add sample-gprof pin. 2006-06-14 Dave Brolley * commonCfg.cxx (GprofCfg): Don't subscribe to the target scheduler. Connect the cpu's sample-gprof pin to our sample pin. No longer any relation with the target scheduler. (BoardCfg::write_config): Nl longer any relation between gprof and the target scheduler. --------------050407040100090507080407 Content-Type: text/plain; name="sid-gprof.patch.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="sid-gprof.patch.txt" Content-length: 15863 Index: sid/component/profiling/gprof.cxx =================================================================== RCS file: /cvs/src/src/sid/component/profiling/gprof.cxx,v retrieving revision 1.15 diff -c -p -r1.15 gprof.cxx *** sid/component/profiling/gprof.cxx 11 May 2006 16:04:38 -0000 1.15 --- sid/component/profiling/gprof.cxx 14 Jun 2006 19:28:59 -0000 *************** namespace profiling_components *** 135,143 **** vector stats; unsigned current_stats; - component *sim_sched; - string sim_sched_event; - string target_attribute; component* target_component; --- 135,140 ---- *************** namespace profiling_components *** 272,278 **** return component::ok; } ! void accumulate (host_int_4) { if (! this->target_component) return; --- 269,275 ---- return component::ok; } ! void accumulate (host_int_4 ticks) { if (! this->target_component) return; *************** namespace profiling_components *** 300,306 **** if (quantized < this->stats[current_stats].value_min) this->stats[current_stats].value_min = quantized; if (quantized > this->stats[current_stats].value_max) this->stats[current_stats].value_max = quantized; ! this->stats[current_stats].value_hitcount_map [quantized] ++; } void accumulate_call (host_int_4 selfpc_low) --- 297,303 ---- if (quantized < this->stats[current_stats].value_min) this->stats[current_stats].value_min = quantized; if (quantized > this->stats[current_stats].value_max) this->stats[current_stats].value_max = quantized; ! this->stats[current_stats].value_hitcount_map [quantized] += ticks; } void accumulate_call (host_int_4 selfpc_low) *************** namespace profiling_components *** 559,596 **** } current_stats = i; } - - // If cycles was specified, then we need to be subscribed to the - // target scheduler - if (! sim_sched) - return; - if (parts.size () == 2) - { - host_int_4 cycles; - component::status s = parse_attribute (parts[1], cycles); - if (s == component::ok) - { - sim_sched->connect_pin (sim_sched_event + "-event", & accumulate_pin); - sim_sched->set_attribute_value (sim_sched_event + "-time", make_attribute (cycles)); - // Take a sample now to make up for the one which just got cancelled when - // N-time was set. - accumulate (1); - return; - } - } } - // No gprof config or cycles was not specified. We will not be triggered by the - // target scheduler. - if (sim_sched) - sim_sched->disconnect_pin (sim_sched_event + "-event", & accumulate_pin); return; } } public: gprof_component (): - sim_sched (0), - sim_sched_event ("0"), target_attribute ("pc"), target_component (0), output_file_format (endian_unknown), --- 556,568 ---- *************** namespace profiling_components *** 605,611 **** add_pin ("sample", & this->accumulate_pin); add_attribute ("sample", & this->accumulate_pin, "pin"); - add_attribute ("sim-sched-event", & this->sim_sched_event, "setting"); add_pin ("cg-caller", & this->cg_caller_pin); add_attribute ("cg-caller", & this->cg_caller_pin, "pin"); add_pin ("cg-caller-hi", & this->cg_caller_hi_pin); --- 577,582 ---- *************** namespace profiling_components *** 653,659 **** "setting"); add_attribute ("output-file-endianness", & this->output_file_format, "setting"); add_uni_relation ("target-component", & this->target_component); - add_uni_relation ("sim-sched", & this->sim_sched); cg_caller_hi_pin.driven (0); cg_callee_hi_pin.driven (0); --- 624,629 ---- Index: sid/include/sidcpuutil.h =================================================================== RCS file: /cvs/src/src/sid/include/sidcpuutil.h,v retrieving revision 1.35 diff -c -p -r1.35 sidcpuutil.h *** sid/include/sidcpuutil.h 11 May 2006 19:29:51 -0000 1.35 --- sid/include/sidcpuutil.h 14 Jun 2006 19:28:59 -0000 *************** namespace sidutil *** 253,258 **** --- 253,259 ---- output_pin cg_callee_pin; output_pin cg_jump_pin; output_pin cg_return_pin; + output_pin sample_gprof_pin; // tracing private: *************** namespace sidutil *** 310,316 **** recursion_record limit (& this->step_limit); if (UNLIKELY(! limit.ok())) return; - this->current_step_insn_count = 0; this->yield_p = false; // Check for triggerpoints due right now; may set yield_p! --- 311,316 ---- *************** namespace sidutil *** 320,326 **** sid::host_int_8 prev_latency = this->total_latency; sid::host_int_8 prev_insn_count = this->total_insn_count; if (! this->yield_p) ! this->step_insns (); sid::host_int_8 num_insns = this->total_insn_count - prev_insn_count; sid::host_int_8 latency = this->total_latency - prev_latency; --- 320,331 ---- sid::host_int_8 prev_latency = this->total_latency; sid::host_int_8 prev_insn_count = this->total_insn_count; if (! this->yield_p) ! { ! if (UNLIKELY (this->gprof_configured_p)) ! this->sample_gprof (1); ! this->gprof_prev_latency = this->total_latency; ! this->step_insns (); ! } sid::host_int_8 num_insns = this->total_insn_count - prev_insn_count; sid::host_int_8 latency = this->total_latency - prev_latency; *************** namespace sidutil *** 392,401 **** --- 397,436 ---- return num; } + void sample_gprof (sid::host_int_4 num_insns) + { + this->gprof_counter += num_insns; + + // Sample for gprof in insn-count mode? + if (this->gprof_cycles == 0) + { + sid::host_int_4 ticks = this->gprof_counter / this->step_insn_count; + if (ticks > 0) + { + this->sample_gprof_pin.drive (ticks); + this->gprof_counter %= this->step_insn_count; + } + return; + } + + // Sample for gprof in cycle mode + if ((sid::signed_host_int_8)(this->total_latency) > (sid::signed_host_int_8)(this->gprof_prev_latency)) + this->gprof_counter += this->total_latency - this->gprof_prev_latency; + + sid::host_int_4 ticks = this->gprof_counter / this->gprof_cycles; + if (ticks > 0) + { + this->sample_gprof_pin.drive (ticks); + this->gprof_counter %= this->gprof_cycles; + } + } + virtual void step_insns () = 0; bool stop_after_insns_p (sid::host_int_4 num) { this->current_step_insn_count += num; + + bool rc; if (UNLIKELY(this->yield_p || (this->current_step_insn_count >= this->step_insn_count))) { *************** namespace sidutil *** 403,414 **** // arithmetic overhead in the inner insn-stepping loops. this->total_insn_count += this->current_step_insn_count; this->current_step_insn_count = 0; ! return true; } else { ! return false; } } void --- 438,459 ---- // arithmetic overhead in the inner insn-stepping loops. this->total_insn_count += this->current_step_insn_count; this->current_step_insn_count = 0; ! rc = true; // stop! } else + rc = false; // don't stop + + // Sample for gprof? + if (UNLIKELY (this->gprof_configured_p)) { ! // Count 1 fewer insns if exiting to account for the one counted on entry ! if (rc) ! --num; ! this->sample_gprof (num); } + this->gprof_prev_latency = this->total_latency; + + return rc; } void *************** namespace sidutil *** 487,493 **** return static_cast(trap_disposition_pin.sense ()); } ! void unconfigure_gprof (const string &gprof_spec, sid::host_int_4 num_cycles) { if (! gprof_configured_p) return; --- 532,538 ---- return static_cast(trap_disposition_pin.sense ()); } ! void unconfigure_gprof () { if (! gprof_configured_p) return; *************** namespace sidutil *** 495,536 **** assert (gprof); sid::pin *p; - #if 0 // can't happen? - // If 'cycles' was specified on the --gprof option, then - // first, sample the address of the branch which caused - // the reconfig for the given number of cycles. - if (num_cycles && last_caller && gprof_spec.size () > 6) - { - vector parts = tokenize (gprof_spec.substr (6), ","); - if (parts.size () > 1) - { - p = gprof->find_pin ("sample"); - if (p) - { - std::string save_pc = this->attribute_value ("pc"); - if (! save_pc.empty ()) - { - sid::component::status s = this->set_attribute_value ("pc", make_numeric_attribute (last_caller)); - if (s == sid::component::ok) - for (int i = 0; i < num_cycles; ++i) - p->driven (1); - this->set_attribute_value ("pc", save_pc); - } - } - } - } - #endif - // Then disconnect the call graph notification pins. p = gprof->find_pin ("cg-caller"); if (p) cg_caller_pin.disconnect (p); p = gprof->find_pin ("cg-callee"); if (p) cg_callee_pin.disconnect (p); gprof_configured_p = false; } ! void configure_gprof () { if (gprof_configured_p) return; --- 540,557 ---- assert (gprof); sid::pin *p; // Then disconnect the call graph notification pins. p = gprof->find_pin ("cg-caller"); if (p) cg_caller_pin.disconnect (p); p = gprof->find_pin ("cg-callee"); if (p) cg_callee_pin.disconnect (p); + p = gprof->find_pin ("sample"); + if (p) sample_gprof_pin.disconnect (p); gprof_configured_p = false; } ! void configure_gprof (const string &config) { if (gprof_configured_p) return; *************** namespace sidutil *** 551,556 **** --- 572,593 ---- p->driven (last_callee); } + p = gprof->find_pin ("sample"); + if (p) + sample_gprof_pin.connect (p); + + vector parts = tokenize (config.substr (6), ","); + if (parts.size () == 2) + { + component::status s = parse_attribute (parts[1], gprof_cycles); + gprof_counter = gprof_cycles - 1; + } + else + { + gprof_cycles = 0; + gprof_counter = step_insn_count - 1; + } + gprof_configured_p = true; } *************** namespace sidutil *** 563,568 **** --- 600,608 ---- sid::host_int_4 last_caller; sid::host_int_4 last_callee; bool gprof_configured_p; + sid::host_int_4 gprof_cycles; + sid::host_int_4 gprof_counter; + sid::host_int_8 gprof_prev_latency; virtual void configure (const string &config) { *************** namespace sidutil *** 580,588 **** gprof->set_attribute_value ("configure!", config); // Now do our own configuration if (config.size () > 6) ! configure_gprof (); else ! unconfigure_gprof (config, num_cycles); return; } if (config.size () <= 11) --- 620,628 ---- gprof->set_attribute_value ("configure!", config); // Now do our own configuration if (config.size () > 6) ! configure_gprof (config); else ! unconfigure_gprof (); return; } if (config.size () <= 11) *************** namespace sidutil *** 592,598 **** sid::host_int_4 n; sid::component::status s = parse_attribute (config.substr (11), n); if (s == sid::component::ok) ! step_insn_count = n; return; } if (config.substr (0, 8) == "verbose=") --- 632,642 ---- sid::host_int_4 n; sid::component::status s = parse_attribute (config.substr (11), n); if (s == sid::component::ok) ! { ! step_insn_count = n; ! if (gprof_configured_p && gprof_cycles == 0) ! gprof_counter = step_insn_count - 1; ! } return; } if (config.substr (0, 8) == "verbose=") *************** namespace sidutil *** 838,843 **** --- 882,888 ---- public: basic_cpu (): total_latency (0), + current_step_insn_count (0), step_limit ("instruction stepping", 1), sched_query (this), triggerpoint_manager (this), *************** public: *** 857,862 **** --- 902,908 ---- last_caller (0), last_callee (0), gprof_configured_p (false), + gprof_prev_latency (0), core_probe (0), main (0) { *************** public: *** 881,886 **** --- 927,933 ---- add_pin ("cg-return", & this->cg_return_pin); add_pin ("cg-jump", & this->cg_jump_pin); add_pin ("print-insn-summary!", & this->print_insn_summary_pin); + add_pin ("sample-gprof", &sample_gprof_pin); add_pin ("endian-set!", & this->endian_set_pin); add_pin ("eflags-set!", & this->eflags_set_pin); add_watchable_pin ("trap", & this->trap_type_pin); // output side Index: sid/main/dynamic/commonCfg.cxx =================================================================== RCS file: /cvs/src/src/sid/main/dynamic/commonCfg.cxx,v retrieving revision 1.12 diff -c -p -r1.12 commonCfg.cxx *** sid/main/dynamic/commonCfg.cxx 11 May 2006 20:27:02 -0000 1.12 --- sid/main/dynamic/commonCfg.cxx 14 Jun 2006 19:28:59 -0000 *************** GprofCfg::GprofCfg (const string name, *** 1002,1022 **** { assert (cpu); assert (sess); - // Add a subscription to the target scheduler. Even if it's not - // used now, it could be used due to dynamic configuration. - assert (sess->sim_sched); - int slot = sess->sim_sched->add_subscription (this, "sample"); - sess->sim_sched->set_regular (slot, true); - sess->sim_sched->set_time (slot, interval); - sess->sim_sched->set_priority (slot, SchedCfg::gprof_priority); ! if (type != simulated_cycles) ! { ! // default to instruction_count ! string ev = sidutil::make_attribute (cpu->get_subscription_number()); ! ev += "-event"; ! conn_pin (sess->sim_sched, ev, this, "sample"); ! } sess->shutdown_seq->add_output (7, this, "store"); relate (this, "target-component", cpu); --- 1002,1009 ---- { assert (cpu); assert (sess); ! conn_pin (cpu, "sample-gprof", this, "sample"); sess->shutdown_seq->add_output (7, this, "store"); relate (this, "target-component", cpu); *************** GprofCfg::GprofCfg (const string name, *** 1025,1031 **** set (this, "value-attribute", "pc"); set (this, "bucket-size", "4"); // bytes-per-bucket set (this, "output-file", filename); - set (this, "sim-sched-event", sidutil::make_attribute (slot)); } // Create a gprof component but don't activate it --- 1012,1017 ---- *************** GprofCfg::GprofCfg (const string name, *** 1039,1057 **** { assert (cpu); assert (sess); - // Add a subscription to the target scheduler. Even if it's not - // used now, it could be used due to dynamic configuration. - assert (sess->sim_sched); - int slot = sess->sim_sched->add_subscription (this, "sample"); - sess->sim_sched->set_regular (slot, true); - sess->sim_sched->set_time (slot, 1); - sess->sim_sched->set_priority (slot, SchedCfg::gprof_priority); sess->shutdown_seq->add_output (7, this, "store"); relate (this, "target-component", cpu); set (this, "value-attribute", "pc"); set (this, "bucket-size", "4"); // bytes-per-bucket - set (this, "sim-sched-event", sidutil::make_attribute (slot)); } --- 1025,1035 ---- *************** void BoardCfg::write_config (Writer &w) *** 1330,1336 **** if (gprof) { // gprof's configure! attribute will be set by the cpu. - Relation (gprof, "sim-sched", sess->sim_sched).write_to (w); Relation (cpu, "gprof", gprof).write_to (w); } if (! gloss->possibly_wrapped ()) --- 1308,1313 ---- --------------050407040100090507080407--