public inbox for sid@sourceware.org
 help / color / mirror / Atom feed
* [patch][commit] Performance of --gprof option
@ 2006-06-14 20:41 Dave Brolley
  0 siblings, 0 replies; only message in thread
From: Dave Brolley @ 2006-06-14 20:41 UTC (permalink / raw)
  To: sid

[-- Attachment #1: Type: text/plain, Size: 2142 bytes --]

Hi,

We have two ports which support pipeline modelling (--final-insn-count, 
--trace-count) combined with use of --gprof=<file>,cycles=1 in order to 
obtain cycle accurate profiles of the applications being simulated. Our 
clients have complained about the poor performance of SID when using 
these options. The poor performance is mainly due to the following factors:

o The gprof component is a "regular" client of the target scheduler when 
in cycle mode.
o It is necessary to use --insn-count=1 in order to obtain accurate 
samples when using --gprof=<file>,cycles=n. Otherwise the gprof 
component is not triggered after the given number of cycles.

These combine to slow the simulation because each simulated cycle 
requires one entire target scheduler iteration. Also, the gprof 
component's "sample" pin is driven once per simulated cycle.

By making two simple changes, I was able to improve the performance of 
SID by 30% in this situation.

1) Have the cpu component perform gprof sampling directly.
2) Allow a given number of cycles to be driven on the gprof component's 
"sample" pin.

Change number 1 removes the requirement of using --insn-count=1. Gprof 
sampling is now done by the cpu component in step_pin_handler and in 
stop_after_insns_p.

Change number 2 allows several samples to be accumulated at the same pc 
(when there is some kind of stall due to latency), eliminating the need 
to drive the "sample" pin once per simulated cycle. This dramatically 
reduces the number of times the pin is driven during a typical simulation.

These changes also fix a couple of "bugs" in that the gprof option 
wasn't working as advertised (when --help is specified) since

o More samples would be taken than expected in insn-count mode if the 
cpu was yielded
o Fewer samples than expected would be taken in cycle mode if 
--insn-count=1 was not used.

The correct number of samples is now taken in all situations.

I've committed the attached patch which implements these changes. Tested 
on two internal ports which support --gprof. Judging from the lack of 
calls to cg_profile*, no other ports are using this feature.

Dave

[-- Attachment #2: sid-gprof.ChangeLog --]
[-- Type: text/plain, Size: 1307 bytes --]

2006-06-14  Dave Brolley  <brolley@redhat.com>

	* gprof.cxx (gprof_component): Remove sim_sched and sim_sched_event.
	Remove the sim-sched-event attribute and the sim-sched relation.
	(accumulate): Increment the bucket count by the value driven on the
	pin.
	(configure): No longer a subscription to the target scheduler to
	maintain.

2006-06-14  Dave Brolley  <brolley@redhat.com>

	* sidcpuutil.h (basic_cpu::sample_gprof_pin): New pin.
	(sample_gprof): New method of basic_cpu.
	(step_pin_handler): Don't reset current_step_insn_count.
	Call sample_gprof, if requested and save latency.
	(stop_after_insns_p): Likewise.
	(unconfigure_gprof): Now takes no arguments. Remove unused code.
	Disconnect the sample_gprof_pin.
	(configure_gprof): Now takes configuration string. Connect the
	sample_gprof_pin. Determine the sampling mode.
	(gprof_cycles,gprof_counter,gprof_prev_latency): New members of
	basic_cpu.
	(configure): Initialize gprof_counter.
	(basic_cpu): Add sample-gprof pin.

2006-06-14  Dave Brolley  <brolley@redhat.com>

	* commonCfg.cxx (GprofCfg): Don't subscribe to the target scheduler.
	Connect the cpu's sample-gprof pin to our sample pin. No longer any
	relation with the target scheduler.
	(BoardCfg::write_config): Nl longer any relation between gprof and
	the target scheduler.


[-- Attachment #3: sid-gprof.patch.txt --]
[-- Type: text/plain, Size: 15863 bytes --]

Index: sid/component/profiling/gprof.cxx
===================================================================
RCS file: /cvs/src/src/sid/component/profiling/gprof.cxx,v
retrieving revision 1.15
diff -c -p -r1.15 gprof.cxx
*** sid/component/profiling/gprof.cxx	11 May 2006 16:04:38 -0000	1.15
--- sid/component/profiling/gprof.cxx	14 Jun 2006 19:28:59 -0000
*************** namespace profiling_components
*** 135,143 ****
      vector<statistics> stats;
      unsigned current_stats;
  
-     component *sim_sched;
-     string sim_sched_event;
- 
      string target_attribute;
      component* target_component;
  
--- 135,140 ----
*************** namespace profiling_components
*** 272,278 ****
  	return component::ok;
        }
  
!     void accumulate (host_int_4)
        {
  	if (! this->target_component) return;
  
--- 269,275 ----
  	return component::ok;
        }
  
!     void accumulate (host_int_4 ticks)
        {
  	if (! this->target_component) return;
  
*************** namespace profiling_components
*** 300,306 ****
  
  	if (quantized < this->stats[current_stats].value_min) this->stats[current_stats].value_min = quantized;
  	if (quantized > this->stats[current_stats].value_max) this->stats[current_stats].value_max = quantized;
! 	this->stats[current_stats].value_hitcount_map [quantized] ++;
        }
  
      void accumulate_call (host_int_4 selfpc_low)
--- 297,303 ----
  
  	if (quantized < this->stats[current_stats].value_min) this->stats[current_stats].value_min = quantized;
  	if (quantized > this->stats[current_stats].value_max) this->stats[current_stats].value_max = quantized;
! 	this->stats[current_stats].value_hitcount_map [quantized] += ticks;
        }
  
      void accumulate_call (host_int_4 selfpc_low)
*************** namespace profiling_components
*** 559,596 ****
  		      }
  		    current_stats = i;
  		  }
- 
- 		// If cycles was specified, then we need to be subscribed to the
- 		// target scheduler
- 		if (! sim_sched)
- 		  return;
- 		if (parts.size () == 2)
- 		  {
- 		    host_int_4 cycles;
- 		    component::status s = parse_attribute (parts[1], cycles);
- 		    if (s == component::ok)
- 		      {
- 			sim_sched->connect_pin (sim_sched_event + "-event", & accumulate_pin);
- 			sim_sched->set_attribute_value (sim_sched_event + "-time", make_attribute (cycles));
- 			// Take a sample now to make up for the one which just got cancelled when
- 			// N-time was set.
- 			accumulate (1);
- 			return;
- 		      }
- 		  }
  	      }
- 	    // No gprof config or cycles was not specified. We will not be triggered by the
- 	    // target scheduler.
- 	    if (sim_sched)
- 	      sim_sched->disconnect_pin (sim_sched_event + "-event", & accumulate_pin);
  	    return;
  	  }
        }
  
    public:
      gprof_component ():
-       sim_sched (0),
-       sim_sched_event ("0"),
        target_attribute ("pc"),
        target_component (0),
        output_file_format (endian_unknown),
--- 556,568 ----
*************** namespace profiling_components
*** 605,611 ****
  
  	add_pin ("sample", & this->accumulate_pin);
  	add_attribute ("sample", & this->accumulate_pin, "pin");
- 	add_attribute ("sim-sched-event", & this->sim_sched_event, "setting");
  	add_pin ("cg-caller", & this->cg_caller_pin);
  	add_attribute ("cg-caller", & this->cg_caller_pin, "pin");
  	add_pin ("cg-caller-hi", & this->cg_caller_hi_pin);
--- 577,582 ----
*************** namespace profiling_components
*** 653,659 ****
  			       "setting");
  	add_attribute ("output-file-endianness", & this->output_file_format, "setting");
  	add_uni_relation ("target-component", & this->target_component);
- 	add_uni_relation ("sim-sched", & this->sim_sched);
  
  	cg_caller_hi_pin.driven (0);
  	cg_callee_hi_pin.driven (0);
--- 624,629 ----
Index: sid/include/sidcpuutil.h
===================================================================
RCS file: /cvs/src/src/sid/include/sidcpuutil.h,v
retrieving revision 1.35
diff -c -p -r1.35 sidcpuutil.h
*** sid/include/sidcpuutil.h	11 May 2006 19:29:51 -0000	1.35
--- sid/include/sidcpuutil.h	14 Jun 2006 19:28:59 -0000
*************** namespace sidutil
*** 253,258 ****
--- 253,259 ----
      output_pin cg_callee_pin;
      output_pin cg_jump_pin;
      output_pin cg_return_pin;
+     output_pin sample_gprof_pin;
     
      // tracing
    private:
*************** namespace sidutil
*** 310,316 ****
  	recursion_record limit (& this->step_limit);
  	if (UNLIKELY(! limit.ok())) return;
  
- 	this->current_step_insn_count = 0;
  	this->yield_p = false;
  
  	// Check for triggerpoints due right now; may set yield_p!
--- 311,316 ----
*************** namespace sidutil
*** 320,326 ****
  	sid::host_int_8 prev_latency = this->total_latency;
  	sid::host_int_8 prev_insn_count = this->total_insn_count;
  	if (! this->yield_p)
! 	  this->step_insns ();
  	sid::host_int_8 num_insns = this->total_insn_count - prev_insn_count;
  	sid::host_int_8 latency = this->total_latency - prev_latency;
  
--- 320,331 ----
  	sid::host_int_8 prev_latency = this->total_latency;
  	sid::host_int_8 prev_insn_count = this->total_insn_count;
  	if (! this->yield_p)
! 	  {
! 	    if (UNLIKELY (this->gprof_configured_p))
! 	      this->sample_gprof (1);
! 	    this->gprof_prev_latency = this->total_latency;
! 	    this->step_insns ();
! 	  }
  	sid::host_int_8 num_insns = this->total_insn_count - prev_insn_count;
  	sid::host_int_8 latency = this->total_latency - prev_latency;
  
*************** namespace sidutil
*** 392,401 ****
--- 397,436 ----
        return num;
      }
  
+     void sample_gprof (sid::host_int_4 num_insns)
+     {
+       this->gprof_counter += num_insns;
+ 
+       // Sample for gprof in insn-count mode?
+       if (this->gprof_cycles == 0)
+ 	{
+ 	  sid::host_int_4 ticks = this->gprof_counter / this->step_insn_count;
+ 	  if (ticks > 0)
+ 	    {
+ 	      this->sample_gprof_pin.drive (ticks);
+ 	      this->gprof_counter %= this->step_insn_count;
+ 	    }
+ 	  return;
+ 	}
+ 
+       // Sample for gprof in cycle mode
+       if ((sid::signed_host_int_8)(this->total_latency) > (sid::signed_host_int_8)(this->gprof_prev_latency))
+ 	this->gprof_counter += this->total_latency - this->gprof_prev_latency;
+ 
+       sid::host_int_4 ticks = this->gprof_counter / this->gprof_cycles;
+       if (ticks > 0)
+ 	{
+ 	  this->sample_gprof_pin.drive (ticks);
+ 	  this->gprof_counter %= this->gprof_cycles;
+ 	}
+     }
+ 
      virtual void step_insns () = 0;
      bool stop_after_insns_p (sid::host_int_4 num)
        {
  	this->current_step_insn_count += num;
+ 
+ 	bool rc;
  	if (UNLIKELY(this->yield_p ||
  		    (this->current_step_insn_count >= this->step_insn_count)))
            {
*************** namespace sidutil
*** 403,414 ****
              // arithmetic overhead in the inner insn-stepping loops.
  	    this->total_insn_count += this->current_step_insn_count;
  	    this->current_step_insn_count = 0;
! 	    return true;
            }
          else
  	  {
! 	    return false;
  	  }
        }
  
      void
--- 438,459 ----
              // arithmetic overhead in the inner insn-stepping loops.
  	    this->total_insn_count += this->current_step_insn_count;
  	    this->current_step_insn_count = 0;
! 	    rc = true; // stop!
            }
          else
+ 	  rc = false; // don't stop
+ 
+ 	// Sample for gprof?
+ 	if (UNLIKELY (this->gprof_configured_p))
  	  {
! 	    // Count 1 fewer insns if exiting to account for the one counted on entry
! 	    if (rc)
! 	      --num;
! 	    this->sample_gprof (num);
  	  }
+ 	this->gprof_prev_latency = this->total_latency;
+ 
+ 	return rc;
        }
  
      void
*************** namespace sidutil
*** 487,493 ****
  	return static_cast<cpu_trap_disposition>(trap_disposition_pin.sense ());
        }
  
!     void unconfigure_gprof (const string &gprof_spec, sid::host_int_4 num_cycles)
        {
  	if (! gprof_configured_p)
  	  return;
--- 532,538 ----
  	return static_cast<cpu_trap_disposition>(trap_disposition_pin.sense ());
        }
  
!     void unconfigure_gprof ()
        {
  	if (! gprof_configured_p)
  	  return;
*************** namespace sidutil
*** 495,536 ****
  	assert (gprof);
  	sid::pin *p;
  
- #if 0 // can't happen?
- 	// If 'cycles' was specified on the --gprof option, then
- 	// first, sample the address of the branch which caused
- 	// the reconfig for the given number of cycles.
- 	if (num_cycles && last_caller && gprof_spec.size () > 6)
- 	  {
- 	    vector<string> parts = tokenize (gprof_spec.substr (6), ",");
- 	    if (parts.size () > 1)
- 	      {
- 		p = gprof->find_pin ("sample");
- 		if (p)
- 		  {
- 		    std::string save_pc = this->attribute_value ("pc");
- 		    if (! save_pc.empty ())
- 		      {
- 			sid::component::status s = this->set_attribute_value ("pc", make_numeric_attribute (last_caller));
- 			if (s == sid::component::ok)
- 			  for (int i = 0; i < num_cycles; ++i)
- 			    p->driven (1);
- 			this->set_attribute_value ("pc", save_pc);
- 		      }
- 		  }
- 	      }
- 	  }
- #endif
- 
  	// Then disconnect the call graph notification pins.
  	p = gprof->find_pin ("cg-caller");
  	if (p) cg_caller_pin.disconnect (p); 
  	p = gprof->find_pin ("cg-callee");
  	if (p) cg_callee_pin.disconnect (p);
  
  	gprof_configured_p = false;
        }
  
!     void configure_gprof ()
        {
  	if (gprof_configured_p)
  	  return;
--- 540,557 ----
  	assert (gprof);
  	sid::pin *p;
  
  	// Then disconnect the call graph notification pins.
  	p = gprof->find_pin ("cg-caller");
  	if (p) cg_caller_pin.disconnect (p); 
  	p = gprof->find_pin ("cg-callee");
  	if (p) cg_callee_pin.disconnect (p);
+ 	p = gprof->find_pin ("sample");
+ 	if (p) sample_gprof_pin.disconnect (p);
  
  	gprof_configured_p = false;
        }
  
!     void configure_gprof (const string &config)
        {
  	if (gprof_configured_p)
  	  return;
*************** namespace sidutil
*** 551,556 ****
--- 572,593 ----
  	    p->driven (last_callee); 
  	}
  
+ 	p = gprof->find_pin ("sample");
+ 	if (p)
+ 	  sample_gprof_pin.connect (p);
+ 
+ 	vector<string> parts = tokenize (config.substr (6), ",");
+ 	if (parts.size () == 2)
+ 	  {
+ 	    component::status s = parse_attribute (parts[1], gprof_cycles);
+ 	    gprof_counter = gprof_cycles - 1;
+ 	  }
+ 	else
+ 	  {
+ 	    gprof_cycles = 0;
+ 	    gprof_counter = step_insn_count - 1;
+ 	  }
+ 
  	gprof_configured_p = true;
        }
  
*************** namespace sidutil
*** 563,568 ****
--- 600,608 ----
      sid::host_int_4 last_caller;
      sid::host_int_4 last_callee;
      bool gprof_configured_p;
+     sid::host_int_4 gprof_cycles;
+     sid::host_int_4 gprof_counter;
+     sid::host_int_8 gprof_prev_latency;
  
      virtual void configure (const string &config)
        {
*************** namespace sidutil
*** 580,588 ****
  	    gprof->set_attribute_value ("configure!", config);
  	    // Now do our own configuration
  	    if (config.size () > 6)
! 	      configure_gprof ();
  	    else
! 	      unconfigure_gprof (config, num_cycles);
  	    return;
  	  }
  	if (config.size () <= 11)
--- 620,628 ----
  	    gprof->set_attribute_value ("configure!", config);
  	    // Now do our own configuration
  	    if (config.size () > 6)
! 	      configure_gprof (config);
  	    else
! 	      unconfigure_gprof ();
  	    return;
  	  }
  	if (config.size () <= 11)
*************** namespace sidutil
*** 592,598 ****
  	    sid::host_int_4 n;
  	    sid::component::status s = parse_attribute (config.substr (11), n);
  	    if (s == sid::component::ok)
! 	      step_insn_count = n;
  	    return;
  	  }
  	if (config.substr (0, 8) == "verbose=")
--- 632,642 ----
  	    sid::host_int_4 n;
  	    sid::component::status s = parse_attribute (config.substr (11), n);
  	    if (s == sid::component::ok)
! 	      {
! 		step_insn_count = n;
! 		if (gprof_configured_p && gprof_cycles == 0)
! 		  gprof_counter = step_insn_count - 1;
! 	      }
  	    return;
  	  }
  	if (config.substr (0, 8) == "verbose=")
*************** namespace sidutil
*** 838,843 ****
--- 882,888 ----
  public:
      basic_cpu ():
        total_latency (0),
+       current_step_insn_count (0),
        step_limit ("instruction stepping", 1),
        sched_query (this),
        triggerpoint_manager (this),
*************** public:
*** 857,862 ****
--- 902,908 ----
        last_caller (0),
        last_callee (0),
        gprof_configured_p (false),
+       gprof_prev_latency (0),
        core_probe (0),
        main (0)
        {
*************** public:
*** 881,886 ****
--- 927,933 ----
  	add_pin ("cg-return", & this->cg_return_pin);
  	add_pin ("cg-jump", & this->cg_jump_pin);
  	add_pin ("print-insn-summary!", & this->print_insn_summary_pin);
+ 	add_pin ("sample-gprof", &sample_gprof_pin);
  	add_pin ("endian-set!", & this->endian_set_pin);
  	add_pin ("eflags-set!", & this->eflags_set_pin);
  	add_watchable_pin ("trap", & this->trap_type_pin); // output side
Index: sid/main/dynamic/commonCfg.cxx
===================================================================
RCS file: /cvs/src/src/sid/main/dynamic/commonCfg.cxx,v
retrieving revision 1.12
diff -c -p -r1.12 commonCfg.cxx
*** sid/main/dynamic/commonCfg.cxx	11 May 2006 20:27:02 -0000	1.12
--- sid/main/dynamic/commonCfg.cxx	14 Jun 2006 19:28:59 -0000
*************** GprofCfg::GprofCfg (const string name, 
*** 1002,1022 ****
  {
    assert (cpu);
    assert (sess);
-   // Add a subscription to the target scheduler. Even if it's not
-   // used now, it could be used due to dynamic configuration.
-   assert (sess->sim_sched);
-   int slot = sess->sim_sched->add_subscription (this, "sample");
-   sess->sim_sched->set_regular (slot, true);
-   sess->sim_sched->set_time (slot, interval);
-   sess->sim_sched->set_priority (slot, SchedCfg::gprof_priority);
  
!   if (type != simulated_cycles)
!     {
!       // default to instruction_count
!       string ev = sidutil::make_attribute (cpu->get_subscription_number());
!       ev += "-event";
!       conn_pin (sess->sim_sched, ev, this, "sample");
!     }
  
    sess->shutdown_seq->add_output (7, this, "store");
    relate (this, "target-component", cpu);
--- 1002,1009 ----
  {
    assert (cpu);
    assert (sess);
  
!   conn_pin (cpu, "sample-gprof", this, "sample");
  
    sess->shutdown_seq->add_output (7, this, "store");
    relate (this, "target-component", cpu);
*************** GprofCfg::GprofCfg (const string name, 
*** 1025,1031 ****
    set (this, "value-attribute", "pc");
    set (this, "bucket-size", "4"); // bytes-per-bucket
    set (this, "output-file", filename);
-   set (this, "sim-sched-event", sidutil::make_attribute (slot));
  }
  
  // Create a gprof component but don't activate it
--- 1012,1017 ----
*************** GprofCfg::GprofCfg (const string name,
*** 1039,1057 ****
  {
    assert (cpu);
    assert (sess);
-   // Add a subscription to the target scheduler. Even if it's not
-   // used now, it could be used due to dynamic configuration.
-   assert (sess->sim_sched);
-   int slot = sess->sim_sched->add_subscription (this, "sample");
-   sess->sim_sched->set_regular (slot, true);
-   sess->sim_sched->set_time (slot, 1);
-   sess->sim_sched->set_priority (slot, SchedCfg::gprof_priority);
  
    sess->shutdown_seq->add_output (7, this, "store");
    relate (this, "target-component", cpu);
    set (this, "value-attribute", "pc");
    set (this, "bucket-size", "4"); // bytes-per-bucket
-   set (this, "sim-sched-event", sidutil::make_attribute (slot));
  }
  
  
--- 1025,1035 ----
*************** void BoardCfg::write_config (Writer &w)
*** 1330,1336 ****
        if (gprof)
        	{
  	  // gprof's configure! attribute will be set by the cpu.
- 	  Relation (gprof, "sim-sched", sess->sim_sched).write_to (w);
  	  Relation (cpu, "gprof", gprof).write_to (w);
        	}
        if (! gloss->possibly_wrapped ())
--- 1308,1313 ----

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2006-06-14 20:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-14 20:41 [patch][commit] Performance of --gprof option Dave Brolley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).