public inbox for sid@sourceware.org
 help / color / mirror / Atom feed
* Profiling: --insn-count=1
@ 2002-07-31 12:53 Scott Dattalo
  2002-08-01  4:18 ` Frank Ch. Eigler
  0 siblings, 1 reply; 5+ messages in thread
From: Scott Dattalo @ 2002-07-31 12:53 UTC (permalink / raw)
  To: sid


To profile my simulated code, I've been invoking sid like so:

arm-elf-sid --cpu arm --memory-region=0x2020000,0x2000000  \
--memory-region=0xfffe0000,0x1ffff --gdb=2000 --gprof      \
--trace-counter --insn-count=1 -EL myprog

I then simulate the application with arm-elf-gcc and examine the results 
with arm-elf-gprof.

Now the question I have is there a way to count cpu cycles instead of cpu 
instructions? If there was a one-to-one relationship between the two, then 
it's not an issue. However, some instructions on the ARM are not 
single-cycled. I suppose the real question is, "is there a way to 
concisely measure the amount of 'simulated' time it take for a simulation 
to run?"

FWIW, I'm using ~6 week old copy of SID.

Scott

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Profiling: --insn-count=1
  2002-07-31 12:53 Profiling: --insn-count=1 Scott Dattalo
@ 2002-08-01  4:18 ` Frank Ch. Eigler
  2002-08-01  5:03   ` Scott Dattalo
  0 siblings, 1 reply; 5+ messages in thread
From: Frank Ch. Eigler @ 2002-08-01  4:18 UTC (permalink / raw)
  To: Scott Dattalo; +Cc: sid

[-- Attachment #1: Type: text/plain, Size: 996 bytes --]

Hi, Scott -

On Wed, Jul 31, 2002 at 12:53:37PM -0700, Scott Dattalo wrote:
> [...]
> Now the question I have is there a way to count cpu cycles instead of cpu 
> instructions? If there was a one-to-one relationship between the two, then 
> it's not an issue. However, some instructions on the ARM are not 
> single-cycled. I suppose the real question is, "is there a way to 
> concisely measure the amount of 'simulated' time it take for a simulation 
> to run?"

The current batch of CPU models in sid do not attempt to track the number
of cycles taken by any given instruction.  To do so exactly is a crazy
amount of work to do just casually.  (Think of having to model all the
pipeline interlock/bypass features, functional units.)  

SID can on the other hand model memory latency, so if that's the bulk of
your interest, we can make the profile data collector sensitive to that.


> FWIW, I'm using ~6 week old copy of SID.

This hasn't changed recently.


- FChE

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Profiling: --insn-count=1
  2002-08-01  4:18 ` Frank Ch. Eigler
@ 2002-08-01  5:03   ` Scott Dattalo
  2002-08-01  8:32     ` Frank Ch. Eigler
  0 siblings, 1 reply; 5+ messages in thread
From: Scott Dattalo @ 2002-08-01  5:03 UTC (permalink / raw)
  To: sid

On Thu, 1 Aug 2002, Frank Ch. Eigler wrote:

> Hi, Scott -
> 
> On Wed, Jul 31, 2002 at 12:53:37PM -0700, Scott Dattalo wrote:
> > [...]
> > Now the question I have is there a way to count cpu cycles instead of cpu 
> > instructions? If there was a one-to-one relationship between the two, then 
> > it's not an issue. However, some instructions on the ARM are not 
> > single-cycled. I suppose the real question is, "is there a way to 
> > concisely measure the amount of 'simulated' time it take for a simulation 
> > to run?"
> 
> The current batch of CPU models in sid do not attempt to track the number
> of cycles taken by any given instruction.  To do so exactly is a crazy
> amount of work to do just casually.  (Think of having to model all the
> pipeline interlock/bypass features, functional units.)  

Yeah, I suspected as much... If I had time to look into it, I'd try to add
that feature. The way I'd approach it is I'd partition the time it takes
an instruction to execute into two parts: the fixed amount of time the CPU
requires and the (possibly) variable amount that the memory accesses
require.  The fixed portion may be ascertained when the target program is
first loaded. The variable portion may too, depending on the address
accessed.  If not, the address can be examined to see which region it
accesses (and memory can be partitioned into regions that describe how
much time it takes to access - e.g. single-cycle SRAM versus 7-wait state
FLASH).

I'm sure this is already obvious to you because, as you say, it would 
require a fair amount of work to implement .


> 
> SID can on the other hand model memory latency, so if that's the bulk of
> your interest, we can make the profile data collector sensitive to that.

Well, actually, that is half the problem.

In my particular application, I have about an 8 Meg program of which only 
70k is code and the rest are constants. For all intents and purposes, you 
can think of the ~7.93 Meg being a file system. The code will be shadowed 
in single-cycle SRAM. The application is an extremely low powered one. 
Furthermore, it has time constraints (i.e. it has to complete its task in 
under a second or so). The code was originally written targetting a 
Desktop PC. I.e. little or no attention was given to code efficiency since 
it was not an issue. So I've been re-writing the code to optimize it for 
an ARM application. 

The development scenario has thus been:

1) make optimizations to the code and test on a Linux box
2) debug and go back to step 1 about 100 hundred times or so.
3) Once convinced that an optimization has been correctly made
   re-target the makefile for an ARM processor
4) simulate the code (using sid as the simulator engine, of course)
5) analyze the simulation results

So far, I've been satisfied knowing the total number of executed
instructions. Objective results are easily quantified.  However, I'm now
rapidly approaching the point where the optimizations have been completed. 
While I know the approximate number of instructions, I still do not know 
the total number of CPU cycles (and hence the total time).

> 
> 
> > FWIW, I'm using ~6 week old copy of SID.
> 
> This hasn't changed recently.

Good to know - I've seen the automated messages regarding snap-shots and 
was unsure off hand if there had been any changes.

Scott

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Profiling: --insn-count=1
  2002-08-01  5:03   ` Scott Dattalo
@ 2002-08-01  8:32     ` Frank Ch. Eigler
  2002-08-01  9:00       ` Scott Dattalo
  0 siblings, 1 reply; 5+ messages in thread
From: Frank Ch. Eigler @ 2002-08-01  8:32 UTC (permalink / raw)
  To: Scott Dattalo; +Cc: sid

[-- Attachment #1: Type: text/plain, Size: 2207 bytes --]

Hi -

On Thu, Aug 01, 2002 at 05:03:55AM -0700, Scott Dattalo wrote:
> [...]
> Yeah, I suspected as much... If I had time to look into it, I'd try to add
> that feature. The way I'd approach it is I'd partition the time it takes
> an instruction to execute into two parts: the fixed amount of time the CPU
> requires and the (possibly) variable amount that the memory accesses
> require.  

> The fixed portion may be ascertained when the target program is
> first loaded. 

This computation is hard, totally target-dependent.  See for example
the amount of work needed in gcc to model a CPU pipeline in detail
(especially the more-precise DFA models).


> The variable portion may too, depending on the address
> accessed.  [...]

SID already does this part.  You can configure memory modules, mappers,
caches, and a few other bits as having latency counts associated with
operations.  The CPU accumulates these as penalties, combines them with
a raw instruction count, and tells the target-time scheduler the sum.
So simulated target time already includes the effect of these parameters.


> [...]
> The development scenario has thus been:
> 
> 1) make optimizations to the code and test on a Linux box
> 2) debug and go back to step 1 about 100 hundred times or so.
> 3) Once convinced that an optimization has been correctly made
>    re-target the makefile for an ARM processor
> 4) simulate the code (using sid as the simulator engine, of course)
> 5) analyze the simulation results

(You may also opt to have both linux & arm builds go in parallel, and
cross-check results for consistency.)


> So far, I've been satisfied knowing the total number of executed
> instructions. Objective results are easily quantified.  However, I'm now
> rapidly approaching the point where the optimizations have been completed. 
> While I know the approximate number of instructions, I still do not know 
> the total number of CPU cycles (and hence the total time).
> [...]

To get the most precise answer, you'd best use hardware running a
profiling-capable OS.  If accounting for approximate memory latencies
is good enough, then SID can be of help.


- FChE

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Profiling: --insn-count=1
  2002-08-01  8:32     ` Frank Ch. Eigler
@ 2002-08-01  9:00       ` Scott Dattalo
  0 siblings, 0 replies; 5+ messages in thread
From: Scott Dattalo @ 2002-08-01  9:00 UTC (permalink / raw)
  Cc: sid

On Thu, 1 Aug 2002, Frank Ch. Eigler wrote:

> To get the most precise answer, you'd best use hardware running a
> profiling-capable OS.  If accounting for approximate memory latencies
> is good enough, then SID can be of help.

Agreed. The whole purpose of simulation in my case is to obtain a
relatively accurate assessment of the hardware requirements. Believe me,
hardware is coming... And I should add SID has been invaluable! Thanks.

Scott

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-08-01 16:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-31 12:53 Profiling: --insn-count=1 Scott Dattalo
2002-08-01  4:18 ` Frank Ch. Eigler
2002-08-01  5:03   ` Scott Dattalo
2002-08-01  8:32     ` Frank Ch. Eigler
2002-08-01  9:00       ` Scott Dattalo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).