* Profiling: --insn-count=1 @ 2002-07-31 12:53 Scott Dattalo 2002-08-01 4:18 ` Frank Ch. Eigler 0 siblings, 1 reply; 5+ messages in thread From: Scott Dattalo @ 2002-07-31 12:53 UTC (permalink / raw) To: sid To profile my simulated code, I've been invoking sid like so: arm-elf-sid --cpu arm --memory-region=0x2020000,0x2000000 \ --memory-region=0xfffe0000,0x1ffff --gdb=2000 --gprof \ --trace-counter --insn-count=1 -EL myprog I then simulate the application with arm-elf-gcc and examine the results with arm-elf-gprof. Now the question I have is there a way to count cpu cycles instead of cpu instructions? If there was a one-to-one relationship between the two, then it's not an issue. However, some instructions on the ARM are not single-cycled. I suppose the real question is, "is there a way to concisely measure the amount of 'simulated' time it take for a simulation to run?" FWIW, I'm using ~6 week old copy of SID. Scott ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Profiling: --insn-count=1 2002-07-31 12:53 Profiling: --insn-count=1 Scott Dattalo @ 2002-08-01 4:18 ` Frank Ch. Eigler 2002-08-01 5:03 ` Scott Dattalo 0 siblings, 1 reply; 5+ messages in thread From: Frank Ch. Eigler @ 2002-08-01 4:18 UTC (permalink / raw) To: Scott Dattalo; +Cc: sid [-- Attachment #1: Type: text/plain, Size: 996 bytes --] Hi, Scott - On Wed, Jul 31, 2002 at 12:53:37PM -0700, Scott Dattalo wrote: > [...] > Now the question I have is there a way to count cpu cycles instead of cpu > instructions? If there was a one-to-one relationship between the two, then > it's not an issue. However, some instructions on the ARM are not > single-cycled. I suppose the real question is, "is there a way to > concisely measure the amount of 'simulated' time it take for a simulation > to run?" The current batch of CPU models in sid do not attempt to track the number of cycles taken by any given instruction. To do so exactly is a crazy amount of work to do just casually. (Think of having to model all the pipeline interlock/bypass features, functional units.) SID can on the other hand model memory latency, so if that's the bulk of your interest, we can make the profile data collector sensitive to that. > FWIW, I'm using ~6 week old copy of SID. This hasn't changed recently. - FChE [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Profiling: --insn-count=1 2002-08-01 4:18 ` Frank Ch. Eigler @ 2002-08-01 5:03 ` Scott Dattalo 2002-08-01 8:32 ` Frank Ch. Eigler 0 siblings, 1 reply; 5+ messages in thread From: Scott Dattalo @ 2002-08-01 5:03 UTC (permalink / raw) To: sid On Thu, 1 Aug 2002, Frank Ch. Eigler wrote: > Hi, Scott - > > On Wed, Jul 31, 2002 at 12:53:37PM -0700, Scott Dattalo wrote: > > [...] > > Now the question I have is there a way to count cpu cycles instead of cpu > > instructions? If there was a one-to-one relationship between the two, then > > it's not an issue. However, some instructions on the ARM are not > > single-cycled. I suppose the real question is, "is there a way to > > concisely measure the amount of 'simulated' time it take for a simulation > > to run?" > > The current batch of CPU models in sid do not attempt to track the number > of cycles taken by any given instruction. To do so exactly is a crazy > amount of work to do just casually. (Think of having to model all the > pipeline interlock/bypass features, functional units.) Yeah, I suspected as much... If I had time to look into it, I'd try to add that feature. The way I'd approach it is I'd partition the time it takes an instruction to execute into two parts: the fixed amount of time the CPU requires and the (possibly) variable amount that the memory accesses require. The fixed portion may be ascertained when the target program is first loaded. The variable portion may too, depending on the address accessed. If not, the address can be examined to see which region it accesses (and memory can be partitioned into regions that describe how much time it takes to access - e.g. single-cycle SRAM versus 7-wait state FLASH). I'm sure this is already obvious to you because, as you say, it would require a fair amount of work to implement . > > SID can on the other hand model memory latency, so if that's the bulk of > your interest, we can make the profile data collector sensitive to that. Well, actually, that is half the problem. In my particular application, I have about an 8 Meg program of which only 70k is code and the rest are constants. For all intents and purposes, you can think of the ~7.93 Meg being a file system. The code will be shadowed in single-cycle SRAM. The application is an extremely low powered one. Furthermore, it has time constraints (i.e. it has to complete its task in under a second or so). The code was originally written targetting a Desktop PC. I.e. little or no attention was given to code efficiency since it was not an issue. So I've been re-writing the code to optimize it for an ARM application. The development scenario has thus been: 1) make optimizations to the code and test on a Linux box 2) debug and go back to step 1 about 100 hundred times or so. 3) Once convinced that an optimization has been correctly made re-target the makefile for an ARM processor 4) simulate the code (using sid as the simulator engine, of course) 5) analyze the simulation results So far, I've been satisfied knowing the total number of executed instructions. Objective results are easily quantified. However, I'm now rapidly approaching the point where the optimizations have been completed. While I know the approximate number of instructions, I still do not know the total number of CPU cycles (and hence the total time). > > > > FWIW, I'm using ~6 week old copy of SID. > > This hasn't changed recently. Good to know - I've seen the automated messages regarding snap-shots and was unsure off hand if there had been any changes. Scott ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Profiling: --insn-count=1 2002-08-01 5:03 ` Scott Dattalo @ 2002-08-01 8:32 ` Frank Ch. Eigler 2002-08-01 9:00 ` Scott Dattalo 0 siblings, 1 reply; 5+ messages in thread From: Frank Ch. Eigler @ 2002-08-01 8:32 UTC (permalink / raw) To: Scott Dattalo; +Cc: sid [-- Attachment #1: Type: text/plain, Size: 2207 bytes --] Hi - On Thu, Aug 01, 2002 at 05:03:55AM -0700, Scott Dattalo wrote: > [...] > Yeah, I suspected as much... If I had time to look into it, I'd try to add > that feature. The way I'd approach it is I'd partition the time it takes > an instruction to execute into two parts: the fixed amount of time the CPU > requires and the (possibly) variable amount that the memory accesses > require. > The fixed portion may be ascertained when the target program is > first loaded. This computation is hard, totally target-dependent. See for example the amount of work needed in gcc to model a CPU pipeline in detail (especially the more-precise DFA models). > The variable portion may too, depending on the address > accessed. [...] SID already does this part. You can configure memory modules, mappers, caches, and a few other bits as having latency counts associated with operations. The CPU accumulates these as penalties, combines them with a raw instruction count, and tells the target-time scheduler the sum. So simulated target time already includes the effect of these parameters. > [...] > The development scenario has thus been: > > 1) make optimizations to the code and test on a Linux box > 2) debug and go back to step 1 about 100 hundred times or so. > 3) Once convinced that an optimization has been correctly made > re-target the makefile for an ARM processor > 4) simulate the code (using sid as the simulator engine, of course) > 5) analyze the simulation results (You may also opt to have both linux & arm builds go in parallel, and cross-check results for consistency.) > So far, I've been satisfied knowing the total number of executed > instructions. Objective results are easily quantified. However, I'm now > rapidly approaching the point where the optimizations have been completed. > While I know the approximate number of instructions, I still do not know > the total number of CPU cycles (and hence the total time). > [...] To get the most precise answer, you'd best use hardware running a profiling-capable OS. If accounting for approximate memory latencies is good enough, then SID can be of help. - FChE [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Profiling: --insn-count=1 2002-08-01 8:32 ` Frank Ch. Eigler @ 2002-08-01 9:00 ` Scott Dattalo 0 siblings, 0 replies; 5+ messages in thread From: Scott Dattalo @ 2002-08-01 9:00 UTC (permalink / raw) Cc: sid On Thu, 1 Aug 2002, Frank Ch. Eigler wrote: > To get the most precise answer, you'd best use hardware running a > profiling-capable OS. If accounting for approximate memory latencies > is good enough, then SID can be of help. Agreed. The whole purpose of simulation in my case is to obtain a relatively accurate assessment of the hardware requirements. Believe me, hardware is coming... And I should add SID has been invaluable! Thanks. Scott ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-08-01 16:00 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-07-31 12:53 Profiling: --insn-count=1 Scott Dattalo 2002-08-01 4:18 ` Frank Ch. Eigler 2002-08-01 5:03 ` Scott Dattalo 2002-08-01 8:32 ` Frank Ch. Eigler 2002-08-01 9:00 ` Scott Dattalo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).