Hi, I'd like to commit the attached patch which improves the performance of the --gprof option by an order of magnitude. One benchmark I tried with sid --board= --final-insn-count --gprof=,cycles=1 improved from 6963 seconds to 640 seconds. This is on top of the improvement I obtained with my previous patch. The culprit was the sw-profile-gprof component's use of an attribute interface to obtain the pc from each sample. It turns out that this alone completely dominates all other aspects of the simulation. The patch does two things: 1) Make use of a local reference whenever 'this->stats[current_stats]' is used more than once in a method of gprof_component. 'this->stats' is a vector and this change gave me about a 3% improvement 2) Use a pin interface to provide the pc for each sample to the gprof component. This was the big win. Rather than have the gprof component obtain and parse the value of an attribute of the cpu for each sample, instead the cpu now drives the pc value on two pins (to handle 64 bit pc's) before driving its sample-gprof pin. Since this represents an interface change, I didn't want to commit it with some review/approval, however, as far as I can tell, the gprof interface is not used by any existing port. Comments? OK to commit? Dave