Hi, I've committed the attached patch which addresses a problem encountered on architectures with parallel insn execution. CGEN cpus in SID compute total cycles used by adding total_insn_count + total_latency. In the case of parallel execution, total_latency may actually decrease since, for a parallel insn, total_insn_count increases, but the number of cycles used does not. The sample_gprof method I committed in my previous patch was using total_latency to determine how many samples to take. I have now changed it to use total_insn_count + current_step_insn_count + total_latency to compute this. The patch also corrects the resetting of gprof_prev_cycle so that it does not get reset unless gprof has been turned off dynamically. This allows initial latency for a cpu to be counted properly. Dave