Yes,on x86-64. I just compare the disassemble between d275970ab and before commit by objdump. And __drand48_iterate will be more long distance after d275970ab, so I revert this commit and found the performance will recover a little. Thanks, abush On Mon, Apr 1, 2024 at 9:12 PM Florian Weimer wrote: > * abush wang: > > > This is test: > > ``` > > uint64_t getnsecs() { > > uint32_t lo, hi; > > __asm__ __volatile__ ( > > "rdtsc" : "=a"(lo), "=d"(hi) > > ); > > return ((uint64_t)hi << 32) | lo; > > } > > > > int main() { > > const int num_iterations = 1; > > uint64_t start, end, total_time = 0; > > > > start = getnsecs(); > > for (int i = 0; i < num_iterations; i++) { > > (void) lrand48(); > > } > > end = getnsecs(); > > total_time += (end - start); > > > > printf("Average time for lrand48: %lu cycles\n", total_time / > num_iterations); > > return 0; > > } > > ``` > > before: > > Average time for lrand48: 21418 cycles > > > > after: > > Average time for lrand48: 9892 cycles > > Do you see this on x86-64? So this isn't a displacement range issue? > > It could be that this is a random performance change due to code > alignment, and not actually caused by the direct call distance. > > Thanks, > Florian > >