* Runtime discrepancy clock_gettime + funcition instrumentation
@ 2021-07-29 7:26 Christian Weiss
2021-07-29 7:31 ` Florian Weimer
0 siblings, 1 reply; 5+ messages in thread
From: Christian Weiss @ 2021-07-29 7:26 UTC (permalink / raw)
To: libc-help; +Cc: Felix Uhl
Dear all,
I made the following observation which I cannot explain. Maybe you can help me out?
I create a library which contains __cyg_profile_func_enter/exit hooks for function instrumentation. I link this library to program which calls a dummy function a large number of times and measure the execution time. The resulting runtime differs significantly depending on whether I use the "-static" flag or not.
In detail:
Libtime.c:
#include <time.h>
void __cyg_profile_func_enter (void *func, void *caller) {
struct timespec local_time;
clock_gettime(CLOCK_MONOTONIC, &local_time);
}
void __cyg_profile_func_exit (void *func, void *caller) {
struct timespec local_time;
clock_gettime(CLOCK_MONOTONIC, &local_time);
}
Building a static and a shared library:
gcc -c libtime.c -o libtime.o
ar rcs libtime_gcc.a libtime.o
gcc -c -fPIC libtime.c -o libtime.o
gcc -shared -o libtime_gcc.so libtime.o
The test code test.c:
#include <stdio.h>
#include <stdlib.h>
#define N_ITER_DEFAULT 1000
long long func_1 (long long x) {
return x;
}
int main (int argc, char *argv[]) {
long long n_iter = argc > 1 ? atol (argv[1]) : N_ITER_DEFAULT;
long long sum = 0;
for (long long i = 0; i < n_iter; i++) {
sum += func_1(i);
}
printf ("sum: %lld\n", sum);
}
Building:
gcc -std=gnu99 -finstrument-functions -static test.c -o test_time_static_gcc.x -L. -ltime_gcc
gcc -std=gnu99 -finstrument-functions test.c -o test_time_dyn_gcc.x -L. -ltime_gcc
I made sure that the correct library is loaded to the dynamic binary at runtime:
[cweiss@amd022 overhead]$ ldd test_time_dyn_gcc.x | grep libtime
libtime_gcc.so => /home/cweiss/Vftrace_examples/overhead/libtime_gcc.so (0x00007fad0d42a000)
Measuring the runtime, the static binary takes about five times more time to finish than the dynamic one. I do not yet understand this observation:
[cweiss@amd022 overhead]$ time ./test_time_static_gcc.x 10000000
sum: 49999995000000
real 0m2.594s
user 0m0.396s
sys 0m2.197s
[cweiss@amd022 overhead]$ time ./test_time_dyn_gcc.x 10000000
sum: 49999995000000
real 0m0.518s
user 0m0.515s
sys 0m0.002s
Further remarks:
- Libc version: 2.17
- Gcc version: 8.20.
- OS: CentOS 7.8
- The issue does not appear when libtime is included explicitly and the "-static" flag is not used. This makes sense because then libc is linked dynamically.
- A perf profile shows that the number of failed branch predictions is significantly higher in the static version (20 million (13.34%) vs 4000 (0%)),
Best regards,
Christian Weiss
Dr. Christian Weiss
Benchmark Analyst
Tel: +49 211 5369 126
NEC Deutschland GmbH
Geschäftsführer Yuichi Kojima
Handelsregister Düsseldorf HRB 57941; VAT ID DE129424743
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Runtime discrepancy clock_gettime + funcition instrumentation
2021-07-29 7:26 Runtime discrepancy clock_gettime + funcition instrumentation Christian Weiss
@ 2021-07-29 7:31 ` Florian Weimer
2021-07-29 8:18 ` Christian Weiss
0 siblings, 1 reply; 5+ messages in thread
From: Florian Weimer @ 2021-07-29 7:31 UTC (permalink / raw)
To: Christian Weiss; +Cc: libc-help, Felix Uhl
* Christian Weiss:
> Measuring the runtime, the static binary takes about five times more time to finish than the dynamic one. I do not yet understand this observation:
>
> [cweiss@amd022 overhead]$ time ./test_time_static_gcc.x 10000000
> sum: 49999995000000
> real 0m2.594s
> user 0m0.396s
> sys 0m2.197s
> [cweiss@amd022 overhead]$ time ./test_time_dyn_gcc.x 10000000
> sum: 49999995000000
>
> real 0m0.518s
> user 0m0.515s
> sys 0m0.002s
Have you checked that the statically linked version uses vDSO
acceleration instead of system calls?
Thanks,
Florian
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Runtime discrepancy clock_gettime + funcition instrumentation
2021-07-29 7:31 ` Florian Weimer
@ 2021-07-29 8:18 ` Christian Weiss
2021-07-29 8:20 ` Florian Weimer
0 siblings, 1 reply; 5+ messages in thread
From: Christian Weiss @ 2021-07-29 8:18 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-help, Felix Uhl
Dear Florian,
Thank you for the quick reply. As far as I understand, I can check if vdso is used by using strace on the binary. Doing so I get:
```[cweiss@amd022 overhead]$ strace ./test_time_static_gcc.x 1 2>&1 | grep -i clock_gettime
clock_gettime(CLOCK_MONOTONIC, {tv_sec=89045, tv_nsec=95268751}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95813405}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95863144}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95933573}) = 0
[cweiss@amd022 overhead]$ strace ./test_time_dyn_gcc.x 1 2>&1 | grep -i clock_gettime```
Am I right that this implies that vdso is not used in the static binary, but in the dynamic one? Is there a way to activate it in the static case or do I just have to live with that?
Best regards,
Christian
-----Original Message-----
From: Florian Weimer [mailto:fweimer@redhat.com]
Sent: 29 July 2021 09:32
To: Christian Weiss <Christian.Weiss@EMEA.NEC.COM>
Cc: libc-help@sourceware.org; Felix Uhl <Felix.Uhl@EMEA.NEC.COM>
Subject: Re: Runtime discrepancy clock_gettime + funcition instrumentation
* Christian Weiss:
> Measuring the runtime, the static binary takes about five times more time to finish than the dynamic one. I do not yet understand this observation:
>
> [cweiss@amd022 overhead]$ time ./test_time_static_gcc.x 10000000
> sum: 49999995000000
> real 0m2.594s
> user 0m0.396s
> sys 0m2.197s
> [cweiss@amd022 overhead]$ time ./test_time_dyn_gcc.x 10000000
> sum: 49999995000000
>
> real 0m0.518s
> user 0m0.515s
> sys 0m0.002s
Have you checked that the statically linked version uses vDSO acceleration instead of system calls?
Thanks,
Florian
Click https://www.mailcontrol.com/sr/fIQaiVcf9cfGX2PQPOmvUg0Q1FXI7Aabn0rrnrkHFzjvcdFTukB4r3gIRLMubDcYTms2r1hgqRkeq4Nrivo6iw== to report this email as spam.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Runtime discrepancy clock_gettime + funcition instrumentation
2021-07-29 8:18 ` Christian Weiss
@ 2021-07-29 8:20 ` Florian Weimer
2021-07-29 8:29 ` Christian Weiss
0 siblings, 1 reply; 5+ messages in thread
From: Florian Weimer @ 2021-07-29 8:20 UTC (permalink / raw)
To: Christian Weiss; +Cc: libc-help, Felix Uhl
* Christian Weiss:
> Dear Florian,
>
> Thank you for the quick reply. As far as I understand, I can check if vdso is used by using strace on the binary. Doing so I get:
>
> ```[cweiss@amd022 overhead]$ strace ./test_time_static_gcc.x 1 2>&1 | grep -i clock_gettime
> clock_gettime(CLOCK_MONOTONIC, {tv_sec=89045, tv_nsec=95268751}) = 0
> clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95813405}) = 0
> clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95863144}) = 0
> clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95933573}) = 0
> [cweiss@amd022 overhead]$ strace ./test_time_dyn_gcc.x 1 2>&1 | grep -i clock_gettime```
>
> Am I right that this implies that vdso is not used in the static
> binary, but in the dynamic one? Is there a way to activate it in the
> static case or do I just have to live with that?
Later glibc versions support vDSO in static binaries. glibc 2.17 as
used by Red Hat Enterprise Linux 7 apparently does not. (Your
observations match my recollections, but I haven't double-checked them.)
Thanks,
Florian
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Runtime discrepancy clock_gettime + funcition instrumentation
2021-07-29 8:20 ` Florian Weimer
@ 2021-07-29 8:29 ` Christian Weiss
0 siblings, 0 replies; 5+ messages in thread
From: Christian Weiss @ 2021-07-29 8:29 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-help, Felix Uhl
Dear Florian,
I checked on a newer Ubuntu system with libc version 2.33. There, I can confirm that vdso is used and that the static binary is in fact faster. Thanks for your helping out.
-----Original Message-----
From: Florian Weimer [mailto:fweimer@redhat.com]
Sent: 29 July 2021 10:21
To: Christian Weiss <Christian.Weiss@EMEA.NEC.COM>
Cc: libc-help@sourceware.org; Felix Uhl <Felix.Uhl@EMEA.NEC.COM>
Subject: Re: Runtime discrepancy clock_gettime + funcition instrumentation
* Christian Weiss:
> Dear Florian,
>
> Thank you for the quick reply. As far as I understand, I can check if vdso is used by using strace on the binary. Doing so I get:
>
> ```[cweiss@amd022 overhead]$ strace ./test_time_static_gcc.x 1 2>&1 |
> grep -i clock_gettime clock_gettime(CLOCK_MONOTONIC, {tv_sec=89045,
> tv_nsec=95268751}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145,
> tv_nsec=95813405}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145,
> tv_nsec=95863144}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145,
> tv_nsec=95933573}) = 0
> [cweiss@amd022 overhead]$ strace ./test_time_dyn_gcc.x 1 2>&1 | grep
> -i clock_gettime```
>
> Am I right that this implies that vdso is not used in the static
> binary, but in the dynamic one? Is there a way to activate it in the
> static case or do I just have to live with that?
Later glibc versions support vDSO in static binaries. glibc 2.17 as used by Red Hat Enterprise Linux 7 apparently does not. (Your observations match my recollections, but I haven't double-checked them.)
Thanks,
Florian
Click https://www.mailcontrol.com/sr/ElFnKpgQyXHGX2PQPOmvUvSoGVaHHSWC2I85Is1gHgcdotMno4NXaggvmnOoSt8fTms2r1hgqRnVvyfHqPLkZg== to report this email as spam.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-07-29 8:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-29 7:26 Runtime discrepancy clock_gettime + funcition instrumentation Christian Weiss
2021-07-29 7:31 ` Florian Weimer
2021-07-29 8:18 ` Christian Weiss
2021-07-29 8:20 ` Florian Weimer
2021-07-29 8:29 ` Christian Weiss
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).