public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* Runtime discrepancy clock_gettime + funcition instrumentation
@ 2021-07-29  7:26 Christian Weiss
  2021-07-29  7:31 ` Florian Weimer
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Weiss @ 2021-07-29  7:26 UTC (permalink / raw)
  To: libc-help; +Cc: Felix Uhl

Dear all,

I made the following observation which I cannot explain. Maybe you can help me out?

I create a library which contains __cyg_profile_func_enter/exit hooks for function instrumentation.  I link this library to program which calls a dummy function a large number of times and measure the execution time. The resulting runtime differs significantly depending on whether I use the "-static" flag or not.

In detail:

Libtime.c:

#include <time.h>
void __cyg_profile_func_enter (void *func, void *caller) {
   struct timespec local_time;
   clock_gettime(CLOCK_MONOTONIC, &local_time);
}
void __cyg_profile_func_exit (void *func, void *caller) {
   struct timespec local_time;
   clock_gettime(CLOCK_MONOTONIC, &local_time);
}

Building a static and a shared library:

gcc -c libtime.c -o libtime.o
ar rcs libtime_gcc.a libtime.o
gcc -c -fPIC libtime.c -o libtime.o
gcc -shared -o libtime_gcc.so libtime.o
The test code test.c:

#include <stdio.h>
#include <stdlib.h>
#define N_ITER_DEFAULT 1000
long long func_1 (long long x) {
  return x;
}
int main (int argc, char *argv[]) {
  long long n_iter = argc > 1 ? atol (argv[1]) : N_ITER_DEFAULT;
  long long sum = 0;
  for (long long  i = 0; i < n_iter; i++) {
     sum += func_1(i);
  }
  printf ("sum: %lld\n", sum);
}

Building:

gcc -std=gnu99 -finstrument-functions -static test.c -o test_time_static_gcc.x -L. -ltime_gcc
gcc -std=gnu99 -finstrument-functions test.c -o test_time_dyn_gcc.x -L. -ltime_gcc

I made sure that the correct library is loaded to the dynamic binary at runtime:

[cweiss@amd022 overhead]$ ldd test_time_dyn_gcc.x  | grep libtime
      libtime_gcc.so => /home/cweiss/Vftrace_examples/overhead/libtime_gcc.so (0x00007fad0d42a000)
Measuring the runtime, the static binary takes about five times more time to finish than the dynamic one. I do not yet understand this observation:

[cweiss@amd022 overhead]$ time ./test_time_static_gcc.x 10000000
sum: 49999995000000
real         0m2.594s
user        0m0.396s
sys           0m2.197s
[cweiss@amd022 overhead]$ time ./test_time_dyn_gcc.x 10000000
sum: 49999995000000

real         0m0.518s
user        0m0.515s
sys           0m0.002s
Further remarks:


-          Libc version: 2.17

-          Gcc version: 8.20.

-          OS: CentOS  7.8

-          The issue does not appear when libtime is included explicitly and the "-static" flag is not used. This makes sense because then libc is linked dynamically.

-          A perf profile shows that the number of failed branch predictions is significantly higher in the static version (20 million (13.34%) vs 4000 (0%)),

Best regards,
Christian Weiss

Dr. Christian Weiss
Benchmark Analyst
Tel: +49 211 5369 126

NEC Deutschland GmbH
Geschäftsführer  Yuichi Kojima
Handelsregister Düsseldorf HRB 57941; VAT ID DE129424743


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Runtime discrepancy clock_gettime + funcition instrumentation
  2021-07-29  7:26 Runtime discrepancy clock_gettime + funcition instrumentation Christian Weiss
@ 2021-07-29  7:31 ` Florian Weimer
  2021-07-29  8:18   ` Christian Weiss
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Weimer @ 2021-07-29  7:31 UTC (permalink / raw)
  To: Christian Weiss; +Cc: libc-help, Felix Uhl

* Christian Weiss:

> Measuring the runtime, the static binary takes about five times more time to finish than the dynamic one. I do not yet understand this observation:
>
> [cweiss@amd022 overhead]$ time ./test_time_static_gcc.x 10000000
> sum: 49999995000000
> real         0m2.594s
> user        0m0.396s
> sys           0m2.197s
> [cweiss@amd022 overhead]$ time ./test_time_dyn_gcc.x 10000000
> sum: 49999995000000
>
> real         0m0.518s
> user        0m0.515s
> sys           0m0.002s

Have you checked that the statically linked version uses vDSO
acceleration instead of system calls?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Runtime discrepancy clock_gettime + funcition instrumentation
  2021-07-29  7:31 ` Florian Weimer
@ 2021-07-29  8:18   ` Christian Weiss
  2021-07-29  8:20     ` Florian Weimer
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Weiss @ 2021-07-29  8:18 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-help, Felix Uhl

Dear Florian,

Thank you for the quick reply. As far as I understand, I can check if vdso is used by using strace on the binary. Doing so I get:

```[cweiss@amd022 overhead]$ strace ./test_time_static_gcc.x 1 2>&1 | grep -i clock_gettime
clock_gettime(CLOCK_MONOTONIC, {tv_sec=89045, tv_nsec=95268751}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95813405}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95863144}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95933573}) = 0
[cweiss@amd022 overhead]$ strace ./test_time_dyn_gcc.x 1 2>&1 | grep -i clock_gettime```

Am I right that this implies that vdso is not used in the static binary, but in the dynamic one? Is there a way to activate it in the static case or do I just have to live with that?

Best regards,
Christian

-----Original Message-----
From: Florian Weimer [mailto:fweimer@redhat.com] 
Sent: 29 July 2021 09:32
To: Christian Weiss <Christian.Weiss@EMEA.NEC.COM>
Cc: libc-help@sourceware.org; Felix Uhl <Felix.Uhl@EMEA.NEC.COM>
Subject: Re: Runtime discrepancy clock_gettime + funcition instrumentation

* Christian Weiss:

> Measuring the runtime, the static binary takes about five times more time to finish than the dynamic one. I do not yet understand this observation:
>
> [cweiss@amd022 overhead]$ time ./test_time_static_gcc.x 10000000
> sum: 49999995000000
> real         0m2.594s
> user        0m0.396s
> sys           0m2.197s
> [cweiss@amd022 overhead]$ time ./test_time_dyn_gcc.x 10000000
> sum: 49999995000000
>
> real         0m0.518s
> user        0m0.515s
> sys           0m0.002s

Have you checked that the statically linked version uses vDSO acceleration instead of system calls?

Thanks,
Florian



 Click https://www.mailcontrol.com/sr/fIQaiVcf9cfGX2PQPOmvUg0Q1FXI7Aabn0rrnrkHFzjvcdFTukB4r3gIRLMubDcYTms2r1hgqRkeq4Nrivo6iw==  to report this email as spam.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Runtime discrepancy clock_gettime + funcition instrumentation
  2021-07-29  8:18   ` Christian Weiss
@ 2021-07-29  8:20     ` Florian Weimer
  2021-07-29  8:29       ` Christian Weiss
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Weimer @ 2021-07-29  8:20 UTC (permalink / raw)
  To: Christian Weiss; +Cc: libc-help, Felix Uhl

* Christian Weiss:

> Dear Florian,
>
> Thank you for the quick reply. As far as I understand, I can check if vdso is used by using strace on the binary. Doing so I get:
>
> ```[cweiss@amd022 overhead]$ strace ./test_time_static_gcc.x 1 2>&1 | grep -i clock_gettime
> clock_gettime(CLOCK_MONOTONIC, {tv_sec=89045, tv_nsec=95268751}) = 0
> clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95813405}) = 0
> clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95863144}) = 0
> clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, tv_nsec=95933573}) = 0
> [cweiss@amd022 overhead]$ strace ./test_time_dyn_gcc.x 1 2>&1 | grep -i clock_gettime```
>
> Am I right that this implies that vdso is not used in the static
> binary, but in the dynamic one? Is there a way to activate it in the
> static case or do I just have to live with that?

Later glibc versions support vDSO in static binaries.  glibc 2.17 as
used by Red Hat Enterprise Linux 7 apparently does not.  (Your
observations match my recollections, but I haven't double-checked them.)

Thanks,
Florian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Runtime discrepancy clock_gettime + funcition instrumentation
  2021-07-29  8:20     ` Florian Weimer
@ 2021-07-29  8:29       ` Christian Weiss
  0 siblings, 0 replies; 5+ messages in thread
From: Christian Weiss @ 2021-07-29  8:29 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-help, Felix Uhl

Dear Florian,

I checked on a newer Ubuntu system with libc version 2.33. There, I can confirm that vdso is used and that the static binary is in fact faster. Thanks for your helping out. 

-----Original Message-----
From: Florian Weimer [mailto:fweimer@redhat.com] 
Sent: 29 July 2021 10:21
To: Christian Weiss <Christian.Weiss@EMEA.NEC.COM>
Cc: libc-help@sourceware.org; Felix Uhl <Felix.Uhl@EMEA.NEC.COM>
Subject: Re: Runtime discrepancy clock_gettime + funcition instrumentation

* Christian Weiss:

> Dear Florian,
>
> Thank you for the quick reply. As far as I understand, I can check if vdso is used by using strace on the binary. Doing so I get:
>
> ```[cweiss@amd022 overhead]$ strace ./test_time_static_gcc.x 1 2>&1 | 
> grep -i clock_gettime clock_gettime(CLOCK_MONOTONIC, {tv_sec=89045, 
> tv_nsec=95268751}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, 
> tv_nsec=95813405}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, 
> tv_nsec=95863144}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=89145, 
> tv_nsec=95933573}) = 0
> [cweiss@amd022 overhead]$ strace ./test_time_dyn_gcc.x 1 2>&1 | grep 
> -i clock_gettime```
>
> Am I right that this implies that vdso is not used in the static 
> binary, but in the dynamic one? Is there a way to activate it in the 
> static case or do I just have to live with that?

Later glibc versions support vDSO in static binaries.  glibc 2.17 as used by Red Hat Enterprise Linux 7 apparently does not.  (Your observations match my recollections, but I haven't double-checked them.)

Thanks,
Florian



 Click https://www.mailcontrol.com/sr/ElFnKpgQyXHGX2PQPOmvUvSoGVaHHSWC2I85Is1gHgcdotMno4NXaggvmnOoSt8fTms2r1hgqRnVvyfHqPLkZg==  to report this email as spam.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-29  8:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-29  7:26 Runtime discrepancy clock_gettime + funcition instrumentation Christian Weiss
2021-07-29  7:31 ` Florian Weimer
2021-07-29  8:18   ` Christian Weiss
2021-07-29  8:20     ` Florian Weimer
2021-07-29  8:29       ` Christian Weiss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).