public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Stefan Liebler <stli@linux.ibm.com>,
	libc-alpha@sourceware.org, Carlos O'Donell <carlos@redhat.com>
Subject: Re: [PATCH v2] Loosen the limits of time/tst-cpuclock1.
Date: Fri, 23 Oct 2020 09:03:19 -0300	[thread overview]
Message-ID: <e049719a-467e-0f55-8ac2-cf093ad86642@linaro.org> (raw)
In-Reply-To: <730e513b-2650-b6a2-98c3-cd99f1e35b6b@linux.ibm.com>



On 23/10/2020 05:59, Stefan Liebler wrote:
> On 10/21/20 2:58 PM, Adhemerval Zanella wrote:
>>
>>
>> On 19/10/2020 11:47, Stefan Liebler via Libc-alpha wrote:
>>> Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f
>>> "Fix time/tst-cpuclock1 intermitent failures" (2020-07-11),
>>> this test fails quite often on s390x/s390 with one/multiple of those:
>>> "before - after" / "nanosleep time" / "dead - after" ourside reasonable range.
>>>
>>> On a zVM/kvm guest the CPUs are shared between multiple guests.
>>> And even on the lpar (kvm host) the CPUs are usually shared between multiple lpars.
>>> The defined CPUs for a lpar/zVM-system could also have lower weights compared
>>> to other lpars which let the steal time further grow.
>>>
>>> Usually I build (-j$(nproc)) and test (PARALLELMFLAGS="-j$(nproc)") glibc multiple
>>> times, e.g. with different GCCs, on various lpars or zVM guests at the same time.
>>> During this time, I've run the test for 13500 times and obvserved the following fails:
>>> ~600x "before - after"
>>> ~60x "nanosleep time"
>>> ~70x "dead - after"
>>>
>>> I've also observed a lot of "before - after" fails on a intel kvm-guest while
>>> building/testing glibc on it.
>>>
>>> The mentioned commit has tighten the limits of valid tv_nsec ranges:
>>> "before - after" (expected: 500000000):
>>> - 100000000 ... 600000000
>>> + 450000000 ... 550000000
>>>
>>> "nanosleep time" (expected: 100000000):
>>> - 100000000 ... 200000000
>>> + 090000000 ... 120000000
>>>
>>> "dead - after" (expected: 100000000):
>>> -           ... 200000000
>>> + 090000000 ... 120000000
>>>
>>> The test itself forks a child process which chew_cpu (user- and kernel-space).
>>> The parent process sleeps with nanosleep(0.5s) and measures the child_clock time:
>>> diff = after - before
>>> With much workload on the machine, the child won't make much progess
>>> and it can fall much beyond the minimum limit. Thus this check is now removed!
>>
>> Ok.
>>
>>>
>>> Afterwards the parent process sleeps with clock_nanosleep (child_clock, 0.1s):
>>> diff = afterns - after
>>> The test currently also allows 0.9 * 0.1s.  As this would be an error, the
>>> hard limit of 1.0 * 0.1s is now used as minimum border!
>>> Depending on the workload, the maximum limit can exceed the 1.2 * 0.1s.
>>> Therefore the upper limit is set to 2.0 which was also used before the
>>> mentioned commit.
>>
>> This is still tricky and add heuristic values that might fail depending
>> of the architecture/kernel.  Wouldn't be better to follow Carlos suggestion
>> and strip down from the test all the time related checks and only keep 
>> the functional interfaces of the ABI:
>>
>>   * clock_getcpuclockid vs. ENOSYS / ESRCH / EPERM
>>   * clock_getcpuclockid vs. valid child
>>   * clock_gettime of dead child where clock is no longer valid
>>
> Sure, I can just remove this last time related check. This would be the
> following diff and the title of the new commit would be "Remove timing
> related checks of time/tst-cpuclock1":

Sounds good.

> diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c
> index cc08150654..f40b590111 100644
> --- a/time/tst-cpuclock1.c
> +++ b/time/tst-cpuclock1.c
> @@ -26,7 +26,6 @@
>  #include <signal.h>
>  #include <stdint.h>
>  #include <sys/wait.h>
> -#include <support/timespec.h>
> 
>  /* This function is intended to rack up both user and system time.  */
>  static void
> @@ -163,21 +162,6 @@ do_test (void)
>           printf ("live PID %d after sleep => %ju.%.9ju\n",
>                   child, (uintmax_t) afterns.tv_sec,
>                   (uintmax_t) afterns.tv_nsec);
> -
> -         /* As the sleep is based on the child clock, the diff should never
> -            be less than the specified sleeptime.  Otherwise this is an
> error.
> -            The upper bound is quite high in order to get no failure if
> running
> -            with high cpu usage and/or on virtualized environments with
> shared
> -            CPUs.  */
> -         struct timespec diff;
> -         diff = timespec_sub (support_timespec_normalize (afterns),
> -                              support_timespec_normalize (before));
> -         if (!support_timespec_check_in_range (sleeptime, diff, 1.0, 2.0))
> -           {
> -             printf ("nanosleep time %ju.%.9ju outside reasonable range\n",
> -                     (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
> -             result = 1;
> -           }
>         }
>      }
> 
> 
>> And then maybe we can add *another* test that might evaluate timings report
>> as the tests was originally intended?
>>
> Thus we would have time/tst-cpuclock1 which just performs functional checks
> and time/tst-cpuclock1-timings which additionally performs the just
> removed timing checks?

I don't have a strong opinion if this check is indeed required in fact 
since it is prone to false positive (even more often with more elaborated
virtual environments).  It was more a suggestion that we can move this 
functional tests to a different one in a subsequent patch.  I am fine
with just your suggestion above.

> 
> The timing checks could be enabled by setting a macro:
> tst-cpuclock1-timings.c:
> #define ENABLE_TIMING_CHECKS 1
> #include <tst-cpuclock1.c>
> 
> But then time/tst-cpuclock1-timings would fail as often as the current
> time/tst-cpuclock1 test if run on systems with high cpu-load /
> virtualized CPUs. Should the valid limits be adjusted? If yes, which
> limits should be used?
> I think at least the "before - after" check which compares different
> clocks should be removed.
> 
> Bye,
> Stefan
> 

  reply	other threads:[~2020-10-23 12:03 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-19 14:47 Stefan Liebler
2020-10-21 12:58 ` Adhemerval Zanella
2020-10-23  8:59   ` Stefan Liebler
2020-10-23 12:03     ` Adhemerval Zanella [this message]
2020-10-23 13:09       ` Stefan Liebler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e049719a-467e-0f55-8ac2-cf093ad86642@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=stli@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).