From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Stefan Liebler <stli@linux.ibm.com>,
libc-alpha@sourceware.org, Carlos O'Donell <carlos@redhat.com>
Subject: Re: [PATCH v2] Loosen the limits of time/tst-cpuclock1.
Date: Fri, 23 Oct 2020 09:03:19 -0300 [thread overview]
Message-ID: <e049719a-467e-0f55-8ac2-cf093ad86642@linaro.org> (raw)
In-Reply-To: <730e513b-2650-b6a2-98c3-cd99f1e35b6b@linux.ibm.com>
On 23/10/2020 05:59, Stefan Liebler wrote:
> On 10/21/20 2:58 PM, Adhemerval Zanella wrote:
>>
>>
>> On 19/10/2020 11:47, Stefan Liebler via Libc-alpha wrote:
>>> Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f
>>> "Fix time/tst-cpuclock1 intermitent failures" (2020-07-11),
>>> this test fails quite often on s390x/s390 with one/multiple of those:
>>> "before - after" / "nanosleep time" / "dead - after" ourside reasonable range.
>>>
>>> On a zVM/kvm guest the CPUs are shared between multiple guests.
>>> And even on the lpar (kvm host) the CPUs are usually shared between multiple lpars.
>>> The defined CPUs for a lpar/zVM-system could also have lower weights compared
>>> to other lpars which let the steal time further grow.
>>>
>>> Usually I build (-j$(nproc)) and test (PARALLELMFLAGS="-j$(nproc)") glibc multiple
>>> times, e.g. with different GCCs, on various lpars or zVM guests at the same time.
>>> During this time, I've run the test for 13500 times and obvserved the following fails:
>>> ~600x "before - after"
>>> ~60x "nanosleep time"
>>> ~70x "dead - after"
>>>
>>> I've also observed a lot of "before - after" fails on a intel kvm-guest while
>>> building/testing glibc on it.
>>>
>>> The mentioned commit has tighten the limits of valid tv_nsec ranges:
>>> "before - after" (expected: 500000000):
>>> - 100000000 ... 600000000
>>> + 450000000 ... 550000000
>>>
>>> "nanosleep time" (expected: 100000000):
>>> - 100000000 ... 200000000
>>> + 090000000 ... 120000000
>>>
>>> "dead - after" (expected: 100000000):
>>> - ... 200000000
>>> + 090000000 ... 120000000
>>>
>>> The test itself forks a child process which chew_cpu (user- and kernel-space).
>>> The parent process sleeps with nanosleep(0.5s) and measures the child_clock time:
>>> diff = after - before
>>> With much workload on the machine, the child won't make much progess
>>> and it can fall much beyond the minimum limit. Thus this check is now removed!
>>
>> Ok.
>>
>>>
>>> Afterwards the parent process sleeps with clock_nanosleep (child_clock, 0.1s):
>>> diff = afterns - after
>>> The test currently also allows 0.9 * 0.1s. As this would be an error, the
>>> hard limit of 1.0 * 0.1s is now used as minimum border!
>>> Depending on the workload, the maximum limit can exceed the 1.2 * 0.1s.
>>> Therefore the upper limit is set to 2.0 which was also used before the
>>> mentioned commit.
>>
>> This is still tricky and add heuristic values that might fail depending
>> of the architecture/kernel. Wouldn't be better to follow Carlos suggestion
>> and strip down from the test all the time related checks and only keep
>> the functional interfaces of the ABI:
>>
>> * clock_getcpuclockid vs. ENOSYS / ESRCH / EPERM
>> * clock_getcpuclockid vs. valid child
>> * clock_gettime of dead child where clock is no longer valid
>>
> Sure, I can just remove this last time related check. This would be the
> following diff and the title of the new commit would be "Remove timing
> related checks of time/tst-cpuclock1":
Sounds good.
> diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c
> index cc08150654..f40b590111 100644
> --- a/time/tst-cpuclock1.c
> +++ b/time/tst-cpuclock1.c
> @@ -26,7 +26,6 @@
> #include <signal.h>
> #include <stdint.h>
> #include <sys/wait.h>
> -#include <support/timespec.h>
>
> /* This function is intended to rack up both user and system time. */
> static void
> @@ -163,21 +162,6 @@ do_test (void)
> printf ("live PID %d after sleep => %ju.%.9ju\n",
> child, (uintmax_t) afterns.tv_sec,
> (uintmax_t) afterns.tv_nsec);
> -
> - /* As the sleep is based on the child clock, the diff should never
> - be less than the specified sleeptime. Otherwise this is an
> error.
> - The upper bound is quite high in order to get no failure if
> running
> - with high cpu usage and/or on virtualized environments with
> shared
> - CPUs. */
> - struct timespec diff;
> - diff = timespec_sub (support_timespec_normalize (afterns),
> - support_timespec_normalize (before));
> - if (!support_timespec_check_in_range (sleeptime, diff, 1.0, 2.0))
> - {
> - printf ("nanosleep time %ju.%.9ju outside reasonable range\n",
> - (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
> - result = 1;
> - }
> }
> }
>
>
>> And then maybe we can add *another* test that might evaluate timings report
>> as the tests was originally intended?
>>
> Thus we would have time/tst-cpuclock1 which just performs functional checks
> and time/tst-cpuclock1-timings which additionally performs the just
> removed timing checks?
I don't have a strong opinion if this check is indeed required in fact
since it is prone to false positive (even more often with more elaborated
virtual environments). It was more a suggestion that we can move this
functional tests to a different one in a subsequent patch. I am fine
with just your suggestion above.
>
> The timing checks could be enabled by setting a macro:
> tst-cpuclock1-timings.c:
> #define ENABLE_TIMING_CHECKS 1
> #include <tst-cpuclock1.c>
>
> But then time/tst-cpuclock1-timings would fail as often as the current
> time/tst-cpuclock1 test if run on systems with high cpu-load /
> virtualized CPUs. Should the valid limits be adjusted? If yes, which
> limits should be used?
> I think at least the "before - after" check which compares different
> clocks should be removed.
>
> Bye,
> Stefan
>
next prev parent reply other threads:[~2020-10-23 12:03 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-19 14:47 Stefan Liebler
2020-10-21 12:58 ` Adhemerval Zanella
2020-10-23 8:59 ` Stefan Liebler
2020-10-23 12:03 ` Adhemerval Zanella [this message]
2020-10-23 13:09 ` Stefan Liebler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e049719a-467e-0f55-8ac2-cf093ad86642@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=carlos@redhat.com \
--cc=libc-alpha@sourceware.org \
--cc=stli@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).