From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 120299 invoked by alias); 19 Feb 2020 18:51:49 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 120282 invoked by uid 89); 19 Feb 2020 18:51:48 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-22.1 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-qk1-f195.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=to:references:from:autocrypt:subject:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=cSV+GnYA7PNVUB7y50xW88kwfv4dNO8e3o+MotqF33o=; b=gcrX+e31kEG+vLHfwbYdVnvUP8aT+NQCSDYlbhtYx3yqxCe88Zi3m1PcR1Y8trW/BY xtPXc91Y34LfSbPcq3Vq+ieBrWX3/Fs+2DbYdHew/Zot2KH8X9TfZumW92XORRutYNRY Xe7xlnbZulb2sOLqK5bYWLJX5K4dkUzhdOBL+7MtBlSDqYMLrwAHDu68r749X0Mb28SU 5jeSTrDr36SyMW3pZhZGjmgFCWpUvWWYtoEYgcAXRISQofxOpnvDJ1WBzrPnpfmQXCbo P3kp5DvA4HPYgn8fgznzzsJcXffdj/qhZMi5opECgmpN4UQMM9Q3t1R3YXeFeqiUoycH GJIQ== Return-Path: To: "Lucas A. M. Magalhaes" , libc-alpha@sourceware.org References: <20200206144819.19046-1-lamm@linux.ibm.com> <158213053262.30076.4354318606424637635@dhcp-9-18-235-152.br.ibm.com> From: Adhemerval Zanella Subject: Re: [PATCH v2] Fix time/tst-cpuclock1 intermitent failures Message-ID: <37861457-ff4d-076c-9d9a-270266108aef@linaro.org> Date: Wed, 19 Feb 2020 18:51:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <158213053262.30076.4354318606424637635@dhcp-9-18-235-152.br.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2020-02/txt/msg00849.txt.bz2 On 19/02/2020 13:42, Lucas A. M. Magalhaes wrote: > Hi, > Thanks for the review :). > > Quoting Adhemerval Zanella (2020-02-18 09:44:08) >> >> >> On 06/02/2020 11:48, Lucas A. M. Magalhaes wrote: >>> This test fails intermittently in systems with heavy load as >>> CLOCK_PROCESS_CPUTIME_ID is subject to scheduler pressure. Thus the >>> test boundaries where relaxed to keep it from fail on this systems. >>> >>> A refactor of the spent time checking was made with some support >>> functions. With the advantage to represent time jitter in percent >>> of the target. >>> >>> The values used by the test boundaries are all empirical. >>> >>> --- >>> >>> Hi, >>> >>> Please Carlos see if this is what you asked. >>> >>> I spent sometime gathering the spent times to find the values >>> for the boundaries. From this I selected values that will fail less >>> than 1% of the time, in the tested machines. >>> >>> Also as I found the spent time deviation completely asymmetrical >>> I choose to separate the values in upper and lower bounds. >>> >>> changes from V2: >>> - Add support functions >> >> None of the files follow the code and style guideline [1]. >> >> [1] https://sourceware.org/glibc/wiki/Style_and_Conventions >> > > OK, Sorry about that. > >>> >>> support/Makefile | 1 + >>> support/cpuclock.c | 51 ++++++++++++++++++++++++++++++++++++++++++++ >>> support/cpuclock.h | 30 ++++++++++++++++++++++++++ >>> time/tst-cpuclock1.c | 43 ++++++++++--------------------------- >>> 4 files changed, 93 insertions(+), 32 deletions(-) >>> create mode 100644 support/cpuclock.c >>> create mode 100644 support/cpuclock.h >>> >>> diff --git a/support/Makefile b/support/Makefile >>> index 3325feb790..b308fe0856 100644 >>> --- a/support/Makefile >>> +++ b/support/Makefile >>> @@ -31,6 +31,7 @@ libsupport-routines = \ >>> check_dns_packet \ >>> check_hostent \ >>> check_netent \ >>> + cpuclock \ >>> delayed_exit \ >>> ignore_stderr \ >>> next_to_fault \ >>> diff --git a/support/cpuclock.c b/support/cpuclock.c >>> new file mode 100644 >>> index 0000000000..f40c7e8d4a >>> --- /dev/null >>> +++ b/support/cpuclock.c >>> @@ -0,0 +1,51 @@ >>> +/* Support functions for cpuclock tests. >>> + Copyright (C) 2018-2020 Free Software Foundation, Inc. >>> + This file is part of the GNU C Library. >>> + >>> + The GNU C Library is free software; you can redistribute it and/or >>> + modify it under the terms of the GNU Lesser General Public >>> + License as published by the Free Software Foundation; either >>> + version 2.1 of the License, or (at your option) any later version. >>> + >>> + The GNU C Library is distributed in the hope that it will be useful, >>> + but WITHOUT ANY WARRANTY; without even the implied warranty of >>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>> + Lesser General Public License for more details. >>> + >>> + You should have received a copy of the GNU Lesser General Public >>> + License along with the GNU C Library; if not, see >>> + . */ >>> + >>> +#include "cpuclock.h" >>> +#include >>> + >>> +#define T_1s 1000000000.0 >>> + >>> +struct timespec time_normalize(struct timespec t) { >>> + int diff; >>> + diff = (t.tv_nsec / T_1s); >>> + t.tv_sec += diff; >>> + t.tv_nsec += -(diff * T_1s); >>> + return t; >>> +} >>> + >>> +struct timespec time_add(struct timespec a, struct timespec b) { >>> + struct timespec s = {.tv_sec = a.tv_sec + b.tv_sec, >>> + .tv_nsec = a.tv_nsec + b.tv_nsec}; >>> + return time_normalize(s); >>> +} >>> + >>> +struct timespec time_sub(struct timespec a, struct timespec b) { >>> + struct timespec d = {.tv_sec = a.tv_sec - b.tv_sec, >>> + .tv_nsec = a.tv_nsec - b.tv_nsec}; >>> + return time_normalize(d); >>> +} >> >> I don't using float operation is the correct solution to handle both >> invalid inputs and overflow results. We already have a working solution >> (support/timespec-add.c and support/timespec-sub.c), why do you need >> to add such functions? >> > > Well I actually didn't notice them, as I was suggested to implement it. > Thanks for pointing it out. I will remove them and use the existing > solution. > >>> + >>> +int percent_diff_check(struct timespec base, struct timespec dev, >>> + float upper_bound, float lower_bound) { >>> + double timea = base.tv_sec + base.tv_nsec / T_1s; >>> + double timeb = dev.tv_sec + dev.tv_nsec / T_1s; >>> + double diff = (timea - timeb) / (0.5 * (timea + timeb)); >>> + return (diff >= -upper_bound && diff <= lower_bound ); >>> +} >>> + >> >> Please add some description of what this function do. And I think >> it would be better to normalize to nanoseconds instead of seconds >> to avoid floating-point cancellation due the range difference. > > I'm using double here because this "(timea + timeb)" was sometimes > overflowing. > > IMHO this will not be used to check, or compute, really small jitter, > so floating-point cancellation should not be a problem. But if you see > that it's worth I could to change to long and use: > > double diff = (timea - timeb) / max(timea, timeb); > > instead of > > double diff = (timea - timeb) / (0.5 * (timea + timeb)); > >> Something like: >> >> /* Return true if the DIFF time is within the ratio > > IMHO it makes more sense to think in it as a relative difference rather > than in ration. As "diff time is x% bigger than base" instead of "diff > time is x% of the base". So I want to keep this approach. But the idea of test refactoring is to check if value is in a bounded range of another value, i.e, to check if the CPU time spent in the created process is within an expected value. I personally think is quite confusing the function call: percent_diff_check(sleeptime, diff, .6, .34) Which is simplified to: -0.6 <= (sleeptime - diff)/(0.5*(sleeptime + diff) <= 0.34 And the result will be negative iff diff is larger than sleeptime. And this might happen iff the calling process is using more than one CPU (so total number of CPU time will be higher than total sleep time which assumes only one CPU). Also, for such scenarios it really awkward to define the expected threshold. For instance, assuming child process has two threads chewing up CPU and the parent process sleeping for 0.5s. The result CPU for child would be close to 1s, and with your formula the resulting diff will be ~ -0.66. This is confusing to set the threshold on such scenario. With my suggestion, for diff time larger than base one you set a lower positive threshold; while for diff time lower than base you set a large higher positive threshold. > >> [upper_bound, lower_bound] of DIFF time, or false otherwise. >> >> For instance: >> >> struct timespec diff = { 3, 94956 }; >> struct timespec base = { 4, 0 }; >> >> The call checks if the ratio of diff/base is within the >> bounds of (0.65, 1.0) (i.e, if 'diff' is at least 65% of >> the 'base' value). >> >> support_timespec_check_in_range (base, diff, 1.0, 0.65); */ >> bool support_timespec_check_ratio (struct timespec base, >> struct timespec diff, >> double upper_bound, >> double lower_bound) >> { >> assert (upper_bound >= lower_bound); >> uint64_t base_norm = base.tv_sec * TIMESPEC_HZ + base.tv_nsec; >> uint64_t diff_norm = diff.tv_sec * TIMESPEC_HZ + diff.tv_nsec; >> double ratio = (double) base_norm / (double) diff_norm; >> return ratio >= lower_bound && ratio <= upper_bound; >> } In fact I think base_norm and diff_norm should be calculated as double here. >> >> >> I think it should follow the libsupport name convention of prepending >> 'support_' on file name. >> >> The same name convention should be used on the exported interfaces >> (support_*). >> >> And I see that using time_* is confusing because the underlying type >> is a 'timespec'. On 'include/time.h' the type is used on function >> name, I think we should do the same here. >> >> > > Ok. > >>> diff --git a/support/cpuclock.h b/support/cpuclock.h >>> new file mode 100644 >>> index 0000000000..2505fbdcff >>> --- /dev/null >>> +++ b/support/cpuclock.h >>> @@ -0,0 +1,30 @@ >>> +/* Support functions for cpuclock tests. >>> + Copyright (C) 2018-2020 Free Software Foundation, Inc. >> >> This implementation is not based on old tests, I think the copyright >> years should be only 2020. >> >>> + This file is part of the GNU C Library. >>> + >>> + The GNU C Library is free software; you can redistribute it and/or >>> + modify it under the terms of the GNU Lesser General Public >>> + License as published by the Free Software Foundation; either >>> + version 2.1 of the License, or (at your option) any later version. >>> + >>> + The GNU C Library is distributed in the hope that it will be useful, >>> + but WITHOUT ANY WARRANTY; without even the implied warranty of >>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>> + Lesser General Public License for more details. >>> + >>> + You should have received a copy of the GNU Lesser General Public >>> + License along with the GNU C Library; if not, see >>> + . */ >>> + >>> +#ifndef SUPPORT_CPUCLOCK_H >>> +#define SUPPORT_CPUCLOCK_H >>> + >>> +#include >>> + >>> +struct timespec time_normalize(struct timespec t); >>> +struct timespec time_add(struct timespec a, struct timespec b); >>> +struct timespec time_sub(struct timespec a, struct timespec b); >>> +int percent_diff_check(struct timespec base, struct timespec dev, >>> + float upper_bound, float lower_bound); >>> + >>> +#endif /* SUPPORT_CPUCLOCK_H */ >> >> As before. >> >>> diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c >>> index 0120906f23..4051de1646 100644 >>> --- a/time/tst-cpuclock1.c >>> +++ b/time/tst-cpuclock1.c >>> @@ -26,6 +26,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> >>> /* This function is intended to rack up both user and system time. */ >>> static void >>> @@ -155,19 +156,11 @@ do_test (void) >>> printf ("live PID %d after sleep => %ju.%.9ju\n", >>> child, (uintmax_t) after.tv_sec, (uintmax_t) after.tv_nsec); >>> >>> - struct timespec diff = { .tv_sec = after.tv_sec - before.tv_sec, >>> - .tv_nsec = after.tv_nsec - before.tv_nsec }; >>> - if (diff.tv_nsec < 0) >>> - { >>> - --diff.tv_sec; >>> - diff.tv_nsec += 1000000000; >>> - } >>> - if (diff.tv_sec != 0 >>> - || diff.tv_nsec > 600000000 >>> - || diff.tv_nsec < 100000000) >>> + struct timespec diff = time_sub(after, before); >>> + if (!percent_diff_check(sleeptime, diff, .3, 2)) >>> { >>> printf ("before - after %ju.%.9ju outside reasonable range\n", >>> - (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); >>> + (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); >>> result = 1; >>> } >>> >>> @@ -194,19 +187,11 @@ do_test (void) >>> } >>> else >>> { >>> - struct timespec d = { .tv_sec = afterns.tv_sec - after.tv_sec, >>> - .tv_nsec = afterns.tv_nsec - after.tv_nsec }; >>> - if (d.tv_nsec < 0) >>> - { >>> - --d.tv_sec; >>> - d.tv_nsec += 1000000000; >>> - } >>> - if (d.tv_sec > 0 >>> - || d.tv_nsec < sleeptime.tv_nsec >>> - || d.tv_nsec > sleeptime.tv_nsec * 2) >>> + diff = time_sub(afterns, after); >>> + if (!percent_diff_check(sleeptime, diff, .6, .34)) >>> { >>> printf ("nanosleep time %ju.%.9ju outside reasonable range\n", >>> - (uintmax_t) d.tv_sec, (uintmax_t) d.tv_nsec); >>> + (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); >>> result = 1; >>> } >>> } >>> @@ -241,20 +226,14 @@ do_test (void) >>> printf ("dead PID %d => %ju.%.9ju\n", >>> child, (uintmax_t) dead.tv_sec, (uintmax_t) dead.tv_nsec); >>> >>> - diff.tv_sec = dead.tv_sec - after.tv_sec; >>> - diff.tv_nsec = dead.tv_nsec - after.tv_nsec; >>> - if (diff.tv_nsec < 0) >>> - { >>> - --diff.tv_sec; >>> - diff.tv_nsec += 1000000000; >>> - } >>> - if (diff.tv_sec != 0 || diff.tv_nsec > 200000000) >>> + diff = time_sub(dead, after); >>> + sleeptime.tv_nsec = 100000000; >>> + if (!percent_diff_check(sleeptime, diff, .6, .36)) >>> { >>> printf ("dead - after %ju.%.9ju outside reasonable range\n", >>> - (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); >>> + (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); >>> result = 1; >>> } >>> - >>> /* Now reap the child and verify that its clock is no longer valid. */ >>> { >>> int x; >>>