public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2] Loosen the limits of time/tst-cpuclock1.
@ 2020-10-19 14:47 Stefan Liebler
  2020-10-21 12:58 ` Adhemerval Zanella
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Liebler @ 2020-10-19 14:47 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f
"Fix time/tst-cpuclock1 intermitent failures" (2020-07-11),
this test fails quite often on s390x/s390 with one/multiple of those:
"before - after" / "nanosleep time" / "dead - after" ourside reasonable range.

On a zVM/kvm guest the CPUs are shared between multiple guests.
And even on the lpar (kvm host) the CPUs are usually shared between multiple lpars.
The defined CPUs for a lpar/zVM-system could also have lower weights compared
to other lpars which let the steal time further grow.

Usually I build (-j$(nproc)) and test (PARALLELMFLAGS="-j$(nproc)") glibc multiple
times, e.g. with different GCCs, on various lpars or zVM guests at the same time.
During this time, I've run the test for 13500 times and obvserved the following fails:
~600x "before - after"
~60x "nanosleep time"
~70x "dead - after"

I've also observed a lot of "before - after" fails on a intel kvm-guest while
building/testing glibc on it.

The mentioned commit has tighten the limits of valid tv_nsec ranges:
"before - after" (expected: 500000000):
- 100000000 ... 600000000
+ 450000000 ... 550000000

"nanosleep time" (expected: 100000000):
- 100000000 ... 200000000
+ 090000000 ... 120000000

"dead - after" (expected: 100000000):
-           ... 200000000
+ 090000000 ... 120000000

The test itself forks a child process which chew_cpu (user- and kernel-space).
The parent process sleeps with nanosleep(0.5s) and measures the child_clock time:
diff = after - before
With much workload on the machine, the child won't make much progess
and it can fall much beyond the minimum limit. Thus this check is now removed!

Afterwards the parent process sleeps with clock_nanosleep (child_clock, 0.1s):
diff = afterns - after
The test currently also allows 0.9 * 0.1s.  As this would be an error, the
hard limit of 1.0 * 0.1s is now used as minimum border!
Depending on the workload, the maximum limit can exceed the 1.2 * 0.1s.
Therefore the upper limit is set to 2.0 which was also used before the
mentioned commit.

For "dead - after", the parent process kills the child process and waits long
enough to let the child finish dying. Then it gets the time of the child:
diff = dead - after
Note that diff also contains the time for the previous clock_nanosleep.
Thus you'll often see both fails at the same time.
As we are here not interested in the timings but in checking the interface,
this timing check is also removed.
---
 time/tst-cpuclock1.c | 67 ++++++++++----------------------------------
 1 file changed, 15 insertions(+), 52 deletions(-)

diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c
index 1ac611a92b..cc08150654 100644
--- a/time/tst-cpuclock1.c
+++ b/time/tst-cpuclock1.c
@@ -125,7 +125,7 @@ do_test (void)
 	  child, (unsigned long int) child_clock,
 	  (uintmax_t) res.tv_sec, (uintmax_t) res.tv_nsec);
 
-  struct timespec before, after;
+  struct timespec before;
   if (clock_gettime (child_clock, &before) < 0)
     {
       printf ("clock_gettime on live PID %d clock %lx => %s\n",
@@ -137,38 +137,7 @@ do_test (void)
   printf ("live PID %d before sleep => %ju.%.9ju\n",
 	  child, (uintmax_t) before.tv_sec, (uintmax_t) before.tv_nsec);
 
-  struct timespec sleeptime = { .tv_nsec = 500000000 };
-  if (nanosleep (&sleeptime, NULL) != 0)
-    {
-      perror ("nanosleep");
-      result = 1;
-      goto done;
-    }
-
-  if (clock_gettime (child_clock, &after) < 0)
-    {
-      printf ("clock_gettime on live PID %d clock %lx => %s\n",
-	      child, (unsigned long int) child_clock, strerror (errno));
-      result = 1;
-      goto done;
-    }
-  /* Should be close to 0.5.  */
-  printf ("live PID %d after sleep => %ju.%.9ju\n",
-	  child, (uintmax_t) after.tv_sec, (uintmax_t) after.tv_nsec);
-
-  /* The bound values are empirically defined by testing this code over high cpu
-     usage and different nice values. Of all the values we keep the 90th
-     percentile of values and use those values for our testing allowed range.  */
-  struct timespec diff = timespec_sub (support_timespec_normalize (after),
-				       support_timespec_normalize (before));
-  if (!support_timespec_check_in_range (sleeptime, diff, .9,  1.1))
-    {
-      printf ("before - after %ju.%.9ju outside reasonable range\n",
-	      (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
-      result = 1;
-    }
-
-  sleeptime.tv_nsec = 100000000;
+  struct timespec sleeptime = { .tv_nsec = 100000000 };
   e = clock_nanosleep (child_clock, 0, &sleeptime, NULL);
   if (e == EINVAL || e == ENOTSUP || e == ENOSYS)
     {
@@ -191,13 +160,19 @@ do_test (void)
 	}
       else
 	{
-        /* The bound values are empirically defined by testing this code over
-           high cpu usage and different nice values. Of all the values we keep
-           the 90th percentile of values and use those values for our testing
-           allowed range.  */
+	  printf ("live PID %d after sleep => %ju.%.9ju\n",
+		  child, (uintmax_t) afterns.tv_sec,
+		  (uintmax_t) afterns.tv_nsec);
+
+	  /* As the sleep is based on the child clock, the diff should never
+	     be less than the specified sleeptime.  Otherwise this is an error.
+	     The upper bound is quite high in order to get no failure if running
+	     with high cpu usage and/or on virtualized environments with shared
+	     CPUs.  */
+	  struct timespec diff;
 	  diff = timespec_sub (support_timespec_normalize (afterns),
-			       support_timespec_normalize (after));
-	  if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.2))
+			       support_timespec_normalize (before));
+	  if (!support_timespec_check_in_range (sleeptime, diff, 1.0, 2.0))
 	    {
 	      printf ("nanosleep time %ju.%.9ju outside reasonable range\n",
 		      (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
@@ -231,21 +206,9 @@ do_test (void)
       result = 1;
       goto done;
     }
-  /* Should be close to 0.6.  */
+  /* Should be close to 0.1.  */
   printf ("dead PID %d => %ju.%.9ju\n",
 	  child, (uintmax_t) dead.tv_sec, (uintmax_t) dead.tv_nsec);
-  /* The bound values are empirically defined by testing this code over high cpu
-     usage and different nice values. Of all the values we keep the 90th
-     percentile of values and use those values for our testing allowed range.  */
-  diff = timespec_sub (support_timespec_normalize (dead),
-		       support_timespec_normalize (after));
-  sleeptime.tv_nsec = 100000000;
-  if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.2))
-    {
-      printf ("dead - after %ju.%.9ju outside reasonable range\n",
-	      (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
-      result = 1;
-    }
 
   /* Now reap the child and verify that its clock is no longer valid.  */
   {
-- 
2.23.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-10-23 13:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19 14:47 [PATCH v2] Loosen the limits of time/tst-cpuclock1 Stefan Liebler
2020-10-21 12:58 ` Adhemerval Zanella
2020-10-23  8:59   ` Stefan Liebler
2020-10-23 12:03     ` Adhemerval Zanella
2020-10-23 13:09       ` Stefan Liebler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).