Re: load average calculation imperfections

public inbox for cygwin-developers@cygwin.com
 help / color / mirror / Atom feed

From: Mark Geisert <mark@maxrnd.com>
To: cygwin-developers@cygwin.com
Subject: Re: load average calculation imperfections
Date: Sun, 15 May 2022 22:25:47 -0700	[thread overview]
Message-ID: <5dbeb18a-92ef-4b6a-64eb-8fe1f60887fc@maxrnd.com> (raw)
In-Reply-To: <Yn47gb2o07WjnDlk@calimero.vinschen.de>

Corinna Vinschen wrote:
> On May 13 13:04, Corinna Vinschen wrote:
>> On May 13 11:34, Jon Turney wrote:
>>> On 12/05/2022 10:48, Corinna Vinschen wrote:
>>>> On May 11 16:40, Mark Geisert wrote:
>>>>>
>>>>> The first counter read now gets error 0xC0000BC6 == PDH_INVALID_DATA, but no
>>>>> errors on subsequent counter reads.  This sounds like it now matches what
>>>>> Corinna reported for W11.  I wonder if she's running build 1706 already.
>>>>
>>>> Erm... looks like I didn't read your mail throughly enough.
>>>>
>>>> This behaviour, the first call returning with PDH_INVALID_DATA and only
>>>> subsequent calls returning valid(?) values, is what breaks the
>>>> getloadavg function and, consequentially, /proc/loadavg.  So maybe xload
>>>> now works, but Cygwin is still broken.
>>>
>>> The first attempt to read '% Processor Time' is expected to fail with
>>> PDH_INVALID_DATA, since it doesn't have a value at a particular instant, but
>>> one averaged over a period of time.
>>>
>>> This is what the following comment is meant to record:
>>>
>>> "Note that PDH will only return data for '% Processor Time' after the second
>>> call to PdhCollectQueryData(), as it's computed over an interval, so the
>>> first attempt to estimate load will fail and 0.0 will be returned."
>>
>> But.
>>
>> Every invocation of getloadavg() returns 0.  Even under load.  Calling
>> `cat /proc/loadavg' is an excercise in futility.
>>
>> The only way to make getloadavg() work is to call it in a loop from the
>> same process with a 1 sec pause between invocations.  In that case, even
>> a parallel `cat /proc/loadavg' shows the same load values.
>>
>> However, as soon as I stop the looping process, the /proc/loadavg values
>> are frozen in the last state they had when stopping that process.
> 
> Oh, and, stopping and restarting all Cygwin processes in the session will
> reset the loadavg to 0.
> 
>> Any suggestions how to fix this?

I'm getting somewhat better behavior from repeated 'cat /proc/loadavg' with the 
following update to Cygwin's loadavg.cc:

diff --git a/winsup/cygwin/loadavg.cc b/winsup/cygwin/loadavg.cc
index 127591a2e..cceb3e9fe 100644
--- a/winsup/cygwin/loadavg.cc
+++ b/winsup/cygwin/loadavg.cc
@@ -87,6 +87,9 @@ static bool load_init (void)
      }

      initialized = true;
+
+    /* prime the data pump, hopefully */
+    (void) PdhCollectQueryData (query);
    }

    return initialized;

It's only somewhat better because it seems like multiple updaters of the load 
average act sort of independently.  It's hard to characterize what I'm seeing but 
let me try.

First let me shove xload aside by saying it shows instantaneous load and is thus a 
different animal.  It only cares about total %processor time, so its load average 
value never goes higher than ncpus, nor does it have any decay behavior built-in.

Any other Cygwin app I know of is using getloadavg() under the hood.  When it 
calculates a new set of 1,5,15 minute load averages, it uses total %processor time 
and total processor queue length.  It has a decay behavior that I think has been 
around since early Unix.  What I haven't noticed before is an "inverse" decay 
behavior that seems wrong to me, but maybe Linux has this.  That is, if you have 
just one compute-bound process the load average won't reach 1.0 until that process 
has been running for a full minute.  You don't see instantaneous load.

I guess that's all reasonable so far.  But I think the wrinkle Cygwin is adding, 
allowing the load average to be calculated by multiple updaters, makes it seem 
like updaters are not keeping in sync with each other despite the loadavginfo 
shared data.  I can't quite wrap my head around the current implementation to 
prove or disprove its correctness.

Ideally, the shared data should have the most recently calculated 1,5,15 minute 
load averages and a timestamp of when they were calculated.  And then any process 
that calls getloadavg() should independently decide whether it's time to calculate 
an updated set of values for machine-wide use.  But can the decay calculations get 
messed up due to multiple updaters?  I want to say no, but I can't quite convince 
myself.  Each updater has its own idea of the 1,5,15 timespans, doesn't it, 
because updates can occur at random, rather than at a set period like a kernel 
would do?

..mark

next prev parent reply	other threads:[~2022-05-16  5:25 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.BSF.4.63.2205051618470.42373@m0.truegem.net>
2022-05-08  7:01 ` load average calculation failing Mark Geisert
     [not found]   ` <223aa826-7bf9-281a-aed8-e16349de5b96@dronecode.org.uk>
2022-05-09  8:45     ` Corinna Vinschen
2022-05-09  8:53       ` Corinna Vinschen
2022-05-10  8:34       ` Mark Geisert
2022-05-10 13:37         ` Jon Turney
2022-05-11 23:40           ` load average calculation failing -- fixed by Windows update Mark Geisert
2022-05-12  8:17             ` Corinna Vinschen
2022-05-12  8:24               ` Mark Geisert
2022-05-12  8:43                 ` Corinna Vinschen
2022-05-12  9:48             ` Corinna Vinschen
2022-05-13 10:34               ` Jon Turney
2022-05-13 11:04                 ` Corinna Vinschen
2022-05-13 11:05                   ` Corinna Vinschen
2022-05-16  5:25                     ` Mark Geisert [this message]
2022-05-16 16:49                       ` load average calculation imperfections Jon Turney
2022-05-17  5:39                         ` Mark Geisert
2022-05-17 14:48                     ` load average calculation failing -- fixed by Windows update Jon Turney
2022-05-17 19:48                       ` Mark Geisert
2022-05-09 11:29   ` load average calculation failing Jon Turney
2022-05-10  8:21     ` Mark Geisert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5dbeb18a-92ef-4b6a-64eb-8fe1f60887fc@maxrnd.com \
    --to=mark@maxrnd.com \
    --cc=cygwin-developers@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).