Re: load average calculation imperfections

public inbox for cygwin-developers@cygwin.com
 help / color / mirror / Atom feed

From: Mark Geisert <mark@maxrnd.com>
To: cygwin-developers@cygwin.com
Subject: Re: load average calculation imperfections
Date: Mon, 16 May 2022 22:39:45 -0700	[thread overview]
Message-ID: <e94c85cd-bc94-7e48-eb14-93ce22344f90@maxrnd.com> (raw)
In-Reply-To: <ceee3f15-52ea-d679-67db-d1573eec5616@dronecode.org.uk>

Jon Turney wrote:
> On 16/05/2022 06:25, Mark Geisert wrote:
>> Corinna Vinschen wrote:
>>> On May 13 13:04, Corinna Vinschen wrote:
>>>> On May 13 11:34, Jon Turney wrote:
>>>>> On 12/05/2022 10:48, Corinna Vinschen wrote:
>>>>>> On May 11 16:40, Mark Geisert wrote:
>>>>>>>
>>>>>>> The first counter read now gets error 0xC0000BC6 == PDH_INVALID_DATA, but no
>>>>>>> errors on subsequent counter reads.  This sounds like it now matches what
>>>>>>> Corinna reported for W11.  I wonder if she's running build 1706 already.
>>>>>>
>>>>>> Erm... looks like I didn't read your mail throughly enough.
>>>>>>
>>>>>> This behaviour, the first call returning with PDH_INVALID_DATA and only
>>>>>> subsequent calls returning valid(?) values, is what breaks the
>>>>>> getloadavg function and, consequentially, /proc/loadavg.  So maybe xload
>>>>>> now works, but Cygwin is still broken.
>>>>>
>>>>> The first attempt to read '% Processor Time' is expected to fail with
>>>>> PDH_INVALID_DATA, since it doesn't have a value at a particular instant, but
>>>>> one averaged over a period of time.
>>>>>
>>>>> This is what the following comment is meant to record:
>>>>>
>>>>> "Note that PDH will only return data for '% Processor Time' after the second
>>>>> call to PdhCollectQueryData(), as it's computed over an interval, so the
>>>>> first attempt to estimate load will fail and 0.0 will be returned."
>>>>
>>>> But.
>>>>
>>>> Every invocation of getloadavg() returns 0.  Even under load.  Calling
>>>> `cat /proc/loadavg' is an excercise in futility.
>>>>
>>>> The only way to make getloadavg() work is to call it in a loop from the
>>>> same process with a 1 sec pause between invocations.  In that case, even
>>>> a parallel `cat /proc/loadavg' shows the same load values.
>>>>
>>>> However, as soon as I stop the looping process, the /proc/loadavg values
>>>> are frozen in the last state they had when stopping that process.
>>>
>>> Oh, and, stopping and restarting all Cygwin processes in the session will
>>> reset the loadavg to 0.
>>>
>>>> Any suggestions how to fix this?
>>
>> I'm getting somewhat better behavior from repeated 'cat /proc/loadavg' with the 
>> following update to Cygwin's loadavg.cc:
>>
>> diff --git a/winsup/cygwin/loadavg.cc b/winsup/cygwin/loadavg.cc
>> index 127591a2e..cceb3e9fe 100644
>> --- a/winsup/cygwin/loadavg.cc
>> +++ b/winsup/cygwin/loadavg.cc
>> @@ -87,6 +87,9 @@ static bool load_init (void)
>>       }
>>
>>       initialized = true;
>> +
>> +    /* prime the data pump, hopefully */
>> +    (void) PdhCollectQueryData (query);
>>     }
> 
> Yeah, something like this might be a good idea, as at the moment we report load 
> averages of 0 for the 5 seconds after the first time someone asks for it.
> 
> It's not ideal, because with this change, we go on to call PdhCollectQueryData() 
> again very shortly afterwards, so the first value for '% Processor Time' is 
> measured over a very short interval, and so may be very inaccurate.

Perhaps add a short delay, say 100ms, after that first PdhCollectQueryData()? 
Enough for anything compute-bound to be measurable but not enough to be 
human-noticeable?  Something even shorter?

[...]
>> Any other Cygwin app I know of is using getloadavg() under the hood. When it 
>> calculates a new set of 1,5,15 minute load averages, it uses total %processor 
>> time and total processor queue length.  It has a decay behavior that I think has 
>> been around since early Unix.  What I haven't noticed before is an "inverse" 
>> decay behavior that seems wrong to me, but maybe Linux has this.  That is, if 
>> you have just one compute-bound process the load average won't reach 1.0 until 
>> that process has been running for a full minute.  You don't see instantaneous load.
> 
> In fact it asymptotically approaches 1, so it wouldn't each it until you've had a 
> load of 1 for a long time compared to the time you are averaging over.
> 
> Starting from idle, a unit load after 1 minute would result in an 1-minute load 
> average of (1 - (1/e)) = ~0.62.   See 
> https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html for some 
> discussion of that.
> 
> That's just how it works, as a measure of demand, not load.

Thanks for that link; that was interesting to read.  OK on that's how it is, the 
ramp even more drawn out over time than I was thinking.

[...]
>> Ideally, the shared data should have the most recently calculated 1,5,15 minute 
>> load averages and a timestamp of when they were calculated.  And then any 
>> process that calls getloadavg() should independently decide whether it's time to 
>> calculate an updated set of values for machine-wide use.  But can the decay 
>> calculations get messed up due to multiple updaters?  I want to say no, but I 
>> can't quite convince myself.  Each updater has its own idea of the 1,5,15 
>> timespans, doesn't it, because updates can occur at random, rather than at a set 
>> period like a kernel would do?
> 
> I think not, because last_time is part of the shared loadavginfo state, which is 
> the unix epoch time that the last update was computed, and updating that is 
> guarded by a mutex.
> 
> That's not to say that this code might not be wrong in some other way :)

Alright, I see the problem with how I was visualizing multiple updaters.  I was 
thinking of the "real" load average over time as a superposition (sum, I guess) of 
the decaying exponential curves of all the updaters' calculations.  But no, each 
updater replaces the current curve with a new one based on its own new data.  What 
I was envisioning would be much more complex and require more state memory.  Oof.

I can submit a patch for the added PdhCollectQueryData() plus short Sleep() if it 
would make sense to try it for awhile on Cygwin head.  Other suggestions welcome.

..mark

next prev parent reply	other threads:[~2022-05-17  5:39 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.BSF.4.63.2205051618470.42373@m0.truegem.net>
2022-05-08  7:01 ` load average calculation failing Mark Geisert
     [not found]   ` <223aa826-7bf9-281a-aed8-e16349de5b96@dronecode.org.uk>
2022-05-09  8:45     ` Corinna Vinschen
2022-05-09  8:53       ` Corinna Vinschen
2022-05-10  8:34       ` Mark Geisert
2022-05-10 13:37         ` Jon Turney
2022-05-11 23:40           ` load average calculation failing -- fixed by Windows update Mark Geisert
2022-05-12  8:17             ` Corinna Vinschen
2022-05-12  8:24               ` Mark Geisert
2022-05-12  8:43                 ` Corinna Vinschen
2022-05-12  9:48             ` Corinna Vinschen
2022-05-13 10:34               ` Jon Turney
2022-05-13 11:04                 ` Corinna Vinschen
2022-05-13 11:05                   ` Corinna Vinschen
2022-05-16  5:25                     ` load average calculation imperfections Mark Geisert
2022-05-16 16:49                       ` Jon Turney
2022-05-17  5:39                         ` Mark Geisert [this message]
2022-05-17 14:48                     ` load average calculation failing -- fixed by Windows update Jon Turney
2022-05-17 19:48                       ` Mark Geisert
2022-05-09 11:29   ` load average calculation failing Jon Turney
2022-05-10  8:21     ` Mark Geisert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e94c85cd-bc94-7e48-eb14-93ce22344f90@maxrnd.com \
    --to=mark@maxrnd.com \
    --cc=cygwin-developers@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).