public inbox for cygwin-developers@cygwin.com
 help / color / mirror / Atom feed
From: Mark Geisert <mark@maxrnd.com>
To: cygwin-developers@cygwin.com
Subject: Re: load average calculation imperfections
Date: Mon, 16 May 2022 22:39:45 -0700	[thread overview]
Message-ID: <e94c85cd-bc94-7e48-eb14-93ce22344f90@maxrnd.com> (raw)
In-Reply-To: <ceee3f15-52ea-d679-67db-d1573eec5616@dronecode.org.uk>

Jon Turney wrote:
> On 16/05/2022 06:25, Mark Geisert wrote:
>> Corinna Vinschen wrote:
>>> On May 13 13:04, Corinna Vinschen wrote:
>>>> On May 13 11:34, Jon Turney wrote:
>>>>> On 12/05/2022 10:48, Corinna Vinschen wrote:
>>>>>> On May 11 16:40, Mark Geisert wrote:
>>>>>>>
>>>>>>> The first counter read now gets error 0xC0000BC6 == PDH_INVALID_DATA, but no
>>>>>>> errors on subsequent counter reads.  This sounds like it now matches what
>>>>>>> Corinna reported for W11.  I wonder if she's running build 1706 already.
>>>>>>
>>>>>> Erm... looks like I didn't read your mail throughly enough.
>>>>>>
>>>>>> This behaviour, the first call returning with PDH_INVALID_DATA and only
>>>>>> subsequent calls returning valid(?) values, is what breaks the
>>>>>> getloadavg function and, consequentially, /proc/loadavg.  So maybe xload
>>>>>> now works, but Cygwin is still broken.
>>>>>
>>>>> The first attempt to read '% Processor Time' is expected to fail with
>>>>> PDH_INVALID_DATA, since it doesn't have a value at a particular instant, but
>>>>> one averaged over a period of time.
>>>>>
>>>>> This is what the following comment is meant to record:
>>>>>
>>>>> "Note that PDH will only return data for '% Processor Time' after the second
>>>>> call to PdhCollectQueryData(), as it's computed over an interval, so the
>>>>> first attempt to estimate load will fail and 0.0 will be returned."
>>>>
>>>> But.
>>>>
>>>> Every invocation of getloadavg() returns 0.  Even under load.  Calling
>>>> `cat /proc/loadavg' is an excercise in futility.
>>>>
>>>> The only way to make getloadavg() work is to call it in a loop from the
>>>> same process with a 1 sec pause between invocations.  In that case, even
>>>> a parallel `cat /proc/loadavg' shows the same load values.
>>>>
>>>> However, as soon as I stop the looping process, the /proc/loadavg values
>>>> are frozen in the last state they had when stopping that process.
>>>
>>> Oh, and, stopping and restarting all Cygwin processes in the session will
>>> reset the loadavg to 0.
>>>
>>>> Any suggestions how to fix this?
>>
>> I'm getting somewhat better behavior from repeated 'cat /proc/loadavg' with the 
>> following update to Cygwin's loadavg.cc:
>>
>> diff --git a/winsup/cygwin/loadavg.cc b/winsup/cygwin/loadavg.cc
>> index 127591a2e..cceb3e9fe 100644
>> --- a/winsup/cygwin/loadavg.cc
>> +++ b/winsup/cygwin/loadavg.cc
>> @@ -87,6 +87,9 @@ static bool load_init (void)
>>       }
>>
>>       initialized = true;
>> +
>> +    /* prime the data pump, hopefully */
>> +    (void) PdhCollectQueryData (query);
>>     }
> 
> Yeah, something like this might be a good idea, as at the moment we report load 
> averages of 0 for the 5 seconds after the first time someone asks for it.
> 
> It's not ideal, because with this change, we go on to call PdhCollectQueryData() 
> again very shortly afterwards, so the first value for '% Processor Time' is 
> measured over a very short interval, and so may be very inaccurate.

Perhaps add a short delay, say 100ms, after that first PdhCollectQueryData()? 
Enough for anything compute-bound to be measurable but not enough to be 
human-noticeable?  Something even shorter?

[...]
>> Any other Cygwin app I know of is using getloadavg() under the hood. When it 
>> calculates a new set of 1,5,15 minute load averages, it uses total %processor 
>> time and total processor queue length.  It has a decay behavior that I think has 
>> been around since early Unix.  What I haven't noticed before is an "inverse" 
>> decay behavior that seems wrong to me, but maybe Linux has this.  That is, if 
>> you have just one compute-bound process the load average won't reach 1.0 until 
>> that process has been running for a full minute.  You don't see instantaneous load.
> 
> In fact it asymptotically approaches 1, so it wouldn't each it until you've had a 
> load of 1 for a long time compared to the time you are averaging over.
> 
> Starting from idle, a unit load after 1 minute would result in an 1-minute load 
> average of (1 - (1/e)) = ~0.62.   See 
> https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html for some 
> discussion of that.
> 
> That's just how it works, as a measure of demand, not load.

Thanks for that link; that was interesting to read.  OK on that's how it is, the 
ramp even more drawn out over time than I was thinking.

[...]
>> Ideally, the shared data should have the most recently calculated 1,5,15 minute 
>> load averages and a timestamp of when they were calculated.  And then any 
>> process that calls getloadavg() should independently decide whether it's time to 
>> calculate an updated set of values for machine-wide use.  But can the decay 
>> calculations get messed up due to multiple updaters?  I want to say no, but I 
>> can't quite convince myself.  Each updater has its own idea of the 1,5,15 
>> timespans, doesn't it, because updates can occur at random, rather than at a set 
>> period like a kernel would do?
> 
> I think not, because last_time is part of the shared loadavginfo state, which is 
> the unix epoch time that the last update was computed, and updating that is 
> guarded by a mutex.
> 
> That's not to say that this code might not be wrong in some other way :)

Alright, I see the problem with how I was visualizing multiple updaters.  I was 
thinking of the "real" load average over time as a superposition (sum, I guess) of 
the decaying exponential curves of all the updaters' calculations.  But no, each 
updater replaces the current curve with a new one based on its own new data.  What 
I was envisioning would be much more complex and require more state memory.  Oof.

I can submit a patch for the added PdhCollectQueryData() plus short Sleep() if it 
would make sense to try it for awhile on Cygwin head.  Other suggestions welcome.

..mark

  reply	other threads:[~2022-05-17  5:39 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.BSF.4.63.2205051618470.42373@m0.truegem.net>
2022-05-08  7:01 ` load average calculation failing Mark Geisert
     [not found]   ` <223aa826-7bf9-281a-aed8-e16349de5b96@dronecode.org.uk>
2022-05-09  8:45     ` Corinna Vinschen
2022-05-09  8:53       ` Corinna Vinschen
2022-05-10  8:34       ` Mark Geisert
2022-05-10 13:37         ` Jon Turney
2022-05-11 23:40           ` load average calculation failing -- fixed by Windows update Mark Geisert
2022-05-12  8:17             ` Corinna Vinschen
2022-05-12  8:24               ` Mark Geisert
2022-05-12  8:43                 ` Corinna Vinschen
2022-05-12  9:48             ` Corinna Vinschen
2022-05-13 10:34               ` Jon Turney
2022-05-13 11:04                 ` Corinna Vinschen
2022-05-13 11:05                   ` Corinna Vinschen
2022-05-16  5:25                     ` load average calculation imperfections Mark Geisert
2022-05-16 16:49                       ` Jon Turney
2022-05-17  5:39                         ` Mark Geisert [this message]
2022-05-17 14:48                     ` load average calculation failing -- fixed by Windows update Jon Turney
2022-05-17 19:48                       ` Mark Geisert
2022-05-09 11:29   ` load average calculation failing Jon Turney
2022-05-10  8:21     ` Mark Geisert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e94c85cd-bc94-7e48-eb14-93ce22344f90@maxrnd.com \
    --to=mark@maxrnd.com \
    --cc=cygwin-developers@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).