From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.kundenserver.de (mout.kundenserver.de [217.72.192.73]) by sourceware.org (Postfix) with ESMTPS id 2E06E384B0C3 for ; Mon, 9 May 2022 08:46:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2E06E384B0C3 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=cygwin.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=cygwin.com Received: from calimero.vinschen.de ([24.134.7.25]) by mrelayeu.kundenserver.de (mreue106 [212.227.15.183]) with ESMTPSA (Nemesis) id 1MbBQU-1oKqrz3PHU-00bXNE; Mon, 09 May 2022 10:46:03 +0200 Received: by calimero.vinschen.de (Postfix, from userid 500) id 5FE47A80885; Mon, 9 May 2022 10:45:58 +0200 (CEST) Date: Mon, 9 May 2022 10:45:58 +0200 From: Corinna Vinschen To: Mark Geisert , cygwin-developers@cygwin.com Subject: Re: load average calculation failing Message-ID: Reply-To: cygwin-developers@cygwin.com Mail-Followup-To: Mark Geisert , cygwin-developers@cygwin.com References: <3a3edd10-2617-0919-4eb0-7ca965b48963@maxrnd.com> <223aa826-7bf9-281a-aed8-e16349de5b96@dronecode.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <223aa826-7bf9-281a-aed8-e16349de5b96@dronecode.org.uk> X-Provags-ID: V03:K1:ST2syW8OIAWMv9WpN1oWgoN2RRLOU4f/qrnIuEH5FZpCk1H3rFk eFf6I5lBekKlXvctcQ5lzW9lWoeoV1upwV2fV+aWOjmBMtpzLbQDSnREjTHWcjmJ0tUpFwV amv+pgzal5lm51yVwpoYKZrTmIgQnj3AmLzdWZ+huXSGFuiktGLtcdtbWACGdDvXCkUOH0t 2EopVeAu+dyd34NcGvQ4Q== X-UI-Out-Filterresults: notjunk:1;V03:K0:NnAapp5QmCk=:wBd2GUtDcvtaN9PsNzVcsD vcyb3mmv/s/MIINaQBOHEdNCU1GvEGkL6ailLTi1ZtJIoZtOGQQQLYqb8yr0jjIbWTBVUadfk uraWtDLOQMscTHmVXObBU7Bv7JsDq1MRuhfcgOqty4MppY91a0HFaqunKaIrFGhZA6J5ydw6l HaA12QbzcZva12gFbUG9gtLZBem21r7A/PM0UkG/qsZ6fugHEos1WSJACeyfak6pJAr0CCRPo 2j/iMazbrmEWUObT6VSXavFoQojZR/Gs8YYJv495ygfZzWatUpAN7mS9lyqA/yQI6Ht9J70Or B5Py7Io6WRwovojRf6lLE6ushoA9uURqK9U3NyEPwbCYzvav5vHix8NBHJYYw3pA+aiM4+2cc 9E1Ilj7oooclMa28JqrItVTeMB7COe9UDA+AiUBUqfF0Y6U5PrFdkxdiUB3hyKPQXYQQjZwhW MVtql9CnAC+xHapPEguhlek5p97hQkzhQDjd/YKELQ8t88rSL4xCuZRZHfuH8vtWoQt2/LnvB lu7/m8V7vyOvwJ2BSb3rmZ6LQ7dNspJhbyaEm5j/vBasybNslZ7BRv1/T0fb78q25VQALER/c G4xqQvmM37POnbUyU/VwNysul/74LimeDIuqLKkqZUDRszIFEwMzGx7x1TlLGVyhkETjNIBJy 9hyHE3Vn76E3NyOffFIGe+/cp8ik+inLVyhDiXZNiIr52R05tAyYULADvtbjtBNcz3x43XHkC 3rjOm3VxXqbEihLY X-Spam-Status: No, score=-101.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, GOOD_FROM_CORINNA_CYGWIN, KAM_DMARC_NONE, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_FAIL, SPF_HELO_NONE, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin-developers@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Cygwin core component developers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2022 08:46:26 -0000 [redirect back to cygwin-developers] On May 8 11:27, Jon Turney wrote: > On 08/05/2022 08:01, Mark Geisert wrote: > > Mark Geisert wrote (on the main Cygwin mailing list): > > > I've recently noticed that the 'xload' I routinely run shows zero > > > load even with compute-bound processes running.  This is on both > > > Cygwin pre-3.4.0 as well as 3.3.4.  A test program, shown below, > > > indicates that getloadavg() is returning with 0 status, i.e. not an > > > error but no elems > > > of the passed-in array updated. > > > > > > Stepping with gdb through the test program seems weird within the > > > loadavginfo::load_init method.  Single-stepping at line > > > loadavg.cc:68 goes to strace.h:52 and then to _sigbe ?! > > > > > > I had recently updated both Cygwin and Windows 10 to latest at the > > > same time so I cannot say when the failure started.  Last day or two > > > at most. > > > > [...] > > > > I've debugged a bit further..  Within Cygwin's loadavg.cc:load_init(), > > the PdhOpenQueryW() call returns successfully.  The subsequent > > PdhAddEnglishCounterW() call is unsuccessful.  It returns status This is a bit weird. I tried to debug this for a while on Friday on W11 and on W11 I can reproduce *a* problem, too, just not the same you report here. On W11 I see load_init() working fine, the calls to PdhAddEnglishCounterW succeed. But then the call to PdhGetFormattedCounterValue in get_load() fails with error PDH_INVALID_DATA. The CStatus member of fmtvalue1 is set to PDH_CSTATUS_NO_INSTANCE. If I tweak get_load to call PdhCollectQueryData again after a fail, the second call succeeds. The only problem with this is, the returned data doesn't make a lot of sense. It only starts to make sense if I add a Sleep(1000) before the second PdhCollectQueryData call, which is rather disappointing. Jon, would it, perhaps, make sense to call PdhCollectQueryData in load_init(), without actually checking the return value? The idea is, to make sure to have a base for the next call to PdhCollectQueryData from inside load_init. But even then, the first values returned by getloadavg might not make much sense, so I guess this is just clutching for straws... > > 0x800007D0 == PDH_CSTATUS_NO_MACHINE. The code (at line 68 mentioned This is a weird error. "The path did not contain a computer name and the function was unable to retrieve the local computer name." Yeah, sure. Mark, did you try to add the computer name to the path by calling GetComputerName() in load_init? I tried the patch I pasted to the end of this mail, but it did not help the first PdhGetFormattedCounterValue call in get_load to return success. > > above) calls debug_printf() to conditionally display the error, which is > > what leads to the strace.h and _sigbe; that's fine. > > > > The weird PDH_CSTATUS_NO_MACHINE is the problem.  I'll try running the > > example from an elevated shell.  Or rebooting the machine.  After that > > it's consulting some oracle TBD. :-( > > > > Thanks for looking into this. > You can find the user space version of this code I initially wrote at > https://github.com/jon-turney/windows-loadavg, which might save you some > time. > > I can't reproduce this on W10 21H1, so I think this must be due to some > change in later Windows... I can reproduce this on W10 21H1, too, and the problem is the one I outlined above, with load_init working fine and just the PdhGetFormattedCounterValue failing in get_load. Corinna diff --git a/winsup/cygwin/loadavg.cc b/winsup/cygwin/loadavg.cc index 127591a2e1f5..a014c2eb758c 100644 --- a/winsup/cygwin/loadavg.cc +++ b/winsup/cygwin/loadavg.cc @@ -40,6 +40,7 @@ #include #include #include static PDH_HQUERY query; static PDH_HCOUNTER counter1; @@ -55,6 +55,17 @@ static bool load_init (void) tried = true; PDH_STATUS status; + DWORD size = MAX_PATH; + WCHAR machine_name[MAX_PATH]; + WCHAR counter_name[MAX_PATH + 64]; + PWCHAR counter_p = counter_name; + + if (GetComputerNameW (machine_name, &size)) + { + *counter_p++ = L'\\'; + *counter_p++ = L'\\'; + counter_p = wcpcpy (counter_p, machine_name); + } status = PdhOpenQueryW (NULL, 0, &query); if (status != STATUS_SUCCESS) @@ -62,18 +73,17 @@ static bool load_init (void) debug_printf ("PdhOpenQueryW, status %y", status); return false; } - status = PdhAddEnglishCounterW (query, - L"\\Processor(_Total)\\% Processor Time", - 0, &counter1); + + wcpcpy (counter_p, L"\\Processor(_Total)\\% Processor Time"); + status = PdhAddEnglishCounterW (query, counter_name, 0, &counter1); if (status != STATUS_SUCCESS) { debug_printf ("PdhAddEnglishCounterW(time), status %y", status); return false; } - status = PdhAddEnglishCounterW (query, - L"\\System\\Processor Queue Length", - 0, &counter2); + wcpcpy (counter_p, L"\\System\\Processor Queue Length"); + status = PdhAddEnglishCounterW (query, counter_name, 0, &counter2); if (status != STATUS_SUCCESS) { debug_printf ("PdhAddEnglishCounterW(queue length), status %y", status);