From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from m0.truegem.net (m0.truegem.net [69.55.228.47]) by sourceware.org (Postfix) with ESMTPS id 34FB838346AB for ; Tue, 10 May 2022 08:34:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 34FB838346AB Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=maxrnd.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=maxrnd.com Received: (from daemon@localhost) by m0.truegem.net (8.12.11/8.12.11) id 24A8Yu9k085490 for ; Tue, 10 May 2022 01:34:56 -0700 (PDT) (envelope-from mark@maxrnd.com) Received: from 162-235-43-67.lightspeed.irvnca.sbcglobal.net(162.235.43.67), claiming to be "[192.168.1.100]" via SMTP by m0.truegem.net, id smtpdj6IjfI; Tue May 10 01:34:50 2022 Subject: Re: load average calculation failing To: cygwin-developers@cygwin.com References: <3a3edd10-2617-0919-4eb0-7ca965b48963@maxrnd.com> <223aa826-7bf9-281a-aed8-e16349de5b96@dronecode.org.uk> From: Mark Geisert Message-ID: <53664601-5858-ffd5-f854-a5c10fc25613@maxrnd.com> Date: Tue, 10 May 2022 01:34:50 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_PORT autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin-developers@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Cygwin core component developers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2022 08:35:00 -0000 Corinna Vinschen wrote: > [redirect back to cygwin-developers] > > On May 8 11:27, Jon Turney wrote: >> On 08/05/2022 08:01, Mark Geisert wrote: >>> Mark Geisert wrote (on the main Cygwin mailing list): >>>> I've recently noticed that the 'xload' I routinely run shows zero >>>> load even with compute-bound processes running.  This is on both >>>> Cygwin pre-3.4.0 as well as 3.3.4.  A test program, shown below, >>>> indicates that getloadavg() is returning with 0 status, i.e. not an >>>> error but no elems >>>> of the passed-in array updated. >>>> >>>> Stepping with gdb through the test program seems weird within the >>>> loadavginfo::load_init method.  Single-stepping at line >>>> loadavg.cc:68 goes to strace.h:52 and then to _sigbe ?! >>>> >>>> I had recently updated both Cygwin and Windows 10 to latest at the >>>> same time so I cannot say when the failure started.  Last day or two >>>> at most. >>>> >> [...] >>> >>> I've debugged a bit further..  Within Cygwin's loadavg.cc:load_init(), >>> the PdhOpenQueryW() call returns successfully.  The subsequent >>> PdhAddEnglishCounterW() call is unsuccessful.  It returns status > > This is a bit weird. I tried to debug this for a while on Friday on > W11 and on W11 I can reproduce *a* problem, too, just not the same you > report here. > > On W11 I see load_init() working fine, the calls to > PdhAddEnglishCounterW succeed. But then the call to > PdhGetFormattedCounterValue in get_load() fails with error > PDH_INVALID_DATA. The CStatus member of fmtvalue1 is set to > PDH_CSTATUS_NO_INSTANCE. > > If I tweak get_load to call PdhCollectQueryData again after a fail, > the second call succeeds. The only problem with this is, the returned > data doesn't make a lot of sense. It only starts to make sense if I > add a Sleep(1000) before the second PdhCollectQueryData call, which is > rather disappointing. > > Jon, would it, perhaps, make sense to call PdhCollectQueryData in > load_init(), without actually checking the return value? The idea is, > to make sure to have a base for the next call to PdhCollectQueryData > from inside load_init. > > But even then, the first values returned by getloadavg might not make > much sense, so I guess this is just clutching for straws... > >>> 0x800007D0 == PDH_CSTATUS_NO_MACHINE. The code (at line 68 mentioned > > This is a weird error. > > "The path did not contain a computer name and the function was unable > to retrieve the local computer name." > > Yeah, sure. > > Mark, did you try to add the computer name to the path by calling > GetComputerName() in load_init? I tried more ham-handedly by prepending L"\\\\hostname" or L"\\\\.". No change. I'm running W10 21H2 on my home machines. One with the issue is up-to-date with Windows patches. Another that still shows reasonable load averages may not have the very latest patches; I need to verify that. Some web page I found while searching for PDH stuff claimed that the performance counters are maintained by a Windows Service, which only gets started when some process attaches to pdh.dll. I have to find that page again and see if it talks about which Windows versions that applies to. That might possibly explain why one can't get reasonable counter numbers immediately after PdhOpenQuery. But then, my running xload, which does load pdh.dll, should be seeing good counters. ..mark