From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb32.google.com (mail-yb1-xb32.google.com [IPv6:2607:f8b0:4864:20::b32]) by sourceware.org (Postfix) with ESMTPS id 4EE413882AC6 for ; Tue, 18 Jun 2024 14:51:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4EE413882AC6 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=zaxiom.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=zaxiom.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4EE413882AC6 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::b32 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718722323; cv=none; b=gPno74LIjZ2ez0RAn0xQgUIq/yWVASlWioYcQDKg2uIPK3w1WQjX/WuIE88eyo42D3FsZcM1VDB++207UYJW+HhyukFNqXmROquZqHfYgmDg+K6KvqT6xBiYhz8o8/xPjzkh8KewSSkibCqqraxgbd3zGjeh7wx7/Qy5/05uJ/Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718722323; c=relaxed/simple; bh=T3p3FOeUDxUjE0tPh/kQH434rpuQFgQ/PKak3rMIQbA=; h=DKIM-Signature:From:Message-Id:Mime-Version:Subject:Date:To; b=ECvC0e03ZDyMplm32BiQM5dDbu29mW7XxIl3BSVKvJFa2hTHb4+LxQDSJ9TmhqB0wt2o5526+nxsr+bJyj9AMPXJb92iLbKn1Y7RKDsNUQm4eYIv8Gs/byq2blCfJOm7W/VPB5x2P+oWaobtcZVCl8d3qlsAD1r+uD61xA1aQM0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yb1-xb32.google.com with SMTP id 3f1490d57ef6-dff02b8a956so5416276276.1 for ; Tue, 18 Jun 2024 07:51:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zaxiom-com.20230601.gappssmtp.com; s=20230601; t=1718722316; x=1719327116; darn=cygwin.com; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=Y+4mX+0i9sftbtJiKT1HyJQGe6iJW9GKcQZlMaZ91uc=; b=FeuXwTFHCCTFEExONRTHsgmvfZVtIuXJ7XhDBxlsDqttmKc/ZgyNP219/1nM15tniu Bm7AgabKPhIK9Hil2OXQytRg4+ETaAl0+INOrlTuvhSHqIhiZ4OfY1EH6GlbAfzY71eD WfaA4rZIjXgDioxy3ykmY09UZxhVDSYh7SStxX0cCnfPW49H3sPctOIhaue6WgbvBO3r 4NXWJTNMrhRVZQcAba0lvpZY1tgKpX4u3I6kIWo2Aj962KdyXdwHU/5zJKLsMOKC9lmE PTwUkW11C+/5ghXamgkhzGhCDDwgKV5p+Jwp8VRUqZxKENz57zuE520UuGKwj1nf2rBD W/Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718722316; x=1719327116; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Y+4mX+0i9sftbtJiKT1HyJQGe6iJW9GKcQZlMaZ91uc=; b=NYBop87ny8wRyLIeiTVOillsHyaYYvE8kPAiu9vj5LllWUH2HRv065sXy4RMbDU6Gk GE1zXNwSP3ygpLjg1zDpXN6oMPy421yyYc0xE7MqQIZ9NvJwecCd47f+YuC6vVetKCfU uBt6ZPBruKHrmpCfy5d2hPwVetMQpngwToxNjNaFsdnYcTzdCrDgTubV54Xb9IOCKhrc fQHDPVkYAuXa7b1yZtFR3UhWkCiwj/FsH5gZBsQPfgJfBKE0lZ7FRR6EO72ercWPbsxr 2kut8+Qkg/NOISUD3j2Sid3EpZwH2QfLf2lO4WYpovRnvtVZQtjq5tJSi9KyV4dAY/++ MJCg== X-Gm-Message-State: AOJu0YxwmqeGvyKz0dV56JD9R/qGts4FqpPmUeP7kIKu4fr/1jBP0f23 lzvhWAS50d0R8ziFTl2IUUQsLOMpeDgbYQQFJDYBbPvfSeswoGZl0+zInYOhxPs= X-Google-Smtp-Source: AGHT+IEaXE06JzXBbrTjBSJ8iMAD9bCaue5L9RSRFbD5kZNF33hqevkrzwO7Np8VLWtiH6ymyzhqxA== X-Received: by 2002:a25:86c1:0:b0:dfd:b613:cd5f with SMTP id 3f1490d57ef6-e02be0ff3c8mr109933276.5.1718722315287; Tue, 18 Jun 2024 07:51:55 -0700 (PDT) Received: from smtpclient.apple ([96.74.41.169]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-dff048522f9sm2314765276.5.2024.06.18.07.51.54 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Jun 2024 07:51:55 -0700 (PDT) From: Nicholas Williams Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_C9F5244C-7FFF-4932-ABB2-89C6F0DDB9DE" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\)) Subject: Re: Cygwin outputting message to stderr on dofork EAGAIN failure even when Python exception is caught and handled Date: Tue, 18 Jun 2024 09:51:44 -0500 In-Reply-To: Cc: "cygwin@cygwin.com" , Andrey Repin To: "Dale Lobb (Sys Admin)" References: <1399464798.20240617105808@yandex.ru> X-Mailer: Apple Mail (2.3774.500.171.1.1) X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --Apple-Mail=_C9F5244C-7FFF-4932-ABB2-89C6F0DDB9DE Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Andrey and Dale, > On Jun 17, 2024, at 11:03, Dale Lobb (Sys Admin) wrote: >=20 > Greetings, Nicholas; >=20 >> From: Cygwin > On Behalf Of An= drey Repin via Cygwin >> Sent: Monday, June 17, 2024 2:58 AM >> To: Nicholas Williams >; cygwin@cygwin.com >> Cc: Andrey Repin > >> Subject: EXTERNAL SENDER: Re: Cygwin outputting message to stderr on dof= ork EAGAIN failure even when Python exception is caught and handled >>=20 >> Greetings, Nicholas Williams! > We have a Python (installed and run thro= ugh Cygwin) process running on > Windows Server 2022 that was very, very oc= casionally failing when subprocess.=E2=80=8Acheck_output was called: > 0 [m= ain] python3 28481 >>=20 >> Greetings, Nicholas Williams! >>=20 >>> We have a Python (installed and run through Cygwin) process running on >>> Windows Server 2022 that was very, very occasionally failing when subpr= ocess.check_output was called: >>=20 >>> 0 [main] python3 28481 dofork: child -1 - forked process 16856 died >>> unexpectedly, retry 0, exit code 0xC0000142, errno 11 >>> =E2=80=A6 >>> subprocess.check_output(["cygpath", "-w", directory.name], encoding= =3D"utf-8").strip() >>> File "/usr/lib/python3.9/subprocess.py", line 424, in check_output <> >>> <>return run(*popenargs, stdout=3DPIPE, timeout=3Dtimeout, check=3D= True, <> >>> File "/usr/lib/python3.9/subprocess.py", line 505, in run <> >>> <>with Popen(*popenargs, **kwargs) as process: <> >>> File "/usr/lib/python3.9/subprocess.py", line 951, in __init__ <> >>> <>self._execute_child(args, executable, preexec_fn, close_fds, <> >>> File "/usr/lib/python3.9/subprocess.py", line 1754, in _execute_child >>> self.pid =3D _posixsubprocess.fork_exec( >>> BlockingIOError: [Errno 11] Resource temporarily unavailable >>=20 >>> Setting aside for a minute the various reasons this might be happening >>> occasionally, which we cannot solve for at this moment, the error number >>> (EAGAIN) indicates that you should =E2=80=9Ctry again.=E2=80=9D So that= =E2=80=99s exactly what we >>> did. We added a try/catch to the Python code to catch the BlockingIOErr= or >>> and, if and only if the error number is EAGAIN, we try up to two more t= imes. >>> This fixed the problem and caused the application to stop quitting. We >>> output a warning to our log so that we don=E2=80=99t forget about the p= roblem, but >>> the warning only ever appears once, so retrying a single time seems to = help. >>=20 >>> However =E2=80=A6 even though Python handles the dofork error, turns it= into a >>> Python exception, and our code catches the Python exception and handles= it >>> properly, Cygwin (not Python =E2=80=A6 Cygwin) still outputs a message = to stderr >>> right before our warning message. This Cygwin error message shows up as= an error in our log tracking: >>=20 >>> 0 [main] python3 15042 dofork: child -1 - forked process 6780 died >>> unexpectedly, retry 0, exit code 0xC0000142, errno 11 >>> 06/16 13:57:53. 87520: WARNING: Retrying command in 2 seconds due to EA= GAIN: [the command we=E2=80=99re running] >>=20 >>> I=E2=80=99m sure there could be any number of things I might be missing= , but IMO, >>> if the process calling dofork properly handles the error raised by dofo= rk, >>> Cygwin should not be outputting an error message to stderr. >>=20 >>> Thoughts? >>=20 >> My inexperienced and uneducated thought would be that forking code is fr= agile >> and some parts of it prone to misbehavior. When an unforeseen error is >> detected, it's better to report it sooner, than to get bitten by it late= r. >>=20 >> Regarding your specific issue, if you create a STC[1] (a minimally enough >> version of your code that, say, fork a process thousands of times, which >> reliable reproduce the issue) somebody else could run to test the cause,= that >> would be wonderful. >>=20 >> (If, however, you could find and fix the cause, that would be even more = wonderful!) >>=20 >=20 > I have seen this exact issue on every Windows 2019 or 2022 > server where I have installed new versions of Cygwin since fall of 2023. > Admittedly, that has only been 3 or 4 machines, but it sure seems like > a pattern. I have resisted upgrading old Cygwin installations for fear > that they also would start to exhibit this fork problem. >=20 > https://cygwin.com/pipermail/cygwin/2023-September/254417.html >=20 > The only thing I have found that decreases the frequency of the > errors is to increase the amount of RAM assigned to the machine. > It does not eliminate the issue. I've tried a ton of different > options with re-basing the Cygwin executables, to no avail. >=20 >=20 > Best Regards, >=20 > Dale To be clear, the problem I=E2=80=99m reporting *IS NOT* the fork failure. S= ure, there might be a bug there, or it might just be that we have a resourc= e exhaustion problem that we haven=E2=80=99t been able to identify yet. We= =E2=80=99ll figure that out eventually; for now, we have successfully worke= d around the problem by retrying. The problem I=E2=80=99m reporting is that Cygwin, for some reason, prints a= n error message to stderr whenever fork fails, instead of letting the appli= cation calling fork do its own error handling. This means that, even though= we catch the Python exception and retry (successfully), an error message s= till gets written to stderr and ends up in our logs. This error message is = coming from this Cygwin code: https://github.com/cygwin/cygwin/blob/7e3c833592b282355a57dd34459b152e4e078= d19/winsup/cygwin/fork.cc#L381-L382 In our opinion, low-level system calls like this shouldn=E2=80=99t be writi= ng to stderr. That=E2=80=99s what errno is for (which this call does proper= ly set)=E2=80=94for the application making the system call to decide what t= o do about the error. That application can always decide to print an error.= The system call should not. There appears to be no way for us to disable t= his error printing. Nick= --Apple-Mail=_C9F5244C-7FFF-4932-ABB2-89C6F0DDB9DE--