public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Nicholas Williams <nicholas.williams@zaxiom.com>
To: "Dale Lobb (Sys Admin)" <Dale.Lobb@bryanhealth.org>
Cc: "cygwin@cygwin.com" <cygwin@cygwin.com>,
	Andrey Repin <anrdaemon@yandex.ru>
Subject: Re: Cygwin outputting message to stderr on dofork EAGAIN failure even when Python exception is caught and handled
Date: Tue, 18 Jun 2024 09:51:44 -0500	[thread overview]
Message-ID: <EE5DC946-7AA2-4D74-8919-8B176089550F@zaxiom.com> (raw)
In-Reply-To: <PH0PR16MB4782ED8A31ADDFE5A3AC6ABAF5CD2@PH0PR16MB4782.namprd16.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 6077 bytes --]

Andrey and Dale,

> On Jun 17, 2024, at 11:03, Dale Lobb (Sys Admin) <Dale.Lobb@bryanhealth.org> wrote:
> 
> Greetings, Nicholas;
> 
>> From: Cygwin <cygwin-bounces+dale.lobb=bryanhealth.org@cygwin.com <mailto:cygwin-bounces+dale.lobb=bryanhealth.org@cygwin.com>> On Behalf Of Andrey Repin via Cygwin
>> Sent: Monday, June 17, 2024 2:58 AM
>> To: Nicholas Williams <nicholas.williams@zaxiom.com <mailto:nicholas.williams@zaxiom.com>>; cygwin@cygwin.com <mailto:cygwin@cygwin.com>
>> Cc: Andrey Repin <anrdaemon@yandex.ru <mailto:anrdaemon@yandex.ru>>
>> Subject: EXTERNAL SENDER: Re: Cygwin outputting message to stderr on dofork EAGAIN failure even when Python exception is caught and handled
>> 
>> Greetings, Nicholas Williams! > We have a Python (installed and run through Cygwin) process running on > Windows Server 2022 that was very, very occasionally failing when subprocess. check_output was called: > 0 [main] python3 28481
>> 
>> Greetings, Nicholas Williams!
>> 
>>> We have a Python (installed and run through Cygwin) process running on
>>> Windows Server 2022 that was very, very occasionally failing when subprocess.check_output was called:
>> 
>>> 0 [main] python3 28481 dofork: child -1 - forked process 16856 died
>>> unexpectedly, retry 0, exit code 0xC0000142, errno 11
>>> …
>>>    subprocess.check_output(["cygpath", "-w", directory.name], encoding="utf-8").strip()
>>> File "/usr/lib/python3.9/subprocess.py", line 424, in check_output <>
>>>     <>return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, <>
>>> File "/usr/lib/python3.9/subprocess.py", line 505, in run <>
>>>     <>with Popen(*popenargs, **kwargs) as process: <>
>>> File "/usr/lib/python3.9/subprocess.py", line 951, in __init__ <>
>>>     <>self._execute_child(args, executable, preexec_fn, close_fds, <>
>>> File "/usr/lib/python3.9/subprocess.py", line 1754, in _execute_child
>>>    self.pid = _posixsubprocess.fork_exec(
>>> BlockingIOError: [Errno 11] Resource temporarily unavailable
>> 
>>> Setting aside for a minute the various reasons this might be happening
>>> occasionally, which we cannot solve for at this moment, the error number
>>> (EAGAIN) indicates that you should “try again.” So that’s exactly what we
>>> did. We added a try/catch to the Python code to catch the BlockingIOError
>>> and, if and only if the error number is EAGAIN, we try up to two more times.
>>> This fixed the problem and caused the application to stop quitting. We
>>> output a warning to our log so that we don’t forget about the problem, but
>>> the warning only ever appears once, so retrying a single time seems to help.
>> 
>>> However … even though Python handles the dofork error, turns it into a
>>> Python exception, and our code catches the Python exception and handles it
>>> properly, Cygwin (not Python … Cygwin) still outputs a message to stderr
>>> right before our warning message. This Cygwin error message shows up as an error in our log tracking:
>> 
>>> 0 [main] python3 15042 dofork: child -1 - forked process 6780 died
>>> unexpectedly, retry 0, exit code 0xC0000142, errno 11
>>> 06/16 13:57:53. 87520: WARNING: Retrying command in 2 seconds due to EAGAIN: [the command we’re running]
>> 
>>> I’m sure there could be any number of things I might be missing, but IMO,
>>> if the process calling dofork properly handles the error raised by dofork,
>>> Cygwin should not be outputting an error message to stderr.
>> 
>>> Thoughts?
>> 
>> My inexperienced and uneducated thought would be that forking code is fragile
>> and some parts of it prone to misbehavior. When an unforeseen error is
>> detected, it's better to report it sooner, than to get bitten by it later.
>> 
>> Regarding your specific issue, if you create a STC[1] (a minimally enough
>> version of your code that, say, fork a process thousands of times, which
>> reliable reproduce the issue) somebody else could run to test the cause, that
>> would be wonderful.
>> 
>> (If, however, you could find and fix the cause, that would be even more wonderful!)
>> 
> 
>  I have seen this exact issue on every Windows  2019 or 2022
> server where I have installed new versions of Cygwin since fall of 2023.
> Admittedly, that has only been 3 or 4 machines, but it sure seems like
> a pattern.  I have resisted upgrading old Cygwin installations for fear
> that they also would start to exhibit this fork problem.
> 
> https://cygwin.com/pipermail/cygwin/2023-September/254417.html
> 
>  The only thing I have found that decreases the frequency of the
> errors is to increase the amount of RAM assigned to the machine.
> It does not eliminate the issue.  I've tried a ton of different
> options with re-basing the Cygwin executables, to no avail.
> 
> 
> Best Regards,
> 
> Dale

To be clear, the problem I’m reporting *IS NOT* the fork failure. Sure, there might be a bug there, or it might just be that we have a resource exhaustion problem that we haven’t been able to identify yet. We’ll figure that out eventually; for now, we have successfully worked around the problem by retrying.

The problem I’m reporting is that Cygwin, for some reason, prints an error message to stderr whenever fork fails, instead of letting the application calling fork do its own error handling. This means that, even though we catch the Python exception and retry (successfully), an error message still gets written to stderr and ends up in our logs. This error message is coming from this Cygwin code:

https://github.com/cygwin/cygwin/blob/7e3c833592b282355a57dd34459b152e4e078d19/winsup/cygwin/fork.cc#L381-L382

In our opinion, low-level system calls like this shouldn’t be writing to stderr. That’s what errno is for (which this call does properly set)—for the application making the system call to decide what to do about the error. That application can always decide to print an error. The system call should not. There appears to be no way for us to disable this error printing.

Nick

      parent reply	other threads:[~2024-06-18 14:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-16 18:54 Nicholas Williams
2024-06-17  7:58 ` Andrey Repin
2024-06-17 16:03   ` Dale Lobb (Sys Admin)
2024-06-18 14:44     ` Dan Shelton
2024-06-18 22:22       ` Nicholas Williams
2024-06-18 14:51     ` Nicholas Williams [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=EE5DC946-7AA2-4D74-8919-8B176089550F@zaxiom.com \
    --to=nicholas.williams@zaxiom.com \
    --cc=Dale.Lobb@bryanhealth.org \
    --cc=anrdaemon@yandex.ru \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).