public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* GNU make losing jobserver tokens
@ 2022-03-21 14:28 Magnus Ihse Bursie
  2022-03-21 15:09 ` Ken Brown
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Magnus Ihse Bursie @ 2022-03-21 14:28 UTC (permalink / raw)
  To: cygwin

Hi,

I'm working for Oracle on the OpenJDK build team. We're using GNU make 
to build the JDK on all supported platforms. For Windows, we use Cygwin 
as our build environment, including the Cygwin version of GNU make.

We have had a long-standing issue with make losing jobserver tokens. 
("long-standing" here means for years, and years, at least since GNU 
make 4.0, up to and including the current latest version in Cygwin.)

Most runs end with something like:

make[2]: INTERNAL: Exiting with 11 jobserver tokens available; should be 
12!

Since the build still succeeds, and it just affects performance (and 
typically not that much), we have not spend too much time getting to the 
bottom of this.

Now, however, I've come across a machine where this happens repeatedly, 
and on a much worse scale:

make[2]: INTERNAL: Exiting with 1 jobserver tokens available; should be 24!

This effectively turns the highly parallelized builds into 
single-threaded builds, and is absolutely detrimental for performance. 
On the flip side, this also makes for the perfect testing environment to 
really get to the bottom of this issue.

I started out by sending a question to bug-make@gnu.org. The folks over 
there reported that this was not a known problem with GNU make on 
Windows in general, and that as far as they knew, the mingw port did not 
suffer from this problem.

Instead, they suggested that it was a Cygwin-specific problem, possibly 
related to issues with emulating Posix pipes and/or signals in Cygwin.

So, my first question is: Is this a known problem in Cygwin GNU make? 
Are there any workarounds/fixes to get around it?

Otherwise: Any suggestions on how to go on and debug this? I am willing 
to build and test an instrumented debug build of make, but I will need 
assistance to find my way around the source and spot likely candidates 
for the source of the problem.

/Magnus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-03-21 14:28 GNU make losing jobserver tokens Magnus Ihse Bursie
@ 2022-03-21 15:09 ` Ken Brown
  2022-03-22  6:54   ` Noel Grandin
  2022-03-22 19:38   ` checking cyg version (was Re: GNU make losing jobserver tokens) L A Walsh
  2022-03-23  6:24 ` GNU make losing jobserver tokens Roumen Petrov
  2022-04-01  8:45 ` Takashi Yano
  2 siblings, 2 replies; 18+ messages in thread
From: Ken Brown @ 2022-03-21 15:09 UTC (permalink / raw)
  To: cygwin

On 3/21/2022 10:28 AM, Magnus Ihse Bursie wrote:
> Hi,
> 
> I'm working for Oracle on the OpenJDK build team. We're using GNU make to build 
> the JDK on all supported platforms. For Windows, we use Cygwin as our build 
> environment, including the Cygwin version of GNU make.
> 
> We have had a long-standing issue with make losing jobserver tokens. 
> ("long-standing" here means for years, and years, at least since GNU make 4.0, 
> up to and including the current latest version in Cygwin.)
> 
> Most runs end with something like:
> 
> make[2]: INTERNAL: Exiting with 11 jobserver tokens available; should be 12!
> 
> Since the build still succeeds, and it just affects performance (and typically 
> not that much), we have not spend too much time getting to the bottom of this.
> 
> Now, however, I've come across a machine where this happens repeatedly, and on a 
> much worse scale:
> 
> make[2]: INTERNAL: Exiting with 1 jobserver tokens available; should be 24!
> 
> This effectively turns the highly parallelized builds into single-threaded 
> builds, and is absolutely detrimental for performance. On the flip side, this 
> also makes for the perfect testing environment to really get to the bottom of 
> this issue.
> 
> I started out by sending a question to bug-make@gnu.org. The folks over there 
> reported that this was not a known problem with GNU make on Windows in general, 
> and that as far as they knew, the mingw port did not suffer from this problem.
> 
> Instead, they suggested that it was a Cygwin-specific problem, possibly related 
> to issues with emulating Posix pipes and/or signals in Cygwin.
> 
> So, my first question is: Is this a known problem in Cygwin GNU make? Are there 
> any workarounds/fixes to get around it?

No, it's not a known problem.

> Otherwise: Any suggestions on how to go on and debug this? I am willing to build 
> and test an instrumented debug build of make, but I will need assistance to find 
> my way around the source and spot likely candidates for the source of the problem.

For starters, is your Cygwin installation up to date?  Cygwin's internal 
implementation of pipes was overhauled starting with cygwin-3.3.0.

Ken

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-03-21 15:09 ` Ken Brown
@ 2022-03-22  6:54   ` Noel Grandin
  2022-03-22 17:52     ` GNU make losing jobserver tokens in pipes Brian Inglis
  2022-03-22 19:38   ` checking cyg version (was Re: GNU make losing jobserver tokens) L A Walsh
  1 sibling, 1 reply; 18+ messages in thread
From: Noel Grandin @ 2022-03-22  6:54 UTC (permalink / raw)
  To: Ken Brown, cygwin, magnus.ihse.bursie

On 2022/03/21 5:09 pm, Ken Brown wrote:
> On 3/21/2022 10:28 AM, Magnus Ihse Bursie wrote:
>>
>> We have had a long-standing issue with make losing jobserver tokens. ("long-standing" here means for years, and years, 
>> at least since GNU make 4.0, up to and including the current latest version in Cygwin.)
>>

Hi

It was not that long ago that Linus Torvalds found a bug in the Linux kernel pipe implementation which caused GNU make 
to lose jobserver tokens, so possibly researching that bug may shed some light on the kinds of things that could be 
wrong with the Cygwin pipe code.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ddad21d3e99c743a3aa473121dc5561679e26bb
https://lkml.org/lkml/2019/12/18/1064

Regards, Noel Grandin



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens in pipes
  2022-03-22  6:54   ` Noel Grandin
@ 2022-03-22 17:52     ` Brian Inglis
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Inglis @ 2022-03-22 17:52 UTC (permalink / raw)
  To: cygwin

On 2022-03-22 00:54, Noel Grandin wrote:
> On 2022/03/21 5:09 pm, Ken Brown wrote:
>> On 3/21/2022 10:28 AM, Magnus Ihse Bursie wrote:
>>> We have had a long-standing issue with make losing jobserver tokens. 
>>> ("long-standing" here means for years, and years, at least since GNU 
>>> make 4.0, up to and including the current latest version in Cygwin.)

> It was not that long ago that Linus Torvalds found a bug in the Linux 
> kernel pipe implementation which caused GNU make to lose jobserver 
> tokens, so possibly researching that bug may shed some light on the 
> kinds of things that could be wrong with the Cygwin pipe code.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ddad21d3e99c743a3aa473121dc5561679e26bb 
> 
> https://lkml.org/lkml/2019/12/18/1064

Perhaps add "in pipes" to subject to get that maintainer's attention?

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* checking cyg version (was Re: GNU make losing jobserver tokens)
  2022-03-21 15:09 ` Ken Brown
  2022-03-22  6:54   ` Noel Grandin
@ 2022-03-22 19:38   ` L A Walsh
  2022-03-22 21:58     ` Adam Dinwoodie
  2022-03-22 23:06     ` Mark Geisert
  1 sibling, 2 replies; 18+ messages in thread
From: L A Walsh @ 2022-03-22 19:38 UTC (permalink / raw)
  To: cygwin





On 2022/03/21 08:09, Ken Brown wrote:
>
> For starters, is your Cygwin installation up to date?  Cygwin's internal 
> implementation of pipes was overhauled starting with cygwin-3.3.0.
>   
How does one check the version of cygwin?  I've updated cygwin files 
this year,
but if I use cygcheck -V, I only see cygwin-3.2, which looks to be from 
last year.

Is that they right way to check the cygwin version?

thanks!
-linda


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: checking cyg version (was Re: GNU make losing jobserver tokens)
  2022-03-22 19:38   ` checking cyg version (was Re: GNU make losing jobserver tokens) L A Walsh
@ 2022-03-22 21:58     ` Adam Dinwoodie
  2022-03-22 23:06     ` Mark Geisert
  1 sibling, 0 replies; 18+ messages in thread
From: Adam Dinwoodie @ 2022-03-22 21:58 UTC (permalink / raw)
  To: cygwin

On Tue, Mar 22, 2022 at 12:38:34PM -0700, L A Walsh wrote:
> On 2022/03/21 08:09, Ken Brown wrote:
> > 
> > For starters, is your Cygwin installation up to date?  Cygwin's internal
> > implementation of pipes was overhauled starting with cygwin-3.3.0.
> How does one check the version of cygwin?  I've updated cygwin files this
> year,
> but if I use cygcheck -V, I only see cygwin-3.2, which looks to be from last
> year.

Unless you're doing something very odd, that implies you're running a
version that's just under a year old, and which won't include the pipe
changes Ken is referring to.

> Is that they right way to check the cygwin version?

Not really: `cygcheck -V` is just reporting the version of cygcheck that
you have installed.  That should normally match the version of the core
"cygwin" package, which also includes the cygwin1.dll library that makes
everything else possible.  However "the version of Cygwin" isn't a
meaningful concept: a Cygwin installation is made up of many parts, each
of which has its own version number.  There's more detail at
https://cygwin.com/faq/faq.html#faq.what.version

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: checking cyg version (was Re: GNU make losing jobserver tokens)
  2022-03-22 19:38   ` checking cyg version (was Re: GNU make losing jobserver tokens) L A Walsh
  2022-03-22 21:58     ` Adam Dinwoodie
@ 2022-03-22 23:06     ` Mark Geisert
  2022-03-23 17:47       ` Samuel Lelièvre
  1 sibling, 1 reply; 18+ messages in thread
From: Mark Geisert @ 2022-03-22 23:06 UTC (permalink / raw)
  To: cygwin

L A Walsh wrote:
> On 2022/03/21 08:09, Ken Brown wrote:
>>
>> For starters, is your Cygwin installation up to date?  Cygwin's internal 
>> implementation of pipes was overhauled starting with cygwin-3.3.0.
> How does one check the version of cygwin?  I've updated cygwin files this year,
> but if I use cygcheck -V, I only see cygwin-3.2, which looks to be from last year.
> 
> Is that they right way to check the cygwin version?

uname -r

..or the catch-all when I can't remember the -r option:

uname -a

..mark

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-03-21 14:28 GNU make losing jobserver tokens Magnus Ihse Bursie
  2022-03-21 15:09 ` Ken Brown
@ 2022-03-23  6:24 ` Roumen Petrov
  2022-04-01  8:45 ` Takashi Yano
  2 siblings, 0 replies; 18+ messages in thread
From: Roumen Petrov @ 2022-03-23  6:24 UTC (permalink / raw)
  To: cygwin

Hi,

Magnus Ihse Bursie wrote:
> Hi,
>
> I'm working for Oracle on the OpenJDK build team. We're using GNU make to build the JDK on all supported platforms. For Windows, we use Cygwin as our build environment, including the Cygwin version of GNU make.
>
> We have had a long-standing issue with make losing jobserver tokens. ("long-standing" here means for years, and years, at least since GNU make 4.0, up to and including the current latest version in Cygwin.)
>
Parallel build was working for my on 32-bit cygwin in the past.


> Most runs end with something like:
>
> make[2]: INTERNAL: Exiting with 11 jobserver tokens available; should be 12!
>
> Since the build still succeeds, and it just affects performance (and typically not that much), we have not spend too much time getting to the bottom of this.

Now I cannot get it working on 64-bit cygwin -
https://www.mail-archive.com/cygwin@cygwin.com/msg169861.html

Interesting that you build succeeds.


>
> [SNIP]
> Instead, they suggested that it was a Cygwin-specific problem, possibly related to issues with emulating Posix pipes and/or signals in Cygwin.

There is some issues in the my build environment:
- unix domain socket as non administrator (https://www.mail-archive.com/cygwin@cygwin.com/msg169832.html ). If I remember well those socket does not work properly in general.
- setup as not admin - package upgrade failed (https://www.mail-archive.com/cygwin@cygwin.com/msg169830.html).

The second one look like issue with pipes.


>
> So, my first question is: Is this a known problem in Cygwin GNU make? Are there any workarounds/fixes to get around it?

Does not look like issue in make. Look like general issue.
May be related to Microsoft Windows OS restriction to user account.

>
> Otherwise: Any suggestions on how to go on and debug this? I am willing to build and test an instrumented debug build of make, but I will need assistance to find my way around the source and spot likely candidates for the source of the problem.

I'm not regular cygwin user and I do not have time and environments to test build variations.
Perhaps build could use local administrator account or to use jobs lest than number of cores.

> /Magnus

Regards,
Roumen Petrov


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: checking cyg version (was Re: GNU make losing jobserver tokens)
  2022-03-22 23:06     ` Mark Geisert
@ 2022-03-23 17:47       ` Samuel Lelièvre
  0 siblings, 0 replies; 18+ messages in thread
From: Samuel Lelièvre @ 2022-03-23 17:47 UTC (permalink / raw)
  To: cygwin

Version number of the "cygwin" Cygwin package:
```
cygcheck -c cygwin
```

Version numbers of all installed Cygwin packages:
```
cygcheck -c
```

Save that information to a file:
```
cygcheck -c > cygwin-package-versions.txt
```

Save more complete information to a file:
```
cygcheck -s -r -v > cygcheck.out
```

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-03-21 14:28 GNU make losing jobserver tokens Magnus Ihse Bursie
  2022-03-21 15:09 ` Ken Brown
  2022-03-23  6:24 ` GNU make losing jobserver tokens Roumen Petrov
@ 2022-04-01  8:45 ` Takashi Yano
  2022-04-27 14:13   ` Takashi Yano
  2 siblings, 1 reply; 18+ messages in thread
From: Takashi Yano @ 2022-04-01  8:45 UTC (permalink / raw)
  To: cygwin

On Mon, 21 Mar 2022 15:28:17 +0100
Magnus Ihse Bursie wrote:
> Hi,
> 
> I'm working for Oracle on the OpenJDK build team. We're using GNU make 
> to build the JDK on all supported platforms. For Windows, we use Cygwin 
> as our build environment, including the Cygwin version of GNU make.
> 
> We have had a long-standing issue with make losing jobserver tokens. 
> ("long-standing" here means for years, and years, at least since GNU 
> make 4.0, up to and including the current latest version in Cygwin.)
> 
> Most runs end with something like:
> 
> make[2]: INTERNAL: Exiting with 11 jobserver tokens available; should be 
> 12!
> 
> Since the build still succeeds, and it just affects performance (and 
> typically not that much), we have not spend too much time getting to the 
> bottom of this.
> 
> Now, however, I've come across a machine where this happens repeatedly, 
> and on a much worse scale:
> 
> make[2]: INTERNAL: Exiting with 1 jobserver tokens available; should be 24!
> 
> This effectively turns the highly parallelized builds into 
> single-threaded builds, and is absolutely detrimental for performance. 
> On the flip side, this also makes for the perfect testing environment to 
> really get to the bottom of this issue.
> 
> I started out by sending a question to bug-make@gnu.org. The folks over 
> there reported that this was not a known problem with GNU make on 
> Windows in general, and that as far as they knew, the mingw port did not 
> suffer from this problem.
> 
> Instead, they suggested that it was a Cygwin-specific problem, possibly 
> related to issues with emulating Posix pipes and/or signals in Cygwin.
> 
> So, my first question is: Is this a known problem in Cygwin GNU make? 
> Are there any workarounds/fixes to get around it?
> 
> Otherwise: Any suggestions on how to go on and debug this? I am willing 
> to build and test an instrumented debug build of make, but I will need 
> assistance to find my way around the source and spot likely candidates 
> for the source of the problem.

I have tried to reproduce the issue by building OpenJDK
from source, however, I could not.

Instead, I encountered another issue.

Building OpenJDK sometimes (rarely) failed with error such as:

      0 [sig] make 5484 sig_send: error sending signal 11, pid 5484, pipe handle 0x118, nb 0, packsize 176, Win32 error 0
 124917 [main] make 5484 sig_send: error sending signal -72, pid 5484, pipe handle 0x118, nb 0, packsize 176, Win32 error 0
common/modules/GensrcModuleInfo.gmk:77: *** open: /home/yano/jdk/build/windows-x86-server-release/make-support/vardeps/make/common/modules/GensrcModuleInfo.gmk/jdk.accessibility/ALL_MODULES.vardeps: No such file or directory.  Stop.
make[2]: *** [make/Main.gmk:141: jdk.accessibility-gensrc-moduleinfo] Error 2
make[2]: *** Waiting for unfinished jobs....


I looked into this new problem and found that wait_sig() thread
crashes with segfault. It seems that accessing _main_tls causes
access violation if a signal is sent just after the process is
started.

static void WINAPI
wait_sig (VOID *)
{
  [...]
      if (!pack.mask)
	{
	  tl_entry = cygheap->find_tls (_main_tls);
	  dummy_mask = _main_tls->sigmask;       // <--- Segfault here
	  cygheap->unlock_tls (tl_entry);
	  pack.mask = &dummy_mask;
	}

I also found the following patch resolves the issue.

diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc
index 62df96652..3824af199 100644
--- a/winsup/cygwin/sigproc.cc
+++ b/winsup/cygwin/sigproc.cc
@@ -1325,6 +1325,10 @@ wait_sig (VOID *)
   _sig_tls = &_my_tls;
   bool sig_held = false;
 
+  /* Wait for _main_tls initialization. */
+  while (!cygwin_finished_initializing)
+    Sleep (10);
+
   sigproc_printf ("entering ReadFile loop, my_readsig %p, my_sendsig %p",
 		  my_readsig, my_sendsig);
 

I guess _main_tls may not be initialized correctly until
cygwin_finished_initializing is set.

Any comments would be appreciated.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-04-01  8:45 ` Takashi Yano
@ 2022-04-27 14:13   ` Takashi Yano
  2022-04-28 13:42     ` Ken Brown
  0 siblings, 1 reply; 18+ messages in thread
From: Takashi Yano @ 2022-04-27 14:13 UTC (permalink / raw)
  To: cygwin

On Fri, 1 Apr 2022 17:45:51 +0900
Takashi Yano wrote:
> On Mon, 21 Mar 2022 15:28:17 +0100
> Magnus Ihse Bursie wrote:
> > Hi,
> > 
> > I'm working for Oracle on the OpenJDK build team. We're using GNU make 
> > to build the JDK on all supported platforms. For Windows, we use Cygwin 
> > as our build environment, including the Cygwin version of GNU make.
> > 
> > We have had a long-standing issue with make losing jobserver tokens. 
> > ("long-standing" here means for years, and years, at least since GNU 
> > make 4.0, up to and including the current latest version in Cygwin.)
> > 
> > Most runs end with something like:
> > 
> > make[2]: INTERNAL: Exiting with 11 jobserver tokens available; should be 
> > 12!
> > 
> > Since the build still succeeds, and it just affects performance (and 
> > typically not that much), we have not spend too much time getting to the 
> > bottom of this.
> > 
> > Now, however, I've come across a machine where this happens repeatedly, 
> > and on a much worse scale:
> > 
> > make[2]: INTERNAL: Exiting with 1 jobserver tokens available; should be 24!
> > 
> > This effectively turns the highly parallelized builds into 
> > single-threaded builds, and is absolutely detrimental for performance. 
> > On the flip side, this also makes for the perfect testing environment to 
> > really get to the bottom of this issue.
> > 
> > I started out by sending a question to bug-make@gnu.org. The folks over 
> > there reported that this was not a known problem with GNU make on 
> > Windows in general, and that as far as they knew, the mingw port did not 
> > suffer from this problem.
> > 
> > Instead, they suggested that it was a Cygwin-specific problem, possibly 
> > related to issues with emulating Posix pipes and/or signals in Cygwin.
> > 
> > So, my first question is: Is this a known problem in Cygwin GNU make? 
> > Are there any workarounds/fixes to get around it?
> > 
> > Otherwise: Any suggestions on how to go on and debug this? I am willing 
> > to build and test an instrumented debug build of make, but I will need 
> > assistance to find my way around the source and spot likely candidates 
> > for the source of the problem.
> 
> I have tried to reproduce the issue by building OpenJDK
> from source, however, I could not.
> 
> Instead, I encountered another issue.
> 
> Building OpenJDK sometimes (rarely) failed with error such as:
> 
>       0 [sig] make 5484 sig_send: error sending signal 11, pid 5484, pipe handle 0x118, nb 0, packsize 176, Win32 error 0
>  124917 [main] make 5484 sig_send: error sending signal -72, pid 5484, pipe handle 0x118, nb 0, packsize 176, Win32 error 0
> common/modules/GensrcModuleInfo.gmk:77: *** open: /home/yano/jdk/build/windows-x86-server-release/make-support/vardeps/make/common/modules/GensrcModuleInfo.gmk/jdk.accessibility/ALL_MODULES.vardeps: No such file or directory.  Stop.
> make[2]: *** [make/Main.gmk:141: jdk.accessibility-gensrc-moduleinfo] Error 2
> make[2]: *** Waiting for unfinished jobs....
> 
> 
> I looked into this new problem and found that wait_sig() thread
> crashes with segfault. It seems that accessing _main_tls causes
> access violation if a signal is sent just after the process is
> started.
> 
> static void WINAPI
> wait_sig (VOID *)
> {
>   [...]
>       if (!pack.mask)
> 	{
> 	  tl_entry = cygheap->find_tls (_main_tls);
> 	  dummy_mask = _main_tls->sigmask;       // <--- Segfault here
> 	  cygheap->unlock_tls (tl_entry);
> 	  pack.mask = &dummy_mask;
> 	}
> 
> I also found the following patch resolves the issue.
> 
> diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc
> index 62df96652..3824af199 100644
> --- a/winsup/cygwin/sigproc.cc
> +++ b/winsup/cygwin/sigproc.cc
> @@ -1325,6 +1325,10 @@ wait_sig (VOID *)
>    _sig_tls = &_my_tls;
>    bool sig_held = false;
>  
> +  /* Wait for _main_tls initialization. */
> +  while (!cygwin_finished_initializing)
> +    Sleep (10);
> +
>    sigproc_printf ("entering ReadFile loop, my_readsig %p, my_sendsig %p",
>  		  my_readsig, my_sendsig);
>  
> 
> I guess _main_tls may not be initialized correctly until
> cygwin_finished_initializing is set.
> 
> Any comments would be appreciated.

Ping?

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-04-27 14:13   ` Takashi Yano
@ 2022-04-28 13:42     ` Ken Brown
  2022-04-28 14:09       ` Corinna Vinschen
  0 siblings, 1 reply; 18+ messages in thread
From: Ken Brown @ 2022-04-28 13:42 UTC (permalink / raw)
  To: cygwin

On 4/27/2022 10:13 AM, Takashi Yano wrote:
> On Fri, 1 Apr 2022 17:45:51 +0900
> Takashi Yano wrote:
>> I have tried to reproduce the issue by building OpenJDK
>> from source, however, I could not.
>>
>> Instead, I encountered another issue.
>>
>> Building OpenJDK sometimes (rarely) failed with error such as:
>>
>>        0 [sig] make 5484 sig_send: error sending signal 11, pid 5484, pipe handle 0x118, nb 0, packsize 176, Win32 error 0
>>   124917 [main] make 5484 sig_send: error sending signal -72, pid 5484, pipe handle 0x118, nb 0, packsize 176, Win32 error 0
>> common/modules/GensrcModuleInfo.gmk:77: *** open: /home/yano/jdk/build/windows-x86-server-release/make-support/vardeps/make/common/modules/GensrcModuleInfo.gmk/jdk.accessibility/ALL_MODULES.vardeps: No such file or directory.  Stop.
>> make[2]: *** [make/Main.gmk:141: jdk.accessibility-gensrc-moduleinfo] Error 2
>> make[2]: *** Waiting for unfinished jobs....
>>
>>
>> I looked into this new problem and found that wait_sig() thread
>> crashes with segfault. It seems that accessing _main_tls causes
>> access violation if a signal is sent just after the process is
>> started.
>>
>> static void WINAPI
>> wait_sig (VOID *)
>> {
>>    [...]
>>        if (!pack.mask)
>> 	{
>> 	  tl_entry = cygheap->find_tls (_main_tls);
>> 	  dummy_mask = _main_tls->sigmask;       // <--- Segfault here
>> 	  cygheap->unlock_tls (tl_entry);
>> 	  pack.mask = &dummy_mask;
>> 	}
>>
>> I also found the following patch resolves the issue.
>>
>> diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc
>> index 62df96652..3824af199 100644
>> --- a/winsup/cygwin/sigproc.cc
>> +++ b/winsup/cygwin/sigproc.cc
>> @@ -1325,6 +1325,10 @@ wait_sig (VOID *)
>>     _sig_tls = &_my_tls;
>>     bool sig_held = false;
>>   
>> +  /* Wait for _main_tls initialization. */
>> +  while (!cygwin_finished_initializing)
>> +    Sleep (10);
>> +
>>     sigproc_printf ("entering ReadFile loop, my_readsig %p, my_sendsig %p",
>>   		  my_readsig, my_sendsig);
>>   
>>
>> I guess _main_tls may not be initialized correctly until
>> cygwin_finished_initializing is set.
>>
>> Any comments would be appreciated.

This seems reasonable to me.

Ken

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-04-28 13:42     ` Ken Brown
@ 2022-04-28 14:09       ` Corinna Vinschen
  2022-04-28 15:01         ` Takashi Yano
  0 siblings, 1 reply; 18+ messages in thread
From: Corinna Vinschen @ 2022-04-28 14:09 UTC (permalink / raw)
  To: cygwin

On Apr 28 09:42, Ken Brown wrote:
> On 4/27/2022 10:13 AM, Takashi Yano wrote:
> > On Fri, 1 Apr 2022 17:45:51 +0900
> > Takashi Yano wrote:
> > > I have tried to reproduce the issue by building OpenJDK
> > > from source, however, I could not.
> > > 
> > > Instead, I encountered another issue.
> > > 
> > > Building OpenJDK sometimes (rarely) failed with error such as:
> > > 
> > >        0 [sig] make 5484 sig_send: error sending signal 11, pid 5484, pipe handle 0x118, nb 0, packsize 176, Win32 error 0
> > >   124917 [main] make 5484 sig_send: error sending signal -72, pid 5484, pipe handle 0x118, nb 0, packsize 176, Win32 error 0
> > > common/modules/GensrcModuleInfo.gmk:77: *** open: /home/yano/jdk/build/windows-x86-server-release/make-support/vardeps/make/common/modules/GensrcModuleInfo.gmk/jdk.accessibility/ALL_MODULES.vardeps: No such file or directory.  Stop.
> > > make[2]: *** [make/Main.gmk:141: jdk.accessibility-gensrc-moduleinfo] Error 2
> > > make[2]: *** Waiting for unfinished jobs....
> > > 
> > > 
> > > I looked into this new problem and found that wait_sig() thread
> > > crashes with segfault. It seems that accessing _main_tls causes
> > > access violation if a signal is sent just after the process is
> > > started.
> > > 
> > > static void WINAPI
> > > wait_sig (VOID *)
> > > {
> > >    [...]
> > >        if (!pack.mask)
> > > 	{
> > > 	  tl_entry = cygheap->find_tls (_main_tls);
> > > 	  dummy_mask = _main_tls->sigmask;       // <--- Segfault here
> > > 	  cygheap->unlock_tls (tl_entry);
> > > 	  pack.mask = &dummy_mask;
> > > 	}
> > > 
> > > I also found the following patch resolves the issue.
> > > 
> > > diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc
> > > index 62df96652..3824af199 100644
> > > --- a/winsup/cygwin/sigproc.cc
> > > +++ b/winsup/cygwin/sigproc.cc
> > > @@ -1325,6 +1325,10 @@ wait_sig (VOID *)
> > >     _sig_tls = &_my_tls;
> > >     bool sig_held = false;
> > > +  /* Wait for _main_tls initialization. */
> > > +  while (!cygwin_finished_initializing)
> > > +    Sleep (10);
> > > +
> > >     sigproc_printf ("entering ReadFile loop, my_readsig %p, my_sendsig %p",
> > >   		  my_readsig, my_sendsig);
> > > 
> > > I guess _main_tls may not be initialized correctly until
> > > cygwin_finished_initializing is set.
> > > 
> > > Any comments would be appreciated.
> 
> This seems reasonable to me.

Missed that, sorry. I agree this seems reasonable, but wouldn't it be
cleaner if we *start* wait_sig only after cygwin_finished_initializing
is set to true?


Corinna

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-04-28 14:09       ` Corinna Vinschen
@ 2022-04-28 15:01         ` Takashi Yano
  2022-04-28 15:32           ` Corinna Vinschen
  0 siblings, 1 reply; 18+ messages in thread
From: Takashi Yano @ 2022-04-28 15:01 UTC (permalink / raw)
  To: cygwin

On Thu, 28 Apr 2022 16:09:24 +0200
Corinna Vinschen wrote:
> On Apr 28 09:42, Ken Brown wrote:
> > On 4/27/2022 10:13 AM, Takashi Yano wrote:
> > > On Fri, 1 Apr 2022 17:45:51 +0900
> > > Takashi Yano wrote:
> > > > I have tried to reproduce the issue by building OpenJDK
> > > > from source, however, I could not.
> > > > 
> > > > Instead, I encountered another issue.
> > > > 
> > > > Building OpenJDK sometimes (rarely) failed with error such as:
> > > > 
> > > >        0 [sig] make 5484 sig_send: error sending signal 11, pid 5484, pipe handle 0x118, nb 0, packsize 176, Win32 error 0
> > > >   124917 [main] make 5484 sig_send: error sending signal -72, pid 5484, pipe handle 0x118, nb 0, packsize 176, Win32 error 0
> > > > common/modules/GensrcModuleInfo.gmk:77: *** open: /home/yano/jdk/build/windows-x86-server-release/make-support/vardeps/make/common/modules/GensrcModuleInfo.gmk/jdk.accessibility/ALL_MODULES.vardeps: No such file or directory.  Stop.
> > > > make[2]: *** [make/Main.gmk:141: jdk.accessibility-gensrc-moduleinfo] Error 2
> > > > make[2]: *** Waiting for unfinished jobs....
> > > > 
> > > > 
> > > > I looked into this new problem and found that wait_sig() thread
> > > > crashes with segfault. It seems that accessing _main_tls causes
> > > > access violation if a signal is sent just after the process is
> > > > started.
> > > > 
> > > > static void WINAPI
> > > > wait_sig (VOID *)
> > > > {
> > > >    [...]
> > > >        if (!pack.mask)
> > > > 	{
> > > > 	  tl_entry = cygheap->find_tls (_main_tls);
> > > > 	  dummy_mask = _main_tls->sigmask;       // <--- Segfault here
> > > > 	  cygheap->unlock_tls (tl_entry);
> > > > 	  pack.mask = &dummy_mask;
> > > > 	}
> > > > 
> > > > I also found the following patch resolves the issue.
> > > > 
> > > > diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc
> > > > index 62df96652..3824af199 100644
> > > > --- a/winsup/cygwin/sigproc.cc
> > > > +++ b/winsup/cygwin/sigproc.cc
> > > > @@ -1325,6 +1325,10 @@ wait_sig (VOID *)
> > > >     _sig_tls = &_my_tls;
> > > >     bool sig_held = false;
> > > > +  /* Wait for _main_tls initialization. */
> > > > +  while (!cygwin_finished_initializing)
> > > > +    Sleep (10);
> > > > +
> > > >     sigproc_printf ("entering ReadFile loop, my_readsig %p, my_sendsig %p",
> > > >   		  my_readsig, my_sendsig);
> > > > 
> > > > I guess _main_tls may not be initialized correctly until
> > > > cygwin_finished_initializing is set.
> > > > 
> > > > Any comments would be appreciated.
> > 
> > This seems reasonable to me.

Thanks Ken and Corinna.

> Missed that, sorry. I agree this seems reasonable, but wouldn't it be
> cleaner if we *start* wait_sig only after cygwin_finished_initializing
> is set to true?

I also thought so, however, there is a comment in dcrt0.cc
as follows. So, there seems to be some reason to start
wait_sig before cygwin_finished_initialization.

  /* Initialize signal processing here, early, in the hopes that the creation
     of a thread early in the process will cause more predictability in memory
     layout for the main thread. */
  if (!dynamically_loaded)
    sigproc_init ();


-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-04-28 15:01         ` Takashi Yano
@ 2022-04-28 15:32           ` Corinna Vinschen
  2022-04-29  9:10             ` Takashi Yano
  0 siblings, 1 reply; 18+ messages in thread
From: Corinna Vinschen @ 2022-04-28 15:32 UTC (permalink / raw)
  To: cygwin

On Apr 29 00:01, Takashi Yano wrote:
> On Thu, 28 Apr 2022 16:09:24 +0200
> Corinna Vinschen wrote:
> > On Apr 28 09:42, Ken Brown wrote:
> > > On 4/27/2022 10:13 AM, Takashi Yano wrote:
> > > > On Fri, 1 Apr 2022 17:45:51 +0900
> > > > Takashi Yano wrote:
> > > > > [...]
> > > > > diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc
> > > > > index 62df96652..3824af199 100644
> > > > > --- a/winsup/cygwin/sigproc.cc
> > > > > +++ b/winsup/cygwin/sigproc.cc
> > > > > @@ -1325,6 +1325,10 @@ wait_sig (VOID *)
> > > > >     _sig_tls = &_my_tls;
> > > > >     bool sig_held = false;
> > > > > +  /* Wait for _main_tls initialization. */
> > > > > +  while (!cygwin_finished_initializing)
> > > > > +    Sleep (10);
> > > > > +
> > > > >     sigproc_printf ("entering ReadFile loop, my_readsig %p, my_sendsig %p",
> > > > >   		  my_readsig, my_sendsig);
> > > > > 
> > > > > I guess _main_tls may not be initialized correctly until
> > > > > cygwin_finished_initializing is set.
> > > > > 
> > > > > Any comments would be appreciated.
> > > 
> > > This seems reasonable to me.
> 
> Thanks Ken and Corinna.
> 
> > Missed that, sorry. I agree this seems reasonable, but wouldn't it be
> > cleaner if we *start* wait_sig only after cygwin_finished_initializing
> > is set to true?
> 
> I also thought so, however, there is a comment in dcrt0.cc
> as follows. So, there seems to be some reason to start
> wait_sig before cygwin_finished_initialization.
> 
>   /* Initialize signal processing here, early, in the hopes that the creation
>      of a thread early in the process will cause more predictability in memory
>      layout for the main thread. */
>   if (!dynamically_loaded)
>     sigproc_init ();

This is a 32-bit only problem. The 64 bit address space layout is as
predictable as can be.  Maybe the above fix should go into 3.3 and for
3.4 we try differently?


Corinna

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-04-28 15:32           ` Corinna Vinschen
@ 2022-04-29  9:10             ` Takashi Yano
  2022-04-30 21:51               ` Ken Brown
  0 siblings, 1 reply; 18+ messages in thread
From: Takashi Yano @ 2022-04-29  9:10 UTC (permalink / raw)
  To: cygwin

On Thu, 28 Apr 2022 17:32:22 +0200
Corinna Vinschen wrote:
> On Apr 29 00:01, Takashi Yano wrote:
> > On Thu, 28 Apr 2022 16:09:24 +0200
> > Corinna Vinschen wrote:
> > > On Apr 28 09:42, Ken Brown wrote:
> > > > On 4/27/2022 10:13 AM, Takashi Yano wrote:
> > > > > On Fri, 1 Apr 2022 17:45:51 +0900
> > > > > Takashi Yano wrote:
> > > > > > [...]
> > > > > > diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc
> > > > > > index 62df96652..3824af199 100644
> > > > > > --- a/winsup/cygwin/sigproc.cc
> > > > > > +++ b/winsup/cygwin/sigproc.cc
> > > > > > @@ -1325,6 +1325,10 @@ wait_sig (VOID *)
> > > > > >     _sig_tls = &_my_tls;
> > > > > >     bool sig_held = false;
> > > > > > +  /* Wait for _main_tls initialization. */
> > > > > > +  while (!cygwin_finished_initializing)
> > > > > > +    Sleep (10);
> > > > > > +
> > > > > >     sigproc_printf ("entering ReadFile loop, my_readsig %p, my_sendsig %p",
> > > > > >   		  my_readsig, my_sendsig);
> > > > > > 
> > > > > > I guess _main_tls may not be initialized correctly until
> > > > > > cygwin_finished_initializing is set.
> > > > > > 
> > > > > > Any comments would be appreciated.
> > > > 
> > > > This seems reasonable to me.
> > 
> > Thanks Ken and Corinna.
> > 
> > > Missed that, sorry. I agree this seems reasonable, but wouldn't it be
> > > cleaner if we *start* wait_sig only after cygwin_finished_initializing
> > > is set to true?
> > 
> > I also thought so, however, there is a comment in dcrt0.cc
> > as follows. So, there seems to be some reason to start
> > wait_sig before cygwin_finished_initialization.
> > 
> >   /* Initialize signal processing here, early, in the hopes that the creation
> >      of a thread early in the process will cause more predictability in memory
> >      layout for the main thread. */
> >   if (!dynamically_loaded)
> >     sigproc_init ();
> 
> This is a 32-bit only problem. The 64 bit address space layout is as
> predictable as can be.  Maybe the above fix should go into 3.3 and for
> 3.4 we try differently?

I tried to move sigproc_init() call from dll_crt0_0() to
fork::child() for 64bit cygwin, however, that causes hang
at cygwin startup.

Am I missing somehting?

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-04-29  9:10             ` Takashi Yano
@ 2022-04-30 21:51               ` Ken Brown
  2022-05-01  0:51                 ` Takashi Yano
  0 siblings, 1 reply; 18+ messages in thread
From: Ken Brown @ 2022-04-30 21:51 UTC (permalink / raw)
  To: cygwin

On 4/29/2022 5:10 AM, Takashi Yano wrote:
> On Thu, 28 Apr 2022 17:32:22 +0200
> I tried to move sigproc_init() call from dll_crt0_0() to
> fork::child() for 64bit cygwin, however, that causes hang
> at cygwin startup.
> 
> Am I missing somehting?

I've never looked into the Cygwin startup code, so just ignore me if what I say 
is nonsense.

Currently sigproc_init is called either from dll_crt0_0 or from dll_crt0_1, 
depending on the value of dynamically_loaded.  What would happen if you always 
call it from dll_crt0_1, right after

   cygwin_finished_initializing = true;

Ken

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GNU make losing jobserver tokens
  2022-04-30 21:51               ` Ken Brown
@ 2022-05-01  0:51                 ` Takashi Yano
  0 siblings, 0 replies; 18+ messages in thread
From: Takashi Yano @ 2022-05-01  0:51 UTC (permalink / raw)
  To: cygwin

On Sat, 30 Apr 2022 17:51:03 -0400
Ken Brown wrote:
> On 4/29/2022 5:10 AM, Takashi Yano wrote:
> > On Thu, 28 Apr 2022 17:32:22 +0200
> > I tried to move sigproc_init() call from dll_crt0_0() to
> > fork::child() for 64bit cygwin, however, that causes hang
> > at cygwin startup.
> > 
> > Am I missing somehting?
> 
> I've never looked into the Cygwin startup code, so just ignore me if what I say 
> is nonsense.
> 
> Currently sigproc_init is called either from dll_crt0_0 or from dll_crt0_1, 
> depending on the value of dynamically_loaded.  What would happen if you always 
> call it from dll_crt0_1, right after
> 
>    cygwin_finished_initializing = true;

Thanks for the advice.
That causes hang on cygwin startup due to fork() fail :(

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-05-01  0:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-21 14:28 GNU make losing jobserver tokens Magnus Ihse Bursie
2022-03-21 15:09 ` Ken Brown
2022-03-22  6:54   ` Noel Grandin
2022-03-22 17:52     ` GNU make losing jobserver tokens in pipes Brian Inglis
2022-03-22 19:38   ` checking cyg version (was Re: GNU make losing jobserver tokens) L A Walsh
2022-03-22 21:58     ` Adam Dinwoodie
2022-03-22 23:06     ` Mark Geisert
2022-03-23 17:47       ` Samuel Lelièvre
2022-03-23  6:24 ` GNU make losing jobserver tokens Roumen Petrov
2022-04-01  8:45 ` Takashi Yano
2022-04-27 14:13   ` Takashi Yano
2022-04-28 13:42     ` Ken Brown
2022-04-28 14:09       ` Corinna Vinschen
2022-04-28 15:01         ` Takashi Yano
2022-04-28 15:32           ` Corinna Vinschen
2022-04-29  9:10             ` Takashi Yano
2022-04-30 21:51               ` Ken Brown
2022-05-01  0:51                 ` Takashi Yano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).