fork failure?

public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed

* fork failure?
@ 2009-10-15 14:34 Charles Wilson
  2009-10-15 14:56 ` Dave Korn
  0 siblings, 1 reply; 28+ messages in thread
From: Charles Wilson @ 2009-10-15 14:34 UTC (permalink / raw)
  To: cygwin

I'm trying to track down a rather weird problem. I don't have an STC
because none of my attepted STCs exhibit the problem.  Schematically, I
have:

=============================================
pid_t pid = fork();

if (pid < 0) {
  printf("fork failed\n");
  exit (1);
}

if (pid > 0) {
  /* parent */
  printf("parent: child pid=%d\n", pid);
  sleep(30);
  exit(0);
}

/* child */
printf("child lives!");
sleep(30);
exit(0);
=============================================

although the actual app is much more complicated, AND an STC based on
the above actually works as expected. But, in the actual app, what
happens is:

1) fork appears to succeed, because I see:
parent: child pid=xxx

2) but fork fails, because I never see:
child lives!
Worse, I never see the pid xxx show up in 'ps' output, even though I
have plenty of time (30 seconds) to spot it (and STCs based on the above
simplification do work as expected; I see 'child lives!' and the process
shows up in 'ps').

I even tried (in the actual app) other methods of indication than
printing to some FILE*, since maybe stdout got scrogged by fork, such as:
a) writing to an inherited fd other than 0,1,2
b) writing to a brand new fd or FILE* opened by the child
c) simply touching a "sentinel" file to change its atime
Nothing. For all I can tell, the fork() fails to *actually* produce a
child process -- even though the *parent* seems to think one has been
created, and has a pid for the so-called child. Which doesn't actually
exist.

Can anybody think of a reason that might cause this behavior? I'm using
stock cygwin-1.7.0-62...

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 14:34 fork failure? Charles Wilson
@ 2009-10-15 14:56 ` Dave Korn
  2009-10-15 15:54   ` Charles Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Korn @ 2009-10-15 14:56 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:

> Nothing. For all I can tell, the fork() fails to *actually* produce a
> child process -- even though the *parent* seems to think one has been
> created, and has a pid for the so-called child. Which doesn't actually
> exist.
> 
> Can anybody think of a reason that might cause this behavior? I'm using
> stock cygwin-1.7.0-62...

  Child gets started long enough to communicate with parent but then dies
subsequently, still in early init, but late enough that the parent process
create retries don't kick in.  Don't hang about; get strace or procmon
straight on the case with this, find out definitively if that process actually
does get created or not.  Oh, and maybe try a --enable-debugging build and set
CYGWIN_DEBUG so you can intercept it if it does get started.

    cheers,
      DaveK

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 14:56 ` Dave Korn
@ 2009-10-15 15:54   ` Charles Wilson
  2009-10-15 16:35     ` Dave Korn
  0 siblings, 1 reply; 28+ messages in thread
From: Charles Wilson @ 2009-10-15 15:54 UTC (permalink / raw)
  To: cygwin

Dave Korn wrote:
>   Child gets started long enough to communicate with parent but then dies
> subsequently, still in early init, but late enough that the parent process
> create retries don't kick in.  Don't hang about; get strace or procmon
> straight on the case with this, find out definitively if that process actually
> does get created or not.  Oh, and maybe try a --enable-debugging build and set
> CYGWIN_DEBUG so you can intercept it if it does get started.

Well, here's where the complications of the real program kick in: it's a
daemon, which itself is earlier forked in order to dissociate with the
controlling terminal.  At this point in the process, it's trying to
fork/exec a different program as a slave to do some work, and
communicate results back to the main process.

Anyway, because the daemon itself was earlier forked, I can't use
'strace <cmd line>' because that only traces the initial process, not
the daemonized one.  If I launch <cmd line> and then attach strace to
the eventual pid of the daemonized process, it hangs (both strace and
the process).  For some reason, if I launch the original program in
non-daemon mode, I can't get it to work at all, strace or not -- I'm
probably invoking it incorrectly, but I can't see how from the man page.

But that's what's so strange: whatever this is, occurs *after* the
fork() in the child -- if there IS a child -- but before the exec() of
the slave.  So it's not like it can't find the slave program -- it
hasn't even looked yet...

I'm not familiar at all with procmon (sysinternals, right?) but I'll
look into it.

P.S. I fear I'm doing something wrong in my cygwin CVS builds, because
the last several times I have done so I've gotten weird behaviors after
installing the dll and .a -- /some/ newly built progs against the new
DLL die in weird ways, that do NOT occur when using the official
snapshots from the same source checkout.  This just started happening
very recently, but I'm not sure what changed (this has nothing to do
with the recent pseudo-reloc stuff, because I rolled back to before that
and still see the same problem).  So it's not the cygwin source code,
it's my build/install procedure somehow:

  ${src}/configure \
    --srcdir=${src} --prefix=/usr \
    --exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib \
    --includedir=/nonexistent/include \
    --with-included-gettext
  make CFLAGS="-ggdb -O0" tooldir=/usr && \
  make info CFLAGS="-ggdb -O0" tooldir=/usr
  make install prefix=$inst/usr exec_prefix=$inst/usr \
    bindir=$inst/usr/bin libdir=$inst/usr/lib \
    sysconfdir=$inst/etc includedir=$inst/usr/include \
    tooldir=$inst/usr && \
  make install-info prefix=$inst/usr exec_prefix=$inst/usr \
    bindir=$inst/usr/bin libdir=$inst/usr/lib \
    sysconfdir=$inst/etc includedir=$inst/usr/include \
    tooldir=$inst/usr

...is what I've been doing, and it's worked for years until recently.
IIRC, it was taken by inspection of mknetrel, at some point in the
distant past. I notice that it differs from
http://cygwin.com/faq/faq.programming.html#faq.programming.building-cygwin

I do not have cocom installed, nor docbook-xml42, docbook-xsl, and xmlto.

Care to post your recipe, Dave? I'm sure it's more up-to-date than the
FAQ...

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 15:54   ` Charles Wilson
@ 2009-10-15 16:35     ` Dave Korn
  2009-10-15 17:07       ` Charles Wilson
  2009-10-15 23:33       ` Charles Wilson
  0 siblings, 2 replies; 28+ messages in thread
From: Dave Korn @ 2009-10-15 16:35 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:

> the daemonized one.  If I launch <cmd line> and then attach strace to
> the eventual pid of the daemonized process, it hangs (both strace and
> the process).  

  How about "gdb --attach PID"?  Does that succeed?  GDB has the advantage of
being a Cygwin rather than Win32 exe, which might make it work better when
taking hold of the running process.

> For some reason, if I launch the original program in
> non-daemon mode, I can't get it to work at all, strace or not -- I'm
> probably invoking it incorrectly, but I can't see how from the man page.

  Well, that's pretty dubious right there; I'd focus on solving that problem
first, you want to be sure you've got all the basics correctly working before
you try to debug it in a more complicated environment such as running daemonized.

> I'm not familiar at all with procmon (sysinternals, right?) but I'll
> look into it.

  Yep, it's dead useful for making sure that stuff at least starts up, and you
can often get a clue how far the code has got by seeing what handles it's
opened and syscalls its made.

> P.S. I fear I'm doing something wrong in my cygwin CVS builds, 

  Didn't spot anything terribly suspicious there I'm afraid.

> Care to post your recipe, Dave? I'm sure it's more up-to-date than the
> FAQ...

  It's nothing special: roughly, since I'm doing this purely from memory
untested, it goes like so -

${src}/configure -v --prefix=/usr
make -j6
make DESTDIR=`pwd`/.inst install
cd .inst/usr
mv etc ..
mv i686-pc-cygwin/{include,lib} .
rmdir i686-pc-cygwin
rm include/iconv.h
^^^^^^^^^^^^^^^^^^^^^ v.important that last one
cd bin/
cp ../../i686-pc-cygwin/winsup/cygwin/cygwin1.dbg .
rename cygwin1 cygwin1-`date +%Y%m%d` cygwin1*
cd ../..
tar cfvj cygwin1-`date +%Y%m%d`.tar.bz2 etc/ usr/
tar -C / -xjf cygwin1-`date +%Y%m%d`.tar.bz2

  Then I exit my final bash shell and rename the new dll and dbg files in
place using cmd.exe.

    cheers,
      DaveK

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 16:35     ` Dave Korn
@ 2009-10-15 17:07       ` Charles Wilson
  2009-10-15 17:21         ` Charles Wilson
  2009-10-15 17:33         ` Christopher Faylor
  2009-10-15 23:33       ` Charles Wilson
  1 sibling, 2 replies; 28+ messages in thread
From: Charles Wilson @ 2009-10-15 17:07 UTC (permalink / raw)
  To: cygwin

Dave Korn wrote:
> Charles Wilson wrote:
>> the daemonized one.  If I launch <cmd line> and then attach strace to
>> the eventual pid of the daemonized process, it hangs (both strace and
>> the process).  
> 
>   How about "gdb --attach PID"?  Does that succeed?  GDB has the advantage of
> being a Cygwin rather than Win32 exe, which might make it work better when
> taking hold of the running process.

Well, since we don't support set fork-follow-child, I'm stuck in the
parent (and I don't get far enough in the child to reach a 'sleep(N)'
call, so the typical attach-after-fork approach won't work either).  And
the parent thinks everything is fine.  I haven't tried linking against a
debug-built DLL so as to step into the fork() call itself (hmm...why
aren't the cygwin1.dbg files available for the 1.7.0-nn releases? They
used to be shipped with 1.5.2x releases...) but I don't think that will
show me anything relevant. Again, I'm stuck in the parent process...

>> For some reason, if I launch the original program in
>> non-daemon mode, I can't get it to work at all, strace or not -- I'm
>> probably invoking it incorrectly, but I can't see how from the man page.
> 
>   Well, that's pretty dubious right there; I'd focus on solving that problem
> first, you want to be sure you've got all the basics correctly working before
> you try to debug it in a more complicated environment such as running daemonized.

I'm fighting a double learning curve here; it's gpg-agent from gnupg2
[*] that I'm trying to get working -- but the fork call is implemented
in the (external, static) library libassuan, so the change/rebuild/test
cycle is a PITA.  But I am not all that familiar with gpg even on linux, so:

1) odd behavior
2) is that a bug, or me screwing up, or is it supposed to do that?
3) check linux
4) hmm...go back to 1)

>> I'm not familiar at all with procmon (sysinternals, right?) but I'll
>> look into it.
> 
>   Yep, it's dead useful for making sure that stuff at least starts up, and you
> can often get a clue how far the code has got by seeing what handles it's
> opened and syscalls its made.
> 
>> P.S. I fear I'm doing something wrong in my cygwin CVS builds, 
> 
>   Didn't spot anything terribly suspicious there I'm afraid.
> 
>> Care to post your recipe, Dave? I'm sure it's more up-to-date than the
>> FAQ...
> 
>   It's nothing special: roughly, since I'm doing this purely from memory
> untested, it goes like so -
...
>   Then I exit my final bash shell and rename the new dll and dbg files in
> place using cmd.exe.

Thanks...I'm going to try a snapshot first with its .dbg file, then use
your recipe to build my own.

[*] requires the existing libgpg-error and libgcrypt packages, plus
(newly ported):
  libksba
  pth
  pinentry
  libassuan
  gnupg2
I can post all of these ports somewhere if somebody wants to help track
this problem down?  gpg2 itself seems to work fine.  I haven't tested
any of the smartcard/usb stuff, nor gpgsm (S/MIME enabled proggie).  I'm
just trying to get gpg-agent working.  Current behavior is:

1) launch gpg as a deamon
2) set GPG_AGENT_INFO in some shell
3) run a gpg2 command to sign something
   what ought to happen is that gpg2 tells the gpg-agent to get the
passphrase.  First gpg-agent tries to do a lookup in its cache, which
fails, and then it should try to run one of the pinentry programs [**],
you enter your passphrase, and then report the result back to gpg2.

   what DOES happen is that, while gpg2 and gpg-agent can communicate,
gpg-agent fails to fork/exec the pinentry program for entry of a
passphrase not found in gpg-agent's cache.  Confusingly, this is
reported as a problem communicating with gpg-agent -- when it isn't.
It's a problem with gpg-agent fork/execing a third program (pinentry).

Note that libksba and pth pass all tests. pinentry doesn't have any
built-in tests, but manual testing works ok [**].  libassuan fails its
one test, which is "passing file descriptors to separate processes", but
that doesn't apply here, because we;re talking about fork/exec, (e.g.
process inheritance as already handled by cygwin's fork) not completely
unrelated processes.

Oh, CRAP.

Wait.

The libassuan test ALSO uses fork/exec.  It is NOT trying to pass fds
between completely unrelated processes.  I bet if I get libassuan's test
working, that will solve the gpg-agent problem too.

Well, at least that makes the change/build/test cycle easier.  And it
means I don't need to worry about 'why can't I get gpg-agent to work in
non-daemon mode'.

[**] I've build -curses, gtk, and gtk-2. Each works standalone, if you
manually pump it with the stdin "commands" using the protocol it
normally uses to communicate with the gpg-agent.
 --
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 17:07       ` Charles Wilson
@ 2009-10-15 17:21         ` Charles Wilson
  2009-10-15 17:33         ` Christopher Faylor
  1 sibling, 0 replies; 28+ messages in thread
From: Charles Wilson @ 2009-10-15 17:21 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:
> Oh, CRAP.
> 
> Wait.
> 
> The libassuan test ALSO uses fork/exec.  It is NOT trying to pass fds
> between completely unrelated processes.  I bet if I get libassuan's test
> working, that will solve the gpg-agent problem too.
> 
> Well, at least that makes the change/build/test cycle easier.  And it
> means I don't need to worry about 'why can't I get gpg-agent to work in
> non-daemon mode'.

False alarm. Yes, while the two process do have a parent/child
relationship, (a) fork succeeds -- an actual child process is created
(b) the file descriptor passwed is NOT opened prior to fork and
inherited, in this test. It is opened by the parent AFTER the child is
forked, and (supposedly) sent to the child over fairly typical
parent/child pipes using the assuan protocol.  This doesn't work on
cygwin, as discussed in other threads.

So, I'm back to
  1) muck with libassuan, build, install
  2) muck with gpg-agent. build, install
  3) test gpg-agent using gpg2
  4) return to 1)

sigh.

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 17:07       ` Charles Wilson
  2009-10-15 17:21         ` Charles Wilson
@ 2009-10-15 17:33         ` Christopher Faylor
  2009-10-15 18:17           ` Dave Korn
  1 sibling, 1 reply; 28+ messages in thread
From: Christopher Faylor @ 2009-10-15 17:33 UTC (permalink / raw)
  To: cygwin

On Thu, Oct 15, 2009 at 01:07:10PM -0400, Charles Wilson wrote:
>Dave Korn wrote:
>> Charles Wilson wrote:
>>> the daemonized one.  If I launch <cmd line> and then attach strace to
>>> the eventual pid of the daemonized process, it hangs (both strace and
>>> the process).  
>> 
>>   How about "gdb --attach PID"?  Does that succeed?  GDB has the advantage of
>> being a Cygwin rather than Win32 exe, which might make it work better when
>> taking hold of the running process.
>
>Well, since we don't support set fork-follow-child, I'm stuck in the
>parent (and I don't get far enough in the child to reach a 'sleep(N)'
>call, so the typical attach-after-fork approach won't work either).  And
>the parent thinks everything is fine.  I haven't tried linking against a
>debug-built DLL so as to step into the fork() call itself (hmm...why
>aren't the cygwin1.dbg files available for the 1.7.0-nn releases? They
>used to be shipped with 1.5.2x releases...)

...different person doing the release...

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 17:33         ` Christopher Faylor
@ 2009-10-15 18:17           ` Dave Korn
  2009-10-15 19:21             ` Charles Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Korn @ 2009-10-15 18:17 UTC (permalink / raw)
  To: cygwin

Christopher Faylor wrote:
> On Thu, Oct 15, 2009 at 01:07:10PM -0400, Charles Wilson wrote:
> (hmm...why
>> aren't the cygwin1.dbg files available for the 1.7.0-nn releases? They
>> used to be shipped with 1.5.2x releases...)
> 
> ...different person doing the release...

 ... who leaves a copy in the -src tarball instead.

    cheers,
      DaveK

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 18:17           ` Dave Korn
@ 2009-10-15 19:21             ` Charles Wilson
  2009-10-16  7:58               ` Corinna Vinschen
  0 siblings, 1 reply; 28+ messages in thread
From: Charles Wilson @ 2009-10-15 19:21 UTC (permalink / raw)
  To: cygwin

Dave Korn wrote:
>  ... who leaves a copy in the -src tarball instead.
Oh, I didn't know that. Thanks, Dave.

I guess that kinda makes sense; you need the exact source for the .dbg
to work anyway.  I'd been rolling my CVS dir back to the specified
version, and then using 'set substitute-path' to make it work.  If I
download the release source tarball, I might as well unpack it in
(??guess??) /mnt/netrel/corinna/foo/ or wherever the .dbg indicates that
the source code was when the release was built, and leave my CVS dir alone.

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 16:35     ` Dave Korn
  2009-10-15 17:07       ` Charles Wilson
@ 2009-10-15 23:33       ` Charles Wilson
  2009-10-15 23:58         ` Dave Korn
                           ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Charles Wilson @ 2009-10-15 23:33 UTC (permalink / raw)
  To: cygwin

Dave Korn wrote:
> Charles Wilson wrote:
>> I'm not familiar at all with procmon (sysinternals, right?) but I'll
>> look into it.
> 
>   Yep, it's dead useful for making sure that stuff at least starts up, and you
> can often get a clue how far the code has got by seeing what handles it's
> opened and syscalls its made.

Well, it appears that the child is dying in dcrt0.c dll_crt0_1() when it
calls cygheap->cwd.init().  The line numbers below are a little messed
up (I need to build with -O0 to get more info), but here are the last
four interesting events from the child...with the top few frames of the
stack trace for each, manually converted using addr2line...



43673	6:58:17.3634216 PM	gpg-agent.exe	568	CreateFile	C:\cygwin-1.7
SUCCESS	Desired Access: Execute/Traverse, Synchronize, Disposition:
Open, Options: Directory, Synchronous IO Non-Alert, Attributes: n/a,
ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened	7628

	"12","ntdll.dll","ntdll.dll + 0x587f4",	"0x771387f4",
	"13","cygwin1.dll","cygwin1.dll + 0x89fe8","0x61089fe8"
		path.cc:3225
		cwdstuff::set(_UNICODE_STRING*, char const*, bool)
	"14","cygwin1.dll","cygwin1.dll + 0x8a4d8","0x6108a4d8",
		path.cc:3168
		cwdstuff::init()
	"15","cygwin1.dll","cygwin1.dll + 0x6722","0x61006722",
		dcrt0.c:798
		dll_crt0_1()

43674	6:58:17.3634756 PM	gpg-agent.exe	568	CloseFile	C:\cygwin-1.7
SUCCESS		7628

	"11","ntdll.dll","ntdll.dll + 0x57f54","0x77137f54",
	"12","cygwin1.dll","cygwin1.dll + 0x8a003","0x6108a003",
		path.cc:3241
		cwdstuff::set(_UNICODE_STRING*, char const*, bool)
	"13","cygwin1.dll","cygwin1.dll + 0x8a4d8","0x6108a4d8",
		path.cc:3168
		cwdstuff::init()
	"14","cygwin1.dll","cygwin1.dll + 0x6722","0x61006722",
		dcrt0.c:798
		dll_crt0_1()


NOTE: next entry is from thread 2228, not 7628
53626	6:58:19.0002272 PM	gpg-agent.exe	568	Thread Exit		SUCCESS	User
Time: 0.0000000, Kernel Time: 0.0000000	2228
	
	"5","kernel32.dll","kernel32.dll + 0x4046b","0x7704046b",
	"6","cygwin1.dll","cygwin1.dll + 0xbabf5","0x610babf5",
		sigproc.cc:1191
		wait_sig(void*)
	"7","cygwin1.dll","cygwin1.dll + 0x3fec","0x61003fec",
		cygthread.cc:50
		cygthread::callfunc(bool)

53627	6:58:19.0005306 PM	gpg-agent.exe	568	Thread Exit		SUCCESS	User
Time: 0.0000000, Kernel Time: 0.0156001	7628

	"7","ntkrnlpa.exe","ntkrnlpa.exe + 0x57a1a","0x820a3a1a"
	"8","ntdll.dll","ntdll.dll + 0x80e35","0x77160e35",

So it looks to me like something in cwdstuff::init() triggered a signal,
which was caught by cygwin and handled by aborting.

Does that analysis look right?  If so, then I guess I need to rebuild
the DLL with --enable-debugging and -O0, so I can find out exactly WHY
that's happening.

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 23:33       ` Charles Wilson
@ 2009-10-15 23:58         ` Dave Korn
  2009-10-16  0:31         ` Dave Korn
  2009-10-16  7:35         ` Charles Wilson
  2 siblings, 0 replies; 28+ messages in thread
From: Dave Korn @ 2009-10-15 23:58 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:
> So it looks to me like something in cwdstuff::init() triggered a signal,
> which was caught by cygwin and handled by aborting.
>
> Does that analysis look right?

  It certainly does.

> [ ... ] appears that the child is dying in dcrt0.c dll_crt0_1() when it
> calls cygheap->cwd.init().  

  That so rings a bell, I'm sure I've had that happen to me or seen or read it
before somewhere.  Is that maybe where it crashed when the DLL was in the root
dir of a drive and it wasn't able to ascend up a level to find the definition
of '/' after assuming the dll had to be in '/bin'?

    cheers,
      DaveK

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 23:33       ` Charles Wilson
  2009-10-15 23:58         ` Dave Korn
@ 2009-10-16  0:31         ` Dave Korn
  2009-10-16  0:46           ` Dave Korn
  2009-10-16  2:06           ` Charles Wilson
  2009-10-16  7:35         ` Charles Wilson
  2 siblings, 2 replies; 28+ messages in thread
From: Dave Korn @ 2009-10-16  0:31 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:

> 43673	6:58:17.3634216 PM	gpg-agent.exe	568	CreateFile	C:\cygwin-1.7
> SUCCESS	Desired Access: Execute/Traverse, Synchronize, Disposition:
> Open, Options: Directory, Synchronous IO Non-Alert, Attributes: n/a,
> ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened	7628

> 43674	6:58:17.3634756 PM	gpg-agent.exe	568	CloseFile	C:\cygwin-1.7
> SUCCESS		7628

  And then "boom".  I suspect that must be:

>       status = NtOpenFile (&h, SYNCHRONIZE | FILE_TRAVERSE, &attr, &io,
> 			   FILE_SHARE_VALID_FLAGS,
> 			   FILE_DIRECTORY_FILE
> 			   | FILE_SYNCHRONOUS_IO_NONALERT
> 			   | FILE_OPEN_FOR_BACKUP_INTENT);
>       if (!NT_SUCCESS (status))
> 	{
> 	  RtlReleasePebLock ();
> 	  __seterrno_from_nt_status (status);
> 	  res = -1;
> 	  goto out;
> 	}
>       /* Workaround a problem in Vista/Longhorn which fails in subsequent
> 	 calls to CreateFile with ERROR_INVALID_HANDLE if the handle in
> 	 CurrentDirectoryHandle changes without calling SetCurrentDirectory,
> 	 and the filename given to CreateFile is a relative path.  It looks
> 	 like Vista stores a copy of the CWD handle in some other undocumented
> 	 place.  The NtClose/DuplicateHandle reuses the original handle for
> 	 the copy of the new handle and the next CreateFile works.
> 	 Note that this is not thread-safe (yet?) */
>       NtClose (*phdl);
>       if (DuplicateHandle (GetCurrentProcess (), h, GetCurrentProcess (), phdl,
> 			   0, TRUE, DUPLICATE_SAME_ACCESS))
> 	NtClose (h);
>       else
> 	*phdl = h;
>       dir = *phdl;

  I'd investigate phdl.  What OS version are you using?  32- or 64-bit?  Have
we possibly got the wrong packing or alignment of the PEB structure or
something like that?  I'm not sure if I'm reading it right (or if the sources
are necessarily correct), but

http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Process/PEB.html

> typedef struct _PEB {
>   BOOLEAN InheritedAddressSpace;
>   BOOLEAN ReadImageFileExecOptions;
>   BOOLEAN BeingDebugged;
>   BOOLEAN Spare;
>   HANDLE Mutant;
>   PVOID ImageBaseAddress;
>   PPEB_LDR_DATA LoaderData;
>   PRTL_USER_PROCESS_PARAMETERS ProcessParameters;

http://www.nirsoft.net/kernel_struct/vista/PEB.html

> typedef struct _PEB
> {
>      UCHAR InheritedAddressSpace;
>      UCHAR ReadImageFileExecOptions;
>      UCHAR BeingDebugged;
>      UCHAR BitField;
>      ULONG ImageUsesLargePages: 1;
>      ULONG IsProtectedProcess: 1;
>      ULONG IsLegacyProcess: 1;
>      ULONG IsImageDynamicallyRelocated: 1;
>      ULONG SpareBits: 4;
>      PVOID Mutant;
>      PVOID ImageBaseAddress;
>      PPEB_LDR_DATA Ldr;
>      PRTL_USER_PROCESS_PARAMETERS ProcessParameters;


  Now is it just my imagination, or does Vista really insert an extra ULONG
bitfield at offset 4 in the struct resulting in all subsequent members being
offset by 4 relative to other versions of Windows?  This needs verifying
against the official MS headers.

    cheers,
      DaveK


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-16  0:31         ` Dave Korn
@ 2009-10-16  0:46           ` Dave Korn
  2009-10-16  2:06           ` Charles Wilson
  1 sibling, 0 replies; 28+ messages in thread
From: Dave Korn @ 2009-10-16  0:46 UTC (permalink / raw)
  To: Dave Korn; +Cc: cygwin

Dave Korn wrote:

>   I'd investigate phdl.  

>   Now is it just my imagination, or does Vista really insert an extra ULONG
> bitfield at offset 4 in the struct resulting in all subsequent members being
> offset by 4 relative to other versions of Windows?  This needs verifying
> against the official MS headers.

  Nope, just looks like an artifact of the way that HTML documentation was
generated; the supposed "ULONG :1" bitfields are actually just the bits of the
BitField UCHAR, and not actually ULONG at all:

http://forum.sysinternals.com/forum_posts.asp?TID=14624

> From WinDbg, in Vista SP1:
> 0:000> dt ntdll!_PEB
>    +0x000 InheritedAddressSpace : UChar
>    +0x001 ReadImageFileExecOptions : UChar
>    +0x002 BeingDebugged    : UChar
>    +0x003 BitField         : UChar
>    +0x003 ImageUsesLargePages : Pos 0, 1 Bit
>    +0x003 IsProtectedProcess : Pos 1, 1 Bit
>    +0x003 IsLegacyProcess  : Pos 2, 1 Bit
>    +0x003 IsImageDynamicallyRelocated : Pos 3, 1 Bit
>    +0x003 SkipPatchingUser32Forwarders : Pos 4, 1 Bit
>    +0x003 SpareBits        : Pos 5, 3 Bits
>    +0x004 Mutant           : Ptr32 Void
>    +0x008 ImageBaseAddress : Ptr32 Void
>    +0x00c Ldr              : Ptr32 _PEB_LDR_DATA
>    +0x010 ProcessParameters : Ptr32 _RTL_USER_PROCESS_PARAMETERS

  Ah well, sorry about the red herring.

  (Mind you, it does also say

> // The PEB and TEB structures are subject to changes between Windows
> // releases, thus the fields offsets may change as well as the Reserved
> // fields.  The Reserved fields are reserved for use only by the Windows
> // operating systems.  Do not assume a maximum size for the structures.

but that doesn't seem to be a possibility here in particular right at the
start of the struct.)

    cheers,
      DaveK

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-16  0:31         ` Dave Korn
  2009-10-16  0:46           ` Dave Korn
@ 2009-10-16  2:06           ` Charles Wilson
  1 sibling, 0 replies; 28+ messages in thread
From: Charles Wilson @ 2009-10-16  2:06 UTC (permalink / raw)
  To: cygwin

Dave Korn wrote:
>   I'd investigate phdl.  What OS version are you using?  32- or 64-bit?  Have
> we possibly got the wrong packing or alignment of the PEB structure or
> something like that?

32bit Vista SP1. I can't imagine that the PEB would be messed up,
because then /every/ fork would die horribly -- not just this one. More
later...

--
Chuck


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 23:33       ` Charles Wilson
  2009-10-15 23:58         ` Dave Korn
  2009-10-16  0:31         ` Dave Korn
@ 2009-10-16  7:35         ` Charles Wilson
  2009-10-16 17:29           ` Charles Wilson
  2 siblings, 1 reply; 28+ messages in thread
From: Charles Wilson @ 2009-10-16  7:35 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:
> Dave Korn wrote:
>> Charles Wilson wrote:
>>> I'm not familiar at all with procmon (sysinternals, right?) but I'll
>>> look into it.
>>   Yep, it's dead useful for making sure that stuff at least starts up, and you
>> can often get a clue how far the code has got by seeing what handles it's
>> opened and syscalls its made.
> 
> Well, it appears that the child is dying in dcrt0.c dll_crt0_1() when it
> calls cygheap->cwd.init().  

Apparently not.

It seems that there is a large time gap between the NT syscalls
displayed by procmon:

43673	6:58:17.3634216 PM gpg-agent.exe   568  CreateFile
43674	6:58:17.3634756 PM gpg-agent.exe   568  CloseFile
          <<< large gap here, where the actual error   >>>
          <<< occurs, prior to signal being handled... >>>
53626	6:58:19.0002272 PM gpg-agent.exe   568  Thread Exit <<HERE>>
53627	6:58:19.0005306 PM pg-agent.exe    568  Thread Exit

See the timestamps? Almost 1.5 seconds after the
wdstuff::set(_UNICODE_STRING*, char const*, bool) call, before the
signal handler thread gets woken up and kills the process.

After adding a ton of console_printf()s, I see the following (which is
displayed in the console in which gpg-agent is running, triggered when I
launch gpg2 in a separate window).

<<< gpg2 launched >>>
about to setjmp using 0x7FF8F5CC               [1]
returned from setjmp using 0x7FF8F5CC (parent) [2]
phdl=0x00880BEC, *phdl=0x00000018
attr.RootDirectory=0x00000018
h=0x000000F8 status=0x00000000
phdl closed
dup succeeded, h closed (new phdl=0x00880BEC, *phdl=0x00000018)
A
res=0
after cygheap->cwd.init()
in_forkee; skipped pthread::init_mainthread()
in_forkee; pre-forkee
in_forkee; about to longjmp using 0x00881560   [3]
returned from setjmp using 0x7FF8F5CC (child)  [2]

Which appears ok, as far as it goes. 'Course, something dies eventually
because I never do get to gpg-agent's "exec()" of pinvoke, after that fork.

[1] this is in fork(), near fork.cc:598. just before the setjmp call.
[2] this is in fork(), near fork.cc:598. just after the setjmp call.
[3] this is in dll_crt0_1(), near dcrt0.cc:840, just before the longjmp
call.

The rest of the lines are various checkpoints in dll_crt0_1()/dcrt0.cc
and in cwdstuff::set()/path.cc.

Man, this is tedious...

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-15 19:21             ` Charles Wilson
@ 2009-10-16  7:58               ` Corinna Vinschen
  0 siblings, 0 replies; 28+ messages in thread
From: Corinna Vinschen @ 2009-10-16  7:58 UTC (permalink / raw)
  To: cygwin

On Oct 15 15:20, Charles Wilson wrote:
> Dave Korn wrote:
> >  ... who leaves a copy in the -src tarball instead.
> Oh, I didn't know that. Thanks, Dave.
> 
> I guess that kinda makes sense; you need the exact source for the .dbg
> to work anyway.  I'd been rolling my CVS dir back to the specified
> version, and then using 'set substitute-path' to make it work.  If I
> download the release source tarball, I might as well unpack it in
> (??guess??) /mnt/netrel/corinna/foo/ or wherever the .dbg indicates that
> the source code was when the release was built, and leave my CVS dir alone.

/ext/build/netrel/src/cygwin-1.7.0-62


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-16  7:35         ` Charles Wilson
@ 2009-10-16 17:29           ` Charles Wilson
  2009-10-16 18:04             ` Dave Korn
  0 siblings, 1 reply; 28+ messages in thread
From: Charles Wilson @ 2009-10-16 17:29 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:
> Which appears ok, as far as it goes. 'Course, something dies eventually
> because I never do get to gpg-agent's "exec()" of pinvoke, after that fork.

Well, bad news. In the kernel, the child gets right up to the return
statement in fork(), returning 0.  But it never reaches the line after
the call to fork() in the user code.

Help?

In libassuan (which provides fork-a-child and communication protocols
between parent and child for gpg-agent), I've done this:

=====
cygwin_internal(CW_CONSOLE_PRINTF, "***USER: [%s] about to fork
(pid=%d)\n", name, getpid());

  (*ctx)->pid = fork ();

cygwin_internal(CW_CONSOLE_PRINTF, "***USER: [%s] after fork (mypid=%d,
forkpid=%d)\n", name, getpid(), (*ctx)->pid);
=====

where CW_CONSOLE_PRINTF is a new call that allows access from user space
to the kernel mode console_print() [*] functionality -- I know this is
not something we'd really want in the kernel, but it helps me ensure
that my in-kernel and user-mode debug output go to the same place.

[*] actually, a new console_vprintf(), for obvious reasons

I'm stuck with this printf-style debugging because (a) strace causes the
process to hang, and (b) attaching with gdb follows only the parent. I
haven't tried the FORKDEBUG or CYGWIN_FORK_SLEEP yet...  The former
probably won't work, as I'm debugging a grandchild but don't want to
debug the child.

Now, inside fork.cc I have a ton of console_printf's. Most
interestingly, I have

===== in frok::child =====
...
console_printf("child: frok::child returning 0\n");
  return 0;
}
==========================

and in fork() itself, the following is very near the end of the function:

===== in fork() =====
  if (ischild || res > 0)
{
console_printf("(child?): everything is ok (ischild=%d)\n",ischild);
    /* everything is ok */;
}
  else
     ... error handling ...

  syscall_printf ("%d = fork()", res);
console_printf("returning from fork: ischild=%d, res=%d\n",ischild,res);
  return res;
}
==========================

Here's my output:

   ***USER: [/usr/bin/pinentry-gtk-2.exe] about to fork (pid=6664)
   about to setjmp using 0x7FF8F5DC
   returned from setjmp using 0x7FF8F5DC (parent)
   ...
   (child): about to call grouped.child(esp)
[this is stuff in frok::child():]
   child is running.  pid 1452, ppid 0, stack here 0x7FF8F534
   child: sync with parent returned
   child: hParent 0x228, load_dlls 0, fork_info->stacksize 0x00000000
   child: about to call set_cygwin_privileges
   child: about to call clear_procimptoken
   child: about to call cygheap->user.reimpersonate
   child: about to do some debugging stuff
   child: about fixup_shms_after_fork
   child: getting ready to finish up: load_dlls=0
   child:a returned from fixup_after_fork; syncing with parent
   child:a returned from sync_with_parent
[this is the parent, now that the chile has sync'ed:]
   (child?): everything is ok (ischild=0)
   returning from fork: ischild=0, res=1452
[child again:]
   child: returned from init_console_handle. calling ForceCloseHandle1
[this is the parent, back user code in libassuan:]
   ***USER: [/usr/bin/pinentry-gtk-2.exe] after fork
            (mypid=6664,forkpid=1452)
[child is still in kernel, in frok::child()]
   child: about to call pthread::atforkchild
P: ***USER: [/usr/bin/pinentry-gtk-2.exe] parent. child pid=1452
C: child: about to call fixup_timers_after_fork
P: ***USER: [/usr/bin/pinentry-gtk-2.exe] parent about to handshake

[parent will now attempt to send data over the pipe to the child. This]
[should block until child is ready?]

[child, still in frok::child()]
   child: about to call ld_preload
   child: about to call fixup_hooks_after_fork
   child: about to call _my_tls.fixup_hooks_after_fork
   child: waiting for sigthread...
   child: frok::child returning 0
[child returns from frok::child to fork()]
   (child): returned from grouped.child(esp) res=0
   (child?): everything is ok (ischild=1)

[so, child says everything is fine, and is about to return from fork()
[to caller]

   returning from fork: ischild=1, res=0
   ***USER: [/usr/bin/pinentry-gtk-2.exe] parent returning with 67125247
The End.

The statement after fork() is never reached in the child process, even
tho fork() SAYS it is returning with 0. I would have EXPECTED to see:
   ***USER: [/usr/bin/pinentry-gtk-2.exe] after fork
            (mypid=1452,forkpid=0)

Since the handshake fails, parent returns with an error code (67125247).

I note that procmon indicates that the child process exited with code
-1073741783, which in hex is 0xc0000029.  If that's an NTSTATUS value,
then it means STATUS_INVALID_UNWIND_TARGET, but that might be a red herring.

Any ideas?

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-16 17:29           ` Charles Wilson
@ 2009-10-16 18:04             ` Dave Korn
  2009-10-16 19:46               ` Charles Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Korn @ 2009-10-16 18:04 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:
> Charles Wilson wrote:
>> Which appears ok, as far as it goes. 'Course, something dies eventually
>> because I never do get to gpg-agent's "exec()" of pinvoke, after that fork.
> 
> Well, bad news. In the kernel, the child gets right up to the return
> statement in fork(), returning 0.  But it never reaches the line after
> the call to fork() in the user code.
> 
> Help?

  Trashed stack?

> I note that procmon indicates that the child process exited with code
> -1073741783, which in hex is 0xc0000029.  If that's an NTSTATUS value,
> then it means STATUS_INVALID_UNWIND_TARGET, but that might be a red herring.

  Trashed stack during SEH unwind?

  Might be able to get some useful info by running this under windbg and
intercepting the exception; even just an $eip to tell you where it's coming
from might be all the clue you needed.

    cheers,
      DaveK

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-16 18:04             ` Dave Korn
@ 2009-10-16 19:46               ` Charles Wilson
  2009-10-16 20:01                 ` Dave Korn
  0 siblings, 1 reply; 28+ messages in thread
From: Charles Wilson @ 2009-10-16 19:46 UTC (permalink / raw)
  To: cygwin

Dave Korn wrote:
> Charles Wilson wrote:
>> Help?
> 
>   Trashed stack?
> 
>> I note that procmon indicates that the child process exited with code
>> -1073741783, which in hex is 0xc0000029.  If that's an NTSTATUS value,
>> then it means STATUS_INVALID_UNWIND_TARGET, but that might be a red herring.
> 
>   Trashed stack during SEH unwind?
> 
>   Might be able to get some useful info by running this under windbg and
> intercepting the exception; even just an $eip to tell you where it's coming
> from might be all the clue you needed.

*** wait with pending attach
Symbol search path is:
srv*c:\Temp\websymbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 00400000 00998000   c:\cygwin-1.7\bin\gpg-agent.exe
ModLoad: 770e0000 77207000   C:\Windows\system32\ntdll.dll
ModLoad: 77000000 770db000   C:\Windows\system32\kernel32.dll
ModLoad: 67f00000 67f0f000   c:\cygwin-1.7\bin\cyggcc_s-1.dll
ModLoad: 61000000 61300000   c:\cygwin-1.7\bin\cygwin1.dll
ModLoad: 760a0000 76166000   C:\Windows\system32\ADVAPI32.DLL
ModLoad: 75960000 75a22000   C:\Windows\system32\RPCRT4.dll
ModLoad: 684a0000 6850d000   c:\cygwin-1.7\bin\cyggcrypt-11.dll
ModLoad: 71950000 71958000   c:\cygwin-1.7\bin\cyggpg-error-0.dll
ModLoad: 6a960000 6a96d000   c:\cygwin-1.7\bin\cygintl-8.dll
ModLoad: 6ca10000 6cb0a000   c:\cygwin-1.7\bin\cygiconv-2.dll
ModLoad: 76290000 7632d000   C:\Windows\system32\USER32.dll
ModLoad: 76050000 7609b000   C:\Windows\system32\GDI32.dll
ModLoad: 6fa40000 6fa54000   c:\cygwin-1.7\bin\cygpth-20.dll
ModLoad: 76170000 7618e000   C:\Windows\system32\IMM32.DLL
ModLoad: 76190000 76258000   C:\Windows\system32\MSCTF.dll
ModLoad: 75bd0000 75c7a000   C:\Windows\system32\msvcrt.dll
ModLoad: 76330000 76339000   C:\Windows\system32\LPK.DLL
ModLoad: 75de0000 75e5d000   C:\Windows\system32\USP10.dll
ModLoad: 6c1b0000 6c1b5000   C:\Windows\system32\avgrsstx.dll
(f78.1938): Break instruction exception - code 80000003 (first chance)
eax=7ffdc000 ebx=00000000 ecx=00000000 edx=7716d094 esi=00000000
edi=00000000
eip=77127dfe esp=1a49ff5c ebp=1a49ff88 iopl=0         nv up ei pl zr na
pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000
efl=00000246
ntdll!DbgBreakPoint:
77127dfe cc              int     3
0:002> g
(f78.1118): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=0136cab8 ecx=00000000 edx=ffffffff esi=00000007
edi=00404119
eip=00000000 esp=7ff8f6fc ebp=7ff8f984 iopl=0         nv up ei pl zr na
pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000
efl=00010246
00000000 ??              ???
0:000> k
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
7ff8f984 00000000 0x0


which is just after the output window gets:
returning from fork: ischild=1, res=0

So, this is the right spot.  And $eip is 0x0.  That doesn't tell me much...

Something is obviously going badly wrong in the guts of fork(). Unless
somebody has a brilliant idea, I'm going to mothball this until after
cgf (Ye Olde Forke Experte) returns.

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-16 19:46               ` Charles Wilson
@ 2009-10-16 20:01                 ` Dave Korn
  2009-10-16 20:43                   ` Charles Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Korn @ 2009-10-16 20:01 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:

> ModLoad: 75bd0000 75c7a000   C:\Windows\system32\msvcrt.dll

  Say, what's that doing there?  Might like to check who's pulling it in, just
in case something's gone all win32 on you that shouldn't be.

> ModLoad: 6c1b0000 6c1b5000   C:\Windows\system32\avgrsstx.dll

  Let's hope AVG hasn't gone (even further) over to the dark side.

> (f78.1938): Break instruction exception - code 80000003 (first chance)
> eax=7ffdc000 ebx=00000000 ecx=00000000 edx=7716d094 esi=00000000
> edi=00000000
> eip=77127dfe esp=1a49ff5c ebp=1a49ff88 iopl=0         nv up ei pl zr na
> pe nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000
> efl=00000246
> ntdll!DbgBreakPoint:
> 77127dfe cc              int     3
> 0:002> g
> (f78.1118): Access violation - code c0000005 (first chance)
> First chance exceptions are reported before any exception handling.
> This exception may be expected and handled.
> eax=00000000 ebx=0136cab8 ecx=00000000 edx=ffffffff esi=00000007
> edi=00404119
> eip=00000000 esp=7ff8f6fc ebp=7ff8f984 iopl=0         nv up ei pl zr na
> pe nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000
> efl=00010246
> 00000000 ??              ???
> 0:000> k
> ChildEBP RetAddr
> WARNING: Frame IP not in any known module. Following frames may be wrong.
> 7ff8f984 00000000 0x0
> 
> 
> which is just after the output window gets:
> returning from fork: ischild=1, res=0
> 
> So, this is the right spot.  And $eip is 0x0.  That doesn't tell me much...

  So, the dreaded jump-to-zero.  Always a tricky one, since by the time you
get there you have no idea where you came there from.  Except that we suspect
fork().  I'd set a breakpoint on the start of fork and another one on the ret
at the end of it, (did you try mingw gdb yet?  it might be easier here than
windbg since it'll understand the symbols, but if you can't get it to work
then you can manually look up symbol addresses and set the breakpoints by hex
address), and then I'd restart the program, note the value of $esp and verify
a sane-looking return address on entry to the function, let it run to the end
of the function and find out if the stack pointer wasn't back at the same
location or if the return address there had been corrupted.  The second of
those could potentially be tracked down using a hardware breakpoint
(watchpoint in gdb terminology), the first of those two would require reading
the code to see why it's not popping and pushing in equal amounts.

    cheers,
      DaveK

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-16 20:01                 ` Dave Korn
@ 2009-10-16 20:43                   ` Charles Wilson
  2009-10-17  3:41                     ` Charles Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Charles Wilson @ 2009-10-16 20:43 UTC (permalink / raw)
  To: cygwin

Dave Korn wrote:
> Charles Wilson wrote:
> 
>> ModLoad: 75bd0000 75c7a000   C:\Windows\system32\msvcrt.dll
> 
>   Say, what's that doing there?  Might like to check who's pulling it in, just
> in case something's gone all win32 on you that shouldn't be.

It appears to be pulled in by winsock2, which is on-demand loaded by
cygwin, so it doesn't show up in the explicit dependencies as reported
by cygcheck.  But that's all "behind the cygwin layer" -- the way I've
built gnupg2 and libassuan, they don't go behind cygwin's back to access
windows socket functions directly. They use cygwin functionality for that.

>> ModLoad: 6c1b0000 6c1b5000   C:\Windows\system32\avgrsstx.dll
> 
>   Let's hope AVG hasn't gone (even further) over to the dark side.

Aw geez.  I tried running with AVG both enabled and disabled (but not
uninstalled).  There was a difference in the ProcMon output -- obviously
the disabled AVG makes fewer syscalls -- but the gpg-agent behavior was
unchanged.

I guess I'll try to uninstall AVG and see if that makes a difference.

>> which is just after the output window gets:
>> returning from fork: ischild=1, res=0
>>
>> So, this is the right spot.  And $eip is 0x0.  That doesn't tell me much...
> 
>   So, the dreaded jump-to-zero.  Always a tricky one, since by the time you
> get there you have no idea where you came there from.  Except that we suspect
> fork().  I'd set a breakpoint on the start of fork and another one on the ret
> at the end of it, (did you try mingw gdb yet? 

Not yet. Chris S. has recently released an updated mingw gdb based on
7.0, but I haven't installed or tested that one yet.

> it might be easier here than
> windbg since it'll understand the symbols, but if you can't get it to work
> then you can manually look up symbol addresses and set the breakpoints by hex
> address), 

Well, I did this in windbg (manually setting breakpoints).
Unfortunately, they appeared to have no effect -- after "g", it blew
right past them and into the exception.  Maybe I'll have better luck
with mingw-gdb.

First I'm going to rip out a lot of the debugging cruft from my cygwin
DLL, now that I know (part of) it was a wild goose chase.

> and then I'd restart the program, note the value of $esp and verify
> a sane-looking return address on entry to the function, let it run to the end
> of the function and find out if the stack pointer wasn't back at the same
> location or if the return address there had been corrupted.

Ah. Well...that won't actually work.  The *parent* is the only one of
the two that actually /enters/ the fork() function in the normal way,
and thus could be expected to have a reasonable return address (and hit
a breakpoint at the beginning of the function).

The child...not so much. It "enters" fork() by way of the longjmp, using
the jmb_buf set by the parent when IT was inside fork(), before the
parent (via a roundabout method) called CreateProcess to create the
child in the first place.

I suppose I could debug both the parent AND the child: since the forkee
should have exactly the same memory layout (and stack trace) once they
return from fork(), I suppose that I could

  1) look at the parent's stack trace when it is inside fork(). Ditto
     its return address.
  2) after the child longjmp's back into fork() from dll_crt0_1,
     look at its stack trace and return address. (although I can't
     really catch it that early. I can only catch it in the debugger
     just after the CYGWIN_FORK_SLEEP...but at least I'm still
     back inside fork() at that point.

They ought to match in all respects, correct?

> The second of
> those could potentially be tracked down using a hardware breakpoint
> (watchpoint in gdb terminology), the first of those two would require reading
> the code to see why it's not popping and pushing in equal amounts.

But setjmp and longjmp are nasty black magic assembly generated by
winsup/cygwin/gendef... Ow! Stop! That hurts!

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-16 20:43                   ` Charles Wilson
@ 2009-10-17  3:41                     ` Charles Wilson
  2009-10-17  5:26                       ` Dave Korn
  0 siblings, 1 reply; 28+ messages in thread
From: Charles Wilson @ 2009-10-17  3:41 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 8037 bytes --]

Charles Wilson wrote:
>   1) look at the parent's stack trace when it is inside fork(). Ditto
>      its return address.
>   2) after the child longjmp's back into fork() from dll_crt0_1,
>      look at its stack trace and return address. (although I can't
>      really catch it that early. I can only catch it in the debugger
>      just after the CYGWIN_FORK_SLEEP...but at least I'm still
>      back inside fork() at that point.
> 
> They ought to match in all respects, correct?

I think...maybe...I've located the primary symptom.  I don't yet know
the cause, so I don't have fix.  But...

The stack traces are "close", but do not match exactly.  Here's an
interleaved version, "P" for parent, "C" for child (and edited for
formatting, and to redact private data):

A: I /think/ this is the import thunk for fork, rather than
   the /actual/ _fork in the cygwin1.dll. Blame mingw gdb.
B: where I set the breakpoint in child. So that's reassuring.


P	#0  0x00439bb4 in fork ()  [[[[ B ]]]]
C	#0  fork () at fork.cc:617 [[[[ A ]]]]

P	#1  0x0043843f in pipe_connect_unix (ctx=0x7ff8fa30,
	    name=0x137ad68 "/usr/bin/pinentry-gtk-2.exe",
	    argv=0x7ff8fa1c, fd_child_list=0x7ff8fa10, atfork=0x40c803
	    <atfork_cb>, atforkvalue=0x136b948, flags=128)
	    at assuan-pipe-connect.c:234

C	#1  0x610cd03c in _sigfe () from /usr/bin/cygwin1.dll
C	#2  0x0137ad68 in ?? ()
C	#3  0x00001120 in ?? ()
C	#4  0x00000019 in ?? ()

P	#2  0x00438f56 in assuan_pipe_connect_ext (ctx=0x7ff8fa30,
	    name=0x137ad68 "/usr/bin/pinentry-gtk-2.exe",
	    argv=0x7ff8fa1c, fd_child_list=0x7ff8fa10, atfork=0x40c803
	    <atfork_cb>, atforkvalue=0x136b948, flags=128)
	    at assuan-pipe-connect.c:920
C	#5  0x00438f56 in assuan_pipe_connect_ext (ctx=0x7ff8fa30,
	    name=0x137ad68 "/usr/bin/pinentry-gtk-2.exe",
	    argv=0x7ff8fa1c, fd_child_list=0x7ff8fa10, atfork=0x40c803
	    <atfork_cb>, atforkvalue=0x136b948, flags=128)
	    at assuan-pipe-connect.c:920

P	#3  0x0040cbd0 in start_pinentry (ctrl=0x136b948)
	    at agent/call-pinentry.c:316
C	#6  0x0040cbd0 in start_pinentry (ctrl=0x136b948)
	    at agent/call-pinentry.c:316		
		
P	#4  0x0040e013 in agent_get_passphrase (ctrl=0x136b948,
	    retpass=0x7ff8fec0,
	    desc=0x136bae9 "Please enter the passphrase to unlock the
	    secret key for the OpenPGP certificate:REDACTED"...,
	    prompt=0x0,errtext=0x0, with_qualitybar=0)
	    at agent/call-pinentry.c:809		
C	#7  0x0040e013 in agent_get_passphrase (ctrl=0x136b948,
	    retpass=0x7ff8fec0,
	    desc=0x136bae9 "Please enter the passphrase to unlock the
	    secret key for the OpenPGP certificate:REDACTED"...,
	    prompt=0x0, errtext=0x0, with_qualitybar=0)
	    at agent/call-pinentry.c:809
		
P	#5  0x00406f15 in cmd_get_passphrase (ctx=0x136ba50,
	    line=0x136babc "REDACTED")
	    at agent/command.c:1111
C	#8  0x00406f15 in cmd_get_passphrase (ctx=0x136ba50,
	    line=0x136babc "REDACTED")
	    at agent/command.c:1111

P	#6  0x00435115 in dispatch_command (ctx=0x136ba50,
	    line=0x136baa7 "--data --repeat=0 -- REDACTED", linelen=282)
	    at assuan-handler.c:497
C	#9  0x00435115 in dispatch_command (ctx=0x136ba50,
	    line=0x136baa7 "--data --repeat=0 -- REDACTED", linelen=282)
	    at assuan-handler.c:497
		
P	#7  0x004355a9 in process_request (ctx=0x136ba50)
	    at assuan-handler.c:720
C	#10 0x004355a9 in process_request (ctx=0x136ba50)
	    at assuan-handler.c:720

P	#8  0x004355d7 in assuan_process (ctx=0x136ba50)
	    at assuan-handler.c:742
C	#11 0x004355d7 in assuan_process (ctx=0x136ba50)
	    at assuan-handler.c:742
		
P	#9  0x00408519 in start_command_handler (ctrl=0x136b948,
	    listen_fd=-1, fd=7)
	    at agent/command.c:1944
C	#12 0x00408519 in start_command_handler (ctrl=0x136b948,
	    listen_fd=-1, fd=7)
	    at agent/command.c:1944

P	#10 0x004041a2 in start_connection_thread (arg=0x136b948)
	    at agent/gpg-agent.c:1755	
C	#13 0x004041a2 in start_connection_thread (arg=0x136b948)
	    at agent/gpg-agent.c:1755
		
P	#11 0x6fa44bba in ?? ()
C	#14 0x6fa44bba in pth_exit () from /usr/bin/cygpth-20.dll

P	#12 0x0136b948 in ?? ()
C	#15 0x00000000 in ?? ()

P	#13 0x00000202 in ?? ()
P	#14 0x0022c9e8 in ?? ()


The register dump and $esp inspection for both parent and child are in
the attached file, for the truly interested.


So, what's the smoking gun?  This, in the child:

(gdb) p *(child_info_fork *)child_proc_info
$5 = {<child_info> = {msv_count = 0, cb = 288, intro = 2936076800,
    magic = 3897586042, type = 2, subproc_ready = 0x230, user_h = 0xb4,
    parent = 0x234, cygheap = 0x61245650, cygheap_max = 0x6124c1ac,
    cygheap_reserve_sz = 0, flag = 2 '\002', fhandler_union_cb = 464,
    retry = 10, exit_code = 0, static retry_count = 10},
  forker_finished = 0x238, stacksize = 0, jmp = {1, 22793784, 0, 0, 7,
    4210969, 2147022580, 2147022140, 1627853804, 3866659, 2293760,
    2281064, 2284368, 0 <repeats 39 times>}, stacktop = 0x7ff8f53c,
  stackbottom = 0x230000, filler = "\000\000\000"}

fork_info->stacksize is zero. According to fork.cc in frok::child:

  /* If we've played with the stack, stacksize != 0.  That means that
     fork() was invoked from other than the main thread.  Make sure that
     the threadinfo information is properly set up.  */
  if (fork_info->stacksize)
    {
      _main_tls = &_my_tls;
      _main_tls->init_thread (NULL, NULL);
      _main_tls->local_clib = *_impure_ptr;
      _impure_ptr = &_main_tls->local_clib;
    }

So, since in this child, we have stacksize == 0, that must mean that
fork() was invoked from the main thread (of the parent, since a child
/never/ "invokes" fork. It can't; parents invoke fork).  Right?

Wrong: look at frame #10(parent)/#13(child)
0x004041a2 in start_connection_thread (arg=0x136b948)
	    at agent/gpg-agent.c:1755

Here's the start of that function:

/* This is the standard connection thread's main function.  */
static void *
start_connection_thread (void *arg)
{
...
}

It is launched inside:


/* Connection handler loop.  Wait for connection requests and spawn a
   thread after accepting a connection.  */
static void
handle_connections (gnupg_fd_t listen_fd, gnupg_fd_t listen_fd_ssh)
{
...
  for (;;)
    {
...
     /* We now might create new threads and because we don't want any
         signals (as we are handling them here) to be delivered to a
         new thread.  Thus we need to block those signals. */
      pth_sigmask (SIG_BLOCK, &sigs, NULL);

      if (!shutdown_pending && FD_ISSET (FD2INT (listen_fd),
                                         &read_fdset))
        {
>>> HERE <<<  if (!pth_spawn (tattr, start_connection_thread, ctrl))
                {
                  log_error ("error spawning connection handler: %s\n",
                             strerror (errno) );
                  assuan_sock_close (fd);
                  xfree (ctrl);
                }

        }
...
    }
...
}


So, we have a thread, which is not the main thread, which calls fork.
Yet, when the forkee (child) checks that, it thinks that it WAS launched
from the main thread, and does not do the main_tls/_my_tls fixup stuff
for the stack.


Right now the "CYGWIN_FORK_SLEEP" check comes AFTER the main_tls fixup
stuff, so there's another cygwin kernel rebuild in my future, if I want
to step into the child early enough to look at the main_tls fixup stuff.
However, even if I poke at fork_info->stacksize and make it non-zero, so
that the main_tls/my_tls fixup happens --- I still need to figure out
exactly WHAT non-zero value it should have. I suspect that's almost as
hard as figuring out why cygwin didn't do it in the first place -- and
if I knew that, I'd just fix it there, rather than poking at it manually
using gdb...

I have a hunch that an STC (okay, less-hellaciously-complicated test
case) could be developed, using just GNU pth and avoiding all the
libassuan/gnupg gobbledygook.

Anyway, I'm going to be AFK (or, at least AF-This-Problem) for the weekend.

I've attached the entire log of relevant data from parent and child,
gzipped.

--
Chuck


[-- Attachment #2: fork-data.tar.gz --]
[-- Type: application/gzip, Size: 2005 bytes --]

[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-17  3:41                     ` Charles Wilson
@ 2009-10-17  5:26                       ` Dave Korn
  2009-10-17  6:55                         ` Charles Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Korn @ 2009-10-17  5:26 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:

> I have a hunch that an STC (okay, less-hellaciously-complicated test
> case) could be developed, using just GNU pth and avoiding all the
> libassuan/gnupg gobbledygook.

  Oh yuck.  So there's this alternative user-land pthreads library that runs a
scheduler within a single real machine thread, using some hairy sjlj hackery
to perform context switches?  That's kinda asking for trouble, isn't it?

  Anyway, look here: pth_mctx.c line ~ 514

> /*
>  * VARIANT 5: WIN32 SPECIFIC JMP_BUF FIDDLING
>  *
>  * Oh hell, Win32 has setjmp(3), but no sigstack(2) or sigaltstack(2).
>  * So we have to fiddle around with the jmp_buf here too...
>  */
> 
> #elif PTH_MCTX_MTH(sjlj) && PTH_MCTX_DSP(sjljw32)
> intern int
> pth_mctx_set(pth_mctx_t *mctx, void (*func)(void),
>                      char *sk_addr_lo, char *sk_addr_hi)
> {
>     pth_mctx_save(mctx);
> #if i386
>     mctx->jb[7] = (int)sk_addr_hi;
>     mctx->jb[8] = (int)func;
> #else
> #error "Unsupported Win32 architecture"
> #endif
>     sigemptyset(&mctx->sigs);
>     mctx->error = 0;
>     return TRUE;
> }

  Umm, yes.  Poking around directly inside a sigjmp_buf.  Wonder if the layout
is actually what that code expects it to be or not?  That's where I'd start
looking next, anyway, if I was wondering why maybe things were randomly
jumping to unexpected places ...

    cheers,
      DaveK



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-17  5:26                       ` Dave Korn
@ 2009-10-17  6:55                         ` Charles Wilson
  2009-10-17  9:48                           ` Charles Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Charles Wilson @ 2009-10-17  6:55 UTC (permalink / raw)
  To: cygwin

Dave Korn wrote:
> Charles Wilson wrote:
> 
>> I have a hunch that an STC (okay, less-hellaciously-complicated test
>> case) could be developed, using just GNU pth and avoiding all the
>> libassuan/gnupg gobbledygook.
> 
>   Oh yuck.  So there's this alternative user-land pthreads library that runs a
> scheduler within a single real machine thread, using some hairy sjlj hackery
> to perform context switches?  That's kinda asking for trouble, isn't it?

Well, I haven't looked closely at it at all. I compiled it, it passed
its own testsuite, so I figured Great! moving on...

I was sorta under the impression that Pth acted as a wrapper around
pthreads if available, which seems relatively harmless. But maybe I was
wrong.

If, instead, we're NOT actually using real threads, but instead PTH is
faking them all within a single thread...well, (a) my guess about the
innards of frok::child and main_tls/my_tls is wrong, and (b) that's
just...evil.

>   Anyway, look here: pth_mctx.c line ~ 514

Well, we're not "windows" are we? I'll have to look, but I thought the
PTH configury was smart enough to treat cygwin as more unixy than that.

>   Umm, yes.  Poking around directly inside a sigjmp_buf.  Wonder if the layout
> is actually what that code expects it to be or not?  That's where I'd start
> looking next, anyway, if I was wondering why maybe things were randomly
> jumping to unexpected places ...

Oh gosh. I hope that code isn't actually "live" in the cygwin
build...yeah, messing around with jmp_bufs behind cygwin's -- or ANY
OS's -- back is just bound to screw up.  Sigh.

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fork failure?
  2009-10-17  6:55                         ` Charles Wilson
@ 2009-10-17  9:48                           ` Charles Wilson
  2009-10-17 10:18                             ` GNU pth + cygwin + fork [Was: Re: fork failure?] Charles Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Charles Wilson @ 2009-10-17  9:48 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2104 bytes --]

Charles Wilson wrote:
> Dave Korn wrote:
>>   Umm, yes.  Poking around directly inside a sigjmp_buf.  Wonder if the layout
>> is actually what that code expects it to be or not?  That's where I'd start
>> looking next, anyway, if I was wondering why maybe things were randomly
>> jumping to unexpected places ...
> 
> Oh gosh. I hope that code isn't actually "live" in the cygwin
> build...yeah, messing around with jmp_bufs behind cygwin's -- or ANY
> OS's -- back is just bound to screw up.  Sigh.

Ok, it's pth's fault.  As it happens, none of the tests in pth's test
suite use fork(); that seems to be a serious oversight.  Anyway, here
are two dirt-simple apps. One using pthreads, the other using pth.
'Course, you need pth to build the latter.

gcc -o pth-fork pth-fork.c -lpth
gcc -o pthreads-fork pthreads-fork.c


$ ./pthreads-fork
FORKPARENT: mypid=7328 childpid=7724
FORKCHILD: mypid=7724

$ ./pth-fork
FORKPARENT: mypid=7140 childpid=7840

It looks like this is a long-standing problem:
http://www.cygwin.com/ml/cygwin/2001-05/threads.html#01131

Then, as now, suspicion falls on messing with jmp_buf and/or the stack
in bad ways.

(Note: using -DPTH_SYSCALL_SOFT=1 to force using pth_fork() as a wrapper
around the system fork() doesn't help. Same bad behavior.)

I wonder what's more difficult...fixing pth, or modifying libassuan and
gnupg to use plain old pthreads instead of pth?

Typically, there's a big performance impact between native threads
(slow, but pre-emptive) and user-mode threads (fast, but
non-pre-emptive).  However, on windows, I believe you don't have nearly
as much of a performance penalty using native threads (which cygwin's
pthread implementation uses under the hood).  So, modifying the code to
use pthreads wouldn't be bad, from a performance standpoint...but the
APIs are /just annoyingly different enough/ to be painful.

Hmmm...or writing shim wrappers to translate pth calls to pthread?

Ach, the purist in me just wants to get pth working...

I've attached the two test progs, and the cygport for pth. (sans source.
use 'cygport *.cygport get').

--
Chuck

[-- Attachment #2: pth-2.0.7-1.cygport.tar.bz2 --]
[-- Type: application/octet-stream, Size: 20480 bytes --]

[-- Attachment #3: pthreads-fork.c --]
[-- Type: text/plain, Size: 831 bytes --]

#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <string.h>
#include <sys/errno.h>

void *test (void *arg);

int
main (int argc, char *argv[])
{
  int err;
  pthread_t thread;
  void *threadrv;

  if ((err = pthread_create (&thread, NULL, test, NULL)) != 0)
    {
      printf ("Error on pthread_create %d:%s\n", err, strerror (err));
      exit (1);
    }

  if ((err = pthread_join (thread, &threadrv)) != 0)
    {
      printf ("Error on pthread_join %d:%s\n", err, strerror (err));
      exit (1);
    }
  return 0;
}

void *
test (void *arg)
{
  int pid;

  pid = fork ();
  if (pid < 0)
    {
      printf ("FORKFAILED\n");
    }
  else if (pid == 0)
    {
      printf ("FORKCHILD: mypid=%d\n", getpid ());
    }
  else
    {
      printf ("FORKPARENT: mypid=%d childpid=%d\n", getpid (), pid);
    }
}


[-- Attachment #4: pth-fork.c --]
[-- Type: text/plain, Size: 976 bytes --]

#include <stdlib.h>
#include <stdio.h>
#include <pth.h>
#include <string.h>
#include <sys/errno.h>

void *test (void *arg);

int
main (int argc, char *argv[])
{
  int err;
  pth_t thread;
  void *threadrv;

  if ((err = pth_init ()) != TRUE)
    {
      printf ("Error on pth_init: %d: %s\n", errno, strerror (errno));
      exit (1);
    }

  if ((thread = pth_spawn (PTH_ATTR_DEFAULT, test, NULL)) == (pid_t) NULL)
    {
      printf ("Error on pth_spawn: %d: %s\n", errno, strerror (errno));
      exit (1);
    }

  if ((err = pth_join (thread, &threadrv)) != TRUE)
    {
      printf ("Error on pthread_join %d: %s\n", errno, strerror (errno));
      exit (1);
    }
  return 0;
}

void *
test (void *arg)
{
  int pid;

  pid = fork ();
  if (pid < 0)
    {
      printf ("FORKFAILED\n");
    }
  else if (pid == 0)
    {
      printf ("FORKCHILD: mypid=%d\n", getpid ());
    }
  else
    {
      printf ("FORKPARENT: mypid=%d childpid=%d\n", getpid (), pid);
    }
}



[-- Attachment #5: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* GNU pth + cygwin + fork [Was: Re: fork failure?]
  2009-10-17  9:48                           ` Charles Wilson
@ 2009-10-17 10:18                             ` Charles Wilson
  2009-10-17 15:37                               ` Dave Korn
  0 siblings, 1 reply; 28+ messages in thread
From: Charles Wilson @ 2009-10-17 10:18 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:

> Ach, the purist in me just wants to get pth working...

Hmm...it appears the right way to do this is NOT to add another special
case in pth: "no, on cygwin THIS is the way you poke around in the
jmp_buf" + extra cygwin TLC in pth_fork().  Instead, cygwin pth should
use the standard posix sigstack/sigaltstack approach.

But that'll have to wait until after cygwin-1.7.1:
http://cygwin.com/ml/cygwin/2009-07/msg00859.html
> Let me add a new data point: I'll implement sigaltstack after 1.7.1 is
> released.

And, of course, cgf's statement above doesn't mean that sigaltstack will
be available the day after 1.7.1 is released, either. I'm sure it will
be devilishly tricky to get right, and will take a lot of time and effort.

In the short-to-medium term, it looks like converting libassuan and
gnupg to use pthreads instead of pth won't be terribly difficult.  Once
once sig[alt]stack is available I can modify cygwin-pth to use the
sig[alt]stack "Machine Context Implementation" instead of the current
"sjlj/sjljw32/none" one, and then restore libassuan and gnupg to the pth
status quo ante.

I think that pretty much ends this nightmare thread -- but chalk another
vote up there for "pretty please, cgf, implement sigaltstack soonish".

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GNU pth + cygwin + fork [Was: Re: fork failure?]
  2009-10-17 10:18                             ` GNU pth + cygwin + fork [Was: Re: fork failure?] Charles Wilson
@ 2009-10-17 15:37                               ` Dave Korn
  2009-10-17 16:10                                 ` Charles Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Korn @ 2009-10-17 15:37 UTC (permalink / raw)
  To: cygwin

Charles Wilson wrote:

> In the short-to-medium term, it looks like converting libassuan and
> gnupg to use pthreads instead of pth won't be terribly difficult.  Once
> once sig[alt]stack is available I can modify cygwin-pth to use the
> sig[alt]stack "Machine Context Implementation" instead of the current
> "sjlj/sjljw32/none" one, and then restore libassuan and gnupg to the pth
> status quo ante.

  My first thought would be to figure out what pth is attempting to do while
messing in jmp_buf, and make it work.  It's bad, unmaintainable code, that
will break again in the future if ever jmp_buf is rearranged - but it only has
to stagger along for another couple of months until you can do it right using
sigaltstack.  Until then, slapping a band-aid on pth might be a lot less
work-that-soon-has-to-be-thrown-away than hacking both libassuan and gpg to
handle a different API.  (I say this without having yet done the research to
figure out exactly what pth thinks it is doing to that jmp_buf and whether
it's necessarily possible, but it ought to be.)

  Anyway, it's your effort so it's your call but I suggest this strategy
because you didn't explicitly mention having considered it in your
deliberations above.

    cheers,
      DaveK

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: GNU pth + cygwin + fork [Was: Re: fork failure?]
  2009-10-17 15:37                               ` Dave Korn
@ 2009-10-17 16:10                                 ` Charles Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Charles Wilson @ 2009-10-17 16:10 UTC (permalink / raw)
  To: cygwin

Dave Korn wrote:
>   My first thought would be to figure out what pth is attempting to do while
> messing in jmp_buf, and make it work.  It's bad, unmaintainable code, that
> will break again in the future if ever jmp_buf is rearranged - but it only has
> to stagger along for another couple of months until you can do it right using
> sigaltstack.

I did consider that, but frankly this whole business of "adjusting the
stack" by messing around inside the undocumented jmp_buf scares the
*bleep* out of me.  WAY too easy to get wrong, and my asm skills are too
rusty -- and too non-x86 -- to be trusted.

> Until then, slapping a band-aid on pth might be a lot less
> work-that-soon-has-to-be-thrown-away than hacking both libassuan and gpg to
> handle a different API.  (I say this without having yet done the research to
> figure out exactly what pth thinks it is doing to that jmp_buf and whether
> it's necessarily possible, but it ought to be.)

I'm sure it's possible, but IMO migrating libassuan/gnupg is easier --
for me.  As it turns out, in libassuan there's only one file that needs
to be modified. gnupg is a bit trickier, but not terrible -- and the
APIs, while annoyingly different, aren't totally dissimilar.

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2009-10-17 16:10 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-15 14:34 fork failure? Charles Wilson
2009-10-15 14:56 ` Dave Korn
2009-10-15 15:54   ` Charles Wilson
2009-10-15 16:35     ` Dave Korn
2009-10-15 17:07       ` Charles Wilson
2009-10-15 17:21         ` Charles Wilson
2009-10-15 17:33         ` Christopher Faylor
2009-10-15 18:17           ` Dave Korn
2009-10-15 19:21             ` Charles Wilson
2009-10-16  7:58               ` Corinna Vinschen
2009-10-15 23:33       ` Charles Wilson
2009-10-15 23:58         ` Dave Korn
2009-10-16  0:31         ` Dave Korn
2009-10-16  0:46           ` Dave Korn
2009-10-16  2:06           ` Charles Wilson
2009-10-16  7:35         ` Charles Wilson
2009-10-16 17:29           ` Charles Wilson
2009-10-16 18:04             ` Dave Korn
2009-10-16 19:46               ` Charles Wilson
2009-10-16 20:01                 ` Dave Korn
2009-10-16 20:43                   ` Charles Wilson
2009-10-17  3:41                     ` Charles Wilson
2009-10-17  5:26                       ` Dave Korn
2009-10-17  6:55                         ` Charles Wilson
2009-10-17  9:48                           ` Charles Wilson
2009-10-17 10:18                             ` GNU pth + cygwin + fork [Was: Re: fork failure?] Charles Wilson
2009-10-17 15:37                               ` Dave Korn
2009-10-17 16:10                                 ` Charles Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).