STC for libapr1 failure

public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed

* STC for libapr1 failure
@ 2011-08-26  0:39 David Rothenberger
  2011-08-26 11:16 ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: David Rothenberger @ 2011-08-26  0:39 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 836 bytes --]

For a while now, the test cases that come with libapr1 have been
bombing with this message:

  *** fatal error - NtCreateEvent(lock): 0xC0000035

I finally took some time to investigate and have extracted a STC
that demonstrates the problem.

It's been a decade since I did any C programming, so I'm not really
sure that the STC is valid. However, it does work on my Debian
box. (I know that doesn't really mean anything, but it's the best I
can do.)

I've tried this on my Win7-64 box running the 20110822 snapshot and
on a WinXP VM running 1.7.9. I get the same results in both places.

Regards,
David

-- 
David Rothenberger  ----  daveroth@acm.org

The First Rule of Program Optimization:
        Don't do it.

The Second Rule of Program Optimization (for experts only!):
        Don't do it yet.
                -- Michael Jackson

[-- Attachment #2: stc-flock-fork.c --]
[-- Type: text/plain, Size: 2522 bytes --]

/***********************************************************************
 * This is a STC that causes the following error on my test machine:
 *   NtCreateEvent(lock): 0xC0000035
 *
 * It tries to use flock() for file locking. It creates a temporary
 * file, the uses fork to spawn a number of children. Each child opens
 * the file, then repeatedly uses flock to lock and unlock it.
 *
 * This test was extracted from the APR test suite.
 *
 * Compile: gcc -Wall -o stc-flock-fork stc-flock-fork.c
 ***********************************************************************/

#include <sys/types.h>
#include <sys/file.h>
#include <sys/wait.h>

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#define MAX_ITER 2000
#define CHILDREN 6

/* A temporary file used for flock. */
char tmpfilename[] = "/tmp/flocktstXXXXXX";

/* Fork and use flock to lock and unlock the file repeatedly in the child. */
void make_child(int trylock, pid_t *pid)
{
    if ((*pid = fork()) < 0) {
        perror("fork failed");
        exit(1);
    }
    else if (*pid == 0) {
        int fd2 = open(tmpfilename, O_RDONLY);
        if (fd2 < 0) {
            perror("child open");
            exit(1);
        }

        int rc;
        int i;
        for (i=0; i<MAX_ITER; ++i) {
            do {
                rc = flock(fd2, LOCK_EX);
            } while (rc < 0 && errno == EINTR);
            if (rc < 0) {
                perror("lock");
                exit(1);
            }
            
            do {
                rc = flock(fd2, LOCK_UN);
            } while (rc < 0 && errno == EINTR);
            if (rc < 0) {
                perror("unlock");
                exit(1);
            }
        }
        exit(0);
    }
}

/* Wait for the child to finish. */
void await_child(pid_t pid)
{
    pid_t pstatus;
    int exit_int;

    do {
        pstatus = waitpid(pid, &exit_int, WUNTRACED);
    } while (pstatus < 0 && errno == EINTR);
}

int main(int argc, const char * const * argv, const char * const *env)
{
    pid_t child[CHILDREN];
    int n;
    int fd;
 
    /* Create the temporary file. */
    fd = mkstemp(tmpfilename);
    if (fd < 0) {
        perror("open failed");
        exit(1);
    }
    close(fd);

    /* Create the children. */
    for (n = 0; n < CHILDREN; n++)
        make_child(0, &child[n]);

    /* Wait for them to finish. */
    for (n = 0; n < CHILDREN; n++)
        await_child(child[n]);

    /* Clean up. */
    unlink(tmpfilename);
    return 0;
}


[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2011-08-26  0:39 STC for libapr1 failure David Rothenberger
@ 2011-08-26 11:16 ` Corinna Vinschen
  2011-08-27 20:37   ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2011-08-26 11:16 UTC (permalink / raw)
  To: cygwin

On Aug 25 17:39, David Rothenberger wrote:
> For a while now, the test cases that come with libapr1 have been
> bombing with this message:
> 
>   *** fatal error - NtCreateEvent(lock): 0xC0000035
> 
> I finally took some time to investigate and have extracted a STC
> that demonstrates the problem.

Thanks a lot for the testcase.  In theory, the NtCreateEvent call should
not have happened at all, since it's called under lock, and the code
around that should have made sure that the object doesn't exist at the
time.

After a few hours of extrem puzzlement, I now finally know what happens.
It's kinda hard to explain.

A lock on a file is represented by an event object.  Process A holds the
lock corresponding with event a.  Process B tries to lock, but the lock
of process A blocks that.  So B now waits for event a, until it gets
signalled.  Now A unlocks, thus signalling event a and closing the handle
afterwards.  But A's time slice isn't up yet, so it tries again to lock
the file, before B returned from the wait for a.  And here a wrong
condition fails to recognize the situation.  It finds the event object,
but since it's recognized as "that's me", it doesn't treat the event as
a blocking factor.  This in turn is the allowance to create its own lock
event object.  However, the object still exists, since b has still an
open handle to it.  So creating the event fails, and rightfully so.

What I don't have is an idea how to fix this problem correctly.  I have
to think about that.  Stay tuned.

Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2011-08-26 11:16 ` Corinna Vinschen
@ 2011-08-27 20:37   ` Corinna Vinschen
  2011-08-27 22:27     ` David Rothenberger
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2011-08-27 20:37 UTC (permalink / raw)
  To: cygwin

On Aug 26 13:15, Corinna Vinschen wrote:
> On Aug 25 17:39, David Rothenberger wrote:
> > For a while now, the test cases that come with libapr1 have been
> > bombing with this message:
> > 
> >   *** fatal error - NtCreateEvent(lock): 0xC0000035
> > 
> > I finally took some time to investigate and have extracted a STC
> > that demonstrates the problem.
> 
> Thanks a lot for the testcase.  In theory, the NtCreateEvent call should
> not have happened at all, since it's called under lock, and the code
> around that should have made sure that the object doesn't exist at the
> time.
> 
> After a few hours of extrem puzzlement, I now finally know what happens.
> It's kinda hard to explain.
> 
> A lock on a file is represented by an event object.  Process A holds the
> lock corresponding with event a.  Process B tries to lock, but the lock
> of process A blocks that.  So B now waits for event a, until it gets
> signalled.  Now A unlocks, thus signalling event a and closing the handle
> afterwards.  But A's time slice isn't up yet, so it tries again to lock
> the file, before B returned from the wait for a.  And here a wrong
> condition fails to recognize the situation.  It finds the event object,
> but since it's recognized as "that's me", it doesn't treat the event as
> a blocking factor.  This in turn is the allowance to create its own lock
> event object.  However, the object still exists, since b has still an
> open handle to it.  So creating the event fails, and rightfully so.
> 
> What I don't have is an idea how to fix this problem correctly.  I have
> to think about that.  Stay tuned.

Please test the latest snapshot.  It should fix this problem, as well as
a starvation problem with signals (and, fwiw, thread cancel events) in
flock, lockf, and POSIX fcntl locks.

Thanks again for the testcase.  It was very helpful to test both problems.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2011-08-27 20:37   ` Corinna Vinschen
@ 2011-08-27 22:27     ` David Rothenberger
  2011-08-29 13:55       ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: David Rothenberger @ 2011-08-27 22:27 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1469 bytes --]

On 8/27/2011 1:37 PM, Corinna Vinschen wrote:
> On Aug 26 13:15, Corinna Vinschen wrote:
>> On Aug 25 17:39, David Rothenberger wrote:
>>> For a while now, the test cases that come with libapr1 have been
>>> bombing with this message:
>>>
>>>   *** fatal error - NtCreateEvent(lock): 0xC0000035
>>>
>>> I finally took some time to investigate and have extracted a STC
>>> that demonstrates the problem.
>>
>> Thanks a lot for the testcase.  In theory, the NtCreateEvent call should
>> not have happened at all, since it's called under lock, and the code
>> around that should have made sure that the object doesn't exist at the
>> time.
>>
>> After a few hours of extrem puzzlement, I now finally know what happens.
>> It's kinda hard to explain.
>>
[... very good description of flock problem ...]
> 
> Please test the latest snapshot.  It should fix this problem, as well as
> a starvation problem with signals (and, fwiw, thread cancel events) in
> flock, lockf, and POSIX fcntl locks.

The new snapshot runs the flock STC. Thanks!

I've been building libapr1 without F_SETLK support for a while since
it was also triggering the "NtCreateEvent(lock): 0xC0000035"
error. Since you mentioned fcntl, I tried re-enabling the fcntl
mutexes. They still trigger the error.

I've attached a similar STC that uses fcntl instead of flock.

-- 
David Rothenberger  ----  daveroth@acm.org

"It's what you learn after you know it all that counts."
                -- John Wooden

[-- Attachment #2: stc-fcntl-fork.c --]
[-- Type: text/plain, Size: 3254 bytes --]

/***********************************************************************
 * This is a STC that causes the following error on my test machine:
 *   NtCreateEvent(lock): 0xC0000035
 *
 * It tries to use fcntl() for file locking. It creates a temporary
 * file, the uses fork to spawn a number of children. Each child opens
 * the file, then repeatedly uses fcntl to lock and unlock it.
 *
 * This test was extracted from the APR test suite.
 *
 * Compile: gcc -Wall -o stc-fcntl-fork stc-fcntl-fork.c
 ***********************************************************************/

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/wait.h>
#include <sys/file.h>

#define MAX_ITER 2000
#define CHILDREN 6

/* A temporary file used for fcntl. */
char tmpfilename[] = "/tmp/fcntlXXXXXX";

struct flock mutex_lock_it;
struct flock mutex_unlock_it;

/* Fork and use fcntl to lock and unlock the file repeatedly in the child. */
void make_child(int trylock, pid_t *pid)
{
    if ((*pid = fork()) < 0) {
        perror("fork failed");
        exit(1);
    }
    else if (*pid == 0) {
        int fd2 = open(tmpfilename, O_RDWR);
        if (fd2 < 0) {
            perror("child open");
            exit(1);
        }

        int rc;
        int i;
        for (i=0; i<MAX_ITER; ++i) {
            do {
                rc = fcntl(fd2, F_SETLKW, &mutex_lock_it);
            } while (rc < 0 && errno == EINTR);
            if (rc < 0) {
                perror("lock");
                exit(1);
            }
            
            do {
                rc = fcntl(fd2, F_SETLKW, &mutex_unlock_it);
            } while (rc < 0 && errno == EINTR);
            if (rc < 0) {
                perror("unlock");
                exit(1);
            }
        }
        exit(0);
    }
}

/* Wait for the child to finish. */
void await_child(pid_t pid)
{
    pid_t pstatus;
    int exit_int;

    do {
        pstatus = waitpid(pid, &exit_int, WUNTRACED);
    } while (pstatus < 0 && errno == EINTR);
}

int main(int argc, const char * const * argv, const char * const *env)
{
    pid_t child[CHILDREN];
    int n;
    int fd;
 
    /* Create the temporary file. */
    fd = mkstemp(tmpfilename);
    if (fd < 0) {
        perror("open failed");
        exit(1);
    }
    close(fd);

    /* Setup mutexes */
    mutex_lock_it.l_whence = SEEK_SET;   /* from current point */
    mutex_lock_it.l_start = 0;           /* -"- */
    mutex_lock_it.l_len = 0;             /* until end of file */
    mutex_lock_it.l_type = F_WRLCK;      /* set exclusive/write lock */
    mutex_lock_it.l_pid = 0;             /* pid not actually interesting */
    mutex_unlock_it.l_whence = SEEK_SET; /* from current point */
    mutex_unlock_it.l_start = 0;         /* -"- */
    mutex_unlock_it.l_len = 0;           /* until end of file */
    mutex_unlock_it.l_type = F_UNLCK;    /* set exclusive/write lock */
    mutex_unlock_it.l_pid = 0;           /* pid not actually interesting */

    /* Create the children. */
    for (n = 0; n < CHILDREN; n++)
        make_child(0, &child[n]);

    /* Wait for them to finish. */
    for (n = 0; n < CHILDREN; n++)
        await_child(child[n]);

    /* Clean up. */
    unlink(tmpfilename);
    return 0;
}


[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2011-08-27 22:27     ` David Rothenberger
@ 2011-08-29 13:55       ` Corinna Vinschen
  2011-08-29 17:09         ` David Rothenberger
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2011-08-29 13:55 UTC (permalink / raw)
  To: cygwin

On Aug 27 15:27, David Rothenberger wrote:
> On 8/27/2011 1:37 PM, Corinna Vinschen wrote:
> > On Aug 26 13:15, Corinna Vinschen wrote:
> >> On Aug 25 17:39, David Rothenberger wrote:
> >>> For a while now, the test cases that come with libapr1 have been
> >>> bombing with this message:
> >>>
> >>>   *** fatal error - NtCreateEvent(lock): 0xC0000035
> >>>
> >>> I finally took some time to investigate and have extracted a STC
> >>> that demonstrates the problem.
> >>
> >> Thanks a lot for the testcase.  In theory, the NtCreateEvent call should
> >> not have happened at all, since it's called under lock, and the code
> >> around that should have made sure that the object doesn't exist at the
> >> time.
> >>
> >> After a few hours of extrem puzzlement, I now finally know what happens.
> >> It's kinda hard to explain.
> >>
> [... very good description of flock problem ...]
> > 
> > Please test the latest snapshot.  It should fix this problem, as well as
> > a starvation problem with signals (and, fwiw, thread cancel events) in
> > flock, lockf, and POSIX fcntl locks.
> 
> The new snapshot runs the flock STC. Thanks!
> 
> I've been building libapr1 without F_SETLK support for a while since
> it was also triggering the "NtCreateEvent(lock): 0xC0000035"
> error. Since you mentioned fcntl, I tried re-enabling the fcntl
> mutexes. They still trigger the error.
> 
> I've attached a similar STC that uses fcntl instead of flock.

I made a couple more changes to the file locking code to accommodate
POSIX locks as well.  Please test the today's developer snapshot,
which I'm just creating.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2011-08-29 13:55       ` Corinna Vinschen
@ 2011-08-29 17:09         ` David Rothenberger
  0 siblings, 0 replies; 32+ messages in thread
From: David Rothenberger @ 2011-08-29 17:09 UTC (permalink / raw)
  To: cygwin

On 8/29/2011 6:54 AM, Corinna Vinschen wrote:
> On Aug 27 15:27, David Rothenberger wrote:
>> On 8/27/2011 1:37 PM, Corinna Vinschen wrote:
>>> On Aug 26 13:15, Corinna Vinschen wrote:
>>>> On Aug 25 17:39, David Rothenberger wrote:
>>>>> For a while now, the test cases that come with libapr1 have been
>>>>> bombing with this message:
>>>>>
>>>>>   *** fatal error - NtCreateEvent(lock): 0xC0000035
>>>>>
>>>>> I finally took some time to investigate and have extracted a STC
>>>>> that demonstrates the problem.
>>>>
>>>> Thanks a lot for the testcase.  In theory, the NtCreateEvent call should
>>>> not have happened at all, since it's called under lock, and the code
>>>> around that should have made sure that the object doesn't exist at the
>>>> time.
>>>>
>>>> After a few hours of extrem puzzlement, I now finally know what happens.
>>>> It's kinda hard to explain.
>>>>
>> [... very good description of flock problem ...]
>>>
>>> Please test the latest snapshot.  It should fix this problem, as well as
>>> a starvation problem with signals (and, fwiw, thread cancel events) in
>>> flock, lockf, and POSIX fcntl locks.
>>
>> The new snapshot runs the flock STC. Thanks!
>>
>> I've been building libapr1 without F_SETLK support for a while since
>> it was also triggering the "NtCreateEvent(lock): 0xC0000035"
>> error. Since you mentioned fcntl, I tried re-enabling the fcntl
>> mutexes. They still trigger the error.
>>
>> I've attached a similar STC that uses fcntl instead of flock.
> 
> I made a couple more changes to the file locking code to accommodate
> POSIX locks as well.  Please test the today's developer snapshot,
> which I'm just creating.

The latest baseline fixes my STC and the libpr1 test suite. Thanks!

-- 
David Rothenberger  ----  daveroth@acm.org

The Beatles:
        Paul McCartney's old back-up band.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-24  3:49                                         ` Yaakov (Cygwin/X)
@ 2012-02-24  8:15                                           ` Corinna Vinschen
  0 siblings, 0 replies; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-24  8:15 UTC (permalink / raw)
  To: cygwin

On Feb 23 21:49, Yaakov (Cygwin/X) wrote:
> On Thu, 2012-02-23 at 15:19 +0100, Corinna Vinschen wrote:
> > On Feb 21 18:09, Corinna Vinschen wrote:
> > > Btw., in how far is XWin broken?  I just tried to start it from the
> > > start menu and that worked perfectly fine.  I get the default xterm
> > > and that works.
> > 
> > I really need something reproducible here. 
> 
> I am unable to reproduce it anymore.  While I thought I excluded other
> possible causes, it seems that I was wrong.  Sorry for the noise.

Never mind.  I'm glad that the code got more testing.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-23 14:20                                       ` Corinna Vinschen
  2012-02-23 18:43                                         ` Achim Gratz
@ 2012-02-24  3:49                                         ` Yaakov (Cygwin/X)
  2012-02-24  8:15                                           ` Corinna Vinschen
  1 sibling, 1 reply; 32+ messages in thread
From: Yaakov (Cygwin/X) @ 2012-02-24  3:49 UTC (permalink / raw)
  To: cygwin

On Thu, 2012-02-23 at 15:19 +0100, Corinna Vinschen wrote:
> On Feb 21 18:09, Corinna Vinschen wrote:
> > Btw., in how far is XWin broken?  I just tried to start it from the
> > start menu and that worked perfectly fine.  I get the default xterm
> > and that works.
> 
> I really need something reproducible here. 

I am unable to reproduce it anymore.  While I thought I excluded other
possible causes, it seems that I was wrong.  Sorry for the noise.


Yaakov



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-23 14:20                                       ` Corinna Vinschen
@ 2012-02-23 18:43                                         ` Achim Gratz
  2012-02-24  3:49                                         ` Yaakov (Cygwin/X)
  1 sibling, 0 replies; 32+ messages in thread
From: Achim Gratz @ 2012-02-23 18:43 UTC (permalink / raw)
  To: cygwin

Corinna Vinschen writes:
>> > > I'm sorry to report that the 20120220 snapshot breaks the X server,
>> > > which uses fcntl() with a lock file.
>> > 
>> > STC?
>> 
>> Btw., in how far is XWin broken?  I just tried to start it from the
>> start menu and that worked perfectly fine.  I get the default xterm
>> and that works.
>
> I really need something reproducible here.  Otherwise I will release
> Cygwin 1.7.11 this weekend.

FWIW, I did a full install plus the snapshot 20120220 yesterday (and
some things from cygports, like texlive on top of that) and XWin/wmaker
works fine.  After doing a rebaseall also the Tk based applications
started working correctly.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptation for Waldorf rackAttack V1.04R1:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-21 17:10                                     ` Corinna Vinschen
@ 2012-02-23 14:20                                       ` Corinna Vinschen
  2012-02-23 18:43                                         ` Achim Gratz
  2012-02-24  3:49                                         ` Yaakov (Cygwin/X)
  0 siblings, 2 replies; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-23 14:20 UTC (permalink / raw)
  To: cygwin

Hi Yaakov,

On Feb 21 18:09, Corinna Vinschen wrote:
> On Feb 21 09:58, Corinna Vinschen wrote:
> > On Feb 20 19:29, Yaakov (Cygwin/X) wrote:
> > > On Mon, 2012-02-20 at 15:17 +0100, Corinna Vinschen wrote:
> > > > As always, thanks for the testcase.  I think I found the problem.  It's
> > > > hard to explain if you don;t know how the code works, but it boils down
> > > > to the fact that my last round of patches back in August were not
> > > > actually fixing the problem, but only working around it.  I'm hopeful
> > > > that I got it right this time.  I'm just generating a new snapshot.
> > > > Please give it another hit with the APR testsuite.
> > > 
> > > I'm sorry to report that the 20120220 snapshot breaks the X server,
> > > which uses fcntl() with a lock file.
> > 
> > STC?
> 
> Btw., in how far is XWin broken?  I just tried to start it from the
> start menu and that worked perfectly fine.  I get the default xterm
> and that works.

I really need something reproducible here.  Otherwise I will release
Cygwin 1.7.11 this weekend.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-21  8:59                                   ` Corinna Vinschen
@ 2012-02-21 17:10                                     ` Corinna Vinschen
  2012-02-23 14:20                                       ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-21 17:10 UTC (permalink / raw)
  To: cygwin

On Feb 21 09:58, Corinna Vinschen wrote:
> On Feb 20 19:29, Yaakov (Cygwin/X) wrote:
> > On Mon, 2012-02-20 at 15:17 +0100, Corinna Vinschen wrote:
> > > As always, thanks for the testcase.  I think I found the problem.  It's
> > > hard to explain if you don;t know how the code works, but it boils down
> > > to the fact that my last round of patches back in August were not
> > > actually fixing the problem, but only working around it.  I'm hopeful
> > > that I got it right this time.  I'm just generating a new snapshot.
> > > Please give it another hit with the APR testsuite.
> > 
> > I'm sorry to report that the 20120220 snapshot breaks the X server,
> > which uses fcntl() with a lock file.
> 
> STC?

Btw., in how far is XWin broken?  I just tried to start it from the
start menu and that worked perfectly fine.  I get the default xterm
and that works.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-21  1:29                                 ` Yaakov (Cygwin/X)
@ 2012-02-21  8:59                                   ` Corinna Vinschen
  2012-02-21 17:10                                     ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-21  8:59 UTC (permalink / raw)
  To: cygwin

On Feb 20 19:29, Yaakov (Cygwin/X) wrote:
> On Mon, 2012-02-20 at 15:17 +0100, Corinna Vinschen wrote:
> > As always, thanks for the testcase.  I think I found the problem.  It's
> > hard to explain if you don;t know how the code works, but it boils down
> > to the fact that my last round of patches back in August were not
> > actually fixing the problem, but only working around it.  I'm hopeful
> > that I got it right this time.  I'm just generating a new snapshot.
> > Please give it another hit with the APR testsuite.
> 
> I'm sorry to report that the 20120220 snapshot breaks the X server,
> which uses fcntl() with a lock file.

STC?


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-20 14:19                               ` Corinna Vinschen
  2012-02-20 20:15                                 ` David Rothenberger
@ 2012-02-21  1:29                                 ` Yaakov (Cygwin/X)
  2012-02-21  8:59                                   ` Corinna Vinschen
  1 sibling, 1 reply; 32+ messages in thread
From: Yaakov (Cygwin/X) @ 2012-02-21  1:29 UTC (permalink / raw)
  To: cygwin

On Mon, 2012-02-20 at 15:17 +0100, Corinna Vinschen wrote:
> As always, thanks for the testcase.  I think I found the problem.  It's
> hard to explain if you don;t know how the code works, but it boils down
> to the fact that my last round of patches back in August were not
> actually fixing the problem, but only working around it.  I'm hopeful
> that I got it right this time.  I'm just generating a new snapshot.
> Please give it another hit with the APR testsuite.

I'm sorry to report that the 20120220 snapshot breaks the X server,
which uses fcntl() with a lock file.


Yaakov



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-20 14:19                               ` Corinna Vinschen
@ 2012-02-20 20:15                                 ` David Rothenberger
  2012-02-21  1:29                                 ` Yaakov (Cygwin/X)
  1 sibling, 0 replies; 32+ messages in thread
From: David Rothenberger @ 2012-02-20 20:15 UTC (permalink / raw)
  To: cygwin

On 2/20/2012 6:17 AM, Corinna Vinschen wrote:
> As always, thanks for the testcase.  I think I found the problem.  It's
> hard to explain if you don;t know how the code works, but it boils down
> to the fact that my last round of patches back in August were not
> actually fixing the problem, but only working around it.  I'm hopeful
> that I got it right this time.  I'm just generating a new snapshot.
> Please give it another hit with the APR testsuite.

All the tests pass with the latest snapshot. Thanks for all the hard work!

-- 
David Rothenberger  ----  daveroth@acm.org

Too much is just enough.
                -- Mark Twain, on whiskey

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-18 21:52                             ` David Rothenberger
@ 2012-02-20 14:19                               ` Corinna Vinschen
  2012-02-20 20:15                                 ` David Rothenberger
  2012-02-21  1:29                                 ` Yaakov (Cygwin/X)
  0 siblings, 2 replies; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-20 14:19 UTC (permalink / raw)
  To: cygwin

On Feb 18 13:51, David Rothenberger wrote:
> On 2/16/2012 8:04 AM, Corinna Vinschen wrote:
> > On Feb 16 07:56, David Rothenberger wrote:
> >> On 2/16/2012 6:09 AM, Corinna Vinschen wrote:
> >>> I read the Linux man page again (http://linux.die.net/man/2/flock)
> >>> and I just hacked the following testcase, based on your flock STC.
> >>
> >> That sounds pretty close to what the APR test case is doing, as far as I
> >> understand.
> >>
> >>> The testcase is attached.  I'm pretty curious what your test is actually
> >>> testing.
> >>
> >> I got to work at my real job all last night, so couldn't extract the STC
> >> from the APR test suite. But, here's the test in APR-ese in case you're
> >> interested. I'll remove the APRisms as soon as I can to get you another
> >> test case.
> 
> I've extracted the test case, which is attached.
> 
> I must humbly apologize. The test case was actually using fcntl() for
> file locking, not flock(). I got thrown off by the name of the test:
> "testflock". It seems APR prefers fcntl() for file locking if available.
> 
> The attached test works fine for me on Linux, but fails on Cygwin
> starting with the 20120215 snapshot.

As always, thanks for the testcase.  I think I found the problem.  It's
hard to explain if you don;t know how the code works, but it boils down
to the fact that my last round of patches back in August were not
actually fixing the problem, but only working around it.  I'm hopeful
that I got it right this time.  I'm just generating a new snapshot.
Please give it another hit with the APR testsuite.


Thanks again,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-16 16:06                           ` Corinna Vinschen
@ 2012-02-18 21:52                             ` David Rothenberger
  2012-02-20 14:19                               ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: David Rothenberger @ 2012-02-18 21:52 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1234 bytes --]

On 2/16/2012 8:04 AM, Corinna Vinschen wrote:
> On Feb 16 07:56, David Rothenberger wrote:
>> On 2/16/2012 6:09 AM, Corinna Vinschen wrote:
>>> I read the Linux man page again (http://linux.die.net/man/2/flock)
>>> and I just hacked the following testcase, based on your flock STC.
>>
>> That sounds pretty close to what the APR test case is doing, as far as I
>> understand.
>>
>>> The testcase is attached.  I'm pretty curious what your test is actually
>>> testing.
>>
>> I got to work at my real job all last night, so couldn't extract the STC
>> from the APR test suite. But, here's the test in APR-ese in case you're
>> interested. I'll remove the APRisms as soon as I can to get you another
>> test case.

I've extracted the test case, which is attached.

I must humbly apologize. The test case was actually using fcntl() for
file locking, not flock(). I got thrown off by the name of the test:
"testflock". It seems APR prefers fcntl() for file locking if available.

The attached test works fine for me on Linux, but fails on Cygwin
starting with the 20120215 snapshot.


-- 
David Rothenberger  ----  daveroth@acm.org

"So why don't you make like a tree, and get outta here."
                -- Biff in "Back to the Future"

[-- Attachment #2: stc-fcntl-forkexec.c --]
[-- Type: text/plain, Size: 3531 bytes --]

/***********************************************************************
 * This is a STC to show a process can get an exclusive lock on a file using
 * fcntl, even though another process has an exclusive lock.
 *
 * A parent process uses fcntl to get an exclusive lock. It then
 * uses fork/exec to spawn a child of itself, which also tries to get an
 * exclusive lock on the file.
 *
 * If all works correctly, the child should not be able to get the
 * lock. However, the child is able to get the lock.
 *
 * This test was extracted from the APR test suite.
 *
 * Compile: gcc -Wall -o stc-fcntl-forkexec stc-fcntl-forkexec.c
 ***********************************************************************/
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/file.h>
#include <sys/wait.h>

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#define TESTFILE "testfile.lock"

error_t
lock_file (int fd, int cmd)
{
  int rc;
  struct flock l = { 0 };
  int fc;

  l.l_whence = SEEK_SET;  /* lock from current point */
  l.l_start = 0;          /* begin lock at this offset */
  l.l_len = 0;            /* lock to end of file */
  l.l_type = F_WRLCK;
  fc = cmd;

  /* keep trying if fcntl() gets interrupted (by a signal) */
  while ((rc = fcntl(fd, fc, &l)) < 0 && errno == EINTR)
    continue;

  if (rc == -1) {
    /* on some Unix boxes (e.g., Tru64), we get EACCES instead
     * of EAGAIN; we don't want APR_STATUS_IS_EAGAIN() matching EACCES
     * since that breaks other things, so fix up the retcode here
     */
    if (errno == EACCES) {
      return EAGAIN;
    }
    return errno;
  }
  return 0;
}

/* The child */
void
tryread ()
{
  int fd;
  error_t status;
  
  fd = open (TESTFILE, O_WRONLY, 0666);
  if (fd < 0)
    {
      perror ("child open failed");
      exit (2);
    }

  status = lock_file (fd, F_SETLK);
  if (status == 0)
    exit(0);
  if (status == EAGAIN)
    exit(1);
  exit(2);
}

int
main (int argc, const char *const *argv)
{
  int fd;
  const char *args[3];
  pid_t pid;
  pid_t pstatus;
  int exit_int;

  if (argc > 1)
    {
      /* Called to run the child. */
      tryread ();
      fprintf (stderr, "Should not get here!\n");
      return 2;
    }  

  /* Create the lock file. */
  fd = open (TESTFILE, O_WRONLY|O_CREAT, 0666);
  if (fd < 0)
    {
      perror ("open failed");
      return 1;
    }

  /* Lock the file. */
  if (lock_file (fd, F_SETLKW) != 0)
    {
      perror ("lock");
      return 1;
    }

  /* Spawn the child reader */
  if ((pid = fork ()) < 0)
    {
      perror ("fork");
      return 1;
    }
  else if (pid == 0) {
    /* child */
    args[0] = program_invocation_name;
    args[1] = "child";
    args[2] = NULL;
    execl (program_invocation_name, program_invocation_name, "child", NULL);
    fprintf (stderr, "execv failed\n");
    _exit (2);
  }

  /* Wait for the child. */
  do {
    pstatus = waitpid (pid, &exit_int, WUNTRACED);
  } while (pstatus < 0 && errno == EINTR);

  if (WIFEXITED (exit_int))
    {
      exit_int = WEXITSTATUS (exit_int);
      if (exit_int == 0)
        printf ("FAILED: Child was able to get a lock when it shouldn't.\n");
      else if (exit_int == 1)
        printf ("SUCCESS: Child was not able to get the lock.\n");
      else
        fprintf (stderr, "Unexpected error from child: %d\n", exit_int);
    }
  else
    fprintf (stderr, "Child did not terminate normally.\n");
  
  /* Close the file */
  close (fd);

  /* Clean up. */
  unlink (TESTFILE);

  return 0;
}


[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-16 15:57                         ` David Rothenberger
@ 2012-02-16 16:06                           ` Corinna Vinschen
  2012-02-18 21:52                             ` David Rothenberger
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-16 16:06 UTC (permalink / raw)
  To: cygwin

On Feb 16 07:56, David Rothenberger wrote:
> On 2/16/2012 6:09 AM, Corinna Vinschen wrote:
> > I read the Linux man page again (http://linux.die.net/man/2/flock)
> > and I just hacked the following testcase, based on your flock STC.
> 
> That sounds pretty close to what the APR test case is doing, as far as I
> understand.
> 
> > The testcase is attached.  I'm pretty curious what your test is actually
> > testing.
> 
> I got to work at my real job all last night, so couldn't extract the STC
> from the APR test suite. But, here's the test in APR-ese in case you're
> interested. I'll remove the APRisms as soon as I can to get you another
> test case.

Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-16 14:11                       ` Corinna Vinschen
@ 2012-02-16 15:57                         ` David Rothenberger
  2012-02-16 16:06                           ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: David Rothenberger @ 2012-02-16 15:57 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1693 bytes --]

On 2/16/2012 6:09 AM, Corinna Vinschen wrote:
> On Feb 15 14:14, David Rothenberger wrote:
>> On 2/15/2012 1:20 PM, Corinna Vinschen wrote:
>>> On Feb 15 13:15, David Rothenberger wrote:
>>>> On 2/15/2012 12:45 PM, Corinna Vinschen wrote:
>>>>> On Feb 15 11:39, David Rothenberger wrote:
>>>>>> But... now one of the flock tests is failing. It takes a while to
>>>>>> extract a STC from the APR test suite because everything is written in
>>>>>> APR-ese and I have to convert every APR call into the base C library
>>>>>> calls. I'll work on that over the next day or three.
>>>>>>
>>>>>> The gist of the test that's failing is this:
>>>>>>
>>>>>>  * Create a file.
>>>>>>  * Get an exclusive flock on it.
>>>>>>  * Spawn a child process that attempts to get an exclusive, non-blocking
>>>>>>    lock on the file.
>>>>>>
>>>>>> The test is expecting that the child will not be able to get the lock,
>>>>>> but the child is able to.
>>>>> [...]
>>>>> Does it fork/exec or does it only exec? 
>>>>
>>>> Looks like fork/exec. execv to be precise.
>>>>
>>>>> I guess I really need the testcase.
>>>> [...]
> 
> I read the Linux man page again (http://linux.die.net/man/2/flock)
> and I just hacked the following testcase, based on your flock STC.

That sounds pretty close to what the APR test case is doing, as far as I
understand.

> The testcase is attached.  I'm pretty curious what your test is actually
> testing.

I got to work at my real job all last night, so couldn't extract the STC
from the APR test suite. But, here's the test in APR-ese in case you're
interested. I'll remove the APRisms as soon as I can to get you another
test case.

-- 
David Rothenberger  ----  daveroth@acm.org

[-- Attachment #2: testflock.c --]
[-- Type: text/plain, Size: 3281 bytes --]

/* Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include "testflock.h"
#include "testutil.h"
#include "apr_pools.h"
#include "apr_thread_proc.h"
#include "apr_file_io.h"
#include "apr_file_info.h"
#include "apr_general.h"
#include "apr_strings.h"

static int launch_reader(abts_case *tc)
{
    apr_proc_t proc = {0};
    apr_procattr_t *procattr;
    const char *args[2];
    apr_status_t rv;
    apr_exit_why_e why;
    int exitcode;

    rv = apr_procattr_create(&procattr, p);
    APR_ASSERT_SUCCESS(tc, "Couldn't create procattr", rv);

    rv = apr_procattr_io_set(procattr, APR_NO_PIPE, APR_NO_PIPE,
            APR_NO_PIPE);
    APR_ASSERT_SUCCESS(tc, "Couldn't set io in procattr", rv);

    rv = apr_procattr_cmdtype_set(procattr, APR_PROGRAM_ENV);
    APR_ASSERT_SUCCESS(tc, "Couldn't set copy environment", rv);

    rv = apr_procattr_error_check_set(procattr, 1);
    APR_ASSERT_SUCCESS(tc, "Couldn't set error check in procattr", rv);

    args[0] = "tryread" EXTENSION;
    args[1] = NULL;
    rv = apr_proc_create(&proc, TESTBINPATH "tryread" EXTENSION, args, NULL, procattr, p);
    APR_ASSERT_SUCCESS(tc, "Couldn't launch program", rv);

    ABTS_ASSERT(tc, "wait for child process",
            apr_proc_wait(&proc, &exitcode, &why, APR_WAIT) == APR_CHILD_DONE);

    ABTS_ASSERT(tc, "child terminated normally", why == APR_PROC_EXIT);
    return exitcode;
}

static void test_withlock(abts_case *tc, void *data)
{
    apr_file_t *file;
    apr_status_t rv;
    int code;

    rv = apr_file_open(&file, TESTFILE, APR_FOPEN_WRITE|APR_FOPEN_CREATE,
                       APR_OS_DEFAULT, p);
    APR_ASSERT_SUCCESS(tc, "Could not create file.", rv);
    ABTS_PTR_NOTNULL(tc, file);

    rv = apr_file_lock(file, APR_FLOCK_EXCLUSIVE);
    APR_ASSERT_SUCCESS(tc, "Could not lock the file.", rv);
    ABTS_PTR_NOTNULL(tc, file);

    code = launch_reader(tc);
    ABTS_INT_EQUAL(tc, FAILED_READ, code);

    (void) apr_file_close(file);
}

static void test_withoutlock(abts_case *tc, void *data)
{
    int code;

    code = launch_reader(tc);
    ABTS_INT_EQUAL(tc, SUCCESSFUL_READ, code);
}

static void remove_lockfile(abts_case *tc, void *data)
{
    APR_ASSERT_SUCCESS(tc, "Couldn't remove lock file.",
                       apr_file_remove(TESTFILE, p));
}

abts_suite *testflock(abts_suite *suite)
{
    suite = ADD_SUITE(suite)

    abts_run_test(suite, test_withlock, NULL);
    abts_run_test(suite, test_withoutlock, NULL);
    abts_run_test(suite, remove_lockfile, NULL);

    return suite;
}

[-- Attachment #3: testflock.h --]
[-- Type: text/plain, Size: 966 bytes --]

/* Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#ifndef TESTFLOCK
#define TESTFLOCK

#define TESTFILE "data/testfile.lock"

#define FAILED_READ      0
#define SUCCESSFUL_READ  1
#define UNEXPECTED_ERROR 2

#endif

[-- Attachment #4: tryread.c --]
[-- Type: text/plain, Size: 1530 bytes --]

/* Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include "testflock.h"
#include "apr_pools.h"
#include "apr_file_io.h"
#include "apr_general.h"
#include "apr.h"

#if APR_HAVE_STDLIB_H
#include <stdlib.h>
#endif

int main(int argc, const char * const *argv)
{
    apr_file_t *file;
    apr_status_t status;
    apr_pool_t *p;

    apr_initialize();
    apr_pool_create(&p, NULL);

    if (apr_file_open(&file, TESTFILE, APR_FOPEN_WRITE, APR_OS_DEFAULT, p)
        != APR_SUCCESS) {

        exit(UNEXPECTED_ERROR);
    }
    status = apr_file_lock(file, APR_FLOCK_EXCLUSIVE | APR_FLOCK_NONBLOCK);
    if (status == APR_SUCCESS) {
        exit(SUCCESSFUL_READ);
    }
    if (APR_STATUS_IS_EAGAIN(status)) {
        exit(FAILED_READ);
    }
    exit(UNEXPECTED_ERROR);
}

[-- Attachment #5: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-15 22:14                     ` David Rothenberger
@ 2012-02-16 14:11                       ` Corinna Vinschen
  2012-02-16 15:57                         ` David Rothenberger
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-16 14:11 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2743 bytes --]

On Feb 15 14:14, David Rothenberger wrote:
> On 2/15/2012 1:20 PM, Corinna Vinschen wrote:
> > On Feb 15 13:15, David Rothenberger wrote:
> >> On 2/15/2012 12:45 PM, Corinna Vinschen wrote:
> >>> On Feb 15 11:39, David Rothenberger wrote:
> >>>> But... now one of the flock tests is failing. It takes a while to
> >>>> extract a STC from the APR test suite because everything is written in
> >>>> APR-ese and I have to convert every APR call into the base C library
> >>>> calls. I'll work on that over the next day or three.
> >>>>
> >>>> The gist of the test that's failing is this:
> >>>>
> >>>>  * Create a file.
> >>>>  * Get an exclusive flock on it.
> >>>>  * Spawn a child process that attempts to get an exclusive, non-blocking
> >>>>    lock on the file.
> >>>>
> >>>> The test is expecting that the child will not be able to get the lock,
> >>>> but the child is able to.
> >>>[...]
> >>> Does it fork/exec or does it only exec? 
> >>
> >> Looks like fork/exec. execv to be precise.
> >>
> >>> I guess I really need the testcase.
> >> [...]

I read the Linux man page again (http://linux.die.net/man/2/flock)
and I just hacked the following testcase, based on your flock STC.

It creates a lock in the parent, then forks a child.  The child tries to
grab the lock, first using the inherited file descriptor.  This is
supposed to work.  Then it opens the file again and tries to lock the
file using that descriptor.  This is supposed to fail with EWOULDBLOCK.
If it failed to lock the file one way or the other, it tries to unlock
the file using the second descriptor.  In theory this should fail.  If
it doesn't fail, it tries to lock the file again using both descriptors.
The expected result is the same as in the first two tries.  Eventually
the child exec's, and runs the entire set of tests again.  The result
should be the same as for the forked child.

I tried this test on both, Linux and Cygwin (latest from CVS), and it
behaves identically:

Linux$ ./stc-flock-forkexec
funlock from forked child with new descriptor succeeded but shouldn't
funlock from execed child with new descriptor succeeded but shouldn't

Cygwin$ ./stc-flock-forkexec
funlock from forked child with new descriptor succeeded but shouldn't
funlock from execed child with new descriptor succeeded but shouldn't

Funny enough, unlocking always returns success on the descriptor not
holding the lock, even on Linux.  But the second set of tests shows that
the lock is still firm in the hands of the first descriptor.

The testcase is attached.  I'm pretty curious what your test is actually
testing.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: stc-flock-forkexec.c --]
[-- Type: text/plain, Size: 3214 bytes --]

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/file.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <sys/stat.h>

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

/* A temporary file used for flock. */
char tmpfilename[] = "/tmp/flocktstXXXXXX";

void
test_child (int fd, const char *type, const char *fname)
{
  int rc;

  /* First try to lock using fd. */
  do
    {
      rc = flock (fd, LOCK_EX | LOCK_NB);
    }
  while (rc < 0 && errno == EINTR);
  if (rc < 0)
    fprintf (stderr, "flock from %s child with same descriptor: %s\n", type, strerror (errno));

  int fd2 = open (fname, O_RDONLY);
  if (fd2 < 0)
    perror ("child open");
  else
    {
      /* Try another descriptor. */
      do
	{
	  rc = flock (fd2, LOCK_EX | LOCK_NB);
	}
      while (rc < 0 && errno == EINTR);
      if (rc == 0)
	fprintf (stderr, "flock from %s child with new descriptor succeeded but shouldn't\n", type);
      else if (errno != EWOULDBLOCK)
	  fprintf (stderr, "flock from %s child with new descriptor: %s\n", type, strerror (errno));
      if (rc < 0)
	{
	  do
	    {
	      rc = flock (fd2, LOCK_UN);
	    }
	  while (rc < 0 && errno == EINTR);
	  if (rc == 0)
	    {
	      fprintf (stderr, "funlock from %s child with new descriptor succeeded but shouldn't\n", type);
	      do
		{
		  rc = flock (fd2, LOCK_EX | LOCK_NB);
		}
	      while (rc < 0 && errno == EINTR);
	      if (rc == 0)
		fprintf (stderr, "flock from %s child with new descriptor succeeded but shouldn't\n", type);
	      else if (errno != EWOULDBLOCK)
		  fprintf (stderr, "flock from %s child with new descriptor: %s\n", type, strerror (errno));
	      do
		{
		  rc = flock (fd, LOCK_EX | LOCK_NB);
		}
	      while (rc < 0 && errno == EINTR);
	      if (rc < 0)
		fprintf (stderr, "flock from %s child with same descriptor: %s\n", type, strerror (errno));
	    }
	}
      close (fd2);
    }
}

/* Fork and use flock to lock and unlock the file repeatedly in the child. */
void
make_child (int fd, pid_t * pid)
{
  if ((*pid = fork ()) < 0)
    {
      perror ("fork failed");
      exit (1);
    }
  else if (*pid == 0)
    {
      char buf[32];

      test_child (fd, "forked", tmpfilename);
      snprintf (buf, 32, "%d", fd);
      execl (program_invocation_name, program_invocation_name, buf, tmpfilename, NULL);
      perror ("execl");
      exit (1);
    }
}

/* Wait for the child to finish. */
void
await_child (pid_t pid)
{
  pid_t pstatus;
  int exit_int;

  do
    {
      pstatus = waitpid (pid, &exit_int, WUNTRACED);
    }
  while (pstatus < 0 && errno == EINTR);
}

int
main (int argc, const char *const *argv)
{
  pid_t child;
  int rc;
  int fd;

  if (argc > 1)
    {
      test_child (atoi (argv[1]), "execed", argv[2]);
      exit (0);
    }

  /* Create the temporary file. */
  fd = mkstemp (tmpfilename);
  if (fd < 0)
    {
      perror ("open failed");
      exit (1);
    }
  do
    {
      rc = flock (fd, LOCK_EX);
    }
  while (rc < 0 && errno == EINTR);
  if (rc < 0)
    {
      perror ("lock");
      exit (1);
    }

  make_child (fd, &child);

  await_child (child);

  close (fd);

  /* Clean up. */
  unlink (tmpfilename);

  return 0;
}


[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-15 21:20                   ` Corinna Vinschen
@ 2012-02-15 22:14                     ` David Rothenberger
  2012-02-16 14:11                       ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: David Rothenberger @ 2012-02-15 22:14 UTC (permalink / raw)
  To: cygwin

On 2/15/2012 1:20 PM, Corinna Vinschen wrote:
> On Feb 15 13:15, David Rothenberger wrote:
>> On 2/15/2012 12:45 PM, Corinna Vinschen wrote:
>>> On Feb 15 11:39, David Rothenberger wrote:
>>>> On 2/15/2012 7:38 AM, Corinna Vinschen wrote:
>>>>> Did I mention that I hate synchronization problems?  Anyway, I think I
>>>>> found the problem.  I applied a patch which fixes the problem for me
>>>>> and, surprise!, the flock test still runs fine, too.  I've just uploaded
>>>>> a new snapshot.  Please give it a try.
>>>>
>>>> All the procmutex tests pass now! Awesome!
>>>>
>>>> But... now one of the flock tests is failing. It takes a while to
>>>> extract a STC from the APR test suite because everything is written in
>>>> APR-ese and I have to convert every APR call into the base C library
>>>> calls. I'll work on that over the next day or three.
>>>>
>>>> The gist of the test that's failing is this:
>>>>
>>>>  * Create a file.
>>>>  * Get an exclusive flock on it.
>>>>  * Spawn a child process that attempts to get an exclusive, non-blocking
>>>>    lock on the file.
>>>>
>>>> The test is expecting that the child will not be able to get the lock,
>>>> but the child is able to.
>>>
>>> Did I really mention that I hate synchronization problems?
>>
>> Yeah, you mentioned it. :-)
>>
>>> Does it fork/exec or does it only exec? 
>>
>> Looks like fork/exec. execv to be precise.
>>
>>> I guess I really need the testcase.
>>
>> I'll try to work on that tonight.
> 
> Thanks.  Btw., does that testcase fail in 1.7.9 as well?

I'm pretty sure it did. I think all the tests passed the last time I
released this package (2011-09-10), but I might have been testing
against a snapshot. It's hard for me to tell now. If I just install
1.7.9 on my system, things like /bin/ls stop working. The compiled tests
don't run, either.

FYI, the test was passing with 1.7.10 and the 20120214 snapshot. It
didn't start failing until your last snapshot (20120215).

-- 
David Rothenberger  ----  daveroth@acm.org

Don't panic.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-15 21:16                 ` David Rothenberger
@ 2012-02-15 21:20                   ` Corinna Vinschen
  2012-02-15 22:14                     ` David Rothenberger
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-15 21:20 UTC (permalink / raw)
  To: cygwin

On Feb 15 13:15, David Rothenberger wrote:
> On 2/15/2012 12:45 PM, Corinna Vinschen wrote:
> > On Feb 15 11:39, David Rothenberger wrote:
> >> On 2/15/2012 7:38 AM, Corinna Vinschen wrote:
> >>> Did I mention that I hate synchronization problems?  Anyway, I think I
> >>> found the problem.  I applied a patch which fixes the problem for me
> >>> and, surprise!, the flock test still runs fine, too.  I've just uploaded
> >>> a new snapshot.  Please give it a try.
> >>
> >> All the procmutex tests pass now! Awesome!
> >>
> >> But... now one of the flock tests is failing. It takes a while to
> >> extract a STC from the APR test suite because everything is written in
> >> APR-ese and I have to convert every APR call into the base C library
> >> calls. I'll work on that over the next day or three.
> >>
> >> The gist of the test that's failing is this:
> >>
> >>  * Create a file.
> >>  * Get an exclusive flock on it.
> >>  * Spawn a child process that attempts to get an exclusive, non-blocking
> >>    lock on the file.
> >>
> >> The test is expecting that the child will not be able to get the lock,
> >> but the child is able to.
> > 
> > Did I really mention that I hate synchronization problems?
> 
> Yeah, you mentioned it. :-)
> 
> > Does it fork/exec or does it only exec? 
> 
> Looks like fork/exec. execv to be precise.
> 
> > I guess I really need the testcase.
> 
> I'll try to work on that tonight.

Thanks.  Btw., does that testcase fail in 1.7.9 as well?


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-15 20:46               ` Corinna Vinschen
@ 2012-02-15 21:16                 ` David Rothenberger
  2012-02-15 21:20                   ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: David Rothenberger @ 2012-02-15 21:16 UTC (permalink / raw)
  To: cygwin

On 2/15/2012 12:45 PM, Corinna Vinschen wrote:
> On Feb 15 11:39, David Rothenberger wrote:
>> On 2/15/2012 7:38 AM, Corinna Vinschen wrote:
>>> Did I mention that I hate synchronization problems?  Anyway, I think I
>>> found the problem.  I applied a patch which fixes the problem for me
>>> and, surprise!, the flock test still runs fine, too.  I've just uploaded
>>> a new snapshot.  Please give it a try.
>>
>> All the procmutex tests pass now! Awesome!
>>
>> But... now one of the flock tests is failing. It takes a while to
>> extract a STC from the APR test suite because everything is written in
>> APR-ese and I have to convert every APR call into the base C library
>> calls. I'll work on that over the next day or three.
>>
>> The gist of the test that's failing is this:
>>
>>  * Create a file.
>>  * Get an exclusive flock on it.
>>  * Spawn a child process that attempts to get an exclusive, non-blocking
>>    lock on the file.
>>
>> The test is expecting that the child will not be able to get the lock,
>> but the child is able to.
> 
> Did I really mention that I hate synchronization problems?

Yeah, you mentioned it. :-)

> Does it fork/exec or does it only exec? 

Looks like fork/exec. execv to be precise.

> I guess I really need the testcase.

I'll try to work on that tonight.

-- 
David Rothenberger  ----  daveroth@acm.org

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-15 19:39             ` David Rothenberger
@ 2012-02-15 20:46               ` Corinna Vinschen
  2012-02-15 21:16                 ` David Rothenberger
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-15 20:46 UTC (permalink / raw)
  To: cygwin

On Feb 15 11:39, David Rothenberger wrote:
> On 2/15/2012 7:38 AM, Corinna Vinschen wrote:
> > Did I mention that I hate synchronization problems?  Anyway, I think I
> > found the problem.  I applied a patch which fixes the problem for me
> > and, surprise!, the flock test still runs fine, too.  I've just uploaded
> > a new snapshot.  Please give it a try.
> 
> All the procmutex tests pass now! Awesome!
> 
> But... now one of the flock tests is failing. It takes a while to
> extract a STC from the APR test suite because everything is written in
> APR-ese and I have to convert every APR call into the base C library
> calls. I'll work on that over the next day or three.
> 
> The gist of the test that's failing is this:
> 
>  * Create a file.
>  * Get an exclusive flock on it.
>  * Spawn a child process that attempts to get an exclusive, non-blocking
>    lock on the file.
> 
> The test is expecting that the child will not be able to get the lock,
> but the child is able to.

Did I really mention that I hate synchronization problems?

Does it fork/exec or does it only exec?  If the latter, and if the child
uses the file descriptor inherited from the parent, then it's ok that
it gets the lock, afaics.  From the Linux man page:

   Locks  created by flock() are associated with an open file table entry.
   This means that duplicate file descriptors (created  by,  for  example,
   fork(2)  or  dup(2)) refer to the same lock, and this lock may be modiâ€
   fied or released using any of these descriptors.  Furthermore, the lock
   is  released  either  by  an explicit LOCK_UN operation on any of these
   duplicate descriptors, or when all such descriptors have been closed.
   [...]
   Locks created by flock() are preserved across an execve(2)

But maybe I misunderstood something when implementig this?  I guess I really
need the testcase.

Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-15 15:39           ` Corinna Vinschen
@ 2012-02-15 19:39             ` David Rothenberger
  2012-02-15 20:46               ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: David Rothenberger @ 2012-02-15 19:39 UTC (permalink / raw)
  To: cygwin

On 2/15/2012 7:38 AM, Corinna Vinschen wrote:
> Did I mention that I hate synchronization problems?  Anyway, I think I
> found the problem.  I applied a patch which fixes the problem for me
> and, surprise!, the flock test still runs fine, too.  I've just uploaded
> a new snapshot.  Please give it a try.

All the procmutex tests pass now! Awesome!

But... now one of the flock tests is failing. It takes a while to
extract a STC from the APR test suite because everything is written in
APR-ese and I have to convert every APR call into the base C library
calls. I'll work on that over the next day or three.

The gist of the test that's failing is this:

 * Create a file.
 * Get an exclusive flock on it.
 * Spawn a child process that attempts to get an exclusive, non-blocking
   lock on the file.

The test is expecting that the child will not be able to get the lock,
but the child is able to.

-- 
David Rothenberger  ----  daveroth@acm.org

Hubbard's Law:
        Don't take life too seriously; you won't get out of it alive.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-14 21:43         ` David Rothenberger
@ 2012-02-15 15:39           ` Corinna Vinschen
  2012-02-15 19:39             ` David Rothenberger
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-15 15:39 UTC (permalink / raw)
  To: cygwin

On Feb 14 13:43, David Rothenberger wrote:
> On 2/14/2012 10:24 AM, Corinna Vinschen wrote:
> > On Feb 14 09:58, David Rothenberger wrote:
> >> On 2/14/2012 6:45 AM, Corinna Vinschen wrote:
> >>> On Feb 14 15:02, Corinna Vinschen wrote:
> >>>> On Feb 14 00:00, David Rothenberger wrote:
> >>>>> The libapr1 test cases are failing again for flock locks. This same
> >>>>> test case failed with 1.7.9 with a fatal error[1], but that was
> >>>>> corrected. The test is no longer encountering the fatal error, but
> >>>>> it is producing the wrong result.
> >>>>
> >>>> Thanks for the testcase.  I think I found the issue.  An event handle
> >>>> was closed in the wrong place, outside of the important mutex lock for
> >>>> the lock object.  I applied the patch to CVS.  Your testcase now appears
> >>>> to run fine for me.  Can you try your entire testsuite again and see
> >>>> if there's another failure lurking?
> >>>
> >>> I uploaded a snapshot for testing.
> >>
> >> Thanks. The snapshot fixes the flock test case, but now the fcntl test
> >> case is failing.
> > 
> > *Sob*.  How so?  Does it hang or does it allow multiple concurrent
> > exclusive locks as the flock case?
> 
> Sorry, I should have said. It hangs.
> 
> >> I'll try to send an STC for that case, but I suspect the one from last
> >> year will have the problem.
> > 
> > Please send it anyway.
> 
> It's attached. If you run it with an argument (any argument), each child
> will print its loop count and you can see what happens. If it doesn't
> hang for you, try increasing MAX_ITER or CHILDREN at the top.

Did I mention that I hate synchronization problems?  Anyway, I think I
found the problem.  I applied a patch which fixes the problem for me
and, surprise!, the flock test still runs fine, too.  I've just uploaded
a new snapshot.  Please give it a try.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-14 18:25       ` Corinna Vinschen
@ 2012-02-14 21:43         ` David Rothenberger
  2012-02-15 15:39           ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: David Rothenberger @ 2012-02-14 21:43 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1593 bytes --]

On 2/14/2012 10:24 AM, Corinna Vinschen wrote:
> On Feb 14 09:58, David Rothenberger wrote:
>> On 2/14/2012 6:45 AM, Corinna Vinschen wrote:
>>> On Feb 14 15:02, Corinna Vinschen wrote:
>>>> On Feb 14 00:00, David Rothenberger wrote:
>>>>> The libapr1 test cases are failing again for flock locks. This same
>>>>> test case failed with 1.7.9 with a fatal error[1], but that was
>>>>> corrected. The test is no longer encountering the fatal error, but
>>>>> it is producing the wrong result.
>>>>
>>>> Thanks for the testcase.  I think I found the issue.  An event handle
>>>> was closed in the wrong place, outside of the important mutex lock for
>>>> the lock object.  I applied the patch to CVS.  Your testcase now appears
>>>> to run fine for me.  Can you try your entire testsuite again and see
>>>> if there's another failure lurking?
>>>
>>> I uploaded a snapshot for testing.
>>
>> Thanks. The snapshot fixes the flock test case, but now the fcntl test
>> case is failing.
> 
> *Sob*.  How so?  Does it hang or does it allow multiple concurrent
> exclusive locks as the flock case?

Sorry, I should have said. It hangs.

>> I'll try to send an STC for that case, but I suspect the one from last
>> year will have the problem.
> 
> Please send it anyway.

It's attached. If you run it with an argument (any argument), each child
will print its loop count and you can see what happens. If it doesn't
hang for you, try increasing MAX_ITER or CHILDREN at the top.



-- 
David Rothenberger  ----  daveroth@acm.org

QOTD:
        "Oh, no, no...  I'm not beautiful.  Just very, very pretty."

[-- Attachment #2: stc-fcntl-fork.c --]
[-- Type: text/plain, Size: 4653 bytes --]

/***********************************************************************
 * This is a STC to show that fcntl hangs.
 *
 * It tries to use fcntl() for file locking. It creates a temporary
 * file, the uses fork to spawn a number of children. Each child opens
 * the file, then repeatedly uses flock to lock and unlock it.
 *
 * While each child has the lock, it increments a counter stored in
 * shared memory in a racy way, passing the current value to a function
 * which sleeps briefly, then returns the incremented counter.
 *
 * If all works correctly, the counter should end up be incremented
 * by each child iteration.
 *
 * However, this test currently just hangs.
 *
 * Compile: gcc -Wall -o stc-flock-fork stc-flock-fork.c
 ***********************************************************************/

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/wait.h>
#include <sys/file.h>
#include <sys/mman.h>

#define NUM_TEST_ITERS 1
#define MAX_ITER 10
#define CHILDREN 3
#define MAX_COUNT (MAX_ITER * CHILDREN)

/* Counter stored in shared memory. */
static volatile int *x;

/* A temporary file used for fcntl. */
char tmpfilename[] = "/tmp/fcntlXXXXXX";

struct flock mutex_lock_it;
struct flock mutex_unlock_it;


/* a slower more racy way to implement (*x)++ */
static int increment(int n)
{
    usleep(1);
    return n+1;
}

/* Fork and use fcntl to lock and unlock the file repeatedly in the child. */
void make_child(int childnum, int verbose, int trylock, pid_t *pid)
{
    if ((*pid = fork()) < 0) {
        perror("fork failed");
        exit(1);
    }
    else if (*pid == 0) {
        int fd2 = open(tmpfilename, O_RDWR);
        if (fd2 < 0) {
            perror("child open");
            exit(1);
        }

        int rc;
        int i;
        for (i=0; i<MAX_ITER; ++i) {

            if (verbose)
                printf("Child %d: %d\n", childnum, i);

            /* Get the lock. */
            do {
                rc = fcntl(fd2, F_SETLKW, &mutex_lock_it);
            } while (rc < 0 && errno == EINTR);
            if (rc < 0) {
                perror("lock");
                exit(1);
            }
            
            /* Increment the counter. */
            *x = increment(*x);

            /* Release the lock. */
            do {
                rc = fcntl(fd2, F_SETLKW, &mutex_unlock_it);
            } while (rc < 0 && errno == EINTR);
            if (rc < 0) {
                perror("unlock");
                exit(1);
            }
        }
        exit(0);
    }
}

/* Wait for the child to finish. */
void await_child(pid_t pid)
{
    pid_t pstatus;
    int exit_int;

    do {
        pstatus = waitpid(pid, &exit_int, WUNTRACED);
    } while (pstatus < 0 && errno == EINTR);
}


/* Allocate and attach shared memory */
void init_shm ()
{
    x = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
             MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    if (!x) {
        perror ("mmap failed");
        exit (1);
    }
}

int main(int argc, const char * const * argv, const char * const *env)
{
    pid_t child[CHILDREN];
    int n;
    int fd;
    int i;
    int verbose = (argc > 1) ? 1 : 0;
 
    /* Create the temporary file. */
    fd = mkstemp(tmpfilename);
    if (fd < 0) {
        perror("open failed");
        exit(1);
    }
    close(fd);

    /* Initialize shared memory */
    init_shm();

    /* Setup mutexes */
    mutex_lock_it.l_whence = SEEK_SET;   /* from current point */
    mutex_lock_it.l_start = 0;           /* -"- */
    mutex_lock_it.l_len = 0;             /* until end of file */
    mutex_lock_it.l_type = F_WRLCK;      /* set exclusive/write lock */
    mutex_lock_it.l_pid = 0;             /* pid not actually interesting */
    mutex_unlock_it.l_whence = SEEK_SET; /* from current point */
    mutex_unlock_it.l_start = 0;         /* -"- */
    mutex_unlock_it.l_len = 0;           /* until end of file */
    mutex_unlock_it.l_type = F_UNLCK;    /* set exclusive/write lock */
    mutex_unlock_it.l_pid = 0;           /* pid not actually interesting */

    /* Perform the test multiple times. */
    for (i = 0; i < NUM_TEST_ITERS; ++i) {
        /* Create the children. */
        for (n = 0; n < CHILDREN; n++)
            make_child(n, verbose, 0, &child[n]);

        /* Wait for them to finish. */
        for (n = 0; n < CHILDREN; n++)
            await_child(child[n]);

        /* Check counter */
        if (*x != MAX_COUNT) {
            printf("Iteration %d: FAILED: *x (%d) != MAX_COUNT (%d)\n", i, *x, MAX_COUNT);
            exit(1);
        }
    }

    /* Clean up. */
    unlink(tmpfilename);
    return 0;
}


[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-14 17:58     ` David Rothenberger
@ 2012-02-14 18:25       ` Corinna Vinschen
  2012-02-14 21:43         ` David Rothenberger
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-14 18:25 UTC (permalink / raw)
  To: cygwin

On Feb 14 09:58, David Rothenberger wrote:
> On 2/14/2012 6:45 AM, Corinna Vinschen wrote:
> > On Feb 14 15:02, Corinna Vinschen wrote:
> >> On Feb 14 00:00, David Rothenberger wrote:
> >>> The libapr1 test cases are failing again for flock locks. This same
> >>> test case failed with 1.7.9 with a fatal error[1], but that was
> >>> corrected. The test is no longer encountering the fatal error, but
> >>> it is producing the wrong result.
> >>
> >> Thanks for the testcase.  I think I found the issue.  An event handle
> >> was closed in the wrong place, outside of the important mutex lock for
> >> the lock object.  I applied the patch to CVS.  Your testcase now appears
> >> to run fine for me.  Can you try your entire testsuite again and see
> >> if there's another failure lurking?
> > 
> > I uploaded a snapshot for testing.
> 
> Thanks. The snapshot fixes the flock test case, but now the fcntl test
> case is failing.

*Sob*.  How so?  Does it hang or does it allow multiple concurrent
exclusive locks as the flock case?

> I'll try to send an STC for that case, but I suspect the one from last
> year will have the problem.

Please send it anyway.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-14 14:46   ` Corinna Vinschen
@ 2012-02-14 17:58     ` David Rothenberger
  2012-02-14 18:25       ` Corinna Vinschen
  0 siblings, 1 reply; 32+ messages in thread
From: David Rothenberger @ 2012-02-14 17:58 UTC (permalink / raw)
  To: cygwin

On 2/14/2012 6:45 AM, Corinna Vinschen wrote:
> On Feb 14 15:02, Corinna Vinschen wrote:
>> On Feb 14 00:00, David Rothenberger wrote:
>>> The libapr1 test cases are failing again for flock locks. This same
>>> test case failed with 1.7.9 with a fatal error[1], but that was
>>> corrected. The test is no longer encountering the fatal error, but
>>> it is producing the wrong result.
>>
>> Thanks for the testcase.  I think I found the issue.  An event handle
>> was closed in the wrong place, outside of the important mutex lock for
>> the lock object.  I applied the patch to CVS.  Your testcase now appears
>> to run fine for me.  Can you try your entire testsuite again and see
>> if there's another failure lurking?
> 
> I uploaded a snapshot for testing.

Thanks. The snapshot fixes the flock test case, but now the fcntl test
case is failing.

I'll try to send an STC for that case, but I suspect the one from last
year will have the problem.

-- 
David Rothenberger  ----  daveroth@acm.org

QOTD:
        "I drive my car quietly, for it goes without saying."

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-14 14:03 ` Corinna Vinschen
@ 2012-02-14 14:46   ` Corinna Vinschen
  2012-02-14 17:58     ` David Rothenberger
  0 siblings, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-14 14:46 UTC (permalink / raw)
  To: cygwin

On Feb 14 15:02, Corinna Vinschen wrote:
> On Feb 14 00:00, David Rothenberger wrote:
> > The libapr1 test cases are failing again for flock locks. This same
> > test case failed with 1.7.9 with a fatal error[1], but that was
> > corrected. The test is no longer encountering the fatal error, but
> > it is producing the wrong result.
> 
> Thanks for the testcase.  I think I found the issue.  An event handle
> was closed in the wrong place, outside of the important mutex lock for
> the lock object.  I applied the patch to CVS.  Your testcase now appears
> to run fine for me.  Can you try your entire testsuite again and see
> if there's another failure lurking?

I uploaded a snapshot for testing.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-14  8:00 David Rothenberger
  2012-02-14  8:07 ` David Rothenberger
@ 2012-02-14 14:03 ` Corinna Vinschen
  2012-02-14 14:46   ` Corinna Vinschen
  1 sibling, 1 reply; 32+ messages in thread
From: Corinna Vinschen @ 2012-02-14 14:03 UTC (permalink / raw)
  To: cygwin

On Feb 14 00:00, David Rothenberger wrote:
> The libapr1 test cases are failing again for flock locks. This same
> test case failed with 1.7.9 with a fatal error[1], but that was
> corrected. The test is no longer encountering the fatal error, but
> it is producing the wrong result.

Thanks for the testcase.  I think I found the issue.  An event handle
was closed in the wrong place, outside of the important mutex lock for
the lock object.  I applied the patch to CVS.  Your testcase now appears
to run fine for me.  Can you try your entire testsuite again and see
if there's another failure lurking?

Btw., mmap is really simple.  For your testcase that could be, for
instance:

#include <sys/mman.h>

void init_shm ()
{
  x = mmap (NULL, getpagesize (), PROT_READ | PROT_WRITE,
	    MAP_SHARED | MAP_ANONYMOUS, -1, 0);
  if (!x)
    {
      perror ("mmap failed");
      exit (1);
    }
}


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: STC for libapr1 failure
  2012-02-14  8:00 David Rothenberger
@ 2012-02-14  8:07 ` David Rothenberger
  2012-02-14 14:03 ` Corinna Vinschen
  1 sibling, 0 replies; 32+ messages in thread
From: David Rothenberger @ 2012-02-14  8:07 UTC (permalink / raw)
  To: cygwin

On 2/14/2012 12:00 AM, David Rothenberger wrote:
> The libapr1 test cases are failing again for flock locks.

I forgot to mention that this same test is failing in the libapr1 test
suite when using fcntl locks. I haven't extracted an STC for that, but
it's probably very similar to the previous one here

  http://cygwin.com/ml/cygwin/2011-08/msg00496.html

with the shared memory counter added.

-- 
David Rothenberger  ----  daveroth@acm.org

The main problem I have with cats is, they're not dogs.
                -- Kevin Cowherd

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

* STC for libapr1 failure
@ 2012-02-14  8:00 David Rothenberger
  2012-02-14  8:07 ` David Rothenberger
  2012-02-14 14:03 ` Corinna Vinschen
  0 siblings, 2 replies; 32+ messages in thread
From: David Rothenberger @ 2012-02-14  8:00 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]

The libapr1 test cases are failing again for flock locks. This same
test case failed with 1.7.9 with a fatal error[1], but that was
corrected. The test is no longer encountering the fatal error, but
it is producing the wrong result.

I extracted the attached STC to demonstrate the problem. It starts a
number of child processes, each of which repeatedly grab and release
a lock on a temporary file. While they have the lock, the increment
a counter in shared memory in a racy way.

If all goes well, the counter should end up having the value of
CHILDREN * ITERS_PER_CHILDREN. And it does, sometimes. Other times,
however, it's less than this value, indicating the lock did not
work.

(I'm using shmget for shared memory, so you have to have cygserver
running. APR has a number of shared memory methods, including mmap,
but this was the easiest for me to extract.)

As before, I haven't been doing C programming in a while, so I'm not
100% sure the test case is valid, but it does demonstrate the same
problem the APR test case is having.

I've tried this on my Win7-64 box running the 20120210 snapshot and
on a WinXP running stock 1.7.10. I get the same results in both
places.

Regards,
David

[1] http://cygwin.com/ml/cygwin/2011-08/msg00480.html

-- 
David Rothenberger  ----  daveroth@acm.org

I think we are in Rats' Alley where the dead men lost their bones.
                -- T.S. Eliot

[-- Attachment #2: stc-flock-fork.c --]
[-- Type: text/plain, Size: 4089 bytes --]

/***********************************************************************
 * This is a STC to show that flock occasionally does not work.
 *
 * It tries to use flock() for file locking. It creates a temporary
 * file, the uses fork to spawn a number of children. Each child opens
 * the file, then repeatedly uses flock to lock and unlock it.
 *
 * While each child has the lock, it increments a counter stored in
 * shared memory in a racy way, passing the current value to a function
 * which sleeps briefly, then returns the incremented counter.
 *
 * If all works correctly, the counter should end up be incremented
 * by each child iteration.
 *
 * However, this is failing for me occasionally. The counter ends up
 * being less than the expected value.
 *
 * This test was extracted from the APR test suite.
 *
 * Compile: gcc -Wall -o stc-flock-fork stc-flock-fork.c
 ***********************************************************************/

#include <sys/types.h>
#include <sys/file.h>
#include <sys/wait.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/stat.h>

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#define MAX_ITER 200
#define CHILDREN 6
#define MAX_COUNT (MAX_ITER * CHILDREN)

/* Counter stored in shared memory. */
static volatile int *x;

/* A temporary file used for flock. */
char tmpfilename[] = "/tmp/flocktstXXXXXX";

/* a slower more racy way to implement (*x)++ */
static int increment(int n)
{
    usleep(1);
    return n+1;
}

/* Fork and use flock to lock and unlock the file repeatedly in the child. */
void make_child(int trylock, pid_t *pid)
{
    if ((*pid = fork()) < 0) {
        perror("fork failed");
        exit(1);
    }
    else if (*pid == 0) {
        int fd2 = open(tmpfilename, O_RDONLY);
        if (fd2 < 0) {
            perror("child open");
            exit(1);
        }

        int rc;
        int i;
        for (i=0; i<MAX_ITER; ++i) {
            /* Get the lock. */
            do {
                rc = flock(fd2, LOCK_EX);
            } while (rc < 0 && errno == EINTR);
            if (rc < 0) {
                perror("lock");
                exit(1);
            }

            /* Have the lock. Increment the counter. */
            *x = increment(*x);

            /* Release the lock. */
            do {
                rc = flock(fd2, LOCK_UN);
            } while (rc < 0 && errno == EINTR);
            if (rc < 0) {
                perror("unlock");
                exit(1);
            }
        }
        exit(0);
    }
}

/* Wait for the child to finish. */
void await_child(pid_t pid)
{
    pid_t pstatus;
    int exit_int;

    do {
        pstatus = waitpid(pid, &exit_int, WUNTRACED);
    } while (pstatus < 0 && errno == EINTR);
}

/* Allocate and attach shared memory */
void init_shm ()
{
    int shmid;
    if ((shmid = shmget(IPC_PRIVATE, sizeof(int), S_IRUSR | S_IWUSR | IPC_CREAT)) < 0) {
        perror("shmget failed");
        exit(1);
    }
    if ((x = shmat(shmid, NULL, 0)) == (void *) -1) {
        perror("shmat failed");
        exit(1);
    }
}

int main(int argc, const char * const * argv, const char * const *env)
{
    pid_t child[CHILDREN];
    int i;
    int n;
    int fd;

    /* Create the temporary file. */
    fd = mkstemp(tmpfilename);
    if (fd < 0) {
        perror("open failed");
        exit(1);
    }
    close(fd);

    /* Initialize shared memory */
    init_shm();

    /* Perform the test multiple times, since this fails only intermittedly. */
    for (i = 0; i < 100; ++i) {
        /* Initialize counter */
        *x = 0;

        /* Create the children. */
        for (n = 0; n < CHILDREN; n++)
            make_child(0, &child[n]);

        /* Wait for them to finish. */
        for (n = 0; n < CHILDREN; n++)
            await_child(child[n]);

        /* Check counter */
        if (*x != MAX_COUNT) {
            printf("Iteration %d: FAILED: *x (%d) != MAX_COUNT (%d)\n", i, *x, MAX_COUNT);
            exit(1);
        }
    }

    /* Clean up. */
    unlink(tmpfilename);

    return 0;
}

[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-02-24  8:15 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-26  0:39 STC for libapr1 failure David Rothenberger
2011-08-26 11:16 ` Corinna Vinschen
2011-08-27 20:37   ` Corinna Vinschen
2011-08-27 22:27     ` David Rothenberger
2011-08-29 13:55       ` Corinna Vinschen
2011-08-29 17:09         ` David Rothenberger
2012-02-14  8:00 David Rothenberger
2012-02-14  8:07 ` David Rothenberger
2012-02-14 14:03 ` Corinna Vinschen
2012-02-14 14:46   ` Corinna Vinschen
2012-02-14 17:58     ` David Rothenberger
2012-02-14 18:25       ` Corinna Vinschen
2012-02-14 21:43         ` David Rothenberger
2012-02-15 15:39           ` Corinna Vinschen
2012-02-15 19:39             ` David Rothenberger
2012-02-15 20:46               ` Corinna Vinschen
2012-02-15 21:16                 ` David Rothenberger
2012-02-15 21:20                   ` Corinna Vinschen
2012-02-15 22:14                     ` David Rothenberger
2012-02-16 14:11                       ` Corinna Vinschen
2012-02-16 15:57                         ` David Rothenberger
2012-02-16 16:06                           ` Corinna Vinschen
2012-02-18 21:52                             ` David Rothenberger
2012-02-20 14:19                               ` Corinna Vinschen
2012-02-20 20:15                                 ` David Rothenberger
2012-02-21  1:29                                 ` Yaakov (Cygwin/X)
2012-02-21  8:59                                   ` Corinna Vinschen
2012-02-21 17:10                                     ` Corinna Vinschen
2012-02-23 14:20                                       ` Corinna Vinschen
2012-02-23 18:43                                         ` Achim Gratz
2012-02-24  3:49                                         ` Yaakov (Cygwin/X)
2012-02-24  8:15                                           ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).