public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* STC for libapr1 flock failure
@ 2013-04-26 17:52 David Rothenberger
  2013-04-27  4:22 ` David Rothenberger
  0 siblings, 1 reply; 5+ messages in thread
From: David Rothenberger @ 2013-04-26 17:52 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 839 bytes --]

The libapr1 test cases are failing for flock locks. I've extracted
the attached STC to demonstrate the problem. It starts a number of
child processes, each of which repeatedly grab and release a lock on
a temporary file. While they have the lock, the increment a counter
in shared memory in a racy way.

If all goes well, the counter should end up having the value of
CHILDREN * ITERS_PER_CHILDREN. And it does, except sometimes the
test just hangs. The attached runs the test 10 times, which is
sufficient to reproduce the hang on my machine. Sometimes the first
iteration hangs, sometimes it's the last one.

This was working the last time I built libapr1 (19-Feb-2012).

To run the test, just run "make".

Regards,
David

-- 
David Rothenberger  ----  daveroth@acm.org

question = ( to ) ? be : ! be;
                -- Wm. Shakespeare

[-- Attachment #2: Makefile --]
[-- Type: text/plain, Size: 179 bytes --]

CC=gcc
CFLAGS=-Wall
STC=stc-flock-fork

.PHONY: test
test: $(STC)
	./$(STC)

$(STC): $(STC).c
	$(CC) $(CFLAGS) -o $@ $^

.PHONY: clean
clean:
	rm -f $(STC)
	-rm -f /tmp/flocktst*

[-- Attachment #3: stc-flock-fork.c --]
[-- Type: text/plain, Size: 4043 bytes --]

/***********************************************************************
 * This is a STC to show that flock occasionally does not work.
 *
 * It tries to use flock() for file locking. It creates a temporary
 * file, the uses fork to spawn a number of children. Each child opens
 * the file, then repeatedly uses flock to lock and unlock it.
 *
 * While each child has the lock, it increments a counter stored in
 * shared memory in a racy way, passing the current value to a function
 * which sleeps briefly, then returns the incremented counter.
 *
 * If all works correctly, the counter should end up be incremented
 * by each child iteration.
 *
 * However, this is failing for me occasionally. The test case hangs after a
   few iterations.
 *
 * This test was extracted from the APR test suite.
 *
 * Compile: gcc -Wall -o stc-flock-fork stc-flock-fork.c
 ***********************************************************************/

#include <sys/types.h>
#include <sys/file.h>
#include <sys/wait.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/stat.h>
#include <sys/mman.h>

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#define MAX_ITER 10
#define CHILDREN 6
#define MAX_COUNT (MAX_ITER * CHILDREN)

/* Counter stored in shared memory. */
static volatile int *x;

/* A temporary file used for flock. */
char tmpfilename[] = "/tmp/flocktstXXXXXX";

/* a slower more racy way to implement (*x)++ */
static int increment(int n)
{
    usleep(1);
    return n+1;
}

/* Fork and use flock to lock and unlock the file repeatedly in the child. */
void make_child(int trylock, pid_t *pid)
{
    if ((*pid = fork()) < 0) {
        perror("fork failed");
        exit(1);
    }
    else if (*pid == 0) {
        int fd2 = open(tmpfilename, O_RDONLY);
        if (fd2 < 0) {
            perror("child open");
            exit(1);
        }

        int rc;
        int i;
        for (i=0; i<MAX_ITER; ++i) {
            /* Get the lock. */
            do {
                rc = flock(fd2, LOCK_EX);
            } while (rc < 0 && errno == EINTR);
            if (rc < 0) {
                perror("lock");
                exit(1);
            }

            /* Have the lock. Increment the counter. */
            *x = increment(*x);

            /* Release the lock. */
            do {
                rc = flock(fd2, LOCK_UN);
            } while (rc < 0 && errno == EINTR);
            if (rc < 0) {
                perror("unlock");
                exit(1);
            }
        }
        exit(0);
    }
}

/* Wait for the child to finish. */
void await_child(pid_t pid)
{
    pid_t pstatus;
    int exit_int;

    do {
        pstatus = waitpid(pid, &exit_int, WUNTRACED);
    } while (pstatus < 0 && errno == EINTR);
}

/* Allocate and attach shared memory */
void init_shm ()
{
    x = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
             MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    if (!x) {
        perror ("mmap failed");
        exit (1);
    }
}

int main(int argc, const char * const * argv, const char * const *env)
{
    pid_t child[CHILDREN];
    int i;
    int n;
    int fd;

    /* Create the temporary file. */
    fd = mkstemp(tmpfilename);
    if (fd < 0) {
        perror("open failed");
        exit(1);
    }
    close(fd);

    /* Initialize shared memory */
    init_shm();

    /* Perform the test multiple times, since this fails only intermittedly. */
    for (i = 0; i < 10; ++i) {
        printf("Iteration %d\n", i);

        /* Initialize counter */
        *x = 0;

        /* Create the children. */
        for (n = 0; n < CHILDREN; n++)
            make_child(0, &child[n]);

        /* Wait for them to finish. */
        for (n = 0; n < CHILDREN; n++)
            await_child(child[n]);

        /* Check counter */
        if (*x != MAX_COUNT) {
            printf("Iteration %d: FAILED: *x (%d) != MAX_COUNT (%d)\n", i, *x, MAX_COUNT);
            exit(1);
        }
    }

    /* Clean up. */
    unlink(tmpfilename);

    return 0;
}


[-- Attachment #4: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: STC for libapr1 flock failure
  2013-04-26 17:52 STC for libapr1 flock failure David Rothenberger
@ 2013-04-27  4:22 ` David Rothenberger
  2013-05-01 23:23   ` David Rothenberger
  0 siblings, 1 reply; 5+ messages in thread
From: David Rothenberger @ 2013-04-27  4:22 UTC (permalink / raw)
  To: cygwin

On 4/26/2013 9:25 AM, David Rothenberger wrote:
> This was working the last time I built libapr1 (19-Feb-2012).

The test case does work with 1.7.17-1. The last snapshot I could find
where it worked is 2012-12-18 17:38:50 UTC. The 2012-12-21 snapshot
broke it badly, causing

   wait_sig: WaitForSingleObject(0x6C8) for thread exit returned 258

messages on even the first iteration. Subsequent snapshots just hang
after a few iterations.

-- 
David Rothenberger  ----  daveroth@acm.org

"Buy land.  They've stopped making it."
                -- Mark Twain

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: STC for libapr1 flock failure
  2013-04-27  4:22 ` David Rothenberger
@ 2013-05-01 23:23   ` David Rothenberger
  2013-05-15 18:52     ` David Rothenberger
  0 siblings, 1 reply; 5+ messages in thread
From: David Rothenberger @ 2013-05-01 23:23 UTC (permalink / raw)
  To: cygwin

David Rothenberger wrote:
> On 4/26/2013 9:25 AM, David Rothenberger wrote:
>> This was working the last time I built libapr1 (19-Feb-2012).
> 
> The test case does work with 1.7.17-1. The last snapshot I could find
> where it worked is 2012-12-18 17:38:50 UTC. The 2012-12-21 snapshot
> broke it badly, causing
> 
>    wait_sig: WaitForSingleObject(0x6C8) for thread exit returned 258
> 
> messages on even the first iteration. Subsequent snapshots just hang
> after a few iterations.

I retested the STC with the latest snapshot and it still hangs.

I know Christopher didn't say it might be fixed, but since there were
signal handler changes involved I thought I'd give it a try.

-- 
David Rothenberger                spammer? -> spam@daveroth.dyndns.org
GPG/PGP: 0x7F67E734, C233 365A 25EF 2C5F C8E1 43DF B44F BA26 7F67 E734

"The computer programmer is a creator of universes for which he alone
 is responsible. Universes of virtually unlimited complexity can be
 created in the form of computer programs."
                -- Joseph Weizenbaum, _Computer Power and Human Reason_

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: STC for libapr1 flock failure
  2013-05-01 23:23   ` David Rothenberger
@ 2013-05-15 18:52     ` David Rothenberger
  2013-05-16  5:38       ` Christopher Faylor
  0 siblings, 1 reply; 5+ messages in thread
From: David Rothenberger @ 2013-05-15 18:52 UTC (permalink / raw)
  To: cygwin

David Rothenberger wrote:
> David Rothenberger wrote:
>> On 4/26/2013 9:25 AM, David Rothenberger wrote:
>>> This was working the last time I built libapr1 (19-Feb-2012).
>>
>> The test case does work with 1.7.17-1. The last snapshot I could find
>> where it worked is 2012-12-18 17:38:50 UTC. The 2012-12-21 snapshot
>> broke it badly, causing
>>
>>    wait_sig: WaitForSingleObject(0x6C8) for thread exit returned 258
>>
>> messages on even the first iteration. Subsequent snapshots just hang
>> after a few iterations.
> 
> I retested the STC with the latest snapshot and it still hangs.
> 
> I know Christopher didn't say it might be fixed, but since there were
> signal handler changes involved I thought I'd give it a try.

There hasn't been any response to the bug report yet and I'm wondering
why. I don't want to be pushy, but I'm curious if this is because core
developers haven't yet had time to look at it, the STC does not provide
enough information, or perhaps it was just overlooked.

I'm trying to build a 64-bit version of this library and the test case
is failing there, too. If this won't be addressed, I will disable flock
locking in libapr1.

-- 
David Rothenberger                spammer? -> spam@daveroth.dyndns.org
GPG/PGP: 0x7F67E734, C233 365A 25EF 2C5F C8E1 43DF B44F BA26 7F67 E734

"Those who will be able to conquer software will be able to conquer the
world."
                -- Tadahiro Sekimoto, president, NEC Corp.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: STC for libapr1 flock failure
  2013-05-15 18:52     ` David Rothenberger
@ 2013-05-16  5:38       ` Christopher Faylor
  0 siblings, 0 replies; 5+ messages in thread
From: Christopher Faylor @ 2013-05-16  5:38 UTC (permalink / raw)
  To: cygwin

On Wed, May 15, 2013 at 11:52:04AM -0700, David Rothenberger wrote:
>David Rothenberger wrote:
>> David Rothenberger wrote:
>>> On 4/26/2013 9:25 AM, David Rothenberger wrote:
>>>> This was working the last time I built libapr1 (19-Feb-2012).
>>>
>>> The test case does work with 1.7.17-1. The last snapshot I could find
>>> where it worked is 2012-12-18 17:38:50 UTC. The 2012-12-21 snapshot
>>> broke it badly, causing
>>>
>>>    wait_sig: WaitForSingleObject(0x6C8) for thread exit returned 258
>>>
>>> messages on even the first iteration. Subsequent snapshots just hang
>>> after a few iterations.
>> 
>> I retested the STC with the latest snapshot and it still hangs.
>> 
>> I know Christopher didn't say it might be fixed, but since there were
>> signal handler changes involved I thought I'd give it a try.
>
>There hasn't been any response to the bug report yet and I'm wondering
>why. I don't want to be pushy, but I'm curious if this is because core
>developers haven't yet had time to look at it, the STC does not provide
>enough information, or perhaps it was just overlooked.

My time is in very short supply these days so I hadn't had a chance to
look into this.  I just did now, however.  This should be fixed in the
next snapshot (building now).

http://cygwin.com/snapshots/

Thanks for the test case.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-05-16  5:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-26 17:52 STC for libapr1 flock failure David Rothenberger
2013-04-27  4:22 ` David Rothenberger
2013-05-01 23:23   ` David Rothenberger
2013-05-15 18:52     ` David Rothenberger
2013-05-16  5:38       ` Christopher Faylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).