public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* RE: Possible race in SYSV IPC (semaphores)
@ 2012-11-19 21:06 Lavrentiev, Anton (NIH/NLM/NCBI) [C]
  2012-11-23 11:37 ` Corinna Vinschen
  0 siblings, 1 reply; 18+ messages in thread
From: Lavrentiev, Anton (NIH/NLM/NCBI) [C] @ 2012-11-19 21:06 UTC (permalink / raw)
  To: cygwin

Hello again,

I can now positively confirm the race condition in cygserver w.r.t. the named
pipe used to serialize SYSV requests through the server.  The race is due to
that transport_layer_pipes::accept() (bool *const recoverable) (file: transport_pipes.cc) does actually _create_ the pipe when pipe_instance == 0
(ironically, transport_layer_pipes::listen() does not create any OS primitives
at all!).

This means that under heavy load, cygserver threads may all end up processing
their requests and closing all instances of the pipe (bringing pipe_instance == 0)
yet not being able to get to the point of accepting new request (that is, to
re-create the pipe).  For the client (user process), this looks like the pipe
does not exist (during that very tiny period of time), and the following message
gets printed:

Iteration 3016
      1 [main] a 4872 transport_layer_pipes::connect: lost connection to cygserver, error = 2

Note "error 2" for ERROR_FILE_NOT_FOUND.

The error is so subtle it is very difficult to reproduce with the default number of
threads in cygserver, 10 (although, not impossible, esp. on a moderately-to-heavily
loaded system).

The error is not due to deadlocking (reported earlier today) -- the server remains
afloat coming to CreateNamedPipe() just a jiffy too late, it's the client that suffers
from the pipe disappearance.  Deadlocking of the logging cygserver seems to remain a
yet another issue.

Anton Lavrentiev
Contractor NIH/NLM/NCBI

P.S. The code to reveal the race is the very same test program posted today
(stripped of the BUGx conditionals):

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/ipc.h>
#include <sys/sem.h>


#define SEMKEY 1


union semun {
    int val;                    /* value for SETVAL */
    struct semid_ds *buf;       /* buffer for IPC_STAT, IPC_SET */
    unsigned short  *array;     /* array for GETALL, SETALL */
};


static void doCriticalWork(void)
{
    return;
}


int main(void)
{
    struct sembuf lock[2];
    int n, semid;

    if ((semid = semget(SEMKEY, 2, IPC_CREAT | 0666)) < 0) {
        perror("semget(IPC_CREATE)");
        return 1;
    }

    lock[0].sem_num = 0;
    lock[0].sem_op  = 0;
    lock[0].sem_flg = IPC_NOWAIT;
    lock[1].sem_num = 0;
    lock[1].sem_op  = 1;

    lock[1].sem_flg = SEM_UNDO;

    if (semop(semid, lock, 2) != 0) {
        perror("semop(LOCK[0])");
        return 1;
    }

    for (n = 0; ; ++n) {
        static const union semun arg = { 0 };
        int error;

        printf("Iteration %d\n", n);

        lock[0].sem_num = 1;
        lock[0].sem_op  = 0; /* precondition:  [1] == 0 */
        lock[0].sem_flg = 0;
        lock[1].sem_num = 1;
        lock[1].sem_op  = 1; /* postcondition: [1] == 1 */
        lock[1].sem_flg = SEM_UNDO;

        if (semop(semid, lock, 2) < 0) {
            error = errno;
            fprintf(stderr, "semop(LOCK[1]): %d, %s\n",
                    error, strerror(error));
            break;
        }

        doCriticalWork();

        lock[0].sem_num =  1;
        lock[0].sem_op  = -1;
        lock[0].sem_flg =  SEM_UNDO | IPC_NOWAIT;

        if (semop(semid, lock, 1) < 0) {
            error = errno;
            fprintf(stderr, "semop(UNLOCK[1]): %d, %s\n",
                    error, strerror(error));
            break;
        }
    }

    return 1;
}


^ permalink raw reply	[flat|nested] 18+ messages in thread
* RE: Possible race in SYSV IPC (semaphores)
@ 2012-11-19 16:23 Lavrentiev, Anton (NIH/NLM/NCBI) [C]
  0 siblings, 0 replies; 18+ messages in thread
From: Lavrentiev, Anton (NIH/NLM/NCBI) [C] @ 2012-11-19 16:23 UTC (permalink / raw)
  To: cygwin

> For testing it would be most helpful if you would create a STC.

The test case attached from my earlier post from today, also shows
a race (dead-lock, actually) occurring in CYGSERVER, when it is run
with the "-d -l 7" command line (even when it is only one application,
the test case is using it).  There is an internal disconcert between
the cygserver threads.

The log ends with this, and there is no way to stop the server
(like with "cygrunsrv -S") other than killing it from the task manager:

cygserver: /home/corinna/src/cygwin/cygwin-1.7.15/cygwin-1.7.15-1/src/cygwin-1.7.15/winsup/cygserver/transport_pipes.cc, line 132: Try to create named pipe: \\.\pipe\cygwin-13a7ed34cc1953a9-lpccygserver: /home/corinna/src/cygwin/cygwin-1.7.15/cygwin-1.7.15-1/src/cygwin-1.7.15/winsup/cygserver/process.cc, line 287: Try hold(4956)
cygserver: /home/corinna/src/cygwin/cygwin-1.7.15/cygwin-1.7.15-1/src/cygwin-1.7.15/winsup/cygserver/process.cc, line 287: holding (4956)

Anton Lavrentiev
Contractor NIH/NLM/NCBI


^ permalink raw reply	[flat|nested] 18+ messages in thread
* Possible race in SYSV IPC (semaphores)
@ 2012-10-26 18:37 Lavrentiev, Anton (NIH/NLM/NCBI) [C]
  2012-10-27 10:24 ` Corinna Vinschen
  0 siblings, 1 reply; 18+ messages in thread
From: Lavrentiev, Anton (NIH/NLM/NCBI) [C] @ 2012-10-26 18:37 UTC (permalink / raw)
  To: cygwin

Hi,

For now, I can only report the observed (mis) behavior of SYSV semop() call,
which (on the client side) gets manifested as the following:

transport_layer_pipes::connect: lost connection to cygserver, error = 2

(this code then does a by-hand adjustment with semctl(SETVAL)).

Note that there is a dedicated cygserver process running for my single-threaded
application.

Looking at the debugging output of cygserver, this is what I see in the log
(around the only time semctl() is logged there):

cygserver: /home/corinna/src/cygwin/cygwin-1.7.15/cygwin-1.7.15-1/src/cygwin-1.7.15/winsup/cygserver/transport_pipes.cc, line 132: Try to create named pipe: \\.\pipe\cygwin-13a7ed34cc1953a9-lpc
cygserver: /home/corinna/src/cygwin/cygwin-1.7.15/cygwin-1.7.15-1/src/cygwin-1.7.15/winsup/cygserver/transport_pipes.cc, line 132: Try to create named pipe: \\.\pipe\cygwin-13a7ed34cc1953a9-lpc

Note the double pipe creation call, and only a single "exit" log line such as:

cygserver: /home/corinna/src/cygwin/cygwin-1.7.15/cygwin-1.7.15-1/src/cygwin-1.7.15/winsup/cygserver/sem.cc, line 81: leaving (3416)

Cygserver does not stop (also, since SIGSYS is set to ignore in the program,
it also keeps running -- although, not always quite successfully once the semop()
failure occurred.)

The semaphore operations are very intensive; and involve arrays of 5 sems at
some times;  also, there are quite large chunks of shmem updated every now
and then.

I studied the source of cygserver, and noticed that pipe_instance (transport_pipes.cc)
is not declared "volatile".  This is strange because the compiler can rearrange lines
of code that include this variable, otherwise.  And that seems rather critical.

Right now what I observe, is that SYSV IPC is unreliable, and I'm yet to figure
out why;  the very same code (the locking logic) works on Linux/Solaris/Mac for
years and on thousands (yes, that many) of hosts.  With CYGWIN the instability can
appear within a wide range of run time: from just a few minutes to some long hours,
rather randomly.

Any input can be greatly appreciated.

Anton Lavrentiev
Contractor NIH/NLM/NCBI


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-11-28 21:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-19 21:06 Possible race in SYSV IPC (semaphores) Lavrentiev, Anton (NIH/NLM/NCBI) [C]
2012-11-23 11:37 ` Corinna Vinschen
2012-11-23 13:11   ` Corinna Vinschen
2012-11-23 13:34     ` Corinna Vinschen
2012-11-23 16:44       ` Corinna Vinschen
2012-11-23 17:16         ` Lavrentiev, Anton (NIH/NLM/NCBI) [C]
2012-11-23 17:29           ` Corinna Vinschen
2012-11-23 17:47             ` Lavrentiev, Anton (NIH/NLM/NCBI) [C]
2012-11-23 17:59               ` Corinna Vinschen
2012-11-26 12:54                 ` Corinna Vinschen
2012-11-26 18:09                   ` Lavrentiev, Anton (NIH/NLM/NCBI) [C]
2012-11-26 18:14                     ` Lavrentiev, Anton (NIH/NLM/NCBI) [C]
2012-11-27  7:13                     ` Corinna Vinschen
2012-11-27 14:37                       ` Lavrentiev, Anton (NIH/NLM/NCBI) [C]
2012-11-28 21:20                       ` Lavrentiev, Anton (NIH/NLM/NCBI) [C]
  -- strict thread matches above, loose matches on Subject: below --
2012-11-19 16:23 Lavrentiev, Anton (NIH/NLM/NCBI) [C]
2012-10-26 18:37 Lavrentiev, Anton (NIH/NLM/NCBI) [C]
2012-10-27 10:24 ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).