public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/18493] New: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
@ 2015-06-05 11:12 dan at censornet dot com
  2015-06-05 11:13 ` [Bug libc/18493] " dan at censornet dot com
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: dan at censornet dot com @ 2015-06-05 11:12 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=18493

            Bug ID: 18493
           Summary: Infinite loop/deadlock? in __libc_recv
                    (fd=fd@entry=300, buf=buf@entry=0x7f6042880600,
                    n=n@entry=5, flags=-1, flags@entry=258) at
                    ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
           Product: glibc
           Version: 2.19
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: dan at censornet dot com
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

In a multi-threaded pthreads process running on Ubuntu 14.04 AMD64 (with over
1000 threads) which uses real time FIFO scheduling, we occasionally see calls
to recv() with flags (MSG_PEEK | MSG_WAITALL) get stuck in an infinte loop or
deadlock meaning the threads lock up chewing as much CPU as they can (due to
FIFO scheduling) while stuck inside recv().

Here's an example gdb back trace:

[Switching to thread 4 (Thread 0x7f6040546700 (LWP 27251))]
#0  0x00007f6231d2f7eb in __libc_recv (fd=fd@entry=146,
buf=buf@entry=0x7f6040543600, n=n@entry=5, flags=-1, flags@entry=258) at
../sysdeps/unix/sysv/linux/x86_64/recv.c:33
33      ../sysdeps/unix/sysv/linux/x86_64/recv.c: No such file or directory.
(gdb) bt
#0  0x00007f6231d2f7eb in __libc_recv (fd=fd@entry=146,
buf=buf@entry=0x7f6040543600, n=n@entry=5, flags=-1, flags@entry=258) at
../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x0000000000421945 in recv (__flags=258, __n=5, __buf=0x7f6040543600,
__fd=146) at /usr/include/x86_64-linux-gnu/bits/socket2.h:44
[snip]

The socket is a TCP socket in blocking mode, the recv() call is inside an outer
loop with a counter, and I've checked the counter with gdb and it's always at
1, meaning that I'm sure that the outer loop isn't the problem, the thread is
indeed deadlocked inside the recv() internals.

Other nodes: 
* There always seems to be 2 or more threads deadlocked in the same place (same
recv() call but with distinct FDs)
* The threads calling recv() have cancellation disbaled by previously
executing: thread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);

I've even tried adding a poll() call for POLLRDNORM on the socket before
calling recv() with MSG_PEEK | MSG_WAITALL flags to try to make sure there's
data available on the socket before calling poll(), but it makes no difference.

So, I don't know what is wrong here, I've read all the recv() documentation and
believe that recv() is being used correctly, the only conclusion I can come to
is that there is a bug in libc recv() when using flags MSG_PEEK | MSG_WAITALL
with thousands of pthreads running.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libc/18493] Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  2015-06-05 11:12 [Bug libc/18493] New: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 dan at censornet dot com
@ 2015-06-05 11:13 ` dan at censornet dot com
  2015-06-05 11:17 ` dan at censornet dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: dan at censornet dot com @ 2015-06-05 11:13 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=18493

Dan Searle <dan at censornet dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |critical

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libc/18493] Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  2015-06-05 11:12 [Bug libc/18493] New: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 dan at censornet dot com
  2015-06-05 11:13 ` [Bug libc/18493] " dan at censornet dot com
@ 2015-06-05 11:17 ` dan at censornet dot com
  2015-06-05 11:44 ` schwab@linux-m68k.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: dan at censornet dot com @ 2015-06-05 11:17 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=18493

--- Comment #1 from Dan Searle <dan at censornet dot com> ---
Typo, original: "I've even tried adding a poll() call for POLLRDNORM on the
socket before calling recv() with MSG_PEEK | MSG_WAITALL flags to try to make
sure there's data available on the socket before calling poll(), but it makes
no difference."

Should have been: "I've even tried adding a poll() call for POLLRDNORM on the
socket before calling recv() with MSG_PEEK | MSG_WAITALL flags to try to make
sure there's data available on the socket before calling *recv()*, but it makes
no difference."

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libc/18493] Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  2015-06-05 11:12 [Bug libc/18493] New: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 dan at censornet dot com
  2015-06-05 11:13 ` [Bug libc/18493] " dan at censornet dot com
  2015-06-05 11:17 ` dan at censornet dot com
@ 2015-06-05 11:44 ` schwab@linux-m68k.org
  2015-06-05 11:52 ` dan at censornet dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: schwab@linux-m68k.org @ 2015-06-05 11:44 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=18493

Andreas Schwab <schwab@linux-m68k.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |MOVED

--- Comment #2 from Andreas Schwab <schwab@linux-m68k.org> ---
The __libc_recv function is just a thin wrapper around the recvfrom system
call.  Please report this to the kernel people.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libc/18493] Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  2015-06-05 11:12 [Bug libc/18493] New: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 dan at censornet dot com
                   ` (2 preceding siblings ...)
  2015-06-05 11:44 ` schwab@linux-m68k.org
@ 2015-06-05 11:52 ` dan at censornet dot com
  2015-06-10 14:00 ` fweimer at redhat dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: dan at censornet dot com @ 2015-06-05 11:52 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=18493

--- Comment #3 from Dan Searle <dan at censornet dot com> ---
by "kernel people", you mean https://bugzilla.kernel.org/ ?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libc/18493] Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  2015-06-05 11:12 [Bug libc/18493] New: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 dan at censornet dot com
                   ` (3 preceding siblings ...)
  2015-06-05 11:52 ` dan at censornet dot com
@ 2015-06-10 14:00 ` fweimer at redhat dot com
  2015-06-11  8:00 ` dan at censornet dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: fweimer at redhat dot com @ 2015-06-10 14:00 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=18493

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

--- Comment #4 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Dan Searle from comment #3)
> by "kernel people", you mean https://bugzilla.kernel.org/ ?

More like one of the mailing lists, either linux-kernel or netdev.  You will
need to provide a proper test case, though.  It's also not quite clear what you
mean by “the threads lock up chewing as much CPU as they can”. Does recv
actually return from the kernel?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-28492-listarch-glibc-bugs=sources.redhat.com@sourceware.org Wed Jun 10 18:43:36 2015
Return-Path: <glibc-bugs-return-28492-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 119315 invoked by alias); 10 Jun 2015 18:43:36 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 119284 invoked by uid 48); 10 Jun 2015 18:43:32 -0000
From: "carlos at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug build/18512] make install failure with overridden prefix
Date: Wed, 10 Jun 2015 18:43:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: build
X-Bugzilla-Version: 2.21
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: carlos at redhat dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution:
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: security-
X-Bugzilla-Changed-Fields:
Message-ID: <bug-18512-131-IeKcS7UJJx@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-18512-131@http.sourceware.org/bugzilla/>
References: <bug-18512-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-06/txt/msg00103.txt.bz2
Content-length: 1352

https://sourceware.org/bugzilla/show_bug.cgi?id\x18512

--- Comment #1 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Martin Sebor from comment #0)
> Attempting to install glibc configured with --prefix=/usr into a
> non-standard directory specified by the prefix make variable fails with the
> error below:
>
> $ /src/glibc-trunk/configure --prefix=/usr
> ...
> $ nice make install prefix=/build/glibc-trunk-install-prefix-override-usr
> make[3]: Leaving directory `/src/glibc-trunk/elf'
> /usr/bin/install -c /build/glibc-trunk/elf/ld.so /lib64/ld-2.21.90.so.new
> /usr/bin/install: cannot create regular file '/lib64/ld-2.21.90.so.new':
> Permission denied
> make[2]: *** [/lib64/ld-2.21.90.so] Error 1
> make[2]: Leaving directory `/src/glibc-trunk/elf'
> make[1]: *** [elf/ldso_install] Error 2
> make[1]: Leaving directory `/src/glibc-trunk'
> make: *** [install] Error 2
>
> However, with glibc configured with a different prefix the same installation
> succeeds.
>
> (Setting the DESTDIR variable works as one would expect.)

This is an unsupported use case.

The prefix is a part of the ABI for glibc, and you can't change it at install
time, only at configure time.

My preference would be to have it error out that you've changed the prefix.

--
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libc/18493] Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  2015-06-05 11:12 [Bug libc/18493] New: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 dan at censornet dot com
                   ` (4 preceding siblings ...)
  2015-06-10 14:00 ` fweimer at redhat dot com
@ 2015-06-11  8:00 ` dan at censornet dot com
  2015-06-11  8:07 ` fweimer at redhat dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: dan at censornet dot com @ 2015-06-11  8:00 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=18493

--- Comment #5 from Dan Searle <dan at censornet dot com> ---
There is a tracker for this bug with a test case here:
https://bugzilla.redhat.com/show_bug.cgi?id=1205258

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libc/18493] Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  2015-06-05 11:12 [Bug libc/18493] New: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 dan at censornet dot com
                   ` (5 preceding siblings ...)
  2015-06-11  8:00 ` dan at censornet dot com
@ 2015-06-11  8:07 ` fweimer at redhat dot com
  2015-06-11  8:08 ` fweimer at redhat dot com
  2015-06-11  8:15 ` dan at censornet dot com
  8 siblings, 0 replies; 10+ messages in thread
From: fweimer at redhat dot com @ 2015-06-11  8:07 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=18493

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugzilla.redhat.com
                   |                            |/show_bug.cgi?id=1205258

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libc/18493] Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  2015-06-05 11:12 [Bug libc/18493] New: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 dan at censornet dot com
                   ` (6 preceding siblings ...)
  2015-06-11  8:07 ` fweimer at redhat dot com
@ 2015-06-11  8:08 ` fweimer at redhat dot com
  2015-06-11  8:15 ` dan at censornet dot com
  8 siblings, 0 replies; 10+ messages in thread
From: fweimer at redhat dot com @ 2015-06-11  8:08 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=18493

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugzilla.kernel.org
                   |                            |/show_bug.cgi?id=99461

--- Comment #6 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Dan Searle from comment #5)
> There is a tracker for this bug with a test case here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1205258

Also <https://bugzilla.kernel.org/show_bug.cgi?id=99461>.  This clearly is a
kernel bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libc/18493] Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  2015-06-05 11:12 [Bug libc/18493] New: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 dan at censornet dot com
                   ` (7 preceding siblings ...)
  2015-06-11  8:08 ` fweimer at redhat dot com
@ 2015-06-11  8:15 ` dan at censornet dot com
  8 siblings, 0 replies; 10+ messages in thread
From: dan at censornet dot com @ 2015-06-11  8:15 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=18493

--- Comment #7 from Dan Searle <dan at censornet dot com> ---
Agreed, it's a kernel bug, it doesn't handle symultanious use of both MSG_PEEK
and MSG_WAITALL flags in recvfrom SYSCALL in certain edge case(s).

I have worked around the issue for now by not using MSG_WAITALL (while still
using MSG_PEEK) with an outer loop around recv() with a sleep() and a counter
to retry the recv() call a set number of times before timing out.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-06-11  8:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-05 11:12 [Bug libc/18493] New: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 dan at censornet dot com
2015-06-05 11:13 ` [Bug libc/18493] " dan at censornet dot com
2015-06-05 11:17 ` dan at censornet dot com
2015-06-05 11:44 ` schwab@linux-m68k.org
2015-06-05 11:52 ` dan at censornet dot com
2015-06-10 14:00 ` fweimer at redhat dot com
2015-06-11  8:00 ` dan at censornet dot com
2015-06-11  8:07 ` fweimer at redhat dot com
2015-06-11  8:08 ` fweimer at redhat dot com
2015-06-11  8:15 ` dan at censornet dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).