public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug linuxthreads/25774] New: Suggestion: count number of retries in __lll_lock_wait
@ 2020-04-04  9:52 rigault.francois+glib at gmail dot com
  2020-04-04 16:10 ` [Bug linuxthreads/25774] " romain.geissler at amadeus dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: rigault.francois+glib at gmail dot com @ 2020-04-04  9:52 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=25774

            Bug ID: 25774
           Summary: Suggestion: count number of retries in __lll_lock_wait
           Product: glibc
           Version: 2.30
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P2
         Component: linuxthreads
          Assignee: unassigned at sourceware dot org
          Reporter: rigault.francois+glib at gmail dot com
                CC: drow at false dot org
  Target Milestone: ---

There is no guarantee that threads trying to concurrently access the same
mutex, acquire it in a fair way.

If you consider a c++ program running multiple of these threads:


```
#include <mutex>
#include <chrono>
#include <iostream>
#include <thread>
using namespace std;
using namespace std::chrono;

mutex m;

void func(string s) {
        while(true) {
                lock_guard<mutex> l(m);
                cout << s << endl;
                this_thread::sleep_for(seconds(1));
        }
}
int main() {

        thread one(func, "one");
        thread two(func, "two");

        one.join();
        return 0;
}
```

here the first thread acquiring the mutex will keep it for one second.
When it releases the mutex, it wakes up the other thread.

However, practically, there is no guarantee that the other thread successfully
acquires it. Both threads are still trying to concurrently acquire the mutex.
In practice, the same thread gets to acquire the mutex again and again. The
second thread will seem stuck and will never acquire the mutex.

In order to ease the investigation of these kinds of stuck threads, what do you
think of improving the current stap probe point in this fashion:

```
--- glibc-2.30/nptl/lowlevellock.c      2019-08-01 06:29:31.000000000 +0200
+++ ../glibc/glibc-2.30/nptl/lowlevellock.c     2020-04-04 11:24:50.367896172
+0200
@@ -42,13 +42,16 @@
 void
 __lll_lock_wait (int *futex, int private)
 {
+  int count = 0;
+
   if (atomic_load_relaxed (futex) == 2)
     goto futex;

   while (atomic_exchange_acquire (futex, 2) != 0)
     {
+      ++count;
     futex:
-      LIBC_PROBE (lll_lock_wait, 1, futex);
+      LIBC_PROBE (lll_lock_wait, 2, futex, count);
       lll_futex_wait (futex, 2, private); /* Wait if *futex == 2.  */
     }
 }
```

this would allow a system tap script to detect how many times a stuck thread
has been woken up without managing to acquire the lock.

It seems pretty easy to fall into this caveat when developing a multi threaded
application, I don't know if there is any other mean to investigate this.
Feedback is very welcome.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug linuxthreads/25774] Suggestion: count number of retries in __lll_lock_wait
  2020-04-04  9:52 [Bug linuxthreads/25774] New: Suggestion: count number of retries in __lll_lock_wait rigault.francois+glib at gmail dot com
@ 2020-04-04 16:10 ` romain.geissler at amadeus dot com
  2020-04-04 16:17 ` carlos at redhat dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: romain.geissler at amadeus dot com @ 2020-04-04 16:10 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=25774

Romain Geissler <romain.geissler at amadeus dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |romain.geissler at amadeus dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug linuxthreads/25774] Suggestion: count number of retries in __lll_lock_wait
  2020-04-04  9:52 [Bug linuxthreads/25774] New: Suggestion: count number of retries in __lll_lock_wait rigault.francois+glib at gmail dot com
  2020-04-04 16:10 ` [Bug linuxthreads/25774] " romain.geissler at amadeus dot com
@ 2020-04-04 16:17 ` carlos at redhat dot com
  2020-04-04 20:51 ` rigault.francois+glib at gmail dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: carlos at redhat dot com @ 2020-04-04 16:17 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=25774

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
                 CC|                            |carlos at redhat dot com
   Last reconfirmed|                            |2020-04-04
     Ever confirmed|0                           |1

--- Comment #1 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to François Rigault from comment #0)
> ```
> --- glibc-2.30/nptl/lowlevellock.c      2019-08-01 06:29:31.000000000 +0200
> +++ ../glibc/glibc-2.30/nptl/lowlevellock.c     2020-04-04
> 11:24:50.367896172 +0200
> @@ -42,13 +42,16 @@
>  void
>  __lll_lock_wait (int *futex, int private)
>  {
> +  int count = 0;
> +
>    if (atomic_load_relaxed (futex) == 2)
>      goto futex;
> 
>    while (atomic_exchange_acquire (futex, 2) != 0)
>      {
> +      ++count;
>      futex:
> -      LIBC_PROBE (lll_lock_wait, 1, futex);
> +      LIBC_PROBE (lll_lock_wait, 2, futex, count);
>        lll_futex_wait (futex, 2, private); /* Wait if *futex == 2.  */
>      }
>  }
> ```

It adds a small amount of code in the slow path before the wait, which seems
fine to me, we're about to enter the kernel and sleep.

The value is local, and should be 'unsigned' to allow wrapping.

I think your suggestion is a good one.

Would you like to put a patch together for this?

I'd expect:
- The change to the probe.
- Add the probe to manual/probes.texi under a new node "POSIX Thread Probes"
and document the behaviour (previously undocumented) of just this probe.

Have a look at our contribution checklist:
https://sourceware.org/glibc/wiki/Contribution%20checklist

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug linuxthreads/25774] Suggestion: count number of retries in __lll_lock_wait
  2020-04-04  9:52 [Bug linuxthreads/25774] New: Suggestion: count number of retries in __lll_lock_wait rigault.francois+glib at gmail dot com
  2020-04-04 16:10 ` [Bug linuxthreads/25774] " romain.geissler at amadeus dot com
  2020-04-04 16:17 ` carlos at redhat dot com
@ 2020-04-04 20:51 ` rigault.francois+glib at gmail dot com
  2020-04-04 20:52 ` rigault.francois+glib at gmail dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rigault.francois+glib at gmail dot com @ 2020-04-04 20:51 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=25774

--- Comment #2 from François Rigault <rigault.francois+glib at gmail dot com> ---
Thanks for the feedback! I will try to read all that.

In the meantime, I was scratching my head to make this work with an older glibc
2.23. It turns out that in glibc 2.23, there was already a second argument to
the probe, that used to dump the "futex_op" argument of the futex syscall

```
1:     LIBC_PROBE (lll_lock_wait, 2, %rdi, %rsi)
```

so, depending on the version of glibc you will find either a probe with a
single argument, either a probe which second argument is either futex_op, or a
counter of how many times the thread was woken up uselessly. Hope it's not a
big deal.

Attaching what it looks like in glibc 2.23, purely fyi

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug linuxthreads/25774] Suggestion: count number of retries in __lll_lock_wait
  2020-04-04  9:52 [Bug linuxthreads/25774] New: Suggestion: count number of retries in __lll_lock_wait rigault.francois+glib at gmail dot com
                   ` (2 preceding siblings ...)
  2020-04-04 20:51 ` rigault.francois+glib at gmail dot com
@ 2020-04-04 20:52 ` rigault.francois+glib at gmail dot com
  2020-04-06 12:00 ` [Bug nptl/25774] " fweimer at redhat dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rigault.francois+glib at gmail dot com @ 2020-04-04 20:52 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=25774

--- Comment #3 from François Rigault <rigault.francois+glib at gmail dot com> ---
Created attachment 12433
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12433&action=edit
fyi changing the probe in glibc 2.23

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug nptl/25774] Suggestion: count number of retries in __lll_lock_wait
  2020-04-04  9:52 [Bug linuxthreads/25774] New: Suggestion: count number of retries in __lll_lock_wait rigault.francois+glib at gmail dot com
                   ` (3 preceding siblings ...)
  2020-04-04 20:52 ` rigault.francois+glib at gmail dot com
@ 2020-04-06 12:00 ` fweimer at redhat dot com
  2020-04-06 14:28 ` carlos at redhat dot com
  2020-04-11 10:35 ` rigault.francois+glib at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: fweimer at redhat dot com @ 2020-04-06 12:00 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=25774

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-
                 CC|                            |drepper.fsp at gmail dot com,
                   |                            |fweimer at redhat dot com
          Component|linuxthreads                |nptl

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug nptl/25774] Suggestion: count number of retries in __lll_lock_wait
  2020-04-04  9:52 [Bug linuxthreads/25774] New: Suggestion: count number of retries in __lll_lock_wait rigault.francois+glib at gmail dot com
                   ` (4 preceding siblings ...)
  2020-04-06 12:00 ` [Bug nptl/25774] " fweimer at redhat dot com
@ 2020-04-06 14:28 ` carlos at redhat dot com
  2020-04-11 10:35 ` rigault.francois+glib at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: carlos at redhat dot com @ 2020-04-06 14:28 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=25774

--- Comment #4 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to François Rigault from comment #3)
> Created attachment 12433 [details]
> fyi changing the probe in glibc 2.23

There is no lowlevellock.S anymore in upstream. Please work on master.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug nptl/25774] Suggestion: count number of retries in __lll_lock_wait
  2020-04-04  9:52 [Bug linuxthreads/25774] New: Suggestion: count number of retries in __lll_lock_wait rigault.francois+glib at gmail dot com
                   ` (5 preceding siblings ...)
  2020-04-06 14:28 ` carlos at redhat dot com
@ 2020-04-11 10:35 ` rigault.francois+glib at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: rigault.francois+glib at gmail dot com @ 2020-04-11 10:35 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=25774

--- Comment #5 from François Rigault <rigault.francois+glib at gmail dot com> ---
Created attachment 12451
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12451&action=edit
Proposed patch

Proposed patch. Can you please help me review it?

This is a tiny change that should not be legally significant as I understand.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-04-11 10:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-04  9:52 [Bug linuxthreads/25774] New: Suggestion: count number of retries in __lll_lock_wait rigault.francois+glib at gmail dot com
2020-04-04 16:10 ` [Bug linuxthreads/25774] " romain.geissler at amadeus dot com
2020-04-04 16:17 ` carlos at redhat dot com
2020-04-04 20:51 ` rigault.francois+glib at gmail dot com
2020-04-04 20:52 ` rigault.francois+glib at gmail dot com
2020-04-06 12:00 ` [Bug nptl/25774] " fweimer at redhat dot com
2020-04-06 14:28 ` carlos at redhat dot com
2020-04-11 10:35 ` rigault.francois+glib at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).