public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug nptl/30977] New: Hang in pthread_cond_wait
@ 2023-10-17 13:19 zengzetang at bytedance dot com
  2023-10-17 13:44 ` [Bug nptl/30977] " schwab@linux-m68k.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: zengzetang at bytedance dot com @ 2023-10-17 13:19 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30977

            Bug ID: 30977
           Summary: Hang in pthread_cond_wait
           Product: glibc
           Version: 2.17
            Status: UNCONFIRMED
          Severity: critical
          Priority: P2
         Component: nptl
          Assignee: unassigned at sourceware dot org
          Reporter: zengzetang at bytedance dot com
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

Created attachment 15177
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15177&action=edit
repeat

Recently we met a hang in MySQL
(https://bugs.mysql.com/bug.php?id=112277&thanks=5&notify=71). In this bug
situation, many MySQL threads block on pthread_cond_wait. 


After some investigation of MySQL source code, we think it may be a bug of
pthread lib. 


We wrote some code to try to repeat the same stacktrace as MySQL did. Although
the stacktrace is not totally same, they have same behavior. Run the code and
it block s in 1 minute.


Stack trace get from gdb:

(gdb) bt
#0  0x00007ffff7bcd54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff7bcb240 in pthread_cond_broadcast@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#2  0x00000000004013f0 in CountDownLatch::countDown (this=0x7fffe80008c0) at
con.cpp:50
#3  0x0000000000400fa8 in agree () at con.cpp:107
#4  0x0000000000401223 in main () at con.cpp:142
(gdb) t 2
[Switching to thread 2 (Thread 0x7ffff6fd0700 (LWP 1583721))]
#0  0x00007ffff7bcaa35 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bcaa35 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x000000000040138b in CountDownLatch::wait (this=0x7ffff00008c0) at
con.cpp:38
#2  0x0000000000400e43 in waitForAgree (val=0x1) at con.cpp:90
#3  0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff70cfb0d in clone () from /lib64/libc.so.6
(gdb) t 3
[Switching to thread 3 (Thread 0x7ffff67cf700 (LWP 1583722))]
#0  0x00007ffff7bcd54d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bcd54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff7bc8e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007ffff7bc8d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000000000040136e in CountDownLatch::wait (this=0x7fffe80008c0) at
con.cpp:36
#4  0x0000000000400e43 in waitForAgree (val=0x2) at con.cpp:90
#5  0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007ffff70cfb0d in clone () from /lib64/libc.so.6



ldd --version
ldd (GNU libc) 2.17

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug nptl/30977] Hang in pthread_cond_wait
  2023-10-17 13:19 [Bug nptl/30977] New: Hang in pthread_cond_wait zengzetang at bytedance dot com
@ 2023-10-17 13:44 ` schwab@linux-m68k.org
  2023-10-17 14:01 ` schwab@linux-m68k.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: schwab@linux-m68k.org @ 2023-10-17 13:44 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30977

--- Comment #1 from Andreas Schwab <schwab@linux-m68k.org> ---
Use after free?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug nptl/30977] Hang in pthread_cond_wait
  2023-10-17 13:19 [Bug nptl/30977] New: Hang in pthread_cond_wait zengzetang at bytedance dot com
  2023-10-17 13:44 ` [Bug nptl/30977] " schwab@linux-m68k.org
@ 2023-10-17 14:01 ` schwab@linux-m68k.org
  2023-10-18  1:48 ` zengzetang at bytedance dot com
  2023-10-18  8:01 ` schwab@linux-m68k.org
  3 siblings, 0 replies; 5+ messages in thread
From: schwab@linux-m68k.org @ 2023-10-17 14:01 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30977

--- Comment #2 from Andreas Schwab <schwab@linux-m68k.org> ---
The function agree accesses counts[] without any locks, while waitForAgree
modifies them in parallel.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug nptl/30977] Hang in pthread_cond_wait
  2023-10-17 13:19 [Bug nptl/30977] New: Hang in pthread_cond_wait zengzetang at bytedance dot com
  2023-10-17 13:44 ` [Bug nptl/30977] " schwab@linux-m68k.org
  2023-10-17 14:01 ` schwab@linux-m68k.org
@ 2023-10-18  1:48 ` zengzetang at bytedance dot com
  2023-10-18  8:01 ` schwab@linux-m68k.org
  3 siblings, 0 replies; 5+ messages in thread
From: zengzetang at bytedance dot com @ 2023-10-18  1:48 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30977

--- Comment #3 from 曾泽堂 <zengzetang at bytedance dot com> ---
Oh, yes, you are right. it's my fault. 

Is there any document about the meaning of fields in pthread_cond_t &
pthread_mutex_t, like the following we get from mysql, in which case the
countDown number changing from 1 -> 808991033 and threads blocking, is totally
wrong:

  {
  _vptr.CountDownLatch = 0x1414d71000007fa0, 
  lock = {
    m_mutex = {
      __data = {
        __lock = 32671, 
        __count = 0, 
        __owner = 512, 
        __nusers = 0, 
        __kind = 65536, 
        __spins = 80, 
        __list = {
          __prev = 0x1b500000000, 
          __next = 0x64f077a800000000
        }
      }, 
      __size =
"\237\177\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\000\001\000P\000\000\000\000\000\000\000\265\001\000\000\000\000\000\000\250w\360d", 
      __align = 32671
    }, 
    m_psi = 0x8900b134de24
  }, 
  cond = {
    m_cond = {
      __data = {
        __lock = 35072, 
        __futex = 604012544, 
        __total_seq = 70368744199395, 
        __wakeup_seq = 0, 
        __woken_seq = 4049354197659235840, 
        __mutex = 0x312d326563332d64, 
        __nwaiters = 761619761, 
        __broadcast_seq = 959931489
      }, 
      __size = "\000\211\000\000\000\200\000$\343T\000\000\000@", '\000'
<repeats 11 times>, "\066\067\071\060\062\062\070d-3ce2-11ee-ad79", 
      __align = 2594214122853796096
    }, 
    m_psi = 0x306533363130302d
  },

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug nptl/30977] Hang in pthread_cond_wait
  2023-10-17 13:19 [Bug nptl/30977] New: Hang in pthread_cond_wait zengzetang at bytedance dot com
                   ` (2 preceding siblings ...)
  2023-10-18  1:48 ` zengzetang at bytedance dot com
@ 2023-10-18  8:01 ` schwab@linux-m68k.org
  3 siblings, 0 replies; 5+ messages in thread
From: schwab@linux-m68k.org @ 2023-10-18  8:01 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30977

Andreas Schwab <schwab@linux-m68k.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |NOTABUG

--- Comment #4 from Andreas Schwab <schwab@linux-m68k.org> ---
Not a bug, missing locking.

These are all implementation details, try looking at the comments in the code.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-18  8:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-17 13:19 [Bug nptl/30977] New: Hang in pthread_cond_wait zengzetang at bytedance dot com
2023-10-17 13:44 ` [Bug nptl/30977] " schwab@linux-m68k.org
2023-10-17 14:01 ` schwab@linux-m68k.org
2023-10-18  1:48 ` zengzetang at bytedance dot com
2023-10-18  8:01 ` schwab@linux-m68k.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).