From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 77B88388F040; Tue, 5 May 2020 11:50:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 77B88388F040 From: "arun11299 at gmail dot com" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Tue, 05 May 2020 11:50:46 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: arun11299 at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 May 2020 11:50:46 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 Arun changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |arun11299 at gmail dot com --- Comment #3 from Arun --- I am probably facing the same issue with Python as well where GIL (Global Interpreter Lock) is the one locking the critical region of code. My setup involves spawning 4 to 8 Python threads and each thread would be making lots of Redis socket calls. Python releases the GIL whenever a socket call is made so that other threads can make some progress. Below is the backtrace of the thread which was supposed to release the GIL = by doing a condition signal first. ------------------- (gdb) bt #0 0x00007f12c99e04b0 in futex_wait (private=3D, expected= =3D4, futex_word=3D0xa745b0 ) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 0x00007f12c99e04b0 in futex_wait_simple (private=3D, expected=3D4, futex_word=3D0xa745b0 ) at ../sysdeps/nptl/futex-internal.h:135 #2 0x00007f12c99e04b0 in __condvar_quiesce_and_switch_g1 (private=3D, g1index=3D, wseq=3D, cond=3D0xa745a0 ) at pthread_cond_common.c:412 #3 0x00007f12c99e04b0 in __pthread_cond_signal (cond=3D0xa745a0 = ) at pthread_cond_signal.c:78 #4 0x000000000050bfc8 in drop_gil.lto_priv (tstate=3D0x146ee3f0) at ../Python/ceval_gil.h:187 #5 0x000000000050c0ed in PyEval_SaveThread () at ../Python/ceval.c:356 #6 0x00000000005c04bb in sock_call_ex (s=3D0x7f125b97a9a8, writing=3D1, so= ck_func=3D 0x5bdbf0 , data=3D0x7f12594650a0, connect=3D0, err=3D0x= 0, timeout=3D-1000000000) at ../Modules/socketmodule.c:899 #7 0x00000000005c0659 in sock_sendall (s=3D0x7f125b97a9a8, args=3D) at ../Modules/socketmodule.c:3833 #8 0x000000000050a8af in _PyCFunction_FastCallDict (kwargs=3D, nargs=3D, args=3D, func_obj=3D) at ../Objects/methodobject.c:234 #9 0x000000000050a8af in _PyCFunction_FastCallKeywords (kwnames=3D, nargs=3D, stack=3D, func=3D) at ../Objects/methodobject.c:294 #10 0x000000000050a8af in call_function.lto_priv (pp_stack=3D0x7f1259465250, oparg=3D, kwnames=3D) at ../Python/ceval.c:4851 #11 0x000000000050c5b9 in _PyEval_EvalFrameDefault (f=3D, throwflag=3D) at ../Python/ceval.c:3335 #12 0x0000000000509d48 in PyEval_EvalFrameEx (throwflag=3D0, f=3DFrame 0x16= cdcbf8, for file /usr/lib/python3/dist-packages/redis/connection.py, line 590, in send_packed_command (self=3D, _sock=3D, _parser=3D, _sock=3D<...>, _reader=3D, _next_response=3DFalse= ) at remote 0x7f1259c84518>, _description_args=3D{'host': '10.64.219.4', 'port':= 6379, 'db': 0}, _connect_callbacks=3D[]) at remote 0x7f1259c84ba8>, command=3D[b'*2\r\n$3\r\nGET\r\n$54\r\nproj_00|mock1|cps|07b6e3d7-5ed1-36e8= -81ef-e53777298405\r\n'], item=3Db'*2\r\n$3\r\nGET\r\n$54\r\nproj_00|mock1|cps|07b6e3d7-5ed1-36e8-81e= f-e53777298405\r\n')) at ../Python/ceval.c:754 #13 0x0000000000509d48 in _PyFunction_FastCall (globals=3D, nargs=3D382585848, args=3D, co=3D) at ../Python/ceval.c:4933 ------------------- Part of the python code that drops the GIL: -------------------- static void drop_gil(struct _ceval_runtime_state *ceval, PyThreadState *tstate) { struct _gil_runtime_state *gil =3D &ceval->gil; if (!_Py_atomic_load_relaxed(&gil->locked)) { Py_FatalError("drop_gil: GIL is not locked"); } /* tstate is allowed to be NULL (early interpreter init) */ if (tstate !=3D NULL) { /* Sub-interpreter support: threads might have been switched under our feet using PyThreadState_Swap(). Fix the GIL last holder variable so that our heuristics work. */ _Py_atomic_store_relaxed(&gil->last_holder, (uintptr_t)tstate); } MUTEX_LOCK(gil->mutex); _Py_ANNOTATE_RWLOCK_RELEASED(&gil->locked, /*is_write=3D*/1); _Py_atomic_store_relaxed(&gil->locked, 0); COND_SIGNAL(gil->cond); MUTEX_UNLOCK(gil->mutex); #ifdef FORCE_SWITCHING if (_Py_atomic_load_relaxed(&ceval->gil_drop_request) && tstate !=3D NU= LL) { MUTEX_LOCK(gil->switch_mutex); /* Not switched yet =3D> wait */ if (((PyThreadState*)_Py_atomic_load_relaxed(&gil->last_holder)) = =3D=3D tstate) { assert(is_tstate_valid(tstate)); RESET_GIL_DROP_REQUEST(tstate->interp); /* NOTE: if COND_WAIT does not atomically start waiting when releasing the mutex, another thread can run through, take the GIL and drop it again, and reset the condition before we even had a chance to wait for it. */ COND_WAIT(gil->switch_cond, gil->switch_mutex); } MUTEX_UNLOCK(gil->switch_mutex); } #endif } -------------------- --=20 You are receiving this mail because: You are on the CC list for the bug.=