public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64
@ 2023-01-24 14:21 stsp at users dot sourceforge.net
  2023-01-24 14:52 ` [Bug nptl/30041] " schwab@linux-m68k.org
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: stsp at users dot sourceforge.net @ 2023-01-24 14:21 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

            Bug ID: 30041
           Summary: pthread_cancel() hangs under gdb on aarch64
           Product: glibc
           Version: 2.36
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: nptl
          Assignee: unassigned at sourceware dot org
          Reporter: stsp at users dot sourceforge.net
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

Created attachment 14612
  --> https://sourceware.org/bugzilla/attachment.cgi?id=14612&action=edit
test case

Under qemu's aarch64 please do the following:

$ gcc -Wall -ggdb3 tcanc.c
$ ./a.out
1
2
3
Stopping
4
OK

So far so good.
Now:

$ gdb ./a.out
r
1
2
3
Stopping
4
5
6
[ counting continues infinitely - main thread stuck in pthread_cancel() ]
[ lets disable SIGALRM just to make sure the hang is permanent ]
^C
Thread 1 "a.out" received signal SIGINT, Interrupt.
__GI__dl_debug_state () at ./elf/dl-debug.c:117
117     ./elf/dl-debug.c: No such file or directory.
(gdb) handle SIGALRM nopass
Signal        Stop      Print   Pass to program Description
SIGALRM       No        No      No              Alarm clock
(gdb) c
Continuing.
^C
Thread 1 "a.out" received signal SIGINT, Interrupt.
__GI__dl_debug_state () at ./elf/dl-debug.c:117
117     in ./elf/dl-debug.c
(gdb) 
[ yes, the hang is permanent, it won't advance even w/o SIGALRM ]


This hang doesn't depend on a SIGALRM rate, i.e. SIGALRM
doesn't drain the CPU power, the rate in a test-case is
actually rather low. But SIGALRM is a needed "ingredient",
i.e. w/o SIGALRM the hang is not reproducible.

Stack trace points to some dlopen/unwind games, so I suspect
its a glibc bug. But if not - maybe its a gdb bug?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs under gdb on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
@ 2023-01-24 14:52 ` schwab@linux-m68k.org
  2023-01-24 15:25 ` stsp at users dot sourceforge.net
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: schwab@linux-m68k.org @ 2023-01-24 14:52 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

--- Comment #1 from Andreas Schwab <schwab@linux-m68k.org> ---
I cannot reproduce that, with or without gdb.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs under gdb on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
  2023-01-24 14:52 ` [Bug nptl/30041] " schwab@linux-m68k.org
@ 2023-01-24 15:25 ` stsp at users dot sourceforge.net
  2023-01-24 15:34 ` stsp at users dot sourceforge.net
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: stsp at users dot sourceforge.net @ 2023-01-24 15:25 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

--- Comment #2 from Stas Sergeev <stsp at users dot sourceforge.net> ---
(In reply to Andreas Schwab from comment #1)
> I cannot reproduce that, with or without gdb.

Are you under qemu?
I use kinetic-server-cloudimg-arm64.img ubuntu
with all updates, and "-cpu cortex-a57 -M virt".
Not sure what else is helpful, maybe you want
an ssh to my vm?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs under gdb on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
  2023-01-24 14:52 ` [Bug nptl/30041] " schwab@linux-m68k.org
  2023-01-24 15:25 ` stsp at users dot sourceforge.net
@ 2023-01-24 15:34 ` stsp at users dot sourceforge.net
  2023-01-24 15:41 ` schwab@linux-m68k.org
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: stsp at users dot sourceforge.net @ 2023-01-24 15:34 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

--- Comment #3 from Stas Sergeev <stsp at users dot sourceforge.net> ---
gdb is 12.1-3ubuntu2
What's yours?
If its a gdb problem, then we
first need to sync up the gdb
version.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs under gdb on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
                   ` (2 preceding siblings ...)
  2023-01-24 15:34 ` stsp at users dot sourceforge.net
@ 2023-01-24 15:41 ` schwab@linux-m68k.org
  2023-01-24 15:55 ` stsp at users dot sourceforge.net
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: schwab@linux-m68k.org @ 2023-01-24 15:41 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

--- Comment #4 from Andreas Schwab <schwab@linux-m68k.org> ---
I have tested it on real hardware.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs under gdb on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
                   ` (3 preceding siblings ...)
  2023-01-24 15:41 ` schwab@linux-m68k.org
@ 2023-01-24 15:55 ` stsp at users dot sourceforge.net
  2023-01-24 18:20 ` stsp at users dot sourceforge.net
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: stsp at users dot sourceforge.net @ 2023-01-24 15:55 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

--- Comment #5 from Stas Sergeev <stsp at users dot sourceforge.net> ---
(In reply to Andreas Schwab from comment #4)
> I have tested it on real hardware.

OK it seems then you need to raise the
SIGALRM freq. Please change line 35 and
replace the value 4000 with eg 500.
That way it actually even hangs w/o gdb,
but the behavior seems more random, i.e.
now it also hangs in pthread_join().

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs under gdb on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
                   ` (4 preceding siblings ...)
  2023-01-24 15:55 ` stsp at users dot sourceforge.net
@ 2023-01-24 18:20 ` stsp at users dot sourceforge.net
  2023-01-24 18:20 ` [Bug nptl/30041] pthread_cancel() hangs " stsp at users dot sourceforge.net
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: stsp at users dot sourceforge.net @ 2023-01-24 18:20 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

Stas Sergeev <stsp at users dot sourceforge.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #14612|0                           |1
        is obsolete|                            |

--- Comment #6 from Stas Sergeev <stsp at users dot sourceforge.net> ---
Created attachment 14613
  --> https://sourceware.org/bugzilla/attachment.cgi?id=14613&action=edit
test case

So I lowered the tick interval in
a hope for reproducing on a real HW.
But I can't promise, maybe you need
to lower even more. That freq allows
the repro w/o gdb, which is already
better.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
                   ` (5 preceding siblings ...)
  2023-01-24 18:20 ` stsp at users dot sourceforge.net
@ 2023-01-24 18:20 ` stsp at users dot sourceforge.net
  2023-01-25  9:57 ` schwab@linux-m68k.org
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: stsp at users dot sourceforge.net @ 2023-01-24 18:20 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

Stas Sergeev <stsp at users dot sourceforge.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|pthread_cancel() hangs      |pthread_cancel() hangs on
                   |under gdb on aarch64        |aarch64

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
                   ` (6 preceding siblings ...)
  2023-01-24 18:20 ` [Bug nptl/30041] pthread_cancel() hangs " stsp at users dot sourceforge.net
@ 2023-01-25  9:57 ` schwab@linux-m68k.org
  2023-01-25 10:17 ` stsp at users dot sourceforge.net
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: schwab@linux-m68k.org @ 2023-01-25  9:57 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

--- Comment #7 from Andreas Schwab <schwab@linux-m68k.org> ---
If anything this is a bug in the debugger.  There are two concurrent types of
events, the signal and the shlib events, and the constant flow of signal events
prevents the shlib event from making forward progress.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
                   ` (7 preceding siblings ...)
  2023-01-25  9:57 ` schwab@linux-m68k.org
@ 2023-01-25 10:17 ` stsp at users dot sourceforge.net
  2023-01-25 10:25 ` schwab@linux-m68k.org
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: stsp at users dot sourceforge.net @ 2023-01-25 10:17 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

--- Comment #8 from Stas Sergeev <stsp at users dot sourceforge.net> ---
(In reply to Andreas Schwab from comment #7)
> If anything this is a bug in the debugger.  There are two concurrent types
> of events, the signal and the shlib events, and the constant flow of signal
> events prevents the shlib event from making forward progress.

Yes, this seems to be the case.
I modified the test so that the
second thread disables the timer
after some time. If pthread_cancel()
was hanging, it unstucks.
If pthread_join() was hanging, it
doesn't unstuck because actually
the second thread is already terminated
so the timer shut-down doesn't happen.

But I thought I excluded such a possibility
by at least 2 things:
- attaching with gdb and doing "handle SIGALRM nopass"
- lowering the SIGALRM rate and making
sure both threads can execute code and
print things.

So I still don't understand what's
going on. If both threads could
sleep() and printf() relatively happily
under the much higher SIGALRM rate, then
why some rather small SIGALRM rate still
causes pthread_cancel() to stall indefinitely?
Its not like anything else stalls.
In fact, I discovered that effect on a
real program of mine, which works perfectly
(and is used by people) under the exact
SIGALRM rate which causes the full stall of
pthread_cancel().
So how is that possible w/o a bug?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
                   ` (8 preceding siblings ...)
  2023-01-25 10:17 ` stsp at users dot sourceforge.net
@ 2023-01-25 10:25 ` schwab@linux-m68k.org
  2023-01-25 10:29 ` stsp at users dot sourceforge.net
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: schwab@linux-m68k.org @ 2023-01-25 10:25 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

--- Comment #9 from Andreas Schwab <schwab@linux-m68k.org> ---
Telling the debugger not to forward the signal does not change the overhead of
signal delivery through the debugger.  You are still stuck in the shlib event. 
The only way to prevent the overhead of the shlib event is to make sure
libgcc_s is already loaded by the time pthread_cancel is called.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
                   ` (9 preceding siblings ...)
  2023-01-25 10:25 ` schwab@linux-m68k.org
@ 2023-01-25 10:29 ` stsp at users dot sourceforge.net
  2023-01-25 10:42 ` schwab@linux-m68k.org
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: stsp at users dot sourceforge.net @ 2023-01-25 10:29 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

Stas Sergeev <stsp at users dot sourceforge.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #14613|0                           |1
        is obsolete|                            |

--- Comment #10 from Stas Sergeev <stsp at users dot sourceforge.net> ---
Created attachment 14624
  --> https://sourceware.org/bugzilla/attachment.cgi?id=14624&action=edit
test case

Here's the updated test-cases that
shows that both threads are alive
and kicking before pthread_cancel().
After pthread_cancel() - either both
stuck forever, or until the second
thread shuts down the timer.

> You are still stuck in the shlib event.

But could you please explain a bit
of a details? If both threads could
progress, then why "shlib event" can't?
How is it different from the prints
that I have now inserted into the test
to make sure SIGALRM doesn't out-hog CPU?

> libgcc_s is already loaded by the time pthread_cancel is called.

Wow! Then nothing would stall?
How can I do that?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
                   ` (10 preceding siblings ...)
  2023-01-25 10:29 ` stsp at users dot sourceforge.net
@ 2023-01-25 10:42 ` schwab@linux-m68k.org
  2023-01-25 10:57 ` stsp at users dot sourceforge.net
  2023-01-25 12:45 ` [Bug nptl/30041] pthread_cancel() hangs under gdb " stsp at users dot sourceforge.net
  13 siblings, 0 replies; 15+ messages in thread
From: schwab@linux-m68k.org @ 2023-01-25 10:42 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

--- Comment #11 from Andreas Schwab <schwab@linux-m68k.org> ---
Only one thread progresses.  The other is stuck in the shlib event.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
                   ` (11 preceding siblings ...)
  2023-01-25 10:42 ` schwab@linux-m68k.org
@ 2023-01-25 10:57 ` stsp at users dot sourceforge.net
  2023-01-25 12:45 ` [Bug nptl/30041] pthread_cancel() hangs under gdb " stsp at users dot sourceforge.net
  13 siblings, 0 replies; 15+ messages in thread
From: stsp at users dot sourceforge.net @ 2023-01-25 10:57 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

--- Comment #12 from Stas Sergeev <stsp at users dot sourceforge.net> ---
(In reply to Andreas Schwab from comment #11)
> Only one thread progresses.  The other is stuck in the shlib event.

So you mean gdb can't handle shlib
event because of SIGALRMs?
So is it a gdb bug which doesn't
stop signals while performing the
shlib event?
You told about gdb from the very
beginning, but only now I am starting
to understand what "shlib event" do
you mean.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug nptl/30041] pthread_cancel() hangs under gdb on aarch64
  2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
                   ` (12 preceding siblings ...)
  2023-01-25 10:57 ` stsp at users dot sourceforge.net
@ 2023-01-25 12:45 ` stsp at users dot sourceforge.net
  13 siblings, 0 replies; 15+ messages in thread
From: stsp at users dot sourceforge.net @ 2023-01-25 12:45 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30041

Stas Sergeev <stsp at users dot sourceforge.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|pthread_cancel() hangs on   |pthread_cancel() hangs
                   |aarch64                     |under gdb on aarch64

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-01-25 12:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-24 14:21 [Bug nptl/30041] New: pthread_cancel() hangs under gdb on aarch64 stsp at users dot sourceforge.net
2023-01-24 14:52 ` [Bug nptl/30041] " schwab@linux-m68k.org
2023-01-24 15:25 ` stsp at users dot sourceforge.net
2023-01-24 15:34 ` stsp at users dot sourceforge.net
2023-01-24 15:41 ` schwab@linux-m68k.org
2023-01-24 15:55 ` stsp at users dot sourceforge.net
2023-01-24 18:20 ` stsp at users dot sourceforge.net
2023-01-24 18:20 ` [Bug nptl/30041] pthread_cancel() hangs " stsp at users dot sourceforge.net
2023-01-25  9:57 ` schwab@linux-m68k.org
2023-01-25 10:17 ` stsp at users dot sourceforge.net
2023-01-25 10:25 ` schwab@linux-m68k.org
2023-01-25 10:29 ` stsp at users dot sourceforge.net
2023-01-25 10:42 ` schwab@linux-m68k.org
2023-01-25 10:57 ` stsp at users dot sourceforge.net
2023-01-25 12:45 ` [Bug nptl/30041] pthread_cancel() hangs under gdb " stsp at users dot sourceforge.net

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).