public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process.
@ 2024-05-09 10:53 Partha Satapathy
  2024-05-09 14:32 ` Tom Tromey
  2024-05-10 20:19 ` Pedro Alves
  0 siblings, 2 replies; 11+ messages in thread
From: Partha Satapathy @ 2024-05-09 10:53 UTC (permalink / raw)
  To: partha.satapathy, gdb-patches, rajesh.sivaramasubramaniom,
	bert.barbe, blarsen, cupertino.miranda

From: Partha Sarathi Satapathy <partha.satapathy@oracle.com>

Problem: While gdb is attaching an inferior, if ctrl-c is pressed in the
middle of the process attach,  the sigint is passed to the debugged
process. This triggers the exit of the inferior. For example in pstack,
printing a stack can take significant time, and ctrl-c is pressed to
abort the pstack/gdb application. This in turn kills the debugged
process, which can be critical for the system. In this case, the
intention of ctrl+c is to kill pstack/gdb, but not the inferior
application.
gdb -p <<pid>>
or gdb /proc/<<pid>>/exe pid
Attaching to process
<< ctrl+c is pressed during attach
(gdb) q
<<<< inferior process exited >>>>

A Ctrl-C/sigint received by gdb during the attachment of an inferior
passed to the debugged at some definite points during the window of
process attachment. The process of attaching an inferior is a multistep
process, and it takes time to get ready with the GDB prompt. As the
debugger and debugger are not fully attached during this period, the
sigint takes its default action to terminate the process.

Solution: While GDB attaches processes, the inferior is not the current
session leader. Hence, until attach is complete and the GDB prompt is
available, the sigint should not be passed to the inferior.
The signal should be skipped if the process runs in the background. With
this approach, we can skip passing the signature if the process is
attached to the GDB and the process attach is not complete.

attach_flag : Set if process is attached
sync_flag   : Set if attach is complete
If attached and sync_flag is not set, dont kill attached process
---
 gdb/infcmd.c   | 2 ++
 gdb/inferior.h | 3 +++
 gdb/inflow.c   | 2 ++
 3 files changed, 7 insertions(+)

diff --git a/gdb/infcmd.c b/gdb/infcmd.c
index 0309658690c1..ee34498525fd 100644
--- a/gdb/infcmd.c
+++ b/gdb/infcmd.c
@@ -2510,6 +2510,8 @@ setup_inferior (int from_tty)
   target_post_attach (inferior_ptid.pid ());
 
   post_create_inferior (from_tty);
+  check_quit_flag();
+  current_inferior ()->sync_flag = true;
 }
 
 /* What to do after the first program stops after attaching.  */
diff --git a/gdb/inferior.h b/gdb/inferior.h
index e239aa5b3cf0..8f54db17a0de 100644
--- a/gdb/inferior.h
+++ b/gdb/inferior.h
@@ -603,6 +603,9 @@ class inferior : public refcounted_object,
   /* True if this child process was attached rather than forked.  */
   bool attach_flag = false;
 
+  /* True if inferior has been fully attached*/
+  bool sync_flag = false;
+
   /* If this inferior is a vfork child, then this is the pointer to
      its vfork parent, if GDB is still attached to it.  */
   inferior *vfork_parent = NULL;
diff --git a/gdb/inflow.c b/gdb/inflow.c
index 773ac0ba4997..381e6e4c22dd 100644
--- a/gdb/inflow.c
+++ b/gdb/inflow.c
@@ -585,6 +585,8 @@ child_pass_ctrlc (struct target_ops *self)
       if (inf->terminal_state != target_terminal_state::is_ours)
 	{
 	  gdb_assert (inf->pid != 0);
+	  if ((inf->attach_flag) && !(inf->sync_flag))
+	    return;
 
 #ifndef _WIN32
 	  kill (inf->pid, SIGINT);
-- 
2.39.3


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process.
  2024-05-09 10:53 [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process Partha Satapathy
@ 2024-05-09 14:32 ` Tom Tromey
  2024-05-10 20:19 ` Pedro Alves
  1 sibling, 0 replies; 11+ messages in thread
From: Tom Tromey @ 2024-05-09 14:32 UTC (permalink / raw)
  To: Partha Satapathy
  Cc: gdb-patches, rajesh.sivaramasubramaniom, bert.barbe, blarsen,
	cupertino.miranda

>>>>> "Partha" == Partha Satapathy <partha.satapathy@oracle.com> writes:

IMO it would be best if Pedro reviewed this, but...

Partha> diff --git a/gdb/inflow.c b/gdb/inflow.c
Partha> index 773ac0ba4997..381e6e4c22dd 100644
Partha> --- a/gdb/inflow.c
Partha> +++ b/gdb/inflow.c
Partha> @@ -585,6 +585,8 @@ child_pass_ctrlc (struct target_ops *self)
Partha>        if (inf->terminal_state != target_terminal_state::is_ours)
Partha>  	{
Partha>  	  gdb_assert (inf->pid != 0);
Partha> +	  if ((inf->attach_flag) && !(inf->sync_flag))
Partha> +	    return;

... if this code is run, doesn't it mean a C-c will be ignored?

Also earlier:

Partha> +  check_quit_flag();

I think this just returns true/false and clears the flag.
That doesn't seem right.

Maybe check_quit_flag should be marked [[nodiscard]].

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process.
  2024-05-09 10:53 [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process Partha Satapathy
  2024-05-09 14:32 ` Tom Tromey
@ 2024-05-10 20:19 ` Pedro Alves
  2024-05-13 14:49   ` Pedro Alves
  1 sibling, 1 reply; 11+ messages in thread
From: Pedro Alves @ 2024-05-10 20:19 UTC (permalink / raw)
  To: Partha Satapathy, gdb-patches, rajesh.sivaramasubramaniom,
	bert.barbe, blarsen, cupertino.miranda

Hi.

Just wanted to let you know that I've read all the discussion around this until this
email I'm replying to, and started thinking about it a bit.  Unfortunately this is one of
those areas in GDB where the right change is rarely immediately obvious (to me).

Some questions:

 - If you ctrl-c to abort the attach, do we really abort the 
   attach properly?  Or do we stay attached in some half broken state?

 - Below you mention pstack, where can we find it?  And you mention
   that ctrl-c is pressed while that is printing a stack.  I'm assuming
   that's a backtrace command.  I'm confused in that case, as if that is
   so, then we should already be past the initial attach.  The question
   would then becomes, shouldn't gdb have the terminal at that point?
   How come it does not?

I'm wondering whether Baris's patch to eliminate the inferior
continuations would help with this, as it probably makes the attaching
sequence synchronous.  I should probably look at that one.

Pedro Alves


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process.
  2024-05-10 20:19 ` Pedro Alves
@ 2024-05-13 14:49   ` Pedro Alves
  2024-05-16  7:15     ` Partha Satapathy
  0 siblings, 1 reply; 11+ messages in thread
From: Pedro Alves @ 2024-05-13 14:49 UTC (permalink / raw)
  To: Partha Satapathy, gdb-patches, rajesh.sivaramasubramaniom,
	bert.barbe, blarsen, cupertino.miranda

Adding another question to the list below.  (I haven't tried to reproduce this yet myself, btw.)

On 2024-05-10 21:19, Pedro Alves wrote:
> Hi.
> 
> Just wanted to let you know that I've read all the discussion around this until this
> email I'm replying to, and started thinking about it a bit.  Unfortunately this is one of
> those areas in GDB where the right change is rarely immediately obvious (to me).
> 
> Some questions:
> 
>  - If you ctrl-c to abort the attach, do we really abort the 
>    attach properly?  Or do we stay attached in some half broken state?
> 
>  - Below you mention pstack, where can we find it?  And you mention
>    that ctrl-c is pressed while that is printing a stack.  I'm assuming
>    that's a backtrace command.  I'm confused in that case, as if that is
>    so, then we should already be past the initial attach.  The question
>    would then becomes, shouldn't gdb have the terminal at that point?
>    How come it does not?

#3 - The patch description states:

 > Problem: While gdb is attaching an inferior, if ctrl-c is pressed in the
 > middle of the process attach,  the sigint is passed to the debugged
 > process.  This triggers the exit of the inferior.

This SIGINT passing is done with "kill(-pgrp, SIGINT)".  How does that manage
to trigger the exit of the inferior at all?  ptrace should intercept the
SIGINT before the inferior ever sees it.  Did it not?

Or could it be that the real issue is that because that sends the SIGINT
to all the processes in the inferior's pgrp, we kill more processes than
the one we're attaching to, and those processes exiting cause the inferior
to exit as well.  If so, then this is orthogonal to the initial attach,
and can happen after the attach as well.  There is a bug open about this
on bugzilla.

Pedro Alves

> 
> I'm wondering whether Baris's patch to eliminate the inferior
> continuations would help with this, as it probably makes the attaching
> sequence synchronous.  I should probably look at that one.
> 
> Pedro Alves
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process.
  2024-05-13 14:49   ` Pedro Alves
@ 2024-05-16  7:15     ` Partha Satapathy
  2024-06-03  5:21       ` Partha Satapathy
  0 siblings, 1 reply; 11+ messages in thread
From: Partha Satapathy @ 2024-05-16  7:15 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches, rajesh.sivaramasubramaniom, bert.barbe,
	blarsen, cupertino.miranda

On 5/13/2024 8:19 PM, Pedro Alves wrote:
> Adding another question to the list below.  (I haven't tried to reproduce this yet myself, btw.)
> 
> On 2024-05-10 21:19, Pedro Alves wrote:
>> Hi.
>>
>> Just wanted to let you know that I've read all the discussion around this until this
>> email I'm replying to, and started thinking about it a bit.  Unfortunately this is one of
>> those areas in GDB where the right change is rarely immediately obvious (to me).
>>
>> Some questions:
>>
>>   - If you ctrl-c to abort the attach, do we really abort the
>>     attach properly?  Or do we stay attached in some half broken state?
>>
>>   - Below you mention pstack, where can we find it?  And you mention
>>     that ctrl-c is pressed while that is printing a stack.  I'm assuming
>>     that's a backtrace command.  I'm confused in that case, as if that is
>>     so, then we should already be past the initial attach.  The question
>>     would then becomes, shouldn't gdb have the terminal at that point?
>>     How come it does not?
> 
> #3 - The patch description states:
> 
>   > Problem: While gdb is attaching an inferior, if ctrl-c is pressed in the
>   > middle of the process attach,  the sigint is passed to the debugged
>   > process.  This triggers the exit of the inferior.
> 
> This SIGINT passing is done with "kill(-pgrp, SIGINT)".  How does that manage
> to trigger the exit of the inferior at all?  ptrace should intercept the
> SIGINT before the inferior ever sees it.  Did it not?
> 
> Or could it be that the real issue is that because that sends the SIGINT
> to all the processes in the inferior's pgrp, we kill more processes than
> the one we're attaching to, and those processes exiting cause the inferior
> to exit as well.  If so, then this is orthogonal to the initial attach,
> and can happen after the attach as well.  There is a bug open about this
> on bugzilla.
> 
> Pedro Alves
> 
>>
>> I'm wondering whether Baris's patch to eliminate the inferior
>> continuations would help with this, as it probably makes the attaching
>> sequence synchronous.  I should probably look at that one.
>>
>> Pedro Alves
>>


Thanks Pedro and Tom for reviewing the problem.


Problem :
pstack,  dumps the stack of all threads in a process. In some cases 
printing of stack can take significant time and ctrl-c is pressed to 
abort pstack/gdb application. This in turn kills the debugged process, 
which can be  critical for the system. In this case the intention of 
“ctrl+c” to kill pstack/gdb, but not the target application.


# tail pstack -n 12

# Run GDB, strip out unwanted noise.
# --readnever is no longer used since .gdb_index is now in use.
$GDB --quiet -nx $GDBARGS /proc/$1/exe $1 <<EOF 2>&1 |
set width 0
set height 0
set pagination no
$backtrace
EOF
/bin/sed -n \
     -e 's/^\((gdb) \)*//' \
     -e '/^#/p' \
     -e '/^Thread/p'


This is the interest part in the pstack, rest is cosmetic.

pstack uses:
# pstack 1

#0  0x00007fa18cf44017 in epoll_wait () from /lib64/libc.so.6
#1  0x00007fa18e67e036 in sd_event_wait () from 
/usr/lib/systemd/libsystemd-shared-239.so
#2  0x00007fa18e67f33b in sd_event_run () from 
/usr/lib/systemd/libsystemd-shared-239.so
#3  0x000055c155da8c22 in manager_loop ()
#4  0x000055c155d5f133 in main ()

Reproduction:

The debugged application generally attached to process by:
gdb -p <<pid>>
or gdb /proc/<<pid>>/exe pid
pstack uses the latter  method to attach the debugged to gdb. If the
application is large or process of reading symbols is slow, gives a good
window to press the ctrl+c during attach. Spawning "gdb" under "strace
-k" makes gdb a lot slower and gives a larger window to easily press the
ctrl+c at the precise period i.e. during the attach of the debugged
process. The above strace hack will enhance rate of reproduction of the
issue. Testcase:

With GDB 13.1
ps aux | grep abrtd
root     2195168   /usr/sbin/abrtd -d -s

#strace -k -o log gdb -p 2195168
Attaching to process 2195168
[New LWP 2195177]
[New LWP 2195179]
^C[Thread debugging using libthread_db enabled]
<<<<   Note the ctrl+c is pressed after attach is initiated and it’s
still reading the symbols from library >>>> Using host libthread_db
library "/lib64/libthread_db.so.1".
0x00007fe3ed6d70d1 in poll () from /lib64/libc.so.6
(gdb) q
A debugging session is active.
            Inferior 1 [process 2195168] will be detached Quit anyway? (y
or n) y Detaching from program: /usr/sbin/abrtd, process 2195168

# ps aux | grep 2195168
<<<< Process exited >>>>

This is having a very narrow window to press the ctrlc.
Session1 :

]$ ps aux | grep abrtd
root        1329  0.0  0.0 602624 13076 ?        Ssl  May03   0:00 
/usr/sbin/abrtd -d -s

Session2:

# ./tpstack 1329

+ strace -o omlog -k ./gdb --quiet -nx -ex 'set width 0' -ex 'set height 
0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all 
bt' -ex quit /proc/1329/exe 1329
Reading symbols from /proc/1329/exe...
Python Exception <class 'AttributeError'>: module 'gdb' has no attribute 
'_handle_missing_debuginfo'
Reading symbols from .gnu_debugdata for /usr/sbin/abrtd...
(No debugging symbols found in .gnu_debugdata for /usr/sbin/abrtd)
Attaching to program: /proc/1329/exe, process 1329
[New LWP 1399]
[New LWP 1349] ^C

Session1:
[opc@pssatapa-ol8 TEST]$ ps aux | grep abrtd
<<<1329 Is killed >>>

This is a very small window, so a heavy application is good for 
reproduction. I modified the the last part of pstack like:
# Run GDB, strip out unwanted noise.
# --readnever is no longer used since .gdb_index is now in use.
strace -o omlog -k  ./gdb  --quiet -nx  -ex 'set width 0' -ex 'set 
height 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread 
apply all bt' -ex quit  /proc/$1/exe $1

The strace with -k on gdb make gdb slow and we get a window to press 
Ctrl+c.  otherwise the window is very small to time the signal. We 
observe the problem while the FileStsyem or Kernel or proc FS is slow.

The signal is not intended to the inferior.
The signal is passed from "gdb" to the inferior.

The SIGINT handler in gdb, marks the QUIT flag and
in some paths we check the quit flag and pass the signal to inferior.
That is killing the inferior.

On :
+  check_quit_flag();
This should be set only when inf->attach_flag is true.
I will add the check in next iteration.
The idea here is to clear any pending QUIT flag set by sigint
else, post we set the sync_flag , a check to QUIT Flag
and can kill the inferior.

Thanks
Partha

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process.
  2024-05-16  7:15     ` Partha Satapathy
@ 2024-06-03  5:21       ` Partha Satapathy
  2024-06-10  5:41         ` Partha Satapathy
  2024-06-14 17:19         ` Pedro Alves
  0 siblings, 2 replies; 11+ messages in thread
From: Partha Satapathy @ 2024-06-03  5:21 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches, rajesh.sivaramasubramaniom, bert.barbe,
	blarsen, cupertino.miranda

On 5/16/2024 12:45 PM, Partha Satapathy wrote:
> On 5/13/2024 8:19 PM, Pedro Alves wrote:
>> Adding another question to the list below.  (I haven't tried to 
>> reproduce this yet myself, btw.)
>>
>> On 2024-05-10 21:19, Pedro Alves wrote:
>>> Hi.
>>>
>>> Just wanted to let you know that I've read all the discussion around 
>>> this until this
>>> email I'm replying to, and started thinking about it a bit.  
>>> Unfortunately this is one of
>>> those areas in GDB where the right change is rarely immediately 
>>> obvious (to me).
>>>
>>> Some questions:
>>>
>>>   - If you ctrl-c to abort the attach, do we really abort the
>>>     attach properly?  Or do we stay attached in some half broken state?
>>>
>>>   - Below you mention pstack, where can we find it?  And you mention
>>>     that ctrl-c is pressed while that is printing a stack.  I'm assuming
>>>     that's a backtrace command.  I'm confused in that case, as if 
>>> that is
>>>     so, then we should already be past the initial attach.  The question
>>>     would then becomes, shouldn't gdb have the terminal at that point?
>>>     How come it does not?
>>
>> #3 - The patch description states:
>>
>>   > Problem: While gdb is attaching an inferior, if ctrl-c is pressed 
>> in the
>>   > middle of the process attach,  the sigint is passed to the debugged
>>   > process.  This triggers the exit of the inferior.
>>
>> This SIGINT passing is done with "kill(-pgrp, SIGINT)".  How does that 
>> manage
>> to trigger the exit of the inferior at all?  ptrace should intercept the
>> SIGINT before the inferior ever sees it.  Did it not?
>>
>> Or could it be that the real issue is that because that sends the SIGINT
>> to all the processes in the inferior's pgrp, we kill more processes than
>> the one we're attaching to, and those processes exiting cause the 
>> inferior
>> to exit as well.  If so, then this is orthogonal to the initial attach,
>> and can happen after the attach as well.  There is a bug open about this
>> on bugzilla.
>>
>> Pedro Alves
>>
>>>
>>> I'm wondering whether Baris's patch to eliminate the inferior
>>> continuations would help with this, as it probably makes the attaching
>>> sequence synchronous.  I should probably look at that one.
>>>
>>> Pedro Alves
>>>
> 
> 
> Thanks Pedro and Tom for reviewing the problem.
> 
> 
> Problem :
> pstack,  dumps the stack of all threads in a process. In some cases 
> printing of stack can take significant time and ctrl-c is pressed to 
> abort pstack/gdb application. This in turn kills the debugged process, 
> which can be  critical for the system. In this case the intention of 
> “ctrl+c” to kill pstack/gdb, but not the target application.
> 
> 
> # tail pstack -n 12
> 
> # Run GDB, strip out unwanted noise.
> # --readnever is no longer used since .gdb_index is now in use.
> $GDB --quiet -nx $GDBARGS /proc/$1/exe $1 <<EOF 2>&1 |
> set width 0
> set height 0
> set pagination no
> $backtrace
> EOF
> /bin/sed -n \
>      -e 's/^\((gdb) \)*//' \
>      -e '/^#/p' \
>      -e '/^Thread/p'
> 
> 
> This is the interest part in the pstack, rest is cosmetic.
> 
> pstack uses:
> # pstack 1
> 
> #0  0x00007fa18cf44017 in epoll_wait () from /lib64/libc.so.6
> #1  0x00007fa18e67e036 in sd_event_wait () from 
> /usr/lib/systemd/libsystemd-shared-239.so
> #2  0x00007fa18e67f33b in sd_event_run () from 
> /usr/lib/systemd/libsystemd-shared-239.so
> #3  0x000055c155da8c22 in manager_loop ()
> #4  0x000055c155d5f133 in main ()
> 
> Reproduction:
> 
> The debugged application generally attached to process by:
> gdb -p <<pid>>
> or gdb /proc/<<pid>>/exe pid
> pstack uses the latter  method to attach the debugged to gdb. If the
> application is large or process of reading symbols is slow, gives a good
> window to press the ctrl+c during attach. Spawning "gdb" under "strace
> -k" makes gdb a lot slower and gives a larger window to easily press the
> ctrl+c at the precise period i.e. during the attach of the debugged
> process. The above strace hack will enhance rate of reproduction of the
> issue. Testcase:
> 
> With GDB 13.1
> ps aux | grep abrtd
> root     2195168   /usr/sbin/abrtd -d -s
> 
> #strace -k -o log gdb -p 2195168
> Attaching to process 2195168
> [New LWP 2195177]
> [New LWP 2195179]
> ^C[Thread debugging using libthread_db enabled]
> <<<<   Note the ctrl+c is pressed after attach is initiated and it’s
> still reading the symbols from library >>>> Using host libthread_db
> library "/lib64/libthread_db.so.1".
> 0x00007fe3ed6d70d1 in poll () from /lib64/libc.so.6
> (gdb) q
> A debugging session is active.
>             Inferior 1 [process 2195168] will be detached Quit anyway? (y
> or n) y Detaching from program: /usr/sbin/abrtd, process 2195168
> 
> # ps aux | grep 2195168
> <<<< Process exited >>>>
> 
> This is having a very narrow window to press the ctrlc.
> Session1 :
> 
> ]$ ps aux | grep abrtd
> root        1329  0.0  0.0 602624 13076 ?        Ssl  May03   0:00 
> /usr/sbin/abrtd -d -s
> 
> Session2:
> 
> # ./tpstack 1329
> 
> + strace -o omlog -k ./gdb --quiet -nx -ex 'set width 0' -ex 'set height 
> 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all 
> bt' -ex quit /proc/1329/exe 1329
> Reading symbols from /proc/1329/exe...
> Python Exception <class 'AttributeError'>: module 'gdb' has no attribute 
> '_handle_missing_debuginfo'
> Reading symbols from .gnu_debugdata for /usr/sbin/abrtd...
> (No debugging symbols found in .gnu_debugdata for /usr/sbin/abrtd)
> Attaching to program: /proc/1329/exe, process 1329
> [New LWP 1399]
> [New LWP 1349] ^C
> 
> Session1:
> [opc@pssatapa-ol8 TEST]$ ps aux | grep abrtd
> <<<1329 Is killed >>>
> 
> This is a very small window, so a heavy application is good for 
> reproduction. I modified the the last part of pstack like:
> # Run GDB, strip out unwanted noise.
> # --readnever is no longer used since .gdb_index is now in use.
> strace -o omlog -k  ./gdb  --quiet -nx  -ex 'set width 0' -ex 'set 
> height 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread 
> apply all bt' -ex quit  /proc/$1/exe $1
> 
> The strace with -k on gdb make gdb slow and we get a window to press 
> Ctrl+c.  otherwise the window is very small to time the signal. We 
> observe the problem while the FileStsyem or Kernel or proc FS is slow.
> 
> The signal is not intended to the inferior.
> The signal is passed from "gdb" to the inferior.
> 
> The SIGINT handler in gdb, marks the QUIT flag and
> in some paths we check the quit flag and pass the signal to inferior.
> That is killing the inferior.
> 
> On :
> +  check_quit_flag();
> This should be set only when inf->attach_flag is true.
> I will add the check in next iteration.
> The idea here is to clear any pending QUIT flag set by sigint
> else, post we set the sync_flag , a check to QUIT Flag
> and can kill the inferior.
> 
> Thanks
> Partha

Hi Pedro,

I do understand the last reply way a bit long to explain you the 
context. Please find the answers to your questions.

Here are the questions:
 >  - If you ctrl-c to abort the attach, do we really abort the
 >    attach properly?  Or do we stay attached in some half broken state?
 >
Let take the case of "gdb -p pid" and press a ctrl + c immediately.
The intention is to kill the gdb , not to abort the attachment or
kill the attached pid. In the problem case debugged process is killed.

 >  - Below you mention pstack, where can we find it?  And you mention
 >    that ctrl-c is pressed while that is printing a stack.  I'm assuming
 >    that's a backtrace command.  I'm confused in that case, as if that is
 >    so, then we should already be past the initial attach.  The question
 >    would then becomes, shouldn't gdb have the terminal at that point?
 >    How come it does not?

pstack is a wrapper over gdb and run as:
gdb --quiet -nx -ex 'set width 0' -ex 'set height
0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all
bt' -ex quit /proc/<<pid>/exe <<pid>>

Here gdb is run with -"ex" so so it does not need the gdb prompt to 
issue the "back trace" command.

The Ctrl+C is issued in a window between initial attach and 
target_post_attach.

 >This SIGINT passing is done with "kill(-pgrp, SIGINT)".  How does that 
 >manage to trigger the exit of the inferior at all?  ptrace should 
 >intercept the SIGINT before the inferior ever sees it.  Did it not?

/* Handle a SIGINT.  */
void
handle_sigint (int sig)
{
   signal (sig, handle_sigint);

   /* We could be running in a loop reading in symfiles or something so
      it may be quite a while before we get back to the event loop.  So
      set quit_flag to 1 here.  Then if QUIT is called before we get to
      the event loop, we will unwind as expected.  */
   set_quit_flag ();

In the problem case, gdb is reading the sysmbol files and we received 
the sigint. We set the quit flag in this context.

In the event loop we check the quit flag (check_quit_flag) and pass the 
signal to the inferior with "child_pass_ctrlc". This kills the target.

Its fine when gdb has the "GDB" prompt and it owns the terminal and we 
once we press Ctrl+C and that is passed to the target. But with -"ex" 
option, use never experience the GDB prompt and the signal should only 
kill gdb not the attached.

Best way to visualize the issue is to run:
gdb --quiet -nx -ex 'set width 0' -ex 'set height 0' -ex 'set pagination 
no' -ex 'set confirm off' -ex 'thread apply all  bt' -ex quit /proc/1/exe 1

and we have a sigint (Ctrl+C) while gdb reading the symbol files.

Thanks
Partha

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process.
  2024-06-03  5:21       ` Partha Satapathy
@ 2024-06-10  5:41         ` Partha Satapathy
  2024-06-14 17:19         ` Pedro Alves
  1 sibling, 0 replies; 11+ messages in thread
From: Partha Satapathy @ 2024-06-10  5:41 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches, rajesh.sivaramasubramaniom, bert.barbe,
	blarsen, cupertino.miranda, tom

On 6/3/2024 10:51 AM, Partha Satapathy wrote:
> On 5/16/2024 12:45 PM, Partha Satapathy wrote:
>> On 5/13/2024 8:19 PM, Pedro Alves wrote:
>>> Adding another question to the list below.  (I haven't tried to 
>>> reproduce this yet myself, btw.)
>>>
>>> On 2024-05-10 21:19, Pedro Alves wrote:
>>>> Hi.
>>>>
>>>> Just wanted to let you know that I've read all the discussion around 
>>>> this until this
>>>> email I'm replying to, and started thinking about it a bit. 
>>>> Unfortunately this is one of
>>>> those areas in GDB where the right change is rarely immediately 
>>>> obvious (to me).
>>>>
>>>> Some questions:
>>>>
>>>>   - If you ctrl-c to abort the attach, do we really abort the
>>>>     attach properly?  Or do we stay attached in some half broken state?
>>>>
>>>>   - Below you mention pstack, where can we find it?  And you mention
>>>>     that ctrl-c is pressed while that is printing a stack.  I'm 
>>>> assuming
>>>>     that's a backtrace command.  I'm confused in that case, as if 
>>>> that is
>>>>     so, then we should already be past the initial attach.  The 
>>>> question
>>>>     would then becomes, shouldn't gdb have the terminal at that point?
>>>>     How come it does not?
>>>
>>> #3 - The patch description states:
>>>
>>>   > Problem: While gdb is attaching an inferior, if ctrl-c is pressed 
>>> in the
>>>   > middle of the process attach,  the sigint is passed to the debugged
>>>   > process.  This triggers the exit of the inferior.
>>>
>>> This SIGINT passing is done with "kill(-pgrp, SIGINT)".  How does 
>>> that manage
>>> to trigger the exit of the inferior at all?  ptrace should intercept the
>>> SIGINT before the inferior ever sees it.  Did it not?
>>>
>>> Or could it be that the real issue is that because that sends the SIGINT
>>> to all the processes in the inferior's pgrp, we kill more processes than
>>> the one we're attaching to, and those processes exiting cause the 
>>> inferior
>>> to exit as well.  If so, then this is orthogonal to the initial attach,
>>> and can happen after the attach as well.  There is a bug open about this
>>> on bugzilla.
>>>
>>> Pedro Alves
>>>
>>>>
>>>> I'm wondering whether Baris's patch to eliminate the inferior
>>>> continuations would help with this, as it probably makes the attaching
>>>> sequence synchronous.  I should probably look at that one.
>>>>
>>>> Pedro Alves
>>>>
>>
>>
>> Thanks Pedro and Tom for reviewing the problem.
>>
>>
>> Problem :
>> pstack,  dumps the stack of all threads in a process. In some cases 
>> printing of stack can take significant time and ctrl-c is pressed to 
>> abort pstack/gdb application. This in turn kills the debugged process, 
>> which can be  critical for the system. In this case the intention of 
>> “ctrl+c” to kill pstack/gdb, but not the target application.
>>
>>
>> # tail pstack -n 12
>>
>> # Run GDB, strip out unwanted noise.
>> # --readnever is no longer used since .gdb_index is now in use.
>> $GDB --quiet -nx $GDBARGS /proc/$1/exe $1 <<EOF 2>&1 |
>> set width 0
>> set height 0
>> set pagination no
>> $backtrace
>> EOF
>> /bin/sed -n \
>>      -e 's/^\((gdb) \)*//' \
>>      -e '/^#/p' \
>>      -e '/^Thread/p'
>>
>>
>> This is the interest part in the pstack, rest is cosmetic.
>>
>> pstack uses:
>> # pstack 1
>>
>> #0  0x00007fa18cf44017 in epoll_wait () from /lib64/libc.so.6
>> #1  0x00007fa18e67e036 in sd_event_wait () from 
>> /usr/lib/systemd/libsystemd-shared-239.so
>> #2  0x00007fa18e67f33b in sd_event_run () from 
>> /usr/lib/systemd/libsystemd-shared-239.so
>> #3  0x000055c155da8c22 in manager_loop ()
>> #4  0x000055c155d5f133 in main ()
>>
>> Reproduction:
>>
>> The debugged application generally attached to process by:
>> gdb -p <<pid>>
>> or gdb /proc/<<pid>>/exe pid
>> pstack uses the latter  method to attach the debugged to gdb. If the
>> application is large or process of reading symbols is slow, gives a good
>> window to press the ctrl+c during attach. Spawning "gdb" under "strace
>> -k" makes gdb a lot slower and gives a larger window to easily press the
>> ctrl+c at the precise period i.e. during the attach of the debugged
>> process. The above strace hack will enhance rate of reproduction of the
>> issue. Testcase:
>>
>> With GDB 13.1
>> ps aux | grep abrtd
>> root     2195168   /usr/sbin/abrtd -d -s
>>
>> #strace -k -o log gdb -p 2195168
>> Attaching to process 2195168
>> [New LWP 2195177]
>> [New LWP 2195179]
>> ^C[Thread debugging using libthread_db enabled]
>> <<<<   Note the ctrl+c is pressed after attach is initiated and it’s
>> still reading the symbols from library >>>> Using host libthread_db
>> library "/lib64/libthread_db.so.1".
>> 0x00007fe3ed6d70d1 in poll () from /lib64/libc.so.6
>> (gdb) q
>> A debugging session is active.
>>             Inferior 1 [process 2195168] will be detached Quit anyway? (y
>> or n) y Detaching from program: /usr/sbin/abrtd, process 2195168
>>
>> # ps aux | grep 2195168
>> <<<< Process exited >>>>
>>
>> This is having a very narrow window to press the ctrlc.
>> Session1 :
>>
>> ]$ ps aux | grep abrtd
>> root        1329  0.0  0.0 602624 13076 ?        Ssl  May03   0:00 
>> /usr/sbin/abrtd -d -s
>>
>> Session2:
>>
>> # ./tpstack 1329
>>
>> + strace -o omlog -k ./gdb --quiet -nx -ex 'set width 0' -ex 'set 
>> height 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread 
>> apply all bt' -ex quit /proc/1329/exe 1329
>> Reading symbols from /proc/1329/exe...
>> Python Exception <class 'AttributeError'>: module 'gdb' has no 
>> attribute '_handle_missing_debuginfo'
>> Reading symbols from .gnu_debugdata for /usr/sbin/abrtd...
>> (No debugging symbols found in .gnu_debugdata for /usr/sbin/abrtd)
>> Attaching to program: /proc/1329/exe, process 1329
>> [New LWP 1399]
>> [New LWP 1349] ^C
>>
>> Session1:
>> [opc@pssatapa-ol8 TEST]$ ps aux | grep abrtd
>> <<<1329 Is killed >>>
>>
>> This is a very small window, so a heavy application is good for 
>> reproduction. I modified the the last part of pstack like:
>> # Run GDB, strip out unwanted noise.
>> # --readnever is no longer used since .gdb_index is now in use.
>> strace -o omlog -k  ./gdb  --quiet -nx  -ex 'set width 0' -ex 'set 
>> height 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread 
>> apply all bt' -ex quit  /proc/$1/exe $1
>>
>> The strace with -k on gdb make gdb slow and we get a window to press 
>> Ctrl+c.  otherwise the window is very small to time the signal. We 
>> observe the problem while the FileStsyem or Kernel or proc FS is slow.
>>
>> The signal is not intended to the inferior.
>> The signal is passed from "gdb" to the inferior.
>>
>> The SIGINT handler in gdb, marks the QUIT flag and
>> in some paths we check the quit flag and pass the signal to inferior.
>> That is killing the inferior.
>>
>> On :
>> +  check_quit_flag();
>> This should be set only when inf->attach_flag is true.
>> I will add the check in next iteration.
>> The idea here is to clear any pending QUIT flag set by sigint
>> else, post we set the sync_flag , a check to QUIT Flag
>> and can kill the inferior.
>>
>> Thanks
>> Partha
> 
> Hi Pedro,
> 
> I do understand the last reply way a bit long to explain you the 
> context. Please find the answers to your questions.
> 
> Here are the questions:
>  >  - If you ctrl-c to abort the attach, do we really abort the
>  >    attach properly?  Or do we stay attached in some half broken state?
>  >
> Let take the case of "gdb -p pid" and press a ctrl + c immediately.
> The intention is to kill the gdb , not to abort the attachment or
> kill the attached pid. In the problem case debugged process is killed.
> 
>  >  - Below you mention pstack, where can we find it?  And you mention
>  >    that ctrl-c is pressed while that is printing a stack.  I'm assuming
>  >    that's a backtrace command.  I'm confused in that case, as if that is
>  >    so, then we should already be past the initial attach.  The question
>  >    would then becomes, shouldn't gdb have the terminal at that point?
>  >    How come it does not?
> 
> pstack is a wrapper over gdb and run as:
> gdb --quiet -nx -ex 'set width 0' -ex 'set height
> 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all
> bt' -ex quit /proc/<<pid>/exe <<pid>>
> 
> Here gdb is run with -"ex" so so it does not need the gdb prompt to 
> issue the "back trace" command.
> 
> The Ctrl+C is issued in a window between initial attach and 
> target_post_attach.
> 
>  >This SIGINT passing is done with "kill(-pgrp, SIGINT)".  How does that 
>  >manage to trigger the exit of the inferior at all?  ptrace should 
>  >intercept the SIGINT before the inferior ever sees it.  Did it not?
> 
> /* Handle a SIGINT.  */
> void
> handle_sigint (int sig)
> {
>    signal (sig, handle_sigint);
> 
>    /* We could be running in a loop reading in symfiles or something so
>       it may be quite a while before we get back to the event loop.  So
>       set quit_flag to 1 here.  Then if QUIT is called before we get to
>       the event loop, we will unwind as expected.  */
>    set_quit_flag ();
> 
> In the problem case, gdb is reading the sysmbol files and we received 
> the sigint. We set the quit flag in this context.
> 
> In the event loop we check the quit flag (check_quit_flag) and pass the 
> signal to the inferior with "child_pass_ctrlc". This kills the target.
> 
> Its fine when gdb has the "GDB" prompt and it owns the terminal and we 
> once we press Ctrl+C and that is passed to the target. But with -"ex" 
> option, use never experience the GDB prompt and the signal should only 
> kill gdb not the attached.
> 
> Best way to visualize the issue is to run:
> gdb --quiet -nx -ex 'set width 0' -ex 'set height 0' -ex 'set pagination 
> no' -ex 'set confirm off' -ex 'thread apply all  bt' -ex quit /proc/1/exe 1
> 
> and we have a sigint (Ctrl+C) while gdb reading the symbol files.
> 
> Thanks
> Partha

Hi GDB team ,

Can you please update on this.

Thanks
Partha

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process.
  2024-06-03  5:21       ` Partha Satapathy
  2024-06-10  5:41         ` Partha Satapathy
@ 2024-06-14 17:19         ` Pedro Alves
  2024-06-20  7:24           ` [External] : " Partha Satapathy
  1 sibling, 1 reply; 11+ messages in thread
From: Pedro Alves @ 2024-06-14 17:19 UTC (permalink / raw)
  To: Partha Satapathy, gdb-patches, rajesh.sivaramasubramaniom,
	bert.barbe, blarsen, cupertino.miranda

Hi!

Sorry for the delay.

On 2024-06-03 06:21, Partha Satapathy wrote:

> I do understand the last reply way a bit long to explain you the context. Please find the answers to your questions.
> 
> Here are the questions:
>>  - If you ctrl-c to abort the attach, do we really abort the
>>    attach properly?  Or do we stay attached in some half broken state?
>>
> Let take the case of "gdb -p pid" and press a ctrl + c immediately.
> The intention is to kill the gdb , not to abort the attachment or
> kill the attached pid. In the problem case debugged process is killed.

The question was really, even with your patch, when we Ctrl-C and that
aborts the attach, if GDB doesn't exit, are we in a good state?  Did
we detach properly?

> 
>>  - Below you mention pstack, where can we find it?  And you mention
>>    that ctrl-c is pressed while that is printing a stack.  I'm assuming
>>    that's a backtrace command.  I'm confused in that case, as if that is
>>    so, then we should already be past the initial attach.  The question
>>    would then becomes, shouldn't gdb have the terminal at that point?
>>    How come it does not?
> 
> pstack is a wrapper over gdb and run as:
> gdb --quiet -nx -ex 'set width 0' -ex 'set height
> 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all
> bt' -ex quit /proc/<<pid>/exe <<pid>>
> 
> Here gdb is run with -"ex" so so it does not need the gdb prompt to issue the "back trace" command.
> 
> The Ctrl+C is issued in a window between initial attach and target_post_attach.
> 
>>This SIGINT passing is done with "kill(-pgrp, SIGINT)".  How does that >manage to trigger the exit of the inferior at all?  ptrace should >intercept the SIGINT before the inferior ever sees it.  Did it not?

This question ^^^ was the most important.  I didn't see an answer for it.

> 
> /* Handle a SIGINT.  */
> void
> handle_sigint (int sig)
> {
>   signal (sig, handle_sigint);
> 
>   /* We could be running in a loop reading in symfiles or something so
>      it may be quite a while before we get back to the event loop.  So
>      set quit_flag to 1 here.  Then if QUIT is called before we get to
>      the event loop, we will unwind as expected.  */
>   set_quit_flag ();
> 
> In the problem case, gdb is reading the sysmbol files and we received the sigint. We set the quit flag in this context.
> 
> In the event loop we check the quit flag (check_quit_flag) and pass the signal to the inferior with "child_pass_ctrlc". This kills the target.

child_pass_ctrlc "kills" the target in the sense that it uses "kill(.., SIGINT)".  But that
should _not_ make process actually die!  If we are already attached to the process,
then ptrace should intercept the SIGINT, linux-nat.c:linux_nat_target::wait should see
the kernel reporting the SIGINT stop out of waitpid.  The only way it really makes the
inferior die if _after_ that, we somehow pass the signal to the inferior, with
"ptrace(PTRACE_CONTINUE, pid, ..., SIGINT)".  Did _that_ happen?  If so, where, how?
We need to understand exactly what is going on before we even think about what
a fix should look like.

> 
> Its fine when gdb has the "GDB" prompt and it owns the terminal and we once we press Ctrl+C and that is passed to the target. But with -"ex" option, use never experience the GDB prompt and the signal should only kill gdb not the attached.
> 
> Best way to visualize the issue is to run:
> gdb --quiet -nx -ex 'set width 0' -ex 'set height 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all  bt' -ex quit /proc/1/exe 1
> 
> and we have a sigint (Ctrl+C) while gdb reading the symbol files.

I tried to reproduce by attaching to a "/usr/bin/sleep 10000" process,
but gdb reads debug info (or really, no debug info at all to read),
so I added a hack to gdb to sleep a bit to give me time to ctrl-c:

diff --git c/gdb/symfile.c w/gdb/symfile.c
index 5a03def91c6..69b475131b2 100644
--- c/gdb/symfile.c
+++ w/gdb/symfile.c
@@ -1072,6 +1072,9 @@ symbol_file_add_with_addrs (const gdb_bfd_ref_ptr &abfd, const char *name,
       else
        gdb_printf (_("Reading symbols from %ps...\n"),
                    styled_string (file_name_style.style (), name));
+
+      gdb_printf (_("Sleeping, press ctrl-c...\n"));
+      sleep (3);
     }
   syms_from_objfile (objfile, addrs, add_flags);


Still, I wasn't able to reproduce what you see:

$ /path/to/current/master/gdb --quiet -nx -ex 'set width 0' -ex 'set height 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all bt' -ex quit /proc/2621255/exe 2621255
Reading symbols from /proc/2621255/exe...
Sleeping, press ctrl-c...
^CQuit
Attaching to program: /proc/2621255/exe, process 2621255
Error reading attached process's symbol file.
: No such file or directory.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Sleeping, press ctrl-c...
Reading symbols from /usr/lib/debug/.build-id/49/0fef8403240c91833978d494d39e537409b92e.debug...
Sleeping, press ctrl-c...
^CReading symbols from /lib64/ld-linux-x86-64.so.2...
Sleeping, press ctrl-c...
^CReading symbols from /usr/lib/debug/.build-id/41/86944c50f8a32b47d74931e3f512b811813b64.debug...
Sleeping, press ctrl-c...
^C[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x000074a84dee578a in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7ffd69aa9540, rem=0x7ffd69aa9530)
    at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78

warning: 78     ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory

Thread 1 (Thread 0x74a84e03b740 (LWP 2621255) "sleep"):
#0  0x000074a84dee578a in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7ffd69aa9540, rem=0x7ffd69aa9530) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
#1  0x000074a84deea677 in __GI___nanosleep (req=<optimized out>, rem=<optimized out>) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
#2  0x00006119b1d789f0 in ?? ()
#3  0x0000000000000000 in ?? ()
Detaching from program: /proc/2621255/exe, process 2621255
[Inferior 1 (process 2621255) detached]

I tried this multiple times, and the inferior process never died.

Pedro Alves


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [External] : Re: [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process.
  2024-06-14 17:19         ` Pedro Alves
@ 2024-06-20  7:24           ` Partha Satapathy
  2024-06-23  5:42             ` Partha Satapathy
  0 siblings, 1 reply; 11+ messages in thread
From: Partha Satapathy @ 2024-06-20  7:24 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches, rajesh.sivaramasubramaniom, bert.barbe,
	blarsen, cupertino.miranda

On 6/14/2024 10:49 PM, Pedro Alves wrote:
> Hi!
> 
> Sorry for the delay.
> 
> On 2024-06-03 06:21, Partha Satapathy wrote:
> 
>> I do understand the last reply way a bit long to explain you the context. Please find the answers to your questions.
>>
>> Here are the questions:
>>>    - If you ctrl-c to abort the attach, do we really abort the
>>>      attach properly?  Or do we stay attached in some half broken state?
>>>
>> Let take the case of "gdb -p pid" and press a ctrl + c immediately.
>> The intention is to kill the gdb , not to abort the attachment or
>> kill the attached pid. In the problem case debugged process is killed.
> 
> The question was really, even with your patch, when we Ctrl-C and that
> aborts the attach, if GDB doesn't exit, are we in a good state?  Did
> we detach properly?
> 
>>
>>>    - Below you mention pstack, where can we find it?  And you mention
>>>      that ctrl-c is pressed while that is printing a stack.  I'm assuming
>>>      that's a backtrace command.  I'm confused in that case, as if that is
>>>      so, then we should already be past the initial attach.  The question
>>>      would then becomes, shouldn't gdb have the terminal at that point?
>>>      How come it does not?
>>
>> pstack is a wrapper over gdb and run as:
>> gdb --quiet -nx -ex 'set width 0' -ex 'set height
>> 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all
>> bt' -ex quit /proc/<<pid>/exe <<pid>>
>>
>> Here gdb is run with -"ex" so so it does not need the gdb prompt to issue the "back trace" command.
>>
>> The Ctrl+C is issued in a window between initial attach and target_post_attach.
>>
>>> This SIGINT passing is done with "kill(-pgrp, SIGINT)".  How does that >manage to trigger the exit of the inferior at all?  ptrace should >intercept the SIGINT before the inferior ever sees it.  Did it not?
> 
> This question ^^^ was the most important.  I didn't see an answer for it.
> 
>>
>> /* Handle a SIGINT.  */
>> void
>> handle_sigint (int sig)
>> {
>>    signal (sig, handle_sigint);
>>
>>    /* We could be running in a loop reading in symfiles or something so
>>       it may be quite a while before we get back to the event loop.  So
>>       set quit_flag to 1 here.  Then if QUIT is called before we get to
>>       the event loop, we will unwind as expected.  */
>>    set_quit_flag ();
>>
>> In the problem case, gdb is reading the sysmbol files and we received the sigint. We set the quit flag in this context.
>>
>> In the event loop we check the quit flag (check_quit_flag) and pass the signal to the inferior with "child_pass_ctrlc". This kills the target.
> 
> child_pass_ctrlc "kills" the target in the sense that it uses "kill(.., SIGINT)".  But that
> should _not_ make process actually die!  If we are already attached to the process,
> then ptrace should intercept the SIGINT, linux-nat.c:linux_nat_target::wait should see
> the kernel reporting the SIGINT stop out of waitpid.  The only way it really makes the
> inferior die if _after_ that, we somehow pass the signal to the inferior, with
> "ptrace(PTRACE_CONTINUE, pid, ..., SIGINT)".  Did _that_ happen?  If so, where, how?
> We need to understand exactly what is going on before we even think about what
> a fix should look like.
> 
>>
>> Its fine when gdb has the "GDB" prompt and it owns the terminal and we once we press Ctrl+C and that is passed to the target. But with -"ex" option, use never experience the GDB prompt and the signal should only kill gdb not the attached.
>>
>> Best way to visualize the issue is to run:
>> gdb --quiet -nx -ex 'set width 0' -ex 'set height 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all  bt' -ex quit /proc/1/exe 1
>>
>> and we have a sigint (Ctrl+C) while gdb reading the symbol files.
> 
> I tried to reproduce by attaching to a "/usr/bin/sleep 10000" process,
> but gdb reads debug info (or really, no debug info at all to read),
> so I added a hack to gdb to sleep a bit to give me time to ctrl-c:
> 
> diff --git c/gdb/symfile.c w/gdb/symfile.c
> index 5a03def91c6..69b475131b2 100644
> --- c/gdb/symfile.c
> +++ w/gdb/symfile.c
> @@ -1072,6 +1072,9 @@ symbol_file_add_with_addrs (const gdb_bfd_ref_ptr &abfd, const char *name,
>         else
>          gdb_printf (_("Reading symbols from %ps...\n"),
>                      styled_string (file_name_style.style (), name));
> +
> +      gdb_printf (_("Sleeping, press ctrl-c...\n"));
> +      sleep (3);
>       }
>     syms_from_objfile (objfile, addrs, add_flags);
> 
> 
> Still, I wasn't able to reproduce what you see:
> 
> $ /path/to/current/master/gdb --quiet -nx -ex 'set width 0' -ex 'set height 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all bt' -ex quit /proc/2621255/exe 2621255
> Reading symbols from /proc/2621255/exe...
> Sleeping, press ctrl-c...
> ^CQuit
> Attaching to program: /proc/2621255/exe, process 2621255
> Error reading attached process's symbol file.
> : No such file or directory.
> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> Sleeping, press ctrl-c...
> Reading symbols from /usr/lib/debug/.build-id/49/0fef8403240c91833978d494d39e537409b92e.debug...
> Sleeping, press ctrl-c...
> ^CReading symbols from /lib64/ld-linux-x86-64.so.2...
> Sleeping, press ctrl-c...
> ^CReading symbols from /usr/lib/debug/.build-id/41/86944c50f8a32b47d74931e3f512b811813b64.debug...
> Sleeping, press ctrl-c...
> ^C[Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> 0x000074a84dee578a in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7ffd69aa9540, rem=0x7ffd69aa9530)
>      at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> 
> warning: 78     ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory
> 
> Thread 1 (Thread 0x74a84e03b740 (LWP 2621255) "sleep"):
> #0  0x000074a84dee578a in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7ffd69aa9540, rem=0x7ffd69aa9530) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> #1  0x000074a84deea677 in __GI___nanosleep (req=<optimized out>, rem=<optimized out>) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
> #2  0x00006119b1d789f0 in ?? ()
> #3  0x0000000000000000 in ?? ()
> Detaching from program: /proc/2621255/exe, process 2621255
> [Inferior 1 (process 2621255) detached]
> 
> I tried this multiple times, and the inferior process never died.
> 
> Pedro Alves
> 


Hi Pedro,
Thanks for looking in to this.

Console 1:
------------------------------------------------------
./gdb -v
GNU gdb (GDB) 16.0.50.20240605-git



# ./inf &
[1] 63855
# gdb -p 63855
(gdb) c
Continuing.

Console2:
-----------------------------------------------------
# cat /proc/63855/task/63855/stat
63855 (inf) R
# kill -2 63855

Console 1:
------------------------------------------------------
Program received signal SIGINT, Interrupt.
0x000000000040053a in main ()
(gdb)
Console2:
-----------------------------------------------------
# cat /proc/63855/task/63855/stat
cat /proc/63855/task/63855/stat
63855 (inf)

Console 1:
------------------------------------------------------
Program received signal SIGINT, Interrupt.
0x000000000040053a in main ()
(gdb) c
Continuing.

(gdb) q
A debugging session is active.
         Inferior 1 [process 63855] will be detached.
Quit anyway? (y or n) y
Detaching from program: /home/opc/BLD/HTTP_GDB_ML/VN/T/inf, process 63855
[Inferior 1 (process 63855) detached]
]# ps aux | grep 63855
root       63855 39.1  0.0   4232   916 pts/2    R    05:37   3:42 ./inf


This is the pass case, and here:  The  “inf” process PID 63855  is 
running (R state) , Shell queued the sigint to process 63855. As process 
63855 is running handled sigint and sent the notification
(sigchild) to Debugger.  GDB handle the sigchild and report the sigint 
on the debugged. GDB masks the sigint on the child and execution 
continues. After quit , the process continues with  running state.


Similarly :
# ./inf&
[1] 123586
# gdb -p 123586
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x000000000040053a in main ()
(gdb)c
(gdb) q
A debugging session is active.
         Inferior 1 [process 123586] will be detached.
Quit anyway? (y or n) y
Detaching from program: inf, process 123586
[Inferior 1 (process 123586) detached]

We pressed Ctrl+C gdb passes the sigint to debugged with 
child_pass_ctrlc, debugged received the sigint. it’s in running state, 
sent the sigchld to gdb and stopped. gdb handle sigchld after quit 
debugged  continued with running  state.


In the fail case :
Console 1:
------------------------------------------------------
#gdb -p 63855
(gdb)

Console 2:
------------------------------------------------------
# cat /proc/63855/task/63855/stat
63855 (inf) t
# kill -2 63855

Console 1:
------------------------------------------------------
0x000000000040053a in main ()
(gdb)

(gdb) q
A debugging session is active.
         Inferior 1 [process 63855] will be detached.
Quit anyway? (y or n) y
Detaching from program: inf, process 63855
[Inferior 1 (process 63855) detached]
[1]+  Interrupt               ./inf
# ps aux | grep 63855
<< 63855 is killed >>

The  “inf” process PID 63855  is ptrace stop state  (t state) . Shell 
queued the sigint to process 63855. As process 63855 is stop state, did 
not act to the signal. “gdb” never received sigchild from  the child 
sigint context and no action taken. After quit , the process starts 
running and executes the normal sigint path and exit.

In my case:
./gdb  -ex 'thread apply all bt' -ex quit  /proc/<<pid>>/exe <<pid>>
There is no interactive debugger or cont command here. We print the 
stack and quit. Hence the process never need to go to running state to 
handle the “sigint” till detach.

If we press a Ctrl+C in while the symbol reading :
./gdb  -ex 'thread apply all bt' -ex quit  /proc/<<pid>>/exe <<pid>>
^CReading symbols from /proc/2888105/exe...
At this time PTRACE_ATTACH is done from gdb, and symbols are read after 
that. gdb receives the sigint and passes it to the debugged by 
child_pass_ctrlc. Debugged is in stop state from the start and did not 
response to the sigint. After quit PTRACE_DETACH, process handles the 
pending sigint and exit.


I am attaching some snip of strace OP from my  reproduction :
1.ptrace attach:

  123833 ptrace(PTRACE_ATTACH, 2888105)
  123835 gdb(inf_ptrace_target::attach
  123836 gdb(linux_nat_target::attach
  123837 gdb(attach_command(char const*,

2. GDB reading symbols with read_symbols, got sigint from “Ctrl+c” 
handle_sigint
and set the quit flag by set_quit_flag.

  270701  gdb(set_quit_flag()
  270702  gdb(handle_sigint(
  270708  gdb(elf_symfile_read(objfile*,
  270709  gdb(read_symbols(objfile*,

3. gdb continue with symbol file read.

  312569  gdb(elf_symfile_read(objfile*,
  312570  gdb(read_symbols(objfile*,

4. At some point , gdb pass the signal to the child.

  312759  gdb(child_pass_ctrlc(target_ops*)
  312760  gdb(target_pass_ctrlc()
  312761  gdb(target_read(
  312762  gdb(target_read_memory(unsigned long, unsigned char*,
  312763  gdb(read_program_header(int, int*, unsigned long*)
  312764  gdb(scan_dyntag_auxv(int, unsigned long*, unsigned long*)
  312765  gdb(elf_locate_base()

5. gdb did not receive the sigchild and continue with reading the symbol.

  316150  gdb(elf_symfile_read(objfile*,
  316151  gdb(read_symbols(objfile*, enum_flags<symfile_add_flag>)
  316154  gdb(add_vsyscall_page(inferior*)

6. at last ptrace detach happened and process is exited.
346819 ptrace(PTRACE_DETACH, 2888105, NULL, 0) = 0

You can try the with the simple experiment explained in “In the fail case :”
The only difference is the sigint is sent from shell rather than “gdb” . 
“gdb” will not react to the sigint if “continue command” is not applies.

“
child_pass_ctrlc "kills" the target in the sense that it uses "kill(.., 
SIGINT)".  But that should _not_ make process actually die!  If we are 
already attached to the process, then ptrace should intercept the 
SIGINT, linux-nat.c:linux_nat_target::wait should see the kernel 
reporting the SIGINT stop out of waitpid.  The only way it really makes 
the inferior die if _after_ that, we somehow pass the signal to the 
inferior, with "ptrace(PTRACE_CONTINUE, pid, ..., SIGINT)".  Did _that_ 
happen?  If so, where, how?
We need to understand exactly what is going on before we even think 
about what a fix should look like.\
“
Yes I agree with you on this. We should see :

  142357  > gdb(sigchld_handler
  142360  > gdb(child_pass_ctrlc(
  142361  > gdb(target_pass_ctrlc()

Followed by :
  142402  gdb(linux_nat_target::wait
  142403  gdb(target_wait(ptid_t, target_waitstatus*,
  142404  gdb(do_target_wait_1(inferior*, ptid_t, target_waitstatus*,
  142405  gdb(fetch_inferior_event()

This I can see in the cases where the debugged process is in running 
state, but not when debugged process is in ptrace stop state.

Thanks
Partha



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [External] : Re: [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process.
  2024-06-20  7:24           ` [External] : " Partha Satapathy
@ 2024-06-23  5:42             ` Partha Satapathy
  2024-06-24 20:04               ` Partha Satapathy
  0 siblings, 1 reply; 11+ messages in thread
From: Partha Satapathy @ 2024-06-23  5:42 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches, rajesh.sivaramasubramaniom, bert.barbe,
	blarsen, cupertino.miranda

On 6/20/2024 12:54 PM, Partha Satapathy wrote:
> On 6/14/2024 10:49 PM, Pedro Alves wrote:
>> Hi!
>>
>> Sorry for the delay.
>>
>> On 2024-06-03 06:21, Partha Satapathy wrote:
>>
>>> I do understand the last reply way a bit long to explain you the 
>>> context. Please find the answers to your questions.
>>>
>>> Here are the questions:
>>>>    - If you ctrl-c to abort the attach, do we really abort the
>>>>      attach properly?  Or do we stay attached in some half broken 
>>>> state?
>>>>
>>> Let take the case of "gdb -p pid" and press a ctrl + c immediately.
>>> The intention is to kill the gdb , not to abort the attachment or
>>> kill the attached pid. In the problem case debugged process is killed.
>>
>> The question was really, even with your patch, when we Ctrl-C and that
>> aborts the attach, if GDB doesn't exit, are we in a good state?  Did
>> we detach properly?
>>
>>>
>>>>    - Below you mention pstack, where can we find it?  And you mention
>>>>      that ctrl-c is pressed while that is printing a stack.  I'm 
>>>> assuming
>>>>      that's a backtrace command.  I'm confused in that case, as if 
>>>> that is
>>>>      so, then we should already be past the initial attach.  The 
>>>> question
>>>>      would then becomes, shouldn't gdb have the terminal at that point?
>>>>      How come it does not?
>>>
>>> pstack is a wrapper over gdb and run as:
>>> gdb --quiet -nx -ex 'set width 0' -ex 'set height
>>> 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all
>>> bt' -ex quit /proc/<<pid>/exe <<pid>>
>>>
>>> Here gdb is run with -"ex" so so it does not need the gdb prompt to 
>>> issue the "back trace" command.
>>>
>>> The Ctrl+C is issued in a window between initial attach and 
>>> target_post_attach.
>>>
>>>> This SIGINT passing is done with "kill(-pgrp, SIGINT)".  How does 
>>>> that >manage to trigger the exit of the inferior at all?  ptrace 
>>>> should >intercept the SIGINT before the inferior ever sees it.  Did 
>>>> it not?
>>
>> This question ^^^ was the most important.  I didn't see an answer for it.
>>
>>>
>>> /* Handle a SIGINT.  */
>>> void
>>> handle_sigint (int sig)
>>> {
>>>    signal (sig, handle_sigint);
>>>
>>>    /* We could be running in a loop reading in symfiles or something so
>>>       it may be quite a while before we get back to the event loop.  So
>>>       set quit_flag to 1 here.  Then if QUIT is called before we get to
>>>       the event loop, we will unwind as expected.  */
>>>    set_quit_flag ();
>>>
>>> In the problem case, gdb is reading the sysmbol files and we received 
>>> the sigint. We set the quit flag in this context.
>>>
>>> In the event loop we check the quit flag (check_quit_flag) and pass 
>>> the signal to the inferior with "child_pass_ctrlc". This kills the 
>>> target.
>>
>> child_pass_ctrlc "kills" the target in the sense that it uses 
>> "kill(.., SIGINT)".  But that
>> should _not_ make process actually die!  If we are already attached to 
>> the process,
>> then ptrace should intercept the SIGINT, 
>> linux-nat.c:linux_nat_target::wait should see
>> the kernel reporting the SIGINT stop out of waitpid.  The only way it 
>> really makes the
>> inferior die if _after_ that, we somehow pass the signal to the 
>> inferior, with
>> "ptrace(PTRACE_CONTINUE, pid, ..., SIGINT)".  Did _that_ happen?  If 
>> so, where, how?
>> We need to understand exactly what is going on before we even think 
>> about what
>> a fix should look like.
>>
>>>
>>> Its fine when gdb has the "GDB" prompt and it owns the terminal and 
>>> we once we press Ctrl+C and that is passed to the target. But with 
>>> -"ex" option, use never experience the GDB prompt and the signal 
>>> should only kill gdb not the attached.
>>>
>>> Best way to visualize the issue is to run:
>>> gdb --quiet -nx -ex 'set width 0' -ex 'set height 0' -ex 'set 
>>> pagination no' -ex 'set confirm off' -ex 'thread apply all  bt' -ex 
>>> quit /proc/1/exe 1
>>>
>>> and we have a sigint (Ctrl+C) while gdb reading the symbol files.
>>
>> I tried to reproduce by attaching to a "/usr/bin/sleep 10000" process,
>> but gdb reads debug info (or really, no debug info at all to read),
>> so I added a hack to gdb to sleep a bit to give me time to ctrl-c:
>>
>> diff --git c/gdb/symfile.c w/gdb/symfile.c
>> index 5a03def91c6..69b475131b2 100644
>> --- c/gdb/symfile.c
>> +++ w/gdb/symfile.c
>> @@ -1072,6 +1072,9 @@ symbol_file_add_with_addrs (const 
>> gdb_bfd_ref_ptr &abfd, const char *name,
>>         else
>>          gdb_printf (_("Reading symbols from %ps...\n"),
>>                      styled_string (file_name_style.style (), name));
>> +
>> +      gdb_printf (_("Sleeping, press ctrl-c...\n"));
>> +      sleep (3);
>>       }
>>     syms_from_objfile (objfile, addrs, add_flags);
>>
>>
>> Still, I wasn't able to reproduce what you see:
>>
>> $ /path/to/current/master/gdb --quiet -nx -ex 'set width 0' -ex 'set 
>> height 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread 
>> apply all bt' -ex quit /proc/2621255/exe 2621255
>> Reading symbols from /proc/2621255/exe...
>> Sleeping, press ctrl-c...
>> ^CQuit
>> Attaching to program: /proc/2621255/exe, process 2621255
>> Error reading attached process's symbol file.
>> : No such file or directory.
>> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
>> Sleeping, press ctrl-c...
>> Reading symbols from 
>> /usr/lib/debug/.build-id/49/0fef8403240c91833978d494d39e537409b92e.debug...
>> Sleeping, press ctrl-c...
>> ^CReading symbols from /lib64/ld-linux-x86-64.so.2...
>> Sleeping, press ctrl-c...
>> ^CReading symbols from 
>> /usr/lib/debug/.build-id/41/86944c50f8a32b47d74931e3f512b811813b64.debug...
>> Sleeping, press ctrl-c...
>> ^C[Thread debugging using libthread_db enabled]
>> Using host libthread_db library 
>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> 0x000074a84dee578a in __GI___clock_nanosleep 
>> (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7ffd69aa9540, 
>> rem=0x7ffd69aa9530)
>>      at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
>>
>> warning: 78     ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such 
>> file or directory
>>
>> Thread 1 (Thread 0x74a84e03b740 (LWP 2621255) "sleep"):
>> #0  0x000074a84dee578a in __GI___clock_nanosleep 
>> (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7ffd69aa9540, 
>> rem=0x7ffd69aa9530) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
>> #1  0x000074a84deea677 in __GI___nanosleep (req=<optimized out>, 
>> rem=<optimized out>) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
>> #2  0x00006119b1d789f0 in ?? ()
>> #3  0x0000000000000000 in ?? ()
>> Detaching from program: /proc/2621255/exe, process 2621255
>> [Inferior 1 (process 2621255) detached]
>>
>> I tried this multiple times, and the inferior process never died.
>>
>> Pedro Alves
>>
> 
> 
> Hi Pedro,
> Thanks for looking in to this.
> 
> Console 1:
> ------------------------------------------------------
> ./gdb -v
> GNU gdb (GDB) 16.0.50.20240605-git
> 
> 
> 
> # ./inf &
> [1] 63855
> # gdb -p 63855
> (gdb) c
> Continuing.
> 
> Console2:
> -----------------------------------------------------
> # cat /proc/63855/task/63855/stat
> 63855 (inf) R
> # kill -2 63855
> 
> Console 1:
> ------------------------------------------------------
> Program received signal SIGINT, Interrupt.
> 0x000000000040053a in main ()
> (gdb)
> Console2:
> -----------------------------------------------------
> # cat /proc/63855/task/63855/stat
> cat /proc/63855/task/63855/stat
> 63855 (inf)
> 
> Console 1:
> ------------------------------------------------------
> Program received signal SIGINT, Interrupt.
> 0x000000000040053a in main ()
> (gdb) c
> Continuing.
> 
> (gdb) q
> A debugging session is active.
>          Inferior 1 [process 63855] will be detached.
> Quit anyway? (y or n) y
> Detaching from program: /home/opc/BLD/HTTP_GDB_ML/VN/T/inf, process 63855
> [Inferior 1 (process 63855) detached]
> ]# ps aux | grep 63855
> root       63855 39.1  0.0   4232   916 pts/2    R    05:37   3:42 ./inf
> 
> 
> This is the pass case, and here:  The  “inf” process PID 63855  is 
> running (R state) , Shell queued the sigint to process 63855. As process 
> 63855 is running handled sigint and sent the notification
> (sigchild) to Debugger.  GDB handle the sigchild and report the sigint 
> on the debugged. GDB masks the sigint on the child and execution 
> continues. After quit , the process continues with  running state.
> 
> 
> Similarly :
> # ./inf&
> [1] 123586
> # gdb -p 123586
> (gdb) c
> Continuing.
> ^C
> Program received signal SIGINT, Interrupt.
> 0x000000000040053a in main ()
> (gdb)c
> (gdb) q
> A debugging session is active.
>          Inferior 1 [process 123586] will be detached.
> Quit anyway? (y or n) y
> Detaching from program: inf, process 123586
> [Inferior 1 (process 123586) detached]
> 
> We pressed Ctrl+C gdb passes the sigint to debugged with 
> child_pass_ctrlc, debugged received the sigint. it’s in running state, 
> sent the sigchld to gdb and stopped. gdb handle sigchld after quit 
> debugged  continued with running  state.
> 
> 
> In the fail case :
> Console 1:
> ------------------------------------------------------
> #gdb -p 63855
> (gdb)
> 
> Console 2:
> ------------------------------------------------------
> # cat /proc/63855/task/63855/stat
> 63855 (inf) t
> # kill -2 63855
> 
> Console 1:
> ------------------------------------------------------
> 0x000000000040053a in main ()
> (gdb)
> 
> (gdb) q
> A debugging session is active.
>          Inferior 1 [process 63855] will be detached.
> Quit anyway? (y or n) y
> Detaching from program: inf, process 63855
> [Inferior 1 (process 63855) detached]
> [1]+  Interrupt               ./inf
> # ps aux | grep 63855
> << 63855 is killed >>
> 
> The  “inf” process PID 63855  is ptrace stop state  (t state) . Shell 
> queued the sigint to process 63855. As process 63855 is stop state, did 
> not act to the signal. “gdb” never received sigchild from  the child 
> sigint context and no action taken. After quit , the process starts 
> running and executes the normal sigint path and exit.
> 
> In my case:
> ./gdb  -ex 'thread apply all bt' -ex quit  /proc/<<pid>>/exe <<pid>>
> There is no interactive debugger or cont command here. We print the 
> stack and quit. Hence the process never need to go to running state to 
> handle the “sigint” till detach.
> 
> If we press a Ctrl+C in while the symbol reading :
> ./gdb  -ex 'thread apply all bt' -ex quit  /proc/<<pid>>/exe <<pid>>
> ^CReading symbols from /proc/2888105/exe...
> At this time PTRACE_ATTACH is done from gdb, and symbols are read after 
> that. gdb receives the sigint and passes it to the debugged by 
> child_pass_ctrlc. Debugged is in stop state from the start and did not 
> response to the sigint. After quit PTRACE_DETACH, process handles the 
> pending sigint and exit.
> 
> 
> I am attaching some snip of strace OP from my  reproduction :
> 1.ptrace attach:
> 
>   123833 ptrace(PTRACE_ATTACH, 2888105)
>   123835 gdb(inf_ptrace_target::attach
>   123836 gdb(linux_nat_target::attach
>   123837 gdb(attach_command(char const*,
> 
> 2. GDB reading symbols with read_symbols, got sigint from “Ctrl+c” 
> handle_sigint
> and set the quit flag by set_quit_flag.
> 
>   270701  gdb(set_quit_flag()
>   270702  gdb(handle_sigint(
>   270708  gdb(elf_symfile_read(objfile*,
>   270709  gdb(read_symbols(objfile*,
> 
> 3. gdb continue with symbol file read.
> 
>   312569  gdb(elf_symfile_read(objfile*,
>   312570  gdb(read_symbols(objfile*,
> 
> 4. At some point , gdb pass the signal to the child.
> 
>   312759  gdb(child_pass_ctrlc(target_ops*)
>   312760  gdb(target_pass_ctrlc()
>   312761  gdb(target_read(
>   312762  gdb(target_read_memory(unsigned long, unsigned char*,
>   312763  gdb(read_program_header(int, int*, unsigned long*)
>   312764  gdb(scan_dyntag_auxv(int, unsigned long*, unsigned long*)
>   312765  gdb(elf_locate_base()
> 
> 5. gdb did not receive the sigchild and continue with reading the symbol.
> 
>   316150  gdb(elf_symfile_read(objfile*,
>   316151  gdb(read_symbols(objfile*, enum_flags<symfile_add_flag>)
>   316154  gdb(add_vsyscall_page(inferior*)
> 
> 6. at last ptrace detach happened and process is exited.
> 346819 ptrace(PTRACE_DETACH, 2888105, NULL, 0) = 0
> 
> You can try the with the simple experiment explained in “In the fail 
> case :”
> The only difference is the sigint is sent from shell rather than “gdb” . 
> “gdb” will not react to the sigint if “continue command” is not applies.
> 
> “
> child_pass_ctrlc "kills" the target in the sense that it uses "kill(.., 
> SIGINT)".  But that should _not_ make process actually die!  If we are 
> already attached to the process, then ptrace should intercept the 
> SIGINT, linux-nat.c:linux_nat_target::wait should see the kernel 
> reporting the SIGINT stop out of waitpid.  The only way it really makes 
> the inferior die if _after_ that, we somehow pass the signal to the 
> inferior, with "ptrace(PTRACE_CONTINUE, pid, ..., SIGINT)".  Did _that_ 
> happen?  If so, where, how?
> We need to understand exactly what is going on before we even think 
> about what a fix should look like.\
> “
> Yes I agree with you on this. We should see :
> 
>   142357  > gdb(sigchld_handler
>   142360  > gdb(child_pass_ctrlc(
>   142361  > gdb(target_pass_ctrlc()
> 
> Followed by :
>   142402  gdb(linux_nat_target::wait
>   142403  gdb(target_wait(ptid_t, target_waitstatus*,
>   142404  gdb(do_target_wait_1(inferior*, ptid_t, target_waitstatus*,
>   142405  gdb(fetch_inferior_event()
> 
> This I can see in the cases where the debugged process is in running 
> state, but not when debugged process is in ptrace stop state.
> 
> Thanks
> Partha
> 
> 


Hi Pedro,

The problem is with multi-threaded process.
Once Ctrl+C or sigint to gdb set the quit flag, check quit flag at 
places and then call target_pass_ctrlc that calls child_pass_ctrlc.
In target_pass_ctrlc we check for :
       for (thread_info *thr : inf->non_exited_threads ())
         {
           /* A thread can be THREAD_STOPPED and executing, while
              running an infcall.  */
           if (thr->state == THREAD_RUNNING || thr->executing ())
             {

And then current_inferior ()->top_target ()->pass_ctrlc ();

So the target should be multithread.

Here is how I can easily recreate the issue.:

# git remote -v
origin  https://sourceware.org/git/binutils-gdb.git
# git branch
* master

# git diff
diff --git a/gdb/elfread.c b/gdb/elfread.c
index 7a6a8cadcedd..6bdfa05f6519 100644
--- a/gdb/elfread.c
+++ b/gdb/elfread.c
@@ -49,6 +49,10 @@
  #include "gdbsupport/scoped_fd.h"
  #include "dwarf2/public.h"
  #include "cli/cli-cmds.h"
+#include <unistd.h>
+#include <stdlib.h>
+#include <signal.h>
+

  /* Whether ctf should always be read, or only if no dwarf is present.  */
  static bool always_read_ctf;
@@ -1254,6 +1258,12 @@ elf_symfile_read (struct objfile *objfile, 
symfile_add_flags symfile_flags)
  {
    bfd *abfd = objfile->obfd.get ();
    struct elfinfo ei;
+  int pid;
+
+  pid = getpid();
+  kill(pid, 2);
+  gdb_printf ("_DEBUG_ : Killing pid %d in elf_symfile_read\n", pid);
+

    memset ((char *) &ei, 0, sizeof (ei));
    if (!(objfile->flags & OBJF_READNEVER))

Instead of a self-kill, you can put a sleep and press Ctrl+C., this will 
  also yield the same behavior.

# cat infth.c
-------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>

void * worker(void *data)
{
     int num = *(int *)data;
     char *com;

     printf("Thread no %d\n", num);
     while(1) {
         if (num > 1) {
            sleep(3000);
         }
     }
     return NULL;
}


void main(void) {
     pthread_t th1, th2;
     int x=1, y=2;

     pthread_create(&th1, NULL, worker, (void *)(&x));
     pthread_create(&th2, NULL, worker, (void *)(&y));
     sleep(2);

     pthread_join(th1, NULL);
     pthread_join(th2, NULL);

     while(1) {
         sleep(300);
     }
}


# ./gdb -v
GNU gdb (GDB) 16.0.50.20240621-git

# ps aux | grep infth
root       13462  107  0.0  88484  3432 pts/1    Sl+  04:21   0:13 ./infth

# ./gdb  --quiet -nx  -ex 'set width 0' -ex 'set height 0' -ex 'set 
pagination no' -ex 'set confirm off' -ex 'thread apply all bt' -ex quit 
/proc/13462/exe  13462
Reading symbols from /proc/13462/exe...
_DEBUG_ : Killing pid 14346 in elf_symfile_read
Error occurred computing Python error message.
(No debugging symbols found in /proc/13462/exe)
Attaching to program: /proc/13462/exe, process 13462
[New LWP 13464]
[New LWP 13463]
_DEBUG_ : Killing pid 14346 in elf_symfile_read
Python Exception <class 'AttributeError'>: module 'gdb' has no attribute 
'_handle_missing_debuginfo'
_DEBUG_ : Killing pid 14346 in elf_symfile_read
…
…
Detaching from program: /proc/13462/exe, process 13462
[Inferior 1 (process 13462) detached]
[root@pssatapa-ol8 FTEST]# ps aux | grep 13462
<<<<  13462


This is not limited to command line gdb, I can see the issue with the 
interactive gdb as well.
[root@pssatapa-ol8 FTEST]# ./gdb -p 14913
GNU gdb (GDB) 16.0.50.20240621-git

Attaching to process 14913
[New LWP 14915]
[New LWP 14914]
_DEBUG_ : Killing pid 15042 in elf_symfile_read
Python Exception <class 'AttributeError'>: module 'gdb' has no attribute 
'_handle_missing_debuginfo'
_DEBUG_ : Killing pid 15042 in elf_symfile_read
Python Exception <class 'AttributeError'>: module 'gdb' has no attribute 
'_handle_missing_debuginfo'
_DEBUG_ : Killing pid 15042 in elf_symfile_read

Detaching from program: /home/opc/BLD/HTTP_GDB_ML/T1/FTEST/infth, 
process 14913
[Inferior 1 (process 14913) detached]
[root@pssatapa-ol8 FTEST]# ps aux | grep 14913
<<< Process 14913 is killed.

Not only elf_symfile_read() , I have also instrumented 
gdb_bfd_map_section  and find_separate_debug_file_by_debuglink, yielding 
the same result. A call to target_pass_ctrlc () on sigint, depends upon 
the  signal instrumention.  Observed target_pass_ctrlc  with 
infrun_quit_handler and target_terminal::inferior with the above 
instrumentation.


Probable Solution could be:
1.	Till the symbol files are read the process will be  trace stop state 
(t state).
We should not send the signal till that point.
The current fix trying to address that.
2.	Should not  send the signal in child_process_ctrlc
If inferior_thread ()->state == THREAD_STOPPED.
3.	Delay sending the signal till cont is pressed.
Anyway we are not sending the signal as soon as we get sigint.
The signal is sent after checking quit_flag in event loop, but not in 
handle_sigint.
So let’s delay till cont is pressed and process is out of debug stop 
state (t state).


Thanks
Partha

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [External] : Re: [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process.
  2024-06-23  5:42             ` Partha Satapathy
@ 2024-06-24 20:04               ` Partha Satapathy
  0 siblings, 0 replies; 11+ messages in thread
From: Partha Satapathy @ 2024-06-24 20:04 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches, rajesh.sivaramasubramaniom, bert.barbe,
	blarsen, cupertino.miranda

On 6/23/2024 11:12 AM, Partha Satapathy wrote:
> On 6/20/2024 12:54 PM, Partha Satapathy wrote:
>> On 6/14/2024 10:49 PM, Pedro Alves wrote:
>>> Hi!
>>>
>>> Sorry for the delay.
>>>
>>> On 2024-06-03 06:21, Partha Satapathy wrote:
>>>
>>>> I do understand the last reply way a bit long to explain you the 
>>>> context. Please find the answers to your questions.
>>>>
>>>> Here are the questions:
>>>>>    - If you ctrl-c to abort the attach, do we really abort the
>>>>>      attach properly?  Or do we stay attached in some half broken 
>>>>> state?
>>>>>
>>>> Let take the case of "gdb -p pid" and press a ctrl + c immediately.
>>>> The intention is to kill the gdb , not to abort the attachment or
>>>> kill the attached pid. In the problem case debugged process is killed.
>>>
>>> The question was really, even with your patch, when we Ctrl-C and that
>>> aborts the attach, if GDB doesn't exit, are we in a good state?  Did
>>> we detach properly?
>>>
>>>>
>>>>>    - Below you mention pstack, where can we find it?  And you mention
>>>>>      that ctrl-c is pressed while that is printing a stack.  I'm 
>>>>> assuming
>>>>>      that's a backtrace command.  I'm confused in that case, as if 
>>>>> that is
>>>>>      so, then we should already be past the initial attach.  The 
>>>>> question
>>>>>      would then becomes, shouldn't gdb have the terminal at that 
>>>>> point?
>>>>>      How come it does not?
>>>>
>>>> pstack is a wrapper over gdb and run as:
>>>> gdb --quiet -nx -ex 'set width 0' -ex 'set height
>>>> 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread apply all
>>>> bt' -ex quit /proc/<<pid>/exe <<pid>>
>>>>
>>>> Here gdb is run with -"ex" so so it does not need the gdb prompt to 
>>>> issue the "back trace" command.
>>>>
>>>> The Ctrl+C is issued in a window between initial attach and 
>>>> target_post_attach.
>>>>
>>>>> This SIGINT passing is done with "kill(-pgrp, SIGINT)".  How does 
>>>>> that >manage to trigger the exit of the inferior at all?  ptrace 
>>>>> should >intercept the SIGINT before the inferior ever sees it.  Did 
>>>>> it not?
>>>
>>> This question ^^^ was the most important.  I didn't see an answer for 
>>> it.
>>>
>>>>
>>>> /* Handle a SIGINT.  */
>>>> void
>>>> handle_sigint (int sig)
>>>> {
>>>>    signal (sig, handle_sigint);
>>>>
>>>>    /* We could be running in a loop reading in symfiles or something so
>>>>       it may be quite a while before we get back to the event loop.  So
>>>>       set quit_flag to 1 here.  Then if QUIT is called before we get to
>>>>       the event loop, we will unwind as expected.  */
>>>>    set_quit_flag ();
>>>>
>>>> In the problem case, gdb is reading the sysmbol files and we 
>>>> received the sigint. We set the quit flag in this context.
>>>>
>>>> In the event loop we check the quit flag (check_quit_flag) and pass 
>>>> the signal to the inferior with "child_pass_ctrlc". This kills the 
>>>> target.
>>>
>>> child_pass_ctrlc "kills" the target in the sense that it uses 
>>> "kill(.., SIGINT)".  But that
>>> should _not_ make process actually die!  If we are already attached 
>>> to the process,
>>> then ptrace should intercept the SIGINT, 
>>> linux-nat.c:linux_nat_target::wait should see
>>> the kernel reporting the SIGINT stop out of waitpid.  The only way it 
>>> really makes the
>>> inferior die if _after_ that, we somehow pass the signal to the 
>>> inferior, with
>>> "ptrace(PTRACE_CONTINUE, pid, ..., SIGINT)".  Did _that_ happen?  If 
>>> so, where, how?
>>> We need to understand exactly what is going on before we even think 
>>> about what
>>> a fix should look like.
>>>
>>>>
>>>> Its fine when gdb has the "GDB" prompt and it owns the terminal and 
>>>> we once we press Ctrl+C and that is passed to the target. But with 
>>>> -"ex" option, use never experience the GDB prompt and the signal 
>>>> should only kill gdb not the attached.
>>>>
>>>> Best way to visualize the issue is to run:
>>>> gdb --quiet -nx -ex 'set width 0' -ex 'set height 0' -ex 'set 
>>>> pagination no' -ex 'set confirm off' -ex 'thread apply all  bt' -ex 
>>>> quit /proc/1/exe 1
>>>>
>>>> and we have a sigint (Ctrl+C) while gdb reading the symbol files.
>>>
>>> I tried to reproduce by attaching to a "/usr/bin/sleep 10000" process,
>>> but gdb reads debug info (or really, no debug info at all to read),
>>> so I added a hack to gdb to sleep a bit to give me time to ctrl-c:
>>>
>>> diff --git c/gdb/symfile.c w/gdb/symfile.c
>>> index 5a03def91c6..69b475131b2 100644
>>> --- c/gdb/symfile.c
>>> +++ w/gdb/symfile.c
>>> @@ -1072,6 +1072,9 @@ symbol_file_add_with_addrs (const 
>>> gdb_bfd_ref_ptr &abfd, const char *name,
>>>         else
>>>          gdb_printf (_("Reading symbols from %ps...\n"),
>>>                      styled_string (file_name_style.style (), name));
>>> +
>>> +      gdb_printf (_("Sleeping, press ctrl-c...\n"));
>>> +      sleep (3);
>>>       }
>>>     syms_from_objfile (objfile, addrs, add_flags);
>>>
>>>
>>> Still, I wasn't able to reproduce what you see:
>>>
>>> $ /path/to/current/master/gdb --quiet -nx -ex 'set width 0' -ex 'set 
>>> height 0' -ex 'set pagination no' -ex 'set confirm off' -ex 'thread 
>>> apply all bt' -ex quit /proc/2621255/exe 2621255
>>> Reading symbols from /proc/2621255/exe...
>>> Sleeping, press ctrl-c...
>>> ^CQuit
>>> Attaching to program: /proc/2621255/exe, process 2621255
>>> Error reading attached process's symbol file.
>>> : No such file or directory.
>>> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
>>> Sleeping, press ctrl-c...
>>> Reading symbols from 
>>> /usr/lib/debug/.build-id/49/0fef8403240c91833978d494d39e537409b92e.debug...
>>> Sleeping, press ctrl-c...
>>> ^CReading symbols from /lib64/ld-linux-x86-64.so.2...
>>> Sleeping, press ctrl-c...
>>> ^CReading symbols from 
>>> /usr/lib/debug/.build-id/41/86944c50f8a32b47d74931e3f512b811813b64.debug...
>>> Sleeping, press ctrl-c...
>>> ^C[Thread debugging using libthread_db enabled]
>>> Using host libthread_db library 
>>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>>> 0x000074a84dee578a in __GI___clock_nanosleep 
>>> (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7ffd69aa9540, 
>>> rem=0x7ffd69aa9530)
>>>      at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
>>>
>>> warning: 78     ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such 
>>> file or directory
>>>
>>> Thread 1 (Thread 0x74a84e03b740 (LWP 2621255) "sleep"):
>>> #0  0x000074a84dee578a in __GI___clock_nanosleep 
>>> (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7ffd69aa9540, 
>>> rem=0x7ffd69aa9530) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
>>> #1  0x000074a84deea677 in __GI___nanosleep (req=<optimized out>, 
>>> rem=<optimized out>) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
>>> #2  0x00006119b1d789f0 in ?? ()
>>> #3  0x0000000000000000 in ?? ()
>>> Detaching from program: /proc/2621255/exe, process 2621255
>>> [Inferior 1 (process 2621255) detached]
>>>
>>> I tried this multiple times, and the inferior process never died.
>>>
>>> Pedro Alves
>>>
>>
>>
>> Hi Pedro,
>> Thanks for looking in to this.
>>
>> Console 1:
>> ------------------------------------------------------
>> ./gdb -v
>> GNU gdb (GDB) 16.0.50.20240605-git
>>
>>
>>
>> # ./inf &
>> [1] 63855
>> # gdb -p 63855
>> (gdb) c
>> Continuing.
>>
>> Console2:
>> -----------------------------------------------------
>> # cat /proc/63855/task/63855/stat
>> 63855 (inf) R
>> # kill -2 63855
>>
>> Console 1:
>> ------------------------------------------------------
>> Program received signal SIGINT, Interrupt.
>> 0x000000000040053a in main ()
>> (gdb)
>> Console2:
>> -----------------------------------------------------
>> # cat /proc/63855/task/63855/stat
>> cat /proc/63855/task/63855/stat
>> 63855 (inf)
>>
>> Console 1:
>> ------------------------------------------------------
>> Program received signal SIGINT, Interrupt.
>> 0x000000000040053a in main ()
>> (gdb) c
>> Continuing.
>>
>> (gdb) q
>> A debugging session is active.
>>          Inferior 1 [process 63855] will be detached.
>> Quit anyway? (y or n) y
>> Detaching from program: /home/opc/BLD/HTTP_GDB_ML/VN/T/inf, process 63855
>> [Inferior 1 (process 63855) detached]
>> ]# ps aux | grep 63855
>> root       63855 39.1  0.0   4232   916 pts/2    R    05:37   3:42 ./inf
>>
>>
>> This is the pass case, and here:  The  “inf” process PID 63855  is 
>> running (R state) , Shell queued the sigint to process 63855. As 
>> process 63855 is running handled sigint and sent the notification
>> (sigchild) to Debugger.  GDB handle the sigchild and report the sigint 
>> on the debugged. GDB masks the sigint on the child and execution 
>> continues. After quit , the process continues with  running state.
>>
>>
>> Similarly :
>> # ./inf&
>> [1] 123586
>> # gdb -p 123586
>> (gdb) c
>> Continuing.
>> ^C
>> Program received signal SIGINT, Interrupt.
>> 0x000000000040053a in main ()
>> (gdb)c
>> (gdb) q
>> A debugging session is active.
>>          Inferior 1 [process 123586] will be detached.
>> Quit anyway? (y or n) y
>> Detaching from program: inf, process 123586
>> [Inferior 1 (process 123586) detached]
>>
>> We pressed Ctrl+C gdb passes the sigint to debugged with 
>> child_pass_ctrlc, debugged received the sigint. it’s in running state, 
>> sent the sigchld to gdb and stopped. gdb handle sigchld after quit 
>> debugged  continued with running  state.
>>
>>
>> In the fail case :
>> Console 1:
>> ------------------------------------------------------
>> #gdb -p 63855
>> (gdb)
>>
>> Console 2:
>> ------------------------------------------------------
>> # cat /proc/63855/task/63855/stat
>> 63855 (inf) t
>> # kill -2 63855
>>
>> Console 1:
>> ------------------------------------------------------
>> 0x000000000040053a in main ()
>> (gdb)
>>
>> (gdb) q
>> A debugging session is active.
>>          Inferior 1 [process 63855] will be detached.
>> Quit anyway? (y or n) y
>> Detaching from program: inf, process 63855
>> [Inferior 1 (process 63855) detached]
>> [1]+  Interrupt               ./inf
>> # ps aux | grep 63855
>> << 63855 is killed >>
>>
>> The  “inf” process PID 63855  is ptrace stop state  (t state) . Shell 
>> queued the sigint to process 63855. As process 63855 is stop state, 
>> did not act to the signal. “gdb” never received sigchild from  the 
>> child sigint context and no action taken. After quit , the process 
>> starts running and executes the normal sigint path and exit.
>>
>> In my case:
>> ./gdb  -ex 'thread apply all bt' -ex quit  /proc/<<pid>>/exe <<pid>>
>> There is no interactive debugger or cont command here. We print the 
>> stack and quit. Hence the process never need to go to running state to 
>> handle the “sigint” till detach.
>>
>> If we press a Ctrl+C in while the symbol reading :
>> ./gdb  -ex 'thread apply all bt' -ex quit  /proc/<<pid>>/exe <<pid>>
>> ^CReading symbols from /proc/2888105/exe...
>> At this time PTRACE_ATTACH is done from gdb, and symbols are read 
>> after that. gdb receives the sigint and passes it to the debugged by 
>> child_pass_ctrlc. Debugged is in stop state from the start and did not 
>> response to the sigint. After quit PTRACE_DETACH, process handles the 
>> pending sigint and exit.
>>
>>
>> I am attaching some snip of strace OP from my  reproduction :
>> 1.ptrace attach:
>>
>>   123833 ptrace(PTRACE_ATTACH, 2888105)
>>   123835 gdb(inf_ptrace_target::attach
>>   123836 gdb(linux_nat_target::attach
>>   123837 gdb(attach_command(char const*,
>>
>> 2. GDB reading symbols with read_symbols, got sigint from “Ctrl+c” 
>> handle_sigint
>> and set the quit flag by set_quit_flag.
>>
>>   270701  gdb(set_quit_flag()
>>   270702  gdb(handle_sigint(
>>   270708  gdb(elf_symfile_read(objfile*,
>>   270709  gdb(read_symbols(objfile*,
>>
>> 3. gdb continue with symbol file read.
>>
>>   312569  gdb(elf_symfile_read(objfile*,
>>   312570  gdb(read_symbols(objfile*,
>>
>> 4. At some point , gdb pass the signal to the child.
>>
>>   312759  gdb(child_pass_ctrlc(target_ops*)
>>   312760  gdb(target_pass_ctrlc()
>>   312761  gdb(target_read(
>>   312762  gdb(target_read_memory(unsigned long, unsigned char*,
>>   312763  gdb(read_program_header(int, int*, unsigned long*)
>>   312764  gdb(scan_dyntag_auxv(int, unsigned long*, unsigned long*)
>>   312765  gdb(elf_locate_base()
>>
>> 5. gdb did not receive the sigchild and continue with reading the symbol.
>>
>>   316150  gdb(elf_symfile_read(objfile*,
>>   316151  gdb(read_symbols(objfile*, enum_flags<symfile_add_flag>)
>>   316154  gdb(add_vsyscall_page(inferior*)
>>
>> 6. at last ptrace detach happened and process is exited.
>> 346819 ptrace(PTRACE_DETACH, 2888105, NULL, 0) = 0
>>
>> You can try the with the simple experiment explained in “In the fail 
>> case :”
>> The only difference is the sigint is sent from shell rather than “gdb” 
>> . “gdb” will not react to the sigint if “continue command” is not 
>> applies.
>>
>> “
>> child_pass_ctrlc "kills" the target in the sense that it uses 
>> "kill(.., SIGINT)".  But that should _not_ make process actually die!  
>> If we are already attached to the process, then ptrace should 
>> intercept the SIGINT, linux-nat.c:linux_nat_target::wait should see 
>> the kernel reporting the SIGINT stop out of waitpid.  The only way it 
>> really makes the inferior die if _after_ that, we somehow pass the 
>> signal to the inferior, with "ptrace(PTRACE_CONTINUE, pid, ..., 
>> SIGINT)".  Did _that_ happen?  If so, where, how?
>> We need to understand exactly what is going on before we even think 
>> about what a fix should look like.\
>> “
>> Yes I agree with you on this. We should see :
>>
>>   142357  > gdb(sigchld_handler
>>   142360  > gdb(child_pass_ctrlc(
>>   142361  > gdb(target_pass_ctrlc()
>>
>> Followed by :
>>   142402  gdb(linux_nat_target::wait
>>   142403  gdb(target_wait(ptid_t, target_waitstatus*,
>>   142404  gdb(do_target_wait_1(inferior*, ptid_t, target_waitstatus*,
>>   142405  gdb(fetch_inferior_event()
>>
>> This I can see in the cases where the debugged process is in running 
>> state, but not when debugged process is in ptrace stop state.
>>
>> Thanks
>> Partha
>>
>>
> 
> 
> Hi Pedro,
> 
> The problem is with multi-threaded process.
> Once Ctrl+C or sigint to gdb set the quit flag, check quit flag at 
> places and then call target_pass_ctrlc that calls child_pass_ctrlc.
> In target_pass_ctrlc we check for :
>        for (thread_info *thr : inf->non_exited_threads ())
>          {
>            /* A thread can be THREAD_STOPPED and executing, while
>               running an infcall.  */
>            if (thr->state == THREAD_RUNNING || thr->executing ())
>              {
> 
> And then current_inferior ()->top_target ()->pass_ctrlc ();
> 
> So the target should be multithread.
> 
> Here is how I can easily recreate the issue.:
> 
> # git remote -v
> origin  
> https://urldefense.com/v3/__https://sourceware.org/git/binutils-gdb.git__;!!ACWV5N9M2RV99hQ!KQ9XSvk9SPDroDnSLYsm_H6XaeLyqNRiUNyjayVfE3l58UzOwspJ9sThg8Cj7LFvH8JqMVHygGqK_M99CsbP-FgHP2t5JaUL$ # git branch
> * master
> 
> # git diff
> diff --git a/gdb/elfread.c b/gdb/elfread.c
> index 7a6a8cadcedd..6bdfa05f6519 100644
> --- a/gdb/elfread.c
> +++ b/gdb/elfread.c
> @@ -49,6 +49,10 @@
>   #include "gdbsupport/scoped_fd.h"
>   #include "dwarf2/public.h"
>   #include "cli/cli-cmds.h"
> +#include <unistd.h>
> +#include <stdlib.h>
> +#include <signal.h>
> +
> 
>   /* Whether ctf should always be read, or only if no dwarf is present.  */
>   static bool always_read_ctf;
> @@ -1254,6 +1258,12 @@ elf_symfile_read (struct objfile *objfile, 
> symfile_add_flags symfile_flags)
>   {
>     bfd *abfd = objfile->obfd.get ();
>     struct elfinfo ei;
> +  int pid;
> +
> +  pid = getpid();
> +  kill(pid, 2);
> +  gdb_printf ("_DEBUG_ : Killing pid %d in elf_symfile_read\n", pid);
> +
> 
>     memset ((char *) &ei, 0, sizeof (ei));
>     if (!(objfile->flags & OBJF_READNEVER))
> 
> Instead of a self-kill, you can put a sleep and press Ctrl+C., this will 
>   also yield the same behavior.
> 
> # cat infth.c
> -------------------------------------
> #include <stdio.h>
> #include <stdlib.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <pthread.h>
> 
> void * worker(void *data)
> {
>      int num = *(int *)data;
>      char *com;
> 
>      printf("Thread no %d\n", num);
>      while(1) {
>          if (num > 1) {
>             sleep(3000);
>          }
>      }
>      return NULL;
> }
> 
> 
> void main(void) {
>      pthread_t th1, th2;
>      int x=1, y=2;
> 
>      pthread_create(&th1, NULL, worker, (void *)(&x));
>      pthread_create(&th2, NULL, worker, (void *)(&y));
>      sleep(2);
> 
>      pthread_join(th1, NULL);
>      pthread_join(th2, NULL);
> 
>      while(1) {
>          sleep(300);
>      }
> }
> 
> 
> # ./gdb -v
> GNU gdb (GDB) 16.0.50.20240621-git
> 
> # ps aux | grep infth
> root       13462  107  0.0  88484  3432 pts/1    Sl+  04:21   0:13 ./infth
> 
> # ./gdb  --quiet -nx  -ex 'set width 0' -ex 'set height 0' -ex 'set 
> pagination no' -ex 'set confirm off' -ex 'thread apply all bt' -ex quit 
> /proc/13462/exe  13462
> Reading symbols from /proc/13462/exe...
> _DEBUG_ : Killing pid 14346 in elf_symfile_read
> Error occurred computing Python error message.
> (No debugging symbols found in /proc/13462/exe)
> Attaching to program: /proc/13462/exe, process 13462
> [New LWP 13464]
> [New LWP 13463]
> _DEBUG_ : Killing pid 14346 in elf_symfile_read
> Python Exception <class 'AttributeError'>: module 'gdb' has no attribute 
> '_handle_missing_debuginfo'
> _DEBUG_ : Killing pid 14346 in elf_symfile_read
> …
> …
> Detaching from program: /proc/13462/exe, process 13462
> [Inferior 1 (process 13462) detached]
> [root@pssatapa-ol8 FTEST]# ps aux | grep 13462
> <<<<  13462
> 
> 
> This is not limited to command line gdb, I can see the issue with the 
> interactive gdb as well.
> [root@pssatapa-ol8 FTEST]# ./gdb -p 14913
> GNU gdb (GDB) 16.0.50.20240621-git
> 
> Attaching to process 14913
> [New LWP 14915]
> [New LWP 14914]
> _DEBUG_ : Killing pid 15042 in elf_symfile_read
> Python Exception <class 'AttributeError'>: module 'gdb' has no attribute 
> '_handle_missing_debuginfo'
> _DEBUG_ : Killing pid 15042 in elf_symfile_read
> Python Exception <class 'AttributeError'>: module 'gdb' has no attribute 
> '_handle_missing_debuginfo'
> _DEBUG_ : Killing pid 15042 in elf_symfile_read
> 
> Detaching from program: /home/opc/BLD/HTTP_GDB_ML/T1/FTEST/infth, 
> process 14913
> [Inferior 1 (process 14913) detached]
> [root@pssatapa-ol8 FTEST]# ps aux | grep 14913
> <<< Process 14913 is killed.
> 
> Not only elf_symfile_read() , I have also instrumented 
> gdb_bfd_map_section  and find_separate_debug_file_by_debuglink, yielding 
> the same result. A call to target_pass_ctrlc () on sigint, depends upon 
> the  signal instrumention.  Observed target_pass_ctrlc  with 
> infrun_quit_handler and target_terminal::inferior with the above 
> instrumentation.
> 
> 
> Probable Solution could be:
> 1.    Till the symbol files are read the process will be  trace stop 
> state (t state).
> We should not send the signal till that point.
> The current fix trying to address that.
> 2.    Should not  send the signal in child_process_ctrlc
> If inferior_thread ()->state == THREAD_STOPPED.
> 3.    Delay sending the signal till cont is pressed.
> Anyway we are not sending the signal as soon as we get sigint.
> The signal is sent after checking quit_flag in event loop, but not in 
> handle_sigint.
> So let’s delay till cont is pressed and process is out of debug stop 
> state (t state).
> 
> 
> Thanks
> Partha



Hi Pedro,

For a multi thread inferior,  in initial attach:
handle_one sets
  t->set_executing (false);
and then proceed to :
       if (t->inf->needs_setup)
         {
           switch_to_thread_no_regs (t);
           setup_inferior (0);
         }

In setup_inferior context we read the symbol files.  It this time the 
state is not set to stopped. The other threads may also have the state 
running and thread executing, set by nat attach calls. A SIGINT signal 
during  this time trigger target_pass_ctrlc and in that (thr->state == 
THREAD_RUNNING || thr->executing ()) may pass. This check may also pass 
for threads, those who still have not reported the STOP event to 
handle_one, resulting a signal to child. The master thread is in ptrace 
stop state and cant respond to the signal and the pending signal will 
trigger the child exit after detach.

finish_thread_state syncs the executing and state after the attach is done.
finish_thread_state (process_stratum_target *targ, ptid_t ptid)
{
…
     if (set_running_thread (tp, tp->executing ()))
       any_started = true;

Introducing  one more variable to mark the thread attached is completed. 
We will check this variable in target_pass_ctrlc, to confirm all thread 
attached successfully, before passing the signal to child.
Please find the fix bellow:


------------

gdb : Signal to pstack/gdb kills the attached process.

Problem: While gdb is attaching an multi threaded inferior,
if ctrl-c is pressed in the middle of the process attach,
the sigint is passed to the debugged process.
This triggers the exit of the inferior. For example in pstack,
printing a stack can take significant time, and ctrl-c is pressed to
abort the pstack/gdb application. This in turn kills the debugged
process, which can be critical for the system. In this case, the
intention of ctrl+c is to kill pstack/gdb, but not the inferior
application.
gdb -p <<pid>>
or gdb /proc/<<pid>>/exe pid
Attaching to process
<< ctrl+c is pressed during attach
(gdb) q
<<<< inferior process exited >>>>

Root Cause: handle_one() handles an event after stopping threads.
On a TARGET_WAITKIND_STOPPED event, handle one set: t->set_executing 
(false).
and go for setup_inferior(). setup_inferior(), read dependent symbols.
The thread state may not be stopped at this time.
handle_one() serves a single thread at a time, and other threads might not
have stopped by this time. SIGINT to GDB during symbol reading,
will call target_pass_ctrlc(),
and (thr->state == THREAD_RUNNING || thr->executing) may pass for some
threads at this time, resulting in a sigint to the inferior.

Solution: thread_attached in thread_info is set during the thread's
initial attach. After a thread is attached, finish_thread_state() syncs the
thread's executing () and state variables. Here, we set thread_attached.
On a sigint to gdb, target_pass_ctrlc checks the thread_attached set for all
threads before passing the sigint to the child. The thread attached is set
only once and never unset. This marks the end of the initial attachment and
never changes as the state changes at runtime.

Signed-off-by: Partha Sarathi Satapathy partha.satapathy@oracle.com
---
  gdb/gdbthread.h |  3 +++
  gdb/target.c    | 10 +++++++++-
  gdb/thread.c    |  5 ++++-
  3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/gdb/gdbthread.h b/gdb/gdbthread.h
index 73f6895fe467..aeaa9a083320 100644
--- a/gdb/gdbthread.h
+++ b/gdb/gdbthread.h
@@ -568,6 +568,9 @@ class thread_info : public 
intrusive_list_node<thread_info>,
    /* Displaced-step state for this thread.  */
    displaced_step_thread_state displaced_step_state;

+  /* Set when thread intail attach finished */
+  bool thread_attached = false;
+
  private:
    /* True if this thread is resumed from infrun's perspective.
       Note that a thread can be marked both as not-executing and
diff --git a/gdb/target.c b/gdb/target.c
index 1b5aa11ed6f5..21bb35525c8a 100644
--- a/gdb/target.c
+++ b/gdb/target.c
@@ -3759,6 +3759,7 @@ target_interrupt ()
  void
  target_pass_ctrlc (void)
  {
+  bool attached = true;
    /* Pass the Ctrl-C to the first target that has a thread
       running.  */
    for (inferior *inf : all_inferiors ())
@@ -3767,11 +3768,18 @@ target_pass_ctrlc (void)
        if (proc_target == NULL)
         continue;

+      /* Ensure all threads are attached before passing CtrlC */
+      for (thread_info *thr : inf->non_exited_threads ()) {
+       if(thr->thread_attached == false) {
+         attached = false;
+       }
+      }
+
        for (thread_info *thr : inf->non_exited_threads ())
         {
           /* A thread can be THREAD_STOPPED and executing, while
              running an infcall.  */
-         if (thr->state == THREAD_RUNNING || thr->executing ())
+         if ((thr->state == THREAD_RUNNING || thr->executing ()) && 
(attached))
             {
               /* We can get here quite deep in target layers.  Avoid
                  switching thread context or anything that would
diff --git a/gdb/thread.c b/gdb/thread.c
index 4ee469368610..2120274bd31f 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -969,10 +969,13 @@ finish_thread_state (process_stratum_target *targ, 
ptid_t ptid)
  {
    bool any_started = false;

-  for (thread_info *tp : all_non_exited_threads (targ, ptid))
+  for (thread_info *tp : all_non_exited_threads (targ, ptid)) {
      if (set_running_thread (tp, tp->executing ()))
        any_started = true;

+    tp->thread_attached = true;
+  }
+
    if (any_started)
      notify_target_resumed (ptid);
  }

Thanks
Partha

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-06-24 20:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-09 10:53 [PATCH 1/1 V5] gdb : Signal to pstack/gdb kills the attached process Partha Satapathy
2024-05-09 14:32 ` Tom Tromey
2024-05-10 20:19 ` Pedro Alves
2024-05-13 14:49   ` Pedro Alves
2024-05-16  7:15     ` Partha Satapathy
2024-06-03  5:21       ` Partha Satapathy
2024-06-10  5:41         ` Partha Satapathy
2024-06-14 17:19         ` Pedro Alves
2024-06-20  7:24           ` [External] : " Partha Satapathy
2024-06-23  5:42             ` Partha Satapathy
2024-06-24 20:04               ` Partha Satapathy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).