RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS
@ 2014-11-20  5:11 Joel Brobecker
  2014-11-20  5:12 ` Joel Brobecker
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Brobecker @ 2014-11-20  5:11 UTC (permalink / raw)
  To: gdb-patches

Hello,

I was wondering what you guys would think of a patch like this.
I am a bit uncertain, because I don't understand everything
that is happening - and the problem is that this is happening
with a fairly massive and complex program that I don't have access
to, on a system that is also fairly opaque. When I'm lucky, getting
answers is only very hard.

I am still trying to reproduce the problem locally in order to
find out more, but I couldn't understand why, in principle,
one thread couldn't receive multiple notifications during
the same single-step if the system decides to queue up signals?
If that were the case, wouldn't the attached patch make sense?
(currently untested against the program that triggered the issue,
as I think I understand how inline-frame works, and what it does,
but I am not sure I get it all).

Thank you!
-- 
Joel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS
  2014-11-20  5:11 RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS Joel Brobecker
@ 2014-11-20  5:12 ` Joel Brobecker
  2014-11-20  9:55   ` Pedro Alves
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Brobecker @ 2014-11-20  5:12 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 903 bytes --]

[Fixing ENOPATCH... sigh.]
> I was wondering what you guys would think of a patch like this.
> I am a bit uncertain, because I don't understand everything
> that is happening - and the problem is that this is happening
> with a fairly massive and complex program that I don't have access
> to, on a system that is also fairly opaque. When I'm lucky, getting
> answers is only very hard.
> 
> I am still trying to reproduce the problem locally in order to
> find out more, but I couldn't understand why, in principle,
> one thread couldn't receive multiple notifications during
> the same single-step if the system decides to queue up signals?
> If that were the case, wouldn't the attached patch make sense?
> (currently untested against the program that triggered the issue,
> as I think I understand how inline-frame works, and what it does,
> but I am not sure I get it all).

Thanks again!
-- 
Joel

[-- Attachment #2: 0001-skip_inline_frames-failed-assertion-resuming-from-br.patch --]
[-- Type: text/x-diff, Size: 4704 bytes --]

From f7ad35aa92a7007194582b1e23a110fc06b50cd1 Mon Sep 17 00:00:00 2001
From: Joel Brobecker <brobecker@adacore.com>
Date: Thu, 20 Nov 2014 08:38:08 +0400
Subject: [PATCH] skip_inline_frames failed assertion resuming from breakpoint
 on LynxOS

A user reported a failed assertion while debugging their program
on a LynxOS system (thus via GDBserver), when trying to resume
the program's execution after having reached a breakpoint:

    (gdb) continue
    [...]
    ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed.

Turning infrun debug traces helps understand a little better what
happens:

    (gdb) continue
    Continuing.
    infrun: clear_proceed_status_thread (Thread 126)
    [...]
    infrun: clear_proceed_status_thread (Thread 142)
    [...]
    infrun: clear_proceed_status_thread (Thread 146)
    infrun: clear_proceed_status_thread (Thread 125)
    infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0)
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
    infrun: wait_for_inferior ()
    infrun: target_wait (-1, status) =
    infrun:   42000 [Thread 146],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34
    infrun: infwait_normal_state
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x10a187f4
    infrun: context switch
    infrun: Switching context from Thread 142 to Thread 146
    infrun: random signal (GDB_SIGNAL_REALTIME_34)
    infrun: switching back to stepped thread
    infrun: Switching context from Thread 146 to Thread 142
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
    infrun: prepare_to_wait
    [...handling of similar events for threads 145, 144 and 143 snipped...]
    infrun: prepare_to_wait
    infrun: target_wait (-1, status) =
    infrun:   42000 [Thread 146],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34
    infrun: infwait_normal_state
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x10a187f4
    infrun: context switch
    infrun: Switching context from Thread 142 to Thread 146
    ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed.

It all happens while we're trying to single-step out of the breakpoint.
We keep resuming the inferior trying to single-step the thread that
hit the breakpoint, but each time we get a notification that another
thread received a particular signal. This is OK until the same thread
actually received a signal a second time, without having actually
run further (same PC). That's when we hit the assertion in
skip_inline_frames.

This patch avoids the assertion by recognizing that a thread can
indeed potentially receive multiple events without changing PC,
and by therefore changing skip_inline_frames to return immediately
if there we have already computed the inline_state for this thread's
PC.

gdb/ChangeLog:

        * inline-frame.c (skip_inline_frames): Do not raise a failed
        assertion if find_inline_frame_state finds an inlined frame
        state for PTID.  Return early instead.

Tested on x86_64-linux.
---
 gdb/inline-frame.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/gdb/inline-frame.c b/gdb/inline-frame.c
index cecb2af..c60820c 100644
--- a/gdb/inline-frame.c
+++ b/gdb/inline-frame.c
@@ -307,6 +307,24 @@ skip_inline_frames (ptid_t ptid)
   int skip_count = 0;
   struct inline_state *state;
 
+  if (find_inline_frame_state (ptid) != NULL)
+    {
+      /* This thread is receiving multiple notifications without
+	 making progress in its execution (same PC).
+
+	 This was seen happening on LynxOS where a program appears
+	 to have a number of signals being queued then delivered
+	 while trying to single-step a thread out of a breakpoint.
+	 The single-step operation makes no progress until all signals
+	 get delivered first, which can result in the same thread
+	 receiving multiple signals during the same single-step
+	 attempt.
+
+	 We have already computed the inline_state for that thread,
+	 so there is no need to redo it again.  */
+      return;
+    }
+
   /* This function is called right after reinitializing the frame
      cache.  We try not to do more unwinding than absolutely
      necessary, for performance.  */
@@ -335,7 +353,6 @@ skip_inline_frames (ptid_t ptid)
 	}
     }
 
-  gdb_assert (find_inline_frame_state (ptid) == NULL);
   state = allocate_inline_frame_state (ptid);
   state->skipped_frames = skip_count;
   state->saved_pc = this_pc;
-- 
1.9.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS
  2014-11-20  5:12 ` Joel Brobecker
@ 2014-11-20  9:55   ` Pedro Alves
  2014-11-20 17:11     ` Joel Brobecker
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Alves @ 2014-11-20  9:55 UTC (permalink / raw)
  To: Joel Brobecker, gdb-patches

On 11/20/2014 05:12 AM, Joel Brobecker wrote:

>> > I am still trying to reproduce the problem locally in order to
>> > find out more, but I couldn't understand why, in principle,
>> > one thread couldn't receive multiple notifications during
>> > the same single-step if the system decides to queue up signals?
>> > If that were the case, wouldn't the attached patch make sense?
>> > (currently untested against the program that triggered the issue,
>> > as I think I understand how inline-frame works, and what it does,
>> > but I am not sure I get it all).
> Thanks again!
> -- Joel
> 
> 
> 0001-skip_inline_frames-failed-assertion-resuming-from-br.patch
> 
> 
> From f7ad35aa92a7007194582b1e23a110fc06b50cd1 Mon Sep 17 00:00:00 2001
> From: Joel Brobecker <brobecker@adacore.com>
> Date: Thu, 20 Nov 2014 08:38:08 +0400
> Subject: [PATCH] skip_inline_frames failed assertion resuming from breakpoint
>  on LynxOS
> 
> A user reported a failed assertion while debugging their program
> on a LynxOS system (thus via GDBserver), when trying to resume
> the program's execution after having reached a breakpoint:
> 
>     (gdb) continue
>     [...]
>     ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed.
> 
> Turning infrun debug traces helps understand a little better what
> happens:
> 
>     (gdb) continue
>     Continuing.
>     infrun: clear_proceed_status_thread (Thread 126)
>     [...]
>     infrun: clear_proceed_status_thread (Thread 142)
>     [...]
>     infrun: clear_proceed_status_thread (Thread 146)
>     infrun: clear_proceed_status_thread (Thread 125)
>     infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0)
>     infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838

trap_expected=1 indicates that GDB is about to step thread 142 _only_, leaving
everything else stopped.  Can you enable "set debug remote 1" as well?

>     infrun: wait_for_inferior ()
>     infrun: target_wait (-1, status) =
>     infrun:   42000 [Thread 146],
>     infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34

So how come we see an event for thread 146?  That thread shouldn't
have been resumed, so GDB shouldn't be getting an event for it.

This is sounding like a bug in the target.

>     infrun: infwait_normal_state
>     infrun: TARGET_WAITKIND_STOPPED
>     infrun: stop_pc = 0x10a187f4
>     infrun: context switch
>     infrun: Switching context from Thread 142 to Thread 146
>     infrun: random signal (GDB_SIGNAL_REALTIME_34)
>     infrun: switching back to stepped thread
>     infrun: Switching context from Thread 146 to Thread 142
>     infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
>     infrun: prepare_to_wait
>     [...handling of similar events for threads 145, 144 and 143 snipped...]
>     infrun: prepare_to_wait
>     infrun: target_wait (-1, status) =
>     infrun:   42000 [Thread 146],
>     infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34
>     infrun: infwait_normal_state
>     infrun: TARGET_WAITKIND_STOPPED
>     infrun: stop_pc = 0x10a187f4
>     infrun: context switch
>     infrun: Switching context from Thread 142 to Thread 146
>     ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed.


Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS
  2014-11-20  9:55   ` Pedro Alves
@ 2014-11-20 17:11     ` Joel Brobecker
  2014-11-21 10:43       ` Pedro Alves
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Brobecker @ 2014-11-20 17:11 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Hi Pedro,

> >     infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0)
> >     infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
> 
> trap_expected=1 indicates that GDB is about to step thread 142 _only_, leaving
> everything else stopped.  Can you enable "set debug remote 1" as well?

Correct (we are single-stepping out of a breakpoint).

Here is the output with remote debugging:
| Continuing.
| infrun: clear_proceed_status_thread (Thread 126)
| infrun: clear_proceed_status_thread (Thread 147)
| infrun: clear_proceed_status_thread (Thread 134)
| infrun: clear_proceed_status_thread (Thread 135)
| infrun: clear_proceed_status_thread (Thread 133)
| infrun: clear_proceed_status_thread (Thread 136)
| infrun: clear_proceed_status_thread (Thread 127)
| infrun: clear_proceed_status_thread (Thread 129)
| infrun: clear_proceed_status_thread (Thread 128)
| infrun: clear_proceed_status_thread (Thread 130)
| infrun: clear_proceed_status_thread (Thread 132)
| infrun: clear_proceed_status_thread (Thread 141)
| infrun: clear_proceed_status_thread (Thread 131)
| infrun: clear_proceed_status_thread (Thread 137)
| infrun: clear_proceed_status_thread (Thread 138)
| infrun: clear_proceed_status_thread (Thread 139)
| infrun: clear_proceed_status_thread (Thread 140)
| infrun: clear_proceed_status_thread (Thread 142)
| infrun: clear_proceed_status_thread (Thread 143)
| infrun: clear_proceed_status_thread (Thread 144)
| infrun: clear_proceed_status_thread (Thread 145)
| infrun: clear_proceed_status_thread (Thread 146)
| infrun: clear_proceed_status_thread (Thread 125)
| infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0)
| infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
| Sending packet: $Hg8e#4c...Packet received: OK
| Sending packet: $m10684838,4#73...Packet received: 4ba1db21
| Sending packet: $QPassSignals:#f3...Packet received: OK
| Sending packet: $vCont;s:8e#8f...infrun: wait_for_inferior ()
| Packet received: T2e01:3a440910;40:10a187f4;thread:92;
| infrun: target_wait (-1, status) =
| infrun:   42000 [Thread 146],
| infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34
| infrun: infwait_normal_state
| infrun: TARGET_WAITKIND_STOPPED
| infrun: stop_pc = 0x10a187f4
| infrun: context switch
| infrun: Switching context from Thread 142 to Thread 146
| Sending packet: $m10a187f0,4#c5...Packet received: 44000002
| Sending packet: $m10a187f0,4#c5...Packet received: 44000002
| Sending packet: $g#67...Packet received: 000000c33a4409102003b21020ed76a83a4408d80000000000000007000100010001005b20ed76a820ed79380000000010abd7a10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402000e6883a4409103fc34833395728754082c13483339573000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003e112e0be826d69500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010a187f40000f030220000481099d6dc109db33420000000fff80000
| infrun: random signal (GDB_SIGNAL_REALTIME_34)
| Sending packet: $T8e#f1...Packet received: OK
| infrun: switching back to stepped thread
| infrun: Switching context from Thread 146 to Thread 142
| Sending packet: $Hg8e#4c...Packet received: OK
| Sending packet: $g#67...Packet received: 103422e83a8948e02003b21000000000000000000000000900000008000000090000000020037a301068480800000000220000420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402001791c3a89499000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000106848380002f43042000042103422e81068480820000000fff80000
| infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
| Sending packet: $m10684838,4#73...Packet received: 4ba1db21
| Sending packet: $vCont;s:8e#8f...infrun: prepare_to_wait
| Packet received: T2f01:3a55b910;40:10a187f4;thread:91;
| infrun: target_wait (-1, status) =
| infrun:   42000 [Thread 145],
| infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_35
| infrun: infwait_normal_state
| infrun: TARGET_WAITKIND_STOPPED
| infrun: stop_pc = 0x10a187f4
| infrun: context switch
| infrun: Switching context from Thread 142 to Thread 145
| Sending packet: $m10a187f0,4#c5...Packet received: 44000002
| Sending packet: $m10a187f0,4#c5...Packet received: 44000002
| Sending packet: $g#67...Packet received: 000000c33a55b9102003b21020ed76b03a55b8d800000000000001fe000000010000000120ed76b0100703ac00000000280000020000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402000e6883a55b9100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010a187f40000f030280000081099d6dc109db33400000000fff80000
| infrun: random signal (GDB_SIGNAL_REALTIME_35)
| Sending packet: $T8e#f1...Packet received: OK
| infrun: switching back to stepped thread
| infrun: Switching context from Thread 145 to Thread 142
| Sending packet: $Hg8e#4c...Packet received: OK
| Sending packet: $g#67...Packet received: 103422e83a8948e02003b21000000000000000000000000900000008000000090000000020037a301068480800000000220000420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402001791c3a89499000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000106848380002f43042000042103422e81068480820000000fff80000
| infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
| Sending packet: $m10684838,4#73...Packet received: 4ba1db21
| Sending packet: $vCont;s:8e#8f...infrun: prepare_to_wait
| Packet received: T3001:3a65e910;40:10a187f4;thread:90;
| infrun: target_wait (-1, status) =
| infrun:   42000 [Thread 144],
| infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_36
| infrun: infwait_normal_state
| infrun: TARGET_WAITKIND_STOPPED
| infrun: stop_pc = 0x10a187f4
| infrun: context switch
| infrun: Switching context from Thread 142 to Thread 144
| Sending packet: $m10a187f0,4#c5...Packet received: 44000002
| Sending packet: $m10a187f0,4#c5...Packet received: 44000002
| Sending packet: $g#67...Packet received: 000000c33a65e9102003b21020ed76b83a65e8d820f44dcc00000001000000020000000220ed76b800000060000016e020ef70900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402000e6883a65e910408206d1cf98259e4081f6d1cf98259e000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004081f6d1cf98259e00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010a187f40000f030230000481099d6dc109db33400000000fff80000
| infrun: random signal (GDB_SIGNAL_REALTIME_36)
| Sending packet: $T8e#f1...Packet received: OK
| infrun: switching back to stepped thread
| infrun: Switching context from Thread 144 to Thread 142
| Sending packet: $Hg8e#4c...Packet received: OK
| Sending packet: $g#67...Packet received: 103422e83a8948e02003b21000000000000000000000000900000008000000090000000020037a301068480800000000220000420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402001791c3a89499000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000106848380002f43042000042103422e81068480820000000fff80000
| infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
| Sending packet: $m10684838,4#73...Packet received: 4ba1db21
| Sending packet: $vCont;s:8e#8f...infrun: prepare_to_wait
| Packet received: T3101:3a791910;40:10a187f4;thread:8f;
| infrun: target_wait (-1, status) =
| infrun:   42000 [Thread 143],
| infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_37
| infrun: infwait_normal_state
| infrun: TARGET_WAITKIND_STOPPED
| infrun: stop_pc = 0x10a187f4
| infrun: context switch
| infrun: Switching context from Thread 142 to Thread 143
| Sending packet: $m10a187f0,4#c5...Packet received: 44000002
| Sending packet: $m10a187f0,4#c5...Packet received: 44000002
| Sending packet: $g#67...Packet received: 000000c33a7919102003b21020ed76c03a7918d800000000000002123a7919905448524420ed76c0b07da7b020f07728200000040000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402000e6883a7919100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010a187f40000f030280000081099d6dc109db33400000000fff80000
| infrun: random signal (GDB_SIGNAL_REALTIME_37)
| Sending packet: $T8e#f1...Packet received: OK
| infrun: switching back to stepped thread
| infrun: Switching context from Thread 143 to Thread 142
| Sending packet: $Hg8e#4c...Packet received: OK
| Sending packet: $g#67...Packet received: 103422e83a8948e02003b21000000000000000000000000900000008000000090000000020037a301068480800000000220000420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402001791c3a89499000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000106848380002f43042000042103422e81068480820000000fff80000
| infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
| Sending packet: $m10684838,4#73...Packet received: 4ba1db21
| Sending packet: $vCont;s:8e#8f...infrun: prepare_to_wait
| Packet received: T2e01:3a440910;40:10a187f4;thread:92;
| infrun: target_wait (-1, status) =
| infrun:   42000 [Thread 146],
| infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34
| infrun: infwait_normal_state
| infrun: TARGET_WAITKIND_STOPPED
| infrun: stop_pc = 0x10a187f4
| infrun: context switch
| infrun: Switching context from Thread 142 to Thread 146
| Sending packet: $m10a187f0,4#c5...Packet received: 44000002
| Sending packet: $m10a187f0,4#c5...Packet received: 44000002
| ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed.

> >     infrun: wait_for_inferior ()
> >     infrun: target_wait (-1, status) =
> >     infrun:   42000 [Thread 146],
> >     infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34
> 
> So how come we see an event for thread 146?  That thread shouldn't
> have been resumed, so GDB shouldn't be getting an event for it.
> 
> This is sounding like a bug in the target.

I thought about this too, and there might be a ptrace request
I can use to absolutely limit the resumption to the one thread.
I say "might" because only testing will show if the request is
supported, and works, on all versions of LynxOS.

But I have always been relunctant to do so for 2 reasons [1]:
  - It affects the program's scheduling;
  - Can the program lock up if we're trying to single-step
    a thread that's blocked?

Also, what made me consider this change independently of the questions
above is that it seems to me that it the situation we are facing here
seems to be easily handled. So, to avoid headaches from other "buggy"
targets, containing this situation seemed friendlier. Don't we also
have other targets that don't have the capability to resume one single
thread?

-- 
Joel

[1]: I realize that this opens the door for other threads executing
     this instruction without triggering a breakpoint. I can't
     explain why I am more concerned by scheduling interference
     than the probability of missing a breakpoint. I may bite the
     bullet at some point...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS
  2014-11-20 17:11     ` Joel Brobecker
@ 2014-11-21 10:43       ` Pedro Alves
  2014-12-13 15:46         ` Joel Brobecker
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Alves @ 2014-11-21 10:43 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

On 11/20/2014 05:11 PM, Joel Brobecker wrote:
> Hi Pedro,
> 
>>>     infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0)
>>>     infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
>>
>> trap_expected=1 indicates that GDB is about to step thread 142 _only_, leaving
>> everything else stopped.  Can you enable "set debug remote 1" as well?
> 
> Correct (we are single-stepping out of a breakpoint).
> 
> Here is the output with remote debugging:

> | Sending packet: $vCont;s:8e#8f...infrun: wait_for_inferior ()

Alright, GDB really did resume only thread 0x8e/142.

>> So how come we see an event for thread 146?  That thread shouldn't
>> have been resumed, so GDB shouldn't be getting an event for it.
>>
>> This is sounding like a bug in the target.
> 
> I thought about this too, and there might be a ptrace request
> I can use to absolutely limit the resumption to the one thread.
> I say "might" because only testing will show if the request is
> supported, and works, on all versions of LynxOS.

I had a feeling we had discussed this before...  See:

 https://sourceware.org/ml/gdb-patches/2013-05/msg00436.html

The (very) old gdb/lynx-nat.c code in GDB used to do this, so it
should work.  Could you try it?  We're going to be keep hitting
all sorts of issues until this is finally done.

> 
> But I have always been relunctant to do so for 2 reasons [1]:
>   - It affects the program's scheduling;

That's hardly an issue, when the program had just completely
stopped for a breakpoint.  :-)

>   - Can the program lock up if we're trying to single-step
>     a thread that's blocked?

The thread just hit a breakpoint, so it was not blocked in sense
of the kernel not allowing its scheduling before.

The main issue is that we're trying to move the thread past a
breakpoint.  Barring displaced stepping support, to move the
thread past the breakpoint, we have to remove the breakpoint from
the target temporarily.  But then we _cannot_ resume other threads
but the one that is stopped at the breakpoint, because then those
other threads could fly by the removed breakpoint and miss it.

Regarding lock up, the only issue I see is if the instruction the breakpoint
was put on is a syscall instruction that calls into the kernel and that
could block.  That's a corner case that we e.g., never found the need to
handle on Linux.  syscalls tend to wrapped in libc functions, so users
don't normally put breakpoints on syscall instructions.  But still, there
would be ways to handle it.  E.g., when stepping, ask the kernel to report
syscall entry, and if a syscall entry is detected, we know the instruction
has executed, so we can reinsert breakpoints, and resume execution of all
threads again.  Similarly to how we always want to be notified of
signals when we step.  (From infrun.c:

 "If we have removed breakpoints because we are stepping over one (in any
 thread), we need to receive all signals to avoid accidentally skipping
 a breakpoint during execution of a signal handler.")

> Also, what made me consider this change independently of the questions
> above is that it seems to me that it the situation we are facing here
> seems to be easily handled. So, to avoid headaches from other "buggy"
> targets, containing this situation seemed friendlier. Don't we also
> have other targets that don't have the capability to resume one single
> thread?

I honestly hope not.  Resuming only a particular thread is a very
basic debug API feature.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS
  2014-11-21 10:43       ` Pedro Alves
@ 2014-12-13 15:46         ` Joel Brobecker
  2014-12-15 13:11           ` Pedro Alves
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Brobecker @ 2014-12-13 15:46 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 1970 bytes --]

Hi Pedro,

> The main issue is that we're trying to move the thread past a
> breakpoint.  Barring displaced stepping support, to move the
> thread past the breakpoint, we have to remove the breakpoint from
> the target temporarily.  But then we _cannot_ resume other threads
> but the one that is stopped at the breakpoint, because then those
> other threads could fly by the removed breakpoint and miss it.

Attached is a patch that does just that, tested on ppc-lynx5 and
ppc-lynx178.  I waited a while before posting it here, because
I wanted to put it in observation for a while first...

gdb/gdbserver/ChangeLog:

        * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1.
        Remove FIXME comment about assumption about N.

OK to commit?

Note that parallel to that, I came across another issue, which I am
going to call a limitation for now: consider the case where we have
2 threads, A and B, and we are tring to next/step some code in thread
A. While doing so, thread B receives a signal, and therefore reports
it to GDB. GDB sees that this signal is configured as
nostop/noprint/pass, so presumably, you would think that we'd resume
the inferior passing that signal to thread B. However, how do you do
that while at the same time stepping thread A?

IIRC, what happens currently in this case is that GDB keeps trying
to resume/step thread A, and the kernel keeps telling GDB "no,
thread B just received a signal", and so GDB and the kernel go
into that infinite loop where nothing advances. I'm not quite sure
why we keep getting the signal for thread B, if it's a new signal
each time, or if it's about the signal not being passed back (the
program I saw this in is fairly large and complicated).

In any case, I don't see how we could improve this situation
without settting sss-like breakpoints... Something I'm not really
eager to do, at least for now, since "set scheduler-locking step"
seems to work around the issue.

Thanks!
-- 
Joel

[-- Attachment #2: 0001-gdbserver-lynxos-Use-PTRACE_SINGLESTEP_ONE-when-sing.patch --]
[-- Type: text/x-diff, Size: 3831 bytes --]

From ea7e173463120d24417a7706f98fff850f9aaa1a Mon Sep 17 00:00:00 2001
From: Joel Brobecker <brobecker@adacore.com>
Date: Tue, 25 Nov 2014 11:12:10 -0500
Subject: [PATCH] [gdbserver/lynxos] Use PTRACE_SINGLESTEP_ONE when
 single-stepping one thread.

Currently, when we receive a request to single-step one single thread
(Eg, when single-stepping out of a breakpoint), we use the
PTRACE_SINGLESTEP pthread request, which does single-step
the corresponding thread, but also resumes execution of all
other threads in the inferior.

This causes problems when debugging programs where another thread
receives multiple debug events while trying to single-step a specific
thread out of a breakpoint (with infrun traces turned on):

    (gdb) continue
    Continuing.
    infrun: clear_proceed_status_thread (Thread 126)
    [...]
    infrun: clear_proceed_status_thread (Thread 142)
    [...]
    infrun: clear_proceed_status_thread (Thread 146)
    infrun: clear_proceed_status_thread (Thread 125)
    infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0)
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
    infrun: wait_for_inferior ()
    infrun: target_wait (-1, status) =
    infrun:   42000 [Thread 146],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34
    infrun: infwait_normal_state
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x10a187f4
    infrun: context switch
    infrun: Switching context from Thread 142 to Thread 146
    infrun: random signal (GDB_SIGNAL_REALTIME_34)
    infrun: switching back to stepped thread
    infrun: Switching context from Thread 146 to Thread 142
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
    infrun: prepare_to_wait
    [...handling of similar events for threads 145, 144 and 143 snipped...]
    infrun: prepare_to_wait
    infrun: target_wait (-1, status) =
    infrun:   42000 [Thread 146],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34
    infrun: infwait_normal_state
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x10a187f4
    infrun: context switch
    infrun: Switching context from Thread 142 to Thread 146
    ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed.

What happens is that GDB keeps sending requests to resume one specific
thread, and keeps receiving debugging events for other threads.
Things break down when the one of the other threads receives a debug
event for the second time (thread 146 in the example above).

This patch fixes the problem by making sure that only one thread
gets resumed, thus preventing the other threads from generating
an unexpected event.

gdb/gdbserver/ChangeLog:

        * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1.
        Remove FIXME comment about assumption about N.
---
 gdb/gdbserver/lynx-low.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gdb/gdbserver/lynx-low.c b/gdb/gdbserver/lynx-low.c
index 6178e03..3b83669 100644
--- a/gdb/gdbserver/lynx-low.c
+++ b/gdb/gdbserver/lynx-low.c
@@ -320,10 +320,11 @@ lynx_attach (unsigned long pid)
 static void
 lynx_resume (struct thread_resume *resume_info, size_t n)
 {
-  /* FIXME: Assume for now that n == 1.  */
   ptid_t ptid = resume_info[0].thread;
-  const int request = (resume_info[0].kind == resume_step
-                       ? PTRACE_SINGLESTEP : PTRACE_CONT);
+  const int request
+    = (resume_info[0].kind == resume_step
+       ? (n == 1 ? PTRACE_SINGLESTEP_ONE : PTRACE_SINGLESTEP)
+       : PTRACE_CONT);
   const int signal = resume_info[0].sig;

   /* If given a minus_one_ptid, then try using the current_process'
-- 
1.9.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS
  2014-12-13 15:46         ` Joel Brobecker
@ 2014-12-15 13:11           ` Pedro Alves
  2014-12-15 14:58             ` Joel Brobecker
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Alves @ 2014-12-15 13:11 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

On 12/13/2014 03:46 PM, Joel Brobecker wrote:
> Hi Pedro,
> 
>> The main issue is that we're trying to move the thread past a
>> breakpoint.  Barring displaced stepping support, to move the
>> thread past the breakpoint, we have to remove the breakpoint from
>> the target temporarily.  But then we _cannot_ resume other threads
>> but the one that is stopped at the breakpoint, because then those
>> other threads could fly by the removed breakpoint and miss it.
> 
> Attached is a patch that does just that, tested on ppc-lynx5 and
> ppc-lynx178.  I waited a while before posting it here, because
> I wanted to put it in observation for a while first...
> 
> gdb/gdbserver/ChangeLog:
> 
>         * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1.
>         Remove FIXME comment about assumption about N.
> 
> OK to commit?

Sure, OK.

> 
> Note that parallel to that, I came across another issue, which I am
> going to call a limitation for now: consider the case where we have
> 2 threads, A and B, and we are tring to next/step some code in thread
> A. While doing so, thread B receives a signal, and therefore reports
> it to GDB. GDB sees that this signal is configured as
> nostop/noprint/pass, so presumably, you would think that we'd resume
> the inferior passing that signal to thread B. However, how do you do
> that while at the same time stepping thread A?

GDB nowadays sends a single vCont packet that both steps thread A,
continues thread B with a signal and continues all other threads with
no signal (previously in some cases it'd just lose control of the
inferior, or deliver the signal to the wrong thread).  Something like:

  vCont;s:A;C SIG:B;c

See the switch_back_to_stepped_thread calls within:

  if (random_signal)
    {

at the tail end of handle_signal_stop, and
remote.c:append_pending_thread_resumptions.

There are tests in the testsuite that result in packets
just like that.

> 
> IIRC, what happens currently in this case is that GDB keeps trying
> to resume/step thread A, and the kernel keeps telling GDB "no,
> thread B just received a signal", and so GDB and the kernel go
> into that infinite loop where nothing advances. I'm not quite sure
> why we keep getting the signal for thread B, if it's a new signal
> each time, or if it's about the signal not being passed back (the
> program I saw this in is fairly large and complicated).
> 
> In any case, I don't see how we could improve this situation
> without settting sss-like breakpoints... Something I'm not really
> eager to do, at least for now, since "set scheduler-locking step"
> seems to work around the issue.

Couldn't you iterate over the threads, and use PTRACE_STEP_ONE
for the stepped threads, and PTRACE_CONT_ONE for the others,
instead of PTRACE_CONT ?  For the case above, lynx_resume would
end up issuing:

 PTRACE_STEP_ONE, thread A, sig 0
 PTRACE_CONT_ONE, thread B, sig SIG
 PTRACE_CONT_ONE, thread C, sig 0
 PTRACE_CONT_ONE, thread D, sig 0
 ...

Otherwise, yeah, sounds like handling the step request with
breakpoints instead might be the solution.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS
  2014-12-15 13:11           ` Pedro Alves
@ 2014-12-15 14:58             ` Joel Brobecker
  2014-12-15 16:01               ` Pedro Alves
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Brobecker @ 2014-12-15 14:58 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

> > gdb/gdbserver/ChangeLog:
> > 
> >         * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1.
> >         Remove FIXME comment about assumption about N.
> > 
> > OK to commit?
> 
> Sure, OK.

Thank you, pushed!

> GDB nowadays sends a single vCont packet that both steps thread A,
> continues thread B with a signal and continues all other threads with
> no signal (previously in some cases it'd just lose control of the
> inferior, or deliver the signal to the wrong thread).  Something like:
> 
>   vCont;s:A;C SIG:B;c
[...]
> Couldn't you iterate over the threads, and use PTRACE_STEP_ONE
> for the stepped threads, and PTRACE_CONT_ONE for the others,
> instead of PTRACE_CONT ?  For the case above, lynx_resume would
> end up issuing:
> 
>  PTRACE_STEP_ONE, thread A, sig 0
>  PTRACE_CONT_ONE, thread B, sig SIG
>  PTRACE_CONT_ONE, thread C, sig 0
>  PTRACE_CONT_ONE, thread D, sig 0

Interesting. Do you mean sending those requests without waiting
for the inferior to stop? I'd have to verify that it's possible
to send ptrace requests while the inferior is "in flight", but
wouldn't you then have possible race conditions?

-- 
Joel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS
  2014-12-15 14:58             ` Joel Brobecker
@ 2014-12-15 16:01               ` Pedro Alves
  0 siblings, 0 replies; 9+ messages in thread
From: Pedro Alves @ 2014-12-15 16:01 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

On 12/15/2014 02:58 PM, Joel Brobecker wrote:

>> GDB nowadays sends a single vCont packet that both steps thread A,
>> continues thread B with a signal and continues all other threads with
>> no signal (previously in some cases it'd just lose control of the
>> inferior, or deliver the signal to the wrong thread).  Something like:
>>
>>   vCont;s:A;C SIG:B;c
> [...]
>> Couldn't you iterate over the threads, and use PTRACE_STEP_ONE
>> for the stepped threads, and PTRACE_CONT_ONE for the others,
>> instead of PTRACE_CONT ?  For the case above, lynx_resume would
>> end up issuing:
>>
>>  PTRACE_STEP_ONE, thread A, sig 0
>>  PTRACE_CONT_ONE, thread B, sig SIG
>>  PTRACE_CONT_ONE, thread C, sig 0
>>  PTRACE_CONT_ONE, thread D, sig 0
> 
> Interesting. Do you mean sending those requests without waiting
> for the inferior to stop?

Yes.  This is what we do e.g., on Linux.  It just sounds like
Lynx's PTRACE_CONT_ONE is like Linux's PTRACE_CONT.  Linux has
no equivalent of Lynx's PTRACE_CONT (resume all threads with
a single request).

> I'd have to verify that it's possible
> to send ptrace requests while the inferior is "in flight", but
> wouldn't you then have possible race conditions?

Not sure what sort of race conditions you mean, but keep in mind
that I'm pretty clueless about Lynx.  :-)

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-12-15 16:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-20  5:11 RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS Joel Brobecker
2014-11-20  5:12 ` Joel Brobecker
2014-11-20  9:55   ` Pedro Alves
2014-11-20 17:11     ` Joel Brobecker
2014-11-21 10:43       ` Pedro Alves
2014-12-13 15:46         ` Joel Brobecker
2014-12-15 13:11           ` Pedro Alves
2014-12-15 14:58             ` Joel Brobecker
2014-12-15 16:01               ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).