Moribund breakpoints and hardware single-step

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* Moribund breakpoints and hardware single-step
@ 2011-04-28 16:27 Frederic Riss
  2011-05-02  7:31 ` Frederic Riss
  0 siblings, 1 reply; 2+ messages in thread
From: Frederic Riss @ 2011-04-28 16:27 UTC (permalink / raw)
  To: gdb

Hi,

I just debugged a very interesting problem in the moribund breakpoints
machinery. First I'm working on sources that must be ~ 2 months old. I
haven't had time upgrading, but from looking at the diff, the current
GDB master should be subject to the same behavior.

The target is in async + non-stop mode and uses displaced stepping.
When stepping into a function. infrun.c:handle_step_into_function()
inserts a step_resume breakpoint at the end of the prologue and
resumes execution. When the breakpoint is hit, it is removed from the
target and from the breakpoint list and is remembered in the moribund
breakpoints list for a bit. We have the current PC that points to the
location of the moribund breakpoint, and we try to step further. GDB
asks the target to step one instruction and gets the hand back.
Currently if the size of an instruction equals decr_pc_after_break(),
infrun.c:adjust_pc_after_break() will consider that the target hit the
moribund breakpoint and reset the PC to the breakpoint address, thus
executing again and again the same instruction until the breakpoint is
ripped off the moribund list.

The issue is quite serious as it breaks the inferior behavior (it will
go unnoticed if the instruction being repeatedly stepped has always
the same side effect, but $r0 = $r0 + 1 will become $r0 = $r0 + 3 *
(thread_count () + 1) )

The comment in adjust_pc_after_break reads:

      /* When using hardware single-step, a SIGTRAP is reported for both
	 a completed single-step and a software breakpoint.  Need to
	 differentiate between the two, as the latter needs adjusting
	 but the former does not.

	 The SIGTRAP can be due to a completed hardware single-step only if
	  - we didn't insert software single-step breakpoints
	  - the thread to be examined is still the current thread
	  - this thread is currently being stepped

	 If any of these events did not occur, we must have stopped due
	 to hitting a software breakpoint, and have to back up to the
	 breakpoint address.

	 As a special case, we could have hardware single-stepped a
	 software breakpoint.  In this case (prev_pc == breakpoint_pc),
	 we also need to back up to the breakpoint address.  */

It's the last special case here that bites. I 'fixed' that in my tree
with the following simple patch:

@@ -2941,7 +2884,8 @@ adjust_pc_after_break (struct
execution_control_state *ecs)
       if (singlestep_breakpoints_inserted_p
          || !ptid_equal (ecs->ptid, inferior_ptid)
          || !currently_stepping (ecs->event_thread)
-         || ecs->event_thread->prev_pc == breakpoint_pc)
+         || (software_breakpoint_inserted_here_p (aspace, breakpoint_pc)
+             && ecs->event_thread->prev_pc == breakpoint_pc))
        regcache_write_pc (regcache, breakpoint_pc);

       if (RECORD_IS_USED)

The patch is based on the fact that we won't ever hardware single-step
a moribund-breakpoint. However, I'm not sure this assertion always
holds, and I'm a bit nervous that there might be some other cases that
lead to the same kind of behavior. What do you think?

As an aside, why do we use a step-resume breakpoint when stepping into
a function? In these days of massive multi-threading, wouldn't it be
much better to just change the thread's stepping range to avoid other
threads hitting the temporary breakpoint ?

Regards,
Fred

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Moribund breakpoints and hardware single-step
  2011-04-28 16:27 Moribund breakpoints and hardware single-step Frederic Riss
@ 2011-05-02  7:31 ` Frederic Riss
  0 siblings, 0 replies; 2+ messages in thread
From: Frederic Riss @ 2011-05-02  7:31 UTC (permalink / raw)
  To: gdb

On 28 April 2011 18:26, Frederic Riss <frederic.riss@gmail.com> wrote:
> Hi,
>
> I just debugged a very interesting problem in the moribund breakpoints
> machinery. First I'm working on sources that must be ~ 2 months old. I
> haven't had time upgrading, but from looking at the diff, the current
> GDB master should be subject to the same behavior.

For the record, I reproduced this issue on HEAD with x86_64. It's
quite easy to reproduce, as every 1-byte instruction with cumulative
side-effects (eg a pushd) is a candidate reproducer:

-------------------8<-----------------------------------8<----------------------------
$ gcc -g -fno-omit-frame-pointer ../gdb/testsuite/gdb.base/recurse.c
$ ./gdb/gdbserver/gdbserver :10000 ./a.out &
[1] 30543
Process ./a.out created; pid = 30548
Listening on port 10000
$ ./gdb/gdb a.out --silent
Reading symbols from /tmp/gdb/build/a.out...done.
(gdb) set target-async
(gdb) set non-stop
(gdb) tar extended-remote :10000
Remote debugging using :10000
Remote debugging from host 127.0.0.1
[New Thread 30548.30548]
(gdb)
[Thread 30548.30548] #1 stopped.
0x0000003eac400b20 in ?? ()
tb *main
Temporary breakpoint 1 at 0x4004b9: file
../gdb/testsuite/gdb.base/recurse.c, line 24.
(gdb) c
Continuing.

Temporary breakpoint 1, main () at ../gdb/testsuite/gdb.base/recurse.c:24
24	{
(gdb) s
main () at ../gdb/testsuite/gdb.base/recurse.c:29
29	  recurse (10);
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
-------------------8<-----------------------------------8<----------------------------

What happened is that the pushd instruction at the start of main was
executed a bunch of times by the 'step' command because of the
behavior described bellow, thus corrupting the stack. I kept the full
description bellow:

> The target is in async + non-stop mode and uses displaced stepping.
> When stepping into a function. infrun.c:handle_step_into_function()
> inserts a step_resume breakpoint at the end of the prologue and
> resumes execution. When the breakpoint is hit, it is removed from the
> target and from the breakpoint list and is remembered in the moribund
> breakpoints list for a bit. We have the current PC that points to the
> location of the moribund breakpoint, and we try to step further. GDB
> asks the target to step one instruction and gets the hand back.
> Currently if the size of an instruction equals decr_pc_after_break(),
> infrun.c:adjust_pc_after_break() will consider that the target hit the
> moribund breakpoint and reset the PC to the breakpoint address, thus
> executing again and again the same instruction until the breakpoint is
> ripped off the moribund list.
>
> The issue is quite serious as it breaks the inferior behavior (it will
> go unnoticed if the instruction being repeatedly stepped has always
> the same side effect, but $r0 = $r0 + 1 will become $r0 = $r0 + 3 *
> (thread_count () + 1) )
>
> The comment in adjust_pc_after_break reads:
>
>      /* When using hardware single-step, a SIGTRAP is reported for both
>         a completed single-step and a software breakpoint.  Need to
>         differentiate between the two, as the latter needs adjusting
>         but the former does not.
>
>         The SIGTRAP can be due to a completed hardware single-step only if
>          - we didn't insert software single-step breakpoints
>          - the thread to be examined is still the current thread
>          - this thread is currently being stepped
>
>         If any of these events did not occur, we must have stopped due
>         to hitting a software breakpoint, and have to back up to the
>         breakpoint address.
>
>         As a special case, we could have hardware single-stepped a
>         software breakpoint.  In this case (prev_pc == breakpoint_pc),
>         we also need to back up to the breakpoint address.  */
>
> It's the last special case here that bites. I 'fixed' that in my tree
> with the following simple patch:
>
> @@ -2941,7 +2884,8 @@ adjust_pc_after_break (struct
> execution_control_state *ecs)
>       if (singlestep_breakpoints_inserted_p
>          || !ptid_equal (ecs->ptid, inferior_ptid)
>          || !currently_stepping (ecs->event_thread)
> -         || ecs->event_thread->prev_pc == breakpoint_pc)
> +         || (software_breakpoint_inserted_here_p (aspace, breakpoint_pc)
> +             && ecs->event_thread->prev_pc == breakpoint_pc))
>        regcache_write_pc (regcache, breakpoint_pc);
>
>       if (RECORD_IS_USED)
>
> The patch is based on the fact that we won't ever hardware single-step
> a moribund-breakpoint. However, I'm not sure this assertion always
> holds, and I'm a bit nervous that there might be some other cases that
> lead to the same kind of behavior. What do you think?
>
> As an aside, why do we use a step-resume breakpoint when stepping into
> a function? In these days of massive multi-threading, wouldn't it be
> much better to just change the thread's stepping range to avoid other
> threads hitting the temporary breakpoint ?
>
> Regards,
> Fred
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-05-02  7:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-28 16:27 Moribund breakpoints and hardware single-step Frederic Riss
2011-05-02  7:31 ` Frederic Riss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).