public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint
@ 2014-09-14 12:11 Yao Qi
  2014-09-16 12:13 ` Pedro Alves
  0 siblings, 1 reply; 7+ messages in thread
From: Yao Qi @ 2014-09-14 12:11 UTC (permalink / raw)
  To: gdb-patches

I see the following fail on are-none-linux-gnueabi testing,

(gdb) continue^M
Continuing.^M
^M
Program received signal SIGILL, Illegal instruction.^M
[Switching to Thread 1003]^M
handler (signo=10) at
/scratch/yqi/arm-none-linux-gnueabi/src/gdb-trunk/gdb/testsuite/gdb.threads/sigstep-threads.c:33^M
33        tgkill (getpid (), gettid (), SIGUSR1);       /* step-2 */^M
(gdb) FAIL: gdb.threads/sigstep-threads.exp: continue

the cause is that GDBserver doesn't cancel the breakpoint if the stop
signal is SIGILL.  The kernel used here is a little old, 2.6.x, and
doesn't translate SIGILL to SIGTRAP when program hits breakpoint
instruction (which is an illegal instruction actually).  GDB and
GDBserver can translate SIGILL to SIGTRAP under certain circumstance,
so it is not a problem here.  See gdbserver/linux-low.c:linux_wait_1

  /* If this event was not handled before, and is not a SIGTRAP, we
     report it.  SIGILL and SIGSEGV are also treated as traps in case
     a breakpoint is inserted at the current PC.  If this target does
     not support internal breakpoints at all, we also report the
     SIGTRAP without further processing; it's of no concern to us.  */
  maybe_internal_trap
    = (supports_breakpoints ()
       && (WSTOPSIG (w) == SIGTRAP
	   || ((WSTOPSIG (w) == SIGILL
		|| WSTOPSIG (w) == SIGSEGV)
	       && (*the_low_target.breakpoint_at) (event_child->stop_pc))));

However, SIGILL and SIGSEGV is not considered when cancelling
breakpoint, which causes the fail above.  That is, when GDB is doing
software single step on address ADDR, both thread A and thread B hits the
software single step breakpoint, and get SIGILL.  GDB selects the event
from thread A, removes the software single step breakpoint, and resume
the program.  The event (SIGILL) from thread B is reported to GDB, but
GDB doesn't regard this SIGILL as SIGTRAP, because the breakpoint on
address ADDR was removed, so GDB reports "Program received signal
SIGILL".

The patch is to allow calling cancel_breakpoint if the signal is
SIGILL and SIGSEGV.  This patch fixes the fail above.

Regression tested on arm-none-linux-gnueabi and x86_64-linux.

gdb/gdbserver:

2014-09-12  Yao Qi  <yao@codesourcery.com>

	* linux-low.c (cancel_breakpoints_callback): Allow calling
	cancel_breakpoint if stop signal is SIGILL or SIGSEGV.
	(linux_low_filter_event): Likewise.
---
 gdb/gdbserver/linux-low.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index ec3260e..e2c9814 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -1936,7 +1936,12 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat)
 		 the core before this one is handled.  All-stop always
 		 cancels breakpoint hits in all threads.  */
 	      if (non_stop
-		  && WSTOPSIG (wstat) == SIGTRAP
+		  && (WSTOPSIG (wstat) == SIGTRAP
+		      /* SIGILL and SIGSEGV are also treated as traps in
+			 case a breakpoint is inserted at the current PC,
+			 which is checked in cancel_breakpoints below.  */
+		      || WSTOPSIG (wstat) == SIGILL
+		      || WSTOPSIG (wstat) == SIGSEGV)
 		  && cancel_breakpoint (child))
 		{
 		  /* Throw away the SIGTRAP.  */
@@ -2273,7 +2278,12 @@ cancel_breakpoints_callback (struct inferior_list_entry *entry, void *data)
       && thread->last_status.kind == TARGET_WAITKIND_IGNORE
       && lp->status_pending_p
       && WIFSTOPPED (lp->status_pending)
-      && WSTOPSIG (lp->status_pending) == SIGTRAP
+      && (WSTOPSIG (lp->status_pending) == SIGTRAP
+	  /* SIGILL and SIGSEGV are also treated as traps in case a
+	     breakpoint is inserted at the current PC, which is checked
+	     in cancel_breakpoints below.  */
+	  || WSTOPSIG (lp->status_pending) == SIGILL
+	  || WSTOPSIG (lp->status_pending) == SIGSEGV)
       && !lp->stepping
       && !lp->stopped_by_watchpoint
       && cancel_breakpoint (lp))
-- 
1.9.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint
  2014-09-14 12:11 [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint Yao Qi
@ 2014-09-16 12:13 ` Pedro Alves
  2014-09-18  2:34   ` Yao Qi
  0 siblings, 1 reply; 7+ messages in thread
From: Pedro Alves @ 2014-09-16 12:13 UTC (permalink / raw)
  To: Yao Qi, gdb-patches

On 09/14/2014 01:06 PM, Yao Qi wrote:
> I see the following fail on are-none-linux-gnueabi testing,
> 
> (gdb) continue^M
> Continuing.^M
> ^M
> Program received signal SIGILL, Illegal instruction.^M
> [Switching to Thread 1003]^M
> handler (signo=10) at
> /scratch/yqi/arm-none-linux-gnueabi/src/gdb-trunk/gdb/testsuite/gdb.threads/sigstep-threads.c:33^M
> 33        tgkill (getpid (), gettid (), SIGUSR1);       /* step-2 */^M
> (gdb) FAIL: gdb.threads/sigstep-threads.exp: continue
> 
> the cause is that GDBserver doesn't cancel the breakpoint if the stop
> signal is SIGILL.  The kernel used here is a little old, 2.6.x, and
> doesn't translate SIGILL to SIGTRAP when program hits breakpoint
> instruction (which is an illegal instruction actually).  GDB and
> GDBserver can translate SIGILL to SIGTRAP under certain circumstance,
> so it is not a problem here.  See gdbserver/linux-low.c:linux_wait_1
> 
>   /* If this event was not handled before, and is not a SIGTRAP, we
>      report it.  SIGILL and SIGSEGV are also treated as traps in case
>      a breakpoint is inserted at the current PC.  If this target does
>      not support internal breakpoints at all, we also report the
>      SIGTRAP without further processing; it's of no concern to us.  */
>   maybe_internal_trap
>     = (supports_breakpoints ()
>        && (WSTOPSIG (w) == SIGTRAP
> 	   || ((WSTOPSIG (w) == SIGILL
> 		|| WSTOPSIG (w) == SIGSEGV)
> 	       && (*the_low_target.breakpoint_at) (event_child->stop_pc))));
> 
> However, SIGILL and SIGSEGV is not considered when cancelling
> breakpoint, which causes the fail above.  That is, when GDB is doing
> software single step on address ADDR, both thread A and thread B hits the
> software single step breakpoint, and get SIGILL.  GDB selects the event
> from thread A, removes the software single step breakpoint, and resume
> the program.  The event (SIGILL) from thread B is reported to GDB, but
> GDB doesn't regard this SIGILL as SIGTRAP, because the breakpoint on
> address ADDR was removed, so GDB reports "Program received signal
> SIGILL".
> 
> The patch is to allow calling cancel_breakpoint if the signal is
> SIGILL and SIGSEGV.  This patch fixes the fail above.
> 
> Regression tested on arm-none-linux-gnueabi and x86_64-linux.
> 
> gdb/gdbserver:
> 
> 2014-09-12  Yao Qi  <yao@codesourcery.com>
> 
> 	* linux-low.c (cancel_breakpoints_callback): Allow calling
> 	cancel_breakpoint if stop signal is SIGILL or SIGSEGV.
> 	(linux_low_filter_event): Likewise.
> ---
>  gdb/gdbserver/linux-low.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
> index ec3260e..e2c9814 100644
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -1936,7 +1936,12 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat)
>  		 the core before this one is handled.  All-stop always
>  		 cancels breakpoint hits in all threads.  */
>  	      if (non_stop
> -		  && WSTOPSIG (wstat) == SIGTRAP
> +		  && (WSTOPSIG (wstat) == SIGTRAP
> +		      /* SIGILL and SIGSEGV are also treated as traps in
> +			 case a breakpoint is inserted at the current PC,
> +			 which is checked in cancel_breakpoints below.  */
> +		      || WSTOPSIG (wstat) == SIGILL
> +		      || WSTOPSIG (wstat) == SIGSEGV)
>  		  && cancel_breakpoint (child))
>  		{
>  		  /* Throw away the SIGTRAP.  */
> @@ -2273,7 +2278,12 @@ cancel_breakpoints_callback (struct inferior_list_entry *entry, void *data)
>        && thread->last_status.kind == TARGET_WAITKIND_IGNORE
>        && lp->status_pending_p
>        && WIFSTOPPED (lp->status_pending)
> -      && WSTOPSIG (lp->status_pending) == SIGTRAP
> +      && (WSTOPSIG (lp->status_pending) == SIGTRAP
> +	  /* SIGILL and SIGSEGV are also treated as traps in case a
> +	     breakpoint is inserted at the current PC, which is checked
> +	     in cancel_breakpoints below.  */
> +	  || WSTOPSIG (lp->status_pending) == SIGILL
> +	  || WSTOPSIG (lp->status_pending) == SIGSEGV)
>        && !lp->stepping
>        && !lp->stopped_by_watchpoint
>        && cancel_breakpoint (lp))
> 

Instead of duplicating the code and comments, please factor out
the SIGTRAP+SIGILL+SIGSEGVs checks to a helper function.  On the GDB side,
we have linux_nat_lp_status_is_event, and we see that it's also used
on count_count_events_callback (which gdbserver also has), which makes
sense, as it's counting threads that had breakpoint SIGTRAP-ish
events (though I'm not sure why we only consider breakpoints when
selecting the event lwp).

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint
  2014-09-16 12:13 ` Pedro Alves
@ 2014-09-18  2:34   ` Yao Qi
  2014-09-19 17:04     ` Pedro Alves
  0 siblings, 1 reply; 7+ messages in thread
From: Yao Qi @ 2014-09-18  2:34 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Pedro Alves <palves@redhat.com> writes:

> Instead of duplicating the code and comments, please factor out
> the SIGTRAP+SIGILL+SIGSEGVs checks to a helper function.  On the GDB side,
> we have linux_nat_lp_status_is_event, and we see that it's also used
> on count_count_events_callback (which gdbserver also has), which makes
> sense, as it's counting threads that had breakpoint SIGTRAP-ish
> events (though I'm not sure why we only consider breakpoints when
> selecting the event lwp).

I take a look at linux_nat_lp_status_is_event and email discussions on
adding this function <https://sourceware.org/ml/gdb-patches/2010-07/msg00275.html>,
a new function lp_status_is_sigtrap_like_event is added.  I don't use
the same name because I feel linux_nat_lp_status_is_event isn't clear
enough.  Secondly, I don't use "waitstatus.kind == TARGET_WAITKIND_IGNORE"
condition check inside lp_status_is_sigtrap_like_event, because IMO it
was used in linux_nat_lp_status_is_event due to lack of lp->status_p
flag, as the comments described.  However, in GDBserver, we have
status_pending_p flag, so "waitstatus.kind == TARGET_WAITKIND_IGNORE" is
not needed.

count_events_callback and select_event_lwp_callback in GDBServer need to
honour SIGILL and SIGSEGV too.  I write a patch to call
lp_status_is_sigtrap_like_event in them, but regression test result
isn't changed, which is a surprise to me.  I thought some fails should
be fixed.  I'll look into it deeply.

I post the updated patch to fix the issue we've seen on canceling breakpoint.

-- 
Yao (齐尧)

From: Yao Qi <yao@codesourcery.com>
Subject: [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint

I see the following fail on arm-none-linux-gnueabi testing,

(gdb) continue^M
Continuing.^M
^M
Program received signal SIGILL, Illegal instruction.^M
[Switching to Thread 1003]^M
handler (signo=10) at
/scratch/yqi/arm-none-linux-gnueabi/src/gdb-trunk/gdb/testsuite/gdb.threads/sigstep-threads.c:33^M
33        tgkill (getpid (), gettid (), SIGUSR1);       /* step-2 */^M
(gdb) FAIL: gdb.threads/sigstep-threads.exp: continue

the cause is that GDBserver doesn't cancel the breakpoint if the stop
signal is SIGILL.  The kernel used here is a little old, 2.6.x, and
doesn't translate SIGILL to SIGTRAP when program hits breakpoint
instruction (which is an illegal instruction actually).  GDB and
GDBserver can translate SIGILL to SIGTRAP under certain circumstance,
so it is not a problem here.  See gdbserver/linux-low.c:linux_wait_1

  /* If this event was not handled before, and is not a SIGTRAP, we
     report it.  SIGILL and SIGSEGV are also treated as traps in case
     a breakpoint is inserted at the current PC.  If this target does
     not support internal breakpoints at all, we also report the
     SIGTRAP without further processing; it's of no concern to us.  */
  maybe_internal_trap
    = (supports_breakpoints ()
       && (WSTOPSIG (w) == SIGTRAP
	   || ((WSTOPSIG (w) == SIGILL
		|| WSTOPSIG (w) == SIGSEGV)
	       && (*the_low_target.breakpoint_at) (event_child->stop_pc))));

However, SIGILL and SIGSEGV is not considered when cancelling
breakpoint, which causes the fail above.  That is, when GDB is doing
software single step on address ADDR, both thread A and thread B hits the
software single step breakpoint, and get SIGILL.  GDB selects the event
from thread A, removes the software single step breakpoint, and resume
the program.  The event (SIGILL) from thread B is reported to GDB, but
GDB doesn't regard this SIGILL as SIGTRAP, because the breakpoint on
address ADDR was removed, so GDB reports "Program received signal
SIGILL".

The patch is to allow calling cancel_breakpoint if the signal is
SIGILL and SIGSEGV.  This patch fixes the fail above.

gdb/gdbserver:

2014-09-18  Yao Qi  <yao@codesourcery.com>

	* linux-low.c (lp_status_is_sigtrap_like_event): New function.
	(cancel_breakpoints_callback): Call
	lp_status_is_sigtrap_like_event.
	(linux_low_filter_event): Likewise.

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index ec3260e..9c9a303 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -1739,6 +1739,20 @@ cancel_breakpoint (struct lwp_info *lwp)
   return 0;
 }
 
+/* Check for SIGTRAP-like events in LP.  */
+
+static int
+lp_status_is_sigtrap_like_event (struct lwp_info *lp)
+{
+  return (lp->status_pending_p
+	  && WIFSTOPPED (lp->status_pending)
+	  && (WSTOPSIG (lp->status_pending) == SIGTRAP
+	      /* SIGILL and SIGSEGV are also treated as traps in case a
+		 breakpoint is inserted at the current PC.  */
+	      || WSTOPSIG (lp->status_pending) == SIGILL
+	      || WSTOPSIG (lp->status_pending) == SIGSEGV));
+}
+
 /* Do low-level handling of the event, and check if we should go on
    and pass it to caller code.  Return the affected lwp if we are, or
    NULL otherwise.  */
@@ -1936,7 +1950,7 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat)
 		 the core before this one is handled.  All-stop always
 		 cancels breakpoint hits in all threads.  */
 	      if (non_stop
-		  && WSTOPSIG (wstat) == SIGTRAP
+		  && lp_status_is_sigtrap_like_event (child)
 		  && cancel_breakpoint (child))
 		{
 		  /* Throw away the SIGTRAP.  */
@@ -2271,9 +2285,7 @@ cancel_breakpoints_callback (struct inferior_list_entry *entry, void *data)
 
   if (thread->last_resume_kind != resume_stop
       && thread->last_status.kind == TARGET_WAITKIND_IGNORE
-      && lp->status_pending_p
-      && WIFSTOPPED (lp->status_pending)
-      && WSTOPSIG (lp->status_pending) == SIGTRAP
+      && lp_status_is_sigtrap_like_event (lp)
       && !lp->stepping
       && !lp->stopped_by_watchpoint
       && cancel_breakpoint (lp))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint
  2014-09-18  2:34   ` Yao Qi
@ 2014-09-19 17:04     ` Pedro Alves
  2014-09-23  8:47       ` Yao Qi
  0 siblings, 1 reply; 7+ messages in thread
From: Pedro Alves @ 2014-09-19 17:04 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

On 09/18/2014 03:30 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>> Instead of duplicating the code and comments, please factor out
>> the SIGTRAP+SIGILL+SIGSEGVs checks to a helper function.  On the GDB side,
>> we have linux_nat_lp_status_is_event, and we see that it's also used
>> on count_count_events_callback (which gdbserver also has), which makes
>> sense, as it's counting threads that had breakpoint SIGTRAP-ish
>> events (though I'm not sure why we only consider breakpoints when
>> selecting the event lwp).
> 
> I take a look at linux_nat_lp_status_is_event and email discussions on
> adding this function <https://sourceware.org/ml/gdb-patches/2010-07/msg00275.html>,
> a new function lp_status_is_sigtrap_like_event is added.  

I think something with "breakpoint" in the name,
like lp_status_maybe_breakpoint would be even clearer.  The event is
SIGTRAP-like only in the sense that it may signal a breakpoint like
SIGTRAP does.  A SIGILL is not sigtrap-like for single-steps, for example.

> I don't use
> the same name because I feel linux_nat_lp_status_is_event isn't clear
> enough.  Secondly, I don't use "waitstatus.kind == TARGET_WAITKIND_IGNORE"
> condition check inside lp_status_is_sigtrap_like_event, because IMO it
> was used in linux_nat_lp_status_is_event due to lack of lp->status_p
> flag, as the comments described.  However, in GDBserver, we have
> status_pending_p flag, so "waitstatus.kind == TARGET_WAITKIND_IGNORE" is
> not needed.
> 
> count_events_callback and select_event_lwp_callback in GDBServer need to
> honour SIGILL and SIGSEGV too.  I write a patch to call
> lp_status_is_sigtrap_like_event in them, but regression test result
> isn't changed, which is a surprise to me.  I thought some fails should
> be fixed.  I'll look into it deeply.

Maybe you're getting lucky with scheduling.
pthreads.exp and schedlock.exp I think are the most sensitive to this.

See:
 https://www.sourceware.org/ml/gdb-patches/2001-06/msg00250.html

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint
  2014-09-19 17:04     ` Pedro Alves
@ 2014-09-23  8:47       ` Yao Qi
  2014-09-23  9:58         ` Pedro Alves
  0 siblings, 1 reply; 7+ messages in thread
From: Yao Qi @ 2014-09-23  8:47 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Pedro Alves <palves@redhat.com> writes:

>> count_events_callback and select_event_lwp_callback in GDBServer need to
>> honour SIGILL and SIGSEGV too.  I write a patch to call
>> lp_status_is_sigtrap_like_event in them, but regression test result
>> isn't changed, which is a surprise to me.  I thought some fails should
>> be fixed.  I'll look into it deeply.
>
> Maybe you're getting lucky with scheduling.
> pthreads.exp and schedlock.exp I think are the most sensitive to this.

I run them ten times, the results aren't changed.

>
> See:
>  https://www.sourceware.org/ml/gdb-patches/2001-06/msg00250.html

Randomly selecting event lwp was added in the url above you gave, to
prevent the starvation of threads.  However, in my configuration
(arm-linux with SIGILL), event lwp selection does nothing, but no test
fails are caused.  GDBserver processes events like this:

 1. When GDBServer gets a breakpoint event from waitpid (-1, ),
 2. GDBserver will stop_all_lwps, in which wait_for_sigstop will drain
 all pending reports from kernel.
 3. GDBserver selects one lwp and cancels the breakpoint on the rest.  If
 event lwp selection does nothing, it is the lwp GDBserver gets in step 1.
 4. GDBserver steps over the breakpoint, and resumes all the threads.
 Go back to step 1, wait until any threads hit breakpoint,

As we can see, if waitpid (-1, ) (in step #1) returns event lwp randomly,
we don't have to randomly select event lwp again in step #3.  IMO, it is
naturally random that one thread hits the breakpoint first in a
multi-thread program.  That is the reason why no test fails are caused
without event lwp selection in my experiments.  IOW, on the platform
that waitpid (-1, ) returns event lwp randomly, we don't need such lwp
random selection at all.  However, if waitpid kernel implementation
always iterate over a list children in the fixed order, it is possible
that event of the lwp in the front of the list is reported and the rest
lwps may be starved.  In this case, we still have to reply on random
selection inside GDB/GDBserver to avoid starvation.

The patch below is updated to call lp_status_maybe_breakpoint in both
breakpoint cancellation and event lwp selection.

-- 
Yao (齐尧)

Subject: [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint and event lwp selection

I see the following fail on arm-none-linux-gnueabi testing,

(gdb) continue^M
Continuing.^M
^M
Program received signal SIGILL, Illegal instruction.^M
[Switching to Thread 1003]^M
handler (signo=10) at
/scratch/yqi/arm-none-linux-gnueabi/src/gdb-trunk/gdb/testsuite/gdb.threads/sigstep-threads.c:33^M
33        tgkill (getpid (), gettid (), SIGUSR1);       /* step-2 */^M
(gdb) FAIL: gdb.threads/sigstep-threads.exp: continue

the cause is that GDBserver doesn't cancel the breakpoint if the stop
signal is SIGILL.  The kernel used here is a little old, 2.6.x, and
doesn't translate SIGILL to SIGTRAP when program hits breakpoint
instruction (which is an illegal instruction actually).  GDB and
GDBserver can translate SIGILL to SIGTRAP under certain circumstance,
so it is not a problem here.  See gdbserver/linux-low.c:linux_wait_1

  /* If this event was not handled before, and is not a SIGTRAP, we
     report it.  SIGILL and SIGSEGV are also treated as traps in case
     a breakpoint is inserted at the current PC.  If this target does
     not support internal breakpoints at all, we also report the
     SIGTRAP without further processing; it's of no concern to us.  */
  maybe_internal_trap
    = (supports_breakpoints ()
       && (WSTOPSIG (w) == SIGTRAP
	   || ((WSTOPSIG (w) == SIGILL
		|| WSTOPSIG (w) == SIGSEGV)
	       && (*the_low_target.breakpoint_at) (event_child->stop_pc))));

However, SIGILL and SIGSEGV is not considered when cancelling
breakpoint, which causes the fail above.  That is, when GDB is doing
software single step on address ADDR, both thread A and thread B hits the
software single step breakpoint, and get SIGILL.  GDB selects the event
from thread A, removes the software single step breakpoint, and resume
the program.  The event (SIGILL) from thread B is reported to GDB, but
GDB doesn't regard this SIGILL as SIGTRAP, because the breakpoint on
address ADDR was removed, so GDB reports "Program received signal
SIGILL".

The patch is to allow calling cancel_breakpoint if the signal is
SIGILL and SIGSEGV.  This patch fixes the fail above.  Likewise, event
lwp selection should honour SIGILL and SIGSEGV too.

gdb/gdbserver:

2014-09-23  Yao Qi  <yao@codesourcery.com>

	* linux-low.c (lp_status_maybe_breakpoint): New function.
	(linux_low_filter_event): Call lp_status_maybe_breakpoint.
	(count_events_callback): Likewise.
	(select_event_lwp_callback): Likewise.
	(cancel_breakpoints_callback): Likewise.

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 705edde..2f860e9 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -1739,6 +1739,20 @@ cancel_breakpoint (struct lwp_info *lwp)
   return 0;
 }
 
+/* Return true if the event in LP may be caused by breakpoint.  */
+
+static int
+lp_status_maybe_breakpoint (struct lwp_info *lp)
+{
+  return (lp->status_pending_p
+	  && WIFSTOPPED (lp->status_pending)
+	  && (WSTOPSIG (lp->status_pending) == SIGTRAP
+	      /* SIGILL and SIGSEGV are also treated as traps in case a
+		 breakpoint is inserted at the current PC.  */
+	      || WSTOPSIG (lp->status_pending) == SIGILL
+	      || WSTOPSIG (lp->status_pending) == SIGSEGV));
+}
+
 /* Do low-level handling of the event, and check if we should go on
    and pass it to caller code.  Return the affected lwp if we are, or
    NULL otherwise.  */
@@ -1936,7 +1950,7 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat)
 		 the core before this one is handled.  All-stop always
 		 cancels breakpoint hits in all threads.  */
 	      if (non_stop
-		  && WSTOPSIG (wstat) == SIGTRAP
+		  && lp_status_maybe_breakpoint (child)
 		  && cancel_breakpoint (child))
 		{
 		  /* Throw away the SIGTRAP.  */
@@ -2197,9 +2211,7 @@ count_events_callback (struct inferior_list_entry *entry, void *data)
      should be reported to GDB.  */
   if (thread->last_status.kind == TARGET_WAITKIND_IGNORE
       && thread->last_resume_kind != resume_stop
-      && lp->status_pending_p
-      && WIFSTOPPED (lp->status_pending)
-      && WSTOPSIG (lp->status_pending) == SIGTRAP
+      && lp_status_maybe_breakpoint (lp)
       && !breakpoint_inserted_here (lp->stop_pc))
     (*count)++;
 
@@ -2237,9 +2249,7 @@ select_event_lwp_callback (struct inferior_list_entry *entry, void *data)
   /* Select only resumed LWPs that have a SIGTRAP event pending. */
   if (thread->last_resume_kind != resume_stop
       && thread->last_status.kind == TARGET_WAITKIND_IGNORE
-      && lp->status_pending_p
-      && WIFSTOPPED (lp->status_pending)
-      && WSTOPSIG (lp->status_pending) == SIGTRAP
+      && lp_status_maybe_breakpoint (lp)
       && !breakpoint_inserted_here (lp->stop_pc))
     if ((*selector)-- == 0)
       return 1;
@@ -2271,9 +2281,7 @@ cancel_breakpoints_callback (struct inferior_list_entry *entry, void *data)
 
   if (thread->last_resume_kind != resume_stop
       && thread->last_status.kind == TARGET_WAITKIND_IGNORE
-      && lp->status_pending_p
-      && WIFSTOPPED (lp->status_pending)
-      && WSTOPSIG (lp->status_pending) == SIGTRAP
+      && lp_status_maybe_breakpoint (lp)
       && !lp->stepping
       && !lp->stopped_by_watchpoint
       && cancel_breakpoint (lp))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint
  2014-09-23  8:47       ` Yao Qi
@ 2014-09-23  9:58         ` Pedro Alves
  2014-09-23 12:55           ` Yao Qi
  0 siblings, 1 reply; 7+ messages in thread
From: Pedro Alves @ 2014-09-23  9:58 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

On 09/23/2014 09:42 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>>> count_events_callback and select_event_lwp_callback in GDBServer need to
>>> honour SIGILL and SIGSEGV too.  I write a patch to call
>>> lp_status_is_sigtrap_like_event in them, but regression test result
>>> isn't changed, which is a surprise to me.  I thought some fails should
>>> be fixed.  I'll look into it deeply.
>>
>> Maybe you're getting lucky with scheduling.
>> pthreads.exp and schedlock.exp I think are the most sensitive to this.
> 
> I run them ten times, the results aren't changed.
> 
>>
>> See:
>>  https://www.sourceware.org/ml/gdb-patches/2001-06/msg00250.html
> 
> Randomly selecting event lwp was added in the url above you gave, to
> prevent the starvation of threads.  However, in my configuration
> (arm-linux with SIGILL), event lwp selection does nothing, but no test
> fails are caused.  GDBserver processes events like this:
> 
>  1. When GDBServer gets a breakpoint event from waitpid (-1, ),
>  2. GDBserver will stop_all_lwps, in which wait_for_sigstop will drain
>  all pending reports from kernel.
>  3. GDBserver selects one lwp and cancels the breakpoint on the rest.  If
>  event lwp selection does nothing, it is the lwp GDBserver gets in step 1.
>  4. GDBserver steps over the breakpoint, and resumes all the threads.
>  Go back to step 1, wait until any threads hit breakpoint,
> 
> As we can see, if waitpid (-1, ) (in step #1) returns event lwp randomly,
> we don't have to randomly select event lwp again in step #3.  IMO, it is
> naturally random that one thread hits the breakpoint first in a
> multi-thread program.

Depends on scheduling.  When the program is resumed, the thread that had
last hit the breakpoint may manage to be scheduled before other threads
manage to be scheduled and hit a breakpoint themselves.

> That is the reason why no test fails are caused
> without event lwp selection in my experiments.  IOW, on the platform
> that waitpid (-1, ) returns event lwp randomly, 
> we don't need such lwp
> random selection at all.  However, if waitpid kernel implementation
> always iterate over a list children in the fixed order, it is possible
> that event of the lwp in the front of the list is reported and the rest
> lwps may be starved.  In this case, we still have to reply on random
> selection inside GDB/GDBserver to avoid starvation.

I'm looking at kernel/exit.c on the Linux kernel's sources I have handy
(14186fea0cb06bc43181ce239efe0df6f1af260a), specifically at
do_wait() / do_wait_thread() / ptrace_do_wait() and it seems to me
that waitpid always walks the task list in the same order:

	set_current_state(TASK_INTERRUPTIBLE);
	read_lock(&tasklist_lock);
	tsk = current;
	do {
		retval = do_wait_thread(wo, tsk);
		if (retval)
			goto end;

		retval = ptrace_do_wait(wo, tsk);
		if (retval)
			goto end;

		if (wo->wo_flags & __WNOTHREAD)
			break;
	} while_each_thread(current, tsk);
	read_unlock(&tasklist_lock);

So seems like it's still like Michael said back then: "If more than one
LWP is currently stopped at a breakpoint, the highest-numbered one
will be returned.", and it's likely you're being lucky with scheduling.
E.g., multi-core vs single-core, or the scheduling algorithms in the
kernel improved and are masking the issue.  Or, simply the tests
don't really exercise the starvation issue properly.

Anyway,

> The patch below is updated to call lp_status_maybe_breakpoint in both
> breakpoint cancellation and event lwp selection.

This patch is OK.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint
  2014-09-23  9:58         ` Pedro Alves
@ 2014-09-23 12:55           ` Yao Qi
  0 siblings, 0 replies; 7+ messages in thread
From: Yao Qi @ 2014-09-23 12:55 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Pedro Alves <palves@redhat.com> writes:

> This patch is OK.

Thanks for the review.  Patch is pushed in.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-23 12:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-14 12:11 [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint Yao Qi
2014-09-16 12:13 ` Pedro Alves
2014-09-18  2:34   ` Yao Qi
2014-09-19 17:04     ` Pedro Alves
2014-09-23  8:47       ` Yao Qi
2014-09-23  9:58         ` Pedro Alves
2014-09-23 12:55           ` Yao Qi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).