* [PATCH] Fix GDB hang with remote after error from resume @ 2018-01-10 16:56 Andreas Arnez 2018-01-12 19:11 ` [pushed + testcase] " Pedro Alves 0 siblings, 1 reply; 10+ messages in thread From: Andreas Arnez @ 2018-01-10 16:56 UTC (permalink / raw) To: gdb-patches; +Cc: Pedro Alves Since this commit -- Fix PR18360 - internal error when using "interrupt -a" (https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=c65d6b55) -- the testsuite shows long delays on s390 with native-gdbserver when executing certain tests, such as watchpoints.exp. These hangs have been discussed before in the context of buildbot problems, see here: https://sourceware.org/ml/gdb-patches/2017-12/msg00413.html The problem can easily be triggered by stopping on a breakpoint, then setting impossible watchpoints, and finally doing "continue". Then, after having set the step-over state (in keep_going_pass_signal in infrun.c), GDB tries to insert breakpoints and watchpoints into the inferior. This fails, and the "continue" command is aborted. But the step-over state is not cleared in this case, which causes future step-over attempts to be skipped since GDB thinks that "we already have an in-line step-over operation ongoing" (see start_step_over in infrun.c). Thus the next "continue" just goes on to wait for events from the remote, which will never occur. The problem can also be reproduced on amd64 with native-gdbserver, using the following change to watchpoints.exp: -- >8 -- --- a/gdb/testsuite/gdb.base/watchpoints.exp +++ b/gdb/testsuite/gdb.base/watchpoints.exp @@ -61,2 +61,3 @@ with_test_prefix "before inferior start" { gdb_test "watch ival3" ".*" "" + gdb_test "watch *(char \[256\] *) main" -- >8 -- To fix the hang, this patch clears the step-over info when insert_breakpoints has failed. Of course, with native-gdbserver the watchpoints.exp test case still causes many FAILs on s390, because gdbserver does not support watchpoints for that target. This is a separate issue. gdb/ChangeLog: * infrun.c (keep_going_pass_signal): Clear step-over info when insert_breakpoints fails. --- gdb/infrun.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gdb/infrun.c b/gdb/infrun.c index 7e8d8da..720443b 100644 --- a/gdb/infrun.c +++ b/gdb/infrun.c @@ -7751,6 +7751,7 @@ keep_going_pass_signal (struct execution_control_state *ecs) { exception_print (gdb_stderr, e); stop_waiting (ecs); + clear_step_over_info (); return; } END_CATCH -- 2.5.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [pushed + testcase] Re: [PATCH] Fix GDB hang with remote after error from resume 2018-01-10 16:56 [PATCH] Fix GDB hang with remote after error from resume Andreas Arnez @ 2018-01-12 19:11 ` Pedro Alves 2018-01-15 14:43 ` Andreas Arnez 0 siblings, 1 reply; 10+ messages in thread From: Pedro Alves @ 2018-01-12 19:11 UTC (permalink / raw) To: Andreas Arnez, gdb-patches On 01/10/2018 04:56 PM, Andreas Arnez wrote: > Since this commit -- > > Fix PR18360 - internal error when using "interrupt -a" > (https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=c65d6b55) > > -- the testsuite shows long delays on s390 with native-gdbserver when > executing certain tests, such as watchpoints.exp. These hangs have been > discussed before in the context of buildbot problems, see here: > > https://sourceware.org/ml/gdb-patches/2017-12/msg00413.html > > The problem can easily be triggered by stopping on a breakpoint, then > setting impossible watchpoints, and finally doing "continue". Then, after > having set the step-over state (in keep_going_pass_signal in infrun.c), > GDB tries to insert breakpoints and watchpoints into the inferior. This > fails, and the "continue" command is aborted. But the step-over state is > not cleared in this case, which causes future step-over attempts to be > skipped since GDB thinks that "we already have an in-line step-over > operation ongoing" (see start_step_over in infrun.c). Thus the next > "continue" just goes on to wait for events from the remote, which will > never occur. > Thanks much for the fix. > The problem can also be reproduced on amd64 with native-gdbserver, using > the following change to watchpoints.exp: > > -- >8 -- > --- a/gdb/testsuite/gdb.base/watchpoints.exp > +++ b/gdb/testsuite/gdb.base/watchpoints.exp > @@ -61,2 +61,3 @@ with_test_prefix "before inferior start" { > gdb_test "watch ival3" ".*" "" > + gdb_test "watch *(char \[256\] *) main" > > -- >8 -- > One question I had with this is why would it only trigger with native-gdbserver. After debugging a bit, it was obvious -- the reason is simply that native debugging uses displaced stepping by default, unlike remote debugging, because native debugging enables all-stop-on-top-of-non-stop by default. I never got around to flipping that on ("maint set target-non-stop on") by default with remote debugging. I've pushed in your patch to both master and 8.1, along with this follow up commit adding a testcase. From 1d17025506de70cb1d9d5b7a5654e40ce689bf26 Mon Sep 17 00:00:00 2001 From: Pedro Alves <palves@redhat.com> Date: Fri, 12 Jan 2018 18:59:40 +0000 Subject: [PATCH] Add testcase for GDB hang fixed by previous commit This adds a testcase for the previous commit. The regression was related to in-line step overs. The reason we didn't see it on native x86-64/s390 GNU/Linux testing is that native debugging uses displaced stepping by default (because native debugging defaults to "maint set target-non-stop on"), unlike remote debugging. So in order to trigger the bug with native debugging as well, the testcase disables displaced stepping explicitly. Also, instead of using watchpoints to trigger the regression, the testcase uses a breakpoint at address 0, which should be more portable. gdb/testsuite/ChangeLog: 2018-01-12 Pedro Alves <palves@redhat.com> * gdb.base/continue-after-aborted-step-over.c: New. * gdb.base/continue-after-aborted-step-over.exp: New. --- gdb/testsuite/ChangeLog | 5 ++ .../gdb.base/continue-after-aborted-step-over.c | 29 ++++++++ .../gdb.base/continue-after-aborted-step-over.exp | 87 ++++++++++++++++++++++ 3 files changed, 121 insertions(+) create mode 100644 gdb/testsuite/gdb.base/continue-after-aborted-step-over.c create mode 100644 gdb/testsuite/gdb.base/continue-after-aborted-step-over.exp diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog index 7d7c389d98d..90ffb4fa443 100644 --- a/gdb/testsuite/ChangeLog +++ b/gdb/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2018-01-12 Pedro Alves <palves@redhat.com> + + * gdb.base/continue-after-aborted-step-over.c: New. + * gdb.base/continue-after-aborted-step-over.exp: New. + 2018-01-11 Pedro Alves <palves@redhat.com> PR remote/22597 diff --git a/gdb/testsuite/gdb.base/continue-after-aborted-step-over.c b/gdb/testsuite/gdb.base/continue-after-aborted-step-over.c new file mode 100644 index 00000000000..7252648084e --- /dev/null +++ b/gdb/testsuite/gdb.base/continue-after-aborted-step-over.c @@ -0,0 +1,29 @@ +/* This testcase is part of GDB, the GNU debugger. + + Copyright 2018 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +void +function (void) +{ +} + +int +main () +{ + function (); + + return 0; +} diff --git a/gdb/testsuite/gdb.base/continue-after-aborted-step-over.exp b/gdb/testsuite/gdb.base/continue-after-aborted-step-over.exp new file mode 100644 index 00000000000..297cb638587 --- /dev/null +++ b/gdb/testsuite/gdb.base/continue-after-aborted-step-over.exp @@ -0,0 +1,87 @@ +# Copyright 2018 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# This testcase is a regression test for a regression in the in-line +# step-over machinery. If a resumption that starts a step-over +# failed, a following resumption would make GDB hang forever: +# +# (gdb) b *0 +# Breakpoint 2 at 0x0 +# continue +# Continuing. +# Warning: +# Cannot insert breakpoint 2. +# Cannot access memory at address 0x0 +# +# Command aborted. +# delete breakpoints +# Delete all breakpoints? (y or n) y +# (gdb) b function +# Breakpoint 3 at 0x40048b: file test.c, line 33. +# continue +# Continuing. +# *GDB hangs forever* + +standard_testfile + +if {[build_executable "failed to prepare" $testfile $srcfile debug]} { + return -1 +} + +# DISPLACED indicates whether to use displaced-stepping. +proc do_test {displaced} { + global gdb_prompt decimal + global srcfile binfile + + clean_restart $binfile + + gdb_test_no_output "set displaced-stepping $displaced" + + if ![runto_main] { + fail "run to main" + return -1 + } + + # We rely on not being able to set a breakpoint at 0, as proxy for + # any kind of breakpoint insertion failure. If we can examine + # what's at memory address 0, it is possible that we could also + # execute it. + if [is_address_zero_readable] { + untested "memory at address 0 is possibly executable" + return + } + + # Set a breakpoint that fails to insert. + gdb_test "b *0" "Breakpoint $decimal at 0x0" + + gdb_test "continue" \ + "Command aborted\\." \ + "continue aborts" + + # Delete the "bad" breakpoint and try continuing again. + delete_breakpoints + gdb_test "b function" "Breakpoint $decimal .*$srcfile.*" + + gdb_test "continue" \ + "Breakpoint $decimal, function \\(\\) at .*$srcfile:.*" \ + "continue to function" +} + +# This testcase exercises a regression with the in-line step-over +# machinery. So make sure this runs with displaced stepping disabled, +# and for good measure, also try with displaced stepping enabled. +foreach_with_prefix displaced-stepping {"off" "on"} { + do_test ${displaced-stepping} +} -- 2.14.3 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pushed + testcase] Re: [PATCH] Fix GDB hang with remote after error from resume 2018-01-12 19:11 ` [pushed + testcase] " Pedro Alves @ 2018-01-15 14:43 ` Andreas Arnez 2018-01-15 14:53 ` Yao Qi 2018-01-15 15:18 ` Sergio Durigan Junior 0 siblings, 2 replies; 10+ messages in thread From: Andreas Arnez @ 2018-01-15 14:43 UTC (permalink / raw) To: Sergio Durigan Junior, Pedro Alves Cc: gdb-patches, David Edelsohn, Edjunior Machado On Fri, Jan 12 2018, Pedro Alves wrote: > I've pushed in your patch to both master and 8.1, along with this > follow up commit adding a testcase. Great, thanks! Sergio, now that this is in, we should be able to re-activate the reporting for the s390 (and Power) Buildbots, right? -- Andreas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pushed + testcase] Re: [PATCH] Fix GDB hang with remote after error from resume 2018-01-15 14:43 ` Andreas Arnez @ 2018-01-15 14:53 ` Yao Qi 2018-01-15 15:22 ` Andreas Arnez 2018-01-15 17:37 ` Edjunior Machado 2018-01-15 15:18 ` Sergio Durigan Junior 1 sibling, 2 replies; 10+ messages in thread From: Yao Qi @ 2018-01-15 14:53 UTC (permalink / raw) To: Andreas Arnez Cc: Sergio Durigan Junior, Pedro Alves, GDB Patches, David Edelsohn, Edjunior Machado On Mon, Jan 15, 2018 at 2:43 PM, Andreas Arnez <arnez@linux.vnet.ibm.com> wrote: > > Sergio, now that this is in, we should be able to re-activate the > reporting for the s390 (and Power) Buildbots, right? > I am not sure ppc64 issue is resolved. See https://sourceware.org/ml/gdb-patches/2017-12/msg00414.html Note that I had a patch to fix fails in gdb.threads/process-dies-while-handling-bp.exp in this thread, but the tests with gdbserver still hang in some watchpoint tests. -- Yao (齐尧) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pushed + testcase] Re: [PATCH] Fix GDB hang with remote after error from resume 2018-01-15 14:53 ` Yao Qi @ 2018-01-15 15:22 ` Andreas Arnez 2018-01-15 16:10 ` Yao Qi 2018-01-15 17:37 ` Edjunior Machado 1 sibling, 1 reply; 10+ messages in thread From: Andreas Arnez @ 2018-01-15 15:22 UTC (permalink / raw) To: Yao Qi Cc: Sergio Durigan Junior, Pedro Alves, GDB Patches, David Edelsohn, Edjunior Machado On Mon, Jan 15 2018, Yao Qi wrote: > On Mon, Jan 15, 2018 at 2:43 PM, Andreas Arnez <arnez@linux.vnet.ibm.com> wrote: >> >> Sergio, now that this is in, we should be able to re-activate the >> reporting for the s390 (and Power) Buildbots, right? >> > > I am not sure ppc64 issue is resolved. See > https://sourceware.org/ml/gdb-patches/2017-12/msg00414.html > Note that I had a patch to fix fails in > gdb.threads/process-dies-while-handling-bp.exp > in this thread, but the tests with gdbserver still hang in some watchpoint > tests. Hm, too bad. Well, since this type of hang doesn't seem to occur on s390, we can at least try re-activating s390 then. -- Andreas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pushed + testcase] Re: [PATCH] Fix GDB hang with remote after error from resume 2018-01-15 15:22 ` Andreas Arnez @ 2018-01-15 16:10 ` Yao Qi 0 siblings, 0 replies; 10+ messages in thread From: Yao Qi @ 2018-01-15 16:10 UTC (permalink / raw) To: Andreas Arnez Cc: Sergio Durigan Junior, Pedro Alves, GDB Patches, David Edelsohn, Edjunior Machado On Mon, Jan 15, 2018 at 3:22 PM, Andreas Arnez <arnez@linux.vnet.ibm.com> wrote: > > Hm, too bad. Well, since this type of hang doesn't seem to occur on > s390, we can at least try re-activating s390 then. > Yeah, I agree! -- Yao (齐尧) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pushed + testcase] Re: [PATCH] Fix GDB hang with remote after error from resume 2018-01-15 14:53 ` Yao Qi 2018-01-15 15:22 ` Andreas Arnez @ 2018-01-15 17:37 ` Edjunior Machado 2018-01-15 18:17 ` Pedro Alves 1 sibling, 1 reply; 10+ messages in thread From: Edjunior Machado @ 2018-01-15 17:37 UTC (permalink / raw) To: Yao Qi Cc: Andreas Arnez, Sergio Durigan Junior, Pedro Alves, GDB Patches, David Edelsohn, Edjunior Machado Hi all, On Mon, Jan 15, 2018 at 3:52 PM, Yao Qi <qiyaoltc@gmail.com> wrote: > On Mon, Jan 15, 2018 at 2:43 PM, Andreas Arnez <arnez@linux.vnet.ibm.com> > wrote: > > > > Sergio, now that this is in, we should be able to re-activate the > > reporting for the s390 (and Power) Buildbots, right? > > > > I am not sure ppc64 issue is resolved. See > https://sourceware.org/ml/gdb-patches/2017-12/msg00414.html > Note that I had a patch to fix fails in > gdb.threads/process-dies-while-handling-bp.exp > in this thread, but the tests with gdbserver still hang in some watchpoint > tests. > > Andreas' patch does fix the gdb.base/watchpoint.exp hang on ppc64* native-gdbserver, but as Yao mentioned, it is still hanging on gdb.threads/process-dies-while-handling-bp.exp without his fix ( https://sourceware.org/ml/gdb-patches/2017-12/msg00434.html). That said, just tested upstream testsuite, with Andreas' patch included, plus Yao's aforementioned patch and native-gdbserver testrun no longer hangs. These are the results I got with native-gdbserver on Fedora26: ppc64le: === gdb Summary === # of expected passes 50410 # of unexpected failures 550 # of unexpected successes 3 # of expected failures 58 # of unknown successes 4 # of known failures 64 # of unresolved testcases 5 # of untested testcases 49 # of unsupported tests 171 /home/fedora/binutils-gdb.git/build/gdb/gdb version 8.1.50.20180115-git -nw -nx -data-directory /home/fedora/binutils-gdb.git/build/gdb/testsuite/../data-directory -ex "set auto-connect-native-target off" ppc64: === gdb Summary === # of expected passes 50133 # of unexpected failures 1109 # of unexpected successes 3 # of expected failures 58 # of unknown successes 4 # of known failures 65 # of unresolved testcases 6 # of untested testcases 51 # of unsupported tests 162 /home/fedora/binutils-gdb.git/build/gdb/gdb version 8.1.50.20180115-git -nw -nx -data-directory /home/fedora/binutils-gdb.git/build/gdb/testsuite/../data-directory -ex "set auto-connect-native-target off" (FWIW, at first glance, gdb.linespec/cpcompletion.exp in ppc64 BE seems to be one of the main culprits for the whole testrun taking longer (~1h) and with considerably more failures than in ppc64le, aparently due to the ppc64's function descriptor.) Please let me know if you need any additional testing. Thanks and regards, Edjunior ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pushed + testcase] Re: [PATCH] Fix GDB hang with remote after error from resume 2018-01-15 17:37 ` Edjunior Machado @ 2018-01-15 18:17 ` Pedro Alves 2018-01-15 20:11 ` Sergio Durigan Junior 0 siblings, 1 reply; 10+ messages in thread From: Pedro Alves @ 2018-01-15 18:17 UTC (permalink / raw) To: Edjunior Machado, Yao Qi Cc: Andreas Arnez, Sergio Durigan Junior, GDB Patches, David Edelsohn, Edjunior Machado On 01/15/2018 05:37 PM, Edjunior Machado wrote: > (FWIW, at first glance, gdb.linespec/cpcompletion.exp in ppc64 BE seems to > be one of the main culprits for the whole testrun taking longer (~1h) and > with considerably more failures than in ppc64le, aparently due to the > ppc64's function descriptor.) Sergio, would it possible to include the time it took to run the testsuite in the messages sent to gdb-testers? As is, such new long cascading timeouts are introduced without one realizing, unless you go look at the builds' pages one by one. Ideally we'd even get a notification if time jumped seemingly too much between builds. And IWBN if the gdb buildbot web frontend had some kind of graph tracking/plotting test run duration over time (build# for a given builder), so we could just open some URL and notice the spikes. </pie in the sky>. Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pushed + testcase] Re: [PATCH] Fix GDB hang with remote after error from resume 2018-01-15 18:17 ` Pedro Alves @ 2018-01-15 20:11 ` Sergio Durigan Junior 0 siblings, 0 replies; 10+ messages in thread From: Sergio Durigan Junior @ 2018-01-15 20:11 UTC (permalink / raw) To: Pedro Alves Cc: Edjunior Machado, Yao Qi, Andreas Arnez, GDB Patches, David Edelsohn, Edjunior Machado On Monday, January 15 2018, Pedro Alves wrote: > On 01/15/2018 05:37 PM, Edjunior Machado wrote: >> (FWIW, at first glance, gdb.linespec/cpcompletion.exp in ppc64 BE seems to >> be one of the main culprits for the whole testrun taking longer (~1h) and >> with considerably more failures than in ppc64le, aparently due to the >> ppc64's function descriptor.) > > Sergio, would it possible to include the time it took to run the > testsuite in the messages sent to gdb-testers? As is, such new > long cascading timeouts are introduced without one realizing, > unless you go look at the builds' pages one by one. I'll see about that. There must be a way, because the web interface provides this information, but I don't remember seeing it before. > Ideally we'd even get a notification if time jumped > seemingly too much between builds. This would take more time to implement, but if there is a way to obtain the build time, then it should be possible. > And IWBN if the gdb buildbot web frontend had some kind of graph > tracking/plotting test run duration over time (build# for a > given builder), so we could just open some URL and > notice the spikes. </pie in the sky>. I'll leave this pie in the sky :-). -- Sergio GPG key ID: 237A 54B1 0287 28BF 00EF 31F4 D0EB 7628 65FC 5E36 Please send encrypted e-mail if possible http://sergiodj.net/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pushed + testcase] Re: [PATCH] Fix GDB hang with remote after error from resume 2018-01-15 14:43 ` Andreas Arnez 2018-01-15 14:53 ` Yao Qi @ 2018-01-15 15:18 ` Sergio Durigan Junior 1 sibling, 0 replies; 10+ messages in thread From: Sergio Durigan Junior @ 2018-01-15 15:18 UTC (permalink / raw) To: Andreas Arnez; +Cc: Pedro Alves, gdb-patches, David Edelsohn, Edjunior Machado On Monday, January 15 2018, Andreas Arnez wrote: > On Fri, Jan 12 2018, Pedro Alves wrote: > >> I've pushed in your patch to both master and 8.1, along with this >> follow up commit adding a testcase. > > Great, thanks! > > Sergio, now that this is in, we should be able to re-activate the > reporting for the s390 (and Power) Buildbots, right? Thanks for taking care of this, Andreas. I had to cancel all pending builds for the 3 s390x builders, because they had more than 130 builds in the queue each. Now, I'll monitor the next builds and check if the problem still persists. If not, then I can certainly re-activate the notifications for s390x. As for the PPC builders, Yao mentioned that there are probably other issues to be solved. I know Edjunior is also testing your fix on PPC as I write this e-mail, so I'll wait for their opinions here. Thanks! -- Sergio GPG key ID: 237A 54B1 0287 28BF 00EF 31F4 D0EB 7628 65FC 5E36 Please send encrypted e-mail if possible http://sergiodj.net/ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-01-15 20:11 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-10 16:56 [PATCH] Fix GDB hang with remote after error from resume Andreas Arnez 2018-01-12 19:11 ` [pushed + testcase] " Pedro Alves 2018-01-15 14:43 ` Andreas Arnez 2018-01-15 14:53 ` Yao Qi 2018-01-15 15:22 ` Andreas Arnez 2018-01-15 16:10 ` Yao Qi 2018-01-15 17:37 ` Edjunior Machado 2018-01-15 18:17 ` Pedro Alves 2018-01-15 20:11 ` Sergio Durigan Junior 2018-01-15 15:18 ` Sergio Durigan Junior
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).