[PATCH v2] Fix random dejagnu test abort with simulator target

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH v2] Fix random dejagnu test abort with simulator target
@ 2024-04-22  9:55 Bernd Edlinger
  2024-04-22 13:49 ` Andrew Burgess
  0 siblings, 1 reply; 3+ messages in thread
From: Bernd Edlinger @ 2024-04-22  9:55 UTC (permalink / raw)
  To: gdb-patches, Tom de Vries

This is probably a dejagnu issue with the gdb testsuite
when a simulator target is used.  I observed random
testrun aborts with dejagnu 1.6.2-1 from ubuntu 20.04
The problem starts when the test case gdb.base/sigwinch-notty.exp
tries to execute "sleep", although that is impossible with a
simulator.  And for unknown reason the test case completes
(with errors) before the "after 1000" block is run.

Then in a totally different test this happens with 50% likelihood:

ERROR: (DejaGnu) proc "bgerror {can't read "gdb_pid": no such variable}" does not exist.
The error code is TCL LOOKUP COMMAND bgerror
The info on the error is:
invalid command name "bgerror"
    while executing
"::tcl_unknown bgerror {can't read "gdb_pid": no such variable}"
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 ::tcl_unknown $args"

                === gdb Summary ===

 # of expected passes            30815
 # of unexpected failures        241
 # of expected failures          3
 # of known failures             23
 # of unresolved testcases       241
 # of untested testcases         96
 # of unsupported tests          532
 # of paths in test names        1

So the whole test run is aborted in the middle.

This patch should fix the issue.

Co-Authored-By: Tom de Vries <tdevries@suse.de>
---
 gdb/testsuite/gdb.base/sigwinch-notty.exp | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

v2: I took over Tom's suggestion 1:1, and gave it a few test runs with
no unexpected test aborts so far.
So this looks quite good to me, and has also a nice improvement of giving
an UNSUPPORTED message, with what exactly was the reason why the test
did not run.

diff --git a/gdb/testsuite/gdb.base/sigwinch-notty.exp b/gdb/testsuite/gdb.base/sigwinch-notty.exp
index cef21c07c59..621231df6af 100644
--- a/gdb/testsuite/gdb.base/sigwinch-notty.exp
+++ b/gdb/testsuite/gdb.base/sigwinch-notty.exp
@@ -19,11 +19,17 @@
 
 require {!target_info exists gdb,nosignals}
 
-# The testfile relies on "run" from the command line, so only works
-# with "target native".
-if { [target_info gdb_protocol] != "" } {
-    return
-}
+# The test-case relies on "run" from the command line, so it only works
+# with "target native", so we need host == target.
+#
+# The test-case uses "exp_pid -i $gdb_spawn_id" which doesn't work with
+# remote host, so we need build == host.
+#
+# In other words, we need build == host == target.
+require {!is_remote host} {!is_remote target}
+
+# Check that we have "target native" as opposed to native-gdbserver etc.
+require {string equal [target_info gdb_protocol] ""}
 
 gdb_exit
 
-- 
2.39.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] Fix random dejagnu test abort with simulator target
  2024-04-22  9:55 [PATCH v2] Fix random dejagnu test abort with simulator target Bernd Edlinger
@ 2024-04-22 13:49 ` Andrew Burgess
  2024-04-22 14:53   ` Bernd Edlinger
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Burgess @ 2024-04-22 13:49 UTC (permalink / raw)
  To: Bernd Edlinger, gdb-patches, Tom de Vries

Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

> This is probably a dejagnu issue with the gdb testsuite
> when a simulator target is used.  I observed random
> testrun aborts with dejagnu 1.6.2-1 from ubuntu 20.04
> The problem starts when the test case gdb.base/sigwinch-notty.exp
> tries to execute "sleep", although that is impossible with a
> simulator.  And for unknown reason the test case completes
> (with errors) before the "after 1000" block is run.
>
> Then in a totally different test this happens with 50% likelihood:
>
> ERROR: (DejaGnu) proc "bgerror {can't read "gdb_pid": no such variable}" does not exist.
> The error code is TCL LOOKUP COMMAND bgerror
> The info on the error is:
> invalid command name "bgerror"
>     while executing
> "::tcl_unknown bgerror {can't read "gdb_pid": no such variable}"
>     ("uplevel" body line 1)
>     invoked from within
> "uplevel 1 ::tcl_unknown $args"
>
>                 === gdb Summary ===
>
>  # of expected passes            30815
>  # of unexpected failures        241
>  # of expected failures          3
>  # of known failures             23
>  # of unresolved testcases       241
>  # of untested testcases         96
>  # of unsupported tests          532
>  # of paths in test names        1
>
> So the whole test run is aborted in the middle.
>
> This patch should fix the issue.
>
> Co-Authored-By: Tom de Vries <tdevries@suse.de>
> ---
>  gdb/testsuite/gdb.base/sigwinch-notty.exp | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> v2: I took over Tom's suggestion 1:1, and gave it a few test runs with
> no unexpected test aborts so far.
> So this looks quite good to me, and has also a nice improvement of giving
> an UNSUPPORTED message, with what exactly was the reason why the test
> did not run.
>
> diff --git a/gdb/testsuite/gdb.base/sigwinch-notty.exp b/gdb/testsuite/gdb.base/sigwinch-notty.exp
> index cef21c07c59..621231df6af 100644
> --- a/gdb/testsuite/gdb.base/sigwinch-notty.exp
> +++ b/gdb/testsuite/gdb.base/sigwinch-notty.exp
> @@ -19,11 +19,17 @@
>  
>  require {!target_info exists gdb,nosignals}
>  
> -# The testfile relies on "run" from the command line, so only works
> -# with "target native".
> -if { [target_info gdb_protocol] != "" } {
> -    return
> -}
> +# The test-case relies on "run" from the command line, so it only works
> +# with "target native", so we need host == target.

I'm currently reading and rereaching the V1 series to try and understand
what's going on here, but I don't understand this comment.

  # The test-case relies on "run" from the command line, so it only works
  # with "target native", so we need host == target.

Just doesn't make sense to me: 'target extended-remote' will also
support the `run` command, as will 'target sim'.

The conclusion might well be valid, but I think the logic used to
justify it is incorrect.   Can we rephrase this comment so it makes
sense?

Thanks,
Andrew


> +#
> +# The test-case uses "exp_pid -i $gdb_spawn_id" which doesn't work with
> +# remote host, so we need build == host.
> +#
> +# In other words, we need build == host == target.
> +require {!is_remote host} {!is_remote target}
> +
> +# Check that we have "target native" as opposed to native-gdbserver etc.
> +require {string equal [target_info gdb_protocol] ""}
>  
>  gdb_exit
>  
> -- 
> 2.39.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] Fix random dejagnu test abort with simulator target
  2024-04-22 13:49 ` Andrew Burgess
@ 2024-04-22 14:53   ` Bernd Edlinger
  0 siblings, 0 replies; 3+ messages in thread
From: Bernd Edlinger @ 2024-04-22 14:53 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches, Tom de Vries

On 4/22/24 15:49, Andrew Burgess wrote:
> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>> +# The test-case relies on "run" from the command line, so it only works
>> +# with "target native", so we need host == target.
> 
> I'm currently reading and rereaching the V1 series to try and understand
> what's going on here, but I don't understand this comment.
> 
>   # The test-case relies on "run" from the command line, so it only works
>   # with "target native", so we need host == target.
> 
> Just doesn't make sense to me: 'target extended-remote' will also
> support the `run` command, as will 'target sim'.
> 
> The conclusion might well be valid, but I think the logic used to
> justify it is incorrect.   Can we rephrase this comment so it makes
> sense?
> 

I probably don't know how to properly explain that.
But would very much appreciate any advice how to.

The test case ties to do this:
The gdb host/build environment is x86_64-pc-linux-gnu,
and the target environment is riscv-unknown-elf
with newlib so the run command would work but only
to run a program that was built with my riscv-unknown-elf-gcc.
But the test case ties to run "/usr/bin/sleep 3",
therefore the gdb is unable to run that, and since
the stdin is redirected to /dev/null the gdb terminates
immediately, so the test script terminates with one FAIL
and one PASS test case, BUT the after 1000 is still hanging
on, and interrupts the whole test run, or just one test case
that depends on some kind of race condition, and maybe also
a bit on the dejagnu version.
It is interesting that the gdb_pid is no longer known, when
the after 1000 executes, and creates the cryptic message
ERROR: (DejaGnu) proc "bgerror {can't read "gdb_pid": no such variable}" does not exist.

Please feel free to ask if I can give more information.

Thanks
Bernd.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-04-22 16:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-22  9:55 [PATCH v2] Fix random dejagnu test abort with simulator target Bernd Edlinger
2024-04-22 13:49 ` Andrew Burgess
2024-04-22 14:53   ` Bernd Edlinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).