public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug go/98823] New: go testsuite and timeouts
@ 2021-01-25 15:29 jakub at gcc dot gnu.org
  2021-01-25 18:55 ` [Bug go/98823] " ian at airs dot com
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-25 15:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

            Bug ID: 98823
           Summary: go testsuite and timeouts
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: go
          Assignee: ian at airs dot com
          Reporter: jakub at gcc dot gnu.org
                CC: cmang at google dot com
  Target Milestone: ---

Is there something in the libgo testsuite (as well as go.test) that should be
killing tests if they are stuck?
Normally in dejagnu driven tests there is a timeout and if a test for whatever
reason doesn't finish within that timeout, dejagnu kills it.
But I think go testing bypasses that.
E.g. today we've seen on armv7hl the
gcc/testsuite/go/issue19182.x test getting stuck for several hours, but there
was nothing that would just kill the test, so manual killall -9 on it was
needed to make the build finish.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
@ 2021-01-25 18:55 ` ian at airs dot com
  2021-01-25 19:09 ` schwab@linux-m68k.org
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: ian at airs dot com @ 2021-01-25 18:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #1 from Ian Lance Taylor <ian at airs dot com> ---
The Go testsuite is intended to have timeouts for all tests.

The test gcc/testsuite/go.test/test/fixedbugs/issue19182.go is just passed off
to the TCL function go-torture-execute.  Running the executable
gcc/testsuite/go/issue19182.x is the "execute" part of the test. 
go-torture-execute calls the TCL function go_load to run the test.

My assumption, which may well be wrong, is that a TCL function like go_load
applies a timeout by default.  I'm not even sure where go_load is defined, but
it clearly does exist.  I'm still baffled by how all the DejaGNU code works.

It's odd that issue19182.x keeps running.  The test itself is intended to be
self-limiting.  I'm not sure what is going on there.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
  2021-01-25 18:55 ` [Bug go/98823] " ian at airs dot com
@ 2021-01-25 19:09 ` schwab@linux-m68k.org
  2021-01-25 19:34 ` ian at airs dot com
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: schwab@linux-m68k.org @ 2021-01-25 19:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #2 from Andreas Schwab <schwab@linux-m68k.org> ---
go_load is defined in lib/gcc-dg.exp.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
  2021-01-25 18:55 ` [Bug go/98823] " ian at airs dot com
  2021-01-25 19:09 ` schwab@linux-m68k.org
@ 2021-01-25 19:34 ` ian at airs dot com
  2021-01-25 19:41 ` schwab@linux-m68k.org
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: ian at airs dot com @ 2021-01-25 19:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #3 from Ian Lance Taylor <ian at airs dot com> ---
I'm sure I'm missing something, but what I see in lib/gcc-dg.exp is code that
says "if ${tool}_load already exists, then wrap it."  I don't see the original
implementation of ${tool}_load.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-01-25 19:34 ` ian at airs dot com
@ 2021-01-25 19:41 ` schwab@linux-m68k.org
  2021-01-25 19:44 ` schwab@linux-m68k.org
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: schwab@linux-m68k.org @ 2021-01-25 19:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #4 from Andreas Schwab <schwab@linux-m68k.org> ---
That's standard part of dejagnu.

/usr/share/dejagnu/standard.exp

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-01-25 19:41 ` schwab@linux-m68k.org
@ 2021-01-25 19:44 ` schwab@linux-m68k.org
  2021-01-25 20:13 ` ian at airs dot com
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: schwab@linux-m68k.org @ 2021-01-25 19:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #5 from Andreas Schwab <schwab@linux-m68k.org> ---
And for the unix board, its implementation is in
/usr/share/dejagnu/config/unix.exp.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-01-25 19:44 ` schwab@linux-m68k.org
@ 2021-01-25 20:13 ` ian at airs dot com
  2021-01-25 20:52 ` schwab@linux-m68k.org
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: ian at airs dot com @ 2021-01-25 20:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #6 from Ian Lance Taylor <ian at airs dot com> ---
Thanks.  So, unix_load does seem to have a timeout by default, and as far as I
can see the Go testsuite code isn't doing anything to change that.  Why isn't
the timeout working?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-01-25 20:13 ` ian at airs dot com
@ 2021-01-25 20:52 ` schwab@linux-m68k.org
  2021-01-25 20:59 ` ian at airs dot com
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: schwab@linux-m68k.org @ 2021-01-25 20:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #7 from Andreas Schwab <schwab@linux-m68k.org> ---
Perhaps the test is blocking or ignoring SIGTERM, or handling it in some
incompatible way.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-01-25 20:52 ` schwab@linux-m68k.org
@ 2021-01-25 20:59 ` ian at airs dot com
  2021-01-25 23:31 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: ian at airs dot com @ 2021-01-25 20:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #8 from Ian Lance Taylor <ian at airs dot com> ---
The test is pretty simple.

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/go.test/test/fixedbugs/issue19182.go;h=e1f3ffb4749f4dbb4c2204c4a0f484aea91b4771;hb=HEAD

The interesting thing it does is start a goroutine that runs an infinite loop. 
But the main goroutine should always terminate.

The test doesn't do any signal handling at all so a SIGTERM should always just
terminate the program.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-01-25 20:59 ` ian at airs dot com
@ 2021-01-25 23:31 ` jakub at gcc dot gnu.org
  2021-02-01 14:15 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-25 23:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I don't have access to the box where it happened, I was just lucky somebody
else had and could find the stuck process for me and kill it.
In the past 2 month gcc builds were stuck similar way several times but I
didn't have anybody to tell me what is going on and therefore I just killed the
builds and retried, so no idea if it was go or something else.
Anyway, guess we could try to do a go test that will just sleep for 20 minutes
and see if it gets killed with timeout, and similarly a go test that will do
say a busy loop and see if it gets killed.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-01-25 23:31 ` jakub at gcc dot gnu.org
@ 2021-02-01 14:15 ` jakub at gcc dot gnu.org
  2021-02-07 17:48 ` ian at airs dot com
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-02-01 14:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Got it stuck again over the weekend:
kojibui+  9992  101  0.1 884512 29120 ?        Sl   Jan31 968:04               
                                          \_
+/builddir/build/BUILD/gcc-11.0.0-20210130/obj-armv7hl-redhat-linux-gnueabi/gcc/testsuite/go/issue19182.x
Could be kernel bug or whatever, but it would be nice if it didn't halt the
whole regtesting forever (or until killed manually).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-02-01 14:15 ` jakub at gcc dot gnu.org
@ 2021-02-07 17:48 ` ian at airs dot com
  2021-02-07 18:09 ` ian at airs dot com
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: ian at airs dot com @ 2021-02-07 17:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #11 from Ian Lance Taylor <ian at airs dot com> ---
I'm just noting that DejaGNU appears to have a bug in the standard_wait
procedure:

http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=blob;f=lib/remote.exp;h=1c9971a076415adc2fcdc04ab8f78cc832ce1098;hb=HEAD#l1162

The code seems to assume that the parameter timeout will set the timeout for
the remote_expect.  But as far as I can tell, when running under expect,
"timeout" is always a global variable.  So the $timeout that appears in the
function refers to the global variable named "timeout", not the parameter named
"timeout".

This means that although the DejaGNU procedure unix_load appears to set the
timeout to the value of the global variable "test_timeout", and logs various
messages to that effect, in fact that variable has no effect.  Only the
variable "timeout" matters.

However, this DejaGNU bug is not important in the larger scheme of things and
does not seem to affect this issue.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2021-02-07 17:48 ` ian at airs dot com
@ 2021-02-07 18:09 ` ian at airs dot com
  2021-02-07 19:00 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: ian at airs dot com @ 2021-02-07 18:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #12 from Ian Lance Taylor <ian at airs dot com> ---
Other than the timeout issue, DejaGNU appears to work as expected on my system.
 After 300 seconds, it will run the shell command "kill -2 $spid" which kills
the program.  The program issue19182.go does nothing special with SIGINT, and
in my testing it simply exits.  In any case, if the program does not exist
after the "kill -2", DejaGNU waits 5 seconds and does a "kill -15", then waits
another 5 seconds and does a "kill -9".

In short, I can't recreate the problem and I don't see where the bug is.

What version of DejaGNU is in use on the system where the problem occurs?  I'm
testing with DejaGNU 1.6.2.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2021-02-07 18:09 ` ian at airs dot com
@ 2021-02-07 19:00 ` jakub at gcc dot gnu.org
  2021-02-12 17:25 ` ian at airs dot com
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-02-07 19:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I got that with dejagnu 1.6.1.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2021-02-07 19:00 ` jakub at gcc dot gnu.org
@ 2021-02-12 17:25 ` ian at airs dot com
  2021-04-22 12:03 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: ian at airs dot com @ 2021-02-12 17:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #14 from Ian Lance Taylor <ian at airs dot com> ---
The code that kills the test process (close_wait_program in lib/remote.exp) has
indeed changed between DejaGNU 1.6.1 and 1.6.2.  That said, I don't see any
reason why the 1.6.1 code wouldn't kill the test process.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2021-02-12 17:25 ` ian at airs dot com
@ 2021-04-22 12:03 ` jakub at gcc dot gnu.org
  2021-04-22 12:46 ` redi at gcc dot gnu.org
  2023-05-04 13:02 ` bergner at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-22 12:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Might very well be
https://lists.gnu.org/archive/html/bug-dejagnu/2018-07/msg00000.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2021-04-22 12:03 ` jakub at gcc dot gnu.org
@ 2021-04-22 12:46 ` redi at gcc dot gnu.org
  2023-05-04 13:02 ` bergner at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: redi at gcc dot gnu.org @ 2021-04-22 12:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

--- Comment #16 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Ian Lance Taylor from comment #11)
> I'm just noting that DejaGNU appears to have a bug in the standard_wait
> procedure:
> 
> http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=blob;f=lib/remote.exp;
> h=1c9971a076415adc2fcdc04ab8f78cc832ce1098;hb=HEAD#l1162
> 
> The code seems to assume that the parameter timeout will set the timeout for
> the remote_expect.  But as far as I can tell, when running under expect,
> "timeout" is always a global variable.  So the $timeout that appears in the
> function refers to the global variable named "timeout", not the parameter
> named "timeout".
> 
> This means that although the DejaGNU procedure unix_load appears to set the
> timeout to the value of the global variable "test_timeout", and logs various
> messages to that effect, in fact that variable has no effect.  Only the
> variable "timeout" matters.

I raised this on the DG mailing list a few months ago, see:
https://lists.gnu.org/archive/html/dejagnu/2020-12/msg00000.html
It's actually GCC's fault for monkeypatching the standard_wait proc:
https://lists.gnu.org/archive/html/dejagnu/2020-12/msg00003.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug go/98823] go testsuite and timeouts
  2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2021-04-22 12:46 ` redi at gcc dot gnu.org
@ 2023-05-04 13:02 ` bergner at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: bergner at gcc dot gnu.org @ 2023-05-04 13:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98823

Peter Bergner <bergner at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2023-05-04
                 CC|                            |bergner at gcc dot gnu.org,
                   |                            |boger at gcc dot gnu.org

--- Comment #17 from Peter Bergner <bergner at gcc dot gnu.org> ---
I just hit this twice yesterday (ie, infinite hang and no timeout) on our big
internal P10 development system.  Talking with Lynn Boger who is our golang
lead, she mentioned hitting this too on big systems with lots of cores.  She
had opened an upstream golang issue discussing it:

    https://github.com/golang/go/issues/47246

Lynn's workaround of setting GOMAXPROCS=64 worked for me.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-05-04 13:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25 15:29 [Bug go/98823] New: go testsuite and timeouts jakub at gcc dot gnu.org
2021-01-25 18:55 ` [Bug go/98823] " ian at airs dot com
2021-01-25 19:09 ` schwab@linux-m68k.org
2021-01-25 19:34 ` ian at airs dot com
2021-01-25 19:41 ` schwab@linux-m68k.org
2021-01-25 19:44 ` schwab@linux-m68k.org
2021-01-25 20:13 ` ian at airs dot com
2021-01-25 20:52 ` schwab@linux-m68k.org
2021-01-25 20:59 ` ian at airs dot com
2021-01-25 23:31 ` jakub at gcc dot gnu.org
2021-02-01 14:15 ` jakub at gcc dot gnu.org
2021-02-07 17:48 ` ian at airs dot com
2021-02-07 18:09 ` ian at airs dot com
2021-02-07 19:00 ` jakub at gcc dot gnu.org
2021-02-12 17:25 ` ian at airs dot com
2021-04-22 12:03 ` jakub at gcc dot gnu.org
2021-04-22 12:46 ` redi at gcc dot gnu.org
2023-05-04 13:02 ` bergner at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).