public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug testsuite/100203] New: Dejagnu timeouts don't work
@ 2021-04-22 10:46 jakub at gcc dot gnu.org
  2021-04-22 11:03 ` [Bug testsuite/100203] " jakub at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-22 10:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100203

            Bug ID: 100203
           Summary: Dejagnu timeouts don't work
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: testsuite
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

On i686-linux, the libstdc++ 29_atomics/atomic_float/wait_notify.cc testcase is
miscompiled and hangs.
That is tracked elsewhere, this PR is about the make check getting stuck
forever when it times out.
If I
cd i686-pc-linux-gnu/libstdc++-v3/testsuite
make check RUNTESTFLAGS='-v -v -v conformance.exp=wait_notify.cc'
then I see
...
spawn -ignore SIGHUP /home/jakub/src/gcc/obj11/./gcc/xg++ -shared-libgcc
-B/home/jakub/src/gcc/obj11/./gcc -nostdinc++
-L/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/src
-L/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/src/.libs
-L/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/libsupc++/.libs
-B/usr/local/i686-pc-linux-gnu/bin/ -B/usr/local/i686-pc-linux-gnu/lib/
-isystem /usr/local/i686-pc-linux-gnu/include -isystem
/usr/local/i686-pc-linux-gnu/sys-include -fchecking=1
-B/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/src/.libs
-fmessage-length=0 -fno-show-column -ffunction-sections -fdata-sections -g -O2
-DLOCALEDIR="." -nostdinc++
-I/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/include/i686-pc-linux-gnu
-I/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/include
-I/home/jakub/src/gcc/libstdc++-v3/libsupc++
-I/home/jakub/src/gcc/libstdc++-v3/include/backward
-I/home/jakub/src/gcc/libstdc++-v3/testsuite/util
/home/jakub/src/gcc/libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc
-std=gnu++2a -pthread -fdiagnostics-plain-output ./libtestc++.a
-Wl,--gc-sections
-L/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/src/filesystem/.libs
-lm -o ./wait_notify.exe
pid is 1403276 -1403276
pid is -1
waitres is 1403276 exp7 0 0
output is  status 0
calling is_remote host
board_info build name 
getting tucnak name
board_info host name 
getting tucnak name
board is host, host is local
Checking pattern "sparc-*-sunos*" with i686-pc-linux-gnu
Checking pattern "alpha*-*-*" with i686-pc-linux-gnu
Checking pattern "hppa*-*-hpux*" with i686-pc-linux-gnu
Checking pattern "sparc-*-sunos*" with i686-pc-linux-gnu
Checking pattern "alpha*-*-*" with i686-pc-linux-gnu
Checking pattern "hppa*-*-hpux*" with i686-pc-linux-gnu
board_info target name 
getting unix name
calling is_remote target
board_info build name 
getting tucnak name
board_info host name 
getting tucnak name
calling is_remote unix
board_info build name 
getting tucnak name
board_info host name 
getting tucnak name
board is unix, not remote
board_info target exists is_simulator
board_info unix exists name
board_info unix name 
getting unix name
board_info unix exists name
board_info unix exists protocol
board_info unix protocol 
getting unix protocol
call_remote  load unix ./wait_notify.exe {} {} 
board_info unix file_transfer 
getting unix file_transfer
board_info unix connect 
getting unix connect
call_remote calling unix_load
loading to unix
calling is_remote unix
board_info build name 
getting tucnak name
board_info host name 
getting tucnak name
board is unix, not remote
Setting LD_LIBRARY_PATH to
:/home/jakub/src/gcc/obj11/gcc:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/../libatomic/.libs:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/../libgomp/.libs:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/src/.libs::/home/jakub/src/gcc/obj11/gcc:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/../libatomic/.libs:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/../libgomp/.libs:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/src/.libs
Execution timeout is: 300
calling is_remote unix
board_info build name 
getting tucnak name
board_info host name 
getting tucnak name
board is unix, not remote
remote_spawn is local
board_info unix exists name
board_info unix name 
getting unix name
spawning command ./wait_notify.exe 
spawn [open ...]
setting board_info(unix,fileid) to exp13
board_info unix exists name
board_info unix name 
getting unix name
board_info unix exists name
board_info unix exists protocol
board_info unix protocol 
getting unix protocol
call_remote  wait unix 300 
board_info unix file_transfer 
getting unix file_transfer
board_info unix connect 
getting unix connect
call_remote calling standard_wait
board_info target exists gcc,timeout
board_info target exists gcc,timeout
board_info unix fileid 
getting unix fileid
====
WARNING: program timed out.
board_info unix exists name
board_info unix name 
getting unix name
board_info unix exists name
board_info unix exists protocol
board_info unix protocol 
getting unix protocol
call_remote  close unix  
board_info unix connect 
getting unix connect
call_remote calling standard_close
board_info unix exists fileid
board_info unix fileid 
getting unix fileid
Closing the remote shell exp13
board_info unix exists fileid_origid
board_info unix fileid_origid 
getting unix fileid_origid
doing kill, pid is 1403285 1403286
pid is 1403285 1403286

Now, 1403285 process is the wait_notify.exe that is stuck and 1403286 is a cat
process that dejagnu? seems to pipe
the output of the process through for some reason.
dejagnu remote.exp seems to run
sh -c "exec > /dev/null 2>&1 && (kill -2 -1403285 1403286 || kill -2 1403285
1403286)"
and
sh -c "exec > /dev/null 2>&1 && sleep 5 && (kill -15 -1403285 1403286 || kill
-15 1403285 1403286) && sleep 5 && (kill -9 -1403285 1403286 || kill -9 1403285
1403286) && sleep 5"
The problem is I think in the $pid containing more than one pid.
If I run the kill command manually and without stderr redirection, I get:
kill -2 -1403285 1403286; echo $?
sh: kill: (-1403285) - No such process
0
similarly for -15 or -9.
1403285 pts/23   S+     0:00 ./wait_notify.exe
1403286 pts/23   Z+     0:00 [cat] <defunct>
While kill man page says that when multiple processes are specified and there
is just partial success, 64 should be returned rather than 0,
that is not what is happening for me.

So, I wonder if
    if { $pid > 0 } {
        # Tcl has no kill primitive, so we have to execute an external
        # command in order to kill the process.
        verbose "doing kill, pid is $pid"
        # Prepend "-" to generate the "process group ID" needed by
        # kill.
        set pgid "-$pid"
        # Send SIGINT to give the program a better chance to interrupt
        # whatever it might be doing and react to stdin closing.
        # eg, in case of GDB, this should get it back to the prompt.
        exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid || kill -2 $pid)"

        # If the program doesn't exit gracefully when stdin closes,
        # we'll need to kill it.  But only do this after 'wait'ing a
        # bit, to avoid killing the wrong process in case of a
        # PID-reuse race.  The extra sleep at the end is there to give
        # time to kill $exec_pid without having _that_ be subject to a
        # PID reuse race.
        set secs 5
        set sh_cmd "exec > /dev/null 2>&1"
        append sh_cmd " && sleep $secs && (kill -15 $pgid || kill -15 $pid)"
        append sh_cmd " && sleep $secs && (kill -9 $pgid || kill -9 $pid)"
        append sh_cmd " && sleep $secs"
        set exec_pid [exec sh -c "$sh_cmd" &]
    }
shouldn't be changed, so that if $pid contains more than one number instead of
doing one (kill -SIGNUM $pgid || kill -SIGNUM $pid) it will do
separate kill -SIGNUM -$pid || kill -SIGNUM $pid for each of the pids in the
list.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/100203] Dejagnu timeouts don't work
  2021-04-22 10:46 [Bug testsuite/100203] New: Dejagnu timeouts don't work jakub at gcc dot gnu.org
@ 2021-04-22 11:03 ` jakub at gcc dot gnu.org
  2021-04-22 11:19 ` redi at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-22 11:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100203

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The multiple pids in $pid is a result of standard_close, which does:
        if {[board_info ${host} exists fileid_origid]} {
            set oid [board_info ${host} fileid_origid]
            set pid [pid $oid]
            unset board_info(${host},fileid_origid)
Where that [pid $oid] where $oid is file7 for me results in the multiple pids.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/100203] Dejagnu timeouts don't work
  2021-04-22 10:46 [Bug testsuite/100203] New: Dejagnu timeouts don't work jakub at gcc dot gnu.org
  2021-04-22 11:03 ` [Bug testsuite/100203] " jakub at gcc dot gnu.org
@ 2021-04-22 11:19 ` redi at gcc dot gnu.org
  2021-04-22 11:23 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2021-04-22 11:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100203

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
This seems to be a bug in the Bash 'kill' builtin.

#!/bin/bash

# pass /usr/bin/kill as $1 to use the command not the bash builtin
kill=${1:-kill}

sleep 60 &
pid1=$!
sleep 60 &
pid2=$!

sh -c "exec > /dev/stdout 2>&1 && ($kill -15 -$pid1 $pid2 || { echo kill pgrp
failed, trying again... ; $kill -15 $pid1 $pid2 || echo $?;})"
ps -ef | awk 'NR==1 || /[s]leep/ {print}'

kill $pid1 $pid2 2>/dev/null


Run without arguments, this script prints something like:

sh: line 0: kill: (-2977556) - No such process
UID          PID    PPID  C STIME TTY          TIME CMD
jwakely  2977556 2977555  0 12:14 pts/2    00:00:00 sleep 60

i.e. the first kill command kills $pid2 but fails to kill -$pid1 (because there
is no such process group) but it exits with status 0, so we don't try again
just $pid1 instead of -$pid1

Passing /usr/bin/kill as $1 to the script we get:

kill: sending signal to -2977547 failed: No such process
kill pgrp failed, trying again...
kill: sending signal to 2977548 failed: No such process
0
UID          PID    PPID  C STIME TTY          TIME CMD

i.e. the first kill command kills $pid2 and fails to kill -$pid1 (as before),
but because it was only partial success it exits with non-zero status, and we
try again using $pid1 instead of -$pid1. That second command succeeds in
killing $pid1 this time, but gives and error for $pid2 (because it was already
killed).

POSIX says:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html


    The following exit values shall be returned:

     0
        At least one matching process was found for each pid operand, and the
specified signal was successfully processed for at least one matching process.
    >0
        An error occurred.


The Bash builtin seems to be wrong here, because no matching process was found
for -$pid1 so it should not return 0.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/100203] Dejagnu timeouts don't work
  2021-04-22 10:46 [Bug testsuite/100203] New: Dejagnu timeouts don't work jakub at gcc dot gnu.org
  2021-04-22 11:03 ` [Bug testsuite/100203] " jakub at gcc dot gnu.org
  2021-04-22 11:19 ` redi at gcc dot gnu.org
@ 2021-04-22 11:23 ` jakub at gcc dot gnu.org
  2021-04-22 11:26 ` redi at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-22 11:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100203

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note that bash documents its behavior:
'kill'
          kill [-s SIGSPEC] [-n SIGNUM] [-SIGSPEC] JOBSPEC or PID
          kill -l|-L [EXIT_STATUS]

     Send a signal specified by SIGSPEC or SIGNUM to the process named
     by job specification JOBSPEC or process ID PID.  SIGSPEC is either
     a case-insensitive signal name such as 'SIGINT' (with or without
     the 'SIG' prefix) or a signal number; SIGNUM is a signal number.
     If SIGSPEC and SIGNUM are not present, 'SIGTERM' is used.  The '-l'
     option lists the signal names.  If any arguments are supplied when
     '-l' is given, the names of the signals corresponding to the
     arguments are listed, and the return status is zero.  EXIT_STATUS
     is a number specifying a signal number or the exit status of a
     process terminated by a signal.  The '-L' option is equivalent to
     '-l'.  The return status is zero if at least one signal was
     successfully sent, or non-zero if an error occurs or an invalid
     option is encountered.
but yes, it is different from the man 1 kill documentation.

Just tried:
--- /usr/share/dejagnu/remote.exp.jj    2020-07-27 18:54:19.000000000 +0200
+++ /usr/share/dejagnu/remote.exp       2021-04-22 13:12:21.843958084 +0200
@@ -76,7 +76,7 @@ proc close_wait_program { program_id pid
        # Send SIGINT to give the program a better chance to interrupt
        # whatever it might be doing and react to stdin closing.
        # eg, in case of GDB, this should get it back to the prompt.
-       exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid || kill -2 $pid)"
+       exec sh -c "exec > /dev/null 2>&1 && (env kill -2 $pgid || env kill -2
$pid)"

        # If the program doesn't exit gracefully when stdin closes,
        # we'll need to kill it.  But only do this after 'wait'ing a
@@ -86,8 +86,8 @@ proc close_wait_program { program_id pid
        # PID reuse race.
        set secs 5
        set sh_cmd "exec > /dev/null 2>&1"
-       append sh_cmd " && sleep $secs && (kill -15 $pgid || kill -15 $pid)"
-       append sh_cmd " && sleep $secs && (kill -9 $pgid || kill -9 $pid)"
+       append sh_cmd " && sleep $secs && (env kill -15 $pgid || env kill -15
$pid)"
+       append sh_cmd " && sleep $secs && (env kill -9 $pgid || env kill -9
$pid)"
        append sh_cmd " && sleep $secs"
        set exec_pid [exec sh -c "$sh_cmd" &]
     }
@@ -104,7 +104,7 @@ proc close_wait_program { program_id pid
        # We reaped the process, so cancel the pending force-kills, as
        # otherwise if the PID is reused for some other unrelated
        # process, we'd kill the wrong process.
-       exec sh -c "exec > /dev/null 2>&1 && kill -9 $exec_pid"
+       exec sh -c "exec > /dev/null 2>&1 && env kill -9 $exec_pid"
     }

     return $res
and with that change make check doesn't hang, but works as expected:
make check RUNTESTFLAGS='conformance.exp=wait_notify.cc'
make  check-DEJAGNU
make[1]: Entering directory
'/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/testsuite'
...
Native configuration is i686-pc-linux-gnu

                === libstdc++ tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for
target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/jakub/src/gcc/libstdc++-v3/testsuite/config/default.exp as
tool-and-target-specific interface file.
Running /home/jakub/src/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
...
WARNING: program timed out.
FAIL: 29_atomics/atomic_float/wait_notify.cc execution test

                === libstdc++ Summary ===

# of expected passes            5
# of unexpected failures        1
make[1]: Leaving directory
'/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/testsuite'

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/100203] Dejagnu timeouts don't work
  2021-04-22 10:46 [Bug testsuite/100203] New: Dejagnu timeouts don't work jakub at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-04-22 11:23 ` jakub at gcc dot gnu.org
@ 2021-04-22 11:26 ` redi at gcc dot gnu.org
  2021-04-22 11:38 ` redi at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2021-04-22 11:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100203

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Arguably the Bash builtin is behaving as documented. It doesn't mention any
support for process group IDs, and says it returns 0 if at least one signal was
sent, which is true because dejagnu is passing two PIDs.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/100203] Dejagnu timeouts don't work
  2021-04-22 10:46 [Bug testsuite/100203] New: Dejagnu timeouts don't work jakub at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-04-22 11:26 ` redi at gcc dot gnu.org
@ 2021-04-22 11:38 ` redi at gcc dot gnu.org
  2021-04-22 11:59 ` redi at gcc dot gnu.org
  2021-04-22 15:15 ` msebor at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2021-04-22 11:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100203

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Previoously reported upstream by Richi:
https://lists.gnu.org/archive/html/bug-dejagnu/2018-07/msg00000.html
But apparently not actually fixed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/100203] Dejagnu timeouts don't work
  2021-04-22 10:46 [Bug testsuite/100203] New: Dejagnu timeouts don't work jakub at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-04-22 11:38 ` redi at gcc dot gnu.org
@ 2021-04-22 11:59 ` redi at gcc dot gnu.org
  2021-04-22 15:15 ` msebor at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2021-04-22 11:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100203

--- Comment #6 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Richi's patch is in DejaGnu 1.6.2 but Jakub and I are using 1.6.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/100203] Dejagnu timeouts don't work
  2021-04-22 10:46 [Bug testsuite/100203] New: Dejagnu timeouts don't work jakub at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-04-22 11:59 ` redi at gcc dot gnu.org
@ 2021-04-22 15:15 ` msebor at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: msebor at gcc dot gnu.org @ 2021-04-22 15:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100203

Martin Sebor <msebor at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=98823
                 CC|                            |msebor at gcc dot gnu.org

--- Comment #7 from Martin Sebor <msebor at gcc dot gnu.org> ---
Possibly related to pr98823?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-04-22 15:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22 10:46 [Bug testsuite/100203] New: Dejagnu timeouts don't work jakub at gcc dot gnu.org
2021-04-22 11:03 ` [Bug testsuite/100203] " jakub at gcc dot gnu.org
2021-04-22 11:19 ` redi at gcc dot gnu.org
2021-04-22 11:23 ` jakub at gcc dot gnu.org
2021-04-22 11:26 ` redi at gcc dot gnu.org
2021-04-22 11:38 ` redi at gcc dot gnu.org
2021-04-22 11:59 ` redi at gcc dot gnu.org
2021-04-22 15:15 ` msebor at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).