public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls
@ 2020-03-11 14:54 trnka at scm dot com
  2020-03-12  8:43 ` [Bug libfortran/94143] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: trnka at scm dot com @ 2020-03-11 14:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

            Bug ID: 94143
           Summary: [9/10 Regression] Asynchronous execute_command_line()
                    breaks following synchronous calls
           Product: gcc
           Version: 9.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libfortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: trnka at scm dot com
  Target Milestone: ---

Since PR90038 introduced a SIGCHLD handler into execute_command_line(), calling
an asynchronous execute_command_line(wait=.false.) breaks all subsequent
synchronous calls (no matter if those are through
execute_command_line(wait=.true.) or through various libraries), because the
signal handler stays around forever and indiscriminately reaps any child
processes.

The result is that the internal wait() at the end of system()-like calls fails
with ECHILD if the signal handler fires earlier and does a wait() on that
process.

Given that this is a race between the signal handler and the synchronous
wait(), it's somewhat tricky to reproduce reliably. The following test case
triggers it on my machine

program asyncexec
   implicit none

   integer :: i

!$omp parallel default(shared)
!$omp single
   call execute_command_line('sleep 30', wait=.false.)
   do i = 1, 10
      write(*,*) i
      call execute_command_line('/bin/true')
   end do
!$omp end single
!$omp end parallel
end program

This typically leads to the following error on the first or second iteration:

Fortran runtime error: EXECUTE_COMMAND_LINE: Termination status of the
command-language interpreter cannot be obtained

Error termination. Backtrace:
#0  0x7f979747c5fa in set_cmdstat
        at ../../../libgfortran/intrinsics/execute_command_line.c:63
#1  0x7f979747c829 in set_cmdstat
        at ../../../libgfortran/intrinsics/execute_command_line.c:58
#2  0x7f979747c829 in execute_command_line
        at ../../../libgfortran/intrinsics/execute_command_line.c:133

The issue has nothing to do with OpenMP, I'm just using it to get multiple
concurrent threads to maximize the chance that the signal handler will run on a
different thread before the forking thread has a chance to call wait(). In real
life, this issue affects MPI applications because MPI libraries typically spawn
some background event-handling threads even if the program itself is
single-threaded.

I don't see a way to workaround this in user code, so I'd suggest removing the
offending SIGCHLD handler as a quick "fix". That'll leave zombie processes
around, but those are mostly harmless. IMHO there are two possible proper
solutions:

1) Spawn a dedicated thread to specifically wait for the PID launched by the
asynchronous call, instead of a blanket wait(-1).
2) Record all asynchronously launched PIDs in a global list. The SIGCHLD
handler would then extract the PID from siginfo and consult the list to see
whether it should call wait().

Option #1 seems easier to implement to me. I can try to come up with a patch if
desired.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-07-07 10:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-11 14:54 [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls trnka at scm dot com
2020-03-12  8:43 ` [Bug libfortran/94143] " rguenth at gcc dot gnu.org
2020-03-12 11:58 ` jakub at gcc dot gnu.org
2020-04-18 15:54 ` tkoenig at gcc dot gnu.org
2020-05-07 21:05 ` [Bug libfortran/94143] [9/10/11 " anlauf at gcc dot gnu.org
2020-05-08  9:57 ` trnka at scm dot com
2021-06-01  8:16 ` [Bug libfortran/94143] [9/10/11/12 " rguenth at gcc dot gnu.org
2022-05-27  9:42 ` [Bug libfortran/94143] [10/11/12/13 " rguenth at gcc dot gnu.org
2022-06-28 10:40 ` jakub at gcc dot gnu.org
2023-07-07 10:37 ` [Bug libfortran/94143] [11/12/13/14 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).