public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls
@ 2020-03-11 14:54 trnka at scm dot com
  2020-03-12  8:43 ` [Bug libfortran/94143] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: trnka at scm dot com @ 2020-03-11 14:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

            Bug ID: 94143
           Summary: [9/10 Regression] Asynchronous execute_command_line()
                    breaks following synchronous calls
           Product: gcc
           Version: 9.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libfortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: trnka at scm dot com
  Target Milestone: ---

Since PR90038 introduced a SIGCHLD handler into execute_command_line(), calling
an asynchronous execute_command_line(wait=.false.) breaks all subsequent
synchronous calls (no matter if those are through
execute_command_line(wait=.true.) or through various libraries), because the
signal handler stays around forever and indiscriminately reaps any child
processes.

The result is that the internal wait() at the end of system()-like calls fails
with ECHILD if the signal handler fires earlier and does a wait() on that
process.

Given that this is a race between the signal handler and the synchronous
wait(), it's somewhat tricky to reproduce reliably. The following test case
triggers it on my machine

program asyncexec
   implicit none

   integer :: i

!$omp parallel default(shared)
!$omp single
   call execute_command_line('sleep 30', wait=.false.)
   do i = 1, 10
      write(*,*) i
      call execute_command_line('/bin/true')
   end do
!$omp end single
!$omp end parallel
end program

This typically leads to the following error on the first or second iteration:

Fortran runtime error: EXECUTE_COMMAND_LINE: Termination status of the
command-language interpreter cannot be obtained

Error termination. Backtrace:
#0  0x7f979747c5fa in set_cmdstat
        at ../../../libgfortran/intrinsics/execute_command_line.c:63
#1  0x7f979747c829 in set_cmdstat
        at ../../../libgfortran/intrinsics/execute_command_line.c:58
#2  0x7f979747c829 in execute_command_line
        at ../../../libgfortran/intrinsics/execute_command_line.c:133

The issue has nothing to do with OpenMP, I'm just using it to get multiple
concurrent threads to maximize the chance that the signal handler will run on a
different thread before the forking thread has a chance to call wait(). In real
life, this issue affects MPI applications because MPI libraries typically spawn
some background event-handling threads even if the program itself is
single-threaded.

I don't see a way to workaround this in user code, so I'd suggest removing the
offending SIGCHLD handler as a quick "fix". That'll leave zombie processes
around, but those are mostly harmless. IMHO there are two possible proper
solutions:

1) Spawn a dedicated thread to specifically wait for the PID launched by the
asynchronous call, instead of a blanket wait(-1).
2) Record all asynchronously launched PIDs in a global list. The SIGCHLD
handler would then extract the PID from siginfo and consult the list to see
whether it should call wait().

Option #1 seems easier to implement to me. I can try to come up with a patch if
desired.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libfortran/94143] [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls
  2020-03-11 14:54 [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls trnka at scm dot com
@ 2020-03-12  8:43 ` rguenth at gcc dot gnu.org
  2020-03-12 11:58 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-03-12  8:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P4
   Target Milestone|---                         |9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libfortran/94143] [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls
  2020-03-11 14:54 [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls trnka at scm dot com
  2020-03-12  8:43 ` [Bug libfortran/94143] " rguenth at gcc dot gnu.org
@ 2020-03-12 11:58 ` jakub at gcc dot gnu.org
  2020-04-18 15:54 ` tkoenig at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-03-12 11:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.3                         |9.4

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 9.3.0 has been released, adjusting target milestone.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libfortran/94143] [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls
  2020-03-11 14:54 [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls trnka at scm dot com
  2020-03-12  8:43 ` [Bug libfortran/94143] " rguenth at gcc dot gnu.org
  2020-03-12 11:58 ` jakub at gcc dot gnu.org
@ 2020-04-18 15:54 ` tkoenig at gcc dot gnu.org
  2020-05-07 21:05 ` [Bug libfortran/94143] [9/10/11 " anlauf at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-04-18 15:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-04-18
           Keywords|                            |wrong-code
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Confirmed.

It would be nice to get this fixed before releasing 10.0.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libfortran/94143] [9/10/11 Regression] Asynchronous execute_command_line() breaks following synchronous calls
  2020-03-11 14:54 [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls trnka at scm dot com
                   ` (2 preceding siblings ...)
  2020-04-18 15:54 ` tkoenig at gcc dot gnu.org
@ 2020-05-07 21:05 ` anlauf at gcc dot gnu.org
  2020-05-08  9:57 ` trnka at scm dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: anlauf at gcc dot gnu.org @ 2020-05-07 21:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

anlauf at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |anlauf at gcc dot gnu.org

--- Comment #3 from anlauf at gcc dot gnu.org ---
Funny.  I do not get failures when compiling with -fsanitize=thread.

With valgrind --tool=helgrind I get lots of "Possible data race" hints
that I'm not sure how to interpret.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libfortran/94143] [9/10/11 Regression] Asynchronous execute_command_line() breaks following synchronous calls
  2020-03-11 14:54 [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls trnka at scm dot com
                   ` (3 preceding siblings ...)
  2020-05-07 21:05 ` [Bug libfortran/94143] [9/10/11 " anlauf at gcc dot gnu.org
@ 2020-05-08  9:57 ` trnka at scm dot com
  2021-06-01  8:16 ` [Bug libfortran/94143] [9/10/11/12 " rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: trnka at scm dot com @ 2020-05-08  9:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

--- Comment #4 from Tomáš Trnka <trnka at scm dot com> ---
(In reply to anlauf from comment #3)
> Funny.  I do not get failures when compiling with -fsanitize=thread.

I don't think TSAN can help here. This is not a data race between two threads,
but between our SIGCHLD signal handler calling wait() and the wait() inside
system().

So once the shell spawned by system() exits, one of the following happens:

A) everything works OK
SIGCHLD handler fires and calls wait()
wait() in system() reaps the child and returns exit status
wait() in signal handler fails with ECHILD but errors are ignored anyway

B) this bug
SIGCHLD handler fires and calls wait()
wait() in signal handler reaps the child (and ignores its exit status)
wait() in system() fails with ECHILD, triggering a Fortran runtime error

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libfortran/94143] [9/10/11/12 Regression] Asynchronous execute_command_line() breaks following synchronous calls
  2020-03-11 14:54 [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls trnka at scm dot com
                   ` (4 preceding siblings ...)
  2020-05-08  9:57 ` trnka at scm dot com
@ 2021-06-01  8:16 ` rguenth at gcc dot gnu.org
  2022-05-27  9:42 ` [Bug libfortran/94143] [10/11/12/13 " rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-01  8:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.4                         |9.5

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9.4 is being released, retargeting bugs to GCC 9.5.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libfortran/94143] [10/11/12/13 Regression] Asynchronous execute_command_line() breaks following synchronous calls
  2020-03-11 14:54 [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls trnka at scm dot com
                   ` (5 preceding siblings ...)
  2021-06-01  8:16 ` [Bug libfortran/94143] [9/10/11/12 " rguenth at gcc dot gnu.org
@ 2022-05-27  9:42 ` rguenth at gcc dot gnu.org
  2022-06-28 10:40 ` jakub at gcc dot gnu.org
  2023-07-07 10:37 ` [Bug libfortran/94143] [11/12/13/14 " rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-27  9:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.5                         |10.4

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9 branch is being closed

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libfortran/94143] [10/11/12/13 Regression] Asynchronous execute_command_line() breaks following synchronous calls
  2020-03-11 14:54 [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls trnka at scm dot com
                   ` (6 preceding siblings ...)
  2022-05-27  9:42 ` [Bug libfortran/94143] [10/11/12/13 " rguenth at gcc dot gnu.org
@ 2022-06-28 10:40 ` jakub at gcc dot gnu.org
  2023-07-07 10:37 ` [Bug libfortran/94143] [11/12/13/14 " rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-06-28 10:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.4                        |10.5

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 10.4 is being released, retargeting bugs to GCC 10.5.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libfortran/94143] [11/12/13/14 Regression] Asynchronous execute_command_line() breaks following synchronous calls
  2020-03-11 14:54 [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls trnka at scm dot com
                   ` (7 preceding siblings ...)
  2022-06-28 10:40 ` jakub at gcc dot gnu.org
@ 2023-07-07 10:37 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.5                        |11.5

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 10 branch is being closed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-07-07 10:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-11 14:54 [Bug libfortran/94143] New: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls trnka at scm dot com
2020-03-12  8:43 ` [Bug libfortran/94143] " rguenth at gcc dot gnu.org
2020-03-12 11:58 ` jakub at gcc dot gnu.org
2020-04-18 15:54 ` tkoenig at gcc dot gnu.org
2020-05-07 21:05 ` [Bug libfortran/94143] [9/10/11 " anlauf at gcc dot gnu.org
2020-05-08  9:57 ` trnka at scm dot com
2021-06-01  8:16 ` [Bug libfortran/94143] [9/10/11/12 " rguenth at gcc dot gnu.org
2022-05-27  9:42 ` [Bug libfortran/94143] [10/11/12/13 " rguenth at gcc dot gnu.org
2022-06-28 10:40 ` jakub at gcc dot gnu.org
2023-07-07 10:37 ` [Bug libfortran/94143] [11/12/13/14 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).