public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/51591] New: Strange output from STOP statement in OpenMP region
@ 2011-12-16 22:31 longb at cray dot com
  2011-12-17 10:21 ` [Bug fortran/51591] " burnus at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: longb at cray dot com @ 2011-12-16 22:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51591

             Bug #: 51591
           Summary: Strange output from STOP statement in OpenMP region
    Classification: Unclassified
           Product: gcc
           Version: 4.6.2
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: fortran
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: longb@cray.com


> cat testc.c
#include <unistd.h>
/*  extern unsigned int sleep (unsigned int __seconds);  */
int sleepc_ (unsigned int *sec)
{
    sleep(*sec);
    return 0;
}

> cat test.f90
  use omp_lib
  implicit none
  integer i
  print *,"Hello World"
  call omp_set_num_threads(5)
!$omp parallel 
!$omp do schedule(static,1)
  do i=1,omp_get_num_threads()
!$omp critical
   print *, "I am",omp_get_thread_num()," of",omp_get_num_threads()
!$omp end critical
   select case (omp_get_thread_num())
    case (0)
      call sleep (1)
      stop 0
    case (1)
      stop 1
    case (2)
      stop 2
    case (3)
      stop 3
    case default
      stop
   end select
  enddo
!$omp end do
!$omp barrier
!$omp end parallel
  end
> cc -c testc.c
> ftn -fopenmp test.f90 testc.o



Sometimes output looks OK:

> aprun -n1 -d5 ./a.out
 Hello World
STOP 1
 I am           1  of           5
 I am           2  of           5
Application 5777837 exit codes: 1
Application 5777837 resources: utime ~0s, stime ~0s

But more often there is some garbled text output:

> aprun -n1 -d5 ./a.out
 Hello World
STOP 1
 I am           1  of           5
0im `m5                               <<<<-----  What's this?
Application 5777838 exit codes: 1
Application 5777838 resources: utime ~0s, stime ~0s

<Nice to see that the STOP 1 results in an exit code of 1, though - new F08
feature.>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/51591] Strange output from STOP statement in OpenMP region
  2011-12-16 22:31 [Bug fortran/51591] New: Strange output from STOP statement in OpenMP region longb at cray dot com
@ 2011-12-17 10:21 ` burnus at gcc dot gnu.org
  2011-12-17 11:32 ` jb at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: burnus at gcc dot gnu.org @ 2011-12-17 10:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51591

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jb at gcc dot gnu.org

--- Comment #1 from Tobias Burnus <burnus at gcc dot gnu.org> 2011-12-17 09:39:05 UTC ---
(In reply to comment #0)
> > cat testc.c
> int sleepc_ (unsigned int *sec)

This function is actually not used but gfortran's intrinsic "sleep". (The
Fortran program calls "sleep" instead of "sleepc".)

> Sometimes output looks OK:
> But more often there is some garbled text output:
> 0im `m5                               <<<<-----  What's this?

I can reproduce this - though for me the output is more often OK than garbled
(60% vs. 40% of the output of:
  for((I=0;$I < 20; I++)); do ./a.out ; done
)

That's with GCC 4.6. In GCC 4.7, it works much more often (I have to run the
the line above about ~20 times, i.e. approx every 400th run it fails).
Additionally, in 4.7 I do not see garbled output but a segfault.

A backtrace of the core dump shows:

Program terminated with signal 11, Segmentation fault.
#0  _gfortrani_fbuf_flush (u=0x6055d0, mode=<optimized out>)
    at /home/tob/projects/gcc-git/gcc/libgfortran/io/fbuf.c:166
166       if (u->fbuf->act > u->fbuf->pos && u->fbuf->pos > 0)
(gdb) bt
#1  0x00002b37379836bd in _gfortrani_next_record (dtp=0x2b373926dc50, done=1)
    at /home/tob/projects/gcc-git/gcc/libgfortran/io/transfer.c:3397
#2  0x00002b3737983f79 in _gfortran_st_write_done (dtp=0x2b373926dc50)
    at /home/tob/projects/gcc-git/gcc/libgfortran/io/transfer.c:3592

(gdb) p u->fbuf
$3 = (struct fbuf *) 0x7e7e7e7e7e7e7e7e

The value matches:
$ echo $MALLOC_PERTURB_ 
126

Thus, "fbuf" points to malloced memory, which has never been initialized.


> <Nice to see that the STOP 1 results in an exit code of 1, though - new F08
> feature.>

I think gfortran (like several other compilers) does so since years; new (since
4.6) is the support for constant character and integer expressions for (error)
stop.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/51591] Strange output from STOP statement in OpenMP region
  2011-12-16 22:31 [Bug fortran/51591] New: Strange output from STOP statement in OpenMP region longb at cray dot com
  2011-12-17 10:21 ` [Bug fortran/51591] " burnus at gcc dot gnu.org
@ 2011-12-17 11:32 ` jb at gcc dot gnu.org
  2012-02-03 22:09 ` bdavis at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jb at gcc dot gnu.org @ 2011-12-17 11:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51591

--- Comment #2 from Janne Blomqvist <jb at gcc dot gnu.org> 2011-12-17 11:27:40 UTC ---
Looks like some kind of race condition.. 

E.g. what about: STOP calls exit(), which leads to the library destructor being
called, which calls close_units(), which closes each open unit in the tree. But
somehow the print statement from another thread also thinks it has access to
the unit, and then tries to print something, which segfaults because the other
thread is in the process of shutting down the same unit?

Hmm, now that I quickly looked at the code, the above looks likely. So
close_units() acquires unit_lock (the global lock protecting the unit tree),
then closes each unit without acquiring the unit's own lock (u->lock).

For comparison, in normal IO statements, first we acquire unit_lock, find the
unit in the tree, acquire u->lock, then release unit_lock. Then do the IO with
u->lock held, and finally relase u->lock.

So it seems that it would be possible for the print statement to acquire the
u->lock before the close_units gets to lock unit_lock, and thus we have a race?

Of course, this is based on a very quick scan of the code, and I could be all
wrong. Perhaps Jakub knows better, as he designed the libgfortran locking
scheme?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/51591] Strange output from STOP statement in OpenMP region
  2011-12-16 22:31 [Bug fortran/51591] New: Strange output from STOP statement in OpenMP region longb at cray dot com
  2011-12-17 10:21 ` [Bug fortran/51591] " burnus at gcc dot gnu.org
  2011-12-17 11:32 ` jb at gcc dot gnu.org
@ 2012-02-03 22:09 ` bdavis at gcc dot gnu.org
  2013-05-11 17:09 ` bdavis at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bdavis at gcc dot gnu.org @ 2012-02-03 22:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51591

Bud Davis <bdavis at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bdavis at gcc dot gnu.org

--- Comment #3 from Bud Davis <bdavis at gcc dot gnu.org> 2012-02-03 22:08:10 UTC ---
Index: gcc/libgfortran/io/unit.c
===================================================================
--- gcc/libgfortran/io/unit.c   (revision 183873)
+++ gcc/libgfortran/io/unit.c   (working copy)
@@ -637,6 +637,7 @@
   if (u->previous_nonadvancing_write)
     finish_last_advance_record (u);

+  __gthread_mutex_lock (&u->lock);
   rc = (u->s == NULL) ? 0 : sclose (u->s) == -1;

   u->closed = 1;


As theorized, the above patch does seem to correct the problem with no
regressions in the testsuite.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/51591] Strange output from STOP statement in OpenMP region
  2011-12-16 22:31 [Bug fortran/51591] New: Strange output from STOP statement in OpenMP region longb at cray dot com
                   ` (2 preceding siblings ...)
  2012-02-03 22:09 ` bdavis at gcc dot gnu.org
@ 2013-05-11 17:09 ` bdavis at gcc dot gnu.org
  2015-10-20 14:54 ` dominiq at lps dot ens.fr
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bdavis at gcc dot gnu.org @ 2013-05-11 17:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51591

--- Comment #4 from Bud Davis <bdavis at gcc dot gnu.org> ---
Upon closer reflection, the underlying problems is the OpenMP threads doing I/O
while the units are being closed.
So, stop shows in the output, followed by output from threads whose units have
been destroyed, but the call to exit() handler has not yet terminated.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/51591] Strange output from STOP statement in OpenMP region
  2011-12-16 22:31 [Bug fortran/51591] New: Strange output from STOP statement in OpenMP region longb at cray dot com
                   ` (3 preceding siblings ...)
  2013-05-11 17:09 ` bdavis at gcc dot gnu.org
@ 2015-10-20 14:54 ` dominiq at lps dot ens.fr
  2020-07-30 15:17 ` dominiq at lps dot ens.fr
  2020-07-30 15:23 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: dominiq at lps dot ens.fr @ 2015-10-20 14:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51591

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2015-10-20
     Ever confirmed|0                           |1

--- Comment #5 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> Upon closer reflection, the underlying problems is the OpenMP threads doing
> I/O while the units are being closed.
> So, stop shows in the output, followed by output from threads whose units
> have been destroyed, but the call to exit() handler has not yet terminated.

Nay progress after more than two years?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/51591] Strange output from STOP statement in OpenMP region
  2011-12-16 22:31 [Bug fortran/51591] New: Strange output from STOP statement in OpenMP region longb at cray dot com
                   ` (4 preceding siblings ...)
  2015-10-20 14:54 ` dominiq at lps dot ens.fr
@ 2020-07-30 15:17 ` dominiq at lps dot ens.fr
  2020-07-30 15:23 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: dominiq at lps dot ens.fr @ 2020-07-30 15:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51591

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|---                         |WORKSFORME

--- Comment #8 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> Jakub, do you know what the OMP standard has to say on this?
> Is "STOP 1" in an OMP region defined behavior?

No answer after more than one year. Closing.

Open a new PR if the problem is still there.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/51591] Strange output from STOP statement in OpenMP region
  2011-12-16 22:31 [Bug fortran/51591] New: Strange output from STOP statement in OpenMP region longb at cray dot com
                   ` (5 preceding siblings ...)
  2020-07-30 15:17 ` dominiq at lps dot ens.fr
@ 2020-07-30 15:23 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-07-30 15:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51591

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
OpenMP just says that a structured block
"may contain STOP or ERROR STOP statements."
and nothing else, what the particular behavior for STOP is is covered in the
base language or is up to the implementation.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-07-30 15:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-16 22:31 [Bug fortran/51591] New: Strange output from STOP statement in OpenMP region longb at cray dot com
2011-12-17 10:21 ` [Bug fortran/51591] " burnus at gcc dot gnu.org
2011-12-17 11:32 ` jb at gcc dot gnu.org
2012-02-03 22:09 ` bdavis at gcc dot gnu.org
2013-05-11 17:09 ` bdavis at gcc dot gnu.org
2015-10-20 14:54 ` dominiq at lps dot ens.fr
2020-07-30 15:17 ` dominiq at lps dot ens.fr
2020-07-30 15:23 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).