public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/99529] New: libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings
@ 2021-03-10 21:10 burnus at gcc dot gnu.org
  2021-03-10 21:30 ` [Bug fortran/99529] " burnus at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: burnus at gcc dot gnu.org @ 2021-03-10 21:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99529

            Bug ID: 99529
           Summary: libgfortran I/O: Data races related to new unit / new
                    unit calls for I/O to strings
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org
                CC: jvdelisle at gcc dot gnu.org
  Target Milestone: ---

Reported at https://gcc.gnu.org/pipermail/fortran/2021-March/055795.html

Using helgrind on a simple omp do loop with write to
a character variable, I get some possible data races
in libgfortran/io/unit.c.

There global array newunits is
allocated and possibly reallocated in
"newunit_alloc". According to the lock outputs from
helgrind I see that this routine is called even if
output is into a character variable.

Routine "newunit_alloc" uses a lock to avoid having several
threads all over the place. But newunit_free also
writes to newunits array. And this routine does not
obtain a lock itself (see comment above newunit_free
in unit.c) So in theory it can happen that newunit_alloc
reallocated newunits, and newunit_free writes to it just
at this time. As I also use 18 threads the initial size of 16
does not suffice and reallocation does probably
indeed happen.

Also access to newunit_lwi is not protected as well
(and complained about by helgrind).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/99529] libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings
  2021-03-10 21:10 [Bug fortran/99529] New: libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings burnus at gcc dot gnu.org
@ 2021-03-10 21:30 ` burnus at gcc dot gnu.org
  2021-03-11  8:30 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: burnus at gcc dot gnu.org @ 2021-03-10 21:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99529

--- Comment #1 from Tobias Burnus <burnus at gcc dot gnu.org> ---
Regarding calling newunit_alloc, that's due to libgfortran/io/unit.c's:

get_unit (st_parameter_dt *dtp, int do_create)
{
  gfc_unit *unit;

  if ((dtp->common.flags & IOPARM_DT_HAS_INTERNAL_UNIT) != 0)
    {
      int kind;
      if (dtp->common.unit == GFC_INTERNAL_UNIT)
        kind = 1;
      else if (dtp->common.unit == GFC_INTERNAL_UNIT4)
        kind = 4;
      else
        internal_error (&dtp->common, "get_unit(): Bad internal unit KIND");

      dtp->u.p.unit_is_internal = 1;
      dtp->common.unit = newunit_alloc ();
      unit = get_gfc_unit (dtp->common.unit, do_create);
      set_internal_unit (dtp, unit, kind);
      fbuf_init (unit, 128);
      return unit;
    }

 * * *

>From reading the report & looking at the code superficially, it seems as if a
patch is needed like:

diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index 27bee9d4e01..a6f115d5803 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -4360,3 +4360,5 @@ st_read_done_worker (st_parameter_dt *dtp)
            }
+         LOCK (&unit_lock);
          newunit_free (dtp->common.unit);
+         UNLOCK (&unit_lock);
        }
@@ -4448,3 +4450,5 @@ st_write_done_worker (st_parameter_dt *dtp)
            }
+         LOCK (&unit_lock);
          newunit_free (dtp->common.unit);
+         UNLOCK (&unit_lock);
        }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/99529] libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings
  2021-03-10 21:10 [Bug fortran/99529] New: libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings burnus at gcc dot gnu.org
  2021-03-10 21:30 ` [Bug fortran/99529] " burnus at gcc dot gnu.org
@ 2021-03-11  8:30 ` rguenth at gcc dot gnu.org
  2021-03-11  8:52 ` mscfd at gmx dot net
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-11  8:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99529

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Or make an internal (static) newunit_free_unlocked and make the exported
newunit_free take the lock.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/99529] libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings
  2021-03-10 21:10 [Bug fortran/99529] New: libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings burnus at gcc dot gnu.org
  2021-03-10 21:30 ` [Bug fortran/99529] " burnus at gcc dot gnu.org
  2021-03-11  8:30 ` rguenth at gcc dot gnu.org
@ 2021-03-11  8:52 ` mscfd at gmx dot net
  2021-03-11  9:07 ` mscfd at gmx dot net
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: mscfd at gmx dot net @ 2021-03-11  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99529

--- Comment #3 from martin <mscfd at gmx dot net> ---
Thanks for looking at the report and providing a patch to test.
For completeness sake, here is the simple test code, which does not fail, but
which shows the data race in helgrind (compile with -fopenmp -g2):

program omp_write_str

use OMP_LIB
implicit none

integer :: i
character(len=16) :: out

!$omp parallel do schedule(static,10) default(shared) private(i, out)
do i=1,100000
   write(out,'(i8)') omp_get_thread_num()
end do
!$omp end parallel do

end program omp_write_str


The provided patch indeed removes the data races (there are still data races at
startup and exit, which are false positives, as those occur in a single
threaded region.)

However, with the my real code, it does not help. I will further check with
helgrind (the amount of false positives is staggering, though...)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/99529] libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings
  2021-03-10 21:10 [Bug fortran/99529] New: libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings burnus at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-03-11  8:52 ` mscfd at gmx dot net
@ 2021-03-11  9:07 ` mscfd at gmx dot net
  2021-03-11  9:31 ` burnus at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: mscfd at gmx dot net @ 2021-03-11  9:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99529

--- Comment #4 from martin <mscfd at gmx dot net> ---
Ok, here is another suspicious data race in unit.c (backtrace from helgrind):

==7831== Possible data race during read of size 4 at 0x5DE4620 by thread #296
==7831== Locks held: 1, at address 0x2EE5D00
==7831==    at 0x10067CB: get_gfc_unit (unit.c:336)
==7831==    by 0x10071DB: _gfortrani_get_unit (unit.c:556)
==7831==    by 0x10050DC: data_transfer_init (transfer.c:2851)
...
==7831== 
==7831== This conflicts with a previous write of size 4 by thread #3
==7831== Locks held: 1, at address 0x5DE4700
==7831==    at 0x10069F4: _gfortrani_set_internal_unit (unit.c:459)
==7831==    by 0x10071EC: _gfortrani_get_unit (unit.c:557)
==7831==    by 0x10050DC: data_transfer_init (transfer.c:2851)

There seems to be a conflicting access to "unit_cache[c]->unit_number",
set_internal_unit does not hold UNIT_LOCK.

I cannot observe this in the simple testcase, but in the real code at the right
place where I suspect the write-caused memory corruption.

(Just in case line numbers have changed:
unit.c:336:     if (unit_cache[c] != NULL && unit_cache[c]->unit_number == n)
unit.c:556:  iunit->unit_number = dtp->common.unit;
)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/99529] libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings
  2021-03-10 21:10 [Bug fortran/99529] New: libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings burnus at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-03-11  9:07 ` mscfd at gmx dot net
@ 2021-03-11  9:31 ` burnus at gcc dot gnu.org
  2021-03-11  9:48 ` burnus at gcc dot gnu.org
  2021-03-15 13:29 ` burnus at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: burnus at gcc dot gnu.org @ 2021-03-11  9:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99529

--- Comment #5 from Tobias Burnus <burnus at gcc dot gnu.org> ---
(In reply to martin from comment #4)
> Ok, here is another suspicious data race in unit.c (backtrace from helgrind):

It looks as if - for the libgfortran internal use - it first gets the unit
based on that number – and then sets it again. Namely, the call is:

      unit = get_gfc_unit (dtp->common.unit, do_create);
      set_internal_unit (dtp, unit, kind);

Just to make sure, I add an assert - but I think the line can also be removed
for good.

--- a/libgfortran/io/unit.c
+++ b/libgfortran/io/unit.c
@@ -459 +459 @@ set_internal_unit (st_parameter_dt *dtp, gfc_unit *iunit, int
kind)
-  iunit->unit_number = dtp->common.unit;
+  assert (iunit->unit_number == dtp->common.unit);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/99529] libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings
  2021-03-10 21:10 [Bug fortran/99529] New: libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings burnus at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-03-11  9:31 ` burnus at gcc dot gnu.org
@ 2021-03-11  9:48 ` burnus at gcc dot gnu.org
  2021-03-15 13:29 ` burnus at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: burnus at gcc dot gnu.org @ 2021-03-11  9:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99529

--- Comment #6 from Tobias Burnus <burnus at gcc dot gnu.org> ---
Patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566596.html

@Martin: Besides helgrind, you could also try building your program with
-fsanitize=thread - this might also help finding some issues (for environment
flags, see https://github.com/google/sanitizers/wiki/ThreadSanitizerFlags )

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug fortran/99529] libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings
  2021-03-10 21:10 [Bug fortran/99529] New: libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings burnus at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-03-11  9:48 ` burnus at gcc dot gnu.org
@ 2021-03-15 13:29 ` burnus at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: burnus at gcc dot gnu.org @ 2021-03-15 13:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99529

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #7 from Tobias Burnus <burnus at gcc dot gnu.org> ---
Missed to add a PR fortran/99529, hence, the commit did not show up:

r11-7647-ga6e9633ccb593937fceec67fafc2afe5d518d735

commit a6e9633ccb593937fceec67fafc2afe5d518d735
Author: Tobias Burnus <tobias@codesourcery.com>
Date:   Fri Mar 12 16:31:32 2021 +0100

    Fortran: Fix libgfortran I/O race with newunit_free [PR99529]

    libgfortran/ChangeLog:

            * io/transfer.c (st_read_done_worker, st_write_done_worker):
            Call unlock_unit here, add unit_lock lock around newunit_free call.
            (st_read_done, st_write_done): Only call unlock_unit when not
            calling the worker function.
            * io/unit.c (set_internal_unit): Don't reset the unit_number
            to the same number as this cause race warnings.

 * * *

Committed patch was the revised version
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566600.html

Martin (who reported the issue + helped debugging it) wrote in one of the email
threads:
"BTW, I will do some more tests, but it looks like the patch fixes
the memory corruption issue."

Thus, hopefully, this issue is fixed for good. Thanks to all involved for the
help!

→ Close as FIXED (on mainline, GCC 11).

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-15 13:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10 21:10 [Bug fortran/99529] New: libgfortran I/O: Data races related to new unit / new unit calls for I/O to strings burnus at gcc dot gnu.org
2021-03-10 21:30 ` [Bug fortran/99529] " burnus at gcc dot gnu.org
2021-03-11  8:30 ` rguenth at gcc dot gnu.org
2021-03-11  8:52 ` mscfd at gmx dot net
2021-03-11  9:07 ` mscfd at gmx dot net
2021-03-11  9:31 ` burnus at gcc dot gnu.org
2021-03-11  9:48 ` burnus at gcc dot gnu.org
2021-03-15 13:29 ` burnus at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).