Data race within write intrinsic with output into a character variable

public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed

* Data race within write intrinsic with output into a character variable
@ 2021-03-10 11:35 Martin Stein
  2021-03-10 15:41 ` Kay Diederichs
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Stein @ 2021-03-10 11:35 UTC (permalink / raw)
  To: fortran

Hi,

I am seeing rare but reproducible memory corruptions which I can trace back to lines like

write(out,'(a,i8)') 'short string', k

where out is a (sufficiently large) character(len=...) variable and k some small integer. The line itself occurs in a subroutine called from within an openmp region.

I have seen this in two rather different circumstances. If I change the line to

out = 'short string' // toStr(k)

and write my own small toStr function, which translates an integer to its string representation, then the memory corruption (usually occuring shortly afterwards but on seemingly unrelated code) disappears.
As out is usually not even used (it is a routine for debugging which only uses the output in case something goes wrong), I am pretty sure that the problem is within the write code.

Unfortunately I cannot create a small reproducer. As I have already seen data races/memory corruption with write (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88899 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88768) I am inclined to conclude that the write intrinsic is at fault here.

Any idea on how this can be further investigated? If write is indeed at fault, that would be pretty bad.

Best regards
Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Data race within write intrinsic with output into a character variable
  2021-03-10 11:35 Data race within write intrinsic with output into a character variable Martin Stein
@ 2021-03-10 15:41 ` Kay Diederichs
  0 siblings, 0 replies; 4+ messages in thread
From: Kay Diederichs @ 2021-03-10 15:41 UTC (permalink / raw)
  To: fortran

On 3/10/21 12:35 PM, Martin Stein via Fortran wrote:
> Hi,
> 
> I am seeing rare but reproducible memory corruptions which I can trace back to lines like
> 
> write(out,'(a,i8)') 'short string', k
> 
> where out is a (sufficiently large) character(len=...) variable and k some small integer. The line itself occurs in a subroutine called from within an openmp region.
> 
> I have seen this in two rather different circumstances. If I change the line to
> 
> out = 'short string' // toStr(k)
> 
> and write my own small toStr function, which translates an integer to its string representation, then the memory corruption (usually occuring shortly afterwards but on seemingly unrelated code) disappears.
> As out is usually not even used (it is a routine for debugging which only uses the output in case something goes wrong), I am pretty sure that the problem is within the write code.
> 
> Unfortunately I cannot create a small reproducer. As I have already seen data races/memory corruption with write (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88899 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88768) I am inclined to conclude that the write intrinsic is at fault here.
> 
> Any idea on how this can be further investigated? If write is indeed at fault, that would be pretty bad.
> 
> Best regards
> Martin
> 
> 

which version of gfortran, and which operating system?

Kay


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Data race within write intrinsic with output into a character variable
  2021-03-10 20:40 mscfd
@ 2021-03-10 21:38 ` Tobias Burnus
  0 siblings, 0 replies; 4+ messages in thread
From: Tobias Burnus @ 2021-03-10 21:38 UTC (permalink / raw)
  To: mscfd, fortran

Hi Martin,

On 10.03.21 21:40, mscfd via Fortran wrote:
> Using helgrind on a simple omp do loop with write to
> a character variable, I get some possible data races
> in libgfortran/io/unit.c.

[...]

Thanks for digging. I have filled https://gcc.gnu.org/PR99529

> There global array newunits is
> allocated and possibly reallocated in
> "newunit_alloc". According to the lock outputs from
> helgrind I see that this routine is called even if
> output is into a character variable. [...]
Concur - I don't know why this is needed (it is clearly done in
purpose), but I only know libgfortran's I/O superficially.
> Could it be that the corresponding write routine in
> transfer.c which calls newunit_free does not obtain
> the necessary lock. I cannot find it (which does not
> count for much).

Glancing at the code, I came to the same conclusion (see patch in the
linked PR). But I have not tried whether it helps. But at least it compiles.

Thus, if you could test whether it helps on your side? Otherwise, I hope
that Jerry chimes in.

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Data race within write intrinsic with output into a character variable
@ 2021-03-10 20:40 mscfd
  2021-03-10 21:38 ` Tobias Burnus
  0 siblings, 1 reply; 4+ messages in thread
From: mscfd @ 2021-03-10 20:40 UTC (permalink / raw)
  To: fortran

Sorry for the noise, but line breaks and subject
were somehow missing...

> which version of gfortran, and which operating system?
I have seen this on two different Linux distros on x86
with a recently compiled version, but also some
time ago with an older gfortran 10 version.

Using helgrind on a simple omp do loop with write to
a character variable, I get some possible data races
in libgfortran/io/unit.c. There global array newunits is
allocated and possibly reallocated in
"newunit_alloc". According to the lock outputs from
helgrind I see that this routine is called even if
output is into a character variable. Routine
"newunit_alloc" uses a lock to avoid having several
threads all over the place. But newunit_free also
writes to newunits array. And this routine does not
obtain a lock itself (see comment above newunit_free
in unit.c) So in theory it can happen that newunit_alloc
reallocated newunits, and newunit_free writes to it just
at this time. As I also use 18 threads the initial size of 16
does not suffice and reallocation does probably
indeed happen.
Also acces to newunit_lwi is not protected as well
(and complained about by helgrind).

Could it be that the corresponding write routine in
transfer.c which calls newunit_free does not obtain
the necessary lock. I cannot find it (which does not
count for much).

Any thoughts?
Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-10 21:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10 11:35 Data race within write intrinsic with output into a character variable Martin Stein
2021-03-10 15:41 ` Kay Diederichs
2021-03-10 20:40 mscfd
2021-03-10 21:38 ` Tobias Burnus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).