public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/104783] New: [nvptx, openmp] Hang/abort with atomic update in simd construct
@ 2022-03-04 13:35 vries at gcc dot gnu.org
  2022-03-04 13:48 ` [Bug target/104783] " vries at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2022-03-04 13:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783

            Bug ID: 104783
           Summary: [nvptx, openmp] Hang/abort with atomic update in simd
                    construct
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Minimized from
https://github.com/TApplencourt/OvO/blob/master/test_src/cpp/hierarchical_parallelism/atomic_add-float/target_teams_distribute__parallel_for__simd.cpp
.

Test-case:
...
$ cat libgomp/testsuite/libgomp.c/target_teams_distribute__parallel_for__simd.c 
/* { dg-options "-O2" } */

int
main (void)
{
  const unsigned expected_value = 1;

  unsigned counter_N0 = 0;

#pragma omp target map(tofrom: counter_N0)
#pragma omp simd
  for (int i = 0 ; i < 1 ; i++ )
    {
#pragma omp atomic update
      counter_N0 = counter_N0 + 1 ;
    }

  return 0;
}
...

With target board unix/-foffload=-mptx=3.1:
...
libgomp: cuCtxSynchronize error: unspecified launch failure (perhaps abort was
called)


libgomp: cuMemFree_v2 error: unspecified launch failure

libgomp: device finalization failed
FAIL: libgomp.c/target_teams_distribute__parallel_for__simd.c execution test
...

With target board unix: hangs.

With dg-options -O0:
...
PASS: libgomp.c/target_teams_distribute__parallel_for__simd.c execution test
...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct
  2022-03-04 13:35 [Bug target/104783] New: [nvptx, openmp] Hang/abort with atomic update in simd construct vries at gcc dot gnu.org
@ 2022-03-04 13:48 ` vries at gcc dot gnu.org
  2022-03-04 16:16 ` vries at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2022-03-04 13:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783

--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
Hmm, I wonder if nvptx_reorg_uniform_simt should run inbetween SIMT_ENTER and
SIMT_EXIT.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct
  2022-03-04 13:35 [Bug target/104783] New: [nvptx, openmp] Hang/abort with atomic update in simd construct vries at gcc dot gnu.org
  2022-03-04 13:48 ` [Bug target/104783] " vries at gcc dot gnu.org
@ 2022-03-04 16:16 ` vries at gcc dot gnu.org
  2022-03-08 12:51 ` vries at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2022-03-04 16:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783

--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
Hmm, the atom insn sets a register that is not used anywhere.  So the shuffle
communicating the result doesn't make much sense.

We can fix that by doing:
...
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index c6cec0c27c2..60d02c02452 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -3265,7 +3265,9 @@ static bool
 nvptx_unisimt_handle_set (rtx set, rtx_insn *insn, rtx master)
 {
   rtx reg;
-  if (GET_CODE (set) == SET && REG_P (reg = SET_DEST (set)))
+  if (GET_CODE (set) == SET
+      && REG_P (reg = SET_DEST (set))
+      && find_reg_note (insn, REG_UNUSED, reg) == NULL_RTX)
     {
       emit_insn_after (nvptx_gen_shuffle (reg, reg, master, SHUFFLE_IDX),
                       insn);
...

But that gives us a warp sync instead of a shuffle:
...
$L2:
ld.u64 %r29,[%r27];
@ %r33 atom.add.u32 %r30,[%r29],1;
bar.warp.sync 0xffffffff;
...
so the problem of the hang persists.

But, if we roll back the recent change of commit 8e5c34ab45f ("[nvptx] Use
nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt", the test-case
passes.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct
  2022-03-04 13:35 [Bug target/104783] New: [nvptx, openmp] Hang/abort with atomic update in simd construct vries at gcc dot gnu.org
  2022-03-04 13:48 ` [Bug target/104783] " vries at gcc dot gnu.org
  2022-03-04 16:16 ` vries at gcc dot gnu.org
@ 2022-03-08 12:51 ` vries at gcc dot gnu.org
  2022-03-09  7:16 ` vries at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2022-03-08 12:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783

--- Comment #3 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 52584
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52584&action=edit
Tentative patch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct
  2022-03-04 13:35 [Bug target/104783] New: [nvptx, openmp] Hang/abort with atomic update in simd construct vries at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-03-08 12:51 ` vries at gcc dot gnu.org
@ 2022-03-09  7:16 ` vries at gcc dot gnu.org
  2022-03-10 11:22 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2022-03-09  7:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783

--- Comment #4 from Tom de Vries <vries at gcc dot gnu.org> ---
The patch I have works for target boards unix and unix/-foffload=-mptx=3.1, but
I run into the hang for --target_board=unix/-foffload=-misa=sm_75.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct
  2022-03-04 13:35 [Bug target/104783] New: [nvptx, openmp] Hang/abort with atomic update in simd construct vries at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-03-09  7:16 ` vries at gcc dot gnu.org
@ 2022-03-10 11:22 ` cvs-commit at gcc dot gnu.org
  2022-03-10 12:25 ` vries at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-10 11:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tom de Vries <vries@gcc.gnu.org>:

https://gcc.gnu.org/g:f07178ca3c1e5dff799fb5016bb3767571db3165

commit r12-7586-gf07178ca3c1e5dff799fb5016bb3767571db3165
Author: Tom de Vries <tdevries@suse.de>
Date:   Tue Mar 8 10:15:45 2022 +0100

    [nvptx] Disable warp sync in simt region

    I ran into a hang for this code:
    ...
      #pragma omp target map(tofrom: counter_N0)
      #pragma omp simd
      for (int i = 0 ; i < 1 ; i++ )
        {
          #pragma omp atomic update
          counter_N0 = counter_N0 + 1 ;
        }
    ...

    This has to do with the nature of -muniform-simt.  It has two modes of
    operation: inside and outside an SIMT region.

    Outside an SIMT region, a warp pretends to execute a single thread, but
    actually executes in all threads, to keep the local registers in all
threads
    consistent.  This approach works unless the insn that is executed is a
syscall
    or an atomic insn.  In that case, the insn is predicated, such that it
    executes in only one thread.  If the predicated insn writes a result to a
    register, then that register is propagated to the other threads, after
which
    the local registers in all threads are consistent again.

    Inside an SIMT region, a warp executes in all threads.  However, the
    predication and propagation for syscalls and atomic insns is also present
    here, because nvptx_reorg_uniform_simt works on all code.  Care has been
taken
    though to ensure that the predication and propagation is a nop.  That is,
    inside an SIMT region:
    - the predicate evalutes to true for each thread, and
    - the propagation insn copies a register from each thread to the same
thread.

    That works fine, until we use -mptx=6.0, and instead of using the
deprecated
    warp propagation insn shfl, we start using shfl.sync:
    ...
      @%r33 atom.add.u32            _, [%r29], 1;
            shfl.sync.idx.b32       %r30, %r30, %r32, 31, 0xffffffff;
    ...

    The shfl.sync specifies a member mask indicating all threads, but given
that
    the loop only has a single iteration, only thread 0 will execute the insn,
    where it will hang waiting for the other threads.

    Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the
    uniform warp check) such that it only executes outside the SIMT region.

    Tested on x86_64 with nvptx accelerator.

    gcc/ChangeLog:

    2022-03-08  Tom de Vries  <tdevries@suse.de>

            PR target/104783
            * config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate)
            (nvptx_output_unisimt_switch): Handle
unisimt_outside_simt_predicate.
            (nvptx_get_unisimt_outside_simt_predicate): New function.
            (predicate_insn): New function, factored out of ...
            (nvptx_reorg_uniform_simt): ... here.  Predicate all emitted insns.
            * config/nvptx/nvptx.h (struct machine_function): Add
            unisimt_outside_simt_predicate field.
            * config/nvptx/nvptx.md (define_insn "nvptx_warpsync")
            (define_insn "nvptx_uniform_warp_check"): Make predicable.

    libgomp/ChangeLog:

    2022-03-10  Tom de Vries  <tdevries@suse.de>

            * testsuite/libgomp.c/pr104783.c: New test.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct
  2022-03-04 13:35 [Bug target/104783] New: [nvptx, openmp] Hang/abort with atomic update in simd construct vries at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-03-10 11:22 ` cvs-commit at gcc dot gnu.org
@ 2022-03-10 12:25 ` vries at gcc dot gnu.org
  2022-03-22 13:40 ` cvs-commit at gcc dot gnu.org
  2022-03-22 13:47 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2022-03-10 12:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783

--- Comment #6 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 52593
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52593&action=edit
Tentative patch

(In reply to Tom de Vries from comment #4)
> The patch I have works for target boards unix and unix/-foffload=-mptx=3.1,
> but I run into the hang for --target_board=unix/-foffload=-misa=sm_75.

I also have a tentative patch for this, but unfortunately no proper root cause
analysis.  I'm testing this now on a build with misa=sm_75 as default.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct
  2022-03-04 13:35 [Bug target/104783] New: [nvptx, openmp] Hang/abort with atomic update in simd construct vries at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2022-03-10 12:25 ` vries at gcc dot gnu.org
@ 2022-03-22 13:40 ` cvs-commit at gcc dot gnu.org
  2022-03-22 13:47 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-22 13:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tom de Vries <vries@gcc.gnu.org>:

https://gcc.gnu.org/g:a624388b9546b066250be8baa118b7d50c403c25

commit r12-7765-ga624388b9546b066250be8baa118b7d50c403c25
Author: Tom de Vries <tdevries@suse.de>
Date:   Wed Mar 9 10:35:14 2022 +0100

    [nvptx] Add warp sync at simt exit

    Consider this code (with N defined to 1024):
    ...
      float v = 0.0;
      #pragma omp target map(tofrom: v)
      #pragma omp parallel for simd
      for (int i = 0 ; i < N; i++)
        {
          #pragma omp atomic update
          v = v + 1.0;
        }
    ...

    It hangs when executing on target board unix/-foffload=-misa=sm_75, using
    drivers 470.103.01 and 510.54 on a T400 board (sm_75).

    I'm tentatively identifying the problem as a bug in -muniform-simt for
    architectures that support Independent Thread Scheduling (sm_70 and later).

    The problem -muniform-simt is trying to address is to make sure that a
    register produced outside an openmp simd region is available when used in
any
    lane inside an simd region.

    The solution is to, outside an simd region, execute in all warp lanes, thus
    producing consistent values in result registers in each warp thread.

    This approach doesn't work when executing in all warp lanes multiplies the
    side effects from 1 to 32 separate side effects, which is the case for
atomic
    insns.  So atomic insns are rewritten to execute only in lane 0, and if
    there are any results, those are propagated to the other threads in the
warp.
    [ And likewise for system calls malloc, free, vprintf. ]

    Now, consider a non-atomic update: ld, add, store.  The store has side
    effects, are those multiplied or not?

    Pre-sm_70 we can assume that at the end of an SIMT region, any divergent
    control flow has reconverged, and we have a uniform warp, executing in lock
    step.  So:
    - the load will load the same value into the result register across the
warp,
    - the add will write the same value into the result register across the
warp,
    - the store will write the same value to the same memory location, 32
times,
      at once, having the result of a single store.
    So, no side-effect multiplication (well, at least that's the observation).

    Starting sm_70, the threads in a warp are no longer guaranteed to
reconverge
    after divergence.  There's a "Convergence Optimizer" that can can identify
    that it is safe for a warp to reconverge, but that works only as long as
the
    code does not contain "synchronizing operations".

    Consequently, the ld, add, store sequence can be executed by a non-uniform
    warp, which means the side effects can have multiplied, and the registers
are
    no longer guarantueed to be in sync.

    The atomic update in the example above is translated using an atom.cas
loop,
    which means that we have divergence (because only one thread is allowed to
    succeed at a time) and the "Convergence Optimizer" doesn't reconverge
probably
    because the atom.cas counts as a "synchronizing operation".  So, it seems
    plausible that the root cause for the mentioned hang is the problem
described
    above.

    Fix this by adding an explicit warp sync at simt exit.

    Note that we're assuming here that the warp will stay uniform until the
next
    SIMT region entry.

    Tested on x86_64 with nvptx accelerator.

    gcc/ChangeLog:

    2022-03-09  Tom de Vries  <tdevries@suse.de>

            PR target/104916
            PR target/104783
            * config/nvptx/nvptx.md (define_expand "omp_simt_exit"): Emit warp
            sync (or uniform warp check for mptx < 6.0).

    libgomp/ChangeLog:

    2022-03-15  Tom de Vries  <tdevries@suse.de>

            PR target/104916
            PR target/104783
            * testsuite/libgomp.c/pr104783-2.c: New test.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct
  2022-03-04 13:35 [Bug target/104783] New: [nvptx, openmp] Hang/abort with atomic update in simd construct vries at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2022-03-22 13:40 ` cvs-commit at gcc dot gnu.org
@ 2022-03-22 13:47 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2022-03-22 13:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
   Target Milestone|---                         |12.0
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #8 from Tom de Vries <vries at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-03-22 13:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-04 13:35 [Bug target/104783] New: [nvptx, openmp] Hang/abort with atomic update in simd construct vries at gcc dot gnu.org
2022-03-04 13:48 ` [Bug target/104783] " vries at gcc dot gnu.org
2022-03-04 16:16 ` vries at gcc dot gnu.org
2022-03-08 12:51 ` vries at gcc dot gnu.org
2022-03-09  7:16 ` vries at gcc dot gnu.org
2022-03-10 11:22 ` cvs-commit at gcc dot gnu.org
2022-03-10 12:25 ` vries at gcc dot gnu.org
2022-03-22 13:40 ` cvs-commit at gcc dot gnu.org
2022-03-22 13:47 ` vries at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).