public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/100232] New: [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
@ 2021-04-23 13:20 burnus at gcc dot gnu.org
  2021-04-23 14:19 ` [Bug target/100232] " vries at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: burnus at gcc dot gnu.org @ 2021-04-23 13:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

            Bug ID: 100232
           Summary: [OpenMP][nvptx] Reduction fails with optimization and
                    'loop'/'for simd' but not with 'for'
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: openmp, wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org
                CC: vries at gcc dot gnu.org
  Target Milestone: ---
            Target: nvptx-none

Created attachment 50661
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50661&action=edit
Testcase: gcc -fopenmp -O1 (fails, -O0 works)  - to be run with nvptx
offloading

(Based on https://github.com/SOLLVE/sollve_vv/ 's
tests/5.0/loop/test_loop_reduction_{and,or}_device.c )

The code works with nvptx offloading with -O0 but fails with -O1 and higher.
(It also works on AMD GCN or with host fallback.)

A reduction of  result &&= 1  will yield 0 instead of the expected 1.

I note that it works with 'for' but fails with 'loop' and 'for simd', hence, I
think it might related to SIMT (→ some other PRs about SIMT).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
  2021-04-23 13:20 [Bug target/100232] New: [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for' burnus at gcc dot gnu.org
@ 2021-04-23 14:19 ` vries at gcc dot gnu.org
  2021-04-23 15:27 ` burnus at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-04-23 14:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
Can you try the patch for PR81778 ?

It's possible you're looking at a duplicate.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
  2021-04-23 13:20 [Bug target/100232] New: [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for' burnus at gcc dot gnu.org
  2021-04-23 14:19 ` [Bug target/100232] " vries at gcc dot gnu.org
@ 2021-04-23 15:27 ` burnus at gcc dot gnu.org
  2021-04-28 12:51 ` vries at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: burnus at gcc dot gnu.org @ 2021-04-23 15:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #1)
> Can you try the patch for PR81778 ?
> It's possible you're looking at a duplicate.

Unfortunately, it does not seem to make a difference - it still fails

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
  2021-04-23 13:20 [Bug target/100232] New: [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for' burnus at gcc dot gnu.org
  2021-04-23 14:19 ` [Bug target/100232] " vries at gcc dot gnu.org
  2021-04-23 15:27 ` burnus at gcc dot gnu.org
@ 2021-04-28 12:51 ` vries at gcc dot gnu.org
  2021-04-28 13:03 ` vries at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-04-28 12:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #3 from Tom de Vries <vries at gcc dot gnu.org> ---
In expand_GOMP_SIMT_XCHG_BFLY, we have a subreg target:
...
(gdb) call debug_rtx ( target )
(subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
...

During expand_insn, the operands are legitimized, and this changes the state of
the output operand to:
...
(gdb) call debug_rtx ( ops[0].value )
(reg:QI 57)
...

So the value is written to reg 57, but never actually copied back to reg 40.

Tentative fix:
...
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index dd7173126fb..28ae3ed167a 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -361,6 +361,8 @@ expand_GOMP_SIMT_XCHG_BFLY (internal_fn, gcall *stmt)
   create_input_operand (&ops[2], idx, SImode);
   gcc_assert (targetm.have_omp_simt_xchg_bfly ());
   expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
+  if (ops[0].value != target)
+    emit_move_insn (target, ops[0].value);
 }

 /* Exchange between SIMT lanes according to given source lane index.  */
...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
  2021-04-23 13:20 [Bug target/100232] New: [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for' burnus at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-04-28 12:51 ` vries at gcc dot gnu.org
@ 2021-04-28 13:03 ` vries at gcc dot gnu.org
  2021-04-28 14:31 ` vries at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-04-28 13:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #4 from Tom de Vries <vries at gcc dot gnu.org> ---
This commit:
...
commit 3af3bec2e4d344bd54a134d8b2263f44d788c3d8
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Mon May 4 21:21:16 2020 +0100

    internal-fn: Avoid dropping the lhs of some calls [PR94941]
...
adds:
...
   expand_insn (get_multi_vector_move (type, optab), 2, ops);
+  if (!rtx_equal_p (target, ops[0].value))
+    emit_move_insn (target, ops[0].value);
...
in expand_load_lanes_optab_fn and mentions:
...
    create_output_operand coerces an output operand to the insn's
    predicates, using a suggested rtx location if convenient.
    But if that rtx location is actually required rather than
    optional, the builder of the insn has to emit a move afterwards.

    (We could instead add a new interface that does this automatically,
    but that's future work.)

    This PR shows that we were failing to emit the move for some of the
    vector load internal functions.  I think there are other routines in
    internal-fn.c that potentially have the same problem, but this patch is
    supposed to be a conservative subset suitable for backporting to GCC 10.
...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
  2021-04-23 13:20 [Bug target/100232] New: [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for' burnus at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-04-28 13:03 ` vries at gcc dot gnu.org
@ 2021-04-28 14:31 ` vries at gcc dot gnu.org
  2021-04-29  7:55 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-04-28 14:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #5 from Tom de Vries <vries at gcc dot gnu.org> ---
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/569038.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
  2021-04-23 13:20 [Bug target/100232] New: [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for' burnus at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-04-28 14:31 ` vries at gcc dot gnu.org
@ 2021-04-29  7:55 ` cvs-commit at gcc dot gnu.org
  2021-04-29  8:40 ` cvs-commit at gcc dot gnu.org
  2021-04-29  9:08 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-04-29  7:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tom de Vries <vries@gcc.gnu.org>:

https://gcc.gnu.org/g:4d7c874e2c64ebf7631049ace642d246843febae

commit r12-249-g4d7c874e2c64ebf7631049ace642d246843febae
Author: Tom de Vries <tdevries@suse.de>
Date:   Wed Apr 28 16:00:01 2021 +0200

    [omp, simt] Fix expand_GOMP_SIMT_*

    When running the test-case included in this patch using an
    nvptx accelerator, it fails in execution.

    The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away
    during pass_jump as "trivially dead insns".

    This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY:
    ...
      class expand_operand ops[3];
      create_output_operand (&ops[0], target, mode);
      ...
      expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
    ...
    which doesn't guarantee that target is assigned to by the expanded insn.

    F.i., if target is:
    ...
    (gdb) call debug_rtx ( target )
    (subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
    ...
    then after expand_insn, we have:
    ...
    (gdb) call debug_rtx ( ops[0].value )
    (reg:QI 57)
    ...

    See commit 3af3bec2e4d "internal-fn: Avoid dropping the lhs of some
    calls [PR94941]" for a similar problem.

    Fix this in the same way, by adding:
    ...
      if (!rtx_equal_p (target, ops[0].value))
        emit_move_insn (target, ops[0].value);
    ...
    where applicable in the expand_GOMP_SIMT_* functions.

    Tested libgomp on x86_64 with nvptx accelerator.

    gcc/ChangeLog:

    2021-04-28  Tom de Vries  <tdevries@suse.de>

            PR target/100232
            * internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC)
            (expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED)
            (expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY)
            (expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
  2021-04-23 13:20 [Bug target/100232] New: [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for' burnus at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-04-29  7:55 ` cvs-commit at gcc dot gnu.org
@ 2021-04-29  8:40 ` cvs-commit at gcc dot gnu.org
  2021-04-29  9:08 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-04-29  8:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Tom de Vries
<vries@gcc.gnu.org>:

https://gcc.gnu.org/g:f94c6caac7f03815c26c03a532f834c37517519c

commit r11-8324-gf94c6caac7f03815c26c03a532f834c37517519c
Author: Tom de Vries <tdevries@suse.de>
Date:   Wed Apr 28 16:00:01 2021 +0200

    [omp, simt] Fix expand_GOMP_SIMT_*

    When running the test-case included in this patch using an
    nvptx accelerator, it fails in execution.

    The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away
    during pass_jump as "trivially dead insns".

    This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY:
    ...
      class expand_operand ops[3];
      create_output_operand (&ops[0], target, mode);
      ...
      expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
    ...
    which doesn't guarantee that target is assigned to by the expanded insn.

    F.i., if target is:
    ...
    (gdb) call debug_rtx ( target )
    (subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
    ...
    then after expand_insn, we have:
    ...
    (gdb) call debug_rtx ( ops[0].value )
    (reg:QI 57)
    ...

    See commit 3af3bec2e4d "internal-fn: Avoid dropping the lhs of some
    calls [PR94941]" for a similar problem.

    Fix this in the same way, by adding:
    ...
      if (!rtx_equal_p (target, ops[0].value))
        emit_move_insn (target, ops[0].value);
    ...
    where applicable in the expand_GOMP_SIMT_* functions.

    Tested libgomp on x86_64 with nvptx accelerator.

    gcc/ChangeLog:

    2021-04-28  Tom de Vries  <tdevries@suse.de>

            PR target/100232
            * internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC)
            (expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED)
            (expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY)
            (expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to.

    (cherry picked from commit 4d7c874e2c64ebf7631049ace642d246843febae)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
  2021-04-23 13:20 [Bug target/100232] New: [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for' burnus at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-04-29  8:40 ` cvs-commit at gcc dot gnu.org
@ 2021-04-29  9:08 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-04-29  9:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
   Target Milestone|---                         |11.2
         Resolution|---                         |FIXED

--- Comment #8 from Tom de Vries <vries at gcc dot gnu.org> ---
I tried backporting to releases/gcc-10, but ran into:
...
FAIL: libgomp.c/target-43.c (test for excess errors)
Excess errors:
unresolved symbol __sync_val_compare_and_swap_1
mkoffload: fatal error:
/home/vries/oacc/trunk/install/offload-nvptx-none/bin//x86_64-pc-linux-gnu-accel-nvptx-none-gcc
returned 1 exit status
compilation terminated.
...

So I guess backporting stops at gcc-11.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-04-29  9:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-23 13:20 [Bug target/100232] New: [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for' burnus at gcc dot gnu.org
2021-04-23 14:19 ` [Bug target/100232] " vries at gcc dot gnu.org
2021-04-23 15:27 ` burnus at gcc dot gnu.org
2021-04-28 12:51 ` vries at gcc dot gnu.org
2021-04-28 13:03 ` vries at gcc dot gnu.org
2021-04-28 14:31 ` vries at gcc dot gnu.org
2021-04-29  7:55 ` cvs-commit at gcc dot gnu.org
2021-04-29  8:40 ` cvs-commit at gcc dot gnu.org
2021-04-29  9:08 ` vries at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).