public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
@ 2020-06-12 11:33 tschwinge at gcc dot gnu.org
  2020-06-12 14:18 ` [Bug fortran/95654] " tschwinge at gcc dot gnu.org
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2020-06-12 11:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

            Bug ID: 95654
           Summary: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90
                     -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops
                    -ftracer -finline-functions  execution test
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: openmp
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: burnus at gcc dot gnu.org
          Reporter: tschwinge at gcc dot gnu.org
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

The recent PR94690 commit f884bef21cccc05d748fd7869cd641cbb4f6b6bb "[Fortran]
OpenMP - permit lastprivate in distribute + SIMD fixes (PR94690)" added new
test cases, one of which FAILs for nvptx offloading in a high optimization
variant (only):

    PASS: libgomp.fortran/pr66199-5.f90   -O0  (test for excess errors)
    PASS: libgomp.fortran/pr66199-5.f90   -O0  execution test
    PASS: libgomp.fortran/pr66199-5.f90   -O1  (test for excess errors)
    PASS: libgomp.fortran/pr66199-5.f90   -O1  execution test
    PASS: libgomp.fortran/pr66199-5.f90   -O2  (test for excess errors)
    PASS: libgomp.fortran/pr66199-5.f90   -O2  execution test
    PASS: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess
errors)
    FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: libgomp.fortran/pr66199-5.f90   -O3 -g  (test for excess errors)
    PASS: libgomp.fortran/pr66199-5.f90   -O3 -g  execution test
    PASS: libgomp.fortran/pr66199-5.f90   -Os  (test for excess errors)
    PASS: libgomp.fortran/pr66199-5.f90   -Os  execution test

The one FAIL is:

    STOP 3
    FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

Compiling manually, I'm seeing:

    source-gcc/libgomp/testsuite/libgomp.fortran/pr66199-5.f90: In function
‘__m_MOD_f4._omp_fn.0’:
    source-gcc/libgomp/testsuite/libgomp.fortran/pr66199-5.f90:61: warning:
‘.count.16’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       61 | end
          | 
    source-gcc/libgomp/testsuite/libgomp.fortran/pr66199-5.f90: In function
‘__m_MOD_f3._omp_fn.0’:
    source-gcc/libgomp/testsuite/libgomp.fortran/pr66199-5.f90:48: warning:
‘.count.102’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       48 | end
          | 

I haven't assessed whether that's relevant.

I don't see the problem when running with Intel MIC (emulated) offloading (via
setting 'CUDA_VISIBLE_DEVICES=n_o-N-e'), and I have not yet tested GCN
offloading.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
@ 2020-06-12 14:18 ` tschwinge at gcc dot gnu.org
  2020-06-12 15:40 ` burnus at gcc dot gnu.org
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2020-06-12 14:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #1 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Tobias, I see you've unassigned yourself here, and set Depends on: PR95109. 
Have you verified that it's the same underlying issue, or do you just want to
wait for PR95109 being resolved before analyzing this one here, as it might be
the same underlying issue?  And, if you're not going to work on these items
now, is there any time scale?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
  2020-06-12 14:18 ` [Bug fortran/95654] " tschwinge at gcc dot gnu.org
@ 2020-06-12 15:40 ` burnus at gcc dot gnu.org
  2020-09-09 11:27 ` burnus at gcc dot gnu.org
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-06-12 15:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> ---
(In reply to Thomas Schwinge from comment #1)
> Have you verified that it's the same underlying issue
It's not but would otherwise be a duplicate.

> or do you just want to wait for PR95109 being resolved before analyzing this one
The latter. However, it is probably a separate issue.

> And, if you're not going to work on these items now, is there any time scale?
No – there seems to be currently an exponential growth of work items.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
  2020-06-12 14:18 ` [Bug fortran/95654] " tschwinge at gcc dot gnu.org
  2020-06-12 15:40 ` burnus at gcc dot gnu.org
@ 2020-09-09 11:27 ` burnus at gcc dot gnu.org
  2020-09-15 23:29 ` burnus at gcc dot gnu.org
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-09-09 11:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654
Bug 95654 depends on bug 95109, which changed state.

Bug 95109 Summary: [11 regression] ICE in gfortran.dg/gomp/target1.f90 after r11-349
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95109

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2020-09-09 11:27 ` burnus at gcc dot gnu.org
@ 2020-09-15 23:29 ` burnus at gcc dot gnu.org
  2020-09-16  7:24 ` burnus at gcc dot gnu.org
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-09-15 23:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #3 from Tobias Burnus <burnus at gcc dot gnu.org> ---
Created attachment 49222
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49222&action=edit
Slightly reduced example, compile with gfortran -fopenmp -O1 -ftracer

Some testing; with gfortran -fopenmp -O1 -ftracer it shows something like:
         -31 =?= 64
The expected output is the last value of the loop iteration ("lastprivate"),
i.e. (b-1+1)*2 = 32*2 = 64. – When running the target section on the host
(-foffload=disable) it works, which indicates either a shared-memory vs.
non-shared issue or an nvptx problem.

At a glance, the original dump looks fine; it seems as if either the
  .omp_data_i_28(D)->d1 = d1_9;
  .omp_data_i_28(D)->d2 = d2_11;
is not executed or the PHI handling goes wrong.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2020-09-15 23:29 ` burnus at gcc dot gnu.org
@ 2020-09-16  7:24 ` burnus at gcc dot gnu.org
  2020-09-16 13:23 ` vries at gcc dot gnu.org
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-09-16  7:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> ---
(In reply to Tobias Burnus from comment #3)
> Created attachment 49222 [details]
> Slightly reduced example, compile with gfortran -fopenmp -O1 -ftracer

On the host side, a single BB gets inserted – but more interesting is the
device side:

For the nvptx lto1, all dumped trees are identical until profile_estimate.

The result (the non-64 value) slightly varies, but one can disable disable app
three passes until tree-tracer to get still a fail:

gfortran -fopenmp -ftracer -O1 test.f90
  -foffload='-fdisable-tree-dom3 -fdisable-tree-profile_estimate
-fdisable-tree-fixup_cfg3 -fdisable-tree-strlen1 -fdisable-tree-copyprop5
-fdisable-tree-wrestr^Ct -fdisable-tree-dse3 -fdump-tree-all
-fdisable-tree-cddce3 -fdisable-tree-forwprop4 -fdisable-tree-phiopt4
-fdisable-tree-fab1 -fdisable-tree-dce7 -fdisable-tree-crited1
-fdisable-tree-uncprop1 -fdisable-tree-local-pure-const2 -fdisable-tree-nrv
-fdisable-tree-isel -fdisable-tree-optimized'

Adding -foffload=-fdisable-tree-tracer unsurprisingly makes all tree dumps
identical and yields the expected 64.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2020-09-16  7:24 ` burnus at gcc dot gnu.org
@ 2020-09-16 13:23 ` vries at gcc dot gnu.org
  2020-09-16 13:33 ` vries at gcc dot gnu.org
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: vries at gcc dot gnu.org @ 2020-09-16 13:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vries at gcc dot gnu.org

--- Comment #5 from Tom de Vries <vries at gcc dot gnu.org> ---
Minimal example:
...
! { dg-do run }                                                                 
program main
  implicit none
  integer :: d1, d2
  !$omp target map(from: d1)                                                    
  !$omp teams distribute parallel do simd default(none) lastprivate(d1) &       
  !$omp&      collapse(2)                                                       
  do d1 = 0, 31
    do d2 = 0, 31
    end do
  end do
  !$omp end target                                                              
  if (d1 /= 32) stop 3
end program
...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2020-09-16 13:23 ` vries at gcc dot gnu.org
@ 2020-09-16 13:33 ` vries at gcc dot gnu.org
  2020-09-16 15:54 ` vries at gcc dot gnu.org
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: vries at gcc dot gnu.org @ 2020-09-16 13:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #6 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 49227
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49227&action=edit
Dumps for failing test-case

(In reply to Tom de Vries from comment #5)
> Minimal example:
> ...
> ! { dg-do run }                                                             
> 
> program main
>   implicit none
>   integer :: d1, d2
>   !$omp target map(from: d1)                                                
> 
>   !$omp teams distribute parallel do simd default(none) lastprivate(d1) &   
> 
>   !$omp&      collapse(2)                                                   
> 
>   do d1 = 0, 31
>     do d2 = 0, 31
>     end do
>   end do
>   !$omp end target                                                          
> 
>   if (d1 /= 32) stop 3
> end program
> ...

Corresponding dumps.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2020-09-16 13:33 ` vries at gcc dot gnu.org
@ 2020-09-16 15:54 ` vries at gcc dot gnu.org
  2020-09-16 15:57 ` vries at gcc dot gnu.org
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: vries at gcc dot gnu.org @ 2020-09-16 15:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
Minimal example after commit 91347c3bbf7 "Fortran: OpenMP - fix simd with
(last)private (PR97061)":
...
! { dg-do run }                                                                 
program main
  implicit none
  integer :: d1
  !$omp target map(from: d1)                                                    
  !$omp teams distribute parallel do simd default(none) lastprivate(d1)         
  do d1 = 0, 31
  end do
  !$omp end target                                                              
  if (d1 /= 32) stop 3
end program
...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2020-09-16 15:54 ` vries at gcc dot gnu.org
@ 2020-09-16 15:57 ` vries at gcc dot gnu.org
  2020-09-16 21:11 ` burnus at gcc dot gnu.org
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: vries at gcc dot gnu.org @ 2020-09-16 15:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #8 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 49228
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49228&action=edit
Dumps for failing test-case (no collapse case)

(In reply to Tom de Vries from comment #7)
> Minimal example after commit 91347c3bbf7 "Fortran: OpenMP - fix simd with
> (last)private (PR97061)":
> ...
> ! { dg-do run }                                                             
> 
> program main
>   implicit none
>   integer :: d1
>   !$omp target map(from: d1)                                                
> 
>   !$omp teams distribute parallel do simd default(none) lastprivate(d1)     
> 
>   do d1 = 0, 31
>   end do
>   !$omp end target                                                          
> 
>   if (d1 /= 32) stop 3
> end program
> ...

Corresponding dumps.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2020-09-16 15:57 ` vries at gcc dot gnu.org
@ 2020-09-16 21:11 ` burnus at gcc dot gnu.org
  2020-09-17  9:33 ` burnus at gcc dot gnu.org
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-09-16 21:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2020-09-16

--- Comment #9 from Tobias Burnus <burnus at gcc dot gnu.org> ---
See also thread at:
  https://gcc.gnu.org/pipermail/gcc-patches/2020-September/thread.html#554054

Mentioned there RELATED PR:
  PR 80053 – similar issue related to label with address taken

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2020-09-16 21:11 ` burnus at gcc dot gnu.org
@ 2020-09-17  9:33 ` burnus at gcc dot gnu.org
  2020-09-17 10:20 ` vries at gcc dot gnu.org
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-09-17  9:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #10 from Tobias Burnus <burnus at gcc dot gnu.org> ---
(In reply to Tobias Burnus from comment #9)
> See also thread at:
>   https://gcc.gnu.org/pipermail/gcc-patches/2020-September/thread.html#554054

Regarding the patch there, the proper way is to adapt can_duplicate_block_p,
which calls the associated cfg_hooks->can_duplicate_block_p.

Currently, there is the stub (in gimple_cfg_hooks)
  tree-cfg.c:  gimple_can_duplicate_bb_p
and (for rtl_cfg_hooks and cfg_layout_rtl_cfg_hooks)
  cfgrtl.c:    cfg_layout_can_duplicate_bb_p
the latter avoids duplicate tablejumps and calls
targetm.cannot_copy_insn_p.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2020-09-17  9:33 ` burnus at gcc dot gnu.org
@ 2020-09-17 10:20 ` vries at gcc dot gnu.org
  2020-09-17 13:53 ` vries at gcc dot gnu.org
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: vries at gcc dot gnu.org @ 2020-09-17 10:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #11 from Tom de Vries <vries at gcc dot gnu.org> ---
So, at this point we know that duplicating the BB containing VOTE_ANY causes
problems in executing.  But AFAIU, we do not know why.

Is VOTE_ANY not supposed to be duplicated by design? If so, is there any
documentation of that design, that explains that?

At the nvptx level, VOTE_ANY translates to vote.ballot.b32, which does
cross-lane communication, but has defined behaviour in divergent mode AFAICT.
>From that perspective at least, there's no problem with duplicating VOTE_ANY.

My guess at this point, is that duplicating the block with VOTE_ANY has the
effect that the JIT compiler doesn't recognize control flow divergence before
XCHG_IDX, and fails to insert the proper barrier.

And XCHG_IDX translates to shfl.idx.b32, which has undefined behaviour in
divergent mode.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2020-09-17 10:20 ` vries at gcc dot gnu.org
@ 2020-09-17 13:53 ` vries at gcc dot gnu.org
  2020-09-17 15:19 ` vries at gcc dot gnu.org
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: vries at gcc dot gnu.org @ 2020-09-17 13:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #12 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #7)
> Minimal example after commit 91347c3bbf7 "Fortran: OpenMP - fix simd with
> (last)private (PR97061)":
> ...
> ! { dg-do run }                                                             
> 
> program main
>   implicit none
>   integer :: d1
>   !$omp target map(from: d1)                                                
> 
>   !$omp teams distribute parallel do simd default(none) lastprivate(d1)     
> 
>   do d1 = 0, 31
>   end do
>   !$omp end target                                                          
> 
>   if (d1 /= 32) stop 3
> end program
> ...

To further reduce: set num_teams to 2, and num_threads to 1:
...
  !$omp teams distribute parallel do simd default(none) lastprivate(d1)
num_teams (2) num_threads (1)
...
Makes it easier to reason about the code.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2020-09-17 13:53 ` vries at gcc dot gnu.org
@ 2020-09-17 15:19 ` vries at gcc dot gnu.org
  2020-09-22 17:16 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: vries at gcc dot gnu.org @ 2020-09-17 15:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #13 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #11)
> My guess at this point, is that duplicating the block with VOTE_ANY has the
> effect that the JIT compiler doesn't recognize control flow divergence
> before XCHG_IDX, and fails to insert the proper barrier.

Turns out, it's not that complicated.

Before ftracer we have:
...
  <bb 4> [local count: 268435456]:
  _30 = _18 + _27;
  _31 = _18 + _28;
  _46 = .GOMP_SIMT_ENTER_ALLOC (0, 1);
  _47 = .GOMP_SIMT_LANE ();
  _48 = (int) _47;
  _49 = _30 + _48;
  if (_31 > _49)
    goto <bb 8>; [87.50%]
  else
    goto <bb 5>; [12.50%]

  <bb 8> [local count: 117440512]:
  ...
  goto <bb 5>; [100.00%]

  <bb 5> [local count: 134217728]:
  # _54 = PHI <_50(D)(4), _67(8)>
  # _34 = PHI <_49(4), _71(8)>
  _55 = _34 == 63;
  _56 = (int) _55;
  _57 = .GOMP_SIMT_VOTE_ANY (_56);
  if (_57 != 0)
    goto <bb 7>; [50.00%]
  else
    goto <bb 6>; [50.00%]

  <bb 7> [local count: 67108864]:
  _58 = .GOMP_SIMT_LAST_LANE (_56);
  _60 = .GOMP_SIMT_XCHG_IDX (_54, _58);
  _61 = _60 + 1;
  goto <bb 6>; [100.00%]

  <bb 6> [local count: 268435456]:
  # d1_6 = PHI <_61(7), d1_29(D)(5)>
  *_46 ={v} {CLOBBER};
  .GOMP_SIMT_EXIT (_46);
  if (_31 == 32)
    goto <bb 11>; [34.00%]
  else
    goto <bb 9>; [66.00%]
...

At bb4 entry, we have unified control flow (that is, all threads in the warp
execute the same code in lockstep).

That's no longer the case at bb5/bb8.  In team 0, threads 0..15 execute the
loop body (bb8), and threads 16..31 don't.  In team 1, it's the opposite.

However, at bb5 the control flow from bb4 and bb8 joins, so control flow is
once again unified.

Then VOTE_ANY is executed in bb5, with team 1 subsequently going to the block
with XCHG_IDX (bb 7), and team 0, skipping straight to bb6.

After ftracer, we have:
...
  <bb 5> [local count: 16777216]:
  # _54 = PHI <_50(D)(4)>
  # _34 = PHI <_49(4)>
  _55 = _34 == 63;
  _56 = (int) _55;
  _57 = .GOMP_SIMT_VOTE_ANY (_56);
  if (_57 != 0)
    goto <bb 7>; [50.00%]
  else
    goto <bb 6>; [50.00%]

  <bb 8> [local count: 117440512]:
  ...
  _80 = _71 == 63;
  _81 = (int) _80;
  _82 = .GOMP_SIMT_VOTE_ANY (_81);
  if (_82 != 0)
    goto <bb 7>; [50.00%]
  else
    goto <bb 6>; [50.00%]
...

Now control flow no longer is unified at bb 5, and consequently it's not in bb7
when executing XCHG_IDX.  And that's the root cause for the failure we're
seeing.

So, one way to handle this it to consider VOTE_ANY as a "join" to the "fork" of
ENTER_ALLOC (which means: don't duplicate, unless you duplicate the pair).

But, after reading this:
...
/* Allocate per-lane storage and begin non-uniform execution region.  */

static void
expand_GOMP_SIMT_ENTER_ALLOC (internal_fn, gcall *stmt)
...
and this:
...
/* Deallocate per-lane storage and leave non-uniform execution region.  */

static void
expand_GOMP_SIMT_EXIT (internal_fn, gcall *stmt)
...
it seems that spot is already taken.

So I wonder, isn't the problem that we do the lastprivate stuff before
SIMT_EXIT. [ Of course after fixing that we might run into SIMT_EXIT being
duplicated by ftracer. But there at least the description of the internal-fn
would make it clear why we don't want to duplicate it. ]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2020-09-17 15:19 ` vries at gcc dot gnu.org
@ 2020-09-22 17:16 ` cvs-commit at gcc dot gnu.org
  2020-09-25  8:55 ` burnus at gcc dot gnu.org
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-09-22 17:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tobias Burnus <burnus@gcc.gnu.org>:

https://gcc.gnu.org/g:f74c87f85f510248874cb90ad0b0527b015034b9

commit r11-3363-gf74c87f85f510248874cb90ad0b0527b015034b9
Author: Tobias Burnus <tobias@codesourcery.com>
Date:   Tue Sep 22 19:15:44 2020 +0200

    libgomp.fortran/pr66199-5.f90: Make stop codes unique

    libgomp/ChangeLog:

            PR fortran/95654
            * testsuite/libgomp.fortran/pr66199-5.f90: Make stop codes unique.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2020-09-22 17:16 ` cvs-commit at gcc dot gnu.org
@ 2020-09-25  8:55 ` burnus at gcc dot gnu.org
  2020-10-05  6:54 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-09-25  8:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #15 from Tobias Burnus <burnus at gcc dot gnu.org> ---
See also PR97203 + PR97203, and PR80053.

And the thread:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/thread.html#554054

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2020-09-25  8:55 ` burnus at gcc dot gnu.org
@ 2020-10-05  6:54 ` cvs-commit at gcc dot gnu.org
  2020-10-05  7:01 ` vries at gcc dot gnu.org
  2020-10-05  7:01 ` vries at gcc dot gnu.org
  18 siblings, 0 replies; 20+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-05  6:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

--- Comment #16 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tom de Vries <vries@gcc.gnu.org>:

https://gcc.gnu.org/g:ab3f4b27abe8abc947e84ef84bfc9a18797c5868

commit r11-3648-gab3f4b27abe8abc947e84ef84bfc9a18797c5868
Author: Tom de Vries <tdevries@suse.de>
Date:   Tue Sep 22 16:38:07 2020 +0200

    [omp, ftracer] Don't duplicate blocks in SIMT region

    When running the libgomp testsuite on x86_64-linux with nvptx accelerator
on
    the test-case included in this patch, we run into:
    ...
    FAIL: libgomp.fortran/pr95654.f90 -O3 -fomit-frame-pointer -funroll-loops \
      -fpeel-loops -ftracer -finline-functions  execution test
    ...

    The test-case is a minimal version of this FAIL:
    ...
    FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops
\
      -fpeel-loops -ftracer -finline-functions  execution test
    ...
    but that one has stopped failing at commit c2ebf4f10de "openmp: Add support
    for non-rect simd and improve collapsed simd support".

    The problem is that ftracer duplicates a block containing
GOMP_SIMT_VOTE_ANY.

    That is, before ftracer we have (dropping the GOMP_SIMT_ prefix):
    ...
    bb4(ENTER_ALLOC)
    *----------+
    |           \
    |            \
    |             v
    |             *
    v             bb8
    *<------------*
    bb5(VOTE_ANY)
    *-------------+
    |             |
    |             |
    |             |
    |             |
    |             v
    |             *
    v             bb7(XCHG_IDX)
    *<------------*
    bb6(EXIT)
    ...

    The XCHG_IDX internal-fn does inter-SIMT-lane communication, which for
nvptx
    maps onto shfl, an operator which has the requirement that the warp
executing
    the operator is convergent.  The warp diverges at bb4, and
    reconverges at bb5, and does not diverge by going to bb7, so the shfl is
    indeed executed by a convergent warp.

    After ftracer, we have:
    ...
    bb4(ENTER_ALLOC)
    *----------+
    |           \
    |            \
    |             \
    |              \
    v               v
    *               *
    bb5(VOTE_ANY)   bb8(VOTE_ANY)
    *               *
    |\             /|
    | \  +--------+ |
    |  \/           |
    |  /\           |
    | /  +----------v
    |/              *
    v               bb7(XCHG_IDX)
    *<--------------*
    bb6(EXIT)
    ...

    The warp diverges again at bb5, but does not reconverge again before bb6,
so
    the shfl is executed by a divergent warp, which causes the FAIL.

    Fix this by making ftracer ignore blocks containing ENTER_ALLOC, VOTE_ANY
and
    EXIT, effectively treating the SIMT region conservatively.

    An argument can be made that the test needs to be added in a more
    generic place, like gimple_can_duplicate_bb_p or some such, and that
ftracer
    then needs to use the generic test.  But that's a discussion with a much
    broader scope, so I'm leaving that for another patch.

    Bootstrapped and reg-tested on x86_64-linux.

    Build on x86_64-linux with nvptx accelerator, tested with libgomp.

    gcc/ChangeLog:

            PR fortran/95654
            * tracer.c (ignore_bb_p): Ignore GOMP_SIMT_ENTER_ALLOC,
            GOMP_SIMT_VOTE_ANY and GOMP_SIMT_EXIT.

    libgomp/ChangeLog:

    2020-10-05  Tom de Vries  <tdevries@suse.de>

            PR fortran/95654
            * testsuite/libgomp.fortran/pr95654.f90: New test.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2020-10-05  6:54 ` cvs-commit at gcc dot gnu.org
@ 2020-10-05  7:01 ` vries at gcc dot gnu.org
  2020-10-05  7:01 ` vries at gcc dot gnu.org
  18 siblings, 0 replies; 20+ messages in thread
From: vries at gcc dot gnu.org @ 2020-10-05  7:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #17 from Tom de Vries <vries at gcc dot gnu.org> ---
Patch with test-case committed, marking resolved-fixed.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
  2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2020-10-05  7:01 ` vries at gcc dot gnu.org
@ 2020-10-05  7:01 ` vries at gcc dot gnu.org
  18 siblings, 0 replies; 20+ messages in thread
From: vries at gcc dot gnu.org @ 2020-10-05  7:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-10-05  7:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-12 11:33 [Bug fortran/95654] New: nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test tschwinge at gcc dot gnu.org
2020-06-12 14:18 ` [Bug fortran/95654] " tschwinge at gcc dot gnu.org
2020-06-12 15:40 ` burnus at gcc dot gnu.org
2020-09-09 11:27 ` burnus at gcc dot gnu.org
2020-09-15 23:29 ` burnus at gcc dot gnu.org
2020-09-16  7:24 ` burnus at gcc dot gnu.org
2020-09-16 13:23 ` vries at gcc dot gnu.org
2020-09-16 13:33 ` vries at gcc dot gnu.org
2020-09-16 15:54 ` vries at gcc dot gnu.org
2020-09-16 15:57 ` vries at gcc dot gnu.org
2020-09-16 21:11 ` burnus at gcc dot gnu.org
2020-09-17  9:33 ` burnus at gcc dot gnu.org
2020-09-17 10:20 ` vries at gcc dot gnu.org
2020-09-17 13:53 ` vries at gcc dot gnu.org
2020-09-17 15:19 ` vries at gcc dot gnu.org
2020-09-22 17:16 ` cvs-commit at gcc dot gnu.org
2020-09-25  8:55 ` burnus at gcc dot gnu.org
2020-10-05  6:54 ` cvs-commit at gcc dot gnu.org
2020-10-05  7:01 ` vries at gcc dot gnu.org
2020-10-05  7:01 ` vries at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).