public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [nvptx, libgomp, testsuite, PR85519] Reduce recursion depth in declare_target-{1,2}.f90
@ 2018-04-25 11:03 Tom de Vries
  2018-04-25 11:43 ` Jakub Jelinek
  0 siblings, 1 reply; 2+ messages in thread
From: Tom de Vries @ 2018-04-25 11:03 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2062 bytes --]

Hi,

when running the libgomp tests with nvptx accelerator on an Nvidia Titan 
V, we run into these failures:
...
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -O1  execution test
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -O2  execution test
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -Os  execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90   -O1  execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90   -O2  execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90   -Os  execution test
...

These tests contain recursive functions, and the failures are due to the 
fact that during execution it runs out of thread stack. The symptom is:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...
which we can turn into this symptom:
...
libgomp: cuStreamSynchronize error: an illegal instruction was encountered
...
by using GOMP_NVPTX_JIT=-O0, which inserts a valid thread stack check 
after the thread stack decrement at the start of each function.

The thread stack limit defaults to 1024 on all the boards that I've 
checked, including Titan V. The tests have a recursion depth of ~25, so 
when the frame size of the recursive function exceeds ~40, we can be 
sure to run out off thread stack. [ It also may happen at a smaller 
frame size, given that some thread stack space may have already been 
consumed before calling the recursive function. ]

[ The nvptx libgomp port uses a 128k per-warp stack in the global 
memory, avoiding the use of the .local directive in offloading 
functions, which would be mapped onto thread stack. But doing so does 
not eliminate the thread stack usage. F.i., device routine parameters 
can be stored on thread stack. ]


Concluding, these tests run out thread stack on Nvidia Titan V because 
the recursive functions have a larger frame size than we've seen for the 
Nvidia architecture flavours that we've tested before.

The patch fixes this by reducing the recursion depth.

OK for stage4 trunk?

Thanks,
- Tom

[-- Attachment #2: 0001-nvptx-libgomp-testsuite-Reduce-recursion-depth-in-declare_target-1-2-.f90.patch --]
[-- Type: text/x-patch, Size: 1887 bytes --]

[nvptx, libgomp, testsuite] Reduce recursion depth in declare_target-{1,2}.f90

2018-04-25  Tom de Vries  <tom@codesourcery.com>

	PR target/85519
	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
	recursion depth from 25 to 23.
	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

---
 libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 | 4 +++-
 libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90 | 6 ++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
index df941ee..51de6b2 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
@@ -27,5 +27,7 @@ end module
 program e_53_1
   use e_53_1_mod, only : fib, fib_wrapper
   if (fib (15) /= fib_wrapper (15)) STOP 1
-  if (fib (25) /= fib_wrapper (25)) STOP 2
+  ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+  ! Nvidia Titan V.
+  if (fib (23) /= fib_wrapper (23)) STOP 2
 end program
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
index 9c31569..76cce01 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
@@ -4,9 +4,11 @@ program e_53_2
   !$omp declare target (fib)
   integer :: x, fib
   !$omp target map(from: x)
-    x = fib (25)
+    ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+    ! Nvidia Titan V.
+    x = fib (23)
   !$omp end target
-  if (x /= fib (25)) STOP 1
+  if (x /= fib (23)) STOP 1
 end program
 
 integer recursive function fib (n) result (f)

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [nvptx, libgomp, testsuite, PR85519] Reduce recursion depth in declare_target-{1,2}.f90
  2018-04-25 11:03 [nvptx, libgomp, testsuite, PR85519] Reduce recursion depth in declare_target-{1,2}.f90 Tom de Vries
@ 2018-04-25 11:43 ` Jakub Jelinek
  0 siblings, 0 replies; 2+ messages in thread
From: Jakub Jelinek @ 2018-04-25 11:43 UTC (permalink / raw)
  To: Tom de Vries; +Cc: GCC Patches

On Wed, Apr 25, 2018 at 12:58:47PM +0200, Tom de Vries wrote:
> Concluding, these tests run out thread stack on Nvidia Titan V because the
> recursive functions have a larger frame size than we've seen for the Nvidia
> architecture flavours that we've tested before.
> 
> The patch fixes this by reducing the recursion depth.
> 
> OK for stage4 trunk?

Ok for trunk (i.e. 9.x) and 8.2 after 8.1 is released.

> [nvptx, libgomp, testsuite] Reduce recursion depth in declare_target-{1,2}.f90
> 
> 2018-04-25  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/85519
> 	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
> 	recursion depth from 25 to 23.
> 	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

	Jakub

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-04-25 11:08 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 11:03 [nvptx, libgomp, testsuite, PR85519] Reduce recursion depth in declare_target-{1,2}.f90 Tom de Vries
2018-04-25 11:43 ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).