[PATCH][libgomp, testsuite, nvptx] Limit recursion in declare

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90
@ 2022-04-01 11:24 Tom de Vries
  2022-04-01 11:26 ` Jakub Jelinek
  2022-04-01 12:28 ` Thomas Schwinge
  0 siblings, 2 replies; 9+ messages in thread
From: Tom de Vries @ 2022-04-01 11:24 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Thomas Schwinge

Hi,

When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on
an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run
into:
...
FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \
  -DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O0 \
  -DGOMP_NVPTX_JIT=-O0 execution test
...

Fix this by further limiting recursion depth in the test-cases for nvptx.

Furthermore, make the recursion depth limiting nvptx-specific.

Tested on x86_64 with nvptx accelerator.

Any comments?

Thanks,
- Tom

[libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

libgomp/ChangeLog:

2022-04-01  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Define
	and use REC_DEPTH.
	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

---
 .../libgomp.fortran/examples-4/declare_target-1.f90  | 18 +++++++++++++-----
 .../libgomp.fortran/examples-4/declare_target-2.f90  | 20 ++++++++++++++------
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
index b761979ecde..03c5c53ed67 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
@@ -1,4 +1,16 @@
 ! { dg-do run }
+! { dg-additional-options "-cpp" }
+! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+! Nvidia Titan V.
+! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+! Reduced from 22 to 20, otherwise execution runs out of thread stack on
+! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
+! { dg-additional-options "-DREC_DEPTH=20" { target { offload_target_nvptx } } } */
+
+#ifndef REC_DEPTH
+#define REC_DEPTH 25
+#endif
 
 module e_53_1_mod
   integer :: THRESHOLD = 20
@@ -27,9 +39,5 @@ end module
 program e_53_1
   use e_53_1_mod, only : fib, fib_wrapper
   if (fib (15) /= fib_wrapper (15)) stop 1
-  ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
-  ! Nvidia Titan V.
-  ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
-  ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
-  if (fib (22) /= fib_wrapper (22)) stop 2
+  if (fib (REC_DEPTH) /= fib_wrapper (REC_DEPTH)) stop 2
 end program
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
index f576c25ba39..0e8bea578a8 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
@@ -1,16 +1,24 @@
 ! { dg-do run }
+! { dg-additional-options "-cpp" }
+! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+! Nvidia Titan V.
+! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+! Reduced from 22 to 18, otherwise execution runs out of thread stack on
+! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
+! { dg-additional-options "-DREC_DEPTH=18" { target { offload_target_nvptx } } } */
+
+#ifndef REC_DEPTH
+#define REC_DEPTH 25
+#endif
 
 program e_53_2
   !$omp declare target (fib)
   integer :: x, fib
   !$omp target map(from: x)
-    ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
-    ! Nvidia Titan V.
-    ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
-    ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
-    x = fib (22)
+    x = fib (REC_DEPTH)
   !$omp end target
-  if (x /= fib (22)) stop 1
+  if (x /= fib (REC_DEPTH)) stop 1
 end program
 
 integer recursive function fib (n) result (f)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90
  2022-04-01 11:24 [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90 Tom de Vries
@ 2022-04-01 11:26 ` Jakub Jelinek
  2022-04-01 12:28 ` Thomas Schwinge
  1 sibling, 0 replies; 9+ messages in thread
From: Jakub Jelinek @ 2022-04-01 11:26 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Thomas Schwinge

On Fri, Apr 01, 2022 at 01:24:40PM +0200, Tom de Vries wrote:
> Hi,
> 
> When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on
> an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run
> into:
> ...
> FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \
>   -DGOMP_NVPTX_JIT=-O0 execution test
> FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O0 \
>   -DGOMP_NVPTX_JIT=-O0 execution test
> ...
> 
> Fix this by further limiting recursion depth in the test-cases for nvptx.
> 
> Furthermore, make the recursion depth limiting nvptx-specific.
> 
> Tested on x86_64 with nvptx accelerator.
> 
> Any comments?
> 
> Thanks,
> - Tom
> 
> [libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90
> 
> libgomp/ChangeLog:
> 
> 2022-04-01  Tom de Vries  <tdevries@suse.de>
> 
> 	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Define
> 	and use REC_DEPTH.
> 	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

Ok.

	Jakub


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90
  2022-04-01 11:24 [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90 Tom de Vries
  2022-04-01 11:26 ` Jakub Jelinek
@ 2022-04-01 12:28 ` Thomas Schwinge
  2022-04-01 15:34   ` Tom de Vries
  1 sibling, 1 reply; 9+ messages in thread
From: Thomas Schwinge @ 2022-04-01 12:28 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Jakub Jelinek, gcc-patches

Hi Tom!

On 2022-04-01T13:24:40+0200, Tom de Vries <tdevries@suse.de> wrote:
> When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on
> an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run
> into:
> ...
> FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \
>   -DGOMP_NVPTX_JIT=-O0 execution test
> FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O0 \
>   -DGOMP_NVPTX_JIT=-O0 execution test
> ...
>
> Fix this by further limiting recursion depth in the test-cases for nvptx.
>
> Furthermore, make the recursion depth limiting nvptx-specific.

Careful:

> --- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
> +++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
> @@ -1,4 +1,16 @@
>  ! { dg-do run }
> +! { dg-additional-options "-cpp" }
> +! Reduced from 25 to 23, otherwise execution runs out of thread stack on
> +! Nvidia Titan V.
> +! Reduced from 23 to 22, otherwise execution runs out of thread stack on
> +! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
> +! Reduced from 22 to 20, otherwise execution runs out of thread stack on
> +! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
> +! { dg-additional-options "-DREC_DEPTH=20" { target { offload_target_nvptx } } } */

'offload_target_nvptx' doesn't mean that offloading execution is done on
nvptx, but rather that we're "*compiling* for offload target nvptx"
(emphasis mine).  That means, with such a change we're now getting
different behavior in a system with an AMD GPU, when using a toolchain
that only has GCN offloading configured vs. a toolchain that has GCN and
nvptx offloading configured.  This isn't going to cause any real
problems, of course, but it's confusing, and a bad example of
'offload_target_nvptx'.

'offload_device_nvptx' ought to work: "using nvptx offload device".

But again, to keep things simple, I again suggest to unconditionally
reduce the recursion depth for all configurations, unless there exists an
actual rationale for the original value.


Grüße
 Thomas


> +
> +#ifndef REC_DEPTH
> +#define REC_DEPTH 25
> +#endif
>
>  module e_53_1_mod
>    integer :: THRESHOLD = 20
> @@ -27,9 +39,5 @@ end module
>  program e_53_1
>    use e_53_1_mod, only : fib, fib_wrapper
>    if (fib (15) /= fib_wrapper (15)) stop 1
> -  ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
> -  ! Nvidia Titan V.
> -  ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
> -  ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
> -  if (fib (22) /= fib_wrapper (22)) stop 2
> +  if (fib (REC_DEPTH) /= fib_wrapper (REC_DEPTH)) stop 2
>  end program

> --- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
> +++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
> @@ -1,16 +1,24 @@
>  ! { dg-do run }
> +! { dg-additional-options "-cpp" }
> +! Reduced from 25 to 23, otherwise execution runs out of thread stack on
> +! Nvidia Titan V.
> +! Reduced from 23 to 22, otherwise execution runs out of thread stack on
> +! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
> +! Reduced from 22 to 18, otherwise execution runs out of thread stack on
> +! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
> +! { dg-additional-options "-DREC_DEPTH=18" { target { offload_target_nvptx } } } */
> +
> +#ifndef REC_DEPTH
> +#define REC_DEPTH 25
> +#endif
>
>  program e_53_2
>    !$omp declare target (fib)
>    integer :: x, fib
>    !$omp target map(from: x)
> -    ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
> -    ! Nvidia Titan V.
> -    ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
> -    ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
> -    x = fib (22)
> +    x = fib (REC_DEPTH)
>    !$omp end target
> -  if (x /= fib (22)) stop 1
> +  if (x /= fib (REC_DEPTH)) stop 1
>  end program
>
>  integer recursive function fib (n) result (f)
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90
  2022-04-01 12:28 ` Thomas Schwinge
@ 2022-04-01 15:34   ` Tom de Vries
  2022-04-01 15:38     ` Jakub Jelinek
  0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2022-04-01 15:34 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3350 bytes --]

On 4/1/22 14:28, Thomas Schwinge wrote:
> Hi Tom!
> 
> On 2022-04-01T13:24:40+0200, Tom de Vries <tdevries@suse.de> wrote:
>> When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on
>> an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run
>> into:
>> ...
>> FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \
>>    -DGOMP_NVPTX_JIT=-O0 execution test
>> FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O0 \
>>    -DGOMP_NVPTX_JIT=-O0 execution test
>> ...
>>
>> Fix this by further limiting recursion depth in the test-cases for nvptx.
>>
>> Furthermore, make the recursion depth limiting nvptx-specific.
> 
> Careful:
> 
>> --- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
>> +++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
>> @@ -1,4 +1,16 @@
>>   ! { dg-do run }
>> +! { dg-additional-options "-cpp" }
>> +! Reduced from 25 to 23, otherwise execution runs out of thread stack on
>> +! Nvidia Titan V.
>> +! Reduced from 23 to 22, otherwise execution runs out of thread stack on
>> +! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
>> +! Reduced from 22 to 20, otherwise execution runs out of thread stack on
>> +! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
>> +! { dg-additional-options "-DREC_DEPTH=20" { target { offload_target_nvptx } } } */
> 
> 'offload_target_nvptx' doesn't mean that offloading execution is done on
> nvptx, but rather that we're "*compiling* for offload target nvptx"
> (emphasis mine).  That means, with such a change we're now getting
> different behavior in a system with an AMD GPU, when using a toolchain
> that only has GCN offloading configured vs. a toolchain that has GCN and
> nvptx offloading configured.  This isn't going to cause any real
> problems, of course, but it's confusing, and a bad example of
> 'offload_target_nvptx'.
> 
> 'offload_device_nvptx' ought to work: "using nvptx offload device".
> 

Thanks for pointing that out.

I tried to understand this multiple offloading configuration a bit, and 
came up with the following mental model: it's possible to have a host 
with say an nvptx and amd offloading device, and then configure and 
build a toolchain that can generate a single executable that can offload 
to either device, depending on the value of appropriate openacc/openmp 
environment variables.

So, in principle the libgomp testsuite could have a mode in which it 
does that: run the same executable twice, once for each offloading 
device.  In that case, even using offload_device_nvptx would not be 
accurate enough, and we'd need to test for offload device type at 
runtime, as used to be done in 
libgomp/testsuite/libgomp.fortran/task-detach-6.f90.

I've tried to copy that setup to 
libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90, but 
that doesn't seem to work anymore.  I've also tried copying that 
test-case to 
libgomp/testsuite/libgomp.fortran/copy-of-declare_target-1.f90 to rule 
out any subdir-related problems, but no luck there either.

Attached is that copy approach, could you try it out and see if it works 
for you?

Do you perhaps have an idea why it's failing?

I can make a patch using offload_device_nvptx, but I'd prefer to 
understand first why the approach above isn't working.

Thanks,
- Tom

[-- Attachment #2: 0001-libgomp-testsuite-Add-libgomp.fortran-copy-of-declare_target-1.f90.patch --]
[-- Type: text/x-patch, Size: 1864 bytes --]

[libgomp/testsuite] Add libgomp.fortran/copy-of-declare_target-1.f90

---
 .../libgomp.fortran/copy-of-declare_target-1.f90   | 49 ++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/libgomp/testsuite/libgomp.fortran/copy-of-declare_target-1.f90 b/libgomp/testsuite/libgomp.fortran/copy-of-declare_target-1.f90
new file mode 100644
index 00000000000..6dcf5312070
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/copy-of-declare_target-1.f90
@@ -0,0 +1,49 @@
+! { dg-do run }
+! { dg-additional-sources on_device_arch.c }
+
+module e_53_1_mod
+  integer :: THRESHOLD = 20
+contains
+  integer recursive function fib (n) result (f)
+    !$omp declare target
+    integer :: n
+    if (n <= 0) then
+      f = 0
+    else if (n == 1) then
+      f = 1
+    else
+      f = fib (n - 1) + fib (n - 2)
+    end if
+  end function
+
+  integer function fib_wrapper (n)
+    integer :: x
+    !$omp target map(to: n) map(from: x) if(n > THRESHOLD)
+      x = fib (n)
+    !$omp end target
+    fib_wrapper = x
+  end function
+end module
+
+program e_53_1
+  use e_53_1_mod, only : fib, fib_wrapper
+  integer :: REC_DEPTH = 25
+
+  interface
+    integer function on_device_arch_nvptx() bind(C)
+    end function on_device_arch_nvptx
+  end interface
+
+  if (on_device_arch_nvptx () /= 0) then
+     ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+     ! Nvidia Titan V.
+     ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+     ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+     ! Reduced from 22 to 20, otherwise execution runs out of thread stack on
+     ! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
+     REC_DEPTH = 20
+  end if
+
+  if (fib (15) /= fib_wrapper (15)) stop 1
+  if (fib (REC_DEPTH) /= fib_wrapper (REC_DEPTH)) stop 2
+end program

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90
  2022-04-01 15:34   ` Tom de Vries
@ 2022-04-01 15:38     ` Jakub Jelinek
  2022-04-01 15:57       ` Tom de Vries
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Jelinek @ 2022-04-01 15:38 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Thomas Schwinge, gcc-patches

On Fri, Apr 01, 2022 at 05:34:50PM +0200, Tom de Vries wrote:
> Do you perhaps have an idea why it's failing?

Because you call on_device_arch_nvptx () outside of
!$omp target region, so unless the host device is NVPTX,
it will not be true.

> +program e_53_1
> +  use e_53_1_mod, only : fib, fib_wrapper
> +  integer :: REC_DEPTH = 25
> +
> +  interface
> +    integer function on_device_arch_nvptx() bind(C)
> +    end function on_device_arch_nvptx
> +  end interface
> +
> +  if (on_device_arch_nvptx () /= 0) then
> +     ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
> +     ! Nvidia Titan V.
> +     ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
> +     ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
> +     ! Reduced from 22 to 20, otherwise execution runs out of thread stack on
> +     ! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
> +     REC_DEPTH = 20
> +  end if
> +
> +  if (fib (15) /= fib_wrapper (15)) stop 1
> +  if (fib (REC_DEPTH) /= fib_wrapper (REC_DEPTH)) stop 2
> +end program

	Jakub


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90
  2022-04-01 15:38     ` Jakub Jelinek
@ 2022-04-01 15:57       ` Tom de Vries
  2022-04-04 11:05         ` Tom de Vries
  0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2022-04-01 15:57 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1062 bytes --]

On 4/1/22 17:38, Jakub Jelinek wrote:
> On Fri, Apr 01, 2022 at 05:34:50PM +0200, Tom de Vries wrote:
>> Do you perhaps have an idea why it's failing?
> 
> Because you call on_device_arch_nvptx () outside of
> !$omp target region, so unless the host device is NVPTX,
> it will not be true.
> 

That bit does works because on_device_arch_nvptx calls on_device_arch 
which contains the omp target bit:
...
static int
on_device_arch (int d)
{
   int d_cur;
   #pragma omp target map(from:d_cur)
   d_cur = device_arch ();

   return d_cur == d;
}

int
on_device_arch_nvptx ()
{
   return on_device_arch (GOMP_DEVICE_NVIDIA_PTX);
}
...

So I realized that I didn't do a good job of specifying the problem I 
encountered, and went looking at it, at which point I realized the error 
message had changed, and knew how to fix it ... So, my apologies, some 
confusion on my part.

Anyway, attached patch avoids any nvptx-related tcl directives (just for 
once test-case for now).  To me, this seems the most robust solution.

It this approach acceptable?

Thanks,
- Tom

[-- Attachment #2: 0001-libgomp-testsuite-Fix-libgomp.fortran-examples-4-declare_target-1.f90.patch --]
[-- Type: text/x-patch, Size: 2648 bytes --]

[libgomp/testsuite] Fix libgomp.fortran/examples-4/declare_target-1.f90

---
 .../examples-4/declare_target-1.f90                | 31 +++++++++++++---------
 .../libgomp.fortran/examples-4/on_device_arch.c    |  3 +++
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
index 03c5c53ed67..acded20f756 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
@@ -1,16 +1,6 @@
 ! { dg-do run }
-! { dg-additional-options "-cpp" }
-! Reduced from 25 to 23, otherwise execution runs out of thread stack on
-! Nvidia Titan V.
-! Reduced from 23 to 22, otherwise execution runs out of thread stack on
-! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
-! Reduced from 22 to 20, otherwise execution runs out of thread stack on
-! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
-! { dg-additional-options "-DREC_DEPTH=20" { target { offload_target_nvptx } } } */
-
-#ifndef REC_DEPTH
-#define REC_DEPTH 25
-#endif
+! { dg-additional-sources on_device_arch.c }
+! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
 
 module e_53_1_mod
   integer :: THRESHOLD = 20
@@ -38,6 +28,23 @@ end module
 
 program e_53_1
   use e_53_1_mod, only : fib, fib_wrapper
+  integer :: REC_DEPTH = 25
+
+  interface
+    integer function on_device_arch_nvptx() bind(C)
+    end function on_device_arch_nvptx
+  end interface
+
+  if (on_device_arch_nvptx () /= 0) then
+     ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+     ! Nvidia Titan V.
+     ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+     ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+     ! Reduced from 22 to 20, otherwise execution runs out of thread stack on
+     ! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
+     REC_DEPTH = 20
+  end if
+
   if (fib (15) /= fib_wrapper (15)) stop 1
   if (fib (REC_DEPTH) /= fib_wrapper (REC_DEPTH)) stop 2
 end program
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/on_device_arch.c b/libgomp/testsuite/libgomp.fortran/examples-4/on_device_arch.c
new file mode 100644
index 00000000000..f8bef19e021
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/on_device_arch.c
@@ -0,0 +1,3 @@
+/* Auxiliar file.  */
+/* { dg-do compile  { target skip-all-targets } } */
+#include "../../libgomp.c-c++-common/on_device_arch.h"

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90
  2022-04-01 15:57       ` Tom de Vries
@ 2022-04-04 11:05         ` Tom de Vries
  2022-04-04 11:07           ` Jakub Jelinek
  0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2022-04-04 11:05 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1240 bytes --]

On 4/1/22 17:57, Tom de Vries wrote:
> On 4/1/22 17:38, Jakub Jelinek wrote:
>> On Fri, Apr 01, 2022 at 05:34:50PM +0200, Tom de Vries wrote:
>>> Do you perhaps have an idea why it's failing?
>>
>> Because you call on_device_arch_nvptx () outside of
>> !$omp target region, so unless the host device is NVPTX,
>> it will not be true.
>>
> 
> That bit does works because on_device_arch_nvptx calls on_device_arch 
> which contains the omp target bit:
> ...
> static int
> on_device_arch (int d)
> {
>    int d_cur;
>    #pragma omp target map(from:d_cur)
>    d_cur = device_arch ();
> 
>    return d_cur == d;
> }
> 
> int
> on_device_arch_nvptx ()
> {
>    return on_device_arch (GOMP_DEVICE_NVIDIA_PTX);
> }
> ...
> 
> So I realized that I didn't do a good job of specifying the problem I 
> encountered, and went looking at it, at which point I realized the error 
> message had changed, and knew how to fix it ... So, my apologies, some 
> confusion on my part.
> 
> Anyway, attached patch avoids any nvptx-related tcl directives (just for 
> once test-case for now).  To me, this seems the most robust solution.
> 
> It this approach acceptable?

I intend to commit this in a few days, unless there are objections.

Thanks,
- Tom

[-- Attachment #2: 0001-libgomp-testsuite-Fix-libgomp.fortran-examples-4-declare_target-1-2-.f90.patch --]
[-- Type: text/x-patch, Size: 5335 bytes --]

[libgomp/testsuite] Fix libgomp.fortran/examples-4/declare_target-{1,2}.f90

The test-cases libgomp.fortran/examples-4/declare_target-{1,2}.f90 mean to
set an nvptx-specific limit using offload_target_nvptx, but also change
behaviour for amd.

That is, there is now a difference in behaviour between:
- a compiler configured for GCN offloading, and
- a compiler configured for both GCN and nvptx offloading.

Fix this by using instead on_device_arch_nvptx.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-04-04  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.fortran/examples-4/on_device_arch.c: Copy from
	parent dir.
	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Use
	on_device_arch_nvptx instead of offload_target_nvptx.
	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

---
 .../examples-4/declare_target-1.f90                | 31 +++++++++++++---------
 .../examples-4/declare_target-2.f90                | 31 +++++++++++++---------
 .../libgomp.fortran/examples-4/on_device_arch.c    |  3 +++
 3 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
index 03c5c53ed67..acded20f756 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
@@ -1,16 +1,6 @@
 ! { dg-do run }
-! { dg-additional-options "-cpp" }
-! Reduced from 25 to 23, otherwise execution runs out of thread stack on
-! Nvidia Titan V.
-! Reduced from 23 to 22, otherwise execution runs out of thread stack on
-! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
-! Reduced from 22 to 20, otherwise execution runs out of thread stack on
-! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
-! { dg-additional-options "-DREC_DEPTH=20" { target { offload_target_nvptx } } } */
-
-#ifndef REC_DEPTH
-#define REC_DEPTH 25
-#endif
+! { dg-additional-sources on_device_arch.c }
+! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
 
 module e_53_1_mod
   integer :: THRESHOLD = 20
@@ -38,6 +28,23 @@ end module
 
 program e_53_1
   use e_53_1_mod, only : fib, fib_wrapper
+  integer :: REC_DEPTH = 25
+
+  interface
+    integer function on_device_arch_nvptx() bind(C)
+    end function on_device_arch_nvptx
+  end interface
+
+  if (on_device_arch_nvptx () /= 0) then
+     ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+     ! Nvidia Titan V.
+     ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+     ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+     ! Reduced from 22 to 20, otherwise execution runs out of thread stack on
+     ! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
+     REC_DEPTH = 20
+  end if
+
   if (fib (15) /= fib_wrapper (15)) stop 1
   if (fib (REC_DEPTH) /= fib_wrapper (REC_DEPTH)) stop 2
 end program
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
index 0e8bea578a8..27a5cec2e9d 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
@@ -1,20 +1,27 @@
 ! { dg-do run }
-! { dg-additional-options "-cpp" }
-! Reduced from 25 to 23, otherwise execution runs out of thread stack on
-! Nvidia Titan V.
-! Reduced from 23 to 22, otherwise execution runs out of thread stack on
-! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
-! Reduced from 22 to 18, otherwise execution runs out of thread stack on
-! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
-! { dg-additional-options "-DREC_DEPTH=18" { target { offload_target_nvptx } } } */
-
-#ifndef REC_DEPTH
-#define REC_DEPTH 25
-#endif
+! { dg-additional-sources on_device_arch.c }
+! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
 
 program e_53_2
   !$omp declare target (fib)
   integer :: x, fib
+  integer :: REC_DEPTH = 25
+
+  interface
+    integer function on_device_arch_nvptx() bind(C)
+    end function on_device_arch_nvptx
+  end interface
+
+  if (on_device_arch_nvptx () /= 0) then
+     ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+     ! Nvidia Titan V.
+     ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+     ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+     ! Reduced from 22 to 18, otherwise execution runs out of thread stack on
+     ! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
+     REC_DEPTH = 18
+  end if
+
   !$omp target map(from: x)
     x = fib (REC_DEPTH)
   !$omp end target
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/on_device_arch.c b/libgomp/testsuite/libgomp.fortran/examples-4/on_device_arch.c
new file mode 100644
index 00000000000..f8bef19e021
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/on_device_arch.c
@@ -0,0 +1,3 @@
+/* Auxiliar file.  */
+/* { dg-do compile  { target skip-all-targets } } */
+#include "../../libgomp.c-c++-common/on_device_arch.h"

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90
  2022-04-04 11:05         ` Tom de Vries
@ 2022-04-04 11:07           ` Jakub Jelinek
  2022-04-04 11:37             ` Tom de Vries
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Jelinek @ 2022-04-04 11:07 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Thomas Schwinge, gcc-patches

On Mon, Apr 04, 2022 at 01:05:12PM +0200, Tom de Vries wrote:
> 2022-04-04  Tom de Vries  <tdevries@suse.de>
> 
> 	* testsuite/libgomp.fortran/examples-4/on_device_arch.c: Copy from
> 	parent dir.

Wouldn't just ! { dg-additional-sources ../on_device_arch.c }
work?

	Jakub


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90
  2022-04-04 11:07           ` Jakub Jelinek
@ 2022-04-04 11:37             ` Tom de Vries
  0 siblings, 0 replies; 9+ messages in thread
From: Tom de Vries @ 2022-04-04 11:37 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches

On 4/4/22 13:07, Jakub Jelinek wrote:
> On Mon, Apr 04, 2022 at 01:05:12PM +0200, Tom de Vries wrote:
>> 2022-04-04  Tom de Vries  <tdevries@suse.de>
>>
>> 	* testsuite/libgomp.fortran/examples-4/on_device_arch.c: Copy from
>> 	parent dir.
> 
> Wouldn't just ! { dg-additional-sources ../on_device_arch.c }
> work?

I does, pushed with that update.

Thanks,
- Tom


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-04-04 11:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-01 11:24 [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90 Tom de Vries
2022-04-01 11:26 ` Jakub Jelinek
2022-04-01 12:28 ` Thomas Schwinge
2022-04-01 15:34   ` Tom de Vries
2022-04-01 15:38     ` Jakub Jelinek
2022-04-01 15:57       ` Tom de Vries
2022-04-04 11:05         ` Tom de Vries
2022-04-04 11:07           ` Jakub Jelinek
2022-04-04 11:37             ` Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).